RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation

Vladimir Alexiev

<2022-09-20>

Introduction

See these publications:

RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.

If the example instances include embedded source field names, they can describe a mapping precisely. I've implemented a few more tools to generate transformations:

See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.

License and Citation

This work is covered by the Artistic-2.0 license.

If you use this software, please cite it as shown above.

Documentation

rdfpuml shots

Everyone likes screenshots!

But until I do it, here are a couple of shots.

Shot: Company and Person Info

Information about Sirma Group and Atanas Kiryakov from Refinitiv PermID:

test/permid/sample-SirmaGroup-AtanasKiryakov.png

Shot: Complex Class Expressions

Intersections and restrictions, represented using blank nodes and rdf:List:

test/complex-types/example2.png

Shot: CIDOC CRM and Reification

The tool can depict reification using rdf:Statement and various other constructs, in this case crm:E17_Type_Assignment:

doc/img/CONA-types-reification.png

Shot: Arrow Styles

doc/img/arrowStyles.png

Unicode (actual) and semantic URL links for each node and property (idea). "Zmrzlina" is Czech for "icecream" and may have the highest concentration of consonants in the world:

test/unicode/svg-links.png

rdf2sparql shots

Declarative model (left) vs generated SPARQL Update for OntoRefine (right two):

doc/img/rdf2sparql-organizations.png

Software:

The following papers use or mention this software:

Installation

Checkout this repo and add rdf2rml/bin to your path. Install the following prerequisites:

Docker Image

If you prefer to work with Docker so you don't need to install software manually, you can use this rdf2rml image from the public Nexus (Docker Registry) of Ontotext. To run it, use:

docker run -v <directory>:/files --rm docker-registry.ontotext.com/rdf2rml:latest`

Where <directory> is the local directory holding your .ttl files. It was made on 31 May 2023 and uses the following versions:

Note: pull request 7 of 17 Sep 2019 by Jem Rayfield (@jazzyray) dockerizes the installation, and makes extra changes related to input/output and configuration. However, it has not been merged yet

Debian Repo

Jonas Smedegaard (@jonassmedegaard, dr at jones fullstop dk) has volunteered for some of the tasks below. His development is at https://salsa.debian.org/debian/rdf2rml/branches. To adopt changes, do something like this.

ToDo Tasks

See CHANGELOG.org for implemented features and fixes.

Help needed for the following tasks. Post bugs and enhancement requests to this repo!

Near-term

Modularize and Package Better

Regression Tests

rdf2rml: disentangle inverse edge

In the case Y-P-X described above:

Release on CPAN

Add Unicode tests

Add ttl with non-ASCII chars: Accented, Cyrillic, French, etc.

Prefixes

Allow specifying the prefixes file

See https://github.com/VladimirAlexiev/rdf2rml/pull/7

Eliminate Curie.pm

./lib/RDF/Prefixes/Curie.pm remembers @base and uses that for URL shortening. Once perlrdf#131 is fixed, eliminate this dependency (local module)

Remember prefixes from input file

rdfpuml shortens URLs using prefixes only from prefixes.ttl, but should also use prefixes defined in the individual input file.

Support more RDF Formats

Now it only supports Turtle, because it concatenates prefixes.ttl to the main file. If it can collect all prefixes from RDF files, such concatenation won't be needed

Batch Processing

Issue #1: plantuml is slow to start up, so we'd like to process a bunch of puml files at once. The best way is to have a smarter script or Makefile that uses the following http://plantuml.com/command-line features:

"Manual" Batching

Before I discovered the -checkmetadata option, I had the idea that rdfpuml could put several diagrams in one puml file:

@startuml file1.png
  # made from file1.ttl
@enduml
@startuml file2.png
  # made from file2.ttl
@enduml

However, this interferes with make processing that regenerates only png for changed ttl files, and makes things less modular overall.

Mid-Term

Upgrade to use Attean

Trine (Perl RDF) is end of life. Attean is the new generation

Integrate in Emacs org-mode

Write Turtle, see diagram (easy to do)

Node colors, icons, tooltips

See ./ideas

More arrow types and styles

dotted|dashed|plain|bold|hidden|norank|single|thickness

Extra Layout Options

Local layout options are described in Help on Layout:

Global options include (eg see this diagram):

skinparam Linetype ortho skinparam NodeSep 80 skinparam RankSep 80 skinparam Padding 5 skinparam MinClassWidth 40 skinparam SameClassWidth true

And there are a lot more undocumented features: https://forum.plantuml.net/7095

Custom Reification

Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)

Use MindMap/WBS for Hierarchies

Plantuml now has MindMap and WBS (or OBS) diagrams that use a simple bulleted syntax to draw hierarchies.

It would be nice to use this to draw hierarchies of individuals, in particular taxonomies.

Here are examples of the two styles:

Long-Term

rdf2soml to Generate Semantic Object Models

A new tool rdf2soml to generate Ontotext Platform SOML from RDF examples.

What's missing? Most importantly: property cardinality and virtual inverses.

PlantUML can show arrow cardinalities, and this simple and natural PlantUML code:

X "0:1" -left-> "1:m" Y : prop/\ninvProp

Is depicted as follows:

We have two options how to express this in triples:

Cardinality With RDF*

##### model triples
:X :prop :Y.
##### puml triples
<< :X :prop :Y >>
  puml:arrow puml:left; # direction
  puml:min 1; puml:max puml:inf; # cardinality
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"]. # virtual inverse

Cardinality With Blank Node

##### model triples
:X :prop :Y.
##### puml triples
:X puml:left :Y. # direction
:X :prop [ # a puml:Cardinality; # may need this marker class to skip the node from the diagram
  puml:min 1; puml:max puml:inf; # cardinality
  puml:object :Y; # only needed if X has several relations "prop" and they need different annotations
  puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"] # virtual inverse
].

rdf2shape to Describe & Generate RDF Shapes

Visualize RDF Shapes (SHACL and ShEx)

Issue #8: discussion with Thomas Francart of Sparna

I developed this SHACL to PlantUML converter, in Java, based on TopQuadrant SHACL lib, and the result is at https://shacl-play.sparna.fr/play/draw and code at https://github.com/sparna-git/shacl-play/tree/master/shacl-diagram

I don't have a strong opinion on the example you provide, an alternative idea that comes to my mind is

:node1 :link [
  rdf:value :node2;
  puml:min 1 ;
  puml:max 2 ;
]

But this changes the structure of the example graph itself, which might not be convenient

Generate transformations for other than relational sources

R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate: