SHACLer
SHACL
Shapes Contraint Language (SHACL) allows validating a dataset that has been specified following an RDF schema, for instance the SPHN RDF Schema or a project-specific RDF Schema (see Generate a project RDF Schema from the the RDF Schema Template and Instantiate data according to the project RDF Schema). For further information on SHACL, see Data validation with SHACL
The SHACLer
tool
The SPHN SHACL Generator (SHACLer
) is a Python-based tool developed by the DCC.
The single file Python code is based on Python 3 and only requires minimal additional packages.
Projects can use the SHACLer
to generate a set of SHACL rules based on a
given RDF schhema and an optional exception file. Based on those files, the
SHACLer
automatically produces a SHACL file in the Turtle format.
For a tutorial on how to run the SHACLer
to generate SHACL files, see the
user guide
SHACL constraints in the SHACLer.
SHACLer
internals
The SHACLer
generates all validation rules based on NodeShapes centric to a
class from the RDF Schema. All domain, range, restriction and cardinality annotations
and individuals are collected based on the RDF Schema. Information is stored in internal
dictionaries and transported to the SHACL generator.
In detail, to get the information out of the RDF Schema, the generator looks for
all owl:ObjectProperties
and owl:DatatypeProperties
and parses their
range and domain specifications. For range specifications it also parses the
corresponding rdfs:subClassOf
information. This is needed as some properties
have an upper level concept as their domain; logically that implies that also
the lower level elements have that. In addition, the SHACLer looks for owl:Restriction
and parses information according to specific criteria (i.e. is the information a
cardinality restriction or a restriction on a property value?).
Although we require RDFS inference for the validation, it can happen that the upper level
concept should not be instantiatable on its own and is excluded, therefore we annotate
the property at all allowed levels. This supports the readability,
on a per concept basis for a human reader.
Validating data with the SHACL file
Data producers can use this SHACL file to validate the data that has been exported according to the given RDF Schema (see Data validation with SHACL). Validating data before distribution avoids distributing data inconsistent with the RDF Schema (e.g., data with missing properties; data with properties that have not been specified in the RDF Schema; data with wrong data types, etc.)
Availability and usage rights
© Copyright 2022, Personalized Health Informatics Group (PHI), SIB Swiss Institute of Bioinformatics
The SHACLer is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-shacl-generator (send request to DCC - dcc@sib.swiss) and is licensed under the GPLv3 license.
For any question or comment, please contact the SPHN Data Coordination Center (DCC) at dcc@sib.swiss.