SHACLer

SHACL

Shapes Contraint Language (SHACL) allows validating a dataset that has been specified following an ontology or RDF schema, for instance the SPHN RDF Schema or a project-specific RDF Schema (see Generate a SPHN project-specific RDF Schema and Instantiate data according to the project RDF Schema). For further information on SHACL, see Data validation with SHACL

The SHACLer tool

The SPHN SHACL Generator (SHACLer) is a Python-based tool developed by the DCC. The single file Python code is based on Python 3 and only requires minimal additional packages. Projects can use the SHACLer to generate a set of SHACL rules based on a given RDF schhema and an optional exception file. Based on those files, the SHACLer automatically produces a SHACL file in the Turtle format.

For a tutorial on how to run the SHACLer to generate SHACL files, see the user guide SHACL constraints in the SHACLer.

SHACLer internals

The SHACLer generates all validation rules based on NodeShapes centric to a class from the RDF Schema. All domain, range, restriction and cardinality annotations and individuals are collected based on the RDF Schema. Information is stored in internal dictionaries and transported to the SHACL generator.

In detail, to get the information out of the RDF Schema, the generator looks for all owl:ObjectProperties and owl:DatatypeProperties and parses their range and domain specifications. For range specifications it also parses the corresponding rdfs:subClassOf information. This is needed as some properties have an upper level concept as their domain; logically that implies that also the lower level elements have that. In addition, the SHACLer looks for owl:Restriction and parses information according to specific criteria (i.e. is the information a cardinality restriction or a restriction on a property value?). Although we require RDFS inference for the validation, it can happen that the upper level concept should not be instantiatable on its own and is excluded, therefore we annotate the property at all allowed levels. This supports the readability, on a per concept basis for a human reader.

Validating data with the SHACL file

Data producers can use this SHACL file to validate the data that has been exported according to the given RDF Schema (see Data validation with SHACL). Validating data before distribution avoids distributing data inconsistent with the RDF Schema (e.g., data with missing properties; data with properties that have not been specified in the RDF Schema; data with wrong data types, etc.)

Availability and usage rights

© Copyright 2022, Personalized Health Informatics Group (PHI), SIB Swiss Institute of Bioinformatics

The SHACLer is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-shacl-generator (send request to DCC - dcc@sib.swiss) and is licensed under the GPLv3 license.

For any question or comment, please contact the SPHN Data Coordination Center (DCC) at dcc@sib.swiss.