SHACL constraints in the SHACLer

Note

For an introduction to SHACL, visit the SHACL Background section.

Target Audience

This document is mainly intended for RDF experts, eventually data providers, project data managers and researchers as well to understand the way SHACLs have been implemented in the SHACLer tool. This section first lists the assumptions taken into account during the building of the SHACL constraints and then goes into detail about the SHACL rules implemented in the context of SPHN.

1. Assumptions

The SHACLer is a Python-based generator for SHACL (Shapes Contraint Language) where the ontology is interpreted according to some assumptions. When the assumptions hold, an ontology file can be used to generate SHACL files out of it. The ontology of SPHN starting with version 2021.1 conforms to these assumptions. Project based on this version (and future versions) of the SPHN RDF schema do also conform to the assumptions.

The assumptions are the following:

  • We require that SHACL is tested using RDFS Inference turned on. This is required, as ranges pick some upper level concepts (e.g. SNOMED CT subtrees). As SNOMED CT in RDF is an OWL ontology it has subclasses that use OWL syntax instead of RDFS syntax. To be able to apply only RDFS Reasoning in the validation phase, the SNOMED CT exploit feature can be used to extend the ranges to all non-RDFS subclasses.

  • There are no further ObjectProperties/DataProperties than the ones that are defined in the ontology (although, there might be further classes with predicates).

  • An rdfs:domain or rdfs:range annotation of an Object Property indicates that only these properties are allowed in the classes (this is also applying to inherit properties).

  • An rdfs:domain of a property pointing to an owl:unionOf list means that the the property can be used in any of the list items instances.

  • An rdfs:range of a property pointing to an owl:unionOf list means that the the property has to always end in an instance of “one Of” (or subclassOf) the references classes.

  • In case there are Individuals/Instances of owl:NamedIndividual and a class we make these Individuals being the only allowed Instances of a class.

  • owl:EquivalentClass properties link SPHN concepts to other external terminologies (e.g. SNOMED CT, LOINC). These properties are not picked up and evaluated in the SHACL generation. Although logically valid, and applying OWL2 inference also technically valid, the SHACL rules focus on SPHN concepts.

  • An owl:Restriction annotation on a property overwrites its rdfs:range annotation.

2. SHACL constraints implemented for SPHN

A specific set of constraints is implemented in the SHACLer in the context of SPHN.

More details about the implemented SHACL constraints are accessible at: https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-shacl-generator.

3. Template of SHACL constraints

Note

Some of the examples shown below are shortened, to improve readability. The original ones can be looked up in the shacl.ttl generated for the SPHN RDF schema (here).

There exist three different node shape patterns. The first one consists of Cardinality contraints, Restriction on classes, and Literal type constraints. Restricting on individuals/instances is another implemented pattern.

Cardinality constraints

In SPHN, properties may have a specific cardinality, which means that there exists a restriction on how often a property can be used with a certain type of data instance. The cardinalities defined in SPHN are implemented in the RDF schema. They include information on:

1. links connecting each SPHN concept to patient (via sphn:hasSubjectPseudoIdentifier), provider (via sphn:hasDataProviderInstitute), and case (via sphn:hasAdministrativeCase);

  1. the number of times specific metadata (i.e. properties) can be connected to a certain concept.

One example of application of these constraints is on the property :hasAdministrativeCase. Entities are allowed to only have at most one SubjectPseudoIdentifier. This rule is expressed by the following SHACL constraints :

constraints:Biobanksample a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Biosample ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasBiosample ],
        [ sh:class :AdministrativeCase ;
            sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:path :hasAdministrativeCase ],
        [ sh:class :SubjectPseudoIdentifier ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ] ;
    sh:targetClass :Biobanksample .

We can interpret this rule as follows: For all instances of the class :Biobanksample, the property :hasAdministrativeCase can be used zero (sh:minCount 0) or exactly one (sh:maxCount 1) time. For all instances of the class :Biobanksample, the property :hasSubjectPseudoIdentifier can be used exactly one (sh:minCount 1 and sh:maxCount 1) time.

Restriction on classes

A common pattern are restrictions for properties on classes. A certain property has to refer to an instance of a specific class or a specific set of classes. One example where this constraint is required is the property :hasCode for instances of the class :Substance. These constraints are expressed as followed:

constraints:Substance a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:or ( [ sh:class :Code ] [ sh:class sphn-atc:ATC ] [ sh:class snomed:105590001 ] ) ;
            sh:path :hasCode ],
    [ sh:class :Quantity ;
        sh:maxCount 1 ;
        sh:minCount 0 ;
        sh:path :hasQuantity ] ;
    sh:targetClass :Substance .

The above constraints can be interpreted as follows: For all instances of the class :Substance, it must hold that the property :hasCode refers to an instance of at least one of the enumerated classes (i.e. an SPHN Code, a ATC class or a SNOMED CT class of the specific value or its children). This is ensured by the usage of the SHACL expression sh:or which lists all accepted classes.

In addition, if a certain property has to refer to an instance of a specific class or a specific set of classes, for some properties instances of subclasses of the specified classes are not allowed. One example where this constraint is required is the property :hasCode for instances of the class :AdministrativeGender. These constraints are expressed as followed:

constraints:AdministrativeGender a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :SubjectPseudoIdentifier ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:or ( [ sh:class snomed:703118005 ] [ sh:class snomed:703117000 ] [ sh:class snomed:74964007 ] [ sh:class snomed:261665006 ] ) ;
            sh:path :hasCode ] ;
    sh:sparql [ a sh:SPARQLConstraint ;
            sh:message "No subclasses of the specified codes are allowed" ;
            sh:select """
                                        SELECT ?this (<https://biomedit.ch/rdf/sphn-ontology/sphn#hasCode> as ?path) (?class as ?value)
                                        WHERE {
                                        ?this <https://biomedit.ch/rdf/sphn-ontology/sphn#hasCode>/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?class .
                                        FILTER (?class NOT IN (<http://snomed.info/id/703118005>,<http://snomed.info/id/703117000>,<http://snomed.info/id/74964007>,<http://snomed.info/id/261665006>) ) .
                                        FILTER NOT EXISTS {<http://snomed.info/id/703118005> <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        FILTER NOT EXISTS {<http://snomed.info/id/703117000> <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        FILTER NOT EXISTS {<http://snomed.info/id/74964007> <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        FILTER NOT EXISTS {<http://snomed.info/id/261665006> <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        }""" ] ;
    sh:targetClass :AdministrativeGender .

The above constraint can be interpreted as follows: For all instances of the class :AdministrativeGender, it must hold that the property :hasCode refers to an instance of at least one of the enumerated classes (sh:or). No other value are allowed. If the property refers, for example, to an instance of a subclass of one of the enumerated classes, an error message will occur. This is ensured by the usage of the SHACL expression sh:sparql, which throws a message (sh:message) if it finds an instance of a subclass (sh:select).

Note

The no-subclasses-allowed constraint (sh:sparql) is not validated by GraphDB, but ignored.

SPARQL target constraints

To not cause unwanted validation errors when subclasses are validated against the constraints of their parent class, SPARQL target constraints are implemented for the SPHN classes with subclasses, which ensure that only the class is validated against its constraints and not the subclasses.

constraints:Measurement a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Quantity ;
            sh:path :hasQuantity ] ;
    sh:target [ a sh:SPARQLTarget ;
            sh:select """SELECT ?this
                    WHERE {
                    ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://biomedit.ch/rdf/sphn-ontology/sphn#Measurement> .
                    MINUS {
                        ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://biomedit.ch/rdf/sphn-ontology/sphn#Measurement> .
                        ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?other_type .
                        FILTER (?other_type != <https://biomedit.ch/rdf/sphn-ontology/sphn#Measurement> )
                        ?this <http://www.w3.org/2000/01/rdf-schema#subClassOf>+  <https://biomedit.ch/rdf/sphn-ontology/sphn#Measurement> .
                        }
                    }""" ] .

The above constraint can be interpreted as follows: Only instances of the class :Measurement that are not also instances of a subclass of :Measurement are validated against this constraint. Therefore, instances of subclasses (e.g. :OxygenSaturation) are validated only against the constraints:OxygenSaturation shape and not against the constraints of their parent class shape constraints:Measurement. This is ensured by the usage of the SHACL expression sh:SPARQLtarget, where instances of subclasses are excluded from the select query (sh:select).

Note

The target class constraint (sh:SPARQLtarget) is not validated by GraphDB, but causes errors.

Sequence paths

Some properties have a sequence of nodes specified as a path. This is expressed as followed:

constraints:Age a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :SubjectPseudoIdentifier ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ],
        [ sh:in ( ucum:h ucum:wk ucum:a ucum:d ucum:mo ucum:min ) ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path ( :hasQuantity :hasUnit :hasCode ) ] ;
    sh:targetClass :Age .

The above constraint can be interpreted as follows: For all instances of the class :Age, it must hold that the property :hasQuantity refers to an instance of at least one of the enumerated classes (sh:in) over the sequence path :hasQuantity / :hasUnit / :hasCode.

It means that when an age is given, the possible values for its unit are only hour, week, year, month and minutes.

Note

The sequence paths are not validated by GraphDB, but ignored.

Literal type constraints

Besides the object properties where Restrictions on classes are used, there exist also data properties. On data properties we have the option to restrict the possible datatypes using Literal type constraints. In the class :Code, three of them are in use. On the properties :hasCodingSystemAndVersion, :hasIdentifier and :hasName the shacl file validates that the literal used is of type xsd:string.

constraints:Code a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasCodingSystemAndVersion ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:path :hasName ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasIdentifier ] ;
    sh:targetClass :Code .

The interpretation of the above constraint is: Whenever in an instance of :Code the property :hasName is used, the object needs to be a Literal of type xsd:string.

Restricting on individuals/instances

There exist cases where it is forbidden to create new instances of a class, but only already existing so-called individuals (instances) are allowed. This constraint is, for instance, applied on entities of the type :Biosample_fixationType as shown in the following:

constraints:Biosample_fixationType a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:in ( :AldehydeBased :RNALater :VacuumTechnologyStabilization :Other :AlcoholBased :HeatStabilization :AllprotectTissueReagent :NeutralBufferedFormalin :SnapFreezing :UNK :OptimumCuttingTemperatureMedium :PAXgeneTissue :NonaldehydeWithAceticAcid :NonaldehydeBasedWithoutAceticAcid :NonbufferedFormalin ) ;
            sh:path [ sh:inversePath rdf:type ] ] ;
    sh:targetClass sphn:Biosample_fixationType .

This SHACL constraints ensures, that only explicitly enumerated individuals are used as instances for the class :Biosample_fixationType. In addition, it forbids by means of an inversePath constraint sh:inversePath rdf:type that new entities are derived as subclasses.

4. Implementation examples

Class Example

constraints:Quantity a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Unit ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasUnit ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:or ( [ sh:datatype xsd:double ] [ sh:datatype xsd:string ] ) ;
            sh:path :hasValue ] ;
    sh:targetClass :Quantity .

The NodeShape shown here is generated through various parts of the ontology. From bottom to the top:

  • there is a class :Quantity in the ontology (last Line: sh:targetClass :Quantity)

  • the properties :hasUnit and :hasValue do have the :Quantity in their domain specification (sh:property and following)

  • Both properties have given cardinalities (sh:minCount and sh:maxCount)

  • the property :hasUnit has the :Unit class in the range (sh:property and following). The target class will have a NodeShape on its own.

  • the property :hasValue has the xsd:double and xsd:string from the Terminologies in the range (sh:or and following lines).

  • the rdf:type is ignored unless explicitly specified

  • the shape is closed (sh:closed true) to define there are no other properties allowed.

Meaning Binding / Individual Example

constraints:OncologyTreatmentAssessment_result a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:in ( :CompleteResponse :StableDisease :Unknown :ProgressiveDisease :PartialResponse ) ;
            sh:inversePath rdf:type ] ;
    sh:targetClass :OncologyTreatmentAssessment_result .

A Meaning Binding or Individual also result in a NodeShape as shown just above. From bottom to the top:

  • there is a class :OncologyTreatmentAssessment_result in the ontology (last Line: sh:targetClass :OncologyTreatmentAssessment_result)

  • the inverse property of the type sh:inversePath rdf:type means all instances of the class OncologyTreatmentAssessment_result have to be in the list specified in the sh:in list. Only :CompleteResponse, :StableDisease, :unknown, :ProgressiveDisease and :PartialResponse are allowed

  • the rdf:type is ignored unless explicitly specified

  • the shape is closed (sh:closed true) to define there are no other properties allowed.