SPHN Schema Forge

The SPHN Schema Forge is a web service for building SPHN-compliant RDF schemas by simply uploading an Excel file as input. It simplifies the process of generating such schemas, and additionally generates a human-readable HTML representation of the schema, as well as SHACL shapes and SPARQL queries which are used for data quality checks.

The web service is available at: https://schemaforge.dcc.sib.swiss. The figure below displays the home page of the SPHN Schema Forge:

SchemaForge Homepage

Figure 1. The SPHN Schema Forge web service homepage.

Usage

There are two options to run the web service:

  1. Upload a SPHN-compliant Dataset (one Excel file, template and user guide) to generate the full stack of Semantic Web content (from the project-specific RDF Schema to the HTML Visualization).

  2. Upload a SPHN-compliant project-specific RDF Schema and the SPHN RDF Schema (two Turtle files, template and user guide) to build the SHACL rules, SPARQL queries and HTML Visualization

In addition, it is possible to upload a SHACL exception file for the SHACL output (JSON file) and to upload concept images for the human-readable HTML representation (PNG files).

Note

When uploading a project-specific Turtle file, please ensure the file name has the structure “prefix_anything.ttl” to differentiate it from the SPHN Turtle file.

The web service will run the Dataset2RDF, the Documentation and Visualization tool, the SPARQLer and the SHACLer in the background, without the need to install any of this tools. After completion, the RDF Schema, the HTML documentation, the SPARQL queries and the SHACL contraints will be generated and can be downloaded.

SchemaForge Workflow

Figure 2. Workflow integrating the SPHN Schema Forge web service in the context of the SPHN Semantic Interoperability Framework.

Tools integrated in SPHN Schema Forge

The following tools are integrated in the SPHN Schema Forge and enable to automatically build the full stack of Semantic Web content in SPHN (RDF schema, HTML visualization, SPARQLs and SHACLs):

The integrated tools are described below.

SPHN Dataset2RDF

The SPHN Dataset2RDF is a Python tool developed by the DCC. The Dataset2RDF translates the concepts and composedOfs defined in the SPHN Dataset into a formal representation using RDF, RDFS, and OWL. The output of the Dataset2RDF tool is an SPHN-compliant RDF Schema. Since 2023, the Dataset2RDF is used by the DCC to generate the SPHN RDF Schema (i.e. the 2023.2 release onwards) from the SPHN Dataset.

Usage

The Dataset2RDF is built to support two types of scenarios:

  • SPHN Dataset: In this scenario, the input is the SPHN Dataset .xlsx and the output is the SPHN RDF Schema as a .ttl file.

  • Project-specific Dataset: In this scenario, the input is a project-specific Dataset .xlsx, adapted from the Dataset Template

The Dataset2RDF tool can parse the SPHN Dataset .xlsx as follows:

dataset2rdf --input SPHN_Dataset.xlsx \
     --output sphn_rdf_schema.ttl \
     --config dataset2rdf/config.yaml

The tool takes as input the SPHN Dataset .xlsx and a config.yaml, and generates a RDF Schema .ttl.

The Dataset2RDF tool can also parse a project-specific Dataset .xlsx as follows:

dataset2rdf --input Project_Dataset.xlsx \
     --output sphn_rdf_schema.ttl \
     --project-output project_specific_rdf_schema.ttl \
     --config dataset2rdf/config.yaml

The tool takes as input the a project-specific Dataset .xlsx and a config.yaml, and generates a project-specific RDF Schema .ttl, as well as the SPHN RDF Schema.

Note

To generate the project-specific Dataset you should make use of the SPHN Dataset Template and adapt it for your project needs. For more information refer to Generate a project-specific RDF Schema.

SHACLer

Note

For an introduction to SHACL, visit the SHACL Background section.

Shapes Constraint Language (SHACL) allows validating a dataset that has been specified following an RDF schema, for instance the SPHN RDF Schema or a project-specific RDF Schema (see Generate a project-specific RDF Schema and Generate data according to a RDF schema). For further information on SHACL, see Data validation with SHACL.

The SPHN SHACL Generator (SHACLer) is a Python-based tool developed by the DCC. The tool takes as input a SPHN-compliant RDF Schema and generates a set of SHACL rules in Turtle format.

Projects can use the SHACLer to generate a set of SHACL rules based on a project-specific RDF Schema.

Note

An (optional) exception file can be provided together with the RDF schema if any of the defined classes has an exception to be handled separately.

See the instructions on how to run the SHACLer to generate SHACL files here.

SHACLer internals

The SHACLer generates all validation rules based on NodeShapes centric to a class from the RDF schema. All domain, range, restriction and cardinality annotations and individuals are collected based on the RDF schema.

Specifically, the SHACLer looks for all owl:ObjectProperties and owl:DatatypeProperties before parsing their range and domain specifications. For range specifications, it also parses the corresponding rdfs:subClassOf information since some properties have an upper level concept as their domain; which logically implies that for the lower level concepts, the property is deemed valid. In addition, the SHACLer looks for owl:Restriction and parses information according to specific criteria (i.e. is the information a cardinality restriction or a restriction on a property value?). Although we require RDFS inference for the validation, it can happen that the upper-level concept should not be instantiable on its own and is excluded, therefore we annotate the property at all allowed levels. This supports the readability on a per concept basis for a human reader.

All parsed information is stored in internal dictionaries and transported to the SHACL generator.

Assumptions made on the SHACLer

When building the SHACL rules, the SHACLer makes some assumptions about the RDF schema. When the assumptions hold, a RDF schema can be used to generate SHACL files. The RDF schema of SPHN starting with version 2021.1 conforms to these assumptions. Project’s based on this version (and future versions) of the SPHN RDF Schema must also conform to these assumptions.

The assumptions are the following:

  • We require that SHACL is tested using RDFS Inference turned on. This is required, as ranges pick some upper level concepts (e.g. SNOMED CT subtrees). As SNOMED CT in RDF is an OWL ontology, it has subclasses that use OWL syntax instead of RDFS syntax. To be able to apply only RDFS Reasoning in the validation phase, the SNOMED CT exploit feature can be used to extend the ranges to all non-RDFS subclasses.

  • There are no further ObjectProperties/DataProperties than the ones that are defined in the RDF schema (although, there might be further classes with predicates).

  • An rdfs:domain or rdfs:range annotation of an Object Property indicates that only these properties are allowed in the classes (this is also applying to inherited properties).

  • An rdfs:domain of a property pointing to an owl:unionOf list means that the property can be used in any of the listed instances.

  • An rdfs:range of a property pointing to an owl:unionOf list means that the property has to always end in an instance of “one Of” (or subclassOf) the referred classes.

  • In case there are Individuals/Instances of owl:NamedIndividual in a class we make these Individuals being the only allowed Instances of that class.

  • owl:EquivalentClass properties link SPHN concepts to other external terminologies (e.g. SNOMED CT, LOINC). These properties are not picked up and evaluated in the SHACL generation. Although logically valid, and applying OWL2 inference also technically valid, the SHACL rules focus on SPHN concepts.

  • An owl:Restriction annotation on a property overwrites its rdfs:range annotation.

Constraints implemented in SPHN using SHACL

Note on the formatting: the level of the validation constraint is in the straight brackets before each constraint type. See Validation constraint severity levels for more information about the levels.

For each class in the RDF Schema (standalone SPHN or in combination with a project): Restriction on classes

  • [ERROR] no other properties used for this class than the specified in the RDF schema with inference rules applied (same as displayed in the pyLODE visualization)

  • [ERROR] the properties occur the right cardinality Cardinality constraints

  • [ERROR] the properties lead to the right target type (datatype or class) Literal type constraints

  • [ERROR] when terminology valuesets are used, the specification whether children/descendands (direct and indirect subclasses) are allowed is checked

  • [ERROR] when terminology valuesets are used, the validity of the codes are checked according the restricted valuesets

  • [ERROR] when specifiying start and end datetimes in a class, it is asserted that the start is before the end datetime Restricting that the start is before the end

For SPHN/project valuesets in the RDF Schema:

In general:

  • [WARN] naming conventions are obeyed for instances of project/SPHN classes Naming convention on schema instances

  • [WARN] naming conventions are obeyed for instances of shared resources e.g. external terminologies _naming_convention_on_shared_instances

For historized terminologies (e.g. ATC, CHOP, ICD-10-GM):

SHACL constraint components implemented in SPHN

A specific set of constraints is implemented in the SHACLer in the context of SPHN, which are listed below:

SHACL Constraint

Description

sh:closed false

value node has only those properties that have been explicitly enumerated via sh:property

sh:ignoredProperties

properties that are also permitted in addition to those explicitly enumerated via sh:property

sh:datatype xsd:dateTime

verifies if a property value has the type xsd:dateTime

sh:datatype xsd:double

verifies if a property value has the type xsd:double

sh:datatype xsd:string

verifies if a property value has the type xsd:string

sh:class … sh:path

range of a property is used correctly, i.e. the class of an instance matches the specified type constraint

sh:maxCount, sh:minCount

checks if the cardinality of a property is applied correctly, e.g., there is just one value for a given property

sh:inversePath rdf:type

only those values are allowed, that have been explicitly enumerated in the expression as a type

sh:or … sh:path

values of the specified sh:path needs to correspond to one of the explicitly enumerated IRIs

sh:in … sh:path

values of the specified sh:path needs to correspond to one of the explicitly enumerated IRIs

sh:in … sh:inversePath

values need to correspond to explicitly enumerated value lists of individuals

sh:sparql … sh:select

verifies if a property value is correct, when subclasses of the specified codes are not allowed

sh:SPARQLtarget …

sh:select the constraints are only validated for this class and not for the subclasses

Patterns of implemented SHACL constraints

Note

Some of the examples shown below are shortened, to improve readability. The original code can be looked up in the shacl .ttl file generated for the SPHN RDF Schema (here).

There exist different node shape patterns implemented in the SHACLer such as Cardinality contraints, Restriction on classes, Literal type constraints, Restricting on individuals/instances.

Cardinality constraints

In SPHN, properties may have a specific cardinality, which means that there exists a restriction on how often a property can be used with a certain type of data instance. The cardinalities defined in SPHN are implemented in the RDF schema. They include information on:

1. links connecting each SPHN concept to patient (via sphn:hasSubjectPseudoIdentifier), provider (via sphn:hasDataProviderInstitute), and case (via sphn:hasAdministrativeCase);

  1. the number of times specific metadata (i.e. properties) can be connected to a certain concept.

One example of application of these constraints is on the property :hasAdministrativeCase. Entities are allowed to only have at most one SubjectPseudoIdentifier. This rule is expressed by the following SHACL constraints :

constraints:Biobanksample a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Biosample ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasBiosample ],
        [ sh:class :AdministrativeCase ;
            sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:path :hasAdministrativeCase ],
        [ sh:class :SubjectPseudoIdentifier ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ] ;
    sh:targetClass :Biobanksample .

We can interpret this rule as follows: For all instances of the class :Biobanksample, the property :hasAdministrativeCase can be used zero (sh:minCount 0) or exactly one (sh:maxCount 1) time. For all instances of the class :Biobanksample, the property :hasSubjectPseudoIdentifier can be used exactly one (sh:minCount 1 and sh:maxCount 1) time.

Restriction on classes

A common pattern are restrictions for properties on classes. A certain property has to refer to an instance of a specific class or a specific set of classes. One example where this constraint is required is the property :hasCode for instances of the class :Substance. These constraints are expressed as followed:

constraints:Substance a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:or ( [ sh:class :Code ] [ sh:class sphn-atc:ATC ] [ sh:class snomed:105590001 ] ) ;
            sh:path :hasCode ],
    [ sh:class :Quantity ;
        sh:maxCount 1 ;
        sh:minCount 0 ;
        sh:path :hasQuantity ] ;
    sh:targetClass :Substance .

The above constraints can be interpreted as follows: For all instances of the class :Substance, it must hold that the property :hasCode refers to an instance of at least one of the enumerated classes (i.e. a SPHN Code, an ATC class or a SNOMED CT class of the specific value or its children). This is ensured by the usage of the SHACL expression sh:or which lists all accepted classes.

In addition, if a certain property has to refer to an instance of a specific class or a specific set of classes and their subclasses are not allowed as values, then the shape would be complemented with a sh:sparql expression. One example where this constraint is required is the property :hasCode for instances of the class :AdministrativeGender. These constraints are expressed as followed:

constraints:AdministrativeGender a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :SubjectPseudoIdentifier ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ],
        [ sh:datatype xsd:dateTime ;
            sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:path :hasRecordDateTime ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:or ( [ sh:class snomed:261665006 ] [ sh:class snomed:703117000 ] [ sh:class snomed:74964007 ] [ sh:class snomed:703118005 ] ) ;
            sh:path :hasCode ] ;
    sh:sparql [ a sh:SPARQLConstraint ;
            sh:message "No descendents (all subclasses) of the specified codes are allowed" ;
            sh:select """SELECT ?this (<https://biomedit.ch/rdf/sphn-schema/sphn#hasCode> as ?path) (?class as ?value)
                                        WHERE {
                                            ?this <https://biomedit.ch/rdf/sphn-schema/sphn#hasCode>/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?class .
                                            FILTER( ?values IN ( <http://snomed.info/id/261665006>, <http://snomed.info/id/703117000>, <http://snomed.info/id/74964007>, <http://snomed.info/id/703118005> )) .
                                            FILTER (?class NOT IN ( ?values ) ) .
                                            FILTER NOT EXISTS { ?values <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        }""" ] ;
    sh:targetClass :AdministrativeGender .

The above constraint can be interpreted as follows: For all instances of the class :AdministrativeGender, it must hold that the property :hasCode refers to an instance of at least one of the enumerated classes (sh:or). No other value is allowed. If the property value points, for example, to an instance of a subclass of one of the enumerated classes, an error message will occur. This is ensured by the usage of the SHACL expression sh:sparql, which throws a message (sh:message) if it finds an instance of a subclass (sh:select).

Furthermore, if a certain property has to refer to an instance of specific class or a specific set of classes and only instances of direct subclasses of the specified classes are allowed, the sh:sparql expression is again used for encoding such restrictions. One example where this constraint is required is the property :hasCode for instances of the class :Intent. These constraints are expressed as followed:

constraints:Intent a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class snomed:363675004 ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasCode ] ;
    sh:sparql [ a sh:SPARQLConstraint ;
            sh:message "Only children (direct subclasses) of the specified codes are allowed" ;
            sh:select """SELECT ?this (<https://biomedit.ch/rdf/sphn-schema/sphn#hasCode> as ?path) (?class as ?value)
                                        WHERE {
                                            ?this <https://biomedit.ch/rdf/sphn-schema/sphn#hasCode>/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?class .
                                            FILTER( ?values IN ( <http://snomed.info/id/363675004> )) .
                                            ?child rdfs:subClassOf ?values .
                                            FILTER (?class NOT IN ( ?values, ?child) ) .
                                            FILTER NOT EXISTS { ?values <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                            FILTER NOT EXISTS { ?child <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?class .}
                                        }""" ] ;
    sh:targetClass :Intent .

Note

The no-subclasses-allowed and only-direct-subclasses_allowed constraints (sh:sparql) are not validated by GraphDB, but ignored.

SPARQL target constraints

To not cause unwanted validation errors when subclasses are validated against the constraints of their parent class, SPARQL target constraints are implemented for the SPHN classes with subclasses.

constraints:Measurement a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Quantity ;
            sh:path :hasQuantity ] ;
    sh:target [ a sh:SPARQLTarget ;
            sh:select """SELECT ?this
                    WHERE {
                    ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://biomedit.ch/rdf/sphn-schema/sphn#Measurement> .
                    MINUS {
                        ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://biomedit.ch/rdf/sphn-schema/sphn#Measurement> .
                        ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?other_type .
                        FILTER (?other_type != <https://biomedit.ch/rdf/sphn-schema/sphn#Measurement> )
                        ?this <http://www.w3.org/2000/01/rdf-schema#subClassOf>+  <https://biomedit.ch/rdf/sphn-schema/sphn#Measurement> .
                        }
                    }""" ] .

The above constraint can be interpreted as follows: Only instances of the class :Measurement that are not also instances of a subclass of :Measurement are validated against this constraint. Therefore, instances of subclasses (e.g. :OxygenSaturation) are validated only against the constraints:OxygenSaturation shape and not against the constraints of their parent class shape constraints:Measurement. This is ensured by the usage of the SHACL expression sh:SPARQLtarget, where instances of subclasses are excluded from the select query (sh:select).

Note

The target class constraint (sh:SPARQLtarget) is supported by GraphDb since its version 10.3.

Sequence paths

Some properties have a sequence of nodes specified as a path. This is expressed as followed:

constraints:Age a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :SubjectPseudoIdentifier ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasSubjectPseudoIdentifier ],
        [ sh:in ( ucum:h ucum:wk ucum:a ucum:d ucum:mo ucum:min ) ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path ( :hasQuantity :hasUnit :hasCode ) ] ;
    sh:targetClass :Age .

The above constraint can be interpreted as follows: For all instances of the class :Age, it must hold that the property :hasQuantity refers to an instance of at least one of the enumerated classes (sh:in) over the sequence path :hasQuantity / :hasUnit / :hasCode.

It means that when an age is given, the possible code values for its unit are only hour, week, day, year, month or minutes.

Note

The sequence paths are not validated by GraphDB, but ignored.

Literal type constraints

Besides the object properties where Restrictions on classes are used, there exist also data properties. On data properties we have the option to restrict the possible datatypes using Literal type constraints. In the class :Code, three of them are in use: on the properties :hasCodingSystemAndVersion, :hasIdentifier and :hasName, the SHACL file validates that the literal used is of type xsd:string.

constraints:Code a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasCodingSystemAndVersion ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 0 ;
            sh:path :hasName ],
        [ sh:datatype xsd:string ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasIdentifier ] ;
    sh:targetClass :Code .

The interpretation of the above constraint is: Whenever in an instance of :Code the property :hasName is used, the object needs to be a Literal of type xsd:string.

Restricting on individuals/instances

There exist cases where it is forbidden to create new instances of a class, but only already existing so-called individuals (instances) are allowed. This constraint is, for instance, applied on entities of the type :Biosample_fixationType as shown in the following:

constraints:Biosample_fixationType a sh:NodeShape ;
    sh:closed false ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:in ( :AldehydeBased :RNALater :VacuumTechnologyStabilization :Other :AlcoholBased :HeatStabilization :AllprotectTissueReagent :NeutralBufferedFormalin :SnapFreezing :UNK :OptimumCuttingTemperatureMedium :PAXgeneTissue :NonaldehydeWithAceticAcid :NonaldehydeBasedWithoutAceticAcid :NonbufferedFormalin ) ;
            sh:path [ sh:inversePath rdf:type ] ] ;
    sh:targetClass sphn:Biosample_fixationType .

This SHACL constraints ensures, that only explicitly enumerated individuals are used as instances for the class :Biosample_fixationType. In addition, it forbids by means of an inversePath constraint sh:inversePath rdf:type that new entities are derived as subclasses.

Restricting that the start is before the end

Whenever there are start and end datetimes given in the schema, a constraint is created to ensure that it is a valid timeframe (start before end).

constraints:ElectrocardiographicProcedure a sh:NodeShape ;
sh:closed false ;
sh:ignoredProperties ( rdf:type :hasIntent :hasPhysiologicState ) ;
sh:sparql [ a sh:SPARQLConstraint ;
        sh:message "Invalid time frame between sphn:hasStartDateTime and sphn:hasEndDateTime" ;
        sh:select """SELECT ?this (<https://biomedit.ch/rdf/sphn-schema/sphn#hasStartDateTime> as ?path) (?hasStartDateTime as ?value)
                WHERE {
                    ?this <https://biomedit.ch/rdf/sphn-schema/sphn#hasStartDateTime> ?hasStartDateTime .
                    ?this <https://biomedit.ch/rdf/sphn-schema/sphn#hasEndDateTime> ?hasEndDateTime .
                    FILTER (?hasStartDateTime > ?hasEndDateTime)
                }""" ] ;
sh:targetClass :ElectrocardiographicProcedure .

This shorterned excerpt of the SHACL shape of the ElectrocardiographicProcedure ensures in the sh:sparql that the dateTime of that is used in the hasStartDateTime is happening before the hasEndDateTime.

Naming convention on schema instances

The naming convention in 2.2 Naming convention for SPHN data instances describes the convention that must be used to instantiate resource of SPHN and project classes. This convention is translated into a validation constraint.

constraints:GenomicPosition_Warning_Naming a sh:NodeShape ;
sh:severity sh:Warning ;
sh:sparql [ a sh:SPARQLConstraint ;
        sh:message "Instantiated unique resource not matching naming convention '^https://biomedit.ch/rdf/sphn-resource/.*GenomicPosition-.*$'" ;
        sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                     SELECT ?this (?class as ?path) (?this as ?value)
                     WHERE {
                         ?this rdf:type ?class .
                         FILTER(!REGEX(STR(?this), "^https://biomedit.ch/rdf/sphn-resource/.*GenomicPosition-.*$"))
                         }""" ] ;
sh:targetClass :GenomicPosition .

Naming convention on shared instances

The naming convention in naming_convention_for_shared_resources describes the convention that must be used to instantiate shared resources like terminology instances. This convention is translated into a validation constraint.

constraints:sphnHeartRate_Warning_Naming a sh:NodeShape ;
sh:severity sh:Warning ;
sh:sparql [ a sh:SPARQLConstraint ;
        sh:message "Instantiated shared resource not matching naming convention '^https://biomedit.ch/rdf/sphn-resource/.*-Code-.*$'" ;
        sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                        SELECT ?this (<https://biomedit.ch/rdf/sphn-schema/sphn#hasRegularityCode> as ?path) (?code as ?value)
                        WHERE {
                            ?this <https://biomedit.ch/rdf/sphn-schema/sphn#hasRegularityCode> ?code .
                            FILTER(!REGEX(STR(?code), "^https://biomedit.ch/rdf/sphn-resource/.*-Code-.*$"))
                            }""" ];
sh:targetClass :HeartRate .

Old versioned code that has been valid, but is not valid in the current version

In case an old versioned code is used, which has been valid, but is not valid anymore; it is checked whether this code was affected by a meaning change in the time the historization covers. There is a single shape added that covers this for all of the terminology instances used, despite their usage in a class. This uses a lookup property precomputed by the terminology service.

constraints:OldVersionedCodeHasBeenValid a sh:NodeShape ;
    sh:severity sh:Info ;
    sh:sparql [ a sh:SPARQLConstraint ;
            sh:message "The versioned code is not valid anymore due to code meaning change." ;
            sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                        PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                        SELECT  ?this (rdf:type as ?path) (?type as ?value)
                        WHERE {
                            ?this rdf:type ?type .
                            }""" ] ;
    sh:target [ a sh:SPARQLTarget ;
            sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                        PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                        SELECT  ?this
                        WHERE {
                            ?this rdf:type ?type .
                            ?type sphn:hasMeaningValidityInCurrent ?validity .
                            FILTER(?validity = false)
                            }""" ] .

Whenever an instance is instance of a class that is annotated to be affected by a meaning change in the historization, then this triggers a result. This produces a warning with the message that there was a meaning change with the year given as result.

Old versioned code which is still valid

In case an old versioned code is used which is not in the most recent version, and this code is still valid; it is checked whether this code was affected by a meaning change. There is a single shape added that covers this for all of the terminology instances used, despite their usage in a specific class. This uses a lookup property precomputed by the terminology service.

constraints:OldVersionedCodeStillValid a sh:NodeShape ;
    sh:severity sh:Info ;
    sh:sparql [ a sh:SPARQLConstraint ;
            sh:message "The versioned code is old but still valid" ;
            sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                        PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                        SELECT  ?this (rdf:type as ?path) (?type as ?value)
                        WHERE {
                            ?this rdf:type ?type .
                            }""" ] ;
    sh:target [ a sh:SPARQLTarget ;
            sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                            PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                            PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                            SELECT  ?this
                            WHERE {
                                ?this rdf:type ?type .
                                ?type sphn:hasMeaningValidityInCurrent ?validity .
                                OPTIONAL {?type sphn:isCurrent ?isCurrent . }
                                FILTER(?validity = true && (!BOUND(?isCurrent) || ?isCurrent != true))
                                }""" ] .

Whenever an instance is instance of a class that is annotated to be affected by a meaning change in the historization, then this triggers a result. This produces a warning with the message that there was a meaning change with the year given as result.

Unversioned code has changed its meaning in historization

In case an unversioned code is used, where a versioned could have been used; it is checked whether this code was affected by a meaning change in the time the historization covers. There is a single shape added that covers this for all of the terminology instances used, despite their usage in a specific class. This uses a lookup property precomputed by the terminology service.

constraints:UnversionedCodeHasMeaningChange a sh:NodeShape ;
sh:severity sh:Warning ;
sh:sparql [ a sh:SPARQLConstraint ;
        sh:message "The unversioned code you are using has changed its meaning during the time. Please consider using a versioned code." ;
        sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                     PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                     SELECT  ?this (rdf:type as ?path) (STR(?codeChange) as ?value)
                     WHERE {
                         ?this rdf:type ?type .
                         ?type sphn:hasMeaningChangeInHistorization ?codeChange .
                         }""" ] ;
sh:target [ a sh:SPARQLTarget ;
        sh:select """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                     PREFIX sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#>
                     SELECT  ?this
                     WHERE {
                         ?this rdf:type ?type .
                         ?type sphn:hasMeaningChangeInHistorization ?codeChange .
                         }""" ] .

Whenever an instance is the instance of a class that is annotated to be affected by a meaning change in the historization, then this triggers a result. This produces a warning with the message that there was a meaning change with the year given as result.

Implementation examples

Class Example
constraints:Quantity a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:class :Unit ;
            sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:path :hasUnit ],
        [ sh:maxCount 1 ;
            sh:minCount 1 ;
            sh:or ( [ sh:datatype xsd:double ] [ sh:datatype xsd:string ] ) ;
            sh:path :hasValue ] ;
    sh:targetClass :Quantity .

The NodeShape shown here is generated through various parts of the schema. From bottom to the top:

  • there is a class :Quantity in the schema (last Line: sh:targetClass :Quantity)

  • the properties :hasUnit and :hasValue do have the :Quantity in their domain specification (sh:property and following)

  • Both properties have given cardinalities (sh:minCount and sh:maxCount)

  • the property :hasUnit has the :Unit class in the range (sh:property and following). The target class will have a NodeShape on its own.

  • the property :hasValue has the xsd:double and xsd:string from the Terminologies in the range (sh:or and following lines).

  • the rdf:type is ignored unless explicitly specified

  • the shape is closed (sh:closed true) to define there are no other properties allowed.

Meaning Binding / Individual Example
constraints:OncologyTreatmentAssessment_result a sh:NodeShape ;
    sh:closed true ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:in ( :CompleteResponse :StableDisease :Unknown :ProgressiveDisease :PartialResponse ) ;
            sh:inversePath rdf:type ] ;
    sh:targetClass :OncologyTreatmentAssessment_result .

A Meaning Binding or Individual also result in a NodeShape as shown just above. From bottom to the top:

  • there is a class :OncologyTreatmentAssessment_result in the schema (last Line: sh:targetClass :OncologyTreatmentAssessment_result)

  • the inverse property of the type sh:inversePath rdf:type means all instances of the class OncologyTreatmentAssessment_result have to be in the list specified in the sh:in list. Only :CompleteResponse, :StableDisease, :unknown, :ProgressiveDisease and :PartialResponse are allowed

  • the rdf:type is ignored unless explicitly specified

  • the shape is closed (sh:closed true) to define there are no other properties allowed.

Validating data with the SHACL file

Data producers can use this SHACL file to validate the data that has been exported according to the given RDF Schema (see Data validation with SHACL). Validating data before sending it to users avoids distributing data inconsistent with the RDF Schema (e.g., data with missing properties; data with properties that have not been specified in the RDF Schema; data with wrong data types, etc.)

Validation constraint severity levels

There are three different severity levels implemented in the SHACLer:

  • [ERROR] : violating a constraint with this severity fails the validation. For a successful validation you must remediate the error. The validation error message gives you information about the issue.

  • [WARN] : violating a constraint with this severity does not fail the validation. It is recommended to check the data whether there is a potential error.

  • [INFO] : violating a constraint with this severity does not fail the validation. It has informational character only. It is used for e.g. informing that an old but still valid code is used for a historized terminology.

SPHN Schema Visualization Tool

The SPHN Schema Visualization Tool generates a human-readable HTML document describing a RDF schema by extracting information directly from the schema.

The SPHN Schema Visualization Tool is used to generate the version of the SPHN RDF Schema viewed at: https://biomedit.ch/rdf/sphn-schema/sphn.

The tool is based on pyLODE and provides detailed information about:

  • classes

  • class restrictions

  • object properties

  • datatype properties

  • annotation properties

  • named individuals

as defined in the RDF Schema.

In SPHN, the tool integrates a search function to the web documentation which enables to look for classes, properties, and individuals of interest (see Figure 1).

Search function HTML visualization

Figure 1. Search function of the SPHN Schema Visualization.

SPHN projects have the possibility to use the SPHN Schema Visualization Tool for building their own HTML file to view the schema in a human-readable representation. For that, please check the user guide.

Visualization of classes

For each class, as shown in Figure 2, the following information is displayed:

  • the name of the class

  • the URI of the class

  • a description defining the class

  • a graphical representation of the class and its outgoing properties

  • if provided, a meaning binding of the class to standard existing terminology codes

  • the parent classes of the class

  • a table listing the properties the class is a subject of, with the allowed cardinality, class or datatype of the range and possibly the restrictions, if any

  • restrictions on properties of the class (the class being a subject) are elucidated with the allowed codes for specific restrictions

  • the list of properties in which the class is a range - meaning where it can be used as a value.

Class visualization in HTML

Figure 2. View of the ‘Drug’ class in HTML. The metadata and connections to the Drug class are represented in the HTML generated by the SPHN Schema Visualization Tool.

Note

Currently, the graphical images seen at the class level in the SPHN RDF Schema visualization are not generated by the SPHN Schema Visualization Tool. The images have been manually generated. The SPHN Schema Visualization Tool is then able to integrate them in the generated HTML documentation into each class.

Visualization of properties

For each property, whether it is an object property or a datatype property, the following information is displayed:

  • the name of the property

  • the URI of the property

  • a description defining the property

  • the parent properties of the property (called ‘Super-properties’)

  • the domain(s) of the property (class enabled to be used as subject of the property)

  • the range(s) or data type(s) of the property (class or data element enabled to be used as an object/value of the property).

Properties visualization

Figure 3. View of the ‘hasTranscript’ property in HTML. The metadata to the hasTranscript property is represented in the HTML generated by the SPHN Schema Visualization Tool.

Visualization of valuesets and individuals

In the RDF schemas, individuals can be generated to accommodate for values that can’t be encoded with standard codes. These individuals are then connected to specific classes for reflecting the Valuesets and the context in which they can be used. The Valueset being a collection of specific individual values used to describe a Class Restriction. Each Valueset contains the following information:

  • The name of the Valueset

  • The URI of the Individual

  • A description of the Valueset usage

  • The Parent which will be either the SPHN Valueset or a Project Valueset

  • The Individuals that make up the Valueset

Valueset visualization

Figure 4. View of the ‘Comparator’ valueset in HTML.

For each Individual, the following information is displayed:

  • The name of the Individual

  • The Valueset, which the Individual belongs to

  • The URI of the Individual

Individual visualization

Figure 4. View of the ‘GreaterThan’ individual in HTML.

SPHN SPARQLer

The SPARQLer generates statistical queries in order to make data exploration easier and may serve as a starting point for refined project queries. There are 4 different kinds of queries that are generated:

  • Flattening

  • hasCode

  • Count Instances

  • Min Max

Flattening

The first kind of query is the flattening query. For any base-concept, this query displays all the connections and downstream concepts that are instantiateable within SPHN and project schemas. In case there are technical loops, the loop is explored only once. An example for the Assessment Result concept can be found below:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-schema/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
    ?resource a <https://biomedit.ch/rdf/sphn-schema/sphn#AssessmentResult> .
    optional{ ?resource sphn:hasQuantity ?Quantity . }
    optional{ ?resource sphn:hasQuantity/sphn:hasComparator ?Quantity_Comparator . }
    optional{ ?resource sphn:hasQuantity/sphn:hasValue ?Quantity_Value . }
    optional{ ?resource sphn:hasQuantity/sphn:hasUnit ?Quantity_Unit . }
    optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?Quantity_Unit_Code . }
    optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?Quantity_Unit_Code_Identifier . }
    optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?Quantity_Unit_Code_Name . }
    optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?Quantity_Unit_Code_CodingSystemAndVersion . }
    optional{ ?resource sphn:hasCode ?Code . }
    optional{ ?resource sphn:hasCode/sphn:hasIdentifier ?Code_Identifier . }
    optional{ ?resource sphn:hasCode/sphn:hasName ?Code_Name . }
    optional{ ?resource sphn:hasCode/sphn:hasCodingSystemAndVersion ?Code_CodingSystemAndVersion . }
    optional{ ?resource sphn:hasStringValue ?StringValue . }
}

has Code

The exploration that is provided for the hasCode properties is based on the same idea as the flattening. However, only the instances of hasCode are displayed and a count is produced. For Assessment Result the generated query is as follows:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-schema/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
    {
    SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
    WHERE {
        ?resource a <https://biomedit.ch/rdf/sphn-schema/sphn#AssessmentResult> .

        BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode" as ?origin)
        optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?code . }
        optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
        optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?code_name . }
        optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }

    }
    GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
    }
        UNION
    {
    SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
    WHERE {
        ?resource a <https://biomedit.ch/rdf/sphn-schema/sphn#AssessmentResult> .

        BIND("sphn:hasCode" as ?origin)
        optional{ ?resource sphn:hasCode ?code . }
        optional{ ?resource sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
        optional{ ?resource sphn:hasCode/sphn:hasName ?code_name . }
        optional{ ?resource sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }

    }
    GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
    }
}

Count Instances

The count instances works similarly to the flattening only that instead of displaying the results for the flattening, it only displays the count for these different connections. This can be used easily to estimate data sizes.

Min Max

In case our data includes numerics or datetimes, one interesting metric would be to know the minimum (Min) and maximum (Max) of such values. This is exactly what the following query does for Assessment Result:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-schema/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
    {
    SELECT ?origin (MIN(?value) as ?min) (MAX(?value) as ?max)
    WHERE {
        ?resource a <https://biomedit.ch/rdf/sphn-schema/sphn#AssessmentResult> .

        BIND("sphn:hasQuantity/sphn:hasValue" as ?origin)
        optional{ ?resource sphn:hasQuantity/sphn:hasValue ?value . }

    }
    GROUP BY ?origin
    }
}

Availability and usage rights

© Copyright 2024, Personalized Health Informatics Group (PHI), SIB Swiss Institute of Bioinformatics

The SPHN Schema Forge is available at https://schemaforge.dcc.sib.swiss/ and is licensed under the GPLv3 license.

The Dataset2RDF is available at https://git.dcc.sib.swiss/sphn-semantic-framework/dataset2rdf licensed under the GPLv3 license.

The SHACLer is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-shacl-generator and is licensed under the GPLv3 license.

The SPHN Schema Visualization Tool is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-schema-documentation-visualization and is licensed under the GPLv3 license. It is based on pyLODE that is licensed under the BSD 3-Clause licence.

For any question or comment, please contact the SPHN Data Coordination Center (DCC) at dcc@sib.swiss.