SPHN RDF Schema

Scope of the SPHN RDF schema

The SPHN RDF schema provides an interoperable framework for conveying information and storing health-related data from SPHN-related projects using semantic web technologies including RDF (see background section on RDF). The schema facilitates the integration of existing external resources within the SPHN framework. The SPHN RDF schema is based on the SPHN Dataset (available at: https://sphn.ch/document/sphn-dataset/) and transforms its elements into a formal structure (see Figure 1). This documentation provides an overview of the content of the SPHN RDF schema that complies with version 2022.1 and version 2022.2 of the schema. A visual representation of the schema can be found at: https://biomedit.ch/rdf/sphn-ontology/sphn.

Dataset to RDF

Figure 1. Core elements defined in the SPHN dataset and their translation into RDF.

Note

The words schema and ontology are used interchangeably to refer to the SPHN RDF schema.

Concepts of the SPHN dataset are translated into classes (owl:Class) and concepts’ compositions become either object properties (owl:ObjectProperty) or data properties (owl:DataProperty) depending on the type of the value. Table 1 summarizes these properties.

Table 1. Data transformation into object and data property based on the type of the composedOfs’ values.

Property

Concept composition

Example

Object

Another concept

A class

Qualitative element

A set of values valid for a concept provided in the SPHN dataset

Data

String

xsd:string element

Quantitative

xsd:double element

Temporal

xsd:dateTime element

Value sets defined in the dataset are represented as individuals (owl:NamedIndividual). The meaning binding to SNOMED CT and LOINC provided in the SPHN dataset is represented as one or more equivalent classes to a SPHN class (owl:equivalentClass).

Note

Meaning binding provided in the dataset for ‘composedOf’ are not taken into account in the RDF schema because semantically, in RDF, a property (the ‘composedOf’ statement from the dataset) can’t be equivalent to a class (the meaning binding provided in the dataset).

Technical specification of the schema

SPHN RDF namespace

The namespace of the SPHN RDF schema may be divided into two parts:

  • The SPHN ontology IRI: https://biomedit.ch/rdf/sphn-ontology/sphn. The ontology IRI remains fixed and must be defined in the SPHN RDF schema. The ontology IRI can be considered as the “base prefix” and will be used by both data providers (to annotate data) and data users (to query for the relevant classes/properties).

  • The SPHN versioned IRI: https://biomedit.ch/rdf/sphn-ontology/sphn/2022/2. It is provided by the DCC at each published release of the SPHN RDF schema and enables to distinguish different versions of the SPHN RDF schema. The versioned IRI is used by the projects to refer to a specific version of the SPHN RDF schema and is therefore included in the header of all datasets generated using this schema version.

Note

The SPHN RDF schema has dereferenceable links meaning that the content (classes, properties, individuals) is accessible and resolvable on the web when clicking on the link.

Versioning of the schema

Each release of the SPHN RDF schema has a version associated with it. The version, indicated by the tag owl:versionIRI in the SPHN header information, contains the year of release and the release number for that year (e.g. https://biomedit.ch/rdf/sphn-ontology/sphn/2022/1 corresponds to the first release of the SPHN RDF schema in 2022). The published ontology IRI will always point to the latest version IRI of the SPHN RDF schema.

Official releases of the SPHN RDF schema are:

SPHN header information

The header of the SPHN RDF schema is contains the following information:

  • The title of the schema (dc:title)

  • A short description of the content of the schema (dc:description)

  • The license defining the terms and conditions of usage of the schema (dct:license)

  • The version of the schema (owl:versionIRI)

  • The external ontologies to be imported (owl:imports)

  • The previously released version of the schema (owl:priorVersion)

Below is the header information of a turtle file:

<https://biomedit.ch/rdf/sphn-ontology/sphn> rdf:type owl:Ontology ;

owl:versionIRI <https://biomedit.ch/rdf/sphn-ontology/sphn/2022/2> ;
owl:imports    <http://biomedit.ch/rdf/sphn-resource/icd-10-gm/2022/2> ,
               <http://snomed.info/sct/900000000000207008/version/20220207> ,
               <https://biomedit.ch/rdf/sphn-resource/atc/2022/1> ,
               <https://biomedit.ch/rdf/sphn-resource/chop/2022/4> ,
               <https://biomedit.ch/rdf/sphn-resource/loinc/2.72/1> ,
               <https://biomedit.ch/rdf/sphn-resource/ucum/2021/1> ;
dc:description "The RDF schema describing concepts defined in the official SPHN dataset"@en ;
dc:title "The SPHN RDF Schema"@en ;
dct:license <https://creativecommons.org/licenses/by-nc-sa/4.0/> .
owl:priorVersion <https://biomedit.ch/rdf/sphn-ontology/sphn/2022/1> .

Note

  • The schema header provides information about the version of the external terminologies used in the SPHN RDF schema 2022.2.

  • The IRI of SNOMED CT is not within the biomedit.ch domain since the turtle file is generated automatically using the Snomed OWL Toolkit.

About the SPHN RDF classes

Naming convention for classes

The classes defined in the SPHN RDF schema come from the concepts defined in the SPHN dataset (column ‘general concept name’): one concept corresponds to one class. The unique identifier of a class corresponds to a concatenation of the words forming the concept written in an UpperCase format. For example, the concept Radiotherapy Procedure concept is defined as a RadiotherapyProcedure class in RDF. A class necessarily contains the following information:

  • An rdfs:label corresponding to the class name with a space between words for better readability,

  • An rdfs:comment containing the description of the class.

A class may have other annotations which are detailed in the section Annotations for enriching knowledge content.

In addition, a few classes have a meaning binding associated with either a SNOMED CT or LOINC code. This meaning binding is represented in the schema with the annotation owl:equivalentClass.

Additional classes definitions in the SPHN RDF schema

Besides the concepts defined in the SPHN Dataset, five classes have been generated to represent specific types of metadata in RDF:

  • A Terminology class, which groups classes and individuals of external resources (e.g. SNOMED CT, ATC, CHOP) used within the SPHN context to be able to refer to them as possible values (i.e. the SNOMED CT code 703117000 is a possible value for the SPHN gender property), equivalent classes (i.e. the LOINC code 8867-4 is an equivalent class of the SPHN HeartRate class), or even individuals (e.g. UCUM units are individuals that can be directly referred to as property values).

RDF terminology tree

Figure 2. SPHN RDF tree. Terminology is the parent class of ATC, CHOP, ICD-10-GM, LOINC, SNOMED CT and UCUM that are imported as external terminologies. Note in the figure above that UCUM does not have an arrow to further expand sub-classes because UCUM elements are defined as individuals and not as classes.

  • A ValueSet class, which groups classes defining specific SPHN instances of possible values to use for certain object properties. The convention used for defining a ValueSet is: <Class>_<composedOf general concept name> (e.g. AdverseEvent_consequences, Biosample_fixationType).

  • A DataRelease class, which stamps the extraction date of a dataset together with information related to the version of the SPHN or project dataset (see Versioning of the data for more information).

  • A Deprecated class, which groups classes from a previous release of the schema that are no longer used in the current version.

  • A Measurement class, which group classes belonging to the category of measurements done on a patient (e.g. OxygenSaturation, BodyWeight).

Class hierarchies

Some hierarchies are defined in the SPHN Dataset and interpreted in the RDF schema. The following classes are parent terms to their children classes defined in the schema:

  • Diagnosis parent of FOPHDiagnosis, ICDODiagnosis, NursingDiagnosis

  • Procedure parent of DiagnosticRadiologicExamination, FOPHProcedure, RadiotherapyProcedure

  • MedicalDevice parent of LabAnalyzer

  • Measurement parent of BloodPressure, BodyHeight, BodyTemperature, BodyWeight, CircumferenceMeasure, HeartRate, OxygenSaturation and RespiratoryRate.

The rules of inheritance of properties is applied. All properties annotated at a parent class are automatically inherited in the children class. Therefore, in the rdfs:Domain of inherited properties, only the parent class is explicitly stated.

The only exception is the sphn:hasCode property of the Diagnosis which is specified as sphn:hasMorphologyCode for the ICDODiagnosis.

About the SPHN RDF properties

Concepts in the SPHN Dataset contain compositions (i.e. information about a concept) that are translated in the SPHN RDF schema as object properties (relationship between individuals) or data properties (relationship of individual to a literal data).

The IRIs of properties are named after the general concept name column in the SPHN Dataset. The convention used is to define properties with has + <general concept name> as showcased in the tables provided in the next subsections (Table 2 for object properties and Table 3 for data properties).

Object properties

Object properties in the SPHN RDF schema can define relationships between:

  • Resources from the clinical data (a given instance of BodySite is connected to a patient’s CircumferenceMeasure) and

  • Resources and elements collected from external ontologies (the FOPH Diagnosis is identified by a specific ICD-10-GM Code).

Table 2. Example of object property identifiers from convention.

object property

domain

range

hasBodySite

CircumferenceMeasure

BodySite

hasTherapeuticArea

HealthcareEncounter

TherapeuticArea

hasCode

FOPHDiagnosis

ICD-10-GM

An additional set of four object properties are generated in the RDF schema, not represented in nor required by the SPHN Dataset:

  • hasSubjectPseudoIdentifier, connecting an information to the patient identifier

  • hasDataProviderInstitute, connecting an information to the data provider

  • hasAdministrativeCase, connecting an information to the case

  • hasExtractionDateTime, providing information about the time of extraction of the data.

Data properties

Data properties in the SPHN RDF schema point to literal values of given SPHN concepts.

Table 3. Example of data properties identified with the dataset property names.

data property

domain

general concept name (from SPHN Dataset)

range

hasComment

LabResult

comment

xsd:string

hasScoringSystemCode

SimpleScore

scoring system code

xsd:string

hasFractionsNumber

RadiotherapyProcedure

fractions number

xsd:double

Constraints added to properties

An owl:Restriction is a particular type of class description that puts value and cardinality constraints to a property. There can be multiple constraints added to narrow down and/or enrich a concept. Value constraints are meant to restrict the possible types of values of a property. Cardinality constraints restrict the number of instances of a property.

The rules for inheritance of restrictions are applied. Any restrictions annotated on a property are automatically inherited by the subproperties. That is, if a subclass (inheriting the parent class’ property) has a subproperty of that property, the restrictions of this subproperty must be in the range of the restrictions of the parent property. For example, if the parent property has cardinality 0:1, the subproperty may have cardinalities 0:0, 0:1 or 1:1.

Each instance of a subproperty is also an instance of the parent property.

Value constraint

The range of a property usually encodes the type of value allowed for this property. However, in some cases, it was not possible to straightforwardly encode additional information needed to account for the full context. In these cases, value constraints are used to ensure a clean modeling of the data.

For instance, the Code of a Unit given to an OxygenSaturation is %. However, it is not possible to annotate that the property hasCode can only have as a range % since hasCode can be used in another context.

To solve this issue, from version 2022.1 of the SPHN RDF Schema, value constraints (using owl:Restriction) have been integrated in these particular cases.

Coming back to the example of the oxygen saturation’s unit constraint, the following constraint (in .ttl format) can be built:

sphn:OxygenSaturation rdf:type owl:Class ;
       owl:subClassOf [ rdf:type owl:Restriction ;
               owl:onProperty sphn:hasQuantity
               owl:someValuesFrom [ rdf:type owl:Restriction
                               owl:onProperty sphn:hasUnit ;
                               owl:someValuesFrom [ rdf:type owl:Restriction ;
                                               owl:onProperty sphn:hasCode ;
                                               owl:hasValue ucum:percent ]]];

The code above can be read line by line as:

  • The sphn:OxygenSaturation is a class.

  • It has the following restriction statement:

  • whenever the sphn:hasQuantity is used for describing the sphn:OxygenSaturation,

  • only some specific values are allowed, which is in reality a second constraint statement.

  • A constraint is applied to the property sphn:hasUnit

  • that again also has only specific values allowed leading to a third constraint statement.

  • A constraint is now applied to the property sphn:hasCode

  • which allows only for the following value: ucum:percent.

In other words, when annotating the quantity of an OxygenSaturation, the Code representing the Unit must be the instance percent coming from the UCUM notation (see Figure xxx). No other values are allowed. The constraints are represented in a nested structure that follows the properties’ path as they are encoded in the SPHN RDF schema.

OWL restriction

Figure 3 . Example of property path from OxygenSaturation to the possible code of the Unit.

Cardinality constraint

Cardinalities constraints have been implemented to restrict the number of values an instance of a class may have for specific properties. The owl:minCardinality and owl:maxCardinality notation have been used.

For instance, the Lab Result may have 0 or at most 1 Lab Test connected:

owl:intersectionOf ( [ rdf:type owl:Restriction ;
                                 owl:onProperty sphn:hasLabTest ;
                                 owl:minCardinality "0"^^xsd:nonNegativeInteger
                     ]
                     [ rdf:type owl:Restriction ;
                                 owl:onProperty sphn:hasLabTest ;
                                 owl:maxCardinality "1"^^xsd:nonNegativeInteger
                    ]
                   ) ;

Both value and cardinality constraints are visualized in the PyLODE documentation in the definitions of the classes (https://biomedit.ch/rdf/sphn-ontology/sphn/2022/1).

Cardinality PyLODE

Figure 4. Example of cardinality constraints shown in the SPHN RDF visualization for Lab Result (with PyLODE).

Annotations for enriching knowledge content

In some classes, there is a need to annotate specific information provided in the SPHN Dataset. These additional annotations relate to exceptions in the way property restrictions should be understood. Therefore, a new SPHN annotation was created to ease connecting elements from one SPHN RDF schema to another.

skos:definition for subclass values not allowed

SNOMED CT provides a hierarchical structure of its codes. When values are given as SNOMED CT codes, it implicitly means that, by default, any children of the provided code are valid results when taking into account the inheritance rules defined in Semantic Web: a child class is a parent class. The child class is simply a more specific element of the parent class. Sometimes SPHN makes restrictions and only allows for explicitly stated codes to be valid in a defined value set. This information is encoded as a skos:definition with the following text provided for each property: sphn:<property> subclasses not allowed.

For instance, values for defining the administrative gender of a patient are strictly limited to male, female, other and unknown. This value set is provided as an owl:Restriction:

[ rdf:type owl:Restriction ;
       owl:onProperty sphn:hasCode ;
       owl:someValuesFrom [ rdf:type owl:Class ;
                       owl:unionOf (   <http://snomed.info/id/261665006>
                                       <http://snomed.info/id/703117000>
                                       <http://snomed.info/id/703118005>
                                       <http://snomed.info/id/74964007>
                                    )
                           ]
]

In addition, the following annotation, given as information in the schema, states that only these values are permitted and no children of these codes are allowed:

sphn:AdministrativeGender skos:definition "sphn:hasCode subclasses not allowed" .

This information is used in the SHACLer for strictly limiting the set of values of a property and is provided in the PyLODE visualization of the SPHN RDF ontology to indicate in the restriction field that child terms are not allowed. Here is the same AdministrativeGender hasCode property example represented in PyLODE:

skos definition PyLODE

Figure 5. Example of the interpretation of a skos:definition constraint in the SPHN RDF visualization (with PyLODE). The skos:definition indicates that the child terms of the given value set are not allowed to be provided.

skos:note for permitted standards and extendable value sets

The SPHN dataset in some cases defines the external standards (or terminologies) that can be used for a given property in a certain context. This information is encoded in the schema as a skos:note.

For instance, the code of a ProblemCondition may come from ICPC or another standard. This information is depicted as follow in .ttl:

sphn:ProblemCondition skos:note "sphn:hasCode allowed coding system: ICPC or other" .

The SPHN dataset in some cases also defines extendable value sets. Extendable value sets establish properties with recommended or example values, where the data provider has the option of extending them with other values as they see fit. This knowledge is represented as follows in RDF: the owl:Restriction given for this property value is not taking into account the single values provided by the dataset but matches to a higher level of class in the hierarchy from the coding system, enabling other values to be used. The values recommended in the SPHN dataset are provided with the annotation skos:note.

For instance, the list of codes about body sites where the oxygen saturation can be measured is an extendable value set in SPHN. The owl:Restriction is given in BodySite where Code values allowed are only children of snomed:123037004 | body structure (body structure) |, as represented below in .ttl:

[ rdf:type owl:Restriction ;
       owl:onProperty sphn:hasCode ;
       owl:someValuesFrom <http://snomed.info/id/123037004>
]

The specific values recommended in the SPHN dataset to be used are: snomed:29707007 | Toe structure (body structure), snomed:7569003 | Finger structure (body structure) |, snomed:48800003 | Ear lobule structure (body structure) |.

These values are indicated in the following note annotated in the OxygenSaturation class:

sphn:OxygenSaturation skos:note "sphn:hasBodySite/sphn:hasCode recommended values [snomed:29707007; snomed:7569003; snomed:48800003]"

sphn:replaces for reference of previous classes and properties

Changes in classes or properties between different versions of the SPHN RDF Schema are tracked directly in RDF with the sphn:replaces annotation.

For instance, in SPHN 2022.1, the property https://biomedit.ch/rdf/sphn-ontology/sphn#hasActiveIngredient replaces the property https://biomedit.ch/rdf/sphn-ontology/sphn#hasDrugActiveIngredientSubstance defined in SPHN 2021.2. The following statement is written in the SPHN 2022.1 RDF schema:

sphn:hasActiveIngredient       rdf:type owl:ObjectProperty ;
                               sphn:replaces sphn:hasDrugActiveIngredientSubstance .

This annotation is currently used in the migration path tool to generate a CSV file of the differences between two version of SPHN RDF schema.

Development process

After the release of each new version of the SPHN Dataset the corresponding SPHN RDF Schema is reviewed by the SPHN Data Coordination Center (DCC) in close collaboration with the IT experts of the Swiss University Hospitals and revised as necessary.

Availability and usage rights

The SPHN RDF schema is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-ontology/ and can be visualized at https://biomedit.ch/rdf/sphn-ontology/sphn.

External terminologies are accessible through the Terminology Service.

If you need further information, please contact the SPHN Data Coordination Center (DCC) at dcc@sib.swiss.

The SPHN RDF schema is under the CC BY-NC-SA 4.0 License.