SPHN RDF Schema

Scope of the SPHN RDF schema

The SPHN RDF schema provides an interoperable framework for the transport and storage format of health-related data of SPHN-related projects. It also facilitates the integration of, and connection to existing external resources within the SPHN framework. The SPHN RDF schema, based on the SPHN dataset (available at: https://sphn.ch/document/sphn-dataset/), transforms elements of the SPHN dataset into a formal structure (see Figure 1). This documentation, an overview of the content of the SPHN RDF schema, relates to version 2021.2 of the SPHN RDF.

Dataset to RDF

Figure 1. Core elements of the SPHN dataset and their translation into the Web Ontology Language (OWL).

Concepts of the SPHN dataset are translated into classes (owl:Class) and concepts’ compositions become either object properties (owl:ObjectProperty) or data properties (owl:DataProperty). The following table (Table 1) summarises these properties.

Table 1. Example of object property labels from general convention.

Property

Concept composition

Example

Object

Another concept

A class

Qualitative element

A specific set of values valid for a concept composition provided in the SPHN dataset

Data

String

xsd:string element

Quantitative

xsd:double element

Temporal

xsd:dateTime element

Value sets defined in the dataset are represented as individuals (owl:NamedIndividuals). The meaning binding of SNOMED CT and LOINC provided by the SPHN dataset is represented as one or more equivalent classes of a SPHN class (owl:equivalentClass).

Technical specification

SPHN RDF namespace

The namespace of the SPHN RDF schema may be destructured into two parts:

  • The SPHN ontology IRI: https://biomedit.ch/rdf/sphn-ontology/sphn. The ontology IRI remains fixed and must be defined in the SPHN RDF schema. The ontology IRI can be considered as the “base prefix” and will be used by both data providers (to annotate data) and data users (to query for the relevant classes/properties).

  • The SPHN versioned IRI: https://biomedit.ch/rdf/sphn-ontology/sphn/2021/2. It is provided by the DCC at each published release of the SPHN RDF schema. The versioned IRI is used by the projects to refer to a specific version of the SPHN RDF schema and is therefore imported in the header of all datasets generated using this schema version.

Versioning of the schema

Each release of the SPHN RDF schema has a version associated with it. The version, indicated by the tag owl:versionIRI in the SPHN header information schema, contains the year of release and the release number for that year (e.g. http://biomedit.ch/rdf/sphn-ontology/sphn/2021/2 corresponds to the second release of the SPHN RDF schema in 2021). The published ontology IRI will always point to the latest version IRI of the SPHN RDF schema.

SPHN header information

The header of the SPHN RDF schema is contains the following information:

  • The title of the schema (dc:title),

  • A short description of the content of the schema (dc:description),

  • The license defining the terms and conditions of usage of the schema (dcterms:license)

  • The version of the schema (owl:versionIRI)

  • The external ontologies to be imported (owl:imports)

<https://biomedit.ch/rdf/sphn-ontology/sphn> rdf:type owl:Ontology ;
    owl:versionIRI <https://biomedit.ch/rdf/sphn-ontology/sphn/2021/2> ;
    owl:imports <http://biomedit.ch/rdf/sphn-resource/icd-10-gm/2021/1> ,
         <http://snomed.info/sct/900000000000207008/version/20210304> ,
         <https://biomedit.ch/rdf/sphn-resource/atc/2021/3> ,
         <https://biomedit.ch/rdf/sphn-resource/chop/2020/3> ,
         <https://biomedit.ch/rdf/sphn-resource/loinc/2.69/1> ,
         <https://biomedit.ch/rdf/sphn-resource/ucum/2021/1> ;
    dc:description "The RDF schema describing concepts defined in the official SPHN dataset"@en ;
    dc:title "The SPHN RDF Schema"@en ;
    dct:license <https://creativecommons.org/licenses/by-nc-sa/4.0/> .

SPHN RDF classes

The classes defined in the SPHN RDF schema come from the concepts defined in the SPHN dataset: one concept corresponds to one class. The unique identifier of a class corresponds to a concatenation of the words forming the concept written in an UpperCase format (e.g. a Radiotherapy Procedure concept is defined as a RadiotherapyProcedure class in RDF). A class contains the following information:

  • A rdfs:label corresponding to the class name with a space between words for better readability,

  • A rdfs:comment containing a the description of the class.

In addition, a few classes have a meaning binding associated with either a SNOMED CT or LOINC code. This meaning binding is represented in the schema with the annotation owl:equivalentClass.

The RDF schema provides a supplementary level of information by grouping a few classes together, for instance, Measurement is a class created in the RDF schema to group classes belonging to these categories (e.g. OxygenSaturation, BodyWeight).

In addition to the concepts defined in the dataset, three classes needed to be generated to represent specific types of metadata in RDF:

  • A Terminology class, which groups classes and individuals of external resources (e.g. SNOMED CT, ATC, CHOP) used within the SPHN context to be able to refer to them as possible property ranges (i.e. the SNOMED CT code 703117000 is a possible value for the SPHN gender property), equivalent classes (i.e. the LOINC code 8867-4 is an equivalent class of the SPHN HeartRate class), or even individuals (e.g. UCUM units are individuals that can be directly referred to as property values).

RDF terminology tree

Figure 2. SPHN RDF tree. Terminology is the parent class of ATC, CHOP, ICD-10-GM, LOINC, SNOMED CT and UCUM that are imported as external terminologies. Note in the figure above that UCUM does not have an arrow to further expand sub-classes because UCUM elements are being defined as individuals and not as classes.

  • A ValueSet class, which groups classes defining specific SPHN instances of possible values to use for certain object properties. The convention used for defining a ValueSet is: <Class>_<datasetPropertyName> (e.g. AdverseEvent_consequences, Biosample_fixationType). These classes are provided as property ranges of associated object properties.

  • A DataRelease class, which stamps the extraction date of a dataset together with information related to the version of the SPHN or project dataset (see Versioning of the data for more information).

SPHN RDF properties

Concepts in the SPHN dataset contain compositions (i.e. metadata about a concept) that are translated in the SPHN RDF schema as object properties (relationship between individuals) or data properties (relationship of individual to a literal data).

Object properties

Object properties in the SPHN RDF schema can define relationships between

  • resources from the clinical data (a given instance of HeartRate is connected to a patient SubjectPseudoIdentifier) and

  • resources and elements collected from external ontologies (the FOPH Diagnosis is identified by a specific ICD-10-GM Code)

The general convention used to generate the objects’ properties is the following: has<DomainClass><RangeClass>. Some examples are given in Table 2.

Table 2. Example of object property identifiers from general convention.

object property

domain

range

hasCentralVenousPressureCatheter

CentralVenousPressure

Catheter

hasHealthcareEncounterTherapeuticArea

HealthcareEncounter

TherapeuticArea

If multiple object properties have the same range the ‘specific’ object properties are grouped into a single ‘generic’ property.

Note: When the meaning of the properties was identical, the ‘specific’ properties were deleted from the schema. The convention used for defining these specific object properties is: has<Range>.

Example: the classes Biobanksample, Lab Result and Tumor Specimen have a relationship to the class Biosample. The meaning of these properties is unchanged - they all link to a ‘any material sample from a biological entity or testing, diagnostic, propagation, treatment or research purposes’ (definition taken from the SPHN dataset). An object property hasBiosample has therefore been created (see Table 3). The domain of this object property is the union of Biobanksample, LabResult and TumorSpecimen. The range is Biosample. There are no hasBiobanksampleBiosample, hasLabResultBiosample or hasTumorSpecimenBiosample object properties created.

Table 3. Example of object property identifiers from collapsed convention.

object property

domain

range

hasBiosample

Biobanksample|LabResult|TumorSpecimen

Biosample

hasFrequency

DrugPrescription|HeartRate|RespiratoryRate

Frequency

Exceptions are made in the case where subjects are linked to the same type of object but have a different meaning. For example, Drug is connected to two different types of Substance: one corresponds to the active ingredient and the other corresponds to the inactive ingredient. This distinction is explicitely mentioned in the SPHN RDF dataset. In the RDF schema, two distinct object properties are defined: hasDrugActiveIngredientSubstance and hasDrugInactiveIngredientSubstance. Both properties have as domain Drug and as range Substance. However, a different meaning is provided to each. In these cases, the convention used for writing the identifier is as follows: has<DomainClass><datasetPropertyName><RangeClass>

Three additional object properties are generated in the RDF schema that are not represented in nor required by the SPHN dataset:

  • hasSubjectPseudoIdentifier

  • hasDataProviderInstitute

  • hasAdministrativeCase

These three properties have been created to enable an easier linking of health-related and patient data to either the patient, the data provider, or the administrative case which holds the different healthcare encounter.

Data properties

Data properties in the SPHN RDF schema point to literal values of given SPHN concepts. The convention used for defining these properties is: has<DomainClass><datasetPropertyName>. If the domain name is contained in the dataset property name, the information is not duplicated and only the data property name is taken into account (see Table 4).

Table 4. Example of data properties identified with the dataset property names.

object property

domain

dataset property name

range

UnstructuredLabResult

LabResult

unstructured lab result

xsd:string

hasSimpleScoreScoringSystem

SimpleScore

scoring system

xsd:string

hasRadiotherapyProcedureFractionsNumber

RadiotherapyProcedure

fractions number

xsd:double

Two exceptions relate to the properties:

  • hasDateTime, which groups data properties that have as range dateTime. These properties are further split into startDateTime and endDateTime, which have two distinct meaning.

  • hasValue that groups data properties representing either double values or string values of a resource.

Development process

After the release of each new version of the SPHN Dataset the corresponding SPHN RDF Schema is reviewed by the SPHN Data Coordination Center (DCC) in close collaboration with the IT experts of the Swiss University Hospitals and revised as necessary.

Availability and usage rights

The SPHN RDF schema is available at https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-ontology/. If you need further information, please contact the SPHN Data Coordination Center (DCC) at dcc@sib.swiss. The SPHN RDF schema is under the CC-BY-NC-SA V4 License.