Generate data according to a RDF schema
Target Audience
This document is mainly intended for data providers who need to deliver data for a project following the schema provided.
Data providers have the possibility to automatically generate data using the SPHN Connector (recommended!) or can build their own pipelines to create health-related data compliant with the SPHN RDF Schema or the project-specific RDF Schema.
In case you prefer or need to build your own pipeline, please read this document which provides a few tips on how to properly generate data following the conventions and rules defined in the SPHN Semantic Interoperability Framework.
1. Versioning of the data
In SPHN, data provided in RDF format must contain information about:
the extraction date of the data
the version of the SPHN (or project-specific) RDF Schema to which this data conforms
This information is commonly provided in the header of the RDF file.
The class DataRelease
has been created in the SPHN RDF Schema
to enable the annotation of these metadata.
An instance of DataRelease
has the following properties:
hasExtractionDateTime
(i.e. the date of extraction of the data)hasDataProvider
(i.e. the Enterprise Identification Number UID of the institute providing the data)a link to the RDF schema the data resources conforms with, with
dct:conformsTo
(more information)
In the RDF file containing the data resource, a data provider must create
an instance of DataRelease
that includes these three pieces of information.
1.1 Data following the SPHN RDF Schema
If the data is generated using the SPHN RDF Schema (provided by the DCC),
the SPHN schema versionIRI
used must be encoded as follow:
resource:CHE_108_907_884-DataRelease_1660833908 a sphn:DataRelease ;
dct:conformsTo <https://biomedit.ch/rdf/sphn-schema/2024/2> ;
sphn:hasExtractionDateTime "2024-02-02"^^xsd:date ;
sphn:hasDataProvider resource:CHE_108_907_884-DataProvider-7d87e4a5d5a7 .
where:
- resource:CHE_108_907_884-DataRelease_1660833908
is an instance of the class DataRelease
- resource:CHE_108_907_884-DataProvider-7d87e4a5d5a7
is an instance of the class DataProvider
For more information on how to represent data, see section 2. Instantiation of the data.
1.2 Data following a project-specific RDF Schema
If the data is generated using a project specific RDF schema,
the project schema versionIRI
used must be encoded as follows
(example from the PSSS project):
resource:CHE_108_907_884-DataRelease_1620055600 a sphn:DataRelease ;
dct:conformsTo <https://biomedit.ch/rdf/sphn-schema/iicu/2024/1> ;
sphn:hasExtractionDateTime "2024-01-21"^^xsd:date ;
sphn:hasDataProvider resource:CHE_108_907_884-DataProvider .
Note
In this second case (project-specific RDF Schema), the version of the SPHN RDF Schema used is known implicitly, since the PSSS schema must import the SPHN schema it uses (see section on Generate a project-specific RDF Schema). However we could explicitly add that this data file also conforms to
<https://biomedit.ch/rdf/sphn-schema/sphn/2024/2/>
.To ensure the uniqueness of a
DataRelease
instance ID (i.e. the dataset identifier), a UNIX Epoch timestamp could ideally be concatenated to it as a suffix (e.g.CHE_108_907_884-DataRelease_1620055600
, where 1620055600 is a unique ID for Tuesday, May 3, 2021 3:26:40 PM).
2. Instantiation of the data
An instance in RDF is defined as a member of a class, and a class is
typically a collection or a set of instances that share the characteristics
of the class.
For example, if DataProvider
is defined as a class, CHUV
, HUG
and USZ
could be defined as instances of the class DataProvider
.
The following sections provides guidance on how data must be instantiated and which conventions must be followed.
2.1 IRI prefix for data instances
All unique identifiers of data instances (IRIs) are resources and must be defined in the context of SPHN with a prefix in the form of:
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
This allows one to distinguish concrete resources (sphn-resource
)
from schema elements (which use sphn-schema
).
The exception concerns valuesets defined in SPHN where instances
are provided in the SPHN or in the project-specific RDF Schema.
Note
Values coming from valuesets defined in the RDF Schema are the only exception since their prefix will be :
@prefix sphn-individual: <https://biomedit.ch/rdf/sphn-schema/sphn/individual#> .
2.2 Naming convention for SPHN data instances
Each IRI defined for an instance of a data element must follow conventions defined in the SPHN project. Concepts usually have unique data instances defined in the setting of a data provider and can’t be shared between providers.
- Resource that must be unique for each data provider must follow these conventions:
resource:<provider_id>-<prefix>-<class_name>-<unique_id>
where:
<provider_id>
is the Data Provider UID<prefix>
is the SPHN prefix or the project prefix, depending on the namespace that this instance belongs to<class_name>
is the name of the class which this instance is a type of<unique_id>
is the unique identifier defined by the data provider
Examples of instantiation of unique data resources:
# Instantiation of a patient with an identifier attribute:
resource:CHE-101-064-173-sphn-SubjectPseudoIdentifier-7d0f0e6c67ab a sphn:SubjectPseudoIdentifier ;
sphn:hasIdentifier "123456789"^^xsd:string .
# Instantiation of a HeartRateMeasurement with its mandatory properties:
resource:CHE-101-064-173-sphn-HeartRateMeasurement-1aa3d8a84743 a sphn:HeartRateMeasurement ;
sphn:hasStartDateTime "2021-04-02T00:12";
sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-sphn-SubjectPseudoIdentifier-7d0f0e6c67ab ;
sphn:hasResult resource:CHE-101-064-173-sphn-HeartRate-4e9983c80418 ;
sphn:hasDataProvider resource:DataProvider-UID-CHE-101-064-173-3e189e2dcbb5 ;
sphn:hasSourceSystem resource:CHE-101-064-173-sphn-SourceSystem-ec286978f4cc .
The uniqueness of the data points must be ensured by the data provider to avoid any data conflict. There are many ways to generate the unique ID with the most common approach being UUID4.
Note
The unique ID used in the above and all subsequent examples are UUID4, truncated for the sake of readability. In real scenarios the unique ID will be longer.
Important
This does not apply for the instances of the Code class! The instantiation of the code class follows the naming convention of external resources see 2.3 Instantiation of external resources and Code classes.
2.3 Instantiation of external resources and Code classes
External resources (or terminologies) provided by the SPHN project in RDF
(i.e., ATC, CHOP, GENO, HGNC, ICD-10-GM, LOINC, SNOMED CT, SO, UCUM) are commonly used for
referring to standard codes when annotating a particular data element.
In the RDF files containing the codes to external terminologies,
the codes are represented as rdfs:Class
(or owl:Class
).
Therefore, as it is done for SPHN concept resources,
it is necessary for data providers to instantiate their codes and link
them to the appropriate terminology class element.
Since the 2024.1 release of the SPHN RDF Schema, the previous assumption that codes are reusable across data providers is not valid anymore. The same applies for the Code class. As a result, you would instantiate the coded concepts with the following conventions.
- For codes:
resource:<provider_id>-<prefix-of-class-where-used>-<name-of-class-where-used>-<id-of-class-where-used>-sphn-Code-<unique_id-of-the-code>
- For external terminology elements:
resource:<provider_id>-<prefix-of-class-where-used>-<name-of-class-where-used>-<id-of-class-where-used>-sphn-Code-<coding_system>-<code_identifier>
where:
- <provider_id>
is the Data Provider UID
- <prefix-of-class-where-used>
is the SPHN prefix or the project prefix, depending on the namespace that this instance belongs to
- <name-of-class-where-used>
is the name of the class which this instance is a type of
- <id-of-class-where-used>
is the id of the class where this code or terminology element is used this instance is a type of
For code:
- <unique_id-of-the-code>
is the unique identifier defined by the data provider
For external terminology elements:
- <coding_system>
is the coding system used
- <code_identifier>
is the identifier that is used in the
In the SPHN Connector the above terminology elements are created together with:
<termid> = <coding_system>-<code_identifier>
Note
The SPHN Connector handles the changed requirements on the naming conventions automatically for the usual classes. In the concepts where an exact instance of an external terminology code is referenced the coordinates can be used for reference.
For example, the use of a SNOMED CT code for annotating information about the Consent status should be as follows (reduced example for clarity):
@prefix sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
resource:CHE-101-064-173-sphn-Consent-6aba1601c334 a sphn:Consent ;
sphn:hasTypeCode resource:CHE-101-064-173-sphn-Consent-6aba1601c334-sphn-Code-SNOMED-CT-385645004 .
resource:CHE-101-064-173-sphn-Consent-6aba1601c334-sphn-Code-SNOMED-CT-385645004 a snomed:385645004 .
Here, hasTypeCode
points to a resource resource:CHE-101-064-173-sphn-Consent-6aba1601c334-sphn-Code-SNOMED-CT-385645004
which is an instance of the SNOMED CT class 385645004
(that stands for accepted
).
Another example is given below with UCUM (reduced example for clarity):
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# BodyHeight has a specific Unit provided through the Quantity
resource:CHE-101-064-173-sphn-BodyHeight-6f917b6b1e19 a sphn:BodyHeight ;
sphn:hasQuantity resource:CHE-101-064-173-sphn-Quantity-10252cef2d6f .
# Quantity contains the Unit
resource:CHE-101-064-173-sphn-Quantity-10252cef2d6f a sphn:Quantity ;
sphn:hasComparator sphn:GreaterThan ;
sphn:hasUnit resource:CHE-101-064-173-sphn-Unit-da14cbef52be ;
sphn:hasValue "158" .
# Unit is linked to an UCUM resource
resource:CHE-101-064-173-sphn-Unit-da14cbef52be a sphn:Unit ;
sphn:hasCode resource:CHE-101-064-173-sphn-Unit-da14cbef52be-sphn-Code-UCUM-cm .
# The UCUM resource is intantiating the corresponding UCUM class
resource:CHE-101-064-173-sphn-Unit-da14cbef52be-sphn-Code-UCUM-cm a ucum:cm .
For terminologies provided in the context of a project, the members of the project must deliver the RDF files of the terminologies to the data providers and data providers must follow the notation used by the project.
Recommendation on how to generate RDF schemas for external terminologies is provided in FAIRify external terminologies in RDF.
2.4 Examples of data instantiation
2.4.1 Instantiation of a Heart Rate of a patient
The code block below shows an example of data instantiation
of a heart rate measurement (i.e. HeartRateMeasurement
and HeartRate
) on a patient (i.e. SubjectPseudoIdentifier
).
It highlights the specific instances (e.g. Code
, Quantity
, BodySite
) that need
to be generated following the definitions of the SPHN RDF Schema.
It showcases examples of SNOMED CT and UCUM codes instantiation.
Note that instances of optional properties may have been omitted.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# Instantiation of a SubjectPseudoIdentifier (i.e. patient) with an identifier:
resource:CHE-101-064-173-sphn-SubjectPseudoIdentifier-7d0f0e6c67ab a sphn:SubjectPseudoIdentifier ;
sphn:hasIdentifier "123456789"^^xsd:string .
# Instantiation of a DataProvider:
resource:CHE-101-064-173-sphn-DataProvider-d27b937ec578 a sphn:DataProvider ;
sphn:hasCode resource:CHE-101-064-173-sphn-DataProvider-d27b937ec578-sphn-Code-f3ecd42aba3b .
# Instantiation of the Code of the DataProvider:
resource:CHE-101-064-173-sphn-DataProvider-d27b937ec578-sphn-Code-f3ecd42aba3b a sphn:Code ;
sphn:hasName "SIB Institut Suisse de Bioinformatique"^^xsd:string ;
sphn:hasIdentifier "CHE-101-064-173"^^xsd:string ;
sphn:hasCodingSystemAndVersion "UID"^^xsd:string ;
# Instantiation of the HeartRateMeasurement linked to a patient:
resource:CHE-101-064-173-sphn-HeartRateMeasurement-67e274d74e3f a sphn:HeartRateMeasurement ;
sphn:hasBodySite resource:CHE-101-064-173-sphn-BodySite-0148faeb8f53 ;
sphn:hasSubjectPhysiologicState resource:CHE-101-064-173-sphn-PhysiologicState-7120eab9a4e5 ;
sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-sphn-SubjectPseudoIdentifier-7d0f0e6c67ab ;
sphn:hasDataProvider resource:CHE-101-064-173-sphn-DataProvider-d27b937ec578 ;
sphn:hasMethodCode resource:CHE-101-064-173-sphn-HeartRateMeasurement-67e274d74e3f-sphn-Code-SNOMED-CT-29303009 ;
sphn:hasResult resource:CHE-101-064-173-sphn-HeartRate-3e9dcba13f33 ;
sphn:hasStartDateTime "2021-04-02T00:12".
# Instantiation of a HeartRate that results from the measurement:
resource:CHE-101-064-173-sphn-HeartRate-3e9dcba13f33 a sphn:HeartRate ;
sphn:hasQuantity resource:CHE-101-064-173-sphn-Quantity-222b114ca3c4 ;
sphn:hasRegularityCode resource:CHE-101-064-173-sphn-HeartRate-3e9dcba13f33-sphn-Code-SNOMED-CT-61086009 ;
sphn:hasDateTime "2021-04-02T00:13".
# Instantiation of the Quantity:
resource:CHE-101-064-173-sphn-Quantity-222b114ca3c4 a sphn:Quantity ;
sphn:hasValue "85"^^xsd:double ;
sphn:hasUnit resource:CHE-101-064-173-sphn-Unit-bb14c6d7ebba .
# Instantiation of the Unit used in Quantity:
resource:CHE-101-064-173-sphn-Unit-bb14c6d7ebba a sphn:Unit ;
sphn:hasCode resource:CHE-101-064-173-sphn-Unit-bb14c6d7ebba-sphn-Code-UCUM-cblbeatscbrpermin .
# Instantiation of the UCUM unit:
resource:CHE-101-064-173-sphn-Unit-bb14c6d7ebba-sphn-Code-UCUM-cblbeatscbrpermin a ucum:cblbeatscbrpermin .
# Instantiation of the BodySite and connection to its Code:
resource:CHE-101-064-173-sphn-BodySite-0148faeb8f53 a sphn:BodySite ;
sphn:hasCode resource:CHE-101-064-173-sphn-BodySite-0148faeb8f53-sphn-Code-SNOMED-CT-8205005 ;
sphn:hasLaterality resource:CHE-101-064-173-sphn-Laterality-ca8fa62c4eb7 .
# Instantiation of the Laterality used in BodySite:
resource:CHE-101-064-173-sphn-Laterality-ca8fa62c4eb7 a sphn:Laterality ;
sphn:hasCode resource:CHE-101-064-173-sphn-Laterality-ca8fa62c4eb7-sphn-Code-SNOMED-CT-7771000 .
# Instantiation of the Code used in Laterality:
resource:CHE-101-064-173-sphn-Laterality-ca8fa62c4eb7-sphn-Code-SNOMED-CT-7771000 a snomed:7771000 .
# Instantiation of the PhysiologicState:
resource:CHE-101-064-173-sphn-PhysiologicState-7120eab9a4e5 a sphn:PhysiologicState ;
sphn:hasCode resource:CHE-101-064-173-sphn-PhysiologicState-7120eab9a4e5-sphn-Code-SNOMED-CT-128974000 .
# Instantiation of the Code used in PhysiologicState ("Baseline state"):
resource:CHE-101-064-173-sphn-PhysiologicState-7120eab9a4e5-sphn-Code-SNOMED-CT-128974000 a snomed:128974000 .
# Instantiation of the method code:
resource:CHE-101-064-173-sphn-HeartRateMeasurement-67e274d74e3f-sphn-Code-SNOMED-CT-29303009 a snomed:29303009 .
# Heart Rate taken on the wrist:
resource:CHE-101-064-173-sphn-BodySite-0148faeb8f53-sphn-Code-SNOMED-CT-8205005 a snomed:8205005 .
# Regularity of the heart rate is 'Pulse irregular':
resource:CHE-101-064-173-sphn-HeartRate-3e9dcba13f33-sphn-Code-SNOMED-CT-61086009 a snomed:61086009 .
The snippet of mock data represented above has been translated into a graph to better display the connections between the resources.
2.4.2 Instantiation of a Sample
Below is an example of data instantiation for a Sample,
also visualized as a graph. You can observe the use of external terminologies
(SNOMED CT) as well as references to values from value sets defined in SPHN.
Note that connections to the SubjectPseudoIdentifier
, AdministrativeCase
and DataProvider
are not shown here for a less complex visualization.
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-schema/sphn#> .
@prefix sphn-ind: <https://biomedit.ch/rdf/sphn-schema/sphn/individual#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# Instantiation of a Sample:
resource:CHE-101-064-173-sphn-Sample-e40e2caa36ae a sphn:Sample;
sphn:hasIdentifier "d3d5f4g5"^^xsd:string ;
sphn:hasSharedIdentifier "https://example.org/sdfg90w34nhf234"xsd:anyURI ;
sphn:hasBodySite resource:CHE-101-064-173-sphn-BodySite-8944032421f6 ;
sphn:hasMaterialTypeCode resource:CHE-101-064-173-sphn-Sample-e40e2caa36ae-sphn-Code-SNOMED-CT-309051001 ;
sphn:hasFixationType sphn-ind:AldehydeBased ;
sphn:hasPrimaryContainer sphn-ind:Glass ;
sphn:hasCollectionDateTime "2021-07-04T12:12".
# Instantiation of the BodySite:
resource:CHE-101-064-173-sphn-BodySite-8944032421f6 a sphn:BodySite ;
sphn:hasCode resource:CHE-101-064-173-sphn-BodySite-8944032421f6-sphn-Code-SNOMED-CT-53120007 ;
sphn:hasLaterality resource:CHE-101-064-173-sphn-Laterality-e72b070131d9 .
# Instantiation of the Laterality:
resource:CHE-101-064-173-sphn-Laterality-e72b070131d9 a sphn:Laterality ;
sphn:hasCode resource:CHE-101-064-173-sphn-Laterality-e72b070131d9-sphn-Code-SNOMED-CT-7771000 .
# Instantiation of the SNOMED CT code for the laterality
resource:CHE-101-064-173-sphn-Laterality-e72b070131d9-sphn-Code-SNOMED-CT-7771000 a snomed:7771000 .
# Instantiation of the SNOMED CT code for the BodySite
resource:CHE-101-064-173-sphn-BodySite-8944032421f6-sphn-Code-SNOMED-CT-53120007 a snomed:53120007 .
# Instantiation of the SNOMED CT code for the material type
resource:CHE-101-064-173-sphn-Sample-e40e2caa36ae-sphn-Code-SNOMED-CT-309051001 a snomed:309051001 ;
4. Cardinality of the data
The majority of classes (e.g. AdministrativeSex
, BodyTemperatureMeasurement
)
defined in the SPHN RDF Schema possibly share a link to three main concepts:
SubjectPseudoIdentifier
, SourceSystem
and AdministrativeCase
.
These three main classes can also be connected together.
The properties connecting these classes together have specific cardinalities
which are defined in the SPHN RDF Schema as owl:mincardinality
and owl:maxcardinality
restrictions and have being implemented in the SHACL rules (read more
here).
The cardinalities assess whether an instance of a class is linked directly to a patient, a data provider, an administrative case
or a specific source system but also how (e.g. is it possible to have multiple BodyTemperatureMeasurement
connected to a patient? How many types of AdministrativeSex
at the minimum and
maximum can the patient have?). In addition, for each concept, the schema defines which metadata
are mandatory to provide and how many of them can be provided for a single data instance.
This data interconnectivity not only facilitates the process of quality control of the data but it
can also help in the data exploration depending on the interest of the user.
For additional information about how cardinalities are set, please visit: 6. Defining cardinalities.
When instantiating data, data providers must comply with the cardinality restrictions for the data to be valid.
Note that a class that is not linked to any of the three main concepts is necessarily a class that must be instantiated in the context of another class.