Instantiate data according to the project RDF Schema
Target Audience
This document is mainly intended for data providers who need to ensure that data delivered for a project follows the ontology provided for that project. To facilitate the data delivery steps, this document provides a few tips on how to properly generate data following the conventions and rules set in the SPHN interoperability framework.
1. Versioning of the data
Data provided in RDF format must contain information about:
the extraction date of the data
the version of the SPHN (or project-specific) RDF schema to which this data conforms.
This information is commonly provided in the header of the RDF file.
The class DataRelease
has been created in the SPHN RDF schema
to enable the annotation of these metadata.
An instance of DataRelease
has the following properties:
hasExtractionDateTime
(i.e. the date of extraction of the data),hasDataProviderInstitute
(i.e. the Enterprise Identification Number UID of the institute providing the data) anda link to the RDF schema the data resources conforms with, with
dct:conformsTo
(more information).
In the RDF file containing the data resource, a data provider must provide
an instance of DataRelease
that includes these three pieces of information.
1.1 Data following the SPHN RDF schema
If the data is generated using the SPHN RDF schema (provided by the DCC), the SPHN schema versionIRI used must be encoded as follow:
resource:CHE_108_907_884-DataRelease_1660833908 a sphn:DataRelease ;
dct:conformsTo <https://biomedit.ch/rdf/sphn-ontology/2022/1> ;
sphn:hasExtractionDateTime "2022-08-18"^^xsd:date ;
sphn:hasDataProviderInstitute resource:CHE_108_907_884-DataProviderInstitute .
where resource:CHE_108_907_884-DataRelease_1660833908
is an instance of the class DataRelease
and resource:CHE_108_907_884-DataProviderInstitute
is an instance of the class DataProviderInstitute
.
For more information on how to represent data,
check the 2. Instantiation of the data paragraph.
1.2 Data following a project-specific RDF schema
If the data is generated using a project specific RDF schema, the project schema versionIRI used must be encoded as follows (example from the PSSS project):
resource:CHE_108_907_884-DataRelease_1620055600 a sphn:DataRelease ;
dct:conformsTo <https://biomedit.ch/rdf/sphn-ontology/psss/2021/3> ;
sphn:hasExtractionDateTime "2021-05-03"^^xsd:date ;
sphn:hasDataProviderInstitute resource:CHE_108_907_884-DataProviderInstitute .
Note
In this second case (project-specific RDF schema), the version of the SPHN RDF schema used is known implicitly, since the PSSS schema must import the SPHN schema it uses (see section on Generate a SPHN project-specific RDF Schema). However we could explicitly add that this data file also conforms to <https://biomedit.ch/rdf/sphn-ontology/sphn/2021/2/> .
To ensure the uniqueness of a DataRelease instance ID (i.e. the dataset identifier), a UNIX Epoch timestamp should ideally be concatenated to it as a suffix (e.g. “CHE_108_907_884-DataRelease_1620055600”, where ‘1620055600’ is a unique ID for ‘Tuesday, May 3, 2021 3:26:40 PM’).
2. Instantiation of the data
An instance
in RDF is defined as a member of a class, and a class is
generally composed of a set of instances.
This section provides guidance on how data must be instantiated and which
conventions must be followed to support data interoperability.
2.1 IRI prefix for data instances
All unique identifiers of data instances (IRIs) are resources and must be defined in the context of SPHN with a prefix in the form of:
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
This enables to distinguish concrete resources (sphn-resource
)
from ontology elements (which use sphn-ontology
).
The exception concerns valuesets defined in SPHN where instances
are provided in the SPHN ontology.
Note
Values coming from valuesets defined in the SPHN ontology are the only exception since their prefix will be :
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
2.2 Naming convention for SPHN data instances
Each IRI defined for an instance of a data element must follow conventions put in place in the SPHN project. Concepts are separated in two categories:
Concepts that must have unique data instances defined in the setting of a data provider and that can’t be shared between providers.
Concepts where data instances can be shared across data providers, meaning that it is a reused instance in data from different providers.
2.2.1 Unique resource instantiation
- Resource that must be unique for each data provider must follow these conventions:
resource:<provider_id>-<ClassName>-<unique_id>
where unique_id
is a unique identifier defined by the data provider.
Note
Resources that must be unique generally have a temporal attribute connected to them (e.g. HeartRate, BodyTemperature, Biosample) and usually correspond to patient-related specific information.
See below a couple of examples of unique resources data instantiation:
# Instantiation of a patient with an identifier attribute:
resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 a sphn:SubjectPseudoIdentifier ;
sphn:hasIdentifier "123456789"^^xsd:string .
# Instantiation of a HeartRate with a date and patient attribute:
resource:CHE-101-064-173-HeartRate-f9c87482fg a sphn:HeartRate ;
sphn:hasMeasurementDateTime "2021-04-02T00:12";
sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 .
2.3 Instantiation of external resources
External resources provided by the SPHN project in RDF
(i.e., ATC, CHOP, ICD-10-GM, LOINC, SNOMED CT, UCUM) are commonly used for
referring to standard codes when annotating a particular data element.
In the RDF files containing the codes to external terminologies,
the codes are represented as rdfs:Class
, similarly to SPHN concepts.
Therefore, as it is done for SPHN concept resources,
it is necessary for data providers to instantiate their codes and link
them to the appropriate terminology class element.
Since these codes are reusable across data providers,
the instantiation of these resources should follow the convention
already defined for shared resources:
resource:<ClassName>-<coding_system>-<identifier>
where ClassName
will always be Code
.
For example, the use of a SNOMED CT code for annotating information about the Consent status should be as follow:
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
resource:CHE-101-064-173-Consent-gseriow340rokfo a sphn:Consent ;
sphn:hasTypeCode resource:Code-SNOMED-CT-385645004 ;
resource:Code-SNOMED-CT-385645004 a snomed:385645004 .
We can see that hasTypeCode
has for target a resource (Code-SNOMED-CT-385645004
)
which is an instance of the SNOMED CT class 385645004
(that stands for accepted
).
The only exception are the UCUM codes which are already incorporated as
owl:NamedIndividual
meaning that these instances
can be directly referred to by the data providers:
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# BodyHeight has a specific Unit provided through the Quantity
resource:CHE-101-064-173-BodyHeight-ytrewq3 a sphn:BodyHeight ;
sphn:hasQuantity resource:CHE-101-064-173-Quantity-trew0123 .
# Quantity contains the Unit
resource:CHE-101-064-173-Quantity-trew0123 a sphn:Quantity ;
sphn:hasUnit resource:Unit-UCUM-cm ;
sphn:hasValue "158"^^xsd:string ;
sphn:hasComparator sphn:GreaterThan .
# Unit is linked to an UCUM code
resource:Unit-UCUM-cm a sphn:Unit ;
sphn:hasCode ucum:cm .
2.4 Examples of data instantiation
2.4.1 Instantiation of a Heart Rate of a patient
The code block below provides an example representing data for the annotation
of a measured heart rate (i.e. HeartRate
) of a patient (i.e. SubjectPseudoIdentifier
).
It highlights the specific instances (e.g. Code, Quantity, BodySite) that need
to be generated following the definitions of the SPHN RDF schema.
It showcases examples of SNOMED CT instantiation and a reference to UCUM units.
Note that the instance of Quantity
does not show here the optional property hasComparator
.
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# Instantiation of a SubjectPseudoIdentifier (i.e. patient) with an identifier:
resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 a sphn:SubjectPseudoIdentifier ;
sphn:hasIdentifier "123456789"^^xsd:string .
# Instantiation of a HeartRate connected to that SubjectPseudoIdentifier:
resource:CHE-101-064-173-HeartRate-f9c87482 a sphn:HeartRate ;
sphn:hasQuantity resource:CHE-101-064-173-Quantity-mnopqrst ;
sphn:hasBodySite resource:CHE-101-064-173-BodySite-987654 ;
sphn:hasPhysiologicStateCode resource:Code-SNOMED-CT-128974000 ;
sphn:hasRegularityCode resource:Code-SNOMED-CT-61086009 ;
sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 ;
sphn:hasDateTime "2021-04-02T00:12".
# Instantiation of the Quantity:
resource:CHE-101-064-173-Quantity-mnopqrst a sphn:Quantity ;
sphn:hasValue "85"^^xsd:double ;
sphn:hasUnit resource:Unit-UCUM-cblbeatscbrpermin .
# Instantiation of the Unit used in Quantity:
resource:Unit-UCUM-cblbeatscbrpermin a sphn:Unit ;
sphn:hasCode ucum:cblbeatscbrpermin .
# Instantiation of the BodySite and connection to its Code:
resource:CHE-101-064-173-BodySite-987654 a sphn:BodySite ;
sphn:hasCode resource:Code-SNOMED-CT-8205005 ;
sphn:hasLaterality resource:Laterality-SNOMED-CT-7771000 .
# Instantiation of the Laterality used in BodySite:
resource:Laterality-SNOMED-CT-7771000 a sphn:Laterality ;
sphn:hasCode resource:Code-SNOMED-CT-7771000 .
# Instantiation of the Code used in Laterality:
resource:Code-SNOMED-CT-7771000 a snomed:7771000 .
# Heart Rate taken on the wrist:
resource:Code-SNOMED-CT-8205005 a snomed:8205005 .
# Physiologic state of the heart rate is 'Baseline state':
resource:Code-SNOMED-CT-128974000 a snomed:128974000 .
# Regularity of the heart rate is 'Pulse irregular':
resource:Code-SNOMED-CT-61086009 a snomed:61086009 .
The snippet of mock data represented above has been translated into a graph to better display a visualization of the connections between the resources.
2.4.2 Instantiation of a Biosample
Below is an example of data instantiation for a Biosample,
also visualized as a graph. You can observe the use of external terminologies
(SNOMED CT) as well as references to values from valueset defined in SPHN.
Note that connections to the the SubjectPseudoIdentifier
, AdministrativeCase
and DataProviderInstitute
are not shown here for keeping the graph clear.
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# Instantiation of a Biosample:
resource:CHE-101-064-173-Biosample-d3d5f4g5 a sphn:Biosample;
sphn:hasBodySite resource:CHE-101-064-173-BodySite-ay4efj ;
sphn:hasMaterialTypeLiquid sphn:AscitesFluid ;
sphn:hasMaterialTypeTissue sphn:Placenta ;
sphn:hasFixationType sphn:AldehydeBased ;
sphn:hasPrimaryContainer sphn:Glass ;
sphn:hasStorageContainer sphn:OriginalPrimaryContainer ;
sphn:hasCollectionDateTime "2021-07-04T12:12".
# Instantiation of the BodySite:
resource:CHE-101-064-173-BodySite-ay4efj a sphn:BodySite ;
sphn:hasCode resource:Code-SNOMED-CT-53120007 ;
sphn:hasLaterality resource:Laterality-SNOMED-CT-7771000 .
#Instantiation of the Laterality:
resource:Laterality-SNOMED-CT-7771000 a sphn:Laterality ;
sphn:hasCode resource:Code-SNOMED-CT-7771000 .
# Instantiation of the SNOMED CT code for the BodySite
resource:Code-SNOMED-CT-53120007 a snomed:53120007 .
2.4.3 Instantiation of a Diagnostic Radiologic Examination
Below is an excerpt showing the data instantiation for a Diagnostic Radiologic Examination,
also visualized as a graph. You can observe the use of external terminologies (SNOMED CT, CHOP, UCUM)
as well as references to values from valueset defined in SPHN (e.g., CT)
and the use of double type of values.
Note that connections to the SubjectPseudoIdentifier
, AdministrativeCase
and DataProviderInstitute
are not shown here for keeping the graph clear.
The Laterality
instance is also not shown here connected to the BodySite
as it is an optional metadata.
The instance of Quantity
does not show here the optional property hasComparator
.
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix chop: <https://biomedit.ch/rdf/sphn-resource/chop/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .
# Instantiation of a Diagnostic Radiologic Examination:
resource:CHE-101-064-173-DiagnosticRadiologicExamination-g4d3e5t6 a sphn:DiagnosticRadiologicExamination;
sphn:hasCode resource:Code-CHOP-Z88.25 ;
sphn:hasMethod sphn:CT ;
sphn:hasRadiationQuantity resource:CHE-101-064-173-Quantity-xcvb54 ;
sphn:hasBodySite resource:CHE-101-064-173-BodySite-fghj56 .
# Instantiation of the CHOP code used in Diagnostic Radiologic Examination:
resource:Code-CHOP-Z88.25 a chop:Z88.25 .
# Instantiation of the Radiation Quantity:
resource:CHE-101-064-173-Quantity-xcvb54 a sphn:Quantity ;
sphn:hasValue "50"^^xsd:double ;
sphn:hasUnit resource:Unit-UCUM-Gy .
# Instantiation of the UCUM unit Gy:
resource:Unit-UCUM-Gy a sphn:Unit ;
sphn:hasCode ucum:Gy .
# Instantiation of the BodySite Code:
resource:CHE-101-064-173-BodySite-fghj56 a sphn:BodySite ;
sphn:hasCode resource:Code-SNOMED-CT-12921003 .
resource:Code-SNOMED-CT-12921003 a snomed:12921003 .
3. Cardinality of the data
The majority of classes (e.g. AdministrativeGender
, BodyTemperature
)
defined in the SPHN RDF schema possibly share a link to the three main concepts: `
SubjectPseudoIdentifier
, DataProviderInstitute
and AdministrativeCase
.
These three main classes are also connected between each other.
The properties connecting these classes together have specific cardinalities
which are defined in the SPHN RDF schema as owl:mincardinality
and owl:maxcardinality
restrictions and have being implemented in the SHACL rules (read more
here).
The cardinalities assess whether a class is linked directly to a patient, a data provider
or a case but also how (e.g. is it possible to have multiple BodyTemperature
connected to a patient? How many types of AdministrativeGender
at the minimum and
maximum can the patient have?). In addition, for each concept, the schema defines which metadata
are mandatory to provide and how many of them can be provided for a single data instance.
This data interconnectivity not only facilitates the process of quality control of the data but it
can also help in the data exploration depending on the angle the user takes interested in when looking into the data.
When instantiating data, data providers must comply with the cardinality restrictions for the data to be validated.