Data generation following the project ontology

Data providers need to ensure that data delivered for a project follows the ontology provided for that project. To facilitate the data delivery steps, here are a few tips on how to properly generate data following the conventions and rules set in the SPHN interoperability framework.

1. Versioning of the data

Data provided in RDF format must contain information about:

  • the extraction date of the data

  • the version of the SPHN (or project-specific) RDF schema to which this data conforms.

This information is commonly provided in the header of the RDF file.

The class DataRelease has been created in the SPHN RDF schema to enable the annotation of these metadata. An instance of DataRelease has the following properties:

In the RDF file containing the data resource, a data provider must provide an instance of DataRelease that includes these three pieces of information.

1.1 Data following the SPHN RDF schema

If the data is generated using the SPHN RDF schema (provided by the DCC), the SPHN schema versionIRI used would be encoded as follow:

resource:CHE_108_907_884-DataRelease_1612279600 a sphn:DataRelease ;
    dct:conformsTo <https://biomedit.ch/rdf/sphn-ontology/2021/2> ;
    sphn:hasExtractionDateTime  "2021-02-02"^^xsd:date ;
    sphn:hasDataProviderInstitute resource:CHE_108_907_884-DataProviderInstitute .

where resource:CHE_108_907_884-DataRelease_1612279600 is an instance of DataRelease and resource:CHE_108_907_884-DataProviderInstitute is an instance of a DataProviderInstitute. For more information on how to represent data, check the Representation of the data paragraph.

1.2 Data following a project-specific RDF schema

If the data is generated using a project specific RDF schema, the project schema versionIRI used would be encoded as follows (example from the PSSS project):

resource:CHE_108_907_884-DataRelease_1620055600 a sphn:DataRelease ;
    dct:conformsTo <https://biomedit.ch/rdf/sphn-ontology/psss/2021/3> ;
    sphn:hasExtractionDateTime  "2021-05-02"^^xsd:date ;
    sphn:hasDataProviderInstitute resource:CHE_108_907_884-DataProviderInstitute .

Note

  • In this second case the version of the SPHN RDF schema used is known implicitly, since the PSSS schema must import the SPHN schema it uses (see Generating a SPHN project-specific ontology section). However we could explicitly add that this data file also conforms to <https://biomedit.ch/rdf/sphn-ontology/sphn/2021/2/> .

  • To ensure the uniqueness of a DataRelease instance ID (i.e. the dataset identifier), a UNIX Epoch timestamp should ideally be concatenated to it as a suffix (e.g. “CHE_108_907_884-DataRelease_1612279600”, where ‘1612279600’ is a unique ID for ‘Tuesday, February 2, 2021 3:26:40 PM’).

2. Instantiation of the data

An instance in RDF is defined as a member of a class, and a class is generally composed of a set of instances. This section provides guidance on how data must be instantiated and which conventions must be followed to support data interoperability.

2.1 IRI prefix for data instances

All unique identifiers of data instances (IRIs) are resources and must be defined in the context of SPHN with a prefix in the form of:

@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .

This enables to distinguish concrete resources (sphn-resource) from ontology elements (which use sphn-ontology). The exception concerns valuesets defined in SPHN where instances are provided in the SPHN ontology.

Note

Values coming from valuesets defined in the SPHN ontology are the only exception since their prefix will be :

@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .

2.2 Naming convention for SPHN data instances

Each IRI defined for an instance of a data element must follow conventions put in place in the SPHN project. Concepts are separated in two categories:

  • Concepts that must have unique data instances defined in the setting of a data provider and that can’t be shared between providers.

  • Concepts where data instances can be shared across data providers, meaning that it is a reused instance in data from different providers.

2.2.1 Unique resource instantiation

Resource that must be unique for each data provider must follow these conventions:

resource:<provider_id>-<ClassName>-<unique_id>

where unique_id is a unique identifier defined by the data provider.

Note

Resources that must be unique generally have a temporal attribute provided with them (e.g. HeartRate, BodyTemperature, Biosample) and usually correspond to patient-related information.

See below a couple of examples of unique resources data instantiation:

# Instantiation of a patient with an identifier attribute:
resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "123456789"^^xsd:string .

# Instantiation of a HeartRate with a date and patient attribute:
resource:CHE-101-064-173-HeartRate-f9c87482fg a sphn:HeartRate ;
    sphn:hasDateTime "2021-04-02T00:12";
    sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 .

2.2.2 Shared resource instantiation

Shared resources correspond to data that can be provided by different data providers with the same meaning and therefore the same identifier. These resources are quite generic type of information and generally do not depend on any date time.

The list of concepts shared across data providers in the SPHN schema can be further separated in two sub-lists:

  • concepts that only have a Code or Value from a SPHN valueset

  • concepts that only have a value and a Unit

For instances of SPHN Concepts that only have a Code information, it is recommended to generate IRIs following this convention:

resource:<ClassName>-<coding_system>-<unique_id>

where coding_system should be provided following conventions written here

The SPHN Concepts impacted by this convention are:
  • Body Site

  • Care Handling

  • Code

  • Data Determination

  • Data Provider Institute

  • Intent

  • Measurement Method

  • Medical Device

  • Time Pattern

  • Unit

  • Therapeutic Area

For instances of SPHN Concepts that only have a value and a Unit, it is recommended to generate IRIs following this convention:

resource:<ClassName>-<value>-<unit_identifier>

The SPHN Concepts impacted by the value and Unit convention are:
  • Duration

  • Frequency

  • Substance Amount

  • Gestational Age at Birth

2.3 Instantiation of external resources

External resources provided by the SPHN project in RDF (i.e., ATC, CHOP, ICD-10-GM, LOINC, SNOMED CT, UCUM) are commonly used for referring to standard codes when annotating a particular data element. In these RDF files, these codes are all represented as rdfs:Classes, similarly to SPHN concepts. Therefore, as it is done for SPHN concept resources, it is necessary for data providers to instantiate their codes and link them to the appropriate terminology class element.

Since these codes are reusable across data providers, the instantiation of these resources should follow the convention already defined for shared resources: resource:<ClassName>-<coding_system>-<identifier>

where ClassName will always be Code.

For example, the use of a SNOMED CT code for annotating information about the Consent status should be as follow:

@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .

resource:CHE-101-064-173-Consent-gseriow340rokfo a sphn:Consent ;
    sphn:hasConsentStatusCode resource:Code-SNOMED-CT-385645004 ;

resource:Code-SNOMED-CT-385645004 a snomed:385645004 .

We can see that hasConsentStatusCode has for target a resource (Code-SNOMED-CT-385645004) that is an instance of the SNOMED CT class 385645004 (which stands for accepted).

The only exception are the UCUM codes which are already incorporated as owl:NamedIndividual which means that these instances can be directly referred to by the data providers:

@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .

# BodyHeight has a specific Unit
resource:CHE-101-064-173-BodyHeight-ytrewq3 a sphn:BodyHeight ;
    sphn:hasBodyHeightUnit resource:Unit-UCUM-cm .

# Unit is linked to an UCUM code
resource:Unit-UCUM-cm a sphn:Unit ;
    sphn:hasUnitCode ucum:cm .

2.4 Examples of data instantiation

2.4.1 Instantiation of a Heart Rate of a patient

The code block below provides an example representing data for the annotation of a measured heart rate (i.e. HeartRate) of a patient (i.e. SubjectPseudoIdentifier). It highlights the specific instances (e.g. Code, Frequency, BodySite) that need to be generated to follow the definitions of the SPHN RDF schema. It also showcases an example of reference to UCUM units.

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix ucum: <https://biomedit.ch/rdf/sphn-resource/ucum/> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .


# Instantiation of a SubjectPseudoIdentifier (i.e. patient) with an identifier:
resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "123456789"^^xsd:string .

# Instantiation of a HeartRate connected to that SubjectPseudoIdentifier:
resource:CHE-101-064-173-HeartRate-f9c87482 a sphn:HeartRate ;
    sphn:hasFrequency resource:Frequency-85-cblbeatscbrpermin ;
    sphn:hasHeartRateBodySite resource:BodySite-SNOMED-CT-8205005 ;
    sphn:hasHeartRatePhysiologicStateCode resource:Code-SNOMED-CT-128974000 ;
    sphn:hasHeartRateRegularityCode resource:Code-SNOMED-CT-61086009 ;
    sphn:hasSubjectPseudoIdentifier resource:CHE-101-064-173-SubjectPseudoIdentifier-123456789 ;
    sphn:hasDateTime "2021-04-02T00:12".

# Instantiation of the Frequency:
resource:Frequency-85-cblbeatscbrpermin a sphn:Frequency ;
    sphn:hasEvents "85"^^xsd:double ;
    sphn:hasFrequencyUnit resource:Unit-UCUM-cblbeatscbrpermin .

# Instantiation of the Unit used in Frequency:
resource:Unit-UCUM-cblbeatscbrpermin a sphn:Unit ;
sphn:hasUnitCode ucum:cblbeatscbrpermin .

# Instantiation of the BodySite and connection to its Code:
resource:BodySite-SNOMED-CT-8205005 a sphn:BodySite ;
    sphn:hasBodySiteCode resource:Code-SNOMED-CT-8205005 .

# Heart Rate taken on the wrist:
resource:Code-SNOMED-CT-8205005 a snomed:8205005 .

# Physiologic state of the heart rate is 'Baseline state':
resource:Code-SNOMED-CT-128974000 a snomed:128974000 .

# Regularity of the heart rate is 'Pulse irregular':
resource:Code-SNOMED-CT-61086009 a snomed:61086009 .

The snippet of mock data represented above has been translated into a graph to better display a visualization of the connections between the resources.

HeartRate example graph

2.4.2 Instantiation of a Biosample

Below is an example of complete data instantiation for a Biosample, also visualized as a graph. You can observe the use of external terminologies (SNOMED CT) as well as references to values from valueset defined in SPHN.

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .


# Instantiation of a Biosample:
resource:CHE-101-064-173-Biosample-d3d5f4g5 a sphn:Biosample;
    sphn:hasBiosampleBodySite resource:BodySite-SNOMED-CT-53120007 ;
    sphn:hasBiosampleMaterialTypeLiquid sphn:AscitesFluid ;
    sphn:hasBiosampleMaterialTypeTissue sphn:Placenta ;
    sphn:hasBiosampleFixationType sphn:AldehydeBased ;
    sphn:hasBiosamplePrimaryContainerType sphn:Glass ;
    sphn:hasBiosampleStorageContainer sphn:OriginalPrimaryContainer ;
    sphn:hasDateTime "2021-07-04T12:12".

# Instantiation of the BodySite:
resource:BodySite-SNOMED-CT-53120007 a sphn:BodySite ;
    sphn:hasBodySiteCode resource:Code-SNOMED-CT-53120007 .

# Instantiation of the SNOMED CT code for the BodySite
resource:Code-SNOMED-CT-53120007 a snomed:53120007 .
Biosample example graph

2.4.3 Instantiation of a Diagnostic Radiologic Examination

Below is an example of complete data instantiation for a Diagnostic Radiologic Examination, also visualized as a graph. You can observe the use of external terminologies (SNOMED CT, CHOP, UCUM) as well as references to values from valueset defined in SPHN (e.g., CT) and the use of double type of values.

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix chop: <https://biomedit.ch/rdf/sphn-resource/chop/> .
@prefix resource: <https://biomedit.ch/rdf/sphn-resource/> .

# Instantiation of a Diagnostic Radiologic Examination:
resource:CHE-101-064-173-DiagnosticRadiologicExamination-g4d3e5t6 a sphn:DiagnosticRadiologicExamination;
    sphn:hasDiagnosticRadiologicExaminationCode resource:Code-CHOP-Z88.25 ;
    sphn:hasDiagnosticRadiologicExaminationMethod sphn:CT ;
    sphn:hasDiagnosticRadiologicExaminationRadiationAmount "50"^^xsd:double ;
    sphn:hasDiagnosticRadiologicExaminationUnit resource:Unit-UCUM-Gy ;
    sphn:hasBodySite resource:BodySite-SNOMED-CT-12921003 .

# Instantiation of the CHOP code used in Diagnostic Radiologic Examination:
resource:Code-CHOP-Z88.25 a chop:Z88.25 .

# Instantiation of the UCUM unit Gy:
resource:Unit-UCUM-Gy a sphn:Unit ;
    sphn:hasUnitCode ucum:Gy .

# Instantiation of the BodySite Code:
resource:BodySite-SNOMED-CT-12921003 a sphn:BodySite ;
    sphn:hasBodySiteCode resource:Code-SNOMED-CT-12921003 .

resource:Code-SNOMED-CT-12921003 a snomed:12921003 .
Diagnostic radiologic examination example graph

3. Cardinality of the data

The majority of classes (e.g. AdministrativeGender, BodyTemperature) defined in the SPHN RDF schema possibly have a link to the three main concepts: SubjectPseudoIdentifier, DataProviderInstitute and AdministrativeCase. These three main classes are also connected between each other. The properties connecting these classes together have specific cardinalities which are defined in the following document and these have being implemented in the SHACL rules (read more here). The cardinalities assess whether a class is linked directly to a patient, a data provider or a case but also how (e.g. is it possible to have multiple BodyTemperature connected to a patient? How many types of AdministrativeGender at the minimum and maximum can the patient have?). This data interconnectivity not only facilitates the process of quality control of the data but it can also help in the data exploration depending on the angle from which the user is interested in when looking into the data.

Note

The cardinalities accessible in the document will be implemented in the SPHN RDF schema version 2022.