Generate a project-specific RDF Schema
Target Audience
This document is intended for project data managers and researchers interested in generating their project-specific RDF Schema. Guidance on how to create a project-specific RDF Schema from a Dataset or RDF Template is given.
Introduction
A SPHN project can extend existing (SPHN) concepts and create new concepts (referred to as semantics in the following paragraphs) to fit their needs. Note that under no circumstances a project can modify existing content of the SPHN Dataset. The extension of the semantics for project-specific needs implies that the project must generate its own project-specific RDF Schema. This project-specific RDF Schema will be shared by the project to data providers to get data compliant with the new schema. The project-specific RDF Schema always extend the content (i.e. semantics) defined in the SPHN RDF Schema.
There exists two ways for a project to extend the SPHN semantics and produce their RDF Schema (see Figure 1):
Figure 1: The two options to generate a project-specific RDF Schema.
Option 1: from the SPHN Dataset Template (Excel file with content of the SPHN Dataset) provided by DCC, the project extends the file with the semantics it needs. The project then passes the modified SPHN Dataset Template (which becomes the project-specific Dataset) as input to the SPHN Schema Forge to produce the project-specific RDF Schema automatically
Option 2: from the SPHN RDF Schema Template (Turtle file with content of the SPHN RDF Schema) provided by DCC, the project defines its semantics and directly edits the RDF template in any editor of its choice, compliant with Semantic Web technologies (e.g. Protégé), to produce the project-specific RDF Schema manually.
Procedure to update the semantics
In both options, the procedure for updating the semantics is the same. Figure 2 shows the process that must be followed when using and modifying the content of the SPHN Dataset (semantics) to fit the project-specific needs. The project can reuse existing SPHN concepts, extend SPHN concepts or create new concepts. When modifying existing concepts or building new concepts, these changes have the possibility to be integrated in the future within the SPHN Dataset. We encourage you to design a new concept or modify an existing concept according to the Guiding principles for concept design.

Figure 2: Process on how to use and modify the SPHN Dataset for the project-specific needs.
The extension or modification of existing SPHN concepts can result in additional composedOfs, an alternative semantic standard that needs to be added, or it can be a required extension of an existing value set. There are various reasons calling for extensions, e.g. implementation of a new standard in the applicable jurisdiction, change in availablity of biomedical data, new needs of research projects, or expanded medical knowledge.
Note
There exist three SPHN concepts that have a special meaning in the processing:
Subject Pseudo Identifier
, Data Provider Institute
and Administrative Case
Any extension or modification of these concepts might result in invalid pipelines.
Please inform DCC if you want to modify these concepts.
It may happen that you find the concept in the SPHN Dataset for the data you need,
but a piece of information is missing. For example, you need data for a specific measurement,
e.g. Body Temperature
and different measurement methods for measuring the Body Temperature
matter for your research question. The specific measurement Body Temperature
is represented in the SPHN Dataset as a concept. However, the measurement method with
the appropriate value set is not yet defined as a composedOf.
In this case you can extend the SPHN concept with the additional composedOf in
your project-specific Dataset.
Note
If you create an extension for your project, please submit a corresponding change request to the DCC via dcc@sib.swiss. A change request template is available on https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-ontology/-/tree/master/templates. The extension might be relevant to other projects. The DCC can coordinate an extension to the SPHN Dataset if needed.
Example of semantic extension
description |
type |
||
---|---|---|---|
concept |
Body Temperature |
body temperature of the individual |
|
composedOf |
temperature |
measured temperature |
quantitative |
composedOf |
datetime |
datetime of measurement |
temporal |
composedOf |
body site |
body site of measurement |
Body Site |
composedOf |
unit |
unit in which the temperature is expressed |
Unit |
composedOf |
method |
method used to measure the temperature |
Measurement Method |
For the example above, the next step would be to define your value set or subset for the new composedOf.
In case you are choosing SNOMED CT as a controlled vocabulary to express your values for the method
of Body Temperature
measurements, you can define a subset as all descendants for
the SNOMED CT concept 56342008 | Temperature taking (procedure) |.
description |
type |
value set or subset |
||
---|---|---|---|---|
composedOf |
method |
method used to measure the temperature |
Measurement Method |
child of: 56342008 | Temperature taking (procedure) | |
The semantics to be integrated in the project must be defined before going to the technical implementation detailed below with the two options to produce in fine the project-specific RDF Schema.
Option 1: Produce an RDF Schema from the SPHN Dataset Template
The Dataset Template is provided as an Excel sheet to be modified by projects to extend the SPHN Dataset according to their needs.
Definitions of terms used in the Dataset (i.e., concept, composedOf)
can be found in the Guideline
sheet of the Dataset Template
but also in SPHN Dataset.
Once the Dataset Template Excel file is opened, do the following:
1. Add project’s metadata
Select the sheet Metadata
and add the following information below the already filled SPHN metadata line:
prefix: define the prefix that will be used in your project
title: provide a short title about the dataset
description: provide a short description of the content of the dataset
version: the version of the dataset you are building. It should be in the form of
<year>.<number>
prior version: if any, provide the previous version of the dataset
copyright: provide information about the copyright of the dataset
license: provide the iri of the license under which the content of the dataset and the schema belong to
canonical_iri: provide the full canonical iri of the dataset that will be created
versioned_iri: provide the versioned iri of the dataset that will be created. It should match the version information provided in
version
.
Example
A project called “Genotech” that wants to fill the Dataset Template, starts by providing its metadata:
prefix |
title |
description |
version |
prior version |
copyright |
license |
canonical_iri |
versioned_iri |
---|---|---|---|---|---|---|---|---|
genotech |
The Genotech project Dataset |
The Dataset of the Genotech project, based on the SPHN Dataset 2023.1 |
2023.1 |
© Copyright 2023, Genotech Institute |
Note
The Genotech project builds a Dataset for the first time, therefore the ‘prior version’ field is left empty.
2. Add information about coding system
The DCC provides information about terminologies, standards, vocabularies, and ontologies - henceforth collectively referred to as “coding systems” - that can be used in the SPHN Dataset for representing particular values with codes from the coding systems.
These information are given in the Coding System and Version
sheet.
Some of these coding systems are provided in RDF, either by the original provider of the coding system or by the DCC,
while others are not. In case of the latter, one would represent codes from such coding systems as instances of Code
concept in the data.
Since 2023.3 release of the SPHN Dataset Template, the Coding System and Version
sheet
has been updated to incorporate additional information about coding systems used in SPHN and SPHN projects.
The intention for this change was to:
clarify and differentiate which coding systems are used and/or provided in SPHN and SPHN projects,
facilitate the import of coding systems in RDF by the Dataset2RDF
To that end, a project must now also update the Coding System and Version
sheet to integrate information about supported
and used coding systems in their projects independent of whether or not they are provided in RDF.
Following are the columns from the Coding System and Version
sheet that can be populated:
short name: common abbreviation of the coding system
full name: full name or title of the coding system
coding system and version: short name of the coding system followed by a pattern that represents the way the coding system is versioned by the provider
example: example of an existing version of the coding system (that conforms to the pattern expressed in the ‘coding system and version’ column)
provided in RDF (yes/no): indicate whether the coding system is provided by the project in RDF
downloadable in RDF (yes/no): indicate whether the coding system is downloadable in RDF from any location on the web (typically from the original provider)
provided by: the name of the project (should be the same as the prefix written in 1. Add project’s metadata) to indicate that this coding system is provided/used in the project (i.e. the coding system is not provided/used in the SPHN Dataset and is specifically needed for the project’s semantics)
prefix: prefix of the coding system, typically corresponds to the ‘short name’ from the ‘short name’ column
root node: indicate the root node that will be used to group all concepts from the coding system in RDF
canonical iri: IRI of the codes taken from the coding system or defined by the project (it can be a
biomedit.ch
-based iri if the coding system does not have a web-resolvable IRI for their codes)resource prefix: if applicable, a specific resource node can be created to group all codes (including the root node) under a resource node.
For this resource node, a specific prefix must be given
* resource iri: if applicable, the IRI for the resource node. The iri must be of the form https://biomedit.ch/rdf/sphn-resource/...
. This column goes hand in hand with ‘resource prefix’ column
* versioned iri: the versioned IRI of the coding system. This IRI is used to import the coding system in the RDF schema
Examples
ATC - provided in RDF by DCC
In SPHN, the ATC coding system is actively being used and provided in RDF by the DCC.
ATC codes have dereferencable links which is encoded via the ‘canonical iri’ column.
However, a root node is created in order to group all ATC codes under the same parent.
This root node is defined as ATC
and uses the IRI from the ‘resource iri’ column.
Information about ATC is provided as follows:
short name |
full name |
coding system and version |
example |
provided in RDF (yes/no) |
downloadable in RDF (yes/no) |
provided by |
prefix |
root node |
canonical iri |
resource prefix |
resource iri |
versioned iri |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ATC |
Anatomical Therapeutic Chemical classification |
ATC-[YEAR] |
ATC-2023 |
yes |
no |
SPHN |
atc |
ATC |
sphn-atc |
ORPHA - provided in RDF on the web
ORPHA is a coding system that provides codes that represent rare diseases.
Let’s assume that the Genotech project wants to use the ORPHA and aims to provide the ORPHA codes in RDF. ORPHA is already listed in the SPHN Dataset Template but it is not provided in RDF by the DCC.
During the investigation phase, the Genotech project members discover ORDO (Orphanet Rare Disease Ontology) which represents ORPHA codes in a structured way and compliant with Semantic Web standards. This ORDO ontology fits their needs.
Therefore, the Genotech project would like to use the ORDO ontology and will provide it in RDF.
The Genotech project can then update the line containing ORPHA to add metadata about the coding system as follows:
short name |
full name |
coding system and version |
example |
provided in RDF (yes/no) |
downloadable in RDF (yes/no) |
provided by |
prefix |
root node |
canonical iri |
resource prefix |
resource iri |
versioned iri |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ORPHA |
Orphanet nomenclature of rare diseases |
ORPHA-[YEAR]-[MONTH] |
ORPHA-2021-07 |
yes |
yes |
GENOTECH |
orpha |
ORPHA |
sphn-orpha |
Note
The ‘canonical iri’ corresponds to the IRI used for ORPHA codes in the ORDO ontology.
The ‘resource iri’ and ‘resource prefix’ are internal to the Genotech project (and defined in the context of SPHN) in order to group all the content from the ORDO ontology under the same root node ORPHA.
The ‘versioned iri’ follows the way ORDO is versioned; here it corresponds to version 4.2 of the ORDO ontology.
Oncotree - provided in RDF by the project
Oncotree is an example of a coding system which is neither downloadable in RDF nor provided in RDF by the DCC. The project first needs to “FAIRify” and translate the coding system into RDF as much as possible (see FAIRification of External Terminologies in RDF for projects) before using and sharing it.
Again, lets assume that the Genotech project wants to use Oncotree and decides to provide it in RDF.
The following metadata is encoded in the Coding System and Version
sheet:
short name |
full name |
coding system and version |
example |
provided in RDF (yes/no) |
downloadable in RDF (yes/no) |
provided by |
prefix |
root node |
canonical iri |
resource prefix |
resource iri |
versioned iri |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Oncotree |
Oncotree: A Cancer Classification System for Precision Oncology |
oncotree_[YEAR]_[MONTH]_[DAY] |
oncotree_2021_11_02 |
yes |
no |
GENOTECH |
oncotree |
ONCOTREE |
Note
In this example, the ‘resource prefix’ column and ‘resource iri’ column do not need to be defined because the root node (ONCOTREE
)
and all resources of Oncotree will share the same namespace since resources from Oncotree do not have a properly defined and dereferencable link.
Coding systems’ copyright
It is important to keep in mind that before providing any coding system in RDF and eventually sharing them with data providers and/or other data users, the project has the responsibility to check the terms and regulation for using and sharing the coding system.
Copyright information must be stated in the RDF terminology file generated/used in the context of the project. This is an important step that should not be neglected.
The use and sharing of a coding systems’ file in RDF in the context of a project is the responsibility of the project data manager. In the future, if the DCC integrates that coding system in the SPHN Dataset and provides it in the Terminology Service, then it will be under the responsibility of the DCC.
3. Concept definition
The next step is to go to the Concepts
sheet which already contains all concepts and composedOfs defined in SPHN.
A project is allowed to only:
create new concepts
add composedOfs to these new concepts
add composedOfs to existing SPHN concepts
2.1 Add a new concept
A project can decide to add a new concept to their dataset. A concept is an idea or notion that represents, in the context of SPHN, clinical-, health- and genomic-related elements. A concept here can be compared to the notion of “class” in other fields.
The following columns in the dataset template can be filled:
release version: version of the dataset when the concept is created (it should be the version marked in the metadata sheet)
IRI: the versioned IRI ending with the concept name in UpperCase convention
active status:
yes
orno
are the allowed values. You must selectyes
for a concept newly added since it can be used from this version of the dataset and thus is activeconcept reference: provide the name of the concept to create with the following notation:
<prefix>:<Concept Name>
concept or concept compositions or inherited: indicate by selecting one of the three options if the row corresponds to a concept, a composedOf or if it is a composedOf that is inherited from another concept. In this case (i.e. adding a new concept), this cell must be filled with
concept
general concept name: provide the general name of the concept which will be used for building the RDF schema
general description: provide the general description of the concept which will be used in the RDF schema
contextualized concept name: provide the contextualized description of the concept in the particular context it is used
contextualized concept description: provide the contextualized description of the concept explaining its meaning in the particular context it is used
parent: provide the general concept name of the parent with the following notation:
<prefix>:<ConceptName>
Note
Unlike in the
Concept reference
column, theParent
is written in an UpperCase convention without space!The root concept in SPHN is called
SPHNConcept
. It contains all concepts defined in SPHN. All SPHN concepts are children of theSPHNConcept
but some concepts can be children of another SPHN concept, the parent concept. The child concept must then have a more specific meaning than the parent concept. Similarly, all project concepts should be children of the project root concept, which is defined as:<prefix>:<PREFIX>Concept
. Same as in the SPHN Dataset, multiple levels of hierarchies can be created. Therefore, a project concept can be the child of another project (or sphn) concept if it has a more specific meaning.
meaning binding: if available, a meaning binding of the concept to an external coding system can be provided to further anchor the meaning of the concept defined in the project
additional information: text can be added to provide details to the reader of the dataset
Note
This information will not be processed by SPHN tools that generate, for instance, the RDF schema.
cardinality for the concept to Administrative Case: provide the cardinality of the concept with respect to the Administrative Case by keeping in mind the following: how should the instance of this concept be expected to be linked to the Administrative Case?
cardinality for the concept to Data Provider Institute: provide the cardinality of the concept with respect to the Data Provider Institute by keeping in mind the following: how should the instance of this concept be expected to be linked to the Data Provider Institute?
cardinality for the concept to Subject Pseudo Identifier: provide the cardinality of the concept with respect to the Subject Pseudo Identifier by keeping in mind the following: how should the instance of this concept be expected to be linked to the Subject Pseudo Identifier?.
Note
You can highlight a concept by making the line bold as it is done in the SPHN Dataset.
2.2 Add a new ComposedOf
Once a concept is created, a project can add composedOfs to this concept. ComposedOf can be considered as metadata of a concept (i.e., specific information about the concept). ComposedOf can be compared to properties or attributes of a concept.
The following columns in the Dataset Template can (and should whenever possible) be filled:
release version: version of the dataset when the composedOf is created (it should be the version marked in the metadata sheet)
IRI the versioned iri ending with the composedOf name in lowerCase convention
active status: in principle, you should select
yes
for a new composedOfconcept reference: provide the name of the concept this composedOf belongs to with the following notation:
<prefix>:<Concept Name>
concept or concept compositions or inherited: indicate by selecting in the list either
composedOf
orinherited
if the composedOf is inherited from another conceptgeneral concept name: provide the general name of the composedOf which will be used for building the RDF schema
general description: provide the general description of the composedOf which will be used in the RDF schema
contextualized concept name: provide the contextualized description of the composedOf in the particular context it is used
contextualized concept description: provide the contextualized description of the composedOf explaining its meaning in the particular context it is used
parent: provide the general composedOf name of the parent with the following notation:
<prefix>:<composedOf>
Note
It is important to note that unlike in the
Concept reference
, theParent
is written in a lowerCase convention without space.Their exist two root attributes in SPHN for composedOfs:
SPHNAttributeDatatype
for datatype attribute composedOfs andSPHNAttributeObject
for object attribute composedOfs. Similarly, parents of project’s composedOfs should be pointing to one of the project root attribute<prefix>:<PREFIX>AttributeDatatype
or<prefix>:<PREFIX>AttributeObject
when the composedOf is not a descendant of another one.
type: provide the type of the composedOf (e.g., quantitative, qualitative, Code, any SPHN/project concept)
standard: when the type of the composedOf is
Code
, a coding system can be referenced to indicate possible values. Indicate the name of that coding system in this columnvalue set or subset: when the type of the composedOf is
Code
orqualitative
, a set of values or subset of values can be specified in this column
Note
Indicate a subset by starting with
descendant of:
followed by the identifiers/values.Indicate a value set by listing the values and separating them with a semi colon
;
.The standard nomenclature to write codes from coding system is:
<coding system name>: <identifier> <label>
(e.g.,LOINC: 20228-3 Anatomical part Laterality
). The exception is with SNOMED CT codes written as follow (with vertical bar symbols between the label):SNOMED CT: <identifier> | <label> |
.
additional information: text can be added to provide details to the reader of the dataset
cardinality for composedOf: indicate the range of cardinality for the composedOf with respect to the concept for which it is defined by keeping in mind the following: when the concept is instantiated we expect this cardinality to be true.
Example
The Genotech project would like to add the concept of Cost
to their project.
The concept Cost
is described by a value
and a currency code
.
The Dataset Template would be filled as follow:
a new line is created for the concept
Cost
with the following information:
release version: 2023.1
IRI: https://www.biomedit.ch/rdf/sphn-ontology/genotech/2023/1#Cost
active status: yes
concept reference: genotech:Cost
concept or concept compositions or inherited: concept
general concept name: genotech:Cost
general description: an amount that has to be paid or spent to buy or obtain something
contextualized concept name: genotech:Cost
contextualized concept description: an amount that has to be paid or spent to buy or obtain something
parent: genotech:GENOTECHConcept
meaning binding:
additional information:
cardinality for concept to Administrative Case:
cardinality for the concept to Data Provider Institute:
cardinality for the concept to Subject Pseudo Identifier:
Note
This line gives the information about a Concept called Cost
which do not have any link to the Administrative Case
,
Data Provider Institute
or Subject Pseudo Identifier
. It is possible to have a concept X
that is not connected to any of these three concepts, in which case the concept X
must be reused in another concept in a composedOf.
a new line is created below the
Cost
for adding the composedOfvalue
:
release version: 2023.1
IRI https://www.biomedit.ch/rdf/sphn-ontology/genotech/2023/1#hasValue
active status: yes
concept reference: genotech:Cost
concept or concept compositions or inherited: composedOf.
general concept name: genotech:value
general description: value of the concept
contextualized concept name: value
contextualized concept description: value of the cost paid or spent
parent: genotech:GENOTECHAttributeDatatype
type: quantitative
standard:
value set or subset:
additional information:
cardinality for composedOf: 1:1
Note
value
is a composedOf used in the context of Cost
(i.e., concept reference).
With the cardinality, the project indicates that a Cost
must have at least one and only one value
connected.
a new line is created below the
value
for adding the composedOfcurrency code
:
release version: 2023.1
IRI https://www.biomedit.ch/rdf/sphn-ontology/genotech/2023/1#hasCurrencyCode
active status: yes
concept reference: genotech:Cost
concept or concept compositions or inherited: composedOf
general concept name: genotech:currency code
general description: currency of the concept
contextualized concept name: currency
contextualized concept description: currency of the value paid or spent
parent: genotech:GENOTECHAttributeObject
type: Code
standard: ISO 4217
value set or subset:
additional information:
cardinality for composedOf: 1:1
Note
currency code
is a composedOf used in the context of Cost
(i.e., concept reference) and
is of type Code
from the SPHN Dataset. With the cardinality, the project indicates that a
Cost
must have at least one and only one currency code connected.
3. Add a composedOf to an existing SPHN concept
Projects have the possibility to extend existing SPHN concepts with additional composedOfs (properties/attributes) that would be needed in the context of the project.
These new composedOfs can be added at the end of the Dataset Template, as done previously to add a new concept or composedOf or they can be added by adding a new line below the SPHN concept in question.
Example
The Genotech project wants to add a cost to an Administrative Case
to retain information
about the costs or bills of a case.
The project has already created a new concept Cost
but now wants
this information to be a part of Administrative Case
.
A new line is added at the end of the Dataset Template as follow:
release version: 2023.1
IRI https://www.biomedit.ch/rdf/sphn-ontology/genotech/2023/1#hasCost
active status: yes
concept reference: Administrative Case
concept or concept compositions or inherited: composedOf
general concept name: cost
general description: cost of the concept
contextualized concept name: cost
contextualized concept description: cost written in the administrative case
parent: genotech:GENOTECHAttributeObject
type: Cost
standard:
value set or subset:
additional information:
cardinality for composedOf: 0:1
4. Inheritance of SPHN Concepts
A project can define an SPHN concept being the parent of a project concept. This is called inheritance: the project concept has then a more specific definition of the SPHN parent concept (see more about semantic inheritance as it is defined and used in the SPHN Dataset here: semantic-inheritance).
The rule is that when a project concept inherits from an SPHN concept, it must inherit all the properties of that SPHN concept. In the Dataset Template this means a project concept which has as parent an SPHN concept will list the composedOfs of that SPHN concept under the project concept as “inherited” (text to be selected in column ‘concept or concept composedOf or inherited’) composedOfs. In this case, the project has the possibility to narrow down the set of values allowed for a given inherited composedOf.
Note
Note 1: When a inherited property has for type “Code”, the SPHN Dataset usually “restricts” the coding systems
to be used to X, Y, Z or other. The “or other” in principles enables the
project to use any Terminology (coding system they provide in RDF format, codes used with IRIs)
or Code (coding system not provided in RDF, codes used as Code
) without breaking the
SPHN Dataset logic.
Note 2: Terminologies provided in RDF in a project-specific Dataset will be listed in the project-specific RDF Schema under
a project:Terminology
class. The project:Terminology
class must be a subClass of sphn:Terminology
.
The project:Terminology
will be automatically generated in the SPHN Dataset2RDF tool.
The hierarchy of terminologies would be intepreted in the project-specific RDF Schema as follows:
- sphn:Terminology
ATC
CHOP
…
- project:Terminology
TERMINOLOGY A
TERMINOLOGY B
…
SNOMED CT
…
Example
The Genotech project would like to create the concept of Skin Moisture as a Measurement. We take the sphn:Measurement as a concept that a project would like to reuse as parent for project:Skin Moisture.
general name |
general description |
parent |
|
---|---|---|---|
concept |
Measurement |
annotation used to indicate the size or magnitude of […] |
SPHNConcept |
composedOf |
quantity |
value and unit of the concept |
SPHNAttributeObject |
composedOf |
measurement datetime |
datetime of measurement |
hasDateTime |
concept |
Skin Moisture |
hydration state of the outer epidermis |
Measurement |
inherited |
quantity |
value and unit of the concept |
SPHNAttributeObject |
inherited |
measurement datetime |
datetime of measurement |
hasDateTime |
In this example, the genotech project inherits all composedOf from Measurement under the concept Skin Moisture. The semantics are respected and the meaning of ‘inheritance’ used as it should be.
5. Transform the Dataset Template to a RDF Schema
Once the Dataset Template is filled, a last step remains to generate the project-specific RDF Schema and (if wanted) all related content from SPHN Semantic Interoperability Framework (SHACL rules, SPARQL queries, pyLODE Schema Visualization). The updated Dataset Template, which now has become the project-specific Dataset can be given as input to the SPHN Schema Forge (https://schemaforge.dcc.sib.swiss), a web service that will generate all the previously cited materials.
Note
The project can add sheets to the Dataset Template for keeping track of additional metadata (e.g., release notes) but these information will not be processed by the SPHN Schema Forge.
Option 2: Generate an RDF Schema from the SPHN RDF Schema Template
Note
To find out more you can also watch the Tutorial on Expanding the SPHN RDF Schema
In this subsection, information on how to modify and extend the SPHN RDF Schema Template using Protégé to fit the needs of a project is given.
To facilitate the steps in creating a project-specific schema, the DCC provides the RDF Schema Template with pre-filled elements accessible here.
This template contains:
the SPHN RDF Schema imported (as
direct Imports
) and the related external resources imported (asindirect Imports
)adequate imports of RDF libraries used in the context of SPHN (e.g. http://purl.org/dc/terms/)
pre-filled metadata (annotations) for the project-specific schema to be updated by the projects.
1. Create a project schema in Protégé
Load the RDF Schema Template file provided by DCC from Git into Protégé:
First open the template file: File –> Open
Make sure to link to the adequate SPHN RDF Schema and external terminologies when requested to import them (the
catalog.xml
file provided in Git facilitates the import: instructions are available in the README file)Save this project with the project name: File –> Save As –> Select the format (recommended: Turtle syntax, OWL/XML Syntax)
Select location to save and name the project accordingly (e.g. psss_schema, frailty_schema).
2. Edit metadata of the project schema
2.1 Update the ontology IRI
A schema released by a project, which extends the SPHN RDF Schema,
should have its own ontology IRI (namespace) defined.
The ontology IRI, also called base prefix
, will be used by both data providers
(to annotate data) and data users (to query for the relevant classes/properties).
The convention to follow for defining this ontology IRI is:
https://biomedit.ch/rdf/sphn-ontology/
+ <name of the project>
+ /
or #
(e.g., for the PSSS project, the ontology IRI can be: https://biomedit.ch/rdf/sphn-ontology/psss/).
In addition to the ontology IRI, a version IRI must be generated and provided by the project for each published release of their RDF schema. The version IRI must be in the form of:
<ontologyIRI>
+ <year>
+ /
+ <version>
+ /
(e.g. https://biomedit.ch/rdf/sphn-ontology/psss/2021/3/ for the third release of the PSSS RDF Schema in 2021).
The version IRI of a project called PSSS would be reflected in a RDF Turtle file as follow:
@prefix : <https://biomedit.ch/rdf/sphn-ontology/psss/> .
<https://biomedit.ch/rdf/sphn-ontology/psss/>
owl:versionIRI <https://biomedit.ch/rdf/ontology/psss/2021/3/> .
In the template loaded, the ontology IRI and the ontology version IRI
must be updated in the Active Ontology
, section Ontology Header
following
the conventions cited above: simply change the text “PROJECT-NAME” to the actual project name.
2.2 Update the annotations
Below the Ontology header
section are the annotations holding
the metadata about the project’s schema:
the title (
dc:title
) should be a project-specific title (e.g. ‘the PSSS RDF Schema’)the short comment (
dc:description
) should be a short sentence reflecting the content of the project’s schemathe license of the project (
dcterms:license
) which should be the same as the SPHN licensing
Make sure to update the title and the description by changing the “PROJECT-NAME” to the actual project name. The license does not need any changes.
2.3 Update the imports
In the template, the SPHN RDF Schema is being already imported, stated with the following statement in the project-specific Turtle file (example with the PSSS project):
@prefix : <https://biomedit.ch/rdf/sphn-ontology/psss/> .
<https://biomedit.ch/rdf/sphn-ontology/psss/>
owl:versionIRI <https://biomedit.ch/rdf/sphn-ontology/psss/2021/3/>;
owl:imports <https://biomedit.ch/rdf/sphn-ontology/sphn/2021/1/> .
Note
owl:imports
means that the contents of another OWL ontology (here, the SPHN RDF Schema)
is imported into the current one (here, the PSSS RDF Schema).
More information can be found at: https://www.w3.org/TR/owl-ref/#imports-def.
If you wish to import any other terminology or schema in the project, follow these steps:
In
Ontology imports
, click the+
sign next toDirect Imports
Choose
Import an ontology contained in a local file.
, thenContinue
Select the ontology to import with
Browse
, thenContinue
, and finallyFinish
.
2.4 Add the project schema prefix
In the tab Ontology Prefixes
, make sure to update the value of the base prefix
(usually the first line, which has an empty prefix) by changing
the text ‘PROJECT-NAME’ to the actual name.
Then make sure to add the schema prefix of the project where the
Prefix
would be the project name and the Value
would be
the project ontology IRI for better readability in the .ttl
or .owl
file.
3. Implement modifications in the RDF schema
First, create a root class (<PROJECT-NAME>Concept
) and root data
(<PROJECT-NAME>AttributeDatatype
) and object (<PROJECT-NAME>AttributeObject
)
properties for the project-specific ontology, where all the classes and properties
specific to the project will be defined as sub-elements.
Changes to the RDF schema should done following the process highlighed in Figure 3:
Figure 3: Process on how to use and modify the SPHN RDF Schema for the project specific needs.
This section displays information about the way a project should update the SPHN RDF Schema depending on the type of modification.
3.1 Modify an existing class
A project modifying an existing class of the SPHN RDF Schema in any way
(minor edit or change breaking compatibility) must provide the modified class with their project prefix.
This implies a new class is generated by the project, with the same naming but a different prefix
(e.g. a modification in the class sphn:Encounter
by the PSSS project would become psss:Encounter
).
In Protégé, a new class must be created in the project ontology with the same name but
this IRI will be the project ontology IRI (e.g. https://biomedit.ch/rdf/sphn-ontology/psss/Encounter).
Note
If we follow the example provided, real data following the PSSS ontology must then provide
the Encounter data elements based on the definition of the PSSS project.
Therefore, the prefix used (and the IRI) will always be
PSSS:Encounter
(and https://biomedit.ch/rdf/sphn-ontology/psss/Encounter
).
3.2 Modify an existing property
Any change affecting a property from the SPHN RDF Schema must result
in the creation of a new property with the project ontology IRI.
For example, DCC has defined a material type liquid
property for the concept
Biosample
: sphn:hasMaterialTypeLiquid
, with a restricted list of possible value set.
The project PSSS decides to narrow down the list of possible values for this
material type liquid
property.
The PSSS project must then define their own psss:hasMaterialTypeLiquid
property.
In this psss:hasMaterialTypeLiquid
property, the value set will be restricted
to only values allowed by the PSSS project.
Note
Value set restriction are encoded as owl:Restriction
(see section Constraints added to properties) since the version 2022.1 of the SPHN RDF Schema.
If a project would like to reuse a property in another context (meaning to describe metadata of another class), a new property must be created following the conventions defined in the section About the SPHN RDF properties.
3.3 Create a new property to an existing class
Adding a new property to an existing class can lead to two different scenarios.
1. If the property does not change the meaning of the class, the project can define their property with their prefix associated to the SPHN class as shown in the example below:
sphn:Encounter
(class)psss:hasServiceType
(new property)
The project should submit the change request of adding the new property into the concept to the DCC. If the change is evaluated to be of general importance, the DCC would adapt the concept accordingly in the next release of the SPHN RDF Schema. This would result in the following:
sphn:Encounter
sphn:hasServiceType
2. If the property changes the meaning of the class and breaks compatibility, a new class must be created with the project prefix (following the recommendations from the section 3.1 Modify an existing class) and the property would be defined for this new class:
psss:Encounter
psss:hasEndDate
For more guidance on knowing whether a property eventually breaks the meaning of a class or if a specific change needs the creation of a project-specific class/property, do not hesitate to contact the DCC (dcc@sib.swiss).
3.4 Meaning binding to controlled vocabularies
For the meaning binding you can use any controlled vocabulary that is appropriate for your concept. Please refer to the guiding principles for Controlled vocabulary. If you need help with the meaning binding, please contact the DCC (dcc@sib.swiss).
The integration of meaning binding to RDF classes is represented by
owl:equivalentClass
.
The example below shows that the LOINC code
8302-2
is an equivalent class of the SPHN class BodyHeight
:
### https://biomedit.ch/rdf/sphn-ontology/sphn#BodyHeight
sphn:BodyHeight rdf:type owl:Class ;
owl:equivalentClass <https://loinc.org/rdf/8302-2> ;
rdfs:subClassOf sphn:Measurement ;
rdfs:comment "height of the individual" ;
rdfs:label "Body Height" .
To annotate an equivalent class through Protégé, please follow these instructions:
on the
Class hierarchy
section, select the class of intereston the
Description
section click on the+
sign next toEquivalent To

in the pop-up window that appears, go to the tab
Class expression editor

in the text field, type the label of the equivalent class (for autocomplete, press
Tab
)

Note
The external terminologies used for meaning binding (e.g., SNOMED CT, LOINC, GENO and SO) must be provided in the ontology space in order to be able to find and connect the equivalent classes.
Classes composed of multiple words are better found via autocomplete when an apostrophe is entered at the beginning in the Class expression editor text field.
3.5 Value sets as individuals
Value sets can be defined by the project in order to set and limit the possible values
for a certain property (see section Standards and value sets).
Each possible value needs to be created as an individual in RDF (owl:NamedIndividual
).
These individuals are then grouped into the same valueset, represented with a specific class.
This class is then set as being the range of the property,
meaning that the individuals linked to that class are the possible values for that property.
The creation of a value as an individual and linking a set of values to a property require the following of these steps in Protégé:
Create an individual for each value:
Select tab
Individuals
,Click on
Add individual
,Write the name of the individual to generate the IRI,
Add a label for each individual created.
If not done already, create a
ValueSet
class to group all sets of valuesCreate a class which should be a sub-class of
ValueSet
. The IRI of the class should follow the convention:<DomainClassName>_<propertyName>
where ‘DomainClassName’ is the Domain of the property.Select the class created, then:
Click on the
+
sign next toInstances
,Select the individuals that are linked to this ‘valueset class’ (multiple individuals can be selected with Ctrl+Click),
Click
OK
,Now all individuals of a valueset are connected to a specific valueset class.
The valueset class can now be added in the
owl:restriction
of the class with the property allowing these values:
Select the class,
Click on the
+
sign next toSubClass Of
,Under
Class expression editor
write theowl:restriction
with the following pattern:property-name
+some
+valueset-class
Click
OK
.
For example, the class DiagnosticRadiologicExamination
has the property hasMethod
which has six possible values (PET CT, CT, MRI, PET, SPECT, X-ray).
These six values are created one by one as individuals.
The class DiagnosticRadiologicExamination_method
is then generated
as a subclass of ValueSet
. The six individuals are added as instances
of the class DiagnosticRadiologicExamination_method
.
The class DiagnosticRadiologicExamination_method
is set as
a value restriction on the class DiagnosticRadiologicExamination
for the property hasMethod
(as shown below in a .ttl
format).
sphn:DiagnosticRadiologicExamination
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty sphn:hasMethod ;
owl:someValuesFrom sphn:DiagnosticRadiologicExamination_method
]
4. Best practices when producing the RDF
When creating a new class or a new property, following best practices increases the consistency and the readability of the schema. Here are a few recommendations:
use Pascal case notation for classes (e.g.
AdministrativeGender
) and Camel case notation for data and object properties (e.g.hasEndDateTime
) when creating the IRIsdata and object properties should follow the convention given in the section About the SPHN RDF properties
for all classes and properties, generate a label (
rdfs:label
) with spaces in between words for better readability of classes and properties (e.g.hasEndDateTime
would have as labelhas end date time
)for all classes and properties, create a description (
rdfs:comment
) that explains in an understandable and unambigous sentence the meaning of the class or propertychoose an appropriate controlled vocabulary (meaning binding) to represent your class through the use of
owl:equivalentClass
. (see section Controlled vocabulary) for the guiding principles for the meaning binding to external terminologies (e.g., SNOMED CT, LOINC, GENO or SO).
5. Visualizing the project-specific schema
Once the project-specific RDF Schema is created, it can be visualized with the PyLODE-based SPHN Schema Visualization Tool (see https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-ontology-documentation-visualization). The tool is used to generate human-readable HTML documents for RDF schemas. It takes given ontologies and terminologies as input, manipulates and merges them into a single preprocessed schema and then generates a HTML document.
The html document is structured as follow: it starts with some general information about the schema (URI, version, etc.) and then is divided into five main sections. Each section gives detailed information about the adressed schema components. The end of the html document provides information about namespaces and some legends.
Classes: The list of classes defined in the schema contains the sections shown in the Table below:
Section |
Description |
---|---|
URI |
URI |
Description |
short description about the class |
Schema representation |
image containing the class schema and its outgoing properties and metadata |
Meaning binding (Equivalent-class) |
Link to equivalent class (e.g. SNOMED CT, LOINC, GENO or SO class) |
Parents |
Link to super-classes |
Children (Sub-classes) |
Link to sub-classes |
Property (in the domain of) |
List of properties where the class is listed in the domain with given cardinalities, class or datatype information and restriction information (Yes/No) |
Restrictions |
details about the restrictions applied on properties in the context of the class (e.g. specified SNOMED codes) |
Notes |
Notes for specified properties (allowed coding system or recommended values) |
Used in (In the range of) |
List of properties where the class is listed in the range |
Object Properties: provides the list of object properties defined in the schema with their URI, description, super-properties, domain(s) and range.
Datatype Properties: provides the list of datatype properties defined in the schema with their URI, description, super-properties, domain(s) and data type.
Annotation Properties: provides the list of annotation properties with their URI and description (if provided).
Named Individuals: provides the list of named individuals with their URI and the class in which they appear.
For providing a project-specific RDF Schema in the PyLODE-based SPHN Schema Visualization Tool, follow instructions provided in the README - User Guide. The generated HTML file can then be shared by the project members to anyone who wishes to visualize the project-specific schema in a browser.
6. Validating the project-specific schema
Validation is possible with the SHACLer tool.
Note
The steps presented in option 2 are automatically generated in option 1 using the SPHN Schema Forge.
Reporting back to DCC
The DCC welcomes any feedback to the SPHN Dataset and to the SPHN RDF Schema to improve these specifications. If you have any specific change requests to the SPHN Dataset or to the SPHN RDF Schema, please submit them by email to dcc@sib.swiss. For any change requests to the SPHN Dataset, please include the concept(s) or the composedOf(s), which are affected by the change request, the version of the Dataset, a description of the rationale behind the change request, and your proposal including suggested changes in a table structure following the SPHN Dataset design.