Generating a SPHN project-specific ontology

Note

To find out more watch the Tutorial on Expanding the SPHN RDF Schema

This page provides guidance on how to create a project-specific RDF ontology based on the SPHN RDF schema. Information on how to modify and extend the SPHN RDF schema to fit the needs of the project is also given.

Process

Figure 1: Process on how to use and modify the SPHN Dataset for the project specific needs.

1. Project-specific ontology creation

To facilitate the steps in creating a project-specific ontology, the DCC provides an RDF template with pre-filled elements accessible at: template_ontology.

This RDF template contains:

  • the SPHN RDF ontology imported (as direct Imports) and the related external resources imported (as indirect Imports)

  • adequate imports of RDF libraries used in the context of SPHN (e.g. http://purl.org/dc/terms/)

  • pre-filled metadata (annotations) for the project-specific schema to be updated by the projects.

Please use this file to create your project-specific ontology.

1.1 Create a project-specific ontology in Protégé

Get the template file provided by DCC from the template_ontology into Protégé and follow the steps to update ontology information in there:

  • First open the template file: File –> Open

  • Make sure to link to the adequate SPHN ontology and external terminologies when requested to import them (the catalog.xml file provided in Git facilitates the import: simply make sure to have the SPHN ontlogy and the external terminologies in the directory of the template ontology)

  • Save this project with the project name: File –> Save As –> Select the format (recommended: Turtle syntax, OWL/XML Syntax)

  • Select location to save and name the project accordingly (e.g. psss_ontology, frailty_ontology).

1.1.1 Update the ontology IRI

A schema released by a project, which extends the SPHN RDF schema, should have its own ontology IRI (namespace) defined. The ontology IRI, also called base prefix, will be used by both data providers (to annotate data) and data users (to query for the relevant classes/properties). The convention to follow for defining this ontology IRI is:

https://biomedit.ch/rdf/sphn-ontology/ + <name of the project> + / or #

(e.g., for the PSSS project, the ontology IRI can be: https://biomedit.ch/rdf/sphn-ontology/psss/).

In addition to the ontology IRI, a version IRI must be generated and provided by the project for each published release of their RDF schema. The version IRI must be in the form of:

<ontologyIRI> + <year> + / + <version> + /

(e.g. https://biomedit.ch/rdf/sphn-ontology/psss/2021/3/ for the third release of the PSSS RDF schema in 2021).

The version IRI of a project called PSSS would be reflected in a RDF Turtle file as follow:

@prefix : <https://biomedit.ch/rdf/sphn-ontology/psss/> .

<https://biomedit.ch/rdf/sphn-ontology/psss/>
       owl:versionIRI <https://biomedit.ch/rdf/ontology/psss/2021/3/> .

In the template loaded, the ontology IRI and the ontology version IRI must be updated in the Active Ontology, section Ontology Header following the conventions cited above: simply change the text “PROJECT-NAME” to the actual project name.

1.1.2 Update annotations

Below the Ontology header section are the annotations holding the metadata about the project-ontology:

  • the title (dc:title) should be a project-specific title (e.g. ‘the PSSS schema’)

  • the short comment (dc:description) should be a short sentence reflecting the content of the project ontology

  • the license of the project (dcterms:license) which should be the same as the SPHN licensing

Make sure to update the title and the description by changing the “PROJECT-NAME” to the actual project name. The license does not need any changes.

1.1.3 Information about Imports

In the template, the SPHN ontology is being already imported, marking the following statement in the project-specific (here, an example with the PSSS project) Turtle file:

@prefix : <https://biomedit.ch/rdf/sphn-ontology/psss/> .

<https://biomedit.ch/rdf/sphn-ontology/psss/>
       owl:versionIRI <https://biomedit.ch/rdf/sphn-ontology/psss/2021/3/>;
       owl:imports <https://biomedit.ch/rdf/sphn-ontology/sphn/2021/1/> .

Note

owl:imports means that the contents of another OWL ontology (here, the SPHN RDF schema) is imported into the current ontology (here, the PSSS RDF schema). More information can be found at: https://www.w3.org/TR/owl-ref/#imports-def.

If you wish to import any other ontology in the project, follow these steps:

  • In Ontology imports, click the + sign next to Direct Imports

  • Choose Import an ontology contained in a local file., then Continue

  • Select the ontology to import with Browse, then Continue, and finally Finish.

1.1.3 Add the project ontology prefix

In the tab Ontology Prefixes, make sure to update the value of the base prefix (usually the first line, which has an empty prefix) by changing the text ‘PROJECT-NAME’ to the actual name. Then make sure to add the ontology prefix of the project where the Prefix would be the project name and the Value would be the project ontology IRI for better readability in the .ttl or .owl file.

2. New concepts and modification of existing ones

We encourage you to design a new concept or modify an existing concept according to the Guiding principles for concept design.

First, create a root class (<PROJECT-NAME>Concept) and root data (<PROJECT-NAME>AttributeDatatype) and object (<PROJECT-NAME>AttributeObject) property for the project-specific ontology, where all the classes and properties specific to the project will be defined as sub-elements.

2.1 Extension or modification of existing SPHN concepts

Extension or modification of existing SPHN concepts can result in (an) additional composedOf(s), it can be an alternative semantic standard, that needs to be added, or it can be a required extension of an existing value set. There are various reasons calling for extensions, e.g. implementation of a new standard in the applicable jurisdiction, change in availablity of biomedical data, new needs of research projects, or expanded medical knowledge.

It may happen that you find the concept in the SPHN Dataset for the data you need, but a piece of information is missing. For example, you need data for a specific measurement, e.g. Body Temperature and different measurement methods for measuring the Body Temperature matter for your research question. The specific measurement Body Temperature is represented in the SPHN Dataset as a concept. However, the measurement method with the appropriate value set is not yet defined as a composedOf. In this case you can extend the SPHN concept with the additional composedOf in your project specific Dataset. Please inform the DCC about this extension. It might be relevant to other projects as well and the DCC can coordinate an extension to the SPHN Dataset if needed.

Table 1. Example of concept Body Temperature extended by composedOf method.

description

type

concept

Body Temperature

body temperature of the individual

composedOf

temperature

measured temperature

quantitative

composedOf

datetime

datetime of measurement

temporal

composedOf

body site

body site of measurement

Body Site

composedOf

unit

unit in which the temperature is expressed

Unit

composedOf

method

method used to measure the temperature

Measurement Method

For the example above, the next step would be to define your value set or subset for the new composedOf. In case you are choosing SNOMED CT as a controlled vocabulary to express your values for the method of Body Temperature measurements, you can define a subset as all descendents for the SNOMED CT concept 56342008 |Temperature taking (procedure)|.

Table 2. Example of composedOf method for the concept Body Temperature with subset definition in SNOMED CT.

description

type

value set or subset

composedOf

method

method used to measure the temperature

Measurement Method

child of: 56342008 |Temperature taking (procedure)|

2.2 Implementation of changes in RDF

Process

Figure 2: Process on how to use and modify the SPHN RDF Schema for the project specific needs.

This section displays information about the way a project should update the SPHN RDF schema depending on the modification to perform.

2.2.1 Modifying an existing class

A project modifying an existing class of the SPHN RDF schema in any way (minor edit or change breaking compatibility) must provide the modified class with their project prefix. This implies a new class is generated by the project, with the same naming but a different prefix (e.g. a modification in the class sphn:Encounter by the PSSS project would become psss:Encounter). In Protégé, a new class must be created in the project ontology with the same name but this IRI will be the project ontology IRI (e.g. https://biomedit.ch/rdf/sphn-ontology/psss/Encounter).

Note

If we follow the example provided, real data following the PSSS ontology must then provide the Encounter data elements based on the definition of the PSSS project. Therefore, the prefix used (and the IRI) will always be PSSS:Encounter (and https://biomedit.ch/rdf/sphn-ontology/psss/Encounter).

2.2.2 Modifying an existing property

Any change affecting a property from the SPHN RDF schema must result in the creation of a new property with the project ontology IRI. For example, DCC has defined a material property for the concept ‘biosample’: sphn:hasBiosampleMaterial, with a list of possible value set. The project PSSS decides to narrow down the list of possible values for this material property. Therefore, the PSSS project must define their own psss:hasBiosampleMaterial. In this psss:hasBiosampleMaterial, the value set will be restricted to only values allowed by the PSSS project.

If a project would like to reuse a property in another context (meaning to describe metadata of another class), a new property must be created following the conventions defined in the section SPHN RDF schema .

2.2.3 Creating a new property to an existing class

Adding a new property to an existing class can lead to two different scenarios.

  1. If the property does not change the meaning of the class, the project can define their property with their prefix associated to the SPHN class as shown in the example below:

  • sphn:Encounter (class)

  • psss:hasServiceType (new property)

The project should submit the change request of adding the new property into the concept to the DCC. If the change is evaluated to be of general importance, the DCC would adapt the concept accordingly in the next release of the SPHN RDF schema. This would result in the following:

  • sphn:Encounter

  • sphn:hasServiceType

  1. If the property changes the meaning of the class and breaks compatibility, a new class must be created with the project prefix (following the recommendations from Modifying an existing class) and the property would be defined for this new class:

  • psss:Encounter

  • psss:hasEndDate

For more guidance on knowing whether a property eventually breaks the meaning of a class or if a specific change needs the creation of a project-specific class/property, do not hesitate to contact the DCC members.

2.3 Meaning binding to controlled vocabulary

For the meaning binding you can use any controlled vocabulary that is appropriate for your concept. If you use SNOMED CT or LOINC, the SNOMED CT Browser and the LOINC Browser are valuable tools to find appropriate SNOMED CT concepts and LOINC codes for the meaning binding. To use the LOINC Browser, you would need to create a free LOINC account. There are good practices for meaning binding in SNOMED CT. Appropriate training is provided by SNOMED International on the elearning platform. Further, please refer to the guiding principles for controlled vocabulary. If you need help with meaning binding, please contact us at dcc@sib.swiss.

The integration of meaning binding to RDF classes is represented by owl:equivalentClass. The example below shows that the LOINC code 8302-2 is an equivalent class of the SPHN class BodyHeight:

###  https://biomedit.ch/rdf/sphn-ontology/sphn#BodyHeight
sphn:BodyHeight rdf:type owl:Class ;
               owl:equivalentClass <https://loinc.org/rdf/8302-2> ;
               rdfs:subClassOf sphn:Measurement ;
               rdfs:comment "height of the individual" ;
               rdfs:label "Body Height" .

To annotate an equivalent class through Protégé, please follow these instructions:

  1. on the Class hierarchy section, select the class of interest

  2. on the Description section click on the + sign next to Equivalent To

Description section of Protégé
  1. in the pop-up window that appears, go to the tab Class expression editor

Protégé Class Expression Editor tab selected
  1. in the text field, type the label of the equivalent class (for autocomplete, press Tab)

Protégé autocomplete

Note

  • The external terminologies (SNOMED CT and LOINC) must be provided in the ontology space in order to be able to find and connect the equivalent classes.

  • Classes composed of multiple words are better found via autocomplete when an apostrophe is entered at the beginning in the Class expression editor text field.

3. Valuesets as individuals in the RDF schema

Valuesets can be defined by the project in order to set and limit the possible values for a certain property (see more here). Each possible value needs to be created as an individual in RDF (owl:NamedIndividuals). These individuals are then grouped into the same valueset, represented with a specific class. This class is then set as being the range of the property, meaning that the individuals linked to that class are the possible values for that property.

The creation of a value as an individual and linking a set of values to a property require the following of these steps:

  1. Create an individual for each value:

  • Select tab Individuals,

  • Click on Add individual,

  • Write the name of the individual to generate the IRI,

  • Add a label for each individual created.

  1. If not done already, create a ValueSet class to group all sets of values

  2. Create a class which should be a sub-class of ValueSet. The IRI of the class should follow the convention: <DomainClassName>_<propertyName> where ‘DomainClassName’ is the Domain of the property.

  3. Select the class created, then:

  • Click on the + sign next to Instances,

  • Select the individuals that are linked to this ‘valueset class’ (multiple individuals can be selected with Ctrl+Click),

  • Click OK,

  • Now all individuals of a valueset are connected to a specific valueset class.

  1. The valueset class can now be added in the range of the property where these values can be used:

  • Select the property,

  • Click on the + sign next to Ranges (intersection),

  • Under Class hierarchy browse and select the specific valueset class,

  • Click OK.

For example, the property hasDiagnosticRadiologicExaminationMethod given for a DiagnosticRadiologicExamination class has six possible values (PET CT, CT, MRI, PET, SPECT, X-ray). These six values are created one by one as individuals. The class DiagnosticRadiologicExamination_method is then generated as a subclass of ValueSet. The six individuals are added as instances of the class``DiagnosticRadiologicExamination_method``. The class DiagnosticRadiologicExamination_method is set as the range of hasDiagnosticRadiologicExaminationMethod.

Note

If you would like to see the list of values for a valueset at a glance in the RDF file, you may use the owl:oneOf annotation. For providing this information, do the following in Protégé:

  • Select the specific class to annotate under ValueSet (e.g. in the previous example, DiagnosticRadiologicExamination_method),

  • Click on the + sign next to Equivalent To,

  • Select the Class expression editor,

  • Add the list of values in curly brackets (e.g. {CT , MRI , PET , 'PET CT' , SPECT , X-ray}), each value separated by a comma.

This would result in the following RDF statement for the example of DiagnosticRadiologicExamination_method:

###  https://biomedit.ch/rdf/sphn-ontology/sphn#DiagnosticRadiologicExamination_method
sphn:DiagnosticRadiologicExamination_method rdf:type owl:Class ;
                owl:equivalentClass [ rdf:type owl:Class ;
                                       owl:oneOf ( sphn:CT
                                                   sphn:MRI
                                                   sphn:PET
                                                   sphn:PETCT
                                                   sphn:SPECT
                                                   sphn:X-ray
                                                 )
                                     ] ;

4. Best practices when generating the RDF

When creating a new class or a new property, following best practices increases to some extent the consistency and the readability of the schema. Here are a few recommendations:

  • use UpperCamelCase notation for classes (e.g. AdministrativeGender) and lowerCamelCase notation for data and object properties (e.g. hasEndDateTime) when creating the IRIs,

  • data and object properties should follow the convention given at SPHN RDF properties,

  • for all classes and properties, generate a label (rdfs:label) with spaces in between words for better readability of classes and properties (e.g. hasEndDateTime would have as label has end date time),

  • for all classes and properties, create a description (rdfs:comment) that explains in an understandable and unambigous sentence the meaning of the class or property,

  • choose an appropriate controlled vocabulary (meaning binding) to represent your class through the use of owl:equivalentClass. See (SPHN Dataset controlled vocabulary) for the guiding principles for SNOMED CT and LOINC meaning binding.

5. Loading of the project into WebProtégé

Once the project has been updated in Protégé, it is possible to use WebProtégé (https://webprotege.dcc.sib.swiss/) for enabling the use of a collaborative environment for editing the ontology.

Note

WebProtégé can be used from the beginning to modify the SPHN ontology but please be aware that some features are not available on this web-based tool (e.g. it is not possible to create a property with the project IRI that has the same identifier as in the SPHN ontology), therefore why the documentation showed steps to update the ontology using Protégé.

The following steps must be taken for WebProtégé to correctly parse the data:

  1. All ontologies (i.e. project ontology + SPHN ontology + external terminologies) must be in RDF/XML, OWL/XML, Manchester OWL, Functional OWL or OBO format. TTL or N-Triples do not work.

  2. The main ontology file (i.e. the project ontology) must be named root-ontology.owl.

  3. The import of the SPHN ontology (specified in header of the project RDF schema file) have to use the ontology IRI. The version IRI, which would be best practice, does not work.

  4. Put all ontologies in a .zip file and load them into WebProtégé: Create New Project –> Fill in the different fields –> Browse for the .zip created –> Click Create New Project.

When this is done, the project RDF schema in WebProtégé may be accessed and all the necessary external ontologies will be imported automatically.

6. Reporting back to DCC

The DCC welcomes any feedback to the SPHN dataset and to the RDF schema to improve these specifications. If you have any specific change requests to the SPHN dataset, or to the RDF schema, please submit them by email to dcc@sib.swiss. For any change requests to the SPHN dataset, please include the concept(s) or the composedOf(s), which are affected by the change request, the version of the dataset, a description of the rationale behind the change request, and your proposal including suggested changes in a table structure following the SPHN Dataset design.