SNOMED CT

Introduction to the classification

Systematized Nomenclature of Medicine Clinical Terms – SNOMED CT (https://www.snomed.org/) is a global standard for health terms and a common language designed for use in Electronic Health Records (EHRs). The international edition of SNOMED CT is released twice a year by the International Health Terminology Standards Development Organization (IHTSDO). Switzerland holds a license of SNOMED CT and Swiss organizations can use SNOMED CT free of charge when registering with the Swiss National Release Center (NRC) eHealth Suisse. SNOMED CT is published in RF2 format and can be explored on the web through https://browser.ihtsdotools.org. IHTSDO provides regular training courses as well as a library with training videos on all topics, which are available to anyone after free registration on SNOMED CTs e-learning platform (https://elearning.ihtsdotools.org/).

SNOMED CT provides unique identifiers for concepts (clinical ideas), which are defined though is-a relationships and attribute relationships to other concepts. It is providing different descriptions with their own identifiers such as synonyms, and it is organized in a polyhierarchical manner which means that a concept can have multiple parents.

Information for use in data science

SNOMED CT can be used for analytics of structured and unstructured data, e.g. querying clinical data using the machine processable concept definitions defined in SNOMED CT, or using SNOMED CT for analyzing free text with Natural Language Processing (NLP) tools. In this fact sheet we focus on analytics of structured data. SNOMED CT features described above allow queries using the SNOMED CT hierarchies (e.g. body structure, substance, clinical finding) as well as so called attribute relationships (e.g. causative agent, pathological process). Further, SPHN defines value sets based on SNOMED CT which can help defining query criteria.

Implementation of SNOMED CT in RDF

SNOMED CT in RDF has been generated thanks to the Snomed OWL Toolkit <https://github.com/IHTSDO/snomed-owl-toolkit>.

The OWL produced by the Snomed OWL Toolkit outputs the data to owl functional syntax. This format is readable by some ontology editors but not all databases. Therefore DCC provides conversion to other formats such as Turtle or ntriples.

The namespace provided in the RDF to refer to SNOMED CT terms is: <https://snomed.info/id/>.

The ontology IRI defined by SNOMED is: <http://snomed.info/sct/900000000000207008> A version IRI is provided for each version in the form of: <http://snomed.info/sct/900000000000207008/version/20210304> where the last part is composed of the date in the form of yyyyMMdd of the release.

In SNOMED CT several information is stored:

  1. The OWL conversion implements every concept as an owl:class and properties as owl:ObjectProperty.

  2. Several rdfs:subClassOf are constructing the hierarchy of the classes and rdfs:subPropertyOf in the properties.

  3. Names are annotated using rdfs:label as well as skos:altLabel and skos:prefLabel. The annotations are localized typically with the language tag @en. In some cases there exist alternative labels in @en-gb or @en-us.

  4. owl:equivalentClass or rdfs:subClassOf is used when there is an overlap with other classes.

Example of a SPARQL query

The SPHN dataset defines that SNOMED CT is used as a standard to express the substance that triggered an allergy episode in a patient. It is further defined that the SNOMED CT concept identifying the substance must be a sub concept of SNOMED CT concept 105590001 |Substance (substance)|. Connecting biomedical data with SNOMED CT allows for queries using the SNOMED CT hierarchies. For our example illustrating one of these hierarchy paths we imagine that for some patients an allergy episode for the substance Peanut is recorded, and a researcher is interested to query for all patients that experienced an allergy episode due to consumption of legume (synonym of Pulse vegetable).

The code block below presents an example data with information about four patients, where three of them have an allergy: two are annotated to be allergic to Pulse vegetable and one to Peanut specifically:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX snomed: <http://snomed.info/id/>
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn/>
PREFIX resource: <http://biomedit.ch/rdf/sphn-resource/>

resource:patient123 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "patient123" .

resource:patient234 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "patient234" .

resource:patient345 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "patient345" .

resource:patient456 a sphn:SubjectPseudoIdentifier ;
    sphn:hasIdentifier "patient456" .

resource:allergyEpisode123 a sphn:AllergyEpisode ;
    sphn:hasSubstance resource:substance1 ;
    sphn:hasSubjectPseudoIdentifier resource:patient123 .

resource:allergyEpisode234 a sphn:AllergyEpisode ;
    sphn:hasSubstance resource:substance2 ;
    sphn:hasSubjectPseudoIdentifier resource:patient234 .

resource:allergyEpisode456 a sphn:AllergyEpisode ;
    sphn:hasSubstance resource:substance1 ;
    sphn:hasSubjectPseudoIdentifier resource:patient456 .

resource:substance1 a sphn:Substance ;
    sphn:hasSubstanceCode resource:substance1-code .

# Pulse vegetable
resource:substance1-code a snomed:227313005 .

resource:substance2 a sphn:Substance ;
    sphn:hasSubstanceCode resource:substance2-code .

# Peanuts
resource:substance2-code a snomed:762952008 .

A simple query enables to retrieve patients allergic to a substance in the family of Pulse Vegetable:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX snomed: <http://snomed.info/id/>
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn/>
PREFIX resource: <http://biomedit.ch/rdf/resource/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select DISTINCT ?patient
where {
    # Allergy episodes and their substance codes:
    ?allergy rdf:type sphn:AllergyEpisode .
    ?allergy sphn:hasSubstance ?substance .
    ?substance sphn:hasSubstanceCode ?code .

    # Patients linked to these episodes:
    ?allergy sphn:hasSubjectPseudoIdentifier ?patient .

    # Substance code should be a pulse vegetable (snomed:227313005) or any descendant:
    ?code rdf:type ?pulse_veg_and_descendants .
    ?pulse_veg_and_descendants rdfs:subClassOf* snomed:227313005 .

}

Therefore, the query returns the three patients even if one was specifically allergic to Peanuts:

Results of the query

?patient

patient123

patient234

patient456

This is due to the hierarchical structure of ontology and the reasoning possibilities offered by semantic graph technologies. For the query above the SNOMED CT substance hierarchy is important, namely the concept Pulse vegetable and the concept Peanut which is a direct descendent of Pulse vegetable in SNOMED CT.

  • 105590001 |Substance (substance)|

    • . . .

    • 227313005 |Pulse vegetable (substance)|

      • 762952008 |Peanut (substance)|

Availability and usage rights

The SNOMED CT RDF file is available in the Terminology Service through the BioMedIT portal (accessible via SWITCH edu-ID).

The copyright follows the instructions provided by SNOMED CT (https://www.snomed.org/), SNOMED CT is copyright © SNOMED International 2021 v3.15.1., SNOMED CT international. In order to use the file please register with eHealth Suisse for an affiliate license for SNOMED CT (free of charge) https://www.e-health-suisse.ch/technik-semantik/semantische-interoperabilitaet/snomed-ct/registration-und-lizenz.html