SPHN Dataset

Introduction

The SPHN Dataset contains atomic building blocks, called concepts, that can be used to represent biomedical data and its meaning. With these well-defined concepts, clinical information can be understood the same way across hospitals and projects. E ach concept contains all elements, called composedOfs, to understand it. A concept refers to recommended value sets and/or semantic standards (e.g. LOINC, SNOMED CT, ICD-10-GM, CHOP, ATC, ICD-O-3, HGNC, GENO, SO) to express the data. Additionally, SPHN concepts are composed in a modular way to express the same information in the same way even if the context is different, e.g. a substance can be the substance someone is allergy against or the active ingredient of a drug. The use of internationally recognized standards as controlled vocabulary, such as SNOMED CT and LOINC, is fostering semantic interoperability not only within SPHN but also with international partners. Creating links to international ontologies allows us additionally leverage the domain knowledge that is represented in these ontologies.

Scope of the SPHN Dataset

The SPHN Dataset includes concepts for core clinical data, such as, demographic data, administrative case, diagnoses, procedures, lab results, medications, different measurements and biosample information as well as concepts for medical specialties, such as oncology and intensive care. The following criteria were applied to include a concept in the SPHN Dataset

  • Concepts of general importance for research (e.g. Birth Date, Administrative Gender, FOPH Diagnosis, Drug Administration Event)

  • Concepts relevant for more than one use case in SPHN (e.g. Allergy) or of high importance for other future personalized health projects (e.g. Oncological Treatment Assessment, TNM Classification)

Guiding principles for concept design

Conceptualization and semantic representation

In the SPHN Dataset, a concept is represented by the elements it is composed of (see Table 1). Each of those elements can potentially be a concept itself and the elements are separated according to its single meaning. Each project can choose the elements of interest according to the research question, in the sense “take what is needed” instead of “all or nothing”.

Table 1. Example of concept Oxygen Saturation in the SPHN Dataset.

concept

Oxygen Saturation

fraction of oxygen present in the blood

composedOf

quantity

value and unit of the concept

composedOf

measurement datetime

datetime of measurement

composedOf

body site

body site where the concept was measured, performed or collected

composedOf

measurement method

measurement method of the concept

An alternative representation would be to create single concepts such as Arterial Oxygen Saturation, Intracardiac Oxygen Saturation, etc. However, in the SPHN Dataset the information of what is measured (quantity) is separated from the information where it is measured (body site) so that different parts of the meaning are held in different composedOf elements. This is allowing the reuse of concepts.

The principle of reuse requires generalized descriptions, such as “value and unit of the concept”. Such general descriptions are accompanied with contextualized descriptions explaining the meaning for each single composedOf for the reader.

Table 2. Example of concept Oxygen Saturation in the SPHN Dataset with contextualized concept names and descriptions.

concept

Oxygen Saturation

fraction of oxygen present in the blood

composedOf

saturation

measured oxygen saturation, and unit

composedOf

measurement datetime

datetime of measurement

composedOf

body site

body site of measurement

composedOf

measurement method

method of measurement

Controlled vocabulary

International controlled vocabularies provide unambiguous semantic meaning for concepts. Linking the SPHN concepts to such controlled vocabularies, also referred to as meaning binding, provides expressions that are machine-readable and human-readable. For the meaning binding, we are not limiting to only one controlled vocabulary, a concept can be encoded in several standards. For example, clinical concepts are bound to SNOMED CT concepts and/or to LOINC codes and genomic concepts can use genomic-specific terminologies such as HGNC, GENO and SO. For further guidance on meaning binding please refer to the meaning binding section.

In addition, controlled vocabularies are used as standards for value set definitions. That means instead of value set definitions, such as 1=male; 2=female; 33=other the value set for the SPHN Administrative Gender concept contains the following values coming from the controlled vocabulary SNOMED CT: 446151000124109 |Identifies as male gender (finding)|; 446141000124107 |Identifies as female gender (finding)|; 74964007 |Other (qualifier value)|. By using a standard vocabulary for value sets, the SPHN Dataset supports interoperability with datasets from other initiatives or organizations that use a controlled vocabulary, such as SNOMED CT.

Reuse of concepts: a meaning defined only once

With the iterative growth of the SPHN Dataset, it becomes obvious that certain concepts are used in several medical specialties, such as intensive care and cardiology. In order to prevent one single meaning to be represented twice and differently, we define each meaning (concept) only once. For example, a body site can be the body site a heart rate was measured, the body site oxygen saturation was measured, or the body site the patient felt pain. The meaning of body site itself does not change no matter if it is the body site a heart rate was measured or another measurement or procedure was performed on the body site. Therefore, the meaning of Body Site is represented in the Dataset only once.

Table 3. Concept Body Site in the SPHN Dataset.

general description

type

concept

Body Site

any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities

composedOf

code

code, name, coding system and version describing the concept

Code

composedOf

laterality

localization with respect to the side of the body

Laterality

The Body Site concept is reused several times in the SPHN Dataset. For example, in the concept Heart Rate and in the concept Oxygen Saturation, Body Site is used to describe the body site the Heart Rate or Oxygen Saturation was measured. In the concept Access Device Presence, Body Site is used twice, once as insertion point and once as resting point.

Substance concept

Figure 1. Concept Body Site and its reuse.

For concept reuse in the SPHN Dataset, it is the type that specifies which concept is reused. The illustration below shows the concept Body Site and its reuse in the concepts Heart Rate and Oxygen Saturation by setting Body Site as a type.

Substance concept

Figure 2. Concept Body Site with its composedOfs and its reuse.

Semantic inheritance

Semantic inheritance is a mechanism where a specific concept can be derived from a broader concept. The specific concept (child concept) and the broad concept (parent concept) have a hierarchical relationship and share common composedOfs. For example, a diagnosis in general has a datetime when it was recorded, and any specific diagnosis, such as ICD-O diagnosis, also has the datetime information of when it was recorded. Therefore, both concepts share the same composedOf record datetime. The following graphic illustrates the hierarchical relationship between the parent concept Diagnosis and its children concepts FOPH Diagnosis, Nursing Diagnosis and ICD-O Diagnosis.

Inheritance Diagnosis

Figure 3. Concept Diagnosis and its children concepts.

In the SPHN Dataset, inheritance is represented by the child concept having the parent concept as its type. And the elements that the child concept inherits from the parent concept are specified as inherited. The child concept inherits all composedOfs from the parent concept but it can have additional composedOfs.

Inheritance Diagnosis

Figure 4. Concept FOPH Diagnosis inheriting elements from the Diagnosis concept.

Specification of a concept

A concept is an independent element that carries a semantic meaning by itself. Every element used in a composed concept is a concept itself. A concept can refer to a data point (e.g. Data Provider Institute), or it can be an empty container where the data points are all represented by the concept’s composition (e.g. Blood Pressure). The elements (properties) of a concept are called composedOf. A composedOf can be based on an already defined concept and it can carry semantic information specific to the concept it is part of.

IRI - Internationalized Resource Identifier

Each concept and each composedOf is uniquely identified by an IRI. The IRI is a resolvable versioned URL (Uniform Resource Locator) pointing to a website where details of the current version of the concept or composedOf can be found.

Concept name

There is a general and a contextualized concept name, which can be different for composedOfs. The general concept name aims to provide a unique and consistent naming across the complete dataset for distinguishing elements that have the same meaning, independent of the context in which they are used. The contextualized concept names aims to provide a more specific naming for the composedOf to be understandable within its use in the particular concept, mainly used for human understanding. As an example, the contextualized composedOf name “encounter identifier” is related to the general composedOf name “identifier”.

Description

For each concept and composedOf there is a concise description in natural language. The description needs to explain the general (context independent) meaning of the concept. Since there are already very well formulated descriptions for biomedical concepts, e.g. in the UMLS Metathesaurus, existing descriptions are reused wherever possible. Abbreviations should be avoided unless they are stated in the list of abbreviations in the SPHN Dataset. Descriptions can contain examples to illustrate the meaning of the concept.

Table 4. Example of general and contextualized descriptions in the SPHN Dataset.

general concept name

general description

contextualized concept name

contextualized description

identifier

unique identifier identifying the concept

encounter identifier

a unique pseudonymized encounter ID for the given data delivery/research purpose

Semantic type

For each composedOf there is a type indicating what kind of data can be mapped to this composedOf. The following types are used:

  • string: a sequence of characters; used for free text information such as “problem” in the Problem Condition concept,

  • temporal: any datetime information; used for time points such as assessment dates, start dates or end dates; granularity can vary from seconds (e.g. timestamps of a machine) to years (e.g. if only year of birth is allowed to be shared within a project); format should be: YYYY, YYYY-MM, YYYY-MM-DD or YYYY-MM-DDThh:mm:ss,

  • quantitative: expressing a certain value of a Quantity; technical types can be integer, float,

  • qualitative: expressing a certain characteristic with a pre-defined set of options, which are not expressed with controlled vocabulary (yet).

The type can also be a concept pointing to another concept in the SPHN Dataset, e.g. Code, Body Site.

Standards and value sets

“Standards” are controlled terminologies, classification systems, ontologies or other coding systems. They are to be used to represent the data in an interoperable way, i.e. they serve as semantic standards for value set definitions. Value set definitions can be broad, medium-detailed or detailed. For a broad value set definition only the standard is stated in the “standards” column and the column “value set or subset” is empty. Medium-detailed definitions refer to a substructure of a standard. Detailed definitions contain a finite set of qualitative options or codes from a controlled vocabulary. The following examples illustrate the difference between these three types of definitions.

Table 5. Example of broad definition.

description

standard

value set or subset

concept

Unit

unit of measurement

composedOf

code

code, name, coding system and version describing the concept

UCUM

Table 6. Example of medium-detailed definition.

description

standard

value set or subset

concept

Body Site

any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities

composedOf

code

code, name, coding system and version describing the concept

SNOMED CT

descendant of : 123037004 | body structure (body structure) |

Table 7. Example of detailed definition.

description

standard

value set or subset

concept

Care Handling

describes the relationship between the individual and care provider institute

composedOf

code

code, name, coding system and version describing the type of the concept

SNOMED CT

394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)|

Value sets are defined for concepts of type:

Meaning binding

SPHN concepts with clinical or other meaning are associated by a so called meaning binding to an international standard (e.g., SNOMED CT,LOINC or other). These meaning bindings support the machine readability of the concepts and allow researchers to use the clinical knowledge contained in these terminologies in their research projects.

There are several criteria to consider in meaning binding, and the following guiding principles help to understand concept selection and find meaning bindings for new concepts:

  • Fit for purpose - binding to a single concept or code from an external terminology (otherwise not usable in URIs);

  • Suitability instead of completeness - no binding if there is no suitable concept or code from an external terminology;

  • Exact fit - no binding to more or less specific terms, e.g. ICD-O Diagnosis is not bound to 439401001 |Diagnosis (observable entity)|

  • Best fit - independent of the standards’ hierarchy except

  • LOINC: no binding to panel codes

  • SNOMED CT: attribute hierarchy codes not to be used for concepts

  • Avoid same code for different items in the SPHN Dataset.

The following example illustrates how meaning bindings are stated in the SPHN Dataset. In the example, there is a meaning binding to a SNOMED CT concept and a meanining binding to a LOINC code.

Table 8. Example of meaning binding in the SPHN Dataset.

description

meaning binding

concept

Problem Condition

clinical condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern

SNOMED CT: 55607006 |Problem (finding)|; LOINC: 44100-6 Medical problem

Development process

Request for adding a new concept and/or making changes to the SPHN Dataset need to be submitted to the SPHN Data Coordination Center (DCC). The SPHN Dataset is being developed in collaboration with experts from the five Swiss university hospitals, SPHN National Data Stream (NDS) data managers, and clinical and genomic experts, with the DCC coordinating the development. One to two releases per year are published after approval by Hospital IT strategy alignment group of SPHN.

Availability and usage rights

The SPHN Dataset is available on the SPHN website.

The SPHN Dataset is under the CC-BY 4.0 License.

For any question or comment, please contact the Data Coordination Center (DCC) at dcc@sib.swiss.