SPHN Dataset

Introduction

The SPHN Dataset contains a number of concepts to enable sharing clinical information in a standardized way between individual hospitals and with projects. Each concept contains all the information necessary to understand it. A concept refers to recommended value sets and semantic standards (e.g. LOINC, SNOMED CT, ICD-10, CHOP, ATC, ICD-O-3) to express the data. Additionally, SPHN concepts are composed in a modular way to express the same information in the same way even if the context is different, e.g. a substance can be the substance someone is allergy against or the active ingredient of a drug. The use of international standards as controlled vocabulary (e.g. SNOMED CT, LOINC) is fostering semantic interoperability not only within SPHN but also with other international partners. Creating a link to international ontologies allows us additionally leverage the domain knowledge that is present within these standards.

Scope of the SPHN Dataset

The current version SPHN Dataset includes, for example, demographic data, administrative case, diagnoses, procedures, lab results, medications, different measurements and biosample information as well as domain specific concepts for oncology, for example. The following criteria were applied to include a concept in the SPHN Dataset

  • Concepts of general importance for research (e.g. Birth date, Administrative gender, FOPH Diagnosis, Drug administration event)

  • Concepts relevant for more than one use case in SPHN (e.g. Allergy) or of high importance for other future personalized health projects (e.g. Oncological treatment assessment, TNM classification)

Guiding principles for concept design

Semantic representation and normalization

In the SPHN Dataset, a conceptual modeling approach has been applied where a concept is represented by the elements it is composed of (see Table 1). Each of those elements can potentially be a concept itself and the elements are separated according to its single meaning. Later on, the user has the freedom to choose the elements of interest for a given project, in the sense “take what is needed” instead of “all or nothing”.

Table 1. Example of concept Oxygen Saturation in the SPHN Dataset.

concept

Oxygen Saturation

fraction of oxygen present in the blood

composedOf

saturation

measured oxygen saturation

composedOf

datetime

datetime of measurement

composedOf

body site

body site of measurement

composedOf

method

method of measurement

composedOf

unit

unit of oxygen saturation

An alternative representation would be to create single concepts such as Arterial Oxygen Saturation, Intracardiac Oxygen Saturation, etc. However, in the SPHN Dataset the information of what is measured (saturation) is separated from the information where it is measured (body site) so that different parts of the meaning are held in different composedOf elements. This is allowing reuse of concepts.

Controlled vocabulary

To have an unambiguous semantic meaning of a concept, it is important to link the specific concepts and values of a value set to an international controlled vocabulary, so called meaning binding. It is not necessary to choose only one controlled vocabulary, a concept can be encoded in several standards. For the clinical concept we started with a meaning binding to SNOMED CT and to LOINC. To assign appropriate codes the following guiding principles have been applied:

  • Fit for purpose - binding to single LOINC or SNOMED CT code (otherwise not usable in URIs);

  • Suitability instead of completeness - no binding if there is no suitable SNOMED CT concept, or LOINC code;

  • Exact fit - no binding to more or less specific terms, e.g. ICD-O Diagnosis is not bound to 439401001 |Diagnosis (observable entity)|

  • Best fit - independent of SNOMED CT´s hierarchy;

Reuse of concepts: a meaning defined only once

With the iterative growth of the SPHN Dataset it becomes obvious that certain concepts are used in several therapeutic areas, such as intensive care and cardiology. In order to prevent one single meaning to be represented twice and differently, we define each meaning (concept) only once. For example, a substance can be the substance someone is allergic against or the active ingredient of a drug. The meaning of substance itself does not change even if it is an active ingredient of a drug or a substance causing an allergic reaction. Therefore, the meaning of Substance is represented in the Dataset only once.

Table 2. Concept Substance in the SPHN Dataset.

description

type

concept

Substance

any matter of defined composition that has discrete existence, whose origin may be biological, mineral or chemical

composedOf

code

code, name, coding system and version representing the substance, e.g. ATC or SNOMED CT

Code

composedOf

generic name

name of the substance, for not yet approved medications the international nonproprietary name (INN) of a substance given by the World Health Organization (WHO)

string

In the concept Drug, Substance is reused twice, once as active ingredient, and once as inactive ingredient. In the concept Allergy, Substance is reused as the substance to be considered being responsible for the allergic reaction.

Substance concept

For concept reuse in the SPHN Dataset, it is the type that specifies which concept is reused. The illustration below shows how in the Dataset, the concept Substance is reused in the concepts Drug and Allergy.

Substance concept

Specification of a concept

A concept is an independent element that carries a semantic meaning by itself. Every element used in a composed concept is a concept itself. A concept can refer to a data point (e.g. Data Provider Institute), or it can be an empty container where the data points are all represented by the concept’s composition (e.g. Systemic Arterial Blood Pressure). The elements (properties) of a concept are called composedOf. A composedOf can be based on an already defined concept and it can carry semantic information specific to the concept it is part of.

Unique number

Each concept and each composedOf is identified by a unique ID. The unique ID is a 10-digit numeric number. A new unique ID is created for a concept in case one (or more) of the following changes are performed to a concept:

  • Name of the concept changes;

  • Description of the concept changes;

  • Meaning binding of the concept changes.

A new unique ID is created for a composedOf in case one (or more) of the following changes are performed:

  • Name of the composedOf changes;

  • Description of the composedOf changes;

  • Meaning binding of the composedOf changes;

  • Value set or subset of the composedOf changes.

Description

For each concept and each composedOf there is a concise description in natural language. The description needs to explain the general (context independent) meaning of the concept. Since there are already very well formulated descriptions for biomedical concepts, e.g. in the UMLS Metathesaurus, existing descriptions are reused wherever possible. Descriptions are generally written in lower case. Abbreviations should be avoided unless they are stated in the list of abbreviations in the SPHN Dataset. Descriptions can contain examples to illustrate the meaning of the concept.

Semantic type

For each composedOf there is a type indicating what kind of data can be mapped to this composedOf. The following types are used:

  • string: a sequence of characters; used for free text information such as “problem”

  • temporal: any datetime information; used for time points such as assessment dates, start dates or end dates; granularity can vary from seconds (e.g. timestamps of a machine) to years (e.g. if only year of birth is allowed to be shared within a project); format should be: YYYY, YYYY-MM, YYYY-MM-DD or YYYY-MM-DDThh:mm:ss; or a qualitative information such as at ICU admission

  • quantitative: expressing a certain quantity, amount or range; usually there is a measurement unit attached to the quantity; technical types can be integer, float

  • qualitative: expressing a certain characteristic with a pre-defined set of options

The type can also be a concept pointing to another concept in the SPHN Dataset, e.g. Code, Body Site.

Standards and value sets

Standards are controlled terminologies, classification systems, ontologies or other coding systems that should be used or serve as the semantic standard for the value set or subset defined in column “value set or subset”. For certain concepts the standard is defined on a general use case level without any further limitations which sub structure of the standard to use. In such case the column “value set or subset” is empty.

Table 3. Example of concept Unit.

description

standard

value set or subset

concept

Unit

unit of measurement

UCUM

For certain concepts it is defined in column “value set or subset” which sub structure of the standard is applicable.

Table 4. Example of concept Body Site in the SPHN Dataset.

description

standard

value set or subset

concept

Body Site

any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities

composedOf

code

code, name, coding system and version assigned to the body site

SNOMED CT

child of : 123037004 | body structure (body structure) |

In the SPHN Dataset, the version of the standard is stated in case the value set has been specified in detail for a composedOf.

Table 5. Example of concept Care Handling in the SPHN Dataset.

description

standard

value set or subset

concept

Care Handling

describes the relationship between the individual and care provider institute

composedOf

code

code, name, coding system and version assigned to the type of relationship between the individual and care provider institute

SNOMED CT 2021-01-31

394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)|; 261665006 | Unknown (qualifier value)|

In general, a value set specifies a set of allowed values. Value sets are defined

Meaning binding

With meaning binding to SNOMED CT concepts and/or LOINC codes, the SPHN dataset supports machine readable meaning in addition to the human readable descriptions. The meaning bindings can be used to identify Unique Resource Identifiers (URIs), e.g. when using the SPHN Dataset as the basis for developing an RDF (Resource Description Framework) schema.

Table 6. Example of meaning binding in the SPHN Dataset.

description

meaning binding SNOMED CT

meaning binding LOINC

concept

Problem Condition

clinical condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern

55607006 |Problem (finding)|

44100-6 Medical problem

Development process

New concepts and changes requests to the SPHN Dataset need to be submitted to the SPHN Data Coordination Center (DCC). The development of the new SPHN Dataset is coordinated by the DCC, in close collaboration with the IT experts of the 5 Swiss university hospitals and the domain experts of the SPHN Driver projects. One to two releases per year are published after approval by Hospital IT strategy alignment group of SPHN.

Availability and usage rights

The SPHN Dataset is available on the SPHN website.

The SPHN Dataset is under the CC-BY-NC-SA 4.0 License.