SPHN Dataset
Introduction
The SPHN Dataset contains a number of concepts to enable sharing clinical information in a standardized way between individual hospitals and with projects. Each concept contains all the information necessary to understand it. A concept refers to recommended value sets and semantic standards (e.g. LOINC, SNOMED CT, ICD-10, CHOP, ATC, ICD-O-3) to express the data. Additionally, SPHN concepts are composed in a modular way to express the same information in the same way even if the context is different, e.g. a substance can be the substance someone is allergy against or the active ingredient of a drug. The use of international standards as controlled vocabulary (e.g. SNOMED CT, LOINC) is fostering semantic interoperability not only within SPHN but also with other international partners. Creating a link to international ontologies allows us additionally leverage the domain knowledge that is present within these standards.
Scope of the SPHN Dataset
The current version SPHN Dataset includes, for example, demographic data, administrative case, diagnoses, procedures, lab results, medications, different measurements and biosample information as well as domain specific concepts for oncology, for example. The following criteria were applied to include a concept in the SPHN Dataset
Concepts of general importance for research (e.g.
Birth date
,Administrative gender
,FOPH Diagnosis
,Drug administration event
)Concepts relevant for more than one use case in SPHN (e.g.
Allergy
) or of high importance for other future personalized health projects (e.g.Oncological treatment assessment
,TNM classification
)
Guiding principles for concept design
Semantic representation and normalization
In the SPHN Dataset, a conceptual modeling approach has been applied where a concept is represented by the elements it is composed of (see Table 1). Each of those elements can potentially be a concept itself and the elements are separated according to its single meaning. Later on, the user has the freedom to choose the elements of interest for a given project, in the sense “take what is needed” instead of “all or nothing”.
concept |
Oxygen Saturation |
fraction of oxygen present in the blood |
---|---|---|
composedOf |
saturation |
measured oxygen saturation |
composedOf |
datetime |
datetime of measurement |
composedOf |
body site |
body site of measurement |
composedOf |
method |
method of measurement |
composedOf |
unit |
unit of oxygen saturation |
An alternative representation would be to create single concepts such as Arterial Oxygen Saturation, Intracardiac Oxygen Saturation, etc. However, in the SPHN Dataset the information of what is measured (saturation) is separated from the information where it is measured (body site) so that different parts of the meaning are held in different composedOf elements. This is allowing reuse of concepts.
Controlled vocabulary
To have an unambiguous semantic meaning of a concept, it is important to link the specific concepts and values of a value set to an international controlled vocabulary, so called meaning binding. It is not necessary to choose only one controlled vocabulary, a concept can be encoded in several standards. For the clinical concept we started with a meaning binding to SNOMED CT and to LOINC. To assign appropriate codes the following guiding principles have been applied:
Fit for purpose - binding to single LOINC or SNOMED CT code (otherwise not usable in URIs);
Suitability instead of completeness - no binding if there is no suitable SNOMED CT concept, or LOINC code;
Exact fit - no binding to more or less specific terms, e.g. ICD-O Diagnosis is not bound to 439401001 |Diagnosis (observable entity)|
Best fit - independent of SNOMED CT´s hierarchy;
Reuse of concepts: a meaning defined only once
With the iterative growth of the SPHN Dataset it becomes obvious that certain concepts are used in several therapeutic areas, such as intensive care and cardiology. In order to prevent one single meaning to be represented twice and differently, we define each meaning (concept) only once. For example, a substance can be the substance someone is allergic against or the active ingredient of a drug. The meaning of substance itself does not change even if it is an active ingredient of a drug or a substance causing an allergic reaction. Therefore, the meaning of Substance is represented in the Dataset only once.
description |
type |
||
---|---|---|---|
concept |
Substance |
any matter of defined composition that has discrete existence, whose origin may be biological, mineral or chemical |
|
composedOf |
code |
code, name, coding system and version representing the substance, e.g. ATC or SNOMED CT |
Code |
composedOf |
generic name |
name of the substance, for not yet approved medications the international nonproprietary name (INN) of a substance given by the World Health Organization (WHO) |
string |
In the concept Drug, Substance is reused twice, once as active ingredient, and once as inactive ingredient. In the concept Allergy, Substance is reused as the substance to be considered being responsible for the allergic reaction.
For concept reuse in the SPHN Dataset, it is the type that specifies which concept is reused. The illustration below shows how in the Dataset, the concept Substance is reused in the concepts Drug and Allergy.
Specification of a concept
A concept is an independent element that carries a semantic meaning by itself. Every element used in a composed concept is a concept itself. A concept can refer to a data point (e.g. Data Provider Institute
), or it can be an empty container where the data points are all represented by the concept’s composition (e.g. Systemic Arterial Blood Pressure
).
The elements (properties) of a concept are called composedOf. A composedOf can be based on an already defined concept and it can carry semantic information specific to the concept it is part of.
Unique number
Each concept and each composedOf is identified by a unique ID. The unique ID is a 10-digit numeric number. A new unique ID is created for a concept in case one (or more) of the following changes are performed to a concept:
Name of the concept changes;
Description of the concept changes;
Meaning binding of the concept changes.
A new unique ID is created for a composedOf in case one (or more) of the following changes are performed:
Name of the composedOf changes;
Description of the composedOf changes;
Meaning binding of the composedOf changes;
Value set or subset of the composedOf changes.
Description
For each concept and each composedOf there is a concise description in natural language. The description needs to explain the general (context independent) meaning of the concept. Since there are already very well formulated descriptions for biomedical concepts, e.g. in the UMLS Metathesaurus, existing descriptions are reused wherever possible. Descriptions are generally written in lower case. Abbreviations should be avoided unless they are stated in the list of abbreviations in the SPHN Dataset. Descriptions can contain examples to illustrate the meaning of the concept.
Semantic type
For each composedOf there is a type indicating what kind of data can be mapped to this composedOf. The following types are used:
string: a sequence of characters; used for free text information such as “problem”
temporal: any datetime information; used for time points such as assessment dates, start dates or end dates; granularity can vary from seconds (e.g. timestamps of a machine) to years (e.g. if only year of birth is allowed to be shared within a project); format should be:
YYYY
,YYYY-MM
,YYYY-MM-DD
orYYYY-MM-DDThh:mm:ss
; or a qualitative information such asat ICU admission
quantitative: expressing a certain quantity, amount or range; usually there is a measurement unit attached to the quantity; technical types can be integer, float
qualitative: expressing a certain characteristic with a pre-defined set of options
The type can also be a concept pointing to another concept in the SPHN Dataset, e.g. Code, Body Site.
Standards and value sets
Standards are controlled terminologies, classification systems, ontologies or other coding systems that should be used or serve as the semantic standard for the value set or subset defined in column “value set or subset”. For certain concepts the standard is defined on a general use case level without any further limitations which sub structure of the standard to use. In such case the column “value set or subset” is empty.
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Unit |
unit of measurement |
UCUM |
For certain concepts it is defined in column “value set or subset” which sub structure of the standard is applicable.
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Body Site |
any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities |
||
composedOf |
code |
code, name, coding system and version assigned to the body site |
SNOMED CT |
In the SPHN Dataset, the version of the standard is stated in case the value set has been specified in detail for a composedOf.
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Care Handling |
describes the relationship between the individual and care provider institute |
||
composedOf |
code |
code, name, coding system and version assigned to the type of relationship between the individual and care provider institute |
SNOMED CT 2021-01-31 |
394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)|; 261665006 | Unknown (qualifier value)| |
In general, a value set specifies a set of allowed values. Value sets are defined
for concepts of type qualitative, e.g. principal; secondary; complementary (for diagnosis rank);
for concepts of type Code, e.g. 394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)|; 261665006 | Unknown (qualifier value)|;
or a range of allowed SNOMED CT codes is defined, e.g. child of : 122869004 | measurement procedure (procedure) | (for Measurement Procedure - code).
Meaning binding
With meaning binding to SNOMED CT concepts and/or LOINC codes, the SPHN dataset supports machine readable meaning in addition to the human readable descriptions. The meaning bindings can be used to identify Unique Resource Identifiers (URIs), e.g. when using the SPHN Dataset as the basis for developing an RDF (Resource Description Framework) schema.
description |
meaning binding SNOMED CT |
meaning binding LOINC |
||
---|---|---|---|---|
concept |
Problem Condition |
clinical condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern |
Development process
New concepts and changes requests to the SPHN Dataset need to be submitted to the SPHN Data Coordination Center (DCC). The development of the new SPHN Dataset is coordinated by the DCC, in close collaboration with the IT experts of the 5 Swiss university hospitals and the domain experts of the SPHN Driver projects. One to two releases per year are published after approval by Hospital IT strategy alignment group of SPHN.
Availability and usage rights
The SPHN Dataset is available on the SPHN website.
The SPHN Dataset is under the CC-BY-NC-SA 4.0 License.