SPHN Dataset
Introduction
The SPHN Dataset contains atomic building blocks, called concepts, that can be used to represent biomedical data and its meaning. With these well-defined concepts, clinical information can be understood the same way across hospitals and projects. Each concept contains all elements, called composedOfs, to understand it. A concept refers to recommended value sets and/or semantic standards (e.g. LOINC, SNOMED CT, ICD-10-GM, CHOP, ATC, ICD-O-3, HGNC, GENO, SO) to express the data. Additionally, SPHN concepts are composed in a modular way to express the same information in the same way even if the context is different, e.g. a substance can be the substance someone is allergy against or the active ingredient of a drug. The use of internationally recognized standards as controlled vocabulary, such as SNOMED CT and LOINC, is fostering semantic interoperability not only within SPHN but also with international partners. Creating links to international ontologies allows us additionally leverage the domain knowledge that is represented in these ontologies.
Scope of the SPHN Dataset
The SPHN Dataset includes concepts for core clinical data, such as, demographic data, administrative case, diagnoses, procedures, lab results, medications, different measurements and biosample information as well as concepts for medical specialties, such as oncology and intensive care. The following criteria were applied to include a concept in the SPHN Dataset
Concepts of general importance for research (e.g.
Birth Date
,Administrative Gender
,FOPH Diagnosis
,Drug Administration Event
)Concepts relevant for more than one use case in SPHN (e.g.
Allergy
) or of high importance for other future personalized health projects (e.g.Oncological Treatment Assessment
,TNM Classification
)
Guiding principles for concept design
Conceptualization and semantic representation
In the SPHN Dataset, a concept is represented by the elements it is composed of (see Table 1). Each of those elements can potentially be a concept itself and the elements are separated according to its single meaning. Each project can choose the elements of interest according to the research question, in the sense “take what is needed” instead of “all or nothing”.
concept |
Oxygen Saturation |
fraction of oxygen present in the blood |
---|---|---|
composedOf |
quantity |
value and unit of the concept |
composedOf |
measurement datetime |
datetime of measurement |
composedOf |
body site |
body site where the concept was measured, performed or collected |
composedOf |
measurement method |
measurement method of the concept |
An alternative representation would be to create single concepts such as Arterial Oxygen Saturation, Intracardiac Oxygen Saturation, etc. However, in the SPHN Dataset the information of what is measured (quantity) is separated from the information where it is measured (body site) so that different parts of the meaning are held in different composedOf elements. This is allowing the reuse of concepts.
The principle of reuse requires generalized descriptions, such as “value and unit of the concept”. Such general descriptions are accompanied with contextualized descriptions explaining the meaning for each single composedOf for the reader.
concept |
Oxygen Saturation |
fraction of oxygen present in the blood |
---|---|---|
composedOf |
saturation |
measured oxygen saturation, and unit |
composedOf |
measurement datetime |
datetime of measurement |
composedOf |
body site |
body site of measurement |
composedOf |
measurement method |
method of measurement |
Knowledge-centric design
SPHN adopts a knowledge-centric approach to the design of concepts. By “knowledge-centric” we mean that the concepts should be designed in a way that represents either a process or an entity. Thus, if one is modeling something that occurs over a period of time and has an input and generates an output then one would model it as a Process. For example, Measurement (such as Heart Rate, Blood Pressure, Oxygen Saturation), Electrocardiographic Procedure, and Adverse Event can be considered as Processes. And if one is modeling something that is an input and/or an output to a Process then one would model it as an Entity. For example, Sample, Outcome, Result, and File are examples of Entities.
Applied to concept design, one should therefore follow a logical train of thought from past to present. Such chronological order comes naturally when describing the series of events resulting in data generation. In general, some kind of input undergoes a process or event to result in an output.
For example, lets consider a patient that undergoes an electrocardiographic procedure which results in an electrocardiogram.

Figure 1. Electrocardiographic Procedure
The figure above illustrates at a very high level a Process concept that has an input Entity and an output Entity. The next level below there is a Procedure concept (which is a type of Process) and has an input Patient (a.k.a. Subject Pseudo Identifier) and an output Result. Then the next level below is the Electrocardiographic Procedure concept (which is a type of Procedure) with an input Patient and an output Electrocardiogram.
Domain independence
Data concepts used in several medical specialties, such as Body Site and Intent, or general concepts, such as Quantity should be defined in a general manner. This is allowing the reuse of concepts in different contexts, e.g. the reuse of the concept Quantity in Body Height and Age. The general description of the Quantity concept and its elements is universal, as shown in the table below, and therefore reuse is possible.
concept |
Quantity |
an amount or a number of something |
---|---|---|
composedOf |
value |
countable amount of something, e.g. 300 |
composedOf |
unit |
unit of the amount, e.g. mL, mg, min |
composedOf |
comparator |
qualifier describing imprecise values |
Controlled vocabulary
International controlled vocabularies provide unambiguous semantic meaning for concepts. Linking the SPHN concepts to such controlled vocabularies, also referred to as meaning binding, provides expressions that are machine-readable and human-readable. For the meaning binding, we are not limiting to only one controlled vocabulary, a concept can be encoded in several standards. For example, clinical concepts are bound to SNOMED CT concepts and/or to LOINC codes and genomic concepts can use genomic-specific terminologies such as HGNC, GENO and SO. For further guidance on meaning binding please refer to the meaning binding section.
In addition, controlled vocabularies are used as standards for value set definitions.
That means instead of value set definitions, such as 1=male; 2=female; 33=other
the value set for the SPHN Administrative Gender concept contains the following values
coming from the controlled vocabulary SNOMED CT:
446151000124109 |Identifies as male gender (finding)|
; 446141000124107 |Identifies as female gender (finding)|
;
74964007 |Other (qualifier value)|
.
By using a standard vocabulary for value sets, the SPHN Dataset supports interoperability
with datasets from other initiatives or organizations that use a controlled vocabulary, such as SNOMED CT.
Reuse of concepts: a meaning defined only once
With the iterative growth of the SPHN Dataset, it becomes obvious that certain concepts are used in several medical specialties, such as intensive care and cardiology. In order to prevent one single meaning to be represented twice and differently, we define each meaning (concept) only once. For example, a body site can be the body site a heart rate was measured, the body site oxygen saturation was measured, or the body site the patient felt pain. The meaning of body site itself does not change no matter if it is the body site a heart rate was measured or another measurement or procedure was performed on the body site. Therefore, the meaning of Body Site is represented in the Dataset only once.
general description |
type |
||
---|---|---|---|
concept |
Body Site |
any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities |
|
composedOf |
code |
code, name, coding system and version describing the concept |
Code |
composedOf |
laterality |
localization with respect to the side of the body |
Laterality |
The Body Site concept is reused several times in the SPHN Dataset. For example, in the concept Heart Rate and in the concept Oxygen Saturation, Body Site is used to describe the body site the Heart Rate or Oxygen Saturation was measured. In the concept Procedure, it is used to indicate the body site where the procedure was performed.
Figure 2. Concept Body Site and its reuse.
For concept reuse in the SPHN Dataset, it is the type that specifies which concept is reused. The illustration below shows the concept Body Site and its reuse in the concepts Procedure and Heart Rate by setting Body Site as a type.

Figure 3. Concept Body Site with its composedOfs and its reuse.
Semantic inheritance
Semantic inheritance is a mechanism where a specific concept can be derived from a broader concept.
The specific concept (child concept) and the broad concept (parent concept) have a hierarchical relationship
and share common composedOfs.
For example, a diagnosis in general has a datetime when it was recorded,
and any specific diagnosis, such as ICD-O diagnosis, also has the datetime information of when it was recorded.
Therefore, both concepts share the same composedOf record datetime
.
The following graphic illustrates the hierarchical relationship between the parent concept
Diagnosis and its children concepts FOPH Diagnosis, Nursing Diagnosis and ICD-O Diagnosis.

Figure 4. Concept Diagnosis and its children concepts.
In the SPHN Dataset, inheritance is represented by the child concept having the parent concept as its type. And the elements that the child concept inherits from the parent concept are specified as inherited. The child concept inherits all composedOfs from the parent concept but it can have additional composedOfs.

Figure 5. Concept FOPH Diagnosis inheriting elements from the Diagnosis concept.
Meaning preservation
Existing concepts in the SPHN Dataset can be adapted to project requirements. However, the meaning of the concept itself should not change. For example, the Blood Pressure concept is described as “blood pressure measured either in the artery, in the vein, or in the pulmonary circulation”. This concept description is still valid and does not change when a new composedOf body position is added.
concept |
Blood Pressure |
blood pressure measured either in the artery, in the vein, or in the pulmonary circulation |
---|---|---|
composedOf |
systolic pressure |
measured systolic pressure value and unit |
composedOf |
diastolic pressure |
measured diastolic pressure value and unit |
composedOf |
mean pressure |
measured average pressure value and unit |
composedOf |
measurement datetime |
datetime of measurement |
composedOf |
body site |
body site where the concept was measured, performed or collected |
composedOf |
measurement method |
measurement method of the concept |
concept |
Blood Pressure |
blood pressure measured either in the artery, in the vein, or in the pulmonary circulation |
---|---|---|
composedOf |
systolic pressure |
measured systolic pressure value and unit |
composedOf |
diastolic pressure |
measured diastolic pressure value and unit |
composedOf |
mean pressure |
measured average pressure value and unit |
composedOf |
measurement datetime |
datetime of measurement |
composedOf |
body site |
body site where the concept was measured, performed or collected |
composedOf |
measurement method |
measurement method of the concept |
composedOf |
body position |
body position associated to the concept |
Specification of a concept
A concept is an independent element that carries a semantic meaning by itself.
Every element used in a composed concept is a concept itself.
A concept can refer to a data point (e.g. Data Provider Institute
),
or it can be an empty container where the data points are all represented by the concept’s composition (e.g. Blood Pressure
).
The elements (properties) of a concept are called composedOf.
A composedOf can be based on an already defined concept and it can carry semantic information
specific to the concept it is part of.
IRI - Internationalized Resource Identifier
Each concept and each composedOf is uniquely identified by an IRI. The IRI is a resolvable versioned URL (Uniform Resource Locator) pointing to a website where details of the current version of the concept or composedOf can be found.
Concept name
There is a general and a contextualized concept name, which can be different for composedOfs. The general concept name aims to provide a unique and consistent naming across the complete dataset for distinguishing elements that have the same meaning, independent of the context in which they are used. The contextualized concept names aims to provide a more specific naming for the composedOf to be understandable within its use in the particular concept, mainly used for human understanding. As an example, the contextualized composedOf name “encounter identifier” is related to the general composedOf name “identifier”.
Description
For each concept and composedOf there is a concise description in natural language. The description needs to explain the general (context independent) meaning of the concept. Since there are already very well formulated descriptions for biomedical concepts, e.g. in the UMLS Metathesaurus, existing descriptions are reused wherever possible. Abbreviations should be avoided unless they are stated in the list of abbreviations in the SPHN Dataset. Descriptions can contain examples to illustrate the meaning of the concept.
general concept name |
general description |
contextualized concept name |
contextualized description |
---|---|---|---|
identifier |
unique identifier identifying the concept |
encounter identifier |
a unique pseudonymized encounter ID for the given data delivery/research purpose |
Semantic type
For each composedOf there is a type indicating what kind of data can be mapped to this composedOf. The following types are used:
string: a sequence of characters; used for free text information such as “problem” in the Problem Condition concept,
temporal: any datetime information; used for time points such as assessment dates, start dates or end dates; granularity can vary from seconds (e.g. timestamps of a machine) to years (e.g. if only year of birth is allowed to be shared within a project); format should be:
YYYY
,YYYY-MM
,YYYY-MM-DD
orYYYY-MM-DDThh:mm:ss
,quantitative: expressing a certain value of a Quantity; technical types can be integer, float,
qualitative: expressing a certain characteristic with a pre-defined set of options, which are not expressed with controlled vocabulary (yet).
The type can also be a concept pointing to another concept in the SPHN Dataset, e.g. Code, Body Site.
Standards and value sets
“Standards” are controlled terminologies, classification systems, ontologies or other coding systems. They are to be used to represent the data in an interoperable way, i.e. they serve as semantic standards for value set definitions. Value set definitions can be broad, medium-detailed or detailed. For a broad value set definition only the standard is stated in the “standards” column and the column “value set or subset” is empty. Medium-detailed definitions refer to a substructure of a standard. Detailed definitions contain a finite set of qualitative options or codes from a controlled vocabulary. The following examples illustrate the difference between these three types of definitions.
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Unit |
unit of measurement |
||
composedOf |
code |
code, name, coding system and version describing the concept |
UCUM |
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Body Site |
any anatomical structure, any nonspecific and anatomical site, as well as morphologic abnormalities |
||
composedOf |
code |
code, name, coding system and version describing the concept |
SNOMED CT |
descendant of : 123037004 | body structure (body structure) | |
description |
standard |
value set or subset |
||
---|---|---|---|---|
concept |
Care Handling |
describes the relationship between the individual and care provider institute |
||
composedOf |
code |
code, name, coding system and version describing the type of the concept |
SNOMED CT |
394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)| |
Detailed value set definitions must not contain values that are overlapping in their meaning. An overlap in meaning would be, for example, mixing information about the type of surgery in regards to a minimally invasive or open approach and the access route (access through body site) chosen by the surgeon.
standard |
value set or subset |
||
---|---|---|---|
composedOf |
surgery type |
SNOMED CT |
129236007 |Open approach - access (qualifier value)|; 103388001 |Percutaneous approach - access (qualifier value)|; 129220005 |Transaxillary approach (qualifier value)| |
standard |
value set or subset |
||
---|---|---|---|
composedOf |
surgery access type |
SNOMED CT |
129236007 |Open approach - access (qualifier value)|; 103388001 |Percutaneous approach - access (qualifier value)| |
Value sets are defined for concepts of type:
qualitative, e.g. for diagnosis rank: principal; secondary; complementary;
Code, e.g. for care handling code: 394656005 | Inpatient care (regime/therapy)|; 371883000 | Outpatient procedure (procedure)|; 304903009 | Provision of day care (regime/therapy)|
Meaning binding
SPHN concepts with clinical or other meaning are associated by a so called meaning binding to an international standard (e.g., SNOMED CT,LOINC or other). These meaning bindings support the machine readability of the concepts and allow researchers to use the clinical knowledge contained in these terminologies in their research projects.
There are several criteria to consider in meaning binding, and the following guiding principles help to understand concept selection and find meaning bindings for new concepts:
Fit for purpose - binding to a single concept or code from an external terminology (otherwise not usable in URIs);
Suitability instead of completeness - no binding if there is no suitable concept or code from an external terminology;
Exact fit - no binding to more or less specific terms, e.g. ICD-O Diagnosis is not bound to
439401001 |Diagnosis (observable entity)|
LOINC - don’t use panel or group codes
SNOMED CT
Use procedure codes for procedure concepts, e.g.
29303009 |Electrocardiographic procedure (procedure)|
Use observable entity codes for observables, e.g.
397155001 |Body position (observable entity)|
Avoid same code for different items in the SPHN Dataset.
The following example illustrates how meaning bindings are stated in the SPHN Dataset. In the example, there is a meaning binding to a SNOMED CT concept and a meanining binding to a LOINC code.
description |
meaning binding |
||
---|---|---|---|
concept |
Problem Condition |
clinical condition, problem, diagnosis, or other event, situation, issue, or clinical concept that has risen to a level of concern |
SNOMED CT: 55607006 |Problem (finding)|; LOINC: 44100-6 Medical problem |
Development process
Request for adding a new concept and/or making changes to the SPHN Dataset need to be submitted to the SPHN Data Coordination Center (DCC). The SPHN Dataset is being developed in collaboration with experts from the five Swiss university hospitals, SPHN National Data Stream (NDS) data managers, and clinical and genomic experts, with the DCC coordinating the development. One to two releases per year are published after approval by Hospital IT strategy alignment group of SPHN.
Availability and usage rights
The SPHN Dataset is available on the SPHN website.
The SPHN Dataset is under the CC-BY 4.0 License.
For any question or comment, please contact the Data Coordination Center (DCC) at dcc@sib.swiss.