To find out more about the SPHN Semantic Strategy and the SPHN Ecosystem, watch the seminars on Linked data in SPHN and SPHN Data Ecosystem for FAIR Data
The Data challenge
To advance personalized health research, biomedical researchers can benefit from having access to a variety of health and related data. The challenge however is that data is commonly available at 1) various sources and 2) in multiple formats. For instance, health-related data is produced and collected in hospitals; genomics data is generated in sequencing facilities; citizen data is collected via mobile devices, etc. These different data sources most of the time have various formats, meaning that data can be represented in different ways.
This makes it difficult, if not impossible for researchers to effectively use data in a research project (i.e. difficult to collect, connect and understand data coming from different sources). To overcome this issue, the Swiss Personalised Health Network (SPHN) initiative originated. SPHN is a collaborative effort that aims to bring together both data provider (e.g. hospitals, health-care providers) and researchers (data users) to produce and reuse data in a coordinated manner.
SPHN Semantic Interoperability Strategy
SPHN is an initiative of the Swiss government that provides a framework for exchanging health-related data in an interoperable way for research.
Interoperable data means that data coming from different sources follow the same norm and are compatible, comparable, linkable and usable together.
To facilitate interoperability, so-called standards and coding systems can be used to represent data. Currently, only a few nationally adopted and implemented standards for medical information in hospitals. Some coding systems are used for patient billing and accounting.
To enable and facilitate the use of health-related data from clinical routine and other sources for research, SPHN has developed a semantic interoperability strategy [Gaudet-Blavignac et al. 2021] based on the following three pillars strategy:
Pillar 1: Semantic representation
Pillar 2: Data transport and storage
Pillar 3: Use cases
Figure 1. Semantic interoperability strategy of SPHN.
The Stakeholder Roles
In the SPHN initiative and this documentation, the following stakeholder roles are defined:
Data Provider (alias the ‘data producer’) refers to:
Clinical Data Manager (in a hospital): a person who maintains data (typically, in a Clinical Data Warehouse) and makes it available for further use to Researchers
Members of a scientific project (alias the ‘data consumers’) refer to:
Project Data Manager: a technical expert who prepares or extends data for researchers and specific scientific projects.
Researcher (user): a biomedical researcher who needs to access and analyze the biomedical data
Project Leader: a responsible person for a specific research project.
The collaborative effort between data producers (typically in hospitals) and research projects is expressed in the SPHN Data Ecosystem.
SPHN Data Ecosystem for FAIR Data
The SPHN initiative has promoted the development of the SPHN Ecosystem [Österle et al. 2021] which encapsulates multiple components to allow exchange and reuse of human-related data (see Figure 2):
The Data Coordination Center (DCC) defines the SPHN Dataset, a high-level “data model” describing the meaning of the data (semantics) to be represented. For example, what is an allergy and which kind of information should be provided to describe it. In more detail, the SPHN Dataset semantically defines health-related concepts (or terms) used in health research in Switzerland (Pillar 1). Note that “dataset” here basically refers to metadata of the actual health data to be used, e.g. attributes or field names defined in a clinical health record.
The high-level “data model” and metadata specified in the SPHN Dataset is then represented in a common format to facilitate data exchange, namely in RDF (Resource Description Framework) (Pillar 2). The result is the SPHN RDF Schema that indicates the concepts and rules to follow for generating structured clinical datasets following the FAIR principles.
For facilitating the use and integration of external (national and international) standard terminologies and classifications, a Terminology Service [Krauss et al. 2021] has been put in place to automatically transform existing health-related standards into the RDF format and make them accessible to both data providers and data users (researchers) for connecting and/or representing data with these standards.
Semantic and schema extension: Scientific projects have the possibility to extend the semantics defined in the SPHN Dataset and RDF Schema by adding project-related information. This extension process can happen in two ways:
either the project represents the semantics in the SPHN Dataset Template and uses the SPHN Schema Forge webservice to automatically generate the project-specific RDF Schema
or the project builds directly their schema in the RDF format directly by using the RDF Schema Template as a starting point to extend the SPHN RDF Schema into a project-specific RDF Schema.
A Project Data Mananger can then send the project-specific RDF Schema to the data providers who integrate this new schema into their pipelines for transforming clinical data warehouse data and generate RDF data files that comply with the given RDF Schema. Since 2023, data providers have the possibility to use the SPHN Connector to build a valid SPHN-compliant RDF Schema.
Quality assurance: Note that any new data that is generated by a Clinical Data Manager needs to be checked for quality and, in particular, if it complies with the corresponding schema. Data can be validated with the Quality Assurance Framework mainly composed of the SHACLer and the SPARQLer tools. The Quality Assurance Framework tools are integrated in the SPHN Connector.
Data reuse: Once data was validated and has passed quality assurance checks, Researchers can explore and analyze the data as they need.
Figure 2. Simplified overview of the SPHN Ecosystem. In blue colored text are represented elements developed and taken care by the DCC. In orange colored text are represented tasks that should be achieved by the scientific project members. In red colored text are represented tasks that should achieved by the data providers.
Österle, S.; Touré, V.; Crameri, K. (2021), The SPHN Ecosystem Towards FAIR Data. Preprints, 2021090505 (doi: 10.20944/preprints202109.0505.v1)
Gaudet-Blavignac C, Raisaro JL, Touré V, Österle S, Crameri K, Lovis C (2021), A National, Semantic-Driven, Three-Pillar Strategy to Enable Health Data Secondary Usage Interoperability for Research Within the Swiss Personalized Health Network, JMIR Med Inform 9 (6), e27591 (doi: 10.2196/27591)
Touré, V., Krauss, P., Gnodtke, K., Buchhorn, J., Unni, D., Horki, P., Raisaro, J.L., Kalt, K., Teixeira, D., Crameri, K. and Österle, S. (2023), FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network. Scientific Data, 10(1), p.127 (doi: 10.1038/s41597-023-02028-y)