Introduction

Note

> Watch the SPHN webinars introducing the Framework and its Toolstack and SPHN Data Ecosystem for FAIR Data!

Warning

If you are not familiar with Semantic Web technologies and its related content (e.g., standards, formats, languages), we strongly encourage you to read the Background Information section in this document.

To advance personalized health research, biomedical researchers can benefit from having access to a variety of health and related data. The challenge however is that data is commonly available at 1) various sources and 2) in multiple formats, meaning that data can be represented in different ways. For instance, health-related data is produced and collected in hospitals; genomics data is generated in sequencing facilities; citizen data is collected via mobile devices, etc.

This makes it difficult, if not impossible for researchers to effectively use data in a research project (i.e. difficult to collect, connect and understand data from different sources). To overcome this issue, the Swiss Personalised Health Network (SPHN) initiative originated. SPHN is a collaborative effort that aims to bring together both data provider (e.g. hospitals, health-care providers) and researchers (data users) to produce and reuse data in a coordinated manner.

SPHN Semantic Interoperability Strategy

SPHN is an initiative of the Swiss government that provides a framework for exchanging health-related data in an interoperable way for research.

Note

Interoperable data means that data coming from different sources follow the same norm and are compatible, comparable, linkable and usable together.

To facilitate interoperability, so-called standards and coding systems can be used to represent data. Currently, only a few nationally adopted and implemented standards for medical information in hospitals. Some coding systems are used for patient billing and accounting.

To enable and facilitate the use of health-related data from clinical routine and other sources for research, SPHN has developed a semantic interoperability strategy [Gaudet-Blavignac et al. 2021] based on the following three pillars strategy:

  • Pillar 1: SPHN Concepts (semantics)

  • Pillar 2: SPHN RDF Schema (data transport and storage)

  • Pillar 3: Use cases

DCC semantic interoperability

Figure 1. Semantic interoperability strategy of SPHN.

The Stakeholder Roles

In the SPHN initiative and this documentation, the following stakeholder roles are defined:

Data Provider (alias the ‘data producer’) refers to:

  • Clinical Data Manager: a person who maintains data (typically, in a clinical data warehouse) and makes it available for further use to Researchers

Members of a scientific project (alias the ‘data consumers’) refer to:

  • Project Data Manager: a technical expert who prepares or extends data for researchers and specific scientific projects.

  • Researcher (user): a biomedical researcher who needs to access and analyze the biomedical data

  • Project Leader: a responsible person for a specific research project.

The collaborative effort between data producers (typically in hospitals) and research projects is expressed in the SPHN Data Ecosystem.

The Data Coordination Center (DCC), managed by the Personalized Health Informatics group, coordinates the building of infrastructures to facilitate the exchange between stakeholders.

SPHN Data Ecosystem for FAIR Data

The SPHN initiative has promoted the development of the SPHN Ecosystem [Österle et al. 2021] which encapsulates multiple components to allow exchange and reuse of human-related data (see Figure 2):

The DCC defines the SPHN Dataset, describes the meaning of the data (i.e. semantics) to be represented. For example, what is an allergy and which information should be provided with it to describe a specific allergy. The SPHN Dataset semantically defines health-related concepts (or terms) used in health research in Switzerland (Pillar 1). Note that “dataset” here basically refers to metadata of the actual health data to be used, e.g. attributes or field names defined in a clinical health record.

The semantics specified in the SPHN Dataset are then represented in a common format to facilitate data exchange, namely in RDF (Resource Description Framework) (Pillar 2). This is the SPHN RDF Schema that indicates the concepts and rules to follow for generating structured clinical datasets following the FAIR principles.

For facilitating the use and integration of external (national and international) standard terminologies and classifications, a Terminology Service [Krauss et al. 2021] has been put in place to automatically transform existing health-related standards into the RDF format and make them accessible to both data providers and data users (researchers) for connecting and/or representing data with these standards.

Semantic and schema extension: Scientific projects have the possibility to extend the semantics defined in the SPHN Dataset and RDF Schema by adding project-related information. This extension process can happen in two ways:

either the project represents the semantics in the SPHN Dataset Template and uses the SPHN Schema Forge webservice to automatically generate the project-specific RDF Schema

or the project builds directly their schema in the RDF format using an editor of their choice.

A Project Data Mananger can then send the project-specific RDF Schema to the data providers who integrate this new schema into their pipelines for transforming clinical data warehouse data and generate RDF data files that comply with the given RDF Schema. Since 2023, data providers have the possibility to use the SPHN Connector to build a valid SPHN-compliant RDF Schema.

Quality assurance: Any new data generated by a clinical data manager can be checked for quality. This is automatically done when using the SPHN Connector, which integrates the Quality Check tool that integrates the SHACLer and SPARQL queries.

Data reuse: Once data is validated, researchers can explore and analyze the data as they need.

SPHN Ecosystem

Figure 2. Simplified overview of the SPHN Ecosystem. In blue colored text are represented elements developed and taken care by the DCC. In orange colored text are represented tasks that should be achieved by the scientific project members. In red colored text are represented tasks that should achieved by the data providers.

Reference

Österle, S.; Touré, V.; Crameri, K. (2021), The SPHN Ecosystem Towards FAIR Data. Preprints, 2021090505 (doi: 10.20944/preprints202109.0505.v1)

Gaudet-Blavignac C, Raisaro JL, Touré V, Österle S, Crameri K, Lovis C (2021), A National, Semantic-Driven, Three-Pillar Strategy to Enable Health Data Secondary Usage Interoperability for Research Within the Swiss Personalized Health Network, JMIR Med Inform 9 (6), e27591 (doi: 10.2196/27591)

Touré, V., Krauss, P., Gnodtke, K., Buchhorn, J., Unni, D., Horki, P., Raisaro, J.L., Kalt, K., Teixeira, D., Crameri, K. and Österle, S. (2023), FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network. Scientific Data, 10(1), p.127 (doi: 10.1038/s41597-023-02028-y)