CHUV implementation of SPHN

CHUV research ecosystem

HORUS is the platform at the University Hospital of Lausanne (CHUV) dedicated to the research community. It stands for Hospital Research Unified Data & Analytics Services.

The platform encompasses clinical data (i.e. patient data, metadata, documents, images, IOT) also known as HORUS Data and research applications or services also known as HORUS Analytics & Services (see Figure 1).

Figure 1. Overview of the CHUV research ecosystem

Note

A specific data pipeline (blue arrows) has been developed by the CHUV to deliver data releases to the SPHN community.

Key applications and services are:

HORUS Data

Data integration into Oracle platform (various clinical data sources)

Data standardization (cleansing and FAIR transformation)

Management of Terminologies (ontologies, …)

Management of Data Registries

Data and object storage

RDF Graph databases

HORUS Applications (e.g. HORUS Consent)

Research project registration (protocol, DMP, Ethical approval, DTA)

Project patient cohorts

Project patient pseudo-codification (used by de-identification)

Other applications: HORUS Explorer, HORUS Restitution, HORUS Images, HORUS Registry

HORUS Analytics and Services

CHORUS Digital workspace

Machine Learning platform (MLOps)

Analytics and data visualization tools

SPHN Federated Query System (provided by SPHN)

TI4Health from Tuneinsight

Einstein API endpoint

Virtuoso Graph Database

SPHN Data Pipeline (developed by CHUV)

Data release generation and delivery for SPHN projects (NDS and DEM)

Incremental data delivery for FQS (TI4Health/Einstein)

SPHN Connector (provided by SPHN)

RDF data conversion based on SPHN schema triggered by CHUV SPHN data pipeline

Data quality validation

Einstein notifications (patient RDF data)

SPHN SETT (provided by SPHN)

Data transfer to BioMedIT

SPHN Data Provisioning Process

The process involves the following steps (see Figure 2):

Data Preparation

Data Release Generation

Data Release Validation

Data Release Delivery

Figure 2. SPHN Data Provisioning at CHUV

Note

CHUV developed a generic SPHN data pipeline in Python orchestrated by a Jenkins docker agent. The pipeline integrates the SPHN Connector.

Data analysts and engineers can customize, if required, default scripts to address the data release requirements such as:

patient cohort definition

inclusion and exclusion criteria

de-identification rules

selection of concepts

Data standarization

Data originates from an Oracle platform which is a combination of an Oracle data warehouse Oracle Health Foundation and a data lake, providing structured and unstructured data.

Data standardization is a continous process and a daily ETL done at the datawarehouse level. The challenge consists in standardizing data by making it FAIR, mostly interoperable.

Data standardization components include:

Mapping tables (aligning data fields across different data sources)

Business rules (ensuring data complies with predefined rules and logic)

Terminologies (using standard terminologies to unify data meaning)

Meta data (for PACS images, clinical documents, datasets, Omics)

Clinical data (organizing descriptive metadata and clinical data for easier processing)

Note

Codification is done at the source for some data systems (e.g. billing and laboratory) or manually by specialists or clinicians.

Data de-identification

The de-identification of the data release is not done by the SPHN Connector, but by the CHUV SPHN data pipeline while preparing the release tables.

Dates are shifted according to the project rules and pseudo-codes (Subject Pseudo Identifiers) are provided by HORUS Consent (i.e. 1 unique pseudo id per patient/project) thanks to an Oracle function. All identifiers (e.g. encounter id) are “hashed”.

CHUV pipeline and SPHN Connector

The SPHN pipeline can serve both NDS projects and the Federated Query System.

The diagram below outlines the workflow and the scripts of the pipeline (see Figure 3).

Figure 3. CHUV SPHN data pipeline scripts

Note

Standardized and de-identified CHUV data is transformed into a JSON document (1 per patient) and is ingested into the SPHN Connector to get a validated RDF file. The transfer with SETT is not operated by the CHUV pipeline, but manually upon request.

Federated Query System and Einstein

The CHUV pipeline can generate RDF data (as NQUADS) to Einstein in order to provide patient data to the FQS TI4Health (see Figure 4).

Figure 4. CHUV Federated Query System Architecture

Note

The CHUV SPHN data pipeline is able to load incremental data (a file per patient) as well as remove a patient if necessary (e.g. general consent revoking).