CHUV implementation of SPHN

CHUV research ecosystem

HORUS is the platform at the University Hospital of Lausanne (CHUV) dedicated to the research community. It stands for Hospital Research Unified Data & Analytics Services.

The platform encompasses clinical data (i.e. patient data, metadata, documents, images, IOT) also known as HORUS Data and research applications or services also known as HORUS Analytics & Services (see Figure 1).

CHUV research ecosystem

Figure 1. Overview of the CHUV research ecosystem

Note

A specific data pipeline (blue arrows) has been developed by the CHUV to deliver data releases to the SPHN community.

Key applications and services are:

  • HORUS Data
    • Data integration into Oracle platform (various clinical data sources)

    • Data standardization (cleansing and FAIR transformation)

    • Management of Terminologies (ontologies, …)

    • Management of Data Registries

    • Data and object storage

    • RDF Graph databases

  • HORUS Applications (e.g. HORUS Consent)
    • Research project registration (protocol, DMP, Ethical approval, DTA)

    • Project patient cohorts

    • Project patient pseudo-codification (used by de-identification)

    Other applications: HORUS Explorer, HORUS Restitution, HORUS Images, HORUS Registry

  • HORUS Analytics and Services
    • CHORUS Digital workspace

    • Machine Learning platform (MLOps)

    • Analytics and data visualization tools

  • SPHN Federated Query System (provided by SPHN)
    • TI4Health from Tuneinsight

    • Einstein API endpoint

    • Virtuoso Graph Database

  • SPHN Data Pipeline (developed by CHUV)
    • Data release generation and delivery for SPHN projects (NDS and DEM)

    • Incremental data delivery for FQS (TI4Health/Einstein)

  • SPHN Connector (provided by SPHN)
    • RDF data conversion based on SPHN schema triggered by CHUV SPHN data pipeline

    • Data quality validation

    • Einstein notifications (patient RDF data)

  • SPHN SETT (provided by SPHN)
    • Data transfer to BioMedIT

SPHN Data Provisioning Process

The process involves the following steps (see Figure 2):

  • Data Preparation

  • Data Release Generation

  • Data Release Validation

  • Data Release Delivery

SPHN Data Provisioning Process

Figure 2. SPHN Data Provisioning at CHUV

Note

CHUV developed a generic SPHN data pipeline in Python orchestrated by a Jenkins docker agent. The pipeline integrates the SPHN Connector.

Data analysts and engineers can customize, if required, default scripts to address the data release requirements such as:

  • patient cohort definition

  • inclusion and exclusion criteria

  • de-identification rules

  • selection of concepts

Data standarization

Data originates from an Oracle platform which is a combination of an Oracle data warehouse Oracle Health Foundation and a data lake, providing structured and unstructured data.

Data standardization is a continous process and a daily ETL done at the datawarehouse level. The challenge consists in standardizing data by making it FAIR, mostly interoperable.

Data standardization components include:

  • Mapping tables (aligning data fields across different data sources)

  • Business rules (ensuring data complies with predefined rules and logic)

  • Terminologies (using standard terminologies to unify data meaning)

  • Meta data (for PACS images, clinical documents, datasets, Omics)

  • Clinical data (organizing descriptive metadata and clinical data for easier processing)

Note

Codification is done at the source for some data systems (e.g. billing and laboratory) or manually by specialists or clinicians.

Data de-identification

The de-identification of the data release is not done by the SPHN Connector, but by the CHUV SPHN data pipeline while preparing the release tables.

Dates are shifted according to the project rules and pseudo-codes (Subject Pseudo Identifiers) are provided by HORUS Consent (i.e. 1 unique pseudo id per patient/project) thanks to an Oracle function. All identifiers (e.g. encounter id) are “hashed”.

CHUV pipeline and SPHN Connector

The SPHN pipeline can serve both NDS projects and the Federated Query System.

The diagram below outlines the workflow and the scripts of the pipeline (see Figure 3).

CHUV pipeline scripts overview

Figure 3. CHUV SPHN data pipeline scripts

Note

Standardized and de-identified CHUV data is transformed into a JSON document (1 per patient) and is ingested into the SPHN Connector to get a validated RDF file. The transfer with SETT is not operated by the CHUV pipeline, but manually upon request.

Federated Query System and Einstein

The CHUV pipeline can generate RDF data (as NQUADS) to Einstein in order to provide patient data to the FQS TI4Health (see Figure 4).

CHUV FQS Architecture

Figure 4. CHUV Federated Query System Architecture

Note

The CHUV SPHN data pipeline is able to load incremental data (a file per patient) as well as remove a patient if necessary (e.g. general consent revoking).