SPHN export application at USB
Note
The following documentation about implementation is provided by Universitätsspital Basel (USB). For additional information or questions, please contact the SPHN DCC FAIR Data Team at fair-data-team@sib.swiss.
Introduction
The USB SPHN export application transforms the USB data of the data warehouse into the SPHN export format. The application generates files in the RDF format by concepts as specified either in the SPHN or a project specific ontology. Data export is restricted to a pre-specified set of patients and time period defined for each patient individually depending on the requirement (inclusion criteria) of a project.
The USB export application is based on Python and SQL programming language. Within the Python code, the data generation process is triggered and controlled. The data generation and manipulation statements itself are almost exclusively run within the data warehouse database HANA as SQL statements. The SQL statements are either predefined statements or generated within the Python code.
Application Overview
Python application
Read in the ontologies, transformation to tabular format, export to DWH
Read in project configuration files. These files define which concepts have to be exported for a specific project, the ontologies used, the path for RDF files, whether pseudonymisation should be applied, the DB schema of the project, etc.
Deploy the concept SQL statements for the SPHN transformation layer of the DWH. These statements are ontology version specific.
The components of the Python application are described in more detail in the next chapter.
Data Warehouse (DWH) (Hana DB)
Analysis Layer: data of the clinical application stored in tables in relational format
SPHN transformation layer: based on views defined SQL statements. Restructures DWH analysis layer to meet structure, naming, datatype, encoding of SPHN data.
SPHN Mapping: Mapping tables to transform internal codes to codes required by SPHN, for example Snomed-CT, Loinc codes
Project space: DB schema for a specific project where data are generated in the SPHN format.
Pseudonymization Service: Provides the pseudonymized ID’s for patient and case and the time shift in a consistent, reproducible way.
Data Quality Control
Data Quality check to referential integrity (linking/URI of concepts) done within USB export application
Further checks running the SPHN Quality Framework.
Additional SPARQL queries run on RDF data imported into GraphDb.
Upload data to Biomed-IT node
Using sett tool provided by SPHN, currently manual process.
USB export application: Python Program
Examples of SQL view statement for SPHN transformation layer
Corner stone of the USB SPHN export solution is the SPHN transformation layer of the DWH. It consists of views written in SQL that transform the USB data structure of the DWH into the SPHN data structure. It includes the mapping to the appropriate codes, naming of attributes and the pseudonymization (placeholders that will be filled in during runtime). Multiple views can be used for one concept, for example for the concept SimpleScore each score (GCS, SOFA, …) could have its own view.
The SPHN data structures deviates from a typical relational data model used for analysis purposes as it is quite normalized. For example each quantities and units of a measurement have to go into separate concepts. To make sure only instances of concepts are exported that are needed for a certain project, the views for the such concepts are generated automatically within Python program. They are based on the views for the concepts that have to be exported for a certain project.
In the example of SPHN concept BodyWeight, Quantity (hasQuantity_hasValue, hasQuantity_hasUnit) and Unit (hasQuantity_hasUnit_hasCode) are integrated into BodyWeight concept. The naming of the attributes are derived from the ontology of the concepts Quantity and Unit. The SQL statement is generated automatically within the Python application based on the information provided in the ontology.
CREATE OR REPLACE VIEW V_SPHN_BODY_WEIGHT AS
SELECT
VVMT_BK AS id,
weight.PAT_BK,
weight.CAS_BK,
/*[select pseudonymisation]*/
VVMT_MEASURE_DATE_TS AS OBS_BEG_DATE,
add_days(VVMT_MEASURE_DATE_TS,/*[add_days pseudonymisation]*/ 0) AS hasMeasurementDateTime,
substring(VVMT_NUMERIC_VALUE, 1,6 ) ||'_'|| VIP_UNIT AS hasQuantity,
substring(VVMT_NUMERIC_VALUE, 1,6 ) AS hasQuantity_hasValue,
VIP_UNIT AS hasQuantity_hasUnit,
'ucum:'||VIP_UNIT AS hasQuantity_hasUnit_hasCode,
NULL hasBodySite
FROM CDWH.V_IL_FCT_VITAL_VALUES_METAVISION AS weight
JOIN CDWH.V_IL_DIM_VITAL_PARAMETER_CID AS vital_dim
ON weight.VIP_BK =vital_dim.VIP_BK
AND vital_dim.vip_name = 'Gewicht Aktuell'
/*[join pseudonymisation]*/
WHERE VVMT_NUMERIC_VALUE > 0