USB implementation of SPHN
Data Warehouse at the University Hospital Basel
The University Hospital Basel (USB) has built a Data Warehouse (DWH). The data is stored in a high-performance SAP HANA database, in-memory in column stores. In addition to the DWH, there are various consumers who access data from the DWH. Both are discussed below.
DWH
The DWH is loaded nightly. Execution is fully orchestrated by a Python solution and monitored daily by staff. The three steps of a load—Extraction, Transformation, and Load—are described below. Additionally, the implemented access concept is explained. The technical setup was replicated identically for the University Children’s Hospital Basel (UKBB).
Extraction
Currently, 42 systems are connected, of which 37 use direct database access. Six different database technologies, file imports, and an Active Directory connection are used. Smart Data Adapters facilitate the integration of various databases. Besides full table copying, a partial load criterion (such as an update date) can also be used. The extraction layer currently comprises 13 billion records, 448 GB of data, or 900 tables.
Transformation
Through SQL views, the source data is prepared so users can access it across systems, simply and efficiently. There are 185 entities available, divided into 82 fact and 103 dimension entities.
Load
Since the execution of transformations takes time, these are materialized. All dimensions are historized, meaning it is possible to trace how records have changed over time. Currently, the prepared data includes 5.5 billion records, 241 GB, or 2,500 data attributes.
Access Concept
In principle, a DWH user has access to all tables loaded in the DWH. However, all person-identifying attributes are masked. Per user, a list of patients can be defined, thus restricting data access accordingly.
Consumers
The loaded data is used for various applications.
SQL Access
Selected users or services can execute SQL directly with a personal user account.
Business Intelligence (BI) & Data Insights
Through the SAP BusinessObjects platform, DWH data is provided in a web frontend. Reports, including visualizations, can be created and shared among users. Data can be retrieved on-demand or sent automatically. To enable feasibility studies, a Data Insight platform (MDClone) generates synthetic data based on the DWH data and makes it available. Data queries can be created, enriched with NLP, and shared.