KSB Implementation of SPHN 
==========================

1. Technical Architecture: SPHN Data Integration and Preparation
----------------------------------------------------------------

This document describes the end-to-end architecture implemented at KSB for the preparation, curation, and delivery of patient data compliant with SPHN standards.
The core infrastructure leverages a hybrid deployment model to ensure the highest levels of data privacy, scalability, and security.

1.1 Architecture Foundation
---------------------------

.. list-table::
   :widths: 10 15 75
   :header-rows: 1

   * - Component
     - Technology
     - Description
   * - Data Platform
     - Databricks (on Azure)
     - Primary environment for processing, transformation, and curation of pseudonymized data. Transformations are implemented using SQL, Spark, and Python, and are orchestrated via dbt (data build tool) for version control, testing, and quality assurance. Data governance and access control across the data lake is managed centrally using Unity Catalog.
   * - Cloud Environment
     - Microsoft Azure (Switzerland Region) 
     - Hosts the Databricks workspace, virtual machines (VMs), and secure storage (Data Lake/ADLS). Crucially, only pseudonymized data is transferred to and stored in Azure; no PII is ever exposed to the cloud environment.
   * - Data Model
     - based on FHIR Standard
     - The target data model structure is aligned to FHIR, ensuring interoperability and facilitating conversion to SPHN standards and other international requirements.


1.2 Data Preparation and Curation Pipeline (Medaillon Architecture)
-------------------------------------------------------------------

The data pipeline follows a structured, multi-stage Medallion Architecture (Bronze, Silver, Gold/Serving) to move data from raw source material to the curated, standardized SPHN view. 


.. image:: ../images/ksb/KSB_Infrastructure.png
   :width: 1000px
   :align: center
   :alt: Heart Rate measurement

**Figure 1. High-Level Data Curation Pipeline - Medallion Architecture (Bronze, Silver, Gold)**

**Stage 1: Ingestion and Pseudonymization (On-Premise)**

* **Action:** Data is loaded from on-premise source systems via direct connections (reverseengineered).
* **Process:** Pseudonymization is performed as part of a complex, secure upload process before data is transferred to the Azure platform. The local Concordance Table (mapping internal IDs to pseudonymized IDs) is securely stored and managed on-premise, ensuring that no PII is ever exposed to the cloud or to Microsoft.

**Stage 2: Bronze Layer (Raw Data Persisting)**

* **Description:** Stores the raw, source-system data (now pseudonymized) as-is, providing a historical, immutable snapshot for lineage and auditing.
* **Focus:** Persistence of raw data without transformation.

**Stage 3: Silver Layer (Quality and Metadata)**

* **Description:** Data is cleaned, standardized, and enriched. Transformations are managed via dbt models, utilizing SQL, Spark, and Python logic.
* **Focus:** Adding system metadata, performing initial data quality checks, and handling basic schema enforcement. 

**Stage 4: RDV to BDV Transformation (KSB Data Model)**

* **Action:** This is the crucial transformation step that bridges the gap between source system architecture and the institutional KSB Data Model. This complex logic is fully implemented and governed by dbt.
* **RDV (Raw Data Vault) to BDV (Business Data Vault):** Data is moved away from the operational system logic and structured according to KSB's unified enterprise data model.

**Stage 5: Serving Layer (Gold, SPHN…)**

* **Action:** Data is mapped from the KSB BDV model to the final, curated SPHN concepts (using LOINC, SNOMED CT, CHOP, etc., reference tables).
* **Output:** The data is structured into defined SPHN views (the "Serving Layer"), making it ready for querying and export. 

1.3 SPHN Request and Export Workflow 
------------------------------------

.. list-table::
   :widths: 10 75
   :header-rows: 1

   * - Step
     - Detail
   * - Cohort Creation
     - Based on the SPHN request criteria, queries are executed against the SPHN Serving Layer to identify and select the correct patient cohort.
   * - Data Export
     - The necessary data elements for the selected cohort are exported as individual CSV files, one per SPHN concept (e.g., condition, lab result, encounter).

1.4 SPHN Connector and Delivery
--------------------------------
The final steps involve preparing the data for external submission using the SPHN Connector.

.. list-table::
   :widths: 10 75
   :header-rows: 1

   * - Step
     - Detail
   * - SPHN Connector Setup
     - Installed on a dedicated Azure VM located within the KSB Azure Tenant (Switzerland region) for data sovereignty.
   * - Project Preparation
     - The SPHN project metadata is prepared within the VM environment.
   * - Data Upload
     - The exported CSV files are collected and manually uploaded as a ZIP file to the VM. *Note: Direct API connectivity is fully prepared and will be implemented quickly as SPHN submission volumes increase*.
   * - Connector Execution
     - The SPHN Connector is run against the uploaded CSV files, transforming them into the standardized SPHN Turtle files (.ttl).
   * - Final Delivery
     - The generated Turtle files are manually downloaded from the VM and transmitted to the recipient using a locally installed SETT Client.