KSA Implementation of SPHN

The SPHN implementation at Kantonsspital Aarau (KSA) is based on the Clinical Data Platform at KSA, which is designed to provide a resilient, open and vendor-independent foundation for the preparation and delivery of clinical datasets to associated parties (e.g. SPHN) according to data governance processes. The architecture enables standardized data transformations, secure and compliant data storage.

Architectural Principles

The architecture follows a strict open-architecture approach. All core components are based on open-source technologies and open standards, ensuring that each component can be replaced independently if required. The platform is designed to avoid vendor lock-in by using open data formats and standardized interfaces.

Scalability and resilience are achieved through cloud-native design patterns, while security and data protection are ensured by operating the platform in a secure private cloud Kubernetes environment. This approach also ensures long-term longevity of the platform, as it can evolve over time without dependency on specific vendors, technologies, or proprietary formats.

Data from primary source systems, such as the Hospital Information System (KIS), is updated at regular intervals of approximately ten minutes, ensuring near-real-time data availability.

Data Storage Layer (Lakehouse)

Clinical data is stored in a secure private cloud using S3-compatible object storage. The lakehouse design is implemented using Apache Iceberg (for transactional consistency, schema evolution and time-travel capabilities). Data is stored using open file format (Parquet), ensuring long-term accessibility and interoperability.

The object storage layer is strictly separated from compute to allow for platform scalability and remain flexible. This design is also very cost-efficient, as object storage is used for persistent data while compute resources can be provisioned dynamically and scaled according to actual workload demands.

Medallion Principle

The platform follows the Medallion architecture to structure data processing and quality assurance. Raw clinical data is first stored in the Bronze layer, preserving the original structure from source systems.

The Silver layer contains cleaned, standardized and harmonized data that is suitable for analytical processing.

The Gold layer consists of curated and validated datasets that are ready for delivery and consumption. An SPHN dedicated data mart is available within the Gold layer and acts an interface to SPHN Connector.

Data Transformation and Quality Management

Data transformations are implemented using dbt. dbt provides a declarative and version-controlled framework for transforming data between Medallion layers. It enables systematic data quality checks, documentation generation and lineage tracking.

All transformations are reproducible and auditable, which is essential for clinical and research data. dbt serves as the semantic and transformation layer of the platform while remaining loosely coupled with storage and compute.

Terminology Mappings

The mapping procedure focused on the standardization of laboratory data to ensure semantic interoperability. Laboratory tests and result units were mapped to LOINC and UCUM using a RAG-based pipeline, followed by validation with clinical experts to ensure medical correctness and consistency.

Mappings to SNOMED CT were performed manually by domain experts. ATC codes are already provided by the source systems and are reused without additional transformation efforts.

Data Access and SPHN Connector

Curated SPHN data marts (gold layer) are delivered to SPHN Connector through the REST-based ingestion layer provided by SPHN Connector. The delivery mechanism retrieves data from Iceberg tables and forwards it to the REST endpoints.

The SPHN Connector plays a critical role in the data processing pipeline. It is responsible for converting the datasets into RDF format, ensuring compatibility with SPHN standards. Additionally, the SPHN Connector performs comprehensive quality checks, validating both the structure and semantic consistency of the data before delivery and guaranteeing that only high-quality, SPHN-compliant data is made available for research and downstream applications.