EOC Implementation of SPHN
EOC IT infrastructure
The Ente Ospedaliero Cantonale (EOC) hospital infrastructure is designed to load productive data into a data warehouse in batch mode during the night.
Data is first moved into a staging area using SQL Server Integration Services (SSIS), where it is cleaned, integrated, and transformed before being stored in the data warehouse.
The data warehouse can be queried using Python with Pandas and ODBC libraries. A dedicated Python script refines the dataset and ensures accurate mapping of hospital categories to SPHN concepts.
The SPHN Connector is then used to generate and validate RDF files (in Turtle format), which are subsequently encrypted and securely transferred to the BioMedIT node via SETT (Figure 1).
Figure 1: Pipeline, before reaching the BioMedIT node. Data flows from the production environment, through the data warehouse, R&D data engineering, and the SPHN pipeline before reaching the BioMedIT node.
Data de-identification
The de-identification process is performed in the SPHN Connector:
all identifiers connected to the patient are anonymized (using the scrambleField)
patient dates are shifted by a random timespan, while remaining consistent across records for the same patient (using the dateShift)
The configuration rules are made on the following JSON file within the SPHN Connector:
{
"scrambleField": {
"defaultScrambling": {
"applies_to_fields":
["sphn:SubjectPseudoIdentifier/id",
"sphn:SubjectPseudoIdentifier/sphn:hasIdentifier",
"sphn:AdministrativeCase/id",
"sphn:AdministrativeCase/sphn:hasIdentifier",
"sphn:Sample/id",
"sphn:Sample/sphn:hasIdentifier",
"sphn:hasSample/id",
"sphn:hasSample/sphn:hasIdentifier"]
}
},
"dateShift": {
"defaultDateShift": {
"low_range": -30,
"high_range": 30
}
}
}