Concept Flattening in the SPARQLer

Target Audience

This document is intended for data managers, researchers, RDF experts, and SPHN affiliated partners who are interested in automatically generating SPARQL queries for SPHN RDF compatible schema with various concepts using the SPARQLer tool.

This document contains the following information:

  • Definitions of terms

  • Concept flattening overview

  • Example of a concept flattening.

Definitions

  • Resource: is anything, any IRI or literal denotes something in the world (the “universe of discourse”). These things are called resources. Anything can be a resource, including physical things, documents, abstract concepts, numbers and strings; the term is synonymous with “entity” as it is used in the RDF Semantics specification [RDF11-MT].

  • Properties: An RDF property is any relation between subject resources and object resources. Asserting an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object.

For more information about RDF elements, visit the RDF Background section.

Concept flattening

The concept flattening process groups all resources and their properties into one query file per concept. The data is “flattened”, in the sense that each query returns a list of resources defined for that concept together with values of the directly connected properties. If the value of a property is another concept, the values of that concept would also be retrieved. A concept flattening query can be run against a SPARQL endpoint to generate a table per concept/class that would be easier to process in specific applications.

It is important to remember that the SPARQLer tool requires an RDF schema as input for the concept flattening process. The output is a SPARQL query file (.rq) for each concept, and the aim of the query is to retrieve the data as a table. This can give a better overview of the extent of metadata connected to a particular concept. Concept flattening is more easily understood as a method of processing an RDF graph model per concept into a tabular model.

Example of concept flattening

For the Biobanksample concept, all resources of type Biobanksample are extracted with the DataProviderInstitute object, the Biosample object, the identifier value of the Biobanksample, the AdministrativeCase object and the SubjectPseudoIdentifier object.

Biobanksample concept

Figure 1. The Biobanksample and its related metadata.

The query below is the output of the SPARQLer for the Biobanksample concept:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
   ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Biobanksample> .
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasIdentifier ?administrativeCase_hasIdentifier . }
   optional{ ?resource  sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseCareHandling/sphn:hasCareHandlingTypeCode/rdf:type ?administrativeCase_hasCareHandlingTypeCode .
   ?administrativeCase_hasCareHandlingTypeCode rdfs:label  ?administrativeCase_hasCareHandlingTypeCode_haslabel
   FILTER(strStarts(str(?administrativeCase_hasCareHandlingTypeCode), "http://snomed.info/id/")) . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseCareHandling/sphn:hasCareHandlingTypeCode ?administrativeCase_hasCareHandlingTypeCode . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseDischargeLocation/sphn:hasLocationClass ?administrativeCase_hasLocationClass . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseDischargeLocation/sphn:hasLocationExact ?administrativeCase_hasLocationExact . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseOriginLocation/sphn:hasLocationClass ?administrativeCase_hasLocationClass . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseOriginLocation/sphn:hasLocationExact ?administrativeCase_hasLocationExact . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseAdmissionDateTime ?administrativeCase_hasAdministrativeCaseAdmissionDateTime . }
   optional{ ?resource sphn:hasAdministrativeCase/sphn:hasAdministrativeCaseDischargeDateTime ?administrativeCase_hasAdministrativeCaseDischargeDateTime . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleBodySite/sphn:hasBodySiteCode ?biosample_hasBodySiteCode . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleFixationType ?biosample_hasBiosampleFixationType . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleMaterialTypeLiquid ?biosample_hasBiosampleMaterialTypeLiquid . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleMaterialTypeTissue ?biosample_hasBiosampleMaterialTypeTissue . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosamplePrimaryContainerType ?biosample_hasBiosamplePrimaryContainerType . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleStorageContainer ?biosample_hasBiosampleStorageContainer . }
   optional{ ?resource sphn:hasBiosample/sphn:hasBiosampleDateTime ?biosample_hasBiosampleDateTime . }
   optional{ ?resource sphn:hasDataProviderInstitute/sphn:hasDataProviderInstituteCode/sphn:hasIdentifier ?dataProviderInstitute_hasIdentifier . }
   optional{ ?resource sphn:hasDataProviderInstitute/sphn:hasDataProviderInstituteCode/sphn:hasCodeCodingSystemAndVersion ?dataProviderInstitute_hasCodeCodingSystemAndVersion . }
   optional{ ?resource sphn:hasDataProviderInstitute/sphn:hasDataProviderInstituteCode/sphn:hasCodeName ?dataProviderInstitute_hasCodeName . }
   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier ?subjectPseudoIdentifier_hasIdentifier . }
   optional{ ?resource sphn:hasIdentifier ?identifier_hasIdentifier . }
}

The query above can be ran in any SPARQL endpoint to retrieve data about Biobanksample resources found in a database/data resource of interest that stores data in an SPHN-compliant format. The output of the query would generate a table-like representation of the Biobanksample and its related metadata, of which an excerpt is shown below:

Table 1. Excerpt of an example of table generated when running the Biobanksample query.

resource

administrativeCase_hasIdentifier

administrativeCase_hasCareHandlingTypeCode

CHE…Biobanksample-001

CHE…AdmCase-002

CareHandling-SNOMED-CT-394656005