Concept Flattening in the SPARQLer
Target Audience
This document is intended for data managers, researchers, RDF experts, and SPHN affiliated partners who are interested in automatically generating SPARQL queries for SPHN RDF compatible schema with various concepts using the SPARQLer tool.
This document contains the following information:
Definitions of terms
Concept flattening overview
Example of a concept flattening.
Definitions
Resource: is anything, any IRI or literal denotes something in the world (the “universe of discourse”). These things are called resources. Anything can be a resource, including physical things, documents, abstract concepts, numbers and strings; the term is synonymous with “entity” as it is used in the RDF Semantics specification [RDF11-MT].
Properties: An RDF property is any relation between subject resources and object resources. Asserting an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object.
For more information about RDF elements, visit the RDF Background section.
Concept flattening
The concept flattening process groups all resources and their properties into one query file per concept. The data is “flattened”, in the sense that each query returns a list of resources defined for that concept together with values of the directly connected properties. If the value of a property is another concept, the values of that concept would also be retrieved. A concept flattening query can be run against a SPARQL endpoint to generate a table per concept/class that would be easier to process in specific applications.
It is important to remember that the SPARQLer tool requires an RDF schema as input for the concept flattening process. The output is a SPARQL query file (.rq) for each concept, and the aim of the query is to retrieve the data as a table. This can give a better overview of the extent of metadata connected to a particular concept. Concept flattening is more easily understood as a method of processing an RDF graph model per concept into a tabular model.
Example of concept flattening
For the Age
concept, all resources of type Age
are extracted with the SubjectPseudoIdentifier
object, the Quantity
object
and the DeterminationDateTime
value of the Age.
Figure 1. The concept Age and its related metadata.
The query below is the output of the SPARQLer for the Age
concept:
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier ?hasSubjectPseudoIdentifier_Identifier . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_Code . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_Identifier . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_CodingSystemAndVersion . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_Name . }
optional{ ?resource sphn:hasQuantity/sphn:hasValue ?hasQuantity_Value . }
optional{ ?resource sphn:hasQuantity/sphn:hasComparator ?hasQuantity_Comparator . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?hasQuantity_hasUnit_Code . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?hasQuantity_hasUnit_hasCode_Identifier . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?hasQuantity_hasUnit_hasCode_CodingSystemAndVersion . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?hasQuantity_hasUnit_hasCode_Name . }
optional{ ?resource sphn:hasDeterminationDateTime ?DeterminationDateTime . }
}
The query above can be ran in any SPARQL endpoint to retrieve data about Age resources found in a database/data resource of interest that stores data in an SPHN-compliant format. The output of the query would generate a table-like representation of the concept Age and its related metadata, of which an excerpt is shown below:
resource |
SubjectPseudoIdentifier_hasIdentifier |
Quantity_hasValue |
… |
---|---|---|---|
CHE…Age-001 |
CHE…SubPseId-002 |
“20”^^xsd:double |
… |
Statistical queries
Example of counting the instances per concept and predicates
The query below is the output of the SPARQLer for the Age
concept:
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
{
SELECT ?origin (COUNT(?origin) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:Age" as ?origin)
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasDeterminationDateTime" as ?origin)
optional{ ?resource sphn:hasDeterminationDateTime ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasComparator" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasComparator ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasValue" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasValue ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?predicate . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (COUNT(?predicate) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?predicate . }
}
GROUP BY ?origin
}
}
The output of the query would generate a table-like representation of the concept Age and its predicates, of which an excerpt is shown below:
origin |
count_instances |
---|---|
“sphn:Age” |
“10”^^xsd:integer |
“sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier” |
“10”^^xsd:integer |
“sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode” |
“10”^^xsd:integer |
… |
… |
Example of min and max of predicates (dates and values)
The query below is the output of the SPARQLer for the Age
concept:
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
{
SELECT ?origin (MIN(?value) as ?min) (MAX(?value) as ?max)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasDeterminationDateTime" as ?origin)
optional{ ?resource sphn:hasDeterminationDateTime ?value . }
}
GROUP BY ?origin
} UNION
{
SELECT ?origin (MIN(?value) as ?min) (MAX(?value) as ?max)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasValue" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasValue ?value . }
}
GROUP BY ?origin
}
}
The output of the query would generate a table-like representation of the concept Age and its hasCode predicates, of which an excerpt is shown below:
origin |
min |
max |
---|---|---|
“sphn:hasDeterminationDateTime” |
“2022-04-06T11:55:43.673Z”^^xsd:dateTime |
“2022-05-27T11:55:43.808Z”^^xsd:dateTime |
“sphn:hasQuantity/sphn:hasValue” |
“0.03431341828097245”^^xsd:double |
“0.9737777086827694”^^xsd:double |
Example of listing and counting the hasCode’s per concept
The query below is the output of the SPARQLer for the Age
concept:
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
{
SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode" as ?origin)
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?code . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?code_name . }
optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }
}
GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
} UNION
{
SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
WHERE {
?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode" as ?origin)
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?code . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?code_name . }
optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }
}
GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
}
}
The output of the query would generate a table-like representation of the concept Age and its hasCode predicates, of which an excerpt is shown below:
origin |
code |
code_identifier |
code_name |
code_codingSystemAndVersion |
count_instances |
---|---|---|---|---|---|
“sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode” |
resource:Code-Uid_hug |
“CHE…” |
“HUG” |
“UID” |
“10”^^xsd:integer |
“sphn:hasQuantity/sphn:hasUnit/hasCode” |
resource:ucum/d |
“10”^^xsd:integer |