Concept Flattening in the SPARQLer

Target Audience

This document is intended for data managers, researchers, RDF experts, and SPHN affiliated partners who are interested in automatically generating SPARQL queries for SPHN RDF compatible schema with various concepts using the SPARQLer tool.

This document contains the following information:

  • Definitions of terms

  • Concept flattening overview

  • Example of a concept flattening.

Definitions

  • Resource: is anything, any IRI or literal denotes something in the world (the “universe of discourse”). These things are called resources. Anything can be a resource, including physical things, documents, abstract concepts, numbers and strings; the term is synonymous with “entity” as it is used in the RDF Semantics specification [RDF11-MT].

  • Properties: An RDF property is any relation between subject resources and object resources. Asserting an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object.

For more information about RDF elements, visit the RDF Background section.

Concept flattening

The concept flattening process groups all resources and their properties into one query file per concept. The data is “flattened”, in the sense that each query returns a list of resources defined for that concept together with values of the directly connected properties. If the value of a property is another concept, the values of that concept would also be retrieved. A concept flattening query can be run against a SPARQL endpoint to generate a table per concept/class that would be easier to process in specific applications.

It is important to remember that the SPARQLer tool requires an RDF schema as input for the concept flattening process. The output is a SPARQL query file (.rq) for each concept, and the aim of the query is to retrieve the data as a table. This can give a better overview of the extent of metadata connected to a particular concept. Concept flattening is more easily understood as a method of processing an RDF graph model per concept into a tabular model.

Example of concept flattening

For the Age concept, all resources of type Age are extracted with the SubjectPseudoIdentifier object, the Quantity object and the DeterminationDateTime value of the Age.

Age concept

Figure 1. The concept Age and its related metadata.

The query below is the output of the SPARQLer for the Age concept:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
   ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier ?hasSubjectPseudoIdentifier_Identifier . }
   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_Code . }
   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_Identifier . }
   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_CodingSystemAndVersion . }
   optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?hasSubjectPseudoIdentifier_hasDataProviderInstitute_hasCode_Name . }
   optional{ ?resource sphn:hasQuantity/sphn:hasValue ?hasQuantity_Value . }
   optional{ ?resource sphn:hasQuantity/sphn:hasComparator ?hasQuantity_Comparator . }
   optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?hasQuantity_hasUnit_Code . }
   optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?hasQuantity_hasUnit_hasCode_Identifier . }
   optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?hasQuantity_hasUnit_hasCode_CodingSystemAndVersion . }
   optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?hasQuantity_hasUnit_hasCode_Name . }
   optional{ ?resource sphn:hasDeterminationDateTime ?DeterminationDateTime . }
}

The query above can be ran in any SPARQL endpoint to retrieve data about Age resources found in a database/data resource of interest that stores data in an SPHN-compliant format. The output of the query would generate a table-like representation of the concept Age and its related metadata, of which an excerpt is shown below:

Table 1. Excerpt of an example of table generated when running the Age query.

resource

SubjectPseudoIdentifier_hasIdentifier

Quantity_hasValue

CHE…Age-001

CHE…SubPseId-002

“20”^^xsd:double

Statistical queries

Example of counting the instances per concept and predicates

The query below is the output of the SPARQLer for the Age concept:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
  {
  SELECT ?origin (COUNT(?origin) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .
      BIND("sphn:Age" as ?origin)
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasDeterminationDateTime" as ?origin)
      optional{ ?resource sphn:hasDeterminationDateTime ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasComparator" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasComparator ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasValue" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasValue ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?predicate . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (COUNT(?predicate) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?predicate . }
  }
  GROUP BY ?origin
  }
}

The output of the query would generate a table-like representation of the concept Age and its predicates, of which an excerpt is shown below:

Table 2. Excerpt of an example of table generated when running the Age query.

origin

count_instances

“sphn:Age”

“10”^^xsd:integer

“sphn:hasSubjectPseudoIdentifier/sphn:hasIdentifier”

“10”^^xsd:integer

“sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode”

“10”^^xsd:integer

Example of min and max of predicates (dates and values)

The query below is the output of the SPARQLer for the Age concept:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
  {
  SELECT ?origin (MIN(?value) as ?min) (MAX(?value) as ?max)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasDeterminationDateTime" as ?origin)
      optional{ ?resource sphn:hasDeterminationDateTime ?value . }
  }
  GROUP BY ?origin
  } UNION
  {
  SELECT ?origin (MIN(?value) as ?min) (MAX(?value) as ?max)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasValue" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasValue ?value . }
  }
  GROUP BY ?origin
  }
}

The output of the query would generate a table-like representation of the concept Age and its hasCode predicates, of which an excerpt is shown below:

Table 3. Excerpt of an example of table generated when running the Age query.

origin

min

max

“sphn:hasDeterminationDateTime”

“2022-04-06T11:55:43.673Z”^^xsd:dateTime

“2022-05-27T11:55:43.808Z”^^xsd:dateTime

“sphn:hasQuantity/sphn:hasValue”

“0.03431341828097245”^^xsd:double

“0.9737777086827694”^^xsd:double

Example of listing and counting the hasCode’s per concept

The query below is the output of the SPARQLer for the Age concept:

PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT *
WHERE {
  {
  SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode" as ?origin)
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode ?code . }
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasName ?code_name . }
      optional{ ?resource sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }

  }
  GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
  } UNION
  {
  SELECT ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion (COUNT(?code) as ?count_instances)
  WHERE {
      ?resource a <https://biomedit.ch/rdf/sphn-ontology/sphn#Age> .

      BIND("sphn:hasQuantity/sphn:hasUnit/sphn:hasCode" as ?origin)
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode ?code . }
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasIdentifier ?code_identifier . }
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasName ?code_name . }
      optional{ ?resource sphn:hasQuantity/sphn:hasUnit/sphn:hasCode/sphn:hasCodingSystemAndVersion ?code_codingSystemAndVersion . }

  }
  GROUP BY ?origin ?code ?code_identifier ?code_name ?code_codingSystemAndVersion
  }
}

The output of the query would generate a table-like representation of the concept Age and its hasCode predicates, of which an excerpt is shown below:

Table 4. Excerpt of an example of table generated when running the Age query.

origin

code

code_identifier

code_name

code_codingSystemAndVersion

count_instances

“sphn:hasSubjectPseudoIdentifier/sphn:hasDataProviderInstitute/sphn:hasCode”

resource:Code-Uid_hug

“CHE…”

“HUG”

“UID”

“10”^^xsd:integer

“sphn:hasQuantity/sphn:hasUnit/hasCode”

resource:ucum/d

“10”^^xsd:integer