Warning
The following page aims to provide an introduction to SPARQL for querying RDF data. After reading this page, you will know what SPARQL is; what is the structure of a SPARQL query; what are the different kinds of SPARQL queries; how to build and write SPARQL queries for validating data in RDF.
SPARQL
Introduction
SPARQL (SPARQL Protocol and RDF Query Language) is the standard querying language for RDF. More specifically, it is the declarative language part of the W3C standards. SPARQL borrows elements from RDF and is similar to SQL.
SPARQL queries are based on ‘graph pattern matching’, meaning that the tool doing the search will try to match the pattern in the query with the corresponding data and retrieve it.
Shown in Figure 1 is a triple representing a resource resource:HospitalA
,
which has a relation sphn:hasSubjectPseudoIdentifier
to a variable ?patient
This is a valid pattern which can be used for a search,
and yields the list of patients for resource:HospitalA
.
The syntax of SPARQL queries is similar to Turtle (but not exactly the same).

Figure 1: Example of a graph. A resource HospitalA
connects to a
variable ?patient
via a sphn:hasSubjectPseudoIdentifier
link.
Note
A variable in SPARQL always includes a question mark in front of the variable name.
Structure of a query
At the minimum, a basic SPARQL query format includes a SELECT and WHERE statement.
It has the following structure:
SELECT <variables>
WHERE {
<graph-pattern>
}
where part of the WHERE
statement are curly brackets that include the graph pattern.
Furthermore, a SPARQL query can include the following parts as well:
Prefix declarations
They are namespace declarations and allow for prefix names to be written in queries, rather than full URIs. With prefix declarations, we can write shorter and clearer code:
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
Note
With a prefix declared for SPHN, we can simply refer to sphn
in our code rather than spell out the entire URI.
Therefore, instead of writing:
?patient rdf:type https://biomedit.ch/rdf/sphn-ontology/SubjectPseudoIdentifier
We can write:
?patient a sphn:SubjectPseudoIdentifier
Type of query declaration
There exists four types of query declaration (more information here and on Query Forms):
SELECT
ASK
DESCRIBE
CONSTRUCT
Data set definition
If multiple data graphs are provided in a triplestore, specific data set from which the query should be ran against can be done specified with:
FROM <...>
FROM NAMED <...>
Note
If the dataset is not defined, the query usually runs by default on the complete data set.
Graph pattern
The clause WHERE { ... }
is used to define the graph pattern (in the form of triples)
that the result of the query should comply with.
Query modifiers
They allow to modify the way the output of the query is presented:
ORDER BY ...
Establishes the order of a solution sequenceGROUP BY ...
After dividing the solution into groups, GROUP BY calculates the aggregate value of the groups.HAVING ...
Operates over group solution sets and filters by a variableLIMIT ...
Places a limit on the number of solutions returnedOFFSET ...
Controls where the solutions start fromBIND ...
Assigns a variable to a value or an expression stated in the query
Types of queries
There are four types of SPARQL queries:
SELECT
A SELECT query gets results for requested variables. The output is displayed in a table (see W3C documentation SELECT)
The example above retrieves all instances (?patient
) where the type is a sphn:SubjectPseudoIdentifier
.
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
SELECT ?patient
WHERE {
?patient rdf:type sphn:SubjectPseudoIdentifier
}
ASK
An ASK query checks for matches of a requested pattern, and results in a Boolean ‘yes/no’ output (see W3C documentation ASK)
In the example below, the question asked is whether patient77 had an allergy episode annotated.
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX sphn:<https://biomedit.ch/rdf/sphn-ontology/sphn#>
ASK
WHERE {
?patient a sphn:SubjectPseudoIdentifier .
?patient sphn:hasIdentifier "patient77" .
?allergy_episode a sphn:AllergyEpisode .
?allergy_episode sphn:hasSubjectPseudoIdentifier ?patient .
}
CONSTRUCT
A CONSTRUCT query gets specific parts of a graph, and manipulates the graph by creating new triple as indicated in the query (see W3C documentation CONSTRUCT)
The query below adds a diagnosis to patients that have a lab test code LOINC 6690-2. The result retrieves the list of patients having this new diagnosis, in the form of a triple.
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX snomed: <http://snomed.info/id/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {
resource:Diagnosis1 a sphn:Diagnosis .
resource:Diagnosis1 sphn:hasSubjectPseudoIdentifier ?patient .
}
WHERE {
?patient a sphn:SubjectPseudoIdentifier .
?lab a sphn:LabResult .
?lab sphn:hasSubjectPseudoIdentifier ?patient .
?lab sphn:hasLabResultLabTestCode ?code .
?code a loinc:6690-2 .
}
DESCRIBE
A DESCRIBE query gets basic (triple) information about a variable or resource (see W3C documentation DESCRIBE)
In the example below, the query returns all information provided for patient78.
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX snomed: <http://snomed.info/id/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
DESCRIBE ?thing
WHERE {
?thing a sphn:SubjectPseudoIdentifier .
?thing sphn:hasIdentifier "patient78" .
}
Query formation
In addition to the already mentioned query types, other constructs are also possible:
Nested queries
Nested queries are referred to as ‘subqueries’ in SPARQL: one SELECT
inside another
SELECT
(more information about subqueries).
A nested query is a SELECT clause within a SELECT clause, where the results of the subquery are evaluated first and then projected to the outer query.
The following query calculates the average number of patients per data provider institute:
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT (avg(?numPatients) AS ?avgNumPatientsByDataProvider)
WHERE {
SELECT ?data_provider (count(?patient) AS ?numPatients)
WHERE {
?patient a sphn:SubjectPseudoIdentifier .
?data_provider a sphn:DataProviderInstitute .
?patient sphn:hasDataProviderInstitute ?data_provider .
} GROUP BY ?data_provider
}
Federated SPARQL
A federated query allows for querying different SPARQL endpoints in the same query using
a SERVICE
clause (more information about federated querying).
We can thereby combine information that live in different datasets in one query.
In the example below, the assumption made is that SNOMED CT codes to annotate the Substance comes from the BioPortal instance of SNOMED CT. Using the SERVICE
clause which connects to the BioPortal namespace of SNOMED CT (http://bioportal.bioontology.org/ontologies/SNOMEDCT/), it is possible to retrieve the preferred label of the following SNOMED CT code: 762952008
which corresponds to the Peanut
substance some patients are allergic against:
PREFIX sphn: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX snomed_bioportal: <http://purl.bioontology.org/ontology/SNOMEDCT/>
SELECT ?patient ?label
WHERE {
?patient a sphn:SubjectPseudoIdentifier .
?allergy_episode a sphn:AllergyEpisode .
?substance a sphn:Substance .
?allergy_episode sphn:hasSubjectPseudoIdentifier ?patient .
?allergy_episode sphn:hasSubstance ?substance .
?substance sphn:hasCode ?substance_code .
SERVICE <http://bioportal.bioontology.org/ontologies/SNOMEDCT/> {
?substance_code a snomed_bioportal:762952008 .
?substance_code skos:prefLabel ?label.
}
}
Note
Some tips for working with SPARQL queries:
a
is a shortcut forrdf:type
Prefixes are highly recommended for better readability
Being familiar with the dataset structure helps to write a query
The period at the end of a line in the WHERE clause is a conjunction, i.e.
AND
A semicolon at the end of a line in a WHERE clause introduces another property of the same subject
A comma at the end of a line in a WHERE clause introduces another object with the same predicate and subject.