SHACL
Shapes Constraint Language, SHACL (https://www.w3.org/TR/shacl/), is a W3C
recommended language for validating RDF graphs against a set of conditions,
also known as shapes. SHACL conditions are written in RDF and are called shapes graph
.
The RDF graphs being validated against such a shapes graph are called data graphs
.
The description of SHACL conditions, or rules, can then be used for validation as part
of the validation process, but also for additional purposes such as data integration.
The validation process takes a data graph and a shapes graph as input to produce
a validation report, as shown in Figure 1. This process is facilitated by a SHACL Processor,
which can be either an RDF Triplestore with support for SHACL validation,
or a dedicated SHACL Validation Processor (e.g. library, API, framework).
Figure 1 Validation process facilitated by a SHACL Processor. A data graph and a shapes graph given as input are mapped to a validation report as the result.
The exemplary steps involved for the validation of RDF data are (see Figure 2):
Initially there should be a Data Graph which represents the components of the RDF dataset.
From this Data Graph, targets are used to select focus nodes which specify the RDF graph nodes of interest to be validated against a given shape (see SHACL targets section for further information).
Filters can then be used to remove some of the focus nodes. Filters are part of SHACL Advanced Features.
Constraints are then used to ensure the conformance of focus nodes against the shapes, resulting in a validation report (see SHACL shapes and SHACL constraints sections for further information).
The validation report is described with the SHACL Validation Report Vocabulary.
Figure 2 Overview of steps involved in the SHACL validation process.
The following sections provide further information on SHACL targets, SHACL shapes, and SHACL constraints, followed by a SHACL example with explanations.
SHACL targets
SHACL targets enables the selection of a specific set of instances from an RDF-compliant data against which a validation constraint is applied. These selected ‘data targets’, or nodes, will have to conform to the constraint in order to be validated.
The SHACL Core Language includes the following kinds of targets:
Value |
Description |
---|---|
targetNode |
Points to a specific node |
targetClass |
Points to all nodes that have a given type |
targetSubjectsOf |
Points to all subjects of a specific property |
targetObjectsOf |
Points to all objects of a specific property |
SHACL shapes
A shape determines how to validate a focus node based on the values of properties and other characteristics of the focus node. To that end, shapes can declare constraints. The SHACL Core language defines two types of shapes:
node shapes, that specify constraints about the focus node itself
property shapes, that specify constraints about the value of a particular property or path for a focus node.
The following is a very simple example of the syntax for writing a SHACL node shape. It specifies that the focus node called ‘Name’ itself is a node of type IRI.
:Name a sh:NodeShape ; sh:nodeKind sh:IRI .
For more elaborate examples, also including Property Shapes, please refer to the SHACL example section.
SHACL constraints
Constraints refer to the general meaning of a restriction or limitation. In the context of SHACL this can be understood as a restriction given when defining a shape or combinations of shapes. Different types of constraints exist and each type has several possible constraints components. Table 2 summarizes the possible SHACL Core Constraints types and their associated components.
Constraint Type |
Constraints Component |
Node Shape |
Property Shape |
---|---|---|---|
Value Type |
yes |
yes |
|
Cardinality |
no |
yes |
|
Values |
yes |
yes |
|
Value Range |
no |
yes |
|
String-based |
yes |
yes |
|
Property Pair Constraints |
no |
yes |
|
Logical Constraints |
yes |
yes |
|
Qualified Shapes |
no |
yes |
|
Closed Shapes |
yes |
no |
|
Non-validating Constraints |
yes |
yes |
Note
Please note that the namespace sh: is omitted in front of the constraints of Table 2 for better readability.
Validation report
The validation report is the result of the validation process that reports the conformance and the set of all validation results (Validation Report).
Let’s take the following data example which informs that the IRI http://example.org/Max
is of type Name:
<http://example.org/Max> a :Name
.
If we were to validate this data against the shapes graph example from the SHACL shapes section, the validation report would output the following results:
[ a sh:ValidationReport ; sh:conforms true ; ] .
This report informs that the data conforms to the shapes and is therefore valid for this specific node constraint: http://example.org/Max is indeed an IRI. For more elaborate examples, also including a validation report on data that does not conform to a shapes graph, please refer to the SHACL example section below.
SHACL example
The following example is used to demonstrate how SHACL conditions can be represented as shapes and constraints in a shapes graph, and used to validate a data graph (adapted from https://www.w3.org/TR/shacl/#shacl-example; same conventions apply). The data graph contains three instances of the
ex:Patient
class, defined as follows:
Data graph
ex:Max
a ex:Patient ;
ex:hasSubjectPseudoIdentifier "123-45-678A" .
ex:Erika
a ex:Patient ;
ex:hasSubjectPseudoIdentifier "123-45-6789" ;
ex:hasSubjectPseudoIdentifier "124-35-6789" .
ex:Kevin
a ex:Patient;
ex:birthDate "1971-07-07"^^xsd:date ;
ex:isLocatedAt ex:UntypedLocation .
The SHACL conditions to be represented are:
An instance of
ex:Patient
can have at most one value for the propertyex:hasSubjectPseudoIdentifier
. This value must be a string literal having a predetermined format (i.e., three digits followed by a dash followed by two digits followed by another dash and followed by four digits).
ex:PatientShape
a sh:NodeShape ;
sh:targetClass ex:Patient ; # Applies to all patients
sh:property [
sh:path ex:hasSubjectPseudoIdentifier ; # constrains the values of ex:hasSubjectPseudoIdentifier
sh:maxCount 1 ; # constrains the amount of properties ex:hasSubjectPseudoIdentifier to maximum 1
sh:datatype xsd:string ; # constrains the value to be a string
sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ; # constrains the value to follow this specific pattern
] ;
An instance of
ex:Patient
can have any number of values for the propertyex:isLocatedAt
. These values must be IRIs and instances ofex:Location_class
.
sh:property [
sh:path ex:isLocatedAt ;
sh:class ex:Location_class ;
sh:nodeKind sh:IRI ;
] ;
An instance of
ex:Patient
cannot have values for any other property apart fromex:hasSubjectPseudoIdentifier
,ex:isLocatedAt
andrdf:type
. This is represented with the shapesh:closed
with value equals totrue
, meaning that no other property values than the ones described in the shapes are allowed.
sh:closed true ;
sh:ignoredProperties ( rdf:type ) .
All of these conditions are now represented in the following shapes, which is the way to write a complete SHACL constraint for a given target class:
ex:PatientShape
a sh:NodeShape ;
sh:targetClass ex:Patient ; # Applies to all patients
sh:property [
sh:path ex:hasSubjectPseudoIdentifier ; # constrains the values of ex:hasSubjectPseudoIdentifier
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ;
] ;
sh:property [
sh:path ex:isLocatedAt ;
sh:class ex:Location_class ;
sh:nodeKind sh:IRI ;
] ;
sh:closed true ;
sh:ignoredProperties ( rdf:type ) .
Find below the explanation of how the representation can be read and understood:
The shape
ex:PatientShape
declares the target nodes to be the set of all instances of the classex:Patient
by means of thesh:targetClass
property.During the validation process, the declared target nodes become focus nodes for the node shapes to apply to. The
ex:PatientShape
is such a node shape, declaring constraints on the focus nodes.The
ex:PatientShape
declares constraints using the parameterssh:closed
,sh:ignoredProperties
, and two additional constraints with thesh:property
supported by a property shape each.Property shapes impose multiple restrictions on property values by specifying parameters from multiple constraint components. For each focus node, the property values of property shape are validated against all of its components.
The property shape for
ex:hasSubjectPseudoIdentifier
uses parameters from three constraint components:sh:datatype
,sh:pattern
, andsh:maxCount
. For each focus node, the property values ofex:hasSubjectPseudoIdentifier
are validated against all three components. In detail, thesh:datatype
constrains the value to be a string, thesh:pattern
constrains the string to follow the specified regular expression, and thesh:maxCount
constrains the maximum number of values for the propertyex:hasSubjectPseudoIdentifier
to be one for a singleex:Patient
instance.The property shape for
ex:isLocatedAt
uses parameters from two constraint components:sh:nodeKind
andsh:class
. For each focus node, the property values ofex:isLocatedAt
are validated against both components. In detail, thesh:nodeKind
andsh:class
constrain the values to IRIs and instances ofex:Location_class
.
SHACL validation based on the provided data graph and shapes graph would produce the following validation report (Note: edited for easier reading; see the section Validation Report for details on the format).
[ a sh:ValidationReport;
sh:conforms false;
sh:result
[ a sh:ValidationResult;
sh:resultSeverity sh:Violation;
sh:focusNode ex:Max;
sh:value "123-45-678A";
sh:resultPath ex:hasSubjectPseudoIdentifier;
sh:sourceConstraintComponent sh:PatternConstraintComponent;
sh:sourceShape <http://shape.ontotext.com/node#.../1> .
],
[ a sh:ValidationResult;
sh:resultSeverity sh:Violation;
sh:focusNode ex:Erika;
sh:resultPath ex:hasSubjectPseudoIdentifier;
sh:sourceConstraintComponent sh:MaxCountConstraintComponent;
sh:sourceShape <http://shape.ontotext.com/node#.../1> .
],
[ a sh:ValidationResult;
sh:resultSeverity sh:Violation;
sh:focusNode ex:Kevin;
sh:resultPath ex:isLocatedAt;
sh:value ex:UntypedLocation;
sh:sourceConstraintComponent sh:ClassConstraintComponent;
sh:sourceShape <http://shape.ontotext.com/node#.../2> .
]
] .
<http://shape.ontotext.com/node#.../1> a sh:PropertyShape;
sh:path ex:hasSubjectPseudoIdentifier;
sh:maxCount 1;
sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" .
<http://shape.ontotext.com/node#.../2> a sh:PropertyShape;
sh:path ex:isLocatedAt;
sh:class ex:Location_class> .
The validation report fails meaning that the data is not compliant to the SHACL rules defined, and provides some hints about the problems:
Max’s identifier does not follow the string pattern constraint (i.e. the last element is not a digit but a word character)
Erika does not follow the restriction on the amount of
ex:hasSubjectPseudoIdentifier
that can be given (i.e. twoex:hasSubjectPseudoIdentifier
are given)Kevin’s birth date location is not coming from the
ex:Location_class
(i.e. location isex:UntypedLocation
).