SHACL

Shapes Constraint Language, SHACL (https://www.w3.org/TR/shacl/), is a W3C recommended language for validating RDF graphs against a set of conditions, also known as shapes. SHACL conditions are written in RDF and are called shapes graph. The RDF graphs being validated against such a shapes graph are called data graphs. The description of SHACL conditions, or rules, can then be used for validation as part of the validation process, but also for additional purposes such as data integration. The validation process takes a data graph and a shapes graph as input to produce a validation report, as shown in Figure 1. This process is facilitated by a SHACL Processor, which can be either an RDF Triplestore with support for SHACL validation, or a dedicated SHACL Validation Processor (e.g. library, API, framework).

SHACL validation

Figure 1 Validation process facilitated by a SHACL Processor. A data graph and a shapes graph given as input are mapped to a validation report as the result.

The exemplary steps involved for the validation of RDF data are (see Figure 2):

  • Initially there should be a Data Graph which represents the components of the RDF dataset.

  • From this Data Graph, targets are used to select focus nodes which specify the RDF graph nodes of interest to be validated against a given shape (see SHACL targets section for further information).

  • Filters can then be used to remove some of the focus nodes. Filters are part of SHACL Advanced Features.

  • Constraints are then used to ensure the conformance of focus nodes against the shapes, resulting in a validation report (see SHACL shapes and SHACL constraints sections for further information).

The validation report is described with the SHACL Validation Report Vocabulary.

SHACL validation overview

Figure 2 Overview of steps involved in the SHACL validation process.

The following sections provide further information on SHACL targets, SHACL shapes, and SHACL constraints, followed by a SHACL example with explanations.

SHACL targets

SHACL targets enables the selection of a specific set of instances from an RDF-compliant data against which a validation constraint is applied. These selected ‘data targets’, or nodes, will have to conform to the constraint in order to be validated.

The SHACL Core Language includes the following kinds of targets:

Table 1 SHACL targets available in the SHACL Core Language.

Value

Description

targetNode

Points to a specific node

targetClass

Points to all nodes that have a given type

targetSubjectsOf

Points to all subjects of a specific property

targetObjectsOf

Points to all objects of a specific property

SHACL shapes

A shape determines how to validate a focus node based on the values of properties and other characteristics of the focus node. To that end, shapes can declare constraints. The SHACL Core language defines two types of shapes:

  • node shapes, that specify constraints about the focus node itself

  • property shapes, that specify constraints about the value of a particular property or path for a focus node.

The following is a very simple example of the syntax for writing a SHACL node shape. It specifies that the focus node called ‘Name’ itself is a node of type IRI.

:Name a sh:NodeShape ;
  sh:nodeKind sh:IRI .

For more elaborate examples, also including Property Shapes, please refer to the SHACL example section.

SHACL constraints

Constraints refer to the general meaning of a restriction or limitation. In the context of SHACL this can be understood as a restriction given when defining a shape or combinations of shapes. Different types of constraints exist and each type has several possible constraints components. Table 2 summarizes the possible SHACL Core Constraints types and their associated components.

Table 2 SHACL Core Constraints The columns Node Shape and Property Shape indicate in which context the constraints can be used.

Constraint Type

Constraints Component

Node Shape

Property Shape

Value Type

class, datatype, nodeKind, targetClass

yes

yes

Cardinality

minCount, maxCount

no

yes

Values

node, in, hasValue, path

yes

yes

Value Range

minInclusive, maxInclusive, minExclusive, maxExclusive

no

yes

String-based

minLength, maxLength, pattern, languageIn, uniqueLang

yes

yes

Property Pair Constraints

equals, disjoint, lessThan, lessThanOrEquals

no

yes

Logical Constraints

not, and, or, xone

yes

yes

Qualified Shapes

qualifiedValueShape, qualifiedMinCount, qualifiedMaxCount

no

yes

Closed Shapes

closed, ignoredProperties

yes

no

Non-validating Constraints

name, value, defaultValue

yes

yes

Note

Please note that the namespace sh: is omitted in front of the constraints of Table 2 for better readability.

Validation report

The validation report is the result of the validation process that reports the conformance and the set of all validation results (Validation Report).

Let’s take the following data example which informs that the IRI http://example.org/Max is of type Name:

<http://example.org/Max> a :Name.

If we were to validate this data against the shapes graph example from the SHACL shapes section, the validation report would output the following results:

[     a sh:ValidationReport ;
      sh:conforms true ;
] .

This report informs that the data conforms to the shapes and is therefore valid for this specific node constraint: http://example.org/Max is indeed an IRI. For more elaborate examples, also including a validation report on data that does not conform to a shapes graph, please refer to the SHACL example section below.

SHACL example

The following example is used to demonstrate how SHACL conditions can be represented as shapes and constraints in a shapes graph, and used to validate a data graph (adapted from https://www.w3.org/TR/shacl/#shacl-example; same conventions apply). The data graph contains three instances of the ex:Patient class, defined as follows:

Data graph

ex:Max
      a ex:Patient ;
      ex:hasSubjectPseudoIdentifier "123-45-678A" .

ex:Erika
      a ex:Patient ;
      ex:hasSubjectPseudoIdentifier "123-45-6789" ;
      ex:hasSubjectPseudoIdentifier "124-35-6789" .

ex:Kevin
      a ex:Patient;
      ex:birthDate "1971-07-07"^^xsd:date ;
      ex:isLocatedAt ex:UntypedLocation .

The SHACL conditions to be represented are:

  1. An instance of ex:Patient can have at most one value for the property ex:hasSubjectPseudoIdentifier. This value must be a string literal having a predetermined format (i.e., three digits followed by a dash followed by two digits followed by another dash and followed by four digits).

ex:PatientShape
       a sh:NodeShape ;
       sh:targetClass ex:Patient ; # Applies to all patients
       sh:property [
                       sh:path ex:hasSubjectPseudoIdentifier ; # constrains the values of ex:hasSubjectPseudoIdentifier
                       sh:maxCount 1 ; # constrains the amount of properties ex:hasSubjectPseudoIdentifier to maximum 1
                       sh:datatype xsd:string ; # constrains the value to be a string
                       sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ; # constrains the value to follow this specific pattern
           ] ;
  1. An instance of ex:Patient can have any number of values for the property ex:isLocatedAt. These values must be IRIs and instances of ex:Location_class.

sh:property [
                sh:path ex:isLocatedAt ;
                sh:class ex:Location_class ;
                sh:nodeKind sh:IRI ;
            ] ;
  1. An instance of ex:Patient cannot have values for any other property apart from ex:hasSubjectPseudoIdentifier, ex:isLocatedAt and rdf:type. This is represented with the shape sh:closed with value equals to true, meaning that no other property values than the ones described in the shapes are allowed.

sh:closed true ;
       sh:ignoredProperties ( rdf:type ) .

All of these conditions are now represented in the following shapes, which is the way to write a complete SHACL constraint for a given target class:

ex:PatientShape
       a sh:NodeShape ;
       sh:targetClass ex:Patient ; # Applies to all patients
       sh:property [
                       sh:path ex:hasSubjectPseudoIdentifier ; # constrains the values of ex:hasSubjectPseudoIdentifier
                      sh:maxCount 1 ;
                      sh:datatype xsd:string ;
                      sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ;
                    ] ;
       sh:property [
                       sh:path ex:isLocatedAt ;
                      sh:class ex:Location_class ;
                      sh:nodeKind sh:IRI ;
                    ] ;
       sh:closed true ;
       sh:ignoredProperties ( rdf:type ) .

Find below the explanation of how the representation can be read and understood:

  • The shape ex:PatientShape declares the target nodes to be the set of all instances of the class ex:Patient by means of the sh:targetClass property.

  • During the validation process, the declared target nodes become focus nodes for the node shapes to apply to. The ex:PatientShape is such a node shape, declaring constraints on the focus nodes.

  • The ex:PatientShape declares constraints using the parameters sh:closed, sh:ignoredProperties, and two additional constraints with the sh:property supported by a property shape each.

  • Property shapes impose multiple restrictions on property values by specifying parameters from multiple constraint components. For each focus node, the property values of property shape are validated against all of its components.

  • The property shape for ex:hasSubjectPseudoIdentifier uses parameters from three constraint components: sh:datatype, sh:pattern, and sh:maxCount. For each focus node, the property values of ex:hasSubjectPseudoIdentifier are validated against all three components. In detail, the sh:datatype constrains the value to be a string, the sh:pattern constrains the string to follow the specified regular expression, and the sh:maxCount constrains the maximum number of values for the property ex:hasSubjectPseudoIdentifier to be one for a single ex:Patient instance.

  • The property shape for ex:isLocatedAt uses parameters from two constraint components: sh:nodeKind and sh:class. For each focus node, the property values of ex:isLocatedAt are validated against both components. In detail, the sh:nodeKind and sh:class constrain the values to IRIs and instances of ex:Location_class.

SHACL validation based on the provided data graph and shapes graph would produce the following validation report (Note: edited for easier reading; see the section Validation Report for details on the format).

[ a sh:ValidationReport;
 sh:conforms false;
 sh:result
          [ a sh:ValidationResult;
               sh:resultSeverity sh:Violation;
               sh:focusNode ex:Max;
               sh:value "123-45-678A";
               sh:resultPath ex:hasSubjectPseudoIdentifier;
               sh:sourceConstraintComponent sh:PatternConstraintComponent;
               sh:sourceShape <http://shape.ontotext.com/node#.../1> .
         ],
         [ a sh:ValidationResult;
               sh:resultSeverity sh:Violation;
               sh:focusNode ex:Erika;
               sh:resultPath ex:hasSubjectPseudoIdentifier;
               sh:sourceConstraintComponent sh:MaxCountConstraintComponent;
               sh:sourceShape <http://shape.ontotext.com/node#.../1> .
         ],
             [ a sh:ValidationResult;
               sh:resultSeverity sh:Violation;
               sh:focusNode ex:Kevin;
               sh:resultPath ex:isLocatedAt;
               sh:value ex:UntypedLocation;
               sh:sourceConstraintComponent sh:ClassConstraintComponent;
               sh:sourceShape <http://shape.ontotext.com/node#.../2> .
         ]
] .

       <http://shape.ontotext.com/node#.../1> a sh:PropertyShape;
         sh:path ex:hasSubjectPseudoIdentifier;
         sh:maxCount 1;
         sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" .

       <http://shape.ontotext.com/node#.../2> a sh:PropertyShape;
         sh:path ex:isLocatedAt;
         sh:class ex:Location_class> .

The validation report fails meaning that the data is not compliant to the SHACL rules defined, and provides some hints about the problems:

  • Max’s identifier does not follow the string pattern constraint (i.e. the last element is not a digit but a word character)

  • Erika does not follow the restriction on the amount of ex:hasSubjectPseudoIdentifier that can be given (i.e. two ex:hasSubjectPseudoIdentifier are given)

  • Kevin’s birth date location is not coming from the ex:Location_class (i.e. location is ex:UntypedLocation).