XML Schematron Editor

Overview

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath.

Schematron is capable of expressing constraints in ways that other XML schema languages like XML Schema and DTD cannot. For example, it can require that the content of an element be controlled by one of its siblings. Or it can request or require that the root element, regardless of what element that is, must have specific attributes. Schematron can also specify required relationships between multiple XML files.

Constraints and content rules may be associated with “plain-English” validation error messages, allowing translation of numeric Schematron error codes into meaningful user error messages.

XML Schematron Editor in Workgroup works as follows:

XML Schematron Editor Overview

 

An Introduction to Schematron*

The Schematron schema language differs from most other XML schema languages in that it is a rule-based language that uses path expressions instead of grammars. This means that instead of creating a grammar for an XML document, a Schematron schema makes assertions that are applied to a specific context within the document. If the assertion fails, a diagnostic message that is supplied by the author of the schema can be displayed.

One advantages of a rule-based approach is that in many cases modifying the wanted constraint written in plain English can easily create the Schematron rules. For example, a simple content model can be written like this: “The Person element should in the XML instance document have an attribute Title and contain the elements Name and Gender in that order. If the value of the Title attribute is ‘Mr’ the value of the Gender element must be ‘Male’.”

In this sentence the context in which the assertions should be applied is clearly stated as the Person element while there are four different assertions:

  • The context element (Person) should have an attribute Title
  • The context element should contain two child elements, Name and Gender
  • The child element Name should appear before the child element Gender
  • If attribute Title has the value ‘Mr’, the element Gender must have the value ‘Male’

In order to implement the path expressions used in the rules in Schematron, XPath is used with various extensions provided by XSLT.

It has already been mentioned that Schematron makes various assertions based on a specific context in a document. Both the assertions and the context make up two of the four layers in Schematron’s fixed four-layer hierarchy:

  1. phases (top-level)
  2. patterns
  3. rules (defines the context)
  4. assertions

 

Assertions

The bottom layer in the hierarchy is the assertions, which are used to specify the constraints that should be checked within a specific context of the XML instance document. In a Schematron schema, the typical element used to define assertions is assert. The assert element has a test attribute, which is an XSLT pattern. In the preceding example, there was four assertions made on the document in order to specify the content model, namely:

  • The context element (Person) should have an attribute Title
  • The context element should contain two child elements, Name and Gender
  • The child element Name should appear before the child element Gender
  • If attribute Title has the value ‘Mr’, the element Gender must have the value ‘Male’

Written using Schematron assertions this would be expressed as

Type Test Text
Assert @Title The element Person must have a Title attribute.
Assert count(*) = 2 and count(Name) = 1 and count(Gender)= 1 The element Person should have the child elements Name and Gender.
Assert *[1] = Name The element Name must appear before element Gender.
Assert (@Title = 'Mr' and Gender = 'Male') or @Title != 'Mr' If the Title is “Mr” then the gender of the person must be “Male”.

 

If you are familiar with XPath, these assertions are easy to understand, but even for people with limited experience using XPath they are rather straightforward. The first assertion simply tests for the occurrence of an attribute Title. The second assertion tests that the total number of children is equal to 2 and that there is one Name element and one Gender element. The third assertion tests that the first child element is Name, and the last assertion tests that if the person’s title is ‘Mr’, the gender of the person must be ‘Male’.

If the condition in the test attribute is not fulfilled, the content of the assertion element is displayed to the user. 

Each of these assertions has a condition that is evaluated, but the assertion does not define where in the XML instance document this condition should be checked. For example, the first assertion tests for the occurrence of the attribute Title, but it is not specified on which element in the XML instance document this assertion is applied. The next layer in the hierarchy, the rules, specifies the location of the contexts of assertions.

The Assert type element is used to tag positive assertions about a document.

The Report type is used to tag negative assertions about a document.

 

Rules

The rules in Schematron are declared by using the rule element, which has a context attribute. The value of the context attribute must match an XPath Expression that is used to select one or more nodes in the document. Like the name suggests, the context attribute is used to specify the context in the XML instance document where the assertions should be applied. In the previous example the context was specified to be the Person element, and a Schematron rule with the Person element as context would simply be

Id Abstract Context
  False Person

 

Since the rules are used to group all assertions together that share the same context, the rules are designed so that the assertions are declared as children of the rule element. For the previous example, this means that the complete Schematron rule would be

The element Person must have a Title attribute.
The element Person should have the child elements Name and Gender.
The element Name must appear before element Age.
If the Title is "Mr" then the gender of the person must be "Male".

 

This means that all the assertions in the rule will be tested on every Person element in the XML instance document. If the context is not all the Person elements, it is easy to change the XPath location path to define a more restricted context. The value Database/Person, for example, sets the context to be all the Person elements that have the element Database as its parent.

 

Patterns

The third layer in the Schematron hierarchy is the pattern, declared using the pattern element, which is used to group together different rules. The pattern element also has a name attribute that will be displayed in the output when the pattern is checked. For the preceding assertions, you could have two patterns: one for checking the structure and another for checking the co-occurrence constraint. Since patterns group different rules together, Schematron is designed so that rules are declared as children of the pattern element. This means that the previous example, using the two patterns, would look like

The element Person must have a Title attribute.
The element Person should have the child elements Name and Gender.
The element Name must appear before element Age.
If the Title is "Mr" then the gender of the person must be "Male".

 

The name of the pattern will always be displayed in the output, regardless of whether the assertions fail or succeed. If the assertion fails, the output will also contain the content of the assertion element. However, there is also additional information displayed together with the assertion text to help you locate the source of the failed assertion. For example, if the co-occurrence constraint above was violated by having Title=’Mr’ and Gender=’Female’ then the following diagnostic would be generated by Schematron:

From pattern "Check structure":
From pattern "Check co-occurrence constraints": 
Assertion fails: "If the Title is "Mr" then the gender of the person must be "Male"." 
at /Person[1] ...</>

 

The pattern names are always displayed, while the assertion text is only displayed when the assertion fails. The additional information starts with an XPath expression that shows the location of the context element in the instance document (in this case the first Person element) and then on a new line the start tag of the context element is displayed.

The assertion to test the co-occurrence constraint is not trivial, and in fact this rule could be written in a simpler way by using an XPath predicate when selecting the context. Instead of having the context set to all Person elements, the co-occurrence constraint can be simplified by only specifying the context to be all the Person elements that have the attribute Title=’Mr’. If the rule was specified using this technique, the co-occurrence constraint could be described like this

If the Title is "Mr" then the gender of the person must be "Male".

 

By moving some of the logic from the assertion to the specification of the context, the complexity of the rule has been decreased. This technique is often very useful when writing Schematron schemas.

*[Reference: www.xml.com/pub/a/2003/11/12/schematron.html]