Tuesday, April 06, 2004

XML Schema's uniques keys and keyrefs

XML Schema provides several features that are kind of mimicking some of the ones you can find in a relational database, but with some differences.
We intend here to talk about the following XML Schema elements:
  • key and keyref
  • unique

Get the required softwares

You need an XML Parser supporting the latest XML Schema specifications. An IDE with an XML Schema Visual Editor would also help.

key and keyref

The XML Schema elements key and keyref provide a way to ensure that within a given XML document matching a schema using them, a reference cannot be made to a missing value. The XML Schema element key also preserves unicity of the value within the instance document. As a result, it is useless to use the unique XML Schema element if a key is already mentionned. This is kind of natural, it'd be hard to imagine that a keyref would refer to several keys with the same value... For clarity, we will assume in this document that we have associated the XML Schema URI with the xsd prefix. From an element or attribute of an instance document, to make a reference to another value from the same instance document, this value must have been described as an xsd:key in the associated schema. Then, and only then, xsd:keyref can be used in the XML Schema to establish the fact that the value in the instance document must refer to another specified as a key in the schema. Let us consider an XML Schema, and a corresponding instance documents:
<schema xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:tns="http://www.example.org"
        targetnamespace="http://www.example.org"
        elementformdefault="qualified">
  <element name="root">
    <complextype>
      <sequence minoccurs="1" maxoccurs="1">
        <element name="AAA" type="tns:myAAA"/>
        <element name="BBB" type="tns:myBBB"/>
      </sequence>
    </complextype>
    <xsd:key name="myId">
      <xsd:selector xpath="./tns:AAA/tns:a"/>
      <xsd:field xpath="@id"/>
    </xsd:key>
    <xsd:keyref name="myIdref" refer="tns:myId">
      <xsd:selector xpath="./tns:BBB/tns:b"/>
      <xsd:field xpath="@idref"/>
    </xsd:keyref>
  </element>
  <complextype name="myAAA">
    <sequence minoccurs="1">
      <element name="a" minoccurs="1" maxoccurs="unbounded">
        <complextype>
          <attribute name="id" type="xsd:NCName" use="required"/>
        </complextype>
      </element>
    </sequence>
  </complextype>
  <complextype name="myBBB">
    <sequence minoccurs="1">
      <element name="b" minoccurs="1" maxoccurs="unbounded">
        <complextype>
          <attribute name="idref" type="xsd:NCName" use="required"/>
        </complextype>
      </element>
    </sequence>
  </complextype>
</schema>
<ns:root xmlns:ns="http://www.example.org">
  <ns:aaa>
    <ns:a id="x"/>
    <ns:a id="y"/>
  </ns:aaa>
  <ns:bbb>
    <ns:b idref="x"/>
    <ns:b idref="y"/>
    <ns:b idref="y"/>
  </ns:bbb>
</ns:root>
<root xmlns="http://www.example.org">
  <aaa>
    <a id="x"/>
    <a id="y"/>
  </aaa>
  <bbb>
    <b idref="x"/>
    <b idref="y"/>
    <b idref="y"/>
  </bbb>
</root>
<root xmlns="http://www.example.org">
  <aaa>
    <a id="x"/>
    <a id="y"/>
  </aaa>
  <bbb>
    <b idref="x"/>
    <b idref="y"/>
    <b idref="z"/>
  </bbb>
</root>
The XML Schema on the left is specifying that a root has to contain exactly one element called AAA first, and exactly one element called BBB after.
Under the AAA, there will be at least one element called a, which has a mandatory attribute called id. Under the BBB, there will be at least one element called b, which has a mandatory attribute called idref. The value of the idref attribute must already exist in the id attribute of one a element under the AAA element we've mentioned above.
As a result, the document presented on the top right is valid, because the values found by following the XPath expressed as /root/BBB/b@idref can be also reached by following the /root/AAA/a@id XPath.
The document at the middle right is the same as the one at the top right, but with a default namespace. For the same reason, the document presented on the bottom right is invalid, because the z value of the third b@idref cannot be reached by following the /root/AAA/a@id XPath.
Important
It is important to notice that xsd:key and xsd:keyref are making use of the XPath syntax - and specification - to locate the approriate values. As per the XPath 1.0 specification, an XPath expression cannot use a default (implicit) namespace. That means that if "http://www.example.org" had been a default namespace in the XML Schema, i.e. declared with no prefix, the XPath expression to locate the key would have becomed
<xsd:key name="myId">
  <xsd:selector xpath="./AAA/a"/>
  <xsd:field xpath="@id"/>
</xsd:key>
and this XPath syntax would have lead to impossible-to-validate instance documents, as the key values could not have been located by using this XPath syntax. A validator considering this syntax as valid, and resolving the key values, would thus break the XPath 1.0 specification.

unique

XML Schema provide the possibility to preserve the unicity of a value, without having to make it a key. The key and unique elements preserve unicity of the values. key also allows a unique value to be refered to (with a keyref). Let us consider an XML Schema, and a corresponding instance documents:
<schema xsd="http://www.w3.org/2001/XMLSchema" 
        xmlns:tns="http://www.example.org"
        targetnamespace="http://www.example.org" 
        elementformdefault="qualified">
  <element name="root">
    <complextype>
      <sequence minoccurs="1" maxoccurs="1">
        <element name="AAA" type="tns:myAAA"/>
      </sequence>
    </complextype>
    <xsd:unique name="uniqueA">
      <xsd:selector xpath="./tns:AAA/tns:a"/>
      <xsd:field xpath="@id"/>
    </xsd:unique>
  </element>
  <complextype name="myAAA">
    <sequence minoccurs="1">
      <element name="a" minoccurs="1" maxoccurs="unbounded">
        <complextype>
          <attribute name="id" type="xsd:NCName" use="required"/>
        </complextype>
      </element>
    </sequence>
  </complextype>
</schema>
<root xmlns="http://www.example.org">
  <aaa>
    <a id="x"/>
    <a id="y"/>
    <a id="z"/>
  </aaa>
</root>
<root xmlns="http://www.example.org">
  <aaa>
    <a id="x"/>
    <a id="y"/>
    <a id="y"/>
  </aaa>
</root>
The document at the bottom right breaks the unicity rule...

Using Java to validate such constraints

import java.io.InputStream;
import java.net.URL;
import oracle.xml.parser.schema.XMLSchema;
import oracle.xml.parser.schema.XSDBuilder;
import oracle.xml.parser.v2.XMLDocument;
import oracle.xml.parser.v2.XMLParser;
import oracle.xml.parser.v2.DOMParser;

public class Validator {
  private final static String SCHEMA_LOCATION  = "key.xsd";
  private final static String VALID_DOCUMENT   = "valid.xml";
  private final static String INVALID_DOCUMENT = "invalid.xml";

  public Validator() {
    try {
      URL validatorStream      = this.getClass().getResource(SCHEMA_LOCATION);
      URL docToValidate        = this.getClass().getResource(VALID_DOCUMENT);
      URL docToInValidate      = this.getClass().getResource(INVALID_DOCUMENT);
      DOMParser parser = new DOMParser();
      parser.showWarnings(true);
      String version = DOMParser.getReleaseVersion();
      System.out.println("Using " + version);
      parser.setErrorStream(System.out);
      parser.setValidationMode(XMLParser.SCHEMA_VALIDATION);
      parser.setPreserveWhitespace(true);
      XSDBuilder xsdBuilder = new XSDBuilder();
      InputStream is = validatorStream.openStream();
      XMLSchema xmlSchema = (XMLSchema)xsdBuilder.build(is, null);
      parser.setXMLSchema(xmlSchema);
      URL doc = docToValidate;
//    URL doc = docToInValidate;
      parser.parse(doc);
      XMLDocument valid = parser.getDocument();
      System.out.println("Valid");
    } catch (Exception ex) {
      System.out.println("Invalid...");
      ex.printStackTrace();
    }
  }

  public static void main(String[] args) {
    Validator validator = new Validator();
  }
}