Xee: a new XPath and XSLT Engine
Share

The Paligo CCMS is all about structured authoring which enables its powerful content reuse capabilities. Paligo is built around XML, a standardized markup language which supports highly structured documents. XML supports a whole ecosystem for specifications, including the standardized programming languages XPath, to query XML, and XSLT, to transform it. Paligo uses them both extensively.
At Paligo we’re working on a new implementation of the XPath and XSLT standards, and we recently decided to open source this project. Its name is Xee, which stands for “XML Execution Engine”. Because Xee is now open source, others are free to use it and can also contribute to its development. We hope it finds many such users in the future, as anyone interested in XML can benefit from this collaboration, including Paligo.
Here’s the Xee repository that contains the project source code.
XML, XPath and XSLT
To explain what Xee does we first need to explain what XPath and XSLT are.
Consider this simple XML document:
<?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://docbook.org/ns/docbook">
<chapter>
<title>First Chapter</title>
<para>This is the first paragraph in the first chapter.</para>
<para>This is the second paragraph in the first chapter.</para>
<section>
<title>A Section in the First Chapter</title>
<para>This is a paragraph in a section of the first chapter.</para>
</section>
</chapter>
<chapter>
<title>Second Chapter</title>
<para>This is the first paragraph in the second chapter.</para>
</chapter>
</article>
As you can see, this XML document defines an article with multiple chapters, and inside these chapters can be sections and paragraphs. XML uses tags designated by angle brackets (< and >) to “mark up” parts of the document. Tags enclose something between open and close tags (like <para>
and </para>
).
The tags say something about what they enclose; this is a chapter, section, paragraph, etc. There are many different ways to structure XML; what we showed you here is docbook, which is what Paligo uses.
/article//para
This goes to the article and then finds all marked up elements inside named para
.
XPath is a fully-fledged programming language, supporting sophisticated queries; you can query an XML document a bit like you can query a database. XPath is fully specified in a series of W3C specifications. W3C is the World Wide Web Consortium, an international standards organization for web related technologies. Because it’s extensively specified we know it’s going to be around in its present shape for a long time, so we can build on it for a long time. Built on top of XPath is another standard named XSLT, again specified as W3C specifications. This defines another programming language geared towards transforming one XML document into another (or also HTML and other formats). This can be used to create many different kinds of useful output given a structured document as input.
If you use XML, XPath and XSLT are very useful. Paligo uses them a lot!
Existing implementations of XPath and XSLT
If you use the Java programming language you have access to good XPath and XSLT support. XPath and XSLT are implemented by Saxon, which has been around for a long time. Saxon is available for various other programming languages as well, though the integration can be somewhat involved.
If you step out of the Java world and its periphery, and if you look for other implementations, you run into libxml2
and libxslt
, which form a single larger project together. The libxml2/libxslt
combo is everywhere; on Linux servers, in MacOS, and as part of many larger projects. libxml2/libxslt
is written in the C language, which is a widely-used and venerable programming language.
Software libraries written in the C language are more easy to integrate with other programming languages than Java projects tend to be, so you see that libxml2/libxslt
is widely integrated with other programming languages, such as Python. In fact, I, as the main developer of Xee, once created a popular Python integration for libxml2
named lxml
.
Unfortunately libxml2
is stuck in the past – it implements XPath, but only XPath 1.0, and similarly libxslt
implements XSLT 1.0 only. These are old specifications from 1999. The XPath 2 specification was released in 2007, and we’re currently actually at XPath 3.1, released in 2017. Similarly XSLT 2.0 was released in 2007 and XSLT 3.0, the current version, in 2017. These newer standards define many new features for XPath and XSLT that libxml2/libxslt
does not support.
Why a New Implementation?
But the modern versions of XPath and XSLT standards have already been implemented, through Saxon. Why then create a new one with Xee?
Having an alternative implementation is good for the standard themselves: for XPath and XSLT to be thriving standards they need multiple implementations, in multiple programming languages, by multiple parties. Xee now contributes such an alternative. Xee is built using a modern programming language named Rust. Rust is a systems programming language like the C language, meaning it offers a lot of control and also high performance. This makes it well suited to implement programming languages such as XPath and XSLT. Rust prevents many of the pitfalls traditionally brings, such as memory leaks and security flaws, unlike the C programming language which was used to implement libxml2/libxslt
. Software that is written in Rust plays well with other programming languages; our hope is therefore that Xee becomes integrated with many other programming languages, such as Python and PHP. We’ve in fact created a prototype named xee-php
that lets you use Xee’s XPath facilities from PHP.
Our hope is that Xee can be a more modern and secure alternative to libxml2/libxslt
that finds its home in the open source world and many projects.
Xee Status
The Xee project is a huge undertaking; implementing a programming language is no small feat, and Xee aims to support two! The specification texts describing XPath and XSLT run over 1800 pages.
What we’re most proud of is the XPath 3.1 implementation in Xee. The XPath core language and most of its standard library have been implemented. There are gaps in the standard library implementation still – some formatting functions are particularly huge, for instance, but overall it’s pretty complete.
Besides using Xee from the Rust programming language (and, as we hope, other programming languages in the future) we also provide a “xee” command line tool you can use to execute XPath expressions and, soon, issue XSLT transformations (for as much as that works). You can download it from our releases page.
There’s an XPath 3.1 conformance test suite which provides a lot of automated tests for all aspects of XPath. Of the 21,859 tests, 20,221 tests are passing at the time of writing – that’s already 91 more since we open sourced Xee recently, thanks to open source contributors! Most of the still failing tests have to do with the implementation of missing standard library functionality.
Meanwhile Xee also provides a solid basis for XSLT, reusing a lot of the XPath infrastructure. We’ve recently made it possible to run the XSLT conformance tests as well so we can track our progress. We are now passing 921 XSLT conformance tests, which sounds less impressive once you know there are in fact 14,595 of them. Still, that’s a lot more than just a week ago! A few people have already started to contribute to the XSLT implementation as well.
Xee has now grown beyond just Paligo’s project. We hope that we’ll find more people that are interested in going on this journey with us, together!
Get started with Paligo
Paligo is built to meet the most demanding requirements, with plans made for any company from the growing SMB to the large Enterprise.