Blog

Roaster: an Open API Router for eXist

We’re happy to announce that the request routing library, which was originally developed for TEI Publisher 7, has been released as a separate package with extended functionality. The library, called roaster, is generic and can be used for any eXist-based project. It implements the Open API 3.0 standard to support well-documented, versioned and formally specified APIs.

Background

In previous versions of TEI Publisher, clients (i.e. your web browser) would communicate with the server by directly calling a variety of XQuery scripts. The server-side API, if you can even call it one, was thus scattered over many different files. Finding your way through the code, figuring out what parameters are expected or how the response should look like was rather difficult. Overwriting the default behaviour – e.g. to replace the generated table of contents – required substantial coding skills. The scripts also changed between TEI Publisher versions. So, if you were working on a standalone edition generated from Publisher, updates could be tricky.

With TEI Publisher 7, the entire server-side API can be viewed on a single documentation page. It clearly describes the URL paths you can use, as well as any parameter you can pass in and the type it should conform to. One can also see the different possible responses and what kind of content they would return.

The new API

Looking at the first route of the documents section in the API (see screenshot below), it is easy to construct a URL which returns the source XML: the path template to use is /api/document/{id} and {id} should contain the path to a document – relative to the data root of TEI Publisher.

Documents API screenshot

So to retrieve the TEI/XML for Graves’ letter, located in the file path test/graves6.xml, we can use the following URL:

https://teipublisher.com/exist/apps/tei-publisher/api/document/test%2Fgraves6.xml

Note that the / in the path needs to be URL encoded with %2F. This is a requirement of the Open API specification.

If instead of the TEI/XML we would like to see the letter rendered to HTML, we can use the third route in the list and simply add /html to the end of the URL:

https://teipublisher.com/exist/apps/tei-publisher/api/document/test%2Fgraves6.xml/html

or if we prefer a PDF:

https://teipublisher.com/exist/apps/tei-publisher/api/document/test%2Fgraves6.xml/pdf

For sure, as an ordinary user, you don’t need to know any of this: using the web interface of TEI Publisher, the web components on the page take care of constructing and calling above URLs for you. But if you are a developer, having a well-defined API is a game changer. Just imagine that you want to support your co-workers with a script which allows them to preview a local TEI document as HTML on the fly: sending the content of the document with an HTTP POST request to /api/preview is all you need! Our Visual Studio Code plugin does it like this.

You can use any script or programming language you like, say bash, python, perl – you name it. And because Open API is a widely used standard, there are plenty of tools for documentation, testing or code generation. The API documentation page in TEI Publisher is generated by such a tool (Swagger UI).

Implementation

roaster is essentially an implementation of the Open API standard in pure XQuery. It reads the formal API specification (in JSON format) and determines for each HTTP request coming in, which route to take. It will also check if the parameters, headers or request bodies passed in comply to the rules given in the definition of the route. An error will be generated if the request is not in compliance with the definition, e.g. because a required parameter is missing or has a wrong type. It can also fill in default values for parameters, enforce correct content types for the response etc.

From a developer perspective, this means you don’t have to worry about parameters. Your handler function will receive a single parameter, containing all the necessary information about the request and you can safely assume that it complies with the spec you provided.

If you are interested in the details, please refer to the README. For a TEI Publisher-related example, check out the FAQ article, which describes how to replace the default table of contents with a custom one.

Roaster 1.0.0 can be installed into your local eXist via the package manager in the dashboard. TEI Publisher 7 shipped with a slightly older version, 0.5.1., but you can run both versions side by side.

Cross search

With a growing number of editions realized with the TEI Publisher it is a logical next step to wish for a search service which can run queries across multiple corpora at the same time.

Usually the problem to solve would be the great diversity of encoding across projects, even if they all use TEI as a vocabulary of choice. Even commonly represented information, like the language of the source document, can be stored in various locations in a TEI document. Lucene-based fields and facets, introduced in eXist-db 5.0 provide a mechanism to smoothly abstract away these encoding differences – we can just define, say, a language facet and it’s the collection index configuration’s role to take care of specifying where exactly to grab data from.

The next potential issue would be actually running the queries across corpora, particularly with the decentralized infrastructure where editions are hosted on diverse servers. The answer here is to define an API which individual editions need to expose, so that the aggregate search engine can just poll all its registered ‘members’, regardless of their location or how they implement the search internally.

cross-search results
Cross-search results page

The cross-search prototype is exactly such a search engine. With a simple configuration one can register all ‘member’ editions. Only requirement for the editions themselves is that they expose the api/search/document API endpoint, which is a matter of simple customization for all TEI Publisher 7 applications which support Open API specifications out of the box. The api/search/document endpoint must accept a number of parameters defined in the specification. For this prototype the title, author and lang(uage) fields as well as genre, language and corpus facets were assumed.

We are very happy to report that our prototype works really well as a proof of concept with the eclectic collection of documents from TEI Publisher demo apps, all originating with vastly different projects with diverse encoding styles. Next, we intend to extend this idea into a general portal for archives and libraries and we would welcome collaboration from such institutions.

Our sincere thanks go to the Bibliothek für Bildungsgeschichtliche Forschung des DIPF / Research Library for the History of Education at DIPF for supporting this project.