Blog

Digital editions survival kit

2021-10-26, Magdalena Turska

Reconstructing an edition

Computer systems are not meant to last, to the contrary – not only do they require regular maintenance but we need to take into account the unavoidable cycle of major refurbishments. This paper, just presented at the virtual TEI conference, aims to demonstrate how critically important aspects of an edition can be reconstructed from a rather minimal data set and how such a survival kit can be useful not only for disaster recovery but also as a sustainable approach for the maintenance of scholarly publications.

The following schemata are the key components of the survival kit:

  • Source documents
  • Document encoding and transformation scheme
  • Layout templates
  • Interoperable metadata mapping specification

TEI is the perfect archival format for text-centric data: human readable and easy to process – as long as we have the capacity to read text files we can recover the information from a repository of TEI texts. TEI encoded documents with an associated ODD schema and documentation already form a solid basis for reconstruction even if the ODD would say nothing about the final form of the publication as intended by the editors.

TEI source and rendition via the Processing Model
TEI source and rendition via the Processing Model

The TEI Processing Model covers part of this territory, describing how a source document should be transformed for publication. Nevertheless, in the virtual realm, the document is always accompanied by a certain context on the page: controls to zoom in or out, facing facsimile image or switch between normalized and original spellings, just to name a few options. To explicitly define such a context and specify how a publication page would look and behave we can rely on HTML5 layout templates. A modern, web components based approach to website design gives us a beautifully simple and expressive method of assembling web pages from a virtual Lego block equivalent.

HTML5 page layout using web components
HTML5 page layout using web components

The last missing piece is to document how abstract concepts, e.g. author or date of creation are realized in the encoding so we can recover and use them for queries within the publication as well as for data interchange with other systems. Given the richness of TEI it’s impossible to prescribe what metadata needs to be gathered and how exactly it should be encoded in any given project. On the other hand, it is rather simple to express the mapping in XML, e.g. with an index configuration syntax.

Sample index configuration with fields and facets
Sample index configuration with fields and facets for a TEI document

Such a set of specifications preserves all the information necessary to rebuild the edition from scratch, focusing on the intentions and decisions of the editor while filtering out the ephemeral or secondary presentation aspects. Good to put in the vault and send into space but equally useful when the time to migrate to a new infrastructure comes.

original Dodis layout
Original Dodis document view

How does it work in practice? You might want to have a closer look at one of the TEI Publisher’s demo apps When the Wall came Down which we managed to recreate on the basis of TEI sources and accompanying ODD released by Dodis on the 30th anniversary of the Fall of the Berlin Wall. We managed to get a draft version in only 165 lines of custom code and during just one day of pre-conference workshop. Our task would be still simpler if we also had the web page template and index configuration available.

reconstructed layout
Recreated document view

Given that there’s barely an extra effort involved in assembling the survival kit, preparing it is a clear win. After all, we already have the sources and the ODD! Enriching it with a processing model is not particularly difficult, especially if we use it to generate our transformations. Similarly, in most database systems we will need to prepare the index configurations. At this point we probably don’t need to mention that TEI Publisher already implements this approach since quite a few versions (ODD with the processing model from inception, web components for user interface since version 4 and fields and facets since version 5).

Just think about it, if you pack your edition nicely, it becomes a present which archives and libraries would very much like to keep safe in their vaults and running on their servers forever…

Annotation editor released with new TEI Publisher 7.1.0

Answering the secret dream of many TEI users, the new TEI Publisher version 7.1.0 incorporates a — beautifully simple to use, yet powerful — way to enrich existing TEI documents. Just select a text passage, click on a button and within seconds — and without a pointy bracket in sight! — mark it as one of many supported annotation types. A place or person? Sure, and with built-in connectors for external authority files, too. Critical apparatus entries? We got you! Dates, corrections, regularizations and even quick fixes for typos in your transcription.

As usual, everything is customizable and extendable, so if you want a particular kind of annotation we do not support out of the box, it’s not difficult to add your own or tinker existing ones. Read more in the documentation.

The good news doesn’t end there: you can now use the TEI formula element with TeX notation for math. See the component’s demo page which presents some elaborate formulae or visit Publisher’s Demo collection which now sports shiny new examples: Euler’s Algebra for a wee help with your quadratic equations or The Italienische Madrigal by Alfred (not Albert!) Einstein, with musical scores encoded with MEI. It is nicely rendered with Verovio library through a dedicated pb-mei component and you can even listen to the piece to cheer up. And you can now set Publisher’s interface even to simplified or traditional Chinese.

TEI Publisher 7.1.0 is available as an application package on top of the eXist XML Database. Install it into a recent eXist (5.0.0 or newer) by going to the dashboard and selecting TEI Publisher from the package manager.

For more information refer to the documentation or visit the homepage to play around with it.

It’s not for the first time that our special thanks go to the Office of the Historian of the United States Department of State – this time for funding the major portion of the annotation editor. The Math support has been kindly funded by Bernoulli-Euler Zentrum in Basel.

Archives Online offers hosting in cooperation with e-editiones

2021-03-28, Wolfgang Meier

In cooperation with e-editiones, Archives Online is building an infrastructure for digital scholarly editions based on TEI Publisher and IIIF. It is offering comprehensive, long-term support and maintenance to keep digital editions from going dark. The offer is available to all interested editions world-wide.

The offer will be complemented by a portal, which allows users to search across all editions participating in the service. The portal application has been developed by e-editiones, Archives Online and the Staatsarchiv Zürich, and will go online as soon as the first editions are ready for publication. All code will be made available as free software. The distributed search feature was based on earlier work financed by the DIPF Berlin (Leibniz Institute for Research and Information in Education), and the Karl Barth-Gesamtausgabe supported the server set up and automation.

The goal is to provide an easy, long-term hosting option for editions based on minimal, well-documented requirements: ultimately any edition which complies with the recommended practices can benefit from the hosting offer and participate in the portal service.

If you are interested in having an edition hosted, please contact Archives Online. The service will not be for free: any serious long-term hosting has to cover certain maintenance costs if it wants to follow more than an "install and forget" policy. But we’re confident that our solution minimizes the costs while providing the best possible service, in particular if you’re looking towards long-term availability.

Technical Background

e-editiones central goal is to provide editions with a sustainable publishing solution which ensures long term availability with minimal maintenance. With the redesign of TEI Publisher 6 and 7, we prepared the necessary technical foundations: all editions generated by TEI Publisher now share a common API.

With the new version 7 it became possible to:

  1. host multiple editions created with different versions of the libraries side by side,
  2. simplify the update procedure to make sure editions benefit from new developments while keeping maintenance costs very low,
  3. search across local and distributed editions (hosted on a different server) by leveraging the shared API

Also, despite looking different on the surface, the building blocks for any TEI Publisher-based edition are the same, which allows server administrators to create automated setup and maintenance routines.

Outlook

Among many other things, TEI Publisher 8 will further improve the architecture by introducing more generic concepts for persistent URLs, navigation and addressing documents, allowing editions to use an addressing scheme which better reflects the logic of the edition rather than technical requirements.

A direct integration with git will allow editions to update published data without having to touch the command line.

The upcoming version will also include index configuration and API endpoints for local and distributed portal search, so generated editions can automatically participate in a multi-edition portal like sources-online.

With these low-level technical questions solved, e-editiones is now also putting a strong focus on describing the practical recommendations for the encoding of texts. The idea is to create a set of best practice guidelines for corpora with a pledge that any text following these guidelines will look good out of the box when rendered via TEI Publisher and will be ready for incorporation into the search portal. Initial work is carried out now and we expect a community meetup soon to discuss on a broader forum.

This way we aim to: flatten the learning curve for many projects starting with TEI encoding, reduce the amount of customization work required for an edition, allowing users to publish materials with minimal effort, and assure that the project data is ready to be integrated into larger scale search portals. Needless to say, recommendations we have in mind are intended only with an eye towards interoperability and do not limit in any way customization capacities already embodied by the TEI Publisher approach.