Cross search

With a growing number of editions realized with the TEI Publisher it is a logical next step to wish for a search service which can run queries across multiple corpora at the same time.

Usually the problem to solve would be the great diversity of encoding across projects, even if they all use TEI as a vocabulary of choice. Even commonly represented information, like the language of the source document, can be stored in various locations in a TEI document. Lucene-based fields and facets, introduced in eXist-db 5.0 provide a mechanism to smoothly abstract away these encoding differences – we can just define, say, a language facet and it’s the collection index configuration’s role to take care of specifying where exactly to grab data from.

The next potential issue would be actually running the queries across corpora, particularly with the decentralized infrastructure where editions are hosted on diverse servers. The answer here is to define an API which individual editions need to expose, so that the aggregate search engine can just poll all its registered ‘members’, regardless of their location or how they implement the search internally.

cross-search results
Cross-search results page

The cross-search prototype is exactly such a search engine. With a simple configuration one can register all ‘member’ editions. Only requirement for the editions themselves is that they expose the api/search/document API endpoint, which is a matter of simple customization for all TEI Publisher 7 applications which support Open API specifications out of the box. The api/search/document endpoint must accept a number of parameters defined in the specification. For this prototype the title, author and lang(uage) fields as well as genre, language and corpus facets were assumed.

We are very happy to report that our prototype works really well as a proof of concept with the eclectic collection of documents from TEI Publisher demo apps, all originating with vastly different projects with diverse encoding styles. Next, we intend to extend this idea into a general portal for archives and libraries and we would welcome collaboration from such institutions.

Our sincere thanks go to the Bibliothek für Bildungsgeschichtliche Forschung des DIPF / Research Library for the History of Education at DIPF for supporting this project.

Open Source Advent

2020-12-21, Magdalena Turska

Through December we’re aiming to present a number of recent open source developments related to e-editiones mission to empower digital editions of cultural, scientific, and artistic works by promoting open standards related to digital editions and advancing open source applications based on them.

Continue reading “Open Source Advent”

TEI “Vanilla” meeting summary

Following the spontaneous exchange on e-editiones Slack channel, we have held an online community meeting to discuss the emerging TEI Vanilla proposal on October 20th. It was a pleasure to see such a broad interest and diverse representation of participants, spanning the entire range from TEI newcomers to current and former TEI Council and Board members.

Continue reading “TEI “Vanilla” meeting summary”