Arkhangelskiy, Timofey
fcs-clarin-endpoint-hamburg

Repository



FCS Clarin endpoint

Overview
This is an endpoint for Federated Content Search (FCS).
There are many linguistic corpora online. They are available under different platforms and use a variety of query languages. FCS is a mechanism that allows you to search in multiple corpora at once, using simple text queries or a CQL-like language. This way, you can discover or compare corpora that can be useful for your research, after which you can proceed to them. This is done through the Aggregator.
An endpoint is a piece of software that serves as an intermediary between FCS and individual corpora. It translates the FCS requests into corpus-specific query languages, waits for the results, and then renders them in an XML format required by the FCS.
Different corpus platforms or online databases require different endpoints. This endpoint works with the following platforms or resources:

ANNIS
Tsakorpus
Database of the Formulae-Litterae-Chartae project


Documentation
All documentation is available here.
CLARIN FCS specifications this endpoint implements are available here.

Requirements
This software was tested on Ubuntu and Windows. Its dependencies are the following:

python >= 3.8
python modules: fastapi, uvicorn, lxml, Jinja2 (you can use requirements.txt)
it is recommended to deploy the endpoint through apache2 with wsgi or nginx


License
The software is distributed under CC BY license (see LICENSE).

Funding
The development of this software was funded by the Akademie der Wissenschaften in Hamburg.