WILPS 2020 Project: Related Items
This repository acts as the entrypoint for the Related Items project. It contains information for the overall project architecture as well as links to the different parts and repositories which are part of the Related Items project.
Authors: Fabian Rausch and Felix Welter
(fabian.rausch@studium.uni-hamburg.de and felix.welter@studium.uni-hamburg.de)
Quick links
- bigbluebutton-html5 (Use branch related-items)
- kaldi-model-server (Use branch rtc-stream-docker)
- message-broker
- relevant-terms-tagger
- slide-index
- elastic-search-middleware
- wikipedia-search-service
- wikipedia-elastic-search
- deepl-translation-api
Context
This project was developed during the Master Project "Web Interfaces for Language Processing Systems 2020" and was supervised by Prof. Dr. Chris Biemann and Dr. Seid Muhie Yimam. In light of the Corona pandemic 2020 this Master Project aimed at improving the online conferencing software BigBlueButton.
Project overview
The Related Items project enables users to transcribe speech and lookup related items (e.g. lecture slides or a wikipedia article) via an easy to use interface from within a BigBlueButton conference. For that purpose BigBlueButton was connected to the speech recognition software Kaldi. The transcribed text is available to all participants in the meeting. Every client can select the transcription language for itself. This allows the possibility to read the transcribed text in the users preferred language while the current speaker speaks in a different language. Futhermore, information sources (called indices) were connected to BigBlueButton, therefore enabling all users to search for terms that have been mentioned during the talk quickly.
You can checkout this project in action on Youtube or the Mafiasi Cloud.
Software architecture
The following image visualizes the software architecture.
Generally almost all services are dockerized and can be run on the same machine or on different servers each.
The originally used setup has shown that it works well to put all services
on one machine except for the indices, which are hosted on different
servers for independent scaling.
We recommend putting each service behind an nginx proxy, which easily adds SSL support and improves performance. Especially support for SSL makes development more hasslefree.
Standard request format
This is a standard format for requests to and answers from an index (e.g. slide-index, wikipedia). This way, new sources for related items can easily be added. The service just needs to adhere to the standard request format and a few lines are added to the html5-client.
The micro service is queried via the POST endpoint: /search
.
The request contains the POST params term
, context
and amount
.
Currently amount
it usually set to three.
The micro service returns a json object containing the response type
and type-dependent data.
For images:
{
"type": "image",
"paths": [
{
path: "/path/to/image/",
url: "/url/to/more/information/or/full/size/image"
},
{
path: "/path/to/image2/",
url: "/different/url/to/more/information/or/full/size/image2"
}
]
}
For texts:
{
"type": "text",
"texts": [
{
"text": "This will hopefully be useful information.",
"url": "http://example.com/"
},
{
"text": "This is about funny frog.",
"url": "/link/to/external/page"
}
]
}
If no information was found:
{
"type": "miss"
}
An examples of an implementation with flask can be found in the slide-index repository.
Docker
There are docker containers available for the following components:
- modified kaldi-model-server
- slide-index
- elastic-server-middleware
- wikipedia-search-service
- relevant-terms-tagger
Unless stated otherwise, always use the latest version available. Specific instructions to run each container can be found in the respective repositories.
The html5-client is not dockerized since it adheres to the development guidelines of BigBlueButton.
The message-broker is a trivial nodejs script and can easily be run on a BigBlueButton server.
If you have no experience with docker please check out docker installation and docker usage.
Nginx
This section gives an overview of the nginx setup and a general installation procedure. Please note that some steps may be redundant on your system and/or additional steps may be required (e.g. firewall exceptions) dependent on your setup.
This guide also assumes that you have a registered (sub-)domain which points to your server(s), since this is required for SSL certificates. If this requirement does not suite you, please refer to the section Alternatives to nginx+SSL setup below
Installation
sudo apt update
sudo apt install nginx
Make sure that the domain name is set up in the configuration e.g. /etc/nginx/sites-available/default
The configuration needs to contain a line like server_name example.com www.example.com
;
Most likely the directive server_name
is present and only the domain names need to be added.
SSL certificates
Install certbot and request SSL certificates. This automatically installs the SSL components for nginx.
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install python-certbot-nginx
sudo certbot --nginx -d example.com
More details regarding nginx and the SSL setup can be found here: https://www.digitalocean.com/community/tutorials/how-to-set-up-let-s-encrypt-with-nginx-server-blocks-on-ubuntu-16-04
Nginx reverse proxy
Nginx needs to know where your service is running. This can be done with the following example configuration:
location / {
proxy_pass http://127.0.0.1:8080/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_buffering off;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
}
More general:
location <URL_PATH_PREFIX> {
proxy_pass <SERVICE_CONNECTION>;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_buffering off;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
}
URL_PATH_PREFIX This defines the matching path for the service. /
will redirect all requests to the service.
By using e.g. /msg_broker
and several location blocks, multiple services can run on one server.
SERVICE_CONNECTION This specifies where the service runs (e.g. http://127.0.0.1:8080/). Make sure that
it has a trailing /
so the URL_PATH_PREFIX is not send to the service.
The last two lines containing Access-Control-Allow are required for the service to be used from a different domain.
(e.g. the user participates in a conference at bbb.uhh.de
but the service is hosted at index.uhh.de
).
If you service runs on the same domain/server as the html5 client, these two lines are not required.
For more information, refer to CORS and
nginx CORS.
For more examples of nginx configurations please have a look at the files in this repository.
Alternatives to nginx+SSL setup
These deployment options are untested for this project, however they are standard procedures in other software development projects and could be more suitable if other individuals want to setup the service system of the Related Items project.
Option 1
Host everything (including BigBlueButton) locally. According to BigBlueButton this is
possible with an Ubuntu 16
machine. Please be aware that for BigBlueButton,
Kaldi and Elasticsearch a considerable amount of hardware ressources is needed.
16gb memory and a performant CPU (especially for Kaldi) should be available.
Option 2
Host the services on external servers, however instead of connecting regulary,
tunnel via SSH to each server and connect via localhost. In most browsers this
will prevent SSL warnings and errors.