WILPS 2020 Project: Related Items
This repository acts as the entrypoint for the Related Items project. It contains information for the overall project architecture as well as links to the different parts and respositories which are part of the Related Items project.
Authors: Fabian Rausch and Felix Welter
(fabian.rausch@studium.uni-hamburg.de and felix.welter@studium.uni-hamburg.de)
Quick links
- bigbluebutton-html5 client (Use branch related-items)
- kaldi-model-server (Use branch rtc-stream-docker)
- message-broker
- relevant-terms-tagger
- slide-index
- elastic-search-middleware
- wikipedia-search-service
Context
This project was developed during the Master Project "Web Interfaces for Language Processing Systems" and was supervised by Prof. Dr. Chris Biemann and Dr. Seid Muhie Yimam. In light of the Corona pandemic 2020 the project aims at improving the online conferencing software bigbluebutton. The Related Items project enables users to transcribe speech and lookup related items (e.g. lecture slides or a wikipedia article) via an easy to use interface.
Software architecture
The following image visualizes the software architecture.
Generally almost all services are dockerized and can be run on the same machine or on different servers each.
E.g. it works well to put all services on one machine except for the indices, which are hosted on different
servers for independent scaling.
We recommend putting each service behind an nginx proxy, which easily adds SSL support and improves performance.
Standard request format
This is a standard format for request to and answers from an index (e.g. slide-index, wikipedia) or more general sources of related items. This way, new sources for related items can easily be added. The service just needs to adhere to the standard request format and a few lines are added to the html5-client.
The micro service is queried via the POST endpoint: /search
.
The request contains the POST params term
and context
.
The micro service returns a json object containing the response type
and type-dependent data.
For image:
{
"type": "image",
"paths": [
{
path: "/path/to/image/",
url: "/url/to/more/information/or/full/size/image"
},
{
path: "/path/to/image2/",
url: "/different/url/to/more/information/or/full/size/image2"
}
]
}
For texts:
{
"type": "text",
"texts": [
{
"text": "This will hopefully be useful information.",
"url": "http://example.com/"
},
{
"text": "This is about funny frog.",
"url": "/link/to/external/page"
}
]
}
If no information was found:
{
"type": "miss"
}
An examples of an implementation with flask can be found in the slide-index repository.
Docker
There are docker containers available for the following components:
- modified kaldi-model-server
- slide-index
- elastic-server-middleware
- wikipedia-search-service
- relevant-terms-tagger
Always use the latest version available.
The specific command to run each container can be found in the respective repositories.
The html5-client is not dockerized since it adheres to the development guidelines of bigbluebutton.
The message-broker is a trivial nodejs script and can easily be run on a bigbluebutton server.
If you have no experience with docker please check out docker installation and docker usage.
Nginx
This section gives an overview of the nginx setup and a general installation procedure. Please note that some steps may be redundant on your system and additional steps may be required (e.g. firewall exceptions) dependent on your setup.
This setup also assumes that you have a registered (sub-)domain which points to your server(s), since this is required for SSL certificates.
Installation
sudo apt update
sudo apt install nginx
Make sure that the domain name is set up in the configuration e.g. /etc/nginx/sites-available/default
The configuration needs to contain a line like server_name example.com www.example.com
;
Most likely the directive server_name
is present and only the domain names need to be added.
SSL certificates
Install certbot and request SSL certificates. This automatically installs the SSL components for nginx.
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install python-certbot-nginx
sudo certbot --nginx -d example.com
More details can be found here: https://www.digitalocean.com/community/tutorials/how-to-set-up-let-s-encrypt-with-nginx-server-blocks-on-ubuntu-16-04
Nginx reverse proxy
Nginx needs to know where your service is running. This can be done with the following example configuration:
location / {
proxy_pass http://127.0.0.1:8080/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_buffering off;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
}
More general:
location <URL_PATH_PREFIX> {
proxy_pass <SERVICE_CONNECTION>;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
proxy_buffering off;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
}
URL_PATH_PREFIX This defines the matching path for the service. /
will redirect all request to the service.
By using e.g. /message_broker
and several location blocks, multiple services can run on one server.
SERVICE_CONNECTION This specifies where the service runs (e.g. http://127.0.0.1:8080/). Make sure, that
it has a trailing /
so the URL_PATH_PREFIX is not send to the service.
The last two lines containing Access-Control-Allow are required for the service to be used from a different domain.
(e.g. the user participates in a conference at bbb.uhh.de
but the service is hosted at index.uhh.de
).
If you service runs on the same domain/server as the html5 client, these two lines are not required.