diff --git a/en/best_practices.html b/en/best_practices.html
new file mode 100644
index 0000000000000000000000000000000000000000..31846876808f1cef9bf74877d43ca10a4dea35ee
--- /dev/null
+++ b/en/best_practices.html
@@ -0,0 +1,201 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+  <meta charset="utf-8" />
+  <meta name="generator" content="pandoc" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+  <title>Best Practices</title>
+  <style>
+    html {
+      line-height: 1.5;
+      font-family: Georgia, serif;
+      font-size: 20px;
+      color: #1a1a1a;
+      background-color: #fdfdfd;
+    }
+    body {
+      margin: 0 auto;
+      max-width: 36em;
+      padding-left: 50px;
+      padding-right: 50px;
+      padding-top: 50px;
+      padding-bottom: 50px;
+      hyphens: auto;
+      word-wrap: break-word;
+      text-rendering: optimizeLegibility;
+      font-kerning: normal;
+    }
+    @media (max-width: 600px) {
+      body {
+        font-size: 0.9em;
+        padding: 1em;
+      }
+    }
+    @media print {
+      body {
+        background-color: transparent;
+        color: black;
+        font-size: 12pt;
+      }
+      p, h2, h3 {
+        orphans: 3;
+        widows: 3;
+      }
+      h2, h3, h4 {
+        page-break-after: avoid;
+      }
+    }
+    p {
+      margin: 1em 0;
+    }
+    a {
+      color: #1a1a1a;
+    }
+    a:visited {
+      color: #1a1a1a;
+    }
+    img {
+      max-width: 100%;
+    }
+    h1, h2, h3, h4, h5, h6 {
+      margin-top: 1.4em;
+    }
+    h5, h6 {
+      font-size: 1em;
+      font-style: italic;
+    }
+    h6 {
+      font-weight: normal;
+    }
+    ol, ul {
+      padding-left: 1.7em;
+      margin-top: 1em;
+    }
+    li > ol, li > ul {
+      margin-top: 0;
+    }
+    blockquote {
+      margin: 1em 0 1em 1.7em;
+      padding-left: 1em;
+      border-left: 2px solid #e6e6e6;
+      color: #606060;
+    }
+    code {
+      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
+      font-size: 85%;
+      margin: 0;
+    }
+    pre {
+      margin: 1em 0;
+      overflow: auto;
+    }
+    pre code {
+      padding: 0;
+      overflow: visible;
+    }
+    .sourceCode {
+     background-color: transparent;
+     overflow: visible;
+    }
+    hr {
+      background-color: #1a1a1a;
+      border: none;
+      height: 1px;
+      margin: 1em 0;
+    }
+    table {
+      margin: 1em 0;
+      border-collapse: collapse;
+      width: 100%;
+      overflow-x: auto;
+      display: block;
+      font-variant-numeric: lining-nums tabular-nums;
+    }
+    table caption {
+      margin-bottom: 0.75em;
+    }
+    tbody {
+      margin-top: 0.5em;
+      border-top: 1px solid #1a1a1a;
+      border-bottom: 1px solid #1a1a1a;
+    }
+    th {
+      border-top: 1px solid #1a1a1a;
+      padding: 0.25em 0.5em 0.25em 0.5em;
+    }
+    td {
+      padding: 0.125em 0.5em 0.25em 0.5em;
+    }
+    header {
+      margin-bottom: 4em;
+      text-align: center;
+    }
+    #TOC li {
+      list-style: none;
+    }
+    #TOC a:not(:hover) {
+      text-decoration: none;
+    }
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
+  </style>
+  <!--[if lt IE 9]>
+    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
+  <![endif]-->
+</head>
+<body>
+<header id="title-block-header">
+<h1 class="title">Best Practices</h1>
+</header>
+<p>(an excerpt from <em>Guidelines for Building Language Corpora Under German Law</em>, licensed under a CC-BY 4.0 International license)<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
+<h1 id="recommendations-for-building-corpora">Recommendations for building corpora</h1>
+<ul>
+<li><strong>In case of doubt, you should try to obtain licenses and consent</strong>. Right holders are usually cooperative when it comes to non-commercial, scientific purposes and no economic or other interests are violated e.g. by an unrestricted distribution of copies.</li>
+<li>The attempt to get licenses should begin <strong>as early as possible in the planning phase</strong> of a project, since the negotiations may drag on over a long period of time and this is the only way to ensure that the necessary rights may be obtained before the project starts and therefore any <strong>license fees</strong> or other rewards may be included in the <strong>calculation of the project costs</strong>.</li>
+<li>Also as early as possible in the planning phase, <strong>a center should be approached</strong> that is experienced with licensing of the relevant type of resource. It may provide assistance or in some circumstances take care about obtaining the licenses, and at the same time ensure that the licensing terms are drafted so that the data and the results of the projects may be included into their own archives/projects after the duration of the project and made available for the long term.</li>
+<li>Recommendations for the draft of license agreements can be found on the CLARIN-D Legal Information Platform.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></li>
+<li>License agreements typically have a <strong>limited term</strong>, especially if they are associated with fees. Particularly in these cases, it is recommended to develop a strategy in cooperation with a center for making the content sustainably available. It also should be noted that <strong>unintentional interpretations of the licensor</strong> can prevent license renewals and additional licenses regardless of their legality.</li>
+<li>In cases where it is not possible to obtain sufficient rights to make available a text corpus to the scientific community permanently, but the reasons to build the corpus were nevertheless strong enough, the reasons should be documented and <strong>compromise strategies</strong> should be found on how a sustainable availability may be achieved at least rudimentarily. One possible model is e.g. to comprehensibly document how they may obtain the necessary rights themselves for subsequent users.</li>
+<li>Data protection issues should already be included in the planning phase of a project. If it is intended to collect personal data to a greater extent, an explicit document on the subject should be created and maintained (data protection concept). It must be captured which data is collected for which purposes. If necessary, appropriate consent declaration forms need to be developed and to be signed by the people affected by the data processing.</li>
+</ul>
+<hr />
+<hr />
+<h1 id="recommendations-for-making-written-corpora-available">Recommendations for making written corpora available</h1>
+<ul>
+<li>It is usually necessary and common practice to <strong>limit the number of users</strong> of corpora to people who identified and agreed with an End User License Agreement (see below) and, if necessary, additional data protection regulations. De facto this can be achieved by e.g. data access regulations via passwords which is allocated only on application and only in person or via a DFN-AAIAuthentication and web forms to request consent.</li>
+<li>As a general rule, rights and obligations which result from licensing agreements between right holders and corpus provider, need to be passed on to end-users via <strong>end user license agreements</strong> and <strong>data privacy policies</strong> (for example if a corpus provider undertakes an obligation to the licensor to document access to the corpus).</li>
+<li>With regard to personal data, anonymization and pseudoanonymization should be considered when making corpora available.</li>
+</ul>
+<hr />
+<h1 id="recommendations-for-creating-and-making-own-works-available-derivative-works-and-databases">Recommendations for creating and making own works available: derivative works and databases</h1>
+<ul>
+<li>Works that are <strong>created by scientists themselves should always be released under license terms</strong>, in order that subsequent users in the future may know if they can use the work for their own purposes. At the same time, contents that are (or become) free of copyright and on which the scientist did not acquire any other rights should not be portrayed as protected by law, and as far as possible explicitly marked as unprotected, e.g.with the help of "Public Domain Mark" (PDM).</li>
+<li>When selecting license terms, <strong>existing, widely-used standard licenses that are as liberal as possible should be used</strong> (e.g. one of the two Creative Commons licenses recognized in terms of the Open Definition<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> , namely Creative Commons license versions BY and BY SA, or for software, a GNU license or BSD or Apache licenses which refrain from copyleft). So the result is most likely like the Open Access approach. The increasing trend is to publish scientific works with not more limitations than the Creative Commons license type "CC BY - Attribution," while pure data should be licensed entirely free of restrictions by "CC0". Even scientific publishers are increasingly open to such licenses.</li>
+<li>Particular attention should be paid to <strong>indicating the license as accurately as possible and easy to find</strong>.</li>
+<li><strong>Problems with derivative works</strong> may be avoided in some cases, for example when annotations are published as an independent work from which the original work can not be reconstructed. If the license which is advised for a derivative work is roughly equivalent to the underlying, the same license should be used to facilitate the reusability. In any case, provisions of the license of the underlying work that sometimes allow only certain licenses for later processing (see e.g. the “Share-Alike” clauses in Creative Commons licenses<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> ) should be noted.</li>
+</ul>
+<hr />
+<hr />
+<h1 id="recommendations-for-the-use-of-software-when-creating-derivative-works">Recommendations for the use of software when creating derivative works</h1>
+<ul>
+<li>If no license terms are known, one should attempt to determine if and which restrictions apply to the use of the software.</li>
+<li>Particularly with commercial annotation tools, it may be reasonable to clarify and set out in a supplementary agreement the extent that the outputs of the software may be distributed, because software license provisions often prohibit this altogether. Generally, however, only reverse engineering is to be prevented.</li>
+<li>Before using or licensing software, it should be clarified to what extent the outputs of the software may still be used after the license term expires.</li>
+</ul>
+<hr />
+<section class="footnotes" role="doc-endnotes">
+<hr />
+<ol>
+<li id="fn1" role="doc-endnote"><p><a href="https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf">https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf</a><a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2" role="doc-endnote"><p><a href="http://clarin-d.de/legalissues">http://clarin-d.de/legalissues</a><a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn3" role="doc-endnote"><p><a href="http://opendefinition.org/od/">http://opendefinition.org/od/</a><a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn4" role="doc-endnote"><p>See the variety of content which is combined under different Creative Commons licenses, <a href="https://wiki.creativecommons.org/FAQ#Can_I_combine_material_under_different_Creative_Commons_licenses_in_my_work.3F">https://wiki.creativecommons.org/FAQ#Can_I_combine_material_under_different_Creative_Commons_licenses_in_my_work.3F</a><a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section>
+</body>
+</html>
diff --git a/en/copyright.html b/en/copyright.html
new file mode 100644
index 0000000000000000000000000000000000000000..3a824e92298a4b86bff9e1b37d8d88059110d883
--- /dev/null
+++ b/en/copyright.html
@@ -0,0 +1,238 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+  <meta charset="utf-8" />
+  <meta name="generator" content="pandoc" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+  <title>Copyright</title>
+  <style>
+    html {
+      line-height: 1.5;
+      font-family: Georgia, serif;
+      font-size: 20px;
+      color: #1a1a1a;
+      background-color: #fdfdfd;
+    }
+    body {
+      margin: 0 auto;
+      max-width: 36em;
+      padding-left: 50px;
+      padding-right: 50px;
+      padding-top: 50px;
+      padding-bottom: 50px;
+      hyphens: auto;
+      word-wrap: break-word;
+      text-rendering: optimizeLegibility;
+      font-kerning: normal;
+    }
+    @media (max-width: 600px) {
+      body {
+        font-size: 0.9em;
+        padding: 1em;
+      }
+    }
+    @media print {
+      body {
+        background-color: transparent;
+        color: black;
+        font-size: 12pt;
+      }
+      p, h2, h3 {
+        orphans: 3;
+        widows: 3;
+      }
+      h2, h3, h4 {
+        page-break-after: avoid;
+      }
+    }
+    p {
+      margin: 1em 0;
+    }
+    a {
+      color: #1a1a1a;
+    }
+    a:visited {
+      color: #1a1a1a;
+    }
+    img {
+      max-width: 100%;
+    }
+    h1, h2, h3, h4, h5, h6 {
+      margin-top: 1.4em;
+    }
+    h5, h6 {
+      font-size: 1em;
+      font-style: italic;
+    }
+    h6 {
+      font-weight: normal;
+    }
+    ol, ul {
+      padding-left: 1.7em;
+      margin-top: 1em;
+    }
+    li > ol, li > ul {
+      margin-top: 0;
+    }
+    blockquote {
+      margin: 1em 0 1em 1.7em;
+      padding-left: 1em;
+      border-left: 2px solid #e6e6e6;
+      color: #606060;
+    }
+    code {
+      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
+      font-size: 85%;
+      margin: 0;
+    }
+    pre {
+      margin: 1em 0;
+      overflow: auto;
+    }
+    pre code {
+      padding: 0;
+      overflow: visible;
+    }
+    .sourceCode {
+     background-color: transparent;
+     overflow: visible;
+    }
+    hr {
+      background-color: #1a1a1a;
+      border: none;
+      height: 1px;
+      margin: 1em 0;
+    }
+    table {
+      margin: 1em 0;
+      border-collapse: collapse;
+      width: 100%;
+      overflow-x: auto;
+      display: block;
+      font-variant-numeric: lining-nums tabular-nums;
+    }
+    table caption {
+      margin-bottom: 0.75em;
+    }
+    tbody {
+      margin-top: 0.5em;
+      border-top: 1px solid #1a1a1a;
+      border-bottom: 1px solid #1a1a1a;
+    }
+    th {
+      border-top: 1px solid #1a1a1a;
+      padding: 0.25em 0.5em 0.25em 0.5em;
+    }
+    td {
+      padding: 0.125em 0.5em 0.25em 0.5em;
+    }
+    header {
+      margin-bottom: 4em;
+      text-align: center;
+    }
+    #TOC li {
+      list-style: none;
+    }
+    #TOC a:not(:hover) {
+      text-decoration: none;
+    }
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
+  </style>
+  <!--[if lt IE 9]>
+    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
+  <![endif]-->
+</head>
+<body>
+<header id="title-block-header">
+<h1 class="title">Copyright</h1>
+</header>
+<p>(an excerpt from <em>Guidelines for Building Language Corpora Under German Law</em>, licensed under a CC-BY 4.0 International license)<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
+<h1 id="copyright-and-related-rights">Copyright and related rights</h1>
+<p>In general, texts are protected by copyright in Germany<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> if they satisfy an originality standard and it has not been more than 70 years since the death of their authors. How the originality standard is defined, and therefore how it is met, is a controversial question and its answer may differ from case to case and from court decision to court decision. The requirements for meeting the originality standard for copyright protection have been set lower and lower by courts over the past decades. Texts such as simple statements of the news or plain business correspondence may still not be protected by copyright because they do not meet the originality standard. But there is the concept of “kleine Münze”, a sort of everyday creativity of people in general which is fully protected by copyright.</p>
+<p>There are also certain related rights that are especially relevant for texts:</p>
+<p>Since 2013, there is a related right for publishers in Germany which sidesteps the originality standard, which grants protection to even the shortest paragraphs for a term of one year. This protection follows mere publication, and is limited to the right of making publicly available. It is, therefore, only invoked when the press content is placed online.</p>
+<p>There are two related rights that have a wider protected domain but a smaller scope of application. These include scientific editions of works that are not protected by copyright, and one concerning posthumous works, i.e. works that are published after the death of their authors and as the case may be after the copyright term (70 years, see above). These rights protect all uses of these works (not only for online use) for 25 years.</p>
+<p>Finally, there is the related right for the creators of databases. This term is 15 years and is not related to the contents of databases, but to the manner in which it is structured. This related right does not apply to unstructured data and requires substantial investments of time and/or money. Right holders (i.e. those who have created a database through substantial investment) are protected from “substantial parts” of the database being reproduced or used further.</p>
+<p>Related rights are distinguished from copyright especially in two key ways. First, related rights have a shorter term of protection. Second, related rights can protect the works created by a legal person, e.g. a company. Under copyright, companies may at most have exclusive rights in copyright-protected works, while authors may only be natural persons.</p>
+<p>The rules of copyright law that are most relevant for written corpora affect the right of reproduction (§ 16), the right of distribution (§ 17), the right of making works available to the public (§ 19a), the related rights on scientific editions (§ 70) and posthumous works (§ 71), the related right of makers of a database (§ 87b) and the related right of press publishers (§ 87f). It is still a legal gray area whether Text and Data Mining (TDM)<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> and thus quantitative linguistic analysis are types of use with copyright implications which are not yet mentioned in § 15 UrhG but protected nonetheless. (More specifically, whether the act of performing analysis on the data falls within the scope of § 15 UrhG; the resulting digital copy undoubtedly falls under § 16 UrhG.) Court decisions clarifying this issues can perhaps be expected in the foreseeable future. Because there are clear parallels between TDM and a human reading a text, which is not a type of use relevant for copyright, it is easily conceivable that courts may rule that TDM is permitted by law even without permission of the right holder, similar to reading.</p>
+<hr />
+<hr />
+<h1 id="copyright-exceptions-and-their-application-to-written-corpora">Copyright exceptions and their application to written corpora</h1>
+<p>Laws that balance of interests of authors and users are so-called copyright exceptions. These determine which types of uses are allowed without the consent of the right holders, and under which circumstances. The use of copyright protected material as research data is only broadly provided for. The so-called research exception (§ 52a UrhG) for example allows making available “small scale” works as well as “individual articles from newspapers or periodicals,” and only if and insofar this is “necessary” for the respective research purpose and is “justified” for the “pursuit of non-commercial aims”. The copies may be made available “exclusively for a specifically limited circle of persons” which may include a small research team whose members -- according to the legal commentators Dreier/Schulze (2013) -- may be of different research institutions, or a seminar, but not the whole scientific community. The limited circle must be limited to people who access the materials for their own scientific purposes<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> and the measures taken must be effective considering the state of the art at the time.</p>
+<p>The right of temporary acts of reproduction (§ 44a UrhG) allows a temporary caching of electronic data, although this right is often insufficient to legally cover the empirical methods and replicable results required by scientific research. The same can be said for the right of reproductions for private use, which are permitted by § 53 I UrhG, but allows a transfer only in private, i.e. not in work-related scientific field, and § 53 II UrhG, which allows a reproduction only for one's own personal scientific use (the possibilities of transfer are regulated in § 52a). The right of digital reproductions of complete books or magazines is further limited in § 53 IV UrhG. Concerning all the exceptions, one must keep in mind that they are subordinate to contrary license agreements. Additionally, § 52a IV UrhG states that an equitable remuneration shall be paid (guided by rates set out in the VG WORT case).<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a></p>
+<p>Attention should also be paid to the fact that exceptions of copyright protection do not apply for related rights in the same way. The have their own respective protection exceptions that are named in the respective part of the UrhG.</p>
+<p>As a conclusion it can be said that legal exceptions are typically not a sufficient basis for making available written corpora permanently. Making available a copy of the written corpus that has been the research object is not covered by any of the above mentioned research exceptions what may complicate the repeatability and thus the verification of respective research projects massively. Often enough even building up a corpus of texts for which no express permission was given, is unlawful because the digital copies produced in the process are not necessarily covered by copyright exceptions.</p>
+<p>For building a corpus in conformity with the law, the consent of the right holders must be obtained, or it must be ensured that only texts are used:</p>
+<ul>
+<li>that are not protected by copyright, such as the text of laws, certain government documents, etc.</li>
+<li>where the term of copyright protection has expired, or</li>
+<li>where the texts do not meet the originality standard.<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a></li>
+</ul>
+<p>A thorough checking/clarification of rights is therefore necessary. The costs for this may possibly be reduced by cooperating with other centers that that seek to use this data and therefore check its legal status.</p>
+<p>The situation tends to be considerably easier, if the intended use is covered by standard licenses which grant the necessary rights for the use in a corpus, to everyone, in advance. These are called “Public Licenses”. In best-case scenarios, the author has already published his / her texts under a sufficiently liberal standard license. But often this is not the case. This means that individual license agreements with the respective right holders must be made, which requires time and other resources. In the case of texts published by presses / publishing houses, these may typically be contacted directly, because the publisher often obtains the right to license electronic uses in their contracts with authors. The same often applies to texts which are published on web portals, because operators are often granted the respective rights through “Terms and Conditions” agreements.</p>
+<hr />
+<hr />
+<h1 id="copyright-aspects-for-spoken-coprpora">Copyright aspects for spoken coprpora</h1>
+<p>For spoken corpora, both copyright and related rights may be issues, especially when it comes to:</p>
+<ul>
+<li>audio and video recordings from radio and TV broadcasts, where authors, producers, broadcasting companies, and others own certain rights</li>
+<li>audio and video recordings from the Internet (streaming platforms and other sources) where the operators of the platform may own rights</li>
+<li>written material that belongs to a spoken corpus as supplementary material (e.g. powerpoint slides for a speech, coursebooks for the class, etc.) and</li>
+<li>pictures, graphics etc.</li>
+</ul>
+<p>As soon as these materials are used in the course of research, the consent of relevant rightholders is necessary to perform the research legally. A general research and education law regarding copyright and other rights has not yet been implemented in Europe, although such a regulation has been, and is, continuously discussed. Currently, only the quotation exception (§ 51 of the German Act on Copyright and Related Rights (UrhG)) and some special regulations for building personal scientific archives allow very limited use of someone else’s work at all.</p>
+<p>The consent of the rightholders is usually given through an appropriate license agreement (or contract). In practice, it is a considerable problem when rightholders are not known or cannot be found. This is important because every right holder must give his / her consent before a use of the work which is otherwise only permitted for right holders is allowed (with the exception of films, and if there are no other special agreements). If more than one person created the work, the consent of each corightsholder must be obtained.</p>
+<p>This also refers to transcripts of primary data protected by copyright law (e.g. spoken and song recordings), even if the transcript is technically the work of the scientist in the sense of copyright. In such cases this transcript is considered a simple copy of the work which is included in the primary data or a derivative work (e.g. translation). Both of these types of use are assigned to the original rightholder (except in above-mentioned copyright exceptions, e.g. the quotation exception).</p>
+<p>Extra precaution is appropriate if the copyright-protectable material has not yet been, but will be published within a scientific work in a manner which cannot be avoided due to the best scientific practice in disclosing sources. This affects the authors’ right of personality because it is their choice whether their works are disclosed to the public or not.</p>
+<h1 id="adaptations-derivative-works-and-transformations">Adaptations (derivative works) and transformations</h1>
+<p>Adaptations in the meaning of the law are contents that are based on a previous work and meet the originality standards to qualify for protection (the law of copyright in adaptations), even if the previous work is not longer protected by copyright. If the previous work is still protected by copyright, adaptations may only be published with the consent of the author of the previous work. Transformations are, according to prevailing legal opinion,modified versions of previous works that do not meet the requirements of protection for copyright in adaptations. They also may also only be published with the consent of the author of the previous work.</p>
+<p>The threshold to adaptation or transformation is reached if the an average observer’s impression of a work is changed noticeably. Concerning pictures, this is for example the case if they are cropped or their sizes changed extremely. For films, e.g. if they are musically rendered. Texts are changed noticeably if they are shortened, amended, mixed with other texts or translated. A new layout or a transmission of a text from analogue to digital form is not an adaptation or transformation -- although usually a reproduction -- meaning generally when a text is removed from its original medium/context and remains recognizable as a discrete work. (In exceptional cases the change of the context of the work may result in an adaptation. For text corpora for research purposes, however, this is hard to imagine.)</p>
+<p>When the original work is no longer recognizable by an average observer, no adaptation exists, but rather a new, independent work. Here, courts have said that the personal characteristics of the pre-existing work “fade away” from the new content.<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a> The difference between an adaptation (§ 23 UrhG) and an independent work created in free use (§ 24 UrhG) is, however, fluid.<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a> If, on the surface, the new content has nothing in common with the previous material, free use of the previous material is unproblematic (as far as the law of adaptations). Often, this results from the method which is used within Text and Data Mining. If a text, for example, is statistically analyzed or annotated, it can usually not be reconstructed from the emerging statistics or annotation. Thus both research results are not adaptations of the source text within the meaning of the law.</p>
+<p>For source texts that are still protected, this does not solve the problem of contractual terms that prohibit temporary copies / caching that is technically necessary for the development of research results and making the texts permanently available only with consent, (see above). Apart from that, TDM may also be contractually prohibited because civil law largely allows contracting parties to agree on what they wish (the law of “private autonomy”). If an editor for example forbids TDM or the publication of TDM results, based on a text within a license agreement with a scientific institution that regulates the access to the material, this must be respected, even if the research results are independent and not adaptations or transformations and TDM should a priori not be regarded as a copyright protected type of use.<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a> In this case, the basis for enforcement of the prohibition is not the copyright law, but the contract which was entered between the two parties. Such a contract, however, affects only the relevant parties.</p>
+<p>It is possible to incorporate certain conditions for the use of the material in the agreement instead of a strict prohibition. This can be executed even by standard licenses, which are contracts. It is therefore conceivable that research or TDM results are made subject to copyleft terms.<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a> Disregarding software licenses, however, it is absolutely not common that the conditions of standard licenses impose conditions independently of any existing legal position based on an absolute right (such as copyright or database protection). The six Creative-Commons licenses even explicitly state that they do not restrict anything that the licensee is allowed to do without the license anyway.<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a> Their copyleft terms thus only apply under the pre-condition that there is a legal protection in the first place that requires permission of a rightsholder.<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a> Thus copyleft and other limitations of CC licences would only be effective if TDM is regarded as a type of use within the meaning of the copyright law.</p>
+<p>Since this question is not yet resolved everywhere in the world, the new CC license version 4.0 clarifies explicitly that the results of TDM should not be considered as an adaptation by the licensor. Thus neither the copyleft conditions of CC licenses<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a> nor the other conditions "attribution," "no commercial use" and "no edits allowed" need to be taken into account, as far as TDM and its independent results are concerned.</p>
+<p>If research results are still somehow considered adaptations or transformations within the meaning of the law, i.e. outside of TDM and without other licenses influencing the character of adaptation, the same recommendations apply for further use of these research results as for the use of independent works.</p>
+<hr />
+<hr />
+<h1 id="collections-and-database-works">Collections and database works</h1>
+<p>According to § 4 UrhG, collections of works and databases are protected where the selection or arrangement of the elements constitute the author's own intellectual creation, regardless of whether the individual elements are protected or not. This may be relevant if collections of texts in the public domain are included in a corpus. This protection of “databases works” should not be confused with mere databases, whose creators are additionally protected by §§ 87a - 87e UrhG (see above). The related right of the maker of a database only requires substantial investment; in contrast, a “database work” requires such an extraordinary arrangement of the content that the arrangement itself can be regarded as a creation (similar to authorship). Thus, the threshold for the (high) level of protection of a “database work” is much higher than those for a database protected in accordance to §§ 87a et seq. UrhG. The latter right of the maker of a database place may create restrictions of use if parts of a database are included in a corpus or such a corpus is made available.<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a></p>
+<hr />
+<hr />
+<h1 id="orphan-works">Orphan works</h1>
+<p>After § 61 UrhG was inserted into the Copyright Act in 2014, there are now some types of uses permitted by law concerning text works from collections of publicly accessible libraries, educational institutions, museums and archives, if they are already published and the respective right holders can not be found or identified even by a diligent search (defined in § 61a UrhG), and this research result was recorded in a central register. The permitted types of use concern making available to the public (§ 19a UrhG) and reproduction ( § 16 I UrhG). Since the right to create derivative works is not included, it may not be possible to rely on § 61 UrhG when using such works in corpora.<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a> To take the path of least legal risk, orphan works should only be included in corpora in a way whereby no adaptation or transformation is carried out (see above).</p>
+<p>There is still the unavoidable problem that the status of an orphan work may subsequently expire if the right holders appear and/or become known. From this point in time, the usual rules for the use of works apply again.</p>
+<hr />
+<hr />
+<h1 id="software">Software</h1>
+<p>The terms of use of commercial software are usually clearly laid out, in order to decide the terms under which it may be used and what implications may arise when such software is used to create independent and derivative works. Depending on the approach, the output of the software, i.e. the research result or document, remains independent in its legal status from that of the software.</p>
+<p>Sometimes the legal status is more vague within software tools that were developed in an academic context, as they are often based on data (dictionaries or training corpora) which might be affected by third party rights.</p>
+<p>For software developed in-house, it needs to be noted that the decision if and under which license the software will be released is reserved for the employer for whom the software was created (§ 69b UrhG).</p>
+<hr />
+<section class="footnotes" role="doc-endnotes">
+<hr />
+<ol>
+<li id="fn1" role="doc-endnote"><p><a href="https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf">https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf</a><a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2" role="doc-endnote"><p>We only give information about the legislation in the Federal Republic of Germany. How works of German authors are protected in other countries and how foreign authors are protected in Germany is regulated in some international conventions. The most important ones are the Berne Convention for the Protection of Literary and Artistic Works (usually known as the Berne Convention) and the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS). In Art. 5.1. the Berne Convention says that every state party must acknowledge the protection of works of citizens of other state parties as it acknowledges the protection of works of its own citizens. There are 168 countries that are parties to the Berne Convention (i.a. the EU, the USA, China, Japan, Russia and India).<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn3" role="doc-endnote"><p>We adopt the term Text and Data Mining because it is now frequently used in discussions by the international legal community. At the moment, there is no coherent system of definitions of the different terms which are used for scientific analysis of data, but many slightly different and partly overlapping nomenclatures. It can be argued that the meaning of TDM in any case includes quantitative linguistic analysis.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn4" role="doc-endnote"><p>BT-Drucksache 15/38, S. 2<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn5" role="doc-endnote"><p>In its decision of March 24, 2011 (file reference 6 WG 12/09) concerning the case VG Wort - Federal States, the higher regional court (OLG) of Munich considered a remuneration of 10 euros + VAT per work as equitable for scientific research within the scope of § 52a I Nr. 2 UrhG.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn6" role="doc-endnote"><p>Whether the originality standard applies to a succession of randomly assorted sentences is unclear. § 39 UrhG “Alterations of the work,” which belongs to the moral rights, is one argument that this method is not legally sound.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn7" role="doc-endnote"><p>Federal Supreme Court of Germany in “Mecki-Igel I”, GRUR 1958, 500, 502.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn8" role="doc-endnote"><p>See Dreier/Schulze, Urheberrechtsgesetz Kommentar, 4. ed., § 24 Marginal No. 1 and § 23 Marginal No. 4. <em>[2016 note: a newer 2nd edition was published in 2015]</em><a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn9" role="doc-endnote"><p>Whether TDM can be regarded as a type of use is currently being discussed by jurists and will certainly keep the courts busy. There are many reasons why TDM should be regarded as a kind of reading, which is as so-called Werkgenuss permitted without consent. See above 2.1.at the end.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn10" role="doc-endnote"><p>Meaning, that the licensee must offer the licensed content to the public under identical or similar condtions.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn11" role="doc-endnote"><p>See e.g. section 8.d. in the license CC-BY Version 4.0.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn12" role="doc-endnote"><p>The data bank licenses of Open Data Commons are an exceptional case, because these postulate their copyleft-conditions even for those regions of the world where no database protection law exists, e.g. the United States.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn13" role="doc-endnote"><p>The name for the copyleft mechanism of CC licenses is "share alike", abbreviated as "SA".<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn14" role="doc-endnote"><p>See the court decision of the European Court of Justice (October 9, 2008, Case C304/07) and of the Federal Supreme Court (August 13, 2009, file reference I ZR 130/04)<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn15" role="doc-endnote"><p>At the copy deadline of the present document, this was still an open question. Any news on this point will be published on the CLARIN-D Legal Information Platform. <em>[2016 note: in general, it seems that adapted versions are not covered by § 61 UrhG].</em><a href="#fnref15" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section>
+</body>
+</html>
diff --git a/en/data_protection.html b/en/data_protection.html
new file mode 100644
index 0000000000000000000000000000000000000000..de6b16ae6f4f8da16f427967fa4ae6de2fac0113
--- /dev/null
+++ b/en/data_protection.html
@@ -0,0 +1,225 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+  <meta charset="utf-8" />
+  <meta name="generator" content="pandoc" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+  <title>Data Protection</title>
+  <style>
+    html {
+      line-height: 1.5;
+      font-family: Georgia, serif;
+      font-size: 20px;
+      color: #1a1a1a;
+      background-color: #fdfdfd;
+    }
+    body {
+      margin: 0 auto;
+      max-width: 36em;
+      padding-left: 50px;
+      padding-right: 50px;
+      padding-top: 50px;
+      padding-bottom: 50px;
+      hyphens: auto;
+      word-wrap: break-word;
+      text-rendering: optimizeLegibility;
+      font-kerning: normal;
+    }
+    @media (max-width: 600px) {
+      body {
+        font-size: 0.9em;
+        padding: 1em;
+      }
+    }
+    @media print {
+      body {
+        background-color: transparent;
+        color: black;
+        font-size: 12pt;
+      }
+      p, h2, h3 {
+        orphans: 3;
+        widows: 3;
+      }
+      h2, h3, h4 {
+        page-break-after: avoid;
+      }
+    }
+    p {
+      margin: 1em 0;
+    }
+    a {
+      color: #1a1a1a;
+    }
+    a:visited {
+      color: #1a1a1a;
+    }
+    img {
+      max-width: 100%;
+    }
+    h1, h2, h3, h4, h5, h6 {
+      margin-top: 1.4em;
+    }
+    h5, h6 {
+      font-size: 1em;
+      font-style: italic;
+    }
+    h6 {
+      font-weight: normal;
+    }
+    ol, ul {
+      padding-left: 1.7em;
+      margin-top: 1em;
+    }
+    li > ol, li > ul {
+      margin-top: 0;
+    }
+    blockquote {
+      margin: 1em 0 1em 1.7em;
+      padding-left: 1em;
+      border-left: 2px solid #e6e6e6;
+      color: #606060;
+    }
+    code {
+      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
+      font-size: 85%;
+      margin: 0;
+    }
+    pre {
+      margin: 1em 0;
+      overflow: auto;
+    }
+    pre code {
+      padding: 0;
+      overflow: visible;
+    }
+    .sourceCode {
+     background-color: transparent;
+     overflow: visible;
+    }
+    hr {
+      background-color: #1a1a1a;
+      border: none;
+      height: 1px;
+      margin: 1em 0;
+    }
+    table {
+      margin: 1em 0;
+      border-collapse: collapse;
+      width: 100%;
+      overflow-x: auto;
+      display: block;
+      font-variant-numeric: lining-nums tabular-nums;
+    }
+    table caption {
+      margin-bottom: 0.75em;
+    }
+    tbody {
+      margin-top: 0.5em;
+      border-top: 1px solid #1a1a1a;
+      border-bottom: 1px solid #1a1a1a;
+    }
+    th {
+      border-top: 1px solid #1a1a1a;
+      padding: 0.25em 0.5em 0.25em 0.5em;
+    }
+    td {
+      padding: 0.125em 0.5em 0.25em 0.5em;
+    }
+    header {
+      margin-bottom: 4em;
+      text-align: center;
+    }
+    #TOC li {
+      list-style: none;
+    }
+    #TOC a:not(:hover) {
+      text-decoration: none;
+    }
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
+  </style>
+  <!--[if lt IE 9]>
+    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
+  <![endif]-->
+</head>
+<body>
+<header id="title-block-header">
+<h1 class="title">Data Protection</h1>
+</header>
+<h1 id="written-consent">Written consent</h1>
+<p>An example written consent form used for the learner corpus of Lavtian "LaVA" consists of the following (taken from [Kaija, Auziņa 2019]):<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
+<ul>
+<li><p><strong>Information letter</strong>:</p>
+<blockquote>
+<ul>
+<li>basic information about the project, the institutions that are carrying it out, and contact information;</li>
+<li>brief instructions for the participant;</li>
+<li>information about the security of the data on the server used for corpus and privacy;</li>
+<li>explanation on expressing one's will regarding participation in the project (i.e. what to do if the author decides they no longer want their texts to be used in the corpus)</li>
+</ul>
+</blockquote></li>
+<li><p><strong>Permission</strong> with the following statements:</p>
+<blockquote>
+<ul>
+<li>The author agrees that the corpus is available for free and is made for scientific and teaching purposes. The authors do not receive any financial reward for having their texts included in the corpus.</li>
+<li>The author confirms that none of the data in this text can lead to identification of any existing people.</li>
+<li>The author agrees that the text is anonymous and their name is not mentioned anywhere on the corpus website or its public documentation. Each author receives an anonymous code which makes it possible to recognize several texts written by the same author but does not reveal the identity of the author.</li>
+<li>The data included in the corpus can be cited in the educational materials, research papers, and other work in various forms.</li>
+<li>The corpus and all materials included in it can be publicly accessible for an unlimited period and can be viewed and researched an unlimited amount of times.</li>
+<li>All texts included in the corpus can have linguistic information added to them (e.g. error corrections, part-of-speech annotation, etc.).</li>
+<li>The author will have the right to withdraw their consent at any time. The withdrawal of consent shall not affect the lawfulness of processing based on consent before its withdrawal. The author is aware of this opportunity as a data provider.</li>
+</ul>
+</blockquote></li>
+<li><p><strong>Metadata collection questionnaire</strong> asking about:</p>
+<blockquote>
+<ul>
+<li>age;</li>
+<li>gender;</li>
+<li>other corpus-specific metadata</li>
+</ul>
+</blockquote></li>
+</ul>
+<hr />
+<hr />
+<h1 id="which-data-to-anonymize-or-pseudonymize">Which data to anonymize or pseudonymize</h1>
+<p>Article 4 of the General Data Protection Regulation (GDPR) defines <strong>personal data</strong> as any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
+<p>Note that a person you have signed a consent with might share personal information regarding third parties during the session - this data has to be de-identified.</p>
+<p>In practice, potentially sensitive data include:</p>
+<ul>
+<li>names (personal names, nicknames, organization names)</li>
+<li>locations (addresses, city names, district names, etc.)</li>
+<li>age</li>
+<li>date expressions</li>
+<li>numbers (such as house number, phone number, Social Security number, etc.)</li>
+<li>email addresses</li>
+<li>URIs</li>
+<li>implicit references (e.g. someone's job)</li>
+</ul>
+<p>Two common approaches to de-identifying the data are anonymization and psedunymization. Some researchers suggest to keep the raw data layer intact but only disclose to the public a separate layer which has undergone de-identification.</p>
+<hr />
+<hr />
+<h1 id="anonymization">Anonymization</h1>
+<p>Anonymization is a process of <strong>replacing sensitive data with random strings or standartized category names</strong> (also known as categorization), e.g., "Michael" is replaced with "PERSON_NAME", "Berlin" with "LOCATION_NAME", "<a href="mailto:mail@example.com">mail@example.com</a>" with "EMAIL", etc.</p>
+<p>Since voice and appearance might be interpreted as personal data, an ideal anonymization technique for audio- and video data would be hiring an actor to recite or re-enact the original recording. This however is often not feasible due to time and/or budget constraints, so you might consider the following measures instead:</p>
+<ul>
+<li>for audio recordings, <strong>bleeping out</strong> the parts containing personal data</li>
+<li>for video recordings, <strong>blackening or pixelating</strong> some parts of the speaker's body (this is relevant to the processing of the e.g. sign language data)</li>
+</ul>
+<hr />
+<h1 id="pseudonymization">Pseudonymization</h1>
+<p>Pseudonymization is a process of <strong>replacing sensitive data with semantically similar expressions in such a manner that the data can no longer be attributed to a specific person</strong>. For example, "Michelle" becomes "Sandra", "Berlin" - "Münich", etc. Pseudonymization takes more time to carry out than anonymization, however the resulting data is more human-readable and has more potential to be re-used by third party researches (e.g., in a study focusing on certain linguistic properties of named entities).</p>
+<section class="footnotes" role="doc-endnotes">
+<hr />
+<ol>
+<li id="fn1" role="doc-endnote"><p><a href="https://doi.org/10.3384/ecp2020172006">https://doi.org/10.3384/ecp2020172006</a><a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2" role="doc-endnote"><p><a href="https://gdpr-info.eu/art-4-gdpr/">https://gdpr-info.eu/art-4-gdpr/</a><a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section>
+</body>
+</html>
diff --git a/en/index.html b/en/index.html
new file mode 100644
index 0000000000000000000000000000000000000000..c6834ecc0afb8d5f951b6cac28c6b35e204b36b8
--- /dev/null
+++ b/en/index.html
@@ -0,0 +1,189 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+  <meta charset="utf-8" />
+  <meta name="generator" content="pandoc" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+  <title>QUEST Knowledge Base</title>
+  <style>
+    html {
+      line-height: 1.5;
+      font-family: Georgia, serif;
+      font-size: 20px;
+      color: #1a1a1a;
+      background-color: #fdfdfd;
+    }
+    body {
+      margin: 0 auto;
+      max-width: 36em;
+      padding-left: 50px;
+      padding-right: 50px;
+      padding-top: 50px;
+      padding-bottom: 50px;
+      hyphens: auto;
+      word-wrap: break-word;
+      text-rendering: optimizeLegibility;
+      font-kerning: normal;
+    }
+    @media (max-width: 600px) {
+      body {
+        font-size: 0.9em;
+        padding: 1em;
+      }
+    }
+    @media print {
+      body {
+        background-color: transparent;
+        color: black;
+        font-size: 12pt;
+      }
+      p, h2, h3 {
+        orphans: 3;
+        widows: 3;
+      }
+      h2, h3, h4 {
+        page-break-after: avoid;
+      }
+    }
+    p {
+      margin: 1em 0;
+    }
+    a {
+      color: #1a1a1a;
+    }
+    a:visited {
+      color: #1a1a1a;
+    }
+    img {
+      max-width: 100%;
+    }
+    h1, h2, h3, h4, h5, h6 {
+      margin-top: 1.4em;
+    }
+    h5, h6 {
+      font-size: 1em;
+      font-style: italic;
+    }
+    h6 {
+      font-weight: normal;
+    }
+    ol, ul {
+      padding-left: 1.7em;
+      margin-top: 1em;
+    }
+    li > ol, li > ul {
+      margin-top: 0;
+    }
+    blockquote {
+      margin: 1em 0 1em 1.7em;
+      padding-left: 1em;
+      border-left: 2px solid #e6e6e6;
+      color: #606060;
+    }
+    code {
+      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
+      font-size: 85%;
+      margin: 0;
+    }
+    pre {
+      margin: 1em 0;
+      overflow: auto;
+    }
+    pre code {
+      padding: 0;
+      overflow: visible;
+    }
+    .sourceCode {
+     background-color: transparent;
+     overflow: visible;
+    }
+    hr {
+      background-color: #1a1a1a;
+      border: none;
+      height: 1px;
+      margin: 1em 0;
+    }
+    table {
+      margin: 1em 0;
+      border-collapse: collapse;
+      width: 100%;
+      overflow-x: auto;
+      display: block;
+      font-variant-numeric: lining-nums tabular-nums;
+    }
+    table caption {
+      margin-bottom: 0.75em;
+    }
+    tbody {
+      margin-top: 0.5em;
+      border-top: 1px solid #1a1a1a;
+      border-bottom: 1px solid #1a1a1a;
+    }
+    th {
+      border-top: 1px solid #1a1a1a;
+      padding: 0.25em 0.5em 0.25em 0.5em;
+    }
+    td {
+      padding: 0.125em 0.5em 0.25em 0.5em;
+    }
+    header {
+      margin-bottom: 4em;
+      text-align: center;
+    }
+    #TOC li {
+      list-style: none;
+    }
+    #TOC a:not(:hover) {
+      text-decoration: none;
+    }
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
+  </style>
+  <!--[if lt IE 9]>
+    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
+  <![endif]-->
+</head>
+<body>
+<header id="title-block-header">
+<h1 class="title">QUEST Knowledge Base</h1>
+</header>
+<h1 id="types-of-corpora">Types of corpora</h1>
+<ul>
+<li>Introduction</li>
+<li>Multilingual corpora</li>
+<li>Multimodal corpora</li>
+</ul>
+<h1 id="media-formats">Media formats</h1>
+<ul>
+<li>Sound recordings</li>
+<li>Video recordings</li>
+</ul>
+<h1 id="annotation-formats">Annotation formats</h1>
+<ul>
+<li>Introduction</li>
+<li>ELAN</li>
+<li>EXMARaLDA</li>
+<li>FOLKER</li>
+<li>FLEX</li>
+</ul>
+<h1 id="quality-control">Quality control</h1>
+<ul>
+<li><a href="corpus_sevices.html">HZSK Corpus services</a></li>
+</ul>
+<h1 id="certification">Certification</h1>
+<ul>
+<li><a href="resource_types.html">Resource types</a></li>
+</ul>
+<h1 id="legal-aspects">Legal aspects</h1>
+<ul>
+<li><a href="best_practices.html">Best practices</a></li>
+<li><a href="data_protection.html">Data protection</a></li>
+<li><a href="copyright.html">Copyright</a></li>
+</ul>
+</body>
+</html>
diff --git a/en/resource_types.html b/en/resource_types.html
new file mode 100644
index 0000000000000000000000000000000000000000..83b691baa7ab55ae54fdbfed27860ade325f3db2
--- /dev/null
+++ b/en/resource_types.html
@@ -0,0 +1,179 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+  <meta charset="utf-8" />
+  <meta name="generator" content="pandoc" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+  <title>Granularity and Internal Structure of Audiovisual Resources</title>
+  <style>
+    html {
+      line-height: 1.5;
+      font-family: Georgia, serif;
+      font-size: 20px;
+      color: #1a1a1a;
+      background-color: #fdfdfd;
+    }
+    body {
+      margin: 0 auto;
+      max-width: 36em;
+      padding-left: 50px;
+      padding-right: 50px;
+      padding-top: 50px;
+      padding-bottom: 50px;
+      hyphens: auto;
+      word-wrap: break-word;
+      text-rendering: optimizeLegibility;
+      font-kerning: normal;
+    }
+    @media (max-width: 600px) {
+      body {
+        font-size: 0.9em;
+        padding: 1em;
+      }
+    }
+    @media print {
+      body {
+        background-color: transparent;
+        color: black;
+        font-size: 12pt;
+      }
+      p, h2, h3 {
+        orphans: 3;
+        widows: 3;
+      }
+      h2, h3, h4 {
+        page-break-after: avoid;
+      }
+    }
+    p {
+      margin: 1em 0;
+    }
+    a {
+      color: #1a1a1a;
+    }
+    a:visited {
+      color: #1a1a1a;
+    }
+    img {
+      max-width: 100%;
+    }
+    h1, h2, h3, h4, h5, h6 {
+      margin-top: 1.4em;
+    }
+    h5, h6 {
+      font-size: 1em;
+      font-style: italic;
+    }
+    h6 {
+      font-weight: normal;
+    }
+    ol, ul {
+      padding-left: 1.7em;
+      margin-top: 1em;
+    }
+    li > ol, li > ul {
+      margin-top: 0;
+    }
+    blockquote {
+      margin: 1em 0 1em 1.7em;
+      padding-left: 1em;
+      border-left: 2px solid #e6e6e6;
+      color: #606060;
+    }
+    code {
+      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
+      font-size: 85%;
+      margin: 0;
+    }
+    pre {
+      margin: 1em 0;
+      overflow: auto;
+    }
+    pre code {
+      padding: 0;
+      overflow: visible;
+    }
+    .sourceCode {
+     background-color: transparent;
+     overflow: visible;
+    }
+    hr {
+      background-color: #1a1a1a;
+      border: none;
+      height: 1px;
+      margin: 1em 0;
+    }
+    table {
+      margin: 1em 0;
+      border-collapse: collapse;
+      width: 100%;
+      overflow-x: auto;
+      display: block;
+      font-variant-numeric: lining-nums tabular-nums;
+    }
+    table caption {
+      margin-bottom: 0.75em;
+    }
+    tbody {
+      margin-top: 0.5em;
+      border-top: 1px solid #1a1a1a;
+      border-bottom: 1px solid #1a1a1a;
+    }
+    th {
+      border-top: 1px solid #1a1a1a;
+      padding: 0.25em 0.5em 0.25em 0.5em;
+    }
+    td {
+      padding: 0.125em 0.5em 0.25em 0.5em;
+    }
+    header {
+      margin-bottom: 4em;
+      text-align: center;
+    }
+    #TOC li {
+      list-style: none;
+    }
+    #TOC a:not(:hover) {
+      text-decoration: none;
+    }
+    code{white-space: pre-wrap;}
+    span.smallcaps{font-variant: small-caps;}
+    span.underline{text-decoration: underline;}
+    div.column{display: inline-block; vertical-align: top; width: 50%;}
+    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+    ul.task-list{list-style: none;}
+    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
+  </style>
+  <!--[if lt IE 9]>
+    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
+  <![endif]-->
+</head>
+<body>
+<header id="title-block-header">
+<h1 class="title">Granularity and Internal Structure of Audiovisual Resources</h1>
+</header>
+<h1 id="deposit">Deposit</h1>
+<p>Any data deposited with a data centre to be ingested into a repository or otherwise distributed can be referred to as a Deposit. For a Deposit, the <strong>legal situation</strong> must be clear, and <strong>basic provenance information</strong> must exist, but regarding size, content or data quality or consistency there are no specific requirements. This is an intended underspecification for a set of files of various kinds, making it possible to handle e.g. valuable legacy data before it can be curated or to describe “work in progress” data. Depending on its characteristics, after curation a Deposit can equal a <span class="title-ref">Collection/Corpus</span>, be a part of a <span class="title-ref">Collection/Corpus</span>, or comprise several <span class="title-ref">Collections/Corpora</span>.</p>
+<hr />
+<h1 id="collection">Collection</h1>
+<p>For a set of files to be called a Collection, further requirements must be met. When talking about audiovisual resources, a Collection is a <strong>structured set</strong> of files, i.e. at least a set of documented recordings, based on a specific design, even if only “with a shared origin and/or topic”. The content itself however might not be structured, transcripts e.g. must be browsable, but not searchable, making even images of handwritten transcripts possible. Accordingly, there are no requirements regarding the existence of data models for any parts of the content of the resource. Only basic legal and administrative metadata on the resource (for all included files) and basic source metadata for the recording situations including the participants is required.</p>
+<hr />
+<h1 id="corpus">Corpus</h1>
+<p>While unstructured annotation data, as described above, makes a resource a <span class="title-ref">collection</span>, structured annotation data alone does not make the resource a corpus. For a resource to qualify as a corpus, further basic requirements must be met regarding the design and processability. The <strong>corpus design</strong> must be thoroughly documented to allow for a manual assessment of the plausibility and suitability regarding requirements on completeness/representativeness for the chosen purpose. Furthermore, the <strong>quality of the content</strong> must have been manually assessed to ensure reliability and validity of the chosen conventions/schemas and their application, and this quality must be documented. Regarding general processability and in particular the complexity and reliability of queries, the following criteria must be met:</p>
+<ul>
+<li>The <strong>main structure</strong> defining the corpus data, i.e. the various files, their relationships and their metadata, must be <strong>machine readable</strong> and all paths to files must be resolvable.</li>
+<li>It must be possible to reliably <strong>select specific parts of the data</strong>, i.e. in effect annotation files, (to query or within a query result) on the basis of metadata on recording situations.</li>
+<li>All <strong>participants</strong> must be defined and recognizable in all parts of the data, i.e. unique speaker IDs are required, and a relationship between participants and annotation data is required when participants are not (redundantly) modelled as metadata of recording situations. It must be possible to reliably select specific parts of the data, i.e. in effect annotation files, (to query or within a query result) on the basis of participant metadata. The concept “contribution” must be modelled by the transcription/annotation data to allow reference to tokens/annotations produced by a specific participant.</li>
+<li>If the recordings have not been completely transcribed/annotated, it must be documented <strong>which parts have been transcribed/annotated</strong> and why. Recordings that have not been transcribed/annotated at all must be documented accordingly. An alternative to more fine-grained time-aligned annotation/analysis are longer events with thematic or structural information, e.g. conversation phases or topics.</li>
+<li>It must be explicit (and machine-readable) <strong>which tiers</strong> (or similar components of the data), if any, contain the most basic annotation, usually an orthographic transcription (“token layer”), and which tiers (or similar components of the data) contain higher level annotations referring to this base layer. The content and conventions/schemas must be documented for all tiers (or similar components of the data). <strong>Transcription conventions</strong> must be syntactically validated on an appropriate level and the result documented. If an annotation schema exists, only tags from this schema must occur in the tier (syntactic consistency).</li>
+<li>Within a transcription <strong>different information types</strong> must be explicitly marked-up and separatable, e.g. descriptions of non-verbal behaviour and comments must be identifiable as non-transcription data.</li>
+<li>If a <strong>tokenization</strong> is not explicitly included, tokenization must be possible according to the documented conventions, i.e. textual content must be automatically parsable. The result must however not be tokens as in standardized/normalized written words, since this is not a relevant unit in all systems for description of non-written language.</li>
+</ul>
+<h2 id="sessions-speech-events-and-bundles">Sessions, (Speech) Events and Bundles</h2>
+<ul>
+<li><span class="title-ref">Session</span> = A complete recording session</li>
+<li><span class="title-ref">Sub-Session(?)</span> = A part of a recording session, e.g. corresponding to a task</li>
+<li><span class="title-ref">Bundle</span> = No semantics, just files that belong together - without multiple not synchronized media files</li>
+</ul>
+</body>
+</html>