(an excerpt from Guidelines for Building Language Corpora Under German Law, licensed under a CC-BY 4.0 International license)1
In general, texts are protected by copyright in Germany2 if they satisfy an originality standard and it has not been more than 70 years since the death of their authors. How the originality standard is defined, and therefore how it is met, is a controversial question and its answer may differ from case to case and from court decision to court decision. The requirements for meeting the originality standard for copyright protection have been set lower and lower by courts over the past decades. Texts such as simple statements of the news or plain business correspondence may still not be protected by copyright because they do not meet the originality standard. But there is the concept of “kleine Münze”, a sort of everyday creativity of people in general which is fully protected by copyright.
There are also certain related rights that are especially relevant for texts:
Since 2013, there is a related right for publishers in Germany which sidesteps the originality standard, which grants protection to even the shortest paragraphs for a term of one year. This protection follows mere publication, and is limited to the right of making publicly available. It is, therefore, only invoked when the press content is placed online.
There are two related rights that have a wider protected domain but a smaller scope of application. These include scientific editions of works that are not protected by copyright, and one concerning posthumous works, i.e. works that are published after the death of their authors and as the case may be after the copyright term (70 years, see above). These rights protect all uses of these works (not only for online use) for 25 years.
Finally, there is the related right for the creators of databases. This term is 15 years and is not related to the contents of databases, but to the manner in which it is structured. This related right does not apply to unstructured data and requires substantial investments of time and/or money. Right holders (i.e. those who have created a database through substantial investment) are protected from “substantial parts” of the database being reproduced or used further.
Related rights are distinguished from copyright especially in two key ways. First, related rights have a shorter term of protection. Second, related rights can protect the works created by a legal person, e.g. a company. Under copyright, companies may at most have exclusive rights in copyright-protected works, while authors may only be natural persons.
The rules of copyright law that are most relevant for written corpora affect the right of reproduction (§ 16), the right of distribution (§ 17), the right of making works available to the public (§ 19a), the related rights on scientific editions (§ 70) and posthumous works (§ 71), the related right of makers of a database (§ 87b) and the related right of press publishers (§ 87f). It is still a legal gray area whether Text and Data Mining (TDM)3 and thus quantitative linguistic analysis are types of use with copyright implications which are not yet mentioned in § 15 UrhG but protected nonetheless. (More specifically, whether the act of performing analysis on the data falls within the scope of § 15 UrhG; the resulting digital copy undoubtedly falls under § 16 UrhG.) Court decisions clarifying this issues can perhaps be expected in the foreseeable future. Because there are clear parallels between TDM and a human reading a text, which is not a type of use relevant for copyright, it is easily conceivable that courts may rule that TDM is permitted by law even without permission of the right holder, similar to reading.
Laws that balance of interests of authors and users are so-called copyright exceptions. These determine which types of uses are allowed without the consent of the right holders, and under which circumstances. The use of copyright protected material as research data is only broadly provided for. The so-called research exception (§ 52a UrhG) for example allows making available “small scale” works as well as “individual articles from newspapers or periodicals,” and only if and insofar this is “necessary” for the respective research purpose and is “justified” for the “pursuit of non-commercial aims”. The copies may be made available “exclusively for a specifically limited circle of persons” which may include a small research team whose members -- according to the legal commentators Dreier/Schulze (2013) -- may be of different research institutions, or a seminar, but not the whole scientific community. The limited circle must be limited to people who access the materials for their own scientific purposes4 and the measures taken must be effective considering the state of the art at the time.
The right of temporary acts of reproduction (§ 44a UrhG) allows a temporary caching of electronic data, although this right is often insufficient to legally cover the empirical methods and replicable results required by scientific research. The same can be said for the right of reproductions for private use, which are permitted by § 53 I UrhG, but allows a transfer only in private, i.e. not in work-related scientific field, and § 53 II UrhG, which allows a reproduction only for one's own personal scientific use (the possibilities of transfer are regulated in § 52a). The right of digital reproductions of complete books or magazines is further limited in § 53 IV UrhG. Concerning all the exceptions, one must keep in mind that they are subordinate to contrary license agreements. Additionally, § 52a IV UrhG states that an equitable remuneration shall be paid (guided by rates set out in the VG WORT case).5
Attention should also be paid to the fact that exceptions of copyright protection do not apply for related rights in the same way. The have their own respective protection exceptions that are named in the respective part of the UrhG.
As a conclusion it can be said that legal exceptions are typically not a sufficient basis for making available written corpora permanently. Making available a copy of the written corpus that has been the research object is not covered by any of the above mentioned research exceptions what may complicate the repeatability and thus the verification of respective research projects massively. Often enough even building up a corpus of texts for which no express permission was given, is unlawful because the digital copies produced in the process are not necessarily covered by copyright exceptions.
For building a corpus in conformity with the law, the consent of the right holders must be obtained, or it must be ensured that only texts are used:
A thorough checking/clarification of rights is therefore necessary. The costs for this may possibly be reduced by cooperating with other centers that that seek to use this data and therefore check its legal status.
The situation tends to be considerably easier, if the intended use is covered by standard licenses which grant the necessary rights for the use in a corpus, to everyone, in advance. These are called “Public Licenses”. In best-case scenarios, the author has already published his / her texts under a sufficiently liberal standard license. But often this is not the case. This means that individual license agreements with the respective right holders must be made, which requires time and other resources. In the case of texts published by presses / publishing houses, these may typically be contacted directly, because the publisher often obtains the right to license electronic uses in their contracts with authors. The same often applies to texts which are published on web portals, because operators are often granted the respective rights through “Terms and Conditions” agreements.
For spoken corpora, both copyright and related rights may be issues, especially when it comes to:
As soon as these materials are used in the course of research, the consent of relevant rightholders is necessary to perform the research legally. A general research and education law regarding copyright and other rights has not yet been implemented in Europe, although such a regulation has been, and is, continuously discussed. Currently, only the quotation exception (§ 51 of the German Act on Copyright and Related Rights (UrhG)) and some special regulations for building personal scientific archives allow very limited use of someone else’s work at all.
The consent of the rightholders is usually given through an appropriate license agreement (or contract). In practice, it is a considerable problem when rightholders are not known or cannot be found. This is important because every right holder must give his / her consent before a use of the work which is otherwise only permitted for right holders is allowed (with the exception of films, and if there are no other special agreements). If more than one person created the work, the consent of each corightsholder must be obtained.
This also refers to transcripts of primary data protected by copyright law (e.g. spoken and song recordings), even if the transcript is technically the work of the scientist in the sense of copyright. In such cases this transcript is considered a simple copy of the work which is included in the primary data or a derivative work (e.g. translation). Both of these types of use are assigned to the original rightholder (except in above-mentioned copyright exceptions, e.g. the quotation exception).
Extra precaution is appropriate if the copyright-protectable material has not yet been, but will be published within a scientific work in a manner which cannot be avoided due to the best scientific practice in disclosing sources. This affects the authors’ right of personality because it is their choice whether their works are disclosed to the public or not.
Adaptations in the meaning of the law are contents that are based on a previous work and meet the originality standards to qualify for protection (the law of copyright in adaptations), even if the previous work is not longer protected by copyright. If the previous work is still protected by copyright, adaptations may only be published with the consent of the author of the previous work. Transformations are, according to prevailing legal opinion,modified versions of previous works that do not meet the requirements of protection for copyright in adaptations. They also may also only be published with the consent of the author of the previous work.
The threshold to adaptation or transformation is reached if the an average observer’s impression of a work is changed noticeably. Concerning pictures, this is for example the case if they are cropped or their sizes changed extremely. For films, e.g. if they are musically rendered. Texts are changed noticeably if they are shortened, amended, mixed with other texts or translated. A new layout or a transmission of a text from analogue to digital form is not an adaptation or transformation -- although usually a reproduction -- meaning generally when a text is removed from its original medium/context and remains recognizable as a discrete work. (In exceptional cases the change of the context of the work may result in an adaptation. For text corpora for research purposes, however, this is hard to imagine.)
When the original work is no longer recognizable by an average observer, no adaptation exists, but rather a new, independent work. Here, courts have said that the personal characteristics of the pre-existing work “fade away” from the new content.7 The difference between an adaptation (§ 23 UrhG) and an independent work created in free use (§ 24 UrhG) is, however, fluid.8 If, on the surface, the new content has nothing in common with the previous material, free use of the previous material is unproblematic (as far as the law of adaptations). Often, this results from the method which is used within Text and Data Mining. If a text, for example, is statistically analyzed or annotated, it can usually not be reconstructed from the emerging statistics or annotation. Thus both research results are not adaptations of the source text within the meaning of the law.
For source texts that are still protected, this does not solve the problem of contractual terms that prohibit temporary copies / caching that is technically necessary for the development of research results and making the texts permanently available only with consent, (see above). Apart from that, TDM may also be contractually prohibited because civil law largely allows contracting parties to agree on what they wish (the law of “private autonomy”). If an editor for example forbids TDM or the publication of TDM results, based on a text within a license agreement with a scientific institution that regulates the access to the material, this must be respected, even if the research results are independent and not adaptations or transformations and TDM should a priori not be regarded as a copyright protected type of use.9 In this case, the basis for enforcement of the prohibition is not the copyright law, but the contract which was entered between the two parties. Such a contract, however, affects only the relevant parties.
It is possible to incorporate certain conditions for the use of the material in the agreement instead of a strict prohibition. This can be executed even by standard licenses, which are contracts. It is therefore conceivable that research or TDM results are made subject to copyleft terms.10 Disregarding software licenses, however, it is absolutely not common that the conditions of standard licenses impose conditions independently of any existing legal position based on an absolute right (such as copyright or database protection). The six Creative-Commons licenses even explicitly state that they do not restrict anything that the licensee is allowed to do without the license anyway.11 Their copyleft terms thus only apply under the pre-condition that there is a legal protection in the first place that requires permission of a rightsholder.12 Thus copyleft and other limitations of CC licences would only be effective if TDM is regarded as a type of use within the meaning of the copyright law.
Since this question is not yet resolved everywhere in the world, the new CC license version 4.0 clarifies explicitly that the results of TDM should not be considered as an adaptation by the licensor. Thus neither the copyleft conditions of CC licenses13 nor the other conditions "attribution," "no commercial use" and "no edits allowed" need to be taken into account, as far as TDM and its independent results are concerned.
If research results are still somehow considered adaptations or transformations within the meaning of the law, i.e. outside of TDM and without other licenses influencing the character of adaptation, the same recommendations apply for further use of these research results as for the use of independent works.
According to § 4 UrhG, collections of works and databases are protected where the selection or arrangement of the elements constitute the author's own intellectual creation, regardless of whether the individual elements are protected or not. This may be relevant if collections of texts in the public domain are included in a corpus. This protection of “databases works” should not be confused with mere databases, whose creators are additionally protected by §§ 87a - 87e UrhG (see above). The related right of the maker of a database only requires substantial investment; in contrast, a “database work” requires such an extraordinary arrangement of the content that the arrangement itself can be regarded as a creation (similar to authorship). Thus, the threshold for the (high) level of protection of a “database work” is much higher than those for a database protected in accordance to §§ 87a et seq. UrhG. The latter right of the maker of a database place may create restrictions of use if parts of a database are included in a corpus or such a corpus is made available.14
After § 61 UrhG was inserted into the Copyright Act in 2014, there are now some types of uses permitted by law concerning text works from collections of publicly accessible libraries, educational institutions, museums and archives, if they are already published and the respective right holders can not be found or identified even by a diligent search (defined in § 61a UrhG), and this research result was recorded in a central register. The permitted types of use concern making available to the public (§ 19a UrhG) and reproduction ( § 16 I UrhG). Since the right to create derivative works is not included, it may not be possible to rely on § 61 UrhG when using such works in corpora.15 To take the path of least legal risk, orphan works should only be included in corpora in a way whereby no adaptation or transformation is carried out (see above).
There is still the unavoidable problem that the status of an orphan work may subsequently expire if the right holders appear and/or become known. From this point in time, the usual rules for the use of works apply again.
The terms of use of commercial software are usually clearly laid out, in order to decide the terms under which it may be used and what implications may arise when such software is used to create independent and derivative works. Depending on the approach, the output of the software, i.e. the research result or document, remains independent in its legal status from that of the software.
Sometimes the legal status is more vague within software tools that were developed in an academic context, as they are often based on data (dictionaries or training corpora) which might be affected by third party rights.
For software developed in-house, it needs to be noted that the decision if and under which license the software will be released is reserved for the employer for whom the software was created (§ 69b UrhG).
https://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/guidelines_review_board_linguistics_corpora.pdf↩︎
We only give information about the legislation in the Federal Republic of Germany. How works of German authors are protected in other countries and how foreign authors are protected in Germany is regulated in some international conventions. The most important ones are the Berne Convention for the Protection of Literary and Artistic Works (usually known as the Berne Convention) and the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS). In Art. 5.1. the Berne Convention says that every state party must acknowledge the protection of works of citizens of other state parties as it acknowledges the protection of works of its own citizens. There are 168 countries that are parties to the Berne Convention (i.a. the EU, the USA, China, Japan, Russia and India).↩︎
We adopt the term Text and Data Mining because it is now frequently used in discussions by the international legal community. At the moment, there is no coherent system of definitions of the different terms which are used for scientific analysis of data, but many slightly different and partly overlapping nomenclatures. It can be argued that the meaning of TDM in any case includes quantitative linguistic analysis.↩︎
BT-Drucksache 15/38, S. 2↩︎
In its decision of March 24, 2011 (file reference 6 WG 12/09) concerning the case VG Wort - Federal States, the higher regional court (OLG) of Munich considered a remuneration of 10 euros + VAT per work as equitable for scientific research within the scope of § 52a I Nr. 2 UrhG.↩︎
Whether the originality standard applies to a succession of randomly assorted sentences is unclear. § 39 UrhG “Alterations of the work,” which belongs to the moral rights, is one argument that this method is not legally sound.↩︎
Federal Supreme Court of Germany in “Mecki-Igel I”, GRUR 1958, 500, 502.↩︎
See Dreier/Schulze, Urheberrechtsgesetz Kommentar, 4. ed., § 24 Marginal No. 1 and § 23 Marginal No. 4. [2016 note: a newer 2nd edition was published in 2015]↩︎
Whether TDM can be regarded as a type of use is currently being discussed by jurists and will certainly keep the courts busy. There are many reasons why TDM should be regarded as a kind of reading, which is as so-called Werkgenuss permitted without consent. See above 2.1.at the end.↩︎
Meaning, that the licensee must offer the licensed content to the public under identical or similar condtions.↩︎
See e.g. section 8.d. in the license CC-BY Version 4.0.↩︎
The data bank licenses of Open Data Commons are an exceptional case, because these postulate their copyleft-conditions even for those regions of the world where no database protection law exists, e.g. the United States.↩︎
The name for the copyleft mechanism of CC licenses is "share alike", abbreviated as "SA".↩︎
See the court decision of the European Court of Justice (October 9, 2008, Case C304/07) and of the Federal Supreme Court (August 13, 2009, file reference I ZR 130/04)↩︎
At the copy deadline of the present document, this was still an open question. Any news on this point will be published on the CLARIN-D Legal Information Platform. [2016 note: in general, it seems that adapted versions are not covered by § 61 UrhG].↩︎