Data Model Implementation

For the present case, the TEI-standard was identified as an appropriate markup for words. In terms of the implementation this means that the TEI guidelines have to be implemented as an object type compatible with the chosen repository framework. However, the TEI standard is not complete regarding the diachronic dimension, i.e. information on the development of the word. To be compatible with the elements of the TEI standard on the one hand and to best meet the requirements of the application on the other hand, some attributes are added. This solution allows for processing the xml files according to the TEI standard by ignoring the additional attributes and at the same time, if needed, additional markup can be extracted. The additional attributes comprise a link to the corpus meta data, but also emph{position} and emph{occurrence} of the affixes. Information on the position and some quantification thereof are potentially relevant for a wealth of research questions, such as predictions on the productivity of derivatives and their interaction with the phonological or syntactic modules. So they were included with respect to future use.

For reasons of efficiency in subsequent processing, the historic dates emph{begin} and emph{end} were included in both the word data model and the corpus data model. The result of the word data model is given in listing ref{lst:worddatamodel}. Whereas attributes of the objecttype are specific to the repository framework, the TEI structure can be recognized in the hierarchy of the meta data element starting with the name emph{w} (line ref{src:wordbegin}).