Skip to content
Snippets Groups Projects
Commit 257752b3 authored by Peukert, Dr. Hagen's avatar Peukert, Dr. Hagen
Browse files

Morphilo Dokumentation finalized

parent f7c10a52
No related branches found
No related tags found
No related merge requests found
Pipeline #1600 failed
Showing
with 1266 additions and 1101 deletions
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
Morphilo_doc/_build/html/_images/120px-Green_eyes_kitten.jpg

6.92 KiB

Morphilo_doc/_build/html/_images/FotoHP2012.jpg

294 KiB

Morphilo_doc/_build/html/_images/architecture.png

54.8 KiB

Morphilo_doc/_build/html/_images/morphilo_uml.png

72.6 KiB

Morphilo_doc/_build/html/_images/mycore_architecture-2.png

87.7 KiB

......@@ -3,18 +3,18 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Documentation Morphilo Project
====================================
Morphilo Project Documentation
==============================
.. toctree::
:maxdepth: 3
:caption: Contents:
source/architecture.rst
source/datamodel.rst
source/controller.rst
source/view.rst
source/architecture.rst
source/framework.rst
source/controller.rst
Indices and tables
==================
......
Software Design
===============
MVC Model
---------
.. image:: architecture.*
A standard architecture for software has become a form of an
observer pattern called *Model-View-Controller (MVC)*-Model [#f3]_.
This is escpecially true for web-based applications that use
some form of a client-server architecture since these systems naturally divide
the browser view from the rest of the program logic and, if dynamically set up,
also from the data model usually running in an extra server as well.
As already implied, the MVC-pattern modularizes the program into three components: model, view, and
controller coupled *low* by interfaces. The view is concerned with
everything the actual user sees on the screen or uses to interact with the
machine. The controller is to recognize and process the events initiated by the
user and to update the view. Processing involves to communicate with the model.
This may involve to save or provide data from the data base.
From all that follows, MVC-models are especially supportive for reusing
existing software and promotes parallel development of its three components.
So the data model of an existing program can easily be changed without touching
the essentials of the program logic. The same is true for the code that handles
the view. Most of the time view and data model are the two components that need
to be changed so that the software appearance and presentation is adjusted to
the new user group as well as the different data is adjusted to the needs of the different
requirements of the new application. Nevertheless, if bugs or general changes in
the controller component have to be done, it usually does not affect
substantially the view and data model.
The architecture of a possible **take-and-share**-approach for language
resources is visualized in figure \ref{fig:architect}. Because the very gist
Another positive consequence of MVC-models is that several views (or even
models) could be used simultaneously. It means that the same data could be
presented differently on the user interface.
Morphilo Architecture
---------------------
.. figure:: images/architecture.png
Figure 1: Basic Architecture of a Take-&-Share-Approach
The architecture of a possible *take-and-share* approach for language
resources is visualized in figure 1. Because the very gist
of the approach becomes clearer if describing a concrete example, the case of
annotating lexical derivatives of Middle English and a respective database is
given as an illustration.
annotating lexical derivatives of Middle English with the help of the Morphilo Tool
[#f1]_ using a `MyCoRe repository <http://www.mycore.de>`_ is given as an illustration.
However, any other tool that helps with manual annotations and manages metadata of a corpus could be
substituted here instead.
substituted here instead. [#f2]_
After inputting an untagged corpus or plain text, it is determined whether the
input material was annotated previously by a different user. This information is
usually provided by the metadata administered by the annotation tool; in the case at
hand it is called \emph{Morphilizer} in figure \ref{fig:architect}. An
alternative is a simple table look-up for all occurring words in the datasets Corpus 1 through Corpus n. If contained
completely, the \emph{yes}-branch is followed up further -- otherwise \emph{no}
hand, the *Morphilo* component. An alternative is a
simple table look-up for all occurring words in the datasets Corpus 1 through Corpus n. If contained
completely, the *yes*-branch is followed up further -- otherwise *no*
succeeds. The difference between the two branches is subtle, yet crucial. On
both branches, the annotation tool (here \emph{Morphilizer}) is called, which, first,
sorts out all words that are not contained in the master database (here \emph{Morphilo-DB})
both branches, the annotation tool (here *Morphilo*) is called, which, first,
sorts out all words that are not contained in the master database (here *MyCoRe* repository)
and, second, makes reasonable suggestions on an optimal annotation of
the items. In both cases the
annotations are linked to the respective items (e.g. words) in the
text, but they are also persistently saved in an extra dataset, i.e. Corpus 1
the items. The suggestions made to the user are based on simple string mapping of a saved list of prefixes and suffixes
whereas the remainder of the mapping is defined as the word root. The annotations are linked to the respective items (e.g. words) in the
text, but they are also persistently saved in an extra dataset, i.e. in figure 1 in one of the delineated Corpus 1
through n, together with all available metadata.
The difference between both information streams is that
in the \emph{yes}-branch a comparison between the newly created dataset and
all of the previous datasets of this text is carried out. Within this
unit, all deviations and congruencies are marked and counted. The underlying
The difference between the two branches in figure 1 is that
in the *yes*-branch a comparison between the newly created dataset and
all of the previous datasets of this text is carried out while this is not
possible if a text was not annotated before. Within this
unit, all deviations and congruencies of the annotated items are marked and counted. The underlying
assumption is that with a growing number of comparable texts the
correct annotations approach a theoretic true value of a correct annotation
while errors level out provided that the sample size is large enough. How the
distribution of errors and correct annotations exactly looks like and if a
normal distribution can be assumed is still object of the ongoing research, but
independent of the concrete results, the component (called \emph{compare
manual annotations} in figure \ref{fig:architect}) allows for specifying the
independent of the concrete results, the component (called *compare
manual annotations* in figure 1) allows for specifying the
exact form of the sample population.
In fact, it is necessary at that point to define the form of the distribution,
sample size, and the rejection region. The standard setting are a normal
distribution, a rejection region of $\alpha = 0.05$ and sample size of $30$ so
that a simple Gau\ss-Test can be calculated.
sample size, and the rejection region. To be put it simple here, a uniform distribution in form of a threshold value
of e.g. 20 could be defined that specifies that a word has to be annotated equally by
20 different users before it enters the master database.
Continuing the information flow further, these statistical calculations are
Continuing the information flow in figure 1 further, the threshold values or, if so defined,
the results of the statistical calculation of other distributions respectively are
delivered to the quality-control-component. Based on the statistics, the
respective items together with the metadata, frequencies, and, of course,
annotations are written to the master database. All information in the master
database is directly used for automated annotations. Thus it is directly matched
to the input texts or corpora respectively through the \emph{Morphilizer}-tool.
to the input texts or corpora respectively through the *Morphilo*-tool.
The annotation tool decides on the entries looked up in the master which items
are to be manually annotated.
......@@ -64,3 +103,34 @@ the user will have access to the annotations made in the respective dataset,
correct them or save them and resume later. It is important to note that the user will receive
the tagged document only after all items are fully annotated. No partially
tagged text can be output.
Repository Framework
--------------------
.. figure:: images/mycore_architecture-2.png
Figure 2: `MyCoRe <http://www.mycore.de>`_-Architecture and Components
To specify the repository framework, the morphilo application logic will have to be implemented,
a data model specified, and the input, search and output mask programmed.
There are three directories which are
important for adjusting the MyCoRe framework to the needs of one's own application.
These three directories
correspond essentially to the three components in the MVC model as explicated above. Roughly, they are also envisualized in figure 2 in the upper
right hand corner. More precisely, the view (*Layout* in figure 2) and the model layer
(*Datenmodell* in figure 2) can be done
completely via the *interface*, which is a directory with a predefined
structure and some standard files. For the configuration of the logic an extra directory is offered (*/src/main/java/custom/mycore/addons/*). Here all, java classes
extending the controller layer should be added.
Practically, all three MVC layers are placed in the
*src/main/*-directory of the application. In one of the subdirectories,
*datamodel/def,* the datamodel specifications are defined as xml files. It parallels the model
layer in the MVC pattern. How the data model was defined will be explained in the section Data Model.
.. rubric:: Notes
.. [#f1] Peukert, H. (2012): From Semi-Automatic to Automatic Affix Extraction in Middle English Corpora: Building a Sustainable Database for Analyzing Derivational Morphology over Time, Empirical Methods in Natural Language Processing, Wien, Scientific series of the ÖGAI, 413-23.
.. [#f2] The source code of a possible implementation is available on https://github.com/amadeusgwin/morphilo. The software runs in test mode on https://www.morphilo.uni-hamburg.de/content/index.xml.
.. [#f3] Butz, Andreas; Antonio Krüger (2017): Mensch-Maschine-Interaktion, De Gruyter, 93ff.
\ No newline at end of file
This diff is collapsed.
Data Model
==========
.. _concept:
Conceptualization
-----------------
......@@ -11,16 +13,17 @@ and multi-user processing is necessary. In addition, the framework should
support web technologies, be well documented, and easy to extent. Ideally, the
MVC pattern is realized.
\subsection{Data Model}\label{subsec:datamodel}
The guidelines of the
\emph{TEI}-standard\footnote{http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf} on the
word level are defined in line with the structure defined above in section \ref{subsec:morphologicalSystems}.
In listing \ref{lst:teiExamp} an
The guidelines of the `TEI standard <http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf>`_ on the
word level are defined in line with the defined word structure.
In listing :ref:`teiexamp` an
example is given for a possible markup at the word level for
\emph{comfortable}.\footnote{http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-m.html}
`comfortable <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-m.html>`_
.. _teiexamp:
.. code-block:: xml
:caption: TEI-example for *comfortable*
\begin{lstlisting}[language=XML,
caption={TEI-example for 'comfortable'},label=lst:teiExamp]
<w type="adjective">
<m type="base">
<m type="prefix" baseForm="con">com</m>
......@@ -28,7 +31,6 @@ caption={TEI-example for 'comfortable'},label=lst:teiExamp]
</m>
<m type="suffix">able</m>
</w>
\end{lstlisting}
This data model reflects just one theoretical conception of a word structure model.
Crucially, the model emanates from the assumption
......@@ -38,7 +40,7 @@ other hand, is enclosed in the base, which basically means a stronger lexical,
and less abstract, attachment to the root of a word. Modeling prefixes and suffixes on different
hierarchical levels has important consequences for the branching direction at
subword level (here right-branching). Left the theoretical interest aside, the
choice of the TEI standard is reasonable with view to a sustainable architecture that allows for
choice of the *TEI*-standard is reasonable with view to a sustainable architecture that allows for
exchanging data with little to no additional adjustments.
The negative account is that the model is not eligible for all languages.
......@@ -51,38 +53,44 @@ stem and corresponds to the overwhelming majority of all research carried out
Implementation
--------------
As laid out in the task analysis in section \ref{subsec:datamodel}, it is
advantageous to use established standards. It was also shown that it makes sense
It is
advantageous to use established standardsn and it makes sense
to keep the meta data of each corpus separate from the data model used for the
words to be analyzed.
For the present case, the TEI-standard was identified as an
For the present case, the *TEI*-standard was identified as an
appropriate markup for words. In terms of the implementation this means that
the TEI guidelines have to be implemented as an object type compatible with the chosen
the *TEI*-guidelines have to be implemented as an object type compatible with the chosen
repository framework. However, the TEI standard is not complete regarding the
diachronic dimension, i.e. information on the development of the word. To
be compatible with the elements of the TEI standard on the one hand
and to best meet the requirements of the application on the other hand, some attributes
are added. This solution allows for processing the xml files according to
the TEI standard by ignoring the additional attributes and at the same
the *TEI*-standard by ignoring the additional attributes and at the same
time, if needed, additional markup can be extracted. The additional attributes
comprise a link to the corpus meta data, but also \emph{position} and
\emph{occurrence} of the affixes.
comprise a link to the corpus meta data, but also *position* and
*occurrence* of the affixes.
Information on the position and some quantification thereof are potentially relevant for a
wealth of research questions, such as predictions on the productivity of
derivatives and their interaction with the phonological or syntactic modules. So they were included
with respect to future use.
For reasons of efficiency in subsequent processing,
the historic dates \emph{begin} and \emph{end} were included in both the word
the historic dates *begin* and *end* were included in both the word
data model and the corpus data model. The result of the word data model is given
in listing \ref{lst:worddatamodel}.
in listing :ref:`worddatamodel`.
Whereas attributes of the objecttype are specific to the repository framework, the TEI structure can be
recognized in the hierarchy of the meta data element starting with the name
\emph{w} (line \ref{src:wordbegin}).
*w* (line 17).
\begin{lstlisting}[language=XML,caption={Word Data
model},label=lst:worddatamodel,escapechar=|] <?xml version="1.0" encoding="UTF-8"?>
.. _worddatamodel:
.. code-block:: xml
:caption: Word Data Model
:linenos:
:emphasize-lines: 17
<?xml version="1.0" encoding="UTF-8"?>
<objecttype
name="morphilo"
isChild="true"
......@@ -98,7 +106,7 @@ model},label=lst:worddatamodel,escapechar=|] <?xml version="1.0" encoding="UTF-8
<xs:element name="morphilo">
<xs:complexType>
<xs:sequence>
<xs:element name="w" minOccurs="0" maxOccurs="unbounded">|label{src:wordbegin}|
<xs:element name="w" minOccurs="0" maxOccurs="unbounded">
<xs:complexType mixed="true">
<xs:sequence>
<!-- stem -->
......@@ -187,33 +195,34 @@ model},label=lst:worddatamodel,escapechar=|] <?xml version="1.0" encoding="UTF-8
</element>
</metadata>
</objecttype>
\end{lstlisting}
Additionally, it is worth mentioning that some attributes are modeled as a
\emph{classification}. All these have to be listed
*classification*. All these have to be listed
as separate elements in the data model. This has been done for all attributes
that are more or less subject to little or no change. In fact, all known suffix
and prefix morphemes should be known for the language investigated and are
therefore defined as a classification.
The same is true for the parts of speech named \emph{pos} in the morphilo data
The same is true for the parts of speech named *pos* in the morphilo data
model above.
Here the PENN-Treebank tagset was used. Last, the different morphemic layers in
the standard model named \emph{m} are changed to $m1$ through $m5$. This is the
the standard model named *m* are changed to *m1* through *m5*. This is the
only change in the standard that could be problematic if the data is to be
processed elsewhere and the change is not documented more explicitly. Yet, this
change was necessary for the MyCoRe repository throws errors caused by ambiguity
issues on the different $m$-layers.
issues on the different *m*-layers.
The second data model describes only very few properties of the text corpora
from which the words are extracted. Listing \ref{lst:corpusdatamodel} depicts
from which the words are extracted. Listing :ref:`corpusdatamodel` depicts
only the meta data element. For the sake of simplicity of the prototype, this
data model is kept as simple as possible. The obligatory field is the name of
the corpus. Specific dates of the corpus are classified as optional because in
some cases a text cannot be dated reliably.
.. _corpusdatamodel:
.. code-block:: xml
:caption: Corpus Data Model
\begin{lstlisting}[language=XML,caption={Corpus Data
Model},label=lst:corpusdatamodel]
<metadata>
<!-- Pflichtfelder -->
<element name="korpusname" type="text" minOccurs="1" maxOccurs="1"/>
......@@ -228,14 +237,13 @@ Model},label=lst:corpusdatamodel]
<target type="morphilo"/>
</element>
</metadata>
\end{lstlisting}
As a final remark, one might have noticed that all attributes are modelled as
strings although other data types are available and fields encoding the dates or
the number of words suggest otherwise. The MyCoRe framework even provides a
data type \emph{historydate}. There is not a very satisfying answer to its
data type *historydate*. There is not a very satisfying answer to its
disuse.
All that can be said is that the use of data types different than the string
leads later on to problems in the convergence between the search engine and the
repository framework. These issues seem to be well known and can be followed on
github.
\ No newline at end of file
`github <https://github.com/MyCoRe-Org>`_.
\ No newline at end of file
Framework
=========
\begin{figure}
\centering
\includegraphics[scale=0.33]{mycore_architecture-2.png}
\caption[MyCoRe-Architecture and Components]{MyCoRe-Architecture and Components\protect\footnotemark}
\label{fig:abbMyCoReStruktur}
\end{figure}
\footnotetext{source: https://www.mycore.de}
.. figure:: images/mycore_architecture-2.png
Figure 2: MyCoRe-Architecture and Components [#f1]_
To specify the MyCoRe framework the morphilo application logic will have to be implemented,
the TEI data model specified, and the input, search and output mask programmed.
a data model specified, and the input, search and output mask programmed.
There are three directories which are
important for adjusting the MyCoRe framework to the needs of one's own application. These three directories
important for adjusting the MyCoRe framework to the needs of one's own application.
These three directories
correspond essentially to the three components in the MVC model as explicated in
section \ref{subsec:mvc}. Roughly, they are envisualized in figure \ref{fig:abbMyCoReStruktur} in the upper
right hand corner. More precisely, the view (\emph{Layout} in figure \ref{fig:abbMyCoReStruktur}) and the model layer
(\emph{Datenmodell} in figure \ref{fig:abbMyCoReStruktur}) can be done
completely via the ``interface'', which is a directory with a predefined
completely via the *interface*, which is a directory with a predefined
structure and some standard files. For the configuration of the logic an extra directory is offered (/src/main/java/custom/mycore/addons/). Here all, java classes
extending the controller layer should be added.
Practically, all three MVC layers are placed in the
......@@ -25,3 +24,7 @@ Practically, all three MVC layers are placed in the
\emph{datamodel/def}, the datamodel specifications are defined as xml files. It parallels the model
layer in the MVC pattern. How the data model was defined will be explained in
section \ref{subsec:datamodelimpl}.
.. rubric:: Notes
.. [#f1] source: https://www.mycore.de
\ No newline at end of file
......@@ -4,36 +4,38 @@ View
Conceptualization
-----------------
Lastly, the third directory (\emph{src/main/resources}) contains all code needed
The MyCoRe-directory (*src/main/resources*) contains all code needed
for rendering the data to be displayed on the screen. So this corresponds to
the view in an MVC approach. It is done by xsl-files that (unfortunately)
contain some logic that really belongs to the controller. Thus, the division is
not as clear as implied in theory. I will discuss this issue more specifically in the
relevant subsection below. Among the resources are also all images, styles, and
not as clear as implied in theory. I will point at this issue more specifically in the
relevant subsection below. Among the resources are all images, styles, and
javascripts.
Implementation
--------------
As explained in section \ref{subsec:mvc}, the view component handles the visual
The view component handles the visual
representation in the form of an interface that allows interaction between
the user and the task to be carried out by the machine. As a
webservice in the present case, all interaction happens via a browser, i.e. webpages are
visualized and responses are recognized by registering mouse or keyboard
events. More specifically, a webpage is rendered by transforming xml documents
to html pages. The MyCoRe repository framework uses an open source XSLT
processor from Apache, Xalan.\footnote{http://xalan.apache.org} This engine
processor from Apache, `Xalan <http://xalan.apache.org>`_. This engine
transforms document nodes described by the XPath syntax into hypertext making
use of a special form of template matching. All templates are collected in so
called xml-encoded stylesheets. Since there are two data models with two
different structures, it is good practice to define two stylesheet files one for
each data model.
As a demonstration, in listing \ref{lst:morphilostylesheet} below a short
As a demonstration, in the listing below a short
extract is given for rendering the word data.
\begin{lstlisting}[language=XML,caption={stylesheet
morphilo.xsl},label=lst:morphilostylesheet]
.. code-block:: xml
:caption: word data rendering in morphilo.xsl
:name: morphilo.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
......@@ -44,11 +46,8 @@ morphilo.xsl},label=lst:morphilostylesheet]
xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:encoder="xalan://java.net.URLEncoder"
xmlns:mcrxsl="xalan://org.mycore.common.xml.MCRXMLFunctions"
xmlns:mcrurn="xalan://org.mycore.urn.MCRXMLFunctions"
exclude-result-prefixes="xalan xlink mcr i18n acl mods mcrxsl mcrurn encoder"
version="1.0">
xmlns:mcrurn="xalan://org.mycore.urn.MCRXMLFunctions" exclude-result-prefixes="xalan xlink mcr i18n acl mods mcrxsl mcrurn encoder" version="1.0">
<xsl:param name="MCR.Users.Superuser.UserName"/>
<xsl:template match="/mycoreobject[contains(@ID,'_morphilo_')]">
<head>
<link href="{$WebApplicationBaseURL}css/file.css" rel="stylesheet"/>
......@@ -84,28 +83,34 @@ morphilo.xsl},label=lst:morphilostylesheet]
</xsl:template>
...
</xsl:stylesheet>
\end{lstlisting}
This template matches with
the root node of each \emph{MyCoRe object} ensuring that a valid MyCoRe model is
the root node of each *MyCoRe object* ensuring that a valid MyCoRe model is
used and checking that the document to be processed contains a unique
identifier, here a \emph{MyCoRe-ID}, and the name of the correct data model,
here \emph{morphilo}.
Then, another template, \emph{objectAction}, is called together with two parameters, the ids
identifier, here a *MyCoRe-ID*, and the name of the correct data model,
here *morphilo*.
Then, another template, *objectAction*, is called together with two parameters, the ids
of the document object and attached files. In the remainder all relevant
information from the document is accessed by XPath, such as the word and the lemma,
and enriched with hypertext annotations it is rendered as a hypertext document.
The template \emph{objectAction} is key to understand the coupling process in the software
framework. It is therefore separately listed in \ref{lst:objActionTempl}.
The template *objectAction* is key to understand the coupling process in the software
framework. It is therefore separately listed in :ref:`objActionTempl`.
.. _objActionTempl:
.. code-block:: xml
:caption: template ObjectAction
:linenos:
:emphasize-lines: 7, 15, 19
\begin{lstlisting}[language=XML,caption={template
objectAction},label=lst:objActionTempl,escapechar=|]
<xsl:template name="objectAction">
<xsl:param name="id" select="./@ID"/>
<xsl:param name="accessedit" select="acl:checkPermission($id,'writedb')"/>
<xsl:param name="accessdelete" select="acl:checkPermission($id,'deletedb')"/>
<xsl:variable name="derivCorp" select="./@label"/>
<xsl:variable name="corpID" select="metadata/def.corpuslink[@class='MCRMetaLinkID']/corpuslink/@xlink:href"/>
<xsl:if test="$accessedit or $accessdelete">|\label{ln:ng}|
<xsl:if test="$accessedit or $accessdelete">
<div class="dropdown pull-right">
<xsl:if test="string-length($corpID) &gt; 0 or $CurrentUser='administrator'">
<button class="btn btn-default dropdown-toggle" style="margin:10px" type="button" id="dropdownMenu1" data-toggle="dropdown" aria-expanded="true">
......@@ -113,11 +118,11 @@ objectAction},label=lst:objActionTempl,escapechar=|]
<span class="caret"></span>
</button>
</xsl:if>
<xsl:if test="string-length($corpID) &gt; 0">|\label{ln:ru}|
<xsl:if test="string-length($corpID) &gt; 0">
<xsl:variable name="ifsDirectory" select="document(concat('ifs:/',$derivCorp))"/>
<ul class="dropdown-menu" role="menu" aria-labelledby="dropdownMenu1">
<li role="presentation">
|\label{ln:nw1}|<a href="{$ServletsBaseURL}object/tag{$HttpSession}?id={$derivCorp}&amp;objID={$corpID}" role="menuitem" tabindex="-1">|\label{ln:nw2}|
<a href="{$ServletsBaseURL}object/tag{$HttpSession}?id={$derivCorp}&amp;objID={$corpID}" role="menuitem" tabindex="-1">
<xsl:value-of select="i18n:translate('object.nextObject')"/>
</a>
</li>
......@@ -150,40 +155,45 @@ objectAction},label=lst:objActionTempl,escapechar=|]
</div>
</xsl:if>
</xsl:template>
\end{lstlisting}
The \emph{objectAction} template defines the selection menu appearing -- once manual tagging has
The *objectAction* template defines the selection menu appearing -- once manual tagging has
started -- on the upper right hand side of the webpage entitled
\emph{Annotieren} and displaying the two options \emph{next word} or \emph{back
to project}.
The first thing to note here is that in line \ref{ln:ng} a simple test
*Annotieren* and displaying the two options *next word* or *back
to project*.
The first thing to note here is that in line 7 a simple test
excludes all guest users from accessing the procedure. After ensuring that only
the user who owns the corpus project has access (line \ref{ln:ru}), s/he will be
the user who owns the corpus project has access (line 15), s/he will be
able to access the drop down menu, which is really a url, e.g. line
\ref{ln:nw1}. The attentive reader might have noticed that
the url exactly matches the definition in the web-fragment.xml as shown in
listing \ref{lst:webfragment}, line \ref{ln:tag}, which resolves to the
19. The attentive reader might have noticed that
the url exactly matches the definition in the *web-fragment.xml* as shown in
listing :ref:`webxml`, line 17, which resolves to the
respective java class there. Really, this mechanism is the data interface within the
MVC pattern. The url also contains two variables, named \emph{derivCorp} and
\emph{corpID}, that are needed to identify the corpus and file object by the
java classes (see section \ref{sec:javacode}).
MVC pattern. The url also contains two variables, named *derivCorp* and
*corpID*, that are needed to identify the corpus and file object by the
java classes (see section :ref:`controller-section`).
The morphilo.xsl stylesheet contains yet another modification that deserves mention.
In listing \ref{lst:derobjectTempl}, line \ref{ln:morphMenu}, two menu options --
\emph{Tag automatically} and \emph{Tag manually} -- are defined. The former option
initiates ProcessCorpusServlet.java as can be seen again in listing \ref{lst:webfragment},
line \ref{ln:process}, which determines words that are not in the master data base.
In listing :ref:`derobjectTempl`, line 18, two menu options --
*Tag automatically* and *Tag manually* -- are defined. The former option
initiates ProcessCorpusServlet.java as can be seen again in listing :ref:`webxml`,
line 7, which determines words that are not in the master data base.
Still, it is important to note that the menu option is only displayed if two restrictions
are met. First, a file has to be uploaded (line \ref{ln:1test}) and, second, there must be
are met. First, a file has to be uploaded (line 19) and, second, there must be
only one file. This is necessary because in the annotation process other files will be generated
that store the words that were not yet processed or a file that includes the final result. The
generated files follow a certain pattern. The file harboring the final, entire TEI-annotated
corpus is prefixed by \emph{tagged}, the other file is prefixed \emph{untagged}. This circumstance
is exploited for manipulating the second option (line \ref{ln:loop}). A loop runs through all
files in the respective directory and if a file name starts with \emph{untagged},
corpus is prefixed by *tagged*, the other file is prefixed *untagged*. This circumstance
is exploited for manipulating the second option (line 27). A loop runs through all
files in the respective directory and if a file name starts with *untagged*,
the option to manually tag is displayed.
\begin{lstlisting}[language=XML,caption={template
matching derobject},label=lst:derobjectTempl,escapechar=|]
.. _derobjectTempl:
.. code-block:: xml
:caption: derobject template
:linenos:
:emphasize-lines: 18,19, 27
<xsl:template match="derobject" mode="derivateActions">
<xsl:param name="deriv" />
<xsl:param name="parentObjID" />
......@@ -228,20 +238,20 @@ matching derobject},label=lst:derobjectTempl,escapechar=|]
</div>
</xsl:if>
</xsl:template>
\end{lstlisting}
Besides the two stylesheets morphilo.xsl and corpmeta.xsl, other stylesheets have
Besides the two stylesheets *morphilo.xsl* and *corpmeta.xsl*, other stylesheets have
to be adjusted. They will not be discussed in detail here for they are self-explanatory for the most part.
Essentially, they render the overall layout (\emph{common-layout.xsl}, \emph{skeleton\_layout\_template.xsl})
Essentially, they render the overall layout (*common-layout.xsl*, *skeleton_layout_template.xsl*)
or the presentation
of the search results (\emph{response-page.xsl}) and definitions of the solr search fields (\emph{searchfields-solr.xsl}).
The former and latter also inherit templates from \emph{response-general.xsl} and \emph{response-browse.xsl}, in which the
of the search results (*response-page.xsl*) and definitions of the solr search fields (*searchfields-solr.xsl*).
The former and latter also inherit templates from *response-general.xsl* and *response-browse.xsl*, in which the
navigation bar of search results can be changed. For the use of multilinguality a separate configuration directory
has to be created containing as many \emph{.property}-files as different
languages want to be displayed. In the current case these are restricted to German and English (\emph{messages\_de.properties} and \emph{messages\_en.properties}).
The property files include all \emph{i18n} definitions. All these files are located in the \emph{resources} directory.
has to be created containing as many *.property*-files as different
languages want to be displayed. In the current case these are restricted to German and English (*messages_de.properties* and *messages_en.properties*).
The property files include all *i18n* definitions. All these files are located in the *resources* directory.
Furthermore, a search mask and a page for manually entering the annotations had
to be designed.
For these files a specially designed xml standard (\emph{xed}) is recommended to be used within the
For these files a specially designed xml standard (*xed*) is recommended to be used within the
repository framework.
\ No newline at end of file
File deleted
......@@ -6,7 +6,7 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Documentation Morphilo Project &#8212; Morphilo documentation</title>
<title>Morphilo Project Documentation &#8212; Morphilo documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" src="_static/documentation_options.js"></script>
......@@ -15,7 +15,7 @@
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Data Model" href="source/datamodel.html" />
<link rel="next" title="Software Design" href="source/architecture.html" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
......@@ -30,32 +30,33 @@
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="documentation-morphilo-project">
<h1>Documentation Morphilo Project<a class="headerlink" href="#documentation-morphilo-project" title="Permalink to this headline"></a></h1>
<div class="section" id="morphilo-project-documentation">
<h1>Morphilo Project Documentation<a class="headerlink" href="#morphilo-project-documentation" title="Permalink to this headline"></a></h1>
<div class="toctree-wrapper compound">
<p class="caption"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="source/architecture.html">Software Design</a><ul>
<li class="toctree-l2"><a class="reference internal" href="source/architecture.html#mvc-model">MVC Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/architecture.html#morphilo-architecture">Morphilo Architecture</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/architecture.html#repository-framework">Repository Framework</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="source/datamodel.html">Data Model</a><ul>
<li class="toctree-l2"><a class="reference internal" href="source/datamodel.html#conceptualization">Conceptualization</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/datamodel.html#implementation">Implementation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="source/controller.html">Controller Adjustments</a><ul>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#general-principle-of-operation">General Principle of Operation</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#conceptualization">Conceptualization</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#implementation">Implementation</a><ul>
<li class="toctree-l3"><a class="reference internal" href="source/controller.html#id13">}</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="source/view.html">View</a><ul>
<li class="toctree-l2"><a class="reference internal" href="source/view.html#conceptualization">Conceptualization</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/view.html#implementation">Implementation</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="source/architecture.html">Software Design</a></li>
<li class="toctree-l1"><a class="reference internal" href="source/framework.html">Framework</a></li>
<li class="toctree-l1"><a class="reference internal" href="source/controller.html">Controller Adjustments</a><ul>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#general-principle-of-operation">General Principle of Operation</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#conceptualization">Conceptualization</a></li>
<li class="toctree-l2"><a class="reference internal" href="source/controller.html#implementation">Implementation</a></li>
</ul>
</li>
</ul>
</div>
</div>
......@@ -76,14 +77,14 @@
<div class="sphinxsidebarwrapper">
<h3><a href="#">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Documentation Morphilo Project</a></li>
<li><a class="reference internal" href="#">Morphilo Project Documentation</a></li>
<li><a class="reference internal" href="#indices-and-tables">Indices and tables</a></li>
</ul>
<div class="relations">
<h3>Related Topics</h3>
<ul>
<li><a href="#">Documentation overview</a><ul>
<li>Next: <a href="source/datamodel.html" title="next chapter">Data Model</a></li>
<li>Next: <a href="source/architecture.html" title="next chapter">Software Design</a></li>
</ul></li>
</ul>
</div>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment