basic documentation layout

4a8ee435 · Peukert, Dr. Hagen · b33a07f9 · 4a8ee435 · 4a8ee435 · 4a8ee435
Commit 4a8ee435 authored 6 years ago by Peukert, Dr. Hagen
--- a/.project
+++ b/.project
+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+	<name>morphilo_doc</name>
+	<comment></comment>
+	<projects>
+	</projects>
+	<buildSpec>
+	</buildSpec>
+	<natures>
+	</natures>
+</projectDescription>
--- a/Morphilo_doc/source/architecture.rst
+++ b/Morphilo_doc/source/architecture.rst
+Software Design
+===============
+
+\begin{figure}
+ \centering
+ \includegraphics[scale=0.8]{architecture.pdf}
+ \caption{Morphilo Architecture}
+ \label{fig:architect}
+\end{figure}
+
+The architecture of a possible \emph{take-and-share}-approach for language
+resources is visualized in figure \ref{fig:architect}. Because the very gist
+of the approach becomes clearer if describing a concrete example, the case of
+annotating lexical derivatives of Middle English and a respective database is
+given as an illustration.
+However, any other tool that helps with manual annotations and manages metadata of a corpus could be
+substituted here instead.
+
+After inputting an untagged corpus or plain text, it is determined whether the
+input material was annotated previously by a different user. This information is
+usually provided by the metadata administered by the annotation tool; in the case at
+hand it is called \emph{Morphilizer} in figure \ref{fig:architect}. An
+alternative is a simple table look-up for all occurring words in the datasets Corpus 1 through Corpus n. If contained
+completely, the \emph{yes}-branch is followed up further -- otherwise \emph{no}
+succeeds. The difference between the two branches is subtle, yet crucial. On
+both branches, the annotation tool (here \emph{Morphilizer}) is called, which, first,
+sorts out all words that are not contained in the master database (here \emph{Morphilo-DB})
+and, second, makes reasonable suggestions on an optimal annotation of
+the items. In both cases the
+annotations are linked to the respective items (e.g. words) in the
+text, but they are also persistently saved in an extra dataset, i.e. Corpus 1
+through n, together with all available metadata. 
+
+The difference between both information streams is that
+in the \emph{yes}-branch a comparison between the newly created dataset and
+all of the previous datasets of this text is carried out. Within this
+unit, all deviations and congruencies are marked and counted. The underlying
+assumption is that with a growing number of comparable texts the
+correct annotations approach a theoretic true value of a correct annotation
+while errors level out provided that the sample size is large enough. How the
+distribution of errors and correct annotations exactly looks like and if a
+normal distribution can be assumed is still object of the ongoing research, but
+independent of the concrete results, the component (called \emph{compare
+manual annotations} in figure \ref{fig:architect}) allows for specifying the
+exact form of the sample population.
+In fact, it is necessary at that point to define the form of the distribution,
+sample size, and the rejection region. The standard setting are a normal
+distribution, a rejection region of $\alpha = 0.05$ and sample size of $30$ so
+that a simple Gau\ss-Test can be calculated.
+
+Continuing the information flow further, these statistical calculations are
+delivered to the quality-control-component. Based on the statistics, the
+respective items together with the metadata, frequencies, and, of course,
+annotations are written to the master database. All information in the master
+database is directly used for automated annotations. Thus it is directly matched
+to the input texts or corpora respectively through the \emph{Morphilizer}-tool.
+The annotation tool decides on the entries looked up in the master which items
+are to be manually annotated.
+
+The processes just described are all hidden from the user who has no possibility
+to impact the set quality standards but by errors in the annotation process. The
+user will only see the number of items of the input text he or she will process manually. The
+annotator will also see an estimation of the workload beforehand. On this
+number, a decision can be made if to start the annotation at all.  It will be
+possible to interrupt the annotation work and save progress on the server. And
+the user will have access to the annotations made in the respective dataset,
+correct them or save them and resume later. It is important to note that the user will receive
+the tagged document only after all items are fully annotated. No partially
+tagged text can be output.
\ No newline at end of file
--- a/Morphilo_doc/source/controller.rst
+++ b/Morphilo_doc/source/controller.rst
@@ -3,3 +3,847 @@ Controller Adjustments

 General Principle of Operation
 ------------------------------
+
+Figure \ref{fig:classDiag} illustrates the dependencies of the five java classes that were integrated to add the morphilo
+functionality defined in the default package \emph{custom.mycore.addons.morphilo}. The general principle of operation 
+is the following. The handling of data search, upload, saving, and user
+authentification is fully left to the MyCoRe functionality that is completely
+implemented. The class \emph{ProcessCorpusServlet.java} receives a request from the webinterface to process an uploaded file,
+i.e. a simple text corpus, and it checks if any of the words are available in the master database. All words that are not
+listed in the master database are written to an extra file. These are the words that have to be manually annotated. At the end, the 
+servlet sends a response back to the user interface. In case of all words are contained in the master, an xml file is generated from the 
+master database that includes all annotated words of the original corpus. Usually this will not be the case for larger textfiles. 
+So if some words are not in the master, the user will get the response to initiate the manual annotation process. 
+
+The manual annotation process is processed by the class
+\emph{{Tag\-Corpus\-Serv\-let\-.ja\-va}}, which will build a JDOM object for the first word in the extra file. 
+This is done by creating an object of the \emph{JDOMorphilo.java} class. This class, in turn, will use the methods of 
+\emph{AffixStripper.java} that make simple, but reasonable, suggestions on the word structure. This JDOM object is then 
+given as a response back to the user. It is presented as a form, in which the user can make changes. This is necessary 
+because the word structure algorithm of \emph{AffixStripper.java} errs in some cases. Once the user agrees on the  
+suggestions or on his or her corrections, the JDOM object is saved as an xml that is only searchable, visible, and 
+changeable by the authenicated user (and the administrator), another file  containing all processed words is created or 
+updated respectively and the \emph{TagCorpusServlet.java} servlet will restart until the last word in the extra list is 
+processed. This enables the user to stop and resume her or his annotation work at a later point in time. The 
+\emph{TagCorpusServlet} will call methods from \emph{ProcessCorpusServlet.java}  to adjust the content of the extra 
+files harboring the untagged words. If this file is empty, and only then, it is replaced by the file comprising all words 
+from the original text file, both the ones from the master database and the ones that are annotated by the user, 
+in an annotated xml representation.
+
+Each time \emph{ProcessCorpusServlet.java} is instantiated, it also instantiates \emph{QualityControl.java}. This class checks if a
+new word can be transferred to the master database. The algorithm can be freely adopted to higher or lower quality standards.
+In its present configuration, a method tests at a limit of 20 different
+registered users agreeing on the annotation of the same word. More specifically,
+if 20 JDOM objects are identical except in the attribute field \emph{occurrences} in the metadata node, the JDOM object becomes 
+part of the master. The latter is easily done by changing the attribute \emph{creator} from the user name 
+to \emph{``administrator''} in the service node. This makes the dataset part of the master database. Moreover, the \emph{occurrences} 
+attribute is updated by adding up all occurrences of the word that stem from
+different text corpora of the same time range.
+\begin{landscape}
+ \begin{figure}
+  \centering
+  \includegraphics[scale=0.55]{morphilo_uml.png}
+  \caption{Class Diagram Morphilo}
+  \label{fig:classDiag}
+ \end{figure}
+\end{landscape}
+
+
+
+Conceptualization
+-----------------
+
+The controller component is largely
+specified and ready to use in some hundred or so java classes handling the
+logic of the search such as indexing, but also dealing with directories and
+files as saving, creating, deleting, and updating files.
+Moreover, a rudimentary user management comprising different roles and
+rights is offered. The basic technology behind the controller's logic is the
+servlet. As such all new code has to be registered as a servlet in the
+web-fragment.xml (here the Apache Tomcat container) as listing \ref{lst:webfragment} shows.
+
+\begin{lstlisting}[language=XML,caption={Servlet Registering in the
+web-fragment.xml (excerpt)},label=lst:webfragment,escapechar=|] 
+<servlet>
+ <servlet-name>ProcessCorpusServlet</servlet-name>
+ <servlet-class>custom.mycore.addons.morphilo.ProcessCorpusServlet</servlet-class>
+</servlet>
+<servlet-mapping>
+ <servlet-name>ProcessCorpusServlet</servlet-name>
+ <url-pattern>/servlets/object/process</url-pattern>|\label{ln:process}|
+</servlet-mapping>
+<servlet>
+ <servlet-name>TagCorpusServlet</servlet-name>
+ <servlet-class>custom.mycore.addons.morphilo.TagCorpusServlet</servlet-class>
+</servlet>
+<servlet-mapping>
+ <servlet-name>TagCorpusServlet</servlet-name>
+ <url-pattern>/servlets/object/tag</url-pattern>|\label{ln:tag}|
+</servlet-mapping>
+\end{lstlisting}
+
+Now, the logic has to be extended by the specifications analyzed in chapter 
+\ref{chap:concept} on conceptualization. More specifically, some
+classes have to be added that take care of analyzing words
+(\emph{AffixStripper.java, InflectionEnum.java, SuffixEnum.java,
+PrefixEnum.java}), extracting the relevant words from the text and checking the
+uniqueness of the text (\emph{ProcessCorpusServlet.java}), make reasonable
+suggestions on the annotation (\emph{TagCorpusServlet.java}), build the object
+of each annotated word (\emph{JDOMorphilo.java}), and check on the quality by applying
+statistical models (\emph{QualityControl.java}).
+
+Implementation
+--------------
+
+Having taken a bird's eye perspective in the previous chapter, it is now time to take a look at the specific implementation at the level 
+of methods. Starting with the main servlet, \emph{ProcessCorpusServlet.java}, the class defines four getter method:
+\renewcommand{\labelenumi}{(\theenumi)}
+\begin{enumerate}
+  \item\label{itm:geturl} public String getURLParameter(MCRServletJob, String)
+  \item\label{itm:getcorp} public String getCorpusMetadata(MCRServletJob, String)
+  \item\label{itm:getcont} public ArrayList<String> getContentFromFile(MCRServletJob, String)
+  \item\label{itm:getderiv} public Path getDerivateFilePath(MCRServletJob, String)
+  \item\label{itm:now} public int getNumberOfWords(MCRServletJob job, String)
+\end{enumerate}
+Since each servlet in MyCoRe extends the class MCRServlet, it has access to MCRServletJob, from which the http requests and responses 
+can be used. This is the first argument in the above methods. The second argument of method (\ref{itm:geturl}) specifies the name of an url parameter, i.e. 
+the object id or the id of the derivate. The method returns the value of the given parameter. Typically MyCoRe uses the url to exchange 
+these ids. The second method provides us with the value of a data field in the xml document. So the string defines the name of an attribute.
+\emph{getContentFromFile(MCRServletJob, String)} returns the words as a list from a file when given the filename as a string. 
+The getter listed in \ref{itm:getderiv}), returns the Path from the MyCoRe repository when the name of 
+the file is specified. And finally, method (\ref{itm:now}) returns the number of words by simply returning 
+\emph{getContentFromFile(job, fileName).size()}.
+
+There are two methods in every MyCoRe-Servlet that have to be overwritten, 
+\emph{protected void render(MCRServletJob, Exception)}, which redirects the requests as \emph{POST} or \emph{GET} responds, and
+\emph{protected void think(MCRServletJob)}, in which the logic is implemented. Since the latter is important to understand the
+core idea of the Morphilo algorithm, it is displayed in full length in source code \ref{src:think}.
+
+\begin{lstlisting}[language=java,caption={The overwritten think method},label=src:think,escapechar=|]
+protected void think(MCRServletJob job) throws Exception 
+{    	
+ this.job = job;
+ String dateFromCorp = getCorpusMetadata(job, "def.datefrom");
+ String dateUntilCorp = getCorpusMetadata(job, "def.dateuntil");
+ String corpID = getURLParameter(job, "objID");
+ String derivID = getURLParameter(job, "id");
+        
+ //if NoW is 0, fill with anzWords
+ MCRObject helpObj = MCRMetadataManager.retrieveMCRObject(MCRObjectID.getInstance(corpID));|\label{ln:bugfixstart}|
+ Document jdomDocHelp = helpObj.createXML();
+ XPathFactory xpfacty = XPathFactory.instance();
+ XPathExpression<Element> xpExp = xpfacty.compile("//NoW", Filters.element());
+ Element elem = xpExp.evaluateFirst(jdomDocHelp);
+ //fixes transferred morphilo data from previous stand alone project
+ int corpussize = getNumberOfWords(job, "");
+ if (Integer.parseInt(elem.getText()) != corpussize)
+ {
+  elem.setText(Integer.toString(corpussize));
+  helpObj = new MCRObject(jdomDocHelp);
+  MCRMetadataManager.update(helpObj);
+ }|\label{ln:bugfixend}|       
+    	
+ //Check if the uploaded corpus was processed before
+ SolrClient slr = MCRSolrClientFactory.getSolrClient();|\label{ln:solrstart}|
+ SolrQuery qry = new SolrQuery();
+ qry.setFields("korpusname", "datefrom", "dateuntil", "NoW", "id");
+ qry.setQuery("datefrom:" + dateFromCorp + " AND dateuntil:" + dateUntilCorp + " AND NoW:" + corpussize);
+ SolrDocumentList rslt = slr.query(qry).getResults();|\label{ln:solrresult}|
+        
+ Boolean incrOcc = true;
+ // if resultset contains only one, then it must be the newly created corpus
+ if (slr.query(qry).getResults().getNumFound() > 1) 
+ {
+  incrOcc = false;
+ }|\label{ln:solrend}|
+       
+ //match all words in corpus with morphilo (creator=administrator) and save all words that are not in morphilo DB in leftovers
+ ArrayList<String> leftovers = new ArrayList<String>();
+ ArrayList<String> processed = new ArrayList<String>();
+        
+ leftovers = getUnknownWords(getContentFromFile(job, ""), dateFromCorp, dateUntilCorp, "", incrOcc, incrOcc, false);|\label{ln:callkeymeth}|
+        
+ //write all words of leftover in file as derivative to respective corpmeta dataset        
+ MCRPath root = MCRPath.getPath(derivID, "/");|\label{ln:filesavestart}|
+ Path fn = getDerivateFilePath(job, "").getFileName();
+ Path p = root.resolve("untagged-" + fn);
+ Files.write(p, leftovers);|\label{ln:filesaveend}|
+        
+ //create a file for all words that were processed
+ Path procWds = root.resolve("processed-" + fn);
+ Files.write(procWds, processed);    	       
+}
+\end{lstlisting}
+Using the above mentioned getter methods, the \emph{think} method assigns values to the object ID, needed to get the xml document
+that contain the corpus metadata, the file ID, and the beginning and starting dates from the corpus to be analyzed. Lines \ref{ln:bugfixstart}
+through \ref{ln:bugfixend} show how to access a mycore object as an xml document, a procedure that will be used in different variants
+throughout this implementation.
+By means of the object ID, the respective corpus is identified and a JDOM document is constructed, which can then be accessed
+by XPath. The XPath factory instances are collections of the xml nodes. In the present case, it is save to assume that only one element
+of \emph{NoW} is available (see corpus datamodel listing \ref{lst:corpusdatamodel} with $maxOccurs='1'$). So we do not have to loop through
+the collection, but use the first node named \emph{NoW}. The if-test checks if the number of words of the uploaded file is the
+same as the number written in the document. When the document is initially created by the MyCoRe logic it was configured to be zero.
+If unequal, the setText(String) method is used to write the number of words of the corpus to the document.
+
+Lines \ref{ln:solrstart}--\ref{ln:solrend} reveal the second important ingredient, i.e. controlling the search engine. First, a solr 
+client and a query are initialized. Then, the output of the result set is defined by giving the fields of interest of the document.
+In the case at hand, it is the id, the name of the corpus, the number of words, and the beginnig and ending dates. With \emph{setQuery}
+it is possible to assign values to some or all of these fields. Finally, \emph{getResults()} carries out the search and writes
+all hits to a \emph{SolrDocumentList} (line \ref{ln:solrresult}). The test that follows is really only to set a Boolean 
+encoding if the number of occurrences of that word in the master should be updated. To avoid multiple counts, 
+incrementing the word frequency is only done if it is a new corpus.
+
+In line \ref{ln:callkeymeth} \emph{getUnknownWords(ArrayList, String, String, String, Boolean, Boolean, Boolean)} is called and
+returned as a list of words. This method is key and will be discussed in depth below. Finally, lines 
+\ref{ln:filesavestart}--\ref{ln:filesaveend} show how to handle file objects in MyCoRe. Using the file ID, the root path and the name
+of the first file in that path are identified. Then, a second file starting with ``untagged'' is created and all words returned from
+the \emph{getUnknownWords} is written to that file. By the same token an empty file is created (in the last two lines of the \emph{think}-method), 
+in which all words that are manually annotated will be saved.
+
+In a refactoring phase, the method \emph{getUnknownWords(ArrayList, String, String, String, Boolean, Boolean, Boolean)} could be subdivided into
+three methods: for each Boolean parameter one. In fact, this method handles more than one task. This is mainly due to multiple code avoidance.
+%this is just wrong because no resultset will substantially be more than 10-20
+%In addition, for large text files this method would run into efficiency problems if the master database also reaches the intended size of about 
+%$100,000$ entries and beyond because 
+In essence, an outer loop runs through all words of the corpus and an inner loop runs through all hits in the solr result set. Because the result
+set is supposed to be small, approximately between $10-20$ items, efficiency
+problems are unlikely to cause a problem, although there are some more loops running through collection of about the same sizes.
+%As the hits naturally grow larger with an increasing size of the data base, processing time will rise exponentially.
+Since each word is identified on the basis of its projected word type, the word form, and the time range it falls into, it is these variables that
+have to be checked for existence in the documents. If not in the xml documents,
+\emph{null} is returned and needs to be corrected. Moreover, user authentification must be considered. There are three different XPaths that are relevant. 
+\begin{itemize}
+  \item[-] \emph{//service/servflags/servflag[@type='createdby']} to test for the correct user
+  \item[-] \emph{//morphiloContainer/morphilo} to create the annotated document
+  \item[-] \emph{//morphiloContainer/morphilo/w} to set occurrences or add a link
+\end{itemize}
+
+As an illustration of the core functioning of this method, listing \ref{src:getUnknowWords} is given.
+\begin{lstlisting}[language=java,caption={Mode of Operation of getUnknownWords Method},label=src:getUnknowWords,escapechar=|]
+public ArrayList<String> getUnknownWords(
+ ArrayList<String> corpus, 
+ String timeCorpusBegin, 
+ String timeCorpusEnd, 
+ String wdtpe,
+ Boolean setOcc,
+ Boolean setXlink,
+ Boolean writeAllData) throws Exception
+ {
+  String currentUser = MCRSessionMgr.getCurrentSession().getUserInformation().getUserID();
+  ArrayList lo = new ArrayList();
+    	
+  for (int i = 0; i < corpus.size(); i++) 
+  {
+   SolrClient solrClient = MCRSolrClientFactory.getSolrClient();
+   SolrQuery query = new SolrQuery();
+   query.setFields("w","occurrence","begin","end", "id", "wordtype");
+   query.setQuery(corpus.get(i));
+   query.setRows(50); //more than 50 items are extremely unlikely
+   SolrDocumentList results = solrClient.query(query).getResults();
+   Boolean available = false;
+   for (int entryNum = 0; entryNum < results.size(); entryNum++)
+   {
+    ...
+    // update in MCRMetaDataManager
+    String mcrIDString = results.get(entryNum).getFieldValue("id").toString();
+    //MCRObjekt auslesen und JDOM-Document erzeugen:
+    MCRObject mcrObj = MCRMetadataManager.retrieveMCRObject(MCRObjectID.getInstance(mcrIDString));
+    Document jdomDoc = mcrObj.createXML();
+    ...
+    //check and correction for word type
+    ...
+    //checkand correction time: timeCorrect
+    ...
+    //check if user correct: isAuthorized
+   ...  
+   XPathExpression<Element> xp = xpfac.compile("//morphiloContainer/morphilo/w", Filters.element());
+   //Iterates w-elements and increments occurrence attribute if setOcc is true 
+   for (Element e : xp.evaluate(jdomDoc)) 
+   {
+    //wenn Rechte da sind und Worttyp nirgends gegeben oder gleich ist 
+	if (isAuthorized && timeCorrect
+	 && ((e.getAttributeValue("wordtype") == null && wdtpe.equals(""))
+	 || e.getAttributeValue("wordtype").equals(wordtype))) // nur zur Vereinheitlichung
+     {
+	  int oc = -1;
+	  available = true;|\label{ln:available}|
+      try
+	  {
+	   //adjust occurrence Attribut
+	   if (setOcc)
+       {
+        oc = Integer.parseInt(e.getAttributeValue("occurrence"));			                			
+		e.setAttribute("occurrence", Integer.toString(oc + 1)); 
+	   }
+
+       //write morphilo-ObjectID in xml of corpmeta
+	   if (setXlink)
+	   {
+		Namespace xlinkNamespace = Namespace.getNamespace("xlink", "http://www.w3.org/1999/xlink");|\label{ln:namespace}|
+		MCRObject corpObj = MCRMetadataManager.retrieveMCRObject(MCRObjectID.getInstance(getURLParameter(job, "objID")));
+		Document corpDoc = corpObj.createXML();
+		XPathExpression<Element> xpathEx = xpfac.compile("//corpuslink", Filters.element());
+		Element elm = xpathEx.evaluateFirst(corpDoc);
+		elm.setAttribute("href" , mcrIDString, xlinkNamespace);
+	   }
+	   mcrObj = new MCRObject(jdomDoc);|\label{ln:updatestart}|
+	   MCRMetadataManager.update(mcrObj);
+	   QualityControl qc = new QualityControl(mcrObj);|\label{ln:updateend}|
+	  }
+	  catch(NumberFormatException except)
+	  {
+	   // ignore
+	  }
+	 }
+	}
+    if (!available) // if not available in datasets under the given conditions |\label{ln:notavailable}|
+    {
+     lo.add(corpus.get(i));
+    }  
+   }
+   return lo;
+  }         
+\end{lstlisting}
+As can be seen from the functionality of listing \ref{src:getUnknowWords}, getting the unknown words of a corpus, is rather a side effect for the equally named method.
+More precisely, a Boolean (line \ref{ln:available}) is set when the document is manipulated otherwise because it is clear that the word must exist then.
+If the Boolean remains false (line \ref{ln:notavailable}), the word is put on the list of words that have to be annotated manually. As already explained above, the 
+first loop runs through all words (corpus) and the following lines a solr result set is created. This set is also looped through and it is checked if the time range,
+the word type and the user are authorized. In the remainder, the occurrence attribute of the morphilo document can be incremented (setOcc is true) or/and the word is linked to the
+corpus meta data (setXlink is true). While all code lines are equivalent with
+what was explained in listing \ref{src:think}, it suffices to focus on an
+additional name space, i.e.
+``xlink'' has to be defined (line \ref{ln:namespace}). Once the linking of word
+and corpus is set, the entire MyCoRe object has to be updated. This is done by the functionality of the framework (lines \ref{ln:updatestart}--\ref{ln:updateend}).
+At the end, an instance of \emph{QualityControl} is created.
+
+%QualityControl
+The class \emph{QualityControl} is instantiated with a constructor
+depicted in listing \ref{src:constructQC}.
+\begin{lstlisting}[language=java,caption={Constructor of QualityControl.java},label=src:constructQC,escapechar=|]
+private MCRObject mycoreObject;
+/* Constructor calls method to carry out quality control, i.e. if at least 20 
+ * different users agree 100% on the segments of the word under investigation
+ */
+public QualityControl(MCRObject mycoreObject) throws Exception
+{
+ this.mycoreObject = mycoreObject;		
+ if (getEqualObjectNumber() > 20)
+ {
+  addToMorphiloDB();
+ }
+}
+\end{lstlisting}
+The constructor takes an MyCoRe object, a potential word candidate for the
+master data base, which is assigned to a private class variable because the
+object is used though not changed by some other java methods.
+More importantly, there are two more methods: \emph{getEqualNumber()} and
+\emph{addToMorphiloDB()}. While the former initiates a process of counting and
+comparing objects, the latter is concerned with calculating the correct number
+of occurrences from different, but not the same texts, and generating a MyCoRe object with the same content but with two different flags in the \emph{//service/servflags/servflag}-node, i.e. \emph{createdby='administrator'} and \emph{state='published'}.
+And of course, the \emph{occurrence} attribute is set to the newly calculated value. The logic corresponds exactly to what was explained in 
+listing \ref{src:think} and will not be repeated here. The only difference are the paths compiled by the XPathFactory. They are
+\begin{itemize}
+  \item[-] \emph{//service/servflags/servflag[@type='createdby']} and
+  \item[-] \emph{//service/servstates/servstate[@classid='state']}.
+\end{itemize}
+It is more instructive to document how the number of occurrences is calculated. There are two steps involved. First, a list with all mycore objects that are
+equal to the object which the class is instantiated with (``mycoreObject'' in listing \ref{src:constructQC}) is created. This list is looped and all occurrence
+attributes are summed up. Second, all occurrences from equal texts are substracted. Equal texts are identified on the basis of its meta data and its derivate. 
+There are some obvious shortcomings of this approach, which will be discussed in chapter \ref{chap:results}, section \ref{sec:improv}. Here, suffice it to
+understand the mode of operation. Listing \ref{src:equalOcc} shows a possible solution.
+\begin{lstlisting}[language=java,caption={Occurrence Extraction from Equal Texts (1)},label=src:equalOcc,escapechar=|]
+/* returns number of Occurrences if Objects are equal, zero otherwise
+ */
+private int getOccurrencesFromEqualTexts(MCRObject mcrobj1, MCRObject mcrobj2) throws SAXException, IOException
+{
+ int occurrences = 1;
+ //extract corpmeta ObjectIDs from morphilo-Objects
+ String crpID1 = getAttributeValue("//corpuslink", "href", mcrobj1);
+ String crpID2 = getAttributeValue("//corpuslink", "href", mcrobj2);
+ //get these two corpmeta Objects
+ MCRObject corpo1 = MCRMetadataManager.retrieveMCRObject(MCRObjectID.getInstance(crpID1));
+ MCRObject corpo2 = MCRMetadataManager.retrieveMCRObject(MCRObjectID.getInstance(crpID2));
+ //are the texts equal? get list of 'processed-words' derivate
+ String corp1DerivID = getAttributeValue("//structure/derobjects/derobject", "href", corpo1);
+ String corp2DerivID = getAttributeValue("//structure/derobjects/derobject", "href", corpo2);
+			
+ ArrayList result = new ArrayList(getContentFromFile(corp1DerivID, ""));|\label{ln:writeContent}|
+ result.remove(getContentFromFile(corp2DerivID, ""));|\label{ln:removeContent}|
+ if (result.size() == 0) // the texts are equal
+ {
+  // extract occurrences of one the objects
+  occurrences = Integer.parseInt(getAttributeValue("//morphiloContainer/morphilo/w", "occurrence", mcrobj1));
+ }
+ else
+ {
+  occurrences = 0; //project metadata happened to be the same, but texts are different
+ }
+ return occurrences;
+}
+\end{lstlisting}
+In this implementation, the ids from the \emph{corpmeta} data model are accessed via the xlink attribute in the morphilo documents.
+The method \emph{getAttributeValue(String, String, MCRObject)} does exactly the same as demonstrated earlier (see from line \ref{ln:namespace} 
+on in listing \ref{src:getUnknowWords}). The underlying logic is that the texts are equal if exactly the same number of words were uploaded.
+So all words from one file are written to a list (line \ref{ln:writeContent}) and words from the other file are removed from the
+very same list (line \ref{ln:removeContent}). If this list is empty, then the exact same number of words must have been in both files and the occurrences
+are adjusted accordingly. Since this method is called from another private method that only contains a loop through all equal objects, one gets
+the occurrences from all equal texts. For reasons of confirmability, the looping method is also given:
+\begin{lstlisting}[language=java,caption={Occurrence Extraction from Equal Texts (2)},label=src:equalOcc2,escapechar=|]
+private int getOccurrencesFromEqualTexts() throws Exception
+{
+ ArrayList<MCRObject> equalObjects = new ArrayList<MCRObject>();
+ equalObjects = getAllEqualMCRObjects();
+ int occurrences = 0; 
+ for (MCRObject obj : equalObjects)
+ {
+  occurrences = occurrences + getOccurrencesFromEqualTexts(mycoreObject, obj);			
+ }
+ return occurrences;
+}
+\end{lstlisting}
+
+Now, the constructor in listing \ref{src:constructQC} reveals another method that rolls out an equally complex concatenation of procedures.
+As implied above, \emph{getEqualObjectNumber()} returns the number of equally annotated words. It does this by falling back to another
+method from which the size of the returned list is calculated (\emph{getAllEqualMCRObjects().size()}). Hence, we should care about
+\emph{getAllEqualMCRObjects()}. This method really has the same design as \emph{int getOccurrencesFromEqualTexts()} in listing \ref{src:equalOcc2}.
+The difference is that another method (\emph{Boolean compareMCRObjects(MCRObject, MCRObject, String)}) is used within the loop and 
+that all equal objects are put into the list of MyCoRe objects that are returned. If this list comprises more than 20 
+entries,\footnote{This number is somewhat arbitrary. It is inspired by the sample size n in t-distributed data.} the respective document
+will be integrated in the master data base by the process described above.
+The comparator logic is shown in listing \ref{src:compareMCR}. 
+\begin{lstlisting}[language=java,caption={Comparison of MyCoRe objects},label=src:compareMCR,escapechar=|]
+private Boolean compareMCRObjects(MCRObject mcrobj1, MCRObject mcrobj2, String xpath) throws SAXException, IOException
+{
+ Boolean isEqual = false;
+ Boolean beginTime = false;
+ Boolean endTime = false;
+ Boolean occDiff = false;
+ Boolean corpusDiff = false;
+		
+ String source = getXMLFromObject(mcrobj1, xpath);
+ String target = getXMLFromObject(mcrobj2, xpath);
+
+ XMLUnit.setIgnoreAttributeOrder(true);
+ XMLUnit.setIgnoreComments(true);
+ XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true);
+ XMLUnit.setIgnoreWhitespace(true);
+ XMLUnit.setNormalizeWhitespace(true);
+		
+ //differences in occurrences, end, begin should be ignored
+ try
+ {
+  Diff xmlDiff = new Diff(source, target);
+  DetailedDiff dd = new DetailedDiff(xmlDiff);
+  //counters for differences
+  int i = 0;
+  int j = 0;
+  int k = 0;
+  int l = 0;
+  // list containing all differences
+  List differences = dd.getAllDifferences();|\label{ln:difflist}|
+  for (Object object : differences)
+  {
+   Difference difference = (Difference) object;
+   //@begin,@end,... node is not in the difference list if the count is 0
+   if (difference.getControlNodeDetail().getXpathLocation().endsWith("@begin")) i++;|\label{ln:diffbegin}|
+   if (difference.getControlNodeDetail().getXpathLocation().endsWith("@end")) j++;
+   if (difference.getControlNodeDetail().getXpathLocation().endsWith("@occurrence")) k++; 
+   if (difference.getControlNodeDetail().getXpathLocation().endsWith("@corpus")) l++;|\label{ln:diffend}|
+   //@begin and @end have different values: they must be checked if they fall right in the allowed time range		
+   if ( difference.getControlNodeDetail().getXpathLocation().equals(difference.getTestNodeDetail().getXpathLocation()) 
+	&& difference.getControlNodeDetail().getXpathLocation().endsWith("@begin") 
+	&& (Integer.parseInt(difference.getControlNodeDetail().getValue()) < Integer.parseInt(difference.getTestNodeDetail().getValue())) ) 
+   {
+	beginTime = true;
+   }
+   if (difference.getControlNodeDetail().getXpathLocation().equals(difference.getTestNodeDetail().getXpathLocation()) 
+	&& difference.getControlNodeDetail().getXpathLocation().endsWith("@end")
+	&& (Integer.parseInt(difference.getControlNodeDetail().getValue()) > Integer.parseInt(difference.getTestNodeDetail().getValue())) )
+   {
+	endTime = true;
+   }
+   //attribute values of @occurrence and @corpus are ignored if they are different
+   if (difference.getControlNodeDetail().getXpathLocation().equals(difference.getTestNodeDetail().getXpathLocation()) 
+	&& difference.getControlNodeDetail().getXpathLocation().endsWith("@occurrence"))
+   {
+	occDiff = true;
+   }
+   if (difference.getControlNodeDetail().getXpathLocation().equals(difference.getTestNodeDetail().getXpathLocation()) 
+	&& difference.getControlNodeDetail().getXpathLocation().endsWith("@corpus"))
+   {
+	corpusDiff = true;
+   }						
+  }
+  //if any of @begin, @end ... is identical set Boolean to true
+  if (i == 0) beginTime = true;|\label{ln:zerobegin}|
+  if (j == 0) endTime = true;
+  if (k == 0) occDiff = true;
+  if (l == 0) corpusDiff = true;|\label{ln:zeroend}|
+  //if the size of differences is greater than the number of changes admitted in @begin, @end ... something else must be different
+  if (beginTime && endTime && occDiff && corpusDiff && (i + j + k + l) == dd.getAllDifferences().size()) isEqual = true;|\label{ln:diffsum}|
+  }
+  catch (SAXException e) 
+  {
+   e.printStackTrace();
+  } 
+  catch (IOException e) 
+  {
+   e.printStackTrace();
+  }
+ return isEqual; 
+}
+\end{lstlisting}
+In this method, XMLUnit is heavily used to make all necessary node comparisons. The matter becomes more complicated, however, if some attributes
+are not only ignored, but evaluated according to a given definition as it is the case for the time range. If the evaluator and builder classes are
+not to be overwritten entirely because needed for evaluating other nodes of the
+xml document, the above solution appears a bit awkward. So there is potential for improvement before the production version is to be programmed. 
+
+XMLUnit provides us with a
+list of the differences of the two documents (see line \ref{ln:difflist}). There are four differences allowed, that is, the attributes \emph{occurrence},
+\emph{corpus}, \emph{begin}, and \emph{end}. For each of them a Boolean variable is set. Because any of the attributes could also be equal to the master
+document and the difference list only contains the actual differences, one has to find a way to define both, equal and different, for the attributes.
+This could be done by ignoring these nodes. Yet, this would not include testing if the beginning and ending dates fall into the range of the master 
+document. Therefore the attributes are counted as lines \ref{ln:diffbegin} through \ref{ln:diffend} reveal. If any two documents
+differ in some of the four attributes just specified, then the sum of the counters (line \ref{ln:diffsum}) should not be greater than the collected differences
+by XMLUnit. The rest of the if-tests assign truth values to the respective
+Booleans. It is probably worth mentioning that if all counters are zero (lines
+\ref{ln:zerobegin}-\ref{ln:zeroend}) the attributes and values are identical and hence the Boolean has to be set explicitly. Otherwise the test in line \ref{ln:diffsum} would fail.
+
+%TagCorpusServlet
+Once quality control (explained in detail further down) has been passed, it is
+the user's turn to interact further. By clicking on the option \emph{Manual tagging}, the \emph{TagCorpusServlet} will be callled. This servlet instantiates 
+\emph{ProcessCorpusServlet} to get access to the \emph{getUnknownWords}-method, which delivers the words still to be
+processed and which overwrites the content of the file starting with \emph{untagged}. For the next word in \emph{leftovers} a new MyCoRe object is created
+using the JDOM API and added to the file beginning with \emph{processed}. In line \ref{ln:tagmanu} of listing \ref{src:tagservlet}, the previously defined
+entry mask is called, with which the proposed word structure could be confirmed or changed. How the word structure is determined will be shown later in 
+the text.
+\begin{lstlisting}[language=java,caption={Manual Tagging Procedure},label=src:tagservlet,escapechar=|]
+...
+if (!leftovers.isEmpty())
+{
+ ArrayList<String> processed = new ArrayList<String>();
+ //processed.add(leftovers.get(0));
+ JDOMorphilo jdm = new JDOMorphilo();
+ MCRObject obj = jdm.createMorphiloObject(job, leftovers.get(0));|\label{ln:jdomobject}|  		
+ //write word to be annotated in process list and save it
+ Path filePathProc = pcs.getDerivateFilePath(job, "processed").getFileName();
+ Path proc = root.resolve(filePathProc);
+ processed = pcs.getContentFromFile(job, "processed");
+ processed.add(leftovers.get(0));
+ Files.write(proc, processed);
+    		
+ //call entry mask for next word
+ tagUrl = prop.getBaseURL() + "content/publish/morphilo.xed?id=" + obj.getId();|\label{ln:tagmanu}|
+}
+else
+{
+ //initiate process to give a complete tagged file of the original corpus
+ //if untagged-file is empty, match original file with morphilo 
+ //creator=administrator OR creator=username and write matches in a new file
+ ArrayList<String> complete = new ArrayList<String>();
+ ProcessCorpusServlet pcs2 = new ProcessCorpusServlet();
+ complete = pcs2.getUnknownWords(
+  pcs2.getContentFromFile(job, ""), //main corpus file
+  pcs2.getCorpusMetadata(job, "def.datefrom"),
+  pcs2.getCorpusMetadata(job, "def.dateuntil"),
+  "", //wordtype
+  false,
+  false,
+  true);
+
+ Files.delete(p);
+ MCRXMLFunctions mdm = new MCRXMLFunctions();
+ String mainFile = mdm.getMainDocName(derivID);
+ Path newRoot = root.resolve("tagged-" + mainFile);
+ Files.write(newRoot, complete);
+            	
+ //return to Menu page
+ tagUrl = prop.getBaseURL() + "receive/" + corpID;
+}
+\end{lstlisting}
+At the point where no more items are in \emph{leftsovers} the \emph{getUnknownWords}-method is called whereas the last Boolean parameter
+is set true. This indicates that the array list containing all available and relevant data to the respective user is returned as seen in
+the code snippet in listing \ref{src:writeAll}.
+\begin{lstlisting}[language=java,caption={Code snippet to deliver all data to the user},label=src:writeAll,escapechar=|]
+...
+// all data is written to lo in TEI
+if (writeAllData && isAuthorized && timeCorrect)
+{
+ XPathExpression<Element> xpath = xpfac.compile("//morphiloContainer/morphilo", Filters.element());
+ for (Element e : xpath.evaluate(jdomDoc))
+ {
+  XMLOutputter outputter = new XMLOutputter();
+  outputter.setFormat(Format.getPrettyFormat());
+  lo.add(outputter.outputString(e.getContent()));
+ }
+}
+...
+\end{lstlisting}
+The complete list (\emph{lo}) is written to yet a third file starting with \emph{tagged} and finally returned to the main project webpage.
+
+%JDOMorphilo
+The interesting question is now where does the word structure come from, which is filled in the entry mask as asserted above. 
+In listing \ref{src:tagservlet} line \ref{ln:jdomobject}, one can see that a JDOM object is created and the method 
+\emph{createMorphiloObject(MCRServletJob, String)} is called. The string parameter is the word that needs to be analyzed.
+Most of the method is a mere application of the JDOM API given the data model in chapter \ref{chap:concept} section 
+\ref{subsec:datamodel} and listing \ref{lst:worddatamodel}. That means namespaces, elements and their attributes are defined in the correct
+order and hierarchy. 
+
+To fill the elements and attributes with text, i.e. prefixes, suffixes, stems, etc., a Hashmap -- containing the morpheme as
+key and its position as value -- are created that are filled with the results from an AffixStripper instantiation. Depending on how many prefixes 
+or suffixes respectively are put in the hashmap, the same number of xml elements are created. As a final step, a valid MyCoRe id is generated using
+the existing MyCoRe functionality, the object is created and returned to the TagCorpusServlet.
+
+%AffixStripper explanation
+Last, the analyses of the word structure will be considered. It is implemented
+in the \emph{AffixStripper.java} file.
+All lexical affix morphemes and their allomorphs as well as the inflections were extracted from the
+OED\footnote{Oxford English Dictionary http://www.oed.com/} and saved as enumerated lists (see the example in listing \ref{src:enumPref}). 
+The allomorphic items of these lists are mapped successively to the beginning in the case of prefixes
+(see listing \ref{src:analyzePref}, line \ref{ln:prefLoop}) or to the end of words in the case of suffixes 
+(see listing \ref{src:analyzeSuf}). Since each 
+morphemic variant maps to its morpheme right away, it makes sense to use the morpheme and so 
+implicitly keep the relation to its allomorph.
+
+\begin{lstlisting}[language=java,caption={Enumeration Example for the Prefix "over"},label=src:enumPref,escapechar=|]
+package custom.mycore.addons.morphilo;
+
+public enum PrefixEnum {
+...
+ over("over"), ufer("over"), ufor("over"), uferr("over"), uvver("over"), obaer("over"), ober("over)"), ofaer("over"), 
+ ofere("over"), ofir("over"), ofor("over"), ofer("over"), ouer("over"),oferr("over"), offerr("over"), offr("over"), aure("over"), 
+ war("over"), euer("over"), oferre("over"), oouer("over"), oger("over"), ouere("over"), ouir("over"), ouire("over"), 
+ ouur("over"), ouver("over"), ouyr("over"), ovar("over"), overe("over"), ovre("over"),ovur("over"), owuere("over"), owver("over"),
+ houyr("over"), ouyre("over"), ovir("over"), ovyr("over"), hover("over"), auver("over"), awver("over"), ovver("over"), 
+ hauver("over"), ova("over"), ove("over"), obuh("over"), ovah("over"), ovuh("over"), ofowr("over"), ouuer("over"), oure("over"), 
+ owere("over"), owr("over"), owre("over"), owur("over"), owyr("over"), our("over"), ower("over"), oher("over"), 
+ ooer("over"), oor("over"), owwer("over"), ovr("over"), owir("over"), oar("over"), aur("over"), oer("over"), ufara("over"), 
+ ufera("over"), ufere("over"), uferra("over"), ufora("over"), ufore("over"), ufra("over"), ufre("over"), ufyrra("over"), 
+ yfera("over"), yfere("over"), yferra("over"), uuera("over"), ufe("over"), uferre("over"), uuer("over"), uuere("over"), 
+ vfere("over"), vuer("over"), vuere("over"), vver("over"), uvvor("over") ...
+...chap:results
+ private String morpheme;
+ //constructor
+ PrefixEnum(String morpheme)
+ {
+  this.morpheme = morpheme;
+ }
+ //getter Method
+
+ public String getMorpheme() 
+ {
+  return this.morpheme;
+ }
+}
+\end{lstlisting}
+As can be seen in line \ref{ln:prefPutMorph} in listing \ref{src:analyzePref}, the morpheme is saved to a hash map together with its position, i.e. the size of the
+map plus one at the time being. In line \ref{ln:prefCutoff} the \emph{analyzePrefix} method is recursively called until no more matches can be made.
+
+\begin{lstlisting}[language=java,caption={Method to recognize prefixes},label=src:analyzePref,escapechar=|] 
+private Map<String, Integer> prefixMorpheme = new HashMap<String,Integer>();
+...
+private void analyzePrefix(String restword) 
+{ 
+ if (!restword.isEmpty()) //Abbruchbedingung fuer Rekursion 
+ { 
+  for (PrefixEnum prefEnum : PrefixEnum.values())|\label{ln:prefLoop}|
+  {
+   String s = prefEnum.toString();
+   if (restword.startsWith(s))
+   {
+	prefixMorpheme.put(s, prefixMorpheme.size() + 1);|\label{ln:prefPutMorph}|
+	//cut off the prefix that is added to the list
+	analyzePrefix(restword.substring(s.length()));|\label{ln:prefCutoff}|
+   }
+   else
+   {
+	analyzePrefix("");
+   }
+  }
+ }
+}
+\end{lstlisting}
+
+The recognition of suffixes differs only in the cut-off direction since suffixes occur at the end of a word. 
+Hence, line \ref{ln:prefCutoff} in listing \ref{src:analyzePref} reads in the case of suffixes.
+
+\begin{lstlisting}[language=java,caption={Cut-off mechanism for suffixes},label=src:analyzeSuf,escapechar=|]
+analyzeSuffix(restword.substring(0, restword.length() - s.length()));
+\end{lstlisting}
+
+It is important to note that inflections are suffixes (in the given model case of Middle English morphology) that usually occur at the very 
+end of a word, i.e. after all lexical suffixes, only once. It follows that inflections
+have to be recognized at first without any repetition. So the procedure for inflections can be simplified 
+to a substantial degree as listing \ref{src:analyzeInfl} shows.
+
+\begin{lstlisting}[language=java,caption={Method to recognize inflections},label=src:analyzeInfl,escapechar=|]
+private String analyzeInflection(String wrd)
+{
+ String infl = "";
+ for (InflectionEnum inflEnum : InflectionEnum.values()) 
+ {
+  if (wrd.endsWith(inflEnum.toString())) 
+  {
+   infl = inflEnum.toString();
+  }
+ }
+ return infl;
+}
+\end{lstlisting}
+
+Unfortunately the embeddedness problem prevents a very simple algorithm. Embeddedness occurs when a lexical item
+is a substring of another lexical item. To illustrate, the suffix \emph{ion} is also contained in the suffix \emph{ation}, as is
+\emph{ent} in \emph{ment}, and so on. The embeddedness problem cannot be solved completely on the basis of linear modelling, but 
+for a large part of embedded items one can work around it using implicitly Zipf's law, i.e. the correlation between frequency 
+and length of lexical items. The longer a word becomes, the less frequent it will occur. The simplest logic out of it is to assume
+that longer suffixes (measured in letters) are preferred over shorter suffixes because it is more likely tha the longer the suffix string becomes,
+the more likely it is one (as opposed to several) suffix unit(s). This is done in listing \ref{src:embedAffix}, whereas
+the inner class \emph{sortedByLengthMap} returns a list sorted by length and the loop from line \ref{ln:deleteAffix} onwards deletes
+the respective substrings.
+
+\begin{lstlisting}[language=java,caption={Method to workaround embeddedness},label=src:embedAffix,escapechar=|]
+private Map<String, Integer> sortOutAffixes(Map<String, Integer> affix)
+{
+ Map<String,Integer> sortedByLengthMap = new TreeMap<String, Integer>(new Comparator<String>() 
+  {
+   @Override
+   public int compare(String s1, String s2) 
+   {
+    int cmp = Integer.compare(s1.length(), s2.length());
+    return cmp != 0 ? cmp : s1.compareTo(s2);
+   }
+  }
+ );
+ sortedByLengthMap.putAll(affix);
+ ArrayList<String> al1 = new ArrayList<String>(sortedByLengthMap.keySet());
+ ArrayList<String> al2 = al1;
+ Collections.reverse(al2);
+ for (String s2 : al1)|\label{ln:deleteAffix}|
+ {
+  for (String s1 : al2)
+   if (s1.contains(s2) && s1.length() > s2.length())
+   {
+	affix.remove(s2);
+   }
+  }
+ return affix;
+}
+\end{lstlisting}
+
+Finally, the position of the affix has to be calculated because the hashmap in line \ref{ln:prefPutMorph} in 
+listing \ref{src:analyzePref} does not keep the original order for changes taken place in addressing the affix embeddedness 
+(listing \ref{src:embedAffix}). Listing \ref{src:affixPos} depicts the preferred solution.
+The recursive construction of the method is similar to \emph{private void analyzePrefix(String)} (listing \ref{src:analyzePref})
+only that the two affix types are handled in one method. For that, an additional parameter taking the form either \emph{suffix} 
+or \emph{prefix} is included.
+
+\begin{lstlisting}[language=java,caption={Method to determine position of the affix},label=src:affixPos,escapechar=|]
+private void getAffixPosition(Map<String, Integer> affix, String restword, int pos, String affixtype)
+{
+ if (!restword.isEmpty()) //Abbruchbedingung fuer Rekursion
+ {
+  for (String s : affix.keySet())
+  {
+   if (restword.startsWith(s) && affixtype.equals("prefix"))
+   {
+	pos++;
+	prefixMorpheme.put(s, pos);
+    //prefixAllomorph.add(pos-1, restword.substring(s.length()));
+	getAffixPosition(affix, restword.substring(s.length()), pos, affixtype);
+   }
+   else if (restword.endsWith(s) && affixtype.equals("suffix"))
+   {
+	pos++;
+	suffixMorpheme.put(s, pos);
+	//suffixAllomorph.add(pos-1, restword.substring(s.length()));
+	getAffixPosition(affix, restword.substring(0, restword.length() - s.length()), pos, affixtype);
+   }	
+   else
+   {
+	getAffixPosition(affix, "", pos, affixtype);
+   }
+  }
+ }
+}
+\end{lstlisting}
+
+To give the complete word structure, the root of a word should also be provided. In listing \ref{src:rootAnalyze} a simple solution is offered, however, 
+considering compounds as words consisting of more than one root.
+\begin{lstlisting}[language=java,caption={Method to determine roots},label=src:rootAnalyze,escapechar=|]
+private ArrayList<String> analyzeRoot(Map<String, Integer> pref, Map<String, Integer> suf, int stemNumber)
+{
+ ArrayList<String> root = new ArrayList<String>();
+ int j = 1; //one root always exists
+ // if word is a compound several roots exist
+ while (j <= stemNumber)
+ {
+  j++;
+  String rest = lemma;|\label{ln:lemma}|
+			
+  for (int i=0;i<pref.size();i++)
+  {
+   for (String s : pref.keySet())
+   {
+    //if (i == pref.get(s))
+	if (rest.length() > s.length() && s.equals(rest.substring(0, s.length())))
+	{
+	 rest = rest.substring(s.length(),rest.length());
+    }
+   }
+  }
+			
+  for (int i=0;i<suf.size();i++)
+  {
+   for (String s : suf.keySet())
+   {
+	//if (i == suf.get(s))
+	if (s.length() < rest.length() && (s.equals(rest.substring(rest.length() - s.length(), rest.length()))))
+	{
+	 rest = rest.substring(0, rest.length() - s.length());
+	}
+   }
+  }
+  root.add(rest);
+ }
+ return root;
+}
+\end{lstlisting}
+The logic behind this method is that the root is the remainder of a word when all prefixes and suffixes are substracted.
+So the loops run through the number of prefixes and suffixes at each position and substract the affix. Really, there is
+some code doubling with the previously described methods, which could be eliminated by making it more modular in a possible
+refactoring phase. Again, this is not the concern of a prototype. Line \ref{ln:lemma} defines the initial state of a root,
+which is the case for monomorphemic words. The \emph{lemma} is defined as the wordtoken without the inflection. Thus listing
+\ref{src:lemmaAnalyze} reveals how the class variable is calculated
+\begin{lstlisting}[language=java,caption={Method to determine lemma},label=src:lemmaAnalyze,escapechar=|]
+/*
+ * Simplification: lemma = wordtoken - inflection
+ */
+private String analyzeLemma(String wrd, String infl)
+{
+ return wrd.substring(0, wrd.length() - infl.length());
+}
+\end{lstlisting}
+The constructor of \emph{AffixStripper} calls the method \emph{analyzeWord()}
+whose only job is to calculate each structure element in the correct order
+(listing \ref{src:lemmaAnalyze}). All structure elements are also provided by getters.
+\begin{lstlisting}[language=java,caption={Method to determine all word structure},label=src:lemmaAnalyze,escapechar=|]
+private void analyzeWord()
+{
+ //analyze inflection first because it always occurs at the end of a word
+ inflection = analyzeInflection(wordtoken);
+ lemma = analyzeLemma(wordtoken, inflection);
+ analyzePrefix(lemma);
+ analyzeSuffix(lemma);
+ getAffixPosition(sortOutAffixes(prefixMorpheme), lemma, 0, "prefix");
+ getAffixPosition(sortOutAffixes(suffixMorpheme), lemma, 0, "suffix");
+ prefixNumber = prefixMorpheme.size();
+ suffixNumber = suffixMorpheme.size();
+ wordroot = analyzeRoot(prefixMorpheme, suffixMorpheme, getStemNumber());
+}
+\end{lstlisting}
+
+To conclude, the Morphilo implementation as presented here, aims at fulfilling the task of a working prototype. It is important to note
+that it neither claims to be a very efficient nor a ready software program to be used in production. However, it marks a crucial milestone
+on the way to a production system. At some listings sources of improvement were made explicit; at others no suggestions were made. In the latter
+case this does not imply that there is no potential for improvement. Once acceptability tests are carried out, it will be the task of a follow up project
+to identify these potentials and implement them accordingly.
\ No newline at end of file
--- a/Morphilo_doc/source/datamodel.rst
+++ b/Morphilo_doc/source/datamodel.rst
-Data Model Implementation
-=========================
+Data Model
+==========
+
+Conceptualization
+-----------------
+
+From both the user and task requirements one can derive that four basic
+functions of data processing need to be carried out. Data have to be read, persistently
+saved, searched, and deleted. Furthermore, some kind of user management
+and multi-user processing is necessary. In addition, the framework should
+support web technologies, be well documented, and easy to extent. Ideally, the
+MVC pattern is realized.
+
+\subsection{Data Model}\label{subsec:datamodel}
+The guidelines of the
+\emph{TEI}-standard\footnote{http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf} on the
+word level are defined in line with the structure defined above in section \ref{subsec:morphologicalSystems}. 
+In listing \ref{lst:teiExamp} an
+example is given for a possible markup at the word level for
+\emph{comfortable}.\footnote{http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-m.html}
+
+\begin{lstlisting}[language=XML,
+caption={TEI-example for 'comfortable'},label=lst:teiExamp] 
+<w type="adjective">
+ <m type="base">
+  <m type="prefix" baseForm="con">com</m>
+  <m type="root">fort</m>
+ </m>
+ <m type="suffix">able</m>
+</w>
+\end{lstlisting}
+
+This data model reflects just one theoretical conception of a word structure model. 
+Crucially, the model emanates from the assumption
+that the suffix node is on par with the word base. On the one hand, this 
+implies that the word stem directly dominates the suffix, but not the prefix. The prefix, on the 
+other hand, is enclosed in the base, which basically means a stronger lexical, 
+and less abstract, attachment to the root of a word. Modeling prefixes and suffixes on different
+hierarchical levels has important consequences for the branching direction at
+subword level (here right-branching). Left the theoretical interest aside, the
+choice of the TEI standard is reasonable with view to a sustainable architecture that allows for
+exchanging data with little to no additional adjustments. 
+
+The negative account is that the model is not eligible for all languages.
+It reflects a theoretical construction based on Indo-European
+languages. If attention is paid to which language this software is used, it will
+not be problematic. This is the case for most languages of the Indo-European
+stem and corresponds to the overwhelming majority of all research carried out
+(unfortunately).
+
+Implementation
+--------------
+
+As laid out in the task analysis in section \ref{subsec:datamodel}, it is
+advantageous to use established standards. It was also shown that it makes sense
+to keep the meta data of each corpus separate from the data model used for the
+words to be analyzed. 

 For the present case, the TEI-standard was identified as an
 appropriate markup for words. In terms of the implementation this means that
@@ -26,3 +81,161 @@ Whereas attributes of the objecttype are specific to the repository framework, t
 recognized in the hierarchy of the meta data element starting with the name
 \emph{w} (line \ref{src:wordbegin}).

+\begin{lstlisting}[language=XML,caption={Word Data
+model},label=lst:worddatamodel,escapechar=|] <?xml version="1.0" encoding="UTF-8"?>
+<objecttype
+ name="morphilo"
+ isChild="true"
+ isParent="true"
+ hasDerivates="true"
+ xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:noNamespaceSchemaLocation="datamodel.xsd">
+ <metadata>
+  <element name="morphiloContainer" type="xml" style="dontknow"
+ notinherit="true" heritable="false"> 
+   <xs:sequence>
+    <xs:element name="morphilo">
+     <xs:complexType>
+      <xs:sequence>
+       <xs:element name="w" minOccurs="0" maxOccurs="unbounded">|label{src:wordbegin}|
+        <xs:complexType mixed="true">
+         <xs:sequence>
+          <!-- stem -->
+          <xs:element name="m1" minOccurs="0" maxOccurs="unbounded">
+           <xs:complexType mixed="true">
+            <xs:sequence>
+             <!-- base -->
+             <xs:element name="m2" minOccurs="0" maxOccurs="unbounded">
+              <xs:complexType mixed="true">
+               <xs:sequence>
+                <!-- root -->
+                <xs:element name="m3" minOccurs="0" maxOccurs="unbounded">
+                 <xs:complexType mixed="true">
+                  <xs:attribute name="type" type="xs:string"/>
+                 </xs:complexType>
+                </xs:element>
+                <!-- prefix -->
+                <xs:element name="m4" minOccurs="0" maxOccurs="unbounded">
+                 <xs:complexType mixed="true">
+                  <xs:attribute name="type" type="xs:string"/>
+                  <xs:attribute name="PrefixbaseForm" type="xs:string"/>
+                  <xs:attribute name="position" type="xs:string"/>
+                 </xs:complexType>
+                </xs:element>
+               </xs:sequence>
+               <xs:attribute name="type" type="xs:string"/>
+              </xs:complexType>  
+             </xs:element>
+             <!-- suffix -->
+             <xs:element name="m5" minOccurs="0" maxOccurs="unbounded">
+              <xs:complexType mixed="true">
+               <xs:attribute name="type" type="xs:string"/>
+               <xs:attribute name="SuffixbaseForm" type="xs:string"/>
+               <xs:attribute name="position" type="xs:string"/>
+               <xs:attribute name="inflection" type="xs:string"/>
+              </xs:complexType>
+             </xs:element>
+            </xs:sequence>
+            <!-- stem-Attribute -->
+            <xs:attribute name="type" type="xs:string"/>
+            <xs:attribute name="pos" type="xs:string"/>
+            <xs:attribute name="occurrence" type="xs:string"/>
+           </xs:complexType>
+          </xs:element>
+         </xs:sequence>
+         <!-- w -Attribute auf Wortebene -->
+         <xs:attribute name="lemma" type="xs:string"/>
+         <xs:attribute name="complexType" type="xs:string"/>
+         <xs:attribute name="wordtype" type="xs:string"/>
+         <xs:attribute name="occurrence" type="xs:string"/>
+         <xs:attribute name="corpus" type="xs:string"/>
+         <xs:attribute name="begin" type="xs:string"/>
+         <xs:attribute name="end" type="xs:string"/>
+        </xs:complexType>
+       </xs:element>
+      </xs:sequence>
+     </xs:complexType>
+    </xs:element>
+   </xs:sequence>
+  </element>
+  <element name="wordtype" type="classification" minOccurs="0" maxOccurs="1">
+   <classification id="wordtype"/>
+  </element>
+  <element name="complexType" type="classification" minOccurs="0" maxOccurs="1">
+   <classification id="complexType"/>
+  </element>
+  <element name="corpus" type="classification" minOccurs="0" maxOccurs="1">
+   <classification id="corpus"/>
+  </element>
+  <element name="pos" type="classification" minOccurs="0" maxOccurs="1">
+   <classification id="pos"/>
+  </element>
+  <element name="PrefixbaseForm" type="classification" minOccurs="0"
+  maxOccurs="1"> 
+   <classification id="PrefixbaseForm"/> 
+  </element>
+  <element name="SuffixbaseForm" type="classification" minOccurs="0"
+  maxOccurs="1"> 
+   <classification id="SuffixbaseForm"/> 
+  </element>
+  <element name="inflection" type="classification" minOccurs="0" maxOccurs="1">
+   <classification id="inflection"/>
+  </element>
+  <element name="corpuslink" type="link" minOccurs="0" maxOccurs="unbounded" >
+   <target type="corpmeta"/>
+  </element>
+ </metadata>
+</objecttype>
+\end{lstlisting}
+
+Additionally, it is worth mentioning that some attributes are modeled as a 
+\emph{classification}. All these have to be listed
+as separate elements in the data model. This has been done for all attributes
+that are more or less subject to little or no change. In fact, all known suffix
+and prefix morphemes should be known for the language investigated and are
+therefore defined as a classification.
+The same is true for the parts of speech named \emph{pos} in the morphilo data
+model above.
+Here the PENN-Treebank tagset was used. Last, the different morphemic layers in
+the standard model named \emph{m} are changed to $m1$ through $m5$. This is the
+only change in the standard that could be problematic if the data is to be
+processed elsewhere and the change is not documented more explicitly. Yet, this
+change was necessary for the MyCoRe repository throws errors caused by ambiguity 
+issues on the different $m$-layers.
+
+The second data model describes only very few properties of the text corpora
+from which the words are extracted. Listing \ref{lst:corpusdatamodel} depicts
+only the meta data element. For the sake of simplicity of the prototype, this
+data model is kept as simple as possible. The obligatory field is the name of
+the corpus. Specific dates of the corpus are classified as optional because in
+some cases a text cannot be dated reliably. 
+
+
+\begin{lstlisting}[language=XML,caption={Corpus Data
+Model},label=lst:corpusdatamodel] 
+<metadata> 
+ <!-- Pflichtfelder --> 
+ <element name="korpusname" type="text" minOccurs="1" maxOccurs="1"/> 
+ <!-- Optionale Felder --> 
+ <element name="sprache" type="text" minOccurs="0" maxOccurs="1"/>
+ <element name="size" type="number" minOccurs="0" maxOccurs="1"/>
+ <element name="datefrom" type="text" minOccurs="0" maxOccurs="1"/>
+ <element name="dateuntil" type="text" minOccurs="0" maxOccurs="1"/>
+ <!-- number of words -->
+ <element name="NoW" type="text" minOccurs="0" maxOccurs="1"/>
+ <element name="corpuslink" type="link" minOccurs="0" maxOccurs="unbounded">
+  <target type="morphilo"/>
+ </element>
+</metadata>
+\end{lstlisting}
+
+As a final remark, one might have noticed that all attributes are modelled as
+strings although other data types are available and fields encoding the dates or
+the number of words suggest otherwise. The MyCoRe framework even provides a
+data type \emph{historydate}. There is not a very satisfying answer to its
+disuse.
+All that can be said is that the use of data types different than the string
+leads later on to problems in the convergence between the search engine and the
+repository framework. These issues seem to be well known and can be followed on
+github.
\ No newline at end of file
--- a/Morphilo_doc/source/framework.rst
+++ b/Morphilo_doc/source/framework.rst
+Framework
+=========
+
+\begin{figure}
+	\centering
+	\includegraphics[scale=0.33]{mycore_architecture-2.png}
+	\caption[MyCoRe-Architecture and Components]{MyCoRe-Architecture and Components\protect\footnotemark}
+	\label{fig:abbMyCoReStruktur}
+\end{figure}
+\footnotetext{source: https://www.mycore.de}
+To specify the MyCoRe framework the morphilo application logic will have to be implemented, 
+the TEI data model specified, and the input, search and output mask programmed. 
+
+There are three directories which are
+important for adjusting the MyCoRe framework to the needs of one's own application. These three directories
+correspond essentially to the three components in the MVC model as explicated in
+section \ref{subsec:mvc}. Roughly, they are envisualized in figure \ref{fig:abbMyCoReStruktur} in the upper 
+right hand corner. More precisely, the view (\emph{Layout} in figure \ref{fig:abbMyCoReStruktur}) and the model layer 
+(\emph{Datenmodell} in figure \ref{fig:abbMyCoReStruktur}) can be done
+completely via the ``interface'', which is a directory with a predefined
+structure and some standard files. For the configuration of the logic an extra directory is offered (/src/main/java/custom/mycore/addons/). Here all, java classes
+extending the controller layer should be added.
+Practically, all three MVC layers are placed in the
+\emph{src/main/}-directory of the application. In one of the subdirectories, 
+\emph{datamodel/def}, the datamodel specifications are defined as xml files. It parallels the model
+layer in the MVC pattern. How the data model was defined will be explained in
+section \ref{subsec:datamodelimpl}. 
\ No newline at end of file
--- a/Morphilo_doc/source/view.rst
+++ b/Morphilo_doc/source/view.rst
+View
+====
+
+Conceptualization
+-----------------
+
+Lastly, the third directory (\emph{src/main/resources}) contains all code needed
+for rendering the data to be displayed on the screen. So this corresponds to
+the view in an MVC approach. It is done by xsl-files that (unfortunately)
+contain some logic that really belongs to the controller. Thus, the division is
+not as clear as implied in theory. I will discuss this issue more specifically in the
+relevant subsection below. Among the resources are also all images, styles, and
+javascripts.
+
+Implementation
+--------------
+
+As explained in section \ref{subsec:mvc}, the view component handles the visual
+representation in the form of an interface that allows interaction between
+the user and the task to be carried out by the machine. As a
+webservice in the present case, all interaction happens via a browser, i.e. webpages are
+visualized and responses are recognized by registering mouse or keyboard
+events. More specifically, a webpage is rendered by transforming xml documents
+to html pages. The MyCoRe repository framework uses an open source XSLT
+processor from Apache, Xalan.\footnote{http://xalan.apache.org} This engine
+transforms document nodes described by the XPath syntax into hypertext making
+use of a special form of template matching. All templates are collected in so
+called xml-encoded stylesheets. Since there are two data models with two
+different structures, it is good practice to define two stylesheet files one for
+each data model.
+
+As a demonstration, in listing \ref{lst:morphilostylesheet} below a short
+extract is given for rendering the word data. 
+
+\begin{lstlisting}[language=XML,caption={stylesheet
+morphilo.xsl},label=lst:morphilostylesheet]
+<?xml version="1.0" encoding="UTF-8"?>
+<xsl:stylesheet 
+ xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
+ xmlns:xalan="http://xml.apache.org/xalan"
+ xmlns:i18n="xalan://org.mycore.services.i18n.MCRTranslation"
+ xmlns:acl="xalan://org.mycore.access.MCRAccessManager"
+ xmlns:mcr="http://www.mycore.org/" xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:mods="http://www.loc.gov/mods/v3"
+ xmlns:encoder="xalan://java.net.URLEncoder"
+ xmlns:mcrxsl="xalan://org.mycore.common.xml.MCRXMLFunctions"
+ xmlns:mcrurn="xalan://org.mycore.urn.MCRXMLFunctions"
+ exclude-result-prefixes="xalan xlink mcr i18n acl mods mcrxsl mcrurn encoder"
+ version="1.0"> 
+ <xsl:param name="MCR.Users.Superuser.UserName"/>
+ 
+ <xsl:template match="/mycoreobject[contains(@ID,'_morphilo_')]">
+  <head>
+   <link href="{$WebApplicationBaseURL}css/file.css" rel="stylesheet"/>
+  </head>
+  <div class="row">
+   <xsl:call-template name="objectAction">
+    <xsl:with-param name="id" select="@ID"/>
+    <xsl:with-param name="deriv" select="structure/derobjects/derobject/@xlink:href"/>
+   </xsl:call-template>
+   <xsl:variable name="objID" select="@ID"/>
+   <!-- Hier Ueberschrift setzen -->
+   <h1 style="text-indent: 4em;">
+    <xsl:if test="metadata/def.morphiloContainer/morphiloContainer/morphilo/w">
+     <xsl:value-of select="metadata/def.morphiloContainer/morphiloContainer/morphilo/w/text()[string-length(normalize-space(.))>0]"/>
+    </xsl:if>
+   </h1>
+   <dl class="dl-horizontal">
+   <!-- (1) Display word -->
+    <xsl:if test="metadata/def.morphiloContainer/morphiloContainer/morphilo/w">
+     <dt>
+      <xsl:value-of select="i18n:translate('response.page.label.word')"/>
+     </dt>
+     <dd>
+      <xsl:value-of select="metadata/def.morphiloContainer/morphiloContainer/morphilo/w/text()[string-length(normalize-space(.))>0]"/>
+     </dd>
+    </xsl:if>
+   <!-- (2) Display lemma -->
+    ...
+ </xsl:template>
+ ...
+ <xsl:template name="objectAction">
+ ...
+ </xsl:template>
+...  
+</xsl:stylesheet>
+\end{lstlisting}
+This template matches with
+the root node of each \emph{MyCoRe object} ensuring that a valid MyCoRe model is
+used and checking that the document to be processed contains a unique
+identifier, here a \emph{MyCoRe-ID}, and the name of the correct data model,
+here \emph{morphilo}.
+Then, another template, \emph{objectAction}, is called together with two parameters, the ids
+of the document object and attached files.  In the remainder all relevant
+information from the document is accessed by XPath, such as the word and the lemma,
+and enriched with hypertext annotations it is rendered as a hypertext document.
+The template \emph{objectAction} is key to understand the coupling process in the software
+framework. It is therefore separately listed in \ref{lst:objActionTempl}.
+
+\begin{lstlisting}[language=XML,caption={template
+objectAction},label=lst:objActionTempl,escapechar=|]
+<xsl:template name="objectAction">
+ <xsl:param name="id" select="./@ID"/>
+ <xsl:param name="accessedit" select="acl:checkPermission($id,'writedb')"/>
+ <xsl:param name="accessdelete" select="acl:checkPermission($id,'deletedb')"/>
+ <xsl:variable name="derivCorp" select="./@label"/>
+ <xsl:variable name="corpID" select="metadata/def.corpuslink[@class='MCRMetaLinkID']/corpuslink/@xlink:href"/>
+ <xsl:if test="$accessedit or $accessdelete">|\label{ln:ng}|   
+ <div class="dropdown pull-right">
+  <xsl:if test="string-length($corpID) &gt; 0 or $CurrentUser='administrator'"> 
+   <button class="btn btn-default dropdown-toggle" style="margin:10px" type="button" id="dropdownMenu1" data-toggle="dropdown" aria-expanded="true"> 
+    <span class="glyphicon glyphicon-cog" aria-hidden="true"></span> Annotieren
+    <span class="caret"></span>
+   </button>
+  </xsl:if>
+  <xsl:if test="string-length($corpID) &gt; 0">|\label{ln:ru}|
+   <xsl:variable name="ifsDirectory" select="document(concat('ifs:/',$derivCorp))"/>
+   <ul class="dropdown-menu" role="menu" aria-labelledby="dropdownMenu1">
+    <li role="presentation">
+     |\label{ln:nw1}|<a href="{$ServletsBaseURL}object/tag{$HttpSession}?id={$derivCorp}&amp;objID={$corpID}" role="menuitem" tabindex="-1">|\label{ln:nw2}| 
+      <xsl:value-of select="i18n:translate('object.nextObject')"/>
+     </a>
+    </li>
+    <li role="presentation">
+     <a href="{$WebApplicationBaseURL}receive/{$corpID}" role="menuitem" tabindex="-1">
+      <xsl:value-of select="i18n:translate('object.backToProject')"/>
+     </a>
+    </li>
+   </ul>
+  </xsl:if>
+  <xsl:if test="$CurrentUser='administrator'">
+   <ul class="dropdown-menu" role="menu" aria-labelledby="dropdownMenu1">
+	<li role="presentation">
+	 <a role="menuitem" tabindex="-1" href="{$WebApplicationBaseURL}content/publish/morphilo.xed?id={$id}">
+	  <xsl:value-of select="i18n:translate('object.editWord')"/>
+	 </a>
+	</li>
+	<li role="presentation">
+	 <a href="{$ServletsBaseURL}object/delete{$HttpSession}?id={$id}" role="menuitem" tabindex="-1" class="confirm_deletion option" data-text="Wirklich loeschen"> 
+	  <xsl:value-of select="i18n:translate('object.delWord')"/>
+	 </a>
+    </li>
+   </ul>  
+  </xsl:if>
+  </div>     
+  <div class="row" style="margin-left:0px; margin-right:10px">
+   <xsl:apply-templates select="structure/derobjects/derobject[acl:checkPermission(@xlink:href,'read')]">
+    <xsl:with-param name="objID" select="@ID"/>
+   </xsl:apply-templates>
+  </div>
+ </xsl:if>
+</xsl:template>
+\end{lstlisting}
+The \emph{objectAction} template defines the selection menu appearing -- once manual tagging has
+started -- on the upper right hand side of the webpage entitled
+\emph{Annotieren} and displaying the two options \emph{next word} or \emph{back
+to project}.
+The first thing to note here is that in line \ref{ln:ng} a simple test
+excludes all guest users from accessing the procedure. After ensuring that only
+the user who owns the corpus project has access (line \ref{ln:ru}), s/he will be
+able to access the drop down menu, which is really a url, e.g. line
+\ref{ln:nw1}. The attentive reader might have noticed that
+the url exactly matches the definition in the web-fragment.xml as shown in
+listing \ref{lst:webfragment}, line \ref{ln:tag}, which resolves to the
+respective java class there. Really, this mechanism is the data interface within the
+MVC pattern. The url also contains two variables, named \emph{derivCorp} and
+\emph{corpID}, that are needed to identify the corpus and file object by the
+java classes (see section \ref{sec:javacode}).
+
+The morphilo.xsl stylesheet contains yet another modification that deserves mention.
+In listing \ref{lst:derobjectTempl}, line \ref{ln:morphMenu}, two menu options -- 
+\emph{Tag automatically} and \emph{Tag manually} -- are defined. The former option
+initiates ProcessCorpusServlet.java as can be seen again in listing \ref{lst:webfragment}, 
+line \ref{ln:process}, which determines words that are not in the master data base. 
+Still, it is important to note that the menu option is only displayed if two restrictions
+are met. First, a file has to be uploaded (line \ref{ln:1test}) and, second, there must be
+only one file. This is necessary because in the annotation process other files will be generated
+that store the words that were not yet processed or a file that includes the final result. The
+generated files follow a certain pattern. The file harboring the final, entire TEI-annotated
+corpus is prefixed by \emph{tagged}, the other file is prefixed \emph{untagged}. This circumstance
+is exploited for manipulating the second option (line \ref{ln:loop}). A loop runs through all
+files in the respective directory and if a file name starts with \emph{untagged}, 
+the option to manually tag is displayed.
+
+\begin{lstlisting}[language=XML,caption={template
+matching derobject},label=lst:derobjectTempl,escapechar=|]
+<xsl:template match="derobject" mode="derivateActions">
+ <xsl:param name="deriv" />
+ <xsl:param name="parentObjID" />
+ <xsl:param name="suffix" select="''" />
+ <xsl:param name="id" select="../../../@ID" />
+ <xsl:if test="acl:checkPermission($deriv,'writedb')">
+  <xsl:variable name="ifsDirectory" select="document(concat('ifs:',$deriv,'/'))" />
+  <xsl:variable name="path" select="$ifsDirectory/mcr_directory/path" />
+ ...
+   <div class="options pull-right">
+    <div class="btn-group" style="margin:10px">
+     <a href="#" class="btn btn-default dropdown-toggle" data-toggle="dropdown">
+      <i class="fa fa-cog"></i>
+      <xsl:value-of select="' Korpus'"/>
+      <span class="caret"></span>
+     </a>
+    <ul class="dropdown-menu dropdown-menu-right">
+     <!-- Anpasssungen Morphilo -->|\label{ln:morphMenu}|
+     <xsl:if test="string-length($deriv) &gt; 0">|\label{ln:1test}|
+      <xsl:if test="count($ifsDirectory/mcr_directory/children/child) = 1">|\label{ln:2test}|
+       <li role="presentation">
+        <a href="{$ServletsBaseURL}object/process{$HttpSession}?id={$deriv}&amp;objID={$id}" role="menuitem" tabindex="-1">
+         <xsl:value-of select="i18n:translate('derivate.process')"/>
+        </a>
+       </li>
+      </xsl:if>
+      <xsl:for-each select="$ifsDirectory/mcr_directory/children/child">|\label{ln:loop}|
+       <xsl:variable name="untagged" select="concat($path, 'untagged')"/>
+       <xsl:variable name="filename" select="concat($path,./name)"/>
+       <xsl:if test="starts-with($filename, $untagged)">
+        <li role="presentation">
+         <a href="{$ServletsBaseURL}object/tag{$HttpSession}?id={$deriv}&amp;objID={$id}" role="menuitem" tabindex="-1">
+          <xsl:value-of select="i18n:translate('derivate.taggen')"/>
+         </a>
+        </li>
+       </xsl:if>
+      </xsl:for-each>
+     </xsl:if>
+    ...       
+    </ul>
+   </div>
+  </div>
+ </xsl:if> 
+</xsl:template>
+\end{lstlisting}
+
+Besides the two stylesheets morphilo.xsl and corpmeta.xsl, other stylesheets have
+to be adjusted. They will not be discussed in detail here for they are self-explanatory for the most part.
+Essentially, they render the overall layout (\emph{common-layout.xsl}, \emph{skeleton\_layout\_template.xsl}) 
+or the presentation
+of the search results (\emph{response-page.xsl}) and definitions of the solr search fields (\emph{searchfields-solr.xsl}).
+The former and latter also inherit templates from \emph{response-general.xsl} and \emph{response-browse.xsl}, in which the
+navigation bar of search results can be changed. For the use of multilinguality a separate configuration directory 
+has to be created containing as many \emph{.property}-files as different
+languages want to be displayed. In the current case these are restricted to German and English (\emph{messages\_de.properties} and \emph{messages\_en.properties}). 
+The property files include all \emph{i18n} definitions. All these files are located in the \emph{resources} directory.
+
+Furthermore, a search mask and a page for manually entering the annotations had
+to be designed.
+For these files a specially designed xml standard (\emph{xed}) is recommended to be used within the
+repository framework.
\ No newline at end of file