# INEL Corpus Services

### How to run it through CLI

An example:

> java -Xmx3g -jar /path/to/corpus-services.jar -i /path/to/corpus -o path/to/corpus/curation/report-output.html -c INELChecks -f -p "fsm=/path/to/corpus/corpus-utilities/segmentation.fsm"

**Available options**

*-i*, *--input*

Required. The path to source file(s) you want to perform an action on.

> -i /path/to/corpus

If it's a path to a directory, then the action will be applied to all the eligible files within the directory and all subdirectories.

> -i /path/to/corpus/selkup.coma

If it's a path to a file, then the action will be applied to that file only.

*-o*, *--output*

Optional. The path to a report file (HTML or JSON) containing warnings and errors found in the source data. If this option is not provided, no report will be made.

> -o path/to/corpus/curation/report-output.html

Produces an HTML version of the report that can be viewed in a browser.

> -o path/to/corpus/curation/report.json

Produces a JSON version of the report.

> -o path/to/corpus/curation/report-output.html -o path/to/corpus/curation/report.json

You may provide the option twice to produce both versions of the report.

*-c*, *--corpusfunction* / *-u*, *--utilityfunction*

Optional. The name of the function you want to run AKA the case-sensitive name of the respective java class from .validation or .utilities package. If neither *-c* nor *-u* is provided, Corpus Services will do nothing. Currently *-c* and *-u* perform the same actions and are thus interchangeable, thought that may be subject to change in the future.

> -i selkup.coma -c ComaApostropheChecker

Will call ComaApostropheChecker on the comafile.

> -i selkup.coma -u PrettyPrintData

Will call PrettyPrintData on the comafile.

> -i selkup.coma -c ComaApostropheChecker -c ComaAttachedFilepathsChecker -c ComaFileCoverageChecker

You may chain the option to run several checking classes during the same run.

> -c INELChecks

A useful shortcut to run all the functions from the .validation package.

*-f*, *--fix*

Optional, boolean. If selected, Corpus Services will automatically fix some errors where possible and rewrite the source files. If not, Corpus Services will collect issues to be written in a report, and no changes to the source files will be made.

*-p*, *--property*

Optional. Some checks require properties to be provided by a user, and otherwise will not run correctly. The general syntax is *-p "property_name=property_value".*

> -c ExbSegmentationChecker -p "fsm=/path/to/corpus/corpus-utilities/segmentation.fsm"

ExbSegmentationChecker looks for an external FSM to perform segmentation. The *property_name* in this example is *fsm*, the *property_value* is */path/to/corpus/corpus-utilities/segmentation.fsm*.

> -u ComaMassLinkFiles -p "exb=true" -p "eaf=true" -f

You may chain several properties in one call, same as with *-c/-u*. In the example above, ComaMassLinkFiles is being used to automatically link EXB and EAF files to their respective communications in the comafile.

*-x*, *--xquery*

Optional, specific to the class XQueryWrapper. Contains the query name.

> -u XQueryWrapper -x pos

Here Corpus Services will run the query named *pos* that counts parts of speech across several corpora.