Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# INEL Corpus Services
### How to run it through CLI
An example:
> java -Xmx3g -jar /path/to/corpus-services.jar -i /path/to/corpus -o path/to/corpus/curation/report-output.html -c INELChecks -f -p "fsm=/path/to/corpus/corpus-utilities/segmentation.fsm"
**Available options**
*-i*, *--input*
Required. The path to source file(s) you want to perform an action on.
> -i /path/to/corpus
If it's a path to a directory, then the action will be applied to all the eligible files within the directory and all subdirectories.
> -i /path/to/corpus/selkup.coma
If it's a path to a file, then the action will be applied to that file only.
*-o*, *--output*
Optional. The path to a report file (HTML or JSON) containing warnings and errors found in the source data. If this option is not provided, no report will be made.
> -o path/to/corpus/curation/report-output.html
Produces an HTML version of the report that can be viewed in a browser.
> -o path/to/corpus/curation/report.json
Produces a JSON version of the report.
> -o path/to/corpus/curation/report-output.html -o path/to/corpus/curation/report.json
You may provide the option twice to produce both versions of the report.
*-c*, *--corpusfunction* / *-u*, *--utilityfunction*
Optional. The name of the function you want to run AKA the case-sensitive name of the respective java class from .validation or .utilities package. If neither *-c* nor *-u* is provided, Corpus Services will do nothing. Currently *-c* and *-u* perform the same actions and are thus interchangeable, thought that may be subject to change in the future.
> -i selkup.coma -c ComaApostropheChecker
Will call ComaApostropheChecker on the comafile.
> -i selkup.coma -u PrettyPrintData
Will call PrettyPrintData on the comafile.
> -i selkup.coma -c ComaApostropheChecker -c ComaAttachedFilepathsChecker -c ComaFileCoverageChecker
You may chain the option to run several checking classes during the same run.
> -c INELChecks
A useful shortcut to run all the functions from the .validation package.
*-f*, *--fix*
Optional, boolean. If selected, Corpus Services will automatically fix some errors where possible and rewrite the source files. If not, Corpus Services will collect issues to be written in a report, and no changes to the source files will be made.
*-p*, *--property*
Optional. Some checks require properties to be provided by a user, and otherwise will not run correctly. The general syntax is *-p "property_name=property_value".*
> -c ExbSegmentationChecker -p "fsm=/path/to/corpus/corpus-utilities/segmentation.fsm"
ExbSegmentationChecker looks for an external FSM to perform segmentation. The *property_name* in this example is *fsm*, the *property_value* is */path/to/corpus/corpus-utilities/segmentation.fsm*.
> -u ComaMassLinkFiles -p "exb=true" -p "eaf=true" -f
You may chain several properties in one call, same as with *-c/-u*. In the example above, ComaMassLinkFiles is being used to automatically link EXB and EAF files to their respective communications in the comafile.
*-x*, *--xquery*
Optional, specific to the class XQueryWrapper. Contains the query name.
> -u XQueryWrapper -x pos
Here Corpus Services will run the query named *pos* that counts parts of speech across several corpora.