Changes
Page history
splitted tools
authored
Jul 12, 2019
by
Gallenkamp, Fabian
Show whitespace changes
Inline
Side-by-side
Tool_pdf2xml.asciidoc
0 → 100644
View page @
8552586d
.general data
|===
| name | pdf2xml
| short description | convert PDF files to XML. This script heavily relies on Apache Tika and pdftotext for the extraction of text and the conversion to XML. It tries to combine information from both tools and different conversion modes:
| software category | scraping documents
| developer | Jörg Tiedemann
| maintainer | Jörg Tiedemann
| current version | None
| last changed | None
| programming lanuage(s) | Perl, Java
| operating system(s)|
| license | GPL-3.0
| costs | 0
| language |
| architecture | library
| web-links | link:https://bitbucket.org/tiedemann/pdf2xml/src/master/[repository],
|===
.features
|===
| supported methods |
| additional features |
|===
\ No newline at end of file