name |
boilerpipeR |
short description |
Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates. |
software category |
scraping websites |
developer |
C. Kohlschütter,P.Fankhauser,W.Nejdl |
maintainer |
Mario Annau |
current version |
None |
last changed |
None |
programming lanuage(s) |
R |
operating system(s) |
|
license |
GPL-3.0 |
costs |
0 |
language |
|
architecture |
library |
web-links |
supported methods |
|
additional features |