... | ... | @@ -30,6 +30,22 @@ Complex string manipulations by searching and replacing specific patterns. |
|
|
Transfer between different formats in order to unify and handle vacancies.
|
|
|
|
|
|
|
|
|
==== algoritmic text extraction
|
|
|
Automated text extraction from markup language. See: (<<Günther_etal2014>>,117)
|
|
|
|
|
|
|
|
|
===== body text extraction
|
|
|
Wrangling on structure based features of webpages. See: (<<Günther_etal2014>>,118)
|
|
|
|
|
|
|
|
|
===== boilerpipe
|
|
|
More local approuch than body text extraction (<<Günther_etal2014>>,118)
|
|
|
|
|
|
|
|
|
===== jusText
|
|
|
A heuristic based boilerplate removal tool See: (<<Pomikálek2011>>)
|
|
|
|
|
|
|
|
|
=== text preprocessing
|
|
|
Some text preprocessing tasks in natuaral language processing.
|
|
|
|
... | ... | @@ -293,6 +309,8 @@ Using methods to predict the future for estimation of current values. (Example: |
|
|
|
|
|
- [[[Cabrio2018]]] Cabrio, E., & Villata, S. (2018). Five years of argument mining: a data-driven analysis. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 5427–5433).
|
|
|
|
|
|
- [[[Günther_etal2014]]] Günther E., Scharkow M. (2014). Automatisierte Datenbereinigung bei Inhalts- und Linkanalysen von Online-Nachrichten, in: K. Sommer, M. Wettstein, W. Wirth, & J. Matthes (Hrsg.): Automatisierung der Inhaltsanalyse, Köln: Halem, 111 - 126
|
|
|
|
|
|
- [[[Han_etal2012]]] Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. Saint Louis, UNITED STATES: Elsevier Science & Technology.
|
|
|
|
|
|
- [[[Ignatow_etal2017]]] Ignatow, G., & Mihalcea, R. F. (2017). Text mining: A guidebook for the social sciences. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: Sage.
|
... | ... | @@ -307,6 +325,8 @@ Using methods to predict the future for estimation of current values. (Example: |
|
|
|
|
|
- [[[Papilloud_etal2018]]] Papilloud, C., & Hinneburg, A. (Eds.). (2018). Studienskripten zur Soziologie. Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer VS.
|
|
|
|
|
|
- [[[Pomikálek2011]]] J. Pomikálek (2011). Removing Boilerplate and Duplicate Content from Web Corpora. PhD thesis, Masaryk University, Brno, 2011.
|
|
|
|
|
|
- [[[Roberts2013]]] Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M., & others (2013). The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation (pp. 1–20).
|
|
|
|
|
|
- [[[Rogers2013]]] Rogers, R. (2013). Digital methods. Cambridge, Massachusetts, London, England: The MIT Press.
|
... | ... | |