... | ... | @@ -18,35 +18,6 @@ In this project a more inclusive conception of digital methods is assumed: the u |
|
|
Refers to the complete process of 'knowledge mining from data'.<<Han_etal2012>> Can be applied on various data types and consists of different steps and paradigms. For an application in the context of text mining in the social science see the concept "blended-reading" (<<Stulpe_etal2016>>).
|
|
|
|
|
|
|
|
|
=== automated data collection
|
|
|
In principal there are multiple possible data sources in a data mining process. A basic distinction in relevance to automated data collection can be drawn between connected devices(internet, intranets) or unconnected devices(sensors, etc.). +
|
|
|
Furthermore the server-client-model is the established communication paradigms for connected devices. In order to obtain data either from server or client there exists three different interfaces: log files, apis and user interfaces which constitute the available procedures <<Jünger2018>>.
|
|
|
|
|
|
|
|
|
==== collect log-data
|
|
|
Collect log data which occur during providing the (web-)service or the information processing.
|
|
|
|
|
|
|
|
|
==== parsing from api
|
|
|
Parse structured data from via a documented REST-API.
|
|
|
|
|
|
|
|
|
==== scraping
|
|
|
Automatically parse unstructured or semi-structured data from a normal website (⇒ web-scraping) or service.
|
|
|
|
|
|
|
|
|
===== scraping (static content)
|
|
|
Automatically parse data from static HTML websites.
|
|
|
|
|
|
|
|
|
===== scraping (dynamic content)
|
|
|
Automatically parse dynamic content (HTML5/Javascript,) ⇒ sometimes requires mimicking user-interaction.
|
|
|
|
|
|
|
|
|
==== crawling
|
|
|
Collect websites with an initial set of webpages by following contained links <<Ignatow_etal2017>>.
|
|
|
|
|
|
|
|
|
=== data wrangling
|
|
|
Translate data into suited formats for automatic analysis. Examples: PDFs ⇒ Text . For a practical framework refer also <<Wickham_etal2017>>.
|
|
|
|
... | ... | @@ -208,6 +179,35 @@ Visualizations with user interaction or animations. |
|
|
|
|
|
|
|
|
|
|
|
=== automated data collection
|
|
|
In principal there are multiple possible data sources in a data mining process. A basic distinction in relevance to automated data collection can be drawn between connected devices(internet, intranets) or unconnected devices(sensors, etc.). +
|
|
|
Furthermore the server-client-model is the established communication paradigms for connected devices. In order to obtain data either from server or client there exists three different interfaces: log files, apis and user interfaces which constitute the available procedures <<Jünger2018>>.
|
|
|
|
|
|
|
|
|
==== collect log-data
|
|
|
Collect log data which occur during providing the (web-)service or the information processing.
|
|
|
|
|
|
|
|
|
==== parsing from api
|
|
|
Parse structured data from via a documented REST-API.
|
|
|
|
|
|
|
|
|
==== scraping
|
|
|
Automatically parse unstructured or semi-structured data from a normal website (⇒ web-scraping) or service.
|
|
|
|
|
|
|
|
|
===== scraping (static content)
|
|
|
Automatically parse data from static HTML websites.
|
|
|
|
|
|
|
|
|
===== scraping (dynamic content)
|
|
|
Automatically parse dynamic content (HTML5/Javascript,) ⇒ sometimes requires mimicking user-interaction.
|
|
|
|
|
|
|
|
|
==== crawling
|
|
|
Collect websites with an initial set of webpages by following contained links <<Ignatow_etal2017>>.
|
|
|
|
|
|
|
|
|
=== digital research design
|
|
|
New possibilities in surveys or data aquisition techniques.
|
|
|
|
... | ... | |