Changes

Gallenkamp, Fabian · 2a9c30f9
--- a/MethodsList.asciidoc
+++ b/MethodsList.asciidoc
@@ -4,212 +4,291 @@
 :sectnums:
 :sectnumlevels: 8

-= Digitale Methoden

-== Automatisierte Datensammlung - (automated) data collection/extraction

-=== data-scraping
+= digital methods 

-==== web-scraping (minimal strukturiert)
-Text(/Daten) aus Websiten extrahieren.
- (Ignatow & Mihalcea, 2017, pp. 39–41)
+---

-===== statisch
-Von statischen HTML-Websiten.
+<<Rogers2013>> distinguishes between digitalized/virtual and digital methods. The former methods import standard methods from the social sciences and humanities into the emerging medium. The latter are completly new methods which emerge following the new structures and their properties. +
+In this project a more inclusive conception of digital methods is assumed: the potential use of digital technology during the research.

-===== dynamisch
-Durch HTML5/Javascript ausgelieferte dynamische Inhalte (=> erforderliche Nutzerinteraktion).

-=== parsing (dokumentierte API)
-Durch Schnittstellen bereitgestellte Daten.
+== data mining
+Refers to the complete process of 'knowledge mining from data'.<<Han_etal2012>> Can be applied on various data types and consists of different steps and paradigms.

-=== device/software-driven (tracking)
-Sammlung sowie Übermittlung durch eigene Hardware/Software (bereitgestelltes Gerät, Betriebssystem, Programm, App).

-=== web-crawling
-Erstellung einer Sammlung von Websiten ausgehend einer Auswahl von Links und der Verfolgung darin enthaltender Verlinkungen.
- (Ignatow & Mihalcea, 2017, pp. 37–39)
+=== automated data collection
+In principal there are multiple possible data sources in a data mining process. A basic distinction in relevance to automated data collection can be drawn between connected devices(internet, intranets) or unconnected devices(sensors, etc.). +
+Furthermore the server-client-model is the established communication paradigms for connected devices. In order to obtain data either from server or client there exists three different interfaces: log files, apis and user interfaces which constitute the available procedures <<Jünger2018>>.

-== Datenaufbereitung - data wrangling
-Daten in maschinenlesbare Form überführen. Beispiele: PDFs, Uneinheitliche Formate in Daten

-Tips zur Durchführung in R
- (Wickham & Grolemund, 2017, pp. 119–260)
+==== collect log-data
+Collect log data which occur during providing the (web-)service or the information processing.

-== data mining/ knowledge extraction
-"Data mining" ist eher zu verstehen als "knowledge extraction from data". Für "data mining" existiert kein kurzes deutsches Äquivalent.

-=== text mining
-Auf Text bezogene Strukturanalysen.
+==== parsing from api
+Parse structured data from via a documented REST-API.

-==== Suche
-==== Frequenzanalysen
-"counting things" als Methode
- (Salganik, 2018, pp. 41–45)

-===== Wortfrequenzanalyse - word frequency extraction
-Häufigkeitsanalysen zu Wörtern Begriffen.
+==== scraping
+Automatically parse unstructured or semi-structured data from a normal website (⇒ web-scraping) or service.

-===== Diktionärsansatz - dictionary-extraction /analysis
-Häufigkeitsanalysen bezüglich Wortgruppen.

-==== Topic-Analysen
-Einführung, Def. 157
- (Ignatow & Mihalcea, 2017, pp. 156–163)
+===== scraping (static content)
+Automatically parse data from static HTML websites.

-===== Kookurenzanalysen - co-occurence analysis

-===== Latent Dirichelet Allocation (LDA)
-Ausführliche Methodenbestimmung und Verfahrensweise
- (Maier et al., 2018)
+===== scraping (dynamic content)
+Automatically parse dynamic content (HTML5/Javascript,) ⇒ sometimes requires mimicking user-interaction.
+
+
+==== crawling
+Collect websites with an initial set of webpages by following contained links <<Ignatow_etal2017>>.
+
+
+=== data wrangling
+Translate data into suited formats for automatic analysis. Examples: PDFs ⇒ Text . For a practical framework refer also <<Wickham_etal2017>>.
+
+
+==== regular expressions
+Complex string manipulations by searching and replacing specific patterns.
+
+
+==== data-format conversions
+Transfer between different formats in order to unify and handle vacancies.
+
+
+=== text preprocessing
+Some text preprocessing tasks in natuaral language processing.
+
+
+==== tokenization
+Identify words in character input sequence.
+
+
+==== stop-word removal
+Removing high-frequency words like pronoums, determiners or prepositions.
+
+
+==== stemming
+Identify common stems on a syntactical level.
+
+
+==== word/sentence segmentation
+Separate a chunk of continuous text into separate words/sentences.
+
+
+==== part-of-speech(POS)-tagging
+Identify the part of speech for words.
+
+
+==== dependency parsing
+Create corresponding syntactic, semantic or morphologic trees from input text.
+
+
+===== syntactic parsing
+Create syntactic trees from input text using mostly unsupervised learning on manually annotated treebanks (<<Ignatow_etal2017>>,61).
+
+
+==== word-sense disambiguation
+Recognizing context-sensetive meaning of words.
+
+
+=== information extraction
+Extract factual information(e.g. people, places or situations) in free text.
+
+
+==== (named-)entity-recognition/resolution/extraction
+Identify instances of specific (pre-)defined types(e.g place, name or color) in text.
+
+
+===== relation extraction
+Extract relationships between entities.
+
+
+=== information retrieval
+Retrieve relevant informations in response to the information requests.
+
+
+=== indexing
+'organize data in such a way that it can be easily retrieved later on'(<<Ignatow_etal2017>>,137)
+
+
+=== searching/querying
+'take information requests in the form of queries and return relevant documents'(<<Ignatow_etal2017>>,137). There are different models in order to estimate the similarity between records and the search queries (e.g. boolean, vector space or a probabilistic model)(ibid).
+
+
+=== statistical analysis
+
+
+
+==== frequency analysis
+Descriptiv statistical analysis by using specific text abundances.
+
+
+===== word frequencies/dictionary analysis
+Analyse statistical significant occurence of words/word-groups. Can also be combined with meta-data (e.g. creation time of document).
+
+
+===== co-occurence analysis
+Analyse statistical significant co-occurence of words in different contextual units.
+
+
+==== classification/machine learning
+Various techniques to (semi-)automatically identify specific classes. 
+
+
+===== supervised classification
+Use given training examples in order to classify certain entities.
+
+
+===== latent semantic analysis
+'The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.'(link:https://cran.r-project.org/web/packages/lsa/lsa.pdf[CRAN-R])
+
+
+===== topic modelling
+Probabilistic models to infer semantic clusters. See especially <<Papilloud_etal2018>>.
+
+
+====== latent dirichlet allocation
+'The application of LDA is based on three nested concepts: the text collection to be modelled is referred to as the corpus; one item within the corpus is a document, with words within a document called terms.(...) +
+The aim of the LDA algorithm is to model a comprehensive representation of the corpus by inferring latent content variables, called topics. Regarding the level of analysis, topics are heuristically located on an intermediate level between the corpus and the documents and can be imagined as content-related categories, or clusters. (...) Since topics are hidden in the first place, no information about them is directly observable in the data. The LDA algorithm solves this problem by inferring topics from recurring patterns of word occurrence in documents.'(<<Maier_etal2018>>,94)
+
+
+====== non-negative-matrix-factorization
+Inclusion of non-negative constraint.
+
+
+====== structural topic modelling
+Inclusion of meta-data. Refer especially <<roberts2013>>.

-===== Structural Topic Model
-Methodendarstellung; Mögliche Anwendung bei "Open-ended Questions in Survey Experiments"
- (Roberts, Stewart, Tingley, Airoldi, & others, 2013)

 ===== sentiment analysis
-Definition
-"Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative, or neutral." (Ignatow & Mihalcea, 2017, pp. 148–155)
+'Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative, or neutral.' (<<Ignatow_etal2017>> pp. 148)
+
+
+===== automated narrative, argumentative structures, irony, metaphor detection/extraction
+
+
+
+==== network analysis/modelling
+Generate networks out of text/relationships between text.

-==== Annotation/Klassifikation

-===== syntaktisch
+===== knowledge graph construction
+Modelling entities and their relationships.

-===== semantisch
-Zur überwachten automatischen Klassifikation
- (Lemke & Wiedemann, 2016, pp. 52–54)
-====== supervised learning
-Einführung
- (Ignatow & Mihalcea, 2017, pp. 62–72)
-Supervised learning im Kontext der Text-Klassifikation
- (Scharkow, 2013)

-==== Enitätsanalyse
+=== data visualization
+Visualize the mined informations.

-===== named-entity-extraction/recognition

-===== Relationsmodellierung
+==== word relationships

-====== Knowledge graph construction

-==== automated (thematic, narrative or methapor) analysis
-Einführung in die jeweiligen Bereiche(Thema, Narrativ und Metapher)
- (Ignatow & Mihalcea, 2017, pp. 73–103)

-==== Clusteranalysen
+==== networks

-==== Blended Reading
-Erschließung großer Textmengen durch Kombination verschiedener text-mining Methoden
- (Lemke & Wiedemann, 2016, pp. 17–61)

-==== Netzwerkanalyse (auf Texte bezogen)

-=== meta-data mining
+==== geo-referenced

-==== (geo-)spatial analysis
-Einführung in Mining-Methoden mit räumlichen Metadaten.
- (Sloan & Quan-Haase, 2017, 285ff.)

-==== temporal analysis

-====== interface methods
-temporal co-occurence analysis
- (Marres, 2017, pp. 106–112)
-detaillierte Darstellung:
- (Marres & Gerlitz, 2016)
+==== dynamic visualizations
+Visualizations with user interaction or time frames.

-===== volatility analysis

-== digitale Datenerhebung (neue Möglichkeiten von Experimentalgestaltung) - digital research designs
+== science practice
+General science practice
+
+
+=== digital research design
+New possibilities in surveys or data aquisition techniques.
+

 ==== ecological momentary assessments (EMA)/Experience Sampling Method (ESM)
-EMA und ESM unterscheiden sich nur geringfügig; EMA enthält eher medizinische Fragen und Messungen und ist auf natürliche Umgebung bezogen.
+Mostly equivalent. EMA focusses on medical questions or measurements in a natural environment; ESM more on subjective Questions in the real life. Four characteristics: 1) data collection in natural environments 2) Focussing on near events/impressions/actions 3) questions triggered randomly or event-based 4) multiple questions over a certain period of time [Citation after Stone and Shiffmann 1994] (<<Salganik2018>>,109)

-Enthält vier Charakteristika:
-1) Datensammlung in natürlichen Umgebungen 2) Fokussierung auf zeitlich nahe Erfahrungen/ Verhaltensweisen 3) Fragen, welche Event-ausgelöst oder randomisiert gestellt werden 4) sowie mehrerer Fragen über einen bestimmten Zeitraum [Zitat nach Stone and Shiffmann 1994] (Salganik, 2018, p. 109)

 ==== wiki surveys
-Weitere Eingrenzug von Antwortvorschlägen in Umfragen anhand Wiki-ähnlicher Umfragen.
-Beispiel: http://www.allourideas.org (Salganik, 2018, pp. 111–115)
+Guide open-answer questions with user feedback.
+
+
+==== survey data linked to big data sources
+
+
+
+===== Enriched asking
+
+
+
+===== Amplified asking
+
+
+
+=== collaborative work
+
+
+
+==== open call projects
+(e.g. annotation).
+
+
+==== distributed data collection
+
+
+
+=== digital communication
+
+
+
+== statistical modeling
+
+
+
+=== regression analysis
+

-==== surveys linked to big data sources
-Antwortmöglichkeiten bei Umfragen verbessern unter Rückgriff auf große Datenmengen.
- (Salganik, 2018, pp. 117–130)

-==== gamification in Umfragen
-Strategie um Motivation von Probanden zu erhöhen.
- (Salganik, 2018, pp. 115–117)
+=== time-series analysis

-== statistische Modellierung
-=== Regressionsanalyse
-Einführung mit R;
- (Singh & Allen, 2017, pp. 103–152)

-=== Zeitreihenanalyse
-Detaillierte Einführung in aktuelle Modelle(ARMA, GARCH, VaR).
- (Singh & Allen, 2017, pp. 153–182)

-==== Nowcasting
-Using methods to predict the future for estimation of current values. (Ex. predict influenza epidemiology combining CDC Data and Google Trends.
- (Salganik, 2018, pp. 46–50)
+=== agent-based modeling

-=== ökonometrische Modelle

-==== dynamische Modelle
-"Dynamische Modelle berücksichtigen verzögerte Variablen, so dass sie in Erweiterung von statischen Mollen erlauben, die zeitliche Dynamik von wirtschaftlichen Abläufen zu beschreiben."
- (Hackl, 2013, p. 286)

-==== Mehrgleichungsmodelle
-"Mit Mehrgleichungs-Modellen können Systeme dargestellt werden, in denen die Entwicklungen und Wechselwirkungen von mehr als einer endogenen Variablen beschrieben werden können."
- (Hackl, 2013, p. 286)
+=== 

-== Daten-Visualisierungen
-Visualisierung mit R in der Ökonometrie
- (Singh & Allen, 2017, pp. 75–102)

-== digitale Wissenschaftskommunikation
-=== webbasiert
-=== app-basiert
-=== audio-basiert (podcasts)
-== kollaboratives Arbeiten
-=== Dokumentenerstellung
-=== Arbeitsaufteilung (open call projects)
-=== Verteilte Datenerhebung
-== Analyse sozialer Netzwerke - (social) network (structure) analysis
-Einführung
- (Cioffi-Revilla, 2014, pp. 89–117)
- (Staab, Koltsova, & Ignatov, 2018)
- (Sloan & Quan-Haase, 2017, 197ff. 365ff.)

-=== link analysis
-== complexity modeling/deep learning
-=== social simulation models
+== social complexity modeling/ social simulation

-= References
-Cioffi-Revilla, C. (2014). Introduction to Computational Social Science: Principles and Applications. Texts in Computer Science. London, s.l.: Springer London. Retrieved from http://dx.doi.org/10.1007/978-1-4471-5661-1
-Hackl, P. (2013). Einführung in die Ökonometrie (2., aktualisierte Aufl.). Wi - Wirtschaft. München: Pearson. Retrieved from http://lib.myilibrary.com/detail.asp?id=650988

-Ignatow, G., & Mihalcea, R. F. (2017). Text mining: A guidebook for the social sciences. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: Sage.

-Lemke, M., & Wiedemann, Gregor (Eds.). (2016). Text Mining in den Sozialwissenschaften: Grundlagen und Anwendungen zwischen qualitativer und quantitativer Diskursanalyse. Wiesbaden: Springer VS.
+=== nowcasting
+Using methods to predict the future for estimation of current values. (Example: predict influenza epidemiology combining CDC Data and Google Trends(<<Salganik2018>>,46–50)).

-Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., . . . Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12, 93–118. https://doi.org/10.1080/19312458.2018.1430754

-Marres, N. (2017). Digital sociology: The reinvention of social research. Cambridge, UK, Malden, MA, USA: Polity.
+[bibliography]
+== References

-Marres, N., & Gerlitz, C. (2016). Interface Methods: Renegotiating Relations between Digital Social Research, STS and Sociology. The Sociological Review, 64, 21–46. https://doi.org/10.1111/1467-954X.12314
+- [[[Han_etal2012]]] Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. Saint Louis, UNITED STATES: Elsevier Science & Technology.

-Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M., & others (2013). The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation (pp. 1–20).
+- [[[Ignatow_etal2017]]] Ignatow, G., & Mihalcea, R. F. (2017). Text mining: A guidebook for the social sciences. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: Sage.

-Salganik, M. J. (2018). Bit by bit: Social research in the digital age.
+- [[[Jünger2018]]] Jünger, Jakob (2018): Mapping the Field of Automated Data Collection on the Web. Data Types, Collection Approaches and their Research Logic. In: Stützer, Cathleen / Welker, Martin / Egger, Marc (Hg). Computational Social Science in the Age of Big Data. Concepts, Methodologies, Tools, and Applications. Neue Schriften zur Online-Forschung der Deutschen Gesellschaft für Online-Forschung (DGOF). Köln: Halem-Verlag, S. 104-130.

-Scharkow, M. (2013). Thematic content analysis using supervised machine learning: An empirical evaluation using German online news. Quality & Quantity, 47, 761–773. https://doi.org/10.1007/s11135-011-9545-7
+- [[[Maier_etal2018]]] Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., . . . Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2-3), 93–118. https://doi.org/10.1080/19312458.2018.1430754

-Singh, A. K., & Allen, D. E. (2017). R in Finance and Economics: WORLD SCIENTIFIC.
+- [[[Papilloud_etal2018]]] Papilloud, C., & Hinneburg, A. (Eds.). (2018). Studienskripten zur Soziologie. Qualitative Textanalyse mit Topic-Modellen: Eine Einführung für Sozialwissenschaftler. Wiesbaden: Springer VS.

-Sloan, L., & Quan-Haase, A. (Eds.). (2017). The SAGE handbook of social media research methods. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: SAGE reference. Retrieved from https://ebookcentral.proquest.com/lib/gbv/detail.action?docID=4771733
+- [[[Roberts2013]]] Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M., & others (2013). The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation (pp. 1–20).

-Staab, S., Koltsova, O., & Ignatov, D. I. (Eds.). (2018). Information Systems and Applications, incl. Internet/Web, and HCI: Vol. 11185. Social Informatics: 10th International Conference, SocInfo 2018, St. Petersburg, Russia, September 25-28, 2018, Proceedings, Part I. Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-030-01129-1
+- [[[Rogers2013]]] Rogers, R. (2013). Digital methods. Cambridge, Massachusetts, London, England: The MIT Press.

-Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, tidy, transform, visualize, and model data. Beijing, Boston, Farnham, Sebastopol, Tokyo: O'Reilly UK Ltd.
+- [[[Salganik2018]]] Salganik, M. J. (2018). Bit by bit: Social research in the digital age.

+- [[[Wickham_etal2017]]] Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, tidy, transform, visualize, and model data. Beijing, Boston, Farnham, Sebastopol, Tokyo: O’Reilly UK Ltd.