- 1. user-consented tracking
- 2. scraping
- 3. tools for corpus linguistics/text mining/(semi-)automated text analysis
- 4. computer assisted/aided qualitative data analysis software (CAQDAS)
- 5. natuaral language processing(NLP)
- 6. topic-models
- 7. sentiment analysis
- 8. visualization
- 9. collaborative annotation
- 10. collaborative writing
- 11. research data archiving
- 12. statistical software
- 13. nowcasting
- 14. network analysis
- 15. search
- 16. ESM/EMA surveys
- 17. audio-transcriptions
- 18. optical character recognition (OCR)
- 19. online experiments
- 20. (remote) eye tracking
- 21. agent-based modeling
- 22. investigative journalism
- 23. miscellaneous
1. user-consented tracking
Collection of sensor data on (mobile) devices in accordance with data protection laws.
- AWARE (website repository-android repository-iOS repository-OSX repository-server )
-
'AWARE is an Android framework dedicated to instrument, infer, log and share mobile context information, for application developers, researchers and smartphone users. AWARE captures hardware-, software-, and human-based data. The data is then analyzed using AWARE plugins. They transform data into information you can understand.' Source, visited: 27.02.2019 < | Apache-2.0 | framework | Java | >
- MEILI (website-dev repository-group )
-
< | GPL-3.0 | framework | Java | >
- Passive Data Kit (website repository-djangoserver repository-android repository-iOS )
-
< | Apache-2.0 | framework | Python, Java | english>
- Web Historian(CE) (website-doi website repository )
-
Chrome browser extension designed to integrate web browsing history data collection into research projects collecting other types of data from participants (e.g. surveys, in-depth interviews, experiments). It uses client-side D3 visualizations to inform participants about the data being collected during the informed consent process. It allows participants to delete specific browsing data or opt-out of browsing data collection. It directs participants to an online survey once they have reviewed their data and made a choice of whether to participate. It has been used with Qualtrics surveys, but any survey that accepts data from a URL will work. It works with the open source Passive Data Kit (PDK) as the backend for data collection. To successfully upload, you need to fill in the address of your PDK server in the js/app/config.js file. < e06b3e174f9668f5c62f30a9bedde223023e0bca | GPL-3.0 | plugin | Javascript | english>
2. scraping
Tools in the area of web-scraping
- TWINT (website repository )
-
TWINT (Twitter Intelligence Tool) 'Formerly known as Tweep, Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API.' Retrieved 07.03.2019 < | MIT | package | Python | >
- YouTubeComments (website repository )
-
'This repository contains an R script as well as an interactive Jupyter notebook to demonstrate how to automatically collect, format, and explore YouTube comments, including the emojis they contain. The script and notebook showcase the following steps: Getting access to the YouTube API Extracting comments for a video Formatting the comments & extracting emojis Basic sentiment analysis for text & emojis' Retrieved 07.03.2019 < | Unknown | package | R | >
- facepager (wiki repository )
-
< | MIT | package | Python | >
- Scrapy (website repository )
-
< | BSD | package | Python | >
- RSelenium (repository )
-
< | AGPL-3.0 | package | R | >
3. tools for corpus linguistics/text mining/(semi-)automated text analysis
Integrated platforms for corpus analysis and processing.
- AmCAT (website-entwickler repository wiki )
-
'The Amsterdam Content Analysis Toolkit (AmCAT) is an open source infrastructure that makes it easy to do large-scale automatic and manual content analysis (text analysis) for the social sciences and humanities.' < | AGPL-3.0 | SaaS | Python | >
- COSMOS (website )
-
COSMOS Open Data Analytics software < | Proprietary | standalone | | >
- CWB (website repository-cwb repository-cqpweb )
-
CWB, the IMS[Institut für Maschinelle Sprachverarbeitung Stuttgart] Open Corpus Workbench is 'a fast, powerful and extremely flexible corpus querying system.' < 3.4.15 | GPL-3.0 | framework | C, Perl | english>
- LCM (website )
-
Leipzig Corpus Miner a decentralized SaaS application for the analysis of very large amounts of news texts < | LGPL | framework | Java, R | >
- iLCM (website repository-docker )
-
'The iLCM[LCM=Leipzig Corpus Miner] project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a ‘Software as a Service’ architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data using text mining methods and requirements for the reproducibility of data-driven research designs in the social sciences.' source, retrieved 08.03.2019 < 0.96 | LGPL | SaaS | Java, Python, R | german>
4. computer assisted/aided qualitative data analysis software (CAQDAS)
assist with qualitative research such as transcription analysis, coding and text interpretation, recursive abstraction, content analysis, discourse analysis, grounded theory methodology, etc.
- ATLAS.ti (website )
-
< | Proprietary | standalone | | >
- Leximancer (website )
-
'Leximancer automatically analyses your text documents to identify the high level concepts in your text documents, delivering the key ideas and actionable insights you need with powerful interactive visualisations and data exports.' < | Proprietary | standalone | | >
- MAXQDA (website-uhh )
-
'MAXQDA gehört zu den weltweit führenden und umfangreichsten QDA-Software-Programmen im Bereich der Qualitativen und Mixed-Methods-Forschung. Die Software hilft Ihnen beim Erfassen, Organisieren, Analysieren, Visualisieren und Veröffentlichen Ihrer Daten. Ob Grounded Theory, Literaturreview, explorative Marktforschung, Interviews, Webseitenanalyse oder Surveys: Analysieren Sie was Sie wollen, wie Sie wollen.MAXQDA Analytics Pro ist die erweiterte Version von MAXQDA und enthält neben allen Funktionen für die Qualitative & Mixed Methods-Forschung auch ein Modul für die quantitative Textanalyse (MAXDictio) und ein Modul für die statistische Datenanalyse (MAXQDA Stats)'Source, visited: 27.02.2019 < | Proprietary | standalone | | >
- NVivo (website )
-
< | Proprietary | standalone | | >
- QDAMiner (website )
-
< | Proprietary | standalone | | >
- ORA Pro (website )
-
< | Proprietary | standalone | | >
- Quirkos (website repository )
-
< | Proprietary | standalone | | >
- RQDA (website repository )
-
'It includes a number of standard Computer-Aided Qualitative Data Analysis features. In addition it seamlessly integrates with R, which means that a) statistical analysis on the coding is possible, and b) functions for data manipulation and analysis can be easily extended by writing R functions. To some extent, RQDA and R make an integrated platform for both quantitative and qualitative data analysis.' < | BSD | package | R | >
- TAMS (website )
-
'Text Analysis Markup System (TAMS) is both a system of marking documents for qualitative analysis and a series of tools for mining information based on that syntax.' < | GPL-2.0 | standalone | | >
5. natuaral language processing(NLP)
__
- Apache OpenNLP (website repository )
-
'OpenNLP supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution.' < | Apache-2.0 | package | Java | >
- GATE (website repository )
-
GATE - General Architecture for Text Engineering < | LGPL | package | Java | >
- Gensim (website repository )
-
'Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.' < | LGPL | package | Python | >
- NLTK (website repository )
-
'NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.' < | Apache-2.0 | package | Python | >
- Pandas (website repository )
-
< | BSD | package | Python | >
- polmineR (website-cran repository )
-
< | GPL-3.0 | package | R | >
- quanteda (website repository )
-
'The package is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.' < | GPL-3.0 | package | R | >
- RapidMiner (website repository )
-
< | AGPL-3.0 | framework | Java | >
- spaCy (website repository )
-
spaCy 'excels at large-scale information extraction tasks. It’s written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.' < | MIT | package | Cython | >
- Stanford CoreNLP (website repository )
-
< | GPL-3.0 | framework | Java | >
- tm (website website-cran repository )
-
< | GPL-3.0 | package | R | >
- xtas (website repository )
-
the eXtensible Text Analysis Suite(xtas) 'is a collection of natural language processing and text mining tools, brought together in a single software package with built-in distributed computing and support for the Elasticsearch document store.' < | Apache-2.0 | framework | Python | >
6. topic-models
__
- MALLET (website repository )
-
< | Apache-2.0 | package | Java | >
- TOME (website repository )
-
'TOME is a tool to support the interactive exploration and visualization of text-based archives, supported by a Digital Humanities Startup Grant from the National Endowment for the Humanities (Lauren Klein and Jacob Eisenstein, co-PIs). Drawing upon the technique of topic modeling—a machine learning method for identifying the set of topics, or themes, in a document set—our tool allows humanities scholars to trace the evolution and circulation of these themes across networks and over time.' < | Unknown | package | Python, Jupyter Notebook | >
- Stm (website repository )
-
'The Structural Topic Model (STM) allows researchers to estimate topic models with document-level covariates. The package also includes tools for model selection, visualization, and estimation of topic-covariate regressions. Methods developed in Roberts et al (2014) <doi:10.1111/ajps.12103> and Roberts et al (2016) <doi:10.1080/01621459.2016.1141684>.' < | MIT | package | R | >
7. sentiment analysis
__
- lexicoder (website )
-
'Lexicoder performs simple deductive content analyses of any kind of text, in almost any language. All that is required is the text itself, and a dictionary. Our own work initially focused on the analysis of newspaper stories during election campaigns, and both television and newspaper stories about public policy issues. The software can deal with almost any text, however, and lots of it. Our own databases typically include up to 100,000 news stories. Lexicoder processes these data, even with a relatively complicated coding dictionary, in about fifteen minutes. The software has, we hope, a wide range of applications in the social sciences. It is not the only software that conducts content analysis, of course - there are many packages out there, some of which are much more sophisticated than this one. The advantage to Lexicoder, however, is that it can run on any computer with a recent version of Java (PC or Mac), it is very simple to use, it can deal with huge bodies of data, it can be called from R as well as from the Command Line, and its free.' < | Proprietary | package | Java | >
- OpinionFinder (website repository )
-
'OpinionFinder is a system that processes documents and automatically identifies subjective sentences as well as various aspects of subjectivity within sentences, including agents who are sources of opinion, direct subjective expressions and speech events, and sentiment expressions. OpinionFinder was developed by researchers at the University of Pittsburgh, Cornell University, and the University of Utah. In addition to OpinionFinder, we are also releasing the automatic annotations produced by running OpinionFinder on a subset of the Penn Treebank.' < | Unknown | package | Java | >
- Readme (website )
-
'The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories.' < | CC BY-NC-ND-3.0 | package | R | >
8. visualization
__
- Gephi (website repository )
-
'Gephi is an award-winning open-source platform for visualizing and manipulating large graphs.' < | GPL-3.0 | package | Java | >
- sigma.js (website repository )
-
'Sigma is a JavaScript library dedicated to graph drawing. It makes easy to publish networks on Web pages, and allows developers to integrate network exploration in rich Web applications.' < | MIT | package | Javascript | >
- scikit-image (website repository )
-
'scikit-image is a collection of algorithms for image processing. It is available free of charge and free of restriction. We pride ourselves on high-quality, peer-reviewed code, written by an active community of volunteers.' < | BSD | package | Python | >
9. collaborative annotation
__
- CATMA (website website-uhh repository )
-
'CATMA (Computer Assisted Text Markup and Analysis) is a practical and intuitive tool for text researchers. In CATMA users can combine the hermeneutic, ‘undogmatic’ and the digital, taxonomy based approach to text and corpora—as a single researcher, or in real-time collaboration with other team members.' < | Apache-2.0 | package | Python | >
- WebAnno (website repository )
-
WebAnno is a multi-user tool supporting different roles such as annotator, curator, and project manager. The progress and quality of annotation projects can be monitored and measuered in terms of inter-annotator agreement. Multiple annotation projects can be conducted in parallel. < | Apache-2.0 | package | Python | >
10. collaborative writing
__
- FidusWriter (website repository )
-
< | AGPL-3.0 | package | Python, Javascript | >
11. research data archiving
__
- dataverse (website repository )
-
< | Apache-2.0 | framework | Java | >
12. statistical software
software that helps calcualting with specific statistical models
- gretl (website repository )
-
Is a cross-platform software package for econometric analysis < | GPL-3.0 | package | C | >
- MLwiN (website repository )
-
< | Proprietary | package | | >
- SPSS (website-uhh )
-
< | Proprietary | package | | >
- STATA (website-uhh )
-
< | Proprietary | package | | >
13. nowcasting
__
- Nowcasting (website-cran repository )
-
< | GPL-3.0 | package | R | >
14. network analysis
social network analysis
- AutoMap (website )
-
'AutoMap enables the extraction of information from texts using Network Text Analysis methods. AutoMap supports the extraction of several types of data from unstructured documents. The type of information that can be extracted includes: content analytic data (words and frequencies), semantic network data (the network of concepts), meta-network data (the cross classification of concepts into their ontological category such as people, places and things and the connections among these classified concepts), and sentiment data (attitudes, beliefs). Extraction of each type of data assumes the previously listed type of data has been extracted.' < | Proprietary | package | Java | >
- NodeXL (website )
-
< | Proprietary | package | | >
- ORA Pro (website repository )
-
< | Proprietary | package | | >
- Pajek (website )
-
< | Proprietary | package | | >
- NetworkX (website repository )
-
'Data structures for graphs, digraphs, and multigraphs Many standard graph algorithms Network structure and analysis measures Generators for classic graphs, random graphs, and synthetic networks Nodes can be 'anything' (e.g., text, images, XML records) Edges can hold arbitrary data (e.g., weights, time-series) Open source 3-clause BSD license Well tested with over 90% code coverage Additional benefits from Python include fast prototyping, easy to teach, and multi-platform.' < | BSD | package | Python | >
- UCINET (website )
-
'UCINET 6 for Windows is a software package for the analysis of social network data. It was developed by Lin Freeman, Martin Everett and Steve Borgatti. It comes with the NetDraw network visualization tool.' < | Proprietary | package | | >
15. search
information retrieval in large datasets.
- LuceneSolr (website repository )
-
< | Apache-2.0 | package | | >
16. ESM/EMA surveys
Datenerhebung in 'natürlicher' Umgebung.
- paco (website repository )
-
< | Apache-2.0 | framework | Objective-C, Java | >
17. audio-transcriptions
software that converts speech into electronic text document.
- f4analyse (website )
-
< | Proprietary | standalone | | >
- EXMARaLDA (website repository )
-
'EXMARaLDA ist ein System für das computergestützte Arbeiten mit (vor allem) mündlichen Korpora. Es besteht aus einem Transkriptions- und Annotationseditor (Partitur-Editor), einem Tool zum Verwalten von Korpora (Corpus-Manager) und einem Such- und Analysewerkzeug (EXAKT).' < | Unknown | framework | Java | >
18. optical character recognition (OCR)
OCR is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text.
- tesseract (repository )
-
'Tesseract is an open source text recognizer (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages.' < | Apache-2.0 | package | Python | >
19. online experiments
__
- LIONESS (website )
-
'LIONESS Lab is a free web-based platform for online interactive experiments. It allows you to develop, test and conduct decision-making experiments with live feedback between participants. LIONESS experiments include a standardized set of methods to deal with the set of challenges arising when conducting interactive experiments online. These methods reflect current ‘best practices’ for, e.g., preventing participants to enter a session more than once, facilitating on-the-fly formation of interaction groups, reducing waiting times for participants, driving down attrition by retaining attention of online participants and, importantly, adequate handling of cases in which participants drop out.With LIONESS Lab you can readily develop and test your experiments online in a user-friendly environment. You can develop experiments from scratch in a point-and-click fashion or start from an existent design from our growing repository and adjust it according your own requirements.' Retrieved 07.03.2019 < | Proprietary | package | Javascript | >
- nodeGame (website repository )
-
'NodeGame is a free, open source JavaScript/HTML5 framework for conducting synchronous experiments online and in the lab directly in the browser window. It is specifically designed to support behavioral research along three dimensions: larger group sizes, real-time (but also discrete time) experiments, batches of simultaneous experiments.' < | MIT | package | Javascript | >
- Breadboard (website repository )
-
'Breadboard is a software platform for developing and conducting human interaction experiments on networks. It allows researchers to rapidly design experiments using a flexible domain-specific language and provides researchers with immediate access to a diverse pool of online participants.' Retrieved: 07.03.2019 < | Unknown | package | Javascript | >
- Empirica(beta) (website repository )
-
'Open source project to tackle the problem of long development cycles required to produce software to conduct multi-participant and real-time human experiments online.' Retrieved: 07.03.2019 < | MIT | package | Javascript | >
20. (remote) eye tracking
None
- SearchGazer (website repository )
-
SearchGazer: Webcam Eye Tracking for Remote Studies of Web Search < | MIT | package | Javascript | >
21. agent-based modeling
__
- NetLogo (website repository )
-
'NetLogo is a multi-agent programmable modeling environment. It is used by many tens of thousands of students, teachers and researchers worldwide. It also powers HubNet participatory simulations.' < | Unknown | package | Java, Scala | >
22. investigative journalism
__
- DocumentCloud (website repository )
-
'DocumentCloud is a platform founded on the belief that if journalists were more open about their sourcing, the public would be more inclined to trust their reporting. The platform is a tool to help journalists share, analyze, annotate and, ultimately, publish source documents to the open web.' Source, visited: 04.03.2019 < | MIT | standalone | Ruby | >
- NEW/S/LEAK (website repository )
-
'DocumentCloud is a platform founded on the belief that if journalists were more open about their sourcing, the public would be more inclined to trust their reporting. The platform is a tool to help journalists share, analyze, annotate and, ultimately, publish source documents to the open web.' Source, visited: 04.03.2019 < | AGPL-3.0 | standalone | Ruby | >
23. miscellaneous
__
- spades (website repository )
-
< | None | package | R | >
- scikit-learn (website repository )
-
'Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.' < | BSD | package | Python | >