1. natuaral language processing(NLP)
__
1.1. stand-alone applications
- RapidMiner (website repository )
-
< | AGPL-3.0 | stand-alone application | Java | >
1.2. programming-frameworks/libraries etc.
1.2.1. R
- koRpus (website website-cran )
-
'A set of tools to analyze texts.' < | GPL-3.0 | library | R | >
- polmineR (website-cran repository )
-
< | GPL-3.0 | library | R | >
- quanteda (website repository )
-
'The package is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.' < | GPL-3.0 | library | R | >
- tm (website website-cran )
-
< | GPL-3.0 | library | R | >
- tosca (website-cran )
-
'A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang’s intruder words and intruder topics is provided.' < | GPL-3.0 | library | R | >
1.2.2. Python
- Gensim (website repository )
-
'Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.' < | LGPL | library | Python | >
- NLTK (website repository )
-
'NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.' < | Apache-2.0 | framework | Python | >
- Pandas (website repository )
-
< | BSD | library | Python | >
- xtas (website repository )
-
the eXtensible Text Analysis Suite(xtas) 'is a collection of natural language processing and text mining tools, brought together in a single software package with built-in distributed computing and support for the Elasticsearch document store.' < | Apache-2.0 | framework | Python | >
1.2.3. Others
- Apache OpenNLP (website repository )
-
'OpenNLP supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution.' < | Apache-2.0 | library | Java | >
- GATE (website repository )
-
GATE - General Architecture for Text Engineering < | LGPL | framework | Java | >
- spaCy (website repository )
-
spaCy 'excels at large-scale information extraction tasks. It’s written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.' < | MIT | library | Cython | >
- Stanford CoreNLP (website repository )
-
< | GPL-3.0 | library | Java | >