1. scraping by controling web browsers
Scraping with remote controlled browser-engines.
1.3. programming-frameworks/libraries etc.
1.3.1. R
- RSelenium (repository )
-
< | AGPL-3.0 | library | R | >
1.3.2. Python
- Pyppeteer (repository website-documentation )
-
Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library. < | MIT | library | Python | >
- splash (website-documentation repository )
-
Splash is a javascript rendering service with an HTTP API. It’s a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5. < | BSD | library | Python | >
1.3.3. Others
- Puppeteer (website repository )
-
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium. < | Apache-2.0 | library | Javascript | >
- Selenium (website repository )
-
Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provides infrastructure for the W3C WebDriver specification — a platform and language-neutral coding interface compatible with all major web browsers. < | Apache-2.0 | library | Java | >