SoftwareCategory_scrapingbycontrolingwebbrowsers · Wiki · forschungslabor_digitale_Methoden / digitale Methoden

Software category scraping by controling web browsers (work in progress: this list is preliminary and will be updated continuously)

1. scraping by controling web browsers

Scraping with remote controlled browser-engines.

Pyppeteer (repository website-documentation ): Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library. < | MIT | library | Python | >
splash (website-documentation repository ): Splash is a javascript rendering service with an HTTP API. It’s a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5. < | BSD | library | Python | >

Puppeteer (website repository ): Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium. < | Apache-2.0 | library | Javascript | >
Selenium (website repository ): Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provides infrastructure for the W3C WebDriver specification — a platform and language-neutral coding interface compatible with all major web browsers. < | Apache-2.0 | library | Java | >

Please register or sign in to add a comment.