Bookmarks tagged [web-scraping]
https://github.com/jsdom/jsdom
jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node...
- tags: web-scraping, tools, dom, node.js, javascript
- source code
https://blog.kowalczyk.info/article/ea07db1b9bff415ab180b0525f3898f6/advanced-web-spidering-with-pup...
Puppeteer is a node.js library that makes it easy to do advanced web scraping and spidering. Older generation of web scraping and spidering tools would grab and analyze HTML pages as returned by a web...
- tags: node.js, puppeteer, web-scraping
https://github.com/chineking/cola
A distributed crawling framework.
- tags: python, web-crawling, web-scraping
- source code
https://pythonhosted.org/feedparser/
Universal feed parser.
- tags: python, web-crawling, web-scraping
https://github.com/lorien/grab
Site scraping framework.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/MechanicalSoup/MechanicalSoup
A Python library for automating interaction with websites.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/scrapinghub/portia
Visual scraping for Scrapy.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/binux/pyspider
A powerful spider system.
- tags: python, web-crawling, web-scraping
- source code
https://github.com/jmcarp/robobrowser
A simple, Pythonic library for browsing the web without a standalone web browser.
- tags: python, web-crawling, web-scraping
- source code
A fast high-level screen scraping and web crawling framework.
- tags: python, web-crawling, web-scraping
- source code