*Bug fixes
* Fix a bug preventing SSL connections from working
-
Major enhancements
-
Added support for HTTP Basic Auth with URLs containing a username and password
-
Added support for anonymous HTTP proxies
-
-
Minor enhancements
-
Added read_timeout option to set the HTTP request timeout in seconds
-
-
Bug fixes
-
Don’t fatal error if a page request times out
-
Fix double encoding of links containing %20
-
-
Major enhancements
-
Added page storage engines for MongoDB and Redis
-
-
Minor enhancements
-
Use xpath for link parsing instead of CSS (faster) (Marc Seeger)
-
Added skip_query_strings option to skip links with query strings (Joost Baaij)
-
-
Bug fixes
-
Only consider status code 300..307 a redirect (Marc Seeger)
-
Canonicalize redirect links (Marc Seeger)
-
-
Major enchancements
-
Cookies can be accepted and sent with each HTTP request.
-
-
Bug fixes
-
Fixed issue that allowed following redirects off the original domain
-
-
Minor enhancements
-
Added an attr_accessor to Page for the HTTP response body
-
-
Bug fixes
-
Fixed incorrect method calls in CLI scripts
-
-
Major enchancements
-
Option for persistent storage of pages during crawl with TokyoCabinet or PStore
-
-
Minor enhancements
-
Options can be set via methods on the Core object in the crawl block
-
-
Minor enhancements
-
Options are now applied per-crawl, rather than module-wide.
-
-
Bug fixes
-
Fixed a bug which caused deadlock if an exception occurred when crawling the last page in the queue.
-
-
Minor enhancements
-
When the :verbose option is set to true, exception backtraces are printed to aid debugging.
-
-
Major enhancements
-
Added HTTPS support.
-
CLI program ‘anemone’, which is a frontend for several tasks.
-
-
Minor enhancements
-
HTTP request response time recorded in Page.
-
Use of persistent HTTP connections.
-