Skip to content

Releases: spider-rs/spider

v2.22.6

24 Dec 15:46
Compare
Choose a tag to compare

Whats Changed

This release brings in a SQLite for improved memory handling with the feature flags disk_native_tls, disk, and disk_aws.
SQLite is set to be used in a hybrid manner with memory in order to maintain performance.

With disk handling and our string interning urls crawled can entire the billions of resources or infinite with EFS attached.

Full Changelog: v2.21.33...v2.22.6

v2.21.33

18 Dec 11:53
Compare
Choose a tag to compare

Whats Changed

Fix http crawling past first page
Fix safe handling abs urls

Full Changelog: v2.21.27...v2.21.33

v2.21.27

10 Dec 12:21
Compare
Choose a tag to compare

Whats Changed

  • add balance feature flag to switch to global semaphores.
  • add remote_addr feature flag to get the page remote address / ip.
  • add re-usable chrome client and stream ws
  • fix chrome inline page navigations
  • fix constraining html only pages

Full Changelog: v2.20.6...v2.21.27

v2.20.6

05 Dec 14:11
Compare
Choose a tag to compare

Whats Changed

  • fix chrome initial page return links
  • add hydration scrips ignore for next.js and astro
  • add base script targets for smart mode
  • add custom domain layer interception for giants
  • add interception analytics and ads blocking
  • fix chrome page timeout bytes transferring

Full Changelog: v2.16.0...v2.20.6

v2.16.0

04 Dec 18:11
Compare
Choose a tag to compare

Whats Changed

  • Chrome crawls now get the total bytes used over the network.
  • Improved ignore list for unwanted crawling request for chrome interception.

Full Changelog: v2.15.0...v2.16.0

v2.15.0

04 Dec 14:37
Compare
Choose a tag to compare

Whats Changed

Major possible performance increase for chrome crawling blocking extra unwanted XHR request and scripts.

  • perf(chrome): add xhr interception

Full Changelog: v2.14.0...v2.15.0

v2.14.0

30 Nov 11:18
Compare
Choose a tag to compare

Release Notes

Features

  • feat(transform): add transform_content_send for async streaming.

Improvements

  • chore(interning): add optional string-interning.
  • chore(website): fix crawl, establish domain removal [#233].
  • chore(transform): add streaming markdown/commonmark transforming.
  • chore(transform): add streaming text transforming.
  • chore(chrome): add request interception analytics ignore.

Bug Fixes

  • chore(page): fix URL encode handling mismatch.
  • chore(transform): fix repeated text streaming.
  • chore(page): fix page link return with full URLs.
  • chore(website): fix crawl delay handling.
  • perf(website): reduce extra context switching on crawls.

Thank you for the help @Revertron!

Full Changelog: v2.13.78...v2.14.0

v2.13.78

27 Nov 15:54
Compare
Choose a tag to compare

Whats Changed

  • Fix infinite loop with backoff Gateway retries
  • Fix limit handling break

Full Changelog: v2.13.64...v2.13.78

v2.13.64

07 Nov 21:24
Compare
Choose a tag to compare

Whats Changed

Major fixes for critical bugs that can hang the process.

  • perf reduce cpu usage for streaming rewriter
  • fix hang on iteration streaming
  • fix chrome connection hang
  • fix cache backend default build
  • fix domain absolute link join
  • fix shutdown break loop
  • add ignore protocol list

Full Changelog: v2.12.12...v2.13.64

v2.12.12

05 Nov 03:09
Compare
Choose a tag to compare

Fix smart mode re-rendering and performance

  • fix smart mode re-rendering inline js detection
  • perf improve smart mode parsing
  • fix encoding smart mode html
  • add pin html pre-parsing
  • add chrome status code check for performing full actions

Full Changelog: v2.11.20...v2.12.12