Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -w/--wait & --random-wait options which implement rate limiting #5

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

morgant
Copy link

@morgant morgant commented May 20, 2024

IMPORTANT: PR #4 is a prerequisite for the PR and should be merged first.

This implements rate limiting (see Issue #1) by adding:

  • New -w/--wait option which accepts a number of seconds to pause/sleep between subsequent requests
  • New --random-wait option which will cause -w/--wait seconds to be randomized by 0.5x-2x
  • New WaybackMachineDownloader#wait method which implements the functionality using the aforementioned options

Specifically, WaybackMachineDownloader#wait is called only when requesting additional pages of results in #get_all_snapshots_to_consider and before downloading individual files in #download_files. This means that the first request of pages to download in not delayed, but all subsequent requests & actual page downloads are delayed.

Example usage:

./bin/wayback_machine_downloader --to 20120222134837 --wait 300 --random-wait http://www.folklore.org/

This should pause for a random number of seconds between 150 (2.5 minutes) and 600 (10 minutes) between each request, since it's using both --wait 300 (5 minutes) and --random-wait.

…with Ruby 3.x, plus added some CGI.escape() to ensure that the requested URL's query string parameters are handled correctly and not accidentally mixed with the WayBack Machine's query string parameters
…p/wait between requests, plus a '--wait-random' option which will randomize the number of wait seconds by a 0.5x-2x. These options are used by the new WaybackMachineDownloader#wait method which is called during subsequent requests. Issue cocoflan#1
@morgant morgant changed the title Add '-w'/'--wait' & '--random-wait' options which implement rate limiting Add -w/--wait & --random-wait options which implement rate limiting May 20, 2024
@morgant morgant mentioned this pull request May 20, 2024
@morgant
Copy link
Author

morgant commented Jun 3, 2024

I have merged in a fix from the prerequisite PR #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant