Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting? #1

Open
eoghanmurray opened this issue Nov 20, 2023 · 6 comments
Open

Rate limiting? #1

eoghanmurray opened this issue Nov 20, 2023 · 6 comments

Comments

@eoghanmurray
Copy link

I'm experiencing it working for a while and then getting 443 (Connection refused - connect after about 15 urls are downloaded... is there a way to hack in some rate limiting? I'm running it via gem install.

@mfncooper
Copy link

I'm seeing the exact same thing - 17 successful downloads followed by 1469 connection refused messages. Some way of getting around this would be much appreciated. (I too used gem install.)

@WiiNewU
Copy link

WiiNewU commented Mar 10, 2024

Hi, I have the same issue. Both with gem install and docker.io install. This is more of a workaround but it is posable to increase the latency of your Linux OS’s internet connection temporally to solve having your IP address blocked from the Internet Archive and restore your Linux settings back to default to continue using your PC normally.

I recommend doing this in a VM dedicated to wayback-machine-downloader so it wont interfere with your main system. Also you can resume your failed download so you don’t have to restart from scratch by running the same wayback-machine-downloader command again in the same directory where it last quit. Additionally don't have multiple wayback-machine-downloader commands running or else you will get blocked again. Blocks usually last 60 seconds starting from last Internet Archive connection.

Using the tc command:
sudo tc qdisc add dev wlo1 root netem delay 400ms

*Replace wlo1 with your network card of your VM or PC,
*This delays each connection by 400ms, which is slower then what the IA will block you temporally, but still decently quick.

Test you latency though “ping github.com” command to see if it takes longer then 400ms per each connection.

Run the “wayback_machine_downloader http://example.com” command again in the same directory to resume download. It should now successfully run.

When done run “sudo su” then “tc qdisc del dev wlo1 root” to clear any tc settings you have, including the recently made one. Replace wlo1 with your network card.

@mfncooper
Copy link

Unfortunately, I'm using macOS, so this solution doesn't work (unless I set up a Linux VM on my Mac, which seems a bit heavyweight just to be able to download a website).

What's really needed is the integration of something like Strangler, but that doesn't seem likely to happen, since it looks like this project has been abandoned. (Last commit was 7 years ago.) If I was a Ruby developer, I could help, but I'm not.

@IsaacElenbaas
Copy link

This project is rather small - just open its source wherever you installed / downloaded it to, search for URI, and add sleep(whatever) above the requests. There is only the main one which downloads a page and an initial one to get the list of pages (which doesn't really need the delay).

I found that you need much more than 0.4s nowadays though, and even then it isn't 100% consistent. Just re-run a few times to catch what was missed.

morgant added a commit to UNNA/wayback-machine-downloader that referenced this issue May 20, 2024
…p/wait between requests, plus a '--wait-random' option which will randomize the number of wait seconds by a 0.5x-2x. These options are used by the new WaybackMachineDownloader#wait method which is called during subsequent requests. Issue cocoflan#1
@morgant
Copy link

morgant commented May 20, 2024

I have implemented rudimentary rate limiting support in the form of new -w/--wait & --random-wait options. See PR #5.

morgant added a commit to UNNA/wayback-machine-downloader that referenced this issue Jun 3, 2024
… including a new '--tries' option accepting a number of times to retry if a connection fails for a fatal error (not just an HTTP 4XX/5XX error; the default is 20 retries'). Issue cocoflan#1
@morgant
Copy link

morgant commented Jun 3, 2024

I have also implemented retries upon network errors (not HTTP errors), incl. overriding the default of 20 tries with a new --tries option. See PR #6, of which PR #5 is a prerequisite.

I implemented as a separate PR as I used the retryable gem, but maybe we just want to implement more directly to remove the additional dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants