-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limiting? #1
Comments
I'm seeing the exact same thing - 17 successful downloads followed by 1469 |
Hi, I have the same issue. Both with gem install and docker.io install. This is more of a workaround but it is posable to increase the latency of your Linux OS’s internet connection temporally to solve having your IP address blocked from the Internet Archive and restore your Linux settings back to default to continue using your PC normally. I recommend doing this in a VM dedicated to wayback-machine-downloader so it wont interfere with your main system. Also you can resume your failed download so you don’t have to restart from scratch by running the same wayback-machine-downloader command again in the same directory where it last quit. Additionally don't have multiple wayback-machine-downloader commands running or else you will get blocked again. Blocks usually last 60 seconds starting from last Internet Archive connection. Using the tc command: *Replace wlo1 with your network card of your VM or PC, Test you latency though “ping github.com” command to see if it takes longer then 400ms per each connection. Run the “wayback_machine_downloader http://example.com” command again in the same directory to resume download. It should now successfully run. When done run “sudo su” then “tc qdisc del dev wlo1 root” to clear any tc settings you have, including the recently made one. Replace wlo1 with your network card. |
Unfortunately, I'm using macOS, so this solution doesn't work (unless I set up a Linux VM on my Mac, which seems a bit heavyweight just to be able to download a website). What's really needed is the integration of something like Strangler, but that doesn't seem likely to happen, since it looks like this project has been abandoned. (Last commit was 7 years ago.) If I was a Ruby developer, I could help, but I'm not. |
This project is rather small - just open its source wherever you installed / downloaded it to, search for URI, and add I found that you need much more than 0.4s nowadays though, and even then it isn't 100% consistent. Just re-run a few times to catch what was missed. |
…p/wait between requests, plus a '--wait-random' option which will randomize the number of wait seconds by a 0.5x-2x. These options are used by the new WaybackMachineDownloader#wait method which is called during subsequent requests. Issue cocoflan#1
I have implemented rudimentary rate limiting support in the form of new |
… including a new '--tries' option accepting a number of times to retry if a connection fails for a fatal error (not just an HTTP 4XX/5XX error; the default is 20 retries'). Issue cocoflan#1
I have also implemented retries upon network errors (not HTTP errors), incl. overriding the default of 20 tries with a new I implemented as a separate PR as I used the retryable gem, but maybe we just want to implement more directly to remove the additional dependency. |
I'm experiencing it working for a while and then getting
443 (Connection refused - connect
after about 15 urls are downloaded... is there a way to hack in some rate limiting? I'm running it viagem install
.The text was updated successfully, but these errors were encountered: