Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of a mirror? #35

Closed
jaydiablo opened this issue Nov 27, 2017 · 8 comments
Closed

Possibility of a mirror? #35

jaydiablo opened this issue Nov 27, 2017 · 8 comments
Assignees
Labels

Comments

@jaydiablo
Copy link

We've discussed this briefly before (browscap/browscap#511 (comment)) but maybe we could figure out a good way to make this work?

Could the download requests still go to browscap.org for stat collection and abuse mitigation, but the actual file request would be distributed between browscap.org and a mirror we'd provide?

Abusers could still potentially just cache the final redirect, whether it's browscap.org or our mirror, but maybe it would reduce your bandwidth usage to have the legit users split between two file locations.

I'm not sure what the specifics would be yet, just wanted to bring it up for further discussion since bandwidth seems to be a problem lately.

@asgrim
Copy link
Member

asgrim commented Nov 27, 2017

Adding mirrors doesn't really solve the problem I think; all it does it give us more leeway. I think issuing API keys (see #34) might be the way to go here as it would require a human to generate an API key. I think the genuine number of downloads of the files is a really manageable number, but right now I couldn't tell you what that is.

Since I took over hosting, I've been hosting it on a 4GB Linode ($20/mo), mainly because it has 3TB of transfer allowance. I think if we cut it down, it could be seriously low actual usage.

@reedzhao
Copy link

reedzhao commented Nov 27, 2017

Could it be black Friday that thousands of new servers are deployed? It could be genuine traffic instead of spam/ddos.

Browscap is a popular project, it will out grow server bandwidth one day. Today is 27th Nov, I see an increase in bandwidth can give us enough leeway(a year or two) until a better solution is in place.

I can provide 10TB bandwidth, together with others I think 50TB is not a problem. I have dedicated servers running CPU intensive work but near-zero bandwidth.

@asgrim
Copy link
Member

asgrim commented Nov 27, 2017

The problem has been ongoing for several months, and if it's genuine traffic, why are they hammering the URL with hundreds of requests per minute?

The issue is not bandwidth consumption, it's mitigating and blocking abuse; we need to fix the root cause of the problem, rather than patch up the symptoms :)

@reedzhao
Copy link

@asgrim You mean, some IPs are hammering the URL hundreds of requests per minute? That's definitely abuse. Or bad programming.... Some dude wrote a PHP script to update browscap on each request.

Anyway, thank you for this wonderful project. If there is anything I can contribute, let me know.

@asgrim
Copy link
Member

asgrim commented Nov 27, 2017

Yep, I have scripts that watch the Varnish logs, and occasionally, I'll log in, watch number of requests per minute by IP, and manually block the IP in CloudFlare; hence I created issue #33 - to automate this basically.

The rate limiting does work (most of the IPs already get the rate limited result), but it doesn't actually stop the traffic (hence why I put the IPs into CloudFlare to block them before they even reach our server)

Thank you; it's time I need 😁

@DaAwesomeP
Copy link
Contributor

I think that a permanent solution would be a CDN. You still pay for the bandwidth, but in combination with caching, the hosting would be faster and less strenuous on one server instance.

CDNs work best with no cache disruption and zero invalidation. To load a new file, you need a unique URL. Doing something like /current or /latest would require a cache invalidation. One method around this is to use dated URLs. For example, using the UNIX timestamp: /1511814412. This way, you are not serving a new file no more than once a second, and it is served by a global CDN of unlimited capacity.

There are also a few methods of rate limiting. You can make a rule that says that a new file will not be served unless the epoch has incremented 1800 seconds (or you get a 404). While you would get an influx of requests every 60 seconds, the CDN can handle this fine. PHP users would complain, but a background daemon to update browscap is a better solution since you're not directly adding time to user page loads to download browscap. Most CDNs also have some sort of access control. AWS CloudFront has a signed URL and signed cookie system. With the Amazon price calculator, I calculated that it would be about $22/month for 300GB/month. I know that's a lot but I think that a cloud solution is the only way to allow for growth.

You might try reaching out to MaxCDN. They sponsor and host a lot of open source projects. There's also CDNjs and jsDeliver, but I don't know if they would be able to update the files instantly or would be willing to host browscap.

Another option (which I know won't be popular) is to make people pay for more requests. With API keys, if you want to update more than once an hour, you can add a monthly price. You can also distribute the requests evenly: you can give a 5 minute window that a client is allowed to download in every hour.

These are just my thoughts. Please reach out to MaxCDN—they might help you out.

@asgrim
Copy link
Member

asgrim commented Nov 27, 2017

Bandwidth is our biggest concern, and when talking value for money, we get 3TB for $20/mo... So that wouldn't really be beneficial. I think rate limiting with CloudFlare's IP firewall automated should help significantly, I just need to get time to do it :)

@asgrim
Copy link
Member

asgrim commented Mar 2, 2018

Likely we don't need this now; I've moved off Linode and onto Heroku. CF's page caching has worked a treat from the other issue 👍

@asgrim asgrim closed this as completed Mar 2, 2018
@asgrim asgrim self-assigned this Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants