Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding configuration elements to block based on user-agents #20

Closed
wants to merge 7 commits into from

Conversation

robin-francois
Copy link

With the increase of traffic generated by generative AI robots, and since those bots do not respect the robots.txt, user-agent filtering seems at the moment a good approach to prevent services being overloaded.

This PR adds a new variable to configure a block list of user agents. This list is case sensitive.

Copy link
Collaborator

@tomcbe tomcbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @robin-francois

I reviewed th PR for docuteam: I have a few smaller suggestions, but in general it looks good to me.

}

handle @badbots {
abort
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to send a HTTP header code 403 Forbidden (https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403) instead of just closing the connection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would say that if they were respecting us, we could be respecting them. But there are not. We want to be sure that they do not come back and do not try again.

Copy link
Collaborator

@tomcbe tomcbe Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artefactual sends 403 status code by default in their Nginx Role for AtoM:

https://github.com/artefactual-labs/ansible-nginx/blob/master/defaults/main/main.yml#L70

I still wonder, if sending a 403 is better than no answer to keep them away.

templates/Caddyfile.j2 Outdated Show resolved Hide resolved
templates/Caddyfile.j2 Outdated Show resolved Hide resolved
templates/Caddyfile.j2 Outdated Show resolved Hide resolved
templates/Caddyfile.j2 Outdated Show resolved Hide resolved
@robin-francois
Copy link
Author

Apart from the response to send, I think all points have been solved.

Copy link
Collaborator

@tomcbe tomcbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart form the discussion about wether to send a 403 HTTP status code or aborting the connection, this PR looks good to me now.

@robin-francois
Copy link
Author

We can easily reply 403 if you prefer. I will just need to double check how to do it.

@robin-francois
Copy link
Author

@tomcbe I have tested how to reply a 403 with some simple HTML content. I have made a new commit to do 403 instead of closing the TCP connection.

@tomcbe
Copy link
Collaborator

tomcbe commented Sep 13, 2024

@robin-francois I approved your PR now, but as this is a repository managed by simplificator. I can't merge it.
I'll create a PR for our provisioning projects to use your fork of the role in the meantime.

@cedricwider @tizpuppi Can one of you review this PR and merge it if agree with the proposed changes?

@cedricwider
Copy link
Contributor

Closing this PR in favor of #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants