Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of subdomains field #57

Open
mainzelM opened this issue Dec 30, 2020 · 4 comments
Open

Meaning of subdomains field #57

mainzelM opened this issue Dec 30, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@mainzelM
Copy link

I'm wondering about the meaning of the "subdomains" field within a resource. As it seems, the subdomains field is either empty (in which case the "rule" field corresponds to the actual URL of the resource) or "subdomains" contains a list of non-empty names, in which case the "rule" field must be prefixed with these names to get the actual URLs. However, in the latter case, is there any information available from which I can conclude whether the "rule" field without any subdomain has also been detected as a actual URL?

That is, if there is a resource

{
    "rule": "foo\\.net\\/bar",
    "subdomains": [
        "www"
     ]
}

how can I conclude whether "foo.net/bar" has been detected (in addition to "www.foo.net/bar")?

@kdzwinel
Copy link
Member

kdzwinel commented Jan 14, 2021

Thanks for bringing this to our attention @mainzelM! Unfortunately, turns out that this notation is ambiguous (your example can mean both that the resource was seen on "www.foo.net/bar" only or on both "www.foo.net/bar" and "foo.net/bar"). We consider it a bug and will fix it either by introducing an empty subdomain (<none> or just "") or additional property. That being said, I don't know when exactly we will be able to get to it.

@kdzwinel kdzwinel added the bug Something isn't working label Jan 14, 2021
@mainzelM
Copy link
Author

Thanks for the feedback and the clarification, @kdzwinel. I'm glad to hear that you plan to work on this! As I'm deriving blocker lists from the information you provide (https://github.com/mainzelM/ddg-tr-as-easylist), I'm interested in making the rules as concise as possible.

I also came across another, related topic: if a rule includes a CNAME information, e.g.

{
    "rule": "foo\\.net\\/bar",
    "subdomains": [
        "tracker"
     ],
    "cnames": [
            {
              "original": "baz.com",
              "resolved": "tracker.foo.net"
            }
     ]
}

I'd be interested in the information, whether "tracker.foo.net/bar" was seen in addition to "baz.com". Currently, I cannot derive this from the data above, right?

@kdzwinel
Copy link
Member

Hey @mainzelM sorry for late response.

I'd be interested in the information, whether "tracker.foo.net/bar" was seen in addition to "baz.com". Currently, I cannot derive this from the data above, right?

I believe that you are right - you can't tell that ATM. We may release raw crawl data at some point, but I don't have ETA>

https://github.com/mainzelM/ddg-tr-as-easylist

That's awesome to see 👍

@Margieperez
Copy link

Can you please take the bug off thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants