-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality check #14
Comments
@NielsLeenheer thank you for the input. (i personally also dont like the wrong results)
If i understand it correctly, this would be hard to achieve with the amount of data. And it's hard to keep up to date (especially the handpicked list of tricky user agents) My idea is a bit different, but in many parts similar.
Then we have a percentage, which (can) indicate (possible) other wrong results (based on each value). This indicator would also come to the summary front page, to show which providers do real detection. I already created some work here (but this is not finished) |
Yes, this is one of the drawbacks. It is will be difficult to compile a list of tricky user agents. I think we should have a least a hundred different to start with, to be able to say something meaningful. And the more we can get the better. I'll gladly spend some time on this.
I've considered this approach for a while, but I think there is a fatal flaw with it. Being right isn't a democracy. One may be right and all of the others wrong. I would say this approach is pushing towards harmonisation and not necessarily good quality. Take for example the Opera Mini user agent string: If you look at the results you'll see that everybody except WhichBrowser is fooled. Everybody thinks it is Opera 11 running on Linux, while "zvav" indicates it is the desktop mode of Opera Mini trying to mislead you into thinking it is the desktop version. Add 13 to every letter of "zvav" and you get "mini" |
You are right...didnt had that in my mind.
You just led me to an (i think awesome) idea.... This comparison already uses (only) the testsuites of the parsers, which provide it in a independet way
So why no reuse them in this case? We would have 100% of all expected values... In combination with value harmonization it should be possible? |
I hadn't thought of this, but I can see some obstacles to overcome. I know that the WhichBrowser test data contains expected results that are flat out wrong. The expected results in my test data are in there to be able to do regression testing, not as an indication of a correct result. I expect this to be true of all of the data sources. Just think of it like this: we know there are different results between libraries. That must mean some libraries must make mistakes sometimes. We know that every library passes its own test suite. That logically means the expected results of the test suites contain mistakes. I'm starting to think that a curated list of test makes more and more sense. We could define a list of say 100 strings and call that the "UAmark 2016" test - or something like that. Because it is curated, the test is always the same and results can be tracked over time. I'm also starting to think, it should not contain just tricky user agent strings. It doesn't have to be a torture test. I could contain some basic strings for desktop and mobile, some more exotic ones and some tricky ones. The only thing is, we have to manually confirm the expected results. But I don't think that will be a big problem. And with the combined test suites we have a lot of user agent strings to choose from. |
That could be a little problem...
It's not always a (direct) mistake. Sometimes (like the example above) it's just not a covered case.
I somehow like the idea, but i see some main problems
Maybe we can do a chat about this some time? |
Oh, every string is a real user agent string. But the parsing by the library itself may not be perfect yet. And the expected result mirrors the state of the library. It happens regularly that an improvement to the library causes the expected results to change. And that is okay, as long as it is better than before and not unexpected.
Yes, because but I don't think the limited size is a problem. On the contrary. It allows us to select specific strings that we think are important, instead of having 5000 variants of Chrome in the test suite.
You should definitely keep the whole test set, but I think this quality mark can be an addition to the overview page.
That is a good question. Personally I think browser detection should be most important and maybe some device detection. For bot detection I would use completely different criteria, because just looking at 100 strings won't tell much about how well a library supports detecting bots.
That is the hardest part I think. We should start be setting goals, determining rules and base the selection on that. Some off the cuff ideas:
Sure, I'll send you a DM on Twitter to schedule something. |
@NielsLeenheer in the meantime (until we got our meeting), i add the unitTest results to the UserAgent table, so following information can be displayed on the single UserAgent detail page:
Regardless of this goal we discussed here, i think this is useful. |
👍 Did you get my direct message, BTW? |
Yep i got it 😄 |
Following steps are necessary to get this finally moving
At this point, the useragent and their result is checked and added to the useragent detail page (so already useful). For the badge calculation:
|
So far the results of UserAgentParserComparison have been extremely useful for me, but the overview page of the parsers can be very misleading if you take the numbers at face value. Having a result for a particular user agent string does not mean it is a good result. Perhaps having no result would have been better.
I've been thinking about how to test the quality of the various user agent detection libraries. I'm not talking about nitpicking details like spelling of model names or stuff like that. Even things like 'Opera' vs. 'Opera Mobile' is not that important. I'm talking about clear detection errors.
What I've come up so far is:
For example if we have the following string:
Opera/9.80 (X11; Linux zvav; U; zh) Presto/2.8.119 Version/11.10
The test would pass if:
An identification of Opera on Linux would be the obvious mistake.
Another example:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Atom/1.0.19 Chrome/43.0.2357.65 Electron/0.30.7 Safari/537.36
The test would pass if:
This is actually the Atom editor and not the Chrome browser.
And finally:
Mozilla/5.0 (Linux; Android 5.0; SAMSUNG SM-N9006 Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/2.1 Chrome/34.0.1847.76 Mobile Safari/537.36
The test would pass if:
I do realise that sometimes results are open to interpretation, but I think that despite that, it might be a useful way to identify common problems in libraries and help them raise the quality of their identifications.
Is this something you want to work on with me?
The text was updated successfully, but these errors were encountered: