-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional errors #106
Additional errors #106
Conversation
✅ Deploy Preview for gbfs-validator ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
972dd69
to
12da2eb
Compare
Signed-off-by: Tom <[email protected]>
Signed-off-by: Tom <[email protected]>
12da2eb
to
013f100
Compare
Happy to see the wording |
Hi @PierrickP and @tdelmas, I wonder if we could make the interface more consistent for users when there are both errors and warnings.
cc @josee-sabourin @isabelle-dr
|
For what it's worth, I am generally in favor of having the errors (schema and non-schema) in one place, because from the user perspective, it's a spec violation, regardless of if generated by the JSON Schema or the custom code. |
Great extension. I just found this PR searching for a check on |
Signed-off-by: Pierrick <[email protected]>
Signed-off-by: Pierrick <[email protected]>
Signed-off-by: Pierrick <[email protected]>
@richfab @isabelle-dr i regrouped errors together and update the design @hbruch I've added a new rule ( |
Signed-off-by: Tom <[email protected]>
Signed-off-by: Tom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive work @tdelmas !
Thank you very much for this extremely valuable addition to the validator. I am sure many producers will appreciate this.
Your comments on my suggestions are welcome.
Signed-off-by: Tom <[email protected]>
Thank you all for your detailed review. I've pushed a commit that apply most suggestions of your reviews, and marked them as "resolved" for readability. Comments not marked as resolved are not fixed, but as they require more tests, I strongly feel they are more suited in a follow-up pull request. |
Thank you @tdelmas for taking most suggestions into consideration. This is amazing! I will review your changes as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments but would like to highlight some issues:
None of the new rules are implemented using partials / schema patching, although after skimming through them, I think a lot of them could well have been (?) - is this a strategic decision? What about the rules that are already implemented that way, are they going to be left like this or ported to the imperative style? The benefit of using the "old" convention is that the output automatically conforms to the regular schema validation error outputs, whereas in the imperative style, they have to be constructed manually. Especially since this project has no type-checking, this is prone to errors.
There is a breaking change in the response format of the validator api - which prompts me to raise the question of standardising this format for portability purposes. If validators respond the same way, frontends and validators can be used interchangeably, and validation reports can be consumed later by any "conformant" UI.
Last, the most serious issue for me, this PR introduces a number of warnings based on assumptions about a number of things that I don't think a generic / canonical GBFS validator should be concerned with. Business rules that depend on operational context should be left out of the validator.
If this validator was a library, one could imagine a feature that allowed library users to provide their own additional rulesets specific to their operational context - but as it stands now, this is just speculative.
Don't get me wrong, I applaud the good intentions behind this, but I don't think it belongs in the validator.
@@ -1,19 +1,92 @@ | |||
const o = require('../nonSchemaValidation') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is o
the name of this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After skimming the list of "non-schema" rules, I'm a bit curious. There's already the concept of schema-patching in gbfs-validator/versions/partials
, and it seems to be that a lot of these rules could very well be implemented there. Why was not that convention followed where this was possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused by this. Are we producing validation warnings based on some assumptions about what different types of trips typically cost? Where do these assumptions come from? What is the currency? This seems very speculative imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They only are done for USD
and EUR
https://github.com/tdelmas/gbfs-validator/blob/additional-errors/gbfs-validator/nonSchemaValidation/check_pricing_plans.js#L95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm skeptical about the concept of giving warnings based on some arbitrary assumptions about what is a high number of available vehicles and docks.
@@ -1,5 +1,7 @@ | |||
const fastify = require('fastify') | |||
|
|||
const last_updated_fresh = Math.floor(Date.now() / 1000) - 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general rule of thumb, testing based on an actual clock could lead to unexpected results and false negatives / positives. Using a mocked clock is preferable.
function build(opts = {}) { | ||
const app = fastify(opts) | ||
|
||
app.get('/gbfs.json', async function(request, reply) { | ||
app.get('/gbfs.json', async function (request, reply) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nit-picky, but why add a whitespace here?
expect(get_errors(result)).toEqual({ | ||
errors: [], | ||
nonSchemaErrors: [], | ||
warnings: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The taxonomy here feels odd. There are errors and there are warnings. But there are also a specific kind of error, which is not from the schema. Where do warnings come from?
My hope is that validation results become standardised for portability reasons, so I hope we can spend the extra effort to consider a rational api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the risk of repeating myself, again, I find the assumptions made here to produce warnings are highly speculative and I don't think they belong in a validator. They are business rules that depend on operational and geographic circumstances. This just adds unnecessary complexity.
@testower thank you for your review. I'll try to answer your concerns.
Yes. Logic errors are much more difficult to write using schemas, and even where they are, it leads to some problems and bugs. For example, if two rules needs to patch the same part of a schema, the results will be unpredictable. Or if the original schema changes. Also, most of them are cross-files, which least to reading a file A, then creating the schema for B and vice versa.
I feel like some should, in the future.
I agree, this could be improved in a future PR.
That's a good goal.
Warning are just warnings: here to catch the attention, because it caught something that may be an error. I insist on the may. I strongly feel they are important because they help to catch subtle mistakes, such as prices in cents instead of base units, or meters vs kilometers, because some feeds had those problems.
A separation could be done, it's an interesting idea for a future PR |
Let's take a step back to look at first principles. The GBFS validator is, according to the README
furthermore
The word "canonical" is instructive if taken to mean: "according to the canon", the canon being the "the standard, rule or primary source that is accepted as authoritative"[1]. The "canon" in this context is obviously the GBFS specification. I will interpret this to mean that the validator is intended to be isomorphic[2] to the specification it validates against. Any departure from that principle should be seriously discussed. Let me elaborate on why I think it's a bad idea to introduce validation rules that break this isomorphism. My argument is that any validation rule that doesn't have a corollary in the specification, is a business rule only relevant for a specific operational context. So while it may be useful to get a warning if a car is reported to have room for more than 5 passengers, or a moped has a max range of more than 100km, or a car rental costs more than 50 USD/EUR for an hour (?) - just to name a few examples - these are useful in a specific operational context (perhaps yours?), but it's hard to argue that they are universal truisms about the domain over which the GBFS specification is meant to govern. If they were, they would have been encoded in the specification already. Now I'm not saying that such rules don't have merit, quite the contrary. But I'm saying that they belong in the specific systems that operate in the specific contexts that give rise to their usefulness. Anywhere else, they create clutter and noise. For the sake of argument, I shall entertain the idea that these rules should be in the validator. Now, what process or rationale should determine what these rules and their threshold values should be? Is it consensus-based, or democratic? The greatest common denominator of all producers? Who shall bear the burden of maintaining these over time? As an exercise, imagine that ever more currencies are added to the "typical cost" structure. How is it possible to keep this structure updated and relevant as prices change and currencies inflate and deflate. I believe that the best candidates for taking that responsibility are those who operate in the context where those rules, thresholds and values are useful and relevant. Offloading that to the canonical validator is a recipe for future decay and ultimately complete irrelevance of the validator as a general-purpose tool to validate against the specification. If I were to pretend to be positive about accepting these new rules, I would still expect the following preconditions to be met:
Even with these preconditions met, I'm not enthusiastic about them, since they still introduce a maintenance burden. I.e. someone must take responsibility for keeping them up-to-date and relevant, and someone must make sure that the total set of such non-specification rules is kept at a reasonable level. I'm particularly skeptical about the "typical cost" check. For one because it equates USD with EUR which could quickly change - but more importantly because it opens the possibility to add checks for any number of currencies, which, needless to say, would very quickly grow to unmanageable scales. Barring additional currencies from being added to avoid this problem, just amounts to discrimination, so I just don't see a way to accept it in the first place. Related to the point I made about the output from the validator being standardized, I'm not sure how this fits in. The taxonomy is very unclear and ambiguous about what constitutes a validation error based on the specification and what is a result of an opinionated "sanity check". All in all, I think this marks a very significant shift in the goal and purpose of the validator that we should not take on lightly. I ask the community to think very carefully about accepting these kinds of changes. [1] https://en.wikipedia.org/wiki/Canonical |
As I said, those rules are not here to detect "high" or "low" prices, but cases when the producer made a conversion error (between meters vs kilometers, $ vs cents, etc.). But yes, it's not a possible to detect that perfectly. @testower Is there a list of rules we can remove from this PR, so it could be acceptable for you? |
@tdelmas The issue is that there is a severe break in principle between a format-driven validation, and inclusion of subjective detection of possible user errors. It's not about what is acceptable for @testower personally, it's about having a whitelabel validator that works equally well for any user. That is the main point here. Validation rules (the warnings), like these, do have a role to play but they would work better as a separate validator, perhaps as a template that each user can configure for local use. That way, each user can define their own limitations in weight, size, doors, and prices according to their own needs - but this still has nothing to do with the GBFS standard. Our intention is not to be troublesome but to ensure that certain principles are maintained. |
Yes, all the warnings. |
Hello everyone, Thank you all for sharing your views on the validator with such enthusiasm and quality arguments. It’s very exciting to see that this tool is getting the attention it deserves from the community. Thank you @testower for reminding the purpose of the canonical GBFS validator, which is to validate the conformity of a GBFS feed with the specification. MobilityData will make sure to keep this at the forefront of our decisions. To ensure isomorphism between the specification and the validator, MobilityData recommends continuing to include in the validator only the rules that are in the specification. The data sanity checks proposed by @tdelmas is a very good idea and could have a place in a separate validator configurable by users, as suggested by @JohanEntur. Here is MobilityData’s proposal:
Please let us know what you all think. |
@richfab I think that's a very good solution and it answers my concerns 🥇 if these changes are made I think we can move on to the more technical aspects of the review |
Hi @tdelmas, please let me know if you have any questions or concerns about the proposal. Thank you! |
Thank you for the feedbacks and your ping.
|
Add non-jsonSchema validations
Checked errors and warnings are listed in README.md
Display a condensed summary of errors and warnings grouped by types and paths
Fix #92
Fix #93 by displaying a warning
Exemple: https://deploy-preview-106--gbfs-validator.netlify.app/validator?url=https%3A%2F%2Fdubai.publicbikesystem.net%2Fcustomer%2Fgbfs%2Fv2%2Fgbfs.json