-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mostly universal version range spec draft #139
Conversation
Originally at aboutcode-org/univers#11 Signed-off-by: Philippe Ombredanne <[email protected]>
It comes with an experimental implementation in Python at https://github.com/nexB/univers/ |
Overall I like the idea! It seems to me that most of the time |
VERSION-RANGE-SPEC.rst
Outdated
|
||
- The ordering of multiple ``<version-constraint>`` in a range specifier is not | ||
significant. The canonical ordering is by sorting these by lexicographical | ||
order applied with this two steps approach: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sorting ends up being a bit weird, as this sorts first based on the comparator strings and second on the version. Sorting by version is possibly opening a can of worm as this would be scheme specific.
Correct typo and meaning Signed-off-by: Philippe Ombredanne <[email protected]>
This is the same concept as a Package URL type. Signed-off-by: Philippe Ombredanne <[email protected]>
version. And also provides a concrete enumeration of the available ranges as | ||
a daily feed. | ||
|
||
- The version 5 of the NVD CVE JSON data format at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note, CycloneDX v1.4 has adopted the CVE v5.0 version range syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent!
vers
is essentially the same just using a compact notation, and reusing Package URL types. There should be a perfect bijection between vers and the NVD 5.0 (and the OSV schema) albeit with a possible need for a minimal mapping. Let me make this clear in the doc and explain what the mapping is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevespringett based on https://github.com/CVEProject/cve-schema/releases/tag/v5.0.0-rc5 I reckon this is still a release candidate... @chandanbn @rsc do you know when this will become 5.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RC designation will be removed when the CVE upload/update API service goes into production sometime early next year.
At this time the format is considered frozen for new additions. Only bug fixes or minor doc changes are being done based on the feedback from team developing the CVE services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chandanbn Thank you ++ for details! I look forward to the format going live 🙇
I wonder if there would be a way to hone towards a common list or names of versionType
used your spec and in purl and here?
And possibly also what is called a type
for Package URL and package ecosystems/ aka. collectionURL
in CVE 5?
I reckon the CVE 5 spec only provides examples and not definitive lists but we have many devilishly similar yet different takes on essentially the same thing:
For package type or ecosystem:
-
purl package
type
with a list of types at https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst -
CVE 5
collectionURL
with a list of URLs as examples at https://github.com/CVEProject/cve-schema/blob/1c40e3d9f7c51db850b96adb5001df85bd186ab2/schema/v5.0/CVE_JSON_5.0_schema.json#L123 -
OSSF OSV
ecosystem
with both anecosystem
and apurl
at https://github.com/ossf/osv-schema/blob/c4c5522d746476b569618b3d407346bf7f2d6561/validation/schema.json#L48 -
CSAF 2 support purls at https://github.com/oasis-tcs/csaf/blob/master/csaf_2.0/json_schema/csaf_json_schema.json#L235
And for version ranges:
-
this vers spec versioning
scheme
which adopts thepurl
type as a scheme (but can have more) -
CVE 5
versionType
with a list of types at https://github.com/CVEProject/cve-schema/blob/1c40e3d9f7c51db850b96adb5001df85bd186ab2/schema/v5.0/CVE_JSON_5.0_schema.json#L303 -
OSSF OSV ranges
type
https://github.com/ossf/osv-schema/blob/c4c5522d746476b569618b3d407346bf7f2d6561/validation/schema.json#L68
May there is a way we can better control the entropy of the universe? or at the minimum maintain some kind of unambiguous mappings between all these?
@rsc and @oliverchang ping wrt. OSV and @tschmidtb51 and @santosomar ping wrt. CSAF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mostly just thinking out loud)
I think a major "pro" of combining purl and vers into one entity is that you'd get the combined benefits of having the identity of a component plus the version ranges in one "thing". Version specifiers to me seem like ideal candidates for representing e.g. vulnerable version of a component. If we take CVE-2022-22818 as an example, we can represent the list of fixed versions as a list with purl:
pkg:pypi/[email protected]
pkg:pypi/[email protected]
pkg:pypi/[email protected]
but also as a list of vulnerable versions:
vers:pypi/>=4.0.0|<4.0.2
vers:pypi/>=3.0.0|<3.2.12
vers:pypi/>=2.0.0|<2.2.27
(or perhapsvers:pypi/<2.2.27
to account for unsupported 0x. and 1.x versions)
The latter however is not related to the Django package in any way, so in whatever database stores this information you have to then identify a versionless Django component to go with the version ranges (perhaps just pkg:pypi/django
). If this were combined into purl, you could nicely represent vulnerable versions as of Django as:
purl:pypi/django@<2.2.27|>=3.0.0|<3.2.12|>=4.0.0|<4.0.2
The unfortunate problem is that the semantics of what the range represents (e.g. vulnerable vs not affected) would have to be represented outside of purl (or perhaps as agreed upon qualifiers).
Perhaps the best compromise would be to recommend the use of qualifiers for vers strings within purl? I.e.:
pkg:pypi/[email protected]?vulnerable_versions=vers:pypi/>=4.0.0|<4.0.2
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mprpic I think vers and purl SHOULD NOT enter the field of describing the state of a software. They are build to identify a product (including product name and version (with now also version ranges)) - and that works well. Determining, whether a product is affected or not (and that is always a question related to a specific vulnerability) is a completely different question which is well understood in security advisories.
I like the Unix philosophy: "Do one thing and do it well."
IMHO the one thing for vers and purl is identifying a product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tschmidtb51 But that's the problem I tried to highlight, vers does not identify a product, it only identifies a version range that you then have to relate to a product. I only used the relationship to vulnerability affectedness as an example where you might want to represent identify of a product (purl) and its versions (vers) in one artifact (purlvers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mprpic Sorry about the misunderstanding - I wanted to convey:
- I agree that
vers
is a good approach to specify a version range purlvers
is able identify a product by name (purl) and a version range (vers)purlvers
should only specify a product (name+version), not the contextaffected
ornot affected
. Otherwise, it becomes to complex to check that the information is consistent/ not contradicting. (Imagine an advisory stating the product A with the identifierpkg:pypi/[email protected]?vulnerable_versions=vers:pypi/>=4.0.0|<4.0.2
is fixed. What would be true in such a situation?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tschmidtb51 Right, the association of a purlvers to the context should not be a part of purlvers itself, which I guess rules out the qualifier. Having purl:pypi/django@<2.2.27|>=3.0.0|<3.2.12|>=4.0.0|<4.0.2
be the "thing" to associate with some context, like "is fixed" or "is vulnerable", would be more useful in an advisory.
These are no longer used: we now use a coma. Signed-off-by: Philippe Ombredanne <[email protected]>
@ashcrow re:
yes, using a qualifier in a purl would be a way. I think (but need to double-check) that this is possible without the need to encode anything in the vers version range as I picked the vers component separators such that they are both mostly obvious and the ones commonly used and that they do not collide with a Note that this essentially the proposal of @mprpic in #66 (comment) |
@david-a-wheeler @copernico @joshbressers @sbs2001 @Hritik14 @bwillis @coderpatros @jhutchings1 @brianf @jbmaillet ... ping... you all have been involved in the discussions that led to this. Your feedback is badly needed. |
@kerberosmansour @johnmod3 @erosb you had chimed in on this topic too. Your feedback is welcomed! |
This may be problem in some cases. Best is to keep the version as-is even in canonical form. Signed-off-by: Philippe Ombredanne <[email protected]>
This feels like a good idea to me I like that we can specify explicit versions that are affected or not affected. There will be instances where trying to list ranges will be harder than just listing the specific affected versions. Long term I imagine we will want to keep a catalog of known versioning-scheme identifiers. I assume "semver" will end up being the default if no other ecosystem fits (should it be? I currently think yes, but I've only thought about it for a few minutes). In my mind I would compare this catalog of identifiers to how SPDX has a list of known licenses. If you want to add a new one, you can submit an issue or PR and discuss it. I don't have time right now to do this (I might at the end of the month if nobody gets to it first), but I think putting an Examples section at the bottom could make understanding this easier for casual readers. |
Thanks for advancing this draft @pombredanne ! Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time? I don't particularly care which one, but having an unbounded set of them means interop could be a challenge for consumers. |
VERSION-RANGE-SPEC.rst
Outdated
|
||
Each ``<version-constraint>`` of this pipe-separated list can be either a | ||
single constraint or a list of constraints separated in turn by an comma "," as | ||
in ``1.2.3|>=2.0.0,<5.0.0``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I suggest a more OSV/CVE 5.0 like approach here? OSV used to have a similar approach to this (using a disjunctive normal form), but we realised there was a much more generalisable and simpler way to this as we were discussing this in depth with the CVE WG.
This would constrain the way these can be specified a little, leading to simpler evaluation, less potential human error and with no loss of expressiveness.
The additional restrictions would be:
- Only
>=
and<
, or=/no operator
are allowed. (The first two operators would correspond to "introduced" and "fixed" in OSV). - Only a single constraint per pipe ("|") item. The meaning of pipe ("|") would also change, in that they just become a separator for the different constraints / points.
I.e. instead of 1.2.3|>=2.0.0,<5.0
, we just have 1.2.3|>=2.0.0|<5.0
.
To evaluate them, these essentially become sign posts / events on a version "timeline" (which could instead be a tree because of branching). The algorithm to evaluate this would be almost identical to how CVE 5.0 / OSV does it:
# e.g. v == '2.0.0'
# constraints = ['1.2.3', '>=2.0.0', '<5.0.0']
func is_v_affected(v, constraints):
status = unaffected
sorted_constraints = sorted(constraints)
for constraint in sorted_constraints:
if is_equals_operator(constraint) and constraint == v:
return True
if v >= constraint and is_greater_equals(constraint):
status = affected
if v >= constraint and is_less_than(constraing):
status = unaffected
return status
Why would we want this instead?
- A more restrictive way of specifying ranges means more consistency and less chance of human error when writing these ranges (e.g. there's no way to write overlapping ranges). They're also much easier to write evaluators for. This should still be just as expressive.
<version
entries have the side effect of pointing to versions that contain fixes. So, they are more descriptive to users who want to know what version to update to.- This also generalizes well to specifying affected parts of complicated git commit trees that are not possible to express using a disjunctive normal form with the same semantics.
@pombredanne WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverchang re: WDYT?
I think this is effing brilliant and much better... I cannot find a case that would not fit for vulnerabilities or dependencies. This effectively becomes much simpler to implement in the general case. It also provides a natural "canonical" sort order, which is simply that of the versions. It may make the range conversion code of some range notation more complex in some complex edge cases: complex code for complex cases is perfectly OK in my book.
There is only one corner case where I wonder if adding a !=
(not equal operator) could help:
- I can express this as an interval by adding a gap in an interval ... eg., if in
>=2 | <6
I want to exclude version5
, then I can rewrite this>=2|<6
as>=2 | <5 | >=5.1 | <6
. - This may demand to know what is the "smallest" next version after
5
and may not be strictly equivalent to!=5
? - I reckon this is not a use case that could happen for vulnerabilities and bugs, yet is somewhat common for dependencies as in:
use any version 2 or later, up to and excluding version 6, except for version 5 that I know is buggy
. - I am split as
>=2 | !=5.1 | <6
may feel a bit simpler ... yet these case may not be exceptional enough to warrant their own extra operator... e.g. less is more, and>=
and<
, or=
/no operators may be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne I'm very glad you like this!
It may make the range conversion code of some range notation more complex in some complex edge cases: complex code for complex cases is perfectly OK in my book.
Cannot agree more :)
There is only one corner case where I wonder if adding a != (not equal operator) could help:
I agree that less is more! It's a corner case that can still be expressed using >=
, <
and =
alone. I would expect most ranges to represent different release branches (e.g. >= 2 | < 2.2 | >= 3 | < 3.0.3 | >= 6 | < 6.1
), where the gaps would be naturally encoded by omission.
It's also easier to add operators later if it turns out != is really needed in a lot of cases, rather than removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also easier to add operators later if it turns out != is really needed in a lot of cases, rather than removing it.
That's an excellent point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverchang Your feedback is mucho welcomed on the latest push. Thank you ++ for this.
@joshbressers wrote:
Thank you for the kind encouragements!
Exactly. Though I came to appreciate that "semver" may be more like an unreachble dream than a reality. For instance, Ruby's semver is not semver. Composer's semver is not semver and am I am not even sure that node-semver is semver strictly either. Also semver has not notation for ranges. This spec could help there.
Good point: I will add a bunch of examples! |
@jhutchings1 re:
Ideally, I'd want to recommend the proposed unified "vers" notation together with a strict semver version syntax |
Is there a world where we would want to require or recommend that a lowest common denominator version schema is used all the time: maybe here? |
|
||
- "=" is implied when used to enumerate vulnerable versions | ||
- ">=" (greater or equal) is for the version that introduces a vulnerability | ||
- "<" (lesser) is for the version that fixes a vulnerability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: OSV now supports the equivalent of a "<=", and this brings it to parity with CVE 5.0 (which also does not support >).
We still have strong objections against >, because it can lead to very misleading ranges (see ossf/osv-schema#31 (comment)). GitHub also needed this initially, but they were able to remove this completely from their DB and no longer need this: github/advisory-database#19 (comment)
- **generic**: a generic version comparison algorithm (which will be specified | ||
later, likely based on a split on any wholly alpha or wholly numeric segments | ||
and dealing with digit and string comparisons, like is done in libversion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to split the generic one into 3 types:
datetime
: The value is interpreted as timestamp according to RFC3339 and ISO 8601 (for simplicity sake just allow YYYY-MM-DDThh:mm:ss[.ffffff[Z|[+-]hh:mm]]). This is especially useful if you want to communicate something that resides in the cloud (like SAAS) where it is hard to determine the version of the running software or to communicate production dates.string
: A simple string comparison following the "usual" string sorting rules. (This seems to be currently not really cover - as soon as the string contains a digit it would be separated.intdot
: Split at.
into integers, ignore strings. Compare those by groups. That would cover 80% of the versions currently around... It includes4.2
as well as8.7.190.182.919
.generic
: Split at a set of delimiters (e.g..,;:#+-
). Compare those by groups but interpret well-known strings likealpha
,beta
as modifier to the group before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne What needs to done to get these things integrated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed - we should remove generic
and introduce a unknown
. Reasoning: generic
sounds like 'a magic default that fixes everything'. However, that is hard for implementers ('What do I need to implement exactly?') as well as consumers of tools ('What exactly was implemented? Why do I get different results with different implementations?'). The unknown
clearly states that the version semantics are not known - and therefore, it can be computed by machines. Users SHOULD avoid to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: Here are the issues:
intdot
: addintdot
as version scheme #166datetime
: adddatetime
as version scheme #263string
: addstring
as version scheme #265libversion
: addlibversion
as version scheme #266unknown
: as described in Add mostly universal version range spec draft #139 (comment)
@pombredanne 👋🏻 Just wanted to check: how close are we to merging this change? |
@pombredanne Just checking in, are you closer to accepting this yet? |
Are there any blockers for getting this merged? Conversely, how can we help moving it forward? |
@jhutchings1 @nscuro I will tackle this next week. |
Is this one still planned? |
and dealing with digit and string comparisons, like is done in libversion) | ||
|
||
|
||
TODO: add Rust, composer and archlinux, nginx, tomcat, apache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add semver
with the rules from https://semver.org/ => see #264
|
||
These are a few known versioning schemes for some common Package URL | ||
`types` (aka. ``ecosystem``). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also mention here the true
and false
cases: #267
- Simplify the list of constraints. | ||
|
||
|
||
Version constraints simplification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only allow the canonical form - simplification and transformation introduce additional places for mistakes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading these "simplification rules", they are quite complicated. Hence I expect this complexity to be detrimental to building a well working Purl-ecosystem with many interoperable implementations.
Why is requiring implementations to output a strictly conformant list of constraints and requiring importers to error on non-conformant constraint lists deemed worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From experience with PURL, implementations will get this wrong (or just skip it). Erroring on imperfect constraints instead of just accepting or fixing them would be one of the common sources of interoperability failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But these "imperfect constraints" are absolutely ambiguous, hence these rules are just a wild guess what could be meant.
E.g.:
!=1
and=1
is valid, and to be interpreted as?
*
would be logical, but this is not denoted.<1
and>1
is valid, and to be interpreted as?
!=1
would be logical, but this is not denoted.<=1
and!=1
is valid, and to be interpreted as?
I have no idea, because they are contradicting each other.- There are other pairs which are contradicting, e.g.
!=1
and>=1
.
The issue with the rules for "redundant, invalid constraints" is well stated at the beginning of that section:
- =, < or <= followed by < or <=: this is the same as < or <=
- > or >= followed by =, > or >=: this is the same as > or >=
The final or
in each line nicely expresses "nobody can know", still the rest of this section tries to interpret them in an unambiguous fashion!?!
If I read the algorithm for discarding "redundant, invalid constraints" correctly, it simply states "for the same version number the first occurring constraint has precedence".
Why not the last one (no, this question is not meant seriously)?
Why not leaving it at "[these] redundant constrains are invalid:" (serious question)?
- If a ``version`` in a ``<version-constraint>`` contains separator or | ||
comparator characters (i.e. ``><=!*|``), it must be quoted using the URL | ||
quoting rules. This should be rare in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more explicit about using percent encoding to escape a specific set of characters that are ambiguous. From experience with PURL:
- There are two different URL specs which specify slightly different sets of characters that must be encoded.
- Many readers don't understand that URLs encode different sets of characters in different parts of the URL and that percent encoding is not the same as x-www-form-urlencoding, which has special rules about spaces and plus signs.
- Split the specifier from left once on a slash "/". | ||
|
||
- The left hand side is the <versioning-scheme> that must be lowercase. | ||
Tools should validate that the <versioning-scheme> is a known scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be "Tools MAY validate". Not every use case requires the same level of understanding. If you have a list of different ranges in different contexts, you may want to parse ranges you don't understand and be able to see the scheme and serialize the range back out (for example if it's part of a list containing supported and unsupported schemes and you want to preserve the unsupported ranges). It seems likely there will be libraries that have partial support, either by implementing only one of the use cases described in this spec, or by not supporting certain version expressions, in which case it seems like an easy extension to have a library that supports nothing but basic introspection and reserialization, which can be supported for all schemes.
Add Java implementation reference by @nscuro Reference: https://github.com/nscuro/versatile Signed-off-by: Philippe Ombredanne <[email protected]> Co-authored-by: Niklas <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Use correct scheme Signed-off-by: Philippe Ombredanne <[email protected]>
All at last I am merging this. There will be several follow up and refinements. Many are tracked in #328 ... getting smaller focused changes in smaller PRs will help keep things in check! |
|
||
- Remove the comparator from ``<version-constraint>`` string start. The | ||
remaining string is the version. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So >>1.2.3
will be split into >
as comparator and >1.2.3
as version, and <>1.2.3
into <
and >1.2.3
, correct?
IIRC, version strings must start with a digit, so why not splitting a <version-constraint>
at the first digit which occurs?
Also, please consider translating comparators which are used in other notations by expanding the fixed list by:
- If it starts with "==", then the comparator is "=".
- If it starts with "<>", then the comparator is "!=".
- If it starts with ">>", then the comparator is ">".
- If it starts with "<<", then the comparator is "<".
IMO, all other comparators (after splitting a <version-constraint>
at the first digit) should be specified as invalid and result in an error message.
Originally at aboutcode-org/univers#11
This is an work in progress for "vers" a new mostly universal version ranges specification to use as a companion to purl.
This is a possible solution to these issues and PRs:
Signed-off-by: Philippe Ombredanne [email protected]