Skip to content
This repository has been archived by the owner on Jul 9, 2024. It is now read-only.

Treat UUID like a black box #28

Closed
sergeyprokhorenko opened this issue Aug 14, 2021 · 17 comments · Fixed by #58
Closed

Treat UUID like a black box #28

sergeyprokhorenko opened this issue Aug 14, 2021 · 17 comments · Fixed by #58
Labels
Draft 03 IETF Draft 03 Work

Comments

@sergeyprokhorenko
Copy link

Treat UUID like a black box:

  • Disallow extracting timestamp or any other data from UUID
  • Remove ver, var and node from the RFC (like in UUIDv4)
  • But allow database shard on left part of UUID of any length
@nerg4l
Copy link

nerg4l commented Aug 14, 2021

  • Disallow extracting timestamp or any other data from UUID

The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.

  • Remove ver, var and node from the RFC (like in UUIDv4)
  • The recommended node value is a 48 bit random or pseudo-random number just as it is for UUIDv4.
    Form Section 4.3.3

    UUIDv6 node bits SHOULD be set to a 48 bit random or pseudo-random
    number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or
    the [RFC4122], Section 4.5 method of generating a random multicast
    IEEE 802 MAC address.

  • ver and var are quite important for compatibility and being able to distinguish between versions.

  • But allow database shard on left part of UUID of any length

I'm not sure I quite understand what you mean by this. Do you mean 589e3e93-6f3b-4008-aef8-da09f7fa2fb2 should be padded left like shard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2 or something else?

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Aug 14, 2021

  • Disallow extracting timestamp or any other data from UUID

The goal of these new UUID versions is to provide lexicographically sortable Universally Unique Identifiers. To disallow extracting information from a UUID it should be encrypted or hashed in a way and you wouldn't be able to lexicographically sort them by creation date, otherwise any representation of a UUID is readable.

I did not mean any encryption or hash creation or other technical measures. I only meant a ban for data extracting in the text of RFC.

  • Remove ver, var and node from the RFC (like in UUIDv4)
  • The recommended node value is a 48 bit random or pseudo-random number just as it is for UUIDv4.
    Form Section 4.3.3

    UUIDv6 node bits SHOULD be set to a 48 bit random or pseudo-random
    number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or
    the [RFC4122], Section 4.5 method of generating a random multicast
    IEEE 802 MAC address.

OK. But it is better to replace the unclear word node with randomness.

  • ver and var are quite important for compatibility and being able to distinguish between versions.

The UUIDs themselves of any versions are actually fully compatibile, because they are of the same lenght, and they are globally unique and indexable in DB.

There is actually no need for distinguishing between versions, because UUIDv7 must be treated like a whole black box and the parsing of UUIDv7 must be banned.

  • But allow database shard on left part of UUID of any length

I'm not sure I quite understand what you mean by this. Do you mean 589e3e93-6f3b-4008-aef8-da09f7fa2fb2 should be padded left like shard1#589e3e93-6f3b-4008-aef8-da09f7fa2fb2 or something else?

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

@fabiolimace
Copy link

fabiolimace commented Aug 14, 2021

@sergeyprokhorenko

Remove ver, var and node from the RFC (like in UUIDv4)

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.

The UUIDs theirselves of any versions are actually fully compatibil.

Two RFC-4122 UUIDs of different versions are binary compatible. But I don't agree that non-RFC-4122 UUIDs are compatible with RFC-4122 UUIDs, even though they have the same bit length.

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.

Github Repo: https://github.com/tvondra/sequential-uuids
Here are some SQL implementations: tvondra/sequential-uuids#2

@sergeyprokhorenko
Copy link
Author

@sergeyprokhorenko

Remove ver, var and node from the RFC (like in UUIDv4)

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', but version and variant are mandatory for RFC-4122. These fields are important to separate UUID types into different namespaces (or keyspaces?) so they never collide.

Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.

UUIDs will never collide thanks to random parts regardless of ver or var.

No. I mean that values '589e3', '589e4', '589e5' etc. (as well as '589e', '589f', '589g' etc.) can be used as record selection criteria for database shards.

Do you mean Tomas Vondra's Sequential UUIDs? I think it's an excellent candidate for UUID version (UUIDv9 maybe?). In this type of UUID, it's not possible to extract the creation time due to the small bit length (16 bits). So it works like a opaque or black box UUID.

Github Repo: https://github.com/tvondra/sequential-uuids
Here are some SQL implementations: tvondra/sequential-uuids#2

No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.

@fabiolimace
Copy link

fabiolimace commented Aug 14, 2021

Ver and var are not mandatory for RFC-4122. Just look at specification of UUIDv4.

I don't know if 'mandatory' is the correct word (my English is poor). But that's what I understand from section 4.4. of RFC-4122.

4.4.  Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers
      
The algorithm is as follows:

   o  Set the two most significant bits (bits 6 and 7) of the
      clock_seq_hi_and_reserved to zero and one, respectively.

   o  Set the four most significant bits (bits 12 through 15) of the
      time_hi_and_version field to the 4-bit version number from
      Section 4.1.3.

UUIDs will never collide thanks to random parts regardless of ver or var.

Sorry. I mean UUIDs with different versions never collide, i.e., a UUIDv1 don't collide with UUIDv4. They are in different 'spaces'.

No. I only mean that left parts of UUID of arbitrary lenght can be used for grouping of records and for filling of database shards.

Maybe it can be a use case for UUIDv8 (the catch all version).

@sergeyprokhorenko
Copy link
Author

You are correct. ver and var are used in UUIDv4. Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.

The database shard on left part of UUID is a possible for any version of UUID, because any version of UUID is sortable.

@edo1
Copy link

edo1 commented Aug 14, 2021

Disallow extracting timestamp or any other data from UUID

Entirely agree. The UUID should be used as a unique identifier and not as a timestamp (MAC address, etc) storage.

IMO clause "4.4.4.2. UUIDv7 Decoding" should be changed to something like this:

Do not rely on UUID internals.

@edo1
Copy link

edo1 commented Aug 14, 2021

But allow database shard on left part of UUID of any length

Yes. UUIDv7 should be treaded as a btree-friendly and sharding-friendly (partitioning-friendly) variant of UUIDv4.

@kyzer-davis
Copy link
Contributor

But it is better to replace the unclear word node with randomness.

I totally agree that we could drop the word 'node' and use something else like 'random', [...]

The term Node used throughout the draft 01 is directly from RFC 4122, Section 4.1.6. The term was carried over when drafting UUIDv6 and then replicated throughout the document to be consistent both within this document and the previous RFC.

At a glance I don't see any place where Node is not defined properly in draft 01. If there is a spot where we need to add more clarity to the term node please let me know.

  • Each section has a "Node Usage" which defines how the node should be used within that UUID version.
  • Each section features a "layout and bit order" figure plus text which also defines node usage from the figure.
  • "Distributed UUID Generation" and "Security Considerations" also details information about embedding information in the node.

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Aug 16, 2021

At a glance I don't see any place where Node is not defined properly in draft 01

Please note that people take terms as is, without reading in depth. So the terms should be used in their natural meaning. I've never heard that node means random.

@broofa
Copy link
Contributor

broofa commented Aug 18, 2021

Nevertheless nothing prevent us from amendment of RFC-4122 that eliminate the outdated requirement of ver and var for new versions of UUID.

Everything prevents this!

Redefine how the version bits are used and you break compatibility with existing RFC4122 UUID versions. Redefine how the variant bits are used and you break compatibility with all other UUIDs. I.e. It's not possible to eliminate or ignore the current semantics of these fields without breaking the guarantee that UUIDs of different versions and variants won't collide.

@sergeyprokhorenko
Copy link
Author

No UUIDs will collide thanks to random parts (node) of UUIDs.

@broofa
Copy link
Contributor

broofa commented Aug 19, 2021

No UUIDs will collide thanks to random parts (node) of UUIDs

This doesn’t make sense. Take any UUID of any version, change the version (to a valid version #), and you still have a valid UUID. Any randomly set bits will have a non-zero chance of colliding with those same bits in a different-version uuid. Thus, version is essential to guaranteeing cross-version collisions don’t occur.

… but maybe I misunderstand your point. Can you provide a concrete example?

@nerg4l
Copy link

nerg4l commented Aug 19, 2021

No UUIDs will collide thanks to random parts (node) of UUIDs.

That's not true. Random does not guarantees uniqueness it decreases the probability of collision.

Microsoft used to create UUIDv1 when System.Guid.NewGuid() is called and then they moved to UUIDv4. With the version variant it is guaranteed previous UUIDv1 won't collide with UUIDv4 and they can identify old and new ids. If in the future they decide to use UUDvX then the ver and var bits will again guarantee the lack of collision between versions. Each version is generated differently so without ver and var there is more chance of collision because UUIDv1 (with node id) might align.

In short, if you generate a v1 449c7bd6-00ca-11ec-9a03-0242ac130003 and create MyUUID which does not contain ver and var then you risk the probability of colliding with the UUIDv1 you previously generated.

@sergeyprokhorenko
Copy link
Author

It doesn’t make sense to demand probability of collision between versions less than between UUIDs of the same version. By the way, the 160-bit UUID will never collide 128-bit UUID regardless var or ver.

@nerg4l
Copy link

nerg4l commented Aug 19, 2021

Maybe I'm wrong but as far as I know, this project wants to extend RFC4122 and not redefine it. If you think you could create a better definition and finalise it as an RFC to be a standard then you should create a different draft not tight to RFC4122.

@sergeyprokhorenko
Copy link
Author

I see that this project attempts to improve ugly RFC-4122 and overcome the outdated restrictions. And it's much easier to add amendments than create a new RFC.

@kyzer-davis kyzer-davis added the Out of Scope Topics not in scope for the RFC/Draft label Jan 31, 2022
@kyzer-davis kyzer-davis added Draft 03 IETF Draft 03 Work and removed Out of Scope Topics not in scope for the RFC/Draft labels Feb 23, 2022
@kyzer-davis kyzer-davis mentioned this issue Feb 23, 2022
@kyzer-davis kyzer-davis linked a pull request Feb 23, 2022 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Draft 03 IETF Draft 03 Work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants