[HN Gopher] New UUID Formats ___________________________________________________________________ New UUID Formats Author : swyx Score : 265 points Date : 2022-06-12 15:17 UTC (7 hours ago) (HTM) web link (www.ietf.org) (TXT) w3m dump (www.ietf.org) | nixpulvis wrote: | Maybe I'm being slow right now, but can somewhat help me | understand why Max UUID is ever specifically useful? | okr wrote: | My guess is, that it is just a constant, ready to be used for | bitwise operations. | nixpulvis wrote: | Yep... I was just being slow. I thought the specialized | variant was a new kind of UUID (being inverse of RFC4122), | but it's just the inverse of RFC4122's Nil UUID; a single | value. | | Sorry for the silly comment. This value is just a bunch of | binary 1s. | lewisl9029 wrote: | I've been using ULID for a while now, which analogous to UUID v7 | but with a different (better IMHO) string representation. They've | been awesome for using as sort keys in dynamo for instance, since | they're lexicographically sortable as strings. | | But one thing I'm still wary about is exposing these IDs with | millisecond-precision time components to end users, since I've | seen multiple discussions here on HN about the potential for | timing attacks. | | How worried should I really be? Do people have useful heuristics | on the kinds of data where it's safe/unsafe to expose timing | information, or should I just only expose a separate UUID v4 | externally across the board just to be safe? | doliveira wrote: | I came up with a scheme in which the random part of the ULID is | a slice of the hash of a UUID, which seems to work fine because | I'm generating it all server side, I guess? It works very well | for insertion into NoSQL databases, for instance. | | So I'd also like to know the threat model for these timing | attacks. | kbumsik wrote: | UUIDv7 looks interesting, but how is it different from ULID [1] | in practice? I was considering using ULID for a upcoming new | project because it is lexicographically sortable but it looks | like UUIDv7 just can replace that. | | [1]: | https://cran.r-project.org/web/packages/ulid/vignettes/intro... | rekwah wrote: | As the author of a popular ULID implementation in python[1], | the spec has no stewardship anymore. The specification repo[2] | has plenty of open issues and no real guidance or communication | beyond language implementation authors discussing corner cases | and the gaps in the spec. The monotonic functionality is | ambiguous (at best), doesn't consider distributed id | generation, and is implemented differently per-language [3]. | | Functionally, UUIDv7 might be the _same_ but the hope would be | for a more rigid specification for interoperability. | | [1]: https://github.com/ahawker/ulid | | [2]: https://github.com/ulid/spec | | [3]: https://github.com/ulid/spec/issues/11 | RobertRoberts wrote: | Thank you, I've been using ULID for a while now, and it | serves my purposes. But I have long term support concerns. | | UUIDv7 really seems like the sweet spot between pure | INT/BIGINT auto incrementing PKs and universally sortable | universal ids. | kortex wrote: | I've bee using ULIDs in python for about a year now and so | far have been super happy with them, so a) thank you for | maintaining this! b) I always felt a bit uneasy about the way | the spec describes the monotonicity component. Personally I | just rely on the random aspect as I am fortunate enough to | say that two events in the same millisecond are effectively | simultaneous. | | At that point, it's basically just UUID7 with Crockford | base32 encoding, more or less. | | IMHO the in-process monotonically increasing feature of ULID | is misguided. As you mention, distributed ids are a pain. The | instant you start talking distributed, monotonic counters or | orderable events (two threads count as distributed in this | case), you need to talk things like Lamport clocks or other | hybrid clock strategies. It's better to reach for the right | tools in this case, vs half-baked monotonic-only-in-this- | process vague guarantee. | canadiantim wrote: | How would one go about trying to use UUID 7 in a Postgres | database / python codebase now? | Starlevel001 wrote: | Postgres UUIDs are treated as an opaque 8-byte array, you can | use anything as long as it's in the textual format. | oittaa wrote: | I made a simple Python test library that extends the standard | UUID class with UUIDv6 and UUIDv7. You might want to check it | out. https://github.com/oittaa/uuid6-python | | The official UUID Draft repository has also some alternatives | if you'd like to check those out. | https://github.com/uuid6/prototypes | | Postgres supports UUIDs with any version number natively so you | can then do something like this with it: create | table data (id uuid, firstname varchar(100)); insert into | data (id, firstname) values | ('017f21cf-d130-7cc3-98c4-dc0c0c07398f', 'John'); select | * from data; | amluto wrote: | Fortunately this is a bit less relevant today as Windows loses | market share in database and server applications, but: | | UUIDs have historically massively screwed up endian handling. | While this new draft discusses sorting UUIDs as strings of octets | (bytes) and the _text_ of RFC4122 is fairly explicit about most | significant bytes coming first, the C UUID structure in RFC 4122 | appendix A is entirely misguided: typedef | struct { unsigned32 time_low; unsigned16 | time_mid; unsigned16 time_hi_and_version; | unsigned8 clock_seq_hi_and_reserved; unsigned8 | clock_seq_low; byte node[6]; } uuid_t; | | Those aren't bytes -- they're integers of various sizes. (Hint: | do _not_ use integer types in C code for portable data | structures. ntohl, etc are a mess. Just use arrays of bytes.) | | I don't know the whole history, but MS somehow took this | structure at face value and caused problems like this: | | https://github.com/uuid-rs/uuid/issues/277 | | So, if you want to do anything (e.g. sorting) that depends on the | representation of a UUID (or even depends on converting between | string and binary representations), be aware that UUIDs coming | from Windows may be little-endian. In my book, this is a Windows | bug, but opinions may differ here. | Someone wrote: | > Hint: do not use integer types in C code for portable data | structures. ntohl, etc are a mess. Just use arrays of bytes. | | I don't see how that helps much. If a developer forgets to call | _ntohl_ on multi-byte integer fields, I don't trust them to | correctly convert said integers to arrays of bytes, either. | kevincox wrote: | If you write it the naive way it works. | uint8_t bytes[4]; uint32_t = bytes[0] << 24 + | bytes[1] << 16 + bytes[2] << 8 + bytes[3]; | | The endianness is whatever you write in the indexing and will | be the same across architectures. | e4m2 wrote: | Unfortunately, the naive way also turns out to be wrong in | C. uint8_t gets promoted to a signed int when shifting, | which in turn causes undefined behavior for specific input. | One way of fixing this is casting to the desired type | before the shift, thus avoiding surprising conversions. | | On a side note, compiler warnings and sanitizers help with | this kind of stuff greatly, use them if you have the | option: https://godbolt.org/z/8oq9GTcze | quotemstr wrote: | Little endian won. I don't see the point of maintaining | theoretical compatibility with big endian systems that are at | best esoteric right now and soon to be extinct. Likewise, every | reasonable platform aligns data fields on natural alignment | these days. It's just a waste of effort to make software | portable to evolutionary dead ends. | wolverine876 wrote: | You're saying it won for this application (UUIDs)? | Universally? | | What is the most common remaining use of big endian? | tech2 wrote: | TCP/IP might be a reasonable example. There's a reason | "network byte order" and "big endian" are the same thing. | wolverine876 wrote: | > TCP/IP might be a reasonable example. | | That's a pretty significant example of widespread usage | that isn't going away soon. Perhaps the GGP was only | referring to UUID applications. | mort96 wrote: | This isn't about compatibility with big-endian machines. This | is about compatibility between different uuid libraries, | potentially on different operating systems, all on little | endian CPU architectures. | ntauthority wrote: | Every existing library follows the standard which specifies | host byte order, which usually means little endian. The | Rust library cited a few levels up this chain ignored that, | somehow assuming big endian, and then had to correct for | this mistake. | [deleted] | [deleted] | firebird84 wrote: | It's actually worse than that. The first 3 groupings | (textually) of the uuid might be little endian while the other | 2 are big endian. Learning this cost me more time than I care | to admit. | | https://en.wikipedia.org/wiki/Universally_unique_identifier#... | cpach wrote: | What the...? | | This fact will haunt me in my dreams :-p | amluto wrote: | This is consistent with the misguided structure in the RFC. | The first three fields (the time fields) are multibyte | integers. The remainder is just bytes. The dashes in the | textual representation are just there to confuse you. | throwaway9870 wrote: | I had to write some EFI/GPT code earlier this year and was | dumbfounded to learn this. This is right up there with the | mork file format. | formerly_proven wrote: | Making mixed-endian _even more haunted_ is quite the | achievement. I congratulate whoever did this at Microsoft for | their lasting contribution. | secondcoming wrote: | I thought Windows uses GUIDs, not UUIDs | takeda wrote: | It looks like GUID is synonymous with UUID, but the name GUID | also implies that it could contain that bug/feature | mentioned[1] | | [1] https://en.wikipedia.org/wiki/Universally_unique_identifi | er#... | JonathonW wrote: | UUIDs (as GUIDs) in Windows predate RFC 4122, so I don't think | it's that unreasonable that they're not compliant with the spec | (since the exact contents of a UUID aren't typically that | important, but consistency in how you produce them _is_ ). | | The GUID implementation in Windows derives from the DCE RPC | specification [1]. That's where the multibyte integers in the | RFC 4122 specification come from (they're stated the same way | in the DCE RPC spec), and it doesn't explicitly specify their | endianness. It does call them "NDR integers", but NDR integers | can be little endian or big endian depending on implementation. | DCE specifies a mechanism by which you'd indicate which you're | using in an RPC call, but that data's not included in the UUID | format-- you get whatever byte order the system's decided to | use, which, for Windows, is little-endian. | | [1] | https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm#tagc... | blueflow wrote: | This mishap lives forth in the UUID stored in a machines DMI | data, as well in GPT partition tables, which are required when | EFI is used. It would be really cool if we had some replacement | for EFI that would not harbour these kind of painful legacies. | dtech wrote: | On the other hand, this is an easy to implement conversion, | while changing such a fundamental thing from EFI sounds | pretty hard, making it not worth it. | blueflow wrote: | You cannot fix this with an conversion because you do not | know if your UUID is correct or needs conversion. DMI data | for example has inconsistent endian-ness depending on the | vendor. So if you have a UUID sticker on a new server, you | still have two options which UUID the machine will send | during PXE, either the printed UUID in big-endian encoding | or in the microsoft mixed-endian encoding. | | Use BIOS boot instead of EFI, it has less legacy to | implement: PE executables, FAT file system, Win64 ABI | operator-name wrote: | With the introduction of UUIDv8, my fun script to generate vanity | uuids[0] can finally be spec comfortant! | | [0]:https://github.com/operator-name/vanity-uuid | davidjfelix wrote: | I've been staring at the supposedly readable and memorable uuid | for like 4 minutes without any idea what it says. | operator-name wrote: | Yeah, looking back I could have done a bit more than simple | substitution. Since the sentences don't gave meaning, | interpreting between 1 as i vs l is especially difficult. | | 5eedbed5-f05e-b055-ada0-d15ab11171e5 | | seedbeds-fose-boss-adao-disabilities | | "Memorable" was definitely tongue in cheek, as was the spec | bending. | gigatexal wrote: | Uuidv7 looks really interesting. I like that the article mentions | all the other projects that attempted to fix uuid issues for | different use cases. | stevesimmons wrote: | This had appeared a few times before at earlier stages of the | draft process... | | https://news.ycombinator.com/item?id=28088213 [244 comments] | | Personally, I really like UUIDv7, except I transform the UUID to | a 25-character string which has all the same properties except it | doesn't look like a UUID. The last thing I want is my index of | time-sortable UUIDs getting contaminated with with some UUIDv4 | fully random ones. Since UUIDs may be generated and persisted in | a distributed manner, it's a simple way to at least spot this. | ikornaselur wrote: | The UUID version is in the actual ID, so you would be able to | spot the version as it isn't fully random. | | Although you light mean just visually you'd spot the difference | much easier! | theptip wrote: | Can't you just check the "version" bits and reject if it's not | 4/7? Or are you worried about someone generating a completely | random (non-compliant) set of bits that happens to parse as a | v4/7 | duxup wrote: | I'm always amazed how much work goes into these. | | Meanwhile 99% of the time i just call a function when "I just | need something unique here", -calls function-, and it works! | | Thanks everyone! | GiorgioG wrote: | I used to be a big proponent of using UUIDs for database PKs but | I've found them inherently difficult to work with. It's much | easier to remember/recognize an integer based PK when | troubleshooting a data problem. | | This isn't to say you shouldn't use UUIDs at all, but I much | prefer to use an "ExternalId" column of UUID type if you don't | want to expose your integer based PKs externally. | staticassertion wrote: | I feel like `serial` has got to be pretty fast for writes and | gives you some nice properties like sorted ordering based on | insertion time, a smaller key, and a key that can be compressed | far better than a uuid. | | I get the idea of a ULID/UUID7 encoding a timestamp so you get | sorted order, but I wonder at what scale that beats just using | serial. | stormbrew wrote: | At any scale where you have to have multiple writers | generating IDs or need to merge results from multiple | sources, basically. Synchronizing a serial increment across | hosts and across time is a pain. | | Obviously you can composite a timestamp to an incrementing | id, but then the serial part of it is kind of useless for | ordering, so you may as well use a random number and avoid | needing to synchronize at all. And then you've just | reinvented a time ordered uuid (but to be fair, without the | endian compatibility bs mentioned elsewhere). | staticassertion wrote: | I'm wondering what that performance actually looks like | though. The closest benchmark I have is a single postgres | instance with a uuid pkey and a bigint column, | transactionally incrementing the integer in the column. | | I mean I guess mathing it out, updating an atomic integer | is ~2-100ns, depending on contention. If you need to | coordinate the writes you have anywhere from ~250ms-10ms. | | We can basically throw away the increment at that point | since the network is hundreds/thousands of times slower. | | So at 10ms, that's 100 op/s. At 250ms more like 40kops/s. | | That ignores the fact that your db can perform those writes | concurrently and then batch the writes off to the other | database, so long as it doesn't pretend that they're all | committed at once. psql HOT updates would presumably be a | thing here idk. | | If I had to guess, I'd lean towards the "40kop/s" being | closer than the "100op/s" but idk! I wish we had benchmarks | but I can't find anything :\ | tpetry wrote: | That's in my experience also the best approach. I wrote an | article a few days ago about the exact thing: An integer auto | incrementing PK with an UUID you use externally: | | https://sqlfordevs.io/uuid-prevent-enumeration-attack | tylerscott wrote: | Just seconding this as a sane way to use UUIDs IME. Basically | the sequential integer PK is the "internal ID". IIRC in | SQLite regardless of type or existence of PK there is a | private sequential integer. Super handy pattern to use. | hn_throwaway_99 wrote: | That is how I used to do (and currently still do) DB design, | but honestly I think if DBs start supporting UUID v7s well | that I would use that as the sole primary DB key as well as | the external ID: | | 1. They are still sorted in increasing timestamp order (at | millisecond granularity), so they should have good DB index | characteristics. | | 2. At the same time, they contain 62 bits of randomness, | would should pretty much eliminate IDOR attacks if there is a | bug elsewhere that isn't doing proper access checks. Not good | enough for secure tokens, but just good defense against | access permission check bugs. | | That is, you should basically get the best of both worlds: | ordered keys with enough randomness to make ID-increment | attacks infeasible. | RobertRoberts wrote: | Yep, I want UUID v7, because right now I am using ULID and | it's fantastic, but I'd like more official and wide support | as well. | jeremyjh wrote: | I like the textual representation of ULIDs as well | though; I wish they'd just adopted ULID as v7. At least | it is binary compatible with existing UUID types. | RobertRoberts wrote: | Is there not an equivalent text representation for | UUIDv7? | jeremyjh wrote: | Sure you could encode it in Crockford's base-32 but if it | isn't part of the standard then tools won't implement it | natively, so you couldn't copy a key from a url and look | it up in postgres without running it through a conversion | function, for example. | kortex wrote: | > It's much easier to remember/recognize an integer based PK | when troubleshooting a data problem. | | How often are you actually relying on memory of an ID to | troubleshoot a problem? I mean sure, if you are scanning | visually, it's good to recognize the same ID over and over | again, but my ability to do so caps around 4-6 characters. So I | just look at the last 4 chars regardless when fast scanning. | | I use copy-paste for any time I need to transport IDs between | contexts (that isn't just scripted, which is best). Having a | copy-paste stack (Alfred, Raycast and others have this feature) | is a huge game changer here. | marcos100 wrote: | I can't see a case where an UUID PK is better than an INT (or | BIGINT). Why would you do that? | __s wrote: | If the id appears in a url, you may not want people to guess | ids. The information leak exists even if you do | authentication: maybe you don't want someone to be able to | guess how many records there are, or how quickly records are | being generated | tylerscott wrote: | If for no other reason than simplified debugging I find there | is value. Maybe I'm old but if you have more than one UUID | involved in the debugging I'm more likely to trip up than | just integers. | lkrubner wrote: | There were two problems. | | One problem was the style that started around 2004, and was | very popular with Ruby on Rails and WordPress, and then | Syfmony and Django, where you expose the PK in the URL. If | your integer starts with 1 and then increments, you may not | get to a billion, and you'll never get to a trillion. So it | became ridiculously easy to for outsiders to scan your site: | | http://www.example.com/1 | | http://www.example.com/2 | | http://www.example.com/3 | | ... | | http://www.example.com/10000000000 | | That was one problem. Using UUIDs for PKs means outsiders | can't simply scan your site. | | The other problem was that over the years, everyone ran into | the problem of moving a database, or needing to combine | multiple databases, in which case having PKs the start with 1 | and then increment, a collision of the PKs, from different | databases, is 100% guaranteed. This often happens when | combining WordPress sites, for instance. If you use UUIDs as | your PK, then such collisions become unlikely. | madsbuch wrote: | Some applications have the need to create IDs in a | distributed manner, eg. Og clients, and use that identity | _before_ the database returns it. These systems benefit from | randomly generated IDs. | | You could potentially hist use a random number between 0 and | 2^128-1 and still use ints, though I haven't seen that in | action, usually pk with ints are centrally generated and | consequitive. | chrismorgan wrote: | I've been working on a robust scheme for encrypted sequential | IDs, which is done, including library implementations in Rust, | JavaScript and Python, pending just a smidgeon more writing | about it and reviewing a decision on naming. You store an | integer in the database, then encrypt it with a real block | cipher, and stringify with Base58. I have three modes: one for | 32-bit IDs, using Speck32/64 and producing 4-6 character IDs; | one for 64-bit IDs, using Speck64/128 and producing 8-11 | character IDs; and one hybrid, using the 32-bit mode for IDs | below 232 and the 64-bit mode above that, providing both a | forwards-compatibility measure and a way of producing short IDs | as long as possible. Contact me (see my profile) if you're | interested, or I'll probably publish it in another day or two. | Trouble is that I've been getting distracted with other related | concepts, like optimally-short encoding by using encryption | domains [0, 581), [581, 582), ..., [5810, 264) (this is format- | preserving encryption; the main reputable and practical choices | I've found are Hasty Pudding, which I've just about finished | implementing but would like test vectors for but they're on a | dead FTP site, and NIST's FF1 and FF3, which are patent- | encumbered), and ways of avoiding undesirable patterns (curse | words and such) by skipping integers from the database's ID | sequence if they encode to what you don't want, and check | characters with the Damm algorithm. If I didn't keep getting | distracted with these things, I'd have published a couple of | weeks ago. | | (I am not aware of any open-source library embodying a scheme | like what I propose--all that I've found have either reduced | scope or badly broken encryption; https://github.com/yi- | jiayu/presents encrypts soundly, but doesn't stringify; Hashids | is broken almost beyond belief and should not be considered | encryption; Optimus uses an extremely weak encryption.) | | UUIDs are crazy overkill in any situation where you can have | centralised ID allocation. Fully decentralised? Sure, 128 bits | of randomness or mixed clock and randomness or similar, knock | yourself out. But got a master database? Nah, you're just | generating unreasonably long values that take up unnecessary | space and make for messy URLs and such. | mappu wrote: | I've done something similar to obfuscate private DB IDs in a | large existing application - just ensure they're all | Skip32-encoded in all query parameters with an app-wide | secret. | | It works well but you have to be very disciplined to catch | every case individually. Using GUID PKs from the start just | removes this entire category of problem. | cameronh90 wrote: | UUIDs have a few distinct advantages: you'll never run out, you | don't need a roundtrip to find out what they are after saving | them, they often make a good partitioning key and it makes | things easier if you ever need to combine multiple data sources | together in migration and recovery type scenarios. I also quite | like how they're unique across all data sources and tables, so | if you just encounter a random contextless UUID in the wild, | for example in a support ticket, you can probably still find | what it refers to. | | They are quite unwieldy though. There are a few compact | representations you can use in URLs which make it a bit less | ugly, but they can make your database and logs quite bloated, | in particular if you've got a large number of small records. | CharlesW wrote: | > _There are a few compact representations you can use in | URLs which make it a bit less ugly..._ | | Any thoughts on where to find best-practices guidance? I need | to create an external ID scheme for several million items. | hashids (hashids.org) seems interesting, but I have anxiety | about choosing a solution with weaknesses that I can't | identify given my current level of experience in regards to | this. | munawwar wrote: | I took a shot at the math behind this at | https://www.codepasta.com/databases/2020/09/10/shorter- | uniqu... | | Using the equation listed in the article I couldn't | generate a collision so far. Yet, I still check (in code) | for id collision, and pick new id, just to be 100% sure. | deepsun wrote: | The best strategy is to have integer primary keys for internal | purposes, and some form of uuid for external (think | permalinks). | zbuf wrote: | "Unique IDs" _can_ be super really easy to work with if they're | not so baffling complicated. | | A random string generated using quality randomness can be | adjusted to length to suit the quantity of data (negligible | probability of a collision) which in most cases is very short. | | It's easy to increase the length as you get more data. | | They are visually very different for each item of data. | | They're evenly spread which means they hash/index well. | | You can tune a subset of characters if you want to decrease | ambiguity eg. when exchanged by voice (no zero vs. letter O, | upper/lower case etc.) | | And a final bonus, when working with user input only a a short | prefix is needed to uniquely identify an item (in contrast, it | seems like UUIDs deliberately share a common prefix) | | I'm very happy to concede I must be missing something here, and | would be interested to know. But the above approach has served | me well in a range of uses. | | I can see how UUIDs work, and perhaps "looks like a UUID" is a | useful feature. But reading the URL above and a bit of | Wikipedia doesn't give me much to go on as to _why_ any of this | is happening, and why the hyphens aim to retain meaning to what | is ostensibly a 'unique' number. | jxcole wrote: | We used almost this exact scheme for app id indices and the | curious problem we had to design against was inadvertent | profanity. At some point we decided to just never use vowels | to avoid ever having a complaint about 12f*ck if in the URL | jasonwatkinspdx wrote: | Another approach is to use something like EFF's dice words | lists. One of the smaller lists in particular is | interesting as it's 6^4 words, filtered for profanity, and | where all words have both a unique 3 letter prefix and an | edit distance of 3. That makes them robust for the use case | of someone reading out the phrase to someone typing or | such. | | Never using vowells is a smart idea I wish I'd used in the | past. Previously when I've needed something like this I've | used other dictionary lists vs EFF's, and those were not | curated sufficiently to avoid some really unfortunate | combinations. | manigandham wrote: | Use integer IDs and a library like Hashids for friendly | alphanumeric representations: https://hashids.org/ | | This particular implementation is available in dozens of | languages. | jasonwatkinspdx wrote: | Using purely random ids in your database destroys locality. | They mention this in the introduction: | | > Non-time-ordered UUID versions such as UUIDv4 have poor | database index locality. Meaning new values created in | succession are not close to each other in the index and thus | require inserts to be performed at random locations. The | negative performance effects of which on common structures | used for this (B-tree and its variants) can be dramatic. | | The V7 ids work similarly to what you like, as they're just a | unix timestamp and 74 bits of pseudorandom data (they present | several different schemes you could use to generate this | randomness, but the basic birthday bound says we'd need to be | above 100 billion id's generated in a single millisecond to | worry about collisons. Obviously most systems are nowhere | near that territory. | | So using these id's gives you the practical advantages of | random uinique ids, but with the performance of autoincrement | ids. | zbuf wrote: | Thanks for drawing my attention to that, it's the useful | answer I was looking for; my use cases haven't been bound | by write performance in this manner. However, I'd still be | considering carefully before making use of these UUID | schemes. | jasonwatkinspdx wrote: | If you want something like this, and need a "I just want | it to work, require no central coordination, and to have | vanishingly small probability of collision" then just use | the same concepts but wider than the 128 bit footprint | limit of this scheme. This limit makes sense for the RFC | in the post, as there backwards compatibility is an | explicit goal. But if you used say a 64 bit nanosecond | counter (to preserve best case precision on a single | machine) along with 128+ bits of random data and you'll | need to worry more about gamma ray bursts than | collisions. | jandrewrogers wrote: | 100 billion UUIDs per millisecond is the 50% collision | probability threshold. Achieving an acceptable collision | probability for most applications would limit the UUID | generation rate to more like thousands of UUIDs per | millisecond. | | Even if one was not generating millions of UUIDs per second | on average, the risk of spiky temporal distributions when | generating UUIDs would still need to be considered. | jasonwatkinspdx wrote: | I'm aware of what the bound I quoted is, and am just too | lazy to type the refinement into Wolfram for a discussion | like this. But just knowing the value off the top of my | head for 64 bits is 200k for 1e-9 probability of | collision, I'm pretty happy with 72 bits. | | And although this standard obviously wants to stick | within the existing UUID footprint, if you were say doing | some IoT software that would run on billions of nodes | simultaneously, just add another 32/64/whatever bits of | random data and deal with the minor annoyance of longer | ids and lack of UUID RFC compatibility. But even then, | you can truncate and reformat these these to v4 UUIDs | trivially without meaningfully impacting the collision | resistance, for unsorted external ids in systems that | need the compatability. | | The actual big risk with this sort of scheme is vm | initialization. You need to be sure the CSPRNG you're | using is initialized, which can be slightly tricky in | cloud environments with configuration/control layer stuff | that's racy. Mess this up and two nodes hydrated from the | same snapshot may overlap in sequence as they start | generating ids, and of course the time component cannot | be trusted to save you in this instance. | manigandham wrote: | > _" They're evenly spread which means they hash/index | well."_ | | What do you mean by this? Why would you hash it further? Hash | distribution is primarily down to the hashing algorithm, not | the input data. | | Also indexes are better with somewhat ordered and smaller | data. A 64-bit int sequential counter is much faster and half | the size, and compatible everywhere without the annoyances of | a UUID. | jwilk wrote: | Old-school formatting of the same document: | | https://datatracker.ietf.org/doc/html/draft-peabody-dispatch... | [deleted] | Zamicol wrote: | Why isn't there an option with a strong cryptographic hash like | SHA-256? | staticassertion wrote: | It would be slow and doesn't really serve a purpose since you | either have uuids that are totally random or uuids that need to | preserve their structure. | jwilk wrote: | Or UUIDs derived from other identifiers: | | https://datatracker.ietf.org/doc/html/rfc4122#section-4.3 | kstenerud wrote: | It's a shame they didn't finally drop big endian. Little endian | won out and our newer protocols should reflect that for | efficiency's sake. | wyager wrote: | I guarantee you that byte order swapping is not the limiting | factor on the efficiency of any system involving UUIDs on the | wire. | amluto wrote: | No, they should have tried to find a way to drop _little_ | endian. A cycle or two when generating UUIDs is irrelevant, and | almost all UUID implementations, and most of the RFC 4122 text, | are big-endian. | zuzun wrote: | It's necessary to keep them sortable at byte level. | finnh wrote: | But network byte order is big endian, so I can see arguments | both ways (as it were) | chrismorgan wrote: | Big-endianness is mandatory for opaque bytewise sorting (or | lexicographic ordering in string form), which is a very | desirable property of UUIDv6 and UUIDv7. | ripe wrote: | Side note: I love the HTML format of these IETF RFCs, as in TFA. | Over the decades, I was used to seeing the old text format (which | I like), but this one is particularly easy on the eye, especially | on my Android phone. | [deleted] | hoten wrote: | Only mobile issue I see is there is a really long link that | overflows the intended document width, which introduces an | horizontal scrollbar. That makes scrolling a bit finicky on | mobile. | sureglymop wrote: | I love the design and would like to copy it for my blog! | Especially the code blocks with the nice 'figure' description | below them. | bearjaws wrote: | UUIDv6 coming in just about 10 years late. Guess its something. | [deleted] ___________________________________________________________________ (page generated 2022-06-12 23:00 UTC)