-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pure pgSQL implementation #2
Comments
Thanks for posting this. I decided to go with this implementation. What's worth considering for people attempting to write an equivalent function within their application code is that some languages, or at least Crystal, use a floor-rounded integer value for their float to integer conversion. In the above it appears to be that Postgres rounds the integer value of the float returned from This caught me out for a while because I could see that with a interval length of 60, the UUID was ticking upwards at each half-minute within Postgres, but my Crystal implementation was ticking exactly on the minute. I decided to round the above snippet rather than in my application code, so I'm using; v_time := floor(extract(epoch FROM clock_timestamp()) / p_interval_length); Where my application implementation is; require "uuid"
struct UUID
def self.sequential_random(random = Random::Secure, variant = Variant::RFC4122, version = Version::V4)
new_bytes = random.random_bytes(16)
v_time = (Time.utc.to_unix_f / 60).to_u
new_bytes[0] = (v_time >> 8 & 255).to_u8;
new_bytes[1] = (v_time & 255).to_u8;
# `new` calls the default UUID constructor, so the byte-array to UUID conversion happens in that function.
new(new_bytes, variant, version)
end
end Now I don't think too many people will have use for the Crystal snippet but this implementation detail might help others implementing it in their own application software. |
@jackturnbull Interesting note about the rounding. I suppose that would only be an issue if you're generating ids both on Postgres and in your application and storing them in the same context. Glad you found it useful! :) |
I also created another version - next_uuid - compatible with ulid. Hope this help. CREATE OR REPLACE FUNCTION next_uuid(OUT result uuid) AS $$
DECLARE
now_millis bigint;
second_rand bigint;
hex_value text;
shard_id int:=1;
BEGIN
-- Can use clock_timestamp() / statement_timestamp()
select (extract(epoch from transaction_timestamp())*1000)::BIGINT+(extract(milliseconds from transaction_timestamp()))::BIGINT INTO now_millis;
select ((random() * 10^18)::BIGINT) INTO second_rand;
-- Uncomment below line to ignore sharding.
-- select ((random() * 10^6)::INT) INTO shard_id;
hex_value := RPAD(TO_HEX(now_millis), 12, '0')||LPAD(TO_HEX(shard_id), 4, '0')||LPAD(TO_HEX(second_rand), 16, '0');
result := CAST(hex_value AS UUID);
END;
$$ LANGUAGE PLPGSQL; |
Nice! I wonder how to best track those, so that users can find them. Keeping them buried in a github issue is not great, because people are unlikely to find them here. OTOH I can't just add them to the extension as that won't work on managed services (which I think is the main point of those SQL-only variants). I think two things might work
I think (2) is fine, but I'd like to know your opinions as it's your code. |
@tvondra I think just linking them in the readme should suffice. I published mine as a gist so it's easier to link to https://gist.github.com/rshea0/f4c2e26829d82ed8d38eb5e6e6374ec2 I imagine that most people won't bother installing such a small SQL-only extension anyways and will probably just copy the function into their DB scripts. |
@PachowStudios thanks for the snippet. Maybe I'm missing something but |
@daesu Over what period of time were the ids generated? The ids will only be ordered if they were generated within the same interval period. You can increase the interval length if needed. The point isn't to have every id be ordered, but for them to not be completely random. @tvondra Maybe you can comment on this? Am I correct? |
I haven't tried the C version, but doing a quick benchmark of 100,000 all that is here Baseline = EXPLAIN ANALYZE SELECT uuid_generate_v4() FROM generate_series (1,100000) AS g; @PachowStudios = EXPLAIN ANALYZE SELECT generate_sequential_uuid() FROM generate_series (1,100000) AS g; @dinhduongha = EXPLAIN ANALYZE SELECT next_uuid() FROM generate_series (1,100000) AS g; pgulid with base32 removed: CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE OR REPLACE FUNCTION generate_ulid_uuid()
RETURNS uuid
AS $$
DECLARE
timestamp BYTEA = E'\\000\\000\\000\\000\\000\\000';
unix_time BIGINT;
BEGIN
-- 6 timestamp bytes
unix_time = (EXTRACT(EPOCH FROM clock_timestamp()) * 1000)::BIGINT;
timestamp = SET_BYTE(timestamp, 0, (unix_time >> 40)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 1, (unix_time >> 32)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 2, (unix_time >> 24)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 3, (unix_time >> 16)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 4, (unix_time >> 8)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 5, unix_time::BIT(8)::INTEGER);
RETURN encode( timestamp || gen_random_bytes(10) ,'hex')::uuid;
END
$$
LANGUAGE plpgsql
VOLATILE; So generate_sequential_uuid non-C version seems to be almost 8X slower than uuid_generate_v4. next_uuid seems to be about 3.416X slower and pgulid implementation is only 1.8X slower. I think the pgulid implementation is the way to go. Do note it doesn't seem to be compatible with next_uuid since they use different methods of timestamp accuracy? Edit: EXPLAIN ANALYZE SELECT generate_ulid_uuid2() FROM generate_series (1,100000) AS g; It's actually faster! And only 1.556X slower than uuid_generate_v4. That said, random is postgres and many languages isn't true random, so the pgcrypto version is probably still more reliable I guess? CREATE OR REPLACE FUNCTION generate_ulid_uuid2()
RETURNS uuid
AS $$
DECLARE
timestamp BYTEA = E'\\000\\000\\000\\000\\000\\000';
unix_time BIGINT;
BEGIN
unix_time = (EXTRACT(EPOCH FROM clock_timestamp()) * 1000)::BIGINT;
timestamp = SET_BYTE(timestamp, 0, (unix_time >> 40)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 1, (unix_time >> 32)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 2, (unix_time >> 24)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 3, (unix_time >> 16)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 4, (unix_time >> 8)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 5, unix_time::BIT(8)::INTEGER);
RETURN (encode( timestamp ,'hex')||LPAD(TO_HEX((random() * 10^6)::INT), 4, '0')||LPAD(TO_HEX((random() * 10^18)::BIGINT), 16, '0'))::uuid;
END
$$
LANGUAGE plpgsql
VOLATILE; Edit2: Also, if someone wants to give up ULID compatibility for larger random numbers pool but doesn't mind that it will wrap around in 84 years. (based on the pgcrypto implementation with same speed). What it does is effectively start epoch at 2020 instead of 1970, which allows 1 less character being used for timestamp without loss of precision, but it is limited to only 84 years after which it wraps around. CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE OR REPLACE FUNCTION generate_uuid_84years(
)
RETURNS uuid
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
timestamp BYTEA = E'\\000\\000\\000\\000\\000';
unix_time BIGINT;
BEGIN
unix_time = ((EXTRACT(EPOCH FROM clock_timestamp())-1577836800) * 1000)::BIGINT;
timestamp = SET_BYTE(timestamp, 0, (unix_time >> 32)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 1, (unix_time >> 24)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 2, (unix_time >> 16)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 3, (unix_time >> 8)::BIT(8)::INTEGER);
timestamp = SET_BYTE(timestamp, 4, unix_time::BIT(8)::INTEGER);
RETURN encode( timestamp || gen_random_bytes(11) ,'hex')::uuid;
END
$BODY$; |
Here is another implementation of the time based generator in plpgsql, with the same parameter support as the C extension: https://gist.github.com/Tostino/ca104bff40e704b6db70b9af492664ef |
Here is an improved implementation that is ~2.5x faster than the implementation I left above (as Tostino):
|
A bigint version, with ~4.5 hour wrap-around: create function id() returns bigint as $$ begin
return (
floor((extract(epoch from clock_timestamp()) / 60) % 256)::int8::bit(8)
|| ('x' || encode(gen_random_bytes(7), 'hex'))::bit(56)
)::bit(64)::bigint;
end; $$ language plpgsql; And an even further text-based departure, with create function id() returns text as $$ declare
bytes bytea;
alphabet bytea = '34abcdefhijkmnoprstwxy';
output text = '';
begin
bytes = set_byte(
gen_random_bytes(14),
0,
floor((extract(epoch from clock_timestamp()) / 60) % 256)::integer
);
for i in 0..(length(bytes) - 1) loop
output = output || chr(get_byte(
alphabet,
get_byte(bytes, i) % length(alphabet)
));
end loop;
return output;
end; $$ language plpgsql; |
Hello, I created a pure pgSQL implementation of the time-based generation method.
Hopefully this is useful for those of us using a service like AWS RDS where we can't install C-based extensions.
It doesn't support customizing interval_count, but that should be trivial to add.
The text was updated successfully, but these errors were encountered: