Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

gnat · 2022-08-19T02:48:38Z

This is about CRDB's hash sharded index feature for sequential indexes / primary keys: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/

Is there any way to use this feature?

I've tried passing various arguments to _create_index_sql():

condition=
include=
db_tablespace=
expressions=

With no success.

It may require an extra entry in https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/schema.py to handle this use case?

Any comments, thoughts, appreciated.

The text was updated successfully, but these errors were encountered:

ajwerner · 2022-08-19T04:51:02Z

I wonder if https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/indexes/#hashindex would work, at least for the secondary indexes.

gnat · 2022-08-19T05:03:19Z

Yeah, interesting find @ajwerner HashIndex won't work for us here, but that may be the direction we need to go in. New HashShardedIndex type, perhaps.

I've noticed any modifications I do to add_index seems to be for migrations only, and won't apply during model creation time, so that may be a dead end.

gnat · 2022-08-19T05:12:52Z

Ideal situation is this becomes a Primary Key option- the most common use case.

Although I'd be okay with this just being default as discussed in the ticket over on the main cockroach repo.. The default here is already DEFAULT unique_rowid(). Regardless, this is a pretty important feature.

gnat · 2022-08-19T05:52:49Z

Temporary solution that works:

#BigAutoField='DEFAULT unique_rowid()',
BigAutoField='USING HASH DEFAULT unique_rowid()',

@timgraham This will use the default hash bucket count of 16, which is a nice, sane default.

I'm too new to Django to dive into more advanced topics such as new Field types (if it is necessary?), and am hoping you can take this the rest of the way.

timgraham · 2022-08-25T22:39:28Z

Yes, a new HashShardedIndex class (with bucket_count an optional parameter) is doable. Usage might look like:

class Customer(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        indexes = [
            HashShardedIndex(fields=['id']),
        ]

The downsides are:

It's a boilerplate to have to add to every model.
Models in django.contrib and third-party Django packages can't be adjusted.

gnat mentioned this issue Aug 19, 2022

sql: Use hash-sharded indexes by default cockroachdb/cockroach#78049

Open

gnat mentioned this issue Aug 25, 2022

Migrations hang when "default=" is set on a new field, while USING HASH. #265

Closed

gnat changed the title ~~Best practices for index USING BUCKET (...or USING HASH WITH BUCKET_COUNT) ?~~ Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? Aug 25, 2022

timgraham mentioned this issue Nov 17, 2023

Missing support trigram indexes / index opclasses #287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

gnat commented Aug 19, 2022 •

edited

Loading

ajwerner commented Aug 19, 2022

gnat commented Aug 19, 2022

gnat commented Aug 19, 2022

gnat commented Aug 19, 2022

timgraham commented Aug 25, 2022

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

Comments

gnat commented Aug 19, 2022 • edited Loading

ajwerner commented Aug 19, 2022

gnat commented Aug 19, 2022

gnat commented Aug 19, 2022

gnat commented Aug 19, 2022

timgraham commented Aug 25, 2022

gnat commented Aug 19, 2022 •

edited

Loading