Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? #264

Open
gnat opened this issue Aug 19, 2022 · 5 comments
Open

Comments

@gnat
Copy link

gnat commented Aug 19, 2022

This is about CRDB's hash sharded index feature for sequential indexes / primary keys: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/

Is there any way to use this feature?

I've tried passing various arguments to _create_index_sql():

condition=
include=
db_tablespace=
expressions=

With no success.

It may require an extra entry in https://github.com/cockroachdb/django-cockroachdb/blob/master/django_cockroachdb/schema.py to handle this use case?

Any comments, thoughts, appreciated.

@ajwerner
Copy link

I wonder if https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/indexes/#hashindex would work, at least for the secondary indexes.

@gnat
Copy link
Author

gnat commented Aug 19, 2022

Yeah, interesting find @ajwerner HashIndex won't work for us here, but that may be the direction we need to go in. New HashShardedIndex type, perhaps.

I've noticed any modifications I do to add_index seems to be for migrations only, and won't apply during model creation time, so that may be a dead end.

@gnat
Copy link
Author

gnat commented Aug 19, 2022

Ideal situation is this becomes a Primary Key option- the most common use case.

Although I'd be okay with this just being default as discussed in the ticket over on the main cockroach repo.. The default here is already DEFAULT unique_rowid(). Regardless, this is a pretty important feature.

@gnat
Copy link
Author

gnat commented Aug 19, 2022

Temporary solution that works:

#BigAutoField='DEFAULT unique_rowid()',
BigAutoField='USING HASH DEFAULT unique_rowid()',

@timgraham This will use the default hash bucket count of 16, which is a nice, sane default.

I'm too new to Django to dive into more advanced topics such as new Field types (if it is necessary?), and am hoping you can take this the rest of the way.

@gnat gnat changed the title Best practices for index USING BUCKET (...or USING HASH WITH BUCKET_COUNT) ? Best practices for index USING HASH (...or USING HASH WITH BUCKET_COUNT) ? Aug 25, 2022
@timgraham
Copy link
Collaborator

Yes, a new HashShardedIndex class (with bucket_count an optional parameter) is doable. Usage might look like:

class Customer(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    class Meta:
        indexes = [
            HashShardedIndex(fields=['id']),
        ]

The downsides are:

  1. It's a boilerplate to have to add to every model.
  2. Models in django.contrib and third-party Django packages can't be adjusted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants