Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extend_dictionary in dictionary builder for improved performance #6875

Merged
merged 7 commits into from
Dec 19, 2024

Conversation

rluvaton
Copy link
Contributor

Which issue does this PR close?

No issue

Rationale for this change

This is done to improve the performance when wanting to add already build dictionary to existing builder by taking advantage of the fact that we don't need to check the values for each key

What changes are included in this PR?

added extend_dictionary for PrimitiveDictionaryBuilder and for GenericByteDictionaryBuilder

Are there any user-facing changes?

yes, these are public methods

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 12, 2024
@rluvaton
Copy link
Contributor Author

Hey @tustvold, can you please re-review?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- this looks like a nice improvement to me.

@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

I added some small suggestions on how to improve the docstrings, but we could do that as a follow on PR as well

@rluvaton
Copy link
Contributor Author

I added some small suggestions on how to improve the docstrings, but we could do that as a follow on PR as well

applied

@alamb alamb merged commit 2908a80 into apache:main Dec 19, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Thanks @rluvaton

@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

@rluvaton rluvaton deleted the add-extend-dict branch December 21, 2024 16:37
CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Dec 28, 2024
apache#6875)

* add `extend_dictionary` in dictionary builder for improved performance

* fix extends all nulls

* support null in mapped value

* adding comment

* run `clippy` and `fmt`

* fix ci

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants