Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BugFix] Do not copy null_column data in fnv_hash() and crc32_hash me…
…thod of NullableColumn (#52885) ## Why I'm doing: The fnv_hash() method of ArrayColumn is slow when there are NULLs in the array. ## What I'm doing: Change fnv_hash() method of NullableColumn to not copy null_column data. Fixes #issue fnv_hash() method of ArrayColumn is very slow when there are NULLs in the array. SQLs to reproduce the issue Query 1 is very fast (the input arrays do not contain NULLs) ``` with input as (select array_generate(1000000) as arr union all select ARRAY_MAP(x -> CASE WHEN x % 2 = 1 THEN 2 ELSE x END, array_generate(1000000)) as arr) select count(distinct arr) from input; ``` Query 2 is very slow (one of the input arrays contains NULLs) ``` with input as (select array_generate(1000000) as arr union all select ARRAY_MAP(x -> CASE WHEN x % 2 = 1 THEN NULL ELSE x END, array_generate(1000000)) as arr) select count(distinct arr) from input; ``` The original implementation copies the null_column data in fnv_hash() of NullableColumn and fnv_hash() is called per element in the array. Benchmark 1. Original Implementation: Query 1 (0.63 s), Query 2 (32.93 s). 2. New Implementation: Query 1 (0.67 s), Query 2 (0.60 s). Signed-off-by: Yaqi Zhang <[email protected]> (cherry picked from commit cedf586)
- Loading branch information