Skip to content

Commit

Permalink
[BugFix] Do not copy null_column data in fnv_hash() and crc32_hash me…
Browse files Browse the repository at this point in the history
…thod of NullableColumn (#52885)

## Why I'm doing:
The fnv_hash() method of ArrayColumn is slow when there are NULLs in the array.

## What I'm doing:

Change fnv_hash() method of NullableColumn to not copy null_column data.

Fixes #issue

fnv_hash() method of ArrayColumn is very slow when there are NULLs in the array.

SQLs to reproduce the issue

Query 1 is very fast (the input arrays do not contain NULLs)
```
with input as (select array_generate(1000000) as arr union all select ARRAY_MAP(x -> CASE WHEN x % 2 = 1 THEN 2 ELSE x END, array_generate(1000000)) as arr) select count(distinct arr) from input;
 ```

Query 2 is very slow (one of the input arrays contains NULLs)
```
with input as (select array_generate(1000000) as arr union all select ARRAY_MAP(x -> CASE WHEN x % 2 = 1 THEN NULL ELSE x END, array_generate(1000000)) as arr) select count(distinct arr) from input;
```

The original implementation copies the null_column data in fnv_hash() of NullableColumn and fnv_hash() is called per element in the array.

Benchmark
1. Original Implementation: Query 1 (0.63 s), Query 2 (32.93 s).
2. New Implementation: Query 1 (0.67 s), Query 2 (0.60 s).

Signed-off-by: Yaqi Zhang <[email protected]>
(cherry picked from commit cedf586)
  • Loading branch information
zhangyaqi1989 authored and mergify[bot] committed Nov 20, 2024
1 parent 0800460 commit 76959e7
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions be/src/column/nullable_column.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ void NullableColumn::fnv_hash(uint32_t* hash, uint32_t from, uint32_t to) const
return;
}

auto null_data = _null_column->get_data();
const auto& null_data = _null_column->get_data();
uint32_t value = 0x9e3779b9;
while (from < to) {
uint32_t new_from = from + 1;
Expand All @@ -337,7 +337,7 @@ void NullableColumn::crc32_hash(uint32_t* hash, uint32_t from, uint32_t to) cons
return;
}

auto null_data = _null_column->get_data();
const auto& null_data = _null_column->get_data();
// NULL is treat as 0 when crc32 hash for data loading
static const int INT_VALUE = 0;
while (from < to) {
Expand Down

0 comments on commit 76959e7

Please sign in to comment.