Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse: new database engine (experimental) #1094

Closed
wants to merge 11 commits into from
Closed

Clickhouse: new database engine (experimental) #1094

wants to merge 11 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 30, 2023

Purpose

Clickhouse is a vertical database engine that appears to be quite efficient when it comes to run Zonemaster with millions of domains

Context

  • Clickhouse is currently used internally at Afnic to monthly store Zonemaster reslults on the .fr zone (4 millions domains)

Changes

  • new Clickhouse database engine
  • uses the MySQL DBI to connect to the Clickhouse server

How to test this PR

Setup a Clickhouse server

  • install Clickhouse
  • Generate a double SHA1 password for the database user
    PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
    
  • Setup a zonemaster database and a zonemaster user:
    $ clickhouse-client
    :) CREATE DATABASE zonemaster;
    :) CREATE USER zonemaster IDENTIFIED WITH double_sha1_hash BY '<DOUBLE_SHA1>';
    :) GRANT CREATE TABLE, DROP TABLE, SELECT, INSERT, ALTER UPDATE ON zonemaster.* TO zonemaster;
    

Configure and use Zonemaster-Backend

Adapt the share/backend_config.ini file and use zmb or zmtest.
You could check any results by running some SQL commands with clickhouse-client.

$ clickhouse-client
:) USE zonemaster;
:) SELECT * FROM test_results;

Run the unit tests

Unit test should work when using Clickhouse.

TARGET=Clickhouse prove -l t/

@ghost ghost added the V-Minor Versioning: The change gives an update of minor in version. label Mar 30, 2023
@ghost ghost added this to the v2023.1 milestone Mar 30, 2023
docs/Configuration.md Outdated Show resolved Hide resolved
docs/Configuration.md Outdated Show resolved Hide resolved
docs/Configuration.md Outdated Show resolved Hide resolved
@ghost ghost requested a review from matsduf April 5, 2023 15:42
@matsduf
Copy link
Contributor

matsduf commented Apr 14, 2023

Some installation instructions are needed.

@matsduf matsduf modified the milestones: v2023.1, v2023.2 May 10, 2023
Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still marked as "draft". Is it really ready for installation?

There are other changes in the code that seems to be unrelated to Clickhouse.

Comment on lines 514 to 516
(SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_critical,
(SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_error,
(SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_warning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related to the addition of support of Clickhouse engine?

There are other changes in this module where it is not obvious that they are related to the Clickhouse change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the changes to DB/Clickhouse.pm to avoid updating any logic related to the other engines. So this PR can stand by itself an be an experiment that can be reverted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That makes it easier to review. I think it looks fine, but Travis does not. Some kind of documentation is needed (installation and configuration).

@ghost ghost marked this pull request as ready for review December 4, 2023 09:07
@ghost
Copy link
Author

ghost commented Dec 4, 2023

This PR is still marked as "draft". Is it really ready for installation?

Yes. I marked it as "ready for review" and updated the description.

@ghost ghost changed the title Clickhouse: new database engine Clickhouse: new database engine (experimental) Dec 4, 2023
@tgreenx tgreenx self-requested a review December 7, 2023 15:13
Copy link
Contributor

@tgreenx tgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine.
I tried to test, and I successfully installed using the instructions in zonemaster/zonemaster#1228.

2023-12-07T16:27:29Z [622] [NOTICE] [Zonemaster::Backend::Config] Loading config: /home/tgreen/zonemaster/zonemaster-backend/backend_config.ini
2023-12-07T16:27:29Z [626] [NOTICE] [main] Daemon spawned
2023-12-07T16:27:30Z [626] [NOTICE] [Zonemaster::Backend::DB] Connecting to database 'DBI:mysql:database=zonemaster;host=127.0.0.1;port=9004' as user 'zonemaster'

However, as long as #1132 is not merged I can't go through with further testing:

$ ./script/zmtest zonemaster.net
error: method start_domain_test: 500 Internal Server Error at ./script/zmb line 678.

@marc-vanderwal marc-vanderwal added the S-ReleaseTested Status: The PR has been successfully tested in release testing label Dec 12, 2023
@marc-vanderwal
Copy link
Contributor

marc-vanderwal commented Dec 12, 2023

Release testing report – Success, no issues

Rocky Linux 9.3/Clickhouse

Merged pnax/clickhouse in a detached head. Followed the installation instructions as updated by zonemaster/zonemaster#1228 in order to get a Clickhouse server running on the same host as zonemaster-backend. After running the “smoke test”, the test_results table contains data, as expected.

For the unit tests, the procedure was incomplete. Unit tests needed a zonemaster_testing database, a zonemaster_testing user and enough privileges for the aforementioned user to touch the database. With those prerequisites out of the way, unit tests pass.

tgreenx
tgreenx previously approved these changes Dec 13, 2023
Copy link
Contributor

@tgreenx tgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust the testing from @marc-vanderwal, otherwise I have reviewed the rest and it looks good to me.

@tgreenx
Copy link
Contributor

tgreenx commented Dec 13, 2023

@pnax Before this is merged it should be rebased on latest develop (#1132 has now been merged), to be sure that CI passes

Alexandre Pion added 5 commits December 13, 2023 11:49
Connections to Clickhouse are made through its MySQL interface.
Clickhouse mostly supports standard SQL, however sometimes there are
some deviations (for instance to UPDATE a row, or no native support for
incremental indexes).
* Make UPDATE synchronous by setting `mutations_sync = 1` within the
  query, see
  <https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync>
  <https://clickhouse.com/docs/en/sql-reference/statements/alter#synchronicity-of-alter-queries>

* Manually compute the number of affected rows, because this number is
  incorrect in Clickhouse and this is a feature, see
  <ClickHouse/ClickHouse#50970 (comment)>
Alexandre Pion added 6 commits December 13, 2023 11:49
Clickhouse "batch_id" column is a non-nullable UInt32 to avoid
performance issue with nullable column, see
<https://clickhouse.com/docs/en/sql-reference/data-types/nullable>.

In comparaison the other database engines store an empty or NULL value
when no "batch_id" is defined and an integer otherwise.

                       +-------------+-------------+
                       | no batch_id |  batch_id   |
    +------------------+-------------+-------------+
    | Clickhouse       |      0      | UInt32 >= 1 |
    | Other DB engines |  empty/NULL |  integer    |
    +------------------+-------------+-------------+
Clickhouse does not have a UNIQUE constraint per column. Therefore the
check for user uniqueness is performed manually.
There is no COMMIT with Clickhouse. However there is no restriction
either on the number of bind parameters. Therefore we can pass all
domains in one pass via bind parameters.

This has been successfully tested with a batch made of 2 millions
domains (with Clickhouse only).
To avoid potential memory exhaustion.
matsduf
matsduf previously approved these changes Dec 13, 2023
Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it passes CI.

@marc-vanderwal
Copy link
Contributor

I’m currently testing #1121 with the experimental Clickhouse database engine still enabled and it seems that I’m hitting issues related to the way Clickhouse implements ALTER TABLE … UPDATE that don’t make me feel comfortable at all with using a Clickhouse database backend in production.

For example, I get warnings like the following:

Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.

That line 362 is the equality comparison of $progress with 0 in the following subroutine:

sub test_state {
    my ( $self, $test_id ) = @_;

    my ( $progress ) = $self->dbh->selectrow_array(
        q[
            SELECT progress
            FROM test_results
            WHERE hash_id = ?
        ],
        undef,
        $test_id,
    );
    if ( !defined $progress ) {
        die Zonemaster::Backend::Error::Internal->new( reason => 'job not found' );
    }

    if ( $progress == 0 ) {
        return $TEST_WAITING;
    }
    elsif ( 0 < $progress && $progress < 100 ) {
        return $TEST_RUNNING;
    }
    elsif ( $progress == 100 ) {
        return $TEST_COMPLETED;
    }
    else {
        die Zonemaster::Backend::Error::Internal->new( reason => 'state could not be determined' );
    }
}

But these “argument isn’t numeric” errors should in theory never happen, because the code fetches exactly one row and column and the datum is, according to the database schema, an integer. How do we end up with a string?

Grepping 860b459647d9be55 across both log files shows something even more concerning: the thing that’s being compared to a number is actually the ID of a test being run by the other of the two test agents on the machine. In this example, if I order the matches by timestamp, this happens:

zm-testagent-2.log:2023-12-13T08:59:51Z [2296] [WARNING] [main] Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.
zm-testagent.log:2023-12-13T08:59:53Z [2152] [INFO] [main] Test found: 860b459647d9be55
zm-testagent-2.log:2023-12-13T08:59:54Z [2296] [WARNING] [main] Argument "860b459647d9be55" isn't numeric in numeric eq (==) at /usr/local/share/perl5/5.32/Zonemaster/Backend/DB.pm line 362.
zm-testagent.log:2023-12-13T09:00:21Z [2812] [INFO] [main] Test starting: 860b459647d9be55
zm-testagent.log:2023-12-13T09:05:53Z [2812] [INFO] [main] Test completed: 860b459647d9be55
zm-testagent.log:2023-12-13T09:05:54Z [2152] [NOTICE] [Zonemaster::Backend::Config] Worker process (pid 2812, testid 860b459647d9be55): Terminated with exit code 0

Due to this, many tests either crash with “state could not be determined” or “illegal transition to COMPLETED”.

It seems that Clickhouse doesn’t implement proper transaction isolation when updating a table. Yet, currently, the code does many updates to test_results that, without proper isolation, cause Zonemaster::Backend to exhibit very surprising behavior.

@matsduf matsduf dismissed their stale review December 20, 2023 14:13

Not ready for inclusion yet, not even experimental.

@tgreenx tgreenx dismissed their stale review December 20, 2023 14:15

More changes are required, in particular the way the queue is currently done in Backend. It should be separated and take into account the specifies of such a DBMS.

@matsduf matsduf modified the milestones: v2023.2, v2024.1 Dec 20, 2023
@ghost ghost closed this by deleting the head repository Feb 9, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-ReleaseTested Status: The PR has been successfully tested in release testing V-Minor Versioning: The change gives an update of minor in version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants