-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New result entries table #1092
New result entries table #1092
Conversation
Isn't i better to store the hash_id as number? |
TBH I've never thought about that. This uses the same datatype as in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the change decrease or increase the load of the database when running a batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the only comment I have, it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we implement this change we should understand the gains and losses of the change. At least Mysql and Postgresql have support for querying into the JSON blob as if it consisted of several fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we implement this change we should understand the gains and losses of the change. At least Mysql and Postgresql have support for querying into the JSON blob as if it consisted of several fields.
Pro: you could use SQL syntax to query in the table (no need for specific JSON support).
Con: you have to JOIN the tables.
If you see other gain/loss, please share them.
Both Mysql (Mariadb) and Postgresql have good support for JSON so how much gain is it? Are there any performance differences in saving the result in a table instead of in JSON? Are there any performance differences when reading the result? |
@pnax, have estimated how time it will take to migrade the database in the zonemster.net server? |
Yes but I can't find the figure again. If I recall correctly, this was around 2 to 3 hours. |
@matsstralbergiis will try to measure the difference. @pnax, have you compared load of writing with old and new code? |
See #856 (comment) and #1092 (review). Another gain would be to ease the use of another database engine (for instance Clickhouse, see #1094). Regarding the performance gain, I need to redo some tests. (I thought I had done some a few months back but I can't find my results anymore.) |
The comment in #856 was mine, but today I am better informed and understand that it actually works to let the database engine look into JSON. But it still might be a good idea to use a separate table. It is conceptually simpler. And possibly better support for other engines.
Thanks! |
@result_entries = map { | ||
{ | ||
%$_, | ||
args => decode_json( $_->{args} ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if performance is an issue for test_results, but if it is I have a hunch we're spending a lot of cycles on this decoding and re-encoding of all the JSON-blobs.
- Do you know if it would be possible do some hack with string concatenation? Even though it's ugly, if it's possible it might be worth it.
- Also, do you know if decoding lots of small blobs is faster or slower or the same as decoding one big blob?
I have copied the 'test_result' table to a MySQL instance running in my Mac. The database contains 208 149 tests. Then I created the new 'result_entries' table (I copied the create command from the MySQL.pm in this PR.) and extracted the data from the JSON blob in the 'test_result' table into the nwe table. It now contains 14 518 570 records. After that I performed two queries that I believe gets the same result. (batch 10 contains 81689 tests) One to the 'old' format:
And one to the new format:
The results are consistent when repeating the queries. Maybe it is possible to optimize the queries to speed it up. Feel free to suggest improvements. |
At least, it does not seems like we lose any performance. The new format seems to take half the time for database lookups.
If databases are better at handling integer keys, wouldn't it be better to convert the hash_id into an integer instead? |
From the documentations on DROP TABLE [1] If a foreign key references this table, the table cannot be dropped. In this case, it is necessary to drop the foreign key first. [1]: https://mariadb.com/kb/en/drop-table/#description
These database engines do not support the IF EXISTS syntax on ALTER TABLE and DROP FOREIGN KEY.
I've rebased on top of develop to integrate the changes from #1121 and address a comment from @mattias-p. I think I should have then addressed all comments/remarks. Please re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work :) Just a few cleanup remarks.
* remove unnecessary comments * rename a variable * rename a metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an issue in the migration script, beside it I tested creating a test and querying for the history without issue.
* output the total number of updated rows at the end (SQLite and MySQL) * store level value * only update job with results (SQLite and MySQL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good to me! I only have a few remarks.
push @records, $r; | ||
} | ||
my $query_values = join ", ", ("(?, ?, ?, ?, ?, ?, ?)") x @records; | ||
my $query = "INSERT INTO result_entries (hash_id, level, module, testcase, tag, timestamp, args) VALUES $query_values"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be super-nice if we could verify that the job is "running" when performing the insert and throw an error otherwise. I'm open to leaving this as a future exercise though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Just a question on how to do this. I was thinking to add something like:
if ( $current_state ne $TEST_RUNNING ) {
die Zonemaster::Backend::Error::Internal->new( reason => 'cannot store results to a non-running test' );
}
but I think it won't be sufficient. What if the test's state is updated by another process between the check and the insert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly. I made a proof of concept for how to make it atomic. But I'm not sure about its scalability. Maybe it's great, I don't know. And I also haven't tried it in the other DB engines.
#!/usr/bin/env perl
use strict;
use warnings;
use 5.014;
use utf8;
use DBI;
my $dbh = DBI->connect( "dbi:SQLite:dbname=:memory:", "", "" );
sub setup {
$dbh->do( '
CREATE TABLE test_results (
id INTEGER,
progress INTEGER
)
' );
$dbh->do( '
CREATE TABLE result_entries (
test_id INTEGER,
text VARCHAR(20)
)
' );
$dbh->do( 'INSERT INTO test_results VALUES (1234, 0)' );
$dbh->do( 'INSERT INTO test_results VALUES (2345, 50)' );
$dbh->do( 'INSERT INTO test_results VALUES (3456, 100)' );
}
sub insert {
my ( $id, @messages ) = @_;
my $prefix = 'INSERT INTO result_entries ';
my $separator = ' UNION ';
my $select = 'SELECT id, ? FROM test_results WHERE id = ? AND progress > 0 AND progress < 100';
my $query = $prefix . join $separator, ( $select ) x scalar @messages;
my $sth = $dbh->prepare( $query );
my @params;
for my $message ( @messages ) {
push @params, $message, $id;
}
my $rv = $sth->execute( @params );
if ( $rv && $rv == 0 ) {
die "illegal state transition";
}
}
sub show {
my ( $table ) = @_;
my $rows = $dbh->selectall_arrayref( "SELECT * FROM $table" );
say "Dump of $table:";
for my $row ( @$rows ) {
say join ', ', @{$row};
}
}
my $id = shift @ARGV;
my @messages = @ARGV;
setup();
show( 'test_results' );
say 'Inserting...';
insert( $id, @messages );
show( 'result_entries' );
$ ./poc 1234 aaa bbb ccc
Dump of test_results:
1234, 0
2345, 50
3456, 100
Inserting...
illegal state transition at ./poc line 45.
$ ./poc 2345 aaa bbb ccc
Dump of test_results:
1234, 0
2345, 50
3456, 100
Inserting...
Dump of result_entries:
2345, aaa
2345, bbb
2345, ccc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proof of concept. I think I'll go with your suggestion to "leav[e] this as a future exercise". I've open an issue for that, see #1135.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me.
v2023.2 Release TestingThe "How to test this PR" section is not very explicit and doesn't concern itself with the database migration at all, so I made up the following procedure.
When following this procedure I found a couple of problems.
|
For some reason get_test_results used to add 'ns' arguments to entries from the Nameserver module. These are interpolated into the message anyway, and no other entries are returned with raw arguments. In zonemaster#1092 we accidentally broke this behavior, but since the behavior was undocument and unintended, it's better to clean it up than to fix it.
Purpose
Move the
test_results.results
json array into a dedicated tableresult_entries
.Context
Replaces #856
Changes
result_entries
andlog_level
result_entries(hash_id) -> test_results(hash_id)
andresult_entries(level) -> log_level(level)
get_test_result
andget_test_history
methods are modified to use the new tableDB::test_result
is now only a getter, all write operations use the new database methodsadd_result_entry
andadd_result_entries
How to test this PR
It should work the same way as it was
Credits to @blacksponge for most of the work.