Use result entries table #856

hannaeko · 2021-09-07T14:35:30Z

Purpose

Move the test_results.results json array into a dedicated table.

Context

Was briefly mentioned at last group meeting (2021-09-01)

Changes

Adds a new table result_entries:

                                     Table "public.result_entries"
  Column   |          Type          | Collation | Nullable |                  Default                   
-----------+------------------------+-----------+----------+--------------------------------------------
 id        | integer                |           | not null | nextval('result_entries_id_seq'::regclass)
 hash_id   | character varying(16)  |           | not null | 
 level     | log_level              |           | not null | 
 module    | character varying(255) |           | not null | 
 testcase  | character varying(255) |           | not null | 
 tag       | character varying(255) |           | not null | 
 timestamp | real                   |           | not null | 
 args      | json                   |           | not null | 
Indexes:
    "result_entries_pkey" PRIMARY KEY, btree (id)
    "result_entries__hash_id" btree (hash_id)
    "result_entries__level" btree (level)

The get_test_result and get_test_history methods are modified to use the new table
DB::test_result is now only a getter, all write operations use the new database methods add_result_entry and add_result_entries

How to test this PR

Create a few tests
Get the history / results
It should work the same way as it was

matsduf

I think this looks like a good idea. I have some questions though:

How will this affect the performance of the Test Agent when writing the results?
How will this affect the performance of the RPCAPI reading the results and creating JSON for the response?
Does this have any affect on storage?
The args column in the new table is still a blob. Why not a separate table for that where each argument is one record?
This is a breaking change unless there is a migration script. Is that planned?

I will also do some tests.

hannaeko · 2021-09-08T08:19:31Z

How will this affect the performance of the Test Agent when writing the results?

I need make some test but I currently have some issue when running batches on develop.

How will this affect the performance of the RPCAPI reading the results and creating JSON for the response?

I should not affect the performance, the RPCAPI was already decoded JSON blobs, but I will test too. As for the get_test_history method we should see the performance improving as we don't need to parse / grep up to 200 json blobs.

Does this have any affect on storage?

Again, I haven't done extensive testing yet.

The args column in the new table is still a blob. Why not a separate table for that where each argument is one record?

I thought of totally splitting the args in a separate table (arg_id, entry_id, arg_name, arg_value) but I am not sure of the gain given the extra complexity it will add to both writing and reading the results. I can dig a bit more into that to see what can be done or not.

This is a breaking change unless there is a migration script. Is that planned?

Yes, I wanted to have some early feedback before.

hannaeko · 2021-09-23T15:42:28Z

How will this affect the performance of the Test Agent when writing the results?

I changed the implementation to do only one query to the database instead of one for each log, this way the performance is mostly unaffected (Actually a fair share of the time taken to insert the results in the database is taken by the grep to filter the log entries to a given minimum log level. This overhead can be reduced by avoiding Moose in the Logger::Entry packages. I have a working POC here that I am planning to integrate to the engine.)

matsduf · 2021-09-23T21:21:29Z

I will review again when the conflicts are resolved.

matsduf · 2021-09-27T12:03:37Z

 module    | character varying(255) |           | not null | 
 testcase  | character varying(255) |           | not null | 
 tag       | character varying(255) |           | not null |

Module name, testcase ID and message tag will never reach the length of 255 chacters. We could easily specify that they should be a maximum of 32, 32 and 64 characters, respectively. How much would we gain?

If I understand "character varying" it can handle Unicode characters. Today these three codes have names in ASCII. Could we gain some by using a column type for string of 8-bit length characters?

hannaeko · 2021-09-27T13:11:39Z

Module name, testcase ID and message tag will never reach the length of 255 chacters. We could easily specify that they should be a maximum of 32, 32 and 64 characters. How much would we gain?

varchar stores the data as length + data (without padding), there won't be any performance gain from restricting the length of string except for the length difference in the strings. What could be done is store them as enum / table + foreign key, but that could be done later as I want to keep the number of changes brought by this PR to a minimum.

If I understand "character varying" I can handle Unicode characters. Today these three codes have names in ASCII. Could we gain some by using a column type for string of 8-bit length characters?

The database is encoded in utf8 so there is no overhead on ascii characters, they are still stored on 8-bit.

matsduf · 2021-09-27T15:10:48Z

I think this looks fine. We should improve by having database engine independent documentation of the database structure to ensure that we get consistency between the engines.

mattias-p

This looks really good to me. Though Travis is unhappy. I only have a couple of questions/suggestions.

mattias-p · 2021-09-27T14:25:16Z

lib/Zonemaster/Backend/DB/MySQL.pm

+    $dbh->do(
+        "CREATE TABLE result_entries (
+            id integer AUTO_INCREMENT PRIMARY KEY,
+            hash_id VARCHAR(16) not null,


Did you consider changing this into CHAR(16)? (Here as well as for the other database adapters.)

mattias-p · 2021-09-27T15:20:05Z

lib/Zonemaster/Backend/TestAgent.pm

@@ -170,9 +173,10 @@ sub run {
        }
    }

-    $self->{_db}->test_results( $test_id, Zonemaster::Engine->logger->json( 'INFO' ) );
+    my @entries = grep { $_->numeric_level >= $numeric{INFO} } @{ Zonemaster::Engine->logger->entries };


Did you consider filtering the entries before we even buffer them, as opposed to before we store them? I'm thinking we could save some memory that way.

matsduf · 2023-03-07T09:12:27Z

@blacksponge, I think this is interesting. If you e.g. test all domain names under a TLD you might want to extract all domain names getting a specific message tag or all message tags with a certain level for each domain name. Today you have to open the JSON blob for each domain name.

What are your plans?

Comment 2023-07-13: Today I am better informed and it appears that both Mysql and Postgresql have good support to extract data from the JSON blod as if they were fields in a database table.

ghost · 2023-03-16T09:09:45Z

Replaced by #1092. The logic is kept, this is mainly a rebase to fix the conflicts.

matsduf added this to the v2021.2 milestone Sep 7, 2021

matsduf reviewed Sep 7, 2021

View reviewed changes

hannaeko added 5 commits September 27, 2021 13:09

create new result_entries table

7ca782f

update remaining db engines

23c1c0e

remove debug

4561e60

insert result entries in one query

c759a92

use enum instead of strings

3177c4c

hannaeko force-pushed the use-result-entries-table branch from 2106c78 to 3177c4c Compare September 27, 2021 11:28

update progress after result update

313f29e

mattias-p reviewed Sep 27, 2021

View reviewed changes

add migration script for pg

36fc790

matsduf modified the milestones: v2021.2, v2022.1 Oct 6, 2021

hannaeko marked this pull request as draft November 10, 2021 09:34

matsduf modified the milestones: v2022.1, v2022.2 Apr 27, 2022

ghost mentioned this pull request May 16, 2022

Simplify test progress' computation #988

Merged

matsduf modified the milestones: v2022.2, v2023.1 Nov 15, 2022

ghost mentioned this pull request Mar 16, 2023

New result entries table #1092

Merged

ghost closed this Mar 16, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use result entries table #856

Use result entries table #856

hannaeko commented Sep 7, 2021 •

edited

Loading

matsduf left a comment

hannaeko commented Sep 8, 2021 •

edited

Loading

hannaeko commented Sep 23, 2021

matsduf commented Sep 23, 2021

matsduf commented Sep 27, 2021 •

edited

Loading

hannaeko commented Sep 27, 2021

matsduf commented Sep 27, 2021

mattias-p left a comment

mattias-p Sep 27, 2021

mattias-p Sep 27, 2021

matsduf commented Mar 7, 2023 •

edited

Loading

ghost commented Mar 16, 2023

Use result entries table #856

Use result entries table #856

Conversation

hannaeko commented Sep 7, 2021 • edited Loading

Purpose

Context

Changes

How to test this PR

matsduf left a comment

Choose a reason for hiding this comment

hannaeko commented Sep 8, 2021 • edited Loading

hannaeko commented Sep 23, 2021

matsduf commented Sep 23, 2021

matsduf commented Sep 27, 2021 • edited Loading

hannaeko commented Sep 27, 2021

matsduf commented Sep 27, 2021

mattias-p left a comment

Choose a reason for hiding this comment

mattias-p Sep 27, 2021

Choose a reason for hiding this comment

mattias-p Sep 27, 2021

Choose a reason for hiding this comment

matsduf commented Mar 7, 2023 • edited Loading

ghost commented Mar 16, 2023

hannaeko commented Sep 7, 2021 •

edited

Loading

hannaeko commented Sep 8, 2021 •

edited

Loading

matsduf commented Sep 27, 2021 •

edited

Loading

matsduf commented Mar 7, 2023 •

edited

Loading