Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use result entries table #856

Closed

Conversation

hannaeko
Copy link
Member

@hannaeko hannaeko commented Sep 7, 2021

Purpose

Move the test_results.results json array into a dedicated table.

Context

Was briefly mentioned at last group meeting (2021-09-01)

Changes

Adds a new table result_entries:

                                     Table "public.result_entries"
  Column   |          Type          | Collation | Nullable |                  Default                   
-----------+------------------------+-----------+----------+--------------------------------------------
 id        | integer                |           | not null | nextval('result_entries_id_seq'::regclass)
 hash_id   | character varying(16)  |           | not null | 
 level     | log_level              |           | not null | 
 module    | character varying(255) |           | not null | 
 testcase  | character varying(255) |           | not null | 
 tag       | character varying(255) |           | not null | 
 timestamp | real                   |           | not null | 
 args      | json                   |           | not null | 
Indexes:
    "result_entries_pkey" PRIMARY KEY, btree (id)
    "result_entries__hash_id" btree (hash_id)
    "result_entries__level" btree (level)



  • The get_test_result and get_test_history methods are modified to use the new table
  • DB::test_result is now only a getter, all write operations use the new database methods add_result_entry and add_result_entries

How to test this PR

  • Create a few tests
  • Get the history / results
    It should work the same way as it was

@matsduf matsduf added this to the v2021.2 milestone Sep 7, 2021
Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a good idea. I have some questions though:

  1. How will this affect the performance of the Test Agent when writing the results?
  2. How will this affect the performance of the RPCAPI reading the results and creating JSON for the response?
  3. Does this have any affect on storage?
  4. The args column in the new table is still a blob. Why not a separate table for that where each argument is one record?
  5. This is a breaking change unless there is a migration script. Is that planned?

I will also do some tests.

@hannaeko
Copy link
Member Author

hannaeko commented Sep 8, 2021

How will this affect the performance of the Test Agent when writing the results?

I need make some test but I currently have some issue when running batches on develop.

How will this affect the performance of the RPCAPI reading the results and creating JSON for the response?

I should not affect the performance, the RPCAPI was already decoded JSON blobs, but I will test too. As for the get_test_history method we should see the performance improving as we don't need to parse / grep up to 200 json blobs.

Does this have any affect on storage?

Again, I haven't done extensive testing yet.

The args column in the new table is still a blob. Why not a separate table for that where each argument is one record?

I thought of totally splitting the args in a separate table (arg_id, entry_id, arg_name, arg_value) but I am not sure of the gain given the extra complexity it will add to both writing and reading the results. I can dig a bit more into that to see what can be done or not.

This is a breaking change unless there is a migration script. Is that planned?

Yes, I wanted to have some early feedback before.

@hannaeko
Copy link
Member Author

How will this affect the performance of the Test Agent when writing the results?

I changed the implementation to do only one query to the database instead of one for each log, this way the performance is mostly unaffected (Actually a fair share of the time taken to insert the results in the database is taken by the grep to filter the log entries to a given minimum log level. This overhead can be reduced by avoiding Moose in the Logger::Entry packages. I have a working POC here that I am planning to integrate to the engine.)

@matsduf
Copy link
Contributor

matsduf commented Sep 23, 2021

I will review again when the conflicts are resolved.

@hannaeko hannaeko force-pushed the use-result-entries-table branch from 2106c78 to 3177c4c Compare September 27, 2021 11:28
@matsduf
Copy link
Contributor

matsduf commented Sep 27, 2021

 module    | character varying(255) |           | not null | 
 testcase  | character varying(255) |           | not null | 
 tag       | character varying(255) |           | not null | 

Module name, testcase ID and message tag will never reach the length of 255 chacters. We could easily specify that they should be a maximum of 32, 32 and 64 characters, respectively. How much would we gain?

If I understand "character varying" it can handle Unicode characters. Today these three codes have names in ASCII. Could we gain some by using a column type for string of 8-bit length characters?

@hannaeko
Copy link
Member Author

Module name, testcase ID and message tag will never reach the length of 255 chacters. We could easily specify that they should be a maximum of 32, 32 and 64 characters. How much would we gain?

varchar stores the data as length + data (without padding), there won't be any performance gain from restricting the length of string except for the length difference in the strings. What could be done is store them as enum / table + foreign key, but that could be done later as I want to keep the number of changes brought by this PR to a minimum.

If I understand "character varying" I can handle Unicode characters. Today these three codes have names in ASCII. Could we gain some by using a column type for string of 8-bit length characters?

The database is encoded in utf8 so there is no overhead on ascii characters, they are still stored on 8-bit.

@matsduf
Copy link
Contributor

matsduf commented Sep 27, 2021

I think this looks fine. We should improve by having database engine independent documentation of the database structure to ensure that we get consistency between the engines.

Copy link
Member

@mattias-p mattias-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me. Though Travis is unhappy. I only have a couple of questions/suggestions.

$dbh->do(
"CREATE TABLE result_entries (
id integer AUTO_INCREMENT PRIMARY KEY,
hash_id VARCHAR(16) not null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider changing this into CHAR(16)? (Here as well as for the other database adapters.)

@@ -170,9 +173,10 @@ sub run {
}
}

$self->{_db}->test_results( $test_id, Zonemaster::Engine->logger->json( 'INFO' ) );
my @entries = grep { $_->numeric_level >= $numeric{INFO} } @{ Zonemaster::Engine->logger->entries };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider filtering the entries before we even buffer them, as opposed to before we store them? I'm thinking we could save some memory that way.

@matsduf matsduf modified the milestones: v2021.2, v2022.1 Oct 6, 2021
@hannaeko hannaeko marked this pull request as draft November 10, 2021 09:34
@matsduf matsduf modified the milestones: v2022.1, v2022.2 Apr 27, 2022
@ghost ghost mentioned this pull request May 16, 2022
@matsduf matsduf modified the milestones: v2022.2, v2023.1 Nov 15, 2022
@matsduf
Copy link
Contributor

matsduf commented Mar 7, 2023

@blacksponge, I think this is interesting. If you e.g. test all domain names under a TLD you might want to extract all domain names getting a specific message tag or all message tags with a certain level for each domain name. Today you have to open the JSON blob for each domain name.

What are your plans?

Comment 2023-07-13: Today I am better informed and it appears that both Mysql and Postgresql have good support to extract data from the JSON blod as if they were fields in a database table.

@ghost ghost mentioned this pull request Mar 16, 2023
@ghost
Copy link

ghost commented Mar 16, 2023

Replaced by #1092. The logic is kept, this is mainly a rebase to fix the conflicts.

@ghost ghost closed this Mar 16, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants