-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use name normalization method #1132
Conversation
Waiting for #1092 to update the DB migration script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
4497c2b
to
192a77d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Press return to run 'make distcheck'...
"/usr/local/bin/perl" "-Iinc" "-MExtUtils::Manifest=fullcheck" -e fullcheck
Not in MANIFEST: t/db_ddl.t
Not in MANIFEST: t/queue.t
The error comes from older PRs, not this one. Fixed in #1136.
|
It does not work to use |
That PR has now been merged. |
I have updated the migration script. I am currently running it against a snapshot of zonemaster.net (1.9M tests) to see how long it takes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have one question though.
while ( my $row = $sth1->fetchrow_hashref ) { | ||
my $hash_id = $row->{hash_id}; | ||
my $raw_params = decode_json($row->{params}); | ||
my $domain = $raw_params->{domain}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're extracting domain from params and putting in its own field, but we're still keeping a copy inside params. Is this intentional? Should this be further cleaned up in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domain is still in the params, whether it could / should be removed is a good question, but outside of this PR.
Here I am taking the domain from the params instead its own field for normalization to avoid encoding issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although Travis can't test it.
@blacksponge, What happens if there is a domain name that cannot be normalized, e.g. containing character not allowed or being a U-label that cannot be converted to A-label? I think I would suggest keeping the domain name as-is in the database. |
Good suggestion, I'll do that. |
396bd31
I implemented that. % perl share/patch/patch_db_zonemaster_backend_ver_11.0.3.pl
Configured database engine: SQLite
Starting database migration
-> (1/2) Populating new result_entries table
Will update 0 rows
Progress update: 0 / 0
-> (2/2) Normalizing domain names
Will update 5 rows
Caught error while updating record, ignoring: Caught Zonemaster::Backend::Error::Internal in the `Zonemaster::Backend::DB::_normalize_domain` method: Normalizing domain returned errors. Context: ['Domain name has repeated dots.']
20%
40%
60%
80%
100%
Migration done |
$db->dbh->do('UPDATE test_results SET domain = ?, params = ?, fingerprint = ? where hash_id = ?', undef, $domain, $params, $fingerprint, $hash_id); | ||
}; | ||
if ($@) { | ||
warn "Caught error while updating record, ignoring: $@\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to explicitly include the hash id or perhaps the domain name itself in this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to include the hash id,
Configured database engine: SQLite
Starting database migration
-> (1/2) Populating new result_entries table
Will update 0 rows
Progress update: 0 / 0
-> (2/2) Normalizing domain names
Will update 5 rows
Caught error while updating record with hash id b212fc97d29d51f3, ignoring: Caught Zonemaster::Backend::Error::Internal in the `Zonemaster::Backend::DB::_normalize_domain` method: Normalizing domain returned errors. Context: ['Le nom de domaine contient plusieurs points successifs.']
20%
40%
60%
80%
100%
Migration done
if ($@) { | ||
warn "Caught error while updating record with hash id $hash_id, ignoring: $@\n"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see any other error than the domain name cannot be normalized or some problem with the database engine? For the first error I would suggest a milder wording, something like that the domain name cannot be normalized and is kept in its original form. The user should be not in doubt that this can be ignored.
A database error should maybe even result in a die?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of decoding related errors.
$domain = Zonemaster::Backend::DB::_normalize_domain( $domain ); | ||
|
||
$db->dbh->do('UPDATE test_results SET domain = ?, params = ?, fingerprint = ? where hash_id = ?', undef, $domain, $params, $fingerprint, $hash_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If normalization fails, $domain will be set to empty, won't it? And then the database update will attempt to set an empty domain? And the database update will fail because the domain must not be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it stops the execution and exit the eval block as there is a die in the _normalize_domain method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there is one single domain name that cannot be normalized then no conversion can be done. I do not think that is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not at all, I am updating the domains one by one, so if one domain fail it does not affect the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eval block just handle the migration of one single domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be possible to ignore a domain name that cannot be normalized. I think the script should ignore such a domain name by default, and just keep that domain name unnormalized, and proceed with the conversion.
This is exactly what it does. |
v2023.2 Release TestingI repeated the steps in the "How to test this PR" section on Rocky Linux 8.9 and found no problems. |
Purpose
Use new name normalization method instead of Basic00.
Context
zonemaster/zonemaster-engine#1040 introduce a new method to normalize domain names before testing. Let's use it.
It also fixes #1032.
Changes
Note: With this PR, domains are normalized before being saved to database, meaning that domains that contain ulabel will be converted to alabel before saving. That might break history for the affected domains. Should we provide a migration script to normalize all the domains in the database?
How to test this PR
Test that the domain validation is working
Test that the validation is also working for nameserver
Test that fetching data from parent is working for idna domains