Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use name normalization method #1132

Merged
merged 8 commits into from
Dec 13, 2023

Conversation

hannaeko
Copy link
Member

@hannaeko hannaeko commented Oct 24, 2023

Purpose

Use new name normalization method instead of Basic00.

Context

zonemaster/zonemaster-engine#1040 introduce a new method to normalize domain names before testing. Let's use it.

It also fixes #1032.

Changes

  • Normalize domain name before saving to database
  • Use normalized domain for history search
  • Normalize domain name before fetching data from parent (fixes issue that cause that method to not work for idna domains, see Cannot lookup data for IDN domain zonemaster-gui#426)

Note: With this PR, domains are normalized before being saved to database, meaning that domains that contain ulabel will be converted to alabel before saving. That might break history for the affected domains. Should we provide a migration script to normalize all the domains in the database?

How to test this PR

Test that the domain validation is working

% ./script/zmb --server http://127.0.0.1:5000 start_domain_test --domain '' | jq .error.data -c
[{"message":"Domain name is empty.","path":"/domain"}]

% ./script/zmb --server http://127.0.0.1:5000 start_domain_test --domain , | jq .error.data -c
[{"path":"/domain","message":"Domain name has an ASCII label (\",\") with a character not permitted."}]

Test that the validation is also working for nameserver

% ./script/zmb --server http://127.0.0.1:5000 start_domain_test --domain example.org --nameserver ':0.0.0.0' | jq .error.data -c
[{"path":"/nameservers/0/ns","message":"Domain name is empty."}]

Test that fetching data from parent is working for idna domains

% curl 'http://localhost:5000' --data '{"jsonrpc":"2.0","id":0,"method":"get_data_from_parent_zone","params":{"domain":"café.fr"}}'
{"jsonrpc":"2.0","result":{"ds_list":[],"ns_list":[{"ns":"ns1.parkingcrew.net","ip":"13.248.158.159"},{"ns":"ns2.parkingcrew.net","ip":"76.223.21.9"}]},"id":0}
  • Try lauching a test using a GUI, check that it is working correctly.
  • Then check history, check that is is working correctly.

@hannaeko hannaeko added this to the v2023.2 milestone Oct 24, 2023
@hannaeko hannaeko added the V-Minor Versioning: The change gives an update of minor in version. label Oct 24, 2023
lib/Zonemaster/Backend/DB.pm Outdated Show resolved Hide resolved
lib/Zonemaster/Backend/Validator.pm Outdated Show resolved Hide resolved
@hannaeko
Copy link
Member Author

hannaeko commented Nov 6, 2023

Waiting for #1092 to update the DB migration script.

mattias-p
mattias-p previously approved these changes Nov 17, 2023
Copy link
Member

@mattias-p mattias-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lib/Zonemaster/Backend/DB.pm Outdated Show resolved Hide resolved
@tgreenx tgreenx linked an issue Nov 21, 2023 that may be closed by this pull request
@hannaeko hannaeko force-pushed the domain-normalization branch from 4497c2b to 192a77d Compare November 21, 2023 15:38
Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Press return to run 'make distcheck'...
"/usr/local/bin/perl" "-Iinc" "-MExtUtils::Manifest=fullcheck" -e fullcheck
Not in MANIFEST: t/db_ddl.t
Not in MANIFEST: t/queue.t

The error comes from older PRs, not this one. Fixed in #1136.

@matsduf matsduf dismissed their stale review November 23, 2023 12:54

Error not from this PR.

@matsduf
Copy link
Contributor

matsduf commented Nov 27, 2023

Rå$ttgift.se is captured as having errors ("Domain name has a non-ASCII label ("Rå$ttgift") which is not a valid U-label.") but Rå_ttgift.se passes without error. I assume this is an error in Engine.

@matsduf
Copy link
Contributor

matsduf commented Nov 27, 2023

It does not work to use zmb for IDN names with U-label, see #1138.

matsduf
matsduf previously approved these changes Nov 27, 2023
@marc-vanderwal
Copy link
Contributor

Waiting for #1092 to update the DB migration script.

That PR has now been merged.

@hannaeko hannaeko dismissed stale reviews from matsduf and mattias-p via 6c85dbd December 5, 2023 14:24
@hannaeko
Copy link
Member Author

hannaeko commented Dec 5, 2023

I have updated the migration script. I am currently running it against a snapshot of zonemaster.net (1.9M tests) to see how long it takes.

mattias-p
mattias-p previously approved these changes Dec 5, 2023
Copy link
Member

@mattias-p mattias-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have one question though.

while ( my $row = $sth1->fetchrow_hashref ) {
my $hash_id = $row->{hash_id};
my $raw_params = decode_json($row->{params});
my $domain = $raw_params->{domain};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're extracting domain from params and putting in its own field, but we're still keeping a copy inside params. Is this intentional? Should this be further cleaned up in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain is still in the params, whether it could / should be removed is a good question, but outside of this PR.

Here I am taking the domain from the params instead its own field for normalization to avoid encoding issues.

tgreenx
tgreenx previously approved these changes Dec 5, 2023
Copy link
Contributor

@tgreenx tgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although Travis can't test it.

@matsduf
Copy link
Contributor

matsduf commented Dec 5, 2023

@blacksponge, What happens if there is a domain name that cannot be normalized, e.g. containing character not allowed or being a U-label that cannot be converted to A-label? I think I would suggest keeping the domain name as-is in the database.

marc-vanderwal
marc-vanderwal previously approved these changes Dec 6, 2023
@hannaeko
Copy link
Member Author

hannaeko commented Dec 6, 2023

@blacksponge, What happens if there is a domain name that cannot be normalized, e.g. containing character not allowed or being a U-label that cannot be converted to A-label? I think I would suggest keeping the domain name as-is in the database.

Good suggestion, I'll do that.

@hannaeko hannaeko dismissed stale reviews from marc-vanderwal, tgreenx, and mattias-p via 396bd31 December 8, 2023 13:13
@hannaeko
Copy link
Member Author

hannaeko commented Dec 8, 2023

@blacksponge, What happens if there is a domain name that cannot be normalized, e.g. containing character not allowed or being a U-label that cannot be converted to A-label? I think I would suggest keeping the domain name as-is in the database.

Good suggestion, I'll do that.

I implemented that.

% perl share/patch/patch_db_zonemaster_backend_ver_11.0.3.pl
Configured database engine: SQLite
Starting database migration

-> (1/2) Populating new result_entries table
Will update 0 rows
Progress update: 0 / 0

-> (2/2) Normalizing domain names
Will update 5 rows
Caught error while updating record, ignoring: Caught Zonemaster::Backend::Error::Internal in the `Zonemaster::Backend::DB::_normalize_domain` method: Normalizing domain returned errors. Context: ['Domain name has repeated dots.']
20%
40%
60%
80%
100%

Migration done

mattias-p
mattias-p previously approved these changes Dec 8, 2023
$db->dbh->do('UPDATE test_results SET domain = ?, params = ?, fingerprint = ? where hash_id = ?', undef, $domain, $params, $fingerprint, $hash_id);
};
if ($@) {
warn "Caught error while updating record, ignoring: $@\n";
Copy link
Member

@mattias-p mattias-p Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to explicitly include the hash id or perhaps the domain name itself in this message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to include the hash id,

Configured database engine: SQLite
Starting database migration

-> (1/2) Populating new result_entries table
Will update 0 rows
Progress update: 0 / 0

-> (2/2) Normalizing domain names
Will update 5 rows
Caught error while updating record with hash id b212fc97d29d51f3, ignoring: Caught Zonemaster::Backend::Error::Internal in the `Zonemaster::Backend::DB::_normalize_domain` method: Normalizing domain returned errors. Context: ['Le nom de domaine contient plusieurs points successifs.']
20%
40%
60%
80%
100%

Migration done

Comment on lines +114 to +116
if ($@) {
warn "Caught error while updating record with hash id $hash_id, ignoring: $@\n";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you see any other error than the domain name cannot be normalized or some problem with the database engine? For the first error I would suggest a milder wording, something like that the domain name cannot be normalized and is kept in its original form. The user should be not in doubt that this can be ignored.

A database error should maybe even result in a die?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of decoding related errors.

Comment on lines +110 to +112
$domain = Zonemaster::Backend::DB::_normalize_domain( $domain );

$db->dbh->do('UPDATE test_results SET domain = ?, params = ?, fingerprint = ? where hash_id = ?', undef, $domain, $params, $fingerprint, $hash_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If normalization fails, $domain will be set to empty, won't it? And then the database update will attempt to set an empty domain? And the database update will fail because the domain must not be empty?

Copy link
Member Author

@hannaeko hannaeko Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it stops the execution and exit the eval block as there is a die in the _normalize_domain method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there is one single domain name that cannot be normalized then no conversion can be done. I do not think that is reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well not at all, I am updating the domains one by one, so if one domain fail it does not affect the others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eval block just handle the migration of one single domain.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be possible to ignore a domain name that cannot be normalized. I think the script should ignore such a domain name by default, and just keep that domain name unnormalized, and proceed with the conversion.

@hannaeko
Copy link
Member Author

It must be possible to ignore a domain name that cannot be normalized. I think the script should ignore such a domain name by default, and just keep that domain name unnormalized, and proceed with the conversion.

This is exactly what it does.

@hannaeko hannaeko merged commit d5b4e57 into zonemaster:develop Dec 13, 2023
1 check passed
@mattias-p
Copy link
Member

mattias-p commented Jan 15, 2024

v2023.2 Release Testing

I repeated the steps in the "How to test this PR" section on Rocky Linux 8.9 and found no problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-ReleaseTested Status: The PR has been successfully tested in release testing V-Minor Versioning: The change gives an update of minor in version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating test with empty domain gives confusing error message Normalize zone name before test start
5 participants