Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncm-nmstate: relax syntax of VLAN interface names #1679

Merged
merged 2 commits into from
Nov 6, 2024

Conversation

jouvin
Copy link
Contributor

@jouvin jouvin commented Apr 16, 2024

Fixes #1678

@jouvin jouvin requested review from jrha and aka7 April 16, 2024 09:08
@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 122dd65 to 4f4c11c Compare April 16, 2024 09:24
@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

Implementing a fix I uncovered something I consider as a flaw in the current implementation: to guess the VLAN ID, the interface name is took in priority to the device name. It is a problem if you remove the need for a . before the VLAN ID because there is a chance, like in vlan0 example that you don't get the right VLAN ID (in the example it should be 123, as the device is eth0.123). For some reasons, it seems the tests are passing but I'd suggest to take device first and the interface name if device doesn't provide the information. But it has the risk of being a backward incompatible change.... What's your feeling? Did I miss something?

@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 4f4c11c to 8dfd32d Compare April 16, 2024 09:40
@aka7
Copy link
Contributor

aka7 commented Apr 16, 2024

When I did this for backward compatibility for network.pm, I followed this test file.
https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/route_rule.pan#L36

so I think long as your new change still covers this? then it should be fine for backward compatibility with network.pm?

Due to legacy code we had internally, we only define vlan interface using interfacename.vlan_id, i.e eth0.123 format as part of the interface name when defining vlan iface. So long as this test passes,
https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L30
we should be ok on our end.
so LGTM but I will test your PR against our profiles later this week to confirm.

regards

@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

To be clear, my PR currently only relax the pattern matching on interface/device name but doesn't implement my second suggestion of inverting the order in which we use interface name and device to guess the VLAN ID. As for us this second point is not very important because we always use the same value for the interface name and the device. But I have the feeling you would allow the interface name not to convey the VLAN ID but have it attached to the device name instead...

Anyway, with my fix, I'm a bit surprised that vlan0 test is still working... I should look in more details. But this fix has been deployed at IJCLab and fixes the problem we had before as our VLAN interfaces are all named vlanxxx, xxx being the VLAN ID.

@aka7
Copy link
Contributor

aka7 commented Apr 16, 2024

@jouvin from what I see, vlan0 test config is working as expected with your change since you just relaxed it and not removed anything. vlan0 config is using device=eth0.123 to get the vlanid. (https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L40) so that is correct and I wouldn't have expected your change to break it? so IMO it is working as expected. Unless I'm misunderstanding the solution here.

I'm not fussed with order of check, as we are only setting it one way which is using the format "/system/network/interfaces/eth0.123"

so long as this format of config carries on working too, it should be fine.

I just did quick check on lab host and its a noop as expected.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...

My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing in fact because it returned ID 1 instead of 123.

@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 8dfd32d to 5b9837d Compare April 18, 2024 08:11
@jouvin
Copy link
Contributor Author

jouvin commented Apr 18, 2024

@aka7 @stdweird I'm trying to fix a problem I identified with the way returned VLAN ID from the interface/device name is tested. The fix is trivial but I'm trying to add a test for this and I'm having trouble because adding the needed lines in the test profile make all the tests failing (including the first one which is just compiling the profile if I'm right) with something that looks as a Pan error except that compiling the profile manually gives no error. Surely a trivial mistake of mine... Any idea?

@aka7
Copy link
Contributor

aka7 commented Apr 18, 2024

@jouvin there is planned meetup discussion today, are you able to join todays call, perhaps we can discuss it there?

@aka7
Copy link
Contributor

aka7 commented Apr 18, 2024

@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...

My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which > is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be > tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing > in fact because it returned ID 1 instead of 123.

yes I just tried this on the test file, change interface name from vlan0 -> vlan1 and test fails.
May need to look at regex, because interface name with vlan0 is ignored but anything >0 is used as vlan id, which is not what we want on interface name? so I dont think you want to relax the regex for the interface name.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 18, 2024

@aka7 sorry, I'm at a conference this week, it was impossible for me to join the meetup... Don't spend time trying to fix tests, I have almost everything done but currently I only published the side issue I identified and I don't understand why adding a VLAN interface in the test profile beaks the Pan compilation despite Pan being able to compile it successfully if run manually...

@jouvin
Copy link
Contributor Author

jouvin commented Apr 18, 2024

@stdweird any chance you could have a look why the second commit is breaking all tests... I'm sure it's trivial and you'll identify the problem very quickly! Just my lack of Perl/Quattor development these last years!

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

@stdweird I'm still looking for some help to sort out my (probably stupid) mistake with the test additions, so that I can push the real fix related to the issue...

@stdweird
Copy link
Member

stdweird commented May 7, 2024

@jouvin sorry, i don't know where my head is last weeks ;)
can you already rebase the PR?

@stdweird
Copy link
Member

stdweird commented May 7, 2024

@jouvin quick look at the code, but $iface->{device} || $name is not the same as $iface->{device}. i guess you probably want to pass the $name by itself as 3rd argument and also use it to guess the vlanid

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

@stdweird don't worry, I'm totally overwhelmed too these days, tackling to many issues a the same time... My question was not really on the code itself (the line you mentioned is not from me and I was not completely convinced if was correct with Perl, look at a bashism...) but anyway my main problem at this stage is that I added one test in one of the .t file and this breaks all tests. It has nothing to do with the module code itself: just removing the added lines in the test is enough to make all the other tests successful. I guess I made a very stupid mistake in my test additions.

As for your request to rebase, I'd prefer not to do it at this stage (as I have the whole fix ready for this version and would prefer to rebase afterwards), except if you have particular reason for asking for it...

@stdweird
Copy link
Member

stdweird commented May 7, 2024

hmmm, that is not the reason. if you can rebase, i can run the tests locally and debug it quicker

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

Sorry I missed the conflicts... I'll do the rebase then...

@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 5b9837d to 8dbf700 Compare May 7, 2024 15:13
@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

@stdweird rebase done

@stdweird
Copy link
Member

stdweird commented May 7, 2024

@jouvin hidden in the output is a

[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.

@stdweird
Copy link
Member

stdweird commented May 7, 2024

@jouvin the pan unittest is an actual example? what is the generated filename for that device? probably the generated yml filename also needs to follow the $device logic you modified.

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

@stdweird yes sure, I need to fix the test for the modified logic but wanted to fix the test logic first... I apologise missing the error, I thought it looked every line... My bad! I'll check in the next days and try to complete this PR.

@stdweird
Copy link
Member

stdweird commented May 8, 2024

if you want to allow such names, just replace the whole device regex with \w+; or implement logic to skip the test (maybe controlled by boolean from the schema. you can have it default to true (ie allow arbitrary names by default)

@jouvin
Copy link
Contributor Author

jouvin commented May 8, 2024

@stdweird yes it is what I have done in fact but I have not pushed it yet.. I wanted to add a test first demonstrated the problem with the vlan0 example which works because of a bug in the configuration module...

For my information, where have you seen this "hidden error"?

@stdweird
Copy link
Member

it's not hidden, it's just a single line of ERROR output; it's easily missed

@jouvin
Copy link
Contributor Author

jouvin commented May 13, 2024

@stdweird I think I understood why I was not seeing the error you mentioned. You need -Dprove.args to see the error message from the configuration module (else I don't see any error related to the missing file) and the error I get is:

[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.

It seems that the component doesn't like the device name vlan.0 because there is no number after vlan and before the .. If using vlan0.0, the problem disappears. The change made seems to cause a problem with the ib1.12345 test but it is something else and doesn't break all the other tests that receive undef instead of the expected yml file contents...

I'll check if there is a good reason for such a restriction...

@jouvin
Copy link
Contributor Author

jouvin commented May 13, 2024

And it is unexpected for me. This error comes from ncm-network and I thought it was not used at all when using ncm-nmstate. I may have missed something...

@stdweird
Copy link
Member

@jouvin
Copy link
Contributor Author

jouvin commented May 15, 2024

@stdweird I found it but is it expected that we execute some ncm-network code when using ncm-nmstate?

@stdweird
Copy link
Member

stdweird commented Jul 9, 2024

@jouvin yes, that is normal. the nmstate backend is a subclass of the network backend

@jrha jrha added this to the 24.next milestone Sep 5, 2024
@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 8dbf700 to 0a71bff Compare October 21, 2024 20:48
@jouvin
Copy link
Contributor Author

jouvin commented Oct 21, 2024

Hi, I have been unable to finish the work on this critical fix for us (required if your VLAN name is not containing a . after vlan prefix, which is the case at IJCLab. The fix is working well but unfortunately I have been struggling with the tests failing for an unidentified reason so far... I rebased but it doesn't help (I was not really expecting it to help)... Not sure about what to do...

@stdweird
Copy link
Member

@jouvin the tests still fail on the same error. you need to adapt the regex to pass them

@jouvin
Copy link
Contributor Author

jouvin commented Oct 22, 2024

Thanks @stdweird . I forgot about it when doing my tests yesterday... Modifying this regex is really worrying... but I'll give it a try, hopefully this week :-)

@jouvin
Copy link
Contributor Author

jouvin commented Oct 23, 2024

I add a look this morning. The problem is not DEVICE_REGEX as the is_valid_interface() method is redefined in nmstate and is using another regex! I'm progressing!

@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 0a71bff to 11f935b Compare October 28, 2024 11:56
@jouvin jouvin marked this pull request as draft October 28, 2024 12:13
@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 11f935b to 7568224 Compare October 28, 2024 16:12
@jouvin
Copy link
Contributor Author

jouvin commented Oct 28, 2024

I was able to look in details into the remaining problems and to the things already fixed. It is mainly the problem of the filename regexp that I relaxed in my last changes. I added a test for vlan.456 but I have a failure during the command_history_ok test. And again I have no clue about what the problem could be, despite -Dprove.args=-v. @stdweird any suggestion where to look? In particular, is there a way to look at the generated YML file contents?

@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from 7568224 to d9c563f Compare October 28, 2024 17:49
@jouvin
Copy link
Contributor Author

jouvin commented Oct 28, 2024

Answering to myself... I found the error: command_history_ok check the command order and it was not what was expected... I should have read the command saying the commands are sorted alphabetically, bond last... I guess this sorting is done by nmstate command...

Anyway, the first for the original issue is now complete and working. @jrha is there any possibility to have it included inside the release? It is really critical for us as all our VLANs are name vlan.nnn and we don't want to change that.

Note: the changes include a fix for vlan0 expected config which IMO was wrong due to a bug fixed in 4dc5fff.

@jouvin jouvin marked this pull request as ready for review October 28, 2024 17:57
@jouvin jouvin force-pushed the nmstate_relax_vlan_name_syntax branch from d9c563f to 8b6f7cb Compare October 28, 2024 20:07
jrha
jrha previously approved these changes Oct 31, 2024
@jrha
Copy link
Member

jrha commented Nov 5, 2024

Assuming @aka7 has no problems with this, I'll get it merged tomorrow and start on another RC.

- Use interface name rather than interface device, as it is done for IPv6
- Fix test for vlan0 to match this change (previous test incorrect)
- Add a test for vlan1.123 to check correct definition of VLAN paramters when
  the VLAN ID is specified in the interface name and the device parameter is
  ommitted
- E.g. vlan.456
- Test added for this use case
@jrha jrha force-pushed the nmstate_relax_vlan_name_syntax branch from 8b6f7cb to 4f0524f Compare November 6, 2024 09:07
@jrha
Copy link
Member

jrha commented Nov 6, 2024

Rebased and conflicts addressed.

@jrha jrha merged commit e16692d into quattor:master Nov 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

ncm-nmstate: VLAN name pattern too restrictive
4 participants