-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dhcpv4: send current hostname, fix spec compliance of renewals #5897
Conversation
23bd03e
to
0b4440d
Compare
/ok-to-test |
internal/app/machined/pkg/controllers/network/operator/dhcp4.go
Outdated
Show resolved
Hide resolved
internal/app/machined/pkg/controllers/network/operator/dhcp4.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your work, and this looks really great!
go.mod
Outdated
@@ -3,6 +3,10 @@ module github.com/talos-systems/talos | |||
go 1.18 | |||
|
|||
replace ( | |||
// Forked insomniacslk/dhcp with DHCP spec compliancy fixes | |||
// Upstream PR: https://github.com/insomniacslk/dhcp/pull/469 | |||
github.com/insomniacslk/dhcp => github.com/twelho/dhcp v0.0.0-20220708120125-da9fc6c72fb5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really love to avoid the need to use the fork unless of course we can't get any feedback from the maintainer.
let's see if we can try to ping them on that PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've pinged the people I believe to be the maintainers in insomniacslk/dhcp#469 now. Let's see if they answer.
func (d *DHCP4) setupHostnameWatch(ctx context.Context) (<-chan state.Event, error) { | ||
hostnameWatchCh := make(chan state.Event) | ||
|
||
return hostnameWatchCh, d.state.WatchKind(ctx, hostnameStatusMetadata, hostnameWatchCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return hostnameWatchCh, d.state.WatchKind(ctx, hostnameStatusMetadata, hostnameWatchCh) | |
return hostnameWatchCh, d.state.Watch(ctx, hostnameStatusMetadata, hostnameWatchCh) |
This should be Watch
to watch a specific resource instance here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Is there only ever a single instance of HostnameStatus
, or why is Watch
with resource.VersionUndefined
preferred to using WatchKind
here? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata consists of namespace
, type
, id
and version
.
WatchKind
allows you watching items passing only namespace
and type
what makes it return the list of items of this kind.
There is always a single HostnameStatus
resource in Talos and the id for it is known, so you can use Watch
which always watches a single resource.
return | ||
case <-time.After(renewInterval): | ||
case <-hostnameWatchCh: | ||
err := d.updateHostname(ctx, &hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact the event itself carries the new hostname, so we don't need to get it once again.
we should check the even type, but basically something like:
if event.Type == state.Created || event.Type == state.Updated {
hostname = event.Resource.(*network.HostnameStatus).TypedSpec().Hostname
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now removed updateHostname
altogether and use the hostname directly from the event. Thanks for the pointer. The check needs to cover all event types, since hostname deletion should also indicate to the DHCP server that the node has lost its previous hostname. I'm doing a checked cast to network.HostnameStatus
, which fails intentionally if it gets fed a Tombstone
generated by a deletion event. I've also added a comment about this.
} | ||
|
||
// knownHostname checks if the given hostname has been defined by this operator. | ||
func (d *DHCP4) knownHostname(hostname *string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if I should do a little bit of refactoring and also put the hostname source into the HostnameStatus
resource, this way we can avoid this comparison in favor of checking if hostname source is the operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. knownHostname
really is a hack, and it would be useful to know which of the hostname candidates was elected as the node hostname for transparency and debugging as well. It would be best if the HostnameStatus
source be used to identify an individual HostnameSpec
, as just knowing the configuration layer of the source is not enough (e.g. ambiguity ensues when both the dhcp4
and dhcp6
operators have HostnameSpec
s defined in the operator layer).
The current functionality with knownHostname
works as-is if you prefer to merge this PR first, but I can also rebase on top of the changes to HostnameStatus
if that's better for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I think I can get in some changes early next week for that, as we're still waiting for upstream dhcp library fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, go for it. I can then rebase on top of that work. There's one more issue with the hostname "parroting" prevention as well (not directly related to knownHostname
), the approach needs to be changed a bit to work even when the only hostname (apart from the default hostname) is defined by DHCP. I'll look at fixing that early next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach is now in place and confirmed working even in the new edge cases I found. When the locally defined hostname and the hostname served by DHCP are the same, both now register correctly.
0b4440d
to
68f214f
Compare
68f214f
to
e85233f
Compare
The CI build job seems to have failed due to a temporary issue in cloning the repo, it needs a restart. |
The original (pre-review) version of the sources can be found here: https://github.com/twelho/talos/tree/dhcpv4-send-hostname-v1 |
The functionality of e85233f has now been tested against |
e85233f
to
56a757c
Compare
I've split the hostname sending and requesting operations into two separate invocations to completely sidestep the DHCP server "parroting" issue, where the DHCP server just automatically replies with the hostname sent by the client without verifying that it is actually registered on the server. The current hostname is always sent when acquiring a new lease (which happens also when the local hostname changes, e.g. due to a machine configuration edit), and the DHCP-served hostname is requested right after with an INFORM request. This avoids needing to wait for the first RENEW, which may also never happen depending on the server configuration. RENEWs will still always request the hostname to detect and apply changes done on the DHCP server side. The INFORM support additionally requires some upstream improvements, which I've submitted at insomniacslk/dhcp#470. |
e5624bc
to
40101cf
Compare
I've identified a race condition when testing the INFORM request build, which has now been fixed in the latest commit. Any hostname request (or lease renewal for that matter) is performed using unicast, which requires the address sourced from DHCP to be available on the host (i.e. bound to an interface). This does not happen immediately after notifying the controller about the new address, and thus any immediate INFORM hostname requests will fail to bind to an address. After trying out a couple different approaches, I settled on a context-cancellable wait for both the address assignment and network upon lease negotiation to safely continue with unicast operations. This resolves the random failures I've been seeing when testing this at a large-ish scale. Another issue I've spotted and subsequently fixed is the machine hostnames disappearing from DNS after a renewal. Turns out that at least Windows Server notices if the hostname is only sent during the initial negotiation and removes it from DNS. The most robust solution I came up with is always sending the hostname during request/renew operations, and extending the use of the INFORM hostname requests to lease renewals as well. So far so good. |
40101cf
to
dfa0a54
Compare
It looks like things fail with our "test" DHCP server, but we can fix that of course. This test DHCP server was never supposed to be complete implementation. |
It looks like upstreaming changes didn't quite work yet? Anything we can do to help? |
dfa0a54
to
fa731c8
Compare
It should keep working even if the server doesn't support or can't process DHCPINFORM requests, hostname request errors are non-fatal on purpose. There are some intermittent CI failures, but I haven't been able to pin-point then to this DHCP rewrite. Of course if you have found some specific failure cases I would be happy to try to fix them.
I will ping the maintainers once more with a message in both insomniacslk/dhcp#469 and insomniacslk/dhcp#470. Of course if you can voice the importance of these changes there might be higher chance of people finally reviewing and merging them, as I think I have now addressed all of the feedback (at least in the former PR) and I don't plan on doing any more changes in either. |
As we're going to branch off v1.2.0 stable branch today, I moved it to the next iteration - v1.3.0 |
649bcfe
to
ebff506
Compare
Great job ! Any plans to finish this one? Thanks |
91b640c
to
cb93448
Compare
@twelho thanks for your work, we plan to cut Talos beta release next week, and I want this PR to be merged before that. If the upstream changes doesn't get forward in this time, I'm looking towards forking the DHCP library with your change (or using your fork) until your PR gets merged. |
Alright, I feel like the DHCP library work is very close. I've think I've satisfied all of the maintainer's constraints from my side, so it's essentially down to them to just decide how INFORM requests should be constructed. I believe it would help if you could also notify them about the urgency one more time in this thread: insomniacslk/dhcp#470 (comment). If they can still make up their mind this week and get the work merged, I can rebase/update this PR right away. However, if it looks like it won't happen before next week, I can also finish up this work using my fork for the time being, but avoiding keeping the fork would be my preference as well. |
0c6ffeb
to
a9e257b
Compare
With insomniacslk/dhcp#470 (and insomniacslk/dhcp#498 as a minor fix) finally merged (yay 🎉), this work does not need to rely on my own fork anymore, as all of my changes have now now landed upstream. With that, I've gone through this PR one more time to remove the Note, however, that I've not yet tested this in its final form. As it's been 8 months since starting this work, I don't have access to the original test setup anymore (which also covered Windows environments), so I will need to create a new test setup for this, but I will only be able to test against |
4b1e63a
to
2a49d53
Compare
2a49d53
to
3ddaf4d
Compare
I did some (small) changes:
The question of detecting "is it DHCP hostname", "should I send the hostname" is still not 100% reliable, but seems to be better now. |
peer review in bullpen? |
/promote integration-misc |
) | ||
|
||
// ToUDPAddr combines the given net.IP and port to form a net.UDPAddr. | ||
func ToUDPAddr(ip net.IP, port uint16) (*net.UDPAddr, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to internal/
under operator/
?
move resources/network
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ended up removing this file and doing a straightforward &net.UDPAddr{}
initialization
/promote e2e |
/promote e2e-aws |
This adds support for automatically registering node hostnames in DNS by sending the current hostname to DHCP via option 12. If the current hostname is updated, issue a new DISCOVER to propagate the update to DHCP (updating the hostname on lease renewals is not universally supported by DHCP servers). This addition maintains the previous functionality where the node can also request its hostname from the DHCP server. The received hostname will be processed and prioritized as usual by the `network.HostnameSpecController`. This change set also contains fixes to make DHCP renewals compliant with RFC 2131, specifically avoiding sending the server identifier and requested IP address when issuing renewals using a previous offer. This also uncovered issues and missing features in the upstream `insomniacslk/dhcp` library, the fixes and improvements for which are now finally merged. Sending hostname updates have been tested against `dnsmasq` and the built-in DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge cases with overridden hostnames from different configuration layers have been extensively tested against `dnsmasq`. Signed-off-by: Dennis Marttinen <[email protected]> Signed-off-by: Andrey Smirnov <[email protected]>
3ddaf4d
to
45c5b47
Compare
/promote integration-qemu |
/promote e2e-aws |
/promote e2e-nothing |
/m |
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the
network.HostnameSpecController
.This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered an
issue in the upstream
insomniacslk/dhcp
library, for which a fix is nowpending at insomniacslk/dhcp#469. As upstream
contributions seem to have stalled, I have temporarily added a replacement for
the library to my own fork that carries the fix.
Sending hostname updates have been tested against
dnsmasq
and the built-inDHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against
dnsmasq
.Fixes #4536.
Acceptance
Please use the following checklist:
make conformance
)make fmt
)make lint
)make docs
)make unit-tests
)