Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dhcpv4: send current hostname, fix spec compliance of renewals #5897

Merged
merged 1 commit into from
Mar 29, 2023

Conversation

twelho
Copy link
Contributor

@twelho twelho commented Jul 12, 2022

This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the network.HostnameSpecController.

This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered an
issue in the upstream insomniacslk/dhcp library, for which a fix is now
pending at insomniacslk/dhcp#469. As upstream
contributions seem to have stalled, I have temporarily added a replacement for
the library to my own fork that carries the fix.

Sending hostname updates have been tested against dnsmasq and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against dnsmasq.

Fixes #4536.

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you formatted your code (make fmt)
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

@twelho twelho force-pushed the dhcpv4-send-hostname branch from 23bd03e to 0b4440d Compare July 12, 2022 12:22
@smira smira added this to the v1.2 milestone Jul 12, 2022
@smira
Copy link
Member

smira commented Jul 12, 2022

/ok-to-test

Copy link
Member

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your work, and this looks really great!

go.mod Outdated
@@ -3,6 +3,10 @@ module github.com/talos-systems/talos
go 1.18

replace (
// Forked insomniacslk/dhcp with DHCP spec compliancy fixes
// Upstream PR: https://github.com/insomniacslk/dhcp/pull/469
github.com/insomniacslk/dhcp => github.com/twelho/dhcp v0.0.0-20220708120125-da9fc6c72fb5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really love to avoid the need to use the fork unless of course we can't get any feedback from the maintainer.

let's see if we can try to ping them on that PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've pinged the people I believe to be the maintainers in insomniacslk/dhcp#469 now. Let's see if they answer.

func (d *DHCP4) setupHostnameWatch(ctx context.Context) (<-chan state.Event, error) {
hostnameWatchCh := make(chan state.Event)

return hostnameWatchCh, d.state.WatchKind(ctx, hostnameStatusMetadata, hostnameWatchCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return hostnameWatchCh, d.state.WatchKind(ctx, hostnameStatusMetadata, hostnameWatchCh)
return hostnameWatchCh, d.state.Watch(ctx, hostnameStatusMetadata, hostnameWatchCh)

This should be Watch to watch a specific resource instance here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Is there only ever a single instance of HostnameStatus, or why is Watch with resource.VersionUndefined preferred to using WatchKind here? Just curious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata consists of namespace, type, id and version.
WatchKind allows you watching items passing only namespace and type what makes it return the list of items of this kind.
There is always a single HostnameStatus resource in Talos and the id for it is known, so you can use Watch which always watches a single resource.

return
case <-time.After(renewInterval):
case <-hostnameWatchCh:
err := d.updateHostname(ctx, &hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact the event itself carries the new hostname, so we don't need to get it once again.

we should check the even type, but basically something like:

if event.Type == state.Created || event.Type == state.Updated {
   hostname = event.Resource.(*network.HostnameStatus).TypedSpec().Hostname
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now removed updateHostname altogether and use the hostname directly from the event. Thanks for the pointer. The check needs to cover all event types, since hostname deletion should also indicate to the DHCP server that the node has lost its previous hostname. I'm doing a checked cast to network.HostnameStatus, which fails intentionally if it gets fed a Tombstone generated by a deletion event. I've also added a comment about this.

}

// knownHostname checks if the given hostname has been defined by this operator.
func (d *DHCP4) knownHostname(hostname *string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I should do a little bit of refactoring and also put the hostname source into the HostnameStatus resource, this way we can avoid this comparison in favor of checking if hostname source is the operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. knownHostname really is a hack, and it would be useful to know which of the hostname candidates was elected as the node hostname for transparency and debugging as well. It would be best if the HostnameStatus source be used to identify an individual HostnameSpec, as just knowing the configuration layer of the source is not enough (e.g. ambiguity ensues when both the dhcp4 and dhcp6 operators have HostnameSpecs defined in the operator layer).

The current functionality with knownHostname works as-is if you prefer to merge this PR first, but I can also rebase on top of the changes to HostnameStatus if that's better for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I think I can get in some changes early next week for that, as we're still waiting for upstream dhcp library fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, go for it. I can then rebase on top of that work. There's one more issue with the hostname "parroting" prevention as well (not directly related to knownHostname), the approach needs to be changed a bit to work even when the only hostname (apart from the default hostname) is defined by DHCP. I'll look at fixing that early next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach is now in place and confirmed working even in the new edge cases I found. When the locally defined hostname and the hostname served by DHCP are the same, both now register correctly.

@twelho
Copy link
Contributor Author

twelho commented Jul 13, 2022

The CI build job seems to have failed due to a temporary issue in cloning the repo, it needs a restart.

@twelho
Copy link
Contributor Author

twelho commented Jul 13, 2022

The original (pre-review) version of the sources can be found here: https://github.com/twelho/talos/tree/dhcpv4-send-hostname-v1

@twelho
Copy link
Contributor Author

twelho commented Jul 13, 2022

The functionality of e85233f has now been tested against dnsmasq, including edge cases. No issues found.

@twelho twelho force-pushed the dhcpv4-send-hostname branch from e85233f to 56a757c Compare July 19, 2022 14:33
@twelho
Copy link
Contributor Author

twelho commented Jul 19, 2022

I've split the hostname sending and requesting operations into two separate invocations to completely sidestep the DHCP server "parroting" issue, where the DHCP server just automatically replies with the hostname sent by the client without verifying that it is actually registered on the server.

The current hostname is always sent when acquiring a new lease (which happens also when the local hostname changes, e.g. due to a machine configuration edit), and the DHCP-served hostname is requested right after with an INFORM request. This avoids needing to wait for the first RENEW, which may also never happen depending on the server configuration. RENEWs will still always request the hostname to detect and apply changes done on the DHCP server side. The INFORM support additionally requires some upstream improvements, which I've submitted at insomniacslk/dhcp#470.

@twelho twelho force-pushed the dhcpv4-send-hostname branch 4 times, most recently from e5624bc to 40101cf Compare July 27, 2022 13:27
@twelho
Copy link
Contributor Author

twelho commented Jul 27, 2022

I've identified a race condition when testing the INFORM request build, which has now been fixed in the latest commit. Any hostname request (or lease renewal for that matter) is performed using unicast, which requires the address sourced from DHCP to be available on the host (i.e. bound to an interface). This does not happen immediately after notifying the controller about the new address, and thus any immediate INFORM hostname requests will fail to bind to an address. After trying out a couple different approaches, I settled on a context-cancellable wait for both the address assignment and network upon lease negotiation to safely continue with unicast operations. This resolves the random failures I've been seeing when testing this at a large-ish scale.

Another issue I've spotted and subsequently fixed is the machine hostnames disappearing from DNS after a renewal. Turns out that at least Windows Server notices if the hostname is only sent during the initial negotiation and removes it from DNS. The most robust solution I came up with is always sending the hostname during request/renew operations, and extending the use of the INFORM hostname requests to lease renewals as well. So far so good.

@twelho twelho force-pushed the dhcpv4-send-hostname branch from 40101cf to dfa0a54 Compare August 8, 2022 13:12
@smira
Copy link
Member

smira commented Aug 8, 2022

It looks like things fail with our "test" DHCP server, but we can fix that of course. This test DHCP server was never supposed to be complete implementation.

@smira
Copy link
Member

smira commented Aug 8, 2022

It looks like upstreaming changes didn't quite work yet? Anything we can do to help?

@twelho twelho force-pushed the dhcpv4-send-hostname branch from dfa0a54 to fa731c8 Compare August 8, 2022 14:58
@twelho
Copy link
Contributor Author

twelho commented Aug 8, 2022

It looks like things fail with our "test" DHCP server, but we can fix that of course. This test DHCP server was never supposed to be complete implementation.

It should keep working even if the server doesn't support or can't process DHCPINFORM requests, hostname request errors are non-fatal on purpose. There are some intermittent CI failures, but I haven't been able to pin-point then to this DHCP rewrite. Of course if you have found some specific failure cases I would be happy to try to fix them.

It looks like upstreaming changes didn't quite work yet? Anything we can do to help?

I will ping the maintainers once more with a message in both insomniacslk/dhcp#469 and insomniacslk/dhcp#470. Of course if you can voice the importance of these changes there might be higher chance of people finally reviewing and merging them, as I think I have now addressed all of the feedback (at least in the former PR) and I don't plan on doing any more changes in either.

@smira
Copy link
Member

smira commented Aug 15, 2022

As we're going to branch off v1.2.0 stable branch today, I moved it to the next iteration - v1.3.0

@twelho twelho force-pushed the dhcpv4-send-hostname branch 2 times, most recently from 649bcfe to ebff506 Compare September 10, 2022 21:39
@dimatha
Copy link

dimatha commented Feb 26, 2023

Great job ! Any plans to finish this one? Thanks

@twelho twelho force-pushed the dhcpv4-send-hostname branch from 91b640c to cb93448 Compare February 28, 2023 15:48
@smira
Copy link
Member

smira commented Mar 23, 2023

@twelho thanks for your work, we plan to cut Talos beta release next week, and I want this PR to be merged before that.

If the upstream changes doesn't get forward in this time, I'm looking towards forking the DHCP library with your change (or using your fork) until your PR gets merged.

@twelho
Copy link
Contributor Author

twelho commented Mar 23, 2023

Alright, I feel like the DHCP library work is very close. I've think I've satisfied all of the maintainer's constraints from my side, so it's essentially down to them to just decide how INFORM requests should be constructed. I believe it would help if you could also notify them about the urgency one more time in this thread: insomniacslk/dhcp#470 (comment).

If they can still make up their mind this week and get the work merged, I can rebase/update this PR right away. However, if it looks like it won't happen before next week, I can also finish up this work using my fork for the time being, but avoiding keeping the fork would be my preference as well.

@twelho twelho force-pushed the dhcpv4-send-hostname branch 2 times, most recently from 0c6ffeb to a9e257b Compare March 27, 2023 19:59
@twelho
Copy link
Contributor Author

twelho commented Mar 27, 2023

With insomniacslk/dhcp#470 (and insomniacslk/dhcp#498 as a minor fix) finally merged (yay 🎉), this work does not need to rely on my own fork anymore, as all of my changes have now now landed upstream.

With that, I've gone through this PR one more time to remove the replace directive and clean up both the comments and logic. One more mistake was spotted and fixed, namely avoiding sending the server identifier together with INFORM requests as that is against the spec.

Note, however, that I've not yet tested this in its final form. As it's been 8 months since starting this work, I don't have access to the original test setup anymore (which also covered Windows environments), so I will need to create a new test setup for this, but I will only be able to test against dnsmasq. I would appreciate if others would also be able to confirm that this works as intended using their infrastructure.

@twelho twelho force-pushed the dhcpv4-send-hostname branch 2 times, most recently from 4b1e63a to 2a49d53 Compare March 27, 2023 21:04
@smira smira force-pushed the dhcpv4-send-hostname branch from 2a49d53 to 3ddaf4d Compare March 29, 2023 13:36
@smira
Copy link
Member

smira commented Mar 29, 2023

I did some (small) changes:

  • reworked our test dhcpd to be compliant with the new renewal protocol
  • did some changes around the hostname change detection logic, there were some changes to Talos since the time you started your work @twelho (default hostnames), in my testing it seems to work for all cases reasonably well
  • some cosmetic changes

The question of detecting "is it DHCP hostname", "should I send the hostname" is still not 100% reliable, but seems to be better now.

@frezbo
Copy link
Member

frezbo commented Mar 29, 2023

peer review in bullpen?

@smira
Copy link
Member

smira commented Mar 29, 2023

/promote integration-misc

)

// ToUDPAddr combines the given net.IP and port to form a net.UDPAddr.
func ToUDPAddr(ip net.IP, port uint16) (*net.UDPAddr, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to internal/ under operator/ ?

move resources/network?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up removing this file and doing a straightforward &net.UDPAddr{} initialization

@smira
Copy link
Member

smira commented Mar 29, 2023

/promote e2e

@smira
Copy link
Member

smira commented Mar 29, 2023

/promote e2e-aws

This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.

This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.

Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.

Signed-off-by: Dennis Marttinen <[email protected]>
Signed-off-by: Andrey Smirnov <[email protected]>
@smira smira force-pushed the dhcpv4-send-hostname branch from 3ddaf4d to 45c5b47 Compare March 29, 2023 17:07
@smira
Copy link
Member

smira commented Mar 29, 2023

/promote integration-qemu

@smira
Copy link
Member

smira commented Mar 29, 2023

/promote e2e-aws

@smira
Copy link
Member

smira commented Mar 29, 2023

/promote e2e-nothing

@smira
Copy link
Member

smira commented Mar 29, 2023

/m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

network: send node hostname to DHCP server
7 participants