Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bosh-dns noble support #99

Open
Tracked by #892
jpalermo opened this issue Apr 15, 2024 · 12 comments
Open
Tracked by #892

bosh-dns noble support #99

jpalermo opened this issue Apr 15, 2024 · 12 comments

Comments

@jpalermo
Copy link
Member

Noble switching from resolvconf to systemd-resolved poses a problem for bosh-dns.

Currently bosh-dns injects itself at the top of /etc/resolv.conf via the resolvconf tooling. Bosh-dns then reads the other entries in /etc/resolv.conf and stores those as upstream recursors. Since bosh-dns is at the top of /etc/reslv.conf, it will always be queried first and it can then pass the query onto the recursors and has quite a few features to control this behavior.

With systemd-resolved, the system resolver is doing much the same as bosh-dns was doing. This poses a couple problems.

  • How do we inject bosh-dns into the configuration
  • What do we do with the current recursor features of bosh-dns

How do we inject bosh-dns into the configuration:
We have several options here.

  • Reconfigure systemd-resolved to use the old way of writing /etc/resolv.conf and leave bosh-dns alone. This may work, but poses a problem because systemd-resolved has a system bus API for resolving queries that bypasses /etc/resolv.conf, so as more tools switch to that, bosh-dns is left out of the loop
  • Use the systemd-resolved apis to register bosh-dns as a resolver. systemd-resolved has both global resolver and interface specific resolvers. Unfortunately, it seems like the APIs only allow you to add interface specific resolvers, so this gets a bit weird.
  • Write the bosh-dns IP to a new file in /etc/systemd/resolved.conf.d/ where the bosh-agent writes the DNS servers it was given. This is most simple, but also has the downside that we have no control over how other resolvers are used by systemd-resolved

What do we do with the current recursor features of bosh-dns
These options get limited by the above choice

  • Figure out some way to remove all other resolvers from the systemd-resolved configuration and keep bosh-dns in charge of all dns resolution. This prevents us from using much of the new good behavior provided by systemd-resolved
  • Disable bosh-dns upstream resolver functionality in noble. This would leave systemd-resolved in complete control of resolution, and bosh-dns would only be responsible for bosh specific queries. The dns configuration for bosh-dns could even include routing domain information so systemd-resolved would only query bosh-dns for the domains bosh-dns actually knows about.
@klakin-pivotal
Copy link
Contributor

You may already have investigated this, and/or have considered it and disregarded it as being bad. Apologies for the noise if so.

Given that the systemd-resolved APIs don't really do what we want, might it be worth trying to set [Resolve] ... DNSStubListener=no (and maybe also LLMNR=no and MulticastDNS=no) in resolvd.conf https://www.freedesktop.org/software/systemd/man/latest/resolved.conf.html#Options, and using the resolvconf-compatibility mode of resolvectl to manage resolv.conf https://www.freedesktop.org/software/systemd/man/latest/resolvectl.html#Compatibility%20with%0A%20%20%20%20resolvconf(8)?

@jpalermo
Copy link
Member Author

Does that avoid the problems in the first option above?

...but poses a problem because systemd-resolved has a system bus API for resolving queries that bypasses /etc/resolv.conf, so as more tools switch to that, bosh-dns is left out of the loop

As more things start ignoring resolv.conf and start using the systemd-resolved system bus APIs, us continuing to configure bosh-dns to only use resolv.conf becomes a bigger problem.

@klakin-pivotal
Copy link
Contributor

Does that avoid the problems in the first option above?

Nope. That first option actually totally precludes the first half of the thing I suggested. It's amazing that there's a documented way to shut down everything but the dbus query interface. Sorry for the noise.

@rkoster rkoster moved this from Inbox to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Apr 18, 2024
@jpalermo
Copy link
Member Author

Write the bosh-dns IP to a new file in /etc/systemd/resolved.conf.d/ where the bosh-agent writes the DNS servers it was given. This is most simple, but also has the downside that we have no control over how other resolvers are used by systemd-resolved

I believe that this does not work. Yes, it's possible to use this to add a new "global" dns server to systemd-resolved, but my understanding of what systemd-resolved would do with that was incorrect.

I was able to change bosh-dns to disable all recursing and to add a reference to itself in /etc/systemd/resolved.conf.d/ rather than using resolvconf, but once it was added systemd-resolved did not do what was desired. It was "possible" that it would resolve a query for a bosh dns address, but it was also just as likely to return an NXDOMAIN response.

The documentation for systemd-resolved mentions that it calls all servers in parallel looking for a response, and since NXDOMAIN is a valid response, if that comes back, that is the returned response.

My next attempt was to modify that congiuration file by specifying a route-only domains section in the configuration file marking the bosh-dns server as only valid for those particular domains, which systemd-resolved will then take queries that match the domains, and send them to this server.

This also does not work. It seems that systemd-resolved has a focus more on interfaces than on dns servers. Since I was adding the Domains section to a global DNS configuration, it is simply ignored. To use Domain specific DNS resolution, you must configure the network interface with the Domain, not the global configuration.

The 169.254.0.2/32 address used by bosh-dns is not an interface, but a second IP on the loopback interface. It may be possible to add the Domains= section to the loopback device while associating the 169.254.0.2 DNS server with that interface and it will simply "work", but generally systemd-resolved does not treat the loopback device as a configured interface.

It seems like the dbus API is the most practical way to configure the loopback interface DNS, but I haven't been able to figure that out yet.

@jpalermo
Copy link
Member Author

New findings.

I was wrong before about how it operates, and some of the docs seem to be wrong too.

systemd-resolved only has a single active DNS server at any one time for each interface and for the "global" state. It assumes additional servers on the same interface all behave the same, so it doesn't query them in parallel. The "current servers" for global state and each interface are all queried in parallel by systemd-resolved.

Current the bosh-agent configures a global server. For bosh-dns to work, we'd need it to be the only global server or the only server on a particular interface. Both resolvectl and the dbus API refuse to configure dns servers on the loopback interface.

So one possible scenario would be to have bosh-dns configure itself as the single global server and have the bosh-agent instead of using the global space, place the provided dns configuration directly on the other interfaces (normally just eth0 I'm guessing)

@jpalermo
Copy link
Member Author

My testing was done with GCP where we surprisingly use DHCP for network configuration. This seems to be so the interface is able to discover the GCP provided DNS servers.

However, systemd-resolved doesn't seem to have any way to configure "additional" DNS servers for an interface. We could always have the agent wait for the networking to come online and then modify the DNS servers for the interface manually, but that seems like an awkward interaction.

Something I haven't yet tested is if the agent were to configure the interface directly, rather than using the config files, if that configuration will "stick" once DHCP updates the DNS servers for the interface.

If that doesn't work, ripping out systemd-resolved is looking like our best option.

Another option that would work is putting bosh-dns directly on the stemcell so the agent can always configure it as the single global resolver and the agent can then configure bosh-dns resolvers with the settings.json provided dns servers.

bosh-dns on the stemcell does seem very sane at this point, but also seems like a lot of work to get it there and also provide a way for the config to be updated later with additional configuration.

@jpalermo
Copy link
Member Author

With some help from @cunnie , we managed to get it working.

Currently, bosh-dns creates an additional ip on the loopback device. We changed the behavior so under noble it instead creates a new virtual interface of type dummy and binds the 169.254.0.2 address to that interface instead of the loopback interface.

This allows us to then have systemd-resolved pick up the bosh-dns DNS server IP address from this interface and use it for resolving queries without worrying which DNS server is the "current" one.

We also did work to populate the "Domain" configuration. Since systemd-resolved queries DNS servers for all interfaces by default, and since bosh-dns always uses a TTL of 0, bosh-dns records never get cached by systemd-resolved. This means the external DNS servers would get copies of all the bosh-dns queries which could put them under unexpected additional load. By populating the "Domain" configuration for the virtual interface with both the bosh-dns domains as well as any alias domains found on the system, it allows systemd-resolved to only send those queries to bosh-dns and not to any of the external dns servers.

image

@ramonskie
Copy link
Contributor

a pr is under review #100

@max-soe
Copy link

max-soe commented Jun 14, 2024

As we discussed yesterday in the WG Meeting we should start with a list of bosh-dns features that we lose with this new implementation. And also find workarounds/solutions to achieve the same with Noble. I start a list here, we will sync in the next days internally to maybe find more.. If you have any other features in mind, feel free to add them:

@beyhan
Copy link
Member

beyhan commented Jun 14, 2024

@max-soe I think you forgot to add the link.

I can think of:

@ramonskie
Copy link
Contributor

systemd-resolved:

  • caches all requests by default

logging can be enabled with resolvectl log-level debug as we already do some resolvctl in bosh-dns we should be able to transfer this cmd
a different option would be that bosh-dns would be to set resolved config

@ramonskie
Copy link
Contributor

prometheus has the ability to pull up systemd-resolved prometheus-community/systemd_exporter#119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

5 participants