Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage.md instructions for disabling mayastor nvme-tcp check #10064

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

timolson
Copy link

Documentation addition to further describe disabling of mayastor's nvme-tcp check

@timolson
Copy link
Author

I still cannot get Mayastor working on Talos 1.9.0, so this edit may not be helpful. After adding nvme-tcp to the kernel modules (an instruction which is also missing from the docs), it was no longer necessary to disable this init container, since it did detect nvme-tcp successfully. However, Mayastor is still not working on 1.9.0.

The OpenEBS ZFS provisioner also seems broken in 1.9.0.

Let me also say that the migration from 1.8.3 to 1.9.0 has been very painful for us. I would hope that minor dot releases are basically backward compatible but the alteration of security contexts (and perhaps default modules? How did this every work previously without explicitly loading nvme-tcp?) broke many things for us. :((((

The documentation in Talos and OpenEBS both seem outdated. If I can get Mayastor running on 1.9.0, I'll send another documentation PR.

@timolson timolson closed this Dec 27, 2024
@timolson
Copy link
Author

I've confirmed that these changes are necessary.

I also added the nvme-tcp module explicitly to my machineconfigs, but it's not clear if this is necessary or if that is a default module in Talos.

It also required an upgrade to 1.9.1, since the deadlock bug in 1.9.0 was a significant problem.

Please incorporate the doc changes in this PR, and if it makes sense to also update the machineconfig patch with the nvme-tcp module, I can submit a PR for that, too. LMK.

@timolson timolson reopened this Dec 27, 2024
@dhess
Copy link

dhess commented Dec 30, 2024

I'm confused: what has changed that makes it necessary to disable Mayastor's init container with Talos v1.9.x? We've been using Mayastor successfully with Talos (pre-v1.9) for many months with no issues and no need to make any changes to Mayastor or the list of loaded modules.

@timolson
Copy link
Author

timolson commented Dec 30, 2024

Note that the instructions to disable the nvme-tcp init container were already in the docs. This PR merely adds more explicit instructions about how to perform the disable.

Without this fix, the local-pv system seemed to mostly* initialize but not mayastor. Many containers were hung on init containers, which I traced down to the nvme-tcp check before I even saw this existing note in your docs.

The previous document writer said it was due to not mounting /sys, but perhaps the reason has to do with detection of modules built into the kernel vs loaded? This is not my expertise, but mayastor does not run on my fresh bare metal Talos cluster with 1.9.x unless I disable that init container.

Many other things broke for us as well, moving to 1.9.x, due to possibly related increased security restrictions in 1.9.x.

@dhess
Copy link

dhess commented Dec 30, 2024

Ahh, I see that we already have this disabled in our Helm values for Mayastor/OpenEBS:

mayastor:
  enabled: true
  ...
  csi:
    node:
      initContainers:
        enabled: false

which explains why we haven't needed to edit the manifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants