Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLAT-1811: Add vSphere multi disk support #1290

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vr4manta
Copy link
Contributor

@vr4manta vr4manta commented Sep 18, 2024

SPLAT-1811

Changes

  • Add logic for creating additional disks to new machines

Prerequisites

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
Copy link
Contributor

openshift-ci bot commented Sep 18, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
@vr4manta vr4manta force-pushed the vsphere_disk branch 4 times, most recently from b05d938 to cb52ddf Compare September 23, 2024 12:33
@vr4manta vr4manta changed the title WIP: Vsphere disk SPLAT-1811: Vsphere disk Sep 23, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 23, 2024

@vr4manta: This pull request references SPLAT-1811 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

SPLAT-1811

Changes

  • Add logic for creating additional disks to new machines

Prerequisites

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vr4manta vr4manta changed the title SPLAT-1811: Vsphere disk SPLAT-1811: Add vSphere multi disk support Sep 23, 2024
@vr4manta
Copy link
Contributor Author

/test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2024
Comment on lines 464 to 465
// Currently, MAPI does not provide any API knobs to configure additional volumes for a VM.
// So, we are expecting the VM to have only one disk, which is OS disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will need to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks!

func getAdditionalDiskSpecs(s *machineScope, devices object.VirtualDeviceList, datastore *object.Datastore) ([]types.BaseVirtualDeviceConfigSpec, error) {
var diskSpecs []types.BaseVirtualDeviceConfigSpec

klog.InfoS("About to iterate through disks", "disk", s.providerSpec.Disks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update those. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this message.


klog.InfoS("About to iterate through disks", "disk", s.providerSpec.Disks)

klog.Infof("Feature Gates: %v", s.featureGates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update those. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these logs messages.

if len(s.providerSpec.Disks) > 0 && !s.featureGates.Enabled(featuregate.Feature(apifeatures.FeatureGateVSphereMultiDisk)) {
return nil, machinecontroller.InvalidMachineConfiguration(
"machines cannot contain additional disks due to VSphereMultiDisk feature gate being disabled")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to happen at the providerSpec validation webhook level as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll copy this up this up there too. Thanks!

@@ -547,7 +547,8 @@ func TestNodeGetter(t *testing.T) {
testEnv := &envtest.Environment{
CRDDirectoryPaths: []string{
filepath.Join("..", "..", "..", "install"),
filepath.Join("..", "..", "..", "vendor", "github.com", "openshift", "api", "config", "v1", "zz_generated.crd-manifests")},
filepath.Join("..", "..", "..", "vendor", "github.com", "openshift", "api", "config", "v1", "zz_generated.crd-manifests"),
filepath.Join("..", "..", "..", "third_party", "cluster-api", "crds")},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What new CRDs are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that once I fixed loading of feature gates, there are some existing tests that start to fail due to

E1113 15:26:22.870540 2519608 actuator.go:65] "test" error: test: reconciler failed to Delete machine: unable to remove finalizer for IP address claims: unable to list IPAddressClaims: no matches for kind "IPAddressClaim" in version "ipam.cluster.x-k8s.io/v1beta1"
actuator_test.go:387:
Unexpected error:
<*fmt.wrapError | 0xc00073f880>:
test: reconciler failed to Delete machine: unable to remove finalizer for IP address claims: unable to list IPAddressClaims: no matches for kind "IPAddressClaim" in version "ipam.cluster.x-k8s.io/v1beta1"
{
msg: "test: reconciler failed to Delete machine: unable to remove finalizer for IP address claims: unable to list IPAddressClaims: no matches for kind "IPAddressClaim" in version "ipam.cluster.x-k8s.io/v1beta1"",
err: <*fmt.wrapError | 0xc000f9dba0>{
msg: "unable to remove finalizer for IP address claims: unable to list IPAddressClaims: no matches for kind "IPAddressClaim" in version "ipam.cluster.x-k8s.io/v1beta1"",
err: <*fmt.wrapError | 0xc000f9db80>{
msg: "unable to list IPAddressClaims: no matches for kind "IPAddressClaim" in version "ipam.cluster.x-k8s.io/v1beta1"",
err: <*meta.NoKindMatchError | 0xc000b54d80>{
GroupKind: {
Group: "ipam.cluster.x-k8s.io",
Kind: "IPAddressClaim",
},
SearchedVersions: ["v1beta1"],
},
},
},
}
occurred

I was able to fix this by including the CRDs that are provided by the cluster-capi-operator. Sadly, these are not in a package that go can extract full directory due to no .go file in the same directory as the CRD yaml files. Maybe in a separate PR we can fix the other repo to put a dummy go file in there so we can remove this. Note, we'll still need the filepath join logic, but we can remove what was put into the third_party directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through and removed this one entry but the other locations are still needed. I think I added this here as part of my original debug.

Comment on lines 2634 to 2635
err = gates.Set(fmt.Sprintf("%v=%v", features.FeatureGateVSphereStaticIPs, tc.staticIPFeatureGateEnabled))
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, should inline this

Suggested change
err = gates.Set(fmt.Sprintf("%v=%v", features.FeatureGateVSphereStaticIPs, tc.staticIPFeatureGateEnabled))
if err != nil {
if err := gates.Set(fmt.Sprintf("%v=%v", features.FeatureGateVSphereStaticIPs, tc.staticIPFeatureGateEnabled)); err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll update that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are needed for the unit tests to run. Not sure why they were not needed before, but I think it has to do with now I am setting the featuregate infos during the tests.

@vr4manta vr4manta force-pushed the vsphere_disk branch 2 times, most recently from be37562 to ad6f989 Compare November 14, 2024 15:08
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2024
@vr4manta vr4manta force-pushed the vsphere_disk branch 2 times, most recently from 4e8092d to d5ccb46 Compare November 14, 2024 15:19
@vr4manta
Copy link
Contributor Author

/test all

Copy link
Contributor

openshift-ci bot commented Dec 11, 2024

@vr4manta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator d5ccb46 link true /test e2e-aws-operator
ci/prow/e2e-vsphere-operator d5ccb46 link false /test e2e-vsphere-operator
ci/prow/e2e-azure-operator d5ccb46 link false /test e2e-azure-operator
ci/prow/okd-scos-e2e-aws-ovn d5ccb46 link false /test okd-scos-e2e-aws-ovn
ci/prow/golint d5ccb46 link true /test golint
ci/prow/e2e-metal-ipi-ovn-dualstack d5ccb46 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-openstack d5ccb46 link false /test e2e-openstack
ci/prow/e2e-aws-ovn-upgrade d5ccb46 link true /test e2e-aws-ovn-upgrade
ci/prow/verify-crds-sync d5ccb46 link true /test verify-crds-sync
ci/prow/goimports d5ccb46 link true /test goimports
ci/prow/e2e-metal-ipi-upgrade d5ccb46 link false /test e2e-metal-ipi-upgrade
ci/prow/e2e-metal-ipi-virtualmedia d5ccb46 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-vsphere-ovn-upgrade d5ccb46 link false /test e2e-vsphere-ovn-upgrade
ci/prow/e2e-metal-ipi d5ccb46 link false /test e2e-metal-ipi
ci/prow/e2e-gcp-operator d5ccb46 link false /test e2e-gcp-operator
ci/prow/e2e-vsphere-ovn d5ccb46 link false /test e2e-vsphere-ovn
ci/prow/e2e-azure-ovn d5ccb46 link false /test e2e-azure-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6 d5ccb46 link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-gcp-ovn d5ccb46 link false /test e2e-gcp-ovn
ci/prow/e2e-nutanix d5ccb46 link false /test e2e-nutanix
ci/prow/e2e-vsphere-ovn-serial d5ccb46 link false /test e2e-vsphere-ovn-serial
ci/prow/e2e-aws-ovn d5ccb46 link true /test e2e-aws-ovn
ci/prow/security d5ccb46 link true /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2024
Copy link
Contributor

openshift-ci bot commented Dec 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign radekmanak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 13, 2024
@vr4manta vr4manta force-pushed the vsphere_disk branch 2 times, most recently from 12880d9 to 73a04b8 Compare December 13, 2024 12:45
// VM os disks filename contains the machine name in it
// and has the format like "[DATASTORE] path-within-datastore/machine-name.vmdk".
// This is based on vSphere behavior, an OS disk file gets a name that equals the target VM name during the clone operation.
func filterOutVmOsDisk(attachedDisks []attachedDisk, machine *machinev1.Machine) []attachedDisk {
var disks []attachedDisk
regex, _ := regexp.Compile(fmt.Sprintf(".*\\/%s(_\\d*)?.vmdk", machine.GetName()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this regex doing differently? What will the additional disks names look like?

Comment on lines +1107 to +1110
if len(s.providerSpec.DataDisks) > 0 && !s.featureGates.Enabled(featuregate.Feature(apifeatures.FeatureGateVSphereMultiDisk)) {
return nil, machinecontroller.InvalidMachineConfiguration(
"machines cannot contain additional disks due to VSphereMultiDisk feature gate being disabled")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this same check in the validation webhook?

if !used {
unit32 := int32(unit)
vd.UnitNumber = &unit32
klog.Infof("Assigned unit number %d = %d", unit, *vd.UnitNumber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we are getting a little too much on the logs here? This seems rather low level for logging default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants