Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic if GRID vGPU is attached to VM #1265

Closed
bathomas opened this issue Dec 1, 2020 · 13 comments · Fixed by #1627
Closed

Panic if GRID vGPU is attached to VM #1265

bathomas opened this issue Dec 1, 2020 · 13 comments · Fixed by #1627
Assignees
Labels
acknowledged Status: Issue or Pull Request Acknowledged area/vm Area: Virtual Machines bug Type: Bug
Milestone

Comments

@bathomas
Copy link

bathomas commented Dec 1, 2020

Terraform Version

0.13.5 / 0.12.8

vSphere Provider Version

1.24.2

Affected Resource(s)

  • vsphere_virtual_machine

Panic Output

Error: rpc error: code = Unavailable desc = transport is closing
 panic: interface conversion: types.BaseVirtualDeviceBackingInfo is *types.VirtualPCIPassthroughVmiopBackingInfo, not *types.VirtualPCIPassthroughDeviceBackingInfo

Expected Behavior

Terraform does not panic, v1.18.3 works fine.

Actual Behavior

The VM will boot and is operational, however Terraform panics and does not complete. Subsequent terraform apply commands fail unless the vGPU is removed.

Steps to Reproduce

Either attach a GRID vGPU when the VM is powered off, then boot or using a packer image with vGPU profile baked in. When I run terraform apply, I get a panic. This appears to be due to the PCI passthrough interface conversion.

Important Factoids

  • #0000

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@bathomas bathomas added the bug Type: Bug label Dec 1, 2020
@ghost
Copy link

ghost commented Dec 8, 2020

We have this same set up and experience the same issues with the above versions. We use packer to bake in vGPU profile and we see a terraform panic attempting to spin this up. Backing down to v1.18.3 proved as a workaround for now.

Would love to see this addressed with Terraform also being able to apply vGPU profiles to templates which do not have them baked in. This would match the Packer functionality to do the same, and keep us from creating repetitive templates.

@zerotosixty
Copy link

We have the same issue using a temple with vgpu included.
Terraform 0.13
vmware provider 1.24.2

also a similar issue with destroying a vm if the vgpu is added after build.

@phajisavvi
Copy link

Likewise here, same issues as above and it's blocking some important development work we're doing. Any help around this problem appreciated.

@skywalk7
Copy link

This is the patch that we use to prevent panic:

 	// Read the PCI passthrough devices.
 	var pciDevs []string
 	for _, dev := range vprops.Config.Hardware.Device {
 		if pci, ok := dev.(*types.VirtualPCIPassthrough); ok {
-			devId := pci.Backing.(*types.VirtualPCIPassthroughDeviceBackingInfo).Id
-			pciDevs = append(pciDevs, devId)
+			if pciBacking, ok := pci.Backing.(*types.VirtualPCIPassthroughDeviceBackingInfo) ; ok {
+				devId := pciBacking.Id
+				pciDevs = append(pciDevs, devId)
+			} else {
+				log.Printf("[DEBUG] Ignoring VM %q VirtualPCIPassthrough device with backing type of %T",
+					vm.InventoryPath, pci.Backing)
+			}
 		}
 	}

@ghost
Copy link

ghost commented Jan 29, 2021

Thank you @skywalk7, will give this a try. Hopefully something like this can get baked into the official provider soon. It's become difficult to juggle this with older bugs from being stuck on 18.X.

aarnaud added a commit to aarnaud/terraform-provider-vsphere that referenced this issue Mar 31, 2021
Original implementation of PciPassthrough can't be use for vGPU because it's link to host systemId
aarnaud added a commit to aarnaud/terraform-provider-vsphere that referenced this issue Mar 31, 2021
Original implementation of PciPassthrough can't be use for vGPU because it's link to host systemId
aarnaud added a commit to aarnaud/terraform-provider-vsphere that referenced this issue Mar 31, 2021
Original implementation of PciPassthrough can't be use for vGPU because it's link to host systemId
aarnaud added a commit to aarnaud/terraform-provider-vsphere that referenced this issue Mar 31, 2021
Original implementation of PciPassthrough can't be use for vGPU because it's link to host systemId
@aarnaud
Copy link

aarnaud commented Mar 31, 2021

Hi, I did a PR to support vGPU, waiting for review

@ajski1701
Copy link

Encountered this bug today. Not the easiest to pinpoint and diagnose especially while running a Terraform config that manages a large amount of resources.

Would love to see a permanent fix in the stable releases of the provider.

@bathomas
Copy link
Author

I agree. We're currently stuck between either staying with the old provider or not provisioning vGPU-enabled VMs via Terraform.

@tFable
Copy link

tFable commented Dec 23, 2021

We're also affected by this issue. We're currently forced to:

  • Create VM without vGPU with Terraform
  • Manually add vGPU to VM via vCenter (Terraform doesn't seem to support creating VMs with vGPU)
  • Remove VM from Terraform state and delete Terraform code

It would be great to both be able to create and manage VMs with vGPU via Terraform.

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Dec 23, 2021

There's an open PR for product manager and maintainer review. Thanks for adding your 👍🏻 to the original description.

Ryan

@tFable
Copy link

tFable commented Dec 23, 2021

Hey @tenthirtyam ,

Yea, I have see the PR (#1378) if that's what you're referring to.

Hoping that this gets a chance to be reviewed soon since it has been open for 9 months.

Thanks!

@tenthirtyam
Copy link
Collaborator

I'll plan to discuss this one with the HashiCorp engineering team in January.

Ryan

@tenthirtyam tenthirtyam added needs-triage Status: Issue Needs Triage acknowledged Status: Issue or Pull Request Acknowledged labels Feb 5, 2022
@tenthirtyam tenthirtyam added the area/vm Area: Virtual Machines label Feb 22, 2022
tenthirtyam added a commit that referenced this issue Mar 18, 2022
Adds conditional to avoid a provider panic when a vGPU is added as a PCI device outside of Terraform to a virtual machine. #1265

Signed-off-by: Ryan Johnson <[email protected]>
@tenthirtyam tenthirtyam added this to the v2.2.0 milestone Mar 18, 2022
@tenthirtyam tenthirtyam self-assigned this Mar 18, 2022
appilon pushed a commit that referenced this issue Mar 23, 2022
Adds conditional to avoid a provider panic when a vGPU is added as a PCI device outside of Terraform to a virtual machine. #1265

Signed-off-by: Ryan Johnson <[email protected]>
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 23, 2022
@tenthirtyam tenthirtyam removed the needs-triage Status: Issue Needs Triage label Jun 15, 2022
tenthirtyam pushed a commit to mristok/terraform-provider-vsphere that referenced this issue Apr 29, 2024
Original implementation of PciPassthrough can't be use for vGPU because it's link to host systemId
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
acknowledged Status: Issue or Pull Request Acknowledged area/vm Area: Virtual Machines bug Type: Bug
Projects
None yet
8 participants