Skip to content

Commit

Permalink
Add toubleshooting information for confidential VM
Browse files Browse the repository at this point in the history
Bad Measurement
Impossible to SSH
Wrong decryption password
  • Loading branch information
olethanh committed Nov 18, 2024
1 parent 7a488d1 commit 41a3e56
Showing 1 changed file with 82 additions and 0 deletions.
82 changes: 82 additions & 0 deletions docs/computing/confidential/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,85 @@ sudo qemu-system-x86_64 \
> Note: Once you have entered your password you might have to wait a minute or so for the disk to decrypt and boot.
To exit qemu: press `Ctrl + a`, then `x` and then `[Enter]`

## Cannot SSH inside the VM

1. **Validate IPv6 Connectivity**
Ensure that IPv6 is functioning correctly on your local network. You can test this by
visiting [https://test-ipv6.com/](https://test-ipv6.com/).

2. **Retrieve the VM's IP Address**
Use the following command to retrieve the VM's IP address:
```bash
aleph instance list
```

3. **Try Different User Logins**
Depending on your distribution, the default user login may differ:
- For Debian-based distributions, the default user is `root`.
- For Ubuntu, the default user is `ubuntu`.

4. **Check VM Logs**
To investigate further, check the logs with:
```bash
aleph instance logs <vm_hash>
```


## What to Do If You Entered the Wrong Decryption Password

If you mistakenly entered the wrong decryption secret while starting the VM, you will need to reboot it.

To reboot, run the following command:

```bash
aleph instance restart <vm_hash>
```

Then continue the process again from `aleph instance confidential-start`.


## Error: "Bad Measurement"

If the VM fails to start and the logs display the following error:

```shell
qemu-system-x86_64: sev_launch_start: LAUNCH_START ret=1 fw_error=11 'Bad measurement'
qemu-system-x86_64: sev_kvm_init: failed to create encryption context
qemu-system-x86_64: failed to initialize kvm: Operation not permitted
```

### Probable Causes

1. **Policy Mismatch**
The policy requested by the client does not match the start packet sent by the client. This is unlikely if the VM was
started using the Aleph client unless the default policy was explicitly modified.

2. **Platform Certificate Issues**
The CRN's platform certificate may not match the start packet sent by the client due to one of the following reasons:
- A session generated for one CRN was reused for another CRN.
- The CRN platform certificate was rotated after the client generated the session certificate.
- The CRN platform certificate was rotated, but the old one is still being returned by the CRN API endpoint. (If
other confidential VMs can start, this is likely not the issue.)

### Resolution Steps

1. **On the Client Side:**
Regenerate the session certificate by running:
```bash
aleph instance confidential-init-session
```
When prompted to remove existing certificates, answer "yes." Afterward, continue the normal start process using:
```bash
aleph instance confidential-start
```

2. **If the Error Persists:**
- **On the CRN Node Side:**
Verify if the cached platform certificate at `<CONFIDENTIAL_DIRECTORY>/certs_export.cert` matches the output of:
```bash
/opt/sevctl export
```
If they do not match, delete the cached certificate file and instruct the client to restart the session process
from scratch.

0 comments on commit 41a3e56

Please sign in to comment.