Backup fails after a migration and a new destination directory is selected #227
Replies: 3 comments 1 reply
-
huh, that looks pretty strange to me. I have not yet seen a backup fail in this situation, it seems somehow its attempting to start the backup job, but then the
it also seems to be unable to thaw the filesystems (that results in the vm beeing freezed as filesystems are not
That looks to me as if the complete VM crashes? There have been similar issues in the past related to mysql running within Maybe it helps if you try to backup without the qemu agent running to see if its the same problem? As you tell, the NBD Service seems to be running after this action, so it somehow at least managed to start the backup Also, the latest virtnbdbackup version has seen some improvements and attempts to reconnect the libvirt daemon in case Have you considered using the external checkpoint directory for storing checkpoint information like described here? |
Beta Was this translation helpful? Give feedback.
-
Ok, some more testing. I set up the same scenario. I backed up the VM a couple times to create some checkpoints. I migrated the VM between hosts, which caused libvirt to lose the checkpoint info. Then I ran the backup using a new directory and did not use the common checkpoint dir as we discussed above. Basically the same thing happened. The output was the same (including the virsh-ssh-helper) except for:
This time, the VM itself did not seem to become unhappy and appeared to be running normally. The nbd port was still left open and I had to shutdown the VM again to clear that. The state change lock was still there and needed to be cleared with a libvirtd restart while the VM was down. When the backup happens in this state, libvirtd apparently crashes and restarts (timestamp doesn't match the guest agent line above b/c they were 2 different tests, but the same thing happened):
I think changing the checkpoint dir will prevent this situation in my case, but hopefully this testing will help if someone else runs into this. I'd be curious if other folks can replicate the issue. It seems like libvirtd shouldn't be crashing, so maybe there is a bug somewhere that needs to be addressed. Let me know if you want any more info. I will have this host in a state where I can do more testing probably through the rest of the week. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
is most probably cause of the issue... |
Beta Was this translation helpful? Give feedback.
-
I've run into an issue. I have a wrapper script that creates a new backup directory based on a hash of the VM name to spread Full backups over the course of a month. This is working well for the most part.
If I migrate a VM between hosts, the checkpoints are lost. If the backup directory remains the same, it will recreate the checkpoints and then perform the backup as expected.
The problem is if a VM has checkpoints, is migrated, and then a new backup directory is created to start a new checkpoint chain. virtnbdbackup doesn't think there are any checkpoints to cleanup and it just tries to start a new chain, but the checkpoints exist in the disk image files. When this happens, the backup dies, leaves the nbd port open, and leaves a state change lock on the VM. Running the backup with -k does not remove the lock. I have to shutdown the VM (which frees up the nbd port), restart libvirtd (clears the lock), and then power the VM back up.
At this point, it appears my options are to point the backup at the old directory to recreate the checkpoints and then use the new dir to start a new chain, or shutdown the VM and remove the bitmap checkpoints from the volume file(s).
Any thoughts on how to deal with this situation? I'm thinking my wrapper could, if it creates a new directory, look for an empty checkpoint list and try to run an estimate using the prev directory first (assuming there is one). That should handle most of the situations unless the prev dir is not available for some reason. Any better way to handle this?
Thanks!
If folks are interested, I will share my wrapper once I get this (hopefully) last failure situation ironed out and I make my code a little more readable.
Here is the failure (the first 6 lines and the last line are my wrapper):
Beta Was this translation helpful? Give feedback.
All reactions