Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usability: Make it easier to backup and restore full AiiDA installations #11

Open
2 tasks
ramirezfranciscof opened this issue Mar 1, 2023 · 8 comments
Open
2 tasks
Assignees
Labels
roadmap/proposed A roadmap item that has been proposed but not yet processed

Comments

@ramirezfranciscof
Copy link
Member

Motivation

Proper digital data management requires one to keep copies of the information in case of system failures on the main work devices. AiiDA has a well established method for transmitting information between installations by using the verdi archive command to export/import sets of nodes. However, even when selecting to export all nodes in the database, this may leave out information related to the configurations of the working profile. There is some documentation on creating backups, but it is somewhat convoluted and may even have become outdated since the latest modifications in aiida-core. This means there is currently no official recommended procedure for backing up AiiDA installations.

Desired Outcome

Have a clear recommended procedure for backing up and restoring full AiiDA profiles/installations. Add any features and/or utility scripts in aiida-core that can automate some or all of the steps, and review/update respective documentation section.

Impact

All users should benefit from improved backup procedures.

Complexity

Originally creating the backup just required 3 steps:

  1. Dumping the Postgres database
  2. Copying the file repository folder
  3. Copying the config.json configuration file

Since all of this was performed outside of AiiDA, it is unclear what would happen if this procedure was started while the AiiDA instance was being used (and nodes were created / modified during steps or between them, leading to inconsistent parts). Moreover, the recent changes to include the disk-objectstore (which added another SQLite database inside the file repository) add an extra level of complexity to live backups.

We need to evaluate if we can provide a more streamlined and secure way for users to create backups, perhaps even adding new verdi functionalities to automate one or more of these steps in a safer manner. We must also decide if it is possible to do more modular backups (of single profiles, for example) or if it is too inconvenient to do anything other than full system installation backups.

Finally, this procedure may also need to be re-structured if we implement some pull/push mechanism in the future (or replaced by it altogether).

Extra Notes

  • The export already contains some information about the profile (see Simple way to export an entire database aiida-core#974 for linked PRs)
  • When backing up the disk-objectstore (file repository), one also needs to backup the SQLite db while it is live, which requires special care (currently no way to do it automatically in the disk-objectstore, there is an issue open for it).

Progress

@ramirezfranciscof ramirezfranciscof added the roadmap/proposed A roadmap item that has been proposed but not yet processed label Mar 1, 2023
@ramirezfranciscof ramirezfranciscof self-assigned this Mar 1, 2023
@sphuber
Copy link

sphuber commented Mar 3, 2023

Since, as of v2.0, it is possible to provide custom storage backends (as for example done by aiida-s3) we should take into account that the method of backing up a core.psql_dos backend is not necessarily always the correct one.

Ideally then, we would define a method on the StorageBackend interface that creates a backup of its contents as well as a method to restore a backend from a created backup. In this way, we can have a single verdi command that automates the entire backing up. It can provide options to backup just the storage of any profile, or backup the entire instance including configuration and log files.

One big challenge will be to have the backup/restore methods of the StorageBackend class be performant and work whenever possible without root access. In the past, we would provide manual instructions for backing up the default storage backend since that was the most efficient, i.e., by directly going to psql to dump the database and using rsync for the file repository.

@ramirezfranciscof
Copy link
Member Author

One big challenge will be to have the backup/restore methods of the StorageBackend class be performant and work whenever possible without root access.

Why do you mention this specifically? I would agree that one should try to do as much as possible without root access, but if it is necessary the user should just be prompted for password when running the command.

@sphuber
Copy link

sphuber commented Mar 6, 2023

Why do you mention this specifically? I would agree that one should try to do as much as possible without root access, but if it is necessary the user should just be prompted for password when running the command.

For the same reason that users often experience problems using verdi quicksetup if they don't have root access. Users on these platforms won't be able to make backups if it requires root access and they don't have it.

@ramirezfranciscof
Copy link
Member Author

Yeah, good point, I forget that users may not have root access in their workstation...

@chrisjsewell
Copy link
Member

Heya, I would suggest a possible alternative/complimentary solution here, is to provide functionality to "sync" backend instances.
This is effectively what you are doing now when you create/import an archive (since v2 archives are effectively just an instance of a sqlite_zip backend), the limitation at the moment being that you can only create "full" archives, as opposed to having incremental updates.

If you could, for example, sync a "local" psql_dos backend with a "remote" aiida-s3) backend, then you have a backup.

This obviously relates also to aiidateam/aiida-core#4535

In terms of also syncing, configuration and log file, that would be an open question.
I think there is already an open issue(s) about including the configuration in the archive

@chrisjsewell
Copy link
Member

(my suggestion ☝️ is somewhat alluded to in the initial issue, but I wanted to make it more concrete)

@giovannipizzi
Copy link
Member

@eimrek @sphuber this can be closed now?

@sphuber
Copy link

sphuber commented May 23, 2024

I guess the backup part is there, but it stands to be argued that restoring can be made a lot easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap/proposed A roadmap item that has been proposed but not yet processed
Projects
None yet
Development

No branches or pull requests

4 participants