Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Postgres Operator Upgrade #23

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions enhancements/postgres-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
title: Quay Postgres Upgrade
authors:
- "@jonathankingfc"
reviewers:
- TBD
approvers:
- TBD
creation-date: 2023-01-27
last-updated: 2023-01-27
status: provisional
---

# Quay Operator Postgres Upgrade

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA

## Summary

This enhancement proposes upgrading the version of Postgres on the Red Hat Kubernetes Operator
from version 10 to version 14. Due to the EOL of Postgres 10, the upgrade is critical.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why not go directly to Postgres 15?

It will also increase performance and provide more features for users.

## Motivation

When Postgre reaches EOL, it will no longer be supported by the devloper community and there
will be no further security updates. Therefore, it is critical to provide an upgrade path
in order to ensure the latest security patches and bug fixes. Postgres 14 also includes a number
of new features which can be used in future versions of Quay.

## Open Questions

What is the best way to infer the database version from the image tag?

### Goals

Smooth Upgrade: The upgrade should happen as a Kubernetes job behind the scenes, and should
maintain
compatibility with all Quay features.

Backups: It is important to take a full backup of the database to ensure that you can revert to the
previous version if needed.

### Non-Goals

Database Backup and Restore (as a feature) / This should be a separate enhancement
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backup is a goal and a non-goal at the same time.

Do you want to create backups and provide instructions to how use them manually?


## Proposal

### User Stories

#### Ideally, this upgrade happens without any effect on the user. It should happen in the background and should require no manual steps.

#### Users may want to upgrade from < 3.8, they should follow the direct upgrade path.

### Implementation Details/Notes/Constraints

There are two ways to run the `pg_upgrade` command, Linked and In-Place.

In an in-place upgrade, the upgrade is performed directly on the existing data directory of the
older version of postgres. It does not rquire additional resources to be created.

In a linked upgrade, a new object of the target version of postgres is created and the data is
transferred from the old object to the new one. This type of upgrade provides more
safety as the original data remains intact, making it easier to recover in case of an issue.

The linked upgrade will allow us to create a new PVC, copy the users data, and keep the old PVC
intact.

The implementation will happen through a Kubernetes job triggered conditionally in the reconcile loop.
The process would look something like the following:

1. A change to the Quay CR is made, in this case an upgrade, causing the reconcile loop to fire
off.
2. The reconciler checks the postgres image tag, and determines the current postgres version being
used.
3. If the postgres version is inspected from the tag and if an upgrade is determined necessary,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it determine when an upgrade is necessary and ready?

the following events will be triggered.

4. A Kubernetes Job object definition is created include the commands to run a postgres backup before the postgres upgrade.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long can we expect the backup to take and will it require downtime?

5. An additional PVC in which the data will be backed up.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the operator is killed right after this step, will it create a new PVC once it is back again?

6. Retrigger reconcile loop
7. A Kubernetes Job object definition is created include the commands to run a postgres upgrade.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As operators shouldn't be imperative, but declarative, how would you describe (check for) this state, i.e. that the previous steps are done?

8. Retrigger reconcile loop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there need to be a restart of the Quay pod after the upgrade completes?

- Check status of upgrade, if upgrade still in process, retrigger every 60 seconds
- If status of upgrade is complete, let the reconcile loop pass on an do not retrigger

### Risks and Mitigations

The largest risk involved is that of data loss. In order to mitigate this, the backup job must be
completed in order for the upgrade process to happen. After the upgrade, the Operator will point
to the NEW PVC, meaning the clients original data is never touched. This old PVC can be left on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What storage requirements should we list for v3.9.0?

the system, and either manually deleted or cleaned up in a future upgrade.

## Design Details

### Test Plan

Testing can be done using the kuttl framework implemented in the Operator repo. This can be
used to unit test the kubernetes jobs.
Manual upgrades will be done, but for full end to end test on the entire matrix of upgrades, clear
passing requirements should be provided to QE.

### Upgrade / Downgrade Strategy

This is an open question to be discuessed.
Should downgrades be supported? (this is separate from a revert due to failure)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever heard of DB downgrades in production. @kwestpharedhat can chime in, but I'd be wary of reverting a DB upgrade.


## Drawbacks

There are no clear drawbacks, only risks. This is a required upgrade in order for security to be
maintained.

## Alternatives

There are no alternatives. For the client, the alternative would be to use unmanaged storage with
an upgraded version of Postgres, which is a bad user experience.

## Infrastructure Needed [optional]

There is no additional infrastructure needed.