Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] 2024.11.1 - Hotfix #2807

Open
18 of 25 tasks
viniciusdc opened this issue Oct 30, 2024 · 9 comments
Open
18 of 25 tasks

[RELEASE] 2024.11.1 - Hotfix #2807

viniciusdc opened this issue Oct 30, 2024 · 9 comments
Labels
type: release 🏷 Items related to Nebari releases

Comments

@viniciusdc
Copy link
Contributor

viniciusdc commented Oct 30, 2024

Release Checklist

Release details

Scheduled release date - 2024/11/01

Release captain responsible - @viniciusdc

Starting point - a new release is out

  • Create this issue to track and discuss the upcoming release.
  • Use the previous release issue for any final release-specific discussions, then close.
    • This can be a good time to debrief and discuss improvements to the release process.

Looking forward - planning

  • Create milestone for next release (if it doesn't already exist) and link it back here.
  • Triage bugs to determine what be should included in the release and add it to the milestone.
  • What new features, if any, will be included in the release and add it to the milestone.
    • This will be, in large part, determined by the roadmap.
    • Is there a focus for this release (i.e. UX/UI, stabilization, etc.)?

Pre-release process

  • Decide on a date for the release.
    • What outstanding issues need to be addressed?
    • Has documentation been updated appropriately?
    • Are there any breaking changes that should be highlighted?
    • Are there any upstream releases we are waiting on?
    • Do we need to update the dask versions in the nebari-dask?
    • Will there be an accompanying blog post?
  • Prepare for the release.
    • Update the nebari upgrade for this release
      • Add upgrade messaging including deprecation warnings, version specific warnings and so on.
    • Optionally, announce a merge freeze.
    • Release Candidate (RC) cycle.
    • Update RELEASE.md notes.

Cut the official release

If there were changes to the following packages, handle their releases before cutting a new release for Nebari

These steps must be actioned in the order they appear in this checklist.

@viniciusdc
Copy link
Contributor Author

For the changes made to this release, we have this tracking issue #2798

@viniciusdc
Copy link
Contributor Author

Tested locally to evaluate fixes, waiting for user feedback

@kenafoster
Copy link
Contributor

Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip

(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari --version
2024.11.1rc1
(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari upgrade -c nebari-config.yaml 

---> Starting upgrade from 2024.7.1 to 2024.9.1

Setting nebari_version to 2024.9.1

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n]  (Y): y

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n]  (Y): y

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n]  (Y): y



---> Starting upgrade from 2024.9.1 to 2024.11.1

Setting nebari_version to 2024.11.1

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n]  (Y): y

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n]  (Y): y

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n]  (Y): y

@kenafoster
Copy link
Contributor

kenafoster commented Nov 6, 2024

Attempting 2024.9.1 -> 2024.11.1rc1 upgrade on an existing AWS cluster, I noticed this:

Would you like Nebari to assign the corresponding role/scopes to all of your current groups automatically? [y/N] (N): y
...
ValueError: Failed to connect to Keycloak server: 401: b'{"error":"invalid_grant","error_description":"Invalid user credentials"}'

The reason for this is that I have changed security.keycloak.initial_root_password from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. I believe this is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.

@viniciusdc
Copy link
Contributor Author

Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip

This was expected the sense of skipping was merely in terms of not performing any action on behalf of the user, due to how the upgrade process works, there is not a skip process that we could use, something to enhance.

@viniciusdc
Copy link
Contributor Author

viniciusdc commented Nov 6, 2024

The reason for this is that I have changed security.keycloak.initial_root_password from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. This is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.

Uhm... indeed, that's something we are not considering in many of our tests, and as you commented, it is best practice. There is no easy way to acquire the new password programmatically, though, unless we request the user for it -- in case the attempt with the one in the yaml fails. What do you think?

We can also point the user on how to do this manually in case of an error (sounds like a good idea since it will be difficult to counter all the edge cases)

@dcmcand
Copy link
Contributor

dcmcand commented Nov 6, 2024

@kenafoster can you open a ticket about that issue? I don't think it is a release blocker, but it is a good callout and we should talk about how to best handle that situation in a separate issue. Thanks for all the work you are doing with testing!

@kenafoster
Copy link
Contributor

Yep, here it is #2833

@viniciusdc
Copy link
Contributor Author

viniciusdc commented Nov 21, 2024

I have taken care of the issues mentioned above and thoroughly tested them.

  • Tested upgrade path from 2024.7.1 -> 2024.11.1 and 2024.9.1 -> 2024.11.1;
  • Tested fresh deployment and upgrade deployments
  • Tested GPU usage since the ami_id logic is coupled with the instance_type classes;
  • Tested user creation, groups, and spawning of user instances. All the other resources were left untouched; thus, the previous tests done for 2024.9.1 apply.

Note:
While performing the check, I encountered a weird issue with my cluster where it attempted to re-create all resources, I am assuming it was an issue with my stages folder not being in sync with the storage one, I was able to perform successfully deployments once I replicate the same action with clean stages files

While doing this process, I also took some time to think about how we want to perform the hotfixes in the future and what are some of the policies we should consider in case similar situation like this one appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: release 🏷 Items related to Nebari releases
Projects
Status: New 🚦
Development

No branches or pull requests

3 participants