Immich upgrade always fails with "timed out waiting for the condition" #1687

Chaphasilor · 2023-10-30T10:53:55Z

I've been running Immich on my Truenas SCALE machine for a few weeks now and in that time updated about 5 times.
Every time, I got the following error:

Full error log:

Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 427, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 465, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1379, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1247, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/upgrade.py", line 115, in upgrade
    await self.upgrade_chart_release(job, release, options)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/upgrade.py", line 298, in upgrade_chart_release
    await self.middleware.call('chart.release.helm_action', release_name, chart_path, config, 'upgrade')
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1368, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1328, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1231, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/helm.py", line 44, in helm_action
    raise CallError(f'Failed to {tn_action} chart release: {stderr.decode()}')
middlewared.service_exception.CallError: [EFAULT] Failed to upgrade chart release: Error: UPGRADE FAILED: pre-upgrade hooks failed: timed out waiting for the condition

It seems like the upgrade itself is usually working though, and the version is up to date after refreshing the apps list.
For the upgrade from v1.81.1 to v1.82.1, my installation got corrupted and didn't come back up anymore. Luckily the pgBackup pod did create a DB dump, and I was able to re-install and import the old database.

I'm running TrueNAS-SCALE-22.12.4.2, and the issue also happend with an older version of Bluefin. The error also happens with two separate fresh installs, so that doesn't seem to be an issue either.

If someone could look into this, I would really appreciate it. Recovering my database wasn't easy because it involved messing around with kubectl, so I would be relieved if I didn't have to worry anymore when upgrading to a new version of Immich :)

If any additional info is needed, please let me know!

The text was updated successfully, but these errors were encountered:

stavros-k · 2023-10-31T16:22:13Z

Hello, it looks like the pre-upgrade job is failing, (its the job that takes a database backup before upgrade).
I've tried few upgrades myself and all succeeded.

Reason that I can think of that this could fail is:

Database is not accessible when the job is running.
Database dump takes too long

But you mention that the db dump was created which makes me super curious.

Once the job is started it runs these:

charts/library/common/templates/app_functions/_postgres.tpl

Lines 120 to 123 in 5896795

    
           until pg_isready -U ${POSTGRES_USER} -h ${POSTGRES_HOST}; do sleep 2; done 
        
           echo "Creating backup of ${POSTGRES_DB} database" 
        
           pg_dump --dbname=${POSTGRES_URL} --file {{ $backupPath }}/${POSTGRES_DB}_$(date +%Y-%m-%d_%H-%M-%S).sql || echo "Failed to create backup" 
        
           echo "Backup finished"

Next time you get the error can you please check the logs of that job? If you create a Debug artifact right after the failure it should include those logs to retrieve easily.

You can create the Debug artifact from System Settings -> Advanced -> Save Debug button on the top right corner.
Note: Do not share publicly the debug file as it might contain private details.

Chaphasilor · 2023-10-31T16:39:27Z

Thanks for the response, I'll do it for the 1.83.1 once that is released. Could cou clarify when and where exactly I should run those four commands? In the TrueNAS shell? Within the pgBackup pod? Or somewhere else entirely? 🤔

stavros-k · 2023-10-31T16:41:38Z

Thanks for the response, I'll do it for the 1.83.1 once that is released. Could cou clarify when and where exactly I should run those four commands? In the TrueNAS shell? Within the pgBackup pod? Or somewhere else entirely? 🤔

Oh, sorry you don't have to run those! I just referencing what the pre-upgrade job is running.
Not much that can go wrong there, unless the database is either not responding or the backup taking too long.

Chaphasilor · 2023-10-31T18:00:03Z

Right, I mistook "it runs these" for "run these". My bad. I'll get back to you after the update!

Just FYI, my Immich library has about 8k photos, 300 videos and 41 GB, and the uncompressed SQL dump is ~90 MB. It took only a few seconds to create when I did it manually before this update

stavros-k · 2023-11-01T14:17:15Z

Just FYI, my Immich library has about 8k photos, 300 videos and 41 GB, and the uncompressed SQL dump is ~90 MB. It took only a few seconds to create when I did it manually before this update

Yea, the size and duration you mention is in the ballpark I would expect

Chaphasilor · 2023-11-02T12:25:09Z

Just updated to v1.84 and this time there was no error, naturally 😅
What I did differently this time is not upgrading the machine learning container beforehand. I have set up the immich-machine-learning docker container separately and disabled the bundled ML container because that was giving me trouble at first. Usually I first shut down immich, then upgrade immich-machine-learning, then upgrade immich and start it back up (in hopes of not running into incompatibilities between immich and the immich-machine-learning).
I'll try to do that again next time to reproduce the issue

stavros-k · 2023-11-02T12:27:01Z

Just updated to v1.84 and this time there was no error, naturally 😅 What I did differently this time is not upgrading the machine learning container beforehand. I have set up the immich-machine-learning docker container separately and disabled the bundled ML container because that was giving me trouble at first. Usually I first shut down immich, then upgrade immich-machine-learning, then upgrade immich and start it back up (in hopes of not running into incompatibilities between immich and the immich-machine-learning). I'll try to do that again next time to reproduce the issue

Ah, I see, if you trigger the update while the app is stopped, the backup job cannot complete, because the database is stopped.

Chaphasilor · 2023-11-02T12:30:14Z

That does make sense. So you'll probably need to spin up the immich-postgres container too?
Btw, getting debug logs didn't work either 😅

stavros-k · 2023-11-02T13:04:32Z

That does make sense. So you'll probably need to spin up the immich-postgres container too?

There isn't a clean way to do that.

Case 1:
Postgres container runs already and is reachable -> Proceed with the backup
-> No problem!

Case 2:
Postgres container does not run and is not reachable -> Proceed with starting a container in the background, do the backup. Stop the container.
-> No problem!

Case 3:
Postgres container runs already but is not reachable. If you try to spin up a postgres container using the same data directory, there is a big chance to corrupt data since it will try to restore the write ahead log, but the already running container already manages that. Putting both container in a bad state.
-> Problem!!

Next SCALE release will include some extra metadata available to the Chart so it should handle upgrade from stopped state better along with this. But this also means that there wont a backup during the upgrade from stopped state, as the job will never fire or will fire but never complete (Need to check this!). But we can probably detect that with the extra metadata and provide a more useful error message, but TL;DR is that its better to upgrade from a running state, so the flow completes as it should.

In case you are wondering.. "Why not just check if the X container is running using X command?"

Well, Apps are just a GUI for generating manifests using Helm, that then get sent to Kubernetes.
Some shell scripts can be added in the containers to run at startup, but you cannot interact with the host to check what is running or not. Well you can but only if you give all the nasty elevated permissions to the container, which for obvious reasons you don't wanna do that.

Regarding the save debug process not completing, I'd suggest opening a ticket in https://jira.ixsystems.com in order to be checked.

Chaphasilor · 2023-11-02T14:15:41Z

Alright, I was hoping the new SCALE release would also bring some improvements for apps under the hood and not just a UI overhaul, good to hear!
TrueCharts always put a checkbox at the end of the config/edit dialog, "I have checked the documentation" and such, maybe you could also include something similar where you mention that it's best to upgrade from a running state without shutting down? 😁

Chaphasilor · 2023-11-08T17:42:18Z

Sweet, thanks for that! I take it this behavior will be available after refreshing the chart for the next Immich upgrade? If so, I'd go ahead and try it out asap! 😁

stavros-k · 2023-11-08T17:43:33Z

Not yet, the "code" will be on the app on the next immich App release, but the actual metadata will only exist on the next Cobia release.
Until then it will work as it does now.

Chaphasilor · 2023-11-08T17:44:47Z

Got it. If I don't forget about it, I'll check back once I upgraded to Cobia!

stavros-k mentioned this issue Nov 2, 2023

NAS-125042 / 24.04 / Add isStopped metadata to chart upgrade truenas/middleware#12440

Merged

stavros-k added the enhancement New feature or request label Nov 2, 2023

bugclerk mentioned this issue Nov 3, 2023

NAS-125042 / 23.10.1 / Add isStopped metadata to chart upgrade (by stavros-k) truenas/middleware#12447

Merged

stavros-k mentioned this issue Nov 6, 2023

Show a message when user tries to upgrade app with db from a stopped state #1708

Merged

stavros-k closed this as completed in #1708 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immich upgrade always fails with "timed out waiting for the condition" #1687

Immich upgrade always fails with "timed out waiting for the condition" #1687

Chaphasilor commented Oct 30, 2023 •

edited

Loading

stavros-k commented Oct 31, 2023

Chaphasilor commented Oct 31, 2023

stavros-k commented Oct 31, 2023

Chaphasilor commented Oct 31, 2023

stavros-k commented Nov 1, 2023

Chaphasilor commented Nov 2, 2023

stavros-k commented Nov 2, 2023 •

edited

Loading

Chaphasilor commented Nov 2, 2023

stavros-k commented Nov 2, 2023

Chaphasilor commented Nov 2, 2023

Chaphasilor commented Nov 8, 2023

stavros-k commented Nov 8, 2023

Chaphasilor commented Nov 8, 2023

Immich upgrade always fails with "timed out waiting for the condition" #1687

Immich upgrade always fails with "timed out waiting for the condition" #1687

Comments

Chaphasilor commented Oct 30, 2023 • edited Loading

stavros-k commented Oct 31, 2023

Chaphasilor commented Oct 31, 2023

stavros-k commented Oct 31, 2023

Chaphasilor commented Oct 31, 2023

stavros-k commented Nov 1, 2023

Chaphasilor commented Nov 2, 2023

stavros-k commented Nov 2, 2023 • edited Loading

Chaphasilor commented Nov 2, 2023

stavros-k commented Nov 2, 2023

Chaphasilor commented Nov 2, 2023

Chaphasilor commented Nov 8, 2023

stavros-k commented Nov 8, 2023

Chaphasilor commented Nov 8, 2023

Chaphasilor commented Oct 30, 2023 •

edited

Loading

stavros-k commented Nov 2, 2023 •

edited

Loading