-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All builds stuck "discovering any new versions", seems to be NATS cert expiry underneath, recover instructions not worked #334
Comments
Figured this out -- I was using "sh" when I needed "bash" for this to be a valid command. |
I think I'm on step 6: "bosh deploy --recreate --fix <(bosh manifest)" It gives this error, which is the same error that "control-tower deploy" gives to me:
Any suggestions on how to debug or fix? |
I have deleted and recreated my Concourse, which has mostly worked but was massively disruptive |
Lots of my builds are failing with "Docker failed to start within 120 seconds." and lots are just hanging. I think it's because the worker is overloaded because I'm building so much in parallel as I'm starting from scratch, but this isn't a great failure behaviour. EDIT: this is happening even when the server is not busy, so something else is wrong. Any suggestions gratefully received |
I have updated to the latest version and this seems to be settling down. |
This has happened again, one year later I'm following the same instructions again, and failing again on step 6 with the same error |
I might have fixed this by trying lots of different things, including running the step 6: "bosh deploy --recreate --fix <(bosh manifest)" multiple times, then deleting both worker and web VMs, then (after bosh failed to recreate them with NATs cert errors), re-running the "control tower deploy" command |
Summary
All my builds got stuck saying "discovering any new versions" for ages. I looked at concourse/#844 and its linked issues for a while.
As part of that red herring, I found the following issue:
Which looks a lot like https://github.com/EngineerBetter/control-tower/blob/master/docs/troubleshooting.md#bosh-director-certificate-has-expired
We have had similar issues before and had followed the NATS cert renewal instructions last week in an attempt to avoid this.
I followed those instructions but then got:
I then tried to follow https://github.com/EngineerBetter/control-tower/blob/master/docs/troubleshooting.md#nats-certificate-is-expired
but I currently have:
In step 6 of the above, where it says " Run bosh deploy --recreate --fix <(bosh manifest)", what should I use for "bosh manifest"?
Steps to reproduce
Run Concourse for more than one year.
Expected results
Concourse should continue to work, or be easily recoverable.
If there are any errors they should be clear and suggest fixes.
Actual results
Concourse fails with all builds stuck on "discovering any new versions"
Additional context
Triaging info
Any help or advice would be gratefully received
The text was updated successfully, but these errors were encountered: