Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy Azure Cosmos DB throwing exception #1266

Open
masashi-shib opened this issue Nov 3, 2021 · 19 comments
Open

Destroy Azure Cosmos DB throwing exception #1266

masashi-shib opened this issue Nov 3, 2021 · 19 comments
Labels
blocked The issue cannot be resolved without 3rd party action. impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec needs-repro Needs repro steps before it can be triaged or fixed

Comments

@masashi-shib
Copy link

Hello!

  • Vote on this issue by adding a 👍 reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

When running pulumi destroy we get an exception that the Cosmos DB Account has already an ongoing operation, which seems to be the Delete operation from pulumi destroy.

When trying to run it again it will again throw another exception with the same error.

Note: Azure Cosmos DB Account takes a lot of time to be provisioned and to get deleted.

  pulumi:pulumi:Stack :
    error: update failed

  azure-native:documentdb:DatabaseAccount:
    error: Code="PreconditionFailed" Message="There is already an operation in progress which requires exclusive lock on this service xxx. Please retry the operation after sometime.\r\nActivityId: xxxx, Microsoft.Azure.Documents.Common/2.14.0"

package.json

"dependencies": {
    "@pulumi/azure": "^4.0.0",
    "@pulumi/azure-native": "^1.0.0",
    "@pulumi/pulumi": "^3.3.1",

Steps to reproduce

  1. Write code to create a new DB Account return new azureNative.documentdb.DatabaseAccount(...
  2. pulumi up
  3. pulumi destroy

Expected: I am not sure maybe the author can reply :)
Actual: Exception is thrown from pulumi

@masashi-shib masashi-shib added the kind/bug Some behavior is incorrect or out of spec label Nov 3, 2021
@mikhailshilkov mikhailshilkov added the awaiting-feedback Blocked on input from the author label Nov 4, 2021
@mikhailshilkov
Copy link
Member

Is it just the DatabaseAccount resource that you are creating or any other Cosmos resources? Can you share the particular code for the account? (we have a simple nightly test with it and we don't get these errors). Thank you!

@masashi-shib
Copy link
Author

@mikhailshilkov thank you for the response.
Basically we are creating the account and also a Database resource.

new documentdb.DatabaseAccount(...)
new documentdb.MongoDBResourceMongoDBDatabase(...)

Strange enough it is not consistently reproduceable.

@mikhailshilkov
Copy link
Member

How many regions are you deploying to?

@masashi-shib
Copy link
Author

Just one region currently.

@mikhailshilkov mikhailshilkov added needs-repro Needs repro steps before it can be triaged or fixed and removed awaiting-feedback Blocked on input from the author labels Nov 8, 2021
@masashi-shib
Copy link
Author

AzureCosmosProblem

Following you can find the state of the Cosmos DB Account in Azure Portal.
As you can see it stays in deleting state for 5 - 10 minutes after Pulumi has thrown that error.
Again it is not consistently reproducible.

@justinmchase
Copy link

Any update on this? I am seeing something similar and there is remarkably little about this error message available.

@sloncho
Copy link

sloncho commented Oct 26, 2022

CosmosDB is very slow to delete. I suspect the reason behind the exception is, that the delete operation times out, and pulumi (or the sdk) retries, and as the resource is already under deletion, it throws. We solved this by using a custom resource options with custom timeouts for the delete operation.

But maybe change the defaults for dbaccount is better.

@mikhailshilkov mikhailshilkov added the impact/reliability Something that feels unreliable or flaky label Dec 20, 2022
@mikocot
Copy link

mikocot commented Oct 2, 2023

@mikhailshilkov is that something you're planning to fix anytime soon? It's been ~2 years since the bug got reported, we ocassionally have the same issue.

@mikocot
Copy link

mikocot commented Nov 8, 2023

We've seen the same issue with other resources depending on cosmos, this time private endpoint. I guess the issue still persists.

@danielrbradley
Copy link
Member

@mikocot it looks like we've not recieved a way to reliably reproduce the issue so this will hamper efforts to find a fix.

From the original conversation here, it appears that this error might have just been related to a delete taking too long, the pulumi deployment timing out, then the next deployment failing because the previous deletion was still in progress, though it's impossible to be sure without the repro.

If you have a way to reproduce a similar issue for another resource, I'd suggest opening that as a new issue.

@thomas11 thomas11 self-assigned this Dec 6, 2023
@thomas11
Copy link
Contributor

thomas11 commented Dec 8, 2023

Hi everyone, I gave it another try to reproduce this issue.

I wrote an Automation API program that creates N stacks in parallel. Each one has a CosmosDB account with a database in it, plus a Cosmos MongoDB account with a database in it.

I ran with N up to 30 and in different Azure regions. My results were pretty consistent. It always succeeded. With N=10 it took around 10 minutes total (wall clock) time, with N=30 around 37 minutes.

@mikocot
Copy link

mikocot commented Dec 12, 2023

@thomas11 @danielrbradley we don't have a reliable way to reproduce it, but my guess is that the cosmos DB needs to be in some kind of use at least that causes the lock. Anyway, for the moment we don't have any more detrails but we also don't see it often.

@mjeffryes mjeffryes added this to the 0.98 milestone Dec 15, 2023
@lukehoban lukehoban added the blocked The issue cannot be resolved without 3rd party action. label Dec 18, 2023
@mjeffryes mjeffryes assigned caboog and unassigned thomas11 Dec 18, 2023
@mjeffryes mjeffryes removed this from the 0.98 milestone Jan 26, 2024
@caboog
Copy link

caboog commented Feb 14, 2024

Hi all. We are closing this issue as there is no reliable way to reproduce it. If you find a way to repro this, please open a new issue with those steps.

Thnaks.

@caboog caboog closed this as completed Feb 14, 2024
@caboog caboog closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2024
@justinmchase
Copy link

Normally what you would do is leave the issue open until its fixed. People will come here and keep adding more and more context until a reliable reproduction can be found.

@danielrbradley
Copy link
Member

@justinmchase agreed, we'll leave this open for visibility.

If anyone who's experienced this can let us know if it's been fixed upstream, we'll then close this to clear the issue backlog.

@serpentfabric
Copy link

@justinmchase agreed, we'll leave this open for visibility.

If anyone who's experienced this can let us know if it's been fixed upstream, we'll then close this to clear the issue backlog.

we just ran into it again yesterday... hth...

@jirikopecky
Copy link

Hello, this recently started to happen pretty much with every CosmosDB destroy, fix would be much appreciated.

@thomas11
Copy link
Contributor

Hi @jirikopecky, if it happens very reliably for you, it would be very helpful if you capture verbose logs. They will contain some data like your subscription id, though, so might want to replace it or filter the log down to the HTTP requests and responses to/from Azure.

@jirikopecky
Copy link

We use Github actions, so to my knowledge its not supported to do so - pulumi/actions#589

But to sum things up:

  • We have 2 stacks (global and regional)
  • Global stack have Cosmos DB in it
  • Regional stack creates private endpoint for Cosmos to access it securely
  • They are deleted in reverse order (first regional, then global)
  • Global stack deletion fails with this error on Cosmos resource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked The issue cannot be resolved without 3rd party action. impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec needs-repro Needs repro steps before it can be triaged or fixed
Projects
None yet
Development

No branches or pull requests