Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventSourcedBehavior/Durable State exposed to stack overflow when lots of read-only commands are in the stash #1327

Open
JacobF7 opened this issue May 15, 2024 · 6 comments
Labels
bug Something isn't working help wanted Extra attention is needed needs-reproducible-test
Milestone

Comments

@JacobF7
Copy link

JacobF7 commented May 15, 2024

Both EventSourcedBehavior and Durable state seem to be exposed to stack overflow exceptions when lots of read-only commands are in the stash. This bug has already been opened as an issue for Akka and a fix has also been recently merged. Are there any plans of fixing this in Pekko too?

@pjfanning
Copy link
Contributor

pjfanning commented May 15, 2024

Legally, we can't take any code that is committed to Akka since its license changed. If someone wants to attempt a clean room fix, that would be great.

Even a reproducible test case that someone else can use to do a clean room fix would be much appreciated.

@He-Pin
Copy link
Member

He-Pin commented May 15, 2024

@JacobF7 would you like to prepare a reproducer? I'm not using persistent at work.

@pjfanning pjfanning added help wanted Extra attention is needed needs-reproducible-test bug Something isn't working labels May 16, 2024
@pjfanning pjfanning added this to the 1.1.0-M2 milestone May 16, 2024
@pjfanning
Copy link
Contributor

I've read the akka/akka#29933 but don't want to look at the PR since the license is not compatible with Apache Pekko. It seems like the issue is that 'tryUnstashOne will, after a few steps, call the onCommand method that in turn will call tryUnstashOne'.

So it seems like we need to make tryUnstashOne non-recursive. We may want to allow a small number of recursive calls but after a certain depth, we will need to use something like sending a message to continue the unstashing. There is a good chance that we will need to wait for a response message that tells us the async unstashing has completed.

Alternatively, we could look at turning this into a loop instead of using recursion.

@pjfanning
Copy link
Contributor

@JacobF7 Can I get some background? I reproduced the issue but with a use case that is not real world. Have you hit this issue in the real world?

I have messed about with some changes but in all the honesty, they tend to break stuff.

If someone else wants to try their own solution, please feel free to have a look. Just a reminder, we can't accept any solutions that rely on examining the Akka changes.

One extra potential solution is to ignore the read only events if there are just too many of them. In my testing, it takes > 1000 such events to cause an issue. Is there a good reason not to limit the number of read only events?

@He-Pin
Copy link
Member

He-Pin commented May 25, 2024

@JacobF7 Would you like to provide a reproducer? thanks.

@JacobF7
Copy link
Author

JacobF7 commented May 26, 2024

@pjfanning - In our scenario (gambling), we have an actor that receives on peak loads 15,000 requests per second (both read and write events). Unfortunately, in this particular scenario we don't have the option of distributing the load because many users will need to read and write to the same resource (actor). Moreover write events need to be handled 1 by 1 in the same order that they are received in the inbox. From our performance tests, it seems that Pekko Persistence (Event Sourcing in particular, but also Durable State) was not the right fit for this amount of load, so instead we opted to use a regular actor without persistence. Persistence was handled by manually snapshotting the state of the actor periodically.

@He-Pin - As @pjfanning mentioned, the issue can be replicated by increasing the load (for instance submitting 2000 read requests).

@pjfanning pjfanning modified the milestones: 1.1.0, 1.1.1 Aug 23, 2024
@pjfanning pjfanning modified the milestones: 1.1.1, 1.1.2 Sep 9, 2024
@pjfanning pjfanning modified the milestones: 1.1.2, 1.1.x Sep 27, 2024
@pjfanning pjfanning modified the milestones: 1.1.x, 1.2.0 Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed needs-reproducible-test
Projects
None yet
Development

No branches or pull requests

3 participants