-
Hi, I have been stuck on this for the past week and eventually zeroed in on an issue with Microstream. (Or maybe I am using it wrong). We noticed several of our microservice pods getting OOMKilled even though I thought we set our java startup parameters correctly. Heap dumps, JFRs didn't show things that looked out of the ordinary, but when performing a 'top' I noticed the RES/RSS going way beyond what was even captured with java NMT (i.e. jcmd VM.native_memory). Eventually it gets killed and the cycle happens again. Our use case is simple. We want to cache data we pull from Oracle DB into microstream (We schedule a bulk load every 10 minutes to refresh it). One of our largest Lists is around 250MB which doesn't seem too large. At first I thought there was an issue with the NioFilesSystem but then changed the storage to SQLite and it ends up being the same but I am following the examples as much as possible. I know from a SO article microstream is implemented using direct ByteBuffers so this explains why I'm having a hard time trying to profile this using Java-based tools. I even tried jemalloc thinking it was a fragmentation problem. I believe it helped slow it down but didn't solve the leak. This is a simplified flow of our boot. Assuming the microstream database was already persisted, our pods start sequence was like this (we had more than 1 EmbeddedStorageManager -> 1 per cache):
Basically, we just wanted to pull the data from the file system, load it in memory for use, and shut down the storage manager until we need to reload it again. Now, I placed a bunch of code to test how the memory looked after I closed the StorageManager, and even cleared the reference to root and performed explicit GCs which works to clear it from Java heap. However, the RSS/RES was not going down and off-heap mem kept going up every time I did this. I even tried the various methods like:
to no avail. This was not noticeable with our smallest caches, but for the 250MB one, it was. btw, our root is set to this:
Our container is Oracle Linux 8.4 with the latest version of openjdk17, with helidon 2.4.1 (I used the one integrated with it 5.0GA but also confirmed the same with 7.0GA release of Microstream) I guess the first question is, are we using the StorageManager the right way? Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
FYI, after a lot of trial and error, I found a workaround to this issue. I ended up wrapping the root with a RootWrapper object containing a Lazy Reference, then just before I closed the StorageManager, I called the .clear() method. The off-heap RES then ended up getting freed as expected.
This also seems related to an earlier discussion about not being able to call .shutdown() and .start() right after, which is odd. Also, intuitively, any resources tied to the StorageManager should be cleanly closed. Especially since it implements AutoCloseable, I would expect it to act similarly to a DataSource connection.close(). This was particularly nasty since the leak didn't show up in the heap dumps, JFRs, or even NMT. |
Beta Was this translation helpful? Give feedback.
-
Hello, That solution with the Lazy reference is interesting. Do you keep a reference to the data returned from If doing so, this might explain why the memory is not freed after shutting down Microstream. Microstream will never clear any user data from memory (except if ‘Lazy’ is used), that is the task of the Java GC. If there is any reference to the returned value of In your use case it may be an option to put all the code to load the storage data in a single method, that way everything except the loaded data can be clean up easily:
Regarding the shutdown() / start() topic I have to apologize for that on behalf of the team. |
Beta Was this translation helpful? Give feedback.
-
FYI, we are at the postmortem phase of this issue as we are confirming this fix in a live environment. I will be closing this topic within our team but figured this might be useful to your team. It was the initial stacktrace that gave the strongest clue where the leak was occurring (Line numbers should match 05.00.02-MS-GA version of microstream):
|
Beta Was this translation helpful? Give feedback.
-
One last thing I wanted to note to clarify the issue. The GC is indeed clearing the objects as intended as observed in the JFRs and heap dumps. It stays well below our Xmx setting. It is the off-heap memory. I suspect there is a use case where the PersistenceManager did not properly call deallocateDirectByteBuffer() |
Beta Was this translation helpful? Give feedback.
-
Many thanks for your efforts, and sorry that I was not able to provide a good solution. Unfortunately, I was not able to create a scenario that causes a memory leak the way you described until now. Maybe I missed an important detail… |
Beta Was this translation helpful? Give feedback.
Many thanks for your efforts, and sorry that I was not able to provide a good solution.
Unfortunately, I was not able to create a scenario that causes a memory leak the way you described until now. Maybe I missed an important detail…
I’ll forward your bug description to your test guys, maybe they can reproduce that problem some day.