Best way to extract the top N keys from a local cache? #85

javafanboy · 2022-09-26T08:16:09Z

javafanboy
Sep 26, 2022

When starting new VMs participating as non-storage enabled nodes in a "near cache" with already several tens of nodes, we would like to pre-load it with the "best" key values based on the other front caches on already running VMs usage statistics (we need to think about if creating some average or trusting the longest running VM etc) and my question is what the best method would be to getting hold of the "top N" keys from a local cache?

My plan is to onced I know the top N keys perform a number of getAll calls with the very best keys in the last one (this way they will have a slightly more recent access time making them a little less likely to get evicted). My thinking is also to use an N that is quite a bit smaller than the near cache capacity once again to avoid the "good" keys/values from getting evicted before they have started picking up use and to keep the load time down to get the new VMs into use fast.

I am not aware of any way to otherwise "prime" the usage data of newly loaded keys to make them less likely to get evicted - if anybody have some tips on that let me know!

aseovic · 2022-10-05T21:16:20Z

aseovic
Oct 5, 2022
Maintainer

@javafanboy We don't keep statistics at the key level (and it would be cost prohibitive to do so), but you could possibly use the Invocation Service to get the front cache key set from each of the existing storage disabled members and then preload the intersection of those key sets, or keys that are present on > X% of members, or something similar.

That said, you may not gain much, if anything -- it may be more expensive to calculate and preload the entries that you think are "the best", than to simply load them as the requests on the new member come in.

Basically, you are optimizing a simple get from a remote member, which is not an expensive operation unless the entry itself is pretty big (I assume that the latency between storage enabled and storage disabled members is quite low). It may be best just to let the near cache do its thing and prime itself based on the actual requests coming in. Just my two cents.

10 replies

ben-manes Oct 11, 2022

That’s true, the callbacks and things could be ignored (or only partially applied). Not all of the policy methods are read only, so there would be footguns but certainly expected to be unsafe if used so maybe okay.

aseovic Oct 11, 2022
Maintainer

@javafanboy Do you have an opinion on the subject? Would exposing Caffeine Cache via public API be beneficial enough to warrant doing it, even though it makes it easier for the users such as yourself to shoot themselves in a foot? ;-)

javafanboy Oct 11, 2022
Author

@javafanboy Do you have an opinion on the subject? Would exposing Caffeine Cache via public API be beneficial enough to warrant doing it, even though it makes it easier for the users such as yourself to shoot themselves in a foot? ;-)

I am happy for all chances to optionally shoot myself in the foot if that provide new useful functionality - i.e. as long as it does not make the most common usage patterns more complex or error-prone it is fine :-)

Easy things should be easy to do while it is ok that advanced things are a bit trickier and even requires more "doing them in the right way" to work properly...

aseovic Oct 11, 2022
Maintainer

@javafanboy That's how I feel about it as well. This is a fairly advanced feature, and you will have to jump through several hoops to get the backing map, cast it into the correct type, and access the property it provides. Anyone who can figure out how to do all that is likely knowledgable enough not to use that property the wrong way.

Plus, it's not like we could prevent you from accessing it via reflection anyway -- this will just make it simpler and cleaner.

All right, I'll open an issue and make the change. Should be in the next patch release for both 22.06.x and 22.09.x code lines.

javafanboy Oct 12, 2022
Author

Sounds good - thanks!

javafanboy · 2022-10-13T05:36:48Z

javafanboy
Oct 13, 2022
Author

From my personal point of view my previous improvement proposal "Allow auto scaling without making cluster vulnerable" is however much more important as it makes a big difference for Coherence CEs suitability in a (non-Kubernettes) cloud environment.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to extract the top N keys from a local cache? #85

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best way to extract the top N keys from a local cache? #85

javafanboy Sep 26, 2022

Replies: 2 comments · 10 replies

aseovic Oct 5, 2022 Maintainer

ben-manes Oct 11, 2022

aseovic Oct 11, 2022 Maintainer

javafanboy Oct 11, 2022 Author

aseovic Oct 11, 2022 Maintainer

javafanboy Oct 12, 2022 Author

javafanboy Oct 13, 2022 Author

javafanboy
Sep 26, 2022

Replies: 2 comments 10 replies

aseovic
Oct 5, 2022
Maintainer

aseovic Oct 11, 2022
Maintainer

javafanboy Oct 11, 2022
Author

aseovic Oct 11, 2022
Maintainer

javafanboy Oct 12, 2022
Author

javafanboy
Oct 13, 2022
Author