Timeseries Analytics Usecase #469

chirdeeptomar · 2022-11-21T23:30:14Z

chirdeeptomar
Nov 21, 2022

I have a use case where I have possibly millions of time-series datapoints per day for an entity, a handful of these entities and the lowest level of granularity is millisecond.

My API needs to aggregate these datapoint based on filters like date range, entity type etc or not aggregated at all. API is consumed by a single page application using a charting library.

I am looking for a solution that seamlessly switches between memory and disk based on the query and can present data within a few seconds at max.

I am thinking of using Microstream with Oracle to do a poc. Now, as I want the whole application to be lazy, and only load data into memory on the first API call for a set of filters applied. Data loaded has to be of the lowest level of granularity but then run aggregations to present a higher granularity dataset , as the user then zooms in on the graph, then use the data in memory to run the second query, so on and so forth.

Any suggestions? Can a scenario like this be achieved with Microstream? If so, any samples would be much appreciated.

Answered by hg-ms

Nov 25, 2022

Your assumptions are right.
With a HashMap as storage root that contains Lazy loaded lists Microstream will only load the HashMap at startup but not the Lists in the map. Those will be loaded later when you access the content of the Lazy-Reference. After loading the lazy referenced content into memory filtering can be done with pure Java in memory.

View full answer

hg-ms · 2022-11-24T12:51:50Z

hg-ms
Nov 24, 2022
Collaborator

I’d think it is possible to archive your scenario with Microstream.

The core component then would be the LazyLoading feature of Microstream.
For a more complex example you may have a look at the BookStore demo.

With those Lazy-References you can control the granularity of loaded data.
Additionally there is also the option to unload data behind a lazy reference manually when no more needed. Lazy Reference can be cascaded too.

Of course, there are some things to consider when working with lazy references:

Lazy-References occupy some memory even if not loaded.
Loading and storing data lazy also has some overhead
=> Don’t use the Lazy for single values, better use the Lazy for whole collections or large Objects.
When storing collections, you should know that the effort increases with the size of the collection.
Especially if you have a lot of add operations to e.g., lists it is much more effective to collect some amount of data and then store all at once.

In your scenario I could imagine a tree like data structure for each entity with the entity being the root. The leaves then hold a range of your data points. The root node and Intermediate nodes then could provide precalculated data and lazy load/unload their child nodes on demand.

One addition node when using a DB as storage target:
We don’t do any ORM mapping. Microstream stores data as blobs into the DB. If possible, I’d suggest using a local filesystem as storage target.

0 replies

chirdeeptomar · 2022-11-24T20:36:04Z

chirdeeptomar
Nov 24, 2022
Author

I am thinking of having a hashmap with a key of date and values as a lazy list of Entities eg signal that holds the x,y coordinates, timestamp etc fields. Attached to signal might be another object Metdata which I don't want loaded by default with Signal either so that is a Lazy reference again. At the start of the spring boot, I don't want anything loaded into memory, when a web request comes with a set of filters then only load the list of entities. If I don't have a hashmap at top level, microstream will have to load the whole graph into memory to do the filtering.

At a later point makes another async call to fetch all the metadata as well. Does this seem like a good approach.

Also, my assumption is filter happens in java so first thing to do would be to load the values from the hashmap based on the key date? I can't be loading billions/trillions of data points that might exist for the whole grapqh but load a subset based on the date. Once the subset of loaded into memory based on the date key, then filter/map etc. will be executed in-memory? Is that understanding correct?

0 replies

hg-ms · 2022-11-25T13:36:37Z

hg-ms
Nov 25, 2022
Collaborator

Your assumptions are right.
With a HashMap as storage root that contains Lazy loaded lists Microstream will only load the HashMap at startup but not the Lists in the map. Those will be loaded later when you access the content of the Lazy-Reference. After loading the lazy referenced content into memory filtering can be done with pure Java in memory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeseries Analytics Usecase #469

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Timeseries Analytics Usecase #469

chirdeeptomar Nov 21, 2022

Replies: 3 comments

hg-ms Nov 24, 2022 Collaborator

chirdeeptomar Nov 24, 2022 Author

hg-ms Nov 25, 2022 Collaborator

chirdeeptomar
Nov 21, 2022

hg-ms
Nov 24, 2022
Collaborator

chirdeeptomar
Nov 24, 2022
Author

hg-ms
Nov 25, 2022
Collaborator