Timeseries Analytics Usecase #469
-
I have a use case where I have possibly millions of time-series datapoints per day for an entity, a handful of these entities and the lowest level of granularity is millisecond. My API needs to aggregate these datapoint based on filters like date range, entity type etc or not aggregated at all. API is consumed by a single page application using a charting library. I am looking for a solution that seamlessly switches between memory and disk based on the query and can present data within a few seconds at max. I am thinking of using Microstream with Oracle to do a poc. Now, as I want the whole application to be lazy, and only load data into memory on the first API call for a set of filters applied. Data loaded has to be of the lowest level of granularity but then run aggregations to present a higher granularity dataset , as the user then zooms in on the graph, then use the data in memory to run the second query, so on and so forth. Any suggestions? Can a scenario like this be achieved with Microstream? If so, any samples would be much appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I’d think it is possible to archive your scenario with Microstream. The core component then would be the LazyLoading feature of Microstream. With those Lazy-References you can control the granularity of loaded data. Of course, there are some things to consider when working with lazy references:
In your scenario I could imagine a tree like data structure for each entity with the entity being the root. The leaves then hold a range of your data points. The root node and Intermediate nodes then could provide precalculated data and lazy load/unload their child nodes on demand. One addition node when using a DB as storage target: |
Beta Was this translation helpful? Give feedback.
-
I am thinking of having a hashmap with a key of date and values as a lazy list of Entities eg signal that holds the x,y coordinates, timestamp etc fields. Attached to signal might be another object Metdata which I don't want loaded by default with Signal either so that is a Lazy reference again. At the start of the spring boot, I don't want anything loaded into memory, when a web request comes with a set of filters then only load the list of entities. If I don't have a hashmap at top level, microstream will have to load the whole graph into memory to do the filtering. At a later point makes another async call to fetch all the metadata as well. Does this seem like a good approach. Also, my assumption is filter happens in java so first thing to do would be to load the values from the hashmap based on the key date? I can't be loading billions/trillions of data points that might exist for the whole grapqh but load a subset based on the date. Once the subset of loaded into memory based on the date key, then filter/map etc. will be executed in-memory? Is that understanding correct? |
Beta Was this translation helpful? Give feedback.
-
Your assumptions are right. |
Beta Was this translation helpful? Give feedback.
Your assumptions are right.
With a HashMap as storage root that contains Lazy loaded lists Microstream will only load the HashMap at startup but not the Lists in the map. Those will be loaded later when you access the content of the Lazy-Reference. After loading the lazy referenced content into memory filtering can be done with pure Java in memory.