Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] Improved data structure #2561

Open
markov00 opened this issue Oct 29, 2024 · 0 comments
Open

[performance] Improved data structure #2561

markov00 opened this issue Oct 29, 2024 · 0 comments
Labels
:performance Performance related issues :xy Bar/Line/Area chart related

Comments

@markov00
Copy link
Member

Our current data processing strategy is not very optimized.
We have many steps that are also repeated, multiple loops around the data, and a structure that doesn't adapt much to the requested calculations.

In particular:

  1. the library provides multiple ways to describe data: everything in a single spec, multiple specs, the spec grouping, data split accessors, and y accessors. This increases the logic complexity in charts with the need to align all these into a single set of "data tables"
  2. there are multiple waste of data scans to compute data extents or to fill up some missing details. We can probably improve these scans limiting their number.
  3. the way we describe categorical grouping (groupId, specId, splitAccessors, yAccessors) is not great and increases the complexity and time spent to compose and decompose that grouping.
  4. a lof of processing generate different alternatives of the same dataset but without the possibility to being reused.

All these unoptimized calculations are probably wasting processing time and should be solved. There are probably a couple of tasks to go for:

  • collect all the processing requirements for cartesian charts (all the operations, statistics, and calculations applied to the data today)
  • research and test for an improved general data structure that reduces times for accessing the data, reduces memory usages by reducing the number of permutations, and copies of our data points and offers a simplified and optimized way to compute what we need.
  • benchmark 4/5 different chart cases with the current setup and the alternative.
@markov00 markov00 added :performance Performance related issues :xy Bar/Line/Area chart related labels Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:performance Performance related issues :xy Bar/Line/Area chart related
Projects
None yet
Development

No branches or pull requests

1 participant