Caching and Aliasing Queries #265

ryanhamilton · 2024-11-08T09:28:48Z

Users:

user-bg user-sit - Want to join slow and fast data from different sources. babeldb join queries as slow as slowest data source. #264
user-ab user-sit - Wants to join kdb to SQL (diff speed not issue)
user-bl user-sno - Wants to do one big query at top, then drill down into it.
user-ek -user-bof - Wanted to have reusable functions to allow checking the aliases.
Quants in general used notebooks to cache data then drill down into it. @electroly emphasised this cache once, then drill down was how he safely investigates data even though he has full DB access.

Facts

Users would find this caching / drilldown / prod-offload very useful.
The ownsers of the database would prefer fewer queries against production.
BUT we are not building a database, we are supporting multiple databases. Before every implementation we should ask, could this be done using a database itself?
We are supporting fast in-memory by using H2, it's fast but it's also not moving forward like DuckDB. So not for complex queries. Feels like the right choice for simple caching that we want.

Allow configuring queries on the dashboard and pulse level that users can then use as aliases.
e.g. select * from {{@orders}} where customer=`bob
Recommend they use QUERY_DB('SLOWSQL','SELECT * FROM CLIENTS', 30100) with a timer. That pushes it to the dashboard level.
Allow users to specify an alias for a query within a dashboard i.e. on the component editor. Main problem is that this would not carry over between sheets!! User could always look at sheet1, that relies on query on sheet3 so stops working.
We bundle DuckDB, they could upload CSVs or parquet into a DuckDB instance.

The text was updated successfully, but these errors were encountered:

ryanhamilton · 2024-12-07T10:15:03Z

user-andrew - wanted to drill down or filter into one table and all other UI updates to show that subset.

The one big table then drill-down - is current operating policy and simplifies other parts of development.
e.g. Only hits production database once, rest is within current middleware. Allows dragging and dropping to generate charts by less experienced users.
Cons = Hard to gauge memory usage when first adding a new table. Scalability to other ways of working, e.g. real-time.
Users love using filters , bookmarking them and they "feel like an analyst with all the power" including being able to export that one big table.
Dropdown on bar charts to select what is charted (useful to only perform one query rather than separate).
Click / Double-click to zoom in on data.

ryanhamilton · 2024-12-10T21:46:10Z

Caching query is set on timer with target table:
WIthin dashboard, can query "select * from tbltest" within CacheDB.
Notice it only updates on same time as cache query timer.
Data is periodically saved to folder and only permitted tables are accessible from each dashboard.

github-project-automation bot added this to Pulse Project Nov 8, 2024

ryanhamilton added the engine label Nov 8, 2024

ryanhamilton mentioned this issue Nov 8, 2024

babeldb join queries as slow as slowest data source. #264

Open