Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching and Aliasing Queries #265

Open
ryanhamilton opened this issue Nov 8, 2024 · 2 comments
Open

Caching and Aliasing Queries #265

ryanhamilton opened this issue Nov 8, 2024 · 2 comments
Labels

Comments

@ryanhamilton
Copy link
Contributor

ryanhamilton commented Nov 8, 2024

Users:

  • user-bg user-sit - Want to join slow and fast data from different sources. babeldb join queries as slow as slowest data source. #264
  • user-ab user-sit - Wants to join kdb to SQL (diff speed not issue)
  • user-bl user-sno - Wants to do one big query at top, then drill down into it.
  • user-ek -user-bof - Wanted to have reusable functions to allow checking the aliases.
  • Quants in general used notebooks to cache data then drill down into it. @electroly emphasised this cache once, then drill down was how he safely investigates data even though he has full DB access.

Facts

  1. Users would find this caching / drilldown / prod-offload very useful.
  2. The ownsers of the database would prefer fewer queries against production.
  3. BUT we are not building a database, we are supporting multiple databases. Before every implementation we should ask, could this be done using a database itself?
  4. We are supporting fast in-memory by using H2, it's fast but it's also not moving forward like DuckDB. So not for complex queries. Feels like the right choice for simple caching that we want.

Questions

  • What do other systems do?

Options:

  1. Allow configuring queries on the dashboard and pulse level that users can then use as aliases.
    e.g. select * from {{@orders}} where customer=`bob
  2. Recommend they use QUERY_DB('SLOWSQL','SELECT * FROM CLIENTS', 30100) with a timer. That pushes it to the dashboard level.
  3. Allow users to specify an alias for a query within a dashboard i.e. on the component editor. Main problem is that this would not carry over between sheets!! User could always look at sheet1, that relies on query on sheet3 so stops working.
  4. We bundle DuckDB, they could upload CSVs or parquet into a DuckDB instance.
@ryanhamilton
Copy link
Contributor Author

ryanhamilton commented Dec 7, 2024

user-andrew - wanted to drill down or filter into one table and all other UI updates to show that subset.

  • The one big table then drill-down - is current operating policy and simplifies other parts of development.
    e.g. Only hits production database once, rest is within current middleware. Allows dragging and dropping to generate charts by less experienced users.
    Cons = Hard to gauge memory usage when first adding a new table. Scalability to other ways of working, e.g. real-time.
  • Users love using filters , bookmarking them and they "feel like an analyst with all the power" including being able to export that one big table.
  • Dropdown on bar charts to select what is charted (useful to only perform one query rather than separate).
  • Click / Double-click to zoom in on data.

@ryanhamilton
Copy link
Contributor Author

Proposed UI

  • Caching query is set on timer with target table:
  • WIthin dashboard, can query "select * from tbltest" within CacheDB.
  • Notice it only updates on same time as cache query timer.
  • Data is periodically saved to folder and only permitted tables are accessible from each dashboard.

image

image

image

Untitled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

1 participant