Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ibis expressions are eagerly evaluated #5499

Closed
jthomasmock opened this issue Nov 25, 2024 · 6 comments
Closed

Ibis expressions are eagerly evaluated #5499

jthomasmock opened this issue Nov 25, 2024 · 6 comments

Comments

@jthomasmock
Copy link
Contributor

jthomasmock commented Nov 25, 2024

What I am noticing in positron is that when operating interactively with ibis, positron will eagerly fetch results (i suspect for the data explorer or variable views possibly via a repr called somewhere?). This negates one of the key advantages of the whole "lazily executed" query, especially when building up a complex query because that eager fetch will take quite a bit of time making interactive work much less useful.

This is reproduced on version 2024.09.0-1. A good way to see this is run this in a vscode notebook and then run the same notebook in positron. The chunk of code that creates the query below executes much more quickly in vscode. If one comments out this line ibis.options.interactive = True, it appears as though repr remains fast. However running ibis in interactive mode is definitely the optimal development experience hence reporting this for positron.

reprex

import ibis
from ibis import _
import time

ibis.options.interactive = True

con = ibis.duckdb.connect()
taxi = con.read_parquet("s3://voltrondata-labs-datasets/nyc-taxi")

start_time = time.time()
large_parties_with_cash = (
    taxi
    .filter(_.passenger_count > 3)
    .filter(_.payment_type == "Cash")
)
end_time = time.time()

print(f"Run the query in {end_time - start_time:.4f} seconds")

Originally posted by @boshek in #4574

@jthomasmock
Copy link
Contributor Author

Positron Version: 2024.12.0 (Universal) build 77
Code - OSS Version: 1.93.0
Commit: 4c20d05
Date: 2024-11-25T18:29:19.908Z
Electron: 30.4.0
Chromium: 124.0.6367.243
Node.js: 20.15.1
V8: 12.4.254.20-electron.0
OS: Darwin arm64 24.0.0

I confirmed the reprex is working, albeit with the slower speeds as @boshek indicates.

I'm assuming that this is related to either the variables pane or Data Explorer eagerly evaluating Python objects.

@boshek
Copy link

boshek commented Nov 27, 2024

Thanks for opening this @jthomasmock. Also seeing #5544 which likely is a duplicate.

wesm added a commit that referenced this issue Dec 5, 2024
… is True (#5625)

Addresses #5499 by adding a custom inspector for Ibis expressions. This
is very basic, and per #5573 should perhaps live eventually in Ibis
itself.

Ibis is a bit unusual in that its interactive mode causes computation to
be executed when running the `__repr__` method, for nice interactivity
in the console and in Jupyter notebooks. So here we avoid running the
`__repr__` method so we don't accidentally fire off a BigQuery,
Snowflake, or other query which might have unwanted costs or side
effects.

There is a unit test -- Ibis with DuckDB is a minor dependency to pull
in relative to the rest of our test dependencies so I do not think this
is too onerous.
@testlabauto
Copy link
Contributor

Verified Fixed

Positron Version(s) : 2025.01.0-32
OS Version          : OSX

Test scenario(s)

Query in initial filing is very fast regardless of interactive setting:
Run the query in 0.0021 seconds

Link(s) to TestRail test cases run or created:

@jthomasmock
Copy link
Contributor Author

@boshek -- do you want to try this out as well? Thanks!

@boshek
Copy link

boshek commented Dec 6, 2024

@jthomasmock - would love to. Any suggestions on how to build/install a dev version? I am seeing all these here post merge but nothing built. Also trying to grok how there is a 2025-01 version already 🤔
nevermind - i think builds will appear eventually and be added to those tags. I'll just wait.

@boshek
Copy link

boshek commented Dec 11, 2024

Can confirm that this fixes my issue. So grateful for this work. Genuine QOL improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants