Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolz.curry Seems to Cause 50x+ Performance Slowdown in Certain Cases #583

Open
duzixian opened this issue Sep 25, 2024 · 0 comments
Open

Comments

@duzixian
Copy link

duzixian commented Sep 25, 2024

Issue: Significant Performance Slowdown with toolz.curry

I encountered a performance slowdown where toolz.curry made my script over 50x slower in some cases.

Initially, I had the following code:

        return pipe(
            content,
            process_paragraph,
            map(lambda content
: (content, str([size, line.strip(), path]))),
        )

I was working with nearly 10 million records, and the process was extremely slow. After waiting for about 30 minutes, I had to stop the program.

After investigating, I found that modifying the code to the following version significantly improved performance:

        return map(
            lambda content: (content, str([size, line.strip(), path])),
            process_paragraph(content),
        )

At first, I thought the issue was with toolz.pipe. I reviewed its implementation and noticed it’s just a for loop. I then tried using the pipe functions from the expression and returns libraries, but both were still slow.

Next, I suspected the map function might be causing the slowdown. After further investigation, I discovered that the real issue was with toolz.curry, which was responsible for the drastic performance drop.

Simplified Test Case

To isolate the issue, I simplified my original script to the bare minimum for testing. Below is the test code:

from builtins import map, filter


import toolz
import cytoolz
import funcy
import expression
import pymonad.tools
import pydash
import returns.curry


import time


def process_something(num, curried_fn, lambda_fn):
    def process_iter(i):
        a = [str(i)]
        return list(curried_fn(lambda_fn)(a))

    return list(map(process_iter, range(num)))


def test_curry(flag, curried_fn, lambda_fn):
    t1 = time.perf_counter()
    num = 100_000
    rslt = process_something(num, curried_fn, lambda_fn)
    t2 = time.perf_counter() - t1
    print(f"{flag:<25} {t2:.5f}")
    return [flag, t2, rslt]


fn = filter
fn = map
fn_str = fn.__name__
lambda_fn = lambda x: x
args_num = 2

f2 = lambda fn, a, b: fn(a, b)


curry_list = {
    "toolz.curried": getattr(toolz.curried, fn_str),
    "cytoolz.curried": getattr(cytoolz.curried, fn_str),
    "toolz.curry": toolz.functoolz.curry(fn),
    "cytoolz.curry": cytoolz.functoolz.curry(fn),
    "lambda_curry": lambda x: lambda y: fn(x, y),
    "funcy.curry.seqs": funcy.curry(getattr(funcy.seqs, fn_str)),
    "funcy.curry": funcy.curry(fn),
    "funcy.autocurry": funcy.autocurry(fn),
    "funcy.autocurry.seqs": funcy.autocurry(getattr(funcy.seqs, fn_str)),
    "expression.seq": getattr(expression.collections.seq, fn_str),
    "expression.curry": expression.curry(args_num - 1)(fn),
    "pymonad.tools.curry": pymonad.tools.curry(args_num)(fn),
    "pydash.functions.curry": pydash.functions.curry(f2)(fn),
    "returns.curry": returns.curry.curry(f2)(fn),
}


print(f"{fn_str:<25} time")
t = curry_list
rslt = []
for i in t:
    r = test_curry(i, t[i], lambda_fn)
    rslt.append(r[2])

rslt_set = set(str(i) for i in rslt)
assert len(rslt_set) == 1

Test Results

fn = map
fn_str = fn.__name__
lambda_fn = lambda x: x 

map                       time
toolz.curried             11.18867
cytoolz.curried           9.66267
toolz.curry               11.77486
cytoolz.curry             9.98406
lambda_curry              0.15762
funcy.curry.seqs          0.51989
funcy.curry               0.16449
funcy.autocurry           1.64497
funcy.autocurry.seqs      0.95207
expression.seq            0.87715
expression.curry          0.26653
pymonad.tools.curry       0.33536
pydash.functions.curry    0.93974
returns.curry             3.67666



fn = filter
fn_str = fn.__name__
lambda_fn = lambda x: x is None

filter                    time
toolz.curried             7.93439
cytoolz.curried           7.21578
toolz.curry               7.50556
cytoolz.curry             6.93008
lambda_curry              0.20139
funcy.curry.seqs          0.22452
funcy.curry               0.10085
funcy.autocurry           0.82936
funcy.autocurry.seqs      0.49786
expression.seq            0.34209
expression.curry          0.15820
pymonad.tools.curry       0.20793
pydash.functions.curry    0.43155
returns.curry             2.11528

Additional Note

toolz.curried.map(fn, iter) does not affect performance. The performance issue only occurs with toolz.curried.map(fn)(iter).

System Information

╰─ uname -a
Linux localhost 4.19.273-VK-X-g0ec5bda45854 #327 SMP PREEMPT Fri Jun 28 14:28:49 CST 2024 aarch64 aarch64 aarch64 GNU/Linux


╰─ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal


╰─ python -V
Python 3.12.4


╰─ pip show toolz cytoolz
Name: toolz                                                                 Version: 0.12.1
Summary: List processing tools and functional utilities
Home-page: https://github.com/pytoolz/toolz/
Author: https://raw.github.com/pytoolz/toolz/master/AUTHORS.md
Author-email:
License: BSD
Location: /root/.pyenv/versions/3.12.4/envs/daily/lib/python3.12/site-packages
Requires:
Required-by: cytoolz
---
Name: cytoolz
Version: 0.12.3
Summary: Cython implementation of Toolz: High performance functional utilities
Home-page: https://github.com/pytoolz/cytoolz
Author: https://raw.github.com/pytoolz/cytoolz/master/AUTHORS.md
Author-email: [email protected]
License: BSD
Location: /root/.pyenv/versions/3.12.4/envs/daily/lib/python3.12/site-packages
Requires: toolz
Required-by:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant