-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Create individual calculators in Compute Studio runs #95
base: master
Are you sure you want to change the base?
Conversation
@andersonfrailey This looks like the right idea. I'm inclined to offer something like a generic I'll have time to test #95 out this afternoon/early tomorrow and will report back. |
@andersonfrailey I refactored some of the changes you made in this PR so that functions were passed to dask instead of methods on (I made these changes mostly to figure out what the bottleneck was, but if you think they are helpful, I'm happy to help clean them up/re-think how they could better fit with Tax-Brain's API) I ran into the initial issue with passing large python objects to the dask workers again when this code was run in Tax-Brain/cs-config/cs_config/functions.py Lines 175 to 180 in d113622
I re-wrote these lines so that they didn't use dask since I wasn't sure how to run them without passing the With all of these changes, the total run time is about 150 seconds. This is about the same as the simulation times that we get on Compute Studio now. However, if we can get the table creation code parallelized, then I think we could get the run time down significantly. Without dask, it takes about 9 seconds/year to create the outputs. If we can parallelize that, maybe we can get it down to 30-45 seconds instead of 90 seconds. I measured this via the
I set up the dask workers by opening three terminal tabs:
|
After looking at the run time for |
This PR implements the idea @hdoupe proposed in issue #94. I've modified the
run
function of theTaxBrain
object to accept a new argumentcs_run
that will change when calculator objects are created in compute studio. Right now the tests are failing, but I'm hoping to get that fixed soon.I'm not sure what the best way to profile memory usage/speed in order to compare performance. Any ideas, @hdoupe?