Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker exceeded 95% memory budget #476

Open
beyucel opened this issue Jun 2, 2020 · 6 comments
Open

Worker exceeded 95% memory budget #476

beyucel opened this issue Jun 2, 2020 · 6 comments
Assignees
Labels
Milestone

Comments

@beyucel
Copy link
Contributor

beyucel commented Jun 2, 2020

I just wanted to discuss the memory usage issue with this notebook.
When chunk size is above 25 ( >250 mb), a single worker gets to 6.3 GB memory usage and restarts the kernel. When Chunk size is 25 and below, there is no problem.

My question is, why do 300mb chunks have this high memory usage issue?

@wd15
Copy link
Contributor

wd15 commented Jun 2, 2020

We need to profile the memory usage. Let's check the delta first to see if that makes sense.

@wd15 wd15 added this to the 0.4 milestone Jun 2, 2020
@beyucel
Copy link
Contributor Author

beyucel commented Jun 3, 2020

I initially tried the memory_profiler and %memit magic function. Not sure if it does what we want , or I did not use it correct ( I am investigating that) because it shows that (notebook ) peak memory: 219.80 MiB, increment: 14.20 MiB and it does not seem reasonable. I am following the htop and memory usage for those lines is a lot higher. I will try other two memory profiler as well.

@wd15
Copy link
Contributor

wd15 commented Jun 3, 2020

I would do all the memory profiling outside of the notebook for starters as that can confuse things. Also, start with only one process to get a good benchmark to make sure you understand the delta between each step in the code. Furthermore, break the code down into imperative steps might help.

@beyucel
Copy link
Contributor Author

beyucel commented Jun 3, 2020

Thanks, Daniel. That is what I am trying to do right now. I will share the delta values of each process

@beyucel
Copy link
Contributor Author

beyucel commented Jun 4, 2020

Filename: memory_try.py

Line Mem usage Increment Line Contents

39  183.129 MiB  183.129 MiB   @profile
40                             def HomogenizationPipeline(x):
41  183.215 MiB    0.086 MiB       a1=PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0).transform(x)
42  183.762 MiB    0.547 MiB       a2=TwoPointCorrelation(periodic_boundary=True, cutoff=15,correlations=[(1,1)]).transform(a1)
43  183.762 MiB    0.000 MiB       a3=FlattenTransformer().transform(a2)
44 10015.367 MiB 9831.605 MiB       a4=PCA(n_components=3).fit_transform(a3)
45 10015.367 MiB    0.000 MiB       return a4

================================================
This is the non-compute version this does not tell much because the first three lines are lazy and all computation is made with PCA fit transform. I will add the compute version as well for discussion. This still uses the same notebook as above ( I just wrote a separate .py file with the notebook and used the notebook as a shell)

@beyucel beyucel modified the milestones: 0.4, 0.4.1 Jul 2, 2020
@wd15 wd15 modified the milestones: 0.4.1, 0.4.2 Aug 17, 2020
@wd15 wd15 modified the milestones: 0.4.2, 0.5 Aug 2, 2021
@wd15
Copy link
Contributor

wd15 commented Aug 3, 2021

@beyucel is this still an issue? Can this be closed? Please close if you think that this isn't something we can act on

@wd15 wd15 modified the milestones: 0.5, 0.5.1 Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants