Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak #9

Open
weixiyen opened this issue Apr 7, 2018 · 9 comments · Fixed by #10
Open

Memory Leak #9

weixiyen opened this issue Apr 7, 2018 · 9 comments · Fixed by #10

Comments

@weixiyen
Copy link

weixiyen commented Apr 7, 2018

I'm processing about 2000 images per minute across 8 nodes in production and noticed that system memory is going up.

:erlang.memory() is stable, but system memory goes up by about ~50mb per hour per node until the VM just crashes.

It seems that the NIF bindings from exmagick might be the issue here where memory is not being freed up somewhere.

The commands I'm using are init! image_load!, image_dump!, and size!

0.0.5

@dgvncsz0f
Copy link
Contributor

looking at the code my guess is image_dump!. enif_make_resource_binary requires enif_release_resource which is missing. i'll make a test to verify and if that is the case i'll make a patch fixing.

thanks!

@dgvncsz0f
Copy link
Contributor

@weixiyen could you test #10 and let me know if it fixes your problem?

@weixiyen
Copy link
Author

weixiyen commented Apr 11, 2018

I added the config to my deps, works fine locally.

{:exmagick, git: "https://github.com/Xerpa/exmagick.git", ref: "b9162a1bd0908d0e3d5d501a07764c0e2936eaa2"},

In prod getting this crash on startup for this commit hash whereas hex.pm's version 0.0.5 works fine.

2018-04-11 03:36:29 crash_report
    initial_call: {supervisor,kernel,['Argument__1']}
    pid: <0.3408.0>
    registered_name: []
    error_info: {exit,{on_load_function_failed,'Elixir.ExMagick'},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,352}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [kernel_sup,<0.3383.0>]
    messages: []
    links: [<0.3384.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 117
2018-04-11 03:36:29 supervisor_report
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {on_load_function_failed,'Elixir.ExMagick'}
    offender: [{pid,undefined},{id,kernel_safe_sup},{mfargs,{supervisor,start_link,[{local,kernel_safe_sup},kernel,safe]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]
2018-04-11 03:36:30 crash_report
    initial_call: {application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}
    pid: <0.3382.0>
    registered_name: []
    error_info: {exit,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,134}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [<0.3381.0>]
    messages: [{'EXIT',<0.3383.0>,normal}]
    links: [<0.3381.0>,<0.3380.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 152
2018-04-11 03:36:30 std_info
    application: kernel
    exited: {{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}
    type: permanent
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,

@dgvncsz0f
Copy link
Contributor

the master of exmagick includes flags to enable dirty scheduler that requires a fairly recent erlang version, probably that is why. we've been testing this in production for a while now and is stable but we haven't made a release yet to avoid breaking compatibility. let me fix that and we test again.

@dgvncsz0f
Copy link
Contributor

dgvncsz0f commented Apr 11, 2018

ok. i made dirty scheduler disabled by default [opt-in by using an env var]. could you try it again?

@weixiyen
Copy link
Author

Yep, trying it again.

I just updated my erlang distribution to erts-9.3 with the env flag turned on.

Will get back to you in about ~12 hours with some graphs to compare before / after.

Thank you!

@weixiyen
Copy link
Author

weixiyen commented Apr 12, 2018

Hi @dgvncsz0f - just checking back in.

I think there is still a memory leak, and it's kind of hard to tell how much the fix affected anything.

After running in prod for roughly 24 hours, there is still a big difference between what Erlang thinks it is using versus what the Linux Kernel thinks.

For example, on one node, Erlang thinks it is using 300MB of memory while the Linux Kernel thinks Erlang is using 5GB of memory.

According to Fred Herbert, this type of symptom could be caused by a NIF:
http://erlang.org/pipermail/erlang-questions/2013-September/075401.html

Although, I can't be 100% sure.

Is it possible that there are minor leaks related to either init!, image_load!, or size! functions?

I'll try to do some more testing on my end as well to see if I can get more helpful information

@dgvncsz0f
Copy link
Contributor

@weixiyen thanks. if you can provide more information it will be valuable. later today i'll continue investigating looking for other possible leaks. i'll get back to you as soon as i have something up.

Thanks!

@dgvncsz0f
Copy link
Contributor

sorry, accidentally closed when merged the pr...

@dgvncsz0f dgvncsz0f reopened this Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants