Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in Parsing #54

Open
jkrot opened this issue Aug 3, 2016 · 22 comments
Open

Memory Leak in Parsing #54

jkrot opened this issue Aug 3, 2016 · 22 comments

Comments

@jkrot
Copy link

jkrot commented Aug 3, 2016

https://github.com/chafey/dicomParser/blob/34c02a7cec6b71a68ae21856029a8f5d957e900e/src/readSequenceElementExplicit.js

So I have been doing tests for parsing 1000+ files and I keep running into a memory leak/ inefficient garbage collection happening with these functions. It is making out my heap and causing the parsing to crawl waiting for garbage collection to finish. I was wondering if anyone else ran into this problem. Also I think I may be able to fix the underlying issue but I might need to talk it through with @chafey

@yagni
Copy link
Collaborator

yagni commented Aug 3, 2016

What's your use case and what browser? I've been using the library to parse a 2GB directory (about 3200 files) with less than 150MB memory usage. Once I parse the files and extract the field values I need, I unset the byteArrays and that keeps the memory around 130MB. Are you sure it's not due to how you're using the objects?

@chafey
Copy link
Collaborator

chafey commented Aug 3, 2016

I checked the source file you linked and don't see any unsetting of variables - did you mean to link to another file?

@jkrot
Copy link
Author

jkrot commented Aug 3, 2016

So I used that as an example of previous stuff, I think what is happening is that a fileReader is getting set for each file and not just one fileReader for all of them and parse them one at a time.

@chafey
Copy link
Collaborator

chafey commented Aug 5, 2016

OK so you think your leak is outside the library then? If so, please close

@jkrot
Copy link
Author

jkrot commented Aug 5, 2016

I am still confirming that is the case, I am not sure but I am still
seeing some memory issues even with my changes.

On Fri, Aug 5, 2016 at 8:10 AM Chris Hafey [email protected] wrote:

OK so you think your leak is outside the library then? If so, please close


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#54 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHmqx3BVcpWl2Vbb6guwp8XtXOoqx20ks5qczZagaJpZM4Jbx4X
.

@chafey
Copy link
Collaborator

chafey commented Aug 18, 2016

@jkrot any update on this?

@jkrot
Copy link
Author

jkrot commented Aug 18, 2016

I have made headway on this https://github.com/chafey/dicomParser/blob/master/src/parseDicomDataSet.js line 36 & 70 declaring the variable in the loop was causing a heap leak for me. I am still poking around because it is hard trying to find all the places this type of leak is happening but I am making progress. I will get a commit up by this weekend.

@chafey
Copy link
Collaborator

chafey commented Aug 18, 2016

OK this isn't really a "leak" per say but more of a "using more heap than is necessary". The issue is that javascript does not hoist initialized variables. If we moved the declaratio out of the loop or declare in the loop but not assign we should be good. I am going to be working on dicomParser today and will see if I can fix this. Thanks

@chafey
Copy link
Collaborator

chafey commented Aug 18, 2016

@jkrot
Copy link
Author

jkrot commented Aug 18, 2016

Correct the problem is I am maxing out my heap so when that happens the parsing goes to an absolute crawl waiting for the garbage collector to clean up so it can be used. Part of this was my fault on how I am using the reader which I fixed but part of it is some optimization in the parsing code for better heap management without waiting for the collector.

I guess you never expected to parse several thousand files from disk in parallel.

@jkrot
Copy link
Author

jkrot commented Aug 18, 2016

@chafey
Copy link
Collaborator

chafey commented Aug 18, 2016

ok I'll wait for your PR, please rebase with latest first

@jkrot
Copy link
Author

jkrot commented Aug 22, 2016

So the only way I was able to resolve this was to setup workers and to destroy them when done. I tried forcing the collection to happen in multiple places tracing through the stack in the code and just couldn't get it to happen. I suspect this is more to do with the fileReader then dicomParser but I tried setting my fileReader = undefined; in the onloadend function for it and that didn't help. I traced back from dicomParser.explicitElementToString util all the way to my promise chain for processing these files and just couldn't get it to free up the memory along the way. I still feel like there is room for improvement somewhere but I can't seem to narrow it down. I do know this is because max heap size is being hit with over 1000+ files and those files get parsed in under 30 seconds so the collection doesn't have time to run.

@chafey
Copy link
Collaborator

chafey commented Sep 8, 2016

Any update on this? Please close unless you have evidence of a real issue

@jkrot
Copy link
Author

jkrot commented Sep 8, 2016

So still working on this, I am now classifying this as a heap management issue. So the library does a ton of string creation everywhere and that is all fine and dandy because strings are easy to garbage collect when they are no longer associated. However if you are processing a large amount of files at a time then these may not be garbage collected fast enough. The 2 ways I have found to work around this is to make everything a object because you can delete object properties yourself and free that up or push it into a web worker and terminate the web worker when done because the memory will then be cleaned up. I have just tried changing this in a few places to clean up some heap and it has helped a little but I figure it would take a ton of modification to the entire library to use objects and properties and to delete properties when you are done to streamline the heap issue I am encountering. I have done this in my own code by making the fileReader a method call in an object and deleting that method call after I am done with it to force the memory to be cleaned up faster and that has helped some but there are still strings in the library which are holding onto their associations longer then I expected. This is a very hard issue to track because I have to inspect heap snapshots and look at why the specific memory items could not be collected by garbage collection.

@chafey
Copy link
Collaborator

chafey commented Sep 8, 2016

Interesting information! I would say that your use is atypical so making significant changes to the library might be hard to justify, especially if they have any negative side effects for the more common use cases (e.g. slower performance). I hate to say it, but your use case might be better served by using a C++ DICOM parsing library like dcmtk or gdcm

@jkrot
Copy link
Author

jkrot commented Sep 14, 2016

I will have my first pass at heap reduction by this weekend. It wont be complete but I am aiming at a 10% reduction.

@jkrot
Copy link
Author

jkrot commented Sep 16, 2016

#56 Pull request for improvements.

@jkrot
Copy link
Author

jkrot commented Jan 24, 2017

Hey @chafey I know you have been busy any chance of you looking at the pull request for heap improvements. i have been running it in a forked branch for awhile now.

@chafey
Copy link
Collaborator

chafey commented Feb 16, 2017

@jkrot we will get to this soon, sorry for the delay!

@jkrot
Copy link
Author

jkrot commented Mar 6, 2017

So I have noticed something as well, after the initial parse when using the options untilTag option the byte array returned from

        try { 
            var options = {
                omitPrivateAttibutes: true,
                untilTag: 'x0020000e',
                maxElementLength: 128
            };
            var dataSet = dicomParser.parseDicom(byteArray, options);

I was expecting the byteArray returned to be smaller since we were trimming results to a specific tag but that was not the case. I need to preslice the byteArray before I send it to the parser because I am not interested in the media file in the dicom just the data in the file itself. Occasionally I am getting parsing errors in this endeavor. I thought 4k of byte stream would be more then enough is most cases but files with lots of private tags or files that has been anonymized in a strange manner sometimes break the parsing. Any ideas/thoughts on how best to trim down the size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants