-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in Parsing #54
Comments
What's your use case and what browser? I've been using the library to parse a 2GB directory (about 3200 files) with less than 150MB memory usage. Once I parse the files and extract the field values I need, I unset the byteArrays and that keeps the memory around 130MB. Are you sure it's not due to how you're using the objects? |
I checked the source file you linked and don't see any unsetting of variables - did you mean to link to another file? |
So I used that as an example of previous stuff, I think what is happening is that a fileReader is getting set for each file and not just one fileReader for all of them and parse them one at a time. |
OK so you think your leak is outside the library then? If so, please close |
I am still confirming that is the case, I am not sure but I am still On Fri, Aug 5, 2016 at 8:10 AM Chris Hafey [email protected] wrote:
|
@jkrot any update on this? |
I have made headway on this https://github.com/chafey/dicomParser/blob/master/src/parseDicomDataSet.js line 36 & 70 declaring the variable in the loop was causing a heap leak for me. I am still poking around because it is hard trying to find all the places this type of leak is happening but I am making progress. I will get a commit up by this weekend. |
OK this isn't really a "leak" per say but more of a "using more heap than is necessary". The issue is that javascript does not hoist initialized variables. If we moved the declaratio out of the loop or declare in the loop but not assign we should be good. I am going to be working on dicomParser today and will see if I can fix this. Thanks |
Correct the problem is I am maxing out my heap so when that happens the parsing goes to an absolute crawl waiting for the garbage collector to clean up so it can be used. Part of this was my fault on how I am using the reader which I fixed but part of it is some optimization in the parsing code for better heap management without waiting for the collector. I guess you never expected to parse several thousand files from disk in parallel. |
I also played around with https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/delete |
ok I'll wait for your PR, please rebase with latest first |
So the only way I was able to resolve this was to setup workers and to destroy them when done. I tried forcing the collection to happen in multiple places tracing through the stack in the code and just couldn't get it to happen. I suspect this is more to do with the fileReader then dicomParser but I tried setting my fileReader = undefined; in the onloadend function for it and that didn't help. I traced back from dicomParser.explicitElementToString util all the way to my promise chain for processing these files and just couldn't get it to free up the memory along the way. I still feel like there is room for improvement somewhere but I can't seem to narrow it down. I do know this is because max heap size is being hit with over 1000+ files and those files get parsed in under 30 seconds so the collection doesn't have time to run. |
Any update on this? Please close unless you have evidence of a real issue |
So still working on this, I am now classifying this as a heap management issue. So the library does a ton of string creation everywhere and that is all fine and dandy because strings are easy to garbage collect when they are no longer associated. However if you are processing a large amount of files at a time then these may not be garbage collected fast enough. The 2 ways I have found to work around this is to make everything a object because you can delete object properties yourself and free that up or push it into a web worker and terminate the web worker when done because the memory will then be cleaned up. I have just tried changing this in a few places to clean up some heap and it has helped a little but I figure it would take a ton of modification to the entire library to use objects and properties and to delete properties when you are done to streamline the heap issue I am encountering. I have done this in my own code by making the fileReader a method call in an object and deleting that method call after I am done with it to force the memory to be cleaned up faster and that has helped some but there are still strings in the library which are holding onto their associations longer then I expected. This is a very hard issue to track because I have to inspect heap snapshots and look at why the specific memory items could not be collected by garbage collection. |
Interesting information! I would say that your use is atypical so making significant changes to the library might be hard to justify, especially if they have any negative side effects for the more common use cases (e.g. slower performance). I hate to say it, but your use case might be better served by using a C++ DICOM parsing library like dcmtk or gdcm |
I will have my first pass at heap reduction by this weekend. It wont be complete but I am aiming at a 10% reduction. |
#56 Pull request for improvements. |
Hey @chafey I know you have been busy any chance of you looking at the pull request for heap improvements. i have been running it in a forked branch for awhile now. |
@jkrot we will get to this soon, sorry for the delay! |
So I have noticed something as well, after the initial parse when using the options untilTag option the byte array returned from
I was expecting the byteArray returned to be smaller since we were trimming results to a specific tag but that was not the case. I need to preslice the byteArray before I send it to the parser because I am not interested in the media file in the dicom just the data in the file itself. Occasionally I am getting parsing errors in this endeavor. I thought 4k of byte stream would be more then enough is most cases but files with lots of private tags or files that has been anonymized in a strange manner sometimes break the parsing. Any ideas/thoughts on how best to trim down the size? |
https://github.com/chafey/dicomParser/blob/34c02a7cec6b71a68ae21856029a8f5d957e900e/src/readSequenceElementExplicit.js
So I have been doing tests for parsing 1000+ files and I keep running into a memory leak/ inefficient garbage collection happening with these functions. It is making out my heap and causing the parsing to crawl waiting for garbage collection to finish. I was wondering if anyone else ran into this problem. Also I think I may be able to fix the underlying issue but I might need to talk it through with @chafey
The text was updated successfully, but these errors were encountered: