-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in parse_string? (or misleading documentation?) #45
Comments
https://github.com/JuliaLang/LightXML.jl/blob/d31f1f43dc04fd1546ec5e4ad039c6f1981892fd/src/document.jl#L103 |
I suggest at a minimum:
|
My concern is if you do |
So, per my suggestion above, please describe that in the documentation.
And, please provide examples of correct usage in the documentation. (FWIW, a segfault is far preferable to a memory leak. A segfault gets found in testing. A memory leak causes stuff to break months later in production and costs way more to fix.) |
A segfault caused by a GC will not necessarily be found in tests. |
I'm not familiar with libxml2 API but is the memory own'ed by a root object and is it always possible to find know what the owner object is when the library returns a pointer (either from the argument or from some libxml2 api)? |
I don't know exactly how to answer your question @yuyichao. libxml2 is a C library and until someone implements a full reference count system here (which is what I believe the R wrapper and maybe also the Python wrapper do), you're responsible for C-like manual memory management when using it through this wrapper. |
What I mean is that we don't necessarily need to keep track of reference counting and we can instead track the reference directly. This may not be simpler in general (although also won't be much more complicated since for RC you also need to lookup the julia object corresponding to a libxml2 pointer) but in the case where each object has a clear owner this would be much simpler. My question was basically if this is true for libxml2. For example, can a xml node belong to multiple xml document simultaneously? And if this is reasonable to keep track of given the API. (i.e. if it is clear from the API when and how the ownership is transfered.) |
Looks like.
|
Actually it looks like this is not allowed since freeing both
|
That's expected, |
Then is there a way to free all the memory allocated without segfault after creating this? |
|
It might be one of those things that the library lets you do but you're not supposed to do it? If we want to do something clever and automatic here, I think we'd really have to carefully study how some other proven, well-tested wrapper of |
IIUC In [1]: from lxml import etree
In [2]: root = etree.Element("root")
In [3]: root2 = etree.Element("root2")
In [4]: c1 = etree.Element("child1")
In [5]: root.append(c1)
In [6]: print(etree.tostring(root).decode())
<root><child1/></root>
In [7]: print(etree.tostring(root2).decode())
<root2/>
In [8]: root2.append(c1)
In [9]: print(etree.tostring(root).decode())
<root/>
In [10]: print(etree.tostring(root2).decode())
<root2><child1/></root2> |
FWIW, here's a detailed description of how this works in the R wrapper for libxml2: |
In XMLDict, I've added The user who reported JuliaCloud/XMLDict.jl#1 says that they can now process large (6 gb) files with constant memory usage. I'd like to reiterate my request to the maintainers of LightXML.jl: Please add, ASAP, a description and and example to the documentation for the simple read-only case that says "if you do x, y and z you'll be fine". I understand that it might take time to figure out a robust way to handle all the corner cases, but in the meantime, the documentation should at least tell the user how to use the library safely in its current form. |
Pull requests encouraged. |
FYI I made a further tweak to the XMLDict.jl work-around: JuliaCloud/XMLDict.jl@d7b63d0 |
I received a report of memory leaks from an XMLDict user JuliaCloud/XMLDict.jl#1.
I'm somewhat surprised. The LightXML.jl doc says, under the heading "Create an XML Document", that "When you create XML documents and elements directly you need to take care not to leak memory". However, under the heading "Read an XML file" there is no mention of memory management. XMLDict.jl only calls LightXML.parse_string, so it does not "create XML documents and elements directly".
The documentation seems to say that if you are just "read[ing] an xml file", as opposed to "create[ing] XML documents and elements directly", then you do not need to worry about memory management.
Is
parse_string
broken? or is the documentation broken?This seems related: #19
The text was updated successfully, but these errors were encountered: