-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues #147 and hopefully #143 #149
Conversation
…uffer to the end in an attempt to make it easier to understand and to better cope with understanding the content while trying to chunk the server response into lines.
Local test results:
|
Just a comment on my patch. I decided to simply create a single buffer out of all incoming data, since I think it makes things simpler. The only thing I didn't really like is that to do so, I had to take the incoming data and temporarily put it into an Uint8Array so that I could append it to the new complete buffer. I couldn't find a way to append it directly into the new complete buffer, so that is sort of a waste of an allocation, but other than that it is fairly clean. Also, I don't claim to be any kind of expert in the IMAP protocol, I took my lead from how the existing code was trying to chunk server response lines and just tried to make it smarter based on my [limited] understanding of what the content of the server response should be. |
The wasted allocation and unneccressary copy operations of incrementally creating the buffer like that is rather problematic. Let's say the total size is 1MB and the buffers 4k, then the intermediate wasted allocation is 128MB. For 10MB attachment, that's 12G of allocation and unnecessary copy and GC. |
I was thinking about that some more. I think I could do it in a similar fashion as the old approach, where I use an array of buffers. I could look into that and open a separate pull request to enhance this one. What do you think? Or I guess it should be possible to generate a new, complete pull request. |
You got me thinking this could be structured more like a state machine, is that the direction you were thinking of taking this? I'm going to test an idea of a refactor, but I think your original pull request #148 is satisfactory to close this issue. |
Well, there is a state machine in there, but it has a little bit of parser sprinkled in there too, since it has to understand escaping and whatnot. So, it's a little bit of a hyrbid, but possibly could be made more pure. Certainly seems to work better for me too. I have another version mostly working where I don't concat the buffers into a single buffer, but I need to resolve one test case failure and then test it on real data. |
Yes, #147. Feeding tokens to the parser would be best: it gets rid of double parsing and takes less memory than creating a buffer with the whole command. |
I agree that using my basic approach to directly parse the data as it comes in would be smarter than breaking it into chunks and parsing the chunks, since breaking it into chunks requires knowledge of how to parse it anyway. Not only would that help resource consumption, it would be a little more efficient too, since the current approach has to start over from scratch each time the buffer doesn't contain enough information for it to continue. Regardless, I am going to submit a new pull request, which is an alternate version of this pull request, which does not do extra copying of the buffers. |
…reation and copying by using an array of buffers.
Ok, apparently github automatically adds my new commit to this pull request rather than letting me create another. Fine. Despite what the CI failure says, all tests still pass for me locally and I have used the modified algorithm to sync 175k messages (minus attachments) without issue. |
I don't know if you are going to work on a refactor or not, but if that isn't going to happen soon, I think it would be helpful to apply this PR in the meantime, since the original code is fairly broken as the two new test cases demonstrate. Thanks. |
Any progress on this? |
I pushed cb47736 where _iterateIncomingBuffer handles the buffer explicitly as a state machine. I also commited your zero-length literal test with it as that fixed #147. The other test wasn't set up correctly - ran appendIncomingBuffer twice in a row without calling iterator.next() in between - so I skipped it. |
First of all, thanks! Regarding the second test case, it is not clear to me why appendIncomingBuffer needs to be interleaved with iterator.next(), since it is just setting up a data structure (i.e., an array of two buffers) which is a legal structure. Regardless, that test is very important since it demonstrates the most egregious bug that was fixed by the pull request, so I'll try to reformulate it. I'll open a separate pull request for that, ok? |
If you look at _onData you see that every time a buffer is pushed to this._incomingBuffers, it is processed in _iterateIncomingBuffer. Therefore the test doesn't describe a situation that can happen in real code. The test passes when it looks like this:
|
Ok, I see what you are saying, but then I guess I don't know how to create a test case that fails then. If you look at #143, there is another more complicated test case in there, but it is still formulated incorrectly because I don't iterate after each append. However, those three strings correspond to the incomingBuffers array that I saw in the wild when the parsing failed. Not sure if that helps, but that's all I have for now. |
I see the same with those three strings when iterating after each push. Is it possible that it was something else that caused the issue? |
Yeah, I agree, if I iterate after each it doesn't give me an error either. But my point was, those strings represent the three entries in incomingBuffers when the parse failure occurred. I could attempt to go back and recreate with the old version, but I am fairly certain that the issue I described is what was happening (i.e., it was seeing a linefeed/carriage return in an incomplete buffer and thinking it was terminating the line). Regardless, I'm testing the new version of my 175k messages right now and will see how that goes and will potentially go back and see if I can further pin down the error on the old version. Thanks. |
BTW, are you planning on doing a release? |
Just a follow up on the other test case. I am still not sure how to best recreate the issue, especially since it happens randomly depending on how data is received from the socket. However, I tried to dig up some more information in hopes that you might be able to figure out how to define a test case. So, with some debug statements added in emailjs-imap-client-imap.js, I see this situation: In a call to onData() before calling iterateIncomingBuffer() I see the following entries in _incomingBuffers:
How exactly it gets into this state, I'm not sure but you can see by scrolling to the end of the last buffer that it is not a complete command. This is the type of incomingBuffers I was trying to create with my test case. Again, the issue here is that these errors are dependent on how the data arrives. The command that was retrieved then from inside _parseIncomingCommands() via _iterateIncomingBuffer() was:
This results in a parse error in _parseIncomingCommands(). This is a fairly important bug to have a test case for. I understand the argument that iterate is called in onData each time data is received, but regardless the incomingBuffers look like what I have above and resulted in an exception. By calling appendIncomingBuffer without interleaving iterates, I'm able to recreate this situation in the test case and my patch worked correctly in that case where the original code did not. |
The problem is already there in the first entry of _incomingBuffers, {3019}\n and not {3019}\r\n as the standard says it should be https://tools.ietf.org/html/rfc3501#page-85. Are the higher level emailjs libraries forgiving to this this kind of IMAP commands and parse these kind of literals? |
Sorry, that might have been a mistake in my translating the buffers to strings. In this case, the email server is Gmail, so I'm sure it's doing the right thing. My simplified example in the submitted test case demonstrated the same issue, but was properly formatted. One thing I should point out, after some tests, I have not reproduced this error with master. However, if I add my submitted test case, then master does fail, but my patch didn't fail in that case, for what it's worth. So, it is possible that your patch potentially prevents it from getting into that state in the wild. I could easily reproduce this prior to your patch. |
I edited my previous comment to fix the \r\n issues. |
Rewrite _iterateIncomingBuffer() to parse from the beginning of the buffer to the end in an attempt to make it easier to understand and to better cope with content handling while trying to chunk the server response into lines.
This addresses issue #147 for me. I don't know if it addresses the reported issue in #143, since I didn't report it, but I think it might. It does address the other similar issue that I reported in the comments of #143.