Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MK4 with Octoprint - M486 A<long name> crashes the printer #4123

Closed
DerMouse opened this issue Aug 5, 2024 · 25 comments
Closed

[BUG] MK4 with Octoprint - M486 A<long name> crashes the printer #4123

DerMouse opened this issue Aug 5, 2024 · 25 comments
Labels
bug Something isn't working. stale-issue third-party related to non-genuine-Prusa design, SW, or HW: remote control, extruder mod, different hotends...

Comments

@DerMouse
Copy link

DerMouse commented Aug 5, 2024

Printer model

MK4

Firmware version

6.0.1=14848

Upgrades and modifications

No response

Printing from...

USB/Octoprint

Describe the bug

PruscaSlicer writes M486 A into the gcode file. If is longer thant 83 characters, sending the file to print through Octoprint USB connection will crash/reboot the printer. No problem if same file printed through regular USB key.

How to reproduce

Slice a (very long named) STL file. Send it to MK4 through Octoprint. No need to have any real printing commands in the Gcode file.

Expected behavior

Should not crash anything.

Files

No response

@DerMouse DerMouse added the bug Something isn't working. label Aug 5, 2024
@DerMouse
Copy link
Author

DerMouse commented Aug 5, 2024

Cannot upload a file, so here is some Gcode that triggers the problem.

; length 80
M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789
; length 81
M486 A012345678901234567890123456789012345678901234567890123456789012345678901234567890
; length 82
M486 A0123456789012345678901234567890123456789012345678901234567890123456789012345678901
; length 83
M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012
; length 84
M486 A012345678901234567890123456789012345678901234567890123456789012345678901234567890123
; length 85
M486 A0123456789012345678901234567890123456789012345678901234567890123456789012345678901234
; length 86
M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012345

@danopernis
Copy link
Member

@DerMouse thanks for the report. I just tried this and wasn't able to reproduce it. I don't know much about octoprint, I just followed this guide. I also tried reproducing this via serial port terminal and also didn't get any crash.

@danopernis danopernis added the unable to reproduce We need to be able to reproduce the issue in order to help. label Aug 6, 2024
@bkerler
Copy link
Contributor

bkerler commented Aug 27, 2024

I also can't reproduce any "crash" related to gcode "M486 A" so far. @DerMouse did you see a BSOD or Red screen when the crash happened ? And if yes, can you please take a picture of the screen ? or was the serial port just not responsive ?

@DerMouse
Copy link
Author

Sorry for the late answer.
If I remember proberly (not close to the printer yet), it was a red screen with (maybe) some kind of QR code. I'll check this afternoon and will try to take a picture.

@DerMouse
Copy link
Author

So, I did the test again.
As soon as I ask Octoprint to print the file, the printer reboots and, after reboot, displays a red screen :

IMG_20240828_095924

The printer must then be rebooted or reset.

@DerMouse
Copy link
Author

Here is the data from the serial.log file when sending the file :

2024-08-28 18:21:31,608 - Changing monitoring state from "Operational" to "Starting" 2024-08-28 18:21:31,671 - Send: N0 M110 N0*125 2024-08-28 18:21:31,697 - Recv: ok 2024-08-28 18:21:31,700 - Changing monitoring state from "Starting" to "Printing" 2024-08-28 18:21:31,737 - Send: N1 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789*73 2024-08-28 18:21:31,743 - Recv: ok 2024-08-28 18:21:31,750 - Send: N2 M486 A012345678901234567890123456789012345678901234567890123456789012345678901234567890*122 2024-08-28 18:21:31,761 - Recv: ok 2024-08-28 18:21:31,769 - Send: N3 M486 A0123456789012345678901234567890123456789012345678901234567890123456789012345678901*74 2024-08-28 18:21:31,823 - Recv: ok 2024-08-28 18:21:31,836 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:31,858 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:31,884 - Recv: Resend: 4 2024-08-28 18:21:31,916 - Recv: ok 2024-08-28 18:21:31,977 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,061 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,062 - Recv: Resend: 4 2024-08-28 18:21:32,133 - Recv: ok 2024-08-28 18:21:32,143 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,174 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,175 - Recv: Resend: 4 2024-08-28 18:21:32,184 - Recv: ok 2024-08-28 18:21:32,241 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,276 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,277 - Recv: Resend: 4 2024-08-28 18:21:32,301 - Recv: ok 2024-08-28 18:21:32,315 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,372 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,373 - Recv: Resend: 4 2024-08-28 18:21:32,398 - Recv: ok 2024-08-28 18:21:32,418 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,441 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,449 - Recv: Resend: 4 2024-08-28 18:21:32,469 - Recv: ok 2024-08-28 18:21:32,503 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,533 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,534 - Recv: Resend: 4 2024-08-28 18:21:32,546 - Recv: ok 2024-08-28 18:21:32,558 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,609 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,610 - Recv: Resend: 4 2024-08-28 18:21:32,633 - Recv: ok 2024-08-28 18:21:32,645 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,697 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,698 - Recv: Resend: 4 2024-08-28 18:21:32,715 - Recv: ok 2024-08-28 18:21:32,738 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,780 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,781 - Recv: Resend: 4 2024-08-28 18:21:32,803 - Recv: ok 2024-08-28 18:21:32,820 - Send: N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127 2024-08-28 18:21:32,838 - Recv: Error:checksum mismatch, Last Line: 3 2024-08-28 18:21:32,839 - Recv: Resend: 4 2024-08-28 18:21:32,840 - Printer keeps requesting line 4 again and again, communication stuck 2024-08-28 18:21:32,840 - Changing monitoring state from "Printing" to "Error" 2024-08-28 18:21:32,892 - Send: M112 2024-08-28 18:21:32,899 - Send: N5 M112*36 2024-08-28 18:21:32,937 - Send: N6 M104 T0 S0*39 2024-08-28 18:21:32,951 - Send: N7 M140 S0*98 2024-08-28 18:21:33,009 - Changing monitoring state from "Error" to "Offline after error" 2024-08-28 18:21:33,046 - Connection closed, closing down monitor 2024-08-28 18:21:33,531 - Closing down send loop

@danopernis
Copy link
Member

Thanks for the screenshot and logs. Looks like redscreen is just a symptom of octoprint sending M112 as a part of error recovery. It seems that octoprint doesn't like Error:checksum mismatch on the serial line. I will look into why and how this string ends up on the serial line.

@danopernis danopernis added third-party related to non-genuine-Prusa design, SW, or HW: remote control, extruder mod, different hotends... and removed unable to reproduce We need to be able to reproduce the issue in order to help. labels Aug 28, 2024
@DerMouse
Copy link
Author

@danopernis If you need any more data, or some tests to be done, just send a comment here.

@bkerler
Copy link
Contributor

bkerler commented Aug 29, 2024

Yeah, seems to be an issue with octoprint and not the prusa firmware at first sight. I will try some tests against the checksum algorithm just to be sure.

@bkerler
Copy link
Contributor

bkerler commented Aug 29, 2024

Just verified that "N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127" is correct, the checksum should be 127.

Verifying against the current printer fw 6.1.2, I'm getting a checksum error :
image

Using a debugger, it seems the problem isn't the checksum being calculated incorrectly, but instead the strtol result :
image

The strtol compare does correspond to a value of "12" instead of "127", which means the command string (apos+1 ptr) is nullterminated incorrectly (length-1) due to MAX_CMD_SIZE being 96 as the incoming string buffer is actually 97, thus leading to incorrect checksum validation.

So long story short:
It's an issue with octoprint, please report that issue to their repo. Octoprint should stop sending commands if the checksum is invalid and the commands are numbered.

What should be done for the prusa firmware :
The firmware should return that the command being sent is too large (replying with max payload size being xxx bytes) instead of just validating a checksum for a broken packet due to a command being too long.

@bkerler
Copy link
Contributor

bkerler commented Aug 29, 2024

It seems that octoprint treats checksum errors as no real error and does resend the line assuming that the checksum error happened due to serial issues. With the PR #4168 this should be handled differently by octoprint instead, causing the print to error out.

@cp2004
Copy link

cp2004 commented Aug 29, 2024

Octoprint should stop sending commands if the checksum is invalid and the commands are numbered.

Forgive me if I've missed something here, perhaps not having seen everything in detail. But why should OctoPrint stop sending commands in this situation? Could you explain this a bit more?

Generally, the point of checksums is to react to communication errors. So normally a checksum mismatch would result in a resend request, so communication could recover. Stopping sending any further commands means that communication is never recoverable, which is not what we want.

@cp2004
Copy link

cp2004 commented Aug 29, 2024

Just verified that "N4 M486 A01234567890123456789012345678901234567890123456789012345678901234567890123456789012*127" is correct, the checksum should be 127.

This is also implying to me that OctoPrint is sending the right thing. Apologies again if I'm being dense, but could you clarify what OctoPrint is doing wrong here (or what it should be doing instead)? I'm not very clear.

@foosel
Copy link

foosel commented Aug 30, 2024

From my understanding of this issue here, the problem is indeed not "with OctoPrint", but rather with the checksum validation on firmware side failing due to a buffer overrun in the firmware.

And as far as I understand the core issue here, the slicer (also by prusa) is generating a line (with a prusa specific command?) that is SO close to the firmware's buffer sizes that problems are just waiting to happen, as witnessed here.

We currently don't have a mechanism to communicate buffer sizes from the firmware to OctoPrint, least of all something firmware agnostic, and checksum mismatches due to serial communication issues are indistinguishable from checksum issues due to firmware side buffer overruns. The correct way to handle checksum mismatches in the (way more common) first case is to resend the line. In the second case, I guess all that can be done is abort the print, or maybe reset the line numbers in the hopes that shortens the line length enough so that the firmware doesn't mangle the checksum calculation. But without a way to distinguish one from the other, there's nothing that can be changed in OctoPrint that won't entirely break the first case, and break things for everyone else out there.

As you said yourself, what OctoPrint is sending is correct, it's the firmware that can't handle it and triggers the failure mode. From your explanation above, the firmware should be reading the whole line from last new line to next new line, not just the part that fits into its arbitrarily (and apparently undersized) line buffer.

@danopernis
Copy link
Member

@foosel thanks for the insight, I agree with your points. It is a shame that there is no standard way to distinguish those cases, nor to communicate the actual buffer size. As it stands, firmware is limited in memory and we can't just keep increasing buffers whenever somebody hits the limit. Maybe the slicer should limit the size of cancel object names, we will discuss it.

Worth mentioning, there are actually two limits: One is for the actual line of the gcode and other is for this weird serial protocol which adds line number prefixes and checksum suffixes. Obviously the second one needs to be larger. Is there a way to turn off this serial protocol in OctoPrint? That could be a workaround @DerMouse could use, if the serial line is not noisy.

@DerMouse
Copy link
Author

@foosel thanks for the insight, I agree with your points. It is a shame that there is no standard way to distinguish those cases, nor to communicate the actual buffer size. As it stands, firmware is limited in memory and we can't just keep increasing buffers whenever somebody hits the limit. Maybe the slicer should limit the size of cancel object names, we will discuss it.

I hit this behaviour when slicing (and then sending to print) some STL with a very long file name (90+ characters). Once I understood the problem came from the M486 line, and that it was built from the file name, I shortened the name and everything went back to normal.

Some warning from the slicer, or limiting the M486 A size, would indeed be great to avoid the problem.

@foosel
Copy link

foosel commented Aug 30, 2024

we can't just keep increasing buffers whenever somebody hits the limit.
Maybe the slicer should limit the size of cancel object names, we will discuss it.

I was just about to point out that said "somebody" was probably sitting some rooms further on your corridor ;) This is a very much prusa-internal problem, created by your slicer in your firmware. It would have arised with any host software adhering to the de-factor standard communication protocol as an intermediary, that it happened with OctoPrint was only thanks to its market share.

this weird serial protocol which adds line number prefixes and checksum suffixes.

Oooooof. Ok, I have to admit, this sentence isn't what I'd have expected to see coming from a firmware dev 😅

This "weird serial" protocol is the established industry standard for serial communication in consumer 3d printers, which is notoriously sensitive to EMI thanks to ridiculous cable lengths being used in the field, plus low quality wiring, crossing wires with motor cabling, even refrigerators or other home appliances on the same noisy mains line, down to firmware abusing interrupts, etc. In short: this communication is surprisingly error prone, the communication protocol hence includes ways to implement an error correction at least on the sending side (sadly not on the receiving side, but that's a different story), and that's line numbers and checksums. And that has been the case ever since consumer 3d printing became a thing and significantly predates the existence of OctoPrint and even Marlin.

I'm frankly a bit reluctant about telling you that yes, those CAN in fact be disabled, because I'm afraid that the next thing that will happen is an article in the Prusa Knowledgebase, telling MK4 users who want to use their printer with OctoPrint to disable that crucial error correction "because of an issue with OctoPrint". It wouldn't be the first time that the blame for an issue with your firmware is shifted to me, and the disgruntled users then screaming at me for not properly interoperating with their expensive hardware, when in fact it's the in-house developed firmware that still has some serious bugs to iron out.

So instead of telling you how to disable this error correction as a work around, I'm instead encouraging you to solve the issue at the (in-house) source and use more sensible cancel object labels in the slicer.

@foosel
Copy link

foosel commented Aug 30, 2024

Worth mentioning, there are actually two limits: One is for the actual line of the gcode

Side note, given that this is caused by the slicer not limiting the line length and just using the user supplied STL file name to generate the label here, if I were you I'd take a close look at what happens if the line grows beyond your buffer length even without line numbers and checksums, when coming from a file stored on the printer. My money is on that at least causing label corruption, if not outright crashes as well due to run away quoted strings...

@DerMouse
Copy link
Author

Side note, given that this is caused by the slicer not limiting the line length and just using the user supplied STL file name to generate the label here, if I were you I'd take a close look at what happens if the line grows beyond your buffer length even without line numbers and checksums, when coming from a file stored on the printer. My money is on that at least causing label corruption, if not outright crashes as well due to run away quoted strings...

It could even be worse : a buffer overflow may lead to code execution on the printer (dunno if M486 A may include binary data, though, so several assumptions here). A hand-written malicious GCODE file could then make for interesting results.

@foosel
Copy link

foosel commented Aug 30, 2024

Indeed. A crash in such a case would indicate a potential attack vector.

@danopernis
Copy link
Member

I am not really in the mood of discussing how the entire "established industry standard for serial communication in consumer 3d printers" is just a pile of hacks which are put on top of other hacks which miraculously work most of the time. It resembles how the evolution created totally useless 4.5m nerve in giraffes because it can't go back and fix it because of backwards compatibility 😄 .

Proper engineering solution would involve using proper link layer on top of physical layer, containing proper mechanism for error detection, retransmission and what not. Also not parsing character-oriented instructions like crazy and using proper binary format, with backward compatibilities, versioning, deprecation, proper vendor extensions etc. But it is too late for that, just like it is late for fixing the giraffes, so that's just me ranting. Not sure if that's what you'd expect from firmware developer or not 😉 .

We will fix this, eventually, as we do. It is just tiresome and your Schadenfreude is just not helping to be honest 😫 .

P.S. Crashes are not happening, and the attack vector is not there. Firmware is properly refusing the string which is too long, not overwriting anything. I have no idea what you are talking about.

@foosel
Copy link

foosel commented Aug 30, 2024

your Schadenfreude

There's no Schadenfreude here anywhere, I'm not sure where you are getting that from. I got an issue reported on OctoPrint about this, I went here, explained how things are, gave pointers how to fix this and where else to look for potential related issues and also explained why I didn't want to go on the record with a potential workaround due to those having been abused in communication in the past to shift blame.

Try to look at this from my perspective. Something in this firmware misbehaves, an analysis is posted that even comes to the conclusion that OctoPrint is doing everything correctly, and yet the reporting user is sent to OctoPrint's repository to open a ticket about it. If this would be an isolated case, shit happens, but this has sadly become a bit of a pattern with this and the former firmware, and so I feel the need to be very clear on where the issue lies.

That is not meant as a personal attack, Schadenfreude or anything like that. I just want to see this fixed for the sake of the users, without OctoPrint getting the blame for it (which - sadly - indeed has happened before, repeatedly).

I am not really in the mood of discussing how the entire "established industry standard for serial communication in consumer 3d printers" is just a pile of hacks which are put on top of other hacks which miraculously work most of the time.

We agree on that, however, as you said, it is what it is, and for the sake of interoperability we should see how we can make it work. Throughout the years I've pushed things in the a better direction whenever I could, but with the firmware landscape being as fragmented as it is, there's not much that can be done without fragmenting it even further.

@bkerler
Copy link
Contributor

bkerler commented Aug 30, 2024

@foosel It's not about blaming Octoprint doing anything wrong here at all. It does make sense that in case a checksum fails, the transmission is being resent. It is more about resending an already failed message multiple times again (more than 10 times if I saw it correctly). But maybe this is intentional behaviour by Octoprint, but imho, this clearly should be limited to be less than 5 retries, especially if lines are numbered and checksumed. And I wouldn't expect that a M112 is being send just because a sent message obviously failed and pausing/aborting a print would be sufficient. I didn't dig deep enough in the octoprint repo to see if that is default by design, so sorry for that, if I missed anything obvious.

It is pretty clear that the Prusa printer firmware has to send an error message indicating that the gcode line is too long, that's why I made a PR in order to fix that instead of claiming the checksum is wrong although it was right. As @danopernis already said, there is no vulnerability at all, which I can confirm as I tried to use fuzzing and source debugging to find the root cause of this issue. The printer firmware as of now simply ignores the message and throws a checksum failed error.

Copy link

Thank you for your contribution to our project. This issue has not received any updates for 60 days and may be considered "stale." If this issue is still important to you, please add an update within the next 7 days to keep it open. Administrators can manually reopen the issue if necessary.

Copy link

github-actions bot commented Nov 7, 2024

This issue has been closed due to lack of recent activity. Please consider opening a new one if needed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. stale-issue third-party related to non-genuine-Prusa design, SW, or HW: remote control, extruder mod, different hotends...
Projects
None yet
Development

No branches or pull requests

5 participants