Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle lines longer than 1000 bytes: error "bad mail input format" #18

Open
blueyed opened this issue Jan 11, 2014 · 49 comments
Open

Handle lines longer than 1000 bytes: error "bad mail input format" #18

blueyed opened this issue Jan 11, 2014 · 49 comments

Comments

@blueyed
Copy link
Contributor

blueyed commented Jan 11, 2014

Currently dma chokes on mail input which contains lines longer than 1000 bytes, because of the buffer being used.
When it does not find a newline at the end, it adds one, but then assumes/requires this to be the last line.

The error is "bad mail input format".

This might happen when using dma to handle web application errors during development / local setup, and the included debug information contains huge environment etc settings (e.g. "LS_COLORS").

I think that it would be saner to not require RFC2822 compliance, but accept the mail nonetheless.

The code in question: https://github.com/corecode/dma/blob/master/mail.c#L376

@corecode
Copy link
Owner

Do you have a suggestion how to deal with longer lines? Something that
does not complicate the code?

On 01/11/2014 07:00 AM, Daniel Hahler wrote:

Currently dma chokes on mail input which contains lines longer than
1000 bytes, because of the buffer being used.
When it does not find a newline at the end, it adds one, but then
assumes/requires this to be the last line.

The error is "bad mail input format".

This might happen when using dma to handle web application errors
during development / local setup, and the included debug information
contains huge environment etc settings (e.g. "LS_COLORS").

I think that it would be saner to not require RFC2822 compliance, but
accept the mail nonetheless.

The code in question:
https://github.com/corecode/dma/blob/master/mail.c#L376


Reply to this email directly or view it on GitHub
#18.

@blueyed
Copy link
Contributor Author

blueyed commented Jan 11, 2014

The easiest approach would probably be to increase the buffer length, from 1000 bytes to maybe 64kB (65536).
It would/could then still fail in the same way, but it's much more unlikely.

Or, just do not add a newline (and effectively aborting / cutting the input) and keep reading in 1000 byte chunks.

@corecode
Copy link
Owner

The RFC is absolutely clear on this issue:

Each line of characters MUST be no more than 998 characters

and
However, there are so many implementations that (in compliance with
the transport requirements of [RFC5321
http://tools.ietf.org/search/rfc5321]) do not accept messages
containing more than 1000 characters including the CR and LF per line,
it is important for implementations not to create such messages.

It is clear that we can not forward any message with more than 1000
characters per line. What do you suggest as alternative? Insert a
linebreak after 998 characters?

On 01/11/2014 11:15 PM, Daniel Hahler wrote:

The easiest approach would probably be to increase the buffer length,
from 1000 bytes to maybe 64kB (65536).
It would/could then still fail in the same way, but it's much more
unlikely.

Or, just do not add a newline (and effectively aborting / cutting the
input) and keep reading in 1000 byte chunks.


Reply to this email directly or view it on GitHub
#18 (comment).

@blueyed
Copy link
Contributor Author

blueyed commented Jan 12, 2014

Well, I think that dma is not creating the message, but only transports it.

The bug is really with the app that hands it over to dma.

Inserting newlines can be very problematic, if the message is meant to contain (encoded) data, and the result may fail to parse afterwards.

Imagine an automated message, with "key=value" pairs on separate lines, where it may result in "key=verylongvalue\r\nstillvalue".

However, breaking lines after 998 bytes is still far better than rejecting the mail altogether.

I can imagine having a configuration option (in dma.conf) for this: RFC2822_SPLITLINES.
If set, CRLF gets inserted after 998 bytes, otherwise the lines would get bypassed unchanged.

@corecode
Copy link
Owner

It is not optional to forward messages with more than 998 characters per
line: it is forbidden. Do you know how other MTAs handle this? I only
see two options: break line, or reject (bounce).

On 01/12/2014 06:01 PM, Daniel Hahler wrote:

Well, I think that dma is not creating the message, but only
transports it.

The bug is really with the app that hands it over to dma.

Inserting newlines can be very problematic, if the message is meant to
contain (encoded) data, and the result may fail to parse afterwards.

Imagine an automated message, with "key=value" pairs on separate
lines, where it may result in "key=verylongvalue\r\nstillvalue".

However, breaking lines after 998 bytes is still far better than
rejecting the mail altogether.

I can imagine having a configuration option (in dma.conf) for this:
RFC2822_SPLITLINES.
If set, CRLF gets inserted after 998 bytes, otherwise the lines would
get bypassed unchanged.


Reply to this email directly or view it on GitHub
#18 (comment).

@blueyed
Copy link
Contributor Author

blueyed commented Jan 12, 2014

Then breaking the line is better than rejecting the mail.

Postfix has a line_length_limit option, with a default of 2048 (http://www.postfix.org/postconf.5.html):

Upon input, long lines are chopped up into pieces of at most this length; upon delivery, long lines are reconstructed.

It sounds like this will effectively pass through long lines?!

@bcoca
Copy link

bcoca commented Mar 26, 2014

@blueyed, IIRC that limit is for internal processing across Postfix apps and filters, for actual smtp this is the setting:
http://www.postfix.org/postconf.5.html#smtp_line_length_limit

@corecode
Copy link
Owner

Good catch. Any suggestions how we should handle this?

On 03/26/2014 08:24 PM, Brian Coca wrote:

@blueyed https://github.com/blueyed, IIRC that limit is for internal
processing across Postfix apps and filters, for actual smtp this is
the setting:
http://www.postfix.org/postconf.5.html#smtp_line_length_limit


Reply to this email directly or view it on GitHub
#18 (comment).

@bcoca
Copy link

bcoca commented Mar 26, 2014

I think rejection is OK, but maybe add more info to the message, 'bad mail input format' (I just ran into this) is a bit too succinct, knowing that its a line length error helps pinpoint the issue much faster. Also, quoting RFC helps convince programmers change their code vs demand that the MTA just accept their emails.

@corecode
Copy link
Owner

alternatively we could force a linebreak when accepting the mail, like
postfix does when sending?

On 03/26/2014 08:47 PM, Brian Coca wrote:

I think rejection is OK, but maybe add more info the the message, 'bad
mail input format' (I just ran into this) is a bit too succinct,
knowing that its a line length error helps pinpoint the issue much
faster. Also, quoting RFC helps convince programmers change their code
vs demand that the MTA just accept their emails.


Reply to this email directly or view it on GitHub
#18 (comment).

@blueyed
Copy link
Contributor Author

blueyed commented Mar 27, 2014

The Postfix approach seems sensible.
To me everything is better than rejecting the mail.

quoting RFC helps convince programmers change their code

I agree. There could be a warning instead of the current error.

Given the use case of dma (for personal or lightweight use in containers) I do not think it should be too strict (i.e. rejecting the mail).

@bcoca
Copy link

bcoca commented Mar 27, 2014

I disagree, I use dma because it it simple and lightweight and doesn't try to do complex stuff like line splitting with character or armor encodings. If it accepts mail and it then gets rejected by another mailer, it creates more issues than it solves.

I am a fan of fail early and fail loudly, if you start accepting emails and push the problem downstream you never have an incentive to fix the actual issue, the program composing non compliant smtp messages.

@emaste
Copy link
Collaborator

emaste commented Mar 24, 2016

This is one issue that prevented the FreeBSD clusteradm team switching to DMA internally: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208261

@bcoca
Copy link

bcoca commented Mar 24, 2016

in postfix you CAN disable the validation, does not mean it doesn't do it, postfix has many options to 'tolerate' other broken implementations

@corecode
Copy link
Owner

@emaste what do you suggest dma should do? Just break lines?

@emaste
Copy link
Collaborator

emaste commented Jun 17, 2016

@corecode I want to see the FreeBSD commit mail generator (I think it was the problem) fixed to not generate lines > 998 chars, and I think I agree that by default dma shouldn't accept such mail.

What do you think about having a config file option to set a (non-compliant) maximum line length, and/or one to enable line splitting?

@corecode
Copy link
Owner

I'm fine with not rejecting mails by default - I just don't know what we should do. Split the line? Cut it? Pass it and be non-compliant?

@emaste
Copy link
Collaborator

emaste commented Jun 17, 2016

I don't like just cutting it (silently losing information in an otherwise passed email). Given the choice of passing non-compliant lines and splitting I'd choose the latter.

That leaves us with the question of doing it by default or only under a config file option. If I'm doing the work I wouldn't bother with the config option, but wouldn't object to one to appeal to the "fail early and fail loudly" by default mentioned by @bcoca.

@corecode
Copy link
Owner

I don't think we can easily add to the headers during reading. I'm fine with making the default to split lines. We'll need a strategy for splitting possibly overlong header lines tho, so that they stay being headers. Can we just split words in the middle, or do we have to split on word boundaries?

@emaste
Copy link
Collaborator

emaste commented Jun 17, 2016

For me I consider the line splitting to be a "best effort" to deliver malformated mail, and I'd be happy enough splitting at whatever point the we reach the line length limit. I'd also be fine rejecting overlong headers and splitting only the body, if that's easier.

@ttw
Copy link

ttw commented Jul 29, 2016

I believe the fundamental problem is context; currently there is none. Since 'dma' is, by design, a local MTA (i.e. non-listening) we should readily be able to distinguish from mail we are generating (i.e. out-going mail) and mail we are consuming (i.e. incoming for local delivery); since we are not relaying. I believe the existing behaviour is correct for out-going mail. For incoming mail we should be tolerant and pass the problem to the mail client. I do not believe that mangling email is a valid option as upstream mail signatures will then be incorrect.

To clarify, my own use case is using 'dma' as an out-going MTA and an incoming MDA (through fetchmail).

... had quick look at the code in relation to the above ... I think the initial proposal is correct in that we need to increase the 'readmail' buffer to a maximum (say 64k) and then add a check which will flag the mail as malformed if any line is over 1000 bytes (possibly through a generic 'int qitem.valid' which can map to various validity checks). We can then defer the actual decision until we actually 'deliver' the mail.

PS: you'll probably need a '#define RFC822_MAX_LINE 1000' and a '#define DMA_MAX_LINE' as is see those 'char[1000]' limits in a couple more places.
PPS: I can create a patch if the proposal is acceptable.

@emaste
Copy link
Collaborator

emaste commented Jul 29, 2016

It's the outgoing mail side of this issue that affected the FreeBSD commit mail generator. It is true that the script should not generate long lines, but it's handled by the current MTA.

@ttw
Copy link

ttw commented Jul 29, 2016

@emaste : I can only comment personally to that; firstly, I agree, "the script should not generate long lines". Secondly, we should not emit invalid output (i.e. lines over the limit). Thirdly, splitting content lines presumes a great deal about the content and, whilst it may not impact the FreeBSD output it may malform other content; reducing that problem is a sisyphean task.

In short (and IMHO), FreeBSD is correct it's assessment that 'dma' is an unsuitable mailer for its toolchain (because it produces illegal output).

@ttw
Copy link

ttw commented Jul 30, 2016

Pull request is my best understanding of sane behaviour. It was actually a little simpler than I imagined since we queue the mail anyway; by defining different buffer lengths and only using the RFC length for remote delivery we can bump the error.

That said, upon reviewing the RFC I'm going to have to disagree with @corecode; the RFC does not prohibit lines longer than 1000 characters. IMHO it prohibits the production of messages which have lines longer than 1000 characters (e.g. the FreeBSD toolchain example we're discussing), however, it actually recommends that an MTA "handle an arbitrarily large number of characters". As such, there is probably a bit more work to do on the problem.

PS: I also forgot to patch the line termination so it actually has a valid CRLF, rather than an invalid LF terminator (semantic issue I know but I'll try and do that anyway).

bapt added a commit to bapt/dma that referenced this issue Oct 21, 2016
For mails where the body does not respect RFC2822, split at the
arbitrary length of 990

This should address corecode#18
bapt added a commit to bapt/dma that referenced this issue Oct 21, 2016
For mails where the body does not respect RFC2822, split at the
arbitrary length of 990

This should address corecode#18
bapt pushed a commit to bapt/dma that referenced this issue Dec 12, 2017
For mails where the body does not respect RFC2822, try to split by words
finding the last space before 1000's character

If no spaces are found then consider the mail to be malformed anyway

This should address corecode#18
@Jamie-Landeg-Jones
Copy link

Jamie-Landeg-Jones commented Feb 1, 2022

I don't like the idea of message content being altered - at least, not without warning.

Of course, allowing lines over 1000 characters might cause a message to be corrupted somewhere else down the line.

Surely the clean solution would be to convert to quoted-printable? though this may be a pedantic overkill solution, especially for locally delivered mail.

@corecode
Copy link
Owner

corecode commented Feb 1, 2022

That would change the whole whole content, I don't think we should take that liberty.

@Jamie-Landeg-Jones
Copy link

Jamie-Landeg-Jones commented Feb 3, 2022

Fair enough.

I know I've not been involved with this discussion or development, so I'm not arrogant enough to expect my opinion to matter, but for what it's worth, if I was doing this, I'd be uncomfortable in breaking the spec (as you also are), but I'd also be uncomfortable altering the body in (what to those outside DMA) is an arbitrary manner.

Other applications that do that have caused unexpected (and sometimes unnoticed) errors, as exampled here:

https://stackoverflow.com/questions/9999908/php-mail-function-randomly-adds-a-space-to-message-text

https://stackoverflow.com/questions/10401863/how-to-send-a-csv-attachment-with-lines-longer-than-990-characters

Just my 1 cents worth

Cheers, Jamie

@corecode
Copy link
Owner

corecode commented Feb 3, 2022

How is DMA breaking the spec?

@emaste
Copy link
Collaborator

emaste commented Feb 3, 2022

@bapt added a patch to the FreeBSD base system for this, in freebsd/freebsd-src@b0b2d05 (and a bugfix from me in freebsd/freebsd-src@1a0dde3). This is slightly different from the proposal in #49, in that it does not attempt to split on whitespace. It does not do any splitting on headers.

IMO there are a few improvements that could be made:

  • split headers as well
  • split at whitespace if possible
  • make long line splitting a configurable option (reject mail with long lines, if not enabled)

@corecode
Copy link
Owner

corecode commented Feb 3, 2022

@emaste I think an error in getline will get undetected and return success. Otherwise looks like an acceptable way to address such mails.

@Jamie-Landeg-Jones
Copy link

How is DMA breaking the spec?

It's not. There were suggestions earlier in this thread saying you should just ignore the 1000 character limit. You disagreed, and I agreed with you.

@emaste
Copy link
Collaborator

emaste commented Feb 3, 2022

@corecode getline returns -1 on error or EOF, I think the feof/getline loop is non-canonical but I think checking getline's return for >0 is the appropriate condition.

I think this should go on top of @bapt's change:

diff --git a/contrib/dma/mail.c b/contrib/dma/mail.c
index 9e00c22d71db..0462ec1665ee 100644
--- a/contrib/dma/mail.c
+++ b/contrib/dma/mail.c
@@ -405,10 +405,7 @@ readmail(struct queue *queue, int nodot, int recp_from_header)
        if ((ssize_t)error < 0)
                return (-1);
 
-       while (!feof(stdin)) {
-               newline[0] = '\0';
-               if ((linelen = getline(&line, &linecap, stdin)) <= 0)
-                       break;
+       while ((linelen = getline(&line, &linecap, stdin)) > 0)
                if (had_last_line)
                        errlogx(EX_DATAERR, "bad mail input format:"
                                " from %s (uid %d) (envelope-from %s)",

@emaste
Copy link
Collaborator

emaste commented Feb 3, 2022

@corecode ah, prior to bapt's change we had:

		if (fgets(line, sizeof(line) - 1, stdin) == NULL)
			break;

so same amount of error checking.

I think we can just check errno after the loop,

diff --git a/contrib/dma/mail.c b/contrib/dma/mail.c
index 0462ec1665ee..e052d745398c 100644
--- a/contrib/dma/mail.c
+++ b/contrib/dma/mail.c
@@ -507,8 +507,8 @@ readmail(struct queue *queue, int nodot, int recp_from_header)
                        }
                }
        }
-
-       ret = 0;
+       if (errno == 0)
+               ret = 0;
 fail:
        free(line);
        return (ret);

@corecode
Copy link
Owner

corecode commented Feb 3, 2022

I think we can keep the feof() loop condition, but just do something about error return from getline.

@emaste
Copy link
Collaborator

emaste commented Feb 3, 2022

I think it's fine to use getline()'s -1 return

I have a FreeBSD review in https://reviews.freebsd.org/D34159

@ant5
Copy link

ant5 commented Jun 21, 2023

Hello!

After switching to dma for local delivery and few monthes of things going well I've got an error "bad mail input format" with one email message.

Does I understand right that there is a patch of cure and this patch is only in FreeBSD 13+ source tree?

How can I cure this?
Does I need to extract somehow this patch(es) from FreeBSD source and create custom port for mail/dma then add those patches to the custom port and then make custom package?

Can someone guide me to eliminate future accidents?

P.S. My system is FreeBSD 12-STABLE. Dma is mostly invoked by sieve scripts for coping/forwarding.

@corecode
Copy link
Owner

Are you submitting an incorrectly formatted message? The correct fix would be to only submit correctly formatted messages.

@ant5
Copy link

ant5 commented Jun 21, 2023

One user (who is in company management) sended email to other users. Email message seems very casual.

But then he got email back:
I'm sorry to have to inform you that your message could not be delivered to one or more recipients. ... <[X@Z](mailto:X@Z)>: host 127.0.6.1[127.0.6.1] said: 451 4.2.0 <[X@Z](mailto:X@Z)> Execution of Sieve filters was aborted due to temporary failure (in reply to end of DATA command) ...
for those users who have configured copying/redirecting via sieve scripts.

The log say:
Jun 20 21:49:37 mail dma[e98a03][8169]: bad mail input format: from pop (uid 68) (envelope-from X@Z)

@corecode
Copy link
Owner

I would need to see the message to understand what happened. With the log messages it is not clear what happened. How are pop, sieve, dma, and the user related?

@ant5
Copy link

ant5 commented Jun 21, 2023

Technicaly speaking, user work on workstation with some email client. Email client sent message to postfix via SMTP. Postfix relay message to dovecot via LMTP. Dovecote execute sieve script. Sieve script invoke mailwrapper (sendmail). Mailwrapper execute dma.

I have this message extracted from bounce email but I don't know is it in virgin form and will it raise an error.

@corecode
Copy link
Owner

That is a complicated setup. Why are you using dma and not postfix for dovecot for delivery? Without the original mail delivered to dma we don't know what is going on - it seems that it passed through several MTAs, so the assumption should be that the mail was formatted correctly. This pretty much rules out a 1000+ byte header line.

@ant5
Copy link

ant5 commented Jun 21, 2023

Why are you using dma and not postfix for dovecot for delivery?

They all in separate jails. Dma in dovecot jail will relay to postfix in other jail (host).

@ant5
Copy link

ant5 commented Jun 21, 2023

I'll try to invoke dma and feed it problematic message...
But it's not so simple for me.

@ant5
Copy link

ant5 commented Jun 21, 2023

Yes. The message gives an error.

Zipped and attached:
< deleted confidentional data >

@corecode
Copy link
Owner

Thank you. The problem is that one line is exactly 1000 bytes long including the \r\n.

@haarp
Copy link

haarp commented Jan 4, 2024

Wouldn't a config option setting the max buffer size/line length be the easiest solution here? Yes, it would break the spec, but it would be up to the admin to decide that. And for local-only delivery, there are no other servers further down the line that might choke on it.

Splitting is also a decent solution, but not always desireable. Some content, like code or logs shouldn't be split, to retain proper syntax.

Relegating this to the mail producer is also not always an option. They can be unpredictable. Maybe there was an error during a cronjob and it prints an unpredictably long line. The admin would get no mail, and very likely not realize.

@corecode
Copy link
Owner

corecode commented Jan 6, 2024

we definitely don't want to break the spec. if there is a line >1000 chars, we either have to split it or reject it outright. we can't pass on an invalid message.

@kgcgs
Copy link

kgcgs commented Sep 11, 2024

The only correct thing to do here is mimic the behavior of the fully featured MTAs. Sendmail has always split long lines in order to enforce compliance in messages that it creates but happily passes messages with long lines through SMTP without molesting them. Failure to address this in dma is just going to lead to messages being lost unexpectedly which is how most of us probably got here in the first place. Seems like an easy fix? (Even if it's a part of the RFC that's likely completely unnecessary to respect.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants