git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Đoàn Trần Công Danh" <congdanhqx@gmail.com>
To: git@vger.kernel.org
Cc: "Đoàn Trần Công Danh" <congdanhqx@gmail.com>
Subject: [PATCH] mailinfo: strip CR from base64/quoted-printable email
Date: Wed, 21 Apr 2021 08:34:04 +0700	[thread overview]
Message-ID: <20210421013404.17383-1-congdanhqx@gmail.com> (raw)

When an SMTP server receives an 8-bit email message, possibly with only
LF as line ending, some of those servers decide to change said LF to
CRLF.

Some other SMTP servers, when receives an 8-bit email message, decide to
encoding such message in base64 and/or quoted-printable instead.

If an email is transfered through those 2 email servers in order, the
final recipients will receive an email contains a patch mungled with
CRLF encoded inside another encoding. Thus, such CR couldn't be dropped
by mailsplit. Such accidents have been observed in the wild [1].

Let's guess if such CR was added automatically and strip them in
mailinfo.

[1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---

 I'm not sure if guessing the heuristic to strip CR is a good approach.
 I think it's better to pass --keep-cr down from git-am.
 Let's say --keep-cr=<yes|no|auto>

 mailinfo.c             | 20 +++++++++++++++++---
 t/t5100-mailinfo.sh    |  5 +++++
 t/t5100/cr-base64.mbox | 22 ++++++++++++++++++++++
 t/t5100/info1000       |  5 +++++
 t/t5100/msg1000        |  2 ++
 t/t5100/patch1000      | 22 ++++++++++++++++++++++
 6 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 t/t5100/cr-base64.mbox
 create mode 100644 t/t5100/info1000
 create mode 100644 t/t5100/msg1000
 create mode 100644 t/t5100/patch1000

diff --git a/mailinfo.c b/mailinfo.c
index 5681d9130d..dbff867f42 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -988,16 +988,27 @@ static int handle_boundary(struct mailinfo *mi, struct strbuf *line)
 }
 
 static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line,
-				 struct strbuf *prev)
+				 struct strbuf *prev, int *keep_cr)
 {
 	size_t len = line->len;
 	const char *rest;
 
 	if (!mi->format_flowed) {
+		if (*keep_cr == -1 && len >= 2)
+			*keep_cr = !(line->buf[len - 2] == '\r' &&
+				     line->buf[len - 1] == '\n');
+		if (!*keep_cr && len >= 2 &&
+		    line->buf[len - 2] == '\r' &&
+		    line->buf[len - 1] == '\n') {
+			strbuf_setlen(line, len - 2);
+			strbuf_addch(line, '\n');
+			len--;
+		}
 		handle_filter(mi, line);
 		return;
 	}
 
+	*keep_cr = 1;
 	if (line->buf[len - 1] == '\n') {
 		len--;
 		if (len && line->buf[len - 1] == '\r')
@@ -1036,6 +1047,7 @@ static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line,
 static void handle_body(struct mailinfo *mi, struct strbuf *line)
 {
 	struct strbuf prev = STRBUF_INIT;
+	int keep_cr = -1;
 
 	/* Skip up to the first boundary */
 	if (*(mi->content_top)) {
@@ -1081,7 +1093,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 						strbuf_addbuf(&prev, sb);
 						break;
 					}
-				handle_filter_flowed(mi, sb, &prev);
+				handle_filter_flowed(mi, sb, &prev, &keep_cr);
 			}
 			/*
 			 * The partial chunk is saved in "prev" and will be
@@ -1091,7 +1103,9 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 			break;
 		}
 		default:
-			handle_filter_flowed(mi, line, &prev);
+			/* CR in plain message was processed in mailsplit */
+			keep_cr = 1;
+			handle_filter_flowed(mi, line, &prev, &keep_cr);
 		}
 
 		if (mi->input_error)
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 147e616533..9ccc11d16a 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -228,4 +228,9 @@ test_expect_success 'mailinfo handles unusual header whitespace' '
 	test_cmp expect actual
 '
 
+test_expect_success 'mailinfo strip CR after decode base64' '
+	cp $DATA/cr-base64.mbox 1000 &&
+	check_mailinfo 1000 ""
+'
+
 test_done
diff --git a/t/t5100/cr-base64.mbox b/t/t5100/cr-base64.mbox
new file mode 100644
index 0000000000..6ea9806a6b
--- /dev/null
+++ b/t/t5100/cr-base64.mbox
@@ -0,0 +1,22 @@
+From: A U Thor <mail@example.com>
+To: list@example.org
+Subject: [PATCH v2] sample
+Date: Mon,  3 Aug 2020 22:40:55 +0700
+Message-Id: <msg-id@example.com>
+Content-Type: text/plain; charset="utf-8"
+Content-Transfer-Encoding: base64
+
+T24gZGlmZmVyZW50IGRpc3RybywgcHl0ZXN0IGlzIHN1ZmZpeGVkIHdpdGggZGlmZmVyZW50IHBh
+dHRlcm5zLg0KDQotLS0NCiBjb25maWd1cmUgfCAyICstDQogMSBmaWxlIGNoYW5nZWQsIDEgaW5z
+ZXJ0aW9uKCspLCAxIGRlbGV0aW9uKC0pDQoNCmRpZmYgLS1naXQgYS9jb25maWd1cmUgYi9jb25m
+aWd1cmUNCmluZGV4IGRiMzUzOGIzLi5mN2MxYzA5NSAxMDA3NTUNCi0tLSBhL2NvbmZpZ3VyZQ0K
+KysrIGIvY29uZmlndXJlDQpAQCAtODE0LDcgKzgxNCw3IEBAIGlmIFsgJGhhdmVfcHl0aG9uMyAt
+ZXEgMSBdOyB0aGVuDQogICAgIHByaW50ZiAiQ2hlY2tpbmcgZm9yIHB5dGhvbjMgcHl0ZXN0ICg+
+PSAzLjApLi4uICINCiAgICAgY29uZj0kKG1rdGVtcCkNCiAgICAgcHJpbnRmICJbcHl0ZXN0XVxu
+bWludmVyc2lvbj0zLjBcbiIgPiAkY29uZg0KLSAgICBpZiBweXRlc3QtMyAtYyAkY29uZiAtLXZl
+cnNpb24gPi9kZXYvbnVsbCAyPiYxOyB0aGVuDQorICAgIGlmICIkcHl0aG9uIiAtbSBweXRlc3Qg
+LWMgJGNvbmYgLS12ZXJzaW9uID4vZGV2L251bGwgMj4mMTsgdGhlbg0KICAgICAgICAgcHJpbnRm
+ICJZZXMuXG4iDQogICAgICAgICBoYXZlX3B5dGhvbjNfcHl0ZXN0PTENCiAgICAgZWxzZQ0KLS0g
+DQoyLjI4LjANCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
+CmV4YW1wbGUgbWFpbGluZyBsaXN0IC0tIGxpc3RAZXhhbXBsZS5vcmcKVG8gdW5zdWJzY3JpYmUg
+c2VuZCBhbiBlbWFpbCB0byBsaXN0LWxlYXZlQGV4YW1wbGUub3JnCg==
diff --git a/t/t5100/info1000 b/t/t5100/info1000
new file mode 100644
index 0000000000..dab2228b70
--- /dev/null
+++ b/t/t5100/info1000
@@ -0,0 +1,5 @@
+Author: A U Thor
+Email: mail@example.com
+Subject: sample
+Date: Mon, 3 Aug 2020 22:40:55 +0700
+
diff --git a/t/t5100/msg1000 b/t/t5100/msg1000
new file mode 100644
index 0000000000..5e8e860aae
--- /dev/null
+++ b/t/t5100/msg1000
@@ -0,0 +1,2 @@
+On different distro, pytest is suffixed with different patterns.
+
diff --git a/t/t5100/patch1000 b/t/t5100/patch1000
new file mode 100644
index 0000000000..51c4fb4cb5
--- /dev/null
+++ b/t/t5100/patch1000
@@ -0,0 +1,22 @@
+---
+ configure | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/configure b/configure
+index db3538b3..f7c1c095 100755
+--- a/configure
++++ b/configure
+@@ -814,7 +814,7 @@ if [ $have_python3 -eq 1 ]; then
+     printf "Checking for python3 pytest (>= 3.0)... "
+     conf=$(mktemp)
+     printf "[pytest]\nminversion=3.0\n" > $conf
+-    if pytest-3 -c $conf --version >/dev/null 2>&1; then
++    if "$python" -m pytest -c $conf --version >/dev/null 2>&1; then
+         printf "Yes.\n"
+         have_python3_pytest=1
+     else
+-- 
+2.28.0
+_______________________________________________
+example mailing list -- list@example.org
+To unsubscribe send an email to list-leave@example.org
-- 
2.31.1.192.g0881477623


             reply	other threads:[~2021-04-21  1:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  1:34 Đoàn Trần Công Danh [this message]
2021-04-21  2:09 ` [PATCH] mailinfo: strip CR from base64/quoted-printable email Junio C Hamano
2021-04-21  3:32 ` brian m. carlson
2021-04-21 12:07   ` Đoàn Trần Công Danh
2021-04-22  1:10     ` brian m. carlson
2021-05-04 17:19 ` [PATCH v2 0/5] Teach am/mailinfo to process quoted CR Đoàn Trần Công Danh
2021-05-04 17:19   ` [PATCH v2 1/5] mailinfo: avoid magic number in option parsing Đoàn Trần Công Danh
2021-05-04 17:19   ` [PATCH v2 2/5] mailinfo: warn if CR found in base64/quoted-printable email Đoàn Trần Công Danh
2021-05-05  3:41     ` Junio C Hamano
2021-05-04 17:20   ` [PATCH v2 3/5] mailinfo: skip quoted CR on user's wish Đoàn Trần Công Danh
2021-05-05  4:12     ` Junio C Hamano
2021-05-05 15:53       ` Đoàn Trần Công Danh
2021-05-04 17:20   ` [PATCH v2 4/5] mailinfo: strip quoted CR on users' wish Đoàn Trần Công Danh
2021-05-05  4:27     ` Junio C Hamano
2021-05-04 17:20   ` [PATCH v2 5/5] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh
2021-05-05  4:31   ` [PATCH v2 0/5] Teach am/mailinfo to process quoted CR Junio C Hamano
2021-05-06 15:02 ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 1/6] mailinfo: load default metainfo_charset lazily Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 2/6] mailinfo: stop parsing options manually Đoàn Trần Công Danh
2021-05-08 10:44     ` Junio C Hamano
2021-05-06 15:02   ` [PATCH v3 3/6] mailinfo: warn if CR found in decoded base64/QP email Đoàn Trần Công Danh
2021-05-08 10:52     ` Junio C Hamano
2021-05-06 15:02   ` [PATCH v3 4/6] mailinfo: allow squelching quoted CR warning Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 5/6] mailinfo: allow stripping quoted CR without warning Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 6/6] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh
2021-05-08 10:57   ` [PATCH v3 0/6] Teach am/mailinfo to process quoted CR Junio C Hamano
     [not found] ` <cover.1620309355.git.congdanhqx@gmail.com>
2021-05-06 15:02   ` [PATCH v3 2/6] mailinfo: stop parse options manually Đoàn Trần Công Danh
2021-05-06 15:19     ` Đoàn Trần Công Danh
2021-05-09 17:12 ` [PATCH v4 0/6] Teach am/mailinfo to process quoted CR Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 1/6] mailinfo: load default metainfo_charset lazily Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 2/6] mailinfo: stop parsing options manually Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 3/6] mailinfo: warn if CRLF found in decoded base64/QP email Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 4/6] mailinfo: allow squelching quoted CRLF warning Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 5/6] mailinfo: allow stripping quoted CR without warning Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 6/6] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210421013404.17383-1-congdanhqx@gmail.com \
    --to=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).