All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Đoàn Trần Công Danh" <congdanhqx@gmail.com>
To: git@vger.kernel.org
Cc: "Đoàn Trần Công Danh" <congdanhqx@gmail.com>
Subject: [PATCH] mailinfo: strip CR from base64/quoted-printable email
Date: Wed, 21 Apr 2021 08:34:04 +0700	[thread overview]
Message-ID: <20210421013404.17383-1-congdanhqx@gmail.com> (raw)

When an SMTP server receives an 8-bit email message, possibly with only
LF as line ending, some of those servers decide to change said LF to
CRLF.

Some other SMTP servers, when receives an 8-bit email message, decide to
encoding such message in base64 and/or quoted-printable instead.

If an email is transfered through those 2 email servers in order, the
final recipients will receive an email contains a patch mungled with
CRLF encoded inside another encoding. Thus, such CR couldn't be dropped
by mailsplit. Such accidents have been observed in the wild [1].

Let's guess if such CR was added automatically and strip them in
mailinfo.

[1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---

 I'm not sure if guessing the heuristic to strip CR is a good approach.
 I think it's better to pass --keep-cr down from git-am.
 Let's say --keep-cr=<yes|no|auto>

 mailinfo.c             | 20 +++++++++++++++++---
 t/t5100-mailinfo.sh    |  5 +++++
 t/t5100/cr-base64.mbox | 22 ++++++++++++++++++++++
 t/t5100/info1000       |  5 +++++
 t/t5100/msg1000        |  2 ++
 t/t5100/patch1000      | 22 ++++++++++++++++++++++
 6 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 t/t5100/cr-base64.mbox
 create mode 100644 t/t5100/info1000
 create mode 100644 t/t5100/msg1000
 create mode 100644 t/t5100/patch1000

diff --git a/mailinfo.c b/mailinfo.c
index 5681d9130d..dbff867f42 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -988,16 +988,27 @@ static int handle_boundary(struct mailinfo *mi, struct strbuf *line)
 }
 
 static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line,
-				 struct strbuf *prev)
+				 struct strbuf *prev, int *keep_cr)
 {
 	size_t len = line->len;
 	const char *rest;
 
 	if (!mi->format_flowed) {
+		if (*keep_cr == -1 && len >= 2)
+			*keep_cr = !(line->buf[len - 2] == '\r' &&
+				     line->buf[len - 1] == '\n');
+		if (!*keep_cr && len >= 2 &&
+		    line->buf[len - 2] == '\r' &&
+		    line->buf[len - 1] == '\n') {
+			strbuf_setlen(line, len - 2);
+			strbuf_addch(line, '\n');
+			len--;
+		}
 		handle_filter(mi, line);
 		return;
 	}
 
+	*keep_cr = 1;
 	if (line->buf[len - 1] == '\n') {
 		len--;
 		if (len && line->buf[len - 1] == '\r')
@@ -1036,6 +1047,7 @@ static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line,
 static void handle_body(struct mailinfo *mi, struct strbuf *line)
 {
 	struct strbuf prev = STRBUF_INIT;
+	int keep_cr = -1;
 
 	/* Skip up to the first boundary */
 	if (*(mi->content_top)) {
@@ -1081,7 +1093,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 						strbuf_addbuf(&prev, sb);
 						break;
 					}
-				handle_filter_flowed(mi, sb, &prev);
+				handle_filter_flowed(mi, sb, &prev, &keep_cr);
 			}
 			/*
 			 * The partial chunk is saved in "prev" and will be
@@ -1091,7 +1103,9 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 			break;
 		}
 		default:
-			handle_filter_flowed(mi, line, &prev);
+			/* CR in plain message was processed in mailsplit */
+			keep_cr = 1;
+			handle_filter_flowed(mi, line, &prev, &keep_cr);
 		}
 
 		if (mi->input_error)
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 147e616533..9ccc11d16a 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -228,4 +228,9 @@ test_expect_success 'mailinfo handles unusual header whitespace' '
 	test_cmp expect actual
 '
 
+test_expect_success 'mailinfo strip CR after decode base64' '
+	cp $DATA/cr-base64.mbox 1000 &&
+	check_mailinfo 1000 ""
+'
+
 test_done
diff --git a/t/t5100/cr-base64.mbox b/t/t5100/cr-base64.mbox
new file mode 100644
index 0000000000..6ea9806a6b
--- /dev/null
+++ b/t/t5100/cr-base64.mbox
@@ -0,0 +1,22 @@
+From: A U Thor <mail@example.com>
+To: list@example.org
+Subject: [PATCH v2] sample
+Date: Mon,  3 Aug 2020 22:40:55 +0700
+Message-Id: <msg-id@example.com>
+Content-Type: text/plain; charset="utf-8"
+Content-Transfer-Encoding: base64
+
+T24gZGlmZmVyZW50IGRpc3RybywgcHl0ZXN0IGlzIHN1ZmZpeGVkIHdpdGggZGlmZmVyZW50IHBh
+dHRlcm5zLg0KDQotLS0NCiBjb25maWd1cmUgfCAyICstDQogMSBmaWxlIGNoYW5nZWQsIDEgaW5z
+ZXJ0aW9uKCspLCAxIGRlbGV0aW9uKC0pDQoNCmRpZmYgLS1naXQgYS9jb25maWd1cmUgYi9jb25m
+aWd1cmUNCmluZGV4IGRiMzUzOGIzLi5mN2MxYzA5NSAxMDA3NTUNCi0tLSBhL2NvbmZpZ3VyZQ0K
+KysrIGIvY29uZmlndXJlDQpAQCAtODE0LDcgKzgxNCw3IEBAIGlmIFsgJGhhdmVfcHl0aG9uMyAt
+ZXEgMSBdOyB0aGVuDQogICAgIHByaW50ZiAiQ2hlY2tpbmcgZm9yIHB5dGhvbjMgcHl0ZXN0ICg+
+PSAzLjApLi4uICINCiAgICAgY29uZj0kKG1rdGVtcCkNCiAgICAgcHJpbnRmICJbcHl0ZXN0XVxu
+bWludmVyc2lvbj0zLjBcbiIgPiAkY29uZg0KLSAgICBpZiBweXRlc3QtMyAtYyAkY29uZiAtLXZl
+cnNpb24gPi9kZXYvbnVsbCAyPiYxOyB0aGVuDQorICAgIGlmICIkcHl0aG9uIiAtbSBweXRlc3Qg
+LWMgJGNvbmYgLS12ZXJzaW9uID4vZGV2L251bGwgMj4mMTsgdGhlbg0KICAgICAgICAgcHJpbnRm
+ICJZZXMuXG4iDQogICAgICAgICBoYXZlX3B5dGhvbjNfcHl0ZXN0PTENCiAgICAgZWxzZQ0KLS0g
+DQoyLjI4LjANCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
+CmV4YW1wbGUgbWFpbGluZyBsaXN0IC0tIGxpc3RAZXhhbXBsZS5vcmcKVG8gdW5zdWJzY3JpYmUg
+c2VuZCBhbiBlbWFpbCB0byBsaXN0LWxlYXZlQGV4YW1wbGUub3JnCg==
diff --git a/t/t5100/info1000 b/t/t5100/info1000
new file mode 100644
index 0000000000..dab2228b70
--- /dev/null
+++ b/t/t5100/info1000
@@ -0,0 +1,5 @@
+Author: A U Thor
+Email: mail@example.com
+Subject: sample
+Date: Mon, 3 Aug 2020 22:40:55 +0700
+
diff --git a/t/t5100/msg1000 b/t/t5100/msg1000
new file mode 100644
index 0000000000..5e8e860aae
--- /dev/null
+++ b/t/t5100/msg1000
@@ -0,0 +1,2 @@
+On different distro, pytest is suffixed with different patterns.
+
diff --git a/t/t5100/patch1000 b/t/t5100/patch1000
new file mode 100644
index 0000000000..51c4fb4cb5
--- /dev/null
+++ b/t/t5100/patch1000
@@ -0,0 +1,22 @@
+---
+ configure | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/configure b/configure
+index db3538b3..f7c1c095 100755
+--- a/configure
++++ b/configure
+@@ -814,7 +814,7 @@ if [ $have_python3 -eq 1 ]; then
+     printf "Checking for python3 pytest (>= 3.0)... "
+     conf=$(mktemp)
+     printf "[pytest]\nminversion=3.0\n" > $conf
+-    if pytest-3 -c $conf --version >/dev/null 2>&1; then
++    if "$python" -m pytest -c $conf --version >/dev/null 2>&1; then
+         printf "Yes.\n"
+         have_python3_pytest=1
+     else
+-- 
+2.28.0
+_______________________________________________
+example mailing list -- list@example.org
+To unsubscribe send an email to list-leave@example.org
-- 
2.31.1.192.g0881477623


             reply	other threads:[~2021-04-21  1:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  1:34 Đoàn Trần Công Danh [this message]
2021-04-21  2:09 ` [PATCH] mailinfo: strip CR from base64/quoted-printable email Junio C Hamano
2021-04-21  3:32 ` brian m. carlson
2021-04-21 12:07   ` Đoàn Trần Công Danh
2021-04-22  1:10     ` brian m. carlson
2021-05-04 17:19 ` [PATCH v2 0/5] Teach am/mailinfo to process quoted CR Đoàn Trần Công Danh
2021-05-04 17:19   ` [PATCH v2 1/5] mailinfo: avoid magic number in option parsing Đoàn Trần Công Danh
2021-05-04 17:19   ` [PATCH v2 2/5] mailinfo: warn if CR found in base64/quoted-printable email Đoàn Trần Công Danh
2021-05-05  3:41     ` Junio C Hamano
2021-05-04 17:20   ` [PATCH v2 3/5] mailinfo: skip quoted CR on user's wish Đoàn Trần Công Danh
2021-05-05  4:12     ` Junio C Hamano
2021-05-05 15:53       ` Đoàn Trần Công Danh
2021-05-04 17:20   ` [PATCH v2 4/5] mailinfo: strip quoted CR on users' wish Đoàn Trần Công Danh
2021-05-05  4:27     ` Junio C Hamano
2021-05-04 17:20   ` [PATCH v2 5/5] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh
2021-05-05  4:31   ` [PATCH v2 0/5] Teach am/mailinfo to process quoted CR Junio C Hamano
2021-05-06 15:02 ` [PATCH v3 0/6] " Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 1/6] mailinfo: load default metainfo_charset lazily Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 2/6] mailinfo: stop parsing options manually Đoàn Trần Công Danh
2021-05-08 10:44     ` Junio C Hamano
2021-05-06 15:02   ` [PATCH v3 3/6] mailinfo: warn if CR found in decoded base64/QP email Đoàn Trần Công Danh
2021-05-08 10:52     ` Junio C Hamano
2021-05-06 15:02   ` [PATCH v3 4/6] mailinfo: allow squelching quoted CR warning Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 5/6] mailinfo: allow stripping quoted CR without warning Đoàn Trần Công Danh
2021-05-06 15:02   ` [PATCH v3 6/6] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh
2021-05-08 10:57   ` [PATCH v3 0/6] Teach am/mailinfo to process quoted CR Junio C Hamano
     [not found] ` <cover.1620309355.git.congdanhqx@gmail.com>
2021-05-06 15:02   ` [PATCH v3 2/6] mailinfo: stop parse options manually Đoàn Trần Công Danh
2021-05-06 15:19     ` Đoàn Trần Công Danh
2021-05-09 17:12 ` [PATCH v4 0/6] Teach am/mailinfo to process quoted CR Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 1/6] mailinfo: load default metainfo_charset lazily Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 2/6] mailinfo: stop parsing options manually Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 3/6] mailinfo: warn if CRLF found in decoded base64/QP email Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 4/6] mailinfo: allow squelching quoted CRLF warning Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 5/6] mailinfo: allow stripping quoted CR without warning Đoàn Trần Công Danh
2021-05-09 17:12   ` [PATCH v4 6/6] am: learn to process quoted lines that ends with CRLF Đoàn Trần Công Danh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210421013404.17383-1-congdanhqx@gmail.com \
    --to=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.