All of lore.kernel.org
 help / color / mirror / Atom feed
From: Danh Doan <congdanhqx@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Martin Ågren" <martin.agren@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
Date: Sun, 19 Apr 2020 09:48:05 +0700	[thread overview]
Message-ID: <20200419024805.GC9169@danh.dev> (raw)
In-Reply-To: <xmqqk12ctmm1.fsf@gitster.c.googlers.com>

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
> 
> > This is equivalent as long as `line->len` is equal to
> > `strlen(line->buf)`, which it will be (should be) because it's a
> > strbuf. Ok.
> 
> For the guarantee to hold true, line->buf[0..line->len] should not
> have any '\0' byte in it.  
> 
> This helper has two callers, but in either case, it needs to be
> prepared to work on output of decode_[bq]_segment().  Is there code
> anywhere that guarntees that the decoding won't stuff '\0' in the
> line?

- First caller: `decode_header`, we don't have such guarantee,
  The old code will discard everything after first NUL characters.
  I'm _not_ really familiar with email standard, does email allow
  UTF-16 (albeit in [bq]-encoded) in header? If yes, the current code
  is likely disallow it. The new code accidentally, accept
  [bq]-encoded-utf-16 header and reencode to utf-8 for commit.
  Yes, it's very likely only UTF-8, ISO-8859-1, some variant of
  ISO-2022 is used nowaday in email.
  *If* we would like to not exclude UTF-16, the new question is should
  we trust the length of newly converted utf-8 string.

- Second caller: `handle_commit_msg`: everything get interesting from
  here.

Get back to the question of trusting the length of newly converted
utf-8 string.

I _think_ we should,
because I _think_ there shouldn't be any discrimination against any
encoding (be it utf-8, ISO-8859-1, etc...),
the current code allows NUL character in utf-8 [bq]-encoded string
in this function (early return) and its caller,
and report an error later:

	error: a NUL byte in commit log message not allowed.

meanwhile, if the email was sent in other encoding, the current code 
discards everything after NUL in that line,
thus silently accept broken commit message.

Attached is the faulty patch in ISO-8859-1, which was used to
demonstrate my words.
The current code will make a commit with only "Áb" in the body,
while the new code rightly claims we have a faulty email.

-- 
Danh

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: faulty.patch --]
[-- Type: text/x-diff; charset=utf-8, Size: 579 bytes --]

From e27fb59f5a4854efc0badd902673fd636bcab52e Mon Sep 17 00:00:00 2001
From: =?ISO-8859-1?q?=C4=90o=C3=A0n=20Tr=E1=BA=A7n=20C=C3=B4ng=20Danh?=
 <congdanhqx@gmail.com>
Date: Sun, 19 Apr 2020 09:31:22 +0700
Subject: [PATCH] =?ISO-8859-1?q?=C4=CB=D1=CF=D6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

Áb\0çdèfg

---
 afile | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 afile

diff --git a/afile b/afile
new file mode 100644
index 0000000000..e69de29bb2
-- 
2.26.1.301.g55bc3eb7cb


  reply	other threads:[~2020-04-19  2:48 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-18  3:54 [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
2020-04-18 19:56 ` Martin Ågren
2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
2020-04-18 20:18     ` [PATCH 1/6] am: use `strbuf_attach()` correctly Martin Ågren
2020-04-18 20:18     ` [PATCH 2/6] strbuf_attach: correctly pass in `strlen() + 1` for `alloc` Martin Ågren
2020-04-18 20:18     ` [PATCH 3/6] strbuf: use `strbuf_attach()` correctly Martin Ågren
2020-04-18 20:18     ` [PATCH 4/6] fast-import: avoid awkward use of `strbuf_attach()` Martin Ågren
2020-04-18 20:18     ` [PATCH 5/6] rerere: " Martin Ågren
2020-04-18 20:18     ` [PATCH 6/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
2020-04-19  4:44     ` [PATCH 0/6] " Martin Ågren
2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
2020-04-20 17:30         ` Junio C Hamano
2020-04-21 18:44           ` Martin Ågren
2020-04-19 12:32       ` [PATCH 2/4] strbuf: introduce `strbuf_attachstr_len()` Martin Ågren
2020-04-19 12:32       ` [PATCH 3/4] strbuf: introduce `strbuf_attachstr()` Martin Ågren
2020-04-20 19:39         ` Junio C Hamano
2020-04-21 18:47           ` Martin Ågren
2020-04-19 12:32       ` [PATCH 4/4] strbuf_attach: prefer `strbuf_attachstr_len()` Martin Ågren
2020-04-18 23:12   ` [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Junio C Hamano
2020-04-19  2:48     ` Danh Doan [this message]
2020-04-19  4:34       ` Martin Ågren
2020-04-19  5:32         ` Junio C Hamano
2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
2020-04-19 12:25     ` Martin Ågren
2020-04-19 14:17       ` Danh Doan
2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
2020-04-19 12:29     ` Martin Ågren
2020-04-19 14:16       ` Danh Doan
2020-04-20 19:59     ` Junio C Hamano
2020-04-20 23:53       ` Danh Doan
2020-04-19 11:00   ` [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
2020-04-19 12:30     ` Martin Ågren
2020-04-19 14:24       ` Danh Doan
2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 2/3] mailinfo.c: avoid strlen on strings that can contains NUL Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200419024805.GC9169@danh.dev \
    --to=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.