git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
@ 2020-04-18  3:54 Đoàn Trần Công Danh
  2020-04-18 19:56 ` Martin Ågren
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-18  3:54 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

We're passing buffer from strbuf to reencode_string,
which will call strlen(3) on that buffer,
and discard the length of newly created buffer.

Then, we compute the length of the return buffer to attach to strbuf.

We can do better by reusing all the available information,
and call the underlying lower level function that will be called
indirectly by reencode_string.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 742fa376ab..0e9911df6d 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
 			   struct strbuf *line, const char *charset)
 {
 	char *out;
+	size_t out_len;
 
 	if (!mi->metainfo_charset || !charset || !*charset)
 		return 0;
 
 	if (same_encoding(mi->metainfo_charset, charset))
 		return 0;
-	out = reencode_string(line->buf, mi->metainfo_charset, charset);
+	out = reencode_string_len(line->buf, line->len,
+				  mi->metainfo_charset, charset, &out_len);
 	if (!out) {
 		mi->input_error = -1;
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out));
+	strbuf_attach(line, out, out_len, out_len);
 	return 0;
 }
 
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-18  3:54 [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
@ 2020-04-18 19:56 ` Martin Ågren
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
  2020-04-18 23:12   ` [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Junio C Hamano
  2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
  2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
  2 siblings, 2 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 19:56 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: Git Mailing List

Hi Danh (I think that's how you want to be addressed?),

On Sat, 18 Apr 2020 at 06:00, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
>                            struct strbuf *line, const char *charset)
>  {
>         char *out;
> +       size_t out_len;
>
>         if (!mi->metainfo_charset || !charset || !*charset)
>                 return 0;
>
>         if (same_encoding(mi->metainfo_charset, charset))
>                 return 0;
> -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> +       out = reencode_string_len(line->buf, line->len,
> +                                 mi->metainfo_charset, charset, &out_len);

This is equivalent as long as `line->len` is equal to
`strlen(line->buf)`, which it will be (should be) because it's a
strbuf. Ok.

>         if (!out) {
>                 mi->input_error = -1;
>                 return error("cannot convert from %s to %s",
>                              charset, mi->metainfo_charset);
>         }
> -       strbuf_attach(line, out, strlen(out), strlen(out));
> +       strbuf_attach(line, out, out_len, out_len);

This conversion is ok as such. I wondered why we pass in the same
value twice (before and after this patch). Turns out this usage is wrong
(as per the documentation in strbuf.h) but safe (as per my understanding
of the implementation in strbuf.c). I'll follow up with a series that
fell out of that investigation.

>         return 0;
>  }

All in all, this conversion is correct and it doesn't leave the
use of `strbuf_attach()` any less correct than it already was.


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage
  2020-04-18 19:56 ` Martin Ågren
@ 2020-04-18 20:18   ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 1/6] am: use `strbuf_attach()` correctly Martin Ågren
                       ` (7 more replies)
  2020-04-18 23:12   ` [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Junio C Hamano
  1 sibling, 8 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

On Sat, 18 Apr 2020 at 21:56, Martin Ågren <martin.agren@gmail.com> wrote:
> > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > +       strbuf_attach(line, out, out_len, out_len);
>
> This conversion is ok as such. I wondered why we pass in the same
> value twice (before and after this patch). Turns out this usage is wrong
> (as per the documentation in strbuf.h) but safe (as per my understanding
> of the implementation in strbuf.c). I'll follow up with a series that
> fell out of that investigation.

Here's that series. It could go in parallel to Danh's, or one could go
on top of the other. Any of those would be ok with me.

I think `strbuf_attach()` is sufficiently hard to use that we might as
well simplify it for the majority of the users that don't care about the
distinction between the string's length and the size of the allocated
memory, and avoid it for the few remaining users that are just as well
off using `strbuf_add()`.

The summary is that this function takes `len` and `alloc`, where the
latter must be greater than the former, yet several callers use the same
value for both. I first thought this could cause quite hairy problems
such as writing outside of allocated memory, but it doesn't look that
way. See the patches for more information.

An alternative to the approach taken here would be to introduce
`strbuf_attachstr()` and convert most existing users, then convert the
few remaining ones to use the new function or to move in another
direction. But the new name is a downer -- what else would you attach to
a strbuf if not a string?

Another alternative would be to first convert certain users away from
the function, then simplify it for the remainder. I preferred to first
spend a few patches converting a few existing callers to make it clear
that those are ok.

Martin

Martin Ågren (6):
  am: use `strbuf_attach()` correctly
  strbuf_attach: correctly pass in `strlen() + 1` for `alloc`
  strbuf: use `strbuf_attach()` correctly
  fast-import: avoid awkward use of `strbuf_attach()`
  rerere: avoid awkward use of `strbuf_attach()`
  strbuf: simplify `strbuf_attach()` usage

 strbuf.h             | 16 ++++++++--------
 apply.c              |  2 +-
 archive.c            |  2 +-
 blame.c              |  2 +-
 builtin/am.c         |  2 +-
 convert.c            |  4 ++--
 fast-import.c        |  7 ++++---
 imap-send.c          |  2 +-
 mailinfo.c           |  2 +-
 merge-recursive.c    |  2 +-
 path.c               |  3 +--
 pretty.c             |  4 ++--
 refs/files-backend.c |  2 +-
 rerere.c             |  3 ++-
 strbuf.c             | 11 +++++++----
 trailer.c            |  2 +-
 16 files changed, 35 insertions(+), 31 deletions(-)

-- 
2.26.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/6] am: use `strbuf_attach()` correctly
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 2/6] strbuf_attach: correctly pass in `strlen() + 1` for `alloc` Martin Ågren
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Among other parameters, `strbuf_attach()` takes a length and an amount
of allocated memory. In strbuf.h, it is documented that the latter "must
be larger than the string length, because the string you pass is
supposed to be a NUL-terminated string".

In builtin/am.c, we simply pass in the length of the string twice.

My first assumption was that we'd end up with `alloc == len` and that,
e.g., a subsequent `strbuf_avail(sb)` would evaluate `sb->alloc
- sb->len - 1`, resulting in a huge return value, which could be quite
bad. But luckily, we end up in `strbuf_grow()` where we reallocate a
larger buffer, after which we reinstate a '\0' and everything is fine.

One might ask if the function was documented incorrectly in dd613e6b87
("Strbuf documentation: document most functions", 2008-06-04), but on
the other hand, one really has to wonder whether it's actually useful to
be able to pass in `alloc == len` only to end up performing the
allocation, copying and freeing which this function very much looks like
it would keep us from having to do.

Pass in a value one greater than the length for the `alloc` parameter.
The string has been allocated correctly using the strbuf machinery in
`read_commit_msg()` and we really do have an extra byte at the end with
a NUL. This means both that the buffer is as large as we claim it to be
and that the addition is safe.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 builtin/am.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index e3dfd93c25..e6a9fe8111 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1101,7 +1101,7 @@ static void am_append_signoff(struct am_state *state)
 {
 	struct strbuf sb = STRBUF_INIT;
 
-	strbuf_attach(&sb, state->msg, state->msg_len, state->msg_len);
+	strbuf_attach(&sb, state->msg, state->msg_len, state->msg_len + 1);
 	append_signoff(&sb, 0, 0);
 	state->msg = strbuf_detach(&sb, &state->msg_len);
 }
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] strbuf_attach: correctly pass in `strlen() + 1` for `alloc`
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
  2020-04-18 20:18     ` [PATCH 1/6] am: use `strbuf_attach()` correctly Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 3/6] strbuf: use `strbuf_attach()` correctly Martin Ågren
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

This is similar to the previous commit where we corrected a use of
`strbuf_attach()` to not pass in the same value for `len` and `alloc`.
The sites addressed in this commit all use `strlen()`, which makes it
obvious that we're working with NUL-terminated strings and that we can
safely switch to providing `len + 1` for `alloc`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 mailinfo.c           | 2 +-
 path.c               | 2 +-
 refs/files-backend.c | 2 +-
 trailer.c            | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 742fa376ab..af5d2cad31 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -459,7 +459,7 @@ static int convert_to_utf8(struct mailinfo *mi,
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out));
+	strbuf_attach(line, out, strlen(out), strlen(out) + 1);
 	return 0;
 }
 
diff --git a/path.c b/path.c
index 9bd717c307..5745a71d36 100644
--- a/path.c
+++ b/path.c
@@ -816,7 +816,7 @@ const char *enter_repo(const char *path, int strict)
 			if (!newpath)
 				return NULL;
 			strbuf_attach(&used_path, newpath, strlen(newpath),
-				      strlen(newpath));
+				      strlen(newpath) + 1);
 		}
 		for (i = 0; suffix[i]; i++) {
 			struct stat st;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a..76bb2ef490 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1514,7 +1514,7 @@ static int commit_ref(struct ref_lock *lock)
 		size_t len = strlen(path);
 		struct strbuf sb_path = STRBUF_INIT;
 
-		strbuf_attach(&sb_path, path, len, len);
+		strbuf_attach(&sb_path, path, len, len + 1);
 
 		/*
 		 * If this fails, commit_lock_file() will also fail
diff --git a/trailer.c b/trailer.c
index 0c414f2fed..135f71aef1 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1095,7 +1095,8 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	for (ptr = trailer_lines; *ptr; ptr++) {
 		if (last && isspace((*ptr)->buf[0])) {
 			struct strbuf sb = STRBUF_INIT;
-			strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
+			strbuf_attach(&sb, *last, strlen(*last),
+				      strlen(*last) + 1);
 			strbuf_addbuf(&sb, *ptr);
 			*last = strbuf_detach(&sb, NULL);
 			continue;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] strbuf: use `strbuf_attach()` correctly
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
  2020-04-18 20:18     ` [PATCH 1/6] am: use `strbuf_attach()` correctly Martin Ågren
  2020-04-18 20:18     ` [PATCH 2/6] strbuf_attach: correctly pass in `strlen() + 1` for `alloc` Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 4/6] fast-import: avoid awkward use of `strbuf_attach()` Martin Ågren
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Similar to earlier commits, don't pass in the same value for `len` and
`alloc` to `strbuf_attach()`. This string comes from
`reencode_string_iconv()` which ensures we have an extra byte at the
end.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index bb0065ccaf..deb338412e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -152,7 +152,7 @@ int strbuf_reencode(struct strbuf *sb, const char *from, const char *to)
 	if (!out)
 		return -1;
 
-	strbuf_attach(sb, out, len, len);
+	strbuf_attach(sb, out, len, len + 1);
 	return 0;
 }
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] fast-import: avoid awkward use of `strbuf_attach()`
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (2 preceding siblings ...)
  2020-04-18 20:18     ` [PATCH 3/6] strbuf: use `strbuf_attach()` correctly Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 5/6] rerere: " Martin Ågren
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

As explained in an earlier commit, per the documentation of
`strbuf_attach()`, it is incorrect to pass in the same value for `alloc`
as we do for `len`. But, and this was also explained earlier, doing so
is still ok-ish because we'll end up allocating a large enough buffer
under the hood.

But then one really has to wonder whether

  strbuf_attach(&sb, buf, size, size);

is any better than

  strbuf_reset(&sb);
  strbuf_add(&sb, buf, size);
  free(buf);

The latter is certainly a lot less subtle about what is going on, and if
we're lucky, the strbuf's allocated buffer is large enough that we won't
even need to allocate. So let's change to this more explicit form.

In short, this commit should not make things any worse.

Nearby commits are changing other callsites to pass in a larger 'alloc`
parameter. Maybe that's safe here, too -- I admit I don't quite follow
where this memory comes from. In the future, we could possibly switch
back to `strbuf_attach()` here after looking into the allocations in
more detail. The immediate reason for this commit is that we want to
simplify the usage of `strbuf_attach()`, and we won't be able to pass in
"size, size" any more.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 fast-import.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 202dda11a6..7fd501c5cf 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2946,10 +2946,11 @@ static void cat_blob(struct object_entry *oe, struct object_id *oid)
 	cat_blob_write("\n", 1);
 	if (oe && oe->pack_id == pack_id) {
 		last_blob.offset = oe->idx.offset;
-		strbuf_attach(&last_blob.data, buf, size, size);
+		strbuf_reset(&last_blob.data);
+		strbuf_add(&last_blob.data, buf, size);
 		last_blob.depth = oe->depth;
-	} else
-		free(buf);
+	}
+	free(buf);
 }
 
 static void parse_get_mark(const char *p)
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] rerere: avoid awkward use of `strbuf_attach()`
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (3 preceding siblings ...)
  2020-04-18 20:18     ` [PATCH 4/6] fast-import: avoid awkward use of `strbuf_attach()` Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-18 20:18     ` [PATCH 6/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Similar to the previous commit, avoid passing `strbuf_attach()` the same
value for `len` and `alloc` by switching to `strbuf_add()`. This avoids
a subtle allocate+copy+free dance in favour of a much more obvious one.

Unlike in the previous commit, for this site, I know that it's not safe
to pass in "len, len + 1". Trying that makes `strbuf_attach()` write a
trailing '\0' at `result.ptr[len]`. Running our test suite with
AddressSanitizer will spot an out-of-bounds write due to this.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 rerere.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rerere.c b/rerere.c
index 9281131a9f..c3e3714824 100644
--- a/rerere.c
+++ b/rerere.c
@@ -1007,7 +1007,8 @@ static int handle_cache(struct index_state *istate,
 	else
 		io.io.output = NULL;
 	strbuf_init(&io.input, 0);
-	strbuf_attach(&io.input, result.ptr, result.size, result.size);
+	strbuf_add(&io.input, result.ptr, result.size);
+	free(result.ptr);
 
 	/*
 	 * Grab the conflict ID and optionally write the original
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] strbuf: simplify `strbuf_attach()` usage
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (4 preceding siblings ...)
  2020-04-18 20:18     ` [PATCH 5/6] rerere: " Martin Ågren
@ 2020-04-18 20:18     ` Martin Ågren
  2020-04-19  4:44     ` [PATCH 0/6] " Martin Ågren
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-18 20:18 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

The last few commits have fixed a decent number of call sites that pass
in the same value for `alloc` and `len` when attaching a string to a
strbuf, despite the documentation being quite clear about this not being
ok. As a result, all users currently pass in `len + 1`.

Because this interface was used incorrectly (albeit not necessarily
insecurely) before, simplify it by making `strbuf_attach()` simply assume that
the allocated length is exactly `len + 1`. If and when we do gain one or
two call sites that know better, we can either keep on using this
pessimistic assumption and it might not actually hurt much, or we could
give them a dedicated function similar to the current more flexible (and
proven error-prone) `strbuf_attach()`.

This means we can drop the call to `strbuf_grow()`. It really only
served to do the alloc+copy+free dance that we were supposedly avoiding.

That said, we should consider the edge case of `buf == NULL`. Let's
check for this and BUG. Even for a zero-length string, you need
somewhere to stick the trailing NUL.

(Also: If someone used to pass `alloc == len` for the case `len == 0`,
we used to end up leaking `buf` on the assumption that we were using
`strbuf_slopbuf`. Not the biggest leak ever, but still.)

Finally, one might wonder if we could go further and make
`strbuf_attach()` call `strlen()`. Some call sites wouldn't mind, but
others do have the length already and we might as well use it. Those
that don't have the length will figure it out as they call this
function. Before this commit, they'd do two calls to `strlen()` so they
either didn't care much or they trusted the compiler to optimize for
them. They're not any worse off now.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h             | 16 ++++++++--------
 apply.c              |  2 +-
 archive.c            |  2 +-
 blame.c              |  2 +-
 builtin/am.c         |  2 +-
 convert.c            |  4 ++--
 imap-send.c          |  2 +-
 mailinfo.c           |  2 +-
 merge-recursive.c    |  2 +-
 path.c               |  3 +--
 pretty.c             |  4 ++--
 refs/files-backend.c |  2 +-
 strbuf.c             | 11 +++++++----
 trailer.c            |  3 +--
 14 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index ce8e49c0b2..72dcb43795 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -110,14 +110,14 @@ void strbuf_release(struct strbuf *sb);
 char *strbuf_detach(struct strbuf *sb, size_t *sz);
 
 /**
- * Attach a string to a buffer. You should specify the string to attach,
- * the current length of the string and the amount of allocated memory.
- * The amount must be larger than the string length, because the string you
- * pass is supposed to be a NUL-terminated string.  This string _must_ be
- * malloc()ed, and after attaching, the pointer cannot be relied upon
- * anymore, and neither be free()d directly.
- */
-void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
+ * Attach a string to a buffer. You should specify the string to attach
+ * and its length. The amount of allocated memory will be assumed to be
+ * one greater than the length. The string you pass is supposed to be a
+ * NUL-terminated string.  This string _must_ be malloc()ed, and after
+ * attaching, the pointer cannot be relied upon anymore, nor should it
+ * be free()d directly.
+ */
+void strbuf_attach(struct strbuf *sb, void *str, size_t len);
 
 /**
  * Swap the contents of two string buffers.
diff --git a/apply.c b/apply.c
index 144c19aaca..678cee5535 100644
--- a/apply.c
+++ b/apply.c
@@ -3251,7 +3251,7 @@ static int read_blob_object(struct strbuf *buf, const struct object_id *oid, uns
 		if (!result)
 			return -1;
 		/* XXX read_sha1_file NUL-terminates */
-		strbuf_attach(buf, result, sz, sz + 1);
+		strbuf_attach(buf, result, sz);
 	}
 	return 0;
 }
diff --git a/archive.c b/archive.c
index fb39706120..bef66f8574 100644
--- a/archive.c
+++ b/archive.c
@@ -89,7 +89,7 @@ void *object_file_to_archive(const struct archiver_args *args,
 		struct strbuf buf = STRBUF_INIT;
 		size_t size = 0;
 
-		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
+		strbuf_attach(&buf, buffer, *sizep);
 		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
 		if (commit)
 			format_subst(commit, buf.buf, buf.len, &buf);
diff --git a/blame.c b/blame.c
index 29770e5c81..64218c2c6e 100644
--- a/blame.c
+++ b/blame.c
@@ -241,7 +241,7 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 		case S_IFREG:
 			if (opt->flags.allow_textconv &&
 			    textconv_object(r, read_from, mode, &null_oid, 0, &buf_ptr, &buf_len))
-				strbuf_attach(&buf, buf_ptr, buf_len, buf_len + 1);
+				strbuf_attach(&buf, buf_ptr, buf_len);
 			else if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size)
 				die_errno("cannot open or read '%s'", read_from);
 			break;
diff --git a/builtin/am.c b/builtin/am.c
index e6a9fe8111..0986ea4af2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1101,7 +1101,7 @@ static void am_append_signoff(struct am_state *state)
 {
 	struct strbuf sb = STRBUF_INIT;
 
-	strbuf_attach(&sb, state->msg, state->msg_len, state->msg_len + 1);
+	strbuf_attach(&sb, state->msg, state->msg_len);
 	append_signoff(&sb, 0, 0);
 	state->msg = strbuf_detach(&sb, &state->msg_len);
 }
diff --git a/convert.c b/convert.c
index 5aa87d45e3..ee1941c662 100644
--- a/convert.c
+++ b/convert.c
@@ -467,7 +467,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
 		free(re_src);
 	}
 
-	strbuf_attach(buf, dst, dst_len, dst_len + 1);
+	strbuf_attach(buf, dst, dst_len);
 	return 1;
 }
 
@@ -492,7 +492,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
 		return 0;
 	}
 
-	strbuf_attach(buf, dst, dst_len, dst_len + 1);
+	strbuf_attach(buf, dst, dst_len);
 	return 1;
 }
 
diff --git a/imap-send.c b/imap-send.c
index 6c54d8c29d..b36b9ff62d 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1212,7 +1212,7 @@ static void lf_to_crlf(struct strbuf *msg)
 			new_msg[j++] = '\r';
 		lastc = new_msg[j++] = msg->buf[i];
 	}
-	strbuf_attach(msg, new_msg, j, j + 1);
+	strbuf_attach(msg, new_msg, j);
 }
 
 /*
diff --git a/mailinfo.c b/mailinfo.c
index af5d2cad31..cdcb6af8c1 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -459,7 +459,7 @@ static int convert_to_utf8(struct mailinfo *mi,
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out) + 1);
+	strbuf_attach(line, out, strlen(out));
 	return 0;
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index d92e2acf1e..5cdd797ada 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2963,7 +2963,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 		free(buf);
 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
 	}
-	strbuf_attach(dst, buf, size, size + 1);
+	strbuf_attach(dst, buf, size);
 	return 0;
 }
 
diff --git a/path.c b/path.c
index 5745a71d36..5474bbd079 100644
--- a/path.c
+++ b/path.c
@@ -815,8 +815,7 @@ const char *enter_repo(const char *path, int strict)
 			char *newpath = expand_user_path(used_path.buf, 0);
 			if (!newpath)
 				return NULL;
-			strbuf_attach(&used_path, newpath, strlen(newpath),
-				      strlen(newpath) + 1);
+			strbuf_attach(&used_path, newpath, strlen(newpath));
 		}
 		for (i = 0; suffix[i]; i++) {
 			struct stat st;
diff --git a/pretty.c b/pretty.c
index 28afc701b6..e1abe8ac89 100644
--- a/pretty.c
+++ b/pretty.c
@@ -590,7 +590,7 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 		return buf; /* should not happen but be defensive */
 	len = cp + 1 - (buf + start);
 
-	strbuf_attach(&tmp, buf, strlen(buf), strlen(buf) + 1);
+	strbuf_attach(&tmp, buf, strlen(buf));
 	if (is_encoding_utf8(encoding)) {
 		/* we have re-coded to UTF-8; drop the header */
 		strbuf_remove(&tmp, start, len);
@@ -1687,7 +1687,7 @@ void repo_format_commit_message(struct repository *r,
 		char *out = reencode_string_len(sb->buf, sb->len,
 						output_enc, utf8, &outsz);
 		if (out)
-			strbuf_attach(sb, out, outsz, outsz + 1);
+			strbuf_attach(sb, out, outsz);
 	}
 
 	free(context.commit_encoding);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 76bb2ef490..4ce9f2cda4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1514,7 +1514,7 @@ static int commit_ref(struct ref_lock *lock)
 		size_t len = strlen(path);
 		struct strbuf sb_path = STRBUF_INIT;
 
-		strbuf_attach(&sb_path, path, len, len + 1);
+		strbuf_attach(&sb_path, path, len);
 
 		/*
 		 * If this fails, commit_lock_file() will also fail
diff --git a/strbuf.c b/strbuf.c
index deb338412e..e74253f91d 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -77,13 +77,16 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
 	return res;
 }
 
-void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
+void strbuf_attach(struct strbuf *sb, void *buf, size_t len)
 {
+	if (!buf)
+		BUG("NULL-buffer in strbuf_attach");
+	if (unsigned_add_overflows(len, 1))
+		die("you want to use way too much memory");
 	strbuf_release(sb);
 	sb->buf   = buf;
 	sb->len   = len;
-	sb->alloc = alloc;
-	strbuf_grow(sb, 0);
+	sb->alloc = len + 1;
 	sb->buf[sb->len] = '\0';
 }
 
@@ -152,7 +155,7 @@ int strbuf_reencode(struct strbuf *sb, const char *from, const char *to)
 	if (!out)
 		return -1;
 
-	strbuf_attach(sb, out, len, len + 1);
+	strbuf_attach(sb, out, len);
 	return 0;
 }
 
diff --git a/trailer.c b/trailer.c
index 135f71aef1..30bc622723 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1095,8 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	for (ptr = trailer_lines; *ptr; ptr++) {
 		if (last && isspace((*ptr)->buf[0])) {
 			struct strbuf sb = STRBUF_INIT;
-			strbuf_attach(&sb, *last, strlen(*last),
-				      strlen(*last) + 1);
+			strbuf_attach(&sb, *last, strlen(*last));
 			strbuf_addbuf(&sb, *ptr);
 			*last = strbuf_detach(&sb, NULL);
 			continue;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-18 19:56 ` Martin Ågren
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
@ 2020-04-18 23:12   ` Junio C Hamano
  2020-04-19  2:48     ` Danh Doan
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2020-04-18 23:12 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Đoàn Trần Công Danh, Git Mailing List

Martin Ågren <martin.agren@gmail.com> writes:

> This is equivalent as long as `line->len` is equal to
> `strlen(line->buf)`, which it will be (should be) because it's a
> strbuf. Ok.

For the guarantee to hold true, line->buf[0..line->len] should not
have any '\0' byte in it.  

This helper has two callers, but in either case, it needs to be
prepared to work on output of decode_[bq]_segment().  Is there code
anywhere that guarntees that the decoding won't stuff '\0' in the
line?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-18 23:12   ` [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Junio C Hamano
@ 2020-04-19  2:48     ` Danh Doan
  2020-04-19  4:34       ` Martin Ågren
  0 siblings, 1 reply; 39+ messages in thread
From: Danh Doan @ 2020-04-19  2:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Ågren, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
> 
> > This is equivalent as long as `line->len` is equal to
> > `strlen(line->buf)`, which it will be (should be) because it's a
> > strbuf. Ok.
> 
> For the guarantee to hold true, line->buf[0..line->len] should not
> have any '\0' byte in it.  
> 
> This helper has two callers, but in either case, it needs to be
> prepared to work on output of decode_[bq]_segment().  Is there code
> anywhere that guarntees that the decoding won't stuff '\0' in the
> line?

- First caller: `decode_header`, we don't have such guarantee,
  The old code will discard everything after first NUL characters.
  I'm _not_ really familiar with email standard, does email allow
  UTF-16 (albeit in [bq]-encoded) in header? If yes, the current code
  is likely disallow it. The new code accidentally, accept
  [bq]-encoded-utf-16 header and reencode to utf-8 for commit.
  Yes, it's very likely only UTF-8, ISO-8859-1, some variant of
  ISO-2022 is used nowaday in email.
  *If* we would like to not exclude UTF-16, the new question is should
  we trust the length of newly converted utf-8 string.

- Second caller: `handle_commit_msg`: everything get interesting from
  here.

Get back to the question of trusting the length of newly converted
utf-8 string.

I _think_ we should,
because I _think_ there shouldn't be any discrimination against any
encoding (be it utf-8, ISO-8859-1, etc...),
the current code allows NUL character in utf-8 [bq]-encoded string
in this function (early return) and its caller,
and report an error later:

	error: a NUL byte in commit log message not allowed.

meanwhile, if the email was sent in other encoding, the current code 
discards everything after NUL in that line,
thus silently accept broken commit message.

Attached is the faulty patch in ISO-8859-1, which was used to
demonstrate my words.
The current code will make a commit with only "Áb" in the body,
while the new code rightly claims we have a faulty email.

-- 
Danh

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: faulty.patch --]
[-- Type: text/x-diff; charset=utf-8, Size: 579 bytes --]

From e27fb59f5a4854efc0badd902673fd636bcab52e Mon Sep 17 00:00:00 2001
From: =?ISO-8859-1?q?=C4=90o=C3=A0n=20Tr=E1=BA=A7n=20C=C3=B4ng=20Danh?=
 <congdanhqx@gmail.com>
Date: Sun, 19 Apr 2020 09:31:22 +0700
Subject: [PATCH] =?ISO-8859-1?q?=C4=CB=D1=CF=D6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

Áb\0çdèfg

---
 afile | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 afile

diff --git a/afile b/afile
new file mode 100644
index 0000000000..e69de29bb2
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19  2:48     ` Danh Doan
@ 2020-04-19  4:34       ` Martin Ågren
  2020-04-19  5:32         ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19  4:34 UTC (permalink / raw)
  To: Danh Doan; +Cc: Junio C Hamano, Git Mailing List

On Sun, 19 Apr 2020 at 04:48, Danh Doan <congdanhqx@gmail.com> wrote:
>
> On 2020-04-18 16:12:06-0700, Junio C Hamano <gitster@pobox.com> wrote:
> > Martin Ågren <martin.agren@gmail.com> writes:
> >
> > > This is equivalent as long as `line->len` is equal to
> > > `strlen(line->buf)`, which it will be (should be) because it's a
> > > strbuf. Ok.
> >
> > For the guarantee to hold true, line->buf[0..line->len] should not
> > have any '\0' byte in it.

Yes. I sort of assumed it shouldn't ("because strbuf"), but it's a good
question. Especially considering this is about various encodings..

This assumption that "strlen is length so the conversion is a no-op"
could potentially be broken both for input (line->len) and output
(out_len).

> > This helper has two callers, but in either case, it needs to be
> > prepared to work on output of decode_[bq]_segment().  Is there code
> > anywhere that guarntees that the decoding won't stuff '\0' in the
> > line?
>
> the current code allows NUL character in utf-8 [bq]-encoded string
> in this function (early return) and its caller,
> and report an error later:
>
>         error: a NUL byte in commit log message not allowed.
>
> meanwhile, if the email was sent in other encoding, the current code
> discards everything after NUL in that line,
> thus silently accept broken commit message.

My knee-jerk reaction to Junio's question was along the same line:
surely if we could have a NUL in there, the current `strlen()` would use
it as an excuse to silently truncate the string, either before
processing or afterwards. Thanks for looking into that more.

> Attached is the faulty patch in ISO-8859-1, which was used to
> demonstrate my words.
> The current code will make a commit with only "Áb" in the body,
> while the new code rightly claims we have a faulty email.

Good find.

Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (5 preceding siblings ...)
  2020-04-18 20:18     ` [PATCH 6/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
@ 2020-04-19  4:44     ` Martin Ågren
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
  7 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-19  4:44 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Đoàn Trần Công Danh

On Sat, 18 Apr 2020 at 22:18, Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Sat, 18 Apr 2020 at 21:56, Martin Ågren <martin.agren@gmail.com> wrote:
> > > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > > +       strbuf_attach(line, out, out_len, out_len);
> >
> > This conversion is ok as such. I wondered why we pass in the same
> > value twice (before and after this patch). Turns out this usage is wrong
> > (as per the documentation in strbuf.h) but safe (as per my understanding
> > of the implementation in strbuf.c). I'll follow up with a series that
> > fell out of that investigation.
>
> Here's that series. It could go in parallel to Danh's, or one could go
> on top of the other. Any of those would be ok with me.

I realize now that some of the reasoning in this series is incorrect.
`strbuf_grow()` will use `ALLOC_GROW` meaning we might reallocate
cheaply rather than doing an entire allocate+copy+free cycle.

Still, there is some buggyness in the usage because we might reallocate
(with a 50% overhead) even when we shouldn't really need to.

I'll resubmit this after tackling it from a different angle: I'll
introduce `strbuf_attachstr()`, simplifying and robustifyng most users.
Then I'll switch a few incorrect users of `strbuf_attach()` to
`strbuf_attachstr()`. Then a small number of users will continue using
`strbuf_attach()` with `alloc == len` for $reasons.

> The summary is that this function takes `len` and `alloc`, where the
> latter must be greater than the former, yet several callers use the same
> value for both. I first thought this could cause quite hairy problems
> such as writing outside of allocated memory, but it doesn't look that
> way. See the patches for more information.
>
> An alternative to the approach taken here would be to introduce
> `strbuf_attachstr()` and convert most existing users, then convert the
> few remaining ones to use the new function or to move in another
> direction. But the new name is a downer -- what else would you attach to
> a strbuf if not a string?

So this is what I'll do instead. The reasoning is, you can attach a
string (a NUL-terminated buffer) or a non-string (non-NUL-terminated
buffer).

Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19  4:34       ` Martin Ågren
@ 2020-04-19  5:32         ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2020-04-19  5:32 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Danh Doan, Git Mailing List

Martin Ågren <martin.agren@gmail.com> writes:

> My knee-jerk reaction to Junio's question was along the same line:
> surely if we could have a NUL in there, the current `strlen()` would use
> it as an excuse to silently truncate the string, either before
> processing or afterwards. Thanks for looking into that more.

Exactly.

The change introduces a behaviour change, and we should describe it
in the log message to help future developers know that we did this
change, knowingly that we are changing the behaviour, and believing
that the new behaviour is a better one.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/3] mailinfo: disallow and complains about NUL character
  2020-04-18  3:54 [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
  2020-04-18 19:56 ` Martin Ågren
@ 2020-04-19 11:00 ` Đoàn Trần Công Danh
  2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
                     ` (2 more replies)
  2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
  2 siblings, 3 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-19 11:00 UTC (permalink / raw)
  To: git
  Cc: Đoàn Trần Công Danh, Junio C Hamano,
	Martin Ågren


On 2020-04-18 22:32:30-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > My knee-jerk reaction to Junio's question was along the same line:
> > surely if we could have a NUL in there, the current `strlen()` would use
> > it as an excuse to silently truncate the string, either before
> > processing or afterwards. Thanks for looking into that more.
>
> Exactly.
>
> The change introduces a behaviour change, and we should describe it
> in the log message to help future developers know that we did this
> change, knowingly that we are changing the behaviour, and believing
> that the new behaviour is a better one.

Well, I intended to do something else first (because of Thunderbird started
with HTML email and replace space with non-breaking-space).

I made this change without thinking that much :( And I weren't sure if this
change is well received.

I've rephased the commit message, added some test, and disallow another NUL
characters mailinfo.

-------------------8<------------------------

As of current state, Git disallow NUL character in commit message.

This indirectly disallow NUL in the body of commit message that sent by email
in UTF-8 encoding.

In those below cases:

* NUL character in header field (be it Subject, Author, Email, etc...)
* NUL in the body of commit message that originally sent in other than UTF-8
  encoding

Git silently accepts them, albeit, truncate at the NUL character.

Those email is clearly not generated by Git, they must be crafted.

Complains loudly about those NUL characters.

Đoàn Trần Công Danh (3):
  t4254: merge 2 steps of a single test
  mailinfo.c::convert_to_utf8: reuse strlen info
  mailinfo: disallow NUL character in mail's header

 mailinfo.c            | 11 +++++++--
 t/t4254-am-corrupt.sh | 52 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 57 insertions(+), 6 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  d1bc31e692 t4254: merge 2 steps of a single test
1:  9ce4b7c350 ! 2:  e3e542d388 mailinfo.c::convert_to_utf8: reuse strlen info
    @@ Commit message
         We're passing buffer from strbuf to reencode_string,
         which will call strlen(3) on that buffer,
         and discard the length of newly created buffer.
    -
         Then, we compute the length of the return buffer to attach to strbuf.
     
    +    During this process, we introduce a discrimination between mail
    +    originally written in utf-8 and other encoding.
    +
    +    * if the email was written in utf-8, we leave it as is. If there is
    +      a NUL character in that line, we complains loudly:
    +
    +            error: a NUL byte in commit log message not allowed.
    +
    +    * if the email was written in other encoding, we truncate the data as
    +      the NUL character in that line, then we used the truncated line for
    +      the metadata.
    +
         We can do better by reusing all the available information,
         and call the underlying lower level function that will be called
    -    indirectly by reencode_string.
    +    indirectly by reencode_string. By doing this, we will also postpone
    +    the NUL character processing to the commit step, which will
    +    complains about the faulty metadata.
     
         Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
     
    @@ mailinfo.c: static int convert_to_utf8(struct mailinfo *mi,
      	return 0;
      }
      
    +
    + ## t/t4254-am-corrupt.sh ##
    +@@
    + test_description='git am with corrupt input'
    + . ./test-lib.sh
    + 
    ++write_nul_patch() {
    ++	space=' '
    ++	qNUL=
    ++	case "$1" in
    ++		subject) qNUL='=00' ;;
    ++	esac
    ++	cat <<-EOF
    ++	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
    ++	From: A U Thor <author@example.com>
    ++	Date: Sun, 19 Apr 2020 13:42:07 +0700
    ++	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
    ++	MIME-Version: 1.0
    ++	Content-Type: text/plain; charset=ISO-8859-1
    ++	Content-Transfer-Encoding: 8bit
    ++
    ++	EOF
    ++	if test "$1" = body
    ++	then
    ++		printf "%s\0%s\n" abc def
    ++	fi
    ++	cat <<-\EOF
    ++	---
    ++	diff --git a/afile b/afile
    ++	new file mode 100644
    ++	index 0000000000..e69de29bb2
    ++	--$space
    ++	2.26.1
    ++	EOF
    ++}
    ++
    + test_expect_success setup '
    + 	# Note the missing "+++" line:
    + 	cat >bad-patch.diff <<-\EOF &&
    +@@ t/t4254-am-corrupt.sh: test_expect_success 'try to apply corrupted patch' '
    + 	test_i18ncmp expected actual
    + '
    + 
    ++test_expect_success "NUL in commit message's body" '
    ++	test_when_finished "git am --abort" &&
    ++	write_nul_patch body >body.patch &&
    ++	test_must_fail git am body.patch 2>err &&
    ++	grep "a NUL byte in commit log message not allowed" err
    ++'
    ++
    ++test_expect_failure "NUL in commit message's header" '
    ++	test_when_finished "git am --abort" &&
    ++	write_nul_patch subject >subject.patch &&
    ++	test_must_fail git am subject.patch 2>err &&
    ++	grep "a NUL byte in Subject is not allowed" err
    ++'
    ++
    + test_done
-:  ---------- > 3:  cb96947b36 mailinfo: disallow NUL character in mail's header
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/3] t4254: merge 2 steps of a single test
  2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
@ 2020-04-19 11:00   ` Đoàn Trần Công Danh
  2020-04-19 12:25     ` Martin Ågren
  2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
  2020-04-19 11:00   ` [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
  2 siblings, 1 reply; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-19 11:00 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

While we are at it, make sure we run a clean up after testing.
In a later patch, we will test for more corrupted patch.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 t/t4254-am-corrupt.sh | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index fd3bdbfe2c..ddd35498db 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -25,10 +25,8 @@ test_expect_success setup '
 #   fatal: unable to write file '(null)' mode 100644: Bad address
 # Also, it had the unwanted side-effect of deleting f.
 test_expect_success 'try to apply corrupted patch' '
-	test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual
-'
-
-test_expect_success 'compare diagnostic; ensure file is still here' '
+	test_when_finished "git am --abort" &&
+	test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual &&
 	echo "error: git diff header lacks filename information (line 4)" >expected &&
 	test_path_is_file f &&
 	test_i18ncmp expected actual
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
  2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
@ 2020-04-19 11:00   ` Đoàn Trần Công Danh
  2020-04-19 12:29     ` Martin Ågren
  2020-04-20 19:59     ` Junio C Hamano
  2020-04-19 11:00   ` [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
  2 siblings, 2 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-19 11:00 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

We're passing buffer from strbuf to reencode_string,
which will call strlen(3) on that buffer,
and discard the length of newly created buffer.
Then, we compute the length of the return buffer to attach to strbuf.

During this process, we introduce a discrimination between mail
originally written in utf-8 and other encoding.

* if the email was written in utf-8, we leave it as is. If there is
  a NUL character in that line, we complains loudly:

  	error: a NUL byte in commit log message not allowed.

* if the email was written in other encoding, we truncate the data as
  the NUL character in that line, then we used the truncated line for
  the metadata.

We can do better by reusing all the available information,
and call the underlying lower level function that will be called
indirectly by reencode_string. By doing this, we will also postpone
the NUL character processing to the commit step, which will
complains about the faulty metadata.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c            |  6 ++++--
 t/t4254-am-corrupt.sh | 44 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 742fa376ab..0e9911df6d 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
 			   struct strbuf *line, const char *charset)
 {
 	char *out;
+	size_t out_len;
 
 	if (!mi->metainfo_charset || !charset || !*charset)
 		return 0;
 
 	if (same_encoding(mi->metainfo_charset, charset))
 		return 0;
-	out = reencode_string(line->buf, mi->metainfo_charset, charset);
+	out = reencode_string_len(line->buf, line->len,
+				  mi->metainfo_charset, charset, &out_len);
 	if (!out) {
 		mi->input_error = -1;
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out));
+	strbuf_attach(line, out, out_len, out_len);
 	return 0;
 }
 
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index ddd35498db..98cda32d0a 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -3,6 +3,36 @@
 test_description='git am with corrupt input'
 . ./test-lib.sh
 
+write_nul_patch() {
+	space=' '
+	qNUL=
+	case "$1" in
+		subject) qNUL='=00' ;;
+	esac
+	cat <<-EOF
+	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
+	From: A U Thor <author@example.com>
+	Date: Sun, 19 Apr 2020 13:42:07 +0700
+	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
+	MIME-Version: 1.0
+	Content-Type: text/plain; charset=ISO-8859-1
+	Content-Transfer-Encoding: 8bit
+
+	EOF
+	if test "$1" = body
+	then
+		printf "%s\0%s\n" abc def
+	fi
+	cat <<-\EOF
+	---
+	diff --git a/afile b/afile
+	new file mode 100644
+	index 0000000000..e69de29bb2
+	--$space
+	2.26.1
+	EOF
+}
+
 test_expect_success setup '
 	# Note the missing "+++" line:
 	cat >bad-patch.diff <<-\EOF &&
@@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
 	test_i18ncmp expected actual
 '
 
+test_expect_success "NUL in commit message's body" '
+	test_when_finished "git am --abort" &&
+	write_nul_patch body >body.patch &&
+	test_must_fail git am body.patch 2>err &&
+	grep "a NUL byte in commit log message not allowed" err
+'
+
+test_expect_failure "NUL in commit message's header" '
+	test_when_finished "git am --abort" &&
+	write_nul_patch subject >subject.patch &&
+	test_must_fail git am subject.patch 2>err &&
+	grep "a NUL byte in Subject is not allowed" err
+'
+
 test_done
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header
  2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
  2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
  2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
@ 2020-04-19 11:00   ` Đoàn Trần Công Danh
  2020-04-19 12:30     ` Martin Ågren
  2 siblings, 1 reply; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-19 11:00 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c            | 5 +++++
 t/t4254-am-corrupt.sh | 4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/mailinfo.c b/mailinfo.c
index 0e9911df6d..c31991e621 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -1138,6 +1138,11 @@ static void handle_info(struct mailinfo *mi)
 		else
 			continue;
 
+		if (memchr(hdr->buf, '\0', hdr->len)) {
+			error("a NUL byte in %s is not allowed.", header[i]);
+			mi->input_error = -1;
+		}
+
 		if (!strcmp(header[i], "Subject")) {
 			if (!mi->keep_subject) {
 				cleanup_subject(mi, hdr);
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 98cda32d0a..d9d1ac6c7d 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -69,9 +69,11 @@ test_expect_success "NUL in commit message's body" '
 	grep "a NUL byte in commit log message not allowed" err
 '
 
-test_expect_failure "NUL in commit message's header" '
+test_expect_success "NUL in commit message's header" '
 	test_when_finished "git am --abort" &&
 	write_nul_patch subject >subject.patch &&
+	test_must_fail git mailinfo msg patch <subject.patch 2>err &&
+	grep "a NUL byte in Subject is not allowed" err &&
 	test_must_fail git am subject.patch 2>err &&
 	grep "a NUL byte in Subject is not allowed" err
 '
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] t4254: merge 2 steps of a single test
  2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
@ 2020-04-19 12:25     ` Martin Ågren
  2020-04-19 14:17       ` Danh Doan
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:25 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: Git Mailing List

On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
>  #   fatal: unable to write file '(null)' mode 100644: Bad address
>  # Also, it had the unwanted side-effect of deleting f.

(This comment looks a bit unnecessary, but that's not new.)

>  test_expect_success 'try to apply corrupted patch' '
> -       test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual
> -'
> -
> -test_expect_success 'compare diagnostic; ensure file is still here' '
> +       test_when_finished "git am --abort" &&
> +       test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual &&
>         echo "error: git diff header lacks filename information (line 4)" >expected &&
>         test_path_is_file f &&
>         test_i18ncmp expected actual

Makes sense. The first mini-test used to just set up "actual" and there
was only one test using it, so it's hard to argue it was a "setup"
phase. This looks better.


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
@ 2020-04-19 12:29     ` Martin Ågren
  2020-04-19 14:16       ` Danh Doan
  2020-04-20 19:59     ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:29 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: Git Mailing List

On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
>
> We're passing buffer from strbuf to reencode_string,
> which will call strlen(3) on that buffer,
> and discard the length of newly created buffer.
> Then, we compute the length of the return buffer to attach to strbuf.
>
> During this process, we introduce a discrimination between mail
> originally written in utf-8 and other encoding.
>
> * if the email was written in utf-8, we leave it as is. If there is
>   a NUL character in that line, we complains loudly:
>
>         error: a NUL byte in commit log message not allowed.
>
> * if the email was written in other encoding, we truncate the data as
>   the NUL character in that line, then we used the truncated line for
>   the metadata.
>
> We can do better by reusing all the available information,
> and call the underlying lower level function that will be called
> indirectly by reencode_string. By doing this, we will also postpone
> the NUL character processing to the commit step, which will
> complains about the faulty metadata.
>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>

This makes sense to me.

I think the subject could be adapted though? Now it's not about "reusing
info" anymore, it's about using *other*, *better* info. Maybe

  mailinfo.c: avoid strlen on strings that might contain NUL

? Could probably be improved further..

> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
>                            struct strbuf *line, const char *charset)
>  {
>         char *out;
> +       size_t out_len;
>
>         if (!mi->metainfo_charset || !charset || !*charset)
>                 return 0;
>
>         if (same_encoding(mi->metainfo_charset, charset))
>                 return 0;
> -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> +       out = reencode_string_len(line->buf, line->len,
> +                                 mi->metainfo_charset, charset, &out_len);
>         if (!out) {
>                 mi->input_error = -1;
>                 return error("cannot convert from %s to %s",
>                              charset, mi->metainfo_charset);
>         }
> -       strbuf_attach(line, out, strlen(out), strlen(out));
> +       strbuf_attach(line, out, out_len, out_len);
>         return 0;
>  }

Same diff as before, ok.

> +write_nul_patch() {
> +       space=' '
> +       qNUL=
> +       case "$1" in
> +               subject) qNUL='=00' ;;
> +       esac

Here it's case/esac...

> +       cat <<-EOF
> +       From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> +       From: A U Thor <author@example.com>
> +       Date: Sun, 19 Apr 2020 13:42:07 +0700
> +       Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> +       MIME-Version: 1.0
> +       Content-Type: text/plain; charset=ISO-8859-1
> +       Content-Transfer-Encoding: 8bit
> +
> +       EOF
> +       if test "$1" = body
> +       then
> +               printf "%s\0%s\n" abc def
> +       fi

Here it's if/fi. Looks a bit inconsistent.

I suppose you tried to have a case for "body" above but couldn't get it
to work? Somehow, it would seem more consistent to have a qSubject and
qBody and handle them the same way, but maybe that's not possible?
Maybe using q_to_nul to create qBody containing a NUL?

> +       cat <<-\EOF
> +       ---
> +       diff --git a/afile b/afile
> +       new file mode 100644
> +       index 0000000000..e69de29bb2
> +       --$space
> +       2.26.1
> +       EOF
> +}

I think this helper function should use &&-chaining.

> +
>  test_expect_success setup '
>         # Note the missing "+++" line:
>         cat >bad-patch.diff <<-\EOF &&
> @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
>         test_i18ncmp expected actual
>  '
>
> +test_expect_success "NUL in commit message's body" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch body >body.patch &&
> +       test_must_fail git am body.patch 2>err &&
> +       grep "a NUL byte in commit log message not allowed" err
> +'

Makes sense.

> +
> +test_expect_failure "NUL in commit message's header" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch subject >subject.patch &&
> +       test_must_fail git am subject.patch 2>err &&
> +       grep "a NUL byte in Subject is not allowed" err
> +'

Interesting. :-)


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header
  2020-04-19 11:00   ` [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
@ 2020-04-19 12:30     ` Martin Ågren
  2020-04-19 14:24       ` Danh Doan
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:30 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: Git Mailing List

On Sun, 19 Apr 2020 at 13:05, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> --- a/t/t4254-am-corrupt.sh
> +++ b/t/t4254-am-corrupt.sh
> @@ -69,9 +69,11 @@ test_expect_success "NUL in commit message's body" '
>         grep "a NUL byte in commit log message not allowed" err
>  '
>
> -test_expect_failure "NUL in commit message's header" '
> +test_expect_success "NUL in commit message's header" '
>         test_when_finished "git am --abort" &&
>         write_nul_patch subject >subject.patch &&
> +       test_must_fail git mailinfo msg patch <subject.patch 2>err &&
> +       grep "a NUL byte in Subject is not allowed" err &&
>         test_must_fail git am subject.patch 2>err &&
>         grep "a NUL byte in Subject is not allowed" err
>  '

We used to fail for some reason and it's sort of clear from the size
of the test what that reason is. Now that we flip "failure" to "success"
we can add some more stuff here that works. Makes sense. Of course,
somewhere is a line for stuffing too much into the test, i.e., you'll
reach a point where you should have separate tests, but I think this is
ok.


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it
  2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
                       ` (6 preceding siblings ...)
  2020-04-19  4:44     ` [PATCH 0/6] " Martin Ågren
@ 2020-04-19 12:32     ` Martin Ågren
  2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
                         ` (3 more replies)
  7 siblings, 4 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:32 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

This is v2 of my attempt to look into how `strbuf_attach()` works, how
it's documented and how it's actually used.

This is based on top of Danh's series [1]. Both their patch 2/3 and my
4/4 reason about "NUL" at a particular spot. In short, they conflict
about whether to use function "A" or "B" at one spot and a merge should
(in my opinion) use function "C". That seemed odd enough to me that I
decided to build on top instead.

`strbuf_attach()` has always handled `mem == len`. This was intentional.
When the function was eventually documented, this usage was incorrectly
documented as being wrong.

That documentation didn't stop lots of callers from using it
"incorrectly", though. I first fix the documentation. I then try to
avoid such usage anyway, partly because it can be confusing, partly
because if we know we have enough memory, there's no need to reallocate
a larger buffer.

Martin

[1] https://lore.kernel.org/git/cover.1587289680.git.congdanhqx@gmail.com/

Martin Ågren (4):
  strbuf: fix doc for `strbuf_attach()`
  strbuf: introduce `strbuf_attachstr_len()`
  strbuf: introduce `strbuf_attachstr()`
  strbuf_attach: prefer `strbuf_attachstr_len()`

 strbuf.h             | 40 ++++++++++++++++++++++++++++++++++++----
 apply.c              |  2 +-
 archive.c            |  2 +-
 blame.c              |  2 +-
 builtin/am.c         |  2 +-
 convert.c            |  4 ++--
 fast-import.c        |  2 +-
 imap-send.c          |  2 +-
 mailinfo.c           |  2 +-
 merge-recursive.c    |  2 +-
 path.c               |  3 +--
 pretty.c             |  4 ++--
 refs/files-backend.c |  3 +--
 trailer.c            |  2 +-
 14 files changed, 51 insertions(+), 21 deletions(-)

-- 
2.26.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/4] strbuf: fix doc for `strbuf_attach()`
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
@ 2020-04-19 12:32       ` Martin Ågren
  2020-04-20 17:30         ` Junio C Hamano
  2020-04-19 12:32       ` [PATCH 2/4] strbuf: introduce `strbuf_attachstr_len()` Martin Ågren
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:32 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15)
added `strbuf_attach()`. In the commit message, it is explicitly
discussed how using `mem == len` is ok, although possibly costly. When
this function was documented in dd613e6b87 ("Strbuf documentation:
document most functions", 2008-06-04), this distinction was missed and
the documentation simply forbids this case.

The function handles reallocation and truncation, yet the docs say that
the "amount [of allocated memory] must be larger than the string length,
because the string you pass is supposed to be a NUL-terminated string".

Several callers pass in `mem == len` meaning they are compatible with
the implementation and the original commit message, but not with the
documented behavior. In fact, compared to the documentation, they look
outright dangerous.

Later commits will convert most of those call sites to use new, simpler
interfaces, but in at least one instance we really do want to use this
possibility of passing the same value twice: In rerere.c, it is not safe
to pass in "len, len + 1". Doing so makes `strbuf_attach()` write a
trailing NUL at `result.ptr[len]` and running our test suite with
AddressSanitizer will spot an out-of-bounds write due to this.

That is, rerere.c really has use for the flexibility that `mem == len`
gives us. Let's update the documentation to clarify that this is ok.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index ce8e49c0b2..2a462f70cc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
 /**
  * Attach a string to a buffer. You should specify the string to attach,
  * the current length of the string and the amount of allocated memory.
- * The amount must be larger than the string length, because the string you
- * pass is supposed to be a NUL-terminated string.  This string _must_ be
- * malloc()ed, and after attaching, the pointer cannot be relied upon
- * anymore, and neither be free()d directly.
+ * The amount must be at least as large as the string length. If the two
+ * lengths are equal, reallocation will be handled as appropriate and in
+ * any case, the string will be NUL-truncated as implied by `len`.
+ *
+ * This string _must_ be malloc()ed, and after attaching, the pointer
+ * cannot be relied upon anymore, and neither be free()d directly.
  */
 void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/4] strbuf: introduce `strbuf_attachstr_len()`
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
  2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
@ 2020-04-19 12:32       ` Martin Ågren
  2020-04-19 12:32       ` [PATCH 3/4] strbuf: introduce `strbuf_attachstr()` Martin Ågren
  2020-04-19 12:32       ` [PATCH 4/4] strbuf_attach: prefer `strbuf_attachstr_len()` Martin Ågren
  3 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:32 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Most callers of `strbuf_attach()` provide `len + 1` for the `mem`
parameter. That's a bit tedious and, as we will see in the next commit,
also fairly prone to mistakes.

Provide `strbuf_attachstr_len()` for this common case to simplify
several callers of `strbuf_attach()` by making this new function simply
assume that the size of the allocated buffer is one greater than the
length.

We know that the string has already been allocated with space for the
trailing NUL, meaning it is safe to compute `len + 1`.

Disallow NULL-pointers entirely. We could handle `(buf, len) == (NULL,
0)` specially, but none of the callers we convert here seem to worry
about such a case. Handling this corner case specially can still be done
using the regular `strbuf_attach()`.

Another edge case is where someone doesn't have a NUL at `buf[len]`,
i.e., they are (hopefully) trying to attach a substring of some larger
string. One could argue that they should be using `strbuf_attach()` and
that this is BUG-worthy, or that it would be easy enough for us to place
a NUL there for robustness and carry on. This commit does the latter,
but does not have a strong opinion about it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h          | 19 +++++++++++++++++++
 apply.c           |  2 +-
 archive.c         |  2 +-
 blame.c           |  2 +-
 convert.c         |  4 ++--
 imap-send.c       |  2 +-
 merge-recursive.c |  2 +-
 pretty.c          |  2 +-
 8 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index 2a462f70cc..7d0aeda434 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -121,6 +121,25 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
  */
 void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);
 
+/**
+ * Attach a string to a buffer. You should specify the string to attach
+ * and its length.
+ *
+ * The amount of allocated memory will be assumed to be one greater than
+ * its length. The string you pass _must_ be a NUL-terminated string.
+ * This string _must_ be malloc()ed, and after attaching, the pointer
+ * cannot be relied upon anymore, nor should it be free()d directly.
+ *
+ * Do _not_ use this to truncate the string. That is, the length really
+ * must be `len` already. To truncate (yet keeping track of the amount
+ * of allocated memory), use `strbuf_attach()`.
+ */
+static inline void strbuf_attachstr_len(struct strbuf *sb,
+					char *str, size_t len)
+{
+	strbuf_attach(sb, str, len, len + 1);
+}
+
 /**
  * Swap the contents of two string buffers.
  */
diff --git a/apply.c b/apply.c
index 144c19aaca..cab4055ea4 100644
--- a/apply.c
+++ b/apply.c
@@ -3251,7 +3251,7 @@ static int read_blob_object(struct strbuf *buf, const struct object_id *oid, uns
 		if (!result)
 			return -1;
 		/* XXX read_sha1_file NUL-terminates */
-		strbuf_attach(buf, result, sz, sz + 1);
+		strbuf_attachstr_len(buf, result, sz);
 	}
 	return 0;
 }
diff --git a/archive.c b/archive.c
index fb39706120..17b8add930 100644
--- a/archive.c
+++ b/archive.c
@@ -89,7 +89,7 @@ void *object_file_to_archive(const struct archiver_args *args,
 		struct strbuf buf = STRBUF_INIT;
 		size_t size = 0;
 
-		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
+		strbuf_attachstr_len(&buf, buffer, *sizep);
 		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
 		if (commit)
 			format_subst(commit, buf.buf, buf.len, &buf);
diff --git a/blame.c b/blame.c
index 29770e5c81..12ce104fcb 100644
--- a/blame.c
+++ b/blame.c
@@ -241,7 +241,7 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 		case S_IFREG:
 			if (opt->flags.allow_textconv &&
 			    textconv_object(r, read_from, mode, &null_oid, 0, &buf_ptr, &buf_len))
-				strbuf_attach(&buf, buf_ptr, buf_len, buf_len + 1);
+				strbuf_attachstr_len(&buf, buf_ptr, buf_len);
 			else if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size)
 				die_errno("cannot open or read '%s'", read_from);
 			break;
diff --git a/convert.c b/convert.c
index 5aa87d45e3..9b3a1218a7 100644
--- a/convert.c
+++ b/convert.c
@@ -467,7 +467,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
 		free(re_src);
 	}
 
-	strbuf_attach(buf, dst, dst_len, dst_len + 1);
+	strbuf_attachstr_len(buf, dst, dst_len);
 	return 1;
 }
 
@@ -492,7 +492,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
 		return 0;
 	}
 
-	strbuf_attach(buf, dst, dst_len, dst_len + 1);
+	strbuf_attachstr_len(buf, dst, dst_len);
 	return 1;
 }
 
diff --git a/imap-send.c b/imap-send.c
index 6c54d8c29d..37e5b13e51 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1212,7 +1212,7 @@ static void lf_to_crlf(struct strbuf *msg)
 			new_msg[j++] = '\r';
 		lastc = new_msg[j++] = msg->buf[i];
 	}
-	strbuf_attach(msg, new_msg, j, j + 1);
+	strbuf_attachstr_len(msg, new_msg, j);
 }
 
 /*
diff --git a/merge-recursive.c b/merge-recursive.c
index d92e2acf1e..ef259e4b74 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2963,7 +2963,7 @@ static int read_oid_strbuf(struct merge_options *opt,
 		free(buf);
 		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
 	}
-	strbuf_attach(dst, buf, size, size + 1);
+	strbuf_attachstr_len(dst, buf, size);
 	return 0;
 }
 
diff --git a/pretty.c b/pretty.c
index 28afc701b6..e171830389 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1687,7 +1687,7 @@ void repo_format_commit_message(struct repository *r,
 		char *out = reencode_string_len(sb->buf, sb->len,
 						output_enc, utf8, &outsz);
 		if (out)
-			strbuf_attach(sb, out, outsz, outsz + 1);
+			strbuf_attachstr_len(sb, out, outsz);
 	}
 
 	free(context.commit_encoding);
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/4] strbuf: introduce `strbuf_attachstr()`
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
  2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
  2020-04-19 12:32       ` [PATCH 2/4] strbuf: introduce `strbuf_attachstr_len()` Martin Ågren
@ 2020-04-19 12:32       ` Martin Ågren
  2020-04-20 19:39         ` Junio C Hamano
  2020-04-19 12:32       ` [PATCH 4/4] strbuf_attach: prefer `strbuf_attachstr_len()` Martin Ågren
  3 siblings, 1 reply; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:32 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Similar to the previous commit, introduce `strbuf_attachstr()` where we
don't even have to pass in the length of the string that we want to
attach. Convert existing callers of `strbuf_attachstr()` that use
`strlen()`.

Note how only one caller passes in `mem == len + 1` and that the others
have been using `strbuf_attach()` in direct contradiction to how it was
(incorrectly) documented up until a few commits ago.

Now that the documentation has been fixed, you might say these are all
fine. But the calling convention of `strbuf_attach()` seems sufficiently
hard to get right that it's probably a good idea to introduce this
helper.

This could help reduce reallocations and memory waste. When we
pessimistically pass in `strlen(foo)` for `mem`, the strbuf will have
`alloc == len` and will do a reallocation, not just to get one more byte
for the NUL (which would have been a no-op), but because we're using
`ALLOC_GROW` under the hood, we will ask for 16 more bytes and another
50% on top of that.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 strbuf.h             | 11 +++++++++++
 path.c               |  3 +--
 pretty.c             |  2 +-
 refs/files-backend.c |  3 +--
 trailer.c            |  2 +-
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index 7d0aeda434..32cc15de0c 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -140,6 +140,17 @@ static inline void strbuf_attachstr_len(struct strbuf *sb,
 	strbuf_attach(sb, str, len, len + 1);
 }
 
+/**
+ * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
+ * Useful if you do not know the length of the string.
+ */
+static inline void strbuf_attachstr(struct strbuf *sb, char *str)
+{
+	size_t len = strlen(str);
+
+	strbuf_attach(sb, str, len, len + 1);
+}
+
 /**
  * Swap the contents of two string buffers.
  */
diff --git a/path.c b/path.c
index 9bd717c307..3cd8fd56b4 100644
--- a/path.c
+++ b/path.c
@@ -815,8 +815,7 @@ const char *enter_repo(const char *path, int strict)
 			char *newpath = expand_user_path(used_path.buf, 0);
 			if (!newpath)
 				return NULL;
-			strbuf_attach(&used_path, newpath, strlen(newpath),
-				      strlen(newpath));
+			strbuf_attachstr(&used_path, newpath);
 		}
 		for (i = 0; suffix[i]; i++) {
 			struct stat st;
diff --git a/pretty.c b/pretty.c
index e171830389..5ecdf0cbb2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -590,7 +590,7 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 		return buf; /* should not happen but be defensive */
 	len = cp + 1 - (buf + start);
 
-	strbuf_attach(&tmp, buf, strlen(buf), strlen(buf) + 1);
+	strbuf_attachstr(&tmp, buf);
 	if (is_encoding_utf8(encoding)) {
 		/* we have re-coded to UTF-8; drop the header */
 		strbuf_remove(&tmp, start, len);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a..eb058d85b6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1511,10 +1511,9 @@ static int commit_ref(struct ref_lock *lock)
 		 * the lockfile to. Hopefully it is empty; try to
 		 * delete it.
 		 */
-		size_t len = strlen(path);
 		struct strbuf sb_path = STRBUF_INIT;
 
-		strbuf_attach(&sb_path, path, len, len);
+		strbuf_attachstr(&sb_path, path);
 
 		/*
 		 * If this fails, commit_lock_file() will also fail
diff --git a/trailer.c b/trailer.c
index 0c414f2fed..56c4027943 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1095,7 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
 	for (ptr = trailer_lines; *ptr; ptr++) {
 		if (last && isspace((*ptr)->buf[0])) {
 			struct strbuf sb = STRBUF_INIT;
-			strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
+			strbuf_attachstr(&sb, *last);
 			strbuf_addbuf(&sb, *ptr);
 			*last = strbuf_detach(&sb, NULL);
 			continue;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/4] strbuf_attach: prefer `strbuf_attachstr_len()`
  2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
                         ` (2 preceding siblings ...)
  2020-04-19 12:32       ` [PATCH 3/4] strbuf: introduce `strbuf_attachstr()` Martin Ågren
@ 2020-04-19 12:32       ` Martin Ågren
  3 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-19 12:32 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

After the last few commits, we don't have many users of
`strbuf_attach()`.

Convert the sites in builtin/am.c, strbuf.c and mailinfo.c. They pass in
the same length twice for `len` and `mem` and will eventually hit
`realloc(3)`, which will be a no-op. The string in am.c has been
constructed using the strbuf machinery in `read_commit_msg()`. In
strbuf.c, we've used `reencode_string_iconv()`.  In mailinfo.c, we used
`reencode_string_len()`. So in all cases, we really do have an extra
byte at the end with a NUL.

As explained in the previous commit, it's not just that we avoid calling
`realloc()` to make room for a single NUL byte that we already have, we
avoid asking it for 16 more bytes and another 50% on top of that.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 builtin/am.c  | 2 +-
 fast-import.c | 2 +-
 mailinfo.c    | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index e3dfd93c25..d777855c98 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1101,7 +1101,7 @@ static void am_append_signoff(struct am_state *state)
 {
 	struct strbuf sb = STRBUF_INIT;
 
-	strbuf_attach(&sb, state->msg, state->msg_len, state->msg_len);
+	strbuf_attachstr_len(&sb, state->msg, state->msg_len);
 	append_signoff(&sb, 0, 0);
 	state->msg = strbuf_detach(&sb, &state->msg_len);
 }
diff --git a/fast-import.c b/fast-import.c
index 202dda11a6..28fbc4792b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2946,7 +2946,7 @@ static void cat_blob(struct object_entry *oe, struct object_id *oid)
 	cat_blob_write("\n", 1);
 	if (oe && oe->pack_id == pack_id) {
 		last_blob.offset = oe->idx.offset;
-		strbuf_attach(&last_blob.data, buf, size, size);
+		strbuf_attachstr_len(&last_blob.data, buf, size);
 		last_blob.depth = oe->depth;
 	} else
 		free(buf);
diff --git a/mailinfo.c b/mailinfo.c
index c31991e621..942c363bfd 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -461,7 +461,7 @@ static int convert_to_utf8(struct mailinfo *mi,
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, out_len, out_len);
+	strbuf_attachstr_len(line, out, out_len);
 	return 0;
 }
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19 12:29     ` Martin Ågren
@ 2020-04-19 14:16       ` Danh Doan
  0 siblings, 0 replies; 39+ messages in thread
From: Danh Doan @ 2020-04-19 14:16 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On 2020-04-19 14:29:06+0200, Martin Ågren <martin.agren@gmail.com> wrote:
> I think the subject could be adapted though? Now it's not about "reusing
> info" anymore, it's about using *other*, *better* info. Maybe
> 
>   mailinfo.c: avoid strlen on strings that might contain NUL

This is better than mine.
I couldn't think about any other alternative.

> 
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
> >                            struct strbuf *line, const char *charset)
> >  {
> >         char *out;
> > +       size_t out_len;
> >
> >         if (!mi->metainfo_charset || !charset || !*charset)
> >                 return 0;
> >
> >         if (same_encoding(mi->metainfo_charset, charset))
> >                 return 0;
> > -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> > +       out = reencode_string_len(line->buf, line->len,
> > +                                 mi->metainfo_charset, charset, &out_len);
> >         if (!out) {
> >                 mi->input_error = -1;
> >                 return error("cannot convert from %s to %s",
> >                              charset, mi->metainfo_charset);
> >         }
> > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > +       strbuf_attach(line, out, out_len, out_len);
> >         return 0;
> >  }
> 
> Same diff as before, ok.
> 
> > +write_nul_patch() {
> > +       space=' '
> > +       qNUL=
> > +       case "$1" in
> > +               subject) qNUL='=00' ;;
> > +       esac
> 
> Here it's case/esac...
> 
> > +       cat <<-EOF
> > +       From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> > +       From: A U Thor <author@example.com>
> > +       Date: Sun, 19 Apr 2020 13:42:07 +0700
> > +       Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> > +       MIME-Version: 1.0
> > +       Content-Type: text/plain; charset=ISO-8859-1
> > +       Content-Transfer-Encoding: 8bit
> > +
> > +       EOF
> > +       if test "$1" = body
> > +       then
> > +               printf "%s\0%s\n" abc def
> > +       fi
> 
> Here it's if/fi. Looks a bit inconsistent.
> 
> I suppose you tried to have a case for "body" above but couldn't get it
> to work? Somehow, it would seem more consistent to have a qSubject and
> qBody and handle them the same way, but maybe that's not possible?
> Maybe using q_to_nul to create qBody containing a NUL?

Those patch was written in different time, with different thought in
mind.

I need to send a re-roll to update the subject and moving oneline in
test code to later patch.

I'll update this hunk with that re-roll.
Your new series _won't_ be affected, though.

> > +       cat <<-\EOF
> > +       ---
> > +       diff --git a/afile b/afile
> > +       new file mode 100644
> > +       index 0000000000..e69de29bb2
> > +       --$space
> > +       2.26.1
> > +       EOF
> > +}
> 
> I think this helper function should use &&-chaining.

Correct.

> > +
> >  test_expect_success setup '
> >         # Note the missing "+++" line:
> >         cat >bad-patch.diff <<-\EOF &&
> > @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
> >         test_i18ncmp expected actual
> >  '
> >
> > +test_expect_success "NUL in commit message's body" '
> > +       test_when_finished "git am --abort" &&
> > +       write_nul_patch body >body.patch &&
> > +       test_must_fail git am body.patch 2>err &&
> > +       grep "a NUL byte in commit log message not allowed" err
> > +'
> 
> Makes sense.
> 
> > +
> > +test_expect_failure "NUL in commit message's header" '
> > +       test_when_finished "git am --abort" &&
> > +       write_nul_patch subject >subject.patch &&
> > +       test_must_fail git am subject.patch 2>err &&
> > +       grep "a NUL byte in Subject is not allowed" err
> > +'
> 
> Interesting. :-)

Going through the mail now make me think about moving the last grep to
next patch in order to clearly indicate that we expect "git-am" is
failling but it's passed instead.

And that message isn't introduced until the very next change.

-- 
Danh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/3] t4254: merge 2 steps of a single test
  2020-04-19 12:25     ` Martin Ågren
@ 2020-04-19 14:17       ` Danh Doan
  0 siblings, 0 replies; 39+ messages in thread
From: Danh Doan @ 2020-04-19 14:17 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On 2020-04-19 14:25:41+0200, Martin Ågren <martin.agren@gmail.com> wrote:
> On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> >  #   fatal: unable to write file '(null)' mode 100644: Bad address
> >  # Also, it had the unwanted side-effect of deleting f.
> 
> (This comment looks a bit unnecessary, but that's not new.)

I haven't looked very hard into this part, I guess I could take more
time to look at it and decide if we should remove it or not.

> >  test_expect_success 'try to apply corrupted patch' '
> > -       test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual
> > -'
> > -
> > -test_expect_success 'compare diagnostic; ensure file is still here' '
> > +       test_when_finished "git am --abort" &&
> > +       test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual &&
> >         echo "error: git diff header lacks filename information (line 4)" >expected &&
> >         test_path_is_file f &&
> >         test_i18ncmp expected actual
> 
> Makes sense. The first mini-test used to just set up "actual" and there
> was only one test using it, so it's hard to argue it was a "setup"
> phase. This looks better.

-- 
Danh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header
  2020-04-19 12:30     ` Martin Ågren
@ 2020-04-19 14:24       ` Danh Doan
  0 siblings, 0 replies; 39+ messages in thread
From: Danh Doan @ 2020-04-19 14:24 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On 2020-04-19 14:30:19+0200, Martin Ågren <martin.agren@gmail.com> wrote:
> On Sun, 19 Apr 2020 at 13:05, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> > --- a/t/t4254-am-corrupt.sh
> > +++ b/t/t4254-am-corrupt.sh
> > @@ -69,9 +69,11 @@ test_expect_success "NUL in commit message's body" '
> >         grep "a NUL byte in commit log message not allowed" err
> >  '
> >
> > -test_expect_failure "NUL in commit message's header" '
> > +test_expect_success "NUL in commit message's header" '
> >         test_when_finished "git am --abort" &&
> >         write_nul_patch subject >subject.patch &&
> > +       test_must_fail git mailinfo msg patch <subject.patch 2>err &&
> > +       grep "a NUL byte in Subject is not allowed" err &&
> >         test_must_fail git am subject.patch 2>err &&
> >         grep "a NUL byte in Subject is not allowed" err
> >  '
> 
> We used to fail for some reason and it's sort of clear from the size
> of the test what that reason is. Now that we flip "failure" to "success"
> we can add some more stuff here that works. Makes sense. Of course,
> somewhere is a line for stuffing too much into the test, i.e., you'll
> reach a point where you should have separate tests, but I think this is
> ok.

I'm not really keen to check the "git-mailinfo" failure here.
Since, another developer may come and decide that we should keep the
mailinfo as is, i.e. keep NUL character in Subject, Author, Email,
etc... And let's git-commit complains instead.

Adding a check for git-mailinfo here may limit future development.
Other's feedback is welcomed.

-- 
Danh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] strbuf: fix doc for `strbuf_attach()`
  2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
@ 2020-04-20 17:30         ` Junio C Hamano
  2020-04-21 18:44           ` Martin Ågren
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2020-04-20 17:30 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Đoàn Trần Công Danh

Martin Ågren <martin.agren@gmail.com> writes:

> Commit 917c9a7133 ("New strbuf APIs: splice and attach.", 2007-09-15)
> added `strbuf_attach()`. In the commit message, it is explicitly
> discussed how using `mem == len` is ok, although possibly costly. When
> this function was documented in dd613e6b87 ("Strbuf documentation:
> document most functions", 2008-06-04), this distinction was missed and
> the documentation simply forbids this case.

In other words, mem==len implies that the original lacks the
terminating '\0', so attach needs to reallocate from the get go, so
discouraging such use may make sense, but it is too strong to forbid
it, as the strbuf invariant is held safely?  If so, the description
makes sense to me.

> The function handles reallocation and truncation, yet the docs say that
> the "amount [of allocated memory] must be larger than the string length,
> because the string you pass is supposed to be a NUL-terminated string".

IOW, _attach() does not mind if the original lacks '\0' at the end.

> diff --git a/strbuf.h b/strbuf.h
> index ce8e49c0b2..2a462f70cc 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
>  /**
>   * Attach a string to a buffer. You should specify the string to attach,
>   * the current length of the string and the amount of allocated memory.
> + * The amount must be at least as large as the string length. If the two
> + * lengths are equal, reallocation will be handled as appropriate and in
> + * any case, the string will be NUL-truncated as implied by `len`.

NUL-truncated?  Ah, if mem and len are the same, the string is reallocated
to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len
holds.  Makes sense.

> + *
> + * This string _must_ be malloc()ed, and after attaching, the pointer
> + * cannot be relied upon anymore, and neither be free()d directly.

That's a good thing to explain.

>   */
>  void strbuf_attach(struct strbuf *sb, void *str, size_t len, size_t mem);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/4] strbuf: introduce `strbuf_attachstr()`
  2020-04-19 12:32       ` [PATCH 3/4] strbuf: introduce `strbuf_attachstr()` Martin Ågren
@ 2020-04-20 19:39         ` Junio C Hamano
  2020-04-21 18:47           ` Martin Ågren
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2020-04-20 19:39 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Đoàn Trần Công Danh

Martin Ågren <martin.agren@gmail.com> writes:

> +/**
> + * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
> + * Useful if you do not know the length of the string.
> + */
> +static inline void strbuf_attachstr(struct strbuf *sb, char *str)
> +{
> +	size_t len = strlen(str);
> +
> +	strbuf_attach(sb, str, len, len + 1);
> +}

This is somewhat worrysome in that the interface is _so_ simple that
people may fail to see that str must be allocated piece of memory,
and it is preferrable if string fully fills the allocation.

We should repeat that (instead of just trusting "similar to ..."
would tell them enough) in the doc, perhaps?

> diff --git a/trailer.c b/trailer.c
> index 0c414f2fed..56c4027943 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -1095,7 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
>  	for (ptr = trailer_lines; *ptr; ptr++) {
>  		if (last && isspace((*ptr)->buf[0])) {
>  			struct strbuf sb = STRBUF_INIT;
> -			strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
> +			strbuf_attachstr(&sb, *last);
>  			strbuf_addbuf(&sb, *ptr);
>  			*last = strbuf_detach(&sb, NULL);
>  			continue;

This is not wrong per-se, but it is unclear if use of strbuf_attach*
family to avoid an explicit malloc/copy/free is buying much at this
callsite.  Simplifying the code here of course is not within the
scope of this series.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
  2020-04-19 12:29     ` Martin Ågren
@ 2020-04-20 19:59     ` Junio C Hamano
  2020-04-20 23:53       ` Danh Doan
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2020-04-20 19:59 UTC (permalink / raw)
  To: Đoàn Trần Công Danh; +Cc: git

Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> index ddd35498db..98cda32d0a 100755
> --- a/t/t4254-am-corrupt.sh
> +++ b/t/t4254-am-corrupt.sh
> @@ -3,6 +3,36 @@
>  test_description='git am with corrupt input'
>  . ./test-lib.sh
>  
> +write_nul_patch() {

Style: SP on both sides of (), i.e.

	write_nul_patch () {

But isn't this misnamed?  You are interested in injecting '\0' byte
in the e-mail headers and bodies, not necessarily part of the patch,
but "nul-patch" somehow hints readers that we are writing out a Null
Patch (something that does not do anything, perhaps?).

sample_mbox_with_nul is the best alternative I can come up with
offhand, which is not great either, but at least it does not say
patch.

> +	space=' '
> +	qNUL=
> +	case "$1" in
> +		subject) qNUL='=00' ;;
> +	esac

Style: case/esac aligns with case arms, i.e.

	case "$1" in
	subject) qNUL='=00' ;;
	esac

> +	cat <<-EOF
> +	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> +	From: A U Thor <author@example.com>
> +	Date: Sun, 19 Apr 2020 13:42:07 +0700
> +	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> +	MIME-Version: 1.0
> +	Content-Type: text/plain; charset=ISO-8859-1
> +	Content-Transfer-Encoding: 8bit
> +
> +	EOF

Since the above does have ${qNUL} interpolated, not quoting <<-EOF
is correct.  Good.

> +	if test "$1" = body
> +	then
> +		printf "%s\0%s\n" abc def
> +	fi

OK.  So we won't be able to inject NUL byte in both header and body
at the same time.  If you wanted to allow it, you could write

	case ",$1," in
	*,subject,*)	qNUL="=00" ;;
	esac

in the early part, and then rewrite this one like so:

	case ",$1," in
	*,body,*)	printf "..." ;;
	esac

Then those callers who want to ask for both can say

	sample_mbox_with_nul subject,body

> +	cat <<-\EOF
> +	---
> +	diff --git a/afile b/afile
> +	new file mode 100644
> +	index 0000000000..e69de29bb2
> +	--$space
> +	2.26.1
> +	EOF

Doesn't this want to interpolate $space in the output?  I think you
want to say <<-EOF, without quoting.

    cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch

tells me that "--$space" is left in the output, not "-- ".

> +}
> +
>  test_expect_success setup '
>  	# Note the missing "+++" line:
>  	cat >bad-patch.diff <<-\EOF &&
> @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
>  	test_i18ncmp expected actual
>  '
>  
> +test_expect_success "NUL in commit message's body" '
> +	test_when_finished "git am --abort" &&
> +	write_nul_patch body >body.patch &&
> +	test_must_fail git am body.patch 2>err &&
> +	grep "a NUL byte in commit log message not allowed" err
> +'
> +
> +test_expect_failure "NUL in commit message's header" '
> +	test_when_finished "git am --abort" &&
> +	write_nul_patch subject >subject.patch &&
> +	test_must_fail git am subject.patch 2>err &&
> +	grep "a NUL byte in Subject is not allowed" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info
  2020-04-20 19:59     ` Junio C Hamano
@ 2020-04-20 23:53       ` Danh Doan
  0 siblings, 0 replies; 39+ messages in thread
From: Danh Doan @ 2020-04-20 23:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 2020-04-20 12:59:37-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> 
> > diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> > index ddd35498db..98cda32d0a 100755
> > --- a/t/t4254-am-corrupt.sh
> > +++ b/t/t4254-am-corrupt.sh
> > @@ -3,6 +3,36 @@
> >  test_description='git am with corrupt input'
> >  . ./test-lib.sh
> >  
> > +write_nul_patch() {
> 
> Style: SP on both sides of (), i.e.
> 
> 	write_nul_patch () {
> 
> But isn't this misnamed?  You are interested in injecting '\0' byte

Originally, this function was written to create a file named
"nul.patch", but it's prohibited in Windows land.

It's still misnamed, though.

> in the e-mail headers and bodies, not necessarily part of the patch,
> but "nul-patch" somehow hints readers that we are writing out a Null
> Patch (something that does not do anything, perhaps?).
> 
> sample_mbox_with_nul is the best alternative I can come up with
> offhand, which is not great either, but at least it does not say
> patch.

I prefer having a verb, but make_sample_mbox_with_nul is too long.
I'll take make_mbox_with_nul. Naming is hard.

> > +	cat <<-\EOF
> > +	---
> > +	diff --git a/afile b/afile
> > +	new file mode 100644
> > +	index 0000000000..e69de29bb2
> > +	--$space
> > +	2.26.1
> > +	EOF
> 
> Doesn't this want to interpolate $space in the output?  I think you
> want to say <<-EOF, without quoting.
> 
>     cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch
> 
> tells me that "--$space" is left in the output, not "-- ".

I recalled it now. Originially, I wrote "-- " in that line,
When I try git-am(1) the mail, I saw a warning about trailing space.
I want to get rid of it but forget to change "-\EOF"

Those last 2 lines isn't strictly required, I wanted to mimic a real
patch created by git-format-patch, though.


-- 
Danh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 0/3] Disallow NUL character in mailinfo
  2020-04-18  3:54 [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
  2020-04-18 19:56 ` Martin Ågren
  2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
@ 2020-04-20 23:54 ` Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
                     ` (2 more replies)
  2 siblings, 3 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-20 23:54 UTC (permalink / raw)
  To: git
  Cc: Đoàn Trần Công Danh, Martin Ågren,
	Junio C Hamano


As of current state, Git disallow NUL character in commit message.

This indirectly disallow NUL in the body of commit message that sent by email
in UTF-8 encoding.

In those below cases:

* NUL character in header field (be it Subject, Author, Email, etc...)
* NUL in the body of commit message that originally sent in other than UTF-8
  encoding

Git silently accepts them, albeit, truncate at the NUL character.

Those email is clearly not generated by Git, they must be crafted.

Complains loudly about those NUL characters.

Change from v2:

* reword 2/3 commit message
* rename helper for crafting faulty patch, use single heredoc,
  and allow quoted NUL in both subject and body
* postpone a grep in test for NUL in subject to clearly indicate that git-am
  is failing instead of wrong error message.
* quote the header that contains NUL in the error message.


Đoàn Trần Công Danh (3):
  t4254: merge 2 steps of a single test
  mailinfo.c: avoid strlen on strings that can contains NUL
  mailinfo: disallow NUL character in mail's header

 mailinfo.c            | 11 +++++++--
 t/t4254-am-corrupt.sh | 53 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 6 deletions(-)

Range-diff against v2:
1:  d1bc31e692 = 1:  d1bc31e692 t4254: merge 2 steps of a single test
2:  e3e542d388 ! 2:  97ede4aab2 mailinfo.c::convert_to_utf8: reuse strlen info
    @@ Metadata
     Author: Đoàn Trần Công Danh <congdanhqx@gmail.com>
     
      ## Commit message ##
    -    mailinfo.c::convert_to_utf8: reuse strlen info
    +    mailinfo.c: avoid strlen on strings that can contains NUL
     
         We're passing buffer from strbuf to reencode_string,
         which will call strlen(3) on that buffer,
    @@ t/t4254-am-corrupt.sh
      test_description='git am with corrupt input'
      . ./test-lib.sh
      
    -+write_nul_patch() {
    ++make_mbox_with_nul () {
     +	space=' '
    -+	qNUL=
    -+	case "$1" in
    -+		subject) qNUL='=00' ;;
    -+	esac
    ++	q_nul_in_subject=
    ++	q_nul_in_body=
    ++	while test $# -ne 0
    ++	do
    ++		case "$1" in
    ++		subject) q_nul_in_subject='=00' ;;
    ++		body)    q_nul_in_body='=00' ;;
    ++		esac &&
    ++		shift
    ++	done &&
     +	cat <<-EOF
     +	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
     +	From: A U Thor <author@example.com>
     +	Date: Sun, 19 Apr 2020 13:42:07 +0700
    -+	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
    ++	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${q_nul_in_subject}=D1=CF=D6?=
     +	MIME-Version: 1.0
     +	Content-Type: text/plain; charset=ISO-8859-1
    -+	Content-Transfer-Encoding: 8bit
    ++	Content-Transfer-Encoding: quoted-printable
     +
    -+	EOF
    -+	if test "$1" = body
    -+	then
    -+		printf "%s\0%s\n" abc def
    -+	fi
    -+	cat <<-\EOF
    ++	abc${q_nul_in_body}def
     +	---
     +	diff --git a/afile b/afile
     +	new file mode 100644
    @@ t/t4254-am-corrupt.sh: test_expect_success 'try to apply corrupted patch' '
      
     +test_expect_success "NUL in commit message's body" '
     +	test_when_finished "git am --abort" &&
    -+	write_nul_patch body >body.patch &&
    ++	make_mbox_with_nul body >body.patch &&
     +	test_must_fail git am body.patch 2>err &&
     +	grep "a NUL byte in commit log message not allowed" err
     +'
     +
    -+test_expect_failure "NUL in commit message's header" '
    -+	test_when_finished "git am --abort" &&
    -+	write_nul_patch subject >subject.patch &&
    -+	test_must_fail git am subject.patch 2>err &&
    -+	grep "a NUL byte in Subject is not allowed" err
    -+'
    ++test_expect_failure "NUL in commit message's header" "
    ++	test_when_finished 'git am --abort' &&
    ++	make_mbox_with_nul subject >subject.patch &&
    ++	test_must_fail git am subject.patch
    ++"
     +
      test_done
3:  cb96947b36 ! 3:  b2e760227e mailinfo: disallow NUL character in mail's header
    @@ mailinfo.c: static void handle_info(struct mailinfo *mi)
      			continue;
      
     +		if (memchr(hdr->buf, '\0', hdr->len)) {
    -+			error("a NUL byte in %s is not allowed.", header[i]);
    ++			error("a NUL byte in '%s' is not allowed.", header[i]);
     +			mi->input_error = -1;
     +		}
     +
    @@ t/t4254-am-corrupt.sh: test_expect_success "NUL in commit message's body" '
      	grep "a NUL byte in commit log message not allowed" err
      '
      
    --test_expect_failure "NUL in commit message's header" '
    -+test_expect_success "NUL in commit message's header" '
    - 	test_when_finished "git am --abort" &&
    - 	write_nul_patch subject >subject.patch &&
    +-test_expect_failure "NUL in commit message's header" "
    ++test_expect_success "NUL in commit message's header" "
    + 	test_when_finished 'git am --abort' &&
    + 	make_mbox_with_nul subject >subject.patch &&
    +-	test_must_fail git am subject.patch
     +	test_must_fail git mailinfo msg patch <subject.patch 2>err &&
    -+	grep "a NUL byte in Subject is not allowed" err &&
    - 	test_must_fail git am subject.patch 2>err &&
    - 	grep "a NUL byte in Subject is not allowed" err
    - '
    ++	grep \"a NUL byte in 'Subject' is not allowed\" err &&
    ++	test_must_fail git am subject.patch 2>err &&
    ++	grep \"a NUL byte in 'Subject' is not allowed\" err
    + "
    + 
    + test_done
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 1/3] t4254: merge 2 steps of a single test
  2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
@ 2020-04-20 23:54   ` Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 2/3] mailinfo.c: avoid strlen on strings that can contains NUL Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
  2 siblings, 0 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-20 23:54 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

While we are at it, make sure we run a clean up after testing.
In a later patch, we will test for more corrupted patch.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 t/t4254-am-corrupt.sh | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index fd3bdbfe2c..ddd35498db 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -25,10 +25,8 @@ test_expect_success setup '
 #   fatal: unable to write file '(null)' mode 100644: Bad address
 # Also, it had the unwanted side-effect of deleting f.
 test_expect_success 'try to apply corrupted patch' '
-	test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual
-'
-
-test_expect_success 'compare diagnostic; ensure file is still here' '
+	test_when_finished "git am --abort" &&
+	test_must_fail git -c advice.amWorkDir=false am bad-patch.diff 2>actual &&
 	echo "error: git diff header lacks filename information (line 4)" >expected &&
 	test_path_is_file f &&
 	test_i18ncmp expected actual
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 2/3] mailinfo.c: avoid strlen on strings that can contains NUL
  2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
@ 2020-04-20 23:54   ` Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
  2 siblings, 0 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-20 23:54 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

We're passing buffer from strbuf to reencode_string,
which will call strlen(3) on that buffer,
and discard the length of newly created buffer.
Then, we compute the length of the return buffer to attach to strbuf.

During this process, we introduce a discrimination between mail
originally written in utf-8 and other encoding.

* if the email was written in utf-8, we leave it as is. If there is
  a NUL character in that line, we complains loudly:

  	error: a NUL byte in commit log message not allowed.

* if the email was written in other encoding, we truncate the data as
  the NUL character in that line, then we used the truncated line for
  the metadata.

We can do better by reusing all the available information,
and call the underlying lower level function that will be called
indirectly by reencode_string. By doing this, we will also postpone
the NUL character processing to the commit step, which will
complains about the faulty metadata.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c            |  6 ++++--
 t/t4254-am-corrupt.sh | 44 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 742fa376ab..0e9911df6d 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
 			   struct strbuf *line, const char *charset)
 {
 	char *out;
+	size_t out_len;
 
 	if (!mi->metainfo_charset || !charset || !*charset)
 		return 0;
 
 	if (same_encoding(mi->metainfo_charset, charset))
 		return 0;
-	out = reencode_string(line->buf, mi->metainfo_charset, charset);
+	out = reencode_string_len(line->buf, line->len,
+				  mi->metainfo_charset, charset, &out_len);
 	if (!out) {
 		mi->input_error = -1;
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out));
+	strbuf_attach(line, out, out_len, out_len);
 	return 0;
 }
 
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index ddd35498db..1bbc37bc92 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -3,6 +3,37 @@
 test_description='git am with corrupt input'
 . ./test-lib.sh
 
+make_mbox_with_nul () {
+	space=' '
+	q_nul_in_subject=
+	q_nul_in_body=
+	while test $# -ne 0
+	do
+		case "$1" in
+		subject) q_nul_in_subject='=00' ;;
+		body)    q_nul_in_body='=00' ;;
+		esac &&
+		shift
+	done &&
+	cat <<-EOF
+	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
+	From: A U Thor <author@example.com>
+	Date: Sun, 19 Apr 2020 13:42:07 +0700
+	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${q_nul_in_subject}=D1=CF=D6?=
+	MIME-Version: 1.0
+	Content-Type: text/plain; charset=ISO-8859-1
+	Content-Transfer-Encoding: quoted-printable
+
+	abc${q_nul_in_body}def
+	---
+	diff --git a/afile b/afile
+	new file mode 100644
+	index 0000000000..e69de29bb2
+	--$space
+	2.26.1
+	EOF
+}
+
 test_expect_success setup '
 	# Note the missing "+++" line:
 	cat >bad-patch.diff <<-\EOF &&
@@ -32,4 +63,17 @@ test_expect_success 'try to apply corrupted patch' '
 	test_i18ncmp expected actual
 '
 
+test_expect_success "NUL in commit message's body" '
+	test_when_finished "git am --abort" &&
+	make_mbox_with_nul body >body.patch &&
+	test_must_fail git am body.patch 2>err &&
+	grep "a NUL byte in commit log message not allowed" err
+'
+
+test_expect_failure "NUL in commit message's header" "
+	test_when_finished 'git am --abort' &&
+	make_mbox_with_nul subject >subject.patch &&
+	test_must_fail git am subject.patch
+"
+
 test_done
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 3/3] mailinfo: disallow NUL character in mail's header
  2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
  2020-04-20 23:54   ` [PATCH v3 2/3] mailinfo.c: avoid strlen on strings that can contains NUL Đoàn Trần Công Danh
@ 2020-04-20 23:54   ` Đoàn Trần Công Danh
  2 siblings, 0 replies; 39+ messages in thread
From: Đoàn Trần Công Danh @ 2020-04-20 23:54 UTC (permalink / raw)
  To: git; +Cc: Đoàn Trần Công Danh

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c            | 5 +++++
 t/t4254-am-corrupt.sh | 7 +++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 0e9911df6d..5681d9130d 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -1138,6 +1138,11 @@ static void handle_info(struct mailinfo *mi)
 		else
 			continue;
 
+		if (memchr(hdr->buf, '\0', hdr->len)) {
+			error("a NUL byte in '%s' is not allowed.", header[i]);
+			mi->input_error = -1;
+		}
+
 		if (!strcmp(header[i], "Subject")) {
 			if (!mi->keep_subject) {
 				cleanup_subject(mi, hdr);
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index 1bbc37bc92..daf01c309d 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -70,10 +70,13 @@ test_expect_success "NUL in commit message's body" '
 	grep "a NUL byte in commit log message not allowed" err
 '
 
-test_expect_failure "NUL in commit message's header" "
+test_expect_success "NUL in commit message's header" "
 	test_when_finished 'git am --abort' &&
 	make_mbox_with_nul subject >subject.patch &&
-	test_must_fail git am subject.patch
+	test_must_fail git mailinfo msg patch <subject.patch 2>err &&
+	grep \"a NUL byte in 'Subject' is not allowed\" err &&
+	test_must_fail git am subject.patch 2>err &&
+	grep \"a NUL byte in 'Subject' is not allowed\" err
 "
 
 test_done
-- 
2.26.1.301.g55bc3eb7cb


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] strbuf: fix doc for `strbuf_attach()`
  2020-04-20 17:30         ` Junio C Hamano
@ 2020-04-21 18:44           ` Martin Ågren
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-21 18:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Đoàn Trần Công Danh

On Mon, 20 Apr 2020 at 19:30, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > The function handles reallocation and truncation, yet the docs say that
> > the "amount [of allocated memory] must be larger than the string length,
> > because the string you pass is supposed to be a NUL-terminated string".
>
> IOW, _attach() does not mind if the original lacks '\0' at the end.

Right, nor if it lacks the space for it.

> > diff --git a/strbuf.h b/strbuf.h
> > index ce8e49c0b2..2a462f70cc 100644
> > --- a/strbuf.h
> > +++ b/strbuf.h
> > @@ -112,10 +112,12 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz);
> >  /**
> >   * Attach a string to a buffer. You should specify the string to attach,
> >   * the current length of the string and the amount of allocated memory.
> > + * The amount must be at least as large as the string length. If the two
> > + * lengths are equal, reallocation will be handled as appropriate and in
> > + * any case, the string will be NUL-truncated as implied by `len`.
>
> NUL-truncated?  Ah, if mem and len are the same, the string is reallocated
> to fit an extra byte to NUL-terminate, to make sure strlen(sb->buf)==len
> holds.  Makes sense.

Exactly. NUL-terminated would be better. I think I'll split that last
sentence and replace it with something like the following:

  If the two lengths are equal, reallocation will be handled as
  needed. And regardless, the string will be NUL-terminated at `len`.

(One might still have strlen(sb->buf) < len though. This just guarantees
"<=")


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/4] strbuf: introduce `strbuf_attachstr()`
  2020-04-20 19:39         ` Junio C Hamano
@ 2020-04-21 18:47           ` Martin Ågren
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Ågren @ 2020-04-21 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Đoàn Trần Công Danh

On Mon, 20 Apr 2020 at 21:39, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > +/**
> > + * Attach a string to a buffer similar to `strbuf_attachstr_len()`.
> > + * Useful if you do not know the length of the string.
> > + */
> > +static inline void strbuf_attachstr(struct strbuf *sb, char *str)
> > +{
> > +     size_t len = strlen(str);
> > +
> > +     strbuf_attach(sb, str, len, len + 1);
> > +}
>
> This is somewhat worrysome in that the interface is _so_ simple that
> people may fail to see that str must be allocated piece of memory,
> and it is preferrable if string fully fills the allocation.
>
> We should repeat that (instead of just trusting "similar to ..."
> would tell them enough) in the doc, perhaps?

Yeah, that's a good point. I'll expand on this to try to better get
through that there are things to consider here.

> > @@ -1095,7 +1095,7 @@ void trailer_info_get(struct trailer_info *info, const char *str,
> >       for (ptr = trailer_lines; *ptr; ptr++) {
> >               if (last && isspace((*ptr)->buf[0])) {
> >                       struct strbuf sb = STRBUF_INIT;
> > -                     strbuf_attach(&sb, *last, strlen(*last), strlen(*last));
> > +                     strbuf_attachstr(&sb, *last);
> >                       strbuf_addbuf(&sb, *ptr);
> >                       *last = strbuf_detach(&sb, NULL);
> >                       continue;
>
> This is not wrong per-se, but it is unclear if use of strbuf_attach*
> family to avoid an explicit malloc/copy/free is buying much at this
> callsite.  Simplifying the code here of course is not within the
> scope of this series.

For the other patches in this series, I spent some time and effort
investigating where strings came from, "do I really feel certain that
they're NUL-terminated?". But for this patch, I more or less went "we've
been using strlen on this all this time, surely if it wasn't guaranteed
to be NUL-terminated we'd have messed up already". And I don't think I'm
making anything worse. But yeah, I didn't really step back to look at
what these sites are really doing, and how, as much as I did for the
others.


Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-04-21 18:47 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-18  3:54 [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
2020-04-18 19:56 ` Martin Ågren
2020-04-18 20:18   ` [PATCH 0/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
2020-04-18 20:18     ` [PATCH 1/6] am: use `strbuf_attach()` correctly Martin Ågren
2020-04-18 20:18     ` [PATCH 2/6] strbuf_attach: correctly pass in `strlen() + 1` for `alloc` Martin Ågren
2020-04-18 20:18     ` [PATCH 3/6] strbuf: use `strbuf_attach()` correctly Martin Ågren
2020-04-18 20:18     ` [PATCH 4/6] fast-import: avoid awkward use of `strbuf_attach()` Martin Ågren
2020-04-18 20:18     ` [PATCH 5/6] rerere: " Martin Ågren
2020-04-18 20:18     ` [PATCH 6/6] strbuf: simplify `strbuf_attach()` usage Martin Ågren
2020-04-19  4:44     ` [PATCH 0/6] " Martin Ågren
2020-04-19 12:32     ` [PATCH 0/4] strbuf: fix doc for `strbuf_attach()` and avoid it Martin Ågren
2020-04-19 12:32       ` [PATCH 1/4] strbuf: fix doc for `strbuf_attach()` Martin Ågren
2020-04-20 17:30         ` Junio C Hamano
2020-04-21 18:44           ` Martin Ågren
2020-04-19 12:32       ` [PATCH 2/4] strbuf: introduce `strbuf_attachstr_len()` Martin Ågren
2020-04-19 12:32       ` [PATCH 3/4] strbuf: introduce `strbuf_attachstr()` Martin Ågren
2020-04-20 19:39         ` Junio C Hamano
2020-04-21 18:47           ` Martin Ågren
2020-04-19 12:32       ` [PATCH 4/4] strbuf_attach: prefer `strbuf_attachstr_len()` Martin Ågren
2020-04-18 23:12   ` [PATCH] mailinfo.c::convert_to_utf8: reuse strlen info Junio C Hamano
2020-04-19  2:48     ` Danh Doan
2020-04-19  4:34       ` Martin Ågren
2020-04-19  5:32         ` Junio C Hamano
2020-04-19 11:00 ` [PATCH v2 0/3] mailinfo: disallow and complains about NUL character Đoàn Trần Công Danh
2020-04-19 11:00   ` [PATCH v2 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
2020-04-19 12:25     ` Martin Ågren
2020-04-19 14:17       ` Danh Doan
2020-04-19 11:00   ` [PATCH v2 2/3] mailinfo.c::convert_to_utf8: reuse strlen info Đoàn Trần Công Danh
2020-04-19 12:29     ` Martin Ågren
2020-04-19 14:16       ` Danh Doan
2020-04-20 19:59     ` Junio C Hamano
2020-04-20 23:53       ` Danh Doan
2020-04-19 11:00   ` [PATCH v2 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh
2020-04-19 12:30     ` Martin Ågren
2020-04-19 14:24       ` Danh Doan
2020-04-20 23:54 ` [PATCH v3 0/3] Disallow NUL character in mailinfo Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 1/3] t4254: merge 2 steps of a single test Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 2/3] mailinfo.c: avoid strlen on strings that can contains NUL Đoàn Trần Công Danh
2020-04-20 23:54   ` [PATCH v3 3/3] mailinfo: disallow NUL character in mail's header Đoàn Trần Công Danh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).