All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v3 2/5] strvec: add a strvec_pushvec()
Date: Fri, 3 Sep 2021 07:06:05 -0400	[thread overview]
Message-ID: <YTIBnT8Ue1HZXs82@coredump.intra.peff.net> (raw)
In-Reply-To: <87v93i8svd.fsf@evledraar.gmail.com>

On Fri, Sep 03, 2021 at 01:19:33AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > One thing that did surprise me: the use of "int" here for iterating,
> > rather than size_t. But it seems that strvec is already storing ints,
> > which is an accident!
> 
> Is it really? If you temporarily try to say convert that to "size_t *nr"
> to have the compiler catch all the cases where we use "nr", and then
> s/size_t/int/g those all, you'll find that e.g. setup_revisions() and
> the like expect to take either "int argc" or the strvec equivalent.

It most definitely is an accident, in the sense that the commit which
changed it did not mean to. :)

> We can sensibly convert some of those to size_t, but not all, and the
> int v.s. size_t inconsistency as a result feels weird.
> 
> Since the main point of this API is to be a wrapper for what a C main()
> would take, shouldn't its prototype mirror its prototype? I.e. we should
> stick to "int" here?

That isn't the main point of this API. The reason the name changed away
from argv_array was so that it would be more obviously a generally
useful array of strings.

But even if that were the use, the point isn't that we expect to see 2B
or even 4B strings in any reasonable use case. The point is that we
don't want to accidentally overflow the integer counter, leading to
reads and writes of random bits of memory. And all it takes to do that
is some code like:

  while (strbuf_readline(stdin, &buf))
	strvec_push(&foo, buf.buf);

On a system with 32-bit size_t, you are not likely to actually allocate
2B strings before you'd fail. But on most 64-bit systems, you have
plenty of address space (and these days, often RAM) to do so, but you
still only need 2B strings, because "int" is 32 bits.

So code like setup_revisions() which uses an int is fine. It is reading
from a source with that much smaller "int" limit in the first place.
Whether strvec can handle bigger arrays or not does not matter either
way. And when it later uses ints to operate on such a strvec, it's OK in
practice; even if the size_t gets truncated on the fly to an int, we
know we did not put more than an int's worth of items into the array in
the first place.

But code which has potentially larger inputs (either because it reads
iteratively into the strvec as above, or because it is itself using a
larger data type) is now also OK, because it's using size_t.

What's not OK is code which operates on a potentially-unbounded strvec
using an int. E.g., following the code I showed above with something
like:

  int nr = foo.nr;
  foo.v[nr] = xstrdup("replacement string");

which may write to memory outside of foo.v.

Obviously that's nonsense nobody would ever write directly. It is
possible for some code to do this inadvertently (say, passing a ptr/int
pair through a function interface), but I think it's fairly unlikely in
practice. So while I do think more code ought to be using size_t in
general, I don't think it's necessary to audit this carefully. The
important thing to me is protecting strvec itself.

-Peff

  parent reply	other threads:[~2021-09-03 11:06 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  0:41 [PATCH 0/4] bundle: show progress on "unbundle" Ævar Arnfjörð Bjarmason
2021-07-27  0:41 ` [PATCH 1/4] bundle API: start writing API documentation Ævar Arnfjörð Bjarmason
2021-07-27  0:41 ` [PATCH 2/4] bundle API: change "flags" to be "extra_index_pack_args" Ævar Arnfjörð Bjarmason
2021-07-27  0:41 ` [PATCH 3/4] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-07-27  0:41 ` [PATCH 4/4] bundle: show progress on "unbundle" Ævar Arnfjörð Bjarmason
2021-08-23 11:02 ` [PATCH v2 0/4] " Ævar Arnfjörð Bjarmason
2021-08-23 11:02   ` [PATCH v2 1/4] bundle API: start writing API documentation Ævar Arnfjörð Bjarmason
2021-08-24 17:01     ` Derrick Stolee
2021-08-24 21:48       ` Ævar Arnfjörð Bjarmason
2021-08-23 11:02   ` [PATCH v2 2/4] bundle API: change "flags" to be "extra_index_pack_args" Ævar Arnfjörð Bjarmason
2021-08-24 17:09     ` Derrick Stolee
2021-08-24 21:41       ` Ævar Arnfjörð Bjarmason
2021-08-24 21:48         ` Derrick Stolee
2021-08-23 11:02   ` [PATCH v2 3/4] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-24 17:18     ` Derrick Stolee
2021-08-24 21:40       ` Ævar Arnfjörð Bjarmason
2021-08-23 11:02   ` [PATCH v2 4/4] bundle: show progress on "unbundle" Ævar Arnfjörð Bjarmason
2021-08-24 17:23     ` Derrick Stolee
2021-08-24 21:39       ` Ævar Arnfjörð Bjarmason
2021-08-26 14:05 ` [PATCH v3 0/5] " Ævar Arnfjörð Bjarmason
2021-08-26 14:05   ` [PATCH v3 1/5] bundle API: start writing API documentation Ævar Arnfjörð Bjarmason
2021-08-26 14:05   ` [PATCH v3 2/5] strvec: add a strvec_pushvec() Ævar Arnfjörð Bjarmason
2021-08-28  1:23     ` Junio C Hamano
2021-08-28  1:29       ` Junio C Hamano
2021-08-28  4:12         ` Jeff King
2021-08-29 23:54           ` Junio C Hamano
2021-08-30 17:30             ` Derrick Stolee
2021-09-02 23:19           ` Ævar Arnfjörð Bjarmason
2021-09-03  5:05             ` Junio C Hamano
2021-09-03 11:06             ` Jeff King [this message]
2021-08-26 14:05   ` [PATCH v3 3/5] bundle API: change "flags" to be "extra_index_pack_args" Ævar Arnfjörð Bjarmason
2021-08-28  1:44     ` Junio C Hamano
2021-08-26 14:05   ` [PATCH v3 4/5] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-08-28  1:50     ` Junio C Hamano
2021-08-26 14:05   ` [PATCH v3 5/5] bundle: show progress on "unbundle" Ævar Arnfjörð Bjarmason
2021-08-28  1:54     ` Junio C Hamano
2021-09-02 22:47       ` Ævar Arnfjörð Bjarmason
2021-09-05  7:34   ` [PATCH v4 0/4] " Ævar Arnfjörð Bjarmason
2021-09-05  7:34     ` [PATCH v4 1/4] bundle API: start writing API documentation Ævar Arnfjörð Bjarmason
2021-09-05  7:34     ` [PATCH v4 2/4] bundle API: change "flags" to be "extra_index_pack_args" Ævar Arnfjörð Bjarmason
2021-09-05  7:34     ` [PATCH v4 3/4] index-pack: add --progress-title option Ævar Arnfjörð Bjarmason
2021-09-05  7:34     ` [PATCH v4 4/4] bundle: show progress on "unbundle" Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTIBnT8Ue1HZXs82@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.