All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 0/7] strvec: use size_t to store nr and alloc
Date: Mon, 13 Sep 2021 11:47:12 +0100	[thread overview]
Message-ID: <cdf72de4-4879-c4b9-e882-2e134c6ba45f@iee.email> (raw)
In-Reply-To: <cover-v2-0.7-00000000000-20210912T001420Z-avarab@gmail.com>

On 12/09/2021 01:15, Ævar Arnfjörð Bjarmason wrote:
> This is a proposed v2 of Jeff King's one-patch change to change
> strvec's nr/alloc from "int" to "size_t". As noted below I think it's
> worthwhile to not only change that in the struct, but also in code
> that directly references the "nr" member.
>
> On Sat, Sep 11 2021, Philip Oakley wrote:
>
>> On 11/09/2021 17:13, Ævar Arnfjörð Bjarmason wrote:
>>> On Sat, Sep 11 2021, Jeff King wrote:
>>>
>>>> We converted argv_array (which later became strvec) to use size_t in
>>>> 819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in
>>>> order to avoid the possibility of integer overflow. But later, commit
>>>> d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally
>>>> converted these back to ints!
>>>>
>>>> Those two commits were part of the same patch series. I'm pretty sure
>>>> what happened is that they were originally written in the opposite order
>>>> and then cleaned up and re-ordered during an interactive rebase. And
>>>> when resolving the inevitable conflict, I mistakenly took the "rename"
>>>> patch completely, accidentally dropping the type change.
>>>>
>>>> We can correct it now; better late than never.
>>>>
>>>> Signed-off-by: Jeff King <peff@peff.net>
>>>> ---
>>>> This was posted previously in the midst of another thread, but I don't
>>>> think was picked up. There was some positive reaction, but one "do we
>>>> really need this?" to which I responded in detail:
>>>>
>>>>   https://lore.kernel.org/git/YTIBnT8Ue1HZXs82@coredump.intra.peff.net/
>>>>
>>>> I don't really think any of that needs to go into the commit message,
>>>> but if that's a hold-up, I can try to summarize it (though I think
>>>> referring to the commit which _already_ did this and was accidentally
>>>> reverted would be sufficient).
>>> Thanks, I have a WIP version of this outstanding starting with this
>>> patch that I was planning to submit sometime, but I'm happy to have you
>>> pursue it, especially with the ~100 outstanding patches I have in
>>> master..seen.
>>>
>>> It does feel somewhere between iffy and a landmine waiting to be stepped
>>> on to only convert the member itself, and not any of the corresponding
>>> "int" variables that track it to "size_t".
>>>
>>> If you do the change I suggested in
>>> https://lore.kernel.org/git/87v93i8svd.fsf@evledraar.gmail.com/ you'll
>>> find that there's at least one first-order reference to this that now
>>> uses "int" that if converted to "size_t" will result in a wrap-around
>>> error, we're lucky that one has a test failure.
>>>
>>> I can tell you what that bug is, but maybe it's better if you find it
>>> yourself :) I.e. I found *that* one, but I'm not sure I found them
>>> all. I just s/int nr/size_t *nr/ and eyeballed the wall off compiler
>>> errors & the code context (note: pointer, obviously broken, but makes
>>> the compiler yell).
>>>
>>> That particular bug will be caught by the compiler as it involves a >= 0
>>> comparison against unsigned, but we may not not have that everywhere...
>> I'm particularly interested in the int -> size_t change problem as part
>> of the wider 4GB limitations for the LLP64 systems [0] such as the
>> RaspPi, git-lfs (on windows [1]), and Git-for-Windows[2]. It is a big
>> problem.
> Okey, fine, no fun excercise for the reader then ;)

There's a lot of weeds in there ;-)  In some ways it feels like the
SHA1->SHA256 transition, but without the consensus, as the 'shifting
foundations' problem only affect a subgroup of a subgroup (large Windows
files and repositories).
>
> This is what I'd been sitting on locally since that recent thread, I
> polished it up a bit since Jeff King posted his version.
>
> The potential overflow bug I mentioned is in rebase.c. See
> 5/7. "Potential" because it's not a bug now, but that code
> intentionally considers a strvec, and then iterates it from nr-1 to 0,
> and if it reaches 0 intentionally counts down one more to -1 to
> indicate that it's visited all elements.

It's these tidbits about how the problems surface, their detection and
resolution that are really useful. Along with general awareness raising.
At least here the issue is reasonably tightly focussed, and even then,
testing is hard.
>
> We then check that with i >= 0, except of course if it becomes
> unsigned that doesn't become -1, but rather it wraps around.
>
> The rest of this is all changes to have that s/int/size_t/ radiate
> outwards, i.e. when we assign that value to a variable somewhere its
> now a "size_t" instead of an "int" etc.

In the LLP64 case, I'm somewhat concerned about the possible pushback of
a wide spread s/int/size_t/ on the codebase's look & feel.
 (aside) I don't think there is even a `1S` to match the  the `1L` and
`1U` shorthands used in various places.

None of that is part of the series, but the patches are beneficial to
the codes portability.

>
>> [0]
>> http://nickdesaulniers.github.io/blog/2016/05/30/data-models-and-word-size/
>> [1] https://github.com/git-lfs/git-lfs/issues/2434  Git on Windows
>> client corrupts files > 4Gb
>> [2] https://github.com/git-for-windows/git/pull/2179  [DRAFT] for
>> testing : Fix 4Gb limit for large files on Git for Windows
> Jeff King (1):
>   strvec: use size_t to store nr and alloc
>
> Ævar Arnfjörð Bjarmason (6):
>   remote-curl: pass "struct strvec *" instead of int/char ** pair
>   pack-objects: pass "struct strvec *" instead of int/char ** pair
>   sequencer.[ch]: pass "struct strvec *" instead of int/char ** pair
>   upload-pack.c: pass "struct strvec *" instead of int/char ** pair
>   rebase: don't have loop over "struct strvec" depend on signed "nr"
>   strvec API users: change some "int" tracking "nr" to "size_t"
>
>  builtin/pack-objects.c |  6 +++---
>  builtin/rebase.c       | 26 ++++++++++++--------------
>  connect.c              |  8 ++++----
>  fetch-pack.c           |  4 ++--
>  ls-refs.c              |  2 +-
>  remote-curl.c          | 23 +++++++++++------------
>  sequencer.c            |  8 ++++----
>  sequencer.h            |  4 ++--
>  serve.c                |  2 +-
>  shallow.c              |  5 +++--
>  shallow.h              |  6 ++++--
>  strvec.h               |  4 ++--
>  submodule.c            |  2 +-
>  upload-pack.c          |  7 +++----
>  14 files changed, 53 insertions(+), 54 deletions(-)
>
> Range-diff against v1:
> -:  ----------- > 1:  2ef48d734e8 remote-curl: pass "struct strvec *" instead of int/char ** pair
> -:  ----------- > 2:  7f59a58ed97 pack-objects: pass "struct strvec *" instead of int/char ** pair
> -:  ----------- > 3:  c35cfb9c9c5 sequencer.[ch]: pass "struct strvec *" instead of int/char ** pair
> -:  ----------- > 4:  2e0b82d4316 upload-pack.c: pass "struct strvec *" instead of int/char ** pair
> -:  ----------- > 5:  be85a0565ef rebase: don't have loop over "struct strvec" depend on signed "nr"
> 1:  498f5ed80dc ! 6:  ba17290852c strvec: use size_t to store nr and alloc
>     @@ Commit message
>          We can correct it now; better late than never.
>      
>          Signed-off-by: Jeff King <peff@peff.net>
>     +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>      
>       ## strvec.h ##
>      @@ strvec.h: extern const char *empty_strvec[];
> -:  ----------- > 7:  2edd9708888 strvec API users: change some "int" tracking "nr" to "size_t"


  parent reply	other threads:[~2021-09-13 10:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11 15:01 [PATCH] strvec: use size_t to store nr and alloc Jeff King
2021-09-11 16:13 ` Ævar Arnfjörð Bjarmason
2021-09-11 22:48   ` Philip Oakley
2021-09-12  0:15     ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 1/7] remote-curl: pass "struct strvec *" instead of int/char ** pair Ævar Arnfjörð Bjarmason
2021-09-12  0:36         ` Carlo Arenas
2021-09-13  3:56           ` Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 2/7] pack-objects: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 3/7] sequencer.[ch]: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 4/7] upload-pack.c: " Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 5/7] rebase: don't have loop over "struct strvec" depend on signed "nr" Ævar Arnfjörð Bjarmason
2021-09-12  2:57         ` Eric Sunshine
2021-09-12  0:15       ` [PATCH v2 6/7] strvec: use size_t to store nr and alloc Ævar Arnfjörð Bjarmason
2021-09-12  0:15       ` [PATCH v2 7/7] strvec API users: change some "int" tracking "nr" to "size_t" Ævar Arnfjörð Bjarmason
2021-09-12  3:00         ` Eric Sunshine
2021-09-12 22:19       ` [PATCH v2 0/7] strvec: use size_t to store nr and alloc Jeff King
2021-09-13  5:38         ` Junio C Hamano
2021-09-13 12:29           ` Ævar Arnfjörð Bjarmason
2021-09-13 17:20             ` Jeff King
2021-09-13 10:47       ` Philip Oakley [this message]
2021-09-12 22:00     ` [PATCH] " Jeff King
2021-09-13 11:42       ` Philip Oakley
2021-09-12 21:58   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdf72de4-4879-c4b9-e882-2e134c6ba45f@iee.email \
    --to=philipoakley@iee.email \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.