All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Erik Faye-Lund <kusmabite@gmail.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow
Date: Fri, 24 Dec 2021 18:31:50 +0000	[thread overview]
Message-ID: <54e16578-93f3-33e7-bd31-8909992e3407@iee.email> (raw)
In-Reply-To: <211224.86zgoqgd7q.gmgdl@evledraar.gmail.com>

On 24/12/2021 16:46, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 24 2021, Philip Oakley wrote:
>
>> On 21/12/2021 23:36, Ævar Arnfjörð Bjarmason wrote:
>>> On Tue, Dec 21 2021, Philip Oakley wrote:
>>>
>>>> Sorry for the late comment..
>>>>
>>>> On 10/12/2021 14:31, Johannes Schindelin wrote:
>>>>> Hi Ævar,
>>>>>
>>>>> On Thu, 9 Dec 2021, Ævar Arnfjörð Bjarmason wrote:
>>>>>
>>>>>> The difference between "master" and "git-for-windows/main" is large
>>>>>> enough that comparing the two will segfault on my system. This is
>>>>>> because the range-diff code does some expensive calculations and will
>>>>>> overflow the "int" type.
>>>>> You are holding this thing wrong.
>>>>>
>>>>> The `main` branch of Git for Windows uses merging rebases, therefore you
>>>>> need to use a commit range like
>>>>> `git-for-windows/main^{/^Start.the.merging}..git-for-windows/main` and
>>>>> compare it to `git-for-windows/main..master`.
>>>> I'm not sure that a Git repo has an established way of indicating to how
>>>> it's branching/merging/releasing workflow is set up, especially for
>>>> projects with non-normative use cases, such as Git for Windows. We don't
>>>> have a git document for covering  the different workflows in common use
>>>> for easy reference and consistent terminology.
>>>>
>>>> The merging rebase flow, with 'fake' merge does solve a problem that
>>>> git.git doesn't have but could easily be a common process for 'friendly
>>>> forks' that follow an upstream with local patches. The choice of
>>>> '{/^Start.the.merging}' is currently specific to the Git-for-Windows
>>>> case making it harder to discover this useful maintainer method.
>>> Yes, but let's not get lost in the weeds here. As I noted I just picked
>>> GFW as a handy example of a large history & that command as a handy
>>> example of something that segfaults on "master".
>> Had you already experienced the segfault locally, without using the GFW
>> example? How many commits were present in that case?
> Yes, I ran into it "orginally" just range-diffing as part of a local
> build process.
>
> I could dig up what revision range it was exactly, but does it matter?
No the particular range-diff doesn't matter, I was checking that this
wasn't a confusion about the Git for Windows workflow.

>
>> The GFW example seems like it's taken the discussion in the wrong direction.
>>
>> For me:
>> $ git log git/master..origin/main --pretty=oneline | wc -l
>> 62105
>>
>> That's a lot of commits to have in a range diff. It's almost as big as
>> the whole of git/master
>>
>> $ git log git/master --pretty=oneline | wc -l
>> 65400
>>
>> Personally I'd like a way of trimming 'deadheads' that's a bit easier
>> that needing to remember Dscho's magic string [1], but time will tell.
> There are some repos that move forward by 500-1k commits/day, and people
> do cherry-pick patches etc. So wanting to range-diff after a couple of
> months is something you might do...

It feels to me that in such cases that maybe the algorithm may need
tweaking for the needle in a haystack case ;-)
>
>>> So the point really isn't to say that we should fix range-diff becase
>>> it'll allow us to run this practically useful command on a git.git fork.
>>>
>>>> I fully agree that the range-diff should probably have a patch limit at
>>>> some sensible value.
>>> Why would it? If I'm willing to spend the CPU to produce a range-diff of
>>> an absurdly large range and I've got the memory why shouldn't we support
>>> it?
>> There will always be a limit somewhere, and if it's not commit count or
>> other easily explained & checked limit it will be hard to rationalise
>> about why Git suddenly fails with an error (or segfault) in those
>> humungous case.
> I think it's fairly easy to explain the "your system wouldn't let us
> malloc more, we're dying" that we get from xmalloc(), st_*() and the
> like.
That's better than a segfault, but does it give actionable information
to the user as to what (& how much) they should do? Hence the comment
about a commit count measure.

>>> We don't in cases like xdiff where it's not trivial to just raise the
>>> limits, but here it seems relatively easy.
>>>
>>> I think limits to save users from spending CPU time they didn't expect
>>> are reasonable, but then we can handle them like the diff/merge rename
>>> detection limits, i.e. print a warning/advice, and allow the user to
>>> opt-out.
>>>
>>> That also doesn't really apply here since "diff/merge" will/might still
>>> do something useful in those scenarios, whereas range-diff would just
>>> have truncated output.
>>>
>>>> The 'confusion' between the types size_t, long and int, does ripple
>>>> through a lot of portable code, as shown in the series. Not an easy problem.
>>> Yes, although here we're not just casting and overflowing types, but
>>> overflowing on multiplication and addition, whereas usually we'd just
>>> overflow on "nr" being too big for "int" or similar.
>> I've been very slowly looking at the `long` limits on GFW which have
>> very similar arithmetic issues for pointers, often with no clear answers.
> Right, that's to do with the whole "long" or whatever use in the
> object.c and related code, but I don't think that's applicable here, is
> it?

That question was more about the policy aspects of ensuring that any
proposals aren't 'head against a brick wall' when it comes to the
potential intrusiveness.

Thanks for the clarifications.

Philip


      reply	other threads:[~2021-12-24 18:31 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 19:19 [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 01/10] string-list API: change "nr" and "alloc" to "size_t" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Ævar Arnfjörð Bjarmason
2021-12-10  3:39   ` Jeff King
2021-12-10 10:22     ` Ævar Arnfjörð Bjarmason
2021-12-10 11:41       ` Jeff King
2021-12-10 12:31         ` Ævar Arnfjörð Bjarmason
2021-12-10 19:24           ` Phillip Wood
2021-12-14 14:34           ` Jeff King
2021-12-10 14:27         ` Johannes Schindelin
2021-12-10 14:58           ` Ævar Arnfjörð Bjarmason
2021-12-11 14:01             ` Johannes Schindelin
2021-12-12 17:44               ` Ævar Arnfjörð Bjarmason
2021-12-14 14:42           ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 03/10] range-diff.c: use "size_t" to refer to "struct string_list"'s "nr" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 04/10] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 05/10] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 06/10] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 07/10] linear-assignment.c: convert a macro to a "static inline" function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 08/10] linear-assignment.c: detect signed add/mul on GCC and Clang Ævar Arnfjörð Bjarmason
2021-12-10  3:56   ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 09/10] linear-assignment.c: add and use intprops.h from Gnulib Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 10/10] linear-assignment.c: use "intmax_t" instead of "int" Ævar Arnfjörð Bjarmason
2021-12-10  4:00   ` Jeff King
2021-12-10 12:30 ` [RFC PATCH v2 0/5] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-10 12:30   ` [RFC PATCH v2 1/5] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-14 13:36     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 2/5] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-14 13:39     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 3/5] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-14 13:40     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 4/5] range-diff.c: rename "n" to "column_count" in get_correspondences() Ævar Arnfjörð Bjarmason
2021-12-14 13:42     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j] Ævar Arnfjörð Bjarmason
2021-12-14 14:04     ` Jeff King
2021-12-10 14:31 ` [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Johannes Schindelin
2021-12-10 15:07   ` Ævar Arnfjörð Bjarmason
2021-12-21 23:22   ` Philip Oakley
2021-12-21 23:36     ` Ævar Arnfjörð Bjarmason
2021-12-22 20:50       ` Johannes Schindelin
2021-12-22 21:11         ` Jeff King
2021-12-24 11:15       ` Philip Oakley
2021-12-24 16:46         ` Ævar Arnfjörð Bjarmason
2021-12-24 18:31           ` Philip Oakley [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54e16578-93f3-33e7-bd31-8909992e3407@iee.email \
    --to=philipoakley@iee.email \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=kusmabite@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.