git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: John Cai <johncai86@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"John Cai via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm
Date: Tue, 14 Feb 2023 19:41:00 -0800	[thread overview]
Message-ID: <CABPp-BGmFemkiD1OFrrOdaJt9PjGRp+QHoV_azPqvTtx6CdD9Q@mail.gmail.com> (raw)
In-Reply-To: <AF5092D2-A561-4B56-8FB8-25DCFA28F32C@gmail.com>

Hi John!

On Tue, Feb 14, 2023 at 1:16 PM John Cai <johncai86@gmail.com> wrote:
> On 9 Feb 2023, at 3:44, Elijah Newren wrote:
> > On Mon, Feb 6, 2023 at 12:47 PM John Cai <johncai86@gmail.com> wrote:
> >>
[...]
> It seems like the performance penalty was because I was adding calls to parse
> attribute files. Piggy backing off of the attribute parsing in userdiff.h will
> allow us to not incur this performance penalty:
>
> $ hyperfine -r 5 -L a bin-wrappers/git,git '{a} diff v2.0.0 v2.28.0'
> Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0
>   Time (mean ± σ):      1.072 s ±  0.289 s    [User: 0.626 s, System: 0.081 s]
>   Range (min … max):    0.772 s …  1.537 s    5 runs
>
> Benchmark 2: git diff v2.0.0 v2.28.0
>   Time (mean ± σ):      1.003 s ±  0.065 s    [User: 0.684 s, System: 0.067 s]
>   Range (min … max):    0.914 s …  1.091 s    5 runs
>
> Summary
>   'git diff v2.0.0 v2.28.0' ran
>     1.07 ± 0.30 times faster than 'git-bin-wrapper diff v2.0.0 v2.28.0'

Yaay!  Much better.  :-)

I'm curious, though, whether you are showing here a 7% slowdown (which
would still be bad), or just that the feature is correctly choosing a
different (but slower) algorithm for some files, or some kind of mix.

What is the performance difference if you have this feature included,
but don't have any directives in .gitattributes selecting a different
diff algorithm for any files?

> > And on a separate note...
> >
> > There's another set of considerations we might need to include here as
> > well that I haven't seen anyone else in this thread talk about:
>
> These are some great questions. I'll do my best to answer them.
> >
> > * When trying to diff files, do we read the .gitattributes file from
> > the current checkout to determine the diff algorithm(s)?  Or the
> > index?  Or the commit we are diffing against?
> > * If we use the current checkout or index, what about bare clones or
> > diffing between two different commits?
> > * If diffing between two different commits, and the .gitattributes has
> > changed between those commits, which .gitattributes file wins?
> > * If diffing between two different commits, and the .gitattributes has
> > NOT changed, BUT a file has been renamed and the old and new names
> > have different rules, which rule wins?
>
> In the next version I plan on using Peff's suggestion of utilizing the existing
> diff driver scheme [1]. I believe these four questions are addressed if we use
> the existing userdiff.h API, which in turn calls the attr.h API. We check the
> worktree, then fallback to the index.

So...it sounds like we're just ignoring all the special cases listed
above, and living with bugs related to them?  That's not a criticism;
in fact, it might be okay -- after all, that's exactly what the
existing .gitattributes handling does and you are just hooking into
it.

I am a bit concerned, though, that we're increasing the visibility of
the interactions of .gitattributes with respect to these kinds of
cases.  I think external drivers are probably much less used than what
your feature might be, so folks are more likely to stumble into these
cases and complain.  Perhaps those cases are rare enough that we don't
care, but it might be at least worth documenting the issues (both to
manage user expectations and to give people a heads up about the
potential issues.)

(Also, it may be worth mentioning that I tend to focus on unusual
cases for anything that might touch merging; Junio once named one of
my patchsets "en/t6042-insane-merge-rename-testcases".  It's possible
I worry about corner cases more than is justified given their real
world likelihood.)

> By using the userdiff.h API, the behavior will match what users already expect
> when they for instance set an external driver.

s/already expect/already get/

The bugs also affect external drivers; I just suspect external drivers
aren't used enough that users have complained very loudly (yet?).

> 1. https://lore.kernel.org/git/Y+KQtqNPews3vBS8@coredump.intra.peff.net/
>
> >
> > * If per-file diff algorithms are adopted widely enough, will we be
> > forced to change the merge algorithm to also pay attention to them?
> > If it does, more complicated rename cases occur and we need rules for
> > how to handle those.
> > * If the merge algorithm has to pay attention to .gitattributes for
> > this too, we'll have even more corner cases around what happens if
> > there are merge conflicts in .gitattributes itself (which is already
> > kind of ugly and kludged)
>
> I see this feature as a user-experience type convenience feature, so I don't
> believe there's need for the merge machinery to also pay attention to the diff
> algorithm set through gitattrbutes. We can clarify this in the documentation.

That would be awesome; *please* do this.  This is my primary concern
with this patchset.

I've spent an awful lot of time dealing with weird corner cases in the
merge machinery, and this appears to open a big can of worms to me.
It'd be a huge relief if we just agreed that the .gitattributes
handling here is only meant for user-facing diffs and will not be
consulted by the merge machinery.

> > Anyway, I know I'm a bit animated and biased in this area, and I
> > apologize if I'm a bit too much so.  Even if I am, hopefully my
> > comments at least provide some useful context.
>
> No problem! thanks for raising these issues.
>
> thanks
> John

  reply	other threads:[~2023-02-15  3:42 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05  3:46 [PATCH 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-05  3:46 ` [PATCH 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-06 16:20   ` Phillip Wood
2023-02-05  3:46 ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-05 17:50   ` Eric Sunshine
2023-02-06 13:10     ` John Cai
2023-02-06 16:27   ` Phillip Wood
2023-02-06 18:14     ` Eric Sunshine
2023-02-06 19:50     ` John Cai
2023-02-09  8:26       ` Elijah Newren
2023-02-09 10:31         ` "bad" diffs (was: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm) Ævar Arnfjörð Bjarmason
2023-02-09 16:37         ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai
2023-02-06 16:39   ` Ævar Arnfjörð Bjarmason
2023-02-06 20:37     ` John Cai
2023-02-07 14:55       ` Phillip Wood
2023-02-07 17:00         ` John Cai
2023-02-09  9:09           ` Elijah Newren
2023-02-09 14:44             ` Phillip Wood
2023-02-10  9:57               ` Elijah Newren
2023-02-11 17:39                 ` Phillip Wood
2023-02-11  1:59               ` Jeff King
2023-02-15  2:35                 ` Elijah Newren
2023-02-15  4:21                   ` Jeff King
2023-02-15  5:20                     ` Junio C Hamano
2023-02-15 14:44                 ` Phillip Wood
2023-02-15 15:00                   ` Jeff King
2023-02-07 17:27         ` Ævar Arnfjörð Bjarmason
2023-02-15 14:47           ` Phillip Wood
2023-02-09  8:44       ` Elijah Newren
2023-02-14 21:16         ` John Cai
2023-02-15  3:41           ` Elijah Newren [this message]
2023-02-09  7:50     ` Elijah Newren
2023-02-09  9:41       ` Ævar Arnfjörð Bjarmason
2023-02-11  2:04         ` Jeff King
2023-02-07 17:56   ` Jeff King
2023-02-07 20:18     ` Ævar Arnfjörð Bjarmason
2023-02-07 20:47       ` Junio C Hamano
2023-02-07 21:05         ` Ævar Arnfjörð Bjarmason
2023-02-07 21:28           ` Junio C Hamano
2023-02-07 21:44             ` Ævar Arnfjörð Bjarmason
2023-02-09 16:34     ` John Cai
2023-02-11  1:39       ` Jeff King
2023-02-14 21:40 ` [PATCH v2 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-14 21:40   ` [PATCH v2 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-15  2:38     ` Junio C Hamano
2023-02-15 23:34       ` John Cai
2023-02-15 23:42         ` Junio C Hamano
2023-02-16  2:14           ` Jeff King
2023-02-16  2:57             ` Junio C Hamano
2023-02-16 20:34               ` John Cai
2023-02-14 21:40   ` [PATCH v2 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-15  2:56     ` Junio C Hamano
2023-02-15  3:20       ` Junio C Hamano
2023-02-16 20:37         ` John Cai
2023-02-17 20:21   ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-17 20:21     ` [PATCH v3 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-17 21:27       ` Junio C Hamano
2023-02-18  1:36       ` Elijah Newren
2023-02-17 20:21     ` [PATCH v3 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-17 21:50       ` Junio C Hamano
2023-02-18  2:56       ` Elijah Newren
2023-02-20 15:32         ` John Cai
2023-02-20 16:21           ` Elijah Newren
2023-02-20 16:49             ` John Cai
2023-02-20 17:32               ` Elijah Newren
2023-02-20 20:53                 ` John Cai
2023-02-22 19:47                 ` Jeff King
2023-02-24 17:44                   ` John Cai
2023-02-18  1:16     ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes Elijah Newren
2023-02-20 13:37       ` John Cai
2023-02-20 21:04     ` [PATCH v4 " John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-21 17:34       ` [PATCH v4 0/2] Teach diff to honor diff algorithms set through git attributes Junio C Hamano
2023-02-21 18:05         ` Elijah Newren
2023-02-21 18:51           ` Junio C Hamano
2023-02-21 19:36             ` John Cai
2023-02-21 20:16               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGmFemkiD1OFrrOdaJt9PjGRp+QHoV_azPqvTtx6CdD9Q@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).