git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Cai <johncai86@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: phillip.wood@dunelm.org.uk,
	John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm
Date: Thu, 9 Feb 2023 11:37:25 -0500	[thread overview]
Message-ID: <20230209163725.hyx62qwveut4sj5x@jcai-ubuntu> (raw)
In-Reply-To: <CABPp-BGhuTyq_hrpMc+Ky3yt1UgO7DcAsgcYH15FK--QLdCsQw@mail.gmail.com>

On 23/02/09 12:26AM, Elijah Newren wrote:
> Hi John,
> 
> On Mon, Feb 6, 2023 at 12:02 PM John Cai <johncai86@gmail.com> wrote:
> >
> > Hi Phillip,
> >
> > On 6 Feb 2023, at 11:27, Phillip Wood wrote:
> >
> > > Hi John
> > >
> > > On 05/02/2023 03:46, John Cai via GitGitGadget wrote:
> > >> From: John Cai <johncai86@gmail.com>
> > >>
> > >> It can be useful to specify diff algorithms per file type. For example,
> > >> one may want to use the minimal diff algorithm for .json files, another
> > >> for .c files, etc.
> > >
> > > Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types.
> >
> > At $DAYJOB, there has been a discussion and request for a feature like this [1].
> > One use case that came up was to be able to set a different diff algorithm for
> > .json files.
> >
> > 1. https://gitlab.com/gitlab-org/gitaly/-/issues/2591
> 
> A couple points:
> 
> First, there seems to be a misunderstanding in that issue.  In
> particular, the merge algorithm does call into the xdiff library to do
> the three-way content merge of individual files, and when it does so,
> it has to specify the diff algorithm (or take the default, currently
> myers).  merge-recursive allows the diff algorithm to be specified by
> the user (there are
> -Xdiff-algorithm={histogram,minimal,patience,myers} flags to
> merge/rebase for it), while merge-ort uses histogram (though it uses
> the same parser as merge-recursive and thus gets the variables set
> from the -Xdiff-algorithm flag, it just ignores those values and
> hardcodes histogram).
> 
> Second, I also think the user request got converted to a particular
> solution without looking at the wider problem space:  The idea seemed
> to assume "myers" is default for a good reason, and thus asked for an
> option to use something else.  I'm not sure the assumption is valid; I
> think "myers" is default for historical reasons and histogram is
> better not just for special Salesforce xml files, but code files too.
> The output makes more sense to users.  So much so that even though my
> simple testing suggested it had a 2% performance penalty compared to
> myers, I forced ort to use it[1] even though I designed  everything
> else in that algorithm around eking out maximum performance.  Others
> who have tested the diff algorithms have also found histogram has very
> similar performance to myers, and oftentimes even beats it[2][3].
> Also, worries about invalidating rerere caches[4] was real, but we
> already paid that price when we switched to ort.  And if performance
> is still a worry, [3] gives me reason to believe we can make our
> histogram implementation faster.  Finally, for the period of time when
> Palantir was letting me make an internal git distribution (mostly for
> testing ort), I also carried a patch that changed the default diff
> algorithm to histogram (not just for ort, but for diff/log/etc. as
> well).  Never had any complaints from the users from it.  Perhaps you
> could do the same in your local version of git used by gitaly?

Thanks for that suggestion, Elijah. Changing the diff alg to histogram has been
something we've considered doing as well. However, we've gotten customer
requests to also have the option to view their diffs with patience. Seeing as
the diff algorithm is sometimes a matter of preference, we thought giving users
the control through the repository would be nice.

thanks
John

> 
> [1] See c8017176ac ("merge-ort: use histogram diff", 2020-12-13)
> [2] From 85551232b5 ("perf: compare diff algorithms", 2012-03-06):
> "This does indeed show that histogram diff slightly beats Myers, while
> patience is much slower than the others."
> [3] https://github.com/pascalkuthe/imara-diff
> [4] https://lore.kernel.org/git/20120307114714.GA14990@sigill.intra.peff.net/

  parent reply	other threads:[~2023-02-09 16:37 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05  3:46 [PATCH 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-05  3:46 ` [PATCH 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-06 16:20   ` Phillip Wood
2023-02-05  3:46 ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-05 17:50   ` Eric Sunshine
2023-02-06 13:10     ` John Cai
2023-02-06 16:27   ` Phillip Wood
2023-02-06 18:14     ` Eric Sunshine
2023-02-06 19:50     ` John Cai
2023-02-09  8:26       ` Elijah Newren
2023-02-09 10:31         ` "bad" diffs (was: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm) Ævar Arnfjörð Bjarmason
2023-02-09 16:37         ` John Cai [this message]
2023-02-06 16:39   ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm Ævar Arnfjörð Bjarmason
2023-02-06 20:37     ` John Cai
2023-02-07 14:55       ` Phillip Wood
2023-02-07 17:00         ` John Cai
2023-02-09  9:09           ` Elijah Newren
2023-02-09 14:44             ` Phillip Wood
2023-02-10  9:57               ` Elijah Newren
2023-02-11 17:39                 ` Phillip Wood
2023-02-11  1:59               ` Jeff King
2023-02-15  2:35                 ` Elijah Newren
2023-02-15  4:21                   ` Jeff King
2023-02-15  5:20                     ` Junio C Hamano
2023-02-15 14:44                 ` Phillip Wood
2023-02-15 15:00                   ` Jeff King
2023-02-07 17:27         ` Ævar Arnfjörð Bjarmason
2023-02-15 14:47           ` Phillip Wood
2023-02-09  8:44       ` Elijah Newren
2023-02-14 21:16         ` John Cai
2023-02-15  3:41           ` Elijah Newren
2023-02-09  7:50     ` Elijah Newren
2023-02-09  9:41       ` Ævar Arnfjörð Bjarmason
2023-02-11  2:04         ` Jeff King
2023-02-07 17:56   ` Jeff King
2023-02-07 20:18     ` Ævar Arnfjörð Bjarmason
2023-02-07 20:47       ` Junio C Hamano
2023-02-07 21:05         ` Ævar Arnfjörð Bjarmason
2023-02-07 21:28           ` Junio C Hamano
2023-02-07 21:44             ` Ævar Arnfjörð Bjarmason
2023-02-09 16:34     ` John Cai
2023-02-11  1:39       ` Jeff King
2023-02-14 21:40 ` [PATCH v2 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-14 21:40   ` [PATCH v2 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-15  2:38     ` Junio C Hamano
2023-02-15 23:34       ` John Cai
2023-02-15 23:42         ` Junio C Hamano
2023-02-16  2:14           ` Jeff King
2023-02-16  2:57             ` Junio C Hamano
2023-02-16 20:34               ` John Cai
2023-02-14 21:40   ` [PATCH v2 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-15  2:56     ` Junio C Hamano
2023-02-15  3:20       ` Junio C Hamano
2023-02-16 20:37         ` John Cai
2023-02-17 20:21   ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-17 20:21     ` [PATCH v3 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-17 21:27       ` Junio C Hamano
2023-02-18  1:36       ` Elijah Newren
2023-02-17 20:21     ` [PATCH v3 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-17 21:50       ` Junio C Hamano
2023-02-18  2:56       ` Elijah Newren
2023-02-20 15:32         ` John Cai
2023-02-20 16:21           ` Elijah Newren
2023-02-20 16:49             ` John Cai
2023-02-20 17:32               ` Elijah Newren
2023-02-20 20:53                 ` John Cai
2023-02-22 19:47                 ` Jeff King
2023-02-24 17:44                   ` John Cai
2023-02-18  1:16     ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes Elijah Newren
2023-02-20 13:37       ` John Cai
2023-02-20 21:04     ` [PATCH v4 " John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-20 21:04       ` [PATCH v4 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-21 17:34       ` [PATCH v4 0/2] Teach diff to honor diff algorithms set through git attributes Junio C Hamano
2023-02-21 18:05         ` Elijah Newren
2023-02-21 18:51           ` Junio C Hamano
2023-02-21 19:36             ` John Cai
2023-02-21 20:16               ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230209163725.hyx62qwveut4sj5x@jcai-ubuntu \
    --to=johncai86@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).