git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jeff King <peff@peff.net>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Merging limitations after directory renames -- interesting test repo
Date: Fri, 18 Feb 2011 15:27:36 -0800	[thread overview]
Message-ID: <AANLkTimKp+Z==QXJg2Bagot+Df4REeANuxwVi7bpPCXr@mail.gmail.com> (raw)
In-Reply-To: <20110218222151.GB4258@sigill.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 4030 bytes --]

On Fri, Feb 18, 2011 at 2:21 PM, Jeff King <peff@peff.net> wrote:
>
>  1. Did you bump up your merge.renamelimit? It's hard to see because it
>     scrolls off the screen amidst the many conflicts, but the first
>     message is:
>
>       warning: too many files (created: 425 deleted: 1093), skipping
>       inexact rename detection
>
>     which you want to use. Try "git config merge.renamelimit
>     10000". Which runs pretty snappily on my machine; I wonder if we
>     should up the default limit.

Yeah, for the kernel, I have

	[diff]
		renamelimit=0

to disable the limit entirely, because the default limit is very low
indeed. Git is quite good at the rename detection.

However, the reason for the low default is not because it's not snappy
enough - it's because it can end up using a lot of memory (and if
you're low on memory, the swapping will mean that it goes from "quite
snappy" to "slow as molasses" - but it still will not be CPU limited,
it's just paging like crazy).

That said, having the error message scroll by and not even be noticed
by somebody doing a merge is really not good. It's an important hint,
and if you miss that hint you can easily have a totally trivial merge
turn into a totally unacceptable one.

So I do think we could try to lift the default a bit, but it might be
even more important to just make the message much more noticeable and
avoid scrolling past it. For example, setting a flag, and not printing
out the message immediately, but instead print it out only if it turns
into trouble at the end.

I dunno.

>  2. Which version of git are you using? Commit 83c9031 produces some
>     funky results on merges. It's not in any released version of git,
>     but it is in master.

Ouch. I thought that commit wasn't _supposed_ to make any difference.
If it does, then that sounds like a bad bug. Junio?

> Hmm. That is probably because it was substantially rewritten:

Damn. We suck.

Lookie here, here's the problem (well, at least part of it):

  [torvalds@i5 etherpad]$ git diff --summary -M10 master...origin/pg |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (89%)

That looks fine. But how about doing it the other way around:

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   copy trunk/infrastructure/bin/comp.sh => etherpad/bin/rebuildjar.sh (13%)
   delete mode 100755 trunk/etherpad/bin/rebuildjar.sh

Yeah, not such a good match. And it's total crap too: that rename is
much worse than the real one. So we've actually done something
actively wrong. Because we have:

   .../etherpad => master:etherpad}/bin/rebuildjar.sh |   92
+++++++++++++++++++-
   1 files changed, 89 insertions(+), 3 deletions(-)

and

   .../comp.sh => master:etherpad/bin/rebuildjar.sh   |  296
+++++++++-----------
   1 files changed, 129 insertions(+), 167 deletions(-)

so that choice of comp.sh was total and utter crap, and the 13%
similarity comes from that.

So we did something wrong in picking that comp.sh file. And I bet it's
because on a very superficial level it looks better: despite the diff
being *obviously* bigger for the crazy comp.sh choice, I think it
decided that they had more lines in common.

We've had stupid bugs in the "diffcore_count_changes()" logic before.
It's just that they're _usually_ hidden by overwhelming common code.

In fact, the attached patch improves things a bit. At a huge added CPU
cost in rename detection. Then I get

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (32%)

which is at least the sane choice for rename detection. It won't fix
really the merge-from-hell issue, though, since we won't consider "32%
similar" to be a rename.

We might want to have a way to force rename detection for certain
patterns for cases like this.

                                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 590 bytes --]

 diffcore-rename.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..19957cd 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))

  reply	other threads:[~2011-02-18 23:28 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-18 18:58 Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-18 22:21 ` Jeff King
2011-02-18 23:27   ` Linus Torvalds [this message]
2011-02-19  0:26     ` Linus Torvalds
2011-02-19  0:52       ` Junio C Hamano
2011-02-19  1:03         ` Linus Torvalds
2011-02-19  1:20           ` Linus Torvalds
2011-02-19  0:55       ` Linus Torvalds
2011-02-19  0:44     ` Junio C Hamano
2011-02-19  9:08     ` Jeff King
2011-02-19 10:19     ` Jeff King
2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
2011-02-21 23:33         ` Junio C Hamano
2011-02-22 15:39           ` Jeff King
2011-02-22 18:46             ` Junio C Hamano
2011-02-23  8:02               ` Jeff King
2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
2011-02-19 17:54         ` Piotr Krukowiecki
2011-02-20 10:10           ` Jeff King
2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
2011-02-20 10:12           ` Jeff King
2011-02-19 10:21       ` [PATCH 3/4] commit: stop setting rename limit Jeff King
2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 15:57         ` Linus Torvalds
2011-02-20  9:48           ` Jeff King
2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
2011-02-20  9:53             ` [PATCH 2/3] merge: enable progress reporting for rename detection Jeff King
2011-02-20  9:56             ` [PATCH 3/3] pull: propagate --progress to merge Jeff King
2011-02-20 10:37             ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 16:29         ` Martin Langhoff
2011-02-20 10:04           ` Jeff King
2011-02-20 13:16             ` Martin Langhoff
2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-19 15:31         ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTimKp+Z==QXJg2Bagot+Df4REeANuxwVi7bpPCXr@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).