git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Merging limitations after directory renames -- interesting test repo
Date: Fri, 18 Feb 2011 16:26:14 -0800	[thread overview]
Message-ID: <AANLkTimuU4A7sUqo-dpW3ch4H_WJg+G2ynNmagx=C9t8@mail.gmail.com> (raw)
In-Reply-To: <AANLkTimKp+Z==QXJg2Bagot+Df4REeANuxwVi7bpPCXr@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2460 bytes --]

On Fri, Feb 18, 2011 at 3:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We've had stupid bugs in the "diffcore_count_changes()" logic before.
> It's just that they're _usually_ hidden by overwhelming common code.
>
> In fact, the attached patch improves things a bit.

There's a secondary problem too, which is illustrated by this:

  [torvalds@i5 etherpad]$ git diff --summary -M30 $BASE.. | grep
m0024_statistics_table.js
   copy {trunk/etherpad =>
etherpad}/src/etherpad/db_migrations/m0024_statistics_table.js (100%)
   rename trunk/etherpad/src/etherpad/db_migrations/m0024_statistics_table.js
=> etherpad/src/etherpad/db_migrations/m0040_create_plugin_tables.js
(52%)

which also ends up terminally confusing the merge. It sees that dual
source of m0024_statistics_table.js and just gets really really
confused.

And the bug is that we didn't even ask for copy detection! This just
confuses merging more.

Attached is the ugliest patch ever. I'm in no way implying this should
ever be accepted, with

 - that crazy-ugly static variable to pass in the copy state to the
'for_each_hash()' callback

 - that really ugly "if (detect_rename == DIFF_DETECT_COPY)" with
broken indentation just to get rid of the copy-checking phase.

So please consider the attached patch just a "look, guys, this is
wrong, and here's the ugliest hack you've ever seen to fix it".

Anyway, with this, I can at least do

   git merge -Xrename-threshold=30 origin/pg

and while it fails miserably, the failures are now no longer "totally
obviously a git bug". Now it has real rename-rename conflicts like

CONFLICT (rename/rename): Rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml"
in branch "HEAD"
                          rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml.in"
in "origin/pg"

which really _is_ a conflict that needs user input.

Now, I didn't check that they are *all* of this valid kind, but most
of them really are. The directories are:
  (HEAD): etherpad/src/themes/default/templates
 (origin/pg): etherpad/src/templates

so you'd need to fix that up.

Whatever. It's still a nasty merge, but at least git seems to do a
much better job. That "-Xrename-threshold=30" thing is a total hack,
but it's a valid way to say "ok, git didn't find some renames I want
it to, so let's see if I can force it to do a less critical search".

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2231 bytes --]

 diffcore-rename.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..daa85e4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))
@@ -246,6 +246,8 @@ struct file_similarity {
 	struct file_similarity *next;
 };
 
+static int find_copies_too;
+
 static int find_identical_files(struct file_similarity *src,
 				struct file_similarity *dst)
 {
@@ -277,6 +279,8 @@ static int find_identical_files(struct file_similarity *src,
 			}
 			/* Give higher scores to sources that haven't been used already */
 			score = !source->rename_used;
+			if (source->rename_used && !find_copies_too)
+				continue;
 			score += basename_same(source, target);
 			if (score > best_score) {
 				best = p;
@@ -377,11 +381,12 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(void)
+static int find_exact_renames(int copies)
 {
 	int i;
 	struct hash_table file_table;
 
+	find_copies_too = copies;
 	init_hash(&file_table);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
@@ -467,7 +472,7 @@ void diffcore_rename(struct diff_options *options)
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames();
+	rename_count = find_exact_renames(detect_rename == DIFF_DETECT_COPY);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -551,6 +556,7 @@ void diffcore_rename(struct diff_options *options)
 		rename_count++;
 	}
 
+	if (detect_rename == DIFF_DETECT_COPY)
 	for (i = 0; i < dst_cnt * NUM_CANDIDATE_PER_DST; i++) {
 		struct diff_rename_dst *dst;
 

  reply	other threads:[~2011-02-19  0:26 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-18 18:58 Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-18 22:21 ` Jeff King
2011-02-18 23:27   ` Linus Torvalds
2011-02-19  0:26     ` Linus Torvalds [this message]
2011-02-19  0:52       ` Junio C Hamano
2011-02-19  1:03         ` Linus Torvalds
2011-02-19  1:20           ` Linus Torvalds
2011-02-19  0:55       ` Linus Torvalds
2011-02-19  0:44     ` Junio C Hamano
2011-02-19  9:08     ` Jeff King
2011-02-19 10:19     ` Jeff King
2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
2011-02-21 23:33         ` Junio C Hamano
2011-02-22 15:39           ` Jeff King
2011-02-22 18:46             ` Junio C Hamano
2011-02-23  8:02               ` Jeff King
2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
2011-02-19 17:54         ` Piotr Krukowiecki
2011-02-20 10:10           ` Jeff King
2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
2011-02-20 10:12           ` Jeff King
2011-02-19 10:21       ` [PATCH 3/4] commit: stop setting rename limit Jeff King
2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 15:57         ` Linus Torvalds
2011-02-20  9:48           ` Jeff King
2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
2011-02-20  9:53             ` [PATCH 2/3] merge: enable progress reporting for rename detection Jeff King
2011-02-20  9:56             ` [PATCH 3/3] pull: propagate --progress to merge Jeff King
2011-02-20 10:37             ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 16:29         ` Martin Langhoff
2011-02-20 10:04           ` Jeff King
2011-02-20 13:16             ` Martin Langhoff
2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-19 15:31         ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTimuU4A7sUqo-dpW3ch4H_WJg+G2ynNmagx=C9t8@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.langhoff@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).