All of lore.kernel.org
 help / color / mirror / Atom feed
* Merging limitations after directory renames -- interesting test repo
@ 2011-02-18 18:58 Martin Langhoff
  2011-02-18 22:21 ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Langhoff @ 2011-02-18 18:58 UTC (permalink / raw)
  To: Git Mailing List

Briefly,

Fetch git://dev.laptop.org/git/users/martin/etherpad and try to merge
pg and master. A ton of work is left to do by hand, and you don't even
get useful conflicts.

 - Many paths that shoud succeed don't seem to succeed. After using
git merge, tell me if this file doesn't look like at least git should
have _attempted_ a diff3 on it:
   gitk --all  -- etherpad/bin/rebuildjar.sh
trunk/etherpad/bin/rebuildjar.sh
trunk/trunk/etherpad/bin/rebuildjar.sh

 - Git diff seems confused, says "path needs merge" instead of giving
me a diff. May be related to the first problem
    git diff -- etherpad/bin/rebuildjar.sh

   However I can
   git diff master -- etherpad/bin/rebuildjar.sh
   git diff pg -- etherpad/bin/rebuildjar.sh

Note note! The "pg" branch you see has been artificially rebuilt so I
may have made a mistake. However, I've reviewed all the commits and it
seems correct. Original repo comes from SVN via hg -- see
git://github.com/ether/pad.git

Most importantly, even if I misfired here or there, the resulting
history of the paths reads like something git _can_ make sense of if
it traces the history of the paths carefully.

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 18:58 Merging limitations after directory renames -- interesting test repo Martin Langhoff
@ 2011-02-18 22:21 ` Jeff King
  2011-02-18 23:27   ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-02-18 22:21 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Git Mailing List

[cc'ing Linus, who is usually interested in deep rename issues]

On Fri, Feb 18, 2011 at 01:58:48PM -0500, Martin Langhoff wrote:

> Fetch git://dev.laptop.org/git/users/martin/etherpad and try to merge
> pg and master. A ton of work is left to do by hand, and you don't even

s{git/}{} in your url.

>  - Many paths that shoud succeed don't seem to succeed.

Hmm. Two questions:

  1. Did you bump up your merge.renamelimit? It's hard to see because it
     scrolls off the screen amidst the many conflicts, but the first
     message is:

       warning: too many files (created: 425 deleted: 1093), skipping
       inexact rename detection

     which you want to use. Try "git config merge.renamelimit
     10000". Which runs pretty snappily on my machine; I wonder if we
     should up the default limit.

  2. Which version of git are you using? Commit 83c9031 produces some
     funky results on merges. It's not in any released version of git,
     but it is in master.

> After using git merge, tell me if this file doesn't look like at least
> git should have _attempted_ a diff3 on it:
>
>    gitk --all  -- etherpad/bin/rebuildjar.sh
>    trunk/etherpad/bin/rebuildjar.sh
>    trunk/trunk/etherpad/bin/rebuildjar.sh

So assuming the two issues above are not a problem, merge says:

  CONFLICT (rename/delete): Rename
  trunk/etherpad/bin/rebuildjar.sh->etherpad/bin/rebuildjar.sh in
  origin/pg and deleted in HEAD

about this and many other files. But that doesn't seem right. Both sides
did the same rename. But we don't seem to detect it on "master":

  $ base=`git merge-base master origin/pg`
  $ git diff -M --summary $base master | grep rebuildjar
   create mode 100755 etherpad/bin/rebuildjar.sh
   delete mode 100755 trunk/etherpad/bin/rebuildjar.sh

Hmm. That is probably because it was substantially rewritten:

  $ wc -l etherpad/bin/rebuildjar.sh
  161

  $ git diff --stat \
      $base:trunk/etherpad/bin/rebuildjar.sh \
      master:etherpad/bin/rebuildjar.sh
   .../etherpad => master:etherpad}/bin/rebuildjar.sh |   92 +++++++++++++-
   1 files changed, 89 insertions(+), 3 deletions(-)

You can ignore the rename-looking bit in the diffstat; it's an artifact
of the way single-blob diffs are done. But the important thing to note
is that it was mostly rewritten on the master branch, and that's why the
rename isn't found.

This seems like an interesting case with respect to git's philosophy of
automatic rename detection. Usually the notion of "if it is so far apart
that we can't do rename detection, a 3-way merge is not likely to be
interesting" works.

But in this case, that may not hold. We have identical renames on either
side, but one side has been modified a lot. The 3-way merge may very
well be useful, but we never get there, because we don't even consider
that the two endpoints may be related.

You can simulate the 3-way merge that would happen with:

  $ git show origin/pg:etherpad/bin/rebuildjar.sh >pg
  $ git show master:etherpad/bin/rebuildjar.sh >master
  $ git show $base:trunk/etherpad/bin/rebuildjar.sh >base
  $ git merge-file master base pg
  $ $EDITOR master

It's an ugly merge, but way more useful than dumping with a
rename/delete conflict.

I'm not sure of the right solution. I guess if there was some way to say
"no really, these are renames" it would make your merge easier. But I
wonder if we can somehow be more clever.

>  - Git diff seems confused, says "path needs merge" instead of giving
> me a diff. May be related to the first problem
>     git diff -- etherpad/bin/rebuildjar.sh

I think this is a side effect of the rename/delete conflict. There is
nothing to diff against on one side.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 22:21 ` Jeff King
@ 2011-02-18 23:27   ` Linus Torvalds
  2011-02-19  0:26     ` Linus Torvalds
                       ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Linus Torvalds @ 2011-02-18 23:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4030 bytes --]

On Fri, Feb 18, 2011 at 2:21 PM, Jeff King <peff@peff.net> wrote:
>
>  1. Did you bump up your merge.renamelimit? It's hard to see because it
>     scrolls off the screen amidst the many conflicts, but the first
>     message is:
>
>       warning: too many files (created: 425 deleted: 1093), skipping
>       inexact rename detection
>
>     which you want to use. Try "git config merge.renamelimit
>     10000". Which runs pretty snappily on my machine; I wonder if we
>     should up the default limit.

Yeah, for the kernel, I have

	[diff]
		renamelimit=0

to disable the limit entirely, because the default limit is very low
indeed. Git is quite good at the rename detection.

However, the reason for the low default is not because it's not snappy
enough - it's because it can end up using a lot of memory (and if
you're low on memory, the swapping will mean that it goes from "quite
snappy" to "slow as molasses" - but it still will not be CPU limited,
it's just paging like crazy).

That said, having the error message scroll by and not even be noticed
by somebody doing a merge is really not good. It's an important hint,
and if you miss that hint you can easily have a totally trivial merge
turn into a totally unacceptable one.

So I do think we could try to lift the default a bit, but it might be
even more important to just make the message much more noticeable and
avoid scrolling past it. For example, setting a flag, and not printing
out the message immediately, but instead print it out only if it turns
into trouble at the end.

I dunno.

>  2. Which version of git are you using? Commit 83c9031 produces some
>     funky results on merges. It's not in any released version of git,
>     but it is in master.

Ouch. I thought that commit wasn't _supposed_ to make any difference.
If it does, then that sounds like a bad bug. Junio?

> Hmm. That is probably because it was substantially rewritten:

Damn. We suck.

Lookie here, here's the problem (well, at least part of it):

  [torvalds@i5 etherpad]$ git diff --summary -M10 master...origin/pg |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (89%)

That looks fine. But how about doing it the other way around:

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   copy trunk/infrastructure/bin/comp.sh => etherpad/bin/rebuildjar.sh (13%)
   delete mode 100755 trunk/etherpad/bin/rebuildjar.sh

Yeah, not such a good match. And it's total crap too: that rename is
much worse than the real one. So we've actually done something
actively wrong. Because we have:

   .../etherpad => master:etherpad}/bin/rebuildjar.sh |   92
+++++++++++++++++++-
   1 files changed, 89 insertions(+), 3 deletions(-)

and

   .../comp.sh => master:etherpad/bin/rebuildjar.sh   |  296
+++++++++-----------
   1 files changed, 129 insertions(+), 167 deletions(-)

so that choice of comp.sh was total and utter crap, and the 13%
similarity comes from that.

So we did something wrong in picking that comp.sh file. And I bet it's
because on a very superficial level it looks better: despite the diff
being *obviously* bigger for the crazy comp.sh choice, I think it
decided that they had more lines in common.

We've had stupid bugs in the "diffcore_count_changes()" logic before.
It's just that they're _usually_ hidden by overwhelming common code.

In fact, the attached patch improves things a bit. At a huge added CPU
cost in rename detection. Then I get

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (32%)

which is at least the sane choice for rename detection. It won't fix
really the merge-from-hell issue, though, since we won't consider "32%
similar" to be a rename.

We might want to have a way to force rename detection for certain
patterns for cases like this.

                                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 590 bytes --]

 diffcore-rename.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..19957cd 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 23:27   ` Linus Torvalds
@ 2011-02-19  0:26     ` Linus Torvalds
  2011-02-19  0:52       ` Junio C Hamano
  2011-02-19  0:55       ` Linus Torvalds
  2011-02-19  0:44     ` Junio C Hamano
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2011-02-19  0:26 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2460 bytes --]

On Fri, Feb 18, 2011 at 3:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We've had stupid bugs in the "diffcore_count_changes()" logic before.
> It's just that they're _usually_ hidden by overwhelming common code.
>
> In fact, the attached patch improves things a bit.

There's a secondary problem too, which is illustrated by this:

  [torvalds@i5 etherpad]$ git diff --summary -M30 $BASE.. | grep
m0024_statistics_table.js
   copy {trunk/etherpad =>
etherpad}/src/etherpad/db_migrations/m0024_statistics_table.js (100%)
   rename trunk/etherpad/src/etherpad/db_migrations/m0024_statistics_table.js
=> etherpad/src/etherpad/db_migrations/m0040_create_plugin_tables.js
(52%)

which also ends up terminally confusing the merge. It sees that dual
source of m0024_statistics_table.js and just gets really really
confused.

And the bug is that we didn't even ask for copy detection! This just
confuses merging more.

Attached is the ugliest patch ever. I'm in no way implying this should
ever be accepted, with

 - that crazy-ugly static variable to pass in the copy state to the
'for_each_hash()' callback

 - that really ugly "if (detect_rename == DIFF_DETECT_COPY)" with
broken indentation just to get rid of the copy-checking phase.

So please consider the attached patch just a "look, guys, this is
wrong, and here's the ugliest hack you've ever seen to fix it".

Anyway, with this, I can at least do

   git merge -Xrename-threshold=30 origin/pg

and while it fails miserably, the failures are now no longer "totally
obviously a git bug". Now it has real rename-rename conflicts like

CONFLICT (rename/rename): Rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml"
in branch "HEAD"
                          rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml.in"
in "origin/pg"

which really _is_ a conflict that needs user input.

Now, I didn't check that they are *all* of this valid kind, but most
of them really are. The directories are:
  (HEAD): etherpad/src/themes/default/templates
 (origin/pg): etherpad/src/templates

so you'd need to fix that up.

Whatever. It's still a nasty merge, but at least git seems to do a
much better job. That "-Xrename-threshold=30" thing is a total hack,
but it's a valid way to say "ok, git didn't find some renames I want
it to, so let's see if I can force it to do a less critical search".

                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2231 bytes --]

 diffcore-rename.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..daa85e4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))
@@ -246,6 +246,8 @@ struct file_similarity {
 	struct file_similarity *next;
 };
 
+static int find_copies_too;
+
 static int find_identical_files(struct file_similarity *src,
 				struct file_similarity *dst)
 {
@@ -277,6 +279,8 @@ static int find_identical_files(struct file_similarity *src,
 			}
 			/* Give higher scores to sources that haven't been used already */
 			score = !source->rename_used;
+			if (source->rename_used && !find_copies_too)
+				continue;
 			score += basename_same(source, target);
 			if (score > best_score) {
 				best = p;
@@ -377,11 +381,12 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(void)
+static int find_exact_renames(int copies)
 {
 	int i;
 	struct hash_table file_table;
 
+	find_copies_too = copies;
 	init_hash(&file_table);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
@@ -467,7 +472,7 @@ void diffcore_rename(struct diff_options *options)
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames();
+	rename_count = find_exact_renames(detect_rename == DIFF_DETECT_COPY);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -551,6 +556,7 @@ void diffcore_rename(struct diff_options *options)
 		rename_count++;
 	}
 
+	if (detect_rename == DIFF_DETECT_COPY)
 	for (i = 0; i < dst_cnt * NUM_CANDIDATE_PER_DST; i++) {
 		struct diff_rename_dst *dst;
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 23:27   ` Linus Torvalds
  2011-02-19  0:26     ` Linus Torvalds
@ 2011-02-19  0:44     ` Junio C Hamano
  2011-02-19  9:08     ` Jeff King
  2011-02-19 10:19     ` Jeff King
  3 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2011-02-19  0:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Martin Langhoff, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Ouch. I thought that commit wasn't _supposed_ to make any difference.
> If it does, then that sounds like a bad bug. Junio?

It already has been reverted.

I haven't really looked at the issue yet, but for one thing it seemed to
break synthesis of the virtual common tree inside merge-recursive for some
reason, and ends up producing a lot more diffs than necessary between the
common and the tips being merged.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19  0:26     ` Linus Torvalds
@ 2011-02-19  0:52       ` Junio C Hamano
  2011-02-19  1:03         ` Linus Torvalds
  2011-02-19  0:55       ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2011-02-19  0:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Martin Langhoff, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And the bug is that we didn't even ask for copy detection! This just
> confuses merging more.

Yeah, I think it is probably a good idea to really limit the rename
detection only to renames at least inside merge-recursive.  The static
variable does look ugly but it shouldn't be a rocket surgery to pass it
down if we want to.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19  0:26     ` Linus Torvalds
  2011-02-19  0:52       ` Junio C Hamano
@ 2011-02-19  0:55       ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2011-02-19  0:55 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Martin Langhoff, Git Mailing List

On Fri, Feb 18, 2011 at 4:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So please consider the attached patch just a "look, guys, this is
> wrong, and here's the ugliest hack you've ever seen to fix it".

Btw, the more I think about it, the more I suspect that the
"estimate_similarity()" part of the patch is correct, or at least
better than what we used to have.

If we have a file that expanded from 100 lines to 200 lines, and all
of the old contents are there, then I think that logically people
would expect it to be a "50% similarity".

But the thing is, with the old code, we would look at the old smaller
size (100 lines), and take 50% of that. And then when the delta (also
100 lines) is bigger than that 50%, then we'd totally dismiss that
thing from similarity analysis, because it obviously isn't similar
enough.

So using the bigger size as the basis (and taking 50% of _that_ and
comparing it to the delta) is probably the sane thing to do.

The rest of the patch I still think is total crap. The _intention_ is
good, but the patch was written to be small rather than the right way
of doing things.

                          Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19  0:52       ` Junio C Hamano
@ 2011-02-19  1:03         ` Linus Torvalds
  2011-02-19  1:20           ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2011-02-19  1:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

On Fri, Feb 18, 2011 at 4:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Yeah, I think it is probably a good idea to really limit the rename
> detection only to renames at least inside merge-recursive.  The static
> variable does look ugly but it shouldn't be a rocket surgery to pass it
> down if we want to.

Here's a slight update. It still has the "too ugly to live"
if-statment without proper indentation, but it now changes
"for_each_hash()" to pass in a flag value and uses that.

It also fixes one of the tests that depended on the "find copies"
behavior for its result to actually ask for copies now that we don't
give them unless you do.

                                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 5616 bytes --]

 builtin/describe.c            |    4 ++--
 diffcore-rename.c             |   17 ++++++++++-------
 hash.c                        |    4 ++--
 hash.h                        |    2 +-
 t/t4008-diff-break-rewrite.sh |    4 ++--
 5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 342129f..4f4fe3e 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -63,7 +63,7 @@ static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 	return n;
 }
 
-static int set_util(void *chain)
+static int set_util(void *chain, int flag)
 {
 	struct commit_name *n;
 	for (n = chain; n; n = n->next) {
@@ -289,7 +289,7 @@ static void describe(const char *arg, int last_one)
 		fprintf(stderr, "searching to describe %s\n", arg);
 
 	if (!have_util) {
-		for_each_hash(&names, set_util);
+		for_each_hash(&names, set_util, 0);
 		have_util = 1;
 	}
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..3a937ac 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))
@@ -247,7 +247,7 @@ struct file_similarity {
 };
 
 static int find_identical_files(struct file_similarity *src,
-				struct file_similarity *dst)
+				struct file_similarity *dst, int copies)
 {
 	int renames = 0;
 
@@ -277,6 +277,8 @@ static int find_identical_files(struct file_similarity *src,
 			}
 			/* Give higher scores to sources that haven't been used already */
 			score = !source->rename_used;
+			if (source->rename_used && !copies)
+				continue;
 			score += basename_same(source, target);
 			if (score > best_score) {
 				best = p;
@@ -306,7 +308,7 @@ static void free_similarity_list(struct file_similarity *p)
 	}
 }
 
-static int find_same_files(void *ptr)
+static int find_same_files(void *ptr, int copies)
 {
 	int ret;
 	struct file_similarity *p = ptr;
@@ -329,7 +331,7 @@ static int find_same_files(void *ptr)
 	 * If we have both sources *and* destinations, see if
 	 * we can match them up
 	 */
-	ret = (src && dst) ? find_identical_files(src, dst) : 0;
+	ret = (src && dst) ? find_identical_files(src, dst, copies) : 0;
 
 	/* Free the hashes and return the number of renames found */
 	free_similarity_list(src);
@@ -377,7 +379,7 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(void)
+static int find_exact_renames(int copies)
 {
 	int i;
 	struct hash_table file_table;
@@ -390,7 +392,7 @@ static int find_exact_renames(void)
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
 
 	/* Find the renames */
-	i = for_each_hash(&file_table, find_same_files);
+	i = for_each_hash(&file_table, find_same_files, copies);
 
 	/* .. and free the hash data structure */
 	free_hash(&file_table);
@@ -467,7 +469,7 @@ void diffcore_rename(struct diff_options *options)
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames();
+	rename_count = find_exact_renames(detect_rename == DIFF_DETECT_COPY);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -551,6 +553,7 @@ void diffcore_rename(struct diff_options *options)
 		rename_count++;
 	}
 
+	if (detect_rename == DIFF_DETECT_COPY)
 	for (i = 0; i < dst_cnt * NUM_CANDIDATE_PER_DST; i++) {
 		struct diff_rename_dst *dst;
 
diff --git a/hash.c b/hash.c
index 1cd4c9d..c5dd002 100644
--- a/hash.c
+++ b/hash.c
@@ -81,7 +81,7 @@ void **insert_hash(unsigned int hash, void *ptr, struct hash_table *table)
 	return insert_hash_entry(hash, ptr, table);
 }
 
-int for_each_hash(const struct hash_table *table, int (*fn)(void *))
+int for_each_hash(const struct hash_table *table, int (*fn)(void *, int), int flag)
 {
 	int sum = 0;
 	unsigned int i;
@@ -92,7 +92,7 @@ int for_each_hash(const struct hash_table *table, int (*fn)(void *))
 		void *ptr = array->ptr;
 		array++;
 		if (ptr) {
-			int val = fn(ptr);
+			int val = fn(ptr, flag);
 			if (val < 0)
 				return val;
 			sum += val;
diff --git a/hash.h b/hash.h
index 69e33a4..1e2de55 100644
--- a/hash.h
+++ b/hash.h
@@ -30,7 +30,7 @@ struct hash_table {
 
 extern void *lookup_hash(unsigned int hash, const struct hash_table *table);
 extern void **insert_hash(unsigned int hash, void *ptr, struct hash_table *table);
-extern int for_each_hash(const struct hash_table *table, int (*fn)(void *));
+extern int for_each_hash(const struct hash_table *table, int (*fn)(void *, int), int flag);
 extern void free_hash(struct hash_table *table);
 
 static inline void init_hash(struct hash_table *table)
diff --git a/t/t4008-diff-break-rewrite.sh b/t/t4008-diff-break-rewrite.sh
index d79d9e1e..73b4a24 100755
--- a/t/t4008-diff-break-rewrite.sh
+++ b/t/t4008-diff-break-rewrite.sh
@@ -173,8 +173,8 @@ test_expect_success \
     'compare_diff_raw expected current'
 
 test_expect_success \
-    'run diff with -B -M' \
-    'git diff-index -B -M "$tree" >current'
+    'run diff with -B -C' \
+    'git diff-index -B -C "$tree" >current'
 
 cat >expected <<\EOF
 :100644 100644 f5deac7be59e7eeab8657fd9ae706fd6a57daed2 08bb2fb671deff4c03a4d4a0a1315dff98d5732c C095	file0	file1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19  1:03         ` Linus Torvalds
@ 2011-02-19  1:20           ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2011-02-19  1:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Martin Langhoff, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

On Fri, Feb 18, 2011 at 5:03 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It also fixes one of the tests that depended on the "find copies"
> behavior for its result to actually ask for copies now that we don't
> give them unless you do.

Gahh. Here's all of them - there were six cases.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2825 bytes --]

 t/t4003-diff-rename-1.sh       |    2 +-
 t/t4004-diff-rename-symlink.sh |    2 +-
 t/t4005-diff-rename-2.sh       |    2 +-
 t/t4008-diff-break-rewrite.sh  |    4 ++--
 t/t4009-diff-rename-4.sh       |    2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
index c6130c4..bfa8835 100755
--- a/t/t4003-diff-rename-1.sh
+++ b/t/t4003-diff-rename-1.sh
@@ -29,7 +29,7 @@ test_expect_success \
 # copy-and-edit one, and rename-and-edit the other.  We do not say
 # anything about rezrov.
 
-GIT_DIFF_OPTS=--unified=0 git diff-index -M -p $tree >current
+GIT_DIFF_OPTS=--unified=0 git diff-index -C -p $tree >current
 cat >expected <<\EOF
 diff --git a/COPYING b/COPYING.1
 copy from COPYING
diff --git a/t/t4004-diff-rename-symlink.sh b/t/t4004-diff-rename-symlink.sh
index 92a65f4..6e562c8 100755
--- a/t/t4004-diff-rename-symlink.sh
+++ b/t/t4004-diff-rename-symlink.sh
@@ -35,7 +35,7 @@ test_expect_success SYMLINKS \
 # a new creation.
 
 test_expect_success SYMLINKS 'setup diff output' "
-    GIT_DIFF_OPTS=--unified=0 git diff-index -M -p $tree >current &&
+    GIT_DIFF_OPTS=--unified=0 git diff-index -C -p $tree >current &&
     cat >expected <<\EOF
 diff --git a/bozbar b/bozbar
 new file mode 120000
diff --git a/t/t4005-diff-rename-2.sh b/t/t4005-diff-rename-2.sh
index 1ba359d..77d7f49 100755
--- a/t/t4005-diff-rename-2.sh
+++ b/t/t4005-diff-rename-2.sh
@@ -29,7 +29,7 @@ test_expect_success \
 # and COPYING.2 are based on COPYING, and do not say anything about
 # rezrov.
 
-git diff-index -M $tree >current
+git diff-index -C $tree >current
 
 cat >expected <<\EOF
 :100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178 C1234	COPYING	COPYING.1
diff --git a/t/t4008-diff-break-rewrite.sh b/t/t4008-diff-break-rewrite.sh
index d79d9e1e..73b4a24 100755
--- a/t/t4008-diff-break-rewrite.sh
+++ b/t/t4008-diff-break-rewrite.sh
@@ -173,8 +173,8 @@ test_expect_success \
     'compare_diff_raw expected current'
 
 test_expect_success \
-    'run diff with -B -M' \
-    'git diff-index -B -M "$tree" >current'
+    'run diff with -B -C' \
+    'git diff-index -B -C "$tree" >current'
 
 cat >expected <<\EOF
 :100644 100644 f5deac7be59e7eeab8657fd9ae706fd6a57daed2 08bb2fb671deff4c03a4d4a0a1315dff98d5732c C095	file0	file1
diff --git a/t/t4009-diff-rename-4.sh b/t/t4009-diff-rename-4.sh
index de3f174..f22c8e3 100755
--- a/t/t4009-diff-rename-4.sh
+++ b/t/t4009-diff-rename-4.sh
@@ -29,7 +29,7 @@ test_expect_success \
 # and COPYING.2 are based on COPYING, and do not say anything about
 # rezrov.
 
-git diff-index -z -M $tree >current
+git diff-index -z -C $tree >current
 
 cat >expected <<\EOF
 :100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178 C1234

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 23:27   ` Linus Torvalds
  2011-02-19  0:26     ` Linus Torvalds
  2011-02-19  0:44     ` Junio C Hamano
@ 2011-02-19  9:08     ` Jeff King
  2011-02-19 10:19     ` Jeff King
  3 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-19  9:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, Git Mailing List

On Fri, Feb 18, 2011 at 03:27:36PM -0800, Linus Torvalds wrote:

> >  1. Did you bump up your merge.renamelimit? It's hard to see because it
> >     scrolls off the screen amidst the many conflicts, but the first
> >     message is:
> >
> >       warning: too many files (created: 425 deleted: 1093), skipping
> >       inexact rename detection
> >
> >     which you want to use. Try "git config merge.renamelimit
> >     10000". Which runs pretty snappily on my machine; I wonder if we
> >     should up the default limit.
> 
> Yeah, for the kernel, I have
> 
> 	[diff]
> 		renamelimit=0
> 
> to disable the limit entirely, because the default limit is very low
> indeed. Git is quite good at the rename detection.
> 
> However, the reason for the low default is not because it's not snappy
> enough - it's because it can end up using a lot of memory (and if
> you're low on memory, the swapping will mean that it goes from "quite
> snappy" to "slow as molasses" - but it still will not be CPU limited,
> it's just paging like crazy).

I think it can be both. There is an O(n^2) part to the algorithm. I did
some timings a few years ago that showed an n^2 increase in time as you
bumped the limit:

  http://article.gmane.org/gmane.comp.version-control.git/73519

That's staying within a reasonable memory size. I would not be surprised
if you can get much worse behavior by going into swap, but I didn't
measure peak memory use there.

Those tests led to:

  commit 50705915eae89eae490dff30fa370ed02e4d6e72
  Author: Jeff King <peff@peff.net>
  Date:   Wed Apr 30 13:24:43 2008 -0400

    bump rename limit defaults

    The current rename limit default of 100 was arbitrarily
    chosen. Testing[1] has shown that on modern hardware, a
    limit of 200 adds about a second of computation time, and a
    limit of 500 adds about 5 seconds of computation time.

    This patch bumps the default limit to 200 for viewing diffs,
    and to 500 for performing a merge. The limit for generating
    git-status templates is set independently; we bump it up to
    200 here, as well, to match the diff limit.

But perhaps it's time to revisit the test; it's been 2 years, and my
hardware at the time was probably 2 years out of date. :)

Here are the old and new times for various sizes of rename. Details
about the test are in the message referenced above.

   N   Old CPU Seconds   New CPU Seconds
  10              0.43              0.02
 100              0.44              0.20
 200              1.40              0.55
 400              4.87              1.90
 800             18.08              7.01
1000             27.82             10.83

So maybe bump the diff limit to 400 and the merge limit to 1000,
doubling both? That leaves us at around 2 seconds per-commit for a log,
and 10 seconds tacked onto a merge. We could maybe even go higher with
the merge limit. If it's such a big merge, the conflict resolution is
probably going to take forever anyway, so 30 extra seconds if it makes
rename detection work is probably a good thing.

According to top, git only hit around 17M resident on the 1000-sized
one, so I don't think memory is a problem, at least for average repos
(and yes, I know top is an awful way to measure, but it's quick and it
would need to be orders of magnitude off for it to be a problem).

So I'm in favor of bumping the limits, or possibly even removing the
hard number limit and putting in a "try to do renames for this many
seconds" option. If we're going to have something like 30 second delays
on merge, though, we should perhaps write some eye candy to stderr after
2 seconds or so (like we do with "git checkout").

> So I do think we could try to lift the default a bit, but it might be
> even more important to just make the message much more noticeable and
> avoid scrolling past it. For example, setting a flag, and not printing
> out the message immediately, but instead print it out only if it turns
> into trouble at the end.

Yeah, I also think that would be useful. And if that information filters
up to the merge command, it can even give better advice (like how to
tweak the limit).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-18 23:27   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2011-02-19  9:08     ` Jeff King
@ 2011-02-19 10:19     ` Jeff King
  2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
                         ` (4 more replies)
  3 siblings, 5 replies; 34+ messages in thread
From: Jeff King @ 2011-02-19 10:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

On Fri, Feb 18, 2011 at 03:27:36PM -0800, Linus Torvalds wrote:

> So I do think we could try to lift the default a bit, but it might be
> even more important to just make the message much more noticeable and
> avoid scrolling past it. For example, setting a flag, and not printing
> out the message immediately, but instead print it out only if it turns
> into trouble at the end.

So here's a patch series. It's built on your other 3 patches (all of
which I thought looked good, btw).

  [1/4]: merge: improve inexact rename limit warning
  [2/4]: bump rename limit defaults (again)
  [3/4]: commit: stop setting rename limit
  [4/4]: inexact rename detection eye candy

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/4] merge: improve inexact rename limit warning
  2011-02-19 10:19     ` Jeff King
@ 2011-02-19 10:20       ` Jeff King
  2011-02-21 23:33         ` Junio C Hamano
  2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-02-19 10:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

The warning is generated deep in the diffcore code, which
means that it will come first, followed possibly by a spew
of conflicts, making it hard to see.

Instead, let's have diffcore pass back the information about
how big the rename limit would needed to have been, and then
the caller can provide a more appropriate message (and at a
more appropriate time).

No refactoring of other non-merge callers is necessary,
because nobody else was even using the warn_on_rename_limit
feature.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.h            |    2 +-
 diffcore-rename.c |    5 +++--
 merge-recursive.c |   10 +++++++++-
 merge-recursive.h |    1 +
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/diff.h b/diff.h
index 0083d92..f774c9a 100644
--- a/diff.h
+++ b/diff.h
@@ -110,7 +110,7 @@ struct diff_options {
 	int pickaxe_opts;
 	int rename_score;
 	int rename_limit;
-	int warn_on_too_large_rename;
+	int needed_rename_limit;
 	int dirstat_percent;
 	int setup;
 	int abbrev;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 0cd4c13..1d3d5cd 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -518,12 +518,13 @@ void diffcore_rename(struct diff_options *options)
 	 * but handles the potential overflow case specially (and we
 	 * assume at least 32-bit integers)
 	 */
+	options->needed_rename_limit = 0;
 	if (rename_limit <= 0 || rename_limit > 32767)
 		rename_limit = 32767;
 	if ((num_create > rename_limit && num_src > rename_limit) ||
 	    (num_create * num_src > rename_limit * rename_limit)) {
-		if (options->warn_on_too_large_rename)
-			warning("too many files (created: %d deleted: %d), skipping inexact rename detection", num_create, num_src);
+		options->needed_rename_limit =
+			num_src > num_create ? num_src : num_create;
 		goto cleanup;
 	}
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 16c2dbe..2ecd456 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -22,6 +22,11 @@
 #include "dir.h"
 #include "submodule.h"
 
+static const char rename_limit_advice[] =
+"inexact rename detection was skipped because there were too many\n"
+"  files. You may want to set your merge.renamelimit variable to at least\n"
+"  %d and retry this merge.";
+
 static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 				      const char *subtree_shift)
 {
@@ -436,12 +441,13 @@ static struct string_list *get_renames(struct merge_options *o,
 			    o->diff_rename_limit >= 0 ? o->diff_rename_limit :
 			    500;
 	opts.rename_score = o->rename_score;
-	opts.warn_on_too_large_rename = 1;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	if (diff_setup_done(&opts) < 0)
 		die("diff setup failed");
 	diff_tree_sha1(o_tree->object.sha1, tree->object.sha1, "", &opts);
 	diffcore_std(&opts);
+	if (opts.needed_rename_limit > o->needed_rename_limit)
+		o->needed_rename_limit = opts.needed_rename_limit;
 	for (i = 0; i < diff_queued_diff.nr; ++i) {
 		struct string_list_item *item;
 		struct rename *re;
@@ -1666,6 +1672,8 @@ int merge_recursive(struct merge_options *o,
 		commit_list_insert(h2, &(*result)->parents->next);
 	}
 	flush_output(o);
+	if (o->needed_rename_limit)
+		warning(rename_limit_advice, o->needed_rename_limit);
 	return clean;
 }
 
diff --git a/merge-recursive.h b/merge-recursive.h
index 981ed6a..1a113e2 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -20,6 +20,7 @@ struct merge_options {
 	int diff_rename_limit;
 	int merge_rename_limit;
 	int rename_score;
+	int needed_rename_limit;
 	int call_depth;
 	struct strbuf obuf;
 	struct string_list current_file_set;
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] bump rename limit defaults (again)
  2011-02-19 10:19     ` Jeff King
  2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
@ 2011-02-19 10:21       ` Jeff King
  2011-02-19 17:54         ` Piotr Krukowiecki
  2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
  2011-02-19 10:21       ` [PATCH 3/4] commit: stop setting rename limit Jeff King
                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2011-02-19 10:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

We did this once before in 5070591 (bump rename limit
defaults, 2008-04-30). Back then, we were shooting for about
1 second for a diff/log calculation, and 5 seconds for a
merge.

There are a few new things to consider, though:

  1. Average processors are faster now.

  2. We've seen on the mailing list some ugly merges where
     not using inexact rename detection leads to many more
     conflicts. Merges of this size take a long time
     anyway, so users are probably happy to spend a little
     bit of time computing the renames.

Let's bump the diff/merge default limits from 200/500 to
400/1000. Those are 2 seconds and 10 seconds respectively on
my modern hardware.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c            |    2 +-
 merge-recursive.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 5422c43..869cca7 100644
--- a/diff.c
+++ b/diff.c
@@ -23,7 +23,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_rename_limit_default = 200;
+static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 int diff_use_color_default = -1;
 static const char *diff_word_regex_cfg;
diff --git a/merge-recursive.c b/merge-recursive.c
index 2ecd456..089aa10 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -439,7 +439,7 @@ static struct string_list *get_renames(struct merge_options *o,
 	opts.detect_rename = DIFF_DETECT_RENAME;
 	opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
 			    o->diff_rename_limit >= 0 ? o->diff_rename_limit :
-			    500;
+			    1000;
 	opts.rename_score = o->rename_score;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	if (diff_setup_done(&opts) < 0)
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] commit: stop setting rename limit
  2011-02-19 10:19     ` Jeff King
  2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
  2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
@ 2011-02-19 10:21       ` Jeff King
  2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
  2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
  4 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-19 10:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

For its post-commit summary, commit was explicitly setting
the default rename limit to 100. Presumably when the code
was added, it was necessary to do so. These days, however,
it will fall back properly to the diff default, and that
default has long since changed from 100.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d7f55e3..aa1510e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1215,7 +1215,6 @@ static void print_summary(const char *prefix, const unsigned char *sha1)
 	get_commit_format(format.buf, &rev);
 	rev.always_show_header = 0;
 	rev.diffopt.detect_rename = 1;
-	rev.diffopt.rename_limit = 100;
 	rev.diffopt.break_opt = 0;
 	diff_setup_done(&rev.diffopt);
 
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-19 10:19     ` Jeff King
                         ` (2 preceding siblings ...)
  2011-02-19 10:21       ` [PATCH 3/4] commit: stop setting rename limit Jeff King
@ 2011-02-19 10:25       ` Jeff King
  2011-02-19 15:57         ` Linus Torvalds
  2011-02-19 16:29         ` Martin Langhoff
  2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
  4 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2011-02-19 10:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

During a merge, we might spend many seconds doing inexact
rename detection. It's nice to let the user know that
something is actually happening.

Signed-off-by: Jeff King <peff@peff.net>
---
This feels wrong because it's in such a deep library function. At the
very least we probably need some way to turn it off, so callers can pass
along any --quiet or --no-progress indicators. Though I think it should
perhaps be off entirely for any revision traversals (since the progress
will be per-commit, so it will be in the middle of log output). And then
on for diff (unless --quiet is given), and on for merge.

I made it update progress for each of the rename_src * rename_dst
similarity estimates. We could just as easily count rename_dst items we
look at, but hey, it's eye candy, and obviously bigger numbers are
better.

 diffcore-rename.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1d3d5cd..136a86f 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "hash.h"
+#include "progress.h"
 
 /* Table of rename/copy destinations */
 
@@ -449,6 +450,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_score *mx;
 	int i, j, rename_count;
 	int num_create, num_src, dst_cnt;
+	struct progress *progress;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -528,6 +530,10 @@ void diffcore_rename(struct diff_options *options)
 		goto cleanup;
 	}
 
+	progress = start_progress_delay(
+			"Performing inexact rename detection",
+			rename_dst_nr * rename_src_nr, 50, 1);
+
 	mx = xcalloc(num_create * NUM_CANDIDATE_PER_DST, sizeof(*mx));
 	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
 		struct diff_filespec *two = rename_dst[i].two;
@@ -555,9 +561,11 @@ void diffcore_rename(struct diff_options *options)
 			 */
 			diff_free_filespec_blob(one);
 			diff_free_filespec_blob(two);
+			display_progress(progress, i*rename_src_nr + j);
 		}
 		dst_cnt++;
 	}
+	stop_progress(&progress);
 
 	/* cost matrix sorted by most to least similar pair */
 	qsort(mx, dst_cnt * NUM_CANDIDATE_PER_DST, sizeof(*mx), score_compare);
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19 10:19     ` Jeff King
                         ` (3 preceding siblings ...)
  2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
@ 2011-02-19 15:22       ` Martin Langhoff
  2011-02-19 15:31         ` Martin Langhoff
  4 siblings, 1 reply; 34+ messages in thread
From: Martin Langhoff @ 2011-02-19 15:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sat, Feb 19, 2011 at 5:19 AM, Jeff King <peff@peff.net> wrote:
> So here's a patch series. It's built on your other 3 patches (all of
> which I thought looked good, btw).

Right, this is looking better! Applied Linus and Jeff's patches, and
things are much more civilized.

Now, I still have a ton of cases where the 2 branches renamed it just
slightly different. In those cases, what I want to do is have a merge
helper script that does...

 git merge-helper --prefer-path foo/orig/README foo/preferred/README
foo/other/README

if git could keep the rename trios that it recognizes during the merge
operation in a "merge notes" file, we could just name

 git merge-helper --prefer-path foo/preferred/README

and have it infer the "non-preferred" paths.

The result of the operation should be the same as if git had magically
known what I wanted to do during merge: git attempts a git-smart diff3
type merge, and stores the results accordingly in the index.

Even for a large messy merge like this, it means I can say

 git merge
 git merge-helper --prefer-path src/templates/themes/default/* other/dir/foo/*

and that will clear out 99% of the stupid work with minimal fuss;
leaving the interesting merge work to be cleared up by hand.

I suspect that teaching git merge to accept about 'preferred paths',
and perhaps "cache/learn" about them, is also desirable. Harder
though.

Thoughts?

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Merging limitations after directory renames -- interesting test repo
  2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
@ 2011-02-19 15:31         ` Martin Langhoff
  0 siblings, 0 replies; 34+ messages in thread
From: Martin Langhoff @ 2011-02-19 15:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sat, Feb 19, 2011 at 10:22 AM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
>  git merge-helper --prefer-path foo/preferred/README

The concrete example in this repo could be

 gitk pg master --  trunk/etherpad/src/templates/store/eepnet_eval_signup.ejs \
    etherpad/src/themes/default/templates/store/eepnet_eval_signup.ejs \
   etherpad/src/templates/store/eepnet_eval_signup.ejs

The ideal thing would be to have a helper that knows what to do if I say

  git merge-helper --prefer
etherpad/src/themes/default/templates/store/eepnet_eval_signup.ejs

cheers,


m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
@ 2011-02-19 15:57         ` Linus Torvalds
  2011-02-20  9:48           ` Jeff King
  2011-02-19 16:29         ` Martin Langhoff
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2011-02-19 15:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 2:25 AM, Jeff King <peff@peff.net> wrote:
>
> This feels wrong because it's in such a deep library function. At the
> very least we probably need some way to turn it off, so callers can pass
> along any --quiet or --no-progress indicators.

Yeah.

> I made it update progress for each of the rename_src * rename_dst
> similarity estimates. We could just as easily count rename_dst items we
> look at, but hey, it's eye candy, and obviously bigger numbers are
> better.

Uhh. My only big reaction to your patch was literally "why don't you
just do it on the 'dst' items". I really don't think bigger numbers
are better, and if you have _so_ many sources that each dst takes so
long that you'd want updates at that granularity, you're too screwed
anyway.

Don't make the "update progress" be part of the O(n^2) problem.

                  Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
  2011-02-19 15:57         ` Linus Torvalds
@ 2011-02-19 16:29         ` Martin Langhoff
  2011-02-20 10:04           ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Langhoff @ 2011-02-19 16:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sat, Feb 19, 2011 at 5:25 AM, Jeff King <peff@peff.net> wrote:
> During a merge, we might spend many seconds doing inexact
> rename detection. It's nice to let the user know that
> something is actually happening.

Given that we're doing costly work, could the results of rename
matching be recorded in a .git/MERGE_RENAMES file?

If that file's available, the simple merge-helper script I've
described is possible...

git commit would have to unlink it on successful commit...

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] bump rename limit defaults (again)
  2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
@ 2011-02-19 17:54         ` Piotr Krukowiecki
  2011-02-20 10:10           ` Jeff King
  2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 34+ messages in thread
From: Piotr Krukowiecki @ 2011-02-19 17:54 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 11:21 AM, Jeff King <peff@peff.net> wrote:
> Let's bump the diff/merge default limits from 200/500 to
> 400/1000. Those are 2 seconds and 10 seconds respectively on
> my modern hardware.

Just curious - what is your modern hardware?

-- 
Piotrek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] bump rename limit defaults (again)
  2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
  2011-02-19 17:54         ` Piotr Krukowiecki
@ 2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
  2011-02-20 10:12           ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-02-19 20:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 11:21, Jeff King <peff@peff.net> wrote:

> Let's bump the diff/merge default limits from 200/500 to
> 400/1000. Those are 2 seconds and 10 seconds respectively on
> my modern hardware.

This is somewhat outside the scope of your patch, but rather than
making these decisions for the user and compiling them into Git
wouldn't it be better to expose it somehow?

E.g. when you do a git-merge show a progress bar (similar to
git-clone) showing that we're trying to do rename detection, show that
we stopped, and when we fail point out that the user could run
git-merge with some switch that would make Git try harder.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-19 15:57         ` Linus Torvalds
@ 2011-02-20  9:48           ` Jeff King
  2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
                               ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20  9:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 07:57:57AM -0800, Linus Torvalds wrote:

> > I made it update progress for each of the rename_src * rename_dst
> > similarity estimates. We could just as easily count rename_dst items we
> > look at, but hey, it's eye candy, and obviously bigger numbers are
> > better.
> 
> Uhh. My only big reaction to your patch was literally "why don't you
> just do it on the 'dst' items". I really don't think bigger numbers
> are better, and if you have _so_ many sources that each dst takes so
> long that you'd want updates at that granularity, you're too screwed
> anyway.
> 
> Don't make the "update progress" be part of the O(n^2) problem.

I timed it and it's not. The progress code is smart enough not to
actually print anything more than once per second. So it's just an extra
function call per loop, which is dwarfed by the massive
estimate_similarity(). It's really not that tight a loop.

That being said, the output seems a little smoother to me hoisting it
out, so I've put that in my re-roll. The bigger numbers are worth
keeping, IMHO, as they are a more accurate reflection of how much work
is being done. You do the same amount of work with 2 dests and 1000
sources as you do with 1000 dests and 2 sources. But one will count to
1000, and the other will count to 2. They should probably both count to
2000, the number of estimate_similarity() calls we must make, which is
the expensive part. (Actually that is not quite accurate, as we may skip
some calls for destinations already found, but it's not worth the effort
to figure out how many calls we'll actually make ahead of time).

So here is the re-roll, which I think is probably OK for inclusion. It
replaces 4/4 from my last series.

  [1/3]: add inexact rename detection progress infrastructure
  [2/3]: merge: enable progress reporting for rename detection
  [3/3]: pull: propagate --progress to merge

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/3] add inexact rename detection progress infrastructure
  2011-02-20  9:48           ` Jeff King
@ 2011-02-20  9:51             ` Jeff King
  2011-02-20  9:53             ` [PATCH 2/3] merge: enable progress reporting for rename detection Jeff King
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20  9:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

We might spend many seconds doing inexact rename detection
with no output.  It's nice to let the user know that
something is actually happening.

This patch adds the infrastructure, but no callers actually
turn on progress reporting.

Signed-off-by: Jeff King <peff@peff.net>
---
Replaces the old 4/4. Now not on by default, progress reporting is
hoisted to the outer loop, and actually counts all the way to 100%.

 diff.h            |    1 +
 diffcore-rename.c |   10 ++++++++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/diff.h b/diff.h
index f774c9a..9585e41 100644
--- a/diff.h
+++ b/diff.h
@@ -111,6 +111,7 @@ struct diff_options {
 	int rename_score;
 	int rename_limit;
 	int needed_rename_limit;
+	int show_rename_progress;
 	int dirstat_percent;
 	int setup;
 	int abbrev;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1d3d5cd..d40e40a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "hash.h"
+#include "progress.h"
 
 /* Table of rename/copy destinations */
 
@@ -449,6 +450,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_score *mx;
 	int i, j, rename_count;
 	int num_create, num_src, dst_cnt;
+	struct progress *progress = NULL;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -528,6 +530,12 @@ void diffcore_rename(struct diff_options *options)
 		goto cleanup;
 	}
 
+	if (options->show_rename_progress) {
+		progress = start_progress_delay(
+				"Performing inexact rename detection",
+				rename_dst_nr * rename_src_nr, 50, 1);
+	}
+
 	mx = xcalloc(num_create * NUM_CANDIDATE_PER_DST, sizeof(*mx));
 	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
 		struct diff_filespec *two = rename_dst[i].two;
@@ -557,7 +565,9 @@ void diffcore_rename(struct diff_options *options)
 			diff_free_filespec_blob(two);
 		}
 		dst_cnt++;
+		display_progress(progress, (i+1)*rename_src_nr);
 	}
+	stop_progress(&progress);
 
 	/* cost matrix sorted by most to least similar pair */
 	qsort(mx, dst_cnt * NUM_CANDIDATE_PER_DST, sizeof(*mx), score_compare);
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/3] merge: enable progress reporting for rename detection
  2011-02-20  9:48           ` Jeff King
  2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
@ 2011-02-20  9:53             ` Jeff King
  2011-02-20  9:56             ` [PATCH 3/3] pull: propagate --progress to merge Jeff King
  2011-02-20 10:37             ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
  3 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20  9:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

The user can enable or disable it explicitly with the new
--progress, but it defaults to checking isatty(2).

This works only with merge-recursive and subtree. In theory
we could pass a progress flag to other strategies, but none
of them support progress at this point, so let's wait until
they grow such a feature before worrying about propagating
it.

Signed-off-by: Jeff King <peff@peff.net>
---
In theory somebody could have a third-party strategy that cares about
progress. Passing the information would probably have to go through the
environment, as strategies will barf on command lines they don't
understand.

I punted on it until somebody shows up who actually cares.

 Documentation/merge-options.txt |   10 +++++++++-
 builtin/merge.c                 |    7 +++++++
 merge-recursive.c               |    1 +
 merge-recursive.h               |    1 +
 4 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index e33e0f8..b613d4e 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -75,9 +75,17 @@ option can be used to override --squash.
 ifndef::git-pull[]
 -q::
 --quiet::
-	Operate quietly.
+	Operate quietly. Implies --no-progress.
 
 -v::
 --verbose::
 	Be verbose.
+
+--progress::
+--no-progress::
+	Turn progress on/off explicitly. If neither is specified,
+	progress is shown if standard error is connected to a terminal.
+	Note that not all merge strategies may support progress
+	reporting.
+
 endif::git-pull[]
diff --git a/builtin/merge.c b/builtin/merge.c
index 8c58c3c..d81bee3 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -58,6 +58,7 @@ static int option_renormalize;
 static int verbosity;
 static int allow_rerere_auto;
 static int abort_current_merge;
+static int show_progress = -1;
 
 static struct strategy all_strategy[] = {
 	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -200,6 +201,7 @@ static struct option builtin_merge_options[] = {
 	OPT__VERBOSITY(&verbosity),
 	OPT_BOOLEAN(0, "abort", &abort_current_merge,
 		"abort the current in-progress merge"),
+	OPT_SET_INT(0, "progress", &show_progress, "force progress reporting", 1),
 	OPT_END()
 };
 
@@ -660,6 +662,8 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common,
 			o.subtree_shift = "";
 
 		o.renormalize = option_renormalize;
+		o.show_rename_progress =
+			show_progress == -1 ? isatty(2) : show_progress;
 
 		for (x = 0; x < xopts_nr; x++)
 			if (parse_merge_opt(&o, xopts[x]))
@@ -946,6 +950,9 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_merge_options,
 			builtin_merge_usage, 0);
 
+	if (verbosity < 0 && show_progress == -1)
+		show_progress = 0;
+
 	if (abort_current_merge) {
 		int nargc = 2;
 		const char *nargv[] = {"reset", "--merge", NULL};
diff --git a/merge-recursive.c b/merge-recursive.c
index 089aa10..6c8f957 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -441,6 +441,7 @@ static struct string_list *get_renames(struct merge_options *o,
 			    o->diff_rename_limit >= 0 ? o->diff_rename_limit :
 			    1000;
 	opts.rename_score = o->rename_score;
+	opts.show_rename_progress = o->show_rename_progress;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	if (diff_setup_done(&opts) < 0)
 		die("diff setup failed");
diff --git a/merge-recursive.h b/merge-recursive.h
index 1a113e2..7e1e972 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -21,6 +21,7 @@ struct merge_options {
 	int merge_rename_limit;
 	int rename_score;
 	int needed_rename_limit;
+	int show_rename_progress;
 	int call_depth;
 	struct strbuf obuf;
 	struct string_list current_file_set;
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/3] pull: propagate --progress to merge
  2011-02-20  9:48           ` Jeff King
  2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
  2011-02-20  9:53             ` [PATCH 2/3] merge: enable progress reporting for rename detection Jeff King
@ 2011-02-20  9:56             ` Jeff King
  2011-02-20 10:37             ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
  3 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20  9:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

Now that merge understands progress, we should pass it
along. While we're at it, pass along --no-progress, too.

Signed-off-by: Jeff King <peff@peff.net>
---
Probably not a big deal, but good for people who do "git pull
--no-progress" from a cronjob (which should be !isatty(2), but ISTR some
complaints by people with weird setups).

Fetch respects --no-progress, but I don't think it actually works right.
The transport code seems to only understand "want progress" or "don't
know, guess on isatty". But it's not really related to this topic, so I
didn't investigate too far tonight.

 git-pull.sh |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/git-pull.sh b/git-pull.sh
index f6b7b84..63b063a 100755
--- a/git-pull.sh
+++ b/git-pull.sh
@@ -53,6 +53,8 @@ do
 		verbosity="$verbosity -v" ;;
 	--progress)
 		progress=--progress ;;
+	--no-progress)
+		progress=--no-progress ;;
 	-n|--no-stat|--no-summary)
 		diffstat=--no-stat ;;
 	--stat|--summary)
@@ -293,8 +295,8 @@ true)
 	;;
 *)
 	eval="git-merge $diffstat $no_commit $squash $no_ff $ff_only"
-	eval="$eval  $log_arg $strategy_args $merge_args"
-	eval="$eval \"\$merge_name\" HEAD $merge_head $verbosity"
+	eval="$eval  $log_arg $strategy_args $merge_args $verbosity $progress"
+	eval="$eval \"\$merge_name\" HEAD $merge_head"
 	;;
 esac
 eval "exec $eval"
-- 
1.7.4.1.26.g5e991

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-19 16:29         ` Martin Langhoff
@ 2011-02-20 10:04           ` Jeff King
  2011-02-20 13:16             ` Martin Langhoff
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-02-20 10:04 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sat, Feb 19, 2011 at 11:29:33AM -0500, Martin Langhoff wrote:

> On Sat, Feb 19, 2011 at 5:25 AM, Jeff King <peff@peff.net> wrote:
> > During a merge, we might spend many seconds doing inexact
> > rename detection. It's nice to let the user know that
> > something is actually happening.
> 
> Given that we're doing costly work, could the results of rename
> matching be recorded in a .git/MERGE_RENAMES file?
> 
> If that file's available, the simple merge-helper script I've
> described is possible...

Yeah we can if it's useful, but I'd rather see a prototype merge helper
first. In particular, your merge helper seems like it would help in two
situations:

  1. In a rename/rename conflict, you have X becoming Y on one branch
     and Z on the other. You want to say "X should have become Y", and
     then have it 3-way merge X, Y, and Z, storing the result under the
     name Y.

     For that do we actually want the rename list, or just the list of
     rename/rename conflicts?

  2. If git misses a rename, then by definition won't that rename not be
     in the MERGE_RENAMES list? I guess it may have picked up the rename
     on one half of history, and you could use that?

     I'm not quite clear on the exact usage.

So it makes sense to me to get a working merge helper that just uses
"git diff" as appropriate to look up old renames, and then we'll know
what information is required.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] bump rename limit defaults (again)
  2011-02-19 17:54         ` Piotr Krukowiecki
@ 2011-02-20 10:10           ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20 10:10 UTC (permalink / raw)
  To: Piotr Krukowiecki
  Cc: Linus Torvalds, Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 06:54:47PM +0100, Piotr Krukowiecki wrote:

> On Sat, Feb 19, 2011 at 11:21 AM, Jeff King <peff@peff.net> wrote:
> > Let's bump the diff/merge default limits from 200/500 to
> > 400/1000. Those are 2 seconds and 10 seconds respectively on
> > my modern hardware.
> 
> Just curious - what is your modern hardware?

It's a Core i7 840QM. So it's a measly 1.8GHz, but it turboboosts up to
3.2GHz on a single-threaded process.  The multiple cores are irrelevant
for this timing, as it's very single-threaded[1].

The process used in the tens of megabytes of memory, so I think memory
size is irrelevant. It was very CPU bound in my tests.

-Peff

[1] Actually, this is one place where multi-threading the algorithm
would be very easy. It's literally an m*n double-loop, seeing which of
each "n" sources matches each "m" destination best, and the expensive
bit is comparing a source and dest. So you could parallelize score
calculation up to m*n ways, then just sort the result for each dest.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] bump rename limit defaults (again)
  2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
@ 2011-02-20 10:12           ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20 10:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Linus Torvalds, Junio C Hamano, Martin Langhoff, Git Mailing List

On Sat, Feb 19, 2011 at 09:12:49PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Sat, Feb 19, 2011 at 11:21, Jeff King <peff@peff.net> wrote:
> 
> > Let's bump the diff/merge default limits from 200/500 to
> > 400/1000. Those are 2 seconds and 10 seconds respectively on
> > my modern hardware.
> 
> This is somewhat outside the scope of your patch, but rather than
> making these decisions for the user and compiling them into Git
> wouldn't it be better to expose it somehow?

What do you mean by expose it? We already have config options; this is
just for the defaults.

Or do you mean allowing setting merge.renamelimits to "30s" and cutting
off detection after 30 seconds? I think that would be a reasonable idea,
though part of me doesn't like non-deterministic answers based on how
loaded the system is.

> E.g. when you do a git-merge show a progress bar (similar to
> git-clone) showing that we're trying to do rename detection, show that
> we stopped, and when we fail point out that the user could run
> git-merge with some switch that would make Git try harder.

Er, isn't that exactly what patches 1/4 and 4/4 in my series do?

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-20  9:48           ` Jeff King
                               ` (2 preceding siblings ...)
  2011-02-20  9:56             ` [PATCH 3/3] pull: propagate --progress to merge Jeff King
@ 2011-02-20 10:37             ` Jeff King
  3 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-20 10:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Martin Langhoff, Git Mailing List

On Sun, Feb 20, 2011 at 04:48:04AM -0500, Jeff King wrote:

> (Actually that is not quite accurate, as we may skip some calls for
> destinations already found, but it's not worth the effort to figure
> out how many calls we'll actually make ahead of time).

Hmm. Actually, we do calculate the exact amount of work a few lines
above. So we could be more accurate. I'm still not sure it really
matters.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH 4/4] inexact rename detection eye candy
  2011-02-20 10:04           ` Jeff King
@ 2011-02-20 13:16             ` Martin Langhoff
  0 siblings, 0 replies; 34+ messages in thread
From: Martin Langhoff @ 2011-02-20 13:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sun, Feb 20, 2011 at 5:04 AM, Jeff King <peff@peff.net> wrote:
> Yeah we can if it's useful, but I'd rather see a prototype merge helper
> first. In particular, your merge helper seems like it would help in two
> situations:

Kind of circular -- the helper depends on

 - What unresolved cases do we have?

 - What data can you store from the diff machinery?

cheers,



m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] merge: improve inexact rename limit warning
  2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
@ 2011-02-21 23:33         ` Junio C Hamano
  2011-02-22 15:39           ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2011-02-21 23:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Martin Langhoff, Git Mailing List

Jeff King <peff@peff.net> writes:

> The warning is generated deep in the diffcore code, which
> means that it will come first, followed possibly by a spew
> of conflicts, making it hard to see.
>
> Instead, let's have diffcore pass back the information about
> how big the rename limit would needed to have been, and then
> the caller can provide a more appropriate message (and at a
> more appropriate time).
>
> No refactoring of other non-merge callers is necessary,
> because nobody else was even using the warn_on_rename_limit
> feature.
>
> Signed-off-by: Jeff King <peff@peff.net>

Thanks.

This conflicts with 2840824 (diffcore-rename: fall back to -C when -C -C
busts the rename limit, 2011-01-06) on 'pu', unfortunately.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] merge: improve inexact rename limit warning
  2011-02-21 23:33         ` Junio C Hamano
@ 2011-02-22 15:39           ` Jeff King
  2011-02-22 18:46             ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2011-02-22 15:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Martin Langhoff, Git Mailing List

On Mon, Feb 21, 2011 at 03:33:14PM -0800, Junio C Hamano wrote:

> > Instead, let's have diffcore pass back the information about
> > how big the rename limit would needed to have been, and then
> > the caller can provide a more appropriate message (and at a
> > more appropriate time).
> [...]
> This conflicts with 2840824 (diffcore-rename: fall back to -C when -C -C
> busts the rename limit, 2011-01-06) on 'pu', unfortunately.

Do you want to do the merge, or do you want me to rebase on top of
2840824?

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] merge: improve inexact rename limit warning
  2011-02-22 15:39           ` Jeff King
@ 2011-02-22 18:46             ` Junio C Hamano
  2011-02-23  8:02               ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2011-02-22 18:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Linus Torvalds, Martin Langhoff, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Mon, Feb 21, 2011 at 03:33:14PM -0800, Junio C Hamano wrote:
> ...
>> This conflicts with 2840824 (diffcore-rename: fall back to -C when -C -C
>> busts the rename limit, 2011-01-06) on 'pu', unfortunately.
>
> Do you want to do the merge, or do you want me to rebase on top of
> 2840824?

Please check the resolution queued on 'pu'; I suspect that your series
should graduate before the fall-back-to-c-from-c-c topic, so I'd rather
not to see you rebase this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] merge: improve inexact rename limit warning
  2011-02-22 18:46             ` Junio C Hamano
@ 2011-02-23  8:02               ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2011-02-23  8:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Martin Langhoff, Git Mailing List

On Tue, Feb 22, 2011 at 10:46:57AM -0800, Junio C Hamano wrote:

> >> This conflicts with 2840824 (diffcore-rename: fall back to -C when -C -C
> >> busts the rename limit, 2011-01-06) on 'pu', unfortunately.
> >
> > Do you want to do the merge, or do you want me to rebase on top of
> > 2840824?
> 
> Please check the resolution queued on 'pu'; I suspect that your series
> should graduate before the fall-back-to-c-from-c-c topic, so I'd rather
> not to see you rebase this.

It looks sane to me. I was going to say we should also have
merge-recursive check for the fallback-to-c flag and tell the user what
happened, but I don't think merge-recursive ever has FIND_COPIES_HARDER
enabled.

Which I think means in 2840824 itself, the warning about fallback can
never be triggered (because merge-recursive is the only caller who
turned on warn_on_too_large_rename).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-02-23  8:02 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-18 18:58 Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-18 22:21 ` Jeff King
2011-02-18 23:27   ` Linus Torvalds
2011-02-19  0:26     ` Linus Torvalds
2011-02-19  0:52       ` Junio C Hamano
2011-02-19  1:03         ` Linus Torvalds
2011-02-19  1:20           ` Linus Torvalds
2011-02-19  0:55       ` Linus Torvalds
2011-02-19  0:44     ` Junio C Hamano
2011-02-19  9:08     ` Jeff King
2011-02-19 10:19     ` Jeff King
2011-02-19 10:20       ` [PATCH 1/4] merge: improve inexact rename limit warning Jeff King
2011-02-21 23:33         ` Junio C Hamano
2011-02-22 15:39           ` Jeff King
2011-02-22 18:46             ` Junio C Hamano
2011-02-23  8:02               ` Jeff King
2011-02-19 10:21       ` [PATCH 2/4] bump rename limit defaults (again) Jeff King
2011-02-19 17:54         ` Piotr Krukowiecki
2011-02-20 10:10           ` Jeff King
2011-02-19 20:12         ` Ævar Arnfjörð Bjarmason
2011-02-20 10:12           ` Jeff King
2011-02-19 10:21       ` [PATCH 3/4] commit: stop setting rename limit Jeff King
2011-02-19 10:25       ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 15:57         ` Linus Torvalds
2011-02-20  9:48           ` Jeff King
2011-02-20  9:51             ` [PATCH 1/3] add inexact rename detection progress infrastructure Jeff King
2011-02-20  9:53             ` [PATCH 2/3] merge: enable progress reporting for rename detection Jeff King
2011-02-20  9:56             ` [PATCH 3/3] pull: propagate --progress to merge Jeff King
2011-02-20 10:37             ` [RFC/PATCH 4/4] inexact rename detection eye candy Jeff King
2011-02-19 16:29         ` Martin Langhoff
2011-02-20 10:04           ` Jeff King
2011-02-20 13:16             ` Martin Langhoff
2011-02-19 15:22       ` Merging limitations after directory renames -- interesting test repo Martin Langhoff
2011-02-19 15:31         ` Martin Langhoff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.