All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] rename detection: allow more renames
@ 2015-11-13 16:35 Andreas Krey
  2015-11-24 23:33 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Krey @ 2015-11-13 16:35 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

we also ran into the maximum rename limit
in the rename detection. (Yes, we get a lot
of rename candidates when cherry-picking
between two specific releases.)

The code talks about limiting the size
of the rename matrix, but as far as I
can see, the matrix itself never exists
as such, and the only thing that could
actually overflow is the computation
for the progress indication. This
can be fixed by reporting just the
destinations checked instead of the
combinations, since we only update
the progress once per destination
anyway.

If the direction is good, I will
shape it up (and obtain the
proper signoffs).


diff --git a/diffcore-rename.c b/diffcore-rename.c
index 749a35d..279f24e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -368,7 +368,7 @@ static int too_many_rename_candidates(int num_create,
 	int rename_limit = options->rename_limit;
 	int num_src = rename_src_nr;
 	int i;
-
+	static int max_rename_limit = 100000;
 	options->needed_rename_limit = 0;
 
 	/*
@@ -380,8 +380,8 @@ static int too_many_rename_candidates(int num_create,
 	 * but handles the potential overflow case specially (and we
 	 * assume at least 32-bit integers)
 	 */
-	if (rename_limit <= 0 || rename_limit > 32767)
-		rename_limit = 32767;
+	if (rename_limit <= 0 || rename_limit > max_rename_limit)
+		rename_limit = max_rename_limit;
 	if ((num_create <= rename_limit || num_src <= rename_limit) &&
 	    (num_create * num_src <= rename_limit * rename_limit))
 		return 0;
@@ -515,7 +515,7 @@ void diffcore_rename(struct diff_options *options)
 	if (options->show_rename_progress) {
 		progress = start_progress_delay(
 				_("Performing inexact rename detection"),
-				rename_dst_nr * rename_src_nr, 50, 1);
+				rename_dst_nr, 50, 1);
 	}
 
 	mx = xcalloc(num_create * NUM_CANDIDATE_PER_DST, sizeof(*mx));
@@ -552,7 +552,7 @@ void diffcore_rename(struct diff_options *options)
 			diff_free_filespec_blob(two);
 		}
 		dst_cnt++;
-		display_progress(progress, (i+1)*rename_src_nr);
+		display_progress(progress, (i+1));
 	}
 	stop_progress(&progress);


And we would also like to see progress when doing
a cherry pick - in our case this takes a few minutes:

 
diff --git a/sequencer.c b/sequencer.c
index 3c060e0..8fce028 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -282,6 +282,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
 
 	for (xopt = opts->xopts; xopt != opts->xopts + opts->xopts_nr; xopt++)
 		parse_merge_opt(&o, *xopt);
+	o.show_rename_progress = isatty(2);
 
 	clean = merge_trees(&o,
 			    head_tree,


Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] rename detection: allow more renames
  2015-11-13 16:35 [RFC] rename detection: allow more renames Andreas Krey
@ 2015-11-24 23:33 ` Jeff King
  2015-11-25 12:03   ` Andreas Krey
  2015-11-30 18:25   ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff King @ 2015-11-24 23:33 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Git Mailing List

On Fri, Nov 13, 2015 at 05:35:06PM +0100, Andreas Krey wrote:

> The code talks about limiting the size
> of the rename matrix, but as far as I
> can see, the matrix itself never exists
> as such, and the only thing that could
> actually overflow is the computation
> for the progress indication. This
> can be fixed by reporting just the
> destinations checked instead of the
> combinations, since we only update
> the progress once per destination
> anyway.

I didn't dig in the archive, but I think we discussed the "just show
progress for destinations" before. The problem you run into is that the
items aren't a good indication of the amount of work. You really are
doing n*m work, so if you just count "m", it can be very misleading if
"n" is high (and vice versa).

Might it make more sense just to move to a larger integer size? Or
perhaps to allow a higher limit for each side as long as the product of
the sides does not overflow?

> And we would also like to see progress when doing
> a cherry pick - in our case this takes a few minutes:
> 
>  
> diff --git a/sequencer.c b/sequencer.c
> index 3c060e0..8fce028 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -282,6 +282,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
>  
>  	for (xopt = opts->xopts; xopt != opts->xopts + opts->xopts_nr; xopt++)
>  		parse_merge_opt(&o, *xopt);
> +	o.show_rename_progress = isatty(2);
>  
>  	clean = merge_trees(&o,
>  			    head_tree,

I think that's sensible, for the same reasons that "merge" shows
progress. I suspect the patch you show above catches "git revert", too,
but if not, it should probably get the same treatment.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] rename detection: allow more renames
  2015-11-24 23:33 ` Jeff King
@ 2015-11-25 12:03   ` Andreas Krey
  2015-11-25 12:36     ` Jeff King
  2015-11-30 18:25   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Andreas Krey @ 2015-11-25 12:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

Hi Jeff,

thanks for the reply!

On Tue, 24 Nov 2015 18:33:28 +0000, Jeff King wrote:
...
> I didn't dig in the archive, but I think we discussed the "just show
> progress for destinations" before. The problem you run into is that the
> items aren't a good indication of the amount of work. You really are
> doing n*m work, so if you just count "m", it can be very misleading if
> "n" is high (and vice versa).

True, but the loops do progress indication for destinations only anyway.
So if you only have three destinations and a zillion sources, you
will still get only three progress updates, even if they say
'one zillion (33%)', 'two zillion (67%)', ...

I think as long as this is the case we can as well report the destination
count; maybe put the source count somewhere in the progress text.

> Might it make more sense just to move to a larger integer size?

Which would be? I'd venture into the progress code to identify
the necessary changes.

> Or
> perhaps to allow a higher limit for each side as long as the product of
> the sides does not overflow?

We're somewhat close to getting there. The rename detection runs
for several minutes in our cases.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] rename detection: allow more renames
  2015-11-25 12:03   ` Andreas Krey
@ 2015-11-25 12:36     ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2015-11-25 12:36 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Git Mailing List

On Wed, Nov 25, 2015 at 01:03:15PM +0100, Andreas Krey wrote:

> On Tue, 24 Nov 2015 18:33:28 +0000, Jeff King wrote:
> ...
> > I didn't dig in the archive, but I think we discussed the "just show
> > progress for destinations" before. The problem you run into is that the
> > items aren't a good indication of the amount of work. You really are
> > doing n*m work, so if you just count "m", it can be very misleading if
> > "n" is high (and vice versa).
> 
> True, but the loops do progress indication for destinations only anyway.
> So if you only have three destinations and a zillion sources, you
> will still get only three progress updates, even if they say
> 'one zillion (33%)', 'two zillion (67%)', ...
> 
> I think as long as this is the case we can as well report the destination
> count; maybe put the source count somewhere in the progress text.

Ah, true. Though maybe that is something we should be fixing, rather
than building on. I dunno. I cannot recall the last time I saw rename
detection progress myself.

> > Might it make more sense just to move to a larger integer size?
> 
> Which would be? I'd venture into the progress code to identify
> the necessary changes.

I was thinking of uint64_t; the progress code would need to change, but
I don't see a big problem with that.

> We're somewhat close to getting there. The rename detection runs
> for several minutes in our cases.

Yikes. :)

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] rename detection: allow more renames
  2015-11-24 23:33 ` Jeff King
  2015-11-25 12:03   ` Andreas Krey
@ 2015-11-30 18:25   ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2015-11-30 18:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Krey, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Fri, Nov 13, 2015 at 05:35:06PM +0100, Andreas Krey wrote:
>
>> The code talks about limiting the size
>> of the rename matrix, but as far as I
>> can see, the matrix itself never exists
>> as such, and the only thing that could
>> actually overflow is the computation
>> for the progress indication. This
>> can be fixed by reporting just the
>> destinations checked instead of the
>> combinations, since we only update
>> the progress once per destination
>> anyway.
>
> I didn't dig in the archive, but I think we discussed the "just show
> progress for destinations" before. The problem you run into is that the
> items aren't a good indication of the amount of work. You really are
> doing n*m work, so if you just count "m", it can be very misleading if
> "n" is high (and vice versa).

Right.

With s/never exists/no longer exists/ in the above observation, I
agree that this topic is sensible that it revisits the stale comment
from days back when we did use the matrix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-11-30 18:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13 16:35 [RFC] rename detection: allow more renames Andreas Krey
2015-11-24 23:33 ` Jeff King
2015-11-25 12:03   ` Andreas Krey
2015-11-25 12:36     ` Jeff King
2015-11-30 18:25   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.