git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits
@ 2021-07-11  0:46 Elijah Newren via GitGitGadget
  2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-11  0:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

Fix a few small issues with documentation and warnings around the limits for
the quadratic portion of rename (&copy) detection.

I also sent out a separate RFC about how and whether to bump
{diff,merge}.renameLimit [1]. If there's a quick consensus there, I may add
a patch to this series that bumps the limits, but that could also just be
done separately afterward.

[1]
https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u

Elijah Newren (3):
  doc: clarify documentation for rename/copy limits
  doc: document the special handling of -l0
  diff: correct warning message when renameLimit exceeded

 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt | 14 +++++++++-----
 diff.c                         |  2 +-
 4 files changed, 20 insertions(+), 13 deletions(-)


base-commit: 670b81a890388c60b7032a4f5b879f2ece8c4558
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v1
Pull-Request: https://github.com/git/git/pull/1044
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/3] doc: clarify documentation for rename/copy limits
  2021-07-11  0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget
@ 2021-07-11  0:46 ` Elijah Newren via GitGitGadget
  2021-07-11  4:37   ` Bagas Sanjaya
  2021-07-12 15:03   ` Derrick Stolee
  2021-07-11  0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-11  0:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection.  The following two commits each
introduced an exception to this:

    9027f53cb505 (Do linear-time/space rename logic for exact renames,
                  2007-10-25)
    bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based
                  on basenames, 2021-02-14)

(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection.  So, for copy detection, the documentation was
closer to correct.)

Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm.  While at it, also note the
default values for all these settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt | 11 ++++++-----
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index 2d3331f55c2..31b96e8294f 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -118,9 +118,10 @@ diff.orderFile::
 	relative to the top of the working tree.
 
 diff.renameLimit::
-	The number of files to consider when performing the copy/rename
-	detection; equivalent to the 'git diff' option `-l`. This setting
-	has no effect if rename detection is turned off.
+	The number of files to consider in the quadratic portion of
+	copy/rename detection; equivalent to the 'git diff' option
+	`-l`.  If not set, the default value is 400.  This setting has
+	no effect if rename detection is turned off.
 
 diff.renames::
 	Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
index cb2ed589075..d030b936d1f 100644
--- a/Documentation/config/merge.txt
+++ b/Documentation/config/merge.txt
@@ -33,10 +33,12 @@ merge.verifySignatures::
 include::fmt-merge-msg.txt[]
 
 merge.renameLimit::
-	The number of files to consider when performing rename detection
-	during a merge; if not specified, defaults to the value of
-	diff.renameLimit. This setting has no effect if rename detection
-	is turned off.
+	The number of files to consider in the quadratic portion of
+	rename detection during a merge.  If not specified, defaults
+	to the value of diff.renameLimit.  If neither
+	merge.renameLimit nor diff.renameLimit are specified, defaults
+	to 1000.  This setting has no effect if rename detection is
+	turned off.
 
 merge.renames::
 	Whether Git detects renames.  If set to "false", rename detection
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 32e6dee5ac3..b5682b83956 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
 of a delete/create pair.
 
 -l<num>::
-	The `-M` and `-C` options require O(n^2) processing time where n
-	is the number of potential rename/copy targets.  This
-	option prevents rename/copy detection from running if
-	the number of rename/copy targets exceeds the specified
-	number.
+	The `-M` and `-C` options can require O(n^2) processing time
+	where n is the number of potential rename/copy targets.  This
+	option prevents the quadratic portion of rename/copy detection
+	from running if the number of rename/copy targets exceeds the
+	specified number.  Defaults to diff.renameLimit, or if that is
+	also unspecified, to 400.
 
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/3] doc: document the special handling of -l0
  2021-07-11  0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget
  2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
@ 2021-07-11  0:46 ` Elijah Newren via GitGitGadget
  2021-07-11  4:54   ` Eric Sunshine
  2021-07-11  0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-11  0:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
historical value associated with it.  Document this value, particularly
since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
(diff: remove silent clamp of renameLimit, 2017-11-13).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/diff-options.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b5682b83956..17c8455b908 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -595,6 +595,9 @@ of a delete/create pair.
 	specified number.  Defaults to diff.renameLimit, or if that is
 	also unspecified, to 400.
 
+	Note that for backward compatibility reasons, a value of 0 is treated
+	the same as if 32767 was passed.
+
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
 	Select only files that are Added (`A`), Copied (`C`),
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-11  0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget
  2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
  2021-07-11  0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
@ 2021-07-11  0:46 ` Elijah Newren via GitGitGadget
  2021-07-12 15:09   ` Derrick Stolee
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-11  0:46 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97a0
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14), so now the correct way to refer to quadratic rename
detection is "quadratic rename detection".  Fix the warning accordingly.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..343f391edf8 100644
--- a/diff.c
+++ b/diff.c
@@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q)
 }
 
 static const char rename_limit_warning[] =
-N_("inexact rename detection was skipped due to too many files.");
+N_("quadratic rename detection was skipped due to too many files.");
 
 static const char degrade_cc_to_c_warning[] =
 N_("only found copies from modified paths due to too many files.");
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits
  2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
@ 2021-07-11  4:37   ` Bagas Sanjaya
  2021-07-11  4:52     ` Elijah Newren
  2021-07-12 15:03   ` Derrick Stolee
  1 sibling, 1 reply; 44+ messages in thread
From: Bagas Sanjaya @ 2021-07-11  4:37 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 11/07/21 07.46, Elijah Newren via GitGitGadget wrote:
>   -l<num>::
> -	The `-M` and `-C` options require O(n^2) processing time where n
> -	is the number of potential rename/copy targets.  This
> -	option prevents rename/copy detection from running if
> -	the number of rename/copy targets exceeds the specified
> -	number.
> +	The `-M` and `-C` options can require O(n^2) processing time
> +	where n is the number of potential rename/copy targets.  This
> +	option prevents the quadratic portion of rename/copy detection
> +	from running if the number of rename/copy targets exceeds the
> +	specified number.  Defaults to diff.renameLimit, or if that is
> +	also unspecified, to 400.

Why not just defaults to diff.renameLimit if the default for it and -l 
are both 400?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits
  2021-07-11  4:37   ` Bagas Sanjaya
@ 2021-07-11  4:52     ` Elijah Newren
  0 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren @ 2021-07-11  4:52 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sat, Jul 10, 2021 at 9:37 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 11/07/21 07.46, Elijah Newren via GitGitGadget wrote:
> >   -l<num>::
> > -     The `-M` and `-C` options require O(n^2) processing time where n
> > -     is the number of potential rename/copy targets.  This
> > -     option prevents rename/copy detection from running if
> > -     the number of rename/copy targets exceeds the specified
> > -     number.
> > +     The `-M` and `-C` options can require O(n^2) processing time
> > +     where n is the number of potential rename/copy targets.  This
> > +     option prevents the quadratic portion of rename/copy detection
> > +     from running if the number of rename/copy targets exceeds the
> > +     specified number.  Defaults to diff.renameLimit, or if that is
> > +     also unspecified, to 400.
>
> Why not just defaults to diff.renameLimit if the default for it and -l
> are both 400?

Good point; I'll make that change and include it in the next re-roll.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] doc: document the special handling of -l0
  2021-07-11  0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
@ 2021-07-11  4:54   ` Eric Sunshine
  2021-07-11  4:54     ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Sunshine @ 2021-07-11  4:54 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git List, Elijah Newren

On Sat, Jul 10, 2021 at 8:46 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
> mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
> historical value associated with it.  Document this value, particularly
> since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
> (diff: remove silent clamp of renameLimit, 2017-11-13).
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> @@ -595,6 +595,9 @@ of a delete/create pair.
>         specified number.  Defaults to diff.renameLimit, or if that is
>         also unspecified, to 400.
>
> +       Note that for backward compatibility reasons, a value of 0 is treated
> +       the same as if 32767 was passed.

To get this to format properly, you'll need to drop the indentation
from the new paragraph and replace the blank line preceding it with a
line containing just a plus (`+`).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/3] doc: document the special handling of -l0
  2021-07-11  4:54   ` Eric Sunshine
@ 2021-07-11  4:54     ` Elijah Newren
  0 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren @ 2021-07-11  4:54 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List

On Sat, Jul 10, 2021 at 9:54 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Sat, Jul 10, 2021 at 8:46 PM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
> > mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
> > historical value associated with it.  Document this value, particularly
> > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
> > (diff: remove silent clamp of renameLimit, 2017-11-13).
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > @@ -595,6 +595,9 @@ of a delete/create pair.
> >         specified number.  Defaults to diff.renameLimit, or if that is
> >         also unspecified, to 400.
> >
> > +       Note that for backward compatibility reasons, a value of 0 is treated
> > +       the same as if 32767 was passed.
>
> To get this to format properly, you'll need to drop the indentation
> from the new paragraph and replace the blank line preceding it with a
> line containing just a plus (`+`).

Ok, will do.  Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits
  2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
  2021-07-11  4:37   ` Bagas Sanjaya
@ 2021-07-12 15:03   ` Derrick Stolee
  2021-07-12 21:27     ` Junio C Hamano
  1 sibling, 1 reply; 44+ messages in thread
From: Derrick Stolee @ 2021-07-12 15:03 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
...
>  diff.renameLimit::
> -	The number of files to consider when performing the copy/rename
> -	detection; equivalent to the 'git diff' option `-l`. This setting
> -	has no effect if rename detection is turned off.
> +	The number of files to consider in the quadratic portion of
> +	copy/rename detection; equivalent to the 'git diff' option
> +	`-l`.  If not set, the default value is 400.  This setting has
> +	no effect if rename detection is turned off.
...
>  merge.renameLimit::
> -	The number of files to consider when performing rename detection
> -	during a merge; if not specified, defaults to the value of
> -	diff.renameLimit. This setting has no effect if rename detection
> -	is turned off.
> +	The number of files to consider in the quadratic portion of
> +	rename detection during a merge.  If not specified, defaults
> +	to the value of diff.renameLimit.  If neither
> +	merge.renameLimit nor diff.renameLimit are specified, defaults
> +	to 1000.  This setting has no effect if rename detection is
> +	turned off.
...
>  -l<num>::
> -	The `-M` and `-C` options require O(n^2) processing time where n
> -	is the number of potential rename/copy targets.  This
> -	option prevents rename/copy detection from running if
> -	the number of rename/copy targets exceeds the specified
> -	number.
> +	The `-M` and `-C` options can require O(n^2) processing time
> +	where n is the number of potential rename/copy targets.  This
> +	option prevents the quadratic portion of rename/copy detection
> +	from running if the number of rename/copy targets exceeds the
> +	specified number.  Defaults to diff.renameLimit, or if that is
> +	also unspecified, to 400.

These changes are clear to me, but I also know how a bit about how
the rename algorithm works and what "the quadratic portion" means.
I wonder if this is sufficient detail to help a user less versed in
Git's internals.

Perhaps we should instead be describing the set of "potential inexact
renames" as a more descriptive approach? Here is a possible way to
use this wording instead:

  merge.renameLimit::
      The number of potential inexact renames to consider when
      performing rename detection during a merge; if this limit
      is exceeded, then no inexact renames are computed. Renames
      where the content does not change are excluded from this
      limit. If not specified, defaults to...

Feel free to take or leave any of this example.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-11  0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
@ 2021-07-12 15:09   ` Derrick Stolee
  2021-07-12 18:13     ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Derrick Stolee @ 2021-07-12 15:09 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> The warning when quadratic rename detection was skipped referred to
> "inexact rename detection".  For years, the only linear portion of
> rename detection was looking for exact renames, so "inexact rename
> detection" was an accurate way to refer to the quadratic portion of
> rename detection.  However, that changed with commit bd24aa2f97a0
> (diffcore-rename: guide inexact rename detection based on basenames,
> 2021-02-14), so now the correct way to refer to quadratic rename
> detection is "quadratic rename detection".  Fix the warning accordingly.

Now that I read this more specific reason for using "quadratic", my
earlier comments on patch 1 are slightly less helpful. Specifically,
I was recommending to continue using "inexact renames" but that is
not 100% true anymore.

I still think this "quadratic rename detection" is perhaps hard to
parse as a non-expert. This subtlety of some "easy" inexact renames
definitely makes the definition harder.

Since the steps that find inexact renames without the quadratic
algorithm are heuristics, perhaps this portion could instead be
called "exhaustive rename detection" or even "expensive rename
detection"? It perhaps implies more directly that the limit exists
as a way to prevent an expensive operation.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-12 15:09   ` Derrick Stolee
@ 2021-07-12 18:13     ` Elijah Newren
  2021-07-14  0:47       ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-12 18:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > The warning when quadratic rename detection was skipped referred to
> > "inexact rename detection".  For years, the only linear portion of
> > rename detection was looking for exact renames, so "inexact rename
> > detection" was an accurate way to refer to the quadratic portion of
> > rename detection.  However, that changed with commit bd24aa2f97a0
> > (diffcore-rename: guide inexact rename detection based on basenames,
> > 2021-02-14), so now the correct way to refer to quadratic rename
> > detection is "quadratic rename detection".  Fix the warning accordingly.
>
> Now that I read this more specific reason for using "quadratic", my
> earlier comments on patch 1 are slightly less helpful. Specifically,
> I was recommending to continue using "inexact renames" but that is
> not 100% true anymore.
>
> I still think this "quadratic rename detection" is perhaps hard to
> parse as a non-expert. This subtlety of some "easy" inexact renames
> definitely makes the definition harder.
>
> Since the steps that find inexact renames without the quadratic
> algorithm are heuristics, perhaps this portion could instead be
> called "exhaustive rename detection" or even "expensive rename
> detection"? It perhaps implies more directly that the limit exists
> as a way to prevent an expensive operation.

The name "exhaustive rename detection" seems reasonable to me.  I'll
resubmit using that term and see what folks think.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits
  2021-07-12 15:03   ` Derrick Stolee
@ 2021-07-12 21:27     ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2021-07-12 21:27 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren

Derrick Stolee <stolee@gmail.com> writes:

>   merge.renameLimit::
>       The number of potential inexact renames to consider when
>       performing rename detection during a merge; if this limit
>       is exceeded, then no inexact renames are computed. Renames
>       where the content does not change are excluded from this
>       limit. If not specified, defaults to...
>
> Feel free to take or leave any of this example.

Phrases like "finding inexact renames" and "finding which files were
moved with changes" may tell what we are spending cycles on to the
end user, but I have to say I agree that "quadratic portion" would
not resonate with the intended audiences at all.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-12 18:13     ` Elijah Newren
@ 2021-07-14  0:47       ` Junio C Hamano
  2021-07-14  1:06         ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2021-07-14  0:47 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> Since the steps that find inexact renames without the quadratic
>> algorithm are heuristics, perhaps this portion could instead be
>> called "exhaustive rename detection" or even "expensive rename
>> detection"? It perhaps implies more directly that the limit exists
>> as a way to prevent an expensive operation.
>
> The name "exhaustive rename detection" seems reasonable to me.  I'll
> resubmit using that term and see what folks think.

Funny.  In a sense, since computing content similarity is _more_
precise way to find a pair that is a likely rename than the more
recent heuristics, the more expensive "exhaustive" one can be called
"more precise rename detection", even though it is tempting to use
"inexact" to call it, too ;-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-14  0:47       ` Junio C Hamano
@ 2021-07-14  1:06         ` Elijah Newren
  2021-07-14  1:10           ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14  1:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Jul 13, 2021 at 5:47 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> Since the steps that find inexact renames without the quadratic
> >> algorithm are heuristics, perhaps this portion could instead be
> >> called "exhaustive rename detection" or even "expensive rename
> >> detection"? It perhaps implies more directly that the limit exists
> >> as a way to prevent an expensive operation.
> >
> > The name "exhaustive rename detection" seems reasonable to me.  I'll
> > resubmit using that term and see what folks think.
>
> Funny.  In a sense, since computing content similarity is _more_
> precise way to find a pair that is a likely rename than the more
> recent heuristics, the more expensive "exhaustive" one can be called
> "more precise rename detection", even though it is tempting to use
> "inexact" to call it, too ;-)

I'm afraid I'm not following.  The more recent heuristics use content
similarity as well; in fact, they use the exact same function with a
higher threshold: estimate_similarity().

The exhaustiveness of the quadratic portion comes from comparing each
file to more other files, not in using a different type of comparison.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-14  1:06         ` Elijah Newren
@ 2021-07-14  1:10           ` Junio C Hamano
  2021-07-14  1:22             ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2021-07-14  1:10 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> The exhaustiveness of the quadratic portion comes from comparing each
> file to more other files, not in using a different type of comparison.

Exactly.  By culling potential matches early with heuristics, we
make a trade-off of risking false-negatives but save a lot of cycles
while trying to find "renames with modifications (which is what we
called 'inexact rename')", and my comment equated  fewer false-negatives
with more precision.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults
  2021-07-11  0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-07-11  0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
@ 2021-07-14  1:12 ` Elijah Newren via GitGitGadget
  2021-07-14  1:12   ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
                     ` (4 more replies)
  3 siblings, 5 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-14  1:12 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren

Fix a few small issues with documentation and warnings around the limits for
the quadratic portion of rename (&copy) detection, and bump the default
limits.

Discussion on bumping the limits can be found at [1]. Although it appears we
generally agree we could switch to an unlimited setting for
merge.renameLimit, that would require some changes to progress bars to
notify users how to take action once things start taking a while. So, for
now, just bump the limits.

[1]
https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u

Changes since v1:

 * Shuffled patch order since the explanation of why "inexact rename
   detection" is incorrect was in the third patch
 * Use the term "exhaustive rename detection" for the quadratic portion
 * Simplify -l description by just stating that it defaults to
   diff.renameLimit (since it in turn has the right default value)
 * Fix asciidoc formating
 * Include bump of the limits in a new patch

Elijah Newren (4):
  diff: correct warning message when renameLimit exceeded
  doc: clarify documentation for rename/copy limits
  doc: document the special handling of -l0
  Bump rename limit defaults (yet again)

 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt | 12 ++++++++----
 diff.c                         |  4 ++--
 merge-ort.c                    |  2 +-
 merge-recursive.c              |  2 +-
 6 files changed, 22 insertions(+), 15 deletions(-)


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v2
Pull-Request: https://github.com/git/git/pull/1044

Range-diff vs v1:

 3:  44a5d5efaa6 ! 1:  0d1d0f180a3 diff: correct warning message when renameLimit exceeded
     @@ Commit message
          detection" was an accurate way to refer to the quadratic portion of
          rename detection.  However, that changed with commit bd24aa2f97a0
          (diffcore-rename: guide inexact rename detection based on basenames,
     -    2021-02-14), so now the correct way to refer to quadratic rename
     -    detection is "quadratic rename detection".  Fix the warning accordingly.
     +    2021-02-14).  Let's instead use the term "exhaustive rename detection"
     +    to refer to the quadratic portion.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ diff.c: static int is_summary_empty(const struct diff_queue_struct *q)
       
       static const char rename_limit_warning[] =
      -N_("inexact rename detection was skipped due to too many files.");
     -+N_("quadratic rename detection was skipped due to too many files.");
     ++N_("exhaustive rename detection was skipped due to too many files.");
       
       static const char degrade_cc_to_c_warning[] =
       N_("only found copies from modified paths due to too many files.");
 1:  7dc448df6ec ! 2:  4046993a9a2 doc: clarify documentation for rename/copy limits
     @@ Documentation/config/diff.txt: diff.orderFile::
      -	The number of files to consider when performing the copy/rename
      -	detection; equivalent to the 'git diff' option `-l`. This setting
      -	has no effect if rename detection is turned off.
     -+	The number of files to consider in the quadratic portion of
     ++	The number of files to consider in the exhaustive portion of
      +	copy/rename detection; equivalent to the 'git diff' option
      +	`-l`.  If not set, the default value is 400.  This setting has
      +	no effect if rename detection is turned off.
     @@ Documentation/config/merge.txt: merge.verifySignatures::
      -	during a merge; if not specified, defaults to the value of
      -	diff.renameLimit. This setting has no effect if rename detection
      -	is turned off.
     -+	The number of files to consider in the quadratic portion of
     ++	The number of files to consider in the exhaustive portion of
      +	rename detection during a merge.  If not specified, defaults
      +	to the value of diff.renameLimit.  If neither
      +	merge.renameLimit nor diff.renameLimit are specified, defaults
     @@ Documentation/diff-options.txt: When used together with `-B`, omit also the prei
      -	The `-M` and `-C` options require O(n^2) processing time where n
      -	is the number of potential rename/copy targets.  This
      -	option prevents rename/copy detection from running if
     --	the number of rename/copy targets exceeds the specified
     ++	The `-M` and `-C` options have an exhaustive portion that
     ++	requires O(n^2) processing time where n is the number of
     ++	potential rename/copy targets.  This option prevents the
     ++	exhaustive portion of rename/copy detection from running if
     + 	the number of rename/copy targets exceeds the specified
      -	number.
     -+	The `-M` and `-C` options can require O(n^2) processing time
     -+	where n is the number of potential rename/copy targets.  This
     -+	option prevents the quadratic portion of rename/copy detection
     -+	from running if the number of rename/copy targets exceeds the
     -+	specified number.  Defaults to diff.renameLimit, or if that is
     -+	also unspecified, to 400.
     ++	number.  Defaults to diff.renameLimit.
       
       ifndef::git-format-patch[]
       --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
 2:  ee0969429cb ! 3:  6f5767607cd doc: document the special handling of -l0
     @@ Commit message
      
       ## Documentation/diff-options.txt ##
      @@ Documentation/diff-options.txt: of a delete/create pair.
     - 	specified number.  Defaults to diff.renameLimit, or if that is
     - 	also unspecified, to 400.
     + 	exhaustive portion of rename/copy detection from running if
     + 	the number of rename/copy targets exceeds the specified
     + 	number.  Defaults to diff.renameLimit.
     +++
     ++Note that for backward compatibility reasons, a value of 0 is treated
     ++the same as if a large value was passed (currently, 32767).
       
     -+	Note that for backward compatibility reasons, a value of 0 is treated
     -+	the same as if 32767 was passed.
     -+
       ifndef::git-format-patch[]
       --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
     - 	Select only files that are Added (`A`), Copied (`C`),
 -:  ----------- > 4:  8f1deb6dd16 Bump rename limit defaults (yet again)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
@ 2021-07-14  1:12   ` Elijah Newren via GitGitGadget
  2021-07-14  1:12   ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-14  1:12 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97a0
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14).  Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..2454e34cf6d 100644
--- a/diff.c
+++ b/diff.c
@@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q)
 }
 
 static const char rename_limit_warning[] =
-N_("inexact rename detection was skipped due to too many files.");
+N_("exhaustive rename detection was skipped due to too many files.");
 
 static const char degrade_cc_to_c_warning[] =
 N_("only found copies from modified paths due to too many files.");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/4] doc: clarify documentation for rename/copy limits
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  2021-07-14  1:12   ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
@ 2021-07-14  1:12   ` Elijah Newren via GitGitGadget
  2021-07-14  7:37     ` Ævar Arnfjörð Bjarmason
  2021-07-14  1:12   ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-14  1:12 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection.  The following two commits each
introduced an exception to this:

    9027f53cb505 (Do linear-time/space rename logic for exact renames,
                  2007-10-25)
    bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based
                  on basenames, 2021-02-14)

(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection.  So, for copy detection, the documentation was
closer to correct.)

Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm.  While at it, also note the
default values for all these settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt |  9 +++++----
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index 2d3331f55c2..e26a63d0d42 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -118,9 +118,10 @@ diff.orderFile::
 	relative to the top of the working tree.
 
 diff.renameLimit::
-	The number of files to consider when performing the copy/rename
-	detection; equivalent to the 'git diff' option `-l`. This setting
-	has no effect if rename detection is turned off.
+	The number of files to consider in the exhaustive portion of
+	copy/rename detection; equivalent to the 'git diff' option
+	`-l`.  If not set, the default value is 400.  This setting has
+	no effect if rename detection is turned off.
 
 diff.renames::
 	Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
index 6b66c83eabe..aca0c92dbe6 100644
--- a/Documentation/config/merge.txt
+++ b/Documentation/config/merge.txt
@@ -33,10 +33,12 @@ merge.verifySignatures::
 include::fmt-merge-msg.txt[]
 
 merge.renameLimit::
-	The number of files to consider when performing rename detection
-	during a merge; if not specified, defaults to the value of
-	diff.renameLimit. This setting has no effect if rename detection
-	is turned off.
+	The number of files to consider in the exhaustive portion of
+	rename detection during a merge.  If not specified, defaults
+	to the value of diff.renameLimit.  If neither
+	merge.renameLimit nor diff.renameLimit are specified, defaults
+	to 1000.  This setting has no effect if rename detection is
+	turned off.
 
 merge.renames::
 	Whether Git detects renames.  If set to "false", rename detection
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 32e6dee5ac3..11e08c3fd36 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
 of a delete/create pair.
 
 -l<num>::
-	The `-M` and `-C` options require O(n^2) processing time where n
-	is the number of potential rename/copy targets.  This
-	option prevents rename/copy detection from running if
+	The `-M` and `-C` options have an exhaustive portion that
+	requires O(n^2) processing time where n is the number of
+	potential rename/copy targets.  This option prevents the
+	exhaustive portion of rename/copy detection from running if
 	the number of rename/copy targets exceeds the specified
-	number.
+	number.  Defaults to diff.renameLimit.
 
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/4] doc: document the special handling of -l0
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  2021-07-14  1:12   ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
  2021-07-14  1:12   ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
@ 2021-07-14  1:12   ` Elijah Newren via GitGitGadget
  2021-07-14 16:45     ` Jeff King
  2021-07-14  1:12   ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  4 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-14  1:12 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
historical value associated with it.  Document this value, particularly
since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
(diff: remove silent clamp of renameLimit, 2017-11-13).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/diff-options.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 11e08c3fd36..ba40ac66cc9 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -594,6 +594,9 @@ of a delete/create pair.
 	exhaustive portion of rename/copy detection from running if
 	the number of rename/copy targets exceeds the specified
 	number.  Defaults to diff.renameLimit.
++
+Note that for backward compatibility reasons, a value of 0 is treated
+the same as if a large value was passed (currently, 32767).
 
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-07-14  1:12   ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
@ 2021-07-14  1:12   ` Elijah Newren via GitGitGadget
  2021-07-14 16:43     ` Jeff King
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  4 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-14  1:12 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

These were last bumped in commit 92c57e5c1d29 (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/config/diff.txt  | 2 +-
 Documentation/config/merge.txt | 2 +-
 diff.c                         | 2 +-
 merge-ort.c                    | 2 +-
 merge-recursive.c              | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index e26a63d0d42..167c479f376 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -120,7 +120,7 @@ diff.orderFile::
 diff.renameLimit::
 	The number of files to consider in the exhaustive portion of
 	copy/rename detection; equivalent to the 'git diff' option
-	`-l`.  If not set, the default value is 400.  This setting has
+	`-l`.  If not set, the default value is 1000.  This setting has
 	no effect if rename detection is turned off.
 
 diff.renames::
diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
index aca0c92dbe6..101eabb3f0c 100644
--- a/Documentation/config/merge.txt
+++ b/Documentation/config/merge.txt
@@ -37,7 +37,7 @@ merge.renameLimit::
 	rename detection during a merge.  If not specified, defaults
 	to the value of diff.renameLimit.  If neither
 	merge.renameLimit nor diff.renameLimit are specified, defaults
-	to 1000.  This setting has no effect if rename detection is
+	to 7000.  This setting has no effect if rename detection is
 	turned off.
 
 merge.renames::
diff --git a/diff.c b/diff.c
index 2454e34cf6d..0244a371d32 100644
--- a/diff.c
+++ b/diff.c
@@ -35,7 +35,7 @@
 
 static int diff_detect_rename_default;
 static int diff_indent_heuristic = 1;
-static int diff_rename_limit_default = 400;
+static int diff_rename_limit_default = 1000;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
 static int diff_color_moved_default;
diff --git a/merge-ort.c b/merge-ort.c
index b954f7184a5..8a84375e940 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2558,7 +2558,7 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
 	diff_opts.rename_limit = opt->rename_limit;
 	if (opt->rename_limit <= 0)
-		diff_opts.rename_limit = 1000;
+		diff_opts.rename_limit = 7000;
 	diff_opts.rename_score = opt->rename_score;
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
diff --git a/merge-recursive.c b/merge-recursive.c
index 4327e0cfa33..f19f8cc37bd 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1879,7 +1879,7 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *opt,
 	 */
 	if (opts.detect_rename > DIFF_DETECT_RENAME)
 		opts.detect_rename = DIFF_DETECT_RENAME;
-	opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 1000;
+	opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 7000;
 	opts.rename_score = opt->rename_score;
 	opts.show_rename_progress = opt->show_rename_progress;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-14  1:10           ` Junio C Hamano
@ 2021-07-14  1:22             ` Elijah Newren
  2021-07-14  5:17               ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14  1:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Jul 13, 2021 at 6:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > The exhaustiveness of the quadratic portion comes from comparing each
> > file to more other files, not in using a different type of comparison.
>
> Exactly.  By culling potential matches early with heuristics, we
> make a trade-off of risking false-negatives but save a lot of cycles
> while trying to find "renames with modifications (which is what we
> called 'inexact rename')", and my comment equated  fewer false-negatives
> with more precision.

Okay, I think I'm following what you're saying now, mostly, but I'm
curious about the false negative comment.

Am I mixing up negatives/positives (as I'm prone to do), or would it
be more correct to say the new algorithm risks suboptimal positives
rather than that it risks false negatives?

In particular, the new algorithm will compare files with the same
basename and just accept that pairing if they are similar enough, even
if there might be a better match elsewhere.  However, a lack of a
match in same-basenamed files will not cause those files to have no
match; they will instead just be included in the exhaustive detection
portion, so we can still detect renames for such paths.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-14  1:22             ` Elijah Newren
@ 2021-07-14  5:17               ` Junio C Hamano
  2021-07-14 15:09                 ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2021-07-14  5:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Am I mixing up negatives/positives (as I'm prone to do), or would it
> be more correct to say the new algorithm risks suboptimal positives
> rather than that it risks false negatives?

I'm prone to mixing them up, too, but I think they are the sides of
the same coin.  Imagine there is a path X on the source side, and
two paths Y and Z on the destination side.  With exhaustive match,
Z might be a better match (content-wise) to X than Y is to X.

For the path X on the source that is matched with a suboptimal
counterpart Y on the destination side, we may call the situation a
false-positive because with a more exhaustive search we might have
been able to find Z that is a better match.  For the path Z on the
destination side that was culled too early with heuristics and
failed to be matched with the source path X that got matched with a
suboptimal destination path Y, it is a loss for Z---it wasn't chosen
when it should have been (i.e. a false negative, as Z saw no
counterparts).

In any case, during the word search for "inexact", "more precise",
"more expensive", I do not think negatives and positives will play a
big role anyway, so...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits
  2021-07-14  1:12   ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
@ 2021-07-14  7:37     ` Ævar Arnfjörð Bjarmason
  2021-07-14 16:30       ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-14  7:37 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Jeff King,
	Elijah Newren


On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> A few places in the docs implied that rename/copy detection is always
> quadratic or that all (unpaired) files were involved in the quadratic
> portion of rename/copy detection.  The following two commits each
> introduced an exception to this:
>
>     9027f53cb505 (Do linear-time/space rename logic for exact renames,
>                   2007-10-25)
>     bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based
>                   on basenames, 2021-02-14)
>
> (As a side note, for copy detection, the basename guided inexact rename
> detection is turned off and the exact renames will only result in
> sources (without the dests) being removed from the set of files used in
> quadratic detection.  So, for copy detection, the documentation was
> closer to correct.)
>
> Avoid implying that all files involved in rename/copy detection are
> subject to the full quadratic algorithm.  While at it, also note the
> default values for all these settings.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/config/diff.txt  |  7 ++++---
>  Documentation/config/merge.txt | 10 ++++++----
>  Documentation/diff-options.txt |  9 +++++----
>  3 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> index 2d3331f55c2..e26a63d0d42 100644
> --- a/Documentation/config/diff.txt
> +++ b/Documentation/config/diff.txt
> @@ -118,9 +118,10 @@ diff.orderFile::
>  	relative to the top of the working tree.
>  
>  diff.renameLimit::
> -	The number of files to consider when performing the copy/rename
> -	detection; equivalent to the 'git diff' option `-l`. This setting
> -	has no effect if rename detection is turned off.
> +	The number of files to consider in the exhaustive portion of
> +	copy/rename detection; equivalent to the 'git diff' option
> +	`-l`.  If not set, the default value is 400.  This setting has
> +	no effect if rename detection is turned off.

For both this...

>  diff.renames::
>  	Whether and how Git detects renames.  If set to "false",
> diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
> index 6b66c83eabe..aca0c92dbe6 100644
> --- a/Documentation/config/merge.txt
> +++ b/Documentation/config/merge.txt
> @@ -33,10 +33,12 @@ merge.verifySignatures::
>  include::fmt-merge-msg.txt[]
>  
>  merge.renameLimit::
> -	The number of files to consider when performing rename detection
> -	during a merge; if not specified, defaults to the value of
> -	diff.renameLimit. This setting has no effect if rename detection
> -	is turned off.
> +	The number of files to consider in the exhaustive portion of
> +	rename detection during a merge.  If not specified, defaults
> +	to the value of diff.renameLimit.  If neither
> +	merge.renameLimit nor diff.renameLimit are specified, defaults
> +	to 1000.  This setting has no effect if rename detection is
> +	turned off.

...and this we should really have some wording to the effect of:

    ..., defaults to XYZ. The exact (or even approximate) default of XYZ
    should not be relied upon, and may be changed (or these limits even
    removed) in future versions of git.

I.e. let's distinguish a "this is how it works now, FYI" from a forward
promise that it's going to work like that forever.

Which also leads me to:

>  merge.renames::
>  	Whether Git detects renames.  If set to "false", rename detection
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 32e6dee5ac3..11e08c3fd36 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
>  of a delete/create pair.
>  
>  -l<num>::
> -	The `-M` and `-C` options require O(n^2) processing time where n
> -	is the number of potential rename/copy targets.  This
> -	option prevents rename/copy detection from running if
> +	The `-M` and `-C` options have an exhaustive portion that
> +	requires O(n^2) processing time where n is the number of
> +	potential rename/copy targets.  This option prevents the
> +	exhaustive portion of rename/copy detection from running if
>  	the number of rename/copy targets exceeds the specified
> -	number.
> +	number.  Defaults to diff.renameLimit.

This mention of O(n^2). This is not an issue with your patch/series, nd
this is an improvement.

But as a side-comment: Do we explain somewhere how exactly this
{diff,merge}.renmeLimit=N works for values of N? It's probably fine to
continue to handwave it away, but is there anything that say tells a
user what happens if they have 400 files in their repository in a "x"
directory and "git mv x y"? Will it work, but if they add a 401th file
it won't for diffs?

I *think* given a skimming of previous discussions that it's "no",
i.e. that this just kicks in for these expensive cases, and e.g. for
such a case of moving the same tree OID from "x" to "y" we'd
short-circuit things, but maybe I'm just making things up at this point.

Feel free to ignore me here, I guess this amounts to a request that it
would be great if we could point to some docs about how this works. It's
probably too much detail for Documentation/config/{merge,diff}.txt, so
maybe a section in either of git-{diff,merge}.txt ?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded
  2021-07-14  5:17               ` Junio C Hamano
@ 2021-07-14 15:09                 ` Elijah Newren
  0 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 15:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

On Tue, Jul 13, 2021 at 10:17 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Am I mixing up negatives/positives (as I'm prone to do), or would it
> > be more correct to say the new algorithm risks suboptimal positives
> > rather than that it risks false negatives?
>
> I'm prone to mixing them up, too, but I think they are the sides of
> the same coin.  Imagine there is a path X on the source side, and
> two paths Y and Z on the destination side.  With exhaustive match,
> Z might be a better match (content-wise) to X than Y is to X.
>
> For the path X on the source that is matched with a suboptimal
> counterpart Y on the destination side, we may call the situation a
> false-positive because with a more exhaustive search we might have
> been able to find Z that is a better match.  For the path Z on the
> destination side that was culled too early with heuristics and
> failed to be matched with the source path X that got matched with a
> suboptimal destination path Y, it is a loss for Z---it wasn't chosen
> when it should have been (i.e. a false negative, as Z saw no
> counterparts).
>
> In any case, during the word search for "inexact", "more precise",
> "more expensive", I do not think negatives and positives will play a
> big role anyway, so...

Indeed, I've gotten us off on a bit of a tangent, but thanks for
taking the time to answer my questions.  :-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits
  2021-07-14  7:37     ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 16:30       ` Elijah Newren
  2021-07-14 22:08         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 16:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee, Jeff King

On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
...
> > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> > index 2d3331f55c2..e26a63d0d42 100644
> > --- a/Documentation/config/diff.txt
> > +++ b/Documentation/config/diff.txt
> > @@ -118,9 +118,10 @@ diff.orderFile::
> >       relative to the top of the working tree.
> >
> >  diff.renameLimit::
> > -     The number of files to consider when performing the copy/rename
> > -     detection; equivalent to the 'git diff' option `-l`. This setting
> > -     has no effect if rename detection is turned off.
> > +     The number of files to consider in the exhaustive portion of
> > +     copy/rename detection; equivalent to the 'git diff' option
> > +     `-l`.  If not set, the default value is 400.  This setting has
> > +     no effect if rename detection is turned off.
>
> For both this...
>
> >  diff.renames::
> >       Whether and how Git detects renames.  If set to "false",
> > diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
> > index 6b66c83eabe..aca0c92dbe6 100644
> > --- a/Documentation/config/merge.txt
> > +++ b/Documentation/config/merge.txt
> > @@ -33,10 +33,12 @@ merge.verifySignatures::
> >  include::fmt-merge-msg.txt[]
> >
> >  merge.renameLimit::
> > -     The number of files to consider when performing rename detection
> > -     during a merge; if not specified, defaults to the value of
> > -     diff.renameLimit. This setting has no effect if rename detection
> > -     is turned off.
> > +     The number of files to consider in the exhaustive portion of
> > +     rename detection during a merge.  If not specified, defaults
> > +     to the value of diff.renameLimit.  If neither
> > +     merge.renameLimit nor diff.renameLimit are specified, defaults
> > +     to 1000.  This setting has no effect if rename detection is
> > +     turned off.
>
> ...and this we should really have some wording to the effect of:
>
>     ..., defaults to XYZ. The exact (or even approximate) default of XYZ
>     should not be relied upon, and may be changed (or these limits even
>     removed) in future versions of git.
>
> I.e. let's distinguish a "this is how it works now, FYI" from a forward
> promise that it's going to work like that forever.

Perhaps I could simplify that to "...currently defaults to XYZ"?  Does
that capture your intent well enough?

I agree very much that this shouldn't be a hard promise.  If it was,
we've broken it repeatedly over the years.  Not only have we modified
the default numbers multiple times, there have also been multiple
times when we've been able to change how many renames could be handled
without changing the default numbers for the rename limits.  (This was
done by adding or improving special cases that allow us to exclude
paths from the exhaustive comparison; exact rename detection that
someone else added and my more recent improvements to exact rename
detection are two simple examples).

> Which also leads me to:
>
> >  merge.renames::
> >       Whether Git detects renames.  If set to "false", rename detection
> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > index 32e6dee5ac3..11e08c3fd36 100644
> > --- a/Documentation/diff-options.txt
> > +++ b/Documentation/diff-options.txt
> > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
> >  of a delete/create pair.
> >
> >  -l<num>::
> > -     The `-M` and `-C` options require O(n^2) processing time where n
> > -     is the number of potential rename/copy targets.  This
> > -     option prevents rename/copy detection from running if
> > +     The `-M` and `-C` options have an exhaustive portion that
> > +     requires O(n^2) processing time where n is the number of
> > +     potential rename/copy targets.  This option prevents the
> > +     exhaustive portion of rename/copy detection from running if
> >       the number of rename/copy targets exceeds the specified
> > -     number.
> > +     number.  Defaults to diff.renameLimit.
>
> This mention of O(n^2). This is not an issue with your patch/series, nd
> this is an improvement.
>
> But as a side-comment: Do we explain somewhere how exactly this
> {diff,merge}.renmeLimit=N works for values of N? It's probably fine to
> continue to handwave it away, but is there anything that say tells a
> user what happens if they have 400 files in their repository in a "x"
> directory and "git mv x y"? Will it work, but if they add a 401th file
> it won't for diffs?
>
> I *think* given a skimming of previous discussions that it's "no",
> i.e. that this just kicks in for these expensive cases, and e.g. for
> such a case of moving the same tree OID from "x" to "y" we'd
> short-circuit things, but maybe I'm just making things up at this point.

Tree OIDs aren't used by the rename detection logic, but you're close.

> Feel free to ignore me here, I guess this amounts to a request that it
> would be great if we could point to some docs about how this works. It's
> probably too much detail for Documentation/config/{merge,diff}.txt, so
> maybe a section in either of git-{diff,merge}.txt ?

If git-merge.txt includes it, then rebase, cherry-pick, revert
probably should too.  (and maybe the -3 option of git-am)

Anyway...

In the "git mv dir-x dir-y" case, if no other changes are made to
those files, then the renames would be detectable via blobs having the
same hashes (i.e. exact renames).  So all these renames would be found
regardless of the number of files and without exhaustive detection
even kicking in.

If there were thousands of such files, and every one of them were also
modified, then the basename-guided rename detection which operates in
linear time would still find all the renames, without the exhaustive
detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename:
guide inexact rename detection based on basenames, 2021-02-14) and
37a251436447 (diffcore-rename: use directory rename guided basename
comparisons, 2021-02-27)).

If this were a merge, and there were thousands of such files renamed
and modified, AND they were all further renamed to change their
basename, BUT the original directory was unmodified on the other side
of history, then we wouldn't need any of the renames for the purposes
of the merge and they'd all be excluded from the exhaustive rename
detection.  Sure, we wouldn't find renames for those files, but that
wouldn't change the result of the merge.  So, again, no exhaustive
rename detection.  (See commit 174791f0fb23 (merge-ort: use
relevant_sources to filter possible rename sources, 2021-03-11))

(Rebases & cherry-picks also have an advantage of caching renames from
previous picks on the upstream side of history to avoid needing to
re-detect renames on that side...though it only caches renames that
are either relevant or exact.)

Basically, we bail on exhaustive rename detection if, after the linear
or faster steps have run (excluding previously cached renames, exact
rename detection, skipping irrelevant renames, basename guided rename
detection), the number of remaining unpaired source files times the
number of remaining unpaired destination files is greater than
rename_limit * rename_limit.  (If we can assume the numbers match,
i.e. N = number of remaining unpaired sources = number of remaining
unpaired destinations, then that check simplifies to bailing once N >
rename_limit)


I'm not sure if these details are really going to help the users,
especially if more special cases are added (and more have been
proposed by Johannes and became an Outreachy project, though that one
didn't get off the ground).  And this explanation is not even fully
detailed; I'm still hand waving here in at least four different ways
(mostly in ignoring special cases for different fast steps, but also
by ignoring copy detection that itself messes with the explanation in
multiple ways).

But maybe someone else can figure out a way to hand wave my hand
waving down to a simple enough explanation, while also covering copy
detection, and managing to do so in a way that doesn't mislead.  If
so, we can include that in the docs somewhere.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14  1:12   ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
@ 2021-07-14 16:43     ` Jeff King
  2021-07-14 17:32       ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2021-07-14 16:43 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 01:12:33AM +0000, Elijah Newren via GitGitGadget wrote:

> The combined effect of the above is that the file size used in past
> calculations was likely about 5x too large.  Combine that with a CPU
> performance improvement of ~30%, and we can increase the limits by
> a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
> time limits.

It's slightly sad that we only got a 30% CPU improvement in the past 10
years. Are you just counting clock speed as a short-hand here? I think
that doesn't tell the whole story. But all of this is a side-note
anyway.  What I care about is your actual timings. :)

(It also seems like this rename detection is ripe for parallelization,
 but obviously that's a totally separate topic).

> Using the original time limit of 2s for diff.renameLimit, and bumping
> merge.renameLimit from 10s to 60s, I found the following timings using
> the simple script at the end of this commit message (on an AWS c5.xlarge
> which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):
> 
>       N   Timing
>    1300    1.995s
>    7100   59.973s
> 
> So let's round down to nice even numbers and bump the limits from
> 400->1000, and from 1000->7000.

Sounds good. Thanks for thoroughly measuring and updating.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/4] doc: document the special handling of -l0
  2021-07-14  1:12   ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
@ 2021-07-14 16:45     ` Jeff King
  2021-07-14 17:17       ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2021-07-14 16:45 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 01:12:32AM +0000, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> 
> As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
> mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
> historical value associated with it.  Document this value, particularly
> since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
> (diff: remove silent clamp of renameLimit, 2017-11-13).
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/diff-options.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 11e08c3fd36..ba40ac66cc9 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -594,6 +594,9 @@ of a delete/create pair.
>  	exhaustive portion of rename/copy detection from running if
>  	the number of rename/copy targets exceeds the specified
>  	number.  Defaults to diff.renameLimit.
> ++
> +Note that for backward compatibility reasons, a value of 0 is treated
> +the same as if a large value was passed (currently, 32767).

Given the confusion around what "32767" even means to users, I wonder if
we could just say: a value of 0 removes any artificial limits (but Git
still has some internal limits which real-world cases are not likely to
hit).

Removing limits is after all the point of "0". I'm also not sure if it
is simply for backwards compatibility. We commonly let "0" or "-1" mean
"no limit" for convenience. It seems like something we'd want to
support.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/4] doc: document the special handling of -l0
  2021-07-14 16:45     ` Jeff King
@ 2021-07-14 17:17       ` Elijah Newren
  2021-07-14 17:33         ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 17:17 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 9:45 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 14, 2021 at 01:12:32AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
> > mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
> > historical value associated with it.  Document this value, particularly
> > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
> > (diff: remove silent clamp of renameLimit, 2017-11-13).
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  Documentation/diff-options.txt | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > index 11e08c3fd36..ba40ac66cc9 100644
> > --- a/Documentation/diff-options.txt
> > +++ b/Documentation/diff-options.txt
> > @@ -594,6 +594,9 @@ of a delete/create pair.
> >       exhaustive portion of rename/copy detection from running if
> >       the number of rename/copy targets exceeds the specified
> >       number.  Defaults to diff.renameLimit.
> > ++
> > +Note that for backward compatibility reasons, a value of 0 is treated
> > +the same as if a large value was passed (currently, 32767).
>
> Given the confusion around what "32767" even means to users, I wonder if
> we could just say: a value of 0 removes any artificial limits (but Git
> still has some internal limits which real-world cases are not likely to
> hit).

32767 is not an internal limit; and as such, it is absolutely an
artificial limit.  I had to use 48941 just a few years ago, and that
value (and others larger than 32767) are fully supported.

> Removing limits is after all the point of "0". I'm also not sure if it
> is simply for backwards compatibility. We commonly let "0" or "-1" mean
> "no limit" for convenience. It seems like something we'd want to
> support.

Making 0 mean unlimited could be done, and I think it'd be a one-line
change, but that's not what commit 89973554b52c (diffcore-rename: make
diff-tree -l0 mean -l<large>, 2017-11-29) tried to do.

I'm not opposed to such a change in the meaning of "0", but I am
opposed to documenting this value as unlimited unless we make it so.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14 16:43     ` Jeff King
@ 2021-07-14 17:32       ` Elijah Newren
  2021-07-14 17:57         ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 17:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 9:43 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 14, 2021 at 01:12:33AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > The combined effect of the above is that the file size used in past
> > calculations was likely about 5x too large.  Combine that with a CPU
> > performance improvement of ~30%, and we can increase the limits by
> > a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
> > time limits.
>
> It's slightly sad that we only got a 30% CPU improvement in the past 10
> years. Are you just counting clock speed as a short-hand here? I think
> that doesn't tell the whole story. But all of this is a side-note
> anyway.  What I care about is your actual timings. :)

I'm using shorthand when discussing file sizes above (though I used
actual measurements when picking new values below).  But the 30% came
from measuring the timings with the exact same sample file as you and
using a lightly modified version of your original script (tweaked to
also change file basenames) on an AWS c5xlarge instance.  My timings
showed they were only about 30% faster than what you got when you last
bumped the limits.

All my laptops and even my work machine are pretty old, so I picked an
AWS c5xlarge instance hoping it'd represent a relatively new machine
(since your benchmarks when you bumped were using a new machine at the
time).  Maybe the stock c5xlarge instance I picked wasn't a good
choice (wasn't new enough? had a low relative clock speed? affected by
Spectre patches? had slow disks? something else?).  Or maybe single
processor speed really did just hit up against Moore's law and nearly
all gains have shifted to having more cores available which don't
benefit us here since our algorithm is serial.

> (It also seems like this rename detection is ripe for parallelization,
>  but obviously that's a totally separate topic).

True.

> > Using the original time limit of 2s for diff.renameLimit, and bumping
> > merge.renameLimit from 10s to 60s, I found the following timings using
> > the simple script at the end of this commit message (on an AWS c5.xlarge
> > which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):
> >
> >       N   Timing
> >    1300    1.995s
> >    7100   59.973s
> >
> > So let's round down to nice even numbers and bump the limits from
> > 400->1000, and from 1000->7000.
>
> Sounds good. Thanks for thoroughly measuring and updating.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/4] doc: document the special handling of -l0
  2021-07-14 17:17       ` Elijah Newren
@ 2021-07-14 17:33         ` Jeff King
  2021-07-14 19:32           ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2021-07-14 17:33 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 10:17:21AM -0700, Elijah Newren wrote:

> > > +Note that for backward compatibility reasons, a value of 0 is treated
> > > +the same as if a large value was passed (currently, 32767).
> >
> > Given the confusion around what "32767" even means to users, I wonder if
> > we could just say: a value of 0 removes any artificial limits (but Git
> > still has some internal limits which real-world cases are not likely to
> > hit).
> 
> 32767 is not an internal limit; and as such, it is absolutely an
> artificial limit.  I had to use 48941 just a few years ago, and that
> value (and others larger than 32767) are fully supported.

OK, I thought there was still some 32-bit m*n limits, but I guess not.

> > Removing limits is after all the point of "0". I'm also not sure if it
> > is simply for backwards compatibility. We commonly let "0" or "-1" mean
> > "no limit" for convenience. It seems like something we'd want to
> > support.
> 
> Making 0 mean unlimited could be done, and I think it'd be a one-line
> change, but that's not what commit 89973554b52c (diffcore-rename: make
> diff-tree -l0 mean -l<large>, 2017-11-29) tried to do.
> 
> I'm not opposed to such a change in the meaning of "0", but I am
> opposed to documenting this value as unlimited unless we make it so.

Thanks for clarifying. It does seem silly that "0" means "big, but kind
of arbitrary". But I'm OK to punt on that change for now (and in the
meantime, I have no real opinion on whether or how to document the
current behavior).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14 17:32       ` Elijah Newren
@ 2021-07-14 17:57         ` Jeff King
  2021-07-14 20:03           ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2021-07-14 17:57 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 10:32:56AM -0700, Elijah Newren wrote:

> > It's slightly sad that we only got a 30% CPU improvement in the past 10
> > years. Are you just counting clock speed as a short-hand here? I think
> > that doesn't tell the whole story. But all of this is a side-note
> > anyway.  What I care about is your actual timings. :)
> 
> I'm using shorthand when discussing file sizes above (though I used
> actual measurements when picking new values below).  But the 30% came
> from measuring the timings with the exact same sample file as you and
> using a lightly modified version of your original script (tweaked to
> also change file basenames) on an AWS c5xlarge instance.  My timings
> showed they were only about 30% faster than what you got when you last
> bumped the limits.

Interesting. My timings are much faster. With a 20k file, I get (on my
laptop, which is an i9-9880H):

     N     CPU (2008)    CPU (now)
    10          0.43s       0.007s
   100          0.44s       0.071s
   200          1.40s       0.226s
   400          4.87s       0.788s
   800         18.08s       2.887s
  1000         27.82s       4.431s

The 2008 timings are from the old email you linked in your commit
message, and the new one is from running the revised script you showed.
The savings seem like more than 30%. I don't know if that's all CPU or
if something changed in the code.

Using a 3k file (the median for ls-tree), numbers are similar, but a
little smaller (my n=1300 is about 1.4s).  So I think we're both in the
same ballpark (and certainly an AWS machine is a perfectly fine
representative sample of where people might run Git).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/4] doc: document the special handling of -l0
  2021-07-14 17:33         ` Jeff King
@ 2021-07-14 19:32           ` Elijah Newren
  0 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 19:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 10:33 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 14, 2021 at 10:17:21AM -0700, Elijah Newren wrote:
>
> > > > +Note that for backward compatibility reasons, a value of 0 is treated
> > > > +the same as if a large value was passed (currently, 32767).
> > >
> > > Given the confusion around what "32767" even means to users, I wonder if
> > > we could just say: a value of 0 removes any artificial limits (but Git
> > > still has some internal limits which real-world cases are not likely to
> > > hit).
> >
> > 32767 is not an internal limit; and as such, it is absolutely an
> > artificial limit.  I had to use 48941 just a few years ago, and that
> > value (and others larger than 32767) are fully supported.
>
> OK, I thought there was still some 32-bit m*n limits, but I guess not.

There was once upon a time, but not since 9f7e4bfa3b6d (diff: remove
silent clamp of renameLimit, 2017-11-13).

> > > Removing limits is after all the point of "0". I'm also not sure if it
> > > is simply for backwards compatibility. We commonly let "0" or "-1" mean
> > > "no limit" for convenience. It seems like something we'd want to
> > > support.
> >
> > Making 0 mean unlimited could be done, and I think it'd be a one-line
> > change, but that's not what commit 89973554b52c (diffcore-rename: make
> > diff-tree -l0 mean -l<large>, 2017-11-29) tried to do.
> >
> > I'm not opposed to such a change in the meaning of "0", but I am
> > opposed to documenting this value as unlimited unless we make it so.
>
> Thanks for clarifying. It does seem silly that "0" means "big, but kind
> of arbitrary". But I'm OK to punt on that change for now (and in the
> meantime, I have no real opinion on whether or how to document the
> current behavior).
>
> -Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14 17:57         ` Jeff King
@ 2021-07-14 20:03           ` Elijah Newren
  2021-07-14 20:47             ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 20:03 UTC (permalink / raw)
  To: Jeff King
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 10:57 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jul 14, 2021 at 10:32:56AM -0700, Elijah Newren wrote:
>
> > > It's slightly sad that we only got a 30% CPU improvement in the past 10
> > > years. Are you just counting clock speed as a short-hand here? I think
> > > that doesn't tell the whole story. But all of this is a side-note
> > > anyway.  What I care about is your actual timings. :)
> >
> > I'm using shorthand when discussing file sizes above (though I used
> > actual measurements when picking new values below).  But the 30% came
> > from measuring the timings with the exact same sample file as you and
> > using a lightly modified version of your original script (tweaked to
> > also change file basenames) on an AWS c5xlarge instance.  My timings
> > showed they were only about 30% faster than what you got when you last
> > bumped the limits.
>
> Interesting. My timings are much faster. With a 20k file, I get (on my
> laptop, which is an i9-9880H):
>
>      N     CPU (2008)    CPU (now)
>     10          0.43s       0.007s
>    100          0.44s       0.071s
>    200          1.40s       0.226s
>    400          4.87s       0.788s
>    800         18.08s       2.887s
>   1000         27.82s       4.431s
>
> The 2008 timings are from the old email you linked in your commit
> message, and the new one is from running the revised script you showed.
> The savings seem like more than 30%. I don't know if that's all CPU or
> if something changed in the code.

I was using the script you wrote in 2008, but comparing to your
reported numbers in 2011[1].  When you bumped in 2011, you said you
picked the limits of 400 & 1000 in order to give rough timings of 2s
and 10s.  So the table looks more like:

   N  CPU (2008)  CPU (2011)  c5xlarge  CPU (yours)
 400    4.87s         ~2s      1.106s      0.788s
1000   27.82s        ~10s      6.350s      4.431s

So, 2011->c5xlarge on these (just recomputed now numbers) show
improvements of ~45% and ~36%.  Maybe I had an outlier run earlier
that was in the upper 6s range for N=1000 and I rounded off to 30% for
the commit message?  Don't know, those numbers are on a laptop that
died in the last few days.

But yeah, you are seeing a bigger improvement than I did; 2011->your
current laptop shows roughly 56% - 60% improvement.

[1] https://lore.kernel.org/git/20110219102128.GB22508@sigill.intra.peff.net/

> Using a 3k file (the median for ls-tree), numbers are similar, but a
> little smaller (my n=1300 is about 1.4s).  So I think we're both in the
> same ballpark (and certainly an AWS machine is a perfectly fine
> representative sample of where people might run Git).

Yeah, dividing any of the timings I get by the ones you get seem to be
giving a value somewhere around 1.4, so that seems reassuring on the
consistency front.  That doesn't fix my jealousy of your faster CPU,
but that's a separate problem.  :-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again)
  2021-07-14 20:03           ` Elijah Newren
@ 2021-07-14 20:47             ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2021-07-14 20:47 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

On Wed, Jul 14, 2021 at 01:03:43PM -0700, Elijah Newren wrote:

> > The 2008 timings are from the old email you linked in your commit
> > message, and the new one is from running the revised script you showed.
> > The savings seem like more than 30%. I don't know if that's all CPU or
> > if something changed in the code.
> 
> I was using the script you wrote in 2008, but comparing to your
> reported numbers in 2011[1].  When you bumped in 2011, you said you
> picked the limits of 400 & 1000 in order to give rough timings of 2s
> and 10s.  So the table looks more like:
> 
>    N  CPU (2008)  CPU (2011)  c5xlarge  CPU (yours)
>  400    4.87s         ~2s      1.106s      0.788s
> 1000   27.82s        ~10s      6.350s      4.431s
> 
> So, 2011->c5xlarge on these (just recomputed now numbers) show
> improvements of ~45% and ~36%.  Maybe I had an outlier run earlier
> that was in the upper 6s range for N=1000 and I rounded off to 30% for
> the commit message?  Don't know, those numbers are on a laptop that
> died in the last few days.

Ah, right. I forgot about the 2011 update. So yeah, things are getting
faster, but not as much as I would have hoped as somebody who came of
age during the clock speed boom of the 90's. :)

Thanks for humoring my puzzlement (I don't think any of this changes the
applicability of your patch).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits
  2021-07-14 16:30       ` Elijah Newren
@ 2021-07-14 22:08         ` Ævar Arnfjörð Bjarmason
  2021-07-14 22:56           ` Elijah Newren
  0 siblings, 1 reply; 44+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-14 22:08 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee, Jeff King


On Wed, Jul 14 2021, Elijah Newren wrote:

> On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote:
>>
>> > From: Elijah Newren <newren@gmail.com>
>> >
> ...
>> > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
>> > index 2d3331f55c2..e26a63d0d42 100644
>> > --- a/Documentation/config/diff.txt
>> > +++ b/Documentation/config/diff.txt
>> > @@ -118,9 +118,10 @@ diff.orderFile::
>> >       relative to the top of the working tree.
>> >
>> >  diff.renameLimit::
>> > -     The number of files to consider when performing the copy/rename
>> > -     detection; equivalent to the 'git diff' option `-l`. This setting
>> > -     has no effect if rename detection is turned off.
>> > +     The number of files to consider in the exhaustive portion of
>> > +     copy/rename detection; equivalent to the 'git diff' option
>> > +     `-l`.  If not set, the default value is 400.  This setting has
>> > +     no effect if rename detection is turned off.
>>
>> For both this...
>>
>> >  diff.renames::
>> >       Whether and how Git detects renames.  If set to "false",
>> > diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
>> > index 6b66c83eabe..aca0c92dbe6 100644
>> > --- a/Documentation/config/merge.txt
>> > +++ b/Documentation/config/merge.txt
>> > @@ -33,10 +33,12 @@ merge.verifySignatures::
>> >  include::fmt-merge-msg.txt[]
>> >
>> >  merge.renameLimit::
>> > -     The number of files to consider when performing rename detection
>> > -     during a merge; if not specified, defaults to the value of
>> > -     diff.renameLimit. This setting has no effect if rename detection
>> > -     is turned off.
>> > +     The number of files to consider in the exhaustive portion of
>> > +     rename detection during a merge.  If not specified, defaults
>> > +     to the value of diff.renameLimit.  If neither
>> > +     merge.renameLimit nor diff.renameLimit are specified, defaults
>> > +     to 1000.  This setting has no effect if rename detection is
>> > +     turned off.
>>
>> ...and this we should really have some wording to the effect of:
>>
>>     ..., defaults to XYZ. The exact (or even approximate) default of XYZ
>>     should not be relied upon, and may be changed (or these limits even
>>     removed) in future versions of git.
>>
>> I.e. let's distinguish a "this is how it works now, FYI" from a forward
>> promise that it's going to work like that forever.
>
> Perhaps I could simplify that to "...currently defaults to XYZ"?  Does
> that capture your intent well enough?

Yes, and it's not as needlessly verbose :)

> I agree very much that this shouldn't be a hard promise.  If it was,
> we've broken it repeatedly over the years.  Not only have we modified
> the default numbers multiple times, there have also been multiple
> times when we've been able to change how many renames could be handled
> without changing the default numbers for the rename limits.  (This was
> done by adding or improving special cases that allow us to exclude
> paths from the exhaustive comparison; exact rename detection that
> someone else added and my more recent improvements to exact rename
> detection are two simple examples).
>
>> Which also leads me to:
>>
>> >  merge.renames::
>> >       Whether Git detects renames.  If set to "false", rename detection
>> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> > index 32e6dee5ac3..11e08c3fd36 100644
>> > --- a/Documentation/diff-options.txt
>> > +++ b/Documentation/diff-options.txt
>> > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
>> >  of a delete/create pair.
>> >
>> >  -l<num>::
>> > -     The `-M` and `-C` options require O(n^2) processing time where n
>> > -     is the number of potential rename/copy targets.  This
>> > -     option prevents rename/copy detection from running if
>> > +     The `-M` and `-C` options have an exhaustive portion that
>> > +     requires O(n^2) processing time where n is the number of
>> > +     potential rename/copy targets.  This option prevents the
>> > +     exhaustive portion of rename/copy detection from running if
>> >       the number of rename/copy targets exceeds the specified
>> > -     number.
>> > +     number.  Defaults to diff.renameLimit.
>>
>> This mention of O(n^2). This is not an issue with your patch/series, nd
>> this is an improvement.
>>
>> But as a side-comment: Do we explain somewhere how exactly this
>> {diff,merge}.renmeLimit=N works for values of N? It's probably fine to
>> continue to handwave it away, but is there anything that say tells a
>> user what happens if they have 400 files in their repository in a "x"
>> directory and "git mv x y"? Will it work, but if they add a 401th file
>> it won't for diffs?
>>
>> I *think* given a skimming of previous discussions that it's "no",
>> i.e. that this just kicks in for these expensive cases, and e.g. for
>> such a case of moving the same tree OID from "x" to "y" we'd
>> short-circuit things, but maybe I'm just making things up at this point.
>
> Tree OIDs aren't used by the rename detection logic, but you're close.
>
>> Feel free to ignore me here, I guess this amounts to a request that it
>> would be great if we could point to some docs about how this works. It's
>> probably too much detail for Documentation/config/{merge,diff}.txt, so
>> maybe a section in either of git-{diff,merge}.txt ?
>
> If git-merge.txt includes it, then rebase, cherry-pick, revert
> probably should too.  (and maybe the -3 option of git-am)
>
> Anyway...

Some of this we do via includes, but I also think we should have more
"concept docs", like gitglossary, gitrevisions, so perhaps
gitrenames(5)? Anyway, not for now...

> In the "git mv dir-x dir-y" case, if no other changes are made to
> those files, then the renames would be detectable via blobs having the
> same hashes (i.e. exact renames).  So all these renames would be found
> regardless of the number of files and without exhaustive detection
> even kicking in.
>
> If there were thousands of such files, and every one of them were also
> modified, then the basename-guided rename detection which operates in
> linear time would still find all the renames, without the exhaustive
> detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename:
> guide inexact rename detection based on basenames, 2021-02-14) and
> 37a251436447 (diffcore-rename: use directory rename guided basename
> comparisons, 2021-02-27)).
>
> If this were a merge, and there were thousands of such files renamed
> and modified, AND they were all further renamed to change their
> basename, BUT the original directory was unmodified on the other side
> of history, then we wouldn't need any of the renames for the purposes
> of the merge and they'd all be excluded from the exhaustive rename
> detection.  Sure, we wouldn't find renames for those files, but that
> wouldn't change the result of the merge.  So, again, no exhaustive
> rename detection.  (See commit 174791f0fb23 (merge-ort: use
> relevant_sources to filter possible rename sources, 2021-03-11))
>
> (Rebases & cherry-picks also have an advantage of caching renames from
> previous picks on the upstream side of history to avoid needing to
> re-detect renames on that side...though it only caches renames that
> are either relevant or exact.)
>
> Basically, we bail on exhaustive rename detection if, after the linear
> or faster steps have run (excluding previously cached renames, exact
> rename detection, skipping irrelevant renames, basename guided rename
> detection), the number of remaining unpaired source files times the
> number of remaining unpaired destination files is greater than
> rename_limit * rename_limit.  (If we can assume the numbers match,
> i.e. N = number of remaining unpaired sources = number of remaining
> unpaired destinations, then that check simplifies to bailing once N >
> rename_limit)
>
>
> I'm not sure if these details are really going to help the users,
> especially if more special cases are added (and more have been
> proposed by Johannes and became an Outreachy project, though that one
> didn't get off the ground).  And this explanation is not even fully
> detailed; I'm still hand waving here in at least four different ways
> (mostly in ignoring special cases for different fast steps, but also
> by ignoring copy detection that itself messes with the explanation in
> multiple ways).
>
> But maybe someone else can figure out a way to hand wave my hand
> waving down to a simple enough explanation, while also covering copy
> detection, and managing to do so in a way that doesn't mislead.  If
> so, we can include that in the docs somewhere.

I think it would be good to briefly explain it in the sense of not
explaining it, i.e. give the user who ran into this some idea of what
they're looking at. Perhaps something like (and maybe this is too
long...);

    [...] these limits are limits to the number of "steps" in an an
    algorithm that's subject to change, and whose number of steps will
    depend on your input data.

    They don't correspond to any easily tangible thing, e.g. a limit of
    100 has no correspondence with detecting all renames in a commit
    that changed 10 files, but not a commit that changed 11 files.

    The limits were picked to limit runtime to an approximate desired
    value, given what was thought to be Representative data from a
    repository similar to linux.git.

    If you find your diff/merge take X amount of time doing rename
    detection, then generally speaking doubling the limit will about
    double the time we spend on it.

    So they're an indirect a way to get to a runtime limit of (what was
    it, 2 and 10 seconds?).

    The reason for this configuration not being a time limit is so that
    git can guarantee identical results for the same input data across
    different systems, given the same version and configuration.
    I.e. the common case of running your attempt at a diff/merge twice
    in a row, a time limit would produce unpredictable results.

    

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits
  2021-07-14 22:08         ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 22:56           ` Elijah Newren
  0 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren @ 2021-07-14 22:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya,
	Eric Sunshine, Derrick Stolee, Jeff King

On Wed, Jul 14, 2021 at 3:27 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Jul 14 2021, Elijah Newren wrote:
>
> > On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >> On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote:
> >>
...
> >> ...and this we should really have some wording to the effect of:
> >>
> >>     ..., defaults to XYZ. The exact (or even approximate) default of XYZ
> >>     should not be relied upon, and may be changed (or these limits even
> >>     removed) in future versions of git.
> >>
> >> I.e. let's distinguish a "this is how it works now, FYI" from a forward
> >> promise that it's going to work like that forever.
> >
> > Perhaps I could simplify that to "...currently defaults to XYZ"?  Does
> > that capture your intent well enough?
>
> Yes, and it's not as needlessly verbose :)

Cool, I'll include that change then.

> > I agree very much that this shouldn't be a hard promise.  If it was,
> > we've broken it repeatedly over the years.  Not only have we modified
> > the default numbers multiple times, there have also been multiple
> > times when we've been able to change how many renames could be handled
> > without changing the default numbers for the rename limits.  (This was
> > done by adding or improving special cases that allow us to exclude
> > paths from the exhaustive comparison; exact rename detection that
> > someone else added and my more recent improvements to exact rename
> > detection are two simple examples).
> >
> >> Which also leads me to:
> >>
> >> >  merge.renames::
> >> >       Whether Git detects renames.  If set to "false", rename detection
> >> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> >> > index 32e6dee5ac3..11e08c3fd36 100644
> >> > --- a/Documentation/diff-options.txt
> >> > +++ b/Documentation/diff-options.txt
> >> > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part
> >> >  of a delete/create pair.
> >> >
> >> >  -l<num>::
> >> > -     The `-M` and `-C` options require O(n^2) processing time where n
> >> > -     is the number of potential rename/copy targets.  This
> >> > -     option prevents rename/copy detection from running if
> >> > +     The `-M` and `-C` options have an exhaustive portion that
> >> > +     requires O(n^2) processing time where n is the number of
> >> > +     potential rename/copy targets.  This option prevents the
> >> > +     exhaustive portion of rename/copy detection from running if
> >> >       the number of rename/copy targets exceeds the specified
> >> > -     number.
> >> > +     number.  Defaults to diff.renameLimit.
> >>
> >> This mention of O(n^2). This is not an issue with your patch/series, nd
> >> this is an improvement.
> >>
> >> But as a side-comment: Do we explain somewhere how exactly this
> >> {diff,merge}.renmeLimit=N works for values of N? It's probably fine to
> >> continue to handwave it away, but is there anything that say tells a
> >> user what happens if they have 400 files in their repository in a "x"
> >> directory and "git mv x y"? Will it work, but if they add a 401th file
> >> it won't for diffs?
> >>
> >> I *think* given a skimming of previous discussions that it's "no",
> >> i.e. that this just kicks in for these expensive cases, and e.g. for
> >> such a case of moving the same tree OID from "x" to "y" we'd
> >> short-circuit things, but maybe I'm just making things up at this point.
> >
> > Tree OIDs aren't used by the rename detection logic, but you're close.
> >
> >> Feel free to ignore me here, I guess this amounts to a request that it
> >> would be great if we could point to some docs about how this works. It's
> >> probably too much detail for Documentation/config/{merge,diff}.txt, so
> >> maybe a section in either of git-{diff,merge}.txt ?
> >
> > If git-merge.txt includes it, then rebase, cherry-pick, revert
> > probably should too.  (and maybe the -3 option of git-am)
> >
> > Anyway...
>
> Some of this we do via includes, but I also think we should have more
> "concept docs", like gitglossary, gitrevisions, so perhaps
> gitrenames(5)? Anyway, not for now...
>
> > In the "git mv dir-x dir-y" case, if no other changes are made to
> > those files, then the renames would be detectable via blobs having the
> > same hashes (i.e. exact renames).  So all these renames would be found
> > regardless of the number of files and without exhaustive detection
> > even kicking in.
> >
> > If there were thousands of such files, and every one of them were also
> > modified, then the basename-guided rename detection which operates in
> > linear time would still find all the renames, without the exhaustive
> > detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename:
> > guide inexact rename detection based on basenames, 2021-02-14) and
> > 37a251436447 (diffcore-rename: use directory rename guided basename
> > comparisons, 2021-02-27)).
> >
> > If this were a merge, and there were thousands of such files renamed
> > and modified, AND they were all further renamed to change their
> > basename, BUT the original directory was unmodified on the other side
> > of history, then we wouldn't need any of the renames for the purposes
> > of the merge and they'd all be excluded from the exhaustive rename
> > detection.  Sure, we wouldn't find renames for those files, but that
> > wouldn't change the result of the merge.  So, again, no exhaustive
> > rename detection.  (See commit 174791f0fb23 (merge-ort: use
> > relevant_sources to filter possible rename sources, 2021-03-11))
> >
> > (Rebases & cherry-picks also have an advantage of caching renames from
> > previous picks on the upstream side of history to avoid needing to
> > re-detect renames on that side...though it only caches renames that
> > are either relevant or exact.)
> >
> > Basically, we bail on exhaustive rename detection if, after the linear
> > or faster steps have run (excluding previously cached renames, exact
> > rename detection, skipping irrelevant renames, basename guided rename
> > detection), the number of remaining unpaired source files times the
> > number of remaining unpaired destination files is greater than
> > rename_limit * rename_limit.  (If we can assume the numbers match,
> > i.e. N = number of remaining unpaired sources = number of remaining
> > unpaired destinations, then that check simplifies to bailing once N >
> > rename_limit)
> >
> >
> > I'm not sure if these details are really going to help the users,
> > especially if more special cases are added (and more have been
> > proposed by Johannes and became an Outreachy project, though that one
> > didn't get off the ground).  And this explanation is not even fully
> > detailed; I'm still hand waving here in at least four different ways
> > (mostly in ignoring special cases for different fast steps, but also
> > by ignoring copy detection that itself messes with the explanation in
> > multiple ways).
> >
> > But maybe someone else can figure out a way to hand wave my hand
> > waving down to a simple enough explanation, while also covering copy
> > detection, and managing to do so in a way that doesn't mislead.  If
> > so, we can include that in the docs somewhere.
>
> I think it would be good to briefly explain it in the sense of not
> explaining it, i.e. give the user who ran into this some idea of what
> they're looking at. Perhaps something like (and maybe this is too
> long...);
>
>     [...] these limits are limits to the number of "steps" in an an
>     algorithm that's subject to change, and whose number of steps will
>     depend on your input data.
>
>     They don't correspond to any easily tangible thing, e.g. a limit of
>     100 has no correspondence with detecting all renames in a commit
>     that changed 10 files, but not a commit that changed 11 files.
>
>     The limits were picked to limit runtime to an approximate desired
>     value, given what was thought to be Representative data from a
>     repository similar to linux.git.
>
>     If you find your diff/merge take X amount of time doing rename
>     detection, then generally speaking doubling the limit will about
>     double the time we spend on it.

...*quadruple* the time we spend on it.

>     So they're an indirect a way to get to a runtime limit of (what was
>     it, 2 and 10 seconds?).
>
>     The reason for this configuration not being a time limit is so that
>     git can guarantee identical results for the same input data across
>     different systems, given the same version and configuration.
>     I.e. the common case of running your attempt at a diff/merge twice
>     in a row, a time limit would produce unpredictable results.

Seems pretty lengthy.  Thinking about it for a bit, perhaps we could
just say something like:


After preliminary steps that can detect subsets of renames quickly, an
exhaustive fallback algorithm is used that compares all remaining
unpaired sources with all remaining unpaired destinations looking for
similar files that can be marked as renames.  (Or, in the case of copy
detection, it compares all original sources with all remaining
unpaired destinations).  With N sources and N destinations, this
algorithm is O(N^2).  The rename_limit says to skip this exhaustive
step if N is greater than the rename_limit.  (More precisely, it says
to skip the exhaustive step if number_sources * number_destinations >
rename_limit * rename_limit, which handles the case where there are a
different number of sources and destinations.)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults
  2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-07-14  1:12   ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
@ 2021-07-15  0:45   ` Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
                       ` (5 more replies)
  4 siblings, 6 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-15  0:45 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren

Fix a few small issues with documentation and warnings around the limits for
the quadratic portion of rename (&copy) detection, and bump the default
limits.

Discussion on bumping the limits can be found at [1]. Although it appears we
generally agree we could switch to an unlimited setting for
merge.renameLimit, that would require some changes to progress bars to
notify users how to take action once things start taking a while. So, for
now, just bump the limits.

[1]
https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u

Changes since v2:

 * Change the meaning of "0" to actually mean unlimited, and modify the
   documentation to mention that.
 * Added 'currently' to descriptions to make it clear the defaults are
   likely to change (again).
 * Added a brief explanation of the exhaustive portion of rename detection,
   as requested by Ævar (though, honestly, I think the thing that actually
   helps people pick values for the limit is the warning that tells people
   that rename detection was skipped and how high they need to set the limit
   if they want to redo the operation and get renames).

Changes since v1:

 * Shuffled patch order since the explanation of why "inexact rename
   detection" is incorrect was in the third patch
 * Use the term "exhaustive rename detection" for the quadratic portion
 * Simplify -l description by just stating that it defaults to
   diff.renameLimit (since it in turn has the right default value)
 * Fix asciidoc formating
 * Include bump of the limits in a new patch

Elijah Newren (4):
  diff: correct warning message when renameLimit exceeded
  doc: clarify documentation for rename/copy limits
  diffcore-rename: treat a rename_limit of 0 as unlimited
  Bump rename limit defaults (yet again)

 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt | 16 +++++++++++-----
 diff.c                         |  4 ++--
 diffcore-rename.c              |  2 +-
 merge-ort.c                    |  2 +-
 merge-recursive.c              |  2 +-
 7 files changed, 26 insertions(+), 17 deletions(-)


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v3
Pull-Request: https://github.com/git/git/pull/1044

Range-diff vs v2:

 1:  0d1d0f180a3 = 1:  0d1d0f180a3 diff: correct warning message when renameLimit exceeded
 2:  4046993a9a2 ! 2:  193385d7ca1 doc: clarify documentation for rename/copy limits
     @@ Documentation/config/diff.txt: diff.orderFile::
      -	has no effect if rename detection is turned off.
      +	The number of files to consider in the exhaustive portion of
      +	copy/rename detection; equivalent to the 'git diff' option
     -+	`-l`.  If not set, the default value is 400.  This setting has
     -+	no effect if rename detection is turned off.
     ++	`-l`.  If not set, the default value is currently 400.  This
     ++	setting has no effect if rename detection is turned off.
       
       diff.renames::
       	Whether and how Git detects renames.  If set to "false",
     @@ Documentation/config/merge.txt: merge.verifySignatures::
      +	The number of files to consider in the exhaustive portion of
      +	rename detection during a merge.  If not specified, defaults
      +	to the value of diff.renameLimit.  If neither
     -+	merge.renameLimit nor diff.renameLimit are specified, defaults
     -+	to 1000.  This setting has no effect if rename detection is
     -+	turned off.
     ++	merge.renameLimit nor diff.renameLimit are specified,
     ++	currently defaults to 1000.  This setting has no effect if
     ++	rename detection is turned off.
       
       merge.renames::
       	Whether Git detects renames.  If set to "false", rename detection
     @@ Documentation/diff-options.txt: When used together with `-B`, omit also the prei
      -	The `-M` and `-C` options require O(n^2) processing time where n
      -	is the number of potential rename/copy targets.  This
      -	option prevents rename/copy detection from running if
     -+	The `-M` and `-C` options have an exhaustive portion that
     -+	requires O(n^2) processing time where n is the number of
     -+	potential rename/copy targets.  This option prevents the
     -+	exhaustive portion of rename/copy detection from running if
     - 	the number of rename/copy targets exceeds the specified
     +-	the number of rename/copy targets exceeds the specified
      -	number.
     -+	number.  Defaults to diff.renameLimit.
     ++	The `-M` and `-C` options involve some preliminary steps that
     ++	can detect subsets of renames/copies cheaply, followed by an
     ++	exhaustive fallback portion that compares all remaining
     ++	unpaired destinations to all relevant sources.  (For renames,
     ++	only remaining unpaired sources are relevant; for copies, all
     ++	original sources are relevant.)  For N sources and
     ++	destinations, this exhaustive check is O(N^2).  This option
     ++	prevents the exhaustive portion of rename/copy detection from
     ++	running if the number of source/destination files involved
     ++	exceeds the specified number.  Defaults to diff.renameLimit.
       
       ifndef::git-format-patch[]
       --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
 3:  6f5767607cd ! 3:  00a2072baea doc: document the special handling of -l0
     @@ Metadata
      Author: Elijah Newren <newren@gmail.com>
      
       ## Commit message ##
     -    doc: document the special handling of -l0
     +    diffcore-rename: treat a rename_limit of 0 as unlimited
      
     -    As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0
     -    mean -l<large>, 2017-11-29), -l0 has had a magical special "large"
     -    historical value associated with it.  Document this value, particularly
     -    since it is not large enough for some uses -- see commit 9f7e4bfa3b6d
     -    (diff: remove silent clamp of renameLimit, 2017-11-13).
     +    In commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean
     +    -l<large>, 2017-11-29), -l0 was given a special magical "large" value,
     +    but one which was not large enough for some uses (as can be seen from
     +    commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit,
     +    2017-11-13).  Make 0 (or a negative value) be treated as unlimited
     +    instead and update the documentation to mention this.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## Documentation/diff-options.txt ##
      @@ Documentation/diff-options.txt: of a delete/create pair.
     - 	exhaustive portion of rename/copy detection from running if
     - 	the number of rename/copy targets exceeds the specified
     - 	number.  Defaults to diff.renameLimit.
     -++
     -+Note that for backward compatibility reasons, a value of 0 is treated
     -+the same as if a large value was passed (currently, 32767).
     + 	prevents the exhaustive portion of rename/copy detection from
     + 	running if the number of source/destination files involved
     + 	exceeds the specified number.  Defaults to diff.renameLimit.
     ++	Note that a value of 0 is treated as unlimited.
       
       ifndef::git-format-patch[]
       --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
     +
     + ## diffcore-rename.c ##
     +@@ diffcore-rename.c: static int too_many_rename_candidates(int num_destinations, int num_sources,
     + 	 * memory for the matrix anyway.
     + 	 */
     + 	if (rename_limit <= 0)
     +-		rename_limit = 32767;
     ++		return 0; /* treat as unlimited */
     + 	if (st_mult(num_destinations, num_sources)
     + 	    <= st_mult(rename_limit, rename_limit))
     + 		return 0;
 4:  8f1deb6dd16 ! 4:  b41278b6680 Bump rename limit defaults (yet again)
     @@ Documentation/config/diff.txt: diff.orderFile::
       diff.renameLimit::
       	The number of files to consider in the exhaustive portion of
       	copy/rename detection; equivalent to the 'git diff' option
     --	`-l`.  If not set, the default value is 400.  This setting has
     -+	`-l`.  If not set, the default value is 1000.  This setting has
     - 	no effect if rename detection is turned off.
     +-	`-l`.  If not set, the default value is currently 400.  This
     ++	`-l`.  If not set, the default value is currently 1000.  This
     + 	setting has no effect if rename detection is turned off.
       
       diff.renames::
      
     @@ Documentation/config/merge.txt
      @@ Documentation/config/merge.txt: merge.renameLimit::
       	rename detection during a merge.  If not specified, defaults
       	to the value of diff.renameLimit.  If neither
     - 	merge.renameLimit nor diff.renameLimit are specified, defaults
     --	to 1000.  This setting has no effect if rename detection is
     -+	to 7000.  This setting has no effect if rename detection is
     - 	turned off.
     + 	merge.renameLimit nor diff.renameLimit are specified,
     +-	currently defaults to 1000.  This setting has no effect if
     ++	currently defaults to 7000.  This setting has no effect if
     + 	rename detection is turned off.
       
       merge.renames::
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
@ 2021-07-15  0:45     ` Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-15  0:45 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97a0
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14).  Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..2454e34cf6d 100644
--- a/diff.c
+++ b/diff.c
@@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q)
 }
 
 static const char rename_limit_warning[] =
-N_("inexact rename detection was skipped due to too many files.");
+N_("exhaustive rename detection was skipped due to too many files.");
 
 static const char degrade_cc_to_c_warning[] =
 N_("only found copies from modified paths due to too many files.");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 2/4] doc: clarify documentation for rename/copy limits
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
@ 2021-07-15  0:45     ` Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-15  0:45 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection.  The following two commits each
introduced an exception to this:

    9027f53cb505 (Do linear-time/space rename logic for exact renames,
                  2007-10-25)
    bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based
                  on basenames, 2021-02-14)

(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection.  So, for copy detection, the documentation was
closer to correct.)

Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm.  While at it, also note the
default values for all these settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/config/diff.txt  |  7 ++++---
 Documentation/config/merge.txt | 10 ++++++----
 Documentation/diff-options.txt | 15 ++++++++++-----
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index 2d3331f55c2..d1b5cfa3542 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -118,9 +118,10 @@ diff.orderFile::
 	relative to the top of the working tree.
 
 diff.renameLimit::
-	The number of files to consider when performing the copy/rename
-	detection; equivalent to the 'git diff' option `-l`. This setting
-	has no effect if rename detection is turned off.
+	The number of files to consider in the exhaustive portion of
+	copy/rename detection; equivalent to the 'git diff' option
+	`-l`.  If not set, the default value is currently 400.  This
+	setting has no effect if rename detection is turned off.
 
 diff.renames::
 	Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
index 6b66c83eabe..7cd6d7883b6 100644
--- a/Documentation/config/merge.txt
+++ b/Documentation/config/merge.txt
@@ -33,10 +33,12 @@ merge.verifySignatures::
 include::fmt-merge-msg.txt[]
 
 merge.renameLimit::
-	The number of files to consider when performing rename detection
-	during a merge; if not specified, defaults to the value of
-	diff.renameLimit. This setting has no effect if rename detection
-	is turned off.
+	The number of files to consider in the exhaustive portion of
+	rename detection during a merge.  If not specified, defaults
+	to the value of diff.renameLimit.  If neither
+	merge.renameLimit nor diff.renameLimit are specified,
+	currently defaults to 1000.  This setting has no effect if
+	rename detection is turned off.
 
 merge.renames::
 	Whether Git detects renames.  If set to "false", rename detection
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 32e6dee5ac3..58acfff9289 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -588,11 +588,16 @@ When used together with `-B`, omit also the preimage in the deletion part
 of a delete/create pair.
 
 -l<num>::
-	The `-M` and `-C` options require O(n^2) processing time where n
-	is the number of potential rename/copy targets.  This
-	option prevents rename/copy detection from running if
-	the number of rename/copy targets exceeds the specified
-	number.
+	The `-M` and `-C` options involve some preliminary steps that
+	can detect subsets of renames/copies cheaply, followed by an
+	exhaustive fallback portion that compares all remaining
+	unpaired destinations to all relevant sources.  (For renames,
+	only remaining unpaired sources are relevant; for copies, all
+	original sources are relevant.)  For N sources and
+	destinations, this exhaustive check is O(N^2).  This option
+	prevents the exhaustive portion of rename/copy detection from
+	running if the number of source/destination files involved
+	exceeds the specified number.  Defaults to diff.renameLimit.
 
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
  2021-07-15  0:45     ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
@ 2021-07-15  0:45     ` Elijah Newren via GitGitGadget
  2021-07-15 23:17       ` Junio C Hamano
  2021-07-15  0:45     ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-15  0:45 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean
-l<large>, 2017-11-29), -l0 was given a special magical "large" value,
but one which was not large enough for some uses (as can be seen from
commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit,
2017-11-13).  Make 0 (or a negative value) be treated as unlimited
instead and update the documentation to mention this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/diff-options.txt | 1 +
 diffcore-rename.c              | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 58acfff9289..0aebe832057 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -598,6 +598,7 @@ of a delete/create pair.
 	prevents the exhaustive portion of rename/copy detection from
 	running if the number of source/destination files involved
 	exceeds the specified number.  Defaults to diff.renameLimit.
+	Note that a value of 0 is treated as unlimited.
 
 ifndef::git-format-patch[]
 --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]::
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 3375e24659e..513ba7b05f1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1021,7 +1021,7 @@ static int too_many_rename_candidates(int num_destinations, int num_sources,
 	 * memory for the matrix anyway.
 	 */
 	if (rename_limit <= 0)
-		rename_limit = 32767;
+		return 0; /* treat as unlimited */
 	if (st_mult(num_destinations, num_sources)
 	    <= st_mult(rename_limit, rename_limit))
 		return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 4/4] Bump rename limit defaults (yet again)
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-07-15  0:45     ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget
@ 2021-07-15  0:45     ` Elijah Newren via GitGitGadget
  2021-07-15 13:36     ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee
  2021-07-15 23:20     ` Junio C Hamano
  5 siblings, 0 replies; 44+ messages in thread
From: Elijah Newren via GitGitGadget @ 2021-07-15  0:45 UTC (permalink / raw)
  To: git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

These were last bumped in commit 92c57e5c1d29 (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/config/diff.txt  | 2 +-
 Documentation/config/merge.txt | 2 +-
 diff.c                         | 2 +-
 merge-ort.c                    | 2 +-
 merge-recursive.c              | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index d1b5cfa3542..32f84838ac1 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -120,7 +120,7 @@ diff.orderFile::
 diff.renameLimit::
 	The number of files to consider in the exhaustive portion of
 	copy/rename detection; equivalent to the 'git diff' option
-	`-l`.  If not set, the default value is currently 400.  This
+	`-l`.  If not set, the default value is currently 1000.  This
 	setting has no effect if rename detection is turned off.
 
 diff.renames::
diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt
index 7cd6d7883b6..e27cc639447 100644
--- a/Documentation/config/merge.txt
+++ b/Documentation/config/merge.txt
@@ -37,7 +37,7 @@ merge.renameLimit::
 	rename detection during a merge.  If not specified, defaults
 	to the value of diff.renameLimit.  If neither
 	merge.renameLimit nor diff.renameLimit are specified,
-	currently defaults to 1000.  This setting has no effect if
+	currently defaults to 7000.  This setting has no effect if
 	rename detection is turned off.
 
 merge.renames::
diff --git a/diff.c b/diff.c
index 2454e34cf6d..0244a371d32 100644
--- a/diff.c
+++ b/diff.c
@@ -35,7 +35,7 @@
 
 static int diff_detect_rename_default;
 static int diff_indent_heuristic = 1;
-static int diff_rename_limit_default = 400;
+static int diff_rename_limit_default = 1000;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
 static int diff_color_moved_default;
diff --git a/merge-ort.c b/merge-ort.c
index b954f7184a5..8a84375e940 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2558,7 +2558,7 @@ static void detect_regular_renames(struct merge_options *opt,
 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
 	diff_opts.rename_limit = opt->rename_limit;
 	if (opt->rename_limit <= 0)
-		diff_opts.rename_limit = 1000;
+		diff_opts.rename_limit = 7000;
 	diff_opts.rename_score = opt->rename_score;
 	diff_opts.show_rename_progress = opt->show_rename_progress;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
diff --git a/merge-recursive.c b/merge-recursive.c
index 4327e0cfa33..f19f8cc37bd 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1879,7 +1879,7 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *opt,
 	 */
 	if (opts.detect_rename > DIFF_DETECT_RENAME)
 		opts.detect_rename = DIFF_DETECT_RENAME;
-	opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 1000;
+	opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 7000;
 	opts.rename_score = opt->rename_score;
 	opts.show_rename_progress = opt->show_rename_progress;
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-07-15  0:45     ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
@ 2021-07-15 13:36     ` Derrick Stolee
  2021-07-15 23:20     ` Junio C Hamano
  5 siblings, 0 replies; 44+ messages in thread
From: Derrick Stolee @ 2021-07-15 13:36 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Jeff King,
	Ævar Arnfjörð Bjarmason

On 7/14/2021 8:45 PM, Elijah Newren via GitGitGadget wrote:
> Fix a few small issues with documentation and warnings around the limits for
> the quadratic portion of rename (&copy) detection, and bump the default
> limits.
> 
> Discussion on bumping the limits can be found at [1]. Although it appears we
> generally agree we could switch to an unlimited setting for
> merge.renameLimit, that would require some changes to progress bars to
> notify users how to take action once things start taking a while. So, for
> now, just bump the limits.

I briefly read through the responses to v2, and it looks like you
updated your v3 appropriately. I think the changes since v2 also
satisfy my comments on v1. This version LGTM.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited
  2021-07-15  0:45     ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget
@ 2021-07-15 23:17       ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2021-07-15 23:17 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 3375e24659e..513ba7b05f1 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -1021,7 +1021,7 @@ static int too_many_rename_candidates(int num_destinations, int num_sources,
>  	 * memory for the matrix anyway.
>  	 */
>  	if (rename_limit <= 0)
> -		rename_limit = 32767;
> +		return 0; /* treat as unlimited */

OK.  As this short-cuts, the impact a non-positive value may have on
all the remainder of the function (like limit squared must be larger
than the matrix) would not have to be compensated for.  Very simple
and clean.

>  	if (st_mult(num_destinations, num_sources)
>  	    <= st_mult(rename_limit, rename_limit))
>  		return 0;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults
  2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-07-15 13:36     ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee
@ 2021-07-15 23:20     ` Junio C Hamano
  5 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2021-07-15 23:20 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Fix a few small issues with documentation and warnings around the limits for
> the quadratic portion of rename (&copy) detection, and bump the default
> limits.
>
> Discussion on bumping the limits can be found at [1]. Although it appears we
> generally agree we could switch to an unlimited setting for
> merge.renameLimit, that would require some changes to progress bars to
> notify users how to take action once things start taking a while. So, for
> now, just bump the limits.
>
> [1]
> https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u
>
> Changes since v2:
>
>  * Change the meaning of "0" to actually mean unlimited, and modify the
>    documentation to mention that.
>  * Added 'currently' to descriptions to make it clear the defaults are
>    likely to change (again).
>  * Added a brief explanation of the exhaustive portion of rename detection,
>    as requested by Ævar (though, honestly, I think the thing that actually
>    helps people pick values for the limit is the warning that tells people
>    that rename detection was skipped and how high they need to set the limit
>    if they want to redo the operation and get renames).

Looks sensible.  The final phrase chosen ("exhaustive" part) does
sound easy-to-understand, too.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-07-15 23:20 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-11  0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget
2021-07-11  0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget
2021-07-11  4:37   ` Bagas Sanjaya
2021-07-11  4:52     ` Elijah Newren
2021-07-12 15:03   ` Derrick Stolee
2021-07-12 21:27     ` Junio C Hamano
2021-07-11  0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
2021-07-11  4:54   ` Eric Sunshine
2021-07-11  4:54     ` Elijah Newren
2021-07-11  0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
2021-07-12 15:09   ` Derrick Stolee
2021-07-12 18:13     ` Elijah Newren
2021-07-14  0:47       ` Junio C Hamano
2021-07-14  1:06         ` Elijah Newren
2021-07-14  1:10           ` Junio C Hamano
2021-07-14  1:22             ` Elijah Newren
2021-07-14  5:17               ` Junio C Hamano
2021-07-14 15:09                 ` Elijah Newren
2021-07-14  1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
2021-07-14  1:12   ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
2021-07-14  1:12   ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
2021-07-14  7:37     ` Ævar Arnfjörð Bjarmason
2021-07-14 16:30       ` Elijah Newren
2021-07-14 22:08         ` Ævar Arnfjörð Bjarmason
2021-07-14 22:56           ` Elijah Newren
2021-07-14  1:12   ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget
2021-07-14 16:45     ` Jeff King
2021-07-14 17:17       ` Elijah Newren
2021-07-14 17:33         ` Jeff King
2021-07-14 19:32           ` Elijah Newren
2021-07-14  1:12   ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
2021-07-14 16:43     ` Jeff King
2021-07-14 17:32       ` Elijah Newren
2021-07-14 17:57         ` Jeff King
2021-07-14 20:03           ` Elijah Newren
2021-07-14 20:47             ` Jeff King
2021-07-15  0:45   ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget
2021-07-15  0:45     ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget
2021-07-15  0:45     ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget
2021-07-15  0:45     ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget
2021-07-15 23:17       ` Junio C Hamano
2021-07-15  0:45     ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget
2021-07-15 13:36     ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee
2021-07-15 23:20     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).