* [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits @ 2021-07-11 0:46 Elijah Newren via GitGitGadget 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-11 0:46 UTC (permalink / raw) To: git; +Cc: Elijah Newren Fix a few small issues with documentation and warnings around the limits for the quadratic portion of rename (©) detection. I also sent out a separate RFC about how and whether to bump {diff,merge}.renameLimit [1]. If there's a quick consensus there, I may add a patch to this series that bumps the limits, but that could also just be done separately afterward. [1] https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u Elijah Newren (3): doc: clarify documentation for rename/copy limits doc: document the special handling of -l0 diff: correct warning message when renameLimit exceeded Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 14 +++++++++----- diff.c | 2 +- 4 files changed, 20 insertions(+), 13 deletions(-) base-commit: 670b81a890388c60b7032a4f5b879f2ece8c4558 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v1 Pull-Request: https://github.com/git/git/pull/1044 -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/3] doc: clarify documentation for rename/copy limits 2021-07-11 0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget @ 2021-07-11 0:46 ` Elijah Newren via GitGitGadget 2021-07-11 4:37 ` Bagas Sanjaya 2021-07-12 15:03 ` Derrick Stolee 2021-07-11 0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget ` (2 subsequent siblings) 3 siblings, 2 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-11 0:46 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A few places in the docs implied that rename/copy detection is always quadratic or that all (unpaired) files were involved in the quadratic portion of rename/copy detection. The following two commits each introduced an exception to this: 9027f53cb505 (Do linear-time/space rename logic for exact renames, 2007-10-25) bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14) (As a side note, for copy detection, the basename guided inexact rename detection is turned off and the exact renames will only result in sources (without the dests) being removed from the set of files used in quadratic detection. So, for copy detection, the documentation was closer to correct.) Avoid implying that all files involved in rename/copy detection are subject to the full quadratic algorithm. While at it, also note the default values for all these settings. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 11 ++++++----- 3 files changed, 16 insertions(+), 12 deletions(-) diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt index 2d3331f55c2..31b96e8294f 100644 --- a/Documentation/config/diff.txt +++ b/Documentation/config/diff.txt @@ -118,9 +118,10 @@ diff.orderFile:: relative to the top of the working tree. diff.renameLimit:: - The number of files to consider when performing the copy/rename - detection; equivalent to the 'git diff' option `-l`. This setting - has no effect if rename detection is turned off. + The number of files to consider in the quadratic portion of + copy/rename detection; equivalent to the 'git diff' option + `-l`. If not set, the default value is 400. This setting has + no effect if rename detection is turned off. diff.renames:: Whether and how Git detects renames. If set to "false", diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt index cb2ed589075..d030b936d1f 100644 --- a/Documentation/config/merge.txt +++ b/Documentation/config/merge.txt @@ -33,10 +33,12 @@ merge.verifySignatures:: include::fmt-merge-msg.txt[] merge.renameLimit:: - The number of files to consider when performing rename detection - during a merge; if not specified, defaults to the value of - diff.renameLimit. This setting has no effect if rename detection - is turned off. + The number of files to consider in the quadratic portion of + rename detection during a merge. If not specified, defaults + to the value of diff.renameLimit. If neither + merge.renameLimit nor diff.renameLimit are specified, defaults + to 1000. This setting has no effect if rename detection is + turned off. merge.renames:: Whether Git detects renames. If set to "false", rename detection diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 32e6dee5ac3..b5682b83956 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part of a delete/create pair. -l<num>:: - The `-M` and `-C` options require O(n^2) processing time where n - is the number of potential rename/copy targets. This - option prevents rename/copy detection from running if - the number of rename/copy targets exceeds the specified - number. + The `-M` and `-C` options can require O(n^2) processing time + where n is the number of potential rename/copy targets. This + option prevents the quadratic portion of rename/copy detection + from running if the number of rename/copy targets exceeds the + specified number. Defaults to diff.renameLimit, or if that is + also unspecified, to 400. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget @ 2021-07-11 4:37 ` Bagas Sanjaya 2021-07-11 4:52 ` Elijah Newren 2021-07-12 15:03 ` Derrick Stolee 1 sibling, 1 reply; 44+ messages in thread From: Bagas Sanjaya @ 2021-07-11 4:37 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 11/07/21 07.46, Elijah Newren via GitGitGadget wrote: > -l<num>:: > - The `-M` and `-C` options require O(n^2) processing time where n > - is the number of potential rename/copy targets. This > - option prevents rename/copy detection from running if > - the number of rename/copy targets exceeds the specified > - number. > + The `-M` and `-C` options can require O(n^2) processing time > + where n is the number of potential rename/copy targets. This > + option prevents the quadratic portion of rename/copy detection > + from running if the number of rename/copy targets exceeds the > + specified number. Defaults to diff.renameLimit, or if that is > + also unspecified, to 400. Why not just defaults to diff.renameLimit if the default for it and -l are both 400? -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits 2021-07-11 4:37 ` Bagas Sanjaya @ 2021-07-11 4:52 ` Elijah Newren 0 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-07-11 4:52 UTC (permalink / raw) To: Bagas Sanjaya; +Cc: Elijah Newren via GitGitGadget, Git Mailing List On Sat, Jul 10, 2021 at 9:37 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote: > > On 11/07/21 07.46, Elijah Newren via GitGitGadget wrote: > > -l<num>:: > > - The `-M` and `-C` options require O(n^2) processing time where n > > - is the number of potential rename/copy targets. This > > - option prevents rename/copy detection from running if > > - the number of rename/copy targets exceeds the specified > > - number. > > + The `-M` and `-C` options can require O(n^2) processing time > > + where n is the number of potential rename/copy targets. This > > + option prevents the quadratic portion of rename/copy detection > > + from running if the number of rename/copy targets exceeds the > > + specified number. Defaults to diff.renameLimit, or if that is > > + also unspecified, to 400. > > Why not just defaults to diff.renameLimit if the default for it and -l > are both 400? Good point; I'll make that change and include it in the next re-roll. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget 2021-07-11 4:37 ` Bagas Sanjaya @ 2021-07-12 15:03 ` Derrick Stolee 2021-07-12 21:27 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-07-12 15:03 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> ... > diff.renameLimit:: > - The number of files to consider when performing the copy/rename > - detection; equivalent to the 'git diff' option `-l`. This setting > - has no effect if rename detection is turned off. > + The number of files to consider in the quadratic portion of > + copy/rename detection; equivalent to the 'git diff' option > + `-l`. If not set, the default value is 400. This setting has > + no effect if rename detection is turned off. ... > merge.renameLimit:: > - The number of files to consider when performing rename detection > - during a merge; if not specified, defaults to the value of > - diff.renameLimit. This setting has no effect if rename detection > - is turned off. > + The number of files to consider in the quadratic portion of > + rename detection during a merge. If not specified, defaults > + to the value of diff.renameLimit. If neither > + merge.renameLimit nor diff.renameLimit are specified, defaults > + to 1000. This setting has no effect if rename detection is > + turned off. ... > -l<num>:: > - The `-M` and `-C` options require O(n^2) processing time where n > - is the number of potential rename/copy targets. This > - option prevents rename/copy detection from running if > - the number of rename/copy targets exceeds the specified > - number. > + The `-M` and `-C` options can require O(n^2) processing time > + where n is the number of potential rename/copy targets. This > + option prevents the quadratic portion of rename/copy detection > + from running if the number of rename/copy targets exceeds the > + specified number. Defaults to diff.renameLimit, or if that is > + also unspecified, to 400. These changes are clear to me, but I also know how a bit about how the rename algorithm works and what "the quadratic portion" means. I wonder if this is sufficient detail to help a user less versed in Git's internals. Perhaps we should instead be describing the set of "potential inexact renames" as a more descriptive approach? Here is a possible way to use this wording instead: merge.renameLimit:: The number of potential inexact renames to consider when performing rename detection during a merge; if this limit is exceeded, then no inexact renames are computed. Renames where the content does not change are excluded from this limit. If not specified, defaults to... Feel free to take or leave any of this example. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/3] doc: clarify documentation for rename/copy limits 2021-07-12 15:03 ` Derrick Stolee @ 2021-07-12 21:27 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2021-07-12 21:27 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren Derrick Stolee <stolee@gmail.com> writes: > merge.renameLimit:: > The number of potential inexact renames to consider when > performing rename detection during a merge; if this limit > is exceeded, then no inexact renames are computed. Renames > where the content does not change are excluded from this > limit. If not specified, defaults to... > > Feel free to take or leave any of this example. Phrases like "finding inexact renames" and "finding which files were moved with changes" may tell what we are spending cycles on to the end user, but I have to say I agree that "quadratic portion" would not resonate with the intended audiences at all. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/3] doc: document the special handling of -l0 2021-07-11 0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget @ 2021-07-11 0:46 ` Elijah Newren via GitGitGadget 2021-07-11 4:54 ` Eric Sunshine 2021-07-11 0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 3 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-11 0:46 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean -l<large>, 2017-11-29), -l0 has had a magical special "large" historical value associated with it. Document this value, particularly since it is not large enough for some uses -- see commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit, 2017-11-13). Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/diff-options.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index b5682b83956..17c8455b908 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -595,6 +595,9 @@ of a delete/create pair. specified number. Defaults to diff.renameLimit, or if that is also unspecified, to 400. + Note that for backward compatibility reasons, a value of 0 is treated + the same as if 32767 was passed. + ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: Select only files that are Added (`A`), Copied (`C`), -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/3] doc: document the special handling of -l0 2021-07-11 0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget @ 2021-07-11 4:54 ` Eric Sunshine 2021-07-11 4:54 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Eric Sunshine @ 2021-07-11 4:54 UTC (permalink / raw) To: Elijah Newren via GitGitGadget; +Cc: Git List, Elijah Newren On Sat, Jul 10, 2021 at 8:46 PM Elijah Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 > mean -l<large>, 2017-11-29), -l0 has had a magical special "large" > historical value associated with it. Document this value, particularly > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d > (diff: remove silent clamp of renameLimit, 2017-11-13). > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > @@ -595,6 +595,9 @@ of a delete/create pair. > specified number. Defaults to diff.renameLimit, or if that is > also unspecified, to 400. > > + Note that for backward compatibility reasons, a value of 0 is treated > + the same as if 32767 was passed. To get this to format properly, you'll need to drop the indentation from the new paragraph and replace the blank line preceding it with a line containing just a plus (`+`). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/3] doc: document the special handling of -l0 2021-07-11 4:54 ` Eric Sunshine @ 2021-07-11 4:54 ` Elijah Newren 0 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-07-11 4:54 UTC (permalink / raw) To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List On Sat, Jul 10, 2021 at 9:54 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Sat, Jul 10, 2021 at 8:46 PM Elijah Newren via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 > > mean -l<large>, 2017-11-29), -l0 has had a magical special "large" > > historical value associated with it. Document this value, particularly > > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d > > (diff: remove silent clamp of renameLimit, 2017-11-13). > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > > @@ -595,6 +595,9 @@ of a delete/create pair. > > specified number. Defaults to diff.renameLimit, or if that is > > also unspecified, to 400. > > > > + Note that for backward compatibility reasons, a value of 0 is treated > > + the same as if 32767 was passed. > > To get this to format properly, you'll need to drop the indentation > from the new paragraph and replace the blank line preceding it with a > line containing just a plus (`+`). Ok, will do. Thanks. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-11 0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget 2021-07-11 0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget @ 2021-07-11 0:46 ` Elijah Newren via GitGitGadget 2021-07-12 15:09 ` Derrick Stolee 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 3 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-11 0:46 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The warning when quadratic rename detection was skipped referred to "inexact rename detection". For years, the only linear portion of rename detection was looking for exact renames, so "inexact rename detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14), so now the correct way to refer to quadratic rename detection is "quadratic rename detection". Fix the warning accordingly. Signed-off-by: Elijah Newren <newren@gmail.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 52c791574b7..343f391edf8 100644 --- a/diff.c +++ b/diff.c @@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q) } static const char rename_limit_warning[] = -N_("inexact rename detection was skipped due to too many files."); +N_("quadratic rename detection was skipped due to too many files."); static const char degrade_cc_to_c_warning[] = N_("only found copies from modified paths due to too many files."); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-11 0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget @ 2021-07-12 15:09 ` Derrick Stolee 2021-07-12 18:13 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Derrick Stolee @ 2021-07-12 15:09 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > The warning when quadratic rename detection was skipped referred to > "inexact rename detection". For years, the only linear portion of > rename detection was looking for exact renames, so "inexact rename > detection" was an accurate way to refer to the quadratic portion of > rename detection. However, that changed with commit bd24aa2f97a0 > (diffcore-rename: guide inexact rename detection based on basenames, > 2021-02-14), so now the correct way to refer to quadratic rename > detection is "quadratic rename detection". Fix the warning accordingly. Now that I read this more specific reason for using "quadratic", my earlier comments on patch 1 are slightly less helpful. Specifically, I was recommending to continue using "inexact renames" but that is not 100% true anymore. I still think this "quadratic rename detection" is perhaps hard to parse as a non-expert. This subtlety of some "easy" inexact renames definitely makes the definition harder. Since the steps that find inexact renames without the quadratic algorithm are heuristics, perhaps this portion could instead be called "exhaustive rename detection" or even "expensive rename detection"? It perhaps implies more directly that the limit exists as a way to prevent an expensive operation. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-12 15:09 ` Derrick Stolee @ 2021-07-12 18:13 ` Elijah Newren 2021-07-14 0:47 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-12 18:13 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 7/10/2021 8:46 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > The warning when quadratic rename detection was skipped referred to > > "inexact rename detection". For years, the only linear portion of > > rename detection was looking for exact renames, so "inexact rename > > detection" was an accurate way to refer to the quadratic portion of > > rename detection. However, that changed with commit bd24aa2f97a0 > > (diffcore-rename: guide inexact rename detection based on basenames, > > 2021-02-14), so now the correct way to refer to quadratic rename > > detection is "quadratic rename detection". Fix the warning accordingly. > > Now that I read this more specific reason for using "quadratic", my > earlier comments on patch 1 are slightly less helpful. Specifically, > I was recommending to continue using "inexact renames" but that is > not 100% true anymore. > > I still think this "quadratic rename detection" is perhaps hard to > parse as a non-expert. This subtlety of some "easy" inexact renames > definitely makes the definition harder. > > Since the steps that find inexact renames without the quadratic > algorithm are heuristics, perhaps this portion could instead be > called "exhaustive rename detection" or even "expensive rename > detection"? It perhaps implies more directly that the limit exists > as a way to prevent an expensive operation. The name "exhaustive rename detection" seems reasonable to me. I'll resubmit using that term and see what folks think. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-12 18:13 ` Elijah Newren @ 2021-07-14 0:47 ` Junio C Hamano 2021-07-14 1:06 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-07-14 0:47 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List Elijah Newren <newren@gmail.com> writes: > On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> Since the steps that find inexact renames without the quadratic >> algorithm are heuristics, perhaps this portion could instead be >> called "exhaustive rename detection" or even "expensive rename >> detection"? It perhaps implies more directly that the limit exists >> as a way to prevent an expensive operation. > > The name "exhaustive rename detection" seems reasonable to me. I'll > resubmit using that term and see what folks think. Funny. In a sense, since computing content similarity is _more_ precise way to find a pair that is a likely rename than the more recent heuristics, the more expensive "exhaustive" one can be called "more precise rename detection", even though it is tempting to use "inexact" to call it, too ;-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-14 0:47 ` Junio C Hamano @ 2021-07-14 1:06 ` Elijah Newren 2021-07-14 1:10 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 1:06 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List On Tue, Jul 13, 2021 at 5:47 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > On Mon, Jul 12, 2021 at 8:09 AM Derrick Stolee <stolee@gmail.com> wrote: > >> > >> Since the steps that find inexact renames without the quadratic > >> algorithm are heuristics, perhaps this portion could instead be > >> called "exhaustive rename detection" or even "expensive rename > >> detection"? It perhaps implies more directly that the limit exists > >> as a way to prevent an expensive operation. > > > > The name "exhaustive rename detection" seems reasonable to me. I'll > > resubmit using that term and see what folks think. > > Funny. In a sense, since computing content similarity is _more_ > precise way to find a pair that is a likely rename than the more > recent heuristics, the more expensive "exhaustive" one can be called > "more precise rename detection", even though it is tempting to use > "inexact" to call it, too ;-) I'm afraid I'm not following. The more recent heuristics use content similarity as well; in fact, they use the exact same function with a higher threshold: estimate_similarity(). The exhaustiveness of the quadratic portion comes from comparing each file to more other files, not in using a different type of comparison. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-14 1:06 ` Elijah Newren @ 2021-07-14 1:10 ` Junio C Hamano 2021-07-14 1:22 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-07-14 1:10 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List Elijah Newren <newren@gmail.com> writes: > The exhaustiveness of the quadratic portion comes from comparing each > file to more other files, not in using a different type of comparison. Exactly. By culling potential matches early with heuristics, we make a trade-off of risking false-negatives but save a lot of cycles while trying to find "renames with modifications (which is what we called 'inexact rename')", and my comment equated fewer false-negatives with more precision. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-14 1:10 ` Junio C Hamano @ 2021-07-14 1:22 ` Elijah Newren 2021-07-14 5:17 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 1:22 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List On Tue, Jul 13, 2021 at 6:10 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > The exhaustiveness of the quadratic portion comes from comparing each > > file to more other files, not in using a different type of comparison. > > Exactly. By culling potential matches early with heuristics, we > make a trade-off of risking false-negatives but save a lot of cycles > while trying to find "renames with modifications (which is what we > called 'inexact rename')", and my comment equated fewer false-negatives > with more precision. Okay, I think I'm following what you're saying now, mostly, but I'm curious about the false negative comment. Am I mixing up negatives/positives (as I'm prone to do), or would it be more correct to say the new algorithm risks suboptimal positives rather than that it risks false negatives? In particular, the new algorithm will compare files with the same basename and just accept that pairing if they are similar enough, even if there might be a better match elsewhere. However, a lack of a match in same-basenamed files will not cause those files to have no match; they will instead just be included in the exhaustive detection portion, so we can still detect renames for such paths. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-14 1:22 ` Elijah Newren @ 2021-07-14 5:17 ` Junio C Hamano 2021-07-14 15:09 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2021-07-14 5:17 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List Elijah Newren <newren@gmail.com> writes: > Am I mixing up negatives/positives (as I'm prone to do), or would it > be more correct to say the new algorithm risks suboptimal positives > rather than that it risks false negatives? I'm prone to mixing them up, too, but I think they are the sides of the same coin. Imagine there is a path X on the source side, and two paths Y and Z on the destination side. With exhaustive match, Z might be a better match (content-wise) to X than Y is to X. For the path X on the source that is matched with a suboptimal counterpart Y on the destination side, we may call the situation a false-positive because with a more exhaustive search we might have been able to find Z that is a better match. For the path Z on the destination side that was culled too early with heuristics and failed to be matched with the source path X that got matched with a suboptimal destination path Y, it is a loss for Z---it wasn't chosen when it should have been (i.e. a false negative, as Z saw no counterparts). In any case, during the word search for "inexact", "more precise", "more expensive", I do not think negatives and positives will play a big role anyway, so... ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/3] diff: correct warning message when renameLimit exceeded 2021-07-14 5:17 ` Junio C Hamano @ 2021-07-14 15:09 ` Elijah Newren 0 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-07-14 15:09 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List On Tue, Jul 13, 2021 at 10:17 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > Am I mixing up negatives/positives (as I'm prone to do), or would it > > be more correct to say the new algorithm risks suboptimal positives > > rather than that it risks false negatives? > > I'm prone to mixing them up, too, but I think they are the sides of > the same coin. Imagine there is a path X on the source side, and > two paths Y and Z on the destination side. With exhaustive match, > Z might be a better match (content-wise) to X than Y is to X. > > For the path X on the source that is matched with a suboptimal > counterpart Y on the destination side, we may call the situation a > false-positive because with a more exhaustive search we might have > been able to find Z that is a better match. For the path Z on the > destination side that was culled too early with heuristics and > failed to be matched with the source path X that got matched with a > suboptimal destination path Y, it is a loss for Z---it wasn't chosen > when it should have been (i.e. a false negative, as Z saw no > counterparts). > > In any case, during the word search for "inexact", "more precise", > "more expensive", I do not think negatives and positives will play a > big role anyway, so... Indeed, I've gotten us off on a bit of a tangent, but thanks for taking the time to answer my questions. :-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults 2021-07-11 0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-07-11 0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget @ 2021-07-14 1:12 ` Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget ` (4 more replies) 3 siblings, 5 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-14 1:12 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren Fix a few small issues with documentation and warnings around the limits for the quadratic portion of rename (©) detection, and bump the default limits. Discussion on bumping the limits can be found at [1]. Although it appears we generally agree we could switch to an unlimited setting for merge.renameLimit, that would require some changes to progress bars to notify users how to take action once things start taking a while. So, for now, just bump the limits. [1] https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u Changes since v1: * Shuffled patch order since the explanation of why "inexact rename detection" is incorrect was in the third patch * Use the term "exhaustive rename detection" for the quadratic portion * Simplify -l description by just stating that it defaults to diff.renameLimit (since it in turn has the right default value) * Fix asciidoc formating * Include bump of the limits in a new patch Elijah Newren (4): diff: correct warning message when renameLimit exceeded doc: clarify documentation for rename/copy limits doc: document the special handling of -l0 Bump rename limit defaults (yet again) Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 12 ++++++++---- diff.c | 4 ++-- merge-ort.c | 2 +- merge-recursive.c | 2 +- 6 files changed, 22 insertions(+), 15 deletions(-) base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v2 Pull-Request: https://github.com/git/git/pull/1044 Range-diff vs v1: 3: 44a5d5efaa6 ! 1: 0d1d0f180a3 diff: correct warning message when renameLimit exceeded @@ Commit message detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, - 2021-02-14), so now the correct way to refer to quadratic rename - detection is "quadratic rename detection". Fix the warning accordingly. + 2021-02-14). Let's instead use the term "exhaustive rename detection" + to refer to the quadratic portion. Signed-off-by: Elijah Newren <newren@gmail.com> @@ diff.c: static int is_summary_empty(const struct diff_queue_struct *q) static const char rename_limit_warning[] = -N_("inexact rename detection was skipped due to too many files."); -+N_("quadratic rename detection was skipped due to too many files."); ++N_("exhaustive rename detection was skipped due to too many files."); static const char degrade_cc_to_c_warning[] = N_("only found copies from modified paths due to too many files."); 1: 7dc448df6ec ! 2: 4046993a9a2 doc: clarify documentation for rename/copy limits @@ Documentation/config/diff.txt: diff.orderFile:: - The number of files to consider when performing the copy/rename - detection; equivalent to the 'git diff' option `-l`. This setting - has no effect if rename detection is turned off. -+ The number of files to consider in the quadratic portion of ++ The number of files to consider in the exhaustive portion of + copy/rename detection; equivalent to the 'git diff' option + `-l`. If not set, the default value is 400. This setting has + no effect if rename detection is turned off. @@ Documentation/config/merge.txt: merge.verifySignatures:: - during a merge; if not specified, defaults to the value of - diff.renameLimit. This setting has no effect if rename detection - is turned off. -+ The number of files to consider in the quadratic portion of ++ The number of files to consider in the exhaustive portion of + rename detection during a merge. If not specified, defaults + to the value of diff.renameLimit. If neither + merge.renameLimit nor diff.renameLimit are specified, defaults @@ Documentation/diff-options.txt: When used together with `-B`, omit also the prei - The `-M` and `-C` options require O(n^2) processing time where n - is the number of potential rename/copy targets. This - option prevents rename/copy detection from running if -- the number of rename/copy targets exceeds the specified ++ The `-M` and `-C` options have an exhaustive portion that ++ requires O(n^2) processing time where n is the number of ++ potential rename/copy targets. This option prevents the ++ exhaustive portion of rename/copy detection from running if + the number of rename/copy targets exceeds the specified - number. -+ The `-M` and `-C` options can require O(n^2) processing time -+ where n is the number of potential rename/copy targets. This -+ option prevents the quadratic portion of rename/copy detection -+ from running if the number of rename/copy targets exceeds the -+ specified number. Defaults to diff.renameLimit, or if that is -+ also unspecified, to 400. ++ number. Defaults to diff.renameLimit. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: 2: ee0969429cb ! 3: 6f5767607cd doc: document the special handling of -l0 @@ Commit message ## Documentation/diff-options.txt ## @@ Documentation/diff-options.txt: of a delete/create pair. - specified number. Defaults to diff.renameLimit, or if that is - also unspecified, to 400. + exhaustive portion of rename/copy detection from running if + the number of rename/copy targets exceeds the specified + number. Defaults to diff.renameLimit. +++ ++Note that for backward compatibility reasons, a value of 0 is treated ++the same as if a large value was passed (currently, 32767). -+ Note that for backward compatibility reasons, a value of 0 is treated -+ the same as if 32767 was passed. -+ ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: - Select only files that are Added (`A`), Copied (`C`), -: ----------- > 4: 8f1deb6dd16 Bump rename limit defaults (yet again) -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget @ 2021-07-14 1:12 ` Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget ` (3 subsequent siblings) 4 siblings, 0 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-14 1:12 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The warning when quadratic rename detection was skipped referred to "inexact rename detection". For years, the only linear portion of rename detection was looking for exact renames, so "inexact rename detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14). Let's instead use the term "exhaustive rename detection" to refer to the quadratic portion. Signed-off-by: Elijah Newren <newren@gmail.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 52c791574b7..2454e34cf6d 100644 --- a/diff.c +++ b/diff.c @@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q) } static const char rename_limit_warning[] = -N_("inexact rename detection was skipped due to too many files."); +N_("exhaustive rename detection was skipped due to too many files."); static const char degrade_cc_to_c_warning[] = N_("only found copies from modified paths due to too many files."); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 2/4] doc: clarify documentation for rename/copy limits 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget @ 2021-07-14 1:12 ` Elijah Newren via GitGitGadget 2021-07-14 7:37 ` Ævar Arnfjörð Bjarmason 2021-07-14 1:12 ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget ` (2 subsequent siblings) 4 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-14 1:12 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A few places in the docs implied that rename/copy detection is always quadratic or that all (unpaired) files were involved in the quadratic portion of rename/copy detection. The following two commits each introduced an exception to this: 9027f53cb505 (Do linear-time/space rename logic for exact renames, 2007-10-25) bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14) (As a side note, for copy detection, the basename guided inexact rename detection is turned off and the exact renames will only result in sources (without the dests) being removed from the set of files used in quadratic detection. So, for copy detection, the documentation was closer to correct.) Avoid implying that all files involved in rename/copy detection are subject to the full quadratic algorithm. While at it, also note the default values for all these settings. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 9 +++++---- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt index 2d3331f55c2..e26a63d0d42 100644 --- a/Documentation/config/diff.txt +++ b/Documentation/config/diff.txt @@ -118,9 +118,10 @@ diff.orderFile:: relative to the top of the working tree. diff.renameLimit:: - The number of files to consider when performing the copy/rename - detection; equivalent to the 'git diff' option `-l`. This setting - has no effect if rename detection is turned off. + The number of files to consider in the exhaustive portion of + copy/rename detection; equivalent to the 'git diff' option + `-l`. If not set, the default value is 400. This setting has + no effect if rename detection is turned off. diff.renames:: Whether and how Git detects renames. If set to "false", diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt index 6b66c83eabe..aca0c92dbe6 100644 --- a/Documentation/config/merge.txt +++ b/Documentation/config/merge.txt @@ -33,10 +33,12 @@ merge.verifySignatures:: include::fmt-merge-msg.txt[] merge.renameLimit:: - The number of files to consider when performing rename detection - during a merge; if not specified, defaults to the value of - diff.renameLimit. This setting has no effect if rename detection - is turned off. + The number of files to consider in the exhaustive portion of + rename detection during a merge. If not specified, defaults + to the value of diff.renameLimit. If neither + merge.renameLimit nor diff.renameLimit are specified, defaults + to 1000. This setting has no effect if rename detection is + turned off. merge.renames:: Whether Git detects renames. If set to "false", rename detection diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 32e6dee5ac3..11e08c3fd36 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part of a delete/create pair. -l<num>:: - The `-M` and `-C` options require O(n^2) processing time where n - is the number of potential rename/copy targets. This - option prevents rename/copy detection from running if + The `-M` and `-C` options have an exhaustive portion that + requires O(n^2) processing time where n is the number of + potential rename/copy targets. This option prevents the + exhaustive portion of rename/copy detection from running if the number of rename/copy targets exceeds the specified - number. + number. Defaults to diff.renameLimit. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits 2021-07-14 1:12 ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget @ 2021-07-14 7:37 ` Ævar Arnfjörð Bjarmason 2021-07-14 16:30 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-07-14 7:37 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Jeff King, Elijah Newren On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > A few places in the docs implied that rename/copy detection is always > quadratic or that all (unpaired) files were involved in the quadratic > portion of rename/copy detection. The following two commits each > introduced an exception to this: > > 9027f53cb505 (Do linear-time/space rename logic for exact renames, > 2007-10-25) > bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based > on basenames, 2021-02-14) > > (As a side note, for copy detection, the basename guided inexact rename > detection is turned off and the exact renames will only result in > sources (without the dests) being removed from the set of files used in > quadratic detection. So, for copy detection, the documentation was > closer to correct.) > > Avoid implying that all files involved in rename/copy detection are > subject to the full quadratic algorithm. While at it, also note the > default values for all these settings. > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > Documentation/config/diff.txt | 7 ++++--- > Documentation/config/merge.txt | 10 ++++++---- > Documentation/diff-options.txt | 9 +++++---- > 3 files changed, 15 insertions(+), 11 deletions(-) > > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt > index 2d3331f55c2..e26a63d0d42 100644 > --- a/Documentation/config/diff.txt > +++ b/Documentation/config/diff.txt > @@ -118,9 +118,10 @@ diff.orderFile:: > relative to the top of the working tree. > > diff.renameLimit:: > - The number of files to consider when performing the copy/rename > - detection; equivalent to the 'git diff' option `-l`. This setting > - has no effect if rename detection is turned off. > + The number of files to consider in the exhaustive portion of > + copy/rename detection; equivalent to the 'git diff' option > + `-l`. If not set, the default value is 400. This setting has > + no effect if rename detection is turned off. For both this... > diff.renames:: > Whether and how Git detects renames. If set to "false", > diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt > index 6b66c83eabe..aca0c92dbe6 100644 > --- a/Documentation/config/merge.txt > +++ b/Documentation/config/merge.txt > @@ -33,10 +33,12 @@ merge.verifySignatures:: > include::fmt-merge-msg.txt[] > > merge.renameLimit:: > - The number of files to consider when performing rename detection > - during a merge; if not specified, defaults to the value of > - diff.renameLimit. This setting has no effect if rename detection > - is turned off. > + The number of files to consider in the exhaustive portion of > + rename detection during a merge. If not specified, defaults > + to the value of diff.renameLimit. If neither > + merge.renameLimit nor diff.renameLimit are specified, defaults > + to 1000. This setting has no effect if rename detection is > + turned off. ...and this we should really have some wording to the effect of: ..., defaults to XYZ. The exact (or even approximate) default of XYZ should not be relied upon, and may be changed (or these limits even removed) in future versions of git. I.e. let's distinguish a "this is how it works now, FYI" from a forward promise that it's going to work like that forever. Which also leads me to: > merge.renames:: > Whether Git detects renames. If set to "false", rename detection > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index 32e6dee5ac3..11e08c3fd36 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part > of a delete/create pair. > > -l<num>:: > - The `-M` and `-C` options require O(n^2) processing time where n > - is the number of potential rename/copy targets. This > - option prevents rename/copy detection from running if > + The `-M` and `-C` options have an exhaustive portion that > + requires O(n^2) processing time where n is the number of > + potential rename/copy targets. This option prevents the > + exhaustive portion of rename/copy detection from running if > the number of rename/copy targets exceeds the specified > - number. > + number. Defaults to diff.renameLimit. This mention of O(n^2). This is not an issue with your patch/series, nd this is an improvement. But as a side-comment: Do we explain somewhere how exactly this {diff,merge}.renmeLimit=N works for values of N? It's probably fine to continue to handwave it away, but is there anything that say tells a user what happens if they have 400 files in their repository in a "x" directory and "git mv x y"? Will it work, but if they add a 401th file it won't for diffs? I *think* given a skimming of previous discussions that it's "no", i.e. that this just kicks in for these expensive cases, and e.g. for such a case of moving the same tree OID from "x" to "y" we'd short-circuit things, but maybe I'm just making things up at this point. Feel free to ignore me here, I guess this amounts to a request that it would be great if we could point to some docs about how this works. It's probably too much detail for Documentation/config/{merge,diff}.txt, so maybe a section in either of git-{diff,merge}.txt ? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits 2021-07-14 7:37 ` Ævar Arnfjörð Bjarmason @ 2021-07-14 16:30 ` Elijah Newren 2021-07-14 22:08 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 16:30 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Jeff King On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote: > > > From: Elijah Newren <newren@gmail.com> > > ... > > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt > > index 2d3331f55c2..e26a63d0d42 100644 > > --- a/Documentation/config/diff.txt > > +++ b/Documentation/config/diff.txt > > @@ -118,9 +118,10 @@ diff.orderFile:: > > relative to the top of the working tree. > > > > diff.renameLimit:: > > - The number of files to consider when performing the copy/rename > > - detection; equivalent to the 'git diff' option `-l`. This setting > > - has no effect if rename detection is turned off. > > + The number of files to consider in the exhaustive portion of > > + copy/rename detection; equivalent to the 'git diff' option > > + `-l`. If not set, the default value is 400. This setting has > > + no effect if rename detection is turned off. > > For both this... > > > diff.renames:: > > Whether and how Git detects renames. If set to "false", > > diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt > > index 6b66c83eabe..aca0c92dbe6 100644 > > --- a/Documentation/config/merge.txt > > +++ b/Documentation/config/merge.txt > > @@ -33,10 +33,12 @@ merge.verifySignatures:: > > include::fmt-merge-msg.txt[] > > > > merge.renameLimit:: > > - The number of files to consider when performing rename detection > > - during a merge; if not specified, defaults to the value of > > - diff.renameLimit. This setting has no effect if rename detection > > - is turned off. > > + The number of files to consider in the exhaustive portion of > > + rename detection during a merge. If not specified, defaults > > + to the value of diff.renameLimit. If neither > > + merge.renameLimit nor diff.renameLimit are specified, defaults > > + to 1000. This setting has no effect if rename detection is > > + turned off. > > ...and this we should really have some wording to the effect of: > > ..., defaults to XYZ. The exact (or even approximate) default of XYZ > should not be relied upon, and may be changed (or these limits even > removed) in future versions of git. > > I.e. let's distinguish a "this is how it works now, FYI" from a forward > promise that it's going to work like that forever. Perhaps I could simplify that to "...currently defaults to XYZ"? Does that capture your intent well enough? I agree very much that this shouldn't be a hard promise. If it was, we've broken it repeatedly over the years. Not only have we modified the default numbers multiple times, there have also been multiple times when we've been able to change how many renames could be handled without changing the default numbers for the rename limits. (This was done by adding or improving special cases that allow us to exclude paths from the exhaustive comparison; exact rename detection that someone else added and my more recent improvements to exact rename detection are two simple examples). > Which also leads me to: > > > merge.renames:: > > Whether Git detects renames. If set to "false", rename detection > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > > index 32e6dee5ac3..11e08c3fd36 100644 > > --- a/Documentation/diff-options.txt > > +++ b/Documentation/diff-options.txt > > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part > > of a delete/create pair. > > > > -l<num>:: > > - The `-M` and `-C` options require O(n^2) processing time where n > > - is the number of potential rename/copy targets. This > > - option prevents rename/copy detection from running if > > + The `-M` and `-C` options have an exhaustive portion that > > + requires O(n^2) processing time where n is the number of > > + potential rename/copy targets. This option prevents the > > + exhaustive portion of rename/copy detection from running if > > the number of rename/copy targets exceeds the specified > > - number. > > + number. Defaults to diff.renameLimit. > > This mention of O(n^2). This is not an issue with your patch/series, nd > this is an improvement. > > But as a side-comment: Do we explain somewhere how exactly this > {diff,merge}.renmeLimit=N works for values of N? It's probably fine to > continue to handwave it away, but is there anything that say tells a > user what happens if they have 400 files in their repository in a "x" > directory and "git mv x y"? Will it work, but if they add a 401th file > it won't for diffs? > > I *think* given a skimming of previous discussions that it's "no", > i.e. that this just kicks in for these expensive cases, and e.g. for > such a case of moving the same tree OID from "x" to "y" we'd > short-circuit things, but maybe I'm just making things up at this point. Tree OIDs aren't used by the rename detection logic, but you're close. > Feel free to ignore me here, I guess this amounts to a request that it > would be great if we could point to some docs about how this works. It's > probably too much detail for Documentation/config/{merge,diff}.txt, so > maybe a section in either of git-{diff,merge}.txt ? If git-merge.txt includes it, then rebase, cherry-pick, revert probably should too. (and maybe the -3 option of git-am) Anyway... In the "git mv dir-x dir-y" case, if no other changes are made to those files, then the renames would be detectable via blobs having the same hashes (i.e. exact renames). So all these renames would be found regardless of the number of files and without exhaustive detection even kicking in. If there were thousands of such files, and every one of them were also modified, then the basename-guided rename detection which operates in linear time would still find all the renames, without the exhaustive detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14) and 37a251436447 (diffcore-rename: use directory rename guided basename comparisons, 2021-02-27)). If this were a merge, and there were thousands of such files renamed and modified, AND they were all further renamed to change their basename, BUT the original directory was unmodified on the other side of history, then we wouldn't need any of the renames for the purposes of the merge and they'd all be excluded from the exhaustive rename detection. Sure, we wouldn't find renames for those files, but that wouldn't change the result of the merge. So, again, no exhaustive rename detection. (See commit 174791f0fb23 (merge-ort: use relevant_sources to filter possible rename sources, 2021-03-11)) (Rebases & cherry-picks also have an advantage of caching renames from previous picks on the upstream side of history to avoid needing to re-detect renames on that side...though it only caches renames that are either relevant or exact.) Basically, we bail on exhaustive rename detection if, after the linear or faster steps have run (excluding previously cached renames, exact rename detection, skipping irrelevant renames, basename guided rename detection), the number of remaining unpaired source files times the number of remaining unpaired destination files is greater than rename_limit * rename_limit. (If we can assume the numbers match, i.e. N = number of remaining unpaired sources = number of remaining unpaired destinations, then that check simplifies to bailing once N > rename_limit) I'm not sure if these details are really going to help the users, especially if more special cases are added (and more have been proposed by Johannes and became an Outreachy project, though that one didn't get off the ground). And this explanation is not even fully detailed; I'm still hand waving here in at least four different ways (mostly in ignoring special cases for different fast steps, but also by ignoring copy detection that itself messes with the explanation in multiple ways). But maybe someone else can figure out a way to hand wave my hand waving down to a simple enough explanation, while also covering copy detection, and managing to do so in a way that doesn't mislead. If so, we can include that in the docs somewhere. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits 2021-07-14 16:30 ` Elijah Newren @ 2021-07-14 22:08 ` Ævar Arnfjörð Bjarmason 2021-07-14 22:56 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-07-14 22:08 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Jeff King On Wed, Jul 14 2021, Elijah Newren wrote: > On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote: >> >> > From: Elijah Newren <newren@gmail.com> >> > > ... >> > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt >> > index 2d3331f55c2..e26a63d0d42 100644 >> > --- a/Documentation/config/diff.txt >> > +++ b/Documentation/config/diff.txt >> > @@ -118,9 +118,10 @@ diff.orderFile:: >> > relative to the top of the working tree. >> > >> > diff.renameLimit:: >> > - The number of files to consider when performing the copy/rename >> > - detection; equivalent to the 'git diff' option `-l`. This setting >> > - has no effect if rename detection is turned off. >> > + The number of files to consider in the exhaustive portion of >> > + copy/rename detection; equivalent to the 'git diff' option >> > + `-l`. If not set, the default value is 400. This setting has >> > + no effect if rename detection is turned off. >> >> For both this... >> >> > diff.renames:: >> > Whether and how Git detects renames. If set to "false", >> > diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt >> > index 6b66c83eabe..aca0c92dbe6 100644 >> > --- a/Documentation/config/merge.txt >> > +++ b/Documentation/config/merge.txt >> > @@ -33,10 +33,12 @@ merge.verifySignatures:: >> > include::fmt-merge-msg.txt[] >> > >> > merge.renameLimit:: >> > - The number of files to consider when performing rename detection >> > - during a merge; if not specified, defaults to the value of >> > - diff.renameLimit. This setting has no effect if rename detection >> > - is turned off. >> > + The number of files to consider in the exhaustive portion of >> > + rename detection during a merge. If not specified, defaults >> > + to the value of diff.renameLimit. If neither >> > + merge.renameLimit nor diff.renameLimit are specified, defaults >> > + to 1000. This setting has no effect if rename detection is >> > + turned off. >> >> ...and this we should really have some wording to the effect of: >> >> ..., defaults to XYZ. The exact (or even approximate) default of XYZ >> should not be relied upon, and may be changed (or these limits even >> removed) in future versions of git. >> >> I.e. let's distinguish a "this is how it works now, FYI" from a forward >> promise that it's going to work like that forever. > > Perhaps I could simplify that to "...currently defaults to XYZ"? Does > that capture your intent well enough? Yes, and it's not as needlessly verbose :) > I agree very much that this shouldn't be a hard promise. If it was, > we've broken it repeatedly over the years. Not only have we modified > the default numbers multiple times, there have also been multiple > times when we've been able to change how many renames could be handled > without changing the default numbers for the rename limits. (This was > done by adding or improving special cases that allow us to exclude > paths from the exhaustive comparison; exact rename detection that > someone else added and my more recent improvements to exact rename > detection are two simple examples). > >> Which also leads me to: >> >> > merge.renames:: >> > Whether Git detects renames. If set to "false", rename detection >> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt >> > index 32e6dee5ac3..11e08c3fd36 100644 >> > --- a/Documentation/diff-options.txt >> > +++ b/Documentation/diff-options.txt >> > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part >> > of a delete/create pair. >> > >> > -l<num>:: >> > - The `-M` and `-C` options require O(n^2) processing time where n >> > - is the number of potential rename/copy targets. This >> > - option prevents rename/copy detection from running if >> > + The `-M` and `-C` options have an exhaustive portion that >> > + requires O(n^2) processing time where n is the number of >> > + potential rename/copy targets. This option prevents the >> > + exhaustive portion of rename/copy detection from running if >> > the number of rename/copy targets exceeds the specified >> > - number. >> > + number. Defaults to diff.renameLimit. >> >> This mention of O(n^2). This is not an issue with your patch/series, nd >> this is an improvement. >> >> But as a side-comment: Do we explain somewhere how exactly this >> {diff,merge}.renmeLimit=N works for values of N? It's probably fine to >> continue to handwave it away, but is there anything that say tells a >> user what happens if they have 400 files in their repository in a "x" >> directory and "git mv x y"? Will it work, but if they add a 401th file >> it won't for diffs? >> >> I *think* given a skimming of previous discussions that it's "no", >> i.e. that this just kicks in for these expensive cases, and e.g. for >> such a case of moving the same tree OID from "x" to "y" we'd >> short-circuit things, but maybe I'm just making things up at this point. > > Tree OIDs aren't used by the rename detection logic, but you're close. > >> Feel free to ignore me here, I guess this amounts to a request that it >> would be great if we could point to some docs about how this works. It's >> probably too much detail for Documentation/config/{merge,diff}.txt, so >> maybe a section in either of git-{diff,merge}.txt ? > > If git-merge.txt includes it, then rebase, cherry-pick, revert > probably should too. (and maybe the -3 option of git-am) > > Anyway... Some of this we do via includes, but I also think we should have more "concept docs", like gitglossary, gitrevisions, so perhaps gitrenames(5)? Anyway, not for now... > In the "git mv dir-x dir-y" case, if no other changes are made to > those files, then the renames would be detectable via blobs having the > same hashes (i.e. exact renames). So all these renames would be found > regardless of the number of files and without exhaustive detection > even kicking in. > > If there were thousands of such files, and every one of them were also > modified, then the basename-guided rename detection which operates in > linear time would still find all the renames, without the exhaustive > detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename: > guide inexact rename detection based on basenames, 2021-02-14) and > 37a251436447 (diffcore-rename: use directory rename guided basename > comparisons, 2021-02-27)). > > If this were a merge, and there were thousands of such files renamed > and modified, AND they were all further renamed to change their > basename, BUT the original directory was unmodified on the other side > of history, then we wouldn't need any of the renames for the purposes > of the merge and they'd all be excluded from the exhaustive rename > detection. Sure, we wouldn't find renames for those files, but that > wouldn't change the result of the merge. So, again, no exhaustive > rename detection. (See commit 174791f0fb23 (merge-ort: use > relevant_sources to filter possible rename sources, 2021-03-11)) > > (Rebases & cherry-picks also have an advantage of caching renames from > previous picks on the upstream side of history to avoid needing to > re-detect renames on that side...though it only caches renames that > are either relevant or exact.) > > Basically, we bail on exhaustive rename detection if, after the linear > or faster steps have run (excluding previously cached renames, exact > rename detection, skipping irrelevant renames, basename guided rename > detection), the number of remaining unpaired source files times the > number of remaining unpaired destination files is greater than > rename_limit * rename_limit. (If we can assume the numbers match, > i.e. N = number of remaining unpaired sources = number of remaining > unpaired destinations, then that check simplifies to bailing once N > > rename_limit) > > > I'm not sure if these details are really going to help the users, > especially if more special cases are added (and more have been > proposed by Johannes and became an Outreachy project, though that one > didn't get off the ground). And this explanation is not even fully > detailed; I'm still hand waving here in at least four different ways > (mostly in ignoring special cases for different fast steps, but also > by ignoring copy detection that itself messes with the explanation in > multiple ways). > > But maybe someone else can figure out a way to hand wave my hand > waving down to a simple enough explanation, while also covering copy > detection, and managing to do so in a way that doesn't mislead. If > so, we can include that in the docs somewhere. I think it would be good to briefly explain it in the sense of not explaining it, i.e. give the user who ran into this some idea of what they're looking at. Perhaps something like (and maybe this is too long...); [...] these limits are limits to the number of "steps" in an an algorithm that's subject to change, and whose number of steps will depend on your input data. They don't correspond to any easily tangible thing, e.g. a limit of 100 has no correspondence with detecting all renames in a commit that changed 10 files, but not a commit that changed 11 files. The limits were picked to limit runtime to an approximate desired value, given what was thought to be Representative data from a repository similar to linux.git. If you find your diff/merge take X amount of time doing rename detection, then generally speaking doubling the limit will about double the time we spend on it. So they're an indirect a way to get to a runtime limit of (what was it, 2 and 10 seconds?). The reason for this configuration not being a time limit is so that git can guarantee identical results for the same input data across different systems, given the same version and configuration. I.e. the common case of running your attempt at a diff/merge twice in a row, a time limit would produce unpredictable results. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] doc: clarify documentation for rename/copy limits 2021-07-14 22:08 ` Ævar Arnfjörð Bjarmason @ 2021-07-14 22:56 ` Elijah Newren 0 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-07-14 22:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Jeff King On Wed, Jul 14, 2021 at 3:27 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Wed, Jul 14 2021, Elijah Newren wrote: > > > On Wed, Jul 14, 2021 at 12:47 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> > >> On Wed, Jul 14 2021, Elijah Newren via GitGitGadget wrote: > >> ... > >> ...and this we should really have some wording to the effect of: > >> > >> ..., defaults to XYZ. The exact (or even approximate) default of XYZ > >> should not be relied upon, and may be changed (or these limits even > >> removed) in future versions of git. > >> > >> I.e. let's distinguish a "this is how it works now, FYI" from a forward > >> promise that it's going to work like that forever. > > > > Perhaps I could simplify that to "...currently defaults to XYZ"? Does > > that capture your intent well enough? > > Yes, and it's not as needlessly verbose :) Cool, I'll include that change then. > > I agree very much that this shouldn't be a hard promise. If it was, > > we've broken it repeatedly over the years. Not only have we modified > > the default numbers multiple times, there have also been multiple > > times when we've been able to change how many renames could be handled > > without changing the default numbers for the rename limits. (This was > > done by adding or improving special cases that allow us to exclude > > paths from the exhaustive comparison; exact rename detection that > > someone else added and my more recent improvements to exact rename > > detection are two simple examples). > > > >> Which also leads me to: > >> > >> > merge.renames:: > >> > Whether Git detects renames. If set to "false", rename detection > >> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > >> > index 32e6dee5ac3..11e08c3fd36 100644 > >> > --- a/Documentation/diff-options.txt > >> > +++ b/Documentation/diff-options.txt > >> > @@ -588,11 +588,12 @@ When used together with `-B`, omit also the preimage in the deletion part > >> > of a delete/create pair. > >> > > >> > -l<num>:: > >> > - The `-M` and `-C` options require O(n^2) processing time where n > >> > - is the number of potential rename/copy targets. This > >> > - option prevents rename/copy detection from running if > >> > + The `-M` and `-C` options have an exhaustive portion that > >> > + requires O(n^2) processing time where n is the number of > >> > + potential rename/copy targets. This option prevents the > >> > + exhaustive portion of rename/copy detection from running if > >> > the number of rename/copy targets exceeds the specified > >> > - number. > >> > + number. Defaults to diff.renameLimit. > >> > >> This mention of O(n^2). This is not an issue with your patch/series, nd > >> this is an improvement. > >> > >> But as a side-comment: Do we explain somewhere how exactly this > >> {diff,merge}.renmeLimit=N works for values of N? It's probably fine to > >> continue to handwave it away, but is there anything that say tells a > >> user what happens if they have 400 files in their repository in a "x" > >> directory and "git mv x y"? Will it work, but if they add a 401th file > >> it won't for diffs? > >> > >> I *think* given a skimming of previous discussions that it's "no", > >> i.e. that this just kicks in for these expensive cases, and e.g. for > >> such a case of moving the same tree OID from "x" to "y" we'd > >> short-circuit things, but maybe I'm just making things up at this point. > > > > Tree OIDs aren't used by the rename detection logic, but you're close. > > > >> Feel free to ignore me here, I guess this amounts to a request that it > >> would be great if we could point to some docs about how this works. It's > >> probably too much detail for Documentation/config/{merge,diff}.txt, so > >> maybe a section in either of git-{diff,merge}.txt ? > > > > If git-merge.txt includes it, then rebase, cherry-pick, revert > > probably should too. (and maybe the -3 option of git-am) > > > > Anyway... > > Some of this we do via includes, but I also think we should have more > "concept docs", like gitglossary, gitrevisions, so perhaps > gitrenames(5)? Anyway, not for now... > > > In the "git mv dir-x dir-y" case, if no other changes are made to > > those files, then the renames would be detectable via blobs having the > > same hashes (i.e. exact renames). So all these renames would be found > > regardless of the number of files and without exhaustive detection > > even kicking in. > > > > If there were thousands of such files, and every one of them were also > > modified, then the basename-guided rename detection which operates in > > linear time would still find all the renames, without the exhaustive > > detection even kicking in (see commits bd24aa2f97a0 (diffcore-rename: > > guide inexact rename detection based on basenames, 2021-02-14) and > > 37a251436447 (diffcore-rename: use directory rename guided basename > > comparisons, 2021-02-27)). > > > > If this were a merge, and there were thousands of such files renamed > > and modified, AND they were all further renamed to change their > > basename, BUT the original directory was unmodified on the other side > > of history, then we wouldn't need any of the renames for the purposes > > of the merge and they'd all be excluded from the exhaustive rename > > detection. Sure, we wouldn't find renames for those files, but that > > wouldn't change the result of the merge. So, again, no exhaustive > > rename detection. (See commit 174791f0fb23 (merge-ort: use > > relevant_sources to filter possible rename sources, 2021-03-11)) > > > > (Rebases & cherry-picks also have an advantage of caching renames from > > previous picks on the upstream side of history to avoid needing to > > re-detect renames on that side...though it only caches renames that > > are either relevant or exact.) > > > > Basically, we bail on exhaustive rename detection if, after the linear > > or faster steps have run (excluding previously cached renames, exact > > rename detection, skipping irrelevant renames, basename guided rename > > detection), the number of remaining unpaired source files times the > > number of remaining unpaired destination files is greater than > > rename_limit * rename_limit. (If we can assume the numbers match, > > i.e. N = number of remaining unpaired sources = number of remaining > > unpaired destinations, then that check simplifies to bailing once N > > > rename_limit) > > > > > > I'm not sure if these details are really going to help the users, > > especially if more special cases are added (and more have been > > proposed by Johannes and became an Outreachy project, though that one > > didn't get off the ground). And this explanation is not even fully > > detailed; I'm still hand waving here in at least four different ways > > (mostly in ignoring special cases for different fast steps, but also > > by ignoring copy detection that itself messes with the explanation in > > multiple ways). > > > > But maybe someone else can figure out a way to hand wave my hand > > waving down to a simple enough explanation, while also covering copy > > detection, and managing to do so in a way that doesn't mislead. If > > so, we can include that in the docs somewhere. > > I think it would be good to briefly explain it in the sense of not > explaining it, i.e. give the user who ran into this some idea of what > they're looking at. Perhaps something like (and maybe this is too > long...); > > [...] these limits are limits to the number of "steps" in an an > algorithm that's subject to change, and whose number of steps will > depend on your input data. > > They don't correspond to any easily tangible thing, e.g. a limit of > 100 has no correspondence with detecting all renames in a commit > that changed 10 files, but not a commit that changed 11 files. > > The limits were picked to limit runtime to an approximate desired > value, given what was thought to be Representative data from a > repository similar to linux.git. > > If you find your diff/merge take X amount of time doing rename > detection, then generally speaking doubling the limit will about > double the time we spend on it. ...*quadruple* the time we spend on it. > So they're an indirect a way to get to a runtime limit of (what was > it, 2 and 10 seconds?). > > The reason for this configuration not being a time limit is so that > git can guarantee identical results for the same input data across > different systems, given the same version and configuration. > I.e. the common case of running your attempt at a diff/merge twice > in a row, a time limit would produce unpredictable results. Seems pretty lengthy. Thinking about it for a bit, perhaps we could just say something like: After preliminary steps that can detect subsets of renames quickly, an exhaustive fallback algorithm is used that compares all remaining unpaired sources with all remaining unpaired destinations looking for similar files that can be marked as renames. (Or, in the case of copy detection, it compares all original sources with all remaining unpaired destinations). With N sources and N destinations, this algorithm is O(N^2). The rename_limit says to skip this exhaustive step if N is greater than the rename_limit. (More precisely, it says to skip the exhaustive step if number_sources * number_destinations > rename_limit * rename_limit, which handles the case where there are a different number of sources and destinations.) ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 3/4] doc: document the special handling of -l0 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget @ 2021-07-14 1:12 ` Elijah Newren via GitGitGadget 2021-07-14 16:45 ` Jeff King 2021-07-14 1:12 ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 4 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-14 1:12 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean -l<large>, 2017-11-29), -l0 has had a magical special "large" historical value associated with it. Document this value, particularly since it is not large enough for some uses -- see commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit, 2017-11-13). Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/diff-options.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 11e08c3fd36..ba40ac66cc9 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -594,6 +594,9 @@ of a delete/create pair. exhaustive portion of rename/copy detection from running if the number of rename/copy targets exceeds the specified number. Defaults to diff.renameLimit. ++ +Note that for backward compatibility reasons, a value of 0 is treated +the same as if a large value was passed (currently, 32767). ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 3/4] doc: document the special handling of -l0 2021-07-14 1:12 ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget @ 2021-07-14 16:45 ` Jeff King 2021-07-14 17:17 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2021-07-14 16:45 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 01:12:32AM +0000, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 > mean -l<large>, 2017-11-29), -l0 has had a magical special "large" > historical value associated with it. Document this value, particularly > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d > (diff: remove silent clamp of renameLimit, 2017-11-13). > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > Documentation/diff-options.txt | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index 11e08c3fd36..ba40ac66cc9 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -594,6 +594,9 @@ of a delete/create pair. > exhaustive portion of rename/copy detection from running if > the number of rename/copy targets exceeds the specified > number. Defaults to diff.renameLimit. > ++ > +Note that for backward compatibility reasons, a value of 0 is treated > +the same as if a large value was passed (currently, 32767). Given the confusion around what "32767" even means to users, I wonder if we could just say: a value of 0 removes any artificial limits (but Git still has some internal limits which real-world cases are not likely to hit). Removing limits is after all the point of "0". I'm also not sure if it is simply for backwards compatibility. We commonly let "0" or "-1" mean "no limit" for convenience. It seems like something we'd want to support. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 3/4] doc: document the special handling of -l0 2021-07-14 16:45 ` Jeff King @ 2021-07-14 17:17 ` Elijah Newren 2021-07-14 17:33 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 17:17 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 9:45 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jul 14, 2021 at 01:12:32AM +0000, Elijah Newren via GitGitGadget wrote: > > > From: Elijah Newren <newren@gmail.com> > > > > As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 > > mean -l<large>, 2017-11-29), -l0 has had a magical special "large" > > historical value associated with it. Document this value, particularly > > since it is not large enough for some uses -- see commit 9f7e4bfa3b6d > > (diff: remove silent clamp of renameLimit, 2017-11-13). > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > Documentation/diff-options.txt | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > > index 11e08c3fd36..ba40ac66cc9 100644 > > --- a/Documentation/diff-options.txt > > +++ b/Documentation/diff-options.txt > > @@ -594,6 +594,9 @@ of a delete/create pair. > > exhaustive portion of rename/copy detection from running if > > the number of rename/copy targets exceeds the specified > > number. Defaults to diff.renameLimit. > > ++ > > +Note that for backward compatibility reasons, a value of 0 is treated > > +the same as if a large value was passed (currently, 32767). > > Given the confusion around what "32767" even means to users, I wonder if > we could just say: a value of 0 removes any artificial limits (but Git > still has some internal limits which real-world cases are not likely to > hit). 32767 is not an internal limit; and as such, it is absolutely an artificial limit. I had to use 48941 just a few years ago, and that value (and others larger than 32767) are fully supported. > Removing limits is after all the point of "0". I'm also not sure if it > is simply for backwards compatibility. We commonly let "0" or "-1" mean > "no limit" for convenience. It seems like something we'd want to > support. Making 0 mean unlimited could be done, and I think it'd be a one-line change, but that's not what commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean -l<large>, 2017-11-29) tried to do. I'm not opposed to such a change in the meaning of "0", but I am opposed to documenting this value as unlimited unless we make it so. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 3/4] doc: document the special handling of -l0 2021-07-14 17:17 ` Elijah Newren @ 2021-07-14 17:33 ` Jeff King 2021-07-14 19:32 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2021-07-14 17:33 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 10:17:21AM -0700, Elijah Newren wrote: > > > +Note that for backward compatibility reasons, a value of 0 is treated > > > +the same as if a large value was passed (currently, 32767). > > > > Given the confusion around what "32767" even means to users, I wonder if > > we could just say: a value of 0 removes any artificial limits (but Git > > still has some internal limits which real-world cases are not likely to > > hit). > > 32767 is not an internal limit; and as such, it is absolutely an > artificial limit. I had to use 48941 just a few years ago, and that > value (and others larger than 32767) are fully supported. OK, I thought there was still some 32-bit m*n limits, but I guess not. > > Removing limits is after all the point of "0". I'm also not sure if it > > is simply for backwards compatibility. We commonly let "0" or "-1" mean > > "no limit" for convenience. It seems like something we'd want to > > support. > > Making 0 mean unlimited could be done, and I think it'd be a one-line > change, but that's not what commit 89973554b52c (diffcore-rename: make > diff-tree -l0 mean -l<large>, 2017-11-29) tried to do. > > I'm not opposed to such a change in the meaning of "0", but I am > opposed to documenting this value as unlimited unless we make it so. Thanks for clarifying. It does seem silly that "0" means "big, but kind of arbitrary". But I'm OK to punt on that change for now (and in the meantime, I have no real opinion on whether or how to document the current behavior). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 3/4] doc: document the special handling of -l0 2021-07-14 17:33 ` Jeff King @ 2021-07-14 19:32 ` Elijah Newren 0 siblings, 0 replies; 44+ messages in thread From: Elijah Newren @ 2021-07-14 19:32 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 10:33 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jul 14, 2021 at 10:17:21AM -0700, Elijah Newren wrote: > > > > > +Note that for backward compatibility reasons, a value of 0 is treated > > > > +the same as if a large value was passed (currently, 32767). > > > > > > Given the confusion around what "32767" even means to users, I wonder if > > > we could just say: a value of 0 removes any artificial limits (but Git > > > still has some internal limits which real-world cases are not likely to > > > hit). > > > > 32767 is not an internal limit; and as such, it is absolutely an > > artificial limit. I had to use 48941 just a few years ago, and that > > value (and others larger than 32767) are fully supported. > > OK, I thought there was still some 32-bit m*n limits, but I guess not. There was once upon a time, but not since 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit, 2017-11-13). > > > Removing limits is after all the point of "0". I'm also not sure if it > > > is simply for backwards compatibility. We commonly let "0" or "-1" mean > > > "no limit" for convenience. It seems like something we'd want to > > > support. > > > > Making 0 mean unlimited could be done, and I think it'd be a one-line > > change, but that's not what commit 89973554b52c (diffcore-rename: make > > diff-tree -l0 mean -l<large>, 2017-11-29) tried to do. > > > > I'm not opposed to such a change in the meaning of "0", but I am > > opposed to documenting this value as unlimited unless we make it so. > > Thanks for clarifying. It does seem silly that "0" means "big, but kind > of arbitrary". But I'm OK to punt on that change for now (and in the > meantime, I have no real opinion on whether or how to document the > current behavior). > > -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-07-14 1:12 ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget @ 2021-07-14 1:12 ` Elijah Newren via GitGitGadget 2021-07-14 16:43 ` Jeff King 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 4 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-14 1:12 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> These were last bumped in commit 92c57e5c1d29 (bump rename limit defaults (again), 2011-02-19), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting). Since that time: * Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently) * Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks * Processors have gotten faster * It has been discovered that the timing methodology used last time probably used too large example files. The last point is probably worth explaining a bit more: * The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it). * Since bigger files are modified more frequently, such a computation weights towards larger files. * Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel. * The mean blob size in v2.6.25 was half the average blob size in history leading to that point * The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25. * Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar). * Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones. The combined effect of the above is that the file size used in past calculations was likely about 5x too large. Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits. Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p), but the experience above suggests merge.renameLimit could be extended significantly. In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed. (See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/ for details in that area.) For now, let's just bump the approximate time limit from 10s to 1m. (Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.) Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): N Timing 1300 1.995s 7100 59.973s So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000. Here is the measure_rename_perf script (adapted from https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/ in particular to avoid triggering the linear handling from basename-guided rename detection): #!/bin/bash n=$1; shift rm -rf repo mkdir repo && cd repo git init -q -b main mkdata() { mkdir $1 for i in `seq 1 $2`; do (sed "s/^/$i /" <../sample echo tag: $1 ) >$1/$i done } mkdata initial $n git add . git commit -q -m initial mkdata new $n git add . cd new for i in *; do git mv $i $i.renamed; done cd .. git rm -q -rf initial git commit -q -m new time git diff-tree -M -l0 --summary HEAD^ HEAD Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/config/diff.txt | 2 +- Documentation/config/merge.txt | 2 +- diff.c | 2 +- merge-ort.c | 2 +- merge-recursive.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt index e26a63d0d42..167c479f376 100644 --- a/Documentation/config/diff.txt +++ b/Documentation/config/diff.txt @@ -120,7 +120,7 @@ diff.orderFile:: diff.renameLimit:: The number of files to consider in the exhaustive portion of copy/rename detection; equivalent to the 'git diff' option - `-l`. If not set, the default value is 400. This setting has + `-l`. If not set, the default value is 1000. This setting has no effect if rename detection is turned off. diff.renames:: diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt index aca0c92dbe6..101eabb3f0c 100644 --- a/Documentation/config/merge.txt +++ b/Documentation/config/merge.txt @@ -37,7 +37,7 @@ merge.renameLimit:: rename detection during a merge. If not specified, defaults to the value of diff.renameLimit. If neither merge.renameLimit nor diff.renameLimit are specified, defaults - to 1000. This setting has no effect if rename detection is + to 7000. This setting has no effect if rename detection is turned off. merge.renames:: diff --git a/diff.c b/diff.c index 2454e34cf6d..0244a371d32 100644 --- a/diff.c +++ b/diff.c @@ -35,7 +35,7 @@ static int diff_detect_rename_default; static int diff_indent_heuristic = 1; -static int diff_rename_limit_default = 400; +static int diff_rename_limit_default = 1000; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; static int diff_color_moved_default; diff --git a/merge-ort.c b/merge-ort.c index b954f7184a5..8a84375e940 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2558,7 +2558,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_opts.detect_rename = DIFF_DETECT_RENAME; diff_opts.rename_limit = opt->rename_limit; if (opt->rename_limit <= 0) - diff_opts.rename_limit = 1000; + diff_opts.rename_limit = 7000; diff_opts.rename_score = opt->rename_score; diff_opts.show_rename_progress = opt->show_rename_progress; diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT; diff --git a/merge-recursive.c b/merge-recursive.c index 4327e0cfa33..f19f8cc37bd 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1879,7 +1879,7 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *opt, */ if (opts.detect_rename > DIFF_DETECT_RENAME) opts.detect_rename = DIFF_DETECT_RENAME; - opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 1000; + opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 7000; opts.rename_score = opt->rename_score; opts.show_rename_progress = opt->show_rename_progress; opts.output_format = DIFF_FORMAT_NO_OUTPUT; -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 1:12 ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget @ 2021-07-14 16:43 ` Jeff King 2021-07-14 17:32 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2021-07-14 16:43 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 01:12:33AM +0000, Elijah Newren via GitGitGadget wrote: > The combined effect of the above is that the file size used in past > calculations was likely about 5x too large. Combine that with a CPU > performance improvement of ~30%, and we can increase the limits by > a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated > time limits. It's slightly sad that we only got a 30% CPU improvement in the past 10 years. Are you just counting clock speed as a short-hand here? I think that doesn't tell the whole story. But all of this is a side-note anyway. What I care about is your actual timings. :) (It also seems like this rename detection is ripe for parallelization, but obviously that's a totally separate topic). > Using the original time limit of 2s for diff.renameLimit, and bumping > merge.renameLimit from 10s to 60s, I found the following timings using > the simple script at the end of this commit message (on an AWS c5.xlarge > which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): > > N Timing > 1300 1.995s > 7100 59.973s > > So let's round down to nice even numbers and bump the limits from > 400->1000, and from 1000->7000. Sounds good. Thanks for thoroughly measuring and updating. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 16:43 ` Jeff King @ 2021-07-14 17:32 ` Elijah Newren 2021-07-14 17:57 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 17:32 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 9:43 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jul 14, 2021 at 01:12:33AM +0000, Elijah Newren via GitGitGadget wrote: > > > The combined effect of the above is that the file size used in past > > calculations was likely about 5x too large. Combine that with a CPU > > performance improvement of ~30%, and we can increase the limits by > > a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated > > time limits. > > It's slightly sad that we only got a 30% CPU improvement in the past 10 > years. Are you just counting clock speed as a short-hand here? I think > that doesn't tell the whole story. But all of this is a side-note > anyway. What I care about is your actual timings. :) I'm using shorthand when discussing file sizes above (though I used actual measurements when picking new values below). But the 30% came from measuring the timings with the exact same sample file as you and using a lightly modified version of your original script (tweaked to also change file basenames) on an AWS c5xlarge instance. My timings showed they were only about 30% faster than what you got when you last bumped the limits. All my laptops and even my work machine are pretty old, so I picked an AWS c5xlarge instance hoping it'd represent a relatively new machine (since your benchmarks when you bumped were using a new machine at the time). Maybe the stock c5xlarge instance I picked wasn't a good choice (wasn't new enough? had a low relative clock speed? affected by Spectre patches? had slow disks? something else?). Or maybe single processor speed really did just hit up against Moore's law and nearly all gains have shifted to having more cores available which don't benefit us here since our algorithm is serial. > (It also seems like this rename detection is ripe for parallelization, > but obviously that's a totally separate topic). True. > > Using the original time limit of 2s for diff.renameLimit, and bumping > > merge.renameLimit from 10s to 60s, I found the following timings using > > the simple script at the end of this commit message (on an AWS c5.xlarge > > which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): > > > > N Timing > > 1300 1.995s > > 7100 59.973s > > > > So let's round down to nice even numbers and bump the limits from > > 400->1000, and from 1000->7000. > > Sounds good. Thanks for thoroughly measuring and updating. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 17:32 ` Elijah Newren @ 2021-07-14 17:57 ` Jeff King 2021-07-14 20:03 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2021-07-14 17:57 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 10:32:56AM -0700, Elijah Newren wrote: > > It's slightly sad that we only got a 30% CPU improvement in the past 10 > > years. Are you just counting clock speed as a short-hand here? I think > > that doesn't tell the whole story. But all of this is a side-note > > anyway. What I care about is your actual timings. :) > > I'm using shorthand when discussing file sizes above (though I used > actual measurements when picking new values below). But the 30% came > from measuring the timings with the exact same sample file as you and > using a lightly modified version of your original script (tweaked to > also change file basenames) on an AWS c5xlarge instance. My timings > showed they were only about 30% faster than what you got when you last > bumped the limits. Interesting. My timings are much faster. With a 20k file, I get (on my laptop, which is an i9-9880H): N CPU (2008) CPU (now) 10 0.43s 0.007s 100 0.44s 0.071s 200 1.40s 0.226s 400 4.87s 0.788s 800 18.08s 2.887s 1000 27.82s 4.431s The 2008 timings are from the old email you linked in your commit message, and the new one is from running the revised script you showed. The savings seem like more than 30%. I don't know if that's all CPU or if something changed in the code. Using a 3k file (the median for ls-tree), numbers are similar, but a little smaller (my n=1300 is about 1.4s). So I think we're both in the same ballpark (and certainly an AWS machine is a perfectly fine representative sample of where people might run Git). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 17:57 ` Jeff King @ 2021-07-14 20:03 ` Elijah Newren 2021-07-14 20:47 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2021-07-14 20:03 UTC (permalink / raw) To: Jeff King Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 10:57 AM Jeff King <peff@peff.net> wrote: > > On Wed, Jul 14, 2021 at 10:32:56AM -0700, Elijah Newren wrote: > > > > It's slightly sad that we only got a 30% CPU improvement in the past 10 > > > years. Are you just counting clock speed as a short-hand here? I think > > > that doesn't tell the whole story. But all of this is a side-note > > > anyway. What I care about is your actual timings. :) > > > > I'm using shorthand when discussing file sizes above (though I used > > actual measurements when picking new values below). But the 30% came > > from measuring the timings with the exact same sample file as you and > > using a lightly modified version of your original script (tweaked to > > also change file basenames) on an AWS c5xlarge instance. My timings > > showed they were only about 30% faster than what you got when you last > > bumped the limits. > > Interesting. My timings are much faster. With a 20k file, I get (on my > laptop, which is an i9-9880H): > > N CPU (2008) CPU (now) > 10 0.43s 0.007s > 100 0.44s 0.071s > 200 1.40s 0.226s > 400 4.87s 0.788s > 800 18.08s 2.887s > 1000 27.82s 4.431s > > The 2008 timings are from the old email you linked in your commit > message, and the new one is from running the revised script you showed. > The savings seem like more than 30%. I don't know if that's all CPU or > if something changed in the code. I was using the script you wrote in 2008, but comparing to your reported numbers in 2011[1]. When you bumped in 2011, you said you picked the limits of 400 & 1000 in order to give rough timings of 2s and 10s. So the table looks more like: N CPU (2008) CPU (2011) c5xlarge CPU (yours) 400 4.87s ~2s 1.106s 0.788s 1000 27.82s ~10s 6.350s 4.431s So, 2011->c5xlarge on these (just recomputed now numbers) show improvements of ~45% and ~36%. Maybe I had an outlier run earlier that was in the upper 6s range for N=1000 and I rounded off to 30% for the commit message? Don't know, those numbers are on a laptop that died in the last few days. But yeah, you are seeing a bigger improvement than I did; 2011->your current laptop shows roughly 56% - 60% improvement. [1] https://lore.kernel.org/git/20110219102128.GB22508@sigill.intra.peff.net/ > Using a 3k file (the median for ls-tree), numbers are similar, but a > little smaller (my n=1300 is about 1.4s). So I think we're both in the > same ballpark (and certainly an AWS machine is a perfectly fine > representative sample of where people might run Git). Yeah, dividing any of the timings I get by the ones you get seem to be giving a value somewhere around 1.4, so that seems reassuring on the consistency front. That doesn't fix my jealousy of your faster CPU, but that's a separate problem. :-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 4/4] Bump rename limit defaults (yet again) 2021-07-14 20:03 ` Elijah Newren @ 2021-07-14 20:47 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2021-07-14 20:47 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Bagas Sanjaya, Eric Sunshine, Derrick Stolee, Ævar Arnfjörð Bjarmason On Wed, Jul 14, 2021 at 01:03:43PM -0700, Elijah Newren wrote: > > The 2008 timings are from the old email you linked in your commit > > message, and the new one is from running the revised script you showed. > > The savings seem like more than 30%. I don't know if that's all CPU or > > if something changed in the code. > > I was using the script you wrote in 2008, but comparing to your > reported numbers in 2011[1]. When you bumped in 2011, you said you > picked the limits of 400 & 1000 in order to give rough timings of 2s > and 10s. So the table looks more like: > > N CPU (2008) CPU (2011) c5xlarge CPU (yours) > 400 4.87s ~2s 1.106s 0.788s > 1000 27.82s ~10s 6.350s 4.431s > > So, 2011->c5xlarge on these (just recomputed now numbers) show > improvements of ~45% and ~36%. Maybe I had an outlier run earlier > that was in the upper 6s range for N=1000 and I rounded off to 30% for > the commit message? Don't know, those numbers are on a laptop that > died in the last few days. Ah, right. I forgot about the 2011 update. So yeah, things are getting faster, but not as much as I would have hoped as somebody who came of age during the clock speed boom of the 90's. :) Thanks for humoring my puzzlement (I don't think any of this changes the applicability of your patch). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-07-14 1:12 ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget @ 2021-07-15 0:45 ` Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget ` (5 more replies) 4 siblings, 6 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-15 0:45 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren Fix a few small issues with documentation and warnings around the limits for the quadratic portion of rename (©) detection, and bump the default limits. Discussion on bumping the limits can be found at [1]. Although it appears we generally agree we could switch to an unlimited setting for merge.renameLimit, that would require some changes to progress bars to notify users how to take action once things start taking a while. So, for now, just bump the limits. [1] https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u Changes since v2: * Change the meaning of "0" to actually mean unlimited, and modify the documentation to mention that. * Added 'currently' to descriptions to make it clear the defaults are likely to change (again). * Added a brief explanation of the exhaustive portion of rename detection, as requested by Ævar (though, honestly, I think the thing that actually helps people pick values for the limit is the warning that tells people that rename detection was skipped and how high they need to set the limit if they want to redo the operation and get renames). Changes since v1: * Shuffled patch order since the explanation of why "inexact rename detection" is incorrect was in the third patch * Use the term "exhaustive rename detection" for the quadratic portion * Simplify -l description by just stating that it defaults to diff.renameLimit (since it in turn has the right default value) * Fix asciidoc formating * Include bump of the limits in a new patch Elijah Newren (4): diff: correct warning message when renameLimit exceeded doc: clarify documentation for rename/copy limits diffcore-rename: treat a rename_limit of 0 as unlimited Bump rename limit defaults (yet again) Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 16 +++++++++++----- diff.c | 4 ++-- diffcore-rename.c | 2 +- merge-ort.c | 2 +- merge-recursive.c | 2 +- 7 files changed, 26 insertions(+), 17 deletions(-) base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1044%2Fnewren%2Frename-limit-documentation-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1044/newren/rename-limit-documentation-v3 Pull-Request: https://github.com/git/git/pull/1044 Range-diff vs v2: 1: 0d1d0f180a3 = 1: 0d1d0f180a3 diff: correct warning message when renameLimit exceeded 2: 4046993a9a2 ! 2: 193385d7ca1 doc: clarify documentation for rename/copy limits @@ Documentation/config/diff.txt: diff.orderFile:: - has no effect if rename detection is turned off. + The number of files to consider in the exhaustive portion of + copy/rename detection; equivalent to the 'git diff' option -+ `-l`. If not set, the default value is 400. This setting has -+ no effect if rename detection is turned off. ++ `-l`. If not set, the default value is currently 400. This ++ setting has no effect if rename detection is turned off. diff.renames:: Whether and how Git detects renames. If set to "false", @@ Documentation/config/merge.txt: merge.verifySignatures:: + The number of files to consider in the exhaustive portion of + rename detection during a merge. If not specified, defaults + to the value of diff.renameLimit. If neither -+ merge.renameLimit nor diff.renameLimit are specified, defaults -+ to 1000. This setting has no effect if rename detection is -+ turned off. ++ merge.renameLimit nor diff.renameLimit are specified, ++ currently defaults to 1000. This setting has no effect if ++ rename detection is turned off. merge.renames:: Whether Git detects renames. If set to "false", rename detection @@ Documentation/diff-options.txt: When used together with `-B`, omit also the prei - The `-M` and `-C` options require O(n^2) processing time where n - is the number of potential rename/copy targets. This - option prevents rename/copy detection from running if -+ The `-M` and `-C` options have an exhaustive portion that -+ requires O(n^2) processing time where n is the number of -+ potential rename/copy targets. This option prevents the -+ exhaustive portion of rename/copy detection from running if - the number of rename/copy targets exceeds the specified +- the number of rename/copy targets exceeds the specified - number. -+ number. Defaults to diff.renameLimit. ++ The `-M` and `-C` options involve some preliminary steps that ++ can detect subsets of renames/copies cheaply, followed by an ++ exhaustive fallback portion that compares all remaining ++ unpaired destinations to all relevant sources. (For renames, ++ only remaining unpaired sources are relevant; for copies, all ++ original sources are relevant.) For N sources and ++ destinations, this exhaustive check is O(N^2). This option ++ prevents the exhaustive portion of rename/copy detection from ++ running if the number of source/destination files involved ++ exceeds the specified number. Defaults to diff.renameLimit. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: 3: 6f5767607cd ! 3: 00a2072baea doc: document the special handling of -l0 @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - doc: document the special handling of -l0 + diffcore-rename: treat a rename_limit of 0 as unlimited - As noted in commit 89973554b52c (diffcore-rename: make diff-tree -l0 - mean -l<large>, 2017-11-29), -l0 has had a magical special "large" - historical value associated with it. Document this value, particularly - since it is not large enough for some uses -- see commit 9f7e4bfa3b6d - (diff: remove silent clamp of renameLimit, 2017-11-13). + In commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean + -l<large>, 2017-11-29), -l0 was given a special magical "large" value, + but one which was not large enough for some uses (as can be seen from + commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit, + 2017-11-13). Make 0 (or a negative value) be treated as unlimited + instead and update the documentation to mention this. Signed-off-by: Elijah Newren <newren@gmail.com> ## Documentation/diff-options.txt ## @@ Documentation/diff-options.txt: of a delete/create pair. - exhaustive portion of rename/copy detection from running if - the number of rename/copy targets exceeds the specified - number. Defaults to diff.renameLimit. -++ -+Note that for backward compatibility reasons, a value of 0 is treated -+the same as if a large value was passed (currently, 32767). + prevents the exhaustive portion of rename/copy detection from + running if the number of source/destination files involved + exceeds the specified number. Defaults to diff.renameLimit. ++ Note that a value of 0 is treated as unlimited. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: + + ## diffcore-rename.c ## +@@ diffcore-rename.c: static int too_many_rename_candidates(int num_destinations, int num_sources, + * memory for the matrix anyway. + */ + if (rename_limit <= 0) +- rename_limit = 32767; ++ return 0; /* treat as unlimited */ + if (st_mult(num_destinations, num_sources) + <= st_mult(rename_limit, rename_limit)) + return 0; 4: 8f1deb6dd16 ! 4: b41278b6680 Bump rename limit defaults (yet again) @@ Documentation/config/diff.txt: diff.orderFile:: diff.renameLimit:: The number of files to consider in the exhaustive portion of copy/rename detection; equivalent to the 'git diff' option -- `-l`. If not set, the default value is 400. This setting has -+ `-l`. If not set, the default value is 1000. This setting has - no effect if rename detection is turned off. +- `-l`. If not set, the default value is currently 400. This ++ `-l`. If not set, the default value is currently 1000. This + setting has no effect if rename detection is turned off. diff.renames:: @@ Documentation/config/merge.txt @@ Documentation/config/merge.txt: merge.renameLimit:: rename detection during a merge. If not specified, defaults to the value of diff.renameLimit. If neither - merge.renameLimit nor diff.renameLimit are specified, defaults -- to 1000. This setting has no effect if rename detection is -+ to 7000. This setting has no effect if rename detection is - turned off. + merge.renameLimit nor diff.renameLimit are specified, +- currently defaults to 1000. This setting has no effect if ++ currently defaults to 7000. This setting has no effect if + rename detection is turned off. merge.renames:: -- gitgitgadget ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget @ 2021-07-15 0:45 ` Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget ` (4 subsequent siblings) 5 siblings, 0 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-15 0:45 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The warning when quadratic rename detection was skipped referred to "inexact rename detection". For years, the only linear portion of rename detection was looking for exact renames, so "inexact rename detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14). Let's instead use the term "exhaustive rename detection" to refer to the quadratic portion. Signed-off-by: Elijah Newren <newren@gmail.com> --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 52c791574b7..2454e34cf6d 100644 --- a/diff.c +++ b/diff.c @@ -6284,7 +6284,7 @@ static int is_summary_empty(const struct diff_queue_struct *q) } static const char rename_limit_warning[] = -N_("inexact rename detection was skipped due to too many files."); +N_("exhaustive rename detection was skipped due to too many files."); static const char degrade_cc_to_c_warning[] = N_("only found copies from modified paths due to too many files."); -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 2/4] doc: clarify documentation for rename/copy limits 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget @ 2021-07-15 0:45 ` Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget ` (3 subsequent siblings) 5 siblings, 0 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-15 0:45 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A few places in the docs implied that rename/copy detection is always quadratic or that all (unpaired) files were involved in the quadratic portion of rename/copy detection. The following two commits each introduced an exception to this: 9027f53cb505 (Do linear-time/space rename logic for exact renames, 2007-10-25) bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14) (As a side note, for copy detection, the basename guided inexact rename detection is turned off and the exact renames will only result in sources (without the dests) being removed from the set of files used in quadratic detection. So, for copy detection, the documentation was closer to correct.) Avoid implying that all files involved in rename/copy detection are subject to the full quadratic algorithm. While at it, also note the default values for all these settings. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/config/diff.txt | 7 ++++--- Documentation/config/merge.txt | 10 ++++++---- Documentation/diff-options.txt | 15 ++++++++++----- 3 files changed, 20 insertions(+), 12 deletions(-) diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt index 2d3331f55c2..d1b5cfa3542 100644 --- a/Documentation/config/diff.txt +++ b/Documentation/config/diff.txt @@ -118,9 +118,10 @@ diff.orderFile:: relative to the top of the working tree. diff.renameLimit:: - The number of files to consider when performing the copy/rename - detection; equivalent to the 'git diff' option `-l`. This setting - has no effect if rename detection is turned off. + The number of files to consider in the exhaustive portion of + copy/rename detection; equivalent to the 'git diff' option + `-l`. If not set, the default value is currently 400. This + setting has no effect if rename detection is turned off. diff.renames:: Whether and how Git detects renames. If set to "false", diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt index 6b66c83eabe..7cd6d7883b6 100644 --- a/Documentation/config/merge.txt +++ b/Documentation/config/merge.txt @@ -33,10 +33,12 @@ merge.verifySignatures:: include::fmt-merge-msg.txt[] merge.renameLimit:: - The number of files to consider when performing rename detection - during a merge; if not specified, defaults to the value of - diff.renameLimit. This setting has no effect if rename detection - is turned off. + The number of files to consider in the exhaustive portion of + rename detection during a merge. If not specified, defaults + to the value of diff.renameLimit. If neither + merge.renameLimit nor diff.renameLimit are specified, + currently defaults to 1000. This setting has no effect if + rename detection is turned off. merge.renames:: Whether Git detects renames. If set to "false", rename detection diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 32e6dee5ac3..58acfff9289 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -588,11 +588,16 @@ When used together with `-B`, omit also the preimage in the deletion part of a delete/create pair. -l<num>:: - The `-M` and `-C` options require O(n^2) processing time where n - is the number of potential rename/copy targets. This - option prevents rename/copy detection from running if - the number of rename/copy targets exceeds the specified - number. + The `-M` and `-C` options involve some preliminary steps that + can detect subsets of renames/copies cheaply, followed by an + exhaustive fallback portion that compares all remaining + unpaired destinations to all relevant sources. (For renames, + only remaining unpaired sources are relevant; for copies, all + original sources are relevant.) For N sources and + destinations, this exhaustive check is O(N^2). This option + prevents the exhaustive portion of rename/copy detection from + running if the number of source/destination files involved + exceeds the specified number. Defaults to diff.renameLimit. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget @ 2021-07-15 0:45 ` Elijah Newren via GitGitGadget 2021-07-15 23:17 ` Junio C Hamano 2021-07-15 0:45 ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget ` (2 subsequent siblings) 5 siblings, 1 reply; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-15 0:45 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean -l<large>, 2017-11-29), -l0 was given a special magical "large" value, but one which was not large enough for some uses (as can be seen from commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit, 2017-11-13). Make 0 (or a negative value) be treated as unlimited instead and update the documentation to mention this. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/diff-options.txt | 1 + diffcore-rename.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 58acfff9289..0aebe832057 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -598,6 +598,7 @@ of a delete/create pair. prevents the exhaustive portion of rename/copy detection from running if the number of source/destination files involved exceeds the specified number. Defaults to diff.renameLimit. + Note that a value of 0 is treated as unlimited. ifndef::git-format-patch[] --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]:: diff --git a/diffcore-rename.c b/diffcore-rename.c index 3375e24659e..513ba7b05f1 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -1021,7 +1021,7 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, * memory for the matrix anyway. */ if (rename_limit <= 0) - rename_limit = 32767; + return 0; /* treat as unlimited */ if (st_mult(num_destinations, num_sources) <= st_mult(rename_limit, rename_limit)) return 0; -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited 2021-07-15 0:45 ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget @ 2021-07-15 23:17 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2021-07-15 23:17 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 3375e24659e..513ba7b05f1 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -1021,7 +1021,7 @@ static int too_many_rename_candidates(int num_destinations, int num_sources, > * memory for the matrix anyway. > */ > if (rename_limit <= 0) > - rename_limit = 32767; > + return 0; /* treat as unlimited */ OK. As this short-cuts, the impact a non-positive value may have on all the remainder of the function (like limit squared must be larger than the matrix) would not have to be compensated for. Very simple and clean. > if (st_mult(num_destinations, num_sources) > <= st_mult(rename_limit, rename_limit)) > return 0; ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 4/4] Bump rename limit defaults (yet again) 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-07-15 0:45 ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget @ 2021-07-15 0:45 ` Elijah Newren via GitGitGadget 2021-07-15 13:36 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee 2021-07-15 23:20 ` Junio C Hamano 5 siblings, 0 replies; 44+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-07-15 0:45 UTC (permalink / raw) To: git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> These were last bumped in commit 92c57e5c1d29 (bump rename limit defaults (again), 2011-02-19), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting). Since that time: * Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently) * Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks * Processors have gotten faster * It has been discovered that the timing methodology used last time probably used too large example files. The last point is probably worth explaining a bit more: * The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it). * Since bigger files are modified more frequently, such a computation weights towards larger files. * Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel. * The mean blob size in v2.6.25 was half the average blob size in history leading to that point * The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25. * Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar). * Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones. The combined effect of the above is that the file size used in past calculations was likely about 5x too large. Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits. Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p), but the experience above suggests merge.renameLimit could be extended significantly. In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed. (See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/ for details in that area.) For now, let's just bump the approximate time limit from 10s to 1m. (Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.) Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): N Timing 1300 1.995s 7100 59.973s So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000. Here is the measure_rename_perf script (adapted from https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/ in particular to avoid triggering the linear handling from basename-guided rename detection): #!/bin/bash n=$1; shift rm -rf repo mkdir repo && cd repo git init -q -b main mkdata() { mkdir $1 for i in `seq 1 $2`; do (sed "s/^/$i /" <../sample echo tag: $1 ) >$1/$i done } mkdata initial $n git add . git commit -q -m initial mkdata new $n git add . cd new for i in *; do git mv $i $i.renamed; done cd .. git rm -q -rf initial git commit -q -m new time git diff-tree -M -l0 --summary HEAD^ HEAD Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/config/diff.txt | 2 +- Documentation/config/merge.txt | 2 +- diff.c | 2 +- merge-ort.c | 2 +- merge-recursive.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt index d1b5cfa3542..32f84838ac1 100644 --- a/Documentation/config/diff.txt +++ b/Documentation/config/diff.txt @@ -120,7 +120,7 @@ diff.orderFile:: diff.renameLimit:: The number of files to consider in the exhaustive portion of copy/rename detection; equivalent to the 'git diff' option - `-l`. If not set, the default value is currently 400. This + `-l`. If not set, the default value is currently 1000. This setting has no effect if rename detection is turned off. diff.renames:: diff --git a/Documentation/config/merge.txt b/Documentation/config/merge.txt index 7cd6d7883b6..e27cc639447 100644 --- a/Documentation/config/merge.txt +++ b/Documentation/config/merge.txt @@ -37,7 +37,7 @@ merge.renameLimit:: rename detection during a merge. If not specified, defaults to the value of diff.renameLimit. If neither merge.renameLimit nor diff.renameLimit are specified, - currently defaults to 1000. This setting has no effect if + currently defaults to 7000. This setting has no effect if rename detection is turned off. merge.renames:: diff --git a/diff.c b/diff.c index 2454e34cf6d..0244a371d32 100644 --- a/diff.c +++ b/diff.c @@ -35,7 +35,7 @@ static int diff_detect_rename_default; static int diff_indent_heuristic = 1; -static int diff_rename_limit_default = 400; +static int diff_rename_limit_default = 1000; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; static int diff_color_moved_default; diff --git a/merge-ort.c b/merge-ort.c index b954f7184a5..8a84375e940 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -2558,7 +2558,7 @@ static void detect_regular_renames(struct merge_options *opt, diff_opts.detect_rename = DIFF_DETECT_RENAME; diff_opts.rename_limit = opt->rename_limit; if (opt->rename_limit <= 0) - diff_opts.rename_limit = 1000; + diff_opts.rename_limit = 7000; diff_opts.rename_score = opt->rename_score; diff_opts.show_rename_progress = opt->show_rename_progress; diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT; diff --git a/merge-recursive.c b/merge-recursive.c index 4327e0cfa33..f19f8cc37bd 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1879,7 +1879,7 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *opt, */ if (opts.detect_rename > DIFF_DETECT_RENAME) opts.detect_rename = DIFF_DETECT_RENAME; - opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 1000; + opts.rename_limit = (opt->rename_limit >= 0) ? opt->rename_limit : 7000; opts.rename_score = opt->rename_score; opts.show_rename_progress = opt->show_rename_progress; opts.output_format = DIFF_FORMAT_NO_OUTPUT; -- gitgitgadget ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-07-15 0:45 ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget @ 2021-07-15 13:36 ` Derrick Stolee 2021-07-15 23:20 ` Junio C Hamano 5 siblings, 0 replies; 44+ messages in thread From: Derrick Stolee @ 2021-07-15 13:36 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Bagas Sanjaya, Elijah Newren, Eric Sunshine, Jeff King, Ævar Arnfjörð Bjarmason On 7/14/2021 8:45 PM, Elijah Newren via GitGitGadget wrote: > Fix a few small issues with documentation and warnings around the limits for > the quadratic portion of rename (©) detection, and bump the default > limits. > > Discussion on bumping the limits can be found at [1]. Although it appears we > generally agree we could switch to an unlimited setting for > merge.renameLimit, that would require some changes to progress bars to > notify users how to take action once things start taking a while. So, for > now, just bump the limits. I briefly read through the responses to v2, and it looks like you updated your v3 appropriately. I think the changes since v2 also satisfy my comments on v1. This version LGTM. Thanks, -Stolee ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-07-15 13:36 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee @ 2021-07-15 23:20 ` Junio C Hamano 5 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2021-07-15 23:20 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Bagas Sanjaya, Elijah Newren, Eric Sunshine, Derrick Stolee, Jeff King, Ævar Arnfjörð Bjarmason "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > Fix a few small issues with documentation and warnings around the limits for > the quadratic portion of rename (©) detection, and bump the default > limits. > > Discussion on bumping the limits can be found at [1]. Although it appears we > generally agree we could switch to an unlimited setting for > merge.renameLimit, that would require some changes to progress bars to > notify users how to take action once things start taking a while. So, for > now, just bump the limits. > > [1] > https://lore.kernel.org/git/CABPp-BFzp3TCWiF1QAVSfywDLYrz=GOQszVM-sw5p0rSB8RWvw@mail.gmail.com/T/#u > > Changes since v2: > > * Change the meaning of "0" to actually mean unlimited, and modify the > documentation to mention that. > * Added 'currently' to descriptions to make it clear the defaults are > likely to change (again). > * Added a brief explanation of the exhaustive portion of rename detection, > as requested by Ævar (though, honestly, I think the thing that actually > helps people pick values for the limit is the warning that tells people > that rename detection was skipped and how high they need to set the limit > if they want to redo the operation and get renames). Looks sensible. The final phrase chosen ("exhaustive" part) does sound easy-to-understand, too. ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2021-07-15 23:20 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-11 0:46 [PATCH 0/3] Improve the documentation and warnings dealing with rename/copy limits Elijah Newren via GitGitGadget 2021-07-11 0:46 ` [PATCH 1/3] doc: clarify documentation for " Elijah Newren via GitGitGadget 2021-07-11 4:37 ` Bagas Sanjaya 2021-07-11 4:52 ` Elijah Newren 2021-07-12 15:03 ` Derrick Stolee 2021-07-12 21:27 ` Junio C Hamano 2021-07-11 0:46 ` [PATCH 2/3] doc: document the special handling of -l0 Elijah Newren via GitGitGadget 2021-07-11 4:54 ` Eric Sunshine 2021-07-11 4:54 ` Elijah Newren 2021-07-11 0:46 ` [PATCH 3/3] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-12 15:09 ` Derrick Stolee 2021-07-12 18:13 ` Elijah Newren 2021-07-14 0:47 ` Junio C Hamano 2021-07-14 1:06 ` Elijah Newren 2021-07-14 1:10 ` Junio C Hamano 2021-07-14 1:22 ` Elijah Newren 2021-07-14 5:17 ` Junio C Hamano 2021-07-14 15:09 ` Elijah Newren 2021-07-14 1:12 ` [PATCH v2 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-14 1:12 ` [PATCH v2 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget 2021-07-14 7:37 ` Ævar Arnfjörð Bjarmason 2021-07-14 16:30 ` Elijah Newren 2021-07-14 22:08 ` Ævar Arnfjörð Bjarmason 2021-07-14 22:56 ` Elijah Newren 2021-07-14 1:12 ` [PATCH v2 3/4] doc: document the special handling of -l0 Elijah Newren via GitGitGadget 2021-07-14 16:45 ` Jeff King 2021-07-14 17:17 ` Elijah Newren 2021-07-14 17:33 ` Jeff King 2021-07-14 19:32 ` Elijah Newren 2021-07-14 1:12 ` [PATCH v2 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget 2021-07-14 16:43 ` Jeff King 2021-07-14 17:32 ` Elijah Newren 2021-07-14 17:57 ` Jeff King 2021-07-14 20:03 ` Elijah Newren 2021-07-14 20:47 ` Jeff King 2021-07-15 0:45 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 1/4] diff: correct warning message when renameLimit exceeded Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 2/4] doc: clarify documentation for rename/copy limits Elijah Newren via GitGitGadget 2021-07-15 0:45 ` [PATCH v3 3/4] diffcore-rename: treat a rename_limit of 0 as unlimited Elijah Newren via GitGitGadget 2021-07-15 23:17 ` Junio C Hamano 2021-07-15 0:45 ` [PATCH v3 4/4] Bump rename limit defaults (yet again) Elijah Newren via GitGitGadget 2021-07-15 13:36 ` [PATCH v3 0/4] Rename/copy limits -- docs, warnings, and new defaults Derrick Stolee 2021-07-15 23:20 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.