git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <dstolee@microsoft.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH 8/8] diffcore-rename: determine which relevant_sources are no longer relevant
Date: Sat, 13 Mar 2021 22:22:08 +0000	[thread overview]
Message-ID: <495c10937b7f2b23b7d2a52c254e02ae9ce810a1.1615674128.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.853.git.1615674128.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

As noted a few commits ago ("diffcore-rename: only compute
dir_rename_count for relevant directories"), when a source file rename
is used as part of directory rename detection, we need to increment
counts for each ancestor directory in dirs_removed with value
RELEVANT_FOR_SELF.  However, a few commits ago ("diffcore-rename: check
if we have enough renames for directories early on"), we may have
downgraded all relevant ancestor directories from RELEVANT_FOR_SELF to
RELEVANT_FOR_ANCESTOR.

For a given file, if no ancestor directory is found in dirs_removed with
a value of RELEVANT_FOR_SELF, then we can downgrade
relevant_source[PATH] from RELEVANT_LOCATION to RELEVANT_NO_MORE.  This
means we can skip detecting a rename for that particular path (and any
other paths in the same directory).

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.680 s ±  0.096 s     5.665 s ±  0.129 s
    mega-renames:     13.812 s ±  0.162 s    11.435 s ±  0.158 s
    just-one-mega:   506.0  ms ±  3.9  ms   494.2  ms ±  6.1  ms

While this improvement looks rather modest for these testcases (because
all the previous optimizations were sufficient to nearly remove all time
spent in rename detection already),  consider this alternative testcase
tweaked from the ones in commit 557ac0350d as follows

    <Same initial setup as commit 557ac0350d, then...>
    $ git switch -c add-empty-file v5.5
    $ >drivers/gpu/drm/i915/new-empty-file
    $ git add drivers/gpu/drm/i915/new-empty-file
    $ git commit -m "new file"
    $ git switch 5.4-rename
    $ git cherry-pick --strategy=ort add-empty-file

For this testcase, we see the following improvement:

                            Before                  After
    pick-empty:        1.936 s ±  0.024 s     688.1 ms ±  4.2 ms

So roughly a factor of 3 speedup.  At $DAYJOB, there was a particular
repository and cherry-pick that inspired this optimization; for that
case I saw a speedup factor of 7 with this optimization.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 9844cd48788e..7cc24592617e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -1129,7 +1129,7 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 	 * a majority.
 	 */
 
-	int i;
+	int i, new_num_src;
 	struct hashmap_iter iter;
 	struct strmap_entry *entry;
 
@@ -1193,6 +1193,55 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
 				      RELEVANT_FOR_ANCESTOR);
 		}
 	}
+
+	for (i = 0, new_num_src = 0; i < rename_src_nr; i++) {
+		struct diff_filespec *one = rename_src[i].p->one;
+		int val;
+
+		val = strintmap_get(relevant_sources, one->path);
+
+		/*
+		 * sources that were not found in relevant_sources should
+		 * have already been removed by a prior call to
+		 * remove_unneeded_paths_from_src()
+		 */
+		assert(val != -1);
+
+		if (val == RELEVANT_LOCATION) {
+			int removable = 1;
+			char *dir = get_dirname(one->path);
+			while (1) {
+				char *freeme = dir;
+				int res = strintmap_get(dirs_removed, dir);
+
+				/* Quit if not found or irrelevant */
+				if (res == NOT_RELEVANT)
+					break;
+				/* If RELEVANT_FOR_SELF, can't remove */
+				if (res == RELEVANT_FOR_SELF) {
+					removable = 0;
+					break;
+				}
+				/* Else continue searching upwards */
+				assert(res == RELEVANT_FOR_ANCESTOR);
+				dir = get_dirname(dir);
+				free(freeme);
+			}
+			free(dir);
+			if (removable) {
+				strintmap_set(relevant_sources, one->path,
+					      RELEVANT_NO_MORE);
+				continue;
+			}
+		}
+
+		if (new_num_src < i)
+			memcpy(&rename_src[new_num_src], &rename_src[i],
+			       sizeof(struct diff_rename_src));
+		new_num_src++;
+	}
+
+	rename_src_nr = new_num_src;
 }
 
 void diffcore_rename_extended(struct diff_options *options,
-- 
gitgitgadget

  parent reply	other threads:[~2021-03-13 22:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13 22:22 [PATCH 0/8] Optimization batch 10: avoid detecting even more irrelevant renames Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 1/8] diffcore-rename: take advantage of "majority rules" to skip more renames Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 2/8] merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 3/8] merge-ort: record the reason that we want a rename for a directory Elijah Newren via GitGitGadget
2021-03-15 14:31   ` Derrick Stolee
2021-03-15 15:27     ` Elijah Newren
2021-03-28  2:01       ` Junio C Hamano
2021-03-13 22:22 ` [PATCH 4/8] diffcore-rename: only compute dir_rename_count for relevant directories Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 5/8] diffcore-rename: check if we have enough renames for directories early on Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 6/8] diffcore-rename: add computation of number of unknown renames Elijah Newren via GitGitGadget
2021-03-13 22:22 ` [PATCH 7/8] merge-ort: record the reason that we want a rename for a file Elijah Newren via GitGitGadget
2021-03-13 22:22 ` Elijah Newren via GitGitGadget [this message]
2021-03-15 15:21 ` [PATCH 0/8] Optimization batch 10: avoid detecting even more irrelevant renames Derrick Stolee
2021-03-15 15:34   ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=495c10937b7f2b23b7d2a52c254e02ae9ce810a1.1615674128.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).