git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Elijah Newren <newren@gmail.com>,
	Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection
Date: Thu, 20 May 2021 06:09:34 +0000	[thread overview]
Message-ID: <304a2768c97b190ce982ea4090f461268b1e0b10.1621490982.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.859.v3.git.1621490982.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

When there are many renames between the old base of a series of commits
and the new base for a series of commits, the sequence of merges
employed to transplant those commits (from a cherry-pick or rebase
operation) will repeatedly detect the exact same renames.  This is
wasted effort.

Add data structures which will be used to cache rename detection
results, along with the initialization and deallocation of these data
structures.  Future commits will populate these caches, detect the
appropriate circumstances when they can be used, and employ them to
avoid re-detecting the same renames repeatedly.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b1795d838eaf..709445e817ad 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -140,6 +140,48 @@ struct rename_info {
 	int callback_data_nr, callback_data_alloc;
 	char *callback_data_traverse_path;
 
+	/*
+	 * cached_pairs: Caching of renames and deletions.
+	 *
+	 * These are mappings recording renames and deletions of individual
+	 * files (not directories).  They are thus a map from an old
+	 * filename to either NULL (for deletions) or a new filename (for
+	 * renames).
+	 */
+	struct strmap cached_pairs[3];
+
+	/*
+	 * cached_target_names: just the destinations from cached_pairs
+	 *
+	 * We sometimes want a fast lookup to determine if a given filename
+	 * is one of the destinations in cached_pairs.  cached_target_names
+	 * is thus duplicative information, but it provides a fast lookup.
+	 */
+	struct strset cached_target_names[3];
+
+	/*
+	 * cached_irrelevant: Caching of rename_sources that aren't relevant.
+	 *
+	 * If we try to detect a rename for a source path and succeed, it's
+	 * part of a rename.  If we try to detect a rename for a source path
+	 * and fail, then it's a delete.  If we do not try to detect a rename
+	 * for a path, then we don't know if it's a rename or a delete.  If
+	 * merge-ort doesn't think the path is relevant, then we just won't
+	 * cache anything for that path.  But there's a slight problem in
+	 * that merge-ort can think a path is RELEVANT_LOCATION, but due to
+	 * commit 9bd342137e ("diffcore-rename: determine which
+	 * relevant_sources are no longer relevant", 2021-03-13),
+	 * diffcore-rename can downgrade the path to RELEVANT_NO_MORE.  To
+	 * avoid excessive calls to diffcore_rename_extended() we still need
+	 * to cache such paths, though we cannot record them as either
+	 * renames or deletes.  So we cache them here as a "turned out to be
+	 * irrelevant *for this commit*" as they are often also irrelevant
+	 * for subsequent commits, though we will have to do some extra
+	 * checking to see whether such paths become relevant for rename
+	 * detection when cherry-picking/rebasing subsequent commits.
+	 */
+	struct strset cached_irrelevant[3];
+
 	/*
 	 * needed_limit: value needed for inexact rename detection to run
 	 *
@@ -382,6 +424,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		reinitialize ? strmap_partial_clear : strmap_clear;
 	void (*strintmap_func)(struct strintmap *) =
 		reinitialize ? strintmap_partial_clear : strintmap_clear;
+	void (*strset_func)(struct strset *) =
+		reinitialize ? strset_partial_clear : strset_clear;
 
 	/*
 	 * We marked opti->paths with strdup_strings = 0, so that we
@@ -425,6 +469,9 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strmap_func(&renames->dir_renames[i], 0);
 
 		strintmap_func(&renames->relevant_sources[i]);
+		strset_func(&renames->cached_target_names[i]);
+		strmap_func(&renames->cached_pairs[i], 1);
+		strset_func(&renames->cached_irrelevant[i]);
 	}
 
 	if (!reinitialize) {
@@ -3667,6 +3714,12 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 					 NULL, 0);
 		strintmap_init_with_options(&renames->relevant_sources[i],
 					    0, NULL, 0);
+		strmap_init_with_options(&renames->cached_pairs[i],
+					 NULL, 1);
+		strset_init_with_options(&renames->cached_irrelevant[i],
+					 NULL, 1);
+		strset_init_with_options(&renames->cached_target_names[i],
+					 NULL, 0);
 	}
 
 	/*
-- 
gitgitgadget


  parent reply	other threads:[~2021-05-20  6:10 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 21:32 [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 2/7] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 4/7] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 6/7] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-03-24 22:04 ` [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Junio C Hamano
2021-03-24 23:25   ` Elijah Newren
2021-03-25 18:59     ` Junio C Hamano
2021-03-29 22:34       ` Elijah Newren
2021-03-30 12:07         ` Derrick Stolee
2021-05-04  2:12 ` [PATCH v2 00/13] " Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-17 13:32     ` Derrick Stolee
2021-05-18  3:42       ` Elijah Newren
2021-05-18 13:54         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-17 13:41     ` Derrick Stolee
2021-05-18  3:55       ` Elijah Newren
2021-05-18 13:57         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-17 13:51     ` Derrick Stolee
2021-05-20  0:48       ` Elijah Newren
2021-05-04  2:12   ` [PATCH v2 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-17 14:01     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-17 14:10     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-17 14:16     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-17 14:23     ` Derrick Stolee
2021-05-20  0:36       ` Elijah Newren
2021-05-22 11:17         ` Derrick Stolee
2021-05-14 17:37   ` [PATCH v2 00/13] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren
2021-05-14 21:04     ` Derrick Stolee
2021-05-20  6:09   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-20 11:32       ` Bagas Sanjaya
2021-05-20 15:14         ` Kerry, Richard
2021-05-20 16:34         ` Elijah Newren
2021-05-20  6:09     ` [PATCH v3 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` Elijah Newren via GitGitGadget [this message]
2021-05-20  6:09     ` [PATCH v3 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-22 11:17     ` [PATCH v3 00/13] Optimization batch 11: avoid repeatedly detecting same renames Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=304a2768c97b190ce982ea4090f461268b1e0b10.1621490982.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    --subject='Re: [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox