All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH 17/18] merge-ort: fix a directory rename detection bug
Date: Thu, 07 Jan 2021 20:01:26 +0000	[thread overview]
Message-ID: <551878bd84da895e0899eba620553f6efe60e100.1610049687.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.835.git.1610049687.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

As noted in commit 902c521a35 ("t6423: more involved directory rename
test", 2020-10-15), when we have a case where

  * dir/subdir/ has several files
  * almost all files in dir/subdir/ are renamed to folder/subdir/
  * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
  * the other side of history (that doesn't do the renames) adds a
    new file to dir/subdir/

Then for the majority of the file renames, the directory rename of
   dir/subdir/ -> folder/subdir/
is actually not represented that way but as
   dir/ -> folder/
We also had one rename that was represented as
   dir/subdir/ -> folder/subdir/newsubdir/

Now, since there's a new file in dir/subdir/, where does it go?  Well,
there's only one rule for dir/subdir/, so the code previously noted that
this rule had the "majority" of the one "relevant" rename and thus
erroneously used it to place the file in folder/subdir/newsubdir/.  We
really want the heavy weight associated with dir/ -> folder/ to also be
treated as dir/subdir/ -> folder/subdir/, so that we correctly place the
file in folder/subdir/.

Add a bunch of logic to make sure that we use all relevant renamings in
directory rename detection.

Note that testcase 12f of t6423 still fails after this, but it gets
further than merge-recursive does.  There are some performance related
bits in that testcase (the region_enter messages) that do not yet
succeed, but the rest of the testcase works after this patch.
Subsequent patch series will fix up the performance side.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 198 +++++++++++++++++++++-------------------------------
 1 file changed, 81 insertions(+), 117 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 9e4ebd9c4d6..55a835f8b97 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -764,109 +764,6 @@ static char *apply_dir_rename(struct strmap_entry *rename_info,
 	return strbuf_detach(&new_path, NULL);
 }
 
-static void get_renamed_dir_portion(const char *old_path, const char *new_path,
-				    char **old_dir, char **new_dir)
-{
-	char *end_of_old, *end_of_new;
-
-	/* Default return values: NULL, meaning no rename */
-	*old_dir = NULL;
-	*new_dir = NULL;
-
-	/*
-	 * For
-	 *    "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
-	 * the "e/foo.c" part is the same, we just want to know that
-	 *    "a/b/c/d" was renamed to "a/b/some/thing/else"
-	 * so, for this example, this function returns "a/b/c/d" in
-	 * *old_dir and "a/b/some/thing/else" in *new_dir.
-	 */
-
-	/*
-	 * If the basename of the file changed, we don't care.  We want
-	 * to know which portion of the directory, if any, changed.
-	 */
-	end_of_old = strrchr(old_path, '/');
-	end_of_new = strrchr(new_path, '/');
-
-	/*
-	 * If end_of_old is NULL, old_path wasn't in a directory, so there
-	 * could not be a directory rename (our rule elsewhere that a
-	 * directory which still exists is not considered to have been
-	 * renamed means the root directory can never be renamed -- because
-	 * the root directory always exists).
-	 */
-	if (end_of_old == NULL)
-		return; /* Note: *old_dir and *new_dir are still NULL */
-
-	/*
-	 * If new_path contains no directory (end_of_new is NULL), then we
-	 * have a rename of old_path's directory to the root directory.
-	 */
-	if (end_of_new == NULL) {
-		*old_dir = xstrndup(old_path, end_of_old - old_path);
-		*new_dir = xstrdup("");
-		return;
-	}
-
-	/* Find the first non-matching character traversing backwards */
-	while (*--end_of_new == *--end_of_old &&
-	       end_of_old != old_path &&
-	       end_of_new != new_path)
-		; /* Do nothing; all in the while loop */
-
-	/*
-	 * If both got back to the beginning of their strings, then the
-	 * directory didn't change at all, only the basename did.
-	 */
-	if (end_of_old == old_path && end_of_new == new_path &&
-	    *end_of_old == *end_of_new)
-		return; /* Note: *old_dir and *new_dir are still NULL */
-
-	/*
-	 * If end_of_new got back to the beginning of its string, and
-	 * end_of_old got back to the beginning of some subdirectory, then
-	 * we have a rename/merge of a subdirectory into the root, which
-	 * needs slightly special handling.
-	 *
-	 * Note: There is no need to consider the opposite case, with a
-	 * rename/merge of the root directory into some subdirectory
-	 * because as noted above the root directory always exists so it
-	 * cannot be considered to be renamed.
-	 */
-	if (end_of_new == new_path &&
-	    end_of_old != old_path && end_of_old[-1] == '/') {
-		*old_dir = xstrndup(old_path, --end_of_old - old_path);
-		*new_dir = xstrdup("");
-		return;
-	}
-
-	/*
-	 * We've found the first non-matching character in the directory
-	 * paths.  That means the current characters we were looking at
-	 * were part of the first non-matching subdir name going back from
-	 * the end of the strings.  Get the whole name by advancing both
-	 * end_of_old and end_of_new to the NEXT '/' character.  That will
-	 * represent the entire directory rename.
-	 *
-	 * The reason for the increment is cases like
-	 *    a/b/star/foo/whatever.c -> a/b/tar/foo/random.c
-	 * After dropping the basename and going back to the first
-	 * non-matching character, we're now comparing:
-	 *    a/b/s          and         a/b/
-	 * and we want to be comparing:
-	 *    a/b/star/      and         a/b/tar/
-	 * but without the pre-increment, the one on the right would stay
-	 * a/b/.
-	 */
-	end_of_old = strchr(++end_of_old, '/');
-	end_of_new = strchr(++end_of_new, '/');
-
-	/* Copy the old and new directories into *old_dir and *new_dir. */
-	*old_dir = xstrndup(old_path, end_of_old - old_path);
-	*new_dir = xstrndup(new_path, end_of_new - new_path);
-}
-
 static int path_in_way(struct strmap *paths, const char *path, unsigned side_mask)
 {
 	struct merged_info *mi = strmap_get(paths, path);
@@ -952,6 +849,14 @@ static char *handle_path_level_conflicts(struct merge_options *opt,
 	return new_path;
 }
 
+static void dirname_munge(char *filename)
+{
+	char *slash = strrchr(filename, '/');
+	if (!slash)
+		slash = filename;
+	*slash = '\0';
+}
+
 static void increment_count(struct strmap *dir_rename_count,
 			    char *old_dir,
 			    char *new_dir)
@@ -973,6 +878,76 @@ static void increment_count(struct strmap *dir_rename_count,
 	strintmap_incr(counts, new_dir, 1);
 }
 
+static void update_dir_rename_counts(struct strmap *dir_rename_count,
+				     struct strset *dirs_removed,
+				     const char *oldname,
+				     const char *newname)
+{
+	char *old_dir = xstrdup(oldname);
+	char *new_dir = xstrdup(newname);
+	char new_dir_first_char = new_dir[0];
+	int first_time_in_loop = 1;
+
+	while (1) {
+		dirname_munge(old_dir);
+		dirname_munge(new_dir);
+
+		/*
+		 * When renaming
+		 *   "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
+		 * then this suggests that both
+		 *   a/b/c/d/e/ => a/b/some/thing/else/e/
+		 *   a/b/c/d/   => a/b/some/thing/else/
+		 * so we want to increment counters for both.  We do NOT,
+		 * however, also want to suggest that there was the following
+		 * rename:
+		 *   a/b/c/ => a/b/some/thing/
+		 * so we need to quit at that point.
+		 *
+		 * Note the when first_time_in_loop, we only strip off the
+		 * basename, and we don't care if that's different.
+		 */
+		if (!first_time_in_loop) {
+			char *old_sub_dir = strchr(old_dir, '\0')+1;
+			char *new_sub_dir = strchr(new_dir, '\0')+1;
+			if (!*new_dir) {
+				/*
+				 * Special case when renaming to root directory,
+				 * i.e. when new_dir == "".  In this case, we had
+				 * something like
+				 *    a/b/subdir => subdir
+				 * and so dirname_munge() sets things up so that
+				 *    old_dir = "a/b\0subdir\0"
+				 *    new_dir = "\0ubdir\0"
+				 * We didn't have a '/' to overwrite a '\0' onto
+				 * in new_dir, so we have to compare differently.
+				 */
+				if (new_dir_first_char != old_sub_dir[0] ||
+				    strcmp(old_sub_dir+1, new_sub_dir))
+					break;
+			} else {
+				if (strcmp(old_sub_dir, new_sub_dir))
+					break;
+			}
+		}
+
+		if (strset_contains(dirs_removed, old_dir))
+			increment_count(dir_rename_count, old_dir, new_dir);
+		else
+			break;
+
+		/* If we hit toplevel directory ("") for old or new dir, quit */
+		if (!*old_dir || !*new_dir)
+			break;
+
+		first_time_in_loop = 0;
+	}
+
+	/* Free resources we don't need anymore */
+	free(old_dir);
+	free(new_dir);
+}
+
 static void compute_rename_counts(struct diff_queue_struct *pairs,
 				  struct strmap *dir_rename_count,
 				  struct strset *dirs_removed)
@@ -980,19 +955,11 @@ static void compute_rename_counts(struct diff_queue_struct *pairs,
 	int i;
 
 	for (i = 0; i < pairs->nr; ++i) {
-		char *old_dir, *new_dir;
 		struct diff_filepair *pair = pairs->queue[i];
 
 		if (pair->status != 'R')
 			continue;
 
-		/* Get the old and new directory names */
-		get_renamed_dir_portion(pair->one->path, pair->two->path,
-					&old_dir,        &new_dir);
-		if (!old_dir)
-			/* Directory didn't change at all; ignore this one. */
-			continue;
-
 		/*
 		 * Make dir_rename_count contain a map of a map:
 		 *   old_directory -> {new_directory -> count}
@@ -1000,12 +967,9 @@ static void compute_rename_counts(struct diff_queue_struct *pairs,
 		 * the old filename and the new filename and count how many
 		 * times that pairing occurs.
 		 */
-		if (strset_contains(dirs_removed, old_dir))
-			increment_count(dir_rename_count, old_dir, new_dir);
-
-		/* Free resources we don't need anymore */
-		free(old_dir);
-		free(new_dir);
+		update_dir_rename_counts(dir_rename_count, dirs_removed,
+					 pair->one->path,
+					 pair->two->path);
 	}
 }
 
-- 
gitgitgadget


  parent reply	other threads:[~2021-01-07 20:02 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-07 20:01 [PATCH 00/18] Add directory rename detection to merge-ort Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 01/18] merge-ort: add new data structures for directory rename detection Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 02/18] merge-ort: initialize and free new directory rename data structures Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 03/18] merge-ort: collect which directories are removed in dirs_removed Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 04/18] merge-ort: add outline for computing directory renames Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 05/18] merge-ort: add outline of get_provisional_directory_renames() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 06/18] merge-ort: copy get_renamed_dir_portion() from merge-recursive.c Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 07/18] merge-ort: implement compute_rename_counts() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 08/18] merge-ort: implement handle_directory_level_conflicts() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 09/18] merge-ort: modify collect_renames() for directory rename handling Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 10/18] merge-ort: implement compute_collisions() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 11/18] merge-ort: implement apply_dir_rename() and check_dir_renamed() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 12/18] merge-ort: implement check_for_directory_rename() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 13/18] merge-ort: implement handle_path_level_conflicts() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 14/18] merge-ort: add a new toplevel_dir field Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 15/18] merge-ort: implement apply_directory_rename_modifications() Elijah Newren via GitGitGadget
2021-01-07 20:01 ` [PATCH 16/18] merge-ort: process_renames() now needs more defensiveness Elijah Newren via GitGitGadget
2021-01-07 20:01 ` Elijah Newren via GitGitGadget [this message]
2021-01-07 20:01 ` [PATCH 18/18] DO NOT SUBMIT: directory rename stuff for redo_after_renames Elijah Newren via GitGitGadget
2021-01-07 20:02   ` Elijah Newren
2021-01-07 21:35 ` [PATCH v2 00/17] Add directory rename detection to merge-ort Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 01/17] merge-ort: add new data structures for directory rename detection Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 02/17] merge-ort: initialize and free new directory rename data structures Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 03/17] merge-ort: collect which directories are removed in dirs_removed Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 04/17] merge-ort: add outline for computing directory renames Elijah Newren via GitGitGadget
2021-01-18 19:54     ` Taylor Blau
2021-01-18 20:15       ` Elijah Newren
2021-01-18 20:28         ` Taylor Blau
2021-01-07 21:35   ` [PATCH v2 05/17] merge-ort: add outline of get_provisional_directory_renames() Elijah Newren via GitGitGadget
2021-01-18 20:10     ` Taylor Blau
2021-01-18 20:34       ` Elijah Newren
2021-01-07 21:35   ` [PATCH v2 06/17] merge-ort: copy get_renamed_dir_portion() from merge-recursive.c Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 07/17] merge-ort: implement compute_rename_counts() Elijah Newren via GitGitGadget
2021-01-18 20:36     ` Taylor Blau
2021-01-18 20:41       ` Elijah Newren
2021-01-07 21:35   ` [PATCH v2 08/17] merge-ort: implement handle_directory_level_conflicts() Elijah Newren via GitGitGadget
2021-01-18 21:00     ` Taylor Blau
2021-01-18 21:36       ` Elijah Newren
2021-01-07 21:35   ` [PATCH v2 09/17] merge-ort: modify collect_renames() for directory rename handling Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 10/17] merge-ort: implement compute_collisions() Elijah Newren via GitGitGadget
2021-01-07 21:35   ` [PATCH v2 11/17] merge-ort: implement apply_dir_rename() and check_dir_renamed() Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 12/17] merge-ort: implement check_for_directory_rename() Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 13/17] merge-ort: implement handle_path_level_conflicts() Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 14/17] merge-ort: add a new toplevel_dir field Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 15/17] merge-ort: implement apply_directory_rename_modifications() Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 16/17] merge-ort: process_renames() now needs more defensiveness Elijah Newren via GitGitGadget
2021-01-07 21:36   ` [PATCH v2 17/17] merge-ort: fix a directory rename detection bug Elijah Newren via GitGitGadget
2021-01-19 19:53   ` [PATCH v3 00/17] Add directory rename detection to merge-ort Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 01/17] merge-ort: add new data structures for directory rename detection Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 02/17] merge-ort: initialize and free new directory rename data structures Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 03/17] merge-ort: collect which directories are removed in dirs_removed Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 04/17] merge-ort: add outline for computing directory renames Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 05/17] merge-ort: add outline of get_provisional_directory_renames() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 06/17] merge-ort: copy get_renamed_dir_portion() from merge-recursive.c Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 07/17] merge-ort: implement compute_rename_counts() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 08/17] merge-ort: implement handle_directory_level_conflicts() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 09/17] merge-ort: modify collect_renames() for directory rename handling Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 10/17] merge-ort: implement compute_collisions() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 11/17] merge-ort: implement apply_dir_rename() and check_dir_renamed() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 12/17] merge-ort: implement check_for_directory_rename() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 13/17] merge-ort: implement handle_path_level_conflicts() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 14/17] merge-ort: add a new toplevel_dir field Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 15/17] merge-ort: implement apply_directory_rename_modifications() Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 16/17] merge-ort: process_renames() now needs more defensiveness Elijah Newren via GitGitGadget
2021-01-19 19:53     ` [PATCH v3 17/17] merge-ort: fix a directory rename detection bug Elijah Newren via GitGitGadget
2021-01-19 22:48     ` [PATCH v3 00/17] Add directory rename detection to merge-ort Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551878bd84da895e0899eba620553f6efe60e100.1610049687.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.