All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8] Detection of directory renames
@ 2010-10-28 22:08 Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git

Changes since v7:

* turned debug traces off but allow them to be easily turned on
* corrected example in gitdiffcore.txt (Junio)
* unidiff output changes:
  * don't use "diff --git" to avoid confusing git-apply, use
    --git-detect-bulk-moves instead (Junio, 2008)
  * use "bulk move with similarity index NNN%"
  * don't use trailing asterisk, useless there
* added missing &&'s in testsuite
* improved short help messages
* documented the unidiff output format and --hide-bulk-move-details
* splitted raw output format, under discussion, into its own patch
* improved commit messages, including current rationale for using "/*"
* listed bulk-removal/copy/add as possible future related works
* renamed test script for consistency

I'll experiment next with bulk-removal detection, with bulk-move
rerolled on top of it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
@ 2010-10-28 22:08 ` Yann Dirson
  2010-10-29  1:45   ` Jonathan Nieder
  2010-11-25 10:08   ` [PATCH v8 1/5] " Ævar Arnfjörð Bjarmason
  2010-10-28 22:08 ` [PATCH v8 2/5] Raw diff output format for bulk moves Yann Dirson
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This feature tries to group together files moving from and to
identical directories - a common case being directory renames.

This only adds the detection logic.  The output of raw diff is displayed
as "Rnnn a/ b/", and unified diff does not display them at all.  Output
formats will be refined later in the series.

It is implemented as a new pass in diffcore-rename, occuring after the
file renames get detected, grouping those renames looking like a move
of a full directory into some other place. It is activated by the new
--detect-bulk-moves diffcore flag.

Possible optimisations to this code include:
* avoid use of i_am_not_single by using a separate list
* use a more informative prefixcmp to avoid strcmp calls
  eg. in discard_if_outside()
* optimize for bulk insertions (avoid useless successive memmove's)

Other future developements to be made on top of this include:
* detect bulk removals (well, that one is rather a subset than a layer above),
  and possibly bulk additions
* detect bulk copies
* detect inexact bulk-moves/copies (where some files were not moved, or were
  moved to a different place) - problem of computing a similarity score
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* support other types of bluk-grouping, like prefixes (see eg. kernel
  5d1e859c), and maybe config-specified patterns
* add yours here

This patch has been improved by the following contributions:
- Jonathan Nieder: better implementation of copy_dirname()
- Jonathan Nieder: portable implementation of memrchr() in another patch
- Junio C Hamano: split individual renames hiding under control of another flag
- Junio C Hamano: coding style issues
- Junio C Hamano: better examples
- Ævar Arnfjörð Bjarmason: Don't use C99 comments.
- Jonathan Nieder: just too many other helpful suggestions to list them all

Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-options.txt |    4 +
 Documentation/gitdiffcore.txt  |   12 ++
 diff-lib.c                     |    6 +-
 diff.c                         |    5 +
 diff.h                         |    3 +
 diffcore-rename.c              |  370 ++++++++++++++++++++++++++++++++++++++--
 diffcore.h                     |    1 +
 tree-diff.c                    |    4 +-
 8 files changed, 391 insertions(+), 14 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bfd0b57..887df4b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -245,6 +245,10 @@ endif::git-log[]
 	delete/add pair to be a rename if more than 90% of the file
 	hasn't changed.
 
+--detect-bulk-moves::
+	Detect bulk move of all files of a directory into a
+	different one.
+
 -C[<n>]::
 --detect-copies[=<n>]::
 	Detect copies as well as renames.  See also `--find-copies-harder`.
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 6af29a4..93111ac 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -175,6 +175,18 @@ the expense of making it slower.  Without `\--find-copies-harder`,
 'git diff-{asterisk}' commands can detect copies only if the file that was
 copied happened to have been modified in the same changeset.
 
+Bulk move of all files of a directory into a different one can get
+detected using the `\--detect-bulk-moves` option.  This adds an
+additional pass on top of the results of per-file rename detection.
+They are reported with NULL SHA1 id, in addition to the file renames:
+
+------------------------------------------------
+:040000 040000 0000000... 0000000... R100 foo/ bar/
+:100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
+:100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
+:100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
+------------------------------------------------
+
 
 diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
 --------------------------------------------------------------------
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 						    ce_option, &dirty_submodule);
 		if (!changed && !dirty_submodule) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index d1c6b91..2eba335 100644
--- a/diff.c
+++ b/diff.c
@@ -3191,6 +3191,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--detect-bulk-moves")) {
+		DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 0083d92..1e2506c 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --detect-bulk-moves\n" \
+"                detect moves of all files of a single directory.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..6afc662 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -6,14 +6,34 @@
 #include "diffcore.h"
 #include "hash.h"
 
+#define DEBUG_BULKMOVE 0
+
+#if DEBUG_BULKMOVE
+#define debug_bulkmove(args) __debug_bulkmove args
+void __debug_bulkmove(const char *fmt, ...)
+{
+	va_list ap;
+	va_start(ap, fmt);
+	fprintf(stderr, "[DBG] ");
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+#else
+#define debug_bulkmove(args) do { /*nothing */ } while (0)
+#endif
+
 /* Table of rename/copy destinations */
 
 static struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	unsigned i_am_not_single:1; /* does not look for a match, only here to be looked at */
 } *rename_dst;
 static int rename_dst_nr, rename_dst_alloc;
 
+/*
+ * Do a binary search of "two" in "rename_dst", inserting it if not found.
+ */
 static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 						 int insert_ok)
 {
@@ -49,9 +69,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 	rename_dst[first].two = alloc_filespec(two->path);
 	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
 	rename_dst[first].pair = NULL;
+	rename_dst[first].i_am_not_single = 0;
 	return &(rename_dst[first]);
 }
 
+/*
+ * Do a binary search in "rename_dst" of any entry under "dirname".
+ */
+static struct diff_rename_dst *locate_rename_dst_dir(const char *dirname)
+{
+	int first, last;
+	int prefixlength = strlen(dirname);
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strncmp(dirname, dst->two->path, prefixlength);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	return NULL;
+}
+
 /* Table of rename/copy src files */
 static struct diff_rename_src {
 	struct diff_filespec *one;
@@ -386,8 +433,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -414,6 +464,270 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+static struct diff_bulk_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+} *bulkmove_candidates;
+static int bulkmove_candidates_nr, bulkmove_candidates_alloc;
+
+/*
+ * Do a binary search of "one" in "bulkmove_candidate", inserting it if not
+ * found.
+ * A good part was copy-pasted from locate_rename_dst().
+ */
+static struct diff_bulk_rename *locate_bulkmove_candidate(const char *one_path,
+							  const char *two_path)
+{
+	int first, last;
+
+	first = 0;
+	last = bulkmove_candidates_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_bulk_rename *candidate = &(bulkmove_candidates[next]);
+		/* primary sort key on one_path, secondary on two_path */
+		int cmp = strcmp(one_path, candidate->one->path);
+		if (!cmp)
+			cmp = strcmp(two_path, candidate->two->path);
+		if (!cmp)
+			return candidate;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	/* insert to make it at "first" */
+	if (bulkmove_candidates_alloc <= bulkmove_candidates_nr) {
+		bulkmove_candidates_alloc = alloc_nr(bulkmove_candidates_alloc);
+		bulkmove_candidates = xrealloc(bulkmove_candidates,
+				      bulkmove_candidates_alloc * sizeof(*bulkmove_candidates));
+	}
+	bulkmove_candidates_nr++;
+	if (first < bulkmove_candidates_nr)
+		memmove(bulkmove_candidates + first + 1, bulkmove_candidates + first,
+			(bulkmove_candidates_nr - first - 1) * sizeof(*bulkmove_candidates));
+
+	bulkmove_candidates[first].one = alloc_filespec(one_path);
+	fill_filespec(bulkmove_candidates[first].one, null_sha1, S_IFDIR);
+	bulkmove_candidates[first].two = alloc_filespec(two_path);
+	fill_filespec(bulkmove_candidates[first].two, null_sha1, S_IFDIR);
+	bulkmove_candidates[first].discarded = 0;
+	return &(bulkmove_candidates[first]);
+}
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char *copy_dirname(char *dst, const char *src)
+{
+	size_t len = strlen(src);
+	const char *slash;
+	char *end;
+
+	if (len > 0 && src[len - 1] == '/')
+		/* Trailing slash.  Ignore it. */
+		len--;
+
+	slash = memrchr(src, '/', len);
+	if (!slash) {
+		*dst = '\0';
+		return dst;
+	}
+
+	end = mempcpy(dst, src, slash - src + 1);
+	*end = '\0';
+	return dst;
+}
+
+// FIXME: leaks like hell.
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int dispatched_to_different_dirs(const char *one_parent_path)
+{
+	/* this might be a dir split, or files added
+	 * after the bulk move, or just an isolated
+	 * rename */
+	int two_idx, j, onep_len, maybe_dir_rename;
+	struct diff_rename_dst *one_leftover =
+		one_leftover = locate_rename_dst_dir(one_parent_path);
+
+	if (!one_leftover)
+		return 0;
+
+	/* try to see if it is a file added after the bulk move */
+	two_idx = one_leftover - rename_dst;
+	onep_len = strlen(one_parent_path);
+	maybe_dir_rename = 1;
+
+	/* check no leftover file was already here before */
+	for (j = two_idx; j < rename_dst_nr; j++) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in %s\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+	/* try the other direction (dup code) */
+	for (j = two_idx-1; j >= 0; j--) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in '%s'\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+
+	/* Here we are in the case where a directory
+	 * content was completely moved, but files
+	 * were added to it afterwards.  Proceed as
+	 * for a simple bulk move. */
+	return 0;
+}
+
+/*
+ * Assumes candidate->one is a subdir of seen->one, mark 'seen' as
+ * discarded if candidate->two is outside seen->two.  Also mark
+ * 'candidate' itself as discarded if the conflict implies so.
+ *
+ * Return 1 if 'seen' was discarded
+ */
+static int discard_if_outside(struct diff_bulk_rename *candidate,
+			 struct diff_bulk_rename *seen) {
+	if (!prefixcmp(candidate->two->path, seen->two->path)) {
+		debug_bulkmove((" 'dstpair' conforts 'seen'\n"));
+		return 0;
+	} else {
+		debug_bulkmove(("discarding %s -> %s from bulk moves (split into %s and %s)\n",
+				seen->one->path, seen->two->path,
+				candidate->two->path, seen->two->path));
+		seen->discarded = 1;
+		/* Need to discard dstpair as well, unless moving from
+		 * a strict subdir of seen->one or to a strict subdir
+		 * of seen->two */
+		if (!strcmp(seen->one->path, candidate->one->path) &&
+		    prefixcmp(seen->two->path, candidate->two->path)) {
+			debug_bulkmove(("... and not adding self\n"));
+			candidate->discarded = 1;
+		}
+		return 1;
+	}
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+	char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+
+	/* genuine new files (or believed to be so) */
+	if (!dstpair)
+		return;
+	/* dummy renames used by copy detection */
+	if (!strcmp(dstpair->one->path, dstpair->two->path))
+		return;
+
+	copy_dirname(one_parent_path, dstpair->one->path);
+	copy_dirname(two_parent_path, dstpair->two->path);
+
+	/* simple rename with no directory change */
+	if (!strcmp(one_parent_path, two_parent_path))
+		return;
+
+	debug_bulkmove(("[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path));
+
+	/* loop up one_parent_path over successive parents */
+	// FIXME: also loop over two_parent_path prefixes
+	do {
+		struct diff_bulk_rename *seen;
+		int old_nr = bulkmove_candidates_nr;
+		struct diff_bulk_rename *candidate =
+			locate_bulkmove_candidate(one_parent_path, two_parent_path);
+		debug_bulkmove(("[[]] %s ...\n", one_parent_path));
+		if (old_nr == bulkmove_candidates_nr) {
+			debug_bulkmove((" already seen\n"));
+			return;
+		}
+
+		/* After this commit, are there any files still under one_parent_path ?
+		 * Any file left would disqualifies this dir for bulk move.
+		 */
+		if (dispatched_to_different_dirs(one_parent_path)) {
+			// FIXME: check overlap with discard_if_outside()
+			candidate->discarded = 1;
+			return;
+		}
+
+		/* walk up for one_parent_path prefixes */
+		for (seen = candidate-1; (seen >= bulkmove_candidates) &&
+			     !prefixcmp(one_parent_path, seen->one->path); seen--) {
+			debug_bulkmove((" ? %s -> %s\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+		/* look down for other moves from one_parent_path */
+		seen = candidate + 1;
+		if (seen != bulkmove_candidates + bulkmove_candidates_nr &&
+		    !strcmp(one_parent_path, seen->one->path)) {
+			debug_bulkmove((" ? %s -> %s (2)\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+
+		/* next parent if any */
+		copy_dirname(one_parent_path, one_parent_path);
+	} while (*one_parent_path);
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+	int i;
+	for (i = 0; i < rename_dst_nr; i++)
+		check_one_bulk_move(rename_dst[i].pair);
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -424,6 +738,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_score *mx;
 	int i, j, rename_count;
 	int num_create, num_src, dst_cnt;
+	struct diff_bulk_rename *candidate;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -438,8 +753,7 @@ void diffcore_rename(struct diff_options *options)
 				continue; /* not interested */
 			else
 				locate_rename_dst(p->two, 1);
-		}
-		else if (!DIFF_FILE_VALID(p->two)) {
+		} else if (!DIFF_FILE_VALID(p->two)) {
 			/*
 			 * If the source is a broken "delete", and
 			 * they did not really want to get broken,
@@ -450,14 +764,23 @@ void diffcore_rename(struct diff_options *options)
 			if (p->broken_pair && !p->score)
 				p->one->rename_used++;
 			register_rename_src(p->one, p->score);
-		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+		} else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p->one, p->score);
+			}
+			if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+				/* similarly, bulk move detection needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				// FIXME: check interaction with --find-copies-harder
+				locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -509,6 +832,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -569,7 +894,30 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+		diffcore_bulk_moves();
+
 	DIFF_QUEUE_CLEAR(&outq);
+
+	/* Now turn non-discarded bulkmove_candidates into real renames */
+	for (candidate = bulkmove_candidates;
+	     candidate < bulkmove_candidates + bulkmove_candidates_nr; candidate++) {
+		struct diff_filepair* pair;
+		if (candidate->discarded)
+			continue;
+		/* visualize toplevel dir if needed */
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+		pair->is_bulkmove = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/tree-diff.c b/tree-diff.c
index 12c9a88..89cedd4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 2/5] Raw diff output format for bulk moves.
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-10-28 22:08 ` Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 3/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

To distinguish the general bulk-move case (where the destination
directory was pre-existing) from the directory-rename case (where it
was not), the output of raw diff is displayed as "Rnnn a/* b/".  Those
cannot be confused with renames of files named "whatever/*" with a
literal star character, from the full-zero SHA1's.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/gitdiffcore.txt |    2 +-
 diff.c                        |    9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 93111ac..2538dc0 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -181,7 +181,7 @@ additional pass on top of the results of per-file rename detection.
 They are reported with NULL SHA1 id, in addition to the file renames:
 
 ------------------------------------------------
-:040000 040000 0000000... 0000000... R100 foo/ bar/
+:040000 040000 0000000... 0000000... R100 foo/* bar/
 :100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
 :100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
 :100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
diff --git a/diff.c b/diff.c
index 2eba335..948eb76 100644
--- a/diff.c
+++ b/diff.c
@@ -3490,7 +3490,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 	if (p->status == DIFF_STATUS_COPIED ||
 	    p->status == DIFF_STATUS_RENAMED) {
 		const char *name_a, *name_b;
-		name_a = p->one->path;
+		if (p->is_bulkmove) {
+			/* append "*" to the first dirname */
+			char buf[PATH_MAX];
+			char *next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+			next[-1] = '*'; *next = '\0';
+			name_a = buf;
+		} else
+			name_a = p->one->path;
 		name_b = p->two->path;
 		strip_prefix(opt->prefix_length, &name_a, &name_b);
 		write_name_quoted(name_a, opt->file, inter_name_termination);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 3/5] Add testcases for the --detect-bulk-moves diffcore flag.
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 2/5] Raw diff output format for bulk moves Yann Dirson
@ 2010-10-28 22:08 ` Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 4/5] Unified diff output format for bulk moves Yann Dirson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This notably includes a couple of tests for cases known not to be
working correctly yet.

This patch has been improved by the following contributions:
- Jonathan Nieder: reworked style of test script
- Jonathan Nieder: use "git commit" in test instead of only plumbing,
  and use test_tick
- Sverre Rabbelier: anonymize hashes

Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 t/t4046-diff-bulk-move.sh |  296 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 296 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-bulk-move.sh

diff --git a/t/t4046-diff-bulk-move.sh b/t/t4046-diff-bulk-move.sh
new file mode 100755
index 0000000..4b1c78e
--- /dev/null
+++ b/t/t4046-diff-bulk-move.sh
@@ -0,0 +1,296 @@
+#!/bin/sh
+#
+# Copyright (c) 2008,2010 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# TODO for dir renames:
+# * two dirs or more moving all their files to a single dir
+# * simultaneous bulkmove and rename
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m "original empty commit"
+
+	mkdir a &&
+	printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 &&
+	sed <a/path0 >a/path1 s/Line/Record/ &&
+	sed <a/path0 >a/path2 s/Line/Stuff/ &&
+	sed <a/path0 >a/path3 s/Line/Blurb/ &&
+
+	git update-index --add a/path* &&
+	test_tick &&
+	git commit -m "original set of files" &&
+
+	: rename the directory &&
+	git mv a b
+'
+test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup non-100% rename' '
+	echo "Line 16" >>b/path0 &&
+	git mv b/path2 b/2path &&
+	git rm -f b/path3 &&
+	echo anything >b/path100 &&
+	git add b/path100
+'
+test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 000000 X X D#	a/path3
+	:100644 100644 X X R#	a/path2	b/2path
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:000000 100644 X X A#	b/path100
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move that is not directory move' '
+	git reset -q --hard &&
+
+	mkdir c &&
+	(
+		for i in 0 1 2; do
+			cp a/path$i c/apath$i || exit
+		done
+	) &&
+	git update-index --add c/apath* &&
+	test_tick &&
+	git commit -m "first set of changes" &&
+
+	git mv c/* a/
+'
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with new file in source dir' '
+	echo > c/anotherpath "How much wood?" &&
+	git update-index --add c/another*
+'
+test_expect_success 'diff-index --detect-bulk-moves with new file in source dir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:000000 100644 X X A#	c/anotherpath
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move with interfering copy' '
+	rm c/anotherpath &&
+	git update-index --remove c/anotherpath &&
+	mkdir b &&
+	cp a/apath0 b/apath9 &&
+	echo >> a/apath0 "more" &&
+	git update-index --add a/apath0 b/apath9
+'
+# scores select the "wrong" one as "moved" (only a suboptimal detection)
+test_expect_failure 'diff-index --detect-bulk-moves with interfering copy.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:100644 100644 X X C#	c/apath0	b/apath9
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup bulk move to toplevel' '
+	git reset -q --hard &&
+	git mv c/* .
+'
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	./
+	:100644 100644 X X R#	c/apath0	apath0
+	:100644 100644 X X R#	c/apath1	apath1
+	:100644 100644 X X R#	c/apath2	apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move including a subdir, with some content changes' '
+	git reset -q --hard &&
+	mv c a/ &&
+	git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 &&
+	test_tick &&
+	git commit -m "move as subdir" &&
+
+	git mv a b &&
+	echo foo >>b/c/apath0 &&
+	git update-index --add b/c/apath*
+'
+test_expect_success 'diff-index --detect-bulk-moves on a move including a subdir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:040000 040000 X X R#	a/c/*	b/c/
+	:100644 100644 X X R#	a/c/apath0	b/c/apath0
+	:100644 100644 X X R#	a/c/apath1	b/c/apath1
+	:100644 100644 X X R#	a/c/apath2	b/c/apath2
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of only a subdir' '
+	git reset -q --hard &&
+	: rename a subdirectory of a/. &&
+	git mv a/c a/d
+'
+test_expect_success 'moving a subdir only' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	a/d/
+	:100644 100644 X X R#	a/c/apath0	a/d/apath0
+	:100644 100644 X X R#	a/c/apath1	a/d/apath1
+	:100644 100644 X X R#	a/c/apath2	a/d/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move without a subdir' '
+	git reset -q --hard &&
+	mkdir b &&
+	: rename files in the directory but not subdir. &&
+	git mv a/path* b/
+'
+test_expect_success 'moving files but not subdirs is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of files and subdirs to different places' '
+	git reset -q --hard &&
+	git mv a/c b &&
+	git mv a d
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	:100644 100644 X X R#	a/path0	d/path0
+	:100644 100644 X X R#	a/path1	d/path1
+	:100644 100644 X X R#	a/path2	d/path2
+	:100644 100644 X X R#	a/path3	d/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+# the same with different ordering
+test_expect_success 'setup move of files and subdirs to different places' '
+	git mv d 0
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/path0	0/path0
+	:100644 100644 X X R#	a/path1	0/path1
+	:100644 100644 X X R#	a/path2	0/path2
+	:100644 100644 X X R#	a/path3	0/path3
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_expect_success 'setup move of dir with only subdirs' '
+	git reset -q --hard &&
+	mkdir a/b &&
+	mv a/path* a/b/ &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* &&
+	test_tick &&
+	git commit -m "move all toplevel files down one level" &&
+
+	git mv a z
+'
+# TODO: only a suboptimal non-detection
+test_expect_failure 'moving a dir with no direct children files' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	z/
+	:040000 040000 X X R#	a/b/*	z/b/
+	:040000 040000 X X R#	a/c/*	z/c/
+	:100644 100644 X X R#	a/b/path0	z/b/path0
+	:100644 100644 X X R#	a/b/path1	z/b/path1
+	:100644 100644 X X R#	a/b/path2	z/b/path2
+	:100644 100644 X X R#	a/b/path3	z/b/path3
+	:100644 100644 X X R#	a/c/apath0	z/c/apath0
+	:100644 100644 X X R#	a/c/apath1	z/c/apath1
+	:100644 100644 X X R#	a/c/apath2	z/c/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is a special case of move of a dir into one of its own subdirs, which in
+# turn is a variant of new files/dirs being added into a dir after all its contents
+# are moved away
+
+test_expect_success 'setup move from toplevel to subdir' '
+	git reset -q --hard HEAD~3 &&
+	mv a/* . &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* &&
+	test_tick &&
+	git commit -m "move all files to toplevel" &&
+
+	mkdir z &&
+	git mv path* z/
+'
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	./*	z/
+	:100644 100644 X X R#	path0	z/path0
+	:100644 100644 X X R#	path1	z/path1
+	:100644 100644 X X R#	path2	z/path2
+	:100644 100644 X X R#	path3	z/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	compare_diff_raw expected current
+'
+
+test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 4/5] Unified diff output format for bulk moves.
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
                   ` (2 preceding siblings ...)
  2010-10-28 22:08 ` [PATCH v8 3/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-10-28 22:08 ` Yann Dirson
  2010-10-28 22:08 ` [PATCH v8 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
  2010-10-29  1:05 ` [PATCH v8] Detection of directory renames Jonathan Nieder
  5 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

The output produced is as shown below.

diff --git-detect-bulk-moves mips/nxp/pnx833x/common/ mips/pnx833x/common/
bulk move with similarity index 100%
bulk move from mips/nxp/pnx833x/common/
bulk move to mips/pnx833x/common/

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-generate-patch.txt |   19 +++++++++++++
 diff.c                                |   48 +++++++++++++++++++++++++-------
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/Documentation/diff-generate-patch.txt b/Documentation/diff-generate-patch.txt
index 3ac2bea..367e859 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -71,6 +71,25 @@ separate lines indicate the old and the new mode.
       rename to a
 
 
+bulk move entries
+-----------------
+
+When a bulk move is detected, a special block is output in addition to
+the renames that constitute the bulk move, looking like this:
+
+       diff --git-detect-bulk-moves a/dir1/ b/dir2/
+
+It is essentially similar to the standard diff header, with a special
+syntax showing we are describing a difference between two directories,
+and not a change to be applied as-is to a file.
+
+It is followed by three specific extended header lines:
+
+       bulk move with similarity index <number>
+       bulk move from <dir1>/
+       bulk move to <dir2>/
+
+
 combined diff format
 --------------------
 
diff --git a/diff.c b/diff.c
index 948eb76..ab2e201 100644
--- a/diff.c
+++ b/diff.c
@@ -2659,6 +2659,7 @@ static void run_diff_cmd(const char *pgm,
 	const char *xfrm_msg = NULL;
 	int complete_rewrite = (p->status == DIFF_STATUS_MODIFIED) && p->score;
 	int must_show_header = 0;
+	int use_color;
 
 	if (!DIFF_OPT_TST(o, ALLOW_EXTERNAL))
 		pgm = NULL;
@@ -2668,14 +2669,36 @@ static void run_diff_cmd(const char *pgm,
 			pgm = drv->external;
 	}
 
+	/*
+	 * don't use colors when the header is intended for an
+	 * external diff driver
+	 */
+	use_color = DIFF_OPT_TST(o, COLOR_DIFF) && !pgm;
+
+	if (p->is_bulkmove) {
+		const char *set = diff_get_color(use_color, DIFF_METAINFO);
+		const char *reset = diff_get_color(use_color, DIFF_RESET);
+		struct strbuf *msgbuf;
+		char *line_prefix = "";
+
+		if (o->output_prefix) {
+			msgbuf = o->output_prefix(o, o->output_prefix_data);
+			line_prefix = msgbuf->buf;
+		}
+		fprintf(o->file, "%s%sdiff --git-detect-bulk-moves %s %s%s\n",
+			line_prefix, set, one->path, two->path, reset);
+		fprintf(o->file, "%s%sbulk move with similarity index %d%%%s\n",
+			line_prefix, set, similarity_index(p), reset);
+		fprintf(o->file, "%s%sbulk move from %s%s\n",
+			line_prefix, set, one->path, reset);
+		fprintf(o->file, "%s%sbulk move to %s%s\n",
+			line_prefix, set, two->path, reset);
+		return;
+	}
+
 	if (msg) {
-		/*
-		 * don't use colors when the header is intended for an
-		 * external diff driver
-		 */
 		fill_metainfo(msg, name, other, one, two, o, p,
-			      &must_show_header,
-			      DIFF_OPT_TST(o, COLOR_DIFF) && !pgm);
+			      &must_show_header, use_color);
 		xfrm_msg = msg->len ? msg->buf : NULL;
 	}
 
@@ -2748,8 +2771,10 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 		return;
 	}
 
-	diff_fill_sha1_info(one);
-	diff_fill_sha1_info(two);
+	if (!p->is_bulkmove) {
+		diff_fill_sha1_info(one);
+		diff_fill_sha1_info(two);
+	}
 
 	if (!pgm &&
 	    DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) &&
@@ -3548,9 +3573,10 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
 	if (diff_unmodified_pair(p))
 		return;
 
-	if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
-	    (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
-		return; /* no tree diffs in patch format */
+	if (!p->is_bulkmove &&
+	    ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+	     (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode))))
+		return; /* no tree diffs in patch format, except for bulk moves */
 
 	run_diff(p, o);
 }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename.
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
                   ` (3 preceding siblings ...)
  2010-10-28 22:08 ` [PATCH v8 4/5] Unified diff output format for bulk moves Yann Dirson
@ 2010-10-28 22:08 ` Yann Dirson
  2010-10-29  1:05 ` [PATCH v8] Detection of directory renames Jonathan Nieder
  5 siblings, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-10-28 22:08 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).

This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 Documentation/diff-options.txt |    5 +++
 diff.c                         |    7 ++++
 diff.h                         |    3 ++
 diffcore-rename.c              |   68 ++++++++++++++++++++++++++++++++++++++--
 diffcore.h                     |    1 +
 5 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 887df4b..f9a2f4e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -249,6 +249,11 @@ endif::git-log[]
 	Detect bulk move of all files of a directory into a
 	different one.
 
+--hide-bulk-move-details::
+	Hide the individual files moves that make up a bulk move,
+	without hinding other changes to the involved files (contents
+	change, name change relative to the bulk move's destination).
+
 -C[<n>]::
 --detect-copies[=<n>]::
 	Detect copies as well as renames.  See also `--find-copies-harder`.
diff --git a/diff.c b/diff.c
index ab2e201..b920f2e 100644
--- a/diff.c
+++ b/diff.c
@@ -3221,6 +3221,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		if (!options->detect_rename)
 			options->detect_rename = DIFF_DETECT_RENAME;
 	}
+	else if (!strcmp(arg, "--hide-bulk-move-details")) {
+		DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+		if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+			DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 1e2506c..f28fed5 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
 #define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "                try unchanged files as candidate for copy detection.\n" \
 "  --detect-bulk-moves\n" \
 "                detect moves of all files of a single directory.\n" \
+"  --hide-bulk-move-details\n" \
+"                hide individual files moves in a bulk move.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6afc662..940cb7d 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -519,6 +519,34 @@ static struct diff_bulk_rename *locate_bulkmove_candidate(const char *one_path,
 }
 
 /*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst *file_rename,
+				    struct diff_bulk_rename *dir_rename,
+				    int one_len, int two_len)
+{
+	if (!file_rename->pair) /* file add */
+		return 0;
+	if (strncmp(file_rename->two->path,
+		    dir_rename->two->path, two_len))
+		return -1;
+	if (strncmp(file_rename->pair->one->path,
+		    dir_rename->one->path, one_len) ||
+	    !basename_same(file_rename->pair->one, file_rename->two) ||
+	    file_rename->pair->score != MAX_SCORE)
+		return 0;
+
+	file_rename->pair->uninteresting_rename = 1;
+	debug_bulkmove(("%s* -> %s* makes %s -> %s uninteresting\n",
+			dir_rename->one->path, dir_rename->two->path,
+			file_rename->pair->one->path, file_rename->two->path));
+	return 0;
+}
+
+/*
  * Copy dirname of src into dst, suitable to append a filename without
  * an additional "/".
  * Only handles relative paths since there is no absolute path in a git repo.
@@ -721,11 +749,44 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
  * Take all file renames recorded so far and check if they could cause
  * a bulk move to be detected.
  */
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
 {
 	int i;
 	for (i = 0; i < rename_dst_nr; i++)
 		check_one_bulk_move(rename_dst[i].pair);
+
+	if (opt_hide_renames) {
+		/* flag as "uninteresting" those candidates hidden by dir move */
+		struct diff_bulk_rename *candidate;
+		for (candidate = bulkmove_candidates;
+		     candidate < bulkmove_candidates + bulkmove_candidates_nr;
+		     candidate++) {
+			int two_idx, i, one_len, two_len;
+			struct diff_rename_dst *two_sample;
+			if (candidate->discarded)
+				continue;
+
+			/* bisect to any entry within candidate dst dir */
+			two_sample = locate_rename_dst_dir(candidate->two->path);
+			if (!two_sample) {
+				die("PANIC: %s candidate of rename not in target tree (from %s)\n",
+				    candidate->two->path, candidate->one->path);
+			}
+			two_idx = two_sample - rename_dst;
+
+			/* now remove extraneous 100% files inside. */
+			one_len = strlen(candidate->one->path);
+			two_len = strlen(candidate->two->path);
+			for (i = two_idx; i < rename_dst_nr; i++)
+				if (maybe_mark_uninteresting(rename_dst+i, candidate,
+							     one_len, two_len) < 0)
+					break;
+			for (i = two_idx-1; i >= 0; i--)
+				if (maybe_mark_uninteresting(rename_dst+i, candidate,
+							     one_len, two_len) < 0)
+					break;
+		}
+	}
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -897,7 +958,7 @@ void diffcore_rename(struct diff_options *options)
 
 	/* Now possibly factorize those renames and copies. */
 	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
-		diffcore_bulk_moves();
+		diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
 
 	DIFF_QUEUE_CLEAR(&outq);
 
@@ -932,7 +993,8 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_rename_dst *dst =
 				locate_rename_dst(p->two, 0);
 			if (dst && dst->pair) {
-				diff_q(&outq, dst->pair);
+				if (!dst->pair->uninteresting_rename)
+					diff_q(&outq, dst->pair);
 				pair_to_free = p;
 			}
 			else
diff --git a/diffcore.h b/diffcore.h
index 6dab95b..a4eb8e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned uninteresting_rename : 1;
 	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8] Detection of directory renames
  2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
                   ` (4 preceding siblings ...)
  2010-10-28 22:08 ` [PATCH v8 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
@ 2010-10-29  1:05 ` Jonathan Nieder
  5 siblings, 0 replies; 15+ messages in thread
From: Jonathan Nieder @ 2010-10-29  1:05 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson wrote:

> Changes since v7:

More history for new readers:

 - v7: http://thread.gmane.org/gmane.comp.version-control.git/159825
 - v6: http://thread.gmane.org/gmane.comp.version-control.git/159081
 - v5: http://thread.gmane.org/gmane.comp.version-control.git/158623
 - v1-4: http://thread.gmane.org/gmane.comp.version-control.git/157917/focus=157981

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-10-29  1:45   ` Jonathan Nieder
  2010-10-29 21:18     ` Yann Dirson
  2010-11-25 10:08   ` [PATCH v8 1/5] " Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Nieder @ 2010-10-29  1:45 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git, Yann Dirson

Yann Dirson wrote:

> Possible optimisations to this code include:
> * avoid use of i_am_not_single by using a separate list

I think that would help code clarity, too. It is tempting to try to
split this patch into micropatches:

1. introduce DETECT_DIRECTORY_RENAMES flag and hidden UI for it.

2. if detecting bulk relocations, do not simplify input for
   diffcore by omitting unchanged files (just like when
   finding copies harder).

3. introduce "rename target candidate" flag in rename_dsts.
    - if detecting copies, all diff targets are potential rename
      targets

    - if detecting bulk relocations but not copies, all diff
      targets are rename_dsts, but only file creation diff
      targets are potential rename targets

    - if detecting neither bulk reloc nor copies, only file
      creation diff targets are rename_dsts (and all are
      potential rename targets).

   Alternatively, rename_dsts that are not potential rename targets
   could be put in a different list (which is simpler and probably
   faster, while using a little more memory).

4. honor rename_target_candidate flag (not needed if the
   non-candidates get their own list).

5. introduce a list of potential directory relocations and functions
   to manipulate it.

6. pass the (empty) list of relocated directories back to diffcore.

7. populate the list of directory relocation candidates.  If a file
   has been renamed to go from directory a/ to directory b/, we have a
   directory rename candidate.

8. disqualify directories with stragglers left behind.

9. disqualify directories for which the contents are not unanimous
   about where to go.

10. add documentation and stop hiding the UI.

Trivial comments on the patch:

[...]
> +++ b/diffcore-rename.c
> @@ -6,14 +6,34 @@
>  #include "diffcore.h"
>  #include "hash.h"
>  
> +#define DEBUG_BULKMOVE 0
> +
> +#if DEBUG_BULKMOVE
> +#define debug_bulkmove(args) __debug_bulkmove args
> +void __debug_bulkmove(const char *fmt, ...)
> +{
> +	va_list ap;
> +	va_start(ap, fmt);
> +	fprintf(stderr, "[DBG] ");
> +	vfprintf(stderr, fmt, ap);
> +	va_end(ap);
> +}
> +#else
> +#define debug_bulkmove(args) do { /*nothing */ } while (0)
> +#endif

Is the debugging output infrequent enough to just use a function
unconditionally?

[...]
> + * Supports in-place modification of src by passing dst == src.
> + */
> +static const char *copy_dirname(char *dst, const char *src)
[...]
> +	end = mempcpy(dst, src, slash - src + 1);

I suppose this should read:

	if (dst != src)
		memcpy(dst, src, slash - src + 1);
	dst[slash - src + 1] = '\0';
	return dst;

[...]
> +static int discard_if_outside(struct diff_bulk_rename *candidate,
> +			 struct diff_bulk_rename *seen) {

Style: '{' for functions goes in column 0.

> +	if (!prefixcmp(candidate->two->path, seen->two->path)) {
> +		debug_bulkmove((" 'dstpair' conforts 'seen'\n"));
> +		return 0;
> +	} else {

Can get some depth reduction by dropping the else here (since in
the trivial case we have already returned).

> +static void diffcore_bulk_moves(void)
> +{
> +	int i;
> +	for (i = 0; i < rename_dst_nr; i++)
> +		check_one_bulk_move(rename_dst[i].pair);
> +}

Yay. :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-29  1:45   ` Jonathan Nieder
@ 2010-10-29 21:18     ` Yann Dirson
  2010-10-30  0:26       ` Jonathan Nieder
  0 siblings, 1 reply; 15+ messages in thread
From: Yann Dirson @ 2010-10-29 21:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Thu, Oct 28, 2010 at 08:45:40PM -0500, Jonathan Nieder wrote:
> Yann Dirson wrote:
> 
> > Possible optimisations to this code include:
> > * avoid use of i_am_not_single by using a separate list
> 
> I think that would help code clarity, too. It is tempting to try to
> split this patch into micropatches:

Hm, I fear too much granularity would become meaningless :)

A number of the steps you suggest (notably 8 and 9) would cause the
code to be plain wrong at some points, since those are part of what
makes the algorithm (appear to be) correct.  This would not be a
problem only because that code would not be reachable throught any
means, right ?  If the code needs to be easier to understand, I'd
rather add some more doc, than added commits that are basically
"useless for bisect".

I'm much more tempted to split into fully-functionnal patches that do
adds reachable code paths (eg. bulk removal - although it's much more
than just a split of the patch).

> 1. introduce DETECT_DIRECTORY_RENAMES flag and hidden UI for it.

What do you mean by "hidden UI" ?

[...]
> 8. disqualify directories with stragglers left behind.
> 
> 9. disqualify directories for which the contents are not unanimous
>    about where to go.

[...]

> Trivial comments on the patch:
> 
> [...]
> > +++ b/diffcore-rename.c
> > @@ -6,14 +6,34 @@
> >  #include "diffcore.h"
> >  #include "hash.h"
> >  
> > +#define DEBUG_BULKMOVE 0
> > +
> > +#if DEBUG_BULKMOVE
> > +#define debug_bulkmove(args) __debug_bulkmove args
> > +void __debug_bulkmove(const char *fmt, ...)
> > +{
> > +	va_list ap;
> > +	va_start(ap, fmt);
> > +	fprintf(stderr, "[DBG] ");
> > +	vfprintf(stderr, fmt, ap);
> > +	va_end(ap);
> > +}
> > +#else
> > +#define debug_bulkmove(args) do { /*nothing */ } while (0)
> > +#endif
> 
> Is the debugging output infrequent enough to just use a function
> unconditionally?

You mean, keep funccalls even with DEBUG_BULKMOVE is not set ?  No,
there are too many traces for that.

> [...]
> > + * Supports in-place modification of src by passing dst == src.
> > + */
> > +static const char *copy_dirname(char *dst, const char *src)
> [...]
> > +	end = mempcpy(dst, src, slash - src + 1);
> 
> I suppose this should read:
> 
> 	if (dst != src)
> 		memcpy(dst, src, slash - src + 1);
> 	dst[slash - src + 1] = '\0';
> 	return dst;

Ah, sure the dst==src case can be improved.  But I'm not sure
factorizing writing NUL is worth the cost of re-computing where to put
it when using mempcpy would avoid.  Wouldn't the following be more
adequate ?

	if (dst != src) {
		end = mempcpy(dst, src, slash - src + 1);
		*end = '\0';
	} else
		dst[slash - src + 1] = '\0';
	return dst;

> Style: '{' for functions goes in column 0.

> Can get some depth reduction by dropping the else here (since in
> the trivial case we have already returned).

Right, thanks.

> > +static void diffcore_bulk_moves(void)
> > +{
> > +	int i;
> > +	for (i = 0; i < rename_dst_nr; i++)
> > +		check_one_bulk_move(rename_dst[i].pair);
> > +}
> 
> Yay. :)

Isn't that nice and pretty :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-29 21:18     ` Yann Dirson
@ 2010-10-30  0:26       ` Jonathan Nieder
  2010-11-04 21:56         ` Yann Dirson
  2010-11-04 21:59         ` [PATCH] " Yann Dirson
  0 siblings, 2 replies; 15+ messages in thread
From: Jonathan Nieder @ 2010-10-30  0:26 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson wrote:

> Hm, I fear too much granularity would become meaningless :)
[...]
>                 If the code needs to be easier to understand, I'd
> rather add some more doc, than added commits that are basically
> "useless for bisect".

Yes, the example list was probably a little overboard.  Even so, the
idea was to maintain bisectability by introducing a few tests at a
time.  That way, the fallout of a subfeature can easily be found with
"git bisect", among other benefits[1].

I might try it out some time soon; don't worry about it if you don't
like the idea. :)

> On Thu, Oct 28, 2010 at 08:45:40PM -0500, Jonathan Nieder wrote:

>> 1. introduce DETECT_DIRECTORY_RENAMES flag and hidden UI for it.
>
> What do you mean by "hidden UI" ?

PARSE_OPT_HIDDEN (i.e., a commandline option not tempting readers by
listing in -h).

>> Is the debugging output infrequent enough to just use a function
>> unconditionally?
>
> You mean, keep funccalls even with DEBUG_BULKMOVE is not set ?  No,
> there are too many traces for that.

Yep, that's what I meant.  Alas.

Will the debugging output still be useful once the code is known
to work reliably?  If not, I guess we can remove it at that point
(avoiding the need to worry about patches that may introduce

	debug_bulkmove(expression that does not even
		       typecheck);

down the line).

> Ah, sure the dst==src case can be improved.  But I'm not sure
> factorizing writing NUL is worth the cost of re-computing where to put
> it when using mempcpy would avoid.  Wouldn't the following be more
> adequate ?
> 
> 	if (dst != src) {
> 		end = mempcpy(dst, src, slash - src + 1);
> 		*end = '\0';
> 	} else
> 		dst[slash - src + 1] = '\0';
> 	return dst;

I doubt the difference is measurable.  But that looks fine (micronit:
I suppose one should put the dst == src case first in that case).

Jonathan

[1] The example that set me on this path to madness:
http://thread.gmane.org/gmane.comp.version-control.git/151086/focus=158913

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-30  0:26       ` Jonathan Nieder
@ 2010-11-04 21:56         ` Yann Dirson
  2010-11-04 21:59         ` [PATCH] " Yann Dirson
  1 sibling, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-11-04 21:56 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Yann Dirson, git

On Fri, Oct 29, 2010 at 07:26:00PM -0500, Jonathan Nieder wrote:
> Yann Dirson wrote:
> 
> > Hm, I fear too much granularity would become meaningless :)
> [...]
> >                 If the code needs to be easier to understand, I'd
> > rather add some more doc, than added commits that are basically
> > "useless for bisect".
> 
> Yes, the example list was probably a little overboard.  Even so, the
> idea was to maintain bisectability by introducing a few tests at a
> time.  That way, the fallout of a subfeature can easily be found with
> "git bisect", among other benefits[1].

I like the idea when successive commits get new functionality, but we
have to avoid introducing known-broken functionality and fixing them
in later patches - and I fear the list as you described it was going
that latter way.

> >> Is the debugging output infrequent enough to just use a function
> >> unconditionally?
> >
> > You mean, keep funccalls even with DEBUG_BULKMOVE is not set ?  No,
> > there are too many traces for that.
> 
> Yep, that's what I meant.  Alas.

Well, thinking twice, calls to empty function would be optimized out,
so yes, we don't need cpp magic at all.

-- 
Yann

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] Introduce bulk-move detection in diffcore.
  2010-10-30  0:26       ` Jonathan Nieder
  2010-11-04 21:56         ` Yann Dirson
@ 2010-11-04 21:59         ` Yann Dirson
  1 sibling, 0 replies; 15+ messages in thread
From: Yann Dirson @ 2010-11-04 21:59 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This feature tries to group together files moving from and to
identical directories - a common case being directory renames.

This only adds the detection logic.  The output of raw diff is displayed
as "Rnnn a/ b/", and unified diff does not display them at all.  Output
formats will be refined later in the series.

It is implemented as a new pass in diffcore-rename, occuring after the
file renames get detected, grouping those renames looking like a move
of a full directory into some other place. It is activated by the new
--detect-bulk-moves diffcore flag.

Possible optimisations to this code include:
* avoid use of i_am_not_single by using a separate list
* use a more informative prefixcmp to avoid strcmp calls
  eg. in discard_if_outside()
* optimize for bulk insertions (avoid useless successive memmove's)

Other future developements to be made on top of this include:
* detect bulk removals (well, that one is rather a subset than a layer above),
  and possibly bulk additions
* detect bulk copies
* detect inexact bulk-moves/copies (where some files were not moved, or were
  moved to a different place) - problem of computing a similarity score
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* support other types of bluk-grouping, like prefixes (see eg. kernel
  5d1e859c), and maybe config-specified patterns
* add yours here

This patch has been improved by the following contributions:
- Jonathan Nieder: better implementation of copy_dirname()
- Jonathan Nieder: portable implementation of memrchr() in another patch
- Junio C Hamano: split individual renames hiding under control of another flag
- Junio C Hamano: coding style issues
- Junio C Hamano: better examples
- Ævar Arnfjörð Bjarmason: Don't use C99 comments.
- Jonathan Nieder: just too many other helpful suggestions to list them all

Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Yann Dirson <ydirson@free.fr>
---

This version answers to most of Jonathan's remarks, except for the
debug_bulkmove() improvement.  Should notably make valgrind more happy.

 Documentation/diff-options.txt |    4 +
 Documentation/gitdiffcore.txt  |   12 ++
 diff-lib.c                     |    6 +-
 diff.c                         |    5 +
 diff.h                         |    3 +
 diffcore-rename.c              |  375 ++++++++++++++++++++++++++++++++++++++--
 diffcore.h                     |    1 +
 tree-diff.c                    |    4 +-
 8 files changed, 396 insertions(+), 14 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bfd0b57..887df4b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -245,6 +245,10 @@ endif::git-log[]
 	delete/add pair to be a rename if more than 90% of the file
 	hasn't changed.
 
+--detect-bulk-moves::
+	Detect bulk move of all files of a directory into a
+	different one.
+
 -C[<n>]::
 --detect-copies[=<n>]::
 	Detect copies as well as renames.  See also `--find-copies-harder`.
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 6af29a4..93111ac 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -175,6 +175,18 @@ the expense of making it slower.  Without `\--find-copies-harder`,
 'git diff-{asterisk}' commands can detect copies only if the file that was
 copied happened to have been modified in the same changeset.
 
+Bulk move of all files of a directory into a different one can get
+detected using the `\--detect-bulk-moves` option.  This adds an
+additional pass on top of the results of per-file rename detection.
+They are reported with NULL SHA1 id, in addition to the file renames:
+
+------------------------------------------------
+:040000 040000 0000000... 0000000... R100 foo/ bar/
+:100644 100644 0123456... 1234567... R090 foo/file0 bar/file3
+:100644 100644 2345678... 2345678... R100 foo/file1 bar/file1
+:100644 100644 3456789... 3456789... R100 foo/file2 bar/file2
+------------------------------------------------
+
 
 diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
 --------------------------------------------------------------------
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 						    ce_option, &dirty_submodule);
 		if (!changed && !dirty_submodule) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index d1c6b91..2eba335 100644
--- a/diff.c
+++ b/diff.c
@@ -3191,6 +3191,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--detect-bulk-moves")) {
+		DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 0083d92..1e2506c 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --detect-bulk-moves\n" \
+"                detect moves of all files of a single directory.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..52f3083 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -6,14 +6,34 @@
 #include "diffcore.h"
 #include "hash.h"
 
+#define DEBUG_BULKMOVE 0
+
+#if DEBUG_BULKMOVE
+#define debug_bulkmove(args) __debug_bulkmove args
+void __debug_bulkmove(const char *fmt, ...)
+{
+	va_list ap;
+	va_start(ap, fmt);
+	fprintf(stderr, "[DBG] ");
+	vfprintf(stderr, fmt, ap);
+	va_end(ap);
+}
+#else
+#define debug_bulkmove(args) do { /*nothing */ } while (0)
+#endif
+
 /* Table of rename/copy destinations */
 
 static struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	unsigned i_am_not_single:1; /* does not look for a match, only here to be looked at */
 } *rename_dst;
 static int rename_dst_nr, rename_dst_alloc;
 
+/*
+ * Do a binary search of "two" in "rename_dst", inserting it if not found.
+ */
 static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 						 int insert_ok)
 {
@@ -49,9 +69,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 	rename_dst[first].two = alloc_filespec(two->path);
 	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
 	rename_dst[first].pair = NULL;
+	rename_dst[first].i_am_not_single = 0;
 	return &(rename_dst[first]);
 }
 
+/*
+ * Do a binary search in "rename_dst" of any entry under "dirname".
+ */
+static struct diff_rename_dst *locate_rename_dst_dir(const char *dirname)
+{
+	int first, last;
+	int prefixlength = strlen(dirname);
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strncmp(dirname, dst->two->path, prefixlength);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	return NULL;
+}
+
 /* Table of rename/copy src files */
 static struct diff_rename_src {
 	struct diff_filespec *one;
@@ -386,8 +433,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -414,6 +464,275 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+static struct diff_bulk_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+} *bulkmove_candidates;
+static int bulkmove_candidates_nr, bulkmove_candidates_alloc;
+
+/*
+ * Do a binary search of "one" in "bulkmove_candidate", inserting it if not
+ * found.
+ * A good part was copy-pasted from locate_rename_dst().
+ */
+static struct diff_bulk_rename *locate_bulkmove_candidate(const char *one_path,
+							  const char *two_path)
+{
+	int first, last;
+
+	first = 0;
+	last = bulkmove_candidates_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_bulk_rename *candidate = &(bulkmove_candidates[next]);
+		/* primary sort key on one_path, secondary on two_path */
+		int cmp = strcmp(one_path, candidate->one->path);
+		if (!cmp)
+			cmp = strcmp(two_path, candidate->two->path);
+		if (!cmp)
+			return candidate;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	/* insert to make it at "first" */
+	if (bulkmove_candidates_alloc <= bulkmove_candidates_nr) {
+		bulkmove_candidates_alloc = alloc_nr(bulkmove_candidates_alloc);
+		bulkmove_candidates = xrealloc(bulkmove_candidates,
+				      bulkmove_candidates_alloc * sizeof(*bulkmove_candidates));
+	}
+	bulkmove_candidates_nr++;
+	if (first < bulkmove_candidates_nr)
+		memmove(bulkmove_candidates + first + 1, bulkmove_candidates + first,
+			(bulkmove_candidates_nr - first - 1) * sizeof(*bulkmove_candidates));
+
+	bulkmove_candidates[first].one = alloc_filespec(one_path);
+	fill_filespec(bulkmove_candidates[first].one, null_sha1, S_IFDIR);
+	bulkmove_candidates[first].two = alloc_filespec(two_path);
+	fill_filespec(bulkmove_candidates[first].two, null_sha1, S_IFDIR);
+	bulkmove_candidates[first].discarded = 0;
+	return &(bulkmove_candidates[first]);
+}
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char *copy_dirname(char *dst, const char *src)
+{
+	size_t len = strlen(src);
+	const char *slash;
+	char *end;
+
+	if (len > 0 && src[len - 1] == '/')
+		/* Trailing slash.  Ignore it. */
+		len--;
+
+	slash = memrchr(src, '/', len);
+	if (!slash) {
+		*dst = '\0';
+		return dst;
+	}
+
+	if (dst == src)
+		dst[slash - src + 1] = '\0';
+	else {
+		end = mempcpy(dst, src, slash - src + 1);
+		*end = '\0';
+	}
+	return dst;
+}
+
+// FIXME: leaks like hell.
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int dispatched_to_different_dirs(const char *one_parent_path)
+{
+	/* this might be a dir split, or files added
+	 * after the bulk move, or just an isolated
+	 * rename */
+	int two_idx, j, onep_len, maybe_dir_rename;
+	struct diff_rename_dst *one_leftover =
+		one_leftover = locate_rename_dst_dir(one_parent_path);
+
+	if (!one_leftover)
+		return 0;
+
+	/* try to see if it is a file added after the bulk move */
+	two_idx = one_leftover - rename_dst;
+	onep_len = strlen(one_parent_path);
+	maybe_dir_rename = 1;
+
+	/* check no leftover file was already here before */
+	for (j = two_idx; j < rename_dst_nr; j++) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in %s\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+	/* try the other direction (dup code) */
+	for (j = two_idx-1; j >= 0; j--) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		debug_bulkmove(("leftover file %s in '%s'\n",
+				rename_dst[j].two->path, one_parent_path));
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			debug_bulkmove(("... tells not a bulk move\n"));
+			break;
+		}
+		debug_bulkmove(("... not believed to prevent bulk move\n"));
+	}
+	if (!maybe_dir_rename)
+		return 1;
+
+	/* Here we are in the case where a directory
+	 * content was completely moved, but files
+	 * were added to it afterwards.  Proceed as
+	 * for a simple bulk move. */
+	return 0;
+}
+
+/*
+ * Assumes candidate->one is a subdir of seen->one, mark 'seen' as
+ * discarded if candidate->two is outside seen->two.  Also mark
+ * 'candidate' itself as discarded if the conflict implies so.
+ *
+ * Return 1 if 'seen' was discarded
+ */
+static int discard_if_outside(struct diff_bulk_rename *candidate,
+			      struct diff_bulk_rename *seen)
+{
+	if (!prefixcmp(candidate->two->path, seen->two->path)) {
+		debug_bulkmove((" 'dstpair' conforts 'seen'\n"));
+		return 0;
+	}
+
+	debug_bulkmove(("discarding %s -> %s from bulk moves (split into %s and %s)\n",
+			seen->one->path, seen->two->path,
+			candidate->two->path, seen->two->path));
+	seen->discarded = 1;
+	/* Need to discard dstpair as well, unless moving from
+	 * a strict subdir of seen->one or to a strict subdir
+	 * of seen->two */
+	if (!strcmp(seen->one->path, candidate->one->path) &&
+	    prefixcmp(seen->two->path, candidate->two->path)) {
+		debug_bulkmove(("... and not adding self\n"));
+		candidate->discarded = 1;
+	}
+	return 1;
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+	char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+
+	/* genuine new files (or believed to be so) */
+	if (!dstpair)
+		return;
+	/* dummy renames used by copy detection */
+	if (!strcmp(dstpair->one->path, dstpair->two->path))
+		return;
+
+	copy_dirname(one_parent_path, dstpair->one->path);
+	copy_dirname(two_parent_path, dstpair->two->path);
+
+	/* simple rename with no directory change */
+	if (!strcmp(one_parent_path, two_parent_path))
+		return;
+
+	debug_bulkmove(("[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path));
+
+	/* loop up one_parent_path over successive parents */
+	// FIXME: also loop over two_parent_path prefixes
+	do {
+		struct diff_bulk_rename *seen;
+		int old_nr = bulkmove_candidates_nr;
+		struct diff_bulk_rename *candidate =
+			locate_bulkmove_candidate(one_parent_path, two_parent_path);
+		debug_bulkmove(("[[]] %s ...\n", one_parent_path));
+		if (old_nr == bulkmove_candidates_nr) {
+			debug_bulkmove((" already seen\n"));
+			return;
+		}
+
+		/* After this commit, are there any files still under one_parent_path ?
+		 * Any file left would disqualifies this dir for bulk move.
+		 */
+		if (dispatched_to_different_dirs(one_parent_path)) {
+			// FIXME: check overlap with discard_if_outside()
+			candidate->discarded = 1;
+			return;
+		}
+
+		/* walk up for one_parent_path prefixes */
+		for (seen = candidate-1; (seen >= bulkmove_candidates) &&
+			     !prefixcmp(one_parent_path, seen->one->path); seen--) {
+			debug_bulkmove((" ? %s -> %s\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+		/* look down for other moves from one_parent_path */
+		seen = candidate + 1;
+		if (seen != bulkmove_candidates + bulkmove_candidates_nr &&
+		    !strcmp(one_parent_path, seen->one->path)) {
+			debug_bulkmove((" ? %s -> %s (2)\n", seen->one->path, seen->two->path));
+			/* subdir of "seen" dest dir ? */
+			if (discard_if_outside(candidate, seen))
+				continue;
+		}
+
+		/* next parent if any */
+		copy_dirname(one_parent_path, one_parent_path);
+	} while (*one_parent_path);
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+	int i;
+	for (i = 0; i < rename_dst_nr; i++)
+		check_one_bulk_move(rename_dst[i].pair);
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -424,6 +743,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_score *mx;
 	int i, j, rename_count;
 	int num_create, num_src, dst_cnt;
+	struct diff_bulk_rename *candidate;
 
 	if (!minimum_score)
 		minimum_score = DEFAULT_RENAME_SCORE;
@@ -438,8 +758,7 @@ void diffcore_rename(struct diff_options *options)
 				continue; /* not interested */
 			else
 				locate_rename_dst(p->two, 1);
-		}
-		else if (!DIFF_FILE_VALID(p->two)) {
+		} else if (!DIFF_FILE_VALID(p->two)) {
 			/*
 			 * If the source is a broken "delete", and
 			 * they did not really want to get broken,
@@ -450,14 +769,23 @@ void diffcore_rename(struct diff_options *options)
 			if (p->broken_pair && !p->score)
 				p->one->rename_used++;
 			register_rename_src(p->one, p->score);
-		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+		} else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p->one, p->score);
+			}
+			if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+				/* similarly, bulk move detection needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				// FIXME: check interaction with --find-copies-harder
+				locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -509,6 +837,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -569,7 +899,30 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+		diffcore_bulk_moves();
+
 	DIFF_QUEUE_CLEAR(&outq);
+
+	/* Now turn non-discarded bulkmove_candidates into real renames */
+	for (candidate = bulkmove_candidates;
+	     candidate < bulkmove_candidates + bulkmove_candidates_nr; candidate++) {
+		struct diff_filepair* pair;
+		if (candidate->discarded)
+			continue;
+		/* visualize toplevel dir if needed */
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+		pair->is_bulkmove = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/tree-diff.c b/tree-diff.c
index 12c9a88..89cedd4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
  2010-10-29  1:45   ` Jonathan Nieder
@ 2010-11-25 10:08   ` Ævar Arnfjörð Bjarmason
  2010-11-25 15:02     ` Jonathan Nieder
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-11-25 10:08 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git, Yann Dirson, Junio C Hamano

On Fri, Oct 29, 2010 at 00:08, Yann Dirson <ydirson@altern.org> wrote:

> +       slash = memrchr(src, '/', len);

I can't compile pu now on Solaris due to this bit. Are you planning on
picking up the patch / configure / Makefile detection discussed in the
"[PATCH] compat: add memrchr()" thread? Just wondering what the
progress was on that.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-11-25 10:08   ` [PATCH v8 1/5] " Ævar Arnfjörð Bjarmason
@ 2010-11-25 15:02     ` Jonathan Nieder
  2010-11-25 17:19       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Nieder @ 2010-11-25 15:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Yann Dirson, git, Yann Dirson, Junio C Hamano

Ævar Arnfjörð Bjarmason wrote:
> On Fri, Oct 29, 2010 at 00:08, Yann Dirson <ydirson@altern.org> wrote:

>> +       slash = memrchr(src, '/', len);
>
> I can't compile pu now on Solaris due to this bit. Are you planning on
> picking up the patch / configure / Makefile detection discussed in the
> "[PATCH] compat: add memrchr()" thread? Just wondering what the
> progress was on that.

BTW, remember that the configure / Makefile detection is only an extra
nicety.  [1] teaches git to rely on the system memrchr on glibc
systems only; on other systems (unless they define memrchr as a macro,
which would be a strange thing to do), gitmemrchr is used.

Upshot: it should be safe to use [1] without the configure / Makefile
support.  Later, an interested person can write a patch for improved
memrchr[2] on *BSD and Plan 9.

Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/159081/focus=159121
[2] Lower code size.  Maybe faster.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/5] Introduce bulk-move detection in diffcore.
  2010-11-25 15:02     ` Jonathan Nieder
@ 2010-11-25 17:19       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-11-25 17:19 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Yann Dirson, git, Yann Dirson, Junio C Hamano

On Thu, Nov 25, 2010 at 16:02, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Ævar Arnfjörð Bjarmason wrote:
>> On Fri, Oct 29, 2010 at 00:08, Yann Dirson <ydirson@altern.org> wrote:
>
>>> +       slash = memrchr(src, '/', len);
>>
>> I can't compile pu now on Solaris due to this bit. Are you planning on
>> picking up the patch / configure / Makefile detection discussed in the
>> "[PATCH] compat: add memrchr()" thread? Just wondering what the
>> progress was on that.
>
> BTW, remember that the configure / Makefile detection is only an extra
> nicety.  [1] teaches git to rely on the system memrchr on glibc
> systems only; on other systems (unless they define memrchr as a macro,
> which would be a strange thing to do), gitmemrchr is used.
>
> Upshot: it should be safe to use [1] without the configure / Makefile
> support.  Later, an interested person can write a patch for improved
> memrchr[2] on *BSD and Plan 9.

Sure. It's just a bit of a hack since we already have a standard-ish
way of doing this. Which is to optionally have configure.ac support,
then add platform-specific stuff to the Makefile & drop something in
compat/memrchr.c.

Doing it that way and actually detecting it means we use e.g. the
native memrchr on FreeBSD. I just don't see a reason to do it like the
[1] patch when it's just as easy to do it the "proper way", where it's
more extendable.

Anyway, it's a very minor issue. I'll probably do a patch for this
series like I did for that other thing that needed autoconf detection
recently. Thanks everyone.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-11-25 17:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-28 22:08 [PATCH v8] Detection of directory renames Yann Dirson
2010-10-28 22:08 ` [PATCH v8 1/5] Introduce bulk-move detection in diffcore Yann Dirson
2010-10-29  1:45   ` Jonathan Nieder
2010-10-29 21:18     ` Yann Dirson
2010-10-30  0:26       ` Jonathan Nieder
2010-11-04 21:56         ` Yann Dirson
2010-11-04 21:59         ` [PATCH] " Yann Dirson
2010-11-25 10:08   ` [PATCH v8 1/5] " Ævar Arnfjörð Bjarmason
2010-11-25 15:02     ` Jonathan Nieder
2010-11-25 17:19       ` Ævar Arnfjörð Bjarmason
2010-10-28 22:08 ` [PATCH v8 2/5] Raw diff output format for bulk moves Yann Dirson
2010-10-28 22:08 ` [PATCH v8 3/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
2010-10-28 22:08 ` [PATCH v8 4/5] Unified diff output format for bulk moves Yann Dirson
2010-10-28 22:08 ` [PATCH v8 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
2010-10-29  1:05 ` [PATCH v8] Detection of directory renames Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.