All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] A new merge stragety 'subtree'.
@ 2007-02-17  1:49 Junio C Hamano
  2007-02-17  7:14 ` Shawn O. Pearce
  2007-02-17  8:45 ` Shawn O. Pearce
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2007-02-17  1:49 UTC (permalink / raw)
  To: git

I've explained how I initially merged from Shawn's git-gui, but
some people might have wondered how I continue to pull from
him.  I published the secret on 'pu' last night, but here is an
update that I currently have in my tree (still in 'pu').

The new merge strategy 'subtree' is a slightly modified
'recursive'.  It merges current branch A with other branch
B using common merge base O (and if there are more than one
common merge bases, they are recursively merged), but it can 
"shift" trees while doing so.

I did the initial git-gui merge in commit b4d2b0.  Later, I
needed to get some updates so I created commit 67c7575.


                 b4d2b0  d63ea11
 git.git    ----o---*-------o---* 67c7575
                   /           /
 git-gui.git   ---o------o----o fdf6cfc
            0960f7d

The commit before the merge (d63ea11) has slightly older tree
contents from git-gui repository, but all renamed to be under
git-gui/ directory.

        $ git diff --numstat d63ea11:git-gui fdf6cfc^{tree}
        43	12	GIT-VERSION-GEN
        14	6	Makefile
        1	1	git-gui.sh

What merge-subtree does, when doing tree-level three-way merge
between A and B based on O (that is, merge-subtree O -- A B) is
to 'shift' B to match the tree shape of A.

The tree-shifting is a fun operation, as it sometimes needs to
shift up and sometimes shift down.  When B looks like a subtree
in A, we create a copy of A's tree, and replace the subtree that
corresponds to B with B's tree.  When A looks like a subtree in
B, then we just pick the subtree that corresponds to A's tree
out of B (discarding the other parts of it).

The same tree-shifting is done for the merge base to make it
look like A.  Then merge-recursive does its job just as usual.
The commit 67c7575 was made by shifting Shawn's tree down,
replacing the tree at git-gui in d63ea11 with fdf6cfc.

It works the other way as well.  Shawn can pull d63ea11 from me
to his tree with subtree strategy.  In this case, he will be
shifting my tree up, using only the tree that corresponds to
git-gui/ subdirectory.

What's interesting is that this makes merge an asymmetric
operation.  The resulting tree from such a pull will not look
like what I have in 67c7575; it will be only the git-gui subtree
of it.

For normal merge strategies, if you are on branch A and merge
branch B into it, what you will get is exactly the same as what
you would get by merging branch A while being on branch A
(modulo the direction of conflict markers).  With the subtree
merge, it is not the case anymore.  If I pull from git-gui.git,
the resulting tree look like git.git with Shawn's updates.  If
Shawn pulls from git.git, even after I did some changes on my
own to git-gui subdirectory, bypassing his tree, he will get a
merged result pertaining to git-gui.git repository
(i.e. git-gui/ subdirectory of what I have).  This way, we can
continue merging from each other.

Although I do not plan to commit anything in git-gui/ part of my
tree myself, bypassing Shawn, it is nice to know that it will
not introduce problems down the road.

The subtree merge strategy is inherently incompatible with
fast-forward, so the patch teaches git-merge about it.

-- >8 --

This merge strategy largely piggy-backs on git-merge-recursive.
When merging trees A and B, if B corresponds to a subtree of A,
B is first adjusted to match the tree structure of A, instead of
reading the trees at the same level.  This adjustment is also
done to the common ancestor tree.

If you are pulling updates from git-gui repository into git.git
repository, the root level of the former corresponds to git-gui/
subdirectory of the latter.  The tree object of git-gui's toplevel
is wrapped in a fake tree object, whose sole entry has name 'git-gui'
and records object name of the true tree, before being used by
the 3-way merge code.

If you are merging the other way, only the git-gui/ subtree of
git.git is extracted and merged into git-gui's toplevel.

The detection of corresponding subtree is done by comparing the
pathnames and types in the toplevel of the tree.

Heuristics galore!  That's the git way ;-).

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 .gitignore         |    2 +
 Makefile           |   10 ++-
 cache.h            |    3 +
 git-merge.sh       |    4 +-
 match-trees.c      |  301 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 merge-recursive.c  |   29 +++++
 test-match-trees.c |   24 ++++
 7 files changed, 370 insertions(+), 3 deletions(-)
 create mode 100644 match-trees.c
 create mode 100644 test-match-trees.c

diff --git a/.gitignore b/.gitignore
index f15155d..8f732b4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -74,6 +74,7 @@ git-merge-ours
 git-merge-recursive
 git-merge-resolve
 git-merge-stupid
+git-merge-subtree
 git-mktag
 git-mktree
 git-name-rev
@@ -142,6 +143,7 @@ gitweb/gitweb.cgi
 test-date
 test-delta
 test-dump-cache-tree
+test-match-trees
 common-cmds.h
 *.tar.gz
 *.dsc
diff --git a/Makefile b/Makefile
index ebecbbd..e255a0d 100644
--- a/Makefile
+++ b/Makefile
@@ -221,6 +221,8 @@ BUILT_INS = \
 # what 'all' will build and 'install' will install, in gitexecdir
 ALL_PROGRAMS = $(PROGRAMS) $(SCRIPTS)
 
+ALL_PROGRAMS += git-merge-subtree$X
+
 # Backward compatibility -- to be removed after 1.0
 PROGRAMS += git-ssh-pull$X git-ssh-push$X
 
@@ -260,7 +262,7 @@ LIB_OBJS = \
 	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
 	tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
 	revision.o pager.o tree-walk.o xdiff-interface.o \
-	write_or_die.o trace.o list-objects.o grep.o \
+	write_or_die.o trace.o list-objects.o grep.o match-trees.o \
 	alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
 	color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o
 
@@ -627,6 +629,9 @@ git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS) GIT-CFLAGS
 
 help.o: common-cmds.h
 
+git-merge-subtree$X: git-merge-recursive$X
+	rm -f $@ && ln git-merge-recursive$X $@
+
 $(BUILT_INS): git$X
 	rm -f $@ && ln git$X $@
 
@@ -827,6 +832,9 @@ test-dump-cache-tree$X: dump-cache-tree.o $(GITLIBS)
 test-sha1$X: test-sha1.o $(GITLIBS)
 	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
 
+test-match-trees$X: test-match-trees.o $(GITLIBS)
+	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
+
 check-sha1:: test-sha1$X
 	./test-sha1.sh
 
diff --git a/cache.h b/cache.h
index c62b0b0..d1e5035 100644
--- a/cache.h
+++ b/cache.h
@@ -468,4 +468,7 @@ extern int nfvasprintf(char **str, const char *fmt, va_list va);
 extern void trace_printf(const char *format, ...);
 extern void trace_argv_printf(const char **argv, int count, const char *format, ...);
 
+/* match-trees.c */
+void shift_tree(const unsigned char *, const unsigned char *, unsigned char *, int);
+
 #endif /* CACHE_H */
diff --git a/git-merge.sh b/git-merge.sh
index 498c938..f406795 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -16,10 +16,10 @@ test -z "$(git ls-files -u)" ||
 LF='
 '
 
-all_strategies='recur recursive octopus resolve stupid ours'
+all_strategies='recur recursive octopus resolve stupid ours subtree'
 default_twohead_strategies='recursive'
 default_octopus_strategies='octopus'
-no_trivial_merge_strategies='ours'
+no_trivial_merge_strategies='ours subtree'
 use_strategies=
 
 index_merge=t
diff --git a/match-trees.c b/match-trees.c
new file mode 100644
index 0000000..fe7813c
--- /dev/null
+++ b/match-trees.c
@@ -0,0 +1,301 @@
+#include "cache.h"
+#include "tree.h"
+#include "tree-walk.h"
+
+static int score_missing(unsigned mode, const char *path)
+{
+	int score;
+
+	if (S_ISDIR(mode))
+		score = -1000;
+	else if (S_ISLNK(mode))
+		score = -500;
+	else
+		score = -50;
+	return score;
+}
+
+static int score_differs(unsigned mode1, unsigned mode2, const char *path)
+{
+	int score;
+
+	if (S_ISDIR(mode1) != S_ISDIR(mode2))
+		score = -100;
+	else if (S_ISLNK(mode1) != S_ISLNK(mode2))
+		score = -50;
+	else
+		score = -5;
+	return score;
+}
+
+static int score_matches(unsigned mode1, unsigned mode2, const char *path)
+{
+	int score;
+
+	/* Heh, we found SHA-1 collisions between different kind of objects */
+	if (S_ISDIR(mode1) != S_ISDIR(mode2))
+		score = -100;
+	else if (S_ISLNK(mode1) != S_ISLNK(mode2))
+		score = -50;
+
+	else if (S_ISDIR(mode1))
+		score = 1000;
+	else if (S_ISLNK(mode1))
+		score = 500;
+	else
+		score = 250;
+	return score;
+}
+
+/*
+ * Inspect two trees, and give a score that tells how similar they are.
+ */
+static int score_trees(const unsigned char *hash1, const unsigned char *hash2)
+{
+	struct tree_desc one;
+	struct tree_desc two;
+	void *one_buf, *two_buf;
+	int score = 0;
+	char type[10];
+
+	one_buf = read_sha1_file(hash1, type, &one.size);
+	if (!one_buf)
+		die("unable to read tree (%s)", sha1_to_hex(hash1));
+	if (strcmp(type, tree_type))
+		die("%s is not a tree", sha1_to_hex(hash1));
+	two_buf = read_sha1_file(hash2, type, &two.size);
+	if (!two_buf)
+		die("unable to read tree (%s)", sha1_to_hex(hash2));
+	if (strcmp(type, tree_type))
+		die("%s is not a tree", sha1_to_hex(hash2));
+	one.buf = one_buf;
+	two.buf = two_buf;
+	while (one.size | two.size) {
+		const unsigned char *elem1 = elem1;
+		const unsigned char *elem2 = elem2;
+		const char *path1, *path2;
+		unsigned mode1, mode2;
+		int cmp;
+
+		if (one.size)
+			elem1 = tree_entry_extract(&one, &path1, &mode1);
+		if (two.size)
+			elem2 = tree_entry_extract(&two, &path2, &mode2);
+
+		if (!one.size) {
+			/* two has more entries */
+			score += score_missing(mode2, path2);
+			update_tree_entry(&two);
+			continue;
+		}
+		if (!two.size) {
+			/* two lacks this entry */
+			score += score_missing(mode1, path1);
+			update_tree_entry(&one);
+			continue;
+		}
+		cmp = base_name_compare(path1, strlen(path1), mode1,
+					path2, strlen(path2), mode2);
+		if (cmp < 0) {
+			/* path1 does not appear in two */
+			score += score_missing(mode1, path1);
+			update_tree_entry(&one);
+			continue;
+		}
+		else if (cmp > 0) {
+			/* path2 does not appear in one */
+			score += score_missing(mode2, path2);
+			update_tree_entry(&two);
+			continue;
+		}
+		else if (hashcmp(elem1, elem2))
+			/* they are different */
+			score += score_differs(mode1, mode2, path1);
+		else
+			/* same subtree or blob */
+			score += score_matches(mode1, mode2, path1);
+		update_tree_entry(&one);
+		update_tree_entry(&two);
+	}
+	free(one_buf);
+	free(two_buf);
+	return score;
+}
+
+/*
+ * Match one itself and its subtrees with two and pick the best match.
+ */
+static void match_trees(const unsigned char *hash1,
+			const unsigned char *hash2,
+			int *best_score,
+			char **best_match,
+			char *base,
+			int recurse_limit)
+{
+	struct tree_desc one;
+	void *one_buf;
+	char type[10];
+
+	one_buf = read_sha1_file(hash1, type, &one.size);
+	if (!one_buf)
+		die("unable to read tree (%s)", sha1_to_hex(hash1));
+	if (strcmp(type, tree_type))
+		die("%s is not a tree", sha1_to_hex(hash1));
+	one.buf = one_buf;
+
+	while (one.size) {
+		const char *path;
+		const unsigned char *elem;
+		unsigned mode;
+		int score;
+
+		elem = tree_entry_extract(&one, &path, &mode);
+		if (!S_ISDIR(mode))
+			goto next;
+		score = score_trees(elem, hash2);
+		if (*best_score < score) {
+			char *newpath;
+			newpath = xmalloc(strlen(base) + strlen(path) + 1);
+			sprintf(newpath, "%s%s", base, path);
+			free(*best_match);
+			*best_match = newpath;
+			*best_score = score;
+		}
+		if (recurse_limit) {
+			char *newbase;
+			newbase = xmalloc(strlen(base) + strlen(path) + 2);
+			sprintf(newbase, "%s%s/", base, path);
+			match_trees(elem, hash2, best_score, best_match,
+				    newbase, recurse_limit - 1);
+			free(newbase);
+		}
+
+	next:
+		update_tree_entry(&one);
+	}
+	free(one_buf);
+}
+
+/*
+ * A tree "hash1" has a subdirectory at "prefix".  Come up with a
+ * tree object by replacing it with another tree "hash2".
+ */
+static int splice_tree(const unsigned char *hash1,
+		       char *prefix,
+		       const unsigned char *hash2,
+		       unsigned char *result)
+{
+	char *subpath;
+	int toplen;
+	char *buf;
+	unsigned long sz;
+	struct tree_desc desc;
+	unsigned char *rewrite_here;
+	const unsigned char *rewrite_with;
+	unsigned char subtree[20];
+	char type[20];
+	int status;
+
+	subpath = strchr(prefix, '/');
+	if (!subpath)
+		toplen = strlen(prefix);
+	else {
+		toplen = subpath - prefix;
+		subpath++;
+	}
+
+	buf = read_sha1_file(hash1, type, &sz);
+	if (!buf)
+		die("cannot read tree %s", sha1_to_hex(hash1));
+	desc.buf = buf;
+	desc.size = sz;
+
+	rewrite_here = NULL;
+	while (desc.size) {
+		const char *name;
+		unsigned mode;
+		const unsigned char *sha1;
+
+		sha1 = tree_entry_extract(&desc, &name, &mode);
+		if (strlen(name) == toplen &&
+		    !memcmp(name, prefix, toplen)) {
+			if (!S_ISDIR(mode))
+				die("entry %s in tree %s is not a tree",
+				    name, sha1_to_hex(hash1));
+			rewrite_here = (unsigned char *) sha1;
+			break;
+		}
+		update_tree_entry(&desc);
+	}
+	if (!rewrite_here)
+		die("entry %.*s not found in tree %s",
+		    toplen, prefix, sha1_to_hex(hash1));
+	if (subpath) {
+		status = splice_tree(rewrite_here, subpath, hash2, subtree);
+		if (status)
+			return status;
+		rewrite_with = subtree;
+	}
+	else
+		rewrite_with = hash2;
+	hashcpy(rewrite_here, rewrite_with);
+	status = write_sha1_file(buf, sz, tree_type, result);
+	free(buf);
+	return status;
+}
+
+/*
+ * We are trying to come up with a merge between one and two that
+ * results in a tree shape similar to one.  The tree two might
+ * correspond to a subtree of one, in which case it needs to be
+ * shifted down by prefixing otherwise empty directories.  On the
+ * other hand, it could cover tree one and we might need to pick a
+ * subtree of it.
+ */
+void shift_tree(const unsigned char *hash1,
+		const unsigned char *hash2,
+		unsigned char *shifted,
+		int depth_limit)
+{
+	char *add_prefix;
+	char *del_prefix;
+	int add_score, del_score;
+
+	add_score = del_score = score_trees(hash1, hash2);
+	add_prefix = xcalloc(1, 1);
+	del_prefix = xcalloc(1, 1);
+
+	/*
+	 * See if one's subtree resembles two; if so we need to prefix
+	 * two with a few fake trees to match the prefix.
+	 */
+	match_trees(hash1, hash2, &add_score, &add_prefix, "", depth_limit);
+
+	/*
+	 * See if two's subtree resembles one; if so we need to
+	 * pick only subtree of two.
+	 */
+	match_trees(hash2, hash1, &del_score, &del_prefix, "", depth_limit);
+
+	/* Assume we do not have to do any shifting */
+	hashcpy(shifted, hash2);
+
+	if (add_score < del_score) {
+		/* We need to pick a subtree of two */
+		unsigned mode;
+
+		if (!*del_prefix)
+			return;
+
+		if (get_tree_entry(hash2, del_prefix, shifted, &mode))
+			die("cannot find path %s in tree %s",
+			    del_prefix, sha1_to_hex(hash2));
+		return;
+	}
+
+	if (!*add_prefix)
+		return;
+
+	splice_tree(hash1, add_prefix, hash2, shifted);
+}
+
diff --git a/merge-recursive.c b/merge-recursive.c
index 5898942..7b4e339 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -16,6 +16,22 @@
 #include "path-list.h"
 #include "xdiff-interface.h"
 
+static int subtree_merge;
+
+static struct tree *shift_tree_object(struct tree *one, struct tree *two)
+{
+	unsigned char shifted[20];
+
+	/*
+	 * NEEDSWORK: this limits the recursion depth to hardcoded
+	 * value '2' to avoid excessive overhead.
+	 */
+	shift_tree(one->object.sha1, two->object.sha1, shifted, 2);
+	if (!hashcmp(two->object.sha1, shifted))
+		return two;
+	return lookup_tree(shifted);
+}
+
 /*
  * A virtual commit has
  * - (const char *)commit->util set to the name, and
@@ -1111,6 +1127,12 @@ static int merge_trees(struct tree *head,
 		       struct tree **result)
 {
 	int code, clean;
+
+	if (subtree_merge) {
+		merge = shift_tree_object(head, merge);
+		common = shift_tree_object(head, common);
+	}
+
 	if (sha_eq(common->object.sha1, merge->object.sha1)) {
 		output(0, "Already uptodate!");
 		*result = head;
@@ -1316,6 +1338,13 @@ int main(int argc, char *argv[])
 	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
 	int index_fd;
 
+	if (argv[0]) {
+		int namelen = strlen(argv[0]);
+		if (8 < namelen &&
+		    !strcmp(argv[0] + namelen - 8, "-subtree"))
+			subtree_merge = 1;
+	}
+
 	git_config(merge_config);
 	if (getenv("GIT_MERGE_VERBOSITY"))
 		verbosity = strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
diff --git a/test-match-trees.c b/test-match-trees.c
new file mode 100644
index 0000000..a3c4688
--- /dev/null
+++ b/test-match-trees.c
@@ -0,0 +1,24 @@
+#include "cache.h"
+#include "tree.h"
+
+int main(int ac, char **av)
+{
+	unsigned char hash1[20], hash2[20], shifted[20];
+	struct tree *one, *two;
+
+	if (get_sha1(av[1], hash1))
+		die("cannot parse %s as an object name", av[1]);
+	if (get_sha1(av[2], hash2))
+		die("cannot parse %s as an object name", av[2]);
+	one = parse_tree_indirect(hash1);
+	if (!one)
+		die("not a treeish %s", av[1]);
+	two = parse_tree_indirect(hash2);
+	if (!two)
+		die("not a treeish %s", av[2]);
+
+	shift_tree(one->object.sha1, two->object.sha1, shifted, -1);
+	printf("shifted: %s\n", sha1_to_hex(shifted));
+
+	exit(0);
+}
-- 
1.5.0.552.ge1b1c

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  1:49 [PATCH] A new merge stragety 'subtree' Junio C Hamano
@ 2007-02-17  7:14 ` Shawn O. Pearce
  2007-02-17  8:29   ` Junio C Hamano
  2007-02-17  8:45 ` Shawn O. Pearce
  1 sibling, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2007-02-17  7:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> For normal merge strategies, if you are on branch A and merge
> branch B into it, what you will get is exactly the same as what
> you would get by merging branch A while being on branch A
> (modulo the direction of conflict markers).  With the subtree
> merge, it is not the case anymore.  If I pull from git-gui.git,
> the resulting tree look like git.git with Shawn's updates.  If
> Shawn pulls from git.git, even after I did some changes on my
> own to git-gui subdirectory, bypassing his tree, he will get a
> merged result pertaining to git-gui.git repository
> (i.e. git-gui/ subdirectory of what I have).  This way, we can
> continue merging from each other.
> 
> Although I do not plan to commit anything in git-gui/ part of my
> tree myself, bypassing Shawn, it is nice to know that it will
> not introduce problems down the road.

This does actually cause a problem if you merge a git.git commit
into git-gui.git (by stripping the git-gui/ part off).  The problem
is the entire git.git history would then become the second parent
of the git-gui.git merge commit, and suddenly the git-gui.git
repository increases by >11 MiB in size...  ;-)

With regards to maintaining git-gui: I'll apply all patches to my
tree and do testing there, then ask Junio to merge a tagged release
over to git.git for inclusion in the next git release.

To avoid pulling the entire git.git history into git-gui, I'd ask
that anyone bypassing me (e.g. if I'm being horribly unresponsive
one week) checkout the git-gui branch from git.git, apply the
change(s) there, then merge that branch into git.git using the
subtree strategy.  This way I can later fast-forward git-gui.git
to the fixed commit, without sucking in more than I bargained for.

For example:

	git log -n1 -- git-gui
	# copy the second parent...
	git checkout -b fixgg <secondparent>
	# do fixes...
	git checkout master
	git merge -s subtree fixgg

Then I can later obtain `fixgg` from the merge commit in git.git
and update git-gui.git, without sucking in git.git's objects.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  7:14 ` Shawn O. Pearce
@ 2007-02-17  8:29   ` Junio C Hamano
  2007-02-17  8:53     ` Shawn O. Pearce
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2007-02-17  8:29 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> ...
>> Although I do not plan to commit anything in git-gui/ part of my
>> tree myself, bypassing Shawn, it is nice to know that it will
>> not introduce problems down the road.
>
> This does actually cause a problem if you merge a git.git commit
> into git-gui.git (by stripping the git-gui/ part off).  The problem
> is the entire git.git history would then become the second parent
> of the git-gui.git merge commit, and suddenly the git-gui.git
> repository increases by >11 MiB in size...  ;-)

Don't worry.  I am not asking you to actually perform such a merge
and make the relut as a part of _official_ git-gui.git history.

> To avoid pulling the entire git.git history into git-gui, I'd ask
> that anyone bypassing me (e.g. if I'm being horribly unresponsive
> one week) checkout the git-gui branch from git.git, apply the
> change(s) there, then merge that branch into git.git using the
> subtree strategy.

Actually, I do not think you even need to ask them to do that.
I am not planning to apply patches to git.git that touch
git-gui/ subdirectory myself, but if you see such a patch on the
list, you could first apply it to your copy of git.git
repository, and run your private edition of cherry-pick that
uses merge-subtree instead of merge-recursive to pick it out
onto your 'master' branch of git-gui.git repository.  That way,
the next time I'll pull from your git-gui.git repository, I will
get the change through you.

And the procedure would actually work _even_ _if_ (repeat, I do
not plan to do this) I applied such a patch to my tree before I
pull from you --- it will just result in an accidental clean
merge.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  1:49 [PATCH] A new merge stragety 'subtree' Junio C Hamano
  2007-02-17  7:14 ` Shawn O. Pearce
@ 2007-02-17  8:45 ` Shawn O. Pearce
  2007-02-17  8:51   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2007-02-17  8:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> The detection of corresponding subtree is done by comparing the
> pathnames and types in the toplevel of the tree.
> 
> Heuristics galore!  That's the git way ;-).

I have some concerns about the match-tree heuristic you are using here.

For example, it is very common for Java projects to have the same
tree "shape".  Just look at egit/jgit for an example, the three
top level directories are:

	org.spearce.egit.core/
		META-INF/
		build.properties
		plugin.xml
		src/

	org.spearce.egit.ui/
		META-INF/
		build.properties
		plugin.xml
		src/

	org.spearce.jgit
		META-INF/
		src/

If I were to treat the first two as subprojects this new subtree
merge strategy might fail here as it could easily match to the
wrong directory.


What about a different approach?

In a merge of commit#1 (parent project) and commit#2 (subroject)...

We have the set of merge bases readily available.  We just have
to find out in each merge base where the files went from commit#2,
then modify commit#2 to conform to that same shape.

Really that isn't too different from a rename detection.  In other
words do something like the following:

  a) Scan the parents of the merge base B for a commit that is
  in commit#2's ancestory but not commit#1's ancestory, except by
  the merge commit B.  Such a parent must be from the project that
  commit#2 is also from.  For sake of explaining this, lets call
  this parent B^2.

  b) Perform a partial rename-diff between B^2 and B.  The magic
  here is we need to discard any path in B that also appears in
  B^1 and B^2, and that has the same SHA-1 as in B^1, before we do
  the rename-diff.

  c) Find the most common prefix within the renamed files.

  d) Fit commit#2 to use that prefix, and merge.


Here's a real example.  In 67c75759 you merged git-gui.git.
67c75759^1 is from git.git, 67c75759^2 is from git-gui.git.

The stock rename-diff:

  $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 67c75759
  :100644 100644 c714d38... d99372a... M  .gitignore
  :100755 100755 8fac8cb... 7a10b60... M  GIT-VERSION-GEN
  :100644 100644 fd82d9d... 5d31e6d... M  Makefile
  :100644 100644 b95a137... b95a137... R100       TODO    git-gui/TODO
  :100755 100755 f5010dd... f5010dd... R100       git-gui.sh      git-gui/git-gui.sh

The problem here is both ^1 and ^2 defines the first three paths,
so we think we modified them in the merge rather than moved them.
But these three files match ^1, as we did not do an evil merge here.
That's why they are showing as modified in this diff.

Now take 67c7 and whack those three files (step b above), and rediff:

  $ C=$(git ls-tree 67c75759 | sed '
          /       .gitignore$/d
          /       GIT-VERSION-GEN$/d
          /       Makefile$/d' | git mktree)
  $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 $C
  :100644 100644 c714d38... c714d38... R100       .gitignore      git-gui/.gitignore
  :100755 100755 8fac8cb... 8fac8cb... R100       GIT-VERSION-GEN git-gui/GIT-VERSION-GEN
  :100644 100644 fd82d9d... fd82d9d... R100       Makefile        git-gui/Makefile
  :100644 100644 b95a137... b95a137... R100       TODO    git-gui/TODO
  :100755 100755 f5010dd... f5010dd... R100       git-gui.sh      git-gui/git-gui.sh

Wow, look at that, everything starts with 'git-gui/'!  ;-)

Then we just need to pick the most popular common prefix of all
renamed paths and fit commit#2 to conform to that structure.
Finally we can run the merge through.

The (now functional) pretend object stuff can be useful here,
such as to make $C above so we can pass it off to diffcore.


I think popping off the 'git-gui/' prefix would be the same deal,
only we'd be looking at the old names to determine the prefix to pop,
rather than the new names.

We already do rename detection in merge-recursive.  Slapping an extra
rename pass in front of things when it is invoked as merge-subtree
can't performance hurt that much.

Thoughts?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  8:45 ` Shawn O. Pearce
@ 2007-02-17  8:51   ` Junio C Hamano
  2007-02-17  9:02     ` Shawn O. Pearce
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2007-02-17  8:51 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Really that isn't too different from a rename detection.

I know, but I am just lazy.  The match scoring part is very well
isolated so you or anybody can plug different code there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  8:29   ` Junio C Hamano
@ 2007-02-17  8:53     ` Shawn O. Pearce
  2007-02-17 18:02       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2007-02-17  8:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > To avoid pulling the entire git.git history into git-gui, I'd ask
> > that anyone bypassing me (e.g. if I'm being horribly unresponsive
> > one week) checkout the git-gui branch from git.git, apply the
> > change(s) there, then merge that branch into git.git using the
> > subtree strategy.
> 
> Actually, I do not think you even need to ask them to do that.
> I am not planning to apply patches to git.git that touch
> git-gui/ subdirectory myself, but if you see such a patch on the
> list, you could first apply it to your copy of git.git
> repository, and run your private edition of cherry-pick that
> uses merge-subtree instead of merge-recursive to pick it out
> onto your 'master' branch of git-gui.git repository.  That way,
> the next time I'll pull from your git-gui.git repository, I will
> get the change through you.

Or... I/we implement -p2 in git-am.  Then I can apply it right to
my git-gui repository.  :-)

Wait, isn't there a patch floating around for that?
Wasn't it sent in by Andy Parkins just before 1.5.0?

Just for the record: I am perfectly happy taking patches for git-gui
that were made against the git.git repository.  Obviously I can
easily apply it onto git-gui.git by stripping the path.  I'm also
happy directly pulling worthwhile commits, but I won't pull something
that is based on a git.git commit, for the reason stated above.
 
> And the procedure would actually work _even_ _if_ (repeat, I do
> not plan to do this) I applied such a patch to my tree before I
> pull from you --- it will just result in an accidental clean
> merge.

Yes, of course.  One of the insanely nice things about git. :)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  8:51   ` Junio C Hamano
@ 2007-02-17  9:02     ` Shawn O. Pearce
  2007-02-17 18:04       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn O. Pearce @ 2007-02-17  9:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > Really that isn't too different from a rename detection.
> 
> I know, but I am just lazy.  The match scoring part is very well
> isolated so you or anybody can plug different code there.

Is that a hint that I should code something if I think its better?  ;-)

I'll probably try to code up a competing version of the match code
based on the fun stuff you did not quote, but likely won't get
around to it for at least week.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  8:53     ` Shawn O. Pearce
@ 2007-02-17 18:02       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2007-02-17 18:02 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Or... I/we implement -p2 in git-am.  Then I can apply it right to
> my git-gui repository.  :-)

Which is 2092a1fe (v1.5.0~23), I think.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] A new merge stragety 'subtree'.
  2007-02-17  9:02     ` Shawn O. Pearce
@ 2007-02-17 18:04       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2007-02-17 18:04 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> "Shawn O. Pearce" <spearce@spearce.org> writes:
>> 
>> > Really that isn't too different from a rename detection.
>> 
>> I know, but I am just lazy.  The match scoring part is very well
>> isolated so you or anybody can plug different code there.
>
> Is that a hint that I should code something if I think its better?  ;-)

It is just a statement that scoring part is currently just a
stub, good enough only for what I needed, and any half-seriously
done implementation would be better than what I currently have
there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-17 18:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-17  1:49 [PATCH] A new merge stragety 'subtree' Junio C Hamano
2007-02-17  7:14 ` Shawn O. Pearce
2007-02-17  8:29   ` Junio C Hamano
2007-02-17  8:53     ` Shawn O. Pearce
2007-02-17 18:02       ` Junio C Hamano
2007-02-17  8:45 ` Shawn O. Pearce
2007-02-17  8:51   ` Junio C Hamano
2007-02-17  9:02     ` Shawn O. Pearce
2007-02-17 18:04       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.