All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately.
@ 2009-04-26 22:29 Avery Pennarun
  2009-04-26 22:29 ` [PATCH/RFC 2/2] Automated test script for 'git subtree' Avery Pennarun
  2009-04-30  2:27 ` [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
  0 siblings, 2 replies; 13+ messages in thread
From: Avery Pennarun @ 2009-04-26 22:29 UTC (permalink / raw)
  To: git; +Cc: Avery Pennarun

Many projects are made of a combination of several subprojects/libraries and
some application-specific code.  In some cases, particularly when the
subprojects are all maintained independently, 'git submodule' is the best
way to deal with this situation.  But if you frequently change the
subprojects as part of developing your application, use multiple branches,
and sometimes want to push your subproject changes upstream, the overhead of
manually managing submodules can be excessive.

'git subtree' provides an alternative mechanism, based around the
'git merge -s subtree' merge strategy.  Instead of tracking a submodule
separately, you merge its history into your main project, and occasionally
extract a new "virtual history" from your mainline that can be easily merged
back into the upstream project.  The virtual history can be incrementally
expanded as you make more changes to the superproject.

You would normally then merge the virtual history back into your mainline
(the --rejoin option). This results in extra commits in your application
that appear to change the same files, but these extra commits will tend to
be ignored by git's merge simplification algorithm anyway.

For example, gitweb (commit 1130ef3) was merged into git as of commit
0a8f4f0, after which it was no longer maintained separately.  But imagine it
had been maintained separately, and we wanted to extract git's changes to
gitweb since that time, to share with the upstream.  You could do this:

git subtree split --prefix=gitweb --annotate='(split) ' \
	0a8f4f0^.. --onto=1130ef3 --rejoin

If gitweb had originally been merged using 'git subtree add' (or a previous
split had been done with --rejoin specified), then you could incrementally
produce the list of new changes without needing to remember any commit ids:

git subtree split --prefix=gitweb --annotate='(split) ' --rejoin
---
 Makefile         |    1 +
 command-list.txt |    1 +
 git-subtree.sh   |  435 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 437 insertions(+), 0 deletions(-)
 create mode 100755 git-subtree.sh

diff --git a/Makefile b/Makefile
index 5c8e83a..f14e11c 100644
--- a/Makefile
+++ b/Makefile
@@ -305,6 +305,7 @@ SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-sh-setup.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
+SCRIPT_SH += git-subtree.sh
 SCRIPT_SH += git-web--browse.sh
 
 SCRIPT_PERL += git-add--interactive.perl
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..9be4774 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -113,6 +113,7 @@ git-stash                               mainporcelain
 git-status                              mainporcelain common
 git-stripspace                          purehelpers
 git-submodule                           mainporcelain
+git-subtree                             mainporcelain
 git-svn                                 foreignscminterface
 git-symbolic-ref                        plumbingmanipulators
 git-tag                                 mainporcelain common
diff --git a/git-subtree.sh b/git-subtree.sh
new file mode 100755
index 0000000..39c377c
--- /dev/null
+++ b/git-subtree.sh
@@ -0,0 +1,435 @@
+#!/bin/bash
+#
+# git-subtree.sh: split/join git repositories in subdirectories of this one
+#
+# Copyright (C) 2009 Avery Pennarun <apenwarr@gmail.com>
+#
+if [ $# -eq 0 ]; then
+    set -- -h
+fi
+OPTS_SPEC="\
+git subtree add --prefix=<prefix> <commit>
+git subtree split [options...] --prefix=<prefix> <commit...>
+git subtree merge --prefix=<prefix> <commit>
+git subtree pull  --prefix=<prefix> <repository> <refspec...>
+--
+h,help        show the help
+q             quiet
+d             show debug messages
+prefix=       the name of the subdir to split out
+ options for 'split'
+annotate=     add a prefix to commit message of new commits
+onto=         try connecting new tree to an existing one
+rejoin        merge the new branch back into HEAD
+ignore-joins  ignore prior --rejoin commits
+"
+eval $(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit $?)
+. git-sh-setup
+require_work_tree
+
+quiet=
+debug=
+command=
+onto=
+rejoin=
+ignore_joins=
+annotate=
+
+debug()
+{
+	if [ -n "$debug" ]; then
+		echo "$@" >&2
+	fi
+}
+
+say()
+{
+	if [ -z "$quiet" ]; then
+		echo "$@" >&2
+	fi
+}
+
+assert()
+{
+	if "$@"; then
+		:
+	else
+		die "assertion failed: " "$@"
+	fi
+}
+
+
+#echo "Options: $*"
+
+while [ $# -gt 0 ]; do
+	opt="$1"
+	shift
+	case "$opt" in
+		-q) quiet=1 ;;
+		-d) debug=1 ;;
+		--annotate) annotate="$1"; shift ;;
+		--no-annotate) annotate= ;;
+		--prefix) prefix="$1"; shift ;;
+		--no-prefix) prefix= ;;
+		--onto) onto="$1"; shift ;;
+		--no-onto) onto= ;;
+		--rejoin) rejoin=1 ;;
+		--no-rejoin) rejoin= ;;
+		--ignore-joins) ignore_joins=1 ;;
+		--no-ignore-joins) ignore_joins= ;;
+		--) break ;;
+	esac
+done
+
+command="$1"
+shift
+case "$command" in
+	add|merge|pull) default= ;;
+	split) default="--default HEAD" ;;
+	*) die "Unknown command '$command'" ;;
+esac
+
+if [ -z "$prefix" ]; then
+	die "You must provide the --prefix option."
+fi
+dir="$prefix"
+
+if [ "$command" != "pull" ]; then
+	revs=$(git rev-parse $default --revs-only "$@") || exit $?
+	dirs="$(git rev-parse --no-revs --no-flags "$@")" || exit $?
+	if [ -n "$dirs" ]; then
+		die "Error: Use --prefix instead of bare filenames."
+	fi
+fi
+
+debug "command: {$command}"
+debug "quiet: {$quiet}"
+debug "revs: {$revs}"
+debug "dir: {$dir}"
+debug "opts: {$*}"
+debug
+
+cache_setup()
+{
+	cachedir="$GIT_DIR/subtree-cache/$$"
+	rm -rf "$cachedir" || die "Can't delete old cachedir: $cachedir"
+	mkdir -p "$cachedir" || die "Can't create new cachedir: $cachedir"
+	debug "Using cachedir: $cachedir" >&2
+}
+
+cache_get()
+{
+	for oldrev in $*; do
+		if [ -r "$cachedir/$oldrev" ]; then
+			read newrev <"$cachedir/$oldrev"
+			echo $newrev
+		fi
+	done
+}
+
+cache_set()
+{
+	oldrev="$1"
+	newrev="$2"
+	if [ "$oldrev" != "latest_old" \
+	     -a "$oldrev" != "latest_new" \
+	     -a -e "$cachedir/$oldrev" ]; then
+		die "cache for $oldrev already exists!"
+	fi
+	echo "$newrev" >"$cachedir/$oldrev"
+}
+
+# if a commit doesn't have a parent, this might not work.  But we only want
+# to remove the parent from the rev-list, and since it doesn't exist, it won't
+# be there anyway, so do nothing in that case.
+try_remove_previous()
+{
+	if git rev-parse "$1^" >/dev/null 2>&1; then
+		echo "^$1^"
+	fi
+}
+
+find_existing_splits()
+{
+	debug "Looking for prior splits..."
+	dir="$1"
+	revs="$2"
+	git log --grep="^git-subtree-dir: $dir\$" \
+		--pretty=format:'%s%n%n%b%nEND' $revs |
+	while read a b junk; do
+		case "$a" in
+			git-subtree-mainline:) main="$b" ;;
+			git-subtree-split:) sub="$b" ;;
+			*)
+				if [ -n "$main" -a -n "$sub" ]; then
+					debug "  Prior: $main -> $sub"
+					cache_set $main $sub
+					try_remove_previous "$main"
+					try_remove_previous "$sub"
+					main=
+					sub=
+				fi
+				;;
+		esac
+	done
+}
+
+copy_commit()
+{
+	# We're doing to set some environment vars here, so
+	# do it in a subshell to get rid of them safely later
+	debug copy_commit "{$1}" "{$2}" "{$3}"
+	git log -1 --pretty=format:'%an%n%ae%n%ad%n%cn%n%ce%n%cd%n%s%n%n%b' "$1" |
+	(
+		read GIT_AUTHOR_NAME
+		read GIT_AUTHOR_EMAIL
+		read GIT_AUTHOR_DATE
+		read GIT_COMMITTER_NAME
+		read GIT_COMMITTER_EMAIL
+		read GIT_COMMITTER_DATE
+		export  GIT_AUTHOR_NAME \
+			GIT_AUTHOR_EMAIL \
+			GIT_AUTHOR_DATE \
+			GIT_COMMITTER_NAME \
+			GIT_COMMITTER_EMAIL \
+			GIT_COMMITTER_DATE
+		(echo -n "$annotate"; cat ) |
+		git commit-tree "$2" $3  # reads the rest of stdin
+	) || die "Can't copy commit $1"
+}
+
+add_msg()
+{
+	dir="$1"
+	latest_old="$2"
+	latest_new="$3"
+	cat <<-EOF
+		Add '$dir/' from commit '$latest_new'
+		
+		git-subtree-dir: $dir
+		git-subtree-mainline: $latest_old
+		git-subtree-split: $latest_new
+	EOF
+}
+
+merge_msg()
+{
+	dir="$1"
+	latest_old="$2"
+	latest_new="$3"
+	cat <<-EOF
+		Split '$dir/' into commit '$latest_new'
+		
+		git-subtree-dir: $dir
+		git-subtree-mainline: $latest_old
+		git-subtree-split: $latest_new
+	EOF
+}
+
+toptree_for_commit()
+{
+	commit="$1"
+	git log -1 --pretty=format:'%T' "$commit" -- || exit $?
+}
+
+subtree_for_commit()
+{
+	commit="$1"
+	dir="$2"
+	git ls-tree "$commit" -- "$dir" |
+	while read mode type tree name; do
+		assert [ "$name" = "$dir" ]
+		echo $tree
+		break
+	done
+}
+
+tree_changed()
+{
+	tree=$1
+	shift
+	if [ $# -ne 1 ]; then
+		return 0   # weird parents, consider it changed
+	else
+		ptree=$(toptree_for_commit $1)
+		if [ "$ptree" != "$tree" ]; then
+			return 0   # changed
+		else
+			return 1   # not changed
+		fi
+	fi
+}
+
+copy_or_skip()
+{
+	rev="$1"
+	tree="$2"
+	newparents="$3"
+	assert [ -n "$tree" ]
+
+	identical=
+	nonidentical=
+	p=
+	gotparents=
+	for parent in $newparents; do
+		ptree=$(toptree_for_commit $parent) || exit $?
+		[ -z "$ptree" ] && continue
+		if [ "$ptree" = "$tree" ]; then
+			# an identical parent could be used in place of this rev.
+			identical="$parent"
+		else
+			nonidentical="$parent"
+		fi
+		
+		# sometimes both old parents map to the same newparent;
+		# eliminate duplicates
+		is_new=1
+		for gp in $gotparents; do
+			if [ "$gp" = "$parent" ]; then
+				is_new=
+				break
+			fi
+		done
+		if [ -n "$is_new" ]; then
+			gotparents="$gotparents $parent"
+			p="$p -p $parent"
+		fi
+	done
+	
+	if [ -n "$identical" ]; then
+		echo $identical
+	else
+		copy_commit $rev $tree "$p" || exit $?
+	fi
+}
+
+ensure_clean()
+{
+	if ! git diff-index HEAD --exit-code --quiet; then
+		die "Working tree has modifications.  Cannot add."
+	fi
+	if ! git diff-index --cached HEAD --exit-code --quiet; then
+		die "Index has modifications.  Cannot add."
+	fi
+}
+
+cmd_add()
+{
+	if [ -e "$dir" ]; then
+		die "'$dir' already exists.  Cannot add."
+	fi
+	ensure_clean
+	
+	set -- $revs
+	if [ $# -ne 1 ]; then
+		die "You must provide exactly one revision.  Got: '$revs'"
+	fi
+	rev="$1"
+	
+	debug "Adding $dir as '$rev'..."
+	git read-tree --prefix="$dir" $rev || exit $?
+	git checkout "$dir" || exit $?
+	tree=$(git write-tree) || exit $?
+	
+	headrev=$(git rev-parse HEAD) || exit $?
+	if [ -n "$headrev" -a "$headrev" != "$rev" ]; then
+		headp="-p $headrev"
+	else
+		headp=
+	fi
+	commit=$(add_msg "$dir" "$headrev" "$rev" |
+		 git commit-tree $tree $headp -p "$rev") || exit $?
+	git reset "$commit" || exit $?
+}
+
+cmd_split()
+{
+	debug "Splitting $dir..."
+	cache_setup || exit $?
+	
+	if [ -n "$onto" ]; then
+		debug "Reading history for --onto=$onto..."
+		git rev-list $onto |
+		while read rev; do
+			# the 'onto' history is already just the subdir, so
+			# any parent we find there can be used verbatim
+			debug "  cache: $rev"
+			cache_set $rev $rev
+		done
+	fi
+	
+	if [ -n "$ignore_joins" ]; then
+		unrevs=
+	else
+		unrevs="$(find_existing_splits "$dir" "$revs")"
+	fi
+	
+	# We can't restrict rev-list to only $dir here, because some of our
+	# parents have the $dir contents the root, and those won't match.
+	# (and rev-list --follow doesn't seem to solve this)
+	grl='git rev-list --reverse --parents $revs $unrevs'
+	revmax=$(eval "$grl" | wc -l)
+	revcount=0
+	createcount=0
+	eval "$grl" |
+	while read rev parents; do
+		revcount=$(($revcount + 1))
+		say -n "$revcount/$revmax ($createcount)
"
+		debug "Processing commit: $rev"
+		exists=$(cache_get $rev)
+		if [ -n "$exists" ]; then
+			debug "  prior: $exists"
+			continue
+		fi
+		createcount=$(($createcount + 1))
+		debug "  parents: $parents"
+		newparents=$(cache_get $parents)
+		debug "  newparents: $newparents"
+		
+		tree=$(subtree_for_commit $rev "$dir")
+		debug "  tree is: $tree"
+		[ -z $tree ] && continue
+
+		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
+		debug "  newrev is: $newrev"
+		cache_set $rev $newrev
+		cache_set latest_new $newrev
+		cache_set latest_old $rev
+	done || exit $?
+	latest_new=$(cache_get latest_new)
+	if [ -z "$latest_new" ]; then
+		die "No new revisions were found"
+	fi
+	
+	if [ -n "$rejoin" ]; then
+		debug "Merging split branch into HEAD..."
+		latest_old=$(cache_get latest_old)
+		git merge -s ours \
+			-m "$(merge_msg $dir $latest_old $latest_new)" \
+			$latest_new >&2
+	fi
+	echo $latest_new
+	exit 0
+}
+
+cmd_merge()
+{
+	ensure_clean
+	
+	set -- $revs
+	if [ $# -ne 1 ]; then
+		die "You must provide exactly one revision.  Got: '$revs'"
+	fi
+	rev="$1"
+	
+	git merge -s subtree $rev
+}
+
+cmd_pull()
+{
+	ensure_clean
+	set -x
+	git pull -s subtree "$@"
+}
+
+"cmd_$command" "$@"
-- 
1.6.3.rc2.8.gbe66.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/RFC 2/2] Automated test script for 'git subtree'.
  2009-04-26 22:29 [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
@ 2009-04-26 22:29 ` Avery Pennarun
  2009-04-30  2:27 ` [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
  1 sibling, 0 replies; 13+ messages in thread
From: Avery Pennarun @ 2009-04-26 22:29 UTC (permalink / raw)
  To: git; +Cc: Avery Pennarun

TEMPORARY: this script hasn't yet been integrated into the main git unit
tests; it runs standalone for the moment.
---
 subtree-test.sh |  206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 206 insertions(+), 0 deletions(-)
 create mode 100755 subtree-test.sh

diff --git a/subtree-test.sh b/subtree-test.sh
new file mode 100755
index 0000000..38dff7a
--- /dev/null
+++ b/subtree-test.sh
@@ -0,0 +1,206 @@
+#!/bin/bash
+. shellopts.sh
+set -e
+
+create()
+{
+	echo "$1" >"$1"
+	git add "$1"
+}
+
+check()
+{
+	echo
+	echo "check:" "$@"
+	if "$@"; then
+		echo ok
+		return 0
+	else
+		echo FAILED
+		exit 1
+	fi
+}
+
+check_equal()
+{
+	echo
+	echo "check a:" "{$1}"
+	echo "      b:" "{$2}"
+	if [ "$1" = "$2" ]; then
+		return 0
+	else
+		echo FAILED
+		exit 1
+	fi
+}
+
+fixnl()
+{	
+	t=""
+	while read x; do
+		t="$t$x "
+	done
+	echo $t
+}
+
+multiline()
+{
+	while read x; do
+		set -- $x
+		for d in "$@"; do
+			echo "$d"
+		done
+	done
+}
+
+rm -rf mainline subproj
+mkdir mainline subproj
+
+cd subproj
+git init
+
+create sub1
+git commit -m 'sub1'
+git branch sub1
+git branch -m master subproj
+check true
+
+create sub2
+git commit -m 'sub2'
+git branch sub2
+
+create sub3
+git commit -m 'sub3'
+git branch sub3
+
+cd ../mainline
+git init
+create main4
+git commit -m 'main4'
+git branch -m master mainline
+
+git fetch ../subproj sub1
+git branch sub1 FETCH_HEAD
+git subtree add --prefix=subdir FETCH_HEAD
+
+# this shouldn't actually do anything, since FETCH_HEAD is already a parent
+git merge -m 'merge -s -ours' -s ours FETCH_HEAD
+
+create subdir/main-sub5
+git commit -m 'main-sub5'
+
+create main6
+git commit -m 'main6 boring'
+
+create subdir/main-sub7
+git commit -m 'main-sub7'
+
+git fetch ../subproj sub2
+git branch sub2 FETCH_HEAD
+git subtree merge --prefix=subdir FETCH_HEAD
+git branch pre-split
+
+spl1=$(git subtree split --annotate='*' \
+		--prefix subdir --onto FETCH_HEAD --rejoin)
+echo "spl1={$spl1}"
+git branch spl1 "$spl1"
+
+create subdir/main-sub8
+git commit -m 'main-sub8'
+
+cd ../subproj
+git fetch ../mainline spl1
+git branch spl1 FETCH_HEAD
+git merge FETCH_HEAD
+
+create sub9
+git commit -m 'sub9'
+
+cd ../mainline
+split2=$(git subtree split --annotate='*' --prefix subdir --rejoin)
+git branch split2 "$split2"
+
+create subdir/main-sub10
+git commit -m 'main-sub10'
+
+spl3=$(git subtree split --annotate='*' --prefix subdir --rejoin)
+git branch spl3 "$spl3"
+
+cd ../subproj
+git fetch ../mainline spl3
+git branch spl3 FETCH_HEAD
+git merge FETCH_HEAD
+git branch subproj-merge-spl3
+
+chkm="main4 main6"
+chkms="main-sub10 main-sub5 main-sub7 main-sub8"
+chkms_sub=$(echo $chkms | multiline | sed 's,^,subdir/,' | fixnl)
+chks="sub1 sub2 sub3 sub9"
+chks_sub=$(echo $chks | multiline | sed 's,^,subdir/,' | fixnl)
+
+# make sure exactly the right set of files ends up in the subproj
+subfiles=$(git ls-files | fixnl)
+check_equal "$subfiles" "$chkms $chks"
+
+# make sure the subproj history *only* contains commits that affect the subdir.
+allchanges=$(git log --name-only --pretty=format:'' | sort | fixnl)
+check_equal "$allchanges" "$chkms $chks"
+
+cd ../mainline
+git fetch ../subproj subproj-merge-spl3
+git branch subproj-merge-spl3 FETCH_HEAD
+git subtree pull --prefix=subdir ../subproj subproj-merge-spl3
+
+# make sure exactly the right set of files ends up in the mainline
+mainfiles=$(git ls-files | fixnl)
+check_equal "$mainfiles" "$chkm $chkms_sub $chks_sub"
+
+# make sure each filename changed exactly once in the entire history.
+# 'main-sub??' and '/subdir/main-sub??' both change, because those are the
+# changes that were split into their own history.  And 'subdir/sub??' never
+# change, since they were *only* changed in the subtree branch.
+allchanges=$(git log --name-only --pretty=format:'' | sort | fixnl)
+check_equal "$allchanges" "$chkm $chkms $chks $chkms_sub"
+
+# make sure the --rejoin commits never make it into subproj
+check_equal "$(git log --pretty=format:'%s' HEAD^2 | grep -i split)" ""
+
+# make sure no 'git subtree' tagged commits make it into subproj. (They're
+# meaningless to subproj since one side of the merge refers to the mainline)
+check_equal "$(git log --pretty=format:'%s%n%b' HEAD^2 | grep 'git-subtree.*:')" ""
+
+# make sure no patch changes more than one file.  The original set of commits
+# changed only one file each.  A multi-file change would imply that we pruned
+# commits too aggressively.
+joincommits()
+{
+	commit=
+	all=
+	while read x y; do
+		echo "{$x}" >&2
+		if [ -z "$x" ]; then
+			continue
+		elif [ "$x" = "commit:" ]; then
+			if [ -n "$commit" ]; then
+				echo "$commit $all"
+				all=
+			fi
+			commit="$y"
+		else
+			all="$all $y"
+		fi
+	done
+	echo "$commit $all"
+}
+x=
+git log --pretty=format:'commit: %H' | joincommits |
+(	while read commit a b; do
+		echo "Verifying commit $commit"
+		check_equal "$b" ""
+		x=1
+	done
+	check_equal "$x" 1
+) || exit 1
+
+echo
+echo 'ok'
-- 
1.6.3.rc2.8.gbe66.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-04-26 22:29 [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
  2009-04-26 22:29 ` [PATCH/RFC 2/2] Automated test script for 'git subtree' Avery Pennarun
@ 2009-04-30  2:27 ` Avery Pennarun
  2009-04-30  3:44   ` Ping Yin
  2009-04-30  8:58   ` Finn Arne Gangstad
  1 sibling, 2 replies; 13+ messages in thread
From: Avery Pennarun @ 2009-04-30  2:27 UTC (permalink / raw)
  To: Git Mailing List

Many projects are made of a combination of several subprojects/libraries and
some application-specific code.  In some cases, particularly when the
subprojects are all maintained independently, 'git submodule' is the best
way to deal with this situation.  But if you frequently change the
subprojects as part of developing your application, use multiple branches,
and sometimes want to push your subproject changes upstream, the overhead of
manually managing submodules can be excessive.

'git subtree' provides an alternative mechanism, based around the
'git merge -s subtree' merge strategy.  Instead of tracking a submodule
separately, you merge its history into your main project, and occasionally
extract a new "virtual history" from your mainline that can be easily merged
back into the upstream project.  The virtual history can be incrementally
expanded as you make more changes to the superproject.

You would normally then merge the virtual history back into your mainline
(the --rejoin option). This results in extra commits in your application
that appear to change the same files, but these extra commits will tend to
be ignored by git's merge simplification algorithm anyway.

For example, gitweb (commit 1130ef3) was merged into git as of commit
0a8f4f0, after which it was no longer maintained separately.  But imagine it
had been maintained separately, and we wanted to extract git's changes to
gitweb since that time, to share with the upstream.  You could do this:

git subtree split --prefix=gitweb --annotate='(split) ' \
       0a8f4f0^.. --onto=1130ef3 --rejoin

If gitweb had originally been merged using 'git subtree add' (or a previous
split had been done with --rejoin specified), then you could incrementally
produce the list of new changes without needing to remember any commit ids:

git subtree split --prefix=gitweb --annotate='(split) ' --rejoin
---

Resending just in case it got lost.

Don't suppose anyone has any comments on this?  It's just a first
draft, so please let loose.  I'm in desperate need of *some* kind of
solution like this, however.

- Avery


 Makefile         |    1 +
 command-list.txt |    1 +
 git-subtree.sh   |  435 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 437 insertions(+), 0 deletions(-)
 create mode 100755 git-subtree.sh

diff --git a/Makefile b/Makefile
index 5c8e83a..f14e11c 100644
--- a/Makefile
+++ b/Makefile
@@ -305,6 +305,7 @@ SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-sh-setup.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
+SCRIPT_SH += git-subtree.sh
 SCRIPT_SH += git-web--browse.sh

 SCRIPT_PERL += git-add--interactive.perl
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..9be4774 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -113,6 +113,7 @@ git-stash                               mainporcelain
 git-status                              mainporcelain common
 git-stripspace                          purehelpers
 git-submodule                           mainporcelain
+git-subtree                             mainporcelain
 git-svn                                 foreignscminterface
 git-symbolic-ref                        plumbingmanipulators
 git-tag                                 mainporcelain common
diff --git a/git-subtree.sh b/git-subtree.sh
new file mode 100755
index 0000000..39c377c
--- /dev/null
+++ b/git-subtree.sh
@@ -0,0 +1,435 @@
+#!/bin/bash
+#
+# git-subtree.sh: split/join git repositories in subdirectories of this one
+#
+# Copyright (C) 2009 Avery Pennarun <apenwarr@gmail.com>
+#
+if [ $# -eq 0 ]; then
+    set -- -h
+fi
+OPTS_SPEC="\
+git subtree add --prefix=<prefix> <commit>
+git subtree split [options...] --prefix=<prefix> <commit...>
+git subtree merge --prefix=<prefix> <commit>
+git subtree pull  --prefix=<prefix> <repository> <refspec...>
+--
+h,help        show the help
+q             quiet
+d             show debug messages
+prefix=       the name of the subdir to split out
+ options for 'split'
+annotate=     add a prefix to commit message of new commits
+onto=         try connecting new tree to an existing one
+rejoin        merge the new branch back into HEAD
+ignore-joins  ignore prior --rejoin commits
+"
+eval $(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit $?)
+. git-sh-setup
+require_work_tree
+
+quiet=
+debug=
+command=
+onto=
+rejoin=
+ignore_joins=
+annotate=
+
+debug()
+{
+       if [ -n "$debug" ]; then
+               echo "$@" >&2
+       fi
+}
+
+say()
+{
+       if [ -z "$quiet" ]; then
+               echo "$@" >&2
+       fi
+}
+
+assert()
+{
+       if "$@"; then
+               :
+       else
+               die "assertion failed: " "$@"
+       fi
+}
+
+
+#echo "Options: $*"
+
+while [ $# -gt 0 ]; do
+       opt="$1"
+       shift
+       case "$opt" in
+               -q) quiet=1 ;;
+               -d) debug=1 ;;
+               --annotate) annotate="$1"; shift ;;
+               --no-annotate) annotate= ;;
+               --prefix) prefix="$1"; shift ;;
+               --no-prefix) prefix= ;;
+               --onto) onto="$1"; shift ;;
+               --no-onto) onto= ;;
+               --rejoin) rejoin=1 ;;
+               --no-rejoin) rejoin= ;;
+               --ignore-joins) ignore_joins=1 ;;
+               --no-ignore-joins) ignore_joins= ;;
+               --) break ;;
+       esac
+done
+
+command="$1"
+shift
+case "$command" in
+       add|merge|pull) default= ;;
+       split) default="--default HEAD" ;;
+       *) die "Unknown command '$command'" ;;
+esac
+
+if [ -z "$prefix" ]; then
+       die "You must provide the --prefix option."
+fi
+dir="$prefix"
+
+if [ "$command" != "pull" ]; then
+       revs=$(git rev-parse $default --revs-only "$@") || exit $?
+       dirs="$(git rev-parse --no-revs --no-flags "$@")" || exit $?
+       if [ -n "$dirs" ]; then
+               die "Error: Use --prefix instead of bare filenames."
+       fi
+fi
+
+debug "command: {$command}"
+debug "quiet: {$quiet}"
+debug "revs: {$revs}"
+debug "dir: {$dir}"
+debug "opts: {$*}"
+debug
+
+cache_setup()
+{
+       cachedir="$GIT_DIR/subtree-cache/$$"
+       rm -rf "$cachedir" || die "Can't delete old cachedir: $cachedir"
+       mkdir -p "$cachedir" || die "Can't create new cachedir: $cachedir"
+       debug "Using cachedir: $cachedir" >&2
+}
+
+cache_get()
+{
+       for oldrev in $*; do
+               if [ -r "$cachedir/$oldrev" ]; then
+                       read newrev <"$cachedir/$oldrev"
+                       echo $newrev
+               fi
+       done
+}
+
+cache_set()
+{
+       oldrev="$1"
+       newrev="$2"
+       if [ "$oldrev" != "latest_old" \
+            -a "$oldrev" != "latest_new" \
+            -a -e "$cachedir/$oldrev" ]; then
+               die "cache for $oldrev already exists!"
+       fi
+       echo "$newrev" >"$cachedir/$oldrev"
+}
+
+# if a commit doesn't have a parent, this might not work.  But we only want
+# to remove the parent from the rev-list, and since it doesn't exist, it won't
+# be there anyway, so do nothing in that case.
+try_remove_previous()
+{
+       if git rev-parse "$1^" >/dev/null 2>&1; then
+               echo "^$1^"
+       fi
+}
+
+find_existing_splits()
+{
+       debug "Looking for prior splits..."
+       dir="$1"
+       revs="$2"
+       git log --grep="^git-subtree-dir: $dir\$" \
+               --pretty=format:'%s%n%n%b%nEND' $revs |
+       while read a b junk; do
+               case "$a" in
+                       git-subtree-mainline:) main="$b" ;;
+                       git-subtree-split:) sub="$b" ;;
+                       *)
+                               if [ -n "$main" -a -n "$sub" ]; then
+                                       debug "  Prior: $main -> $sub"
+                                       cache_set $main $sub
+                                       try_remove_previous "$main"
+                                       try_remove_previous "$sub"
+                                       main=
+                                       sub=
+                               fi
+                               ;;
+               esac
+       done
+}
+
+copy_commit()
+{
+       # We're doing to set some environment vars here, so
+       # do it in a subshell to get rid of them safely later
+       debug copy_commit "{$1}" "{$2}" "{$3}"
+       git log -1
--pretty=format:'%an%n%ae%n%ad%n%cn%n%ce%n%cd%n%s%n%n%b' "$1" |
+       (
+               read GIT_AUTHOR_NAME
+               read GIT_AUTHOR_EMAIL
+               read GIT_AUTHOR_DATE
+               read GIT_COMMITTER_NAME
+               read GIT_COMMITTER_EMAIL
+               read GIT_COMMITTER_DATE
+               export  GIT_AUTHOR_NAME \
+                       GIT_AUTHOR_EMAIL \
+                       GIT_AUTHOR_DATE \
+                       GIT_COMMITTER_NAME \
+                       GIT_COMMITTER_EMAIL \
+                       GIT_COMMITTER_DATE
+               (echo -n "$annotate"; cat ) |
+               git commit-tree "$2" $3  # reads the rest of stdin
+       ) || die "Can't copy commit $1"
+}
+
+add_msg()
+{
+       dir="$1"
+       latest_old="$2"
+       latest_new="$3"
+       cat <<-EOF
+               Add '$dir/' from commit '$latest_new'
+
+               git-subtree-dir: $dir
+               git-subtree-mainline: $latest_old
+               git-subtree-split: $latest_new
+       EOF
+}
+
+merge_msg()
+{
+       dir="$1"
+       latest_old="$2"
+       latest_new="$3"
+       cat <<-EOF
+               Split '$dir/' into commit '$latest_new'
+
+               git-subtree-dir: $dir
+               git-subtree-mainline: $latest_old
+               git-subtree-split: $latest_new
+       EOF
+}
+
+toptree_for_commit()
+{
+       commit="$1"
+       git log -1 --pretty=format:'%T' "$commit" -- || exit $?
+}
+
+subtree_for_commit()
+{
+       commit="$1"
+       dir="$2"
+       git ls-tree "$commit" -- "$dir" |
+       while read mode type tree name; do
+               assert [ "$name" = "$dir" ]
+               echo $tree
+               break
+       done
+}
+
+tree_changed()
+{
+       tree=$1
+       shift
+       if [ $# -ne 1 ]; then
+               return 0   # weird parents, consider it changed
+       else
+               ptree=$(toptree_for_commit $1)
+               if [ "$ptree" != "$tree" ]; then
+                       return 0   # changed
+               else
+                       return 1   # not changed
+               fi
+       fi
+}
+
+copy_or_skip()
+{
+       rev="$1"
+       tree="$2"
+       newparents="$3"
+       assert [ -n "$tree" ]
+
+       identical=
+       nonidentical=
+       p=
+       gotparents=
+       for parent in $newparents; do
+               ptree=$(toptree_for_commit $parent) || exit $?
+               [ -z "$ptree" ] && continue
+               if [ "$ptree" = "$tree" ]; then
+                       # an identical parent could be used in place
of this rev.
+                       identical="$parent"
+               else
+                       nonidentical="$parent"
+               fi
+
+               # sometimes both old parents map to the same newparent;
+               # eliminate duplicates
+               is_new=1
+               for gp in $gotparents; do
+                       if [ "$gp" = "$parent" ]; then
+                               is_new=
+                               break
+                       fi
+               done
+               if [ -n "$is_new" ]; then
+                       gotparents="$gotparents $parent"
+                       p="$p -p $parent"
+               fi
+       done
+
+       if [ -n "$identical" ]; then
+               echo $identical
+       else
+               copy_commit $rev $tree "$p" || exit $?
+       fi
+}
+
+ensure_clean()
+{
+       if ! git diff-index HEAD --exit-code --quiet; then
+               die "Working tree has modifications.  Cannot add."
+       fi
+       if ! git diff-index --cached HEAD --exit-code --quiet; then
+               die "Index has modifications.  Cannot add."
+       fi
+}
+
+cmd_add()
+{
+       if [ -e "$dir" ]; then
+               die "'$dir' already exists.  Cannot add."
+       fi
+       ensure_clean
+
+       set -- $revs
+       if [ $# -ne 1 ]; then
+               die "You must provide exactly one revision.  Got: '$revs'"
+       fi
+       rev="$1"
+
+       debug "Adding $dir as '$rev'..."
+       git read-tree --prefix="$dir" $rev || exit $?
+       git checkout "$dir" || exit $?
+       tree=$(git write-tree) || exit $?
+
+       headrev=$(git rev-parse HEAD) || exit $?
+       if [ -n "$headrev" -a "$headrev" != "$rev" ]; then
+               headp="-p $headrev"
+       else
+               headp=
+       fi
+       commit=$(add_msg "$dir" "$headrev" "$rev" |
+                git commit-tree $tree $headp -p "$rev") || exit $?
+       git reset "$commit" || exit $?
+}
+
+cmd_split()
+{
+       debug "Splitting $dir..."
+       cache_setup || exit $?
+
+       if [ -n "$onto" ]; then
+               debug "Reading history for --onto=$onto..."
+               git rev-list $onto |
+               while read rev; do
+                       # the 'onto' history is already just the subdir, so
+                       # any parent we find there can be used verbatim
+                       debug "  cache: $rev"
+                       cache_set $rev $rev
+               done
+       fi
+
+       if [ -n "$ignore_joins" ]; then
+               unrevs=
+       else
+               unrevs="$(find_existing_splits "$dir" "$revs")"
+       fi
+
+       # We can't restrict rev-list to only $dir here, because some of our
+       # parents have the $dir contents the root, and those won't match.
+       # (and rev-list --follow doesn't seem to solve this)
+       grl='git rev-list --reverse --parents $revs $unrevs'
+       revmax=$(eval "$grl" | wc -l)
+       revcount=0
+       createcount=0
+       eval "$grl" |
+       while read rev parents; do
+               revcount=$(($revcount + 1))
+               say -n "$revcount/$revmax ($createcount)
"
+               debug "Processing commit: $rev"
+               exists=$(cache_get $rev)
+               if [ -n "$exists" ]; then
+                       debug "  prior: $exists"
+                       continue
+               fi
+               createcount=$(($createcount + 1))
+               debug "  parents: $parents"
+               newparents=$(cache_get $parents)
+               debug "  newparents: $newparents"
+
+               tree=$(subtree_for_commit $rev "$dir")
+               debug "  tree is: $tree"
+               [ -z $tree ] && continue
+
+               newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
+               debug "  newrev is: $newrev"
+               cache_set $rev $newrev
+               cache_set latest_new $newrev
+               cache_set latest_old $rev
+       done || exit $?
+       latest_new=$(cache_get latest_new)
+       if [ -z "$latest_new" ]; then
+               die "No new revisions were found"
+       fi
+
+       if [ -n "$rejoin" ]; then
+               debug "Merging split branch into HEAD..."
+               latest_old=$(cache_get latest_old)
+               git merge -s ours \
+                       -m "$(merge_msg $dir $latest_old $latest_new)" \
+                       $latest_new >&2
+       fi
+       echo $latest_new
+       exit 0
+}
+
+cmd_merge()
+{
+       ensure_clean
+
+       set -- $revs
+       if [ $# -ne 1 ]; then
+               die "You must provide exactly one revision.  Got: '$revs'"
+       fi
+       rev="$1"
+
+       git merge -s subtree $rev
+}
+
+cmd_pull()
+{
+       ensure_clean
+       set -x
+       git pull -s subtree "$@"
+}
+
+"cmd_$command" "$@"
--
1.6.3.rc2.8.gbe66.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-04-30  2:27 ` [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
@ 2009-04-30  3:44   ` Ping Yin
  2009-04-30  8:58   ` Finn Arne Gangstad
  1 sibling, 0 replies; 13+ messages in thread
From: Ping Yin @ 2009-04-30  3:44 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Git Mailing List

> Resending just in case it got lost.
>
> Don't suppose anyone has any comments on this?  It's just a first
> draft, so please let loose.  I'm in desperate need of *some* kind of
> solution like this, however.
>

I think it is a nice feature. Although there is a howto to introduce
merge as subtree, and i have done it several times, i can't still
remember the first step. A command like this is helpful for me.
Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately.
  2009-04-30  2:27 ` [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
  2009-04-30  3:44   ` Ping Yin
@ 2009-04-30  8:58   ` Finn Arne Gangstad
  2009-04-30 14:32     ` Avery Pennarun
  1 sibling, 1 reply; 13+ messages in thread
From: Finn Arne Gangstad @ 2009-04-30  8:58 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Git Mailing List

On Wed, Apr 29, 2009 at 10:27:44PM -0400, Avery Pennarun wrote:
> Many projects are made of a combination of several subprojects/libraries and
> some application-specific code.  In some cases, particularly when the
> subprojects are all maintained independently, 'git submodule' is the best
> way to deal with this situation.  But if you frequently change the
> subprojects as part of developing your application, use multiple branches,
> and sometimes want to push your subproject changes upstream, the overhead of
> manually managing submodules can be excessive.
> 
> 'git subtree' provides an alternative mechanism, based around the
> 'git merge -s subtree' merge strategy.  Instead of tracking a submodule
> separately, you merge its history into your main project, and occasionally
> extract a new "virtual history" from your mainline that can be easily merged
> back into the upstream project.  The virtual history can be incrementally
> expanded as you make more changes to the superproject.

We have the exact same situation. I wanted to attack this from the
other end though, make submodules useable also in this scenario. The
subtree solution seems to be much easier to do in git, so maybe this
is a better approach!

Let's say you have three different projects that all use some shared
modules, The following operations should all be easy and fully
supported:

a) Modify project + some shared modules (in your project) with single commit
b) Push project + shared modules (for your project)
c) Push modifications to shared modules
d) Merge upstream version of shared modules into your project.

My quick analysis:
Your subtrees: a & b are easy, c & d are painful
Current submodules: a & b are painful, c & d are tolerable (somewhat tedious
with many shared modules, easy with one)

Subtrees also have the advantage that all the existing local tools
will be a lot more useful without any modifications (gitk, git gui,
git diff/patch/am/log/...)

To make subtrees realy useful, it would be good if you could improve c
& d, syncing with the shared modules!

- Finn Arne

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-04-30  8:58   ` Finn Arne Gangstad
@ 2009-04-30 14:32     ` Avery Pennarun
  2009-07-16 18:04       ` Andrey Smirnov
  0 siblings, 1 reply; 13+ messages in thread
From: Avery Pennarun @ 2009-04-30 14:32 UTC (permalink / raw)
  To: Finn Arne Gangstad; +Cc: Git Mailing List

On Thu, Apr 30, 2009 at 4:58 AM, Finn Arne Gangstad <finnag@pvv.org> wrote:
> On Wed, Apr 29, 2009 at 10:27:44PM -0400, Avery Pennarun wrote:
>> 'git subtree' provides an alternative mechanism, based around the
>> 'git merge -s subtree' merge strategy.  Instead of tracking a submodule
>> separately, you merge its history into your main project, and occasionally
>> extract a new "virtual history" from your mainline that can be easily merged
>> back into the upstream project.  The virtual history can be incrementally
>> expanded as you make more changes to the superproject.
>
> We have the exact same situation. I wanted to attack this from the
> other end though, make submodules useable also in this scenario. The
> subtree solution seems to be much easier to do in git, so maybe this
> is a better approach!

Sounds like your thought process is similar to mine :)  I spent a lot
of time trying to figure out how to convince submodules to work the
way I wanted, until I eventually realized that subtrees were already a
lot closer.

> Let's say you have three different projects that all use some shared
> modules, The following operations should all be easy and fully
> supported:
>
> a) Modify project + some shared modules (in your project) with single commit
> b) Push project + shared modules (for your project)
> c) Push modifications to shared modules
> d) Merge upstream version of shared modules into your project.
>
> My quick analysis:
> Your subtrees: a & b are easy, c & d are painful

My *attempt* with git-subtree was to make all four operations easy.
It's up to you to decide whether I succeeded :)

a) Modify-and-commit: just git commit

b) Push project+shared: just git push

c) Push shared changes only:
      # Should we try to make a simpler single command for this?
      # The problem is: I suspect people will normally want to review the
      # git subtree split output before pushing it anywhere, so combining
      # the split/push operations may not be wise.
      git push shared-remote $(git subtree split --prefix=shared-dir):master

d) Merge upstream changes of shared module:
      git subtree pull --prefix=shared-dir shared-remote master
    or
      git fetch shared-remote master
      git subtree merge --prefix=shared-dir FETCH_HEAD

Have fun,

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-04-30 14:32     ` Avery Pennarun
@ 2009-07-16 18:04       ` Andrey Smirnov
  2009-07-16 18:34         ` Avery Pennarun
  0 siblings, 1 reply; 13+ messages in thread
From: Andrey Smirnov @ 2009-07-16 18:04 UTC (permalink / raw)
  To: git

Hello!

Avery Pennarun <apenwarr> writes:

> d) Merge upstream changes of shared module:
>       git subtree pull --prefix=shared-dir shared-remote master
>     or
>       git fetch shared-remote master
>       git subtree merge --prefix=shared-dir FETCH_HEAD

I found the git-subtree aproach of handling sub-repositories very interesting 
and useful to me. This is the previous-to-last feature I've awaited from DVCS
world since I went into it. <remark> The remaining feature I wish that's not 
already there is the ability to automatically track the tree of repos I work
with and manage this tree as simple as filemanager-style clients 
like Tortoise SVN allow. This is a feature like submodules, but with tracking
refs, remotes and remote URIs in a repo itself rather than in .git dirs
and with different commands for propagation and display of changesets). 
Hope someone has time to try this approach (or at least patience to discuss).
</remark>

I've just used git-subtree (latest github version) and it worked for me. 
However I've encountered some difficulty using it for my purpose and wish
to shere the solution I've come to and ask if it is ok:

My goal was to rebase changes to shared library of two similar projects from one
project to another. The commits in the more recent project were touching both
lib/ directory with shared library and the rest of the project.

When I did 
   git subtree split --prefix=lib NewProj -b test-split
 and
   git subtree split --prefix=lib OldProj -b test-split-old
I got the following two trees without a common root:

...X ----- Y ----- OldProj ----...---- Z ---- NewProj

X' ----- Y'==test-split-old ----- Z'==test-split

Problem:

When I did
   git subtree merge test-split --prefix=lib
it printed:
 Auto-merged lib/x.cgi
 CONFLICT (add/add): Merge conflict in lib/x.cgi
 Auto-merged lib/y.cgi
 CONFLICT (add/add): Merge conflict in lib/y.cgi
 Automatic merge failed; fix conflicts and then commit the result.

It's obvious that it should be that way because logically both trees don't have
the same root at the time of merge. But I've expected subtree merge --prefix
will understand that X' is identical to changes to 'lib/*' in X, Y' to Y and Z'
to Z.

Solution: 

    git rebase --onto OldProj test-split-old test-split
it printed:
 First, rewinding head to replay your work on top of it...
 Applying ZZZZZ
 error: x.cgi: does not exist in index
 error: y.cgi: does not exist in index
 Using index info to reconstruct a base tree...
 Falling back to patching base and 3-way merge...

I don't know what magic it used but it did rebase right. Furthermore "-s
subtree" didn't work at all:
    git rebase --onto OldProj test-split-old test-split -s subtree
 First, rewinding head to replay your work on top of it...
 Fast-forwarded OldProj to OldProj.

And so I ask if this behavior is the way git-subtree was meant to work.
It probably has sense to add 'rebase' command to git-subtree script to let
perform such tasks simplier.

My best regards, Andrey Smirnov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-16 18:04       ` Andrey Smirnov
@ 2009-07-16 18:34         ` Avery Pennarun
  2009-07-16 22:09           ` Andrey Smirnov
  0 siblings, 1 reply; 13+ messages in thread
From: Avery Pennarun @ 2009-07-16 18:34 UTC (permalink / raw)
  To: Andrey Smirnov; +Cc: git

On Thu, Jul 16, 2009 at 2:04 PM, Andrey Smirnov<allter@gmail.com> wrote:
> I found the git-subtree aproach of handling sub-repositories very interesting
> and useful to me. This is the previous-to-last feature I've awaited from DVCS
> world since I went into it.

I'm glad you like it.

> My goal was to rebase changes to shared library of two similar projects from one
> project to another. The commits in the more recent project were touching both
> lib/ directory with shared library and the rest of the project.
>
> When I did
>   git subtree split --prefix=lib NewProj -b test-split
>  and
>   git subtree split --prefix=lib OldProj -b test-split-old
> I got the following two trees without a common root:
>
> ...X ----- Y ----- OldProj ----...---- Z ---- NewProj
>
> X' ----- Y'==test-split-old ----- Z'==test-split

So, why don't they have a common root?  This is, of course, the
primary cause of your problems.

How did this shared library get merged into OldProj and NewProj in the
first place?  Did you just copy the files, or did you use something
like 'git merge -s subtree'?  If the latter, you should be able to
convince git-subtree to produce two split repositories with identical
roots, and then merge smoothly between them.

If you just copied the files (or applied patches with git-rebase, etc)
then a common root is impossible, and you'll have to repair the
damage, as I'll explain below:

> Problem:
>
> When I did
>   git subtree merge test-split --prefix=lib
> it printed:
>  Auto-merged lib/x.cgi
>  CONFLICT (add/add): Merge conflict in lib/x.cgi
>  Auto-merged lib/y.cgi
>  CONFLICT (add/add): Merge conflict in lib/y.cgi
>  Automatic merge failed; fix conflicts and then commit the result.
>
> It's obvious that it should be that way because logically both trees don't have
> the same root at the time of merge. But I've expected subtree merge --prefix
> will understand that X' is identical to changes to 'lib/*' in X, Y' to Y and Z'
> to Z.

'git subtree merge' is just like 'git merge' - if you don't have a
shared merge-base commit, it won't know what to do, and will have to
guess.

> Solution:
>
>    git rebase --onto OldProj test-split-old test-split
> it printed:
>  First, rewinding head to replay your work on top of it...
>  Applying ZZZZZ
>  error: x.cgi: does not exist in index
>  error: y.cgi: does not exist in index
>  Using index info to reconstruct a base tree...
>  Falling back to patching base and 3-way merge...
>
> I don't know what magic it used but it did rebase right. Furthermore "-s
> subtree" didn't work at all:
>    git rebase --onto OldProj test-split-old test-split -s subtree
>  First, rewinding head to replay your work on top of it...
>  Fast-forwarded OldProj to OldProj.

Now you've just made a mess.  Instead, I might have tried something like this:

  git checkout -b test-merged test-split
  git rebase test-split-old

Now you've got a split branch test-merged that should contain all the
changes from both test-split and test-split-old, and its parentage is
test-split-old.  Let's make it so that it can merge conflict-free into
either branch:

  git merge -s ours test-split

Next, you merge it into one of your main projects:

  git checkout OldProj
    # you should have used split --rejoin earlier, and the next
command wouldn't be necessary here
  git subtree split --prefix=lib OldProj --rejoin
  git subtree merge --prefix=lib test-merged

And the other:

  git checkout NewProj
    # you should have used split --rejoin earlier, and the next
command wouldn't be necessary here
  git subtree split --prefix=lib NewProj --rejoin
  git subtree merge --prefix=lib test-merged

Now both projects are caught up with test-merged, and moreover, if you
split either of them in the future, they will share a common
merge-base (ie. today's value of test-merged).  Future merges will
therefore be easier.

> And so I ask if this behavior is the way git-subtree was meant to work.
> It probably has sense to add 'rebase' command to git-subtree script to let
> perform such tasks simplier.

I don't think that's a good idea.  git-subtree is completely separate
from rebasing, and doesn't deal with patches at all.  Maybe there
should be some kind of "force-update" option that does what "git
subtree add" does, but wiping out everything in the subtree before it
starts.  That would have simplified the above commands a bit.

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-16 18:34         ` Avery Pennarun
@ 2009-07-16 22:09           ` Andrey Smirnov
  2009-07-16 22:27             ` Avery Pennarun
  0 siblings, 1 reply; 13+ messages in thread
From: Andrey Smirnov @ 2009-07-16 22:09 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

Hello!

On Thu, Jul 16, 2009 at 10:34 PM, Avery Pennarun<apenwarr> wrote:
>> When I did
>>   git subtree split --prefix=lib NewProj -b test-split
>>  and
>>   git subtree split --prefix=lib OldProj -b test-split-old
>> I got the following two trees without a common root:
>>
>> ...X ----- Y ----- OldProj ----...---- Z ---- NewProj
>>
>> X' ----- Y'==test-split-old ----- Z'==test-split
> So, why don't they have a common root?  This is, of course, the
> primary cause of your problems.

The line with OldProj and NewProj is story of commits for the project
that contains both library and other code. The line with test-split
and test-split-old is the story of commits of the shared library alone
with test-split-old corresponding to OldProj and test-split
corresponding to NewProj.
And I needed to get changes test-split-old..test-split in superproject
(but without other
garbage commits that lead to NewProj).

> How did this shared library get merged into OldProj and NewProj in the
> first place?  Did you just copy the files, or did you use something
> like 'git merge -s subtree'?  If the latter, you should be able to
> convince git-subtree to produce two split repositories with identical
> roots, and then merge smoothly between them.

They don't share commits because the library was never developed on its own.
The library evolved from the common code that was cut and pasted
trough about a hundred
web projects stored in SVN. Before I started to use git (mostly I use
it as merge/rebase tool
because our primary VCS is still Subversion) I transplanted changes in
library by manual
svn merge, even on individual files in some cases. While I was typing
my previous message, I
found that if I added "--rejoin", I would have situation that imitate
effect of "add test-split-old"
command followed by "merge -s subtree test-merged":

...X ----- Y ----- OldProj ----- rejoined-merge ----...---- Z ---- NewProj
                                     /                \
X' ----- Y'==test-split-old                      Z''==test-merged
                                   \
                                     Z'==test-split

But Subversion and git-svn don't like git-ish merges, they need rebase. :(

>  git checkout -b test-merged test-split
>  git checkout OldProj
>  git subtree split --prefix=lib OldProj --rejoin
>  git subtree merge --prefix=lib test-merged

Yes, that's one of ways I thought of and that I pictured above. But I
would like
approach that deals only with patches and not commit trees due to
git-svn restriction.

>> And so I ask if this behavior is the way git-subtree was meant to work.
>> It probably has sense to add 'rebase' command to git-subtree script to let
>> perform such tasks simplier.
> I don't think that's a good idea.  git-subtree is completely separate
> from rebasing, and doesn't deal with patches at all.  Maybe there
> should be some kind of "force-update" option that does what "git
> subtree add" does, but wiping out everything in the subtree before it
> starts.  That would have simplified the above commands a bit.

The only thing that links git-subtree with git-rebase is the fact, that
git-subtree "knows" the target commit for rebases dealing with subtrees.
So if one knows commit of a subtree that he wishes to see in superproject
(in my case "test-split") he could issue:
    git subtree rebase --prefix=lib OldProject test-split

Though simple:
    git rebase --onto OldProject test-split-old test-split
worked for me, I think this was a lucky coincidence because of simplicity
of my library commits.

--
Sincerly yours, Andrey.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-16 22:09           ` Andrey Smirnov
@ 2009-07-16 22:27             ` Avery Pennarun
  2009-07-17  7:16               ` Andrey Smirnov
  0 siblings, 1 reply; 13+ messages in thread
From: Avery Pennarun @ 2009-07-16 22:27 UTC (permalink / raw)
  To: Andrey Smirnov; +Cc: git

On Thu, Jul 16, 2009 at 6:09 PM, Andrey Smirnov<allter@gmail.com> wrote:
> On Thu, Jul 16, 2009 at 10:34 PM, Avery Pennarun<apenwarr> wrote:
>> I don't think that's a good idea.  git-subtree is completely separate
>> from rebasing, and doesn't deal with patches at all.  Maybe there
>> should be some kind of "force-update" option that does what "git
>> subtree add" does, but wiping out everything in the subtree before it
>> starts.  That would have simplified the above commands a bit.
>
> The only thing that links git-subtree with git-rebase is the fact, that
> git-subtree "knows" the target commit for rebases dealing with subtrees.
> So if one knows commit of a subtree that he wishes to see in superproject
> (in my case "test-split") he could issue:
>    git subtree rebase --prefix=lib OldProject test-split
>
> Though simple:
>    git rebase --onto OldProject test-split-old test-split
> worked for me, I think this was a lucky coincidence because of simplicity
> of my library commits.

I don't really understand what you're asking for here.  rebase doesn't
have any parameters called a "target."  What does git-subtree know
that you don't know?

Have fun,

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-16 22:27             ` Avery Pennarun
@ 2009-07-17  7:16               ` Andrey Smirnov
  2009-07-17 15:47                 ` Avery Pennarun
  0 siblings, 1 reply; 13+ messages in thread
From: Andrey Smirnov @ 2009-07-17  7:16 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

On Fri, Jul 17, 2009 at 2:27 AM, Avery Pennarun<apenwarr> wrote:

>> The only thing that links git-subtree with git-rebase is the fact, that
>> git-subtree "knows" the target commit for rebases dealing with subtrees.
> rebase doesn't
> have any parameters called a "target."  What does git-subtree know
> that you don't know?

By "rebase target" I mean the mutual relation of git-rebase <newbase>
and <upstream> paramaters
that define where will be the rebased commits. git-subtree can infer
that NewProj contains library up to
test-split and that OldProj contains library upto test-split-old. The
concept of the whole git-subtee workflow
is still blurry to me though, so I will report when I gather more
usage statistics.

> I don't really understand what you're asking for here.

At most I need generic ability to shift merged and rebased
repository's or ref's "left" (selecting some directory or file)
and "right" (prepending some directory to all paths) before actual
operation(s). I.e. the antonym of 'split'
but without 'add' committree-joining semantics. This can be
implemented with some chaining/plumbing presets.

--
Sincerly yours, Andrey.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-17  7:16               ` Andrey Smirnov
@ 2009-07-17 15:47                 ` Avery Pennarun
  2009-07-17 17:46                   ` Andrey Smirnov
  0 siblings, 1 reply; 13+ messages in thread
From: Avery Pennarun @ 2009-07-17 15:47 UTC (permalink / raw)
  To: Andrey Smirnov; +Cc: git

On Fri, Jul 17, 2009 at 3:16 AM, Andrey Smirnov<allter@gmail.com> wrote:
> On Fri, Jul 17, 2009 at 2:27 AM, Avery Pennarun<apenwarr> wrote:
>>> The only thing that links git-subtree with git-rebase is the fact, that
>>> git-subtree "knows" the target commit for rebases dealing with subtrees.
>> rebase doesn't
>> have any parameters called a "target."  What does git-subtree know
>> that you don't know?
>
> By "rebase target" I mean the mutual relation of git-rebase <newbase>
> and <upstream> paramaters
> that define where will be the rebased commits. git-subtree can infer
> that NewProj contains library up to
> test-split and that OldProj contains library upto test-split-old. The
> concept of the whole git-subtee workflow
> is still blurry to me though, so I will report when I gather more
> usage statistics.

The problem is that test-split and test-split-old are completely
unrelated trees that have similar-looking files but no common
ancestry.  All git-subtree knows is exactly that.  It can't simplify
anything (in your case) like you seem to think it can.

git-rebase tries to be cleverer, and starts comparing patches and file
similarities so it can graft one tree onto another, and for
convenience, it throws away redundant commits that do exactly what
some other commit did (basically).  This is actually really messy.  As
soon as you get into that situation, you have nothing but a mess.  My
advice would be to clean up the mess as soon as you can (which
appropriate use of git-subtree + git-rebase can help you do).

Then you'll have actual, valid merge history, and git-subtree will be
able to work smoothly using just that.

>> I don't really understand what you're asking for here.
>
> At most I need generic ability to shift merged and rebased
> repository's or ref's "left" (selecting some directory or file)
> and "right" (prepending some directory to all paths) before actual
> operation(s). I.e. the antonym of 'split'
> but without 'add' committree-joining semantics. This can be
> implemented with some chaining/plumbing presets.

I think that if you're having this problem, you should look for a less
ugly solution :)

What I think you're asking for is a way of turning all the commits in
a subdir into a patch stream (which git-subtree split can do,
essentially), but then to add a prefix to all the paths in all the
patches, so that you can then apply those patches on top of some other
repo where the files were in another location.  You can do that, I
guess, but you're not taking advantage of git's convenience.

git-subtree encourages you to think of the files in the subtree as
their own separate project, and you can then merge that separate
project into yours.  That's actually a more accurate model of reality,
I think.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of  subtrees separately.
  2009-07-17 15:47                 ` Avery Pennarun
@ 2009-07-17 17:46                   ` Andrey Smirnov
  0 siblings, 0 replies; 13+ messages in thread
From: Andrey Smirnov @ 2009-07-17 17:46 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git

On Fri, Jul 17, 2009 at 7:47 PM, Avery Pennarun<apenwarr@gmail.com> wrote:
> The problem is that test-split and test-split-old are completely
> unrelated trees that have similar-looking files but no common
> ancestry.

I understand this but if two projects share the same commit history
for their subdir-lib
both test-split and test-split-old will have same root.

> git-rebase tries to be cleverer, and starts comparing patches and file
> similarities so it can graft one tree onto another,

That is too clear for me and since I try to avoid committing
'reversion commits' it works for my
workflow. I don't want to track library as separate ref (at least now).

Anyway thank you for great extension - will now try using it on a
regular basis - there're another project that is more suitable
to track libraries as you suggest.

--
Sincerly yours, Andrey.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-07-17 17:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-26 22:29 [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
2009-04-26 22:29 ` [PATCH/RFC 2/2] Automated test script for 'git subtree' Avery Pennarun
2009-04-30  2:27 ` [PATCH/RFC 1/2] Add 'git subtree' command for tracking history of subtrees separately Avery Pennarun
2009-04-30  3:44   ` Ping Yin
2009-04-30  8:58   ` Finn Arne Gangstad
2009-04-30 14:32     ` Avery Pennarun
2009-07-16 18:04       ` Andrey Smirnov
2009-07-16 18:34         ` Avery Pennarun
2009-07-16 22:09           ` Andrey Smirnov
2009-07-16 22:27             ` Avery Pennarun
2009-07-17  7:16               ` Andrey Smirnov
2009-07-17 15:47                 ` Avery Pennarun
2009-07-17 17:46                   ` Andrey Smirnov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.