All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] What's the best UI for 'git submodule split'?
@ 2009-02-12 21:50 Eric Kidd
  2009-02-14  2:24 ` [RFC/PATCHv2] git submodule split Eric Kidd
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Kidd @ 2009-02-12 21:50 UTC (permalink / raw)
  To: git

The problem: At work, we're converting a large git repository to
Subversion. This repository contains an optional "streaming/media"
directory which we want to split into a submodule. Some constraints:

  1) We want 'git bisect' to work with the converted repository, so
the submodule should exist all throughout the repository's history,
and not just in the HEAD revision.
  2) The submodule has moved around the tree in the past, and it has
occasionally disappeared for a commit or two. For example, it used to
live in "Media", not "streaming/media". We want to hook up these
different historical locations into a single submodule.

The proposed solution: I'm working on a 'git submodule split' script
which works as follows:

  git submodule split streaming/media Media
  rm .git/refs/original # or just use --force below
  git submodule split other-binaries

This will create two submodules, one at streaming/media and one at
other-binaries. It will rewrite the parent repository's history to
create correct submodule links, and update .gitmodules as necessary at
each point in the history. The new modules will be placed at their
most recent locations in the tree.

Some Q&A:

Q. Why not merge 'submodule split' into the existing 'filter-branch' loop?

A. Internally, 'submodule split' needs to make two separate passes
with 'filter-branch': One to create the new submodule, and one to
update the parent. If I were to merge 'submodule split' into the
existing filter-branch loop, filter-branch would need to keep track of
two repositories. Writing 'submodule split' as a wrapper around
filter-branch helps keep filter-branch simple.

Q. Why only process one submodule at time?

A. If there were multiple submodules, each with several different
historical locations, the data structures in sh would get too tricky
for me to implement well. But I'm happy to take patches and UI
suggestions.

Q. Why operate on the current directory, and why output the new
submodule in place?

A. An earlier version of 'submodule split' took the arguments
'src-repo dst-repo sub-repo sub-repo-dir...'. This required the user
to do more typing, and it didn't feel very "git like". Johannes
Schindelin suggested the current interface. The new interface feels
more natural to me, and it's certainly easier to use in the common
cases.

Q. What's the status of the code?

A. I'll have a very basic implementation of this interface shortly--I
can already handle simple splits, but I want to add more test cases
and add support for directories which move around the tree.

Thank you very much for your feedback! I appreciate the time that the
reviewers spent helping to improve my filter-branch patch, and I'd
like to make this patch as good as possible.

Cheers,
Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC/PATCHv2] git submodule split
  2009-02-12 21:50 [RFC] What's the best UI for 'git submodule split'? Eric Kidd
@ 2009-02-14  2:24 ` Eric Kidd
  2009-02-14  4:37   ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Kidd @ 2009-02-14  2:24 UTC (permalink / raw)
  To: git; +Cc: Eric Kidd, Junio C Hamano, Johannes Schindelin

Proposed usage:
    git submodule split [--url submodule_repo_url] submodule_dir \
        [alternate_dir...]

Replace submodule_dir with a newly-created submodule, keeping all the
history of submodule_dir.  This command also rewrites each commit in the
current repository's history to include the correct revision of
sumodule_dir and the appropriate .gitmodules entries.

If the submodule has moved around the source tree, specify one or more
values for alternate_dir.  To specify the URL of the newly created
repository (for use in .gitmodules), use the --url parameter.

Johannes Schindelin provided extensive help with the UI and
implementation of this command (but has not yet reviewed the code).

Cc: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>

---
Open questions:

  1) Right now, this command is actually git-submodule-split.sh.  Should
     I include this code directly into git-submodule.sh, or move it
     to git-submodule--split.sh and hook it into git-submodule.sh?

  2) Should I implement a --force flag based on filter-branch?  Johannes
     Schindelin has suggested that it might be better to remove the
     --force flag from filter-branch and just rely on the reflog to keep
     backups.

  3) It would be useful to have a version of this command which didn't
     rewrite the super-project's history.  In fact, the non-rewriting
     version should probably be the default, and the current behavior
     should probably be selected using --rewrite.  Thoughts?

  4) We're obviously going to need to support revision arguments other
     than --all (which is what we currently pass to filter-branch).  Should
     we default to the current branch only, or to --all?

Design Q&A (this appeared in an earlier e-mail)

  Q. Why not merge 'submodule split' into the existing 'filter-branch'
  loop?

  A. Internally, 'submodule split' needs to make two separate passes
  with 'filter-branch': One to create the new submodule, and one to
  update the parent. If I were to merge 'submodule split' into the
  existing filter-branch loop, filter-branch would need to keep track of
  two repositories. Writing 'submodule split' as a wrapper around
  filter-branch helps keep filter-branch simple.

  Q. Why only process one submodule at time?

  A. If there were multiple submodules, each with several different
  historical locations, the data structures in sh would get too tricky
  for me to implement well. But I'm happy to take patches and UI
  suggestions.

  Q. Why operate on the current directory, and why output the new
  submodule in place?

  A. An earlier version of 'submodule split' took the arguments
  'src-repo dst-repo sub-repo sub-repo-dir...'. This required the user
  to do more typing, and it didn't feel very "git like". Johannes
  Schindelin suggested the current interface. The new interface feels
  more natural to me, and it's certainly easier to use in the common
  cases.

 .gitignore                 |    1 +
 Makefile                   |    1 +
 git-submodule-split.sh     |  190 ++++++++++++++++++++++++++++++++++++++++++++
 t/t7404-submodule-split.sh |  135 +++++++++++++++++++++++++++++++
 4 files changed, 327 insertions(+), 0 deletions(-)
 create mode 100644 git-submodule-split.sh
 create mode 100755 t/t7404-submodule-split.sh

diff --git a/.gitignore b/.gitignore
index 1c57d4c..603ad7a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -118,6 +118,7 @@ git-show
 git-show-branch
 git-show-index
 git-show-ref
+git-submodule-split
 git-stage
 git-stash
 git-status
diff --git a/Makefile b/Makefile
index b040a96..96274f1 100644
--- a/Makefile
+++ b/Makefile
@@ -276,6 +276,7 @@ SCRIPT_SH += git-sh-setup.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
 SCRIPT_SH += git-web--browse.sh
+SCRIPT_SH += git-submodule-split.sh
 
 SCRIPT_PERL += git-add--interactive.perl
 SCRIPT_PERL += git-archimport.perl
diff --git a/git-submodule-split.sh b/git-submodule-split.sh
new file mode 100644
index 0000000..4582d2a
--- /dev/null
+++ b/git-submodule-split.sh
@@ -0,0 +1,190 @@
+#!/bin/sh
+# 
+# Split a repository into a submodule and main module, with history
+#
+# Copyright 2009 Trustees of Dartmouth College
+# License: GNU General Public License, version 2 or later
+
+USAGE="[--url submodule_repo_url] submodule_dir [alternate_dir...]"
+
+OPTIONS_SPEC=
+. git-sh-setup
+require_work_tree
+
+# Set up our temporary directory.  We export these variables because we
+# want to use them from scripts passed to 'git filter-branch'.  We can't
+# simply substitute these variable values into the text of the scripts we
+# pass to 'git filter-branch', because the filenames may contain spaces,
+# which would get mangled.  Thanks to Johannes Schindelin for this idea.
+export GIT_SPLIT_TEMP_DIR="$GIT_DIR/.git_split"
+export GIT_SPLIT_MAP_DIR="$GIT_SPLIT_TEMP_DIR/map"
+rm -rf "$GIT_SPLIT_TEMP_DIR" &&
+mkdir -p "$GIT_SPLIT_MAP_DIR" || exit
+
+# Parse our command-line arguments.
+mkdir "$GIT_SPLIT_TEMP_DIR/dirs" || exit
+dir_count=0
+while test $# -ne 0; do
+	case "$1" in
+	--)
+		shift
+		break
+		;;
+	--url)
+                shift
+                test $# -ne 0 || die "Must supply argument to --url"
+		sub_url="$1"
+		shift
+		;;
+        -*)
+		die "Unknown option: $1"
+		;;
+	*)
+		# Use the first specified directory as the subrepository
+		# name.
+		if test "$dir_count" -eq 0; then
+			sub_path="$1"
+		fi
+
+		# There's no good way to pass an array of filenames
+		# containing spaces to our subprocesses, so let's cheat
+		# shamelessly and create an "array" of files on disk.
+		printf '%s' "$1" > "$GIT_SPLIT_TEMP_DIR/dirs/$dir_count"
+		dir_count=$(($dir_count + 1))
+                shift
+		;;
+	esac
+done
+
+# We should have at least one directory listed on the command line.
+test "$dir_count" -ge 1 || usage
+
+# Default the repository URL to something based on the repository path.
+if test -z "$sub_url"; then
+	sub_url="../$sub_path"
+fi
+
+# TODO: Pass remaining arguments to rev-parse, defaulting to --all.
+revs="--all"
+
+# More variables for our subprocesses.
+export GIT_SPLIT_DIR_COUNT="$dir_count"
+export GIT_SPLIT_SUB_PATH="$sub_path"
+export GIT_SPLIT_SUB_URL="$sub_url"
+
+# Make sure our environment is sane.
+test "$(is_bare_repository)" = false ||
+	die "Cannot run submodule split in a bare repository"
+git diff-files --ignore-submodules --quiet &&
+	git diff-index --cached --quiet HEAD -- ||
+	die "Cannot split out a submodule with a dirty working directory."
+(cd "$sub_path" &&
+	test "$GIT_DIR" = "$(git rev-parse --git-dir)" ||
+	die "$sub_path is already in a submodule")
+
+
+
+#--------------------------------------------------------------------------
+# Create the new submodule
+
+# Create a new repository at the last known address of our submodule.  We
+# initially share our objects with our parent repository.
+src_repo="$(pwd)"
+sub_repo_temp="$GIT_SPLIT_TEMP_DIR/s" && mkdir "$sub_repo_temp" &&
+(cd "$sub_repo_temp" &&
+	git init &&
+	git remote add origin --mirror "$src_repo" &&
+	echo "$src_repo/.git/objects" > .git/objects/info/alternates &&
+	git fetch --update-head-ok &&
+	git remote rm origin &&
+	git read-tree -u -m HEAD) || exit
+
+index_filter=$(cat << \EOF
+map_info="$GIT_SPLIT_MAP_DIR/$GIT_COMMIT"
+
+# Check for the submodule in all possible locations.
+i=0
+while test "$i" -lt "$GIT_SPLIT_DIR_COUNT"; do
+	candidate="$(cat "$GIT_SPLIT_TEMP_DIR/dirs/$i")" || exit
+	if git rev-parse -q --verify "$GIT_COMMIT:$candidate"; then
+		# Borrowed from git filter-branch.
+		err="$(git read-tree -i -m "$GIT_COMMIT:$candidate" 2>&1)" ||
+			die "$err"
+	        printf '%s' "$candidate" > "$map_info-dir"
+		break
+	fi
+	i=$(($i + 1))
+done
+EOF
+)
+
+commit_filter=$(cat << \EOF
+map_info="$GIT_SPLIT_MAP_DIR/$GIT_COMMIT"
+if test -f "$map_info-dir"; then
+	new_commit="$(git_commit_non_empty_tree "$@")" || exit
+	echo "$new_commit"
+	echo "$new_commit" > "$map_info-submodule-commit" ||
+		die "Can't record the commit ID of the new commit"
+else
+	skip_commit "$@"
+fi
+EOF
+)
+
+# Run our filters, repack the results as a standalone repository with no
+# extra history, and check out HEAD.
+(cd "$sub_repo_temp" &&
+	git filter-branch --index-filter "$index_filter" \
+		--commit-filter "$commit_filter" -- "$revs" &&
+	rm -rf .git/refs/original &&
+	git reflog expire --expire="now" --all &&
+	git repack -a -d &&
+	rm .git/objects/info/alternates) || exit
+
+#--------------------------------------------------------------------------
+# Create the new superproject
+
+index_filter=$(cat << \EOF
+map_info="$GIT_SPLIT_MAP_DIR/$GIT_COMMIT"
+
+# Only update the index if the submodule is present in this revision.
+if test -f "$map_info-dir"; then
+	dir="$(cat "$map_info-dir")" || exit
+
+	# Splice the repo into the tree.
+	test -f "$map_info-submodule-commit" ||
+		die "Can't find map for $GIT_COMMIT"
+	git rm -q --cached -r "$dir" || exit
+	subcommit="$(cat "$map_info-submodule-commit")" || exit
+	echo "160000 $subcommit	$dir" |
+		git update-index --index-info || exit
+
+	# Either update the old .gitmodules file, or make a new one.
+	gitmodules="$GIT_SPLIT_TEMP_DIR/gitmodules"
+	if git rev-parse -q --verify "$GIT_COMMIT:.gitmodules"; then
+		git cat-file blob "$GIT_COMMIT:.gitmodules" > "$gitmodules" ||
+			exit
+	fi
+	subsection=submodule."$GIT_SPLIT_SUB_PATH"
+	git config -f "$gitmodules" "$subsection".path "$dir"
+	git config -f "$gitmodules" "$subsection".url "$GIT_SPLIT_SUB_URL"
+
+	# Write the new .gitmodules file into the tree.
+	new_obj="$(git hash-object -t blob -w "$gitmodules")" ||
+		die "Error adding new .gitmodules file to tree"
+	git update-index --add --cacheinfo 100644 "$new_obj" .gitmodules || exit
+fi
+EOF
+)
+
+# Run our filter.
+git filter-branch --index-filter "$index_filter" -- "$revs" || exit
+
+# Move our submodule into place.  This has to wait until last, because
+# we want to keep the tree clean until after the final git filter, and we
+# need to have a place to put the new submodule.
+rmdir "$sub_path"
+test -d "$sub_path" && die "submodule $sub_path was not actually deleted"
+mv "$sub_repo_temp" "$sub_path" || exit
+
+exit 0
diff --git a/t/t7404-submodule-split.sh b/t/t7404-submodule-split.sh
new file mode 100755
index 0000000..ecc167e
--- /dev/null
+++ b/t/t7404-submodule-split.sh
@@ -0,0 +1,135 @@
+#!/bin/sh
+#
+# Copyright 2009 Trustees of Dartmouth College
+
+test_description='git submodule split tests'
+. ./test-lib.sh
+
+# We use two main repositories: An "original" repository, which remains
+# unmodified, and a "working" repository, which we transform repeatedly.
+rm -rf .git
+test_create_repo original
+
+test_expect_success 'create original repository' '
+	(cd original &&
+		echo "In main project" > main-file &&
+		mkdir sub1 &&
+		echo "In sub1" > sub1/sub1-file &&
+		git add . &&
+		git commit -m "Original project and sub1" &&
+		git tag c1 &&
+        	mkdir -p nested/sub2 &&
+	  	echo "In sub2" > nested/sub2/sub2-file &&
+		git add . &&
+		git commit -m "Add sub2" &&
+		git tag c2 &&
+		git rm -r nested &&
+		git commit -m "Removing nested temporarily" &&
+		git tag c3 &&
+		git checkout c2 -- nested &&
+		git add . &&
+		git commit -m "Putting nested back" &&
+		git tag c4 &&
+		git mv nested/sub2 other-sub2 &&
+		echo "Changed file" >> other-sub2/sub2-file &&
+		git add . &&
+		git commit -m "Moving sub2 and changing a file" &&
+		git tag c5 &&
+		git mv other-sub2 nested/sub2 &&
+		git commit -m "Moving sub2 back" &&
+		git tag c6
+	)
+'
+
+test_expect_success 'make a working repository' '
+	abs_src_path="$(pwd)/original" && mkdir working &&
+	(cd working &&
+		git init &&
+		git remote add origin --mirror "$abs_src_path" &&
+		git fetch --update-head-ok &&
+		git remote rm origin &&
+		git read-tree -u -m HEAD)
+'
+
+test_expect_success 'split out sub1' '
+	(cd working &&
+		git submodule-split --url ../sub1-repo sub1 &&
+		test -f main-file &&
+		test -d sub1/.git &&
+		test_must_fail git rev-parse -q --verify HEAD:sub1/sub1-file &&
+		(cd sub1 && git rev-parse -q --verify HEAD:sub1-file)
+	)
+'
+
+test_expect_success 'split out sub2' '
+	(cd working &&
+		rm -rf .git/refs/original &&
+		git submodule-split nested/sub2 other-sub2 &&
+		test -d nested/sub2/.git &&
+		test_must_fail git rev-parse -q --verify \
+			HEAD:nested/sub2/sub2-file &&
+		test_must_fail git rev-parse -q --verify \
+			c5:other-sub2/sub2-file &&
+		(cd nested/sub2 &&
+			git rev-parse -q --verify HEAD:sub2-file &&
+			git rev-parse -q --verify c5:sub2-file)
+	)
+'
+
+submodule_path() {
+	git config -f .gitmodules submodule."$1".path
+}
+
+submodule_url() {
+	git config -f .gitmodules submodule."$1".url
+}
+
+test_expect_success 'make sure .gitmodules knows about both submodules' '
+	(cd working &&
+		test "$(submodule_path sub1)" = sub1 &&
+		test "$(submodule_url  sub1)" = ../sub1-repo &&
+		test "$(submodule_path nested/sub2)" = nested/sub2 &&
+		test "$(submodule_url  nested/sub2)" = ../nested/sub2
+	)
+'
+
+test_expect_success 'compare each commit in split repository with original' '
+	rm -rf working/.git/refs/original &&
+	module_base="$(pwd)/original" &&
+	(cd working && git config remote.origin.url "$module_base") &&
+	mv working/sub1 sub1-repo &&
+	mkdir nested && mv working/nested/sub2 nested &&
+	original_revs="$(cd original && git rev-parse --all)" &&
+	working_revs="$(cd working && git rev-parse --all)" &&
+	while test -n "$original_revs"; do
+		original_commit="$(echo "$original_revs" | head -n 1)" &&
+		working_commit="$(echo "$working_revs" | head -n 1)" &&
+		original_revs="$(echo "$original_revs" | tail -n +2)" &&
+		working_revs="$(echo "$working_revs" | tail -n +2)" &&
+		(cd original && git checkout -f "$original_commit") &&
+		(cd working && git checkout -f "$working_commit" &&
+			git clean -fd &&
+			git submodule update --init) &&
+		diff -Nr -x .git -x .gitmodules original working ||
+			exit
+	done
+'
+
+test_expect_success 'verify that empty commits are skipped' '
+	(cd working/sub1 &&
+		test "$(git rev-parse c1)" = "$(git rev-parse c2)"
+	)
+'
+
+# Note that we should probably also drop the c3 tag here, because sub2
+# temporarily disappeared from the tree during that commit, but doing so
+# will require more work.  For now, we map c3 back to the last known state
+# of the directory when it was actually in-tree.
+test_expect_success 'verify that directories missing from rev are skipped' '
+	(cd working/nested/sub2 &&
+		test_must_fail git rev-parse -q --verify c1 &&
+		test "$(git rev-parse c2)" = "$(git rev-parse c4)"
+	)
+'
+
+test_done
-- 
1.6.2.rc0.59.g5c88.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  2:24 ` [RFC/PATCHv2] git submodule split Eric Kidd
@ 2009-02-14  4:37   ` Junio C Hamano
  2009-02-14  5:17     ` Eric Kidd
  2009-02-14 11:46     ` Johannes Schindelin
  0 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2009-02-14  4:37 UTC (permalink / raw)
  To: Eric Kidd; +Cc: git, Johannes Schindelin

Eric Kidd <git@randomhacks.net> writes:

> Proposed usage:
>     git submodule split [--url submodule_repo_url] submodule_dir \
>         [alternate_dir...]
>
> Replace submodule_dir with a newly-created submodule, keeping all the
> history of submodule_dir.  This command also rewrites each commit in the
> current repository's history to include the correct revision of
> sumodule_dir and the appropriate .gitmodules entries.
>
> If the submodule has moved around the source tree, specify one or more
> values for alternate_dir.  To specify the URL of the newly created
> repository (for use in .gitmodules), use the --url parameter.

Unfortunately, I do not think we have designed fully (nor implemented at
all) behaviour to check out different points of history that has the same
submodule moved around in the superproject tree.

There were several unconcluded discussions done in the past (and I admit I
participated in a few of them), but it may be hard to use the resulting
repository out of this tool.

I am not saying the split-submodule-history tool is bad in any way, of
course.  I'm just saying that the "git submodule" side needs to be updated
to support such a history better; otherwise the tool's output won't be
usable effectively.  You may want to Cc "submodule" people in the
discussion.

> Johannes Schindelin provided extensive help with the UI and
> implementation of this command (but has not yet reviewed the code).
>
> Cc: Junio C Hamano <gitster@pobox.com>
> Cc: Johannes Schindelin <johannes.schindelin@gmx.de>

Please drop these two lines.  They belong to e-mail headers.

> Open questions:
>
>   1) Right now, this command is actually git-submodule-split.sh.  Should
>      I include this code directly into git-submodule.sh, or move it
>      to git-submodule--split.sh and hook it into git-submodule.sh?

How about in contrib/ somewhere?

>   2) Should I implement a --force flag based on filter-branch?  Johannes
>      Schindelin has suggested that it might be better to remove the
>      --force flag from filter-branch and just rely on the reflog to keep
>      backups.

Sounds sensible to me, but I do not have strong feeling about this either way.

>   4) We're obviously going to need to support revision arguments other
>      than --all (which is what we currently pass to filter-branch).  Should
>      we default to the current branch only, or to --all?

Matching what filter-branch defaults to would be the most natural,
wouldn't it?

I didn't look at the patch, though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  4:37   ` Junio C Hamano
@ 2009-02-14  5:17     ` Eric Kidd
  2009-02-14  9:03       ` Lars Hjemli
  2009-02-14 11:46     ` Johannes Schindelin
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Kidd @ 2009-02-14  5:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Mark Levedahl, Ping Yin, Lars Hjemli

On Fri, Feb 13, 2009 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Kidd <git@randomhacks.net> writes:
>> ...
>> If the submodule has moved around the source tree, specify one or more
>> values for alternate_dir.  To specify the URL of the newly created
>> repository (for use in .gitmodules), use the --url parameter.
>
> Unfortunately, I do not think we have designed fully (nor implemented at
> all) behaviour to check out different points of history that has the same
> submodule moved around in the superproject tree.
>
> There were several unconcluded discussions done in the past (and I admit I
> participated in a few of them), but it may be hard to use the resulting
> repository out of this tool.

Thank you for looking at this proposal!

I think that the resulting repository is usable (though it could
certainly be better). In particular, the following commands will
always give you a working checkout:

  git checkout any-version
  git submodule update --init

The unit tests for git-submodule-split.sh actually walk through the
entire history and run 'git submodule update --init' at each revision.
This works correctly because git-submodule-split creates the necessary
.gitmodules entries for each revision, and includes the
submodule.*.url value that you specify.

Unfortunately, this means that whenever the submodule moves to a new
location in the tree, 'git submodule --init' will actually have to
clone it again. That's not a perfect situation, but it will work for
reasonably small submodules.

>>   1) Right now, this command is actually git-submodule-split.sh.  Should
>>      I include this code directly into git-submodule.sh, or move it
>>      to git-submodule--split.sh and hook it into git-submodule.sh?
>
> How about in contrib/ somewhere?

Sounds good to me! I'd like to include the unit tests and some
documentation, if that's OK.

I'll let you know when the patch has been reviewed, and submit it for
inclusion in contrib.

>>   2) Should I implement a --force flag based on filter-branch?  Johannes
>>      Schindelin has suggested that it might be better to remove the
>>      --force flag from filter-branch and just rely on the reflog to keep
>>      backups.
>
> Sounds sensible to me, but I do not have strong feeling about this either way.

I realized that there's a problem with removing  --force from
filter-branch: The reflog doesn't contain backups of the rewritten
tags. So I'm afraid --force and the refs/original/ directory will need
to remain for now. (Any thoughts, Johannes?)

>>   4) We're obviously going to need to support revision arguments other
>>      than --all (which is what we currently pass to filter-branch).  Should
>>      we default to the current branch only, or to --all?
>
> Matching what filter-branch defaults to would be the most natural,
> wouldn't it?

I think so, although many (or most?) users will probably want to use '-- --all'.

Thank you very much for your suggestions!

Cheers,
Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  5:17     ` Eric Kidd
@ 2009-02-14  9:03       ` Lars Hjemli
  2009-02-14 11:44         ` Johannes Schindelin
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Lars Hjemli @ 2009-02-14  9:03 UTC (permalink / raw)
  To: Eric Kidd
  Cc: Junio C Hamano, git, Johannes Schindelin, Mark Levedahl, Ping Yin

On Sat, Feb 14, 2009 at 06:17, Eric Kidd <git@randomhacks.net> wrote:
> On Fri, Feb 13, 2009 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Kidd <git@randomhacks.net> writes:
>>> ...
>>> If the submodule has moved around the source tree, specify one or more
>>> values for alternate_dir.  To specify the URL of the newly created
>>> repository (for use in .gitmodules), use the --url parameter.
>>
>> Unfortunately, I do not think we have designed fully (nor implemented at
>> all) behaviour to check out different points of history that has the same
>> submodule moved around in the superproject tree.
>>
>> There were several unconcluded discussions done in the past (and I admit I
>> participated in a few of them), but it may be hard to use the resulting
>> repository out of this tool.
>
> Thank you for looking at this proposal!
>
> I think that the resulting repository is usable (though it could
> certainly be better). In particular, the following commands will
> always give you a working checkout:
>
>  git checkout any-version
>  git submodule update --init
>
> The unit tests for git-submodule-split.sh actually walk through the
> entire history and run 'git submodule update --init' at each revision.
> This works correctly because git-submodule-split creates the necessary
> .gitmodules entries for each revision, and includes the
> submodule.*.url value that you specify.
>
> Unfortunately, this means that whenever the submodule moves to a new
> location in the tree, 'git submodule --init' will actually have to
> clone it again. That's not a perfect situation, but it will work for
> reasonably small submodules.

<hand-waving>
I didn't look at the patch, but if the submodule uses a single
module-name while moving around, the re-cloning problem would by
solved if the submodule git-dir was stored inside the git-dir of the
containing repository  (by using the git-file mechanism). Maybe I
should try to finally implement this...
</hand-waving>

--
larsh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  9:03       ` Lars Hjemli
@ 2009-02-14 11:44         ` Johannes Schindelin
  2009-02-17 10:17         ` Nanako Shiraishi
  2009-02-19 23:04         ` Kyle Moffett
  2 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2009-02-14 11:44 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: Eric Kidd, Junio C Hamano, git, Mark Levedahl, Ping Yin

Hi,

On Sat, 14 Feb 2009, Lars Hjemli wrote:

> On Sat, Feb 14, 2009 at 06:17, Eric Kidd <git@randomhacks.net> wrote:
> > On Fri, Feb 13, 2009 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> >> Eric Kidd <git@randomhacks.net> writes:
> >>> ...
> >>> If the submodule has moved around the source tree, specify one or more
> >>> values for alternate_dir.  To specify the URL of the newly created
> >>> repository (for use in .gitmodules), use the --url parameter.
> >>
> >> Unfortunately, I do not think we have designed fully (nor implemented at
> >> all) behaviour to check out different points of history that has the same
> >> submodule moved around in the superproject tree.
> >>
> >> There were several unconcluded discussions done in the past (and I admit I
> >> participated in a few of them), but it may be hard to use the resulting
> >> repository out of this tool.
> >
> > Thank you for looking at this proposal!
> >
> > I think that the resulting repository is usable (though it could
> > certainly be better). In particular, the following commands will
> > always give you a working checkout:
> >
> >  git checkout any-version
> >  git submodule update --init
> >
> > The unit tests for git-submodule-split.sh actually walk through the
> > entire history and run 'git submodule update --init' at each revision.
> > This works correctly because git-submodule-split creates the necessary
> > .gitmodules entries for each revision, and includes the
> > submodule.*.url value that you specify.
> >
> > Unfortunately, this means that whenever the submodule moves to a new
> > location in the tree, 'git submodule --init' will actually have to
> > clone it again. That's not a perfect situation, but it will work for
> > reasonably small submodules.
> 
> <hand-waving>
> I didn't look at the patch, but if the submodule uses a single
> module-name while moving around, the re-cloning problem would by
> solved if the submodule git-dir was stored inside the git-dir of the
> containing repository  (by using the git-file mechanism). Maybe I
> should try to finally implement this...
> </hand-waving>

How should that help with the _working directory_ of the submodule?  After 
all, _that_ is the part we are having with, as the untracked files in that 
directory _are_ part of the submodule.

The real kicker is that when we want to "git submodule checkout" the 
(moved) submodule, we no longer know where it was found last time (and 
where it still is).  We need a sane semantic for that (and I think it 
involves the addition of submodule.<name>.path to the superproject's 
config, something we do not do yet).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  4:37   ` Junio C Hamano
  2009-02-14  5:17     ` Eric Kidd
@ 2009-02-14 11:46     ` Johannes Schindelin
  2009-02-14 14:11       ` Eric Kidd
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2009-02-14 11:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Kidd, git

Hi,

On Fri, 13 Feb 2009, Junio C Hamano wrote:

> Eric Kidd <git@randomhacks.net> writes:
> 
> >   1) Right now, this command is actually git-submodule-split.sh.  
> >      Should I include this code directly into git-submodule.sh, or 
> >      move it to git-submodule--split.sh and hook it into 
> >      git-submodule.sh?
> 
> How about in contrib/ somewhere?

As I said to Eric already, I would like this to be part of git-submodule 
proper, as I expect a lot of people needing it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14 11:46     ` Johannes Schindelin
@ 2009-02-14 14:11       ` Eric Kidd
  2009-02-14 23:01         ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Kidd @ 2009-02-14 14:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, Mark Levedahl, Ping Yin, Lars Hjemli

On Sat, Feb 14, 2009 at 6:46 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> As I said to Eric already, I would like this to be part of git-submodule
> proper, as I expect a lot of people needing it.

I'm happy to do whatever people want. :-) Even if this goes into
contrib/, I want to include everything a regular git command would
have: unit tests, a man page, portable sh code, etc. And I want to
make 'git-submodule-split' useful to as many people as possible.

Which brings me to a design question: Would 'git-submodule-split' be
useful to more people if it were actually two commands?

1) git submodule split: Create the new submodule, update the working
copy, but do not change any history in the super-project. This would
be useful for existing projects that don't want to rewrite existing
commits, but which want to spin off a submodule.

2) git submodule split --rewrite-history: Update the history of the
project to use the rewritten submodule. This would be most useful when
migrating repositories to git.

I've already implemented (2), but frankly, it feels like a special
case of a larger command. It would be very easy to implement (1) and
make it the default.

Cheers,
Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14 14:11       ` Eric Kidd
@ 2009-02-14 23:01         ` Johannes Schindelin
  2009-02-14 23:13           ` Sverre Rabbelier
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2009-02-14 23:01 UTC (permalink / raw)
  To: Eric Kidd; +Cc: Junio C Hamano, git, Mark Levedahl, Ping Yin, Lars Hjemli

Hi,

On Sat, 14 Feb 2009, Eric Kidd wrote:

> On Sat, Feb 14, 2009 at 6:46 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > As I said to Eric already, I would like this to be part of git-submodule
> > proper, as I expect a lot of people needing it.
> 
> I'm happy to do whatever people want. :-) Even if this goes into
> contrib/, I want to include everything a regular git command would
> have: unit tests, a man page, portable sh code, etc.

That is the problem.  IIRC we do not have a testing framework in contrib/.  
Indeed, we would not catch breakages in contrib/ at all.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14 23:01         ` Johannes Schindelin
@ 2009-02-14 23:13           ` Sverre Rabbelier
  0 siblings, 0 replies; 14+ messages in thread
From: Sverre Rabbelier @ 2009-02-14 23:13 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Eric Kidd, Junio C Hamano, git, Mark Levedahl, Ping Yin, Lars Hjemli

On Sun, Feb 15, 2009 at 00:01, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> That is the problem.  IIRC we do not have a testing framework in contrib/.
> Indeed, we would not catch breakages in contrib/ at all.

I recall a patch about getting 'make test' to work in /contrib?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  9:03       ` Lars Hjemli
  2009-02-14 11:44         ` Johannes Schindelin
@ 2009-02-17 10:17         ` Nanako Shiraishi
  2009-02-19 19:23           ` Lars Hjemli
  2009-02-19 23:04         ` Kyle Moffett
  2 siblings, 1 reply; 14+ messages in thread
From: Nanako Shiraishi @ 2009-02-17 10:17 UTC (permalink / raw)
  To: Lars Hjemli
  Cc: Eric Kidd, Junio C Hamano, git, Johannes Schindelin,
	Mark Levedahl, Ping Yin

Quoting Lars Hjemli <hjemli@gmail.com>:

> I didn't look at the patch, but if the submodule uses a single
> module-name while moving around, the re-cloning problem would by
> solved if the submodule git-dir was stored inside the git-dir of the
> containing repository  (by using the git-file mechanism). Maybe I
> should try to finally implement this...

Is it similar to what was discussed earlier in the thread http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621 (I asked gmane for "submodule relocate")?

I think it is a good idea to resume your relocatable submodule directory design. Junio said he will keep 'next' open during the stabilization period before the release. I think he means that there is no need for you to be waiting.

I noticed that Junio hasn't taken very big changes to git-submodule and I think it is firstly because he doesn't use submodules himself heavily as he said in the past. It is understandable if he feels uneasy to take large changes that can potentially affect its existing users if he hasn't developed enough first-hand experience with the part of the system himself.

Junio doesn't get involved in git-gui discussion very much either, and I think the reason is the same (see http://gitster.livejournal.com/24080.html for example where he says he doesn't usually use GUI tools) but git-gui has Shawn who takes both initiative and responsibility in the area. git-submodule support would benefit from someobody like Shawn who takes an active role in improving it.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-17 10:17         ` Nanako Shiraishi
@ 2009-02-19 19:23           ` Lars Hjemli
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Hjemli @ 2009-02-19 19:23 UTC (permalink / raw)
  To: Nanako Shiraishi
  Cc: Eric Kidd, Junio C Hamano, git, Johannes Schindelin,
	Mark Levedahl, Ping Yin

[* sorry for the late reply *]

On Tue, Feb 17, 2009 at 11:17, Nanako Shiraishi <nanako3@lavabit.com> wrote:
> Quoting Lars Hjemli <hjemli@gmail.com>:
>
>> I didn't look at the patch, but if the submodule uses a single
>> module-name while moving around, the re-cloning problem would by
>> solved if the submodule git-dir was stored inside the git-dir of the
>> containing repository  (by using the git-file mechanism). Maybe I
>> should try to finally implement this...
>
> Is it similar to what was discussed earlier in the thread
> http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621
> (I asked gmane for "submodule relocate")?

Well, kind of. We ended up with a scheme similar to what Junio
described in the linked mail: every submodule in a repository is given
a canonical name by the .gitmodules file and this name is then linked
with a url in .git/config. The .gitmodules file also states at which
path(s) this submodule is located. So we can keep track of which
submodules we're interested in (and what urls to use for fetching
objects to those submodules) irrespective of the path used to check
out the submodules.

But when we're switching between branches in which a submodule is
checked out at different paths, we currently loose track of both the
old worktree and its gitdir. My gitfile proposal would salvage the
gitdir but, as Dscho mentioned, uncommited+untracked data in the
worktree would not be handled.

So I currently think it's better to make `git submodule init` update
.git/config with information about the current submodule path (again,
as Dscho mentioned). Then, after switching branches, `git submodule
<some-verb>` could notice that the current path is different from the
one in .git/config and simply `mv oldpath newpath` before updating
.git/config with the new path.

Btw. this wouldn't work smoothly if a single submodule was checked out
at multiple paths in the same revision, but I don't see why anyone
would want to do something like that...

--
larsh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHv2] git submodule split
  2009-02-14  9:03       ` Lars Hjemli
  2009-02-14 11:44         ` Johannes Schindelin
  2009-02-17 10:17         ` Nanako Shiraishi
@ 2009-02-19 23:04         ` Kyle Moffett
  2 siblings, 0 replies; 14+ messages in thread
From: Kyle Moffett @ 2009-02-19 23:04 UTC (permalink / raw)
  To: Lars Hjemli
  Cc: Eric Kidd, Junio C Hamano, git, Johannes Schindelin,
	Mark Levedahl, Ping Yin

On Sat, Feb 14, 2009 at 4:03 AM, Lars Hjemli <hjemli@gmail.com> wrote:
> On Sat, Feb 14, 2009 at 06:17, Eric Kidd <git@randomhacks.net> wrote:
>> On Fri, Feb 13, 2009 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Eric Kidd <git@randomhacks.net> writes:
>>>> ...
>>>> If the submodule has moved around the source tree, specify one or more
>>>> values for alternate_dir.  To specify the URL of the newly created
>>>> repository (for use in .gitmodules), use the --url parameter.
>>>
>>> Unfortunately, I do not think we have designed fully (nor implemented at
>>> all) behaviour to check out different points of history that has the same
>>> submodule moved around in the superproject tree.
>>>
>>> There were several unconcluded discussions done in the past (and I admit I
>>> participated in a few of them), but it may be hard to use the resulting
>>> repository out of this tool.
>>
>> Thank you for looking at this proposal!
>>
>> I think that the resulting repository is usable (though it could
>> certainly be better). In particular, the following commands will
>> always give you a working checkout:
>>
>>  git checkout any-version
>>  git submodule update --init
>>
>> The unit tests for git-submodule-split.sh actually walk through the
>> entire history and run 'git submodule update --init' at each revision.
>> This works correctly because git-submodule-split creates the necessary
>> .gitmodules entries for each revision, and includes the
>> submodule.*.url value that you specify.
>>
>> Unfortunately, this means that whenever the submodule moves to a new
>> location in the tree, 'git submodule --init' will actually have to
>> clone it again. That's not a perfect situation, but it will work for
>> reasonably small submodules.
>
> <hand-waving>
> I didn't look at the patch, but if the submodule uses a single
> module-name while moving around, the re-cloning problem would by
> solved if the submodule git-dir was stored inside the git-dir of the
> containing repository  (by using the git-file mechanism). Maybe I
> should try to finally implement this...
> </hand-waving>

We use submodules at my workplace to keep track of a variety of
closely-related projects (branched from each other).  On account of
some deficiencies in the interface of the GIT we're using (including a
few that are still present), we have a bunch of custom scripts to
clone and check out the whole mess, but it goes something like this:

In super/.git/config in a checkout:

    [remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*
        push = +refs/heads/*:refs/heads/kmoffett-branches/*
        fetch = +refs/projects/heads/*:refs/projects/remotes/origin/*
        push = +refs/projects/heads/*:refs/projects/heads/kmoffett-branches/*

    [submodule "projects/FOO"]
        url = ./projects/FOO/.git


In super/.gitmodules:

    [submodule "projects/FOO"]
        path = projects/FOO
        url = ./projects/FOO/.git


In sub/.git/config (IE: super/projects/FOO/.git/config)

    [remote "origin"]
        url = ../..
        push = +refs/heads/*:refs/projects/heads/FOO/*
        fetch = +refs/projects/heads/*:refs/remotes/parent/*
        fetch = +refs/projects/remotes/*:refs/remotes/*

    [remote "parent"]
        ${same as remote.origin}


In sub/.git/objects/info/alternates (IE:
super/projects/FOO/.git/objects/info/alternates)

    ../../../../.git/objects


In this environment, basically *all* objects are kept in the
"superproject".  When doing a local commit into a subproject, the new
objects are first stored there (is there any way to change that?), but
on the first "git push" in the subproject they will be pushed up to
the parent's objects directory and the next GC of the child project
will clean them up.  All of the child branches are stored in
"refs/projects", so they don't show up by default in various "git
branch", etc, commands, but it's trivial to ensure they get pushed and
pulled appropriately.

Essentially the "superproject" consists of our project-management
environment, with the subprojects being each individual project, which
may be entirely independent.  There is a relatively tight feature
coupling between the per-project scripts and the version of the
management environment, so this works out relatively nicely for our
uses.

A clone by default will only get the superproject, if you want
subprojects you have to add the appropriate branch refs to the
.git/config file (as seen in the above example).  This is handy if
you're only working on one of the particular projects.  Having them
all as separate branches cloned from each other does make it very easy
to diff/merge/cherry-pick between them, even though they are
effectively independent.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] What's the best UI for 'git submodule split'?
       [not found] <431341160902121348n71df3185p2ec998c297d449fc@mail.gmail.com>
@ 2009-02-12 21:59 ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2009-02-12 21:59 UTC (permalink / raw)
  To: Eric Kidd; +Cc: git

Hi,

On Thu, 12 Feb 2009, Eric Kidd wrote:

>   rm .git/refs/original # or just use --force below

BTW I wanted to get rid of this for a long time now, but I cannot seem to 
find the time to work on it.  The 'original' refs should not be needed, as 
that's a job for the reflogs (yeah, people tried to convince me back then, 
but I finally got it, okay?)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-19 23:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-12 21:50 [RFC] What's the best UI for 'git submodule split'? Eric Kidd
2009-02-14  2:24 ` [RFC/PATCHv2] git submodule split Eric Kidd
2009-02-14  4:37   ` Junio C Hamano
2009-02-14  5:17     ` Eric Kidd
2009-02-14  9:03       ` Lars Hjemli
2009-02-14 11:44         ` Johannes Schindelin
2009-02-17 10:17         ` Nanako Shiraishi
2009-02-19 19:23           ` Lars Hjemli
2009-02-19 23:04         ` Kyle Moffett
2009-02-14 11:46     ` Johannes Schindelin
2009-02-14 14:11       ` Eric Kidd
2009-02-14 23:01         ` Johannes Schindelin
2009-02-14 23:13           ` Sverre Rabbelier
     [not found] <431341160902121348n71df3185p2ec998c297d449fc@mail.gmail.com>
2009-02-12 21:59 ` [RFC] What's the best UI for 'git submodule split'? Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.