All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] cache parent project's gitdir in submodules
@ 2021-10-14 20:34 Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-10-14 20:34 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Derrick Stolee, Jonathan Tan

For the original cover letter, see
https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.

The CI run is here:
https://github.com/nasamuffin/git/runs/3899227896

It shows some broken Windows tests, but those are broken in the
fork-point too:
https://github.com/nasamuffin/git/actions/runs/1343120990

Since v3, a pretty major change: the semantics of
submodule.superprojectGitDir has changed, to point from the submodule's
gitdir to the superproject's gitdir (in v3 and earlier, we kept a path
from the submodule's *worktree* to the superproject's gitdir instead).
This cleans up some of the confusions about the behavior when a
submodule worktree moves around in the superproject's tree, or in a
future when we support submodules having multiple worktrees.

I also tried to simplify the tests to use 'test-tool path-utils
relative_path' everywhere - I think that makes them much more clear for
a test reader, but if you're reviewing and it isn't obvious what we're
testing for, please speak up.

I think this is pretty mature and there was a lot of general agreement
that the gitdir->gitdir association was the way to go, so please be
brutal and look for nits, leaks, etc. this round ;)

 - Emily

Emily Shaffer (4):
  t7400-submodule-basic: modernize inspect() helper
  introduce submodule.superprojectGitDir record
  submodule: record superproject gitdir during absorbgitdirs
  submodule: record superproject gitdir during 'update'

 Documentation/config/submodule.txt | 12 +++++++
 builtin/submodule--helper.c        |  4 +++
 git-submodule.sh                   | 14 ++++++++
 submodule.c                        | 10 ++++++
 t/t7400-submodule-basic.sh         | 54 ++++++++++++++++--------------
 t/t7406-submodule-update.sh        | 12 +++++++
 t/t7412-submodule-absorbgitdirs.sh | 23 +++++++++++--
 7 files changed, 102 insertions(+), 27 deletions(-)

Range-diff against v3:
1:  f1236dc9d7 ! 1:  2ff151aaa2 t7400-submodule-basic: modernize inspect() helper
    @@ t/t7400-submodule-basic.sh: test_expect_success 'setup - repository to add submo
     -}
     -
      inspect() {
    - 	dir=$1 &&
    +-	dir=$1 &&
     -	dotdot="${2:-..}" &&
    - 
    +-
     -	(
     -		cd "$dir" &&
     -		listbranches >"$dotdot/heads" &&
    @@ t/t7400-submodule-basic.sh: test_expect_success 'setup - repository to add submo
     -		git diff-files --exit-code &&
     -		git clean -n -d -x >"$dotdot/untracked"
     -	)
    -+	git -C "$dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
    -+	{ git -C "$dir" symbolic-ref HEAD || :; } >head &&
    -+	git -C "$dir" rev-parse HEAD >head-sha1 &&
    -+	git -C "$dir" update-index --refresh &&
    -+	git -C "$dir" diff-files --exit-code &&
    -+	git -C "$dir" clean -n -d -x >untracked
    ++	sub_dir=$1 &&
    ++
    ++	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
    ++	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
    ++	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
    ++	git -C "$sub_dir" update-index --refresh &&
    ++	git -C "$sub_dir" diff-files --exit-code &&
    ++	git -C "$sub_dir" clean -n -d -x >untracked
      }
      
      test_expect_success 'submodule add' '
2:  2caf9eb8ee ! 2:  ed5479ad5d introduce submodule.superprojectGitDir record
    @@ Commit message
     
         By using a relative path instead of an absolute path, we can move the
         superproject directory around on the filesystem without breaking the
    -    submodule's cache.
    +    submodule's cache. And by using the path from gitdir to gitdir, we can
    +    move the submodule within the superproject's tree structure without
    +    breaking the submodule's cache, too.
     
         Since this hint value is only introduced during new submodule creation
         via `git submodule add`, though, there is more work to do to allow the
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
      	clone proceeds as if no alternate was specified.
     +
     +submodule.superprojectGitDir::
    -+	The relative path from the submodule's worktree to its superproject's
    ++	The relative path from the submodule's gitdir to its superproject's
     +	gitdir. When Git is run in a repository, it usually makes no difference
     +	whether this repository is standalone or a submodule, but if this
     +	configuration variable is present, additional behavior may be possible,
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
     +	to be present in every submodule, so only optional value-added behavior
     +	should be linked to it. It is set automatically during
     +	submodule creation.
    -++
    -+	Because of this configuration variable, it is forbidden to use the
    -+	same submodule worktree shared by multiple superprojects.
     
      ## builtin/submodule--helper.c ##
    -@@ builtin/submodule--helper.c: static int module_clone(int argc, const char **argv, const char *prefix)
    +@@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data *clone_data)
      		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
    - 					   error_strategy);
    + 				       error_strategy);
      
     +	git_config_set_in_file(p, "submodule.superprojectGitdir",
     +			       relative_path(absolute_path(get_git_dir()),
    -+					     path, &sb));
    ++					     sm_gitdir, &sb));
     +
      	free(sm_alternate);
      	free(error_strategy);
      
     
      ## t/t7400-submodule-basic.sh ##
    -@@ t/t7400-submodule-basic.sh: test_expect_success 'setup - repository to add submodules to' '
    - submodurl=$(pwd -P)
    +@@ t/t7400-submodule-basic.sh: submodurl=$(pwd -P)
      
      inspect() {
    --	dir=$1 &&
    --
    --	git -C "$dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
    --	{ git -C "$dir" symbolic-ref HEAD || :; } >head &&
    --	git -C "$dir" rev-parse HEAD >head-sha1 &&
    --	git -C "$dir" update-index --refresh &&
    --	git -C "$dir" diff-files --exit-code &&
    --	git -C "$dir" clean -n -d -x >untracked
    -+	sub_dir=$1 &&
    + 	sub_dir=$1 &&
     +	super_dir=$2 &&
    + 
    + 	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
    + 	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
    + 	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
    + 	git -C "$sub_dir" update-index --refresh &&
    + 	git -C "$sub_dir" diff-files --exit-code &&
    ++
    ++	# Ensure that submodule.superprojectGitDir contains the path from the
    ++	# submodule's gitdir to the superproject's gitdir.
    ++
    ++	super_abs_gitdir=$(git -C "$super_dir" rev-parse --absolute-git-dir) &&
    ++	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --absolute-git-dir) &&
    ++
    ++	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
    ++	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
    ++						"$sub_abs_gitdir")" ] &&
     +
    -+	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
    -+	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
    -+	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
    -+	git -C "$sub_dir" update-index --refresh &&
    -+	git -C "$sub_dir" diff-files --exit-code &&
    -+	cached_super_dir="$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" &&
    -+	[ "$(git -C "$super_dir" rev-parse --absolute-git-dir)" \
    -+		-ef "$sub_dir/$cached_super_dir" ] &&
    -+	git -C "$sub_dir" clean -n -d -x >untracked
    + 	git -C "$sub_dir" clean -n -d -x >untracked
      }
      
    - test_expect_success 'submodule add' '
     @@ t/t7400-submodule-basic.sh: test_expect_success 'submodule add' '
      	) &&
      
3:  d278568a8e ! 3:  60e6cf77c5 submodule: record superproject gitdir during absorbgitdirs
    @@ Commit message
      ## submodule.c ##
     @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *path)
      	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
    - 	char *new_git_dir;
    + 	struct strbuf new_gitdir = STRBUF_INIT;
      	const struct submodule *sub;
     +	struct strbuf config_path = STRBUF_INIT, sb = STRBUF_INIT;
      
    @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *p
     +	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
     +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
     +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
    -+			       relative_path(get_super_prefix_or_empty(),
    -+					     path, &sb));
    ++			       relative_path(absolute_path(get_git_dir()),
    ++					     real_new_git_dir, &sb));
     +
     +	strbuf_release(&config_path);
     +	strbuf_release(&sb);
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir' '
     +	test_cmp expect.2 actual.2 &&
     +
     +	# make sure the submodule cached the superproject gitdir correctly
    -+	test-tool path-utils real_path . >expect &&
    -+	test-tool path-utils real_path \
    -+		"$(git -C sub1 config submodule.superprojectGitDir)" >actual &&
    ++	submodule_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
    ++	superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
    ++
    ++	test-tool path-utils relative_path "$superproject_gitdir" \
    ++		"$submodule_gitdir" >expect &&
    ++	git -C sub1 config submodule.superprojectGitDir >actual &&
     +
     +	test_cmp expect actual
      '
      
      test_expect_success 'absorbing does not fail for deinitialized submodules' '
    +@@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir in a nested submodule' '
    + 	git status >actual.1 &&
    + 	git -C sub1/nested rev-parse HEAD >actual.2 &&
    + 	test_cmp expect.1 actual.1 &&
    +-	test_cmp expect.2 actual.2
    ++	test_cmp expect.2 actual.2 &&
    ++
    ++	sub1_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
    ++	sub1_nested_gitdir="$(git -C sub1/nested rev-parse --absolute-git-dir)" &&
    ++
    ++	test-tool path-utils relative_path "$sub1_gitdir" "$sub1_nested_gitdir" \
    ++		>expect &&
    ++	git -C sub1/nested config submodule.superprojectGitDir >actual &&
    ++
    ++	test_cmp expect actual
    + '
    + 
    + test_expect_success 're-setup nested submodule' '
4:  6866c36620 ! 4:  df9879ff93 submodule: record superproject gitdir during 'update'
    @@ Commit message
     
      ## git-submodule.sh ##
     @@ git-submodule.sh: cmd_update()
    - 			fi
    - 		fi
    + 			;;
    + 		esac
      
     +		# Cache a pointer to the superproject's gitdir. This may have
    -+		# changed, so rewrite it unconditionally. Writes it to worktree
    ++		# changed, unless it's a fresh clone. Writes it to worktree
     +		# if applicable, otherwise to local.
    -+		relative_gitdir="$(git rev-parse --path-format=relative \
    -+						 --prefix "${sm_path}" \
    -+						 --git-dir)"
    ++		if test -z "$just_cloned"
    ++		then
    ++			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
    ++			relative_gitdir="$(git rev-parse --path-format=relative \
    ++							 --prefix "${sm_gitdir}" \
    ++							 --git-dir)"
     +
    -+		git -C "$sm_path" config --worktree \
    -+			submodule.superprojectgitdir "$relative_gitdir"
    ++			git -C "$sm_path" config --worktree \
    ++				submodule.superprojectgitdir "$relative_gitdir"
    ++		fi
     +
      		if test -n "$recursive"
      		then
    @@ t/t7406-submodule-update.sh: test_expect_success 'submodule update --quiet passe
     +	(cd super &&
     +	 git -C submodule config --unset submodule.superprojectGitdir &&
     +	 git submodule update &&
    -+	 echo "../.git" >expect &&
    ++	 test-tool path-utils relative_path \
    ++		"$(git rev-parse --absolute-git-dir)" \
    ++		"$(git -C submodule rev-parse --absolute-git-dir)" >expect &&
     +	 git -C submodule config submodule.superprojectGitdir >actual &&
     +	 test_cmp expect actual
     +	)
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/4] t7400-submodule-basic: modernize inspect() helper
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
@ 2021-10-14 20:34 ` Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-10-14 20:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Since the inspect() helper in the submodule-basic test suite was
written, 'git -C <dir>' was added. By using -C, we no longer need a
reference to the base directory for the test. This simplifies callsites,
and will make the addition of other arguments in later patches more
readable.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 t/t7400-submodule-basic.sh | 40 +++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index cb1b8e35db..d69a5c0032 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -107,23 +107,15 @@ test_expect_success 'setup - repository to add submodules to' '
 # generates, which will expand symbolic links.
 submodurl=$(pwd -P)
 
-listbranches() {
-	git for-each-ref --format='%(refname)' 'refs/heads/*'
-}
-
 inspect() {
-	dir=$1 &&
-	dotdot="${2:-..}" &&
-
-	(
-		cd "$dir" &&
-		listbranches >"$dotdot/heads" &&
-		{ git symbolic-ref HEAD || :; } >"$dotdot/head" &&
-		git rev-parse HEAD >"$dotdot/head-sha1" &&
-		git update-index --refresh &&
-		git diff-files --exit-code &&
-		git clean -n -d -x >"$dotdot/untracked"
-	)
+	sub_dir=$1 &&
+
+	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
+	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
+	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
+	git -C "$sub_dir" update-index --refresh &&
+	git -C "$sub_dir" diff-files --exit-code &&
+	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
 test_expect_success 'submodule add' '
@@ -146,7 +138,7 @@ test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod ../.. &&
+	inspect addtest/submod &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -248,7 +240,7 @@ test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch ../.. &&
+	inspect addtest/submod-branch &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -264,7 +256,7 @@ test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz ../../.. &&
+	inspect addtest/dotsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -280,7 +272,7 @@ test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz ../../.. &&
+	inspect addtest/dotslashdotsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -296,7 +288,7 @@ test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz ../../.. &&
+	inspect addtest/slashslashsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -312,7 +304,7 @@ test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod ../.. &&
+	inspect addtest/realsubmod &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -328,7 +320,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 ../.. &&
+	inspect addtest/realsubmod2 &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -359,7 +351,7 @@ test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 ../.. &&
+	inspect addtest/realsubmod3 &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/4] introduce submodule.superprojectGitDir record
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
@ 2021-10-14 20:34 ` Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-10-14 20:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Junio C Hamano

Teach submodules a reference to their superproject's gitdir. This allows
us to A) know that we're running from a submodule, and B) have a
shortcut to the superproject's vitals, for example, configs.

By using a relative path instead of an absolute path, we can move the
superproject directory around on the filesystem without breaking the
submodule's cache. And by using the path from gitdir to gitdir, we can
move the submodule within the superproject's tree structure without
breaking the submodule's cache, too.

Since this hint value is only introduced during new submodule creation
via `git submodule add`, though, there is more work to do to allow the
record to be created at other times.

If the new config is present, we can do some optional value-added
behavior, like letting "git status" print additional info about the
submodule's status in relation to its superproject, or like letting the
superproject and submodule share an additional config file separate from
either one's local config.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/submodule.txt | 12 +++++++++++
 builtin/submodule--helper.c        |  4 ++++
 t/t7400-submodule-basic.sh         | 32 ++++++++++++++++++++----------
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
index ee454f8126..c10aeb7fdc 100644
--- a/Documentation/config/submodule.txt
+++ b/Documentation/config/submodule.txt
@@ -91,3 +91,15 @@ submodule.alternateErrorStrategy::
 	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
 	or `info`, and if there is an error with the computed alternate, the
 	clone proceeds as if no alternate was specified.
+
+submodule.superprojectGitDir::
+	The relative path from the submodule's gitdir to its superproject's
+	gitdir. When Git is run in a repository, it usually makes no difference
+	whether this repository is standalone or a submodule, but if this
+	configuration variable is present, additional behavior may be possible,
+	such as "git status" printing additional information about this
+	submodule's status with respect to its superproject. This config should
+	only be present in projects which are submodules, but is not guaranteed
+	to be present in every submodule, so only optional value-added behavior
+	should be linked to it. It is set automatically during
+	submodule creation.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d574d650af..338dcbd364 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1838,6 +1838,10 @@ static int clone_submodule(struct module_clone_data *clone_data)
 		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
 				       error_strategy);
 
+	git_config_set_in_file(p, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_dir()),
+					     sm_gitdir, &sb));
+
 	free(sm_alternate);
 	free(error_strategy);
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index d69a5c0032..3d146491df 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -109,12 +109,24 @@ submodurl=$(pwd -P)
 
 inspect() {
 	sub_dir=$1 &&
+	super_dir=$2 &&
 
 	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
 	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
 	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
 	git -C "$sub_dir" update-index --refresh &&
 	git -C "$sub_dir" diff-files --exit-code &&
+
+	# Ensure that submodule.superprojectGitDir contains the path from the
+	# submodule's gitdir to the superproject's gitdir.
+
+	super_abs_gitdir=$(git -C "$super_dir" rev-parse --absolute-git-dir) &&
+	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --absolute-git-dir) &&
+
+	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
+	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
+						"$sub_abs_gitdir")" ] &&
+
 	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
@@ -138,7 +150,7 @@ test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod &&
+	inspect addtest/submod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -240,7 +252,7 @@ test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch &&
+	inspect addtest/submod-branch addtest &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -256,7 +268,7 @@ test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz &&
+	inspect addtest/dotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -272,7 +284,7 @@ test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz &&
+	inspect addtest/dotslashdotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -288,7 +300,7 @@ test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz &&
+	inspect addtest/slashslashsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -304,7 +316,7 @@ test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod &&
+	inspect addtest/realsubmod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -320,7 +332,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 &&
+	inspect addtest/realsubmod2 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -351,7 +363,7 @@ test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 &&
+	inspect addtest/realsubmod3 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -492,7 +504,7 @@ test_expect_success 'update should work when path is an empty dir' '
 	git submodule update -q >update.out &&
 	test_must_be_empty update.out &&
 
-	inspect init &&
+	inspect init . &&
 	test_cmp expect head-sha1
 '
 
@@ -551,7 +563,7 @@ test_expect_success 'update should checkout rev1' '
 	echo "$rev1" >expect &&
 
 	git submodule update init &&
-	inspect init &&
+	inspect init . &&
 
 	test_cmp expect head-sha1
 '
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
  2021-10-14 20:34 ` [PATCH v4 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
@ 2021-10-14 20:34 ` Emily Shaffer
  2021-10-18 23:18   ` Jonathan Tan
  2021-10-14 20:34 ` [PATCH v4 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-10-14 20:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Already during 'git submodule add' we record a pointer to the
superproject's gitdir. However, this doesn't help brand-new
submodules created with 'git init' and later absorbed with 'git
submodule absorbgitdir'. Let's start adding that pointer during 'git
submodule absorbgitdir' too.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 submodule.c                        | 10 ++++++++++
 t/t7412-submodule-absorbgitdirs.sh | 23 +++++++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 96487517f9..571b68b66d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2084,6 +2084,7 @@ static void relocate_single_git_dir_into_superproject(const char *path)
 	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	const struct submodule *sub;
+	struct strbuf config_path = STRBUF_INIT, sb = STRBUF_INIT;
 
 	if (submodule_uses_worktrees(path))
 		die(_("relocate_gitdir for submodule '%s' with "
@@ -2114,6 +2115,15 @@ static void relocate_single_git_dir_into_superproject(const char *path)
 
 	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
 
+	/* cache pointer to superproject's gitdir */
+	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
+	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
+	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_dir()),
+					     real_new_git_dir, &sb));
+
+	strbuf_release(&config_path);
+	strbuf_release(&sb);
 	free(old_git_dir);
 	free(real_old_git_dir);
 	free(real_new_git_dir);
diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
index 1cfa150768..9ee5ccd660 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -30,7 +30,17 @@ test_expect_success 'absorb the git dir' '
 	git status >actual.1 &&
 	git -C sub1 rev-parse HEAD >actual.2 &&
 	test_cmp expect.1 actual.1 &&
-	test_cmp expect.2 actual.2
+	test_cmp expect.2 actual.2 &&
+
+	# make sure the submodule cached the superproject gitdir correctly
+	submodule_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
+	superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
+
+	test-tool path-utils relative_path "$superproject_gitdir" \
+		"$submodule_gitdir" >expect &&
+	git -C sub1 config submodule.superprojectGitDir >actual &&
+
+	test_cmp expect actual
 '
 
 test_expect_success 'absorbing does not fail for deinitialized submodules' '
@@ -61,7 +71,16 @@ test_expect_success 'absorb the git dir in a nested submodule' '
 	git status >actual.1 &&
 	git -C sub1/nested rev-parse HEAD >actual.2 &&
 	test_cmp expect.1 actual.1 &&
-	test_cmp expect.2 actual.2
+	test_cmp expect.2 actual.2 &&
+
+	sub1_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
+	sub1_nested_gitdir="$(git -C sub1/nested rev-parse --absolute-git-dir)" &&
+
+	test-tool path-utils relative_path "$sub1_gitdir" "$sub1_nested_gitdir" \
+		>expect &&
+	git -C sub1/nested config submodule.superprojectGitDir >actual &&
+
+	test_cmp expect actual
 '
 
 test_expect_success 're-setup nested submodule' '
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/4] submodule: record superproject gitdir during 'update'
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
                   ` (2 preceding siblings ...)
  2021-10-14 20:34 ` [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
@ 2021-10-14 20:34 ` Emily Shaffer
  2021-10-25 16:17   ` Derrick Stolee
  2021-10-25 16:19 ` [PATCH v4 0/4] cache parent project's gitdir in submodules Derrick Stolee
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
  5 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-10-14 20:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

A recorded hint path to the superproject's gitdir might be added during
'git submodule add', but in some cases - like submodules which were
created before 'git submodule add' learned to record that info - it might
be useful to update the hint. Let's do it during 'git submodule
update', when we already have a handle to the superproject while calling
operations on the submodules.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 git-submodule.sh            | 14 ++++++++++++++
 t/t7406-submodule-update.sh | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/git-submodule.sh b/git-submodule.sh
index 652861aa66..ec02eb5971 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -449,6 +449,20 @@ cmd_update()
 			;;
 		esac
 
+		# Cache a pointer to the superproject's gitdir. This may have
+		# changed, unless it's a fresh clone. Writes it to worktree
+		# if applicable, otherwise to local.
+		if test -z "$just_cloned"
+		then
+			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
+			relative_gitdir="$(git rev-parse --path-format=relative \
+							 --prefix "${sm_gitdir}" \
+							 --git-dir)"
+
+			git -C "$sm_path" config --worktree \
+				submodule.superprojectgitdir "$relative_gitdir"
+		fi
+
 		if test -n "$recursive"
 		then
 			(
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 11cccbb333..7fd5741a5a 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1061,4 +1061,16 @@ test_expect_success 'submodule update --quiet passes quietness to fetch with a s
 	)
 '
 
+test_expect_success 'submodule update adds superproject gitdir to older repos' '
+	(cd super &&
+	 git -C submodule config --unset submodule.superprojectGitdir &&
+	 git submodule update &&
+	 test-tool path-utils relative_path \
+		"$(git rev-parse --absolute-git-dir)" \
+		"$(git -C submodule rev-parse --absolute-git-dir)" >expect &&
+	 git -C submodule config submodule.superprojectGitdir >actual &&
+	 test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-10-14 20:34 ` [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
@ 2021-10-18 23:18   ` Jonathan Tan
  2021-10-25 16:14     ` Derrick Stolee
  2021-11-04 22:09     ` Emily Shaffer
  0 siblings, 2 replies; 31+ messages in thread
From: Jonathan Tan @ 2021-10-18 23:18 UTC (permalink / raw)
  To: emilyshaffer; +Cc: git, Jonathan Tan

> Already during 'git submodule add' we record a pointer to the
> superproject's gitdir. However, this doesn't help brand-new
> submodules created with 'git init' and later absorbed with 'git
> submodule absorbgitdir'. Let's start adding that pointer during 'git
> submodule absorbgitdir' too.

s/absorbgitdir/absorbgitdirs/ (note the "s" at the end)

> @@ -2114,6 +2115,15 @@ static void relocate_single_git_dir_into_superproject(const char *path)
>  
>  	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
>  
> +	/* cache pointer to superproject's gitdir */
> +	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
> +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
> +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
> +			       relative_path(absolute_path(get_git_dir()),
> +					     real_new_git_dir, &sb));
> +
> +	strbuf_release(&config_path);
> +	strbuf_release(&sb);
>  	free(old_git_dir);
>  	free(real_old_git_dir);
>  	free(real_new_git_dir);

Here [1] you mention that you'll delete the NEEDSWORK, but it's still
there.

Having said that, it might be better to make a test in which we call
this command while in a worktree of a superproject. The test might
reveal that (as pointed out to me internally) you might need to use the
common dir functions instead of the git dir functions to point to the
directory that you want (git-worktree.txt distinguishes the 2 if you
search for GIT_COMMON_DIR).

Besides that, all 4 patches look good (including the description of the
new config variable).

[1] https://lore.kernel.org/git/YWc2iJ7FQJYCnQ7w@google.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-10-18 23:18   ` Jonathan Tan
@ 2021-10-25 16:14     ` Derrick Stolee
  2021-11-04 23:22       ` Emily Shaffer
  2021-11-04 22:09     ` Emily Shaffer
  1 sibling, 1 reply; 31+ messages in thread
From: Derrick Stolee @ 2021-10-25 16:14 UTC (permalink / raw)
  To: Jonathan Tan, emilyshaffer; +Cc: git

On 10/18/2021 7:18 PM, Jonathan Tan wrote:
>> Already during 'git submodule add' we record a pointer to the
>> superproject's gitdir. However, this doesn't help brand-new
>> submodules created with 'git init' and later absorbed with 'git
>> submodule absorbgitdir'. Let's start adding that pointer during 'git
>> submodule absorbgitdir' too.
> 
> s/absorbgitdir/absorbgitdirs/ (note the "s" at the end)
> 
>> @@ -2114,6 +2115,15 @@ static void relocate_single_git_dir_into_superproject(const char *path)
>>  
>>  	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
>>  
>> +	/* cache pointer to superproject's gitdir */
>> +	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
>> +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
>> +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
>> +			       relative_path(absolute_path(get_git_dir()),
>> +					     real_new_git_dir, &sb));
>> +
>> +	strbuf_release(&config_path);
>> +	strbuf_release(&sb);
>>  	free(old_git_dir);
>>  	free(real_old_git_dir);
>>  	free(real_new_git_dir);
> 
> Here [1] you mention that you'll delete the NEEDSWORK, but it's still
> there.
> 
> Having said that, it might be better to make a test in which we call
> this command while in a worktree of a superproject. The test might
> reveal that (as pointed out to me internally) you might need to use the
> common dir functions instead of the git dir functions to point to the
> directory that you want (git-worktree.txt distinguishes the 2 if you
> search for GIT_COMMON_DIR).

I came here to say the same thing. It's a bit too direct to compute
the location of a config file this way, so we should expose a method
that can create one for a given Git directory.

Since you're setting this config value inside the submodule's config,
what does it mean for a submodule to also be a worktree (and hence
require config.worktree)? What happens in this rough scenario?

  1. git init sub
  2. git init super
  3. git -C sub worktree add super/sub
  4. git -C super submodule absorbgitdir sub

I haven't actually tried running these things, but it seems unusual
and unexpected. This doesn't even account for cases where the repo
root and a worktree are both submodules within the superproject.

If we already have protections preventing these worktrees as
submodules, then perhaps there is no need for work here. I'm not
familiar enough with the area to make a claim one way or another.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/4] submodule: record superproject gitdir during 'update'
  2021-10-14 20:34 ` [PATCH v4 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
@ 2021-10-25 16:17   ` Derrick Stolee
  0 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2021-10-25 16:17 UTC (permalink / raw)
  To: Emily Shaffer, git

On 10/14/2021 4:34 PM, Emily Shaffer wrote:

> +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
> +			relative_gitdir="$(git rev-parse --path-format=relative \
> +							 --prefix "${sm_gitdir}" \
> +							 --git-dir)"
> +
> +			git -C "$sm_path" config --worktree \
> +				submodule.superprojectgitdir "$relative_gitdir"

This use of `--worktree` is good because it acts the same as `--local` if
config.worktree doesn't exist. If we discover it is too challenging to find
the correct config file in-core, then we could call this command as a subprocess
in patch 3.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 0/4] cache parent project's gitdir in submodules
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
                   ` (3 preceding siblings ...)
  2021-10-14 20:34 ` [PATCH v4 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
@ 2021-10-25 16:19 ` Derrick Stolee
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
  5 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2021-10-25 16:19 UTC (permalink / raw)
  To: Emily Shaffer, git
  Cc: Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Jonathan Tan

On 10/14/2021 4:34 PM, Emily Shaffer wrote:
> Since v3, a pretty major change: the semantics of
> submodule.superprojectGitDir has changed, to point from the submodule's
> gitdir to the superproject's gitdir (in v3 and earlier, we kept a path
> from the submodule's *worktree* to the superproject's gitdir instead).
> This cleans up some of the confusions about the behavior when a
> submodule worktree moves around in the superproject's tree, or in a
> future when we support submodules having multiple worktrees.

I like the new design!
 
> I also tried to simplify the tests to use 'test-tool path-utils
> relative_path' everywhere - I think that makes them much more clear for
> a test reader, but if you're reviewing and it isn't obvious what we're
> testing for, please speak up.

I found these changes to be well motivated.

> I think this is pretty mature and there was a lot of general agreement
> that the gitdir->gitdir association was the way to go, so please be
> brutal and look for nits, leaks, etc. this round ;)

My main concern was also brought up by Jonathan: the concerns around
worktrees are not fully fleshed out. We should avoid a NEEDSWORK, and
there is a way forward using a subprocess even if we find it is
difficult to find the correct config file in-core. In-core is preferred,
but it's good to have options.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-10-18 23:18   ` Jonathan Tan
  2021-10-25 16:14     ` Derrick Stolee
@ 2021-11-04 22:09     ` Emily Shaffer
  1 sibling, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 22:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Oct 18, 2021 at 04:18:18PM -0700, Jonathan Tan wrote:
> 
> > Already during 'git submodule add' we record a pointer to the
> > superproject's gitdir. However, this doesn't help brand-new
> > submodules created with 'git init' and later absorbed with 'git
> > submodule absorbgitdir'. Let's start adding that pointer during 'git
> > submodule absorbgitdir' too.
> 
> s/absorbgitdir/absorbgitdirs/ (note the "s" at the end)
> 
> > @@ -2114,6 +2115,15 @@ static void relocate_single_git_dir_into_superproject(const char *path)
> >  
> >  	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
> >  
> > +	/* cache pointer to superproject's gitdir */
> > +	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
> > +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
> > +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
> > +			       relative_path(absolute_path(get_git_dir()),
> > +					     real_new_git_dir, &sb));
> > +
> > +	strbuf_release(&config_path);
> > +	strbuf_release(&sb);
> >  	free(old_git_dir);
> >  	free(real_old_git_dir);
> >  	free(real_new_git_dir);
> 
> Here [1] you mention that you'll delete the NEEDSWORK, but it's still
> there.
> 
> Having said that, it might be better to make a test in which we call
> this command while in a worktree of a superproject. The test might
> reveal that (as pointed out to me internally) you might need to use the
> common dir functions instead of the git dir functions to point to the
> directory that you want (git-worktree.txt distinguishes the 2 if you
> search for GIT_COMMON_DIR).

Huh, something interesting happened, actually.

I wrote a little test:

  test_expect_success 'absorbgitdirs works when called from a superproject worktree' '
          # set up a worktree of the superproject
          git worktree add wt &&
          (
          cd wt &&
 
          # create a new unembedded git dir
          git init sub4 &&
          test_commit -C sub4 first &&
          git submodule add ./sub4 &&
          test_tick &&
 
          # absorb the git dir
          git submodule absorbgitdirs sub4 &&
 
          # make sure the submodule cached the superproject gitdir correctly
          submodule_gitdir="$(git -C sub4 rev-parse --absolute-git-dir)" &&
          superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
 
          test-tool path-utils relative_path "$superproject_gitdir" \
                  "$submodule_gitdir" >expect &&
          git -C sub4 config submodule.superprojectGitDir >actual &&
 
          test_pause &&
          test_cmp expect actual
          )
  '

However, the `git submodule absorbgitdirs` command didn't do quite what
I expected.

When I made a new worktree, that worktree's gitdir showed up in
'$TEST_DIR/.git/worktrees/wt/'. That's not very surprising. But what did
surprise me was that when I called `git submodule absorbgitdirs sub4`,
sub4's gitdir ended up in '$TEST_DIR/.git/worktrees/wt/modules/sub4',
rather than in '$TEST_DIR/.git/modules/sub4'. That's a little
surprising!

Anyway, this test has a sort of pernicious mistake too - I'm checking
relative path between 'git rev-parse --absolute-git-dir's, but the
relative path from .git/worktrees/wt/ to .git/worktrees/wt/modules/sub4
is the same as the relative path from .git/ to .git/modules/sub4, so
this test actually passes.

I'll change the tests here and elsewhere to use 'git rev-parse
--path-format=absolute --git-common-dir', but I think that still leaves
a kind of dangerous state for people working a lot with worktrees and
submodules - if I like to make throwaway worktrees in my regular
workflow, and I create a new submodule in one, and then get rid of the
worktree when I'm done with the task that added the new submodule, I
think it will explode if I try to checkout the branch I was using in
that worktree.

I tried it out locally:

 # setup
 git init test && cd test
 git commit --allow-empty -m "first commit"
 git worktree add wt
 (
   cd wt
   git init sub
   git -C sub commit --allow-empty -m "first-commit"
   git submodule add ./sub
   git commit -m "add submodule (as initted)
   git submodule absorbgitdirs sub
   # This told me "Migrating git directory of 'sub' from
   # test/wt/sub/.git to test/.git/worktrees/wt/modules/sub, oops
 )

 # Make it possible to checkout wt branch somewhere else
 git -C wt checkout HEAD --detach

 # Try and delete the worktree; presumably my work is done
 git worktree remove wt

At this point, Git wouldn't let me remove the worktree:
  fatal: working trees containing submodules cannot be moved or removed

But wouldn't it be better to just absorb the submodule into the common
dir instead...?

By the way, when I tried checking out 'wt' branch from the original
worktree without deleting the new worktree, and then running 'git
submodule update', Git cloned 'sub' from .git/worktrees/wt/modules/sub
into .git/modules/sub. That was pretty surprising too!


Anyway, all of that is kind of a tangent. The takeaways I got are
twofold:

1) Yes, use common dir, not gitdir, in all the cached
path-to-superproject stuff.
2) Someone (me?) should send a patch keeping the "you can't delete a
submodule with a gitdir in it" die(), but *also* changing the behavior
to put the new submodule gitdir into get_common_dir() instead of
get_git_dir().

I'll try to send (2) separately - I think it will be a pretty small change,
and by keeping around the die() for deleting a worktree that already has
a submodule gitdir in it, we won't be risking deleting anybody's
existing work.

 - Emily

> 
> Besides that, all 4 patches look good (including the description of the
> new config variable).
> 
> [1] https://lore.kernel.org/git/YWc2iJ7FQJYCnQ7w@google.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-10-25 16:14     ` Derrick Stolee
@ 2021-11-04 23:22       ` Emily Shaffer
  2021-11-08  1:07         ` Derrick Stolee
  0 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:22 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, git

On Mon, Oct 25, 2021 at 12:14:07PM -0400, Derrick Stolee wrote:
> 
> On 10/18/2021 7:18 PM, Jonathan Tan wrote:
> >> Already during 'git submodule add' we record a pointer to the
> >> superproject's gitdir. However, this doesn't help brand-new
> >> submodules created with 'git init' and later absorbed with 'git
> >> submodule absorbgitdir'. Let's start adding that pointer during 'git
> >> submodule absorbgitdir' too.
> > 
> > s/absorbgitdir/absorbgitdirs/ (note the "s" at the end)
> > 
> >> @@ -2114,6 +2115,15 @@ static void relocate_single_git_dir_into_superproject(const char *path)
> >>  
> >>  	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
> >>  
> >> +	/* cache pointer to superproject's gitdir */
> >> +	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
> >> +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
> >> +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
> >> +			       relative_path(absolute_path(get_git_dir()),
> >> +					     real_new_git_dir, &sb));
> >> +
> >> +	strbuf_release(&config_path);
> >> +	strbuf_release(&sb);
> >>  	free(old_git_dir);
> >>  	free(real_old_git_dir);
> >>  	free(real_new_git_dir);
> > 
> > Here [1] you mention that you'll delete the NEEDSWORK, but it's still
> > there.
> > 
> > Having said that, it might be better to make a test in which we call
> > this command while in a worktree of a superproject. The test might
> > reveal that (as pointed out to me internally) you might need to use the
> > common dir functions instead of the git dir functions to point to the
> > directory that you want (git-worktree.txt distinguishes the 2 if you
> > search for GIT_COMMON_DIR).
> 
> I came here to say the same thing. It's a bit too direct to compute
> the location of a config file this way, so we should expose a method
> that can create one for a given Git directory.
> 
> Since you're setting this config value inside the submodule's config,
> what does it mean for a submodule to also be a worktree (and hence
> require config.worktree)? What happens in this rough scenario?
> 
>   1. git init sub
>   2. git init super
>   3. git -C sub worktree add super/sub
>   4. git -C super submodule absorbgitdir sub
> 
> I haven't actually tried running these things, but it seems unusual
> and unexpected. This doesn't even account for cases where the repo
> root and a worktree are both submodules within the superproject.
> 
> If we already have protections preventing these worktrees as
> submodules, then perhaps there is no need for work here. I'm not
> familiar enough with the area to make a claim one way or another.

Yeah, I think there is actually a test case covering this in t7412:

137 test_expect_success 'setup a submodule with multiple worktrees' '
138         # first create another unembedded git dir in a new submodule
139         git init sub3 &&
140         test_commit -C sub3 first &&
141         git submodule add ./sub3 &&
142         test_tick &&
143         git commit -m "add another submodule" &&
144         git -C sub3 worktree add ../sub3_second_work_tree
145 '
146
147 test_expect_success 'absorbing fails for a submodule with multiple worktrees' '
148         test_must_fail git submodule absorbgitdirs sub3 2>error &&
149         test_i18ngrep "not supported" error
150 '

That is, I think because 'sub/' in your scenario above has multiple
worktrees, the absorbgitdirs will fail. So I won't do additional work
here. Thanks.

 - Emily

> 
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 0/4] cache parent project's gitdir in submodules
  2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
                   ` (4 preceding siblings ...)
  2021-10-25 16:19 ` [PATCH v4 0/4] cache parent project's gitdir in submodules Derrick Stolee
@ 2021-11-04 23:49 ` Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
                     ` (4 more replies)
  5 siblings, 5 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:49 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Derrick Stolee, Jonathan Tan

For the original cover letter, see
https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com.

The CI run is here:
https://github.com/nasamuffin/git/actions/runs/1423475715

It shows a broken (pedantic,fedora) test, but those are known broken:
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2111040007170.56%40tvgsbejvaqbjf.bet

[v4 cover letter]
Since v3, a pretty major change: the semantics of
submodule.superprojectGitDir has changed, to point from the submodule's
gitdir to the superproject's gitdir (in v3 and earlier, we kept a path
from the submodule's *worktree* to the superproject's gitdir instead).
This cleans up some of the confusions about the behavior when a
submodule worktree moves around in the superproject's tree, or in a
future when we support submodules having multiple worktrees.

I also tried to simplify the tests to use 'test-tool path-utils
relative_path' everywhere - I think that makes them much more clear for
a test reader, but if you're reviewing and it isn't obvious what we're
testing for, please speak up.

I think this is pretty mature and there was a lot of general agreement
that the gitdir->gitdir association was the way to go, so please be
brutal and look for nits, leaks, etc. this round ;)
[/v4 cover letter]

Since v4:

The only real change here is a slight semantics change to map from
<submodule gitdir> to <superproject common git dir>. In every case
*except* for when the superproject has a worktree, this changes nothing.
For the case when the superproject has a worktree, this means that now
submodules will refer to the general superproject common dir (e.g. no
worktree-specific refs or configs or whatnot).

I *think* that because a submodule should exist in the context of the
common dir, not the worktree gitdir, that is ok. However, it does mean
it would be difficult to do something like sharing a config specific to
the worktree (the initial goal of this series).

$ROOT/.git
$ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub
$ROOT/.git/modules/sub <- points to $ROOT/.git
$ROOT/.git/worktrees/wt
$ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook

If the submodule only knows about the common dir, that is tough, because
the submodule would basically have to guess which worktree it's in from
its own path. There would be no way for '$WT/sub' to inherit
'$ROOT/.git/worktrees/wt/config.superproject'.

That said... right now, we don't support submodules in worktrees very
well at all. A submodule in a worktree will get a brand new gitdir in
$ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to
the super's common dir). So I think we can punt on this entire question
until we teach submodules and worktrees to play more gracefully together
(it's on my long list...), and at that time we can probably introduce a
pointer from $ROOT/.git/modules/sub/worktrees/wt/ to
$ROOT/.git/worktrees/wt/....

Or, to summarize the long ramble above: "this is still kind of weird
with worktrees, but let's fix it later when we fix worktrees more
thoroughly".

(More rambling about worktree weirdness here:
https://lore.kernel.org/git/YYRaII8YWVxlBqsF%40google.com )

 - Emily

Emily Shaffer (4):
  t7400-submodule-basic: modernize inspect() helper
  introduce submodule.superprojectGitDir record
  submodule: record superproject gitdir during absorbgitdirs
  submodule: record superproject gitdir during 'update'

 Documentation/config/submodule.txt | 12 +++++++
 builtin/submodule--helper.c        |  4 +++
 git-submodule.sh                   | 14 ++++++++
 submodule.c                        |  9 +++++
 t/t7400-submodule-basic.sh         | 54 ++++++++++++++++--------------
 t/t7406-submodule-update.sh        | 12 +++++++
 t/t7412-submodule-absorbgitdirs.sh | 50 +++++++++++++++++++++++++--
 7 files changed, 128 insertions(+), 27 deletions(-)

Range-diff against v4:
1:  2ff151aaa2 = 1:  6ff10beaf2 t7400-submodule-basic: modernize inspect() helper
2:  ed5479ad5d ! 2:  d4f4627585 introduce submodule.superprojectGitDir record
    @@ Commit message
         superproject directory around on the filesystem without breaking the
         submodule's cache. And by using the path from gitdir to gitdir, we can
         move the submodule within the superproject's tree structure without
    -    breaking the submodule's cache, too.
    +    breaking the submodule's cache, too. Finally, by pointing at
    +    "get_git_common_dir()" instead of "get_git_dir()", we ensure the link
    +    will refer to the parent repo, not to a specific worktree.
     
         Since this hint value is only introduced during new submodule creation
         via `git submodule add`, though, there is more work to do to allow the
    @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy::
     +
     +submodule.superprojectGitDir::
     +	The relative path from the submodule's gitdir to its superproject's
    -+	gitdir. When Git is run in a repository, it usually makes no difference
    -+	whether this repository is standalone or a submodule, but if this
    -+	configuration variable is present, additional behavior may be possible,
    -+	such as "git status" printing additional information about this
    -+	submodule's status with respect to its superproject. This config should
    -+	only be present in projects which are submodules, but is not guaranteed
    -+	to be present in every submodule, so only optional value-added behavior
    -+	should be linked to it. It is set automatically during
    -+	submodule creation.
    ++	common dir. When Git is run in a repository, it usually makes no
    ++	difference whether this repository is standalone or a submodule, but if
    ++	this configuration variable is present, additional behavior may be
    ++	possible, such as "git status" printing additional information about
    ++	this submodule's status with respect to its superproject. This config
    ++	should only be present in projects which are submodules, but is not
    ++	guaranteed to be present in every submodule, so only optional
    ++	value-added behavior should be linked to it. It is set automatically
    ++	during submodule creation.
     
      ## builtin/submodule--helper.c ##
     @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data *clone_data)
    @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data
      				       error_strategy);
      
     +	git_config_set_in_file(p, "submodule.superprojectGitdir",
    -+			       relative_path(absolute_path(get_git_dir()),
    ++			       relative_path(absolute_path(get_git_common_dir()),
     +					     sm_gitdir, &sb));
     +
      	free(sm_alternate);
    @@ t/t7400-submodule-basic.sh: submodurl=$(pwd -P)
     +	# Ensure that submodule.superprojectGitDir contains the path from the
     +	# submodule's gitdir to the superproject's gitdir.
     +
    -+	super_abs_gitdir=$(git -C "$super_dir" rev-parse --absolute-git-dir) &&
    -+	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --absolute-git-dir) &&
    ++	super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) &&
    ++	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) &&
     +
     +	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
     +	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
3:  60e6cf77c5 ! 3:  2dae297943 submodule: record superproject gitdir during absorbgitdirs
    @@ Commit message
         Already during 'git submodule add' we record a pointer to the
         superproject's gitdir. However, this doesn't help brand-new
         submodules created with 'git init' and later absorbed with 'git
    -    submodule absorbgitdir'. Let's start adding that pointer during 'git
    -    submodule absorbgitdir' too.
    +    submodule absorbgitdirs'. Let's start adding that pointer during 'git
    +    submodule absorbgitdirs' too.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
     
    @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *p
      	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
      
     +	/* cache pointer to superproject's gitdir */
    -+	/* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */
     +	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
     +	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
    -+			       relative_path(absolute_path(get_git_dir()),
    ++			       relative_path(absolute_path(get_git_common_dir()),
     +					     real_new_git_dir, &sb));
     +
     +	strbuf_release(&config_path);
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir' '
     +	test_cmp expect.2 actual.2 &&
     +
     +	# make sure the submodule cached the superproject gitdir correctly
    -+	submodule_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
    -+	superproject_gitdir="$(git rev-parse --absolute-git-dir)" &&
    ++	submodule_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" &&
    ++	superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" &&
     +
     +	test-tool path-utils relative_path "$superproject_gitdir" \
     +		"$submodule_gitdir" >expect &&
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir in a
     -	test_cmp expect.2 actual.2
     +	test_cmp expect.2 actual.2 &&
     +
    -+	sub1_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" &&
    -+	sub1_nested_gitdir="$(git -C sub1/nested rev-parse --absolute-git-dir)" &&
    ++	sub1_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" &&
    ++	sub1_nested_gitdir="$(git -C sub1/nested rev-parse --path-format=absolute --git-common-dir)" &&
     +
     +	test-tool path-utils relative_path "$sub1_gitdir" "$sub1_nested_gitdir" \
     +		>expect &&
    @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir in a
      '
      
      test_expect_success 're-setup nested submodule' '
    +@@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorbing fails for a submodule with multiple worktrees' '
    + 	test_i18ngrep "not supported" error
    + '
    + 
    ++test_expect_success 'absorbgitdirs works when called from a superproject worktree' '
    ++	# set up a worktree of the superproject
    ++	git worktree add wt &&
    ++	(
    ++	cd wt &&
    ++
    ++	# create a new unembedded git dir
    ++	git init sub4 &&
    ++	test_commit -C sub4 first &&
    ++	git submodule add ./sub4 &&
    ++	test_tick &&
    ++
    ++	# absorb the git dir
    ++	git submodule absorbgitdirs sub4 &&
    ++
    ++	# make sure the submodule cached the superproject gitdir correctly
    ++	submodule_gitdir="$(git -C sub4 rev-parse --path-format=absolute --git-common-dir)" &&
    ++	superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" &&
    ++
    ++	test-tool path-utils relative_path "$superproject_gitdir" \
    ++		"$submodule_gitdir" >expect &&
    ++	git -C sub4 config submodule.superprojectGitDir >actual &&
    ++
    ++	test_cmp expect actual
    ++	)
    ++'
    ++
    + test_done
4:  df9879ff93 ! 4:  f0412d6d34 submodule: record superproject gitdir during 'update'
    @@ git-submodule.sh: cmd_update()
      			;;
      		esac
      
    -+		# Cache a pointer to the superproject's gitdir. This may have
    ++		# Cache a pointer to the superproject's common dir. This may have
     +		# changed, unless it's a fresh clone. Writes it to worktree
     +		# if applicable, otherwise to local.
     +		if test -z "$just_cloned"
    @@ git-submodule.sh: cmd_update()
     +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
     +			relative_gitdir="$(git rev-parse --path-format=relative \
     +							 --prefix "${sm_gitdir}" \
    -+							 --git-dir)"
    ++							 --git-common-dir)"
     +
     +			git -C "$sm_path" config --worktree \
     +				submodule.superprojectgitdir "$relative_gitdir"
    @@ t/t7406-submodule-update.sh: test_expect_success 'submodule update --quiet passe
     +	 git -C submodule config --unset submodule.superprojectGitdir &&
     +	 git submodule update &&
     +	 test-tool path-utils relative_path \
    -+		"$(git rev-parse --absolute-git-dir)" \
    -+		"$(git -C submodule rev-parse --absolute-git-dir)" >expect &&
    ++		"$(git rev-parse --path-format=absolute --git-common-dir)" \
    ++		"$(git -C submodule rev-parse --path-format=absolute --git-common-dir)" >expect &&
     +	 git -C submodule config submodule.superprojectGitdir >actual &&
     +	 test_cmp expect actual
     +	)
-- 
2.34.0.rc0.344.g81b53c2807-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 1/4] t7400-submodule-basic: modernize inspect() helper
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
@ 2021-11-04 23:49   ` Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:49 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Since the inspect() helper in the submodule-basic test suite was
written, 'git -C <dir>' was added. By using -C, we no longer need a
reference to the base directory for the test. This simplifies callsites,
and will make the addition of other arguments in later patches more
readable.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 t/t7400-submodule-basic.sh | 40 +++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index cb1b8e35db..d69a5c0032 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -107,23 +107,15 @@ test_expect_success 'setup - repository to add submodules to' '
 # generates, which will expand symbolic links.
 submodurl=$(pwd -P)
 
-listbranches() {
-	git for-each-ref --format='%(refname)' 'refs/heads/*'
-}
-
 inspect() {
-	dir=$1 &&
-	dotdot="${2:-..}" &&
-
-	(
-		cd "$dir" &&
-		listbranches >"$dotdot/heads" &&
-		{ git symbolic-ref HEAD || :; } >"$dotdot/head" &&
-		git rev-parse HEAD >"$dotdot/head-sha1" &&
-		git update-index --refresh &&
-		git diff-files --exit-code &&
-		git clean -n -d -x >"$dotdot/untracked"
-	)
+	sub_dir=$1 &&
+
+	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
+	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
+	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
+	git -C "$sub_dir" update-index --refresh &&
+	git -C "$sub_dir" diff-files --exit-code &&
+	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
 test_expect_success 'submodule add' '
@@ -146,7 +138,7 @@ test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod ../.. &&
+	inspect addtest/submod &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -248,7 +240,7 @@ test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch ../.. &&
+	inspect addtest/submod-branch &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -264,7 +256,7 @@ test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz ../../.. &&
+	inspect addtest/dotsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -280,7 +272,7 @@ test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz ../../.. &&
+	inspect addtest/dotslashdotsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -296,7 +288,7 @@ test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz ../../.. &&
+	inspect addtest/slashslashsubmod/frotz &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -312,7 +304,7 @@ test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod ../.. &&
+	inspect addtest/realsubmod &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -328,7 +320,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 ../.. &&
+	inspect addtest/realsubmod2 &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -359,7 +351,7 @@ test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 ../.. &&
+	inspect addtest/realsubmod3 &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
-- 
2.34.0.rc0.344.g81b53c2807-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 2/4] introduce submodule.superprojectGitDir record
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
@ 2021-11-04 23:49   ` Emily Shaffer
  2021-11-05  7:50     ` Junio C Hamano
  2021-11-04 23:49   ` [PATCH v5 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:49 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Junio C Hamano

Teach submodules a reference to their superproject's gitdir. This allows
us to A) know that we're running from a submodule, and B) have a
shortcut to the superproject's vitals, for example, configs.

By using a relative path instead of an absolute path, we can move the
superproject directory around on the filesystem without breaking the
submodule's cache. And by using the path from gitdir to gitdir, we can
move the submodule within the superproject's tree structure without
breaking the submodule's cache, too. Finally, by pointing at
"get_git_common_dir()" instead of "get_git_dir()", we ensure the link
will refer to the parent repo, not to a specific worktree.

Since this hint value is only introduced during new submodule creation
via `git submodule add`, though, there is more work to do to allow the
record to be created at other times.

If the new config is present, we can do some optional value-added
behavior, like letting "git status" print additional info about the
submodule's status in relation to its superproject, or like letting the
superproject and submodule share an additional config file separate from
either one's local config.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/submodule.txt | 12 +++++++++++
 builtin/submodule--helper.c        |  4 ++++
 t/t7400-submodule-basic.sh         | 32 ++++++++++++++++++++----------
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
index ee454f8126..8cc57fe1c1 100644
--- a/Documentation/config/submodule.txt
+++ b/Documentation/config/submodule.txt
@@ -91,3 +91,15 @@ submodule.alternateErrorStrategy::
 	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
 	or `info`, and if there is an error with the computed alternate, the
 	clone proceeds as if no alternate was specified.
+
+submodule.superprojectGitDir::
+	The relative path from the submodule's gitdir to its superproject's
+	common dir. When Git is run in a repository, it usually makes no
+	difference whether this repository is standalone or a submodule, but if
+	this configuration variable is present, additional behavior may be
+	possible, such as "git status" printing additional information about
+	this submodule's status with respect to its superproject. This config
+	should only be present in projects which are submodules, but is not
+	guaranteed to be present in every submodule, so only optional
+	value-added behavior should be linked to it. It is set automatically
+	during submodule creation.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6298cbdd4e..f803812225 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1838,6 +1838,10 @@ static int clone_submodule(struct module_clone_data *clone_data)
 		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
 				       error_strategy);
 
+	git_config_set_in_file(p, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_common_dir()),
+					     sm_gitdir, &sb));
+
 	free(sm_alternate);
 	free(error_strategy);
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index d69a5c0032..3c20d42fbe 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -109,12 +109,24 @@ submodurl=$(pwd -P)
 
 inspect() {
 	sub_dir=$1 &&
+	super_dir=$2 &&
 
 	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
 	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
 	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
 	git -C "$sub_dir" update-index --refresh &&
 	git -C "$sub_dir" diff-files --exit-code &&
+
+	# Ensure that submodule.superprojectGitDir contains the path from the
+	# submodule's gitdir to the superproject's gitdir.
+
+	super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) &&
+	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) &&
+
+	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
+	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
+						"$sub_abs_gitdir")" ] &&
+
 	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
@@ -138,7 +150,7 @@ test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod &&
+	inspect addtest/submod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -240,7 +252,7 @@ test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch &&
+	inspect addtest/submod-branch addtest &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -256,7 +268,7 @@ test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz &&
+	inspect addtest/dotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -272,7 +284,7 @@ test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz &&
+	inspect addtest/dotslashdotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -288,7 +300,7 @@ test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz &&
+	inspect addtest/slashslashsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -304,7 +316,7 @@ test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod &&
+	inspect addtest/realsubmod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -320,7 +332,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 &&
+	inspect addtest/realsubmod2 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -351,7 +363,7 @@ test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 &&
+	inspect addtest/realsubmod3 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -492,7 +504,7 @@ test_expect_success 'update should work when path is an empty dir' '
 	git submodule update -q >update.out &&
 	test_must_be_empty update.out &&
 
-	inspect init &&
+	inspect init . &&
 	test_cmp expect head-sha1
 '
 
@@ -551,7 +563,7 @@ test_expect_success 'update should checkout rev1' '
 	echo "$rev1" >expect &&
 
 	git submodule update init &&
-	inspect init &&
+	inspect init . &&
 
 	test_cmp expect head-sha1
 '
-- 
2.34.0.rc0.344.g81b53c2807-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
@ 2021-11-04 23:49   ` Emily Shaffer
  2021-11-04 23:49   ` [PATCH v5 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
  2021-11-08  1:24   ` [PATCH v5 0/4] cache parent project's gitdir in submodules Derrick Stolee
  4 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:49 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Already during 'git submodule add' we record a pointer to the
superproject's gitdir. However, this doesn't help brand-new
submodules created with 'git init' and later absorbed with 'git
submodule absorbgitdirs'. Let's start adding that pointer during 'git
submodule absorbgitdirs' too.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 submodule.c                        |  9 ++++++
 t/t7412-submodule-absorbgitdirs.sh | 50 ++++++++++++++++++++++++++++--
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index c689070524..2e54dcf1a2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2097,6 +2097,7 @@ static void relocate_single_git_dir_into_superproject(const char *path)
 	char *old_git_dir = NULL, *real_old_git_dir = NULL, *real_new_git_dir = NULL;
 	struct strbuf new_gitdir = STRBUF_INIT;
 	const struct submodule *sub;
+	struct strbuf config_path = STRBUF_INIT, sb = STRBUF_INIT;
 
 	if (submodule_uses_worktrees(path))
 		die(_("relocate_gitdir for submodule '%s' with "
@@ -2127,6 +2128,14 @@ static void relocate_single_git_dir_into_superproject(const char *path)
 
 	relocate_gitdir(path, real_old_git_dir, real_new_git_dir);
 
+	/* cache pointer to superproject's gitdir */
+	strbuf_addf(&config_path, "%s/config", real_new_git_dir);
+	git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_common_dir()),
+					     real_new_git_dir, &sb));
+
+	strbuf_release(&config_path);
+	strbuf_release(&sb);
 	free(old_git_dir);
 	free(real_old_git_dir);
 	free(real_new_git_dir);
diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh
index 1cfa150768..b6f229043d 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -30,7 +30,17 @@ test_expect_success 'absorb the git dir' '
 	git status >actual.1 &&
 	git -C sub1 rev-parse HEAD >actual.2 &&
 	test_cmp expect.1 actual.1 &&
-	test_cmp expect.2 actual.2
+	test_cmp expect.2 actual.2 &&
+
+	# make sure the submodule cached the superproject gitdir correctly
+	submodule_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" &&
+	superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" &&
+
+	test-tool path-utils relative_path "$superproject_gitdir" \
+		"$submodule_gitdir" >expect &&
+	git -C sub1 config submodule.superprojectGitDir >actual &&
+
+	test_cmp expect actual
 '
 
 test_expect_success 'absorbing does not fail for deinitialized submodules' '
@@ -61,7 +71,16 @@ test_expect_success 'absorb the git dir in a nested submodule' '
 	git status >actual.1 &&
 	git -C sub1/nested rev-parse HEAD >actual.2 &&
 	test_cmp expect.1 actual.1 &&
-	test_cmp expect.2 actual.2
+	test_cmp expect.2 actual.2 &&
+
+	sub1_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" &&
+	sub1_nested_gitdir="$(git -C sub1/nested rev-parse --path-format=absolute --git-common-dir)" &&
+
+	test-tool path-utils relative_path "$sub1_gitdir" "$sub1_nested_gitdir" \
+		>expect &&
+	git -C sub1/nested config submodule.superprojectGitDir >actual &&
+
+	test_cmp expect actual
 '
 
 test_expect_success 're-setup nested submodule' '
@@ -130,4 +149,31 @@ test_expect_success 'absorbing fails for a submodule with multiple worktrees' '
 	test_i18ngrep "not supported" error
 '
 
+test_expect_success 'absorbgitdirs works when called from a superproject worktree' '
+	# set up a worktree of the superproject
+	git worktree add wt &&
+	(
+	cd wt &&
+
+	# create a new unembedded git dir
+	git init sub4 &&
+	test_commit -C sub4 first &&
+	git submodule add ./sub4 &&
+	test_tick &&
+
+	# absorb the git dir
+	git submodule absorbgitdirs sub4 &&
+
+	# make sure the submodule cached the superproject gitdir correctly
+	submodule_gitdir="$(git -C sub4 rev-parse --path-format=absolute --git-common-dir)" &&
+	superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" &&
+
+	test-tool path-utils relative_path "$superproject_gitdir" \
+		"$submodule_gitdir" >expect &&
+	git -C sub4 config submodule.superprojectGitDir >actual &&
+
+	test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.34.0.rc0.344.g81b53c2807-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
                     ` (2 preceding siblings ...)
  2021-11-04 23:49   ` [PATCH v5 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
@ 2021-11-04 23:49   ` Emily Shaffer
  2021-11-05  4:49     ` Junio C Hamano
  2021-11-05  8:51     ` Ævar Arnfjörð Bjarmason
  2021-11-08  1:24   ` [PATCH v5 0/4] cache parent project's gitdir in submodules Derrick Stolee
  4 siblings, 2 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-04 23:49 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

A recorded hint path to the superproject's gitdir might be added during
'git submodule add', but in some cases - like submodules which were
created before 'git submodule add' learned to record that info - it might
be useful to update the hint. Let's do it during 'git submodule
update', when we already have a handle to the superproject while calling
operations on the submodules.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 git-submodule.sh            | 14 ++++++++++++++
 t/t7406-submodule-update.sh | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/git-submodule.sh b/git-submodule.sh
index 652861aa66..873d64eb99 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -449,6 +449,20 @@ cmd_update()
 			;;
 		esac
 
+		# Cache a pointer to the superproject's common dir. This may have
+		# changed, unless it's a fresh clone. Writes it to worktree
+		# if applicable, otherwise to local.
+		if test -z "$just_cloned"
+		then
+			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
+			relative_gitdir="$(git rev-parse --path-format=relative \
+							 --prefix "${sm_gitdir}" \
+							 --git-common-dir)"
+
+			git -C "$sm_path" config --worktree \
+				submodule.superprojectgitdir "$relative_gitdir"
+		fi
+
 		if test -n "$recursive"
 		then
 			(
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 11cccbb333..5146237abc 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -1061,4 +1061,16 @@ test_expect_success 'submodule update --quiet passes quietness to fetch with a s
 	)
 '
 
+test_expect_success 'submodule update adds superproject gitdir to older repos' '
+	(cd super &&
+	 git -C submodule config --unset submodule.superprojectGitdir &&
+	 git submodule update &&
+	 test-tool path-utils relative_path \
+		"$(git rev-parse --path-format=absolute --git-common-dir)" \
+		"$(git -C submodule rev-parse --path-format=absolute --git-common-dir)" >expect &&
+	 git -C submodule config submodule.superprojectGitdir >actual &&
+	 test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.34.0.rc0.344.g81b53c2807-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-04 23:49   ` [PATCH v5 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
@ 2021-11-05  4:49     ` Junio C Hamano
  2021-11-05  8:43       ` Ævar Arnfjörð Bjarmason
  2021-11-05  8:51     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2021-11-05  4:49 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> A recorded hint path to the superproject's gitdir might be added during
> 'git submodule add', but in some cases - like submodules which were
> created before 'git submodule add' learned to record that info - it might
> be useful to update the hint. Let's do it during 'git submodule
> update', when we already have a handle to the superproject while calling
> operations on the submodules.

We are hearing repeated mention of "cache" and "hint".  Do we ever
invalidate it, or if we have such a record, do we blindly trust it
and use it without verifying if it is still fresh?

Also, this step and the previous step both say we record gitdir on
their title, but we instead record common dir.  Whichever is the
right choice to record, let's be consistent.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/4] introduce submodule.superprojectGitDir record
  2021-11-04 23:49   ` [PATCH v5 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
@ 2021-11-05  7:50     ` Junio C Hamano
  2021-11-08 23:16       ` Emily Shaffer
  0 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2021-11-05  7:50 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> By using a relative path instead of an absolute path, we can move the
> superproject directory around on the filesystem without breaking the
> submodule's cache. And by using the path from gitdir to gitdir, we can
> move the submodule within the superproject's tree structure without
> breaking the submodule's cache, too.

All of the above explains what the design choice is, and what
benefit the chosen design brings us.  But this last one ...

> Finally, by pointing at "get_git_common_dir()" instead of
> "get_git_dir()", we ensure the link will refer to the parent repo,
> not to a specific worktree.

... only says that we choose to point at the common one, not a
specific worktree (i.e. what behaviour was chosen by the design),
but it is left unclear what benefit it is trying to bring us.

Thinking aloud, imagine that there are two worktrees for the
superproject.  For simplicity, let's further imagine that these
worktrees have a clean check-out of the same commit, hence, these
two worktrees have the same commit from the same submodule checked
out at the same location relative to the project root.

The subdirectory in each of these two superproject worktrees that
checks out the submodule has .git file (as we "absorb" the gitdir in
the modern submodule layout) pointing at somewhere.  It probably is
OK if they point at the same place, but it might be more natural if
these two submodule checkouts are two worktrees for a single
submodule repository (this part I am not very clear, and that is why
I labeled the discussion "thinking aloud").

It seems to me that both design choices would have equally valid
arguments for them.  If both of these submodule worktrees point at
the "common" dir of the superproject, because the "common" one is
part of the primary worktree, which is harder to lose than secondary
worktrees, of the superproject, it is presumably harder to go stale
when "worktree rm" is done at superproject, which may be a plus.
But then from the "cached" pointer, each of these submodule
worktrees cannot tell which worktree of the superproject they are
checked out as a part of.  Of course we can go to the root level of
the submodule worktree and do the usual repository discovery to
learn where the root level of the superproject worktree it belongs
to, but it somehow feels that it defeats half the point of having
this "cache" information.

If we instead point at the git-dir, from there, it is just one level
of indirection to find out where the common dir of the superproject
is.

> If the new config is present, we can do some optional value-added
> behavior, like letting "git status" print additional info about the
> submodule's status in relation to its superproject, or like letting the
> superproject and submodule share an additional config file separate from
> either one's local config.

And one value-add that I would think of off the top of my head is to
show "we have commit X checked out, which is 4 commits on top of
what the superproject's index records for this submodule" when "git
status" is run in the submodule's worktree.  I do not know that is
an operation to specifically optimize for, but by choosing to
"cache" the common dir, not the one specific to the worktree of the
superporject our submodule checkout is a part of, the chosen design
seems to make it harder to implement?

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-05  4:49     ` Junio C Hamano
@ 2021-11-05  8:43       ` Ævar Arnfjörð Bjarmason
  2021-11-08 23:21         ` Emily Shaffer
  0 siblings, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05  8:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Shaffer, git


On Thu, Nov 04 2021, Junio C Hamano wrote:

> Emily Shaffer <emilyshaffer@google.com> writes:
>
>> A recorded hint path to the superproject's gitdir might be added during
>> 'git submodule add', but in some cases - like submodules which were
>> created before 'git submodule add' learned to record that info - it might
>> be useful to update the hint. Let's do it during 'git submodule
>> update', when we already have a handle to the superproject while calling
>> operations on the submodules.
>
> We are hearing repeated mention of "cache" and "hint".  Do we ever
> invalidate it, or if we have such a record, do we blindly trust it
> and use it without verifying if it is still fresh?
>
> Also, this step and the previous step both say we record gitdir on
> their title, but we instead record common dir.  Whichever is the
> right choice to record, let's be consistent.

I had similar (AFAICT still unaddressed) feedback on the v2[1]. I'd lost
track of this series, and see one reason is that the In-Reply-Chain was
broken between v3..v4.

I.e. it seems to me that this whole thing started as a way to avoid
shellscript overhead by calling git-rev-parse from git-submodule.sh, but
now that the relevant bits are moved to C we could just call some
slightly adjusted code in setup.c.

Maybe I'm entirely wrong, but I think if I am that this series has a
commit message gap between the "hint" and "cache" and this really *does*
need to be written at clone/init/update time in some way that I've
missed.

Otherwise I still don't really get it, sorry. I.e. maybe the relevant
code in setup.c really does need caching, or maybe it doesn't anymore,
and this cache-via-config is a hold-over from when figuring out the
parent repo implied needing to shell out somewhere for every operation.

1. https://lore.kernel.org/git/87r1cnfkqx.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-04 23:49   ` [PATCH v5 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
  2021-11-05  4:49     ` Junio C Hamano
@ 2021-11-05  8:51     ` Ævar Arnfjörð Bjarmason
  2021-11-08 23:22       ` Emily Shaffer
  1 sibling, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-05  8:51 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Atharva Raykar


On Thu, Nov 04 2021, Emily Shaffer wrote:

> A recorded hint path to the superproject's gitdir might be added during
> 'git submodule add', but in some cases - like submodules which were
> created before 'git submodule add' learned to record that info - it might
> be useful to update the hint. Let's do it during 'git submodule
> update', when we already have a handle to the superproject while calling
> operations on the submodules.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>  git-submodule.sh            | 14 ++++++++++++++
>  t/t7406-submodule-update.sh | 12 ++++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 652861aa66..873d64eb99 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -449,6 +449,20 @@ cmd_update()
>  			;;
>  		esac
>  
> +		# Cache a pointer to the superproject's common dir. This may have
> +		# changed, unless it's a fresh clone. Writes it to worktree
> +		# if applicable, otherwise to local.
> +		if test -z "$just_cloned"
> +		then
> +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
> +			relative_gitdir="$(git rev-parse --path-format=relative \
> +							 --prefix "${sm_gitdir}" \
> +							 --git-common-dir)"
> +
> +			git -C "$sm_path" config --worktree \
> +				submodule.superprojectgitdir "$relative_gitdir"
> +		fi
> +

Aside from the "is this really needed anymore?" comment on this caching
strategy in general I had in [1] does this really need to be adding new
shell code to git-submodule.sh given that we're actively trying to get
rid of it entirely and move it over to C.

I.e. here we've just called "git submodule--helper
run-update-procedure", and we pass it "$sm_path" (but not "$recursive",
but could easily do so).

It needs to clone this new submodule, so presumably it already has a
"$sm_gitdir" equivalent, and we can turn that into the same relative
path, no?

Can't we call this with a git_config_set*() somewhere in that code?

*investigates a bit*

Okey, so I see that [2] is part of a series that Atharva Raykar had a
version of (including this new git-submodule.sh code above) [3] rebased
on top of an older version of this topic. I.e. this bit is that part of that patch:

+       git_config_set_in_file(config_path, "submodule.superprojectGitdir",
+                              relative_path(get_git_dir(),
+                                            update_data->sm_path, &sb));

I also vaguely recall that Junio ejected his topic to make room for this
topic first.

Maybe I've missed some update on this but is his [2][3] broken in
combination with your topic here? And re[1] is whatever "caching" is
being done here still beneficial once this is all moved to C?

In your [4] there seems to be an agreement to do it the other way
around, but as noted in the mail linked from [1] maybe the caching isn't
needed anymore then?

1. https://lore.kernel.org/git/211105.861r3vhtot.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/20211013051805.45662-8-raykar.ath@gmail.com/
3. https://lore.kernel.org/git/20211013051805.45662-1-raykar.ath@gmail.com/
4. https://lore.kernel.org/git/YWiXL+plA7GHfuVv@google.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs
  2021-11-04 23:22       ` Emily Shaffer
@ 2021-11-08  1:07         ` Derrick Stolee
  0 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2021-11-08  1:07 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, git

On 11/4/2021 7:22 PM, Emily Shaffer wrote:
> On Mon, Oct 25, 2021 at 12:14:07PM -0400, Derrick Stolee wrote:
>> Since you're setting this config value inside the submodule's config,
>> what does it mean for a submodule to also be a worktree (and hence
>> require config.worktree)? What happens in this rough scenario?
>>
>>   1. git init sub
>>   2. git init super
>>   3. git -C sub worktree add super/sub
>>   4. git -C super submodule absorbgitdir sub
>>
>> I haven't actually tried running these things, but it seems unusual
>> and unexpected. This doesn't even account for cases where the repo
>> root and a worktree are both submodules within the superproject.
>>
>> If we already have protections preventing these worktrees as
>> submodules, then perhaps there is no need for work here. I'm not
>> familiar enough with the area to make a claim one way or another.
> 
> Yeah, I think there is actually a test case covering this in t7412:
...
> 147 test_expect_success 'absorbing fails for a submodule with multiple worktrees' '
> 148         test_must_fail git submodule absorbgitdirs sub3 2>error &&
> 149         test_i18ngrep "not supported" error
> 150 '
> 
> That is, I think because 'sub/' in your scenario above has multiple
> worktrees, the absorbgitdirs will fail. So I won't do additional work
> here. Thanks.

Good to hear. Thanks for tracking that down.

-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/4] cache parent project's gitdir in submodules
  2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
                     ` (3 preceding siblings ...)
  2021-11-04 23:49   ` [PATCH v5 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
@ 2021-11-08  1:24   ` Derrick Stolee
  2021-11-08 23:11     ` Emily Shaffer
  4 siblings, 1 reply; 31+ messages in thread
From: Derrick Stolee @ 2021-11-08  1:24 UTC (permalink / raw)
  To: Emily Shaffer, git
  Cc: Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Jonathan Tan

On 11/4/2021 7:49 PM, Emily Shaffer wrote:

> The only real change here is a slight semantics change to map from
> <submodule gitdir> to <superproject common git dir>. In every case
> *except* for when the superproject has a worktree, this changes nothing.
> For the case when the superproject has a worktree, this means that now
> submodules will refer to the general superproject common dir (e.g. no
> worktree-specific refs or configs or whatnot).
> 
> I *think* that because a submodule should exist in the context of the
> common dir, not the worktree gitdir, that is ok. However, it does mean
> it would be difficult to do something like sharing a config specific to
> the worktree (the initial goal of this series).
> 
> $ROOT/.git
> $ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub
> $ROOT/.git/modules/sub <- points to $ROOT/.git
> $ROOT/.git/worktrees/wt
> $ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook
> 
> If the submodule only knows about the common dir, that is tough, because
> the submodule would basically have to guess which worktree it's in from
> its own path. There would be no way for '$WT/sub' to inherit
> '$ROOT/.git/worktrees/wt/config.superproject'.
> 
> That said... right now, we don't support submodules in worktrees very
> well at all. A submodule in a worktree will get a brand new gitdir in
> $ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to
> the super's common dir). So I think we can punt on this entire question
> until we teach submodules and worktrees to play more gracefully together
> (it's on my long list...),

(I omit a portion that will be discussed later.)

> Or, to summarize the long ramble above: "this is still kind of weird
> with worktrees, but let's fix it later when we fix worktrees more
> thoroughly".

I'm concerned about punting here, because making a messy situation worse
is unlikely to have a clean way out. Could we set up a design that works
with superproject worktrees?

You mentioned that submodules cannot have worktrees. At least, you said
that 'absorbgitdirs' does not allow them. Could those subprojects still
exist and be registered as submodules without using that command?

What I'm trying to hint at is that if the submodules can't have
worktrees, then maybe we could make their 'config.worktree' files be
relative to the superproject worktrees. Then, these submodules could
point to the commondir in their base config and _also_ to the worktree
gitdir in their config.worktree.

The issue that is immediately obvious here is that my definition is
circular: we need to know the superproject worktree in order to discover
the config.worktree which contains the information about the superproject
worktree.

> and at that time we can probably introduce a
> pointer from $ROOT/.git/modules/sub/worktrees/wt/ to
> $ROOT/.git/worktrees/wt/....

Your idea here appears to assume that if the superproject has worktrees,
then the submodule is divided into worktrees in an exact correspondence.
This would allow the submodule's config.worktree to point to the
superproject's worktree (or possibly it could be inferred from the
submodule's worktree relative to the submodule's commondir).

This seems like an interesting way forward, but requires changing how
'git absorbgitdirs' works, along with changes to 'git worktree' or other
submodule commands when the submodule first appears during a 'git checkout'
in a worktree. I imagine there are a lot of "gotchas" here. It is worth
spending some time imagining how to create this setup and/or enforce it
as submodules are added in the lifecycle of a repository, if only to
validate the config design presented by this series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/4] cache parent project's gitdir in submodules
  2021-11-08  1:24   ` [PATCH v5 0/4] cache parent project's gitdir in submodules Derrick Stolee
@ 2021-11-08 23:11     ` Emily Shaffer
  2021-11-09 15:58       ` Derrick Stolee
  0 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-08 23:11 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Jonathan Tan

On Sun, Nov 07, 2021 at 08:24:43PM -0500, Derrick Stolee wrote:
> 
> On 11/4/2021 7:49 PM, Emily Shaffer wrote:
> 
> > The only real change here is a slight semantics change to map from
> > <submodule gitdir> to <superproject common git dir>. In every case
> > *except* for when the superproject has a worktree, this changes nothing.
> > For the case when the superproject has a worktree, this means that now
> > submodules will refer to the general superproject common dir (e.g. no
> > worktree-specific refs or configs or whatnot).
> > 
> > I *think* that because a submodule should exist in the context of the
> > common dir, not the worktree gitdir, that is ok. However, it does mean
> > it would be difficult to do something like sharing a config specific to
> > the worktree (the initial goal of this series).
> > 
> > $ROOT/.git
> > $ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub
> > $ROOT/.git/modules/sub <- points to $ROOT/.git
> > $ROOT/.git/worktrees/wt
> > $ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook
> > 
> > If the submodule only knows about the common dir, that is tough, because
> > the submodule would basically have to guess which worktree it's in from
> > its own path. There would be no way for '$WT/sub' to inherit
> > '$ROOT/.git/worktrees/wt/config.superproject'.
> > 
> > That said... right now, we don't support submodules in worktrees very
> > well at all. A submodule in a worktree will get a brand new gitdir in
> > $ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to
> > the super's common dir). So I think we can punt on this entire question
> > until we teach submodules and worktrees to play more gracefully together
> > (it's on my long list...),
> 
> (I omit a portion that will be discussed later.)
> 
> > Or, to summarize the long ramble above: "this is still kind of weird
> > with worktrees, but let's fix it later when we fix worktrees more
> > thoroughly".
> 
> I'm concerned about punting here, because making a messy situation worse
> is unlikely to have a clean way out. Could we set up a design that works
> with superproject worktrees?
> 
> You mentioned that submodules cannot have worktrees. At least, you said
> that 'absorbgitdirs' does not allow them. Could those subprojects still
> exist and be registered as submodules without using that command?
> 
> What I'm trying to hint at is that if the submodules can't have
> worktrees, then maybe we could make their 'config.worktree' files be
> relative to the superproject worktrees. Then, these submodules could
> point to the commondir in their base config and _also_ to the worktree
> gitdir in their config.worktree.
> 
> The issue that is immediately obvious here is that my definition is
> circular: we need to know the superproject worktree in order to discover
> the config.worktree which contains the information about the superproject
> worktree.
> 
> > and at that time we can probably introduce a
> > pointer from $ROOT/.git/modules/sub/worktrees/wt/ to
> > $ROOT/.git/worktrees/wt/....
> 
> Your idea here appears to assume that if the superproject has worktrees,
> then the submodule is divided into worktrees in an exact correspondence.
> This would allow the submodule's config.worktree to point to the
> superproject's worktree (or possibly it could be inferred from the
> submodule's worktree relative to the submodule's commondir).
> 
> This seems like an interesting way forward, but requires changing how
> 'git absorbgitdirs' works, along with changes to 'git worktree' or other
> submodule commands when the submodule first appears during a 'git checkout'
> in a worktree. I imagine there are a lot of "gotchas" here. It is worth
> spending some time imagining how to create this setup and/or enforce it
> as submodules are added in the lifecycle of a repository, if only to
> validate the config design presented by this series.

Yeah, I think we may be overthinking it, especially with the concerns
about common dir vs. gitdir. More specifically - I think we accidentally
did the right thing in the previous iteration by using the gitdir :)

I think we can probably put it pretty simply:
submodule.superprojectGitDir should point from the most local gitdir of
the submodule to the most local gitdir of the superproject.

Luckily there are not so many permutations to worry about here.

Super doesn't have worktrees, sub doesn't have worktrees:
.git/
  modules/
    sub/
      config <- contains a pointer to ".git/"
  config

Super doesn't have worktrees, sub does have worktrees (and as you
suggest above, right now this would have to be created carefully and
manually, but later we probably want this to Just Work):
.git/
  modules/
    sub/
      config
      config.worktree <- contains a pointer to ".git/"
      worktrees/
        sub-wt/
	config <- contains a pointer to ".git/"
  config

Super has worktrees, sub doesn't have worktrees:
Actually, I think in the future this might not be possible, if we want
to make `git worktree add --recurse-submodules` work gracefully (and I
do want that). But in the interim, in practice it looks like this:
.git/
  modules/
    sub/
      config <- contains a pointer to ".git/"
  worktrees/
    super-wt/
      modules/
        sub/
	  config <- contains a pointer to ".git/worktrees/super-wt"
      config
  config
This case is pretty weird anyway, because in order for a submodule to be
present in multiple worktrees of the superproject, the submodule itself
needs to either have multiple worktrees or multiple repos. But on the
flip side, today it's impossible for a single submodule gitdir to need
to know about more superproject worktrees than the one it was
initted/whatever into.

Both super and sub have worktrees:
And this won't exist until we have graceful support of `git worktree add
--recurse-submodules` or with some manual effort, now.
.git/
  modules/
    sub/
      worktrees/
        sub-wt/
	  config <- contains a pointer to ".git/worktrees/super-wt/"
      config
      config.worktree <- contains a pointer to ".git/"
  worktrees/
    super-wt/
      config
  config
  config.worktree

I think this will give us access to both the worktree gitdir *and* the
common gitdir:

  ~/git/.git/worktrees/git-second [GIT_DIR!]$ git rev-parse --git-common-dir
  /usr/local/google/home/emilyshaffer/git/.git

So that means from any submodule, we can determine:
 - submodule's gitdir (from the .git link in the submodule wt)
 - submodule's common dir (from existing commands)
 - gitdir of superproject which submodule inhabits (from the config in
   the submodule's gitdir, or the submodule's config.worktree)
 - common dir of superproject (from existing commands + prior config)

The upshot to me, then, means that we should be 1) making sure to get
the path to the gitdir, not the common dir, of the superproject; and 2)
using helpers to write to the worktree config, not to the local config,
of the submodule. In other words, we want to avoid the following:

.git/
  modules/
    sub/
      worktrees/
        wt/
	  config
      config <- "submodule.superprojectGitDir = ../../../.." as written by the worktree

Will take a look at the rest of the comments too, but this sounds like a
reasonable approach to me.

 - Emily

> 
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 2/4] introduce submodule.superprojectGitDir record
  2021-11-05  7:50     ` Junio C Hamano
@ 2021-11-08 23:16       ` Emily Shaffer
  0 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-08 23:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 05, 2021 at 12:50:22AM -0700, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > By using a relative path instead of an absolute path, we can move the
> > superproject directory around on the filesystem without breaking the
> > submodule's cache. And by using the path from gitdir to gitdir, we can
> > move the submodule within the superproject's tree structure without
> > breaking the submodule's cache, too.
> 
> All of the above explains what the design choice is, and what
> benefit the chosen design brings us.  But this last one ...
> 
> > Finally, by pointing at "get_git_common_dir()" instead of
> > "get_git_dir()", we ensure the link will refer to the parent repo,
> > not to a specific worktree.
> 
> ... only says that we choose to point at the common one, not a
> specific worktree (i.e. what behaviour was chosen by the design),
> but it is left unclear what benefit it is trying to bring us.
> 
> Thinking aloud, imagine that there are two worktrees for the
> superproject.  For simplicity, let's further imagine that these
> worktrees have a clean check-out of the same commit, hence, these
> two worktrees have the same commit from the same submodule checked
> out at the same location relative to the project root.
> 
> The subdirectory in each of these two superproject worktrees that
> checks out the submodule has .git file (as we "absorb" the gitdir in
> the modern submodule layout) pointing at somewhere.  It probably is
> OK if they point at the same place, but it might be more natural if
> these two submodule checkouts are two worktrees for a single
> submodule repository (this part I am not very clear, and that is why
> I labeled the discussion "thinking aloud").
> 
> It seems to me that both design choices would have equally valid
> arguments for them.  If both of these submodule worktrees point at
> the "common" dir of the superproject, because the "common" one is
> part of the primary worktree, which is harder to lose than secondary
> worktrees, of the superproject, it is presumably harder to go stale
> when "worktree rm" is done at superproject, which may be a plus.
> But then from the "cached" pointer, each of these submodule
> worktrees cannot tell which worktree of the superproject they are
> checked out as a part of.  Of course we can go to the root level of
> the submodule worktree and do the usual repository discovery to
> learn where the root level of the superproject worktree it belongs
> to, but it somehow feels that it defeats half the point of having
> this "cache" information.
> 
> If we instead point at the git-dir, from there, it is just one level
> of indirection to find out where the common dir of the superproject
> is.
> 
> > If the new config is present, we can do some optional value-added
> > behavior, like letting "git status" print additional info about the
> > submodule's status in relation to its superproject, or like letting the
> > superproject and submodule share an additional config file separate from
> > either one's local config.
> 
> And one value-add that I would think of off the top of my head is to
> show "we have commit X checked out, which is 4 commits on top of
> what the superproject's index records for this submodule" when "git
> status" is run in the submodule's worktree.  I do not know that is
> an operation to specifically optimize for, but by choosing to
> "cache" the common dir, not the one specific to the worktree of the
> superporject our submodule checkout is a part of, the chosen design
> seems to make it harder to implement?

Yeah, that is compelling. Sorry not to reply in depth, but I am
convinced.

 - Emily

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-05  8:43       ` Ævar Arnfjörð Bjarmason
@ 2021-11-08 23:21         ` Emily Shaffer
  2021-11-09  0:42           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-08 23:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

On Fri, Nov 05, 2021 at 09:43:56AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Nov 04 2021, Junio C Hamano wrote:
> 
> > Emily Shaffer <emilyshaffer@google.com> writes:
> >
> >> A recorded hint path to the superproject's gitdir might be added during
> >> 'git submodule add', but in some cases - like submodules which were
> >> created before 'git submodule add' learned to record that info - it might
> >> be useful to update the hint. Let's do it during 'git submodule
> >> update', when we already have a handle to the superproject while calling
> >> operations on the submodules.
> >
> > We are hearing repeated mention of "cache" and "hint".  Do we ever
> > invalidate it, or if we have such a record, do we blindly trust it
> > and use it without verifying if it is still fresh?
> >
> > Also, this step and the previous step both say we record gitdir on
> > their title, but we instead record common dir.  Whichever is the
> > right choice to record, let's be consistent.
> 
> I had similar (AFAICT still unaddressed) feedback on the v2[1]. I'd lost
> track of this series, and see one reason is that the In-Reply-Chain was
> broken between v3..v4.
> 
> I.e. it seems to me that this whole thing started as a way to avoid
> shellscript overhead by calling git-rev-parse from git-submodule.sh, but
> now that the relevant bits are moved to C we could just call some
> slightly adjusted code in setup.c.

No, that is not the case. It is the case that `git -C .. rev-parse
--git-dir` is *very* expensive in the case where `../` is not, in fact,
a gitdir; when I attempted another series which relied on finding the
parent superproject's gitdir in this way, our testsuite took something
like 5x longer to run than before. In other words, the expensive part is
not the shelling out overhead - the expensive part is searching up the
entire filesystem directory structure in the worst-case ("we are not a
submodule") scenario. This is still needed, even with 'git-submodule.sh'
moving to C.

> 
> Maybe I'm entirely wrong, but I think if I am that this series has a
> commit message gap between the "hint" and "cache" and this really *does*
> need to be written at clone/init/update time in some way that I've
> missed.
> 
> Otherwise I still don't really get it, sorry. I.e. maybe the relevant
> code in setup.c really does need caching, or maybe it doesn't anymore,
> and this cache-via-config is a hold-over from when figuring out the
> parent repo implied needing to shell out somewhere for every operation.
> 
> 1. https://lore.kernel.org/git/87r1cnfkqx.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-05  8:51     ` Ævar Arnfjörð Bjarmason
@ 2021-11-08 23:22       ` Emily Shaffer
  2021-11-09  1:12         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-08 23:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Atharva Raykar

On Fri, Nov 05, 2021 at 09:51:12AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Nov 04 2021, Emily Shaffer wrote:
> 
> > A recorded hint path to the superproject's gitdir might be added during
> > 'git submodule add', but in some cases - like submodules which were
> > created before 'git submodule add' learned to record that info - it might
> > be useful to update the hint. Let's do it during 'git submodule
> > update', when we already have a handle to the superproject while calling
> > operations on the submodules.
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> >  git-submodule.sh            | 14 ++++++++++++++
> >  t/t7406-submodule-update.sh | 12 ++++++++++++
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/git-submodule.sh b/git-submodule.sh
> > index 652861aa66..873d64eb99 100755
> > --- a/git-submodule.sh
> > +++ b/git-submodule.sh
> > @@ -449,6 +449,20 @@ cmd_update()
> >  			;;
> >  		esac
> >  
> > +		# Cache a pointer to the superproject's common dir. This may have
> > +		# changed, unless it's a fresh clone. Writes it to worktree
> > +		# if applicable, otherwise to local.
> > +		if test -z "$just_cloned"
> > +		then
> > +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
> > +			relative_gitdir="$(git rev-parse --path-format=relative \
> > +							 --prefix "${sm_gitdir}" \
> > +							 --git-common-dir)"
> > +
> > +			git -C "$sm_path" config --worktree \
> > +				submodule.superprojectgitdir "$relative_gitdir"
> > +		fi
> > +
> 
> Aside from the "is this really needed anymore?" comment on this caching
> strategy in general I had in [1] does this really need to be adding new
> shell code to git-submodule.sh given that we're actively trying to get
> rid of it entirely and move it over to C.
> 
> I.e. here we've just called "git submodule--helper
> run-update-procedure", and we pass it "$sm_path" (but not "$recursive",
> but could easily do so).
> 
> It needs to clone this new submodule, so presumably it already has a
> "$sm_gitdir" equivalent, and we can turn that into the same relative
> path, no?
> 
> Can't we call this with a git_config_set*() somewhere in that code?
> 
> *investigates a bit*
> 
> Okey, so I see that [2] is part of a series that Atharva Raykar had a
> version of (including this new git-submodule.sh code above) [3] rebased
> on top of an older version of this topic. I.e. this bit is that part of that patch:
> 
> +       git_config_set_in_file(config_path, "submodule.superprojectGitdir",
> +                              relative_path(get_git_dir(),
> +                                            update_data->sm_path, &sb));
> 
> I also vaguely recall that Junio ejected his topic to make room for this
> topic first.
> 
> Maybe I've missed some update on this but is his [2][3] broken in
> combination with your topic here? And re[1] is whatever "caching" is
> being done here still beneficial once this is all moved to C?
> 
> In your [4] there seems to be an agreement to do it the other way
> around, but as noted in the mail linked from [1] maybe the caching isn't
> needed anymore then?

I answered vs. your other mail; yes, it's still needed, and last I
checked Atharva's series had been kicked out to make room for this one.

> 
> 1. https://lore.kernel.org/git/211105.861r3vhtot.gmgdl@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/20211013051805.45662-8-raykar.ath@gmail.com/
> 3. https://lore.kernel.org/git/20211013051805.45662-1-raykar.ath@gmail.com/
> 4. https://lore.kernel.org/git/YWiXL+plA7GHfuVv@google.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-08 23:21         ` Emily Shaffer
@ 2021-11-09  0:42           ` Ævar Arnfjörð Bjarmason
  2021-11-09 20:36             ` Emily Shaffer
  0 siblings, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-09  0:42 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, git


On Mon, Nov 08 2021, Emily Shaffer wrote:

> On Fri, Nov 05, 2021 at 09:43:56AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> 
>> On Thu, Nov 04 2021, Junio C Hamano wrote:
>> 
>> > Emily Shaffer <emilyshaffer@google.com> writes:
>> >
>> >> A recorded hint path to the superproject's gitdir might be added during
>> >> 'git submodule add', but in some cases - like submodules which were
>> >> created before 'git submodule add' learned to record that info - it might
>> >> be useful to update the hint. Let's do it during 'git submodule
>> >> update', when we already have a handle to the superproject while calling
>> >> operations on the submodules.
>> >
>> > We are hearing repeated mention of "cache" and "hint".  Do we ever
>> > invalidate it, or if we have such a record, do we blindly trust it
>> > and use it without verifying if it is still fresh?
>> >
>> > Also, this step and the previous step both say we record gitdir on
>> > their title, but we instead record common dir.  Whichever is the
>> > right choice to record, let's be consistent.
>> 
>> I had similar (AFAICT still unaddressed) feedback on the v2[1]. I'd lost
>> track of this series, and see one reason is that the In-Reply-Chain was
>> broken between v3..v4.
>> 
>> I.e. it seems to me that this whole thing started as a way to avoid
>> shellscript overhead by calling git-rev-parse from git-submodule.sh, but
>> now that the relevant bits are moved to C we could just call some
>> slightly adjusted code in setup.c.
>
> No, that is not the case. It is the case that `git -C .. rev-parse
> --git-dir` is *very* expensive in the case where `../` is not, in fact,
> a gitdir; when I attempted another series which relied on finding the
> parent superproject's gitdir in this way, our testsuite took something
> like 5x longer to run than before. In other words, the expensive part is
> not the shelling out overhead - the expensive part is searching up the
> entire filesystem directory structure in the worst-case ("we are not a
> submodule") scenario. This is still needed, even with 'git-submodule.sh'
> moving to C.

Do you have that test code somewhere? I tried to reproduce this &
can't. I run my tests in /home/avar/*, and just created this:

    $ find /tmp/some/ -name '.git' -type d
    /tmp/some/dir/.git
    /tmp/some/dir/a/b/c/d/e/f/g/i/j/k/.git

I.e. a deeply nested structure in /tmp, if you ask for the git-dir in
/tmp/some/**/k you'll need to search several levels up.

Then with the patch below we'll instrument almost all git commands to
optionally do that search, i.e. anything that does setup_git_directory()
at all:

    $ GIT_TEST_SETUP=true GIT_TEST_SETUP_PRINT=true ~/g/git/git rev-parse HEAD
    warning: from '/tmp/some/dir/a/b/c/d/e/f/g/i/j' found '/tmp/some/dir' ('/tmp/some/dir/.git')
    <some hash here>

And as a quick test to run a few tests I tried:

    rm -rf test-results/; GIT_TEST_SETUP=true GIT_TEST_SETUP_PRINT=true prove -j8 t741*submod*.sh :: -V

Which runs quickly enough for a tight test loop, and does that work >600 times:
    
    $ cat test-results/*.out|grep -c warning.*from.*found
    662

I can't get that to show me any meaningful difference, just to pick on
one test (since it was easier to run repeatedly):
    
    $ hyperfine -L v true,false "GIT_TEST_SETUP={v} ./t7416-submodule-dash-url.sh --root=/dev/shm/git"
    Benchmark #1: GIT_TEST_SETUP=true ./t7416-submodule-dash-url.sh --root=/dev/shm/git
      Time (mean ± σ):     527.5 ms ±   7.2 ms    [User: 431.6 ms, System: 125.9 ms]
      Range (min … max):   522.6 ms … 542.5 ms    10 runs
     
    Benchmark #2: GIT_TEST_SETUP=false ./t7416-submodule-dash-url.sh --root=/dev/shm/git
      Time (mean ± σ):     526.7 ms ±  10.8 ms    [User: 421.1 ms, System: 131.6 ms]
      Range (min … max):   518.2 ms … 546.8 ms    10 runs
     
    Summary
      'GIT_TEST_SETUP=false ./t7416-submodule-dash-url.sh --root=/dev/shm/git' ran
        1.00 ± 0.02 times faster than 'GIT_TEST_SETUP=true ./t7416-submodule-dash-url.sh --root=/dev/shm/git'

I.e. it's all fuzzy and within the error margins.

Now if we do e.g.:

    GIT_TEST_SETUP=false strace -f -c -U calls,name,time -S calls ./t7416-submodule-dash-url.sh 2>&1 >/dev/null | grep -A9000 calls >a
    GIT_TEST_SETUP=true strace -f -c -U calls,name,time -S calls ./t7416-submodule-dash-url.sh 2>&1 >/dev/null | grep -A9000 calls >b

We'll see the syscall difference, in summary:

    -   110086 total            100.00
    +   114765 total            100.00

And some of the real big differences are:
    
    $ diff -u <(head -n 12 a) <(head -n 12 b)
    --- /dev/fd/63  2021-11-09 01:57:16.023991556 +0100
    +++ /dev/fd/62  2021-11-09 01:57:16.019991593 +0100
    @@ -1,12 +1,12 @@
         calls syscall          % time
     --------- ---------------- ------
    -    11504 openat             2.15
    -    11496 close              2.07
    -    10672 read               4.15
    -    10465 rt_sigaction       0.32
    -     9913 lstat              1.50
    -     9456 mmap               0.67
    -     7545 stat               0.81
    -     6349 fstat              0.62
    -     3896 access             0.61
    -     3490 mprotect           0.26
    +    11783 lstat              1.80
    +    11600 close              1.89
    +    11599 openat             2.19
    +    10887 read               4.36
    +    10465 rt_sigaction       0.33
    +     9742 stat               1.07
    +     9455 mmap               0.75
    +     6346 fstat              0.57
    +     4113 access             0.65
    +     3490 mprotect           0.28

But syscalls are fast, so it doesn't show up in real results.

Now, of course a real implementation could be less stupid, e.g. even if
we think we need *a cache* if these are the performance numbers why do
we need to risk the cache being incorrect v.s. say just writing "I am a
submodule" somewhere in the config (if we don't have that).

Then we'll only do that work for submodules, so not all git invocations
will pay the cost (and it this point we'll usually have read config
anyway).

But I really just don't think it's that expensive at all. I can see how
it would be for actually shelling out, but we don't need to do that.

This could also just be that I'm running this on a really fast FS, which
is true. So I went and tested on an AIX machine I have access to.

I/O on AIX is slow, *really slow*, so slow that if you "rm -rfv"
something you'll have time to read individual lines scrolling by.

That ~500ms t7416-submodule-dash-url.sh test runs in around 50s on that
AIX box (power-aix.osuosl.org), most of which is I/O overhead, I created
the same /tmp/ directory structure and tried with
GIT_TEST_SETUP=[false|true] and it's ~55s with/without the env variable,
with no clear winner.

I don't have access to hyperfine on that box, or the patience to wait
for AIX I/O to wait for meaningful results, but to a first approximation
that seems to indicate that it doesn't really matter there either.

diff --git a/setup.c b/setup.c
index 347d7181ae9..8453d397676 100644
--- a/setup.c
+++ b/setup.c
@@ -1209,6 +1209,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
        struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
        const char *prefix = NULL;
        struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
+       const char *str = "/tmp/some/dir/a/b/c/d/e/f/g/i/j";
+       struct strbuf a = STRBUF_INIT, b = STRBUF_INIT;
 
        /*
         * We may have read an incomplete configuration before
@@ -1231,6 +1233,14 @@ const char *setup_git_directory_gently(int *nongit_ok)
                die_errno(_("Unable to read current working directory"));
        strbuf_addbuf(&dir, &cwd);
 
+       if (git_env_bool("GIT_TEST_SETUP", 0)) {
+               strbuf_addstr(&a, str);
+               setup_git_directory_gently_1(&a, &b, 0);
+
+               if (strcmp(a.buf, str) && git_env_bool("GIT_TEST_SETUP_PRINT", 0))
+                       warning("from '%s' found '%s' ('%s/%s')", str, a.buf, a.buf, b.buf);
+       }
+
        switch (setup_git_directory_gently_1(&dir, &gitdir, 1)) {
        case GIT_DIR_EXPLICIT:
                prefix = setup_explicit_git_dir(gitdir.buf, &cwd, &repo_fmt, nongit_ok);


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-08 23:22       ` Emily Shaffer
@ 2021-11-09  1:12         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-09  1:12 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Atharva Raykar


On Mon, Nov 08 2021, Emily Shaffer wrote:

> On Fri, Nov 05, 2021 at 09:51:12AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> 
>> On Thu, Nov 04 2021, Emily Shaffer wrote:
>> 
>> > A recorded hint path to the superproject's gitdir might be added during
>> > 'git submodule add', but in some cases - like submodules which were
>> > created before 'git submodule add' learned to record that info - it might
>> > be useful to update the hint. Let's do it during 'git submodule
>> > update', when we already have a handle to the superproject while calling
>> > operations on the submodules.
>> >
>> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
>> > ---
>> >  git-submodule.sh            | 14 ++++++++++++++
>> >  t/t7406-submodule-update.sh | 12 ++++++++++++
>> >  2 files changed, 26 insertions(+)
>> >
>> > diff --git a/git-submodule.sh b/git-submodule.sh
>> > index 652861aa66..873d64eb99 100755
>> > --- a/git-submodule.sh
>> > +++ b/git-submodule.sh
>> > @@ -449,6 +449,20 @@ cmd_update()
>> >  			;;
>> >  		esac
>> >  
>> > +		# Cache a pointer to the superproject's common dir. This may have
>> > +		# changed, unless it's a fresh clone. Writes it to worktree
>> > +		# if applicable, otherwise to local.
>> > +		if test -z "$just_cloned"
>> > +		then
>> > +			sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)"
>> > +			relative_gitdir="$(git rev-parse --path-format=relative \
>> > +							 --prefix "${sm_gitdir}" \
>> > +							 --git-common-dir)"
>> > +
>> > +			git -C "$sm_path" config --worktree \
>> > +				submodule.superprojectgitdir "$relative_gitdir"
>> > +		fi
>> > +
>> 
>> Aside from the "is this really needed anymore?" comment on this caching
>> strategy in general I had in [1] does this really need to be adding new
>> shell code to git-submodule.sh given that we're actively trying to get
>> rid of it entirely and move it over to C.
>> 
>> I.e. here we've just called "git submodule--helper
>> run-update-procedure", and we pass it "$sm_path" (but not "$recursive",
>> but could easily do so).
>> 
>> It needs to clone this new submodule, so presumably it already has a
>> "$sm_gitdir" equivalent, and we can turn that into the same relative
>> path, no?
>> 
>> Can't we call this with a git_config_set*() somewhere in that code?
>> 
>> *investigates a bit*
>> 
>> Okey, so I see that [2] is part of a series that Atharva Raykar had a
>> version of (including this new git-submodule.sh code above) [3] rebased
>> on top of an older version of this topic. I.e. this bit is that part of that patch:
>> 
>> +       git_config_set_in_file(config_path, "submodule.superprojectGitdir",
>> +                              relative_path(get_git_dir(),
>> +                                            update_data->sm_path, &sb));
>> 
>> I also vaguely recall that Junio ejected his topic to make room for this
>> topic first.
>> 
>> Maybe I've missed some update on this but is his [2][3] broken in
>> combination with your topic here? And re[1] is whatever "caching" is
>> being done here still beneficial once this is all moved to C?
>> 
>> In your [4] there seems to be an agreement to do it the other way
>> around, but as noted in the mail linked from [1] maybe the caching isn't
>> needed anymore then?
>
> I answered vs. your other mail; yes, it's still needed, [...]

I replied just now (in [1], but it's not on lore yet, maybe vger ate my
mail again).

tl;dr: Ran some quick performance numbers, and can't reproduce any
slowdown at all when instrumenting the test suite to run another
setup_git_directory() that'll do a nested lookup on pretty much every
git command, and running the test suite (you mentioned a ~5x overall
slowdown).

Anyway, I think the two replies you've got here only partially address
what I pointed out, in particular in [2]:
    
    But I'm a bit iffy on a series that's structured in a way as to not
    start by asserting that we should have given semantics without the
    cache, and then adds the cache later as an optional optimization.

I.e. even if it was 10x as slow I think it should still be structured in
such a way as to at least have some GIT_TEST_* knob to turn it off in
favor of a slow path.

E.g. we've got commit-graph paths that are probably 100x or 1000x faster
than the slow path, but having the cache-less ones means we can validate
their correctness.
    
> and last I checked Atharva's series had been kicked out to make room
> for this one.

Indeed, but as a comment on this proposed series that doesn't really
address this in my above-quoted ([3])

    I.e. here we've just called "git submodule--helper
    run-update-procedure", and we pass it "$sm_path" (but not "$recursive",
    but could easily do so).

I.e. my understanding is that Junio ejected Atharva's because this
seemed like the smaller change, but perhaps it wasn't realized that we'd
be adding shellscript only to remove it again?

But in any case, even if we're patching git-submodule.sh not having
Atharva's go first in its entirety doesn't answer why we can't extract
the bits he/you came up with in C for this series.

I.e. that git-submodule.sh wouldn't need that shelling out (since it
just called a helper, that helper could just return this data too),
that's just a few lines above the hunk in this series.

I.e. if some remaining performance issue I couldn't reproduce in [1] is
due to the shellscript version of this v.s. calling a C function in
libgit isn't it better to focus on closing that gap than having the
caching mechanism?

(Per the above & linked mails I'm still not 100% sure this even *is* a
caching mechanism, or if we store primary data in it, but I'm just
trying to go by your commit message descriptions).

1. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/87r1cnfkqx.fsf@evledraar.gmail.com/
3. https://lore.kernel.org/git/211105.86wnlngebr.gmgdl@evledraar.gmail.com/

>> 
>> 1. https://lore.kernel.org/git/211105.861r3vhtot.gmgdl@evledraar.gmail.com/
>> 2. https://lore.kernel.org/git/20211013051805.45662-8-raykar.ath@gmail.com/
>> 3. https://lore.kernel.org/git/20211013051805.45662-1-raykar.ath@gmail.com/
>> 4. https://lore.kernel.org/git/YWiXL+plA7GHfuVv@google.com/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 0/4] cache parent project's gitdir in submodules
  2021-11-08 23:11     ` Emily Shaffer
@ 2021-11-09 15:58       ` Derrick Stolee
  0 siblings, 0 replies; 31+ messages in thread
From: Derrick Stolee @ 2021-11-09 15:58 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Albert Cui, Phillip Wood, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Matheus Tavares Bernardino, Jonathan Nieder, Jacob Keller,
	Atharva Raykar, Jonathan Tan

On 11/8/2021 6:11 PM, Emily Shaffer wrote:

> Yeah, I think we may be overthinking it, especially with the concerns
> about common dir vs. gitdir. More specifically - I think we accidentally
> did the right thing in the previous iteration by using the gitdir :)
> 
> I think we can probably put it pretty simply:
> submodule.superprojectGitDir should point from the most local gitdir of
> the submodule to the most local gitdir of the superproject.
> 
> Luckily there are not so many permutations to worry about here.
> 
> Super doesn't have worktrees, sub doesn't have worktrees:

> Super doesn't have worktrees, sub does have worktrees (and as you
> suggest above, right now this would have to be created carefully and
> manually, but later we probably want this to Just Work):

> Super has worktrees, sub doesn't have worktrees:
> Actually, I think in the future this might not be possible, if we want
> to make `git worktree add --recurse-submodules` work gracefully (and I
> do want that). But in the interim, in practice it looks like this:

> Both super and sub have worktrees:
> And this won't exist until we have graceful support of `git worktree add
> --recurse-submodules` or with some manual effort, now.

> I think this will give us access to both the worktree gitdir *and* the
> common gitdir:
> 
>   ~/git/.git/worktrees/git-second [GIT_DIR!]$ git rev-parse --git-common-dir
>   /usr/local/google/home/emilyshaffer/git/.git
> 
> So that means from any submodule, we can determine:
>  - submodule's gitdir (from the .git link in the submodule wt)
>  - submodule's common dir (from existing commands)
>  - gitdir of superproject which submodule inhabits (from the config in
>    the submodule's gitdir, or the submodule's config.worktree)
>  - common dir of superproject (from existing commands + prior config)
> 
> The upshot to me, then, means that we should be 1) making sure to get
> the path to the gitdir, not the common dir, of the superproject; and 2)
> using helpers to write to the worktree config, not to the local config,
> of the submodule. In other words, we want to avoid the following:
> 
> .git/
>   modules/
>     sub/
>       worktrees/
>         wt/
> 	  config
>       config <- "submodule.superprojectGitDir = ../../../.." as written by the worktree
> 
> Will take a look at the rest of the comments too, but this sounds like a
> reasonable approach to me.

I agree, that this seems reasonable. Spelling it out carefully like this,
along with your list of possibilities, clarifies where the data is located
and how we can construct any information we need from that.

You point out that there are cases that can be a bit tricky to get into
with current features, but this config approach won't make that any worse
right now.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-09  0:42           ` Ævar Arnfjörð Bjarmason
@ 2021-11-09 20:36             ` Emily Shaffer
  2021-11-09 21:46               ` Emily Shaffer
  0 siblings, 1 reply; 31+ messages in thread
From: Emily Shaffer @ 2021-11-09 20:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

On Tue, Nov 09, 2021 at 01:42:03AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Mon, Nov 08 2021, Emily Shaffer wrote:
> 
> > On Fri, Nov 05, 2021 at 09:43:56AM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> 
> >> 
> >> On Thu, Nov 04 2021, Junio C Hamano wrote:
> >> 
> >> > Emily Shaffer <emilyshaffer@google.com> writes:
> >> >
> >> >> A recorded hint path to the superproject's gitdir might be added during
> >> >> 'git submodule add', but in some cases - like submodules which were
> >> >> created before 'git submodule add' learned to record that info - it might
> >> >> be useful to update the hint. Let's do it during 'git submodule
> >> >> update', when we already have a handle to the superproject while calling
> >> >> operations on the submodules.
> >> >
> >> > We are hearing repeated mention of "cache" and "hint".  Do we ever
> >> > invalidate it, or if we have such a record, do we blindly trust it
> >> > and use it without verifying if it is still fresh?
> >> >
> >> > Also, this step and the previous step both say we record gitdir on
> >> > their title, but we instead record common dir.  Whichever is the
> >> > right choice to record, let's be consistent.
> >> 
> >> I had similar (AFAICT still unaddressed) feedback on the v2[1]. I'd lost
> >> track of this series, and see one reason is that the In-Reply-Chain was
> >> broken between v3..v4.
> >> 
> >> I.e. it seems to me that this whole thing started as a way to avoid
> >> shellscript overhead by calling git-rev-parse from git-submodule.sh, but
> >> now that the relevant bits are moved to C we could just call some
> >> slightly adjusted code in setup.c.
> >
> > No, that is not the case. It is the case that `git -C .. rev-parse
> > --git-dir` is *very* expensive in the case where `../` is not, in fact,
> > a gitdir; when I attempted another series which relied on finding the
> > parent superproject's gitdir in this way, our testsuite took something
> > like 5x longer to run than before. In other words, the expensive part is
> > not the shelling out overhead - the expensive part is searching up the
> > entire filesystem directory structure in the worst-case ("we are not a
> > submodule") scenario. This is still needed, even with 'git-submodule.sh'
> > moving to C.
> 
> Do you have that test code somewhere?

I messed around with it a little more, rebasing the no-caches-involved
older implementation and using an in-process lookup with
setup_git_directory_gently_1.
https://github.com/nasamuffin/git/tree/config-inheritance-no-cache

The recent experiments are in the tip commit, and the original series is
in the two commits prior if you're interested.

The upshot, though, is that I think there is still not a way around a
second subprocess. Before, we determined the superproject's gitdir like
so:

  # Does a git project at .. think I belong to it?
  git -C .. ls-files <args> -- path/to/submodule
  # Where does that git project's gitdir live?
  git -C .. rev-parse --absolute-git-dir

Even if we can do the second call in-process, we still will be
performing this ls-files call to ensure that the parent repo is actually
our superproject. (One good example of a time when the parent repo is
*not*: the entire Git test suite, where '/path/to/git/t/trash directory.t1234-abcd'
is not a submodule of '/path/to/git/.git'.)

We could reverse the checks, which will make this much less painful in
the real world, but will still slow down our test suites (and hopefully
you'll forgive me for combining C and bash so brazenly, but it's for
illustration purposes only):

  # Is there a git project above us?
  setup_git_directory_gently_1("..", out, 0);
  # Does it think we're its submodule?
  git -C $out rev-parse --absolute-git-dir

That will still result in an extra out-of-process call for every line in
the Git test suite, though, because of the trash directory layout.

I looked briefly at `git ls-files` // `cmd_ls_files()` and it's fairly
close to being callable on an arbitrary 'struct repository', but not
quite there. But I am pretty afraid of the rabbit hole ;)


And anyway, even with those possible changes, it turns out that
`setup_git_directory_gently_1` - which I had to munge a bit to make
non-private, anyway - wants to take shortcuts like looking at
getenv(GIT_DIR_ENVIRONMENT), which means it's will notice the
submodule's envvar before it will notice the path passed to it. I
actually am a little surprised that your experiment worked at all,
because of this wrinkle, but your printf lines show that it did somehow.


Taken as a whole, I'm not quite certain which is worse.

Approach "cache a pointer from the submodule":
- all the normal caching gotchas - creation, invalidation, staleness,
  etc
+ very easy lookup without need for significant refactoring
- since it's a config, if we do it wrong we're stuck supporting it
  forever anyway

Approach "do lots of in-process heuristics":
- need to refactor code areas unrelated to submodules or configs, like
  setup.c
- performance might differ based on filesystem speed
+ still pretty fast (compared to subprocess calls)
+ avoid all cache correctness issues
- still kind of based on heuristics; will someone envision a wonky way
  of organizing a submodule that breaks the heuristic?

I'll think on it more....

 - Emily

> I tried to reproduce this &
> can't. I run my tests in /home/avar/*, and just created this:
> 
>     $ find /tmp/some/ -name '.git' -type d
>     /tmp/some/dir/.git
>     /tmp/some/dir/a/b/c/d/e/f/g/i/j/k/.git
> 
> I.e. a deeply nested structure in /tmp, if you ask for the git-dir in
> /tmp/some/**/k you'll need to search several levels up.
> 
> Then with the patch below we'll instrument almost all git commands to
> optionally do that search, i.e. anything that does setup_git_directory()
> at all:
> 
>     $ GIT_TEST_SETUP=true GIT_TEST_SETUP_PRINT=true ~/g/git/git rev-parse HEAD
>     warning: from '/tmp/some/dir/a/b/c/d/e/f/g/i/j' found '/tmp/some/dir' ('/tmp/some/dir/.git')
>     <some hash here>
> 
> And as a quick test to run a few tests I tried:
> 
>     rm -rf test-results/; GIT_TEST_SETUP=true GIT_TEST_SETUP_PRINT=true prove -j8 t741*submod*.sh :: -V
> 
> Which runs quickly enough for a tight test loop, and does that work >600 times:
>     
>     $ cat test-results/*.out|grep -c warning.*from.*found
>     662
> 
> I can't get that to show me any meaningful difference, just to pick on
> one test (since it was easier to run repeatedly):
>     
>     $ hyperfine -L v true,false "GIT_TEST_SETUP={v} ./t7416-submodule-dash-url.sh --root=/dev/shm/git"
>     Benchmark #1: GIT_TEST_SETUP=true ./t7416-submodule-dash-url.sh --root=/dev/shm/git
>       Time (mean ± σ):     527.5 ms ±   7.2 ms    [User: 431.6 ms, System: 125.9 ms]
>       Range (min … max):   522.6 ms … 542.5 ms    10 runs
>      
>     Benchmark #2: GIT_TEST_SETUP=false ./t7416-submodule-dash-url.sh --root=/dev/shm/git
>       Time (mean ± σ):     526.7 ms ±  10.8 ms    [User: 421.1 ms, System: 131.6 ms]
>       Range (min … max):   518.2 ms … 546.8 ms    10 runs
>      
>     Summary
>       'GIT_TEST_SETUP=false ./t7416-submodule-dash-url.sh --root=/dev/shm/git' ran
>         1.00 ± 0.02 times faster than 'GIT_TEST_SETUP=true ./t7416-submodule-dash-url.sh --root=/dev/shm/git'
> 
> I.e. it's all fuzzy and within the error margins.
> 
> Now if we do e.g.:
> 
>     GIT_TEST_SETUP=false strace -f -c -U calls,name,time -S calls ./t7416-submodule-dash-url.sh 2>&1 >/dev/null | grep -A9000 calls >a
>     GIT_TEST_SETUP=true strace -f -c -U calls,name,time -S calls ./t7416-submodule-dash-url.sh 2>&1 >/dev/null | grep -A9000 calls >b
> 
> We'll see the syscall difference, in summary:
> 
>     -   110086 total            100.00
>     +   114765 total            100.00
> 
> And some of the real big differences are:
>     
>     $ diff -u <(head -n 12 a) <(head -n 12 b)
>     --- /dev/fd/63  2021-11-09 01:57:16.023991556 +0100
>     +++ /dev/fd/62  2021-11-09 01:57:16.019991593 +0100
>     @@ -1,12 +1,12 @@
>          calls syscall          % time
>      --------- ---------------- ------
>     -    11504 openat             2.15
>     -    11496 close              2.07
>     -    10672 read               4.15
>     -    10465 rt_sigaction       0.32
>     -     9913 lstat              1.50
>     -     9456 mmap               0.67
>     -     7545 stat               0.81
>     -     6349 fstat              0.62
>     -     3896 access             0.61
>     -     3490 mprotect           0.26
>     +    11783 lstat              1.80
>     +    11600 close              1.89
>     +    11599 openat             2.19
>     +    10887 read               4.36
>     +    10465 rt_sigaction       0.33
>     +     9742 stat               1.07
>     +     9455 mmap               0.75
>     +     6346 fstat              0.57
>     +     4113 access             0.65
>     +     3490 mprotect           0.28
> 
> But syscalls are fast, so it doesn't show up in real results.
> 
> Now, of course a real implementation could be less stupid, e.g. even if
> we think we need *a cache* if these are the performance numbers why do
> we need to risk the cache being incorrect v.s. say just writing "I am a
> submodule" somewhere in the config (if we don't have that).
> 
> Then we'll only do that work for submodules, so not all git invocations
> will pay the cost (and it this point we'll usually have read config
> anyway).
> 
> But I really just don't think it's that expensive at all. I can see how
> it would be for actually shelling out, but we don't need to do that.
> 
> This could also just be that I'm running this on a really fast FS, which
> is true. So I went and tested on an AIX machine I have access to.
> 
> I/O on AIX is slow, *really slow*, so slow that if you "rm -rfv"
> something you'll have time to read individual lines scrolling by.
> 
> That ~500ms t7416-submodule-dash-url.sh test runs in around 50s on that
> AIX box (power-aix.osuosl.org), most of which is I/O overhead, I created
> the same /tmp/ directory structure and tried with
> GIT_TEST_SETUP=[false|true] and it's ~55s with/without the env variable,
> with no clear winner.
> 
> I don't have access to hyperfine on that box, or the patience to wait
> for AIX I/O to wait for meaningful results, but to a first approximation
> that seems to indicate that it doesn't really matter there either.
> 
> diff --git a/setup.c b/setup.c
> index 347d7181ae9..8453d397676 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1209,6 +1209,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>         struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
>         const char *prefix = NULL;
>         struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
> +       const char *str = "/tmp/some/dir/a/b/c/d/e/f/g/i/j";
> +       struct strbuf a = STRBUF_INIT, b = STRBUF_INIT;
>  
>         /*
>          * We may have read an incomplete configuration before
> @@ -1231,6 +1233,14 @@ const char *setup_git_directory_gently(int *nongit_ok)
>                 die_errno(_("Unable to read current working directory"));
>         strbuf_addbuf(&dir, &cwd);
>  
> +       if (git_env_bool("GIT_TEST_SETUP", 0)) {
> +               strbuf_addstr(&a, str);
> +               setup_git_directory_gently_1(&a, &b, 0);
> +
> +               if (strcmp(a.buf, str) && git_env_bool("GIT_TEST_SETUP_PRINT", 0))
> +                       warning("from '%s' found '%s' ('%s/%s')", str, a.buf, a.buf, b.buf);
> +       }
> +
>         switch (setup_git_directory_gently_1(&dir, &gitdir, 1)) {
>         case GIT_DIR_EXPLICIT:
>                 prefix = setup_explicit_git_dir(gitdir.buf, &cwd, &repo_fmt, nongit_ok);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 4/4] submodule: record superproject gitdir during 'update'
  2021-11-09 20:36             ` Emily Shaffer
@ 2021-11-09 21:46               ` Emily Shaffer
  0 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2021-11-09 21:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

On Tue, Nov 09, 2021 at 12:36:50PM -0800, Emily Shaffer wrote:
> 
> On Tue, Nov 09, 2021 at 01:42:03AM +0100, Ævar Arnfjörð Bjarmason wrote:
> > 
> > 
> > On Mon, Nov 08 2021, Emily Shaffer wrote:
> > 
> > > On Fri, Nov 05, 2021 at 09:43:56AM +0100, Ævar Arnfjörð Bjarmason wrote:
> > >> 
> > >> 
> > >> On Thu, Nov 04 2021, Junio C Hamano wrote:
> > >> 
> > >> > Emily Shaffer <emilyshaffer@google.com> writes:
> > >> >
> > >> >> A recorded hint path to the superproject's gitdir might be added during
> > >> >> 'git submodule add', but in some cases - like submodules which were
> > >> >> created before 'git submodule add' learned to record that info - it might
> > >> >> be useful to update the hint. Let's do it during 'git submodule
> > >> >> update', when we already have a handle to the superproject while calling
> > >> >> operations on the submodules.
> > >> >
> > >> > We are hearing repeated mention of "cache" and "hint".  Do we ever
> > >> > invalidate it, or if we have such a record, do we blindly trust it
> > >> > and use it without verifying if it is still fresh?
> > >> >
> > >> > Also, this step and the previous step both say we record gitdir on
> > >> > their title, but we instead record common dir.  Whichever is the
> > >> > right choice to record, let's be consistent.
> > >> 
> > >> I had similar (AFAICT still unaddressed) feedback on the v2[1]. I'd lost
> > >> track of this series, and see one reason is that the In-Reply-Chain was
> > >> broken between v3..v4.
> > >> 
> > >> I.e. it seems to me that this whole thing started as a way to avoid
> > >> shellscript overhead by calling git-rev-parse from git-submodule.sh, but
> > >> now that the relevant bits are moved to C we could just call some
> > >> slightly adjusted code in setup.c.
> > >
> > > No, that is not the case. It is the case that `git -C .. rev-parse
> > > --git-dir` is *very* expensive in the case where `../` is not, in fact,
> > > a gitdir; when I attempted another series which relied on finding the
> > > parent superproject's gitdir in this way, our testsuite took something
> > > like 5x longer to run than before. In other words, the expensive part is
> > > not the shelling out overhead - the expensive part is searching up the
> > > entire filesystem directory structure in the worst-case ("we are not a
> > > submodule") scenario. This is still needed, even with 'git-submodule.sh'
> > > moving to C.
> > 
> > Do you have that test code somewhere?
> 
> I messed around with it a little more, rebasing the no-caches-involved
> older implementation and using an in-process lookup with
> setup_git_directory_gently_1.
> https://github.com/nasamuffin/git/tree/config-inheritance-no-cache
> 
> The recent experiments are in the tip commit, and the original series is
> in the two commits prior if you're interested.
> 
> The upshot, though, is that I think there is still not a way around a
> second subprocess. Before, we determined the superproject's gitdir like
> so:
> 
>   # Does a git project at .. think I belong to it?
>   git -C .. ls-files <args> -- path/to/submodule
>   # Where does that git project's gitdir live?
>   git -C .. rev-parse --absolute-git-dir
> 
> Even if we can do the second call in-process, we still will be
> performing this ls-files call to ensure that the parent repo is actually
> our superproject. (One good example of a time when the parent repo is
> *not*: the entire Git test suite, where '/path/to/git/t/trash directory.t1234-abcd'
> is not a submodule of '/path/to/git/.git'.)
> 
> We could reverse the checks, which will make this much less painful in
> the real world, but will still slow down our test suites (and hopefully
> you'll forgive me for combining C and bash so brazenly, but it's for
> illustration purposes only):
> 
>   # Is there a git project above us?
>   setup_git_directory_gently_1("..", out, 0);
>   # Does it think we're its submodule?
>   git -C $out rev-parse --absolute-git-dir
> 
> That will still result in an extra out-of-process call for every line in
> the Git test suite, though, because of the trash directory layout.
> 
> I looked briefly at `git ls-files` // `cmd_ls_files()` and it's fairly
> close to being callable on an arbitrary 'struct repository', but not
> quite there. But I am pretty afraid of the rabbit hole ;)

Jonathan Nieder and Glen Choo pointed out that I can read in the index
to an arbitrary struct index_state from a path, and then call
'index_dir_exists()', so this part of it is not as scary as I thought.

I'll mess around today and see if I can come up with an in-process
version of 'get_superproject_working_tree()' and
'get_superproject_gitdir()'.

Thanks for the lead - a fine example of how useful it is to receive
high-level input from someone removed from the problem ;)

 - Emily

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-11-09 21:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14 20:34 [PATCH v4 0/4] cache parent project's gitdir in submodules Emily Shaffer
2021-10-14 20:34 ` [PATCH v4 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2021-10-14 20:34 ` [PATCH v4 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
2021-10-14 20:34 ` [PATCH v4 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2021-10-18 23:18   ` Jonathan Tan
2021-10-25 16:14     ` Derrick Stolee
2021-11-04 23:22       ` Emily Shaffer
2021-11-08  1:07         ` Derrick Stolee
2021-11-04 22:09     ` Emily Shaffer
2021-10-14 20:34 ` [PATCH v4 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
2021-10-25 16:17   ` Derrick Stolee
2021-10-25 16:19 ` [PATCH v4 0/4] cache parent project's gitdir in submodules Derrick Stolee
2021-11-04 23:49 ` [PATCH v5 " Emily Shaffer
2021-11-04 23:49   ` [PATCH v5 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2021-11-04 23:49   ` [PATCH v5 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
2021-11-05  7:50     ` Junio C Hamano
2021-11-08 23:16       ` Emily Shaffer
2021-11-04 23:49   ` [PATCH v5 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2021-11-04 23:49   ` [PATCH v5 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
2021-11-05  4:49     ` Junio C Hamano
2021-11-05  8:43       ` Ævar Arnfjörð Bjarmason
2021-11-08 23:21         ` Emily Shaffer
2021-11-09  0:42           ` Ævar Arnfjörð Bjarmason
2021-11-09 20:36             ` Emily Shaffer
2021-11-09 21:46               ` Emily Shaffer
2021-11-05  8:51     ` Ævar Arnfjörð Bjarmason
2021-11-08 23:22       ` Emily Shaffer
2021-11-09  1:12         ` Ævar Arnfjörð Bjarmason
2021-11-08  1:24   ` [PATCH v5 0/4] cache parent project's gitdir in submodules Derrick Stolee
2021-11-08 23:11     ` Emily Shaffer
2021-11-09 15:58       ` Derrick Stolee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.