All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] git-status reports relation to superproject
@ 2017-11-08 19:55 Stefan Beller
  2017-11-08 19:55 ` [PATCH 1/4] remote, revision: factor out exclusive counting between two commits Stefan Beller
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-08 19:55 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

  $ git -c status.superprojectinfo status
  HEAD detached at v2.15-rc2
  superproject is 6 commits behind HEAD 7070ce2..5e6d0fb
  nothing to commit, working tree clean

How cool is that?

This series side steps the questions raised in
https://public-inbox.org/git/xmqq4lq6hmp2.fsf_-_@gitster.mtv.corp.google.com/
which I am also putting together albeit slowly.

This series just reports the relationship between the superprojects gitlink
(if any) to HEAD. I think that is useful information in the current
world of submodules.

Stefan

Stefan Beller (4):
  remote, revision: factor out exclusive counting between two commits
  submodule.c: factor start_ls_files_dot_dot out of
    get_superproject_working_tree
  submodule.c: get superprojects gitlink value
  git-status: report reference to superproject

 Documentation/config.txt |  5 +++
 builtin/commit.c         |  2 ++
 remote.c                 | 40 +-----------------------
 revision.c               | 45 +++++++++++++++++++++++++++
 revision.h               |  7 +++++
 submodule.c              | 80 ++++++++++++++++++++++++++++++++++++------------
 submodule.h              |  6 ++++
 t/t7519-superproject.sh  | 57 ++++++++++++++++++++++++++++++++++
 wt-status.c              | 37 ++++++++++++++++++++++
 wt-status.h              |  1 +
 10 files changed, 222 insertions(+), 58 deletions(-)
 create mode 100755 t/t7519-superproject.sh

-- 
2.15.0.128.g40905b34bf.dirty


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] remote, revision: factor out exclusive counting between two commits
  2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
@ 2017-11-08 19:55 ` Stefan Beller
  2017-11-08 19:55 ` [PATCH 2/4] submodule.c: factor start_ls_files_dot_dot out of get_superproject_working_tree Stefan Beller
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-08 19:55 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 remote.c   | 40 +---------------------------------------
 revision.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 revision.h |  7 +++++++
 3 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/remote.c b/remote.c
index 685e776a65..60c689383a 100644
--- a/remote.c
+++ b/remote.c
@@ -1990,9 +1990,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
 {
 	struct object_id oid;
 	struct commit *ours, *theirs;
-	struct rev_info revs;
 	const char *base;
-	struct argv_array argv = ARGV_ARRAY_INIT;
 
 	/* Cannot stat unless we are marked to build on top of somebody else. */
 	base = branch_get_upstream(branch, NULL);
@@ -2014,43 +2012,7 @@ int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
 	if (!ours)
 		return -1;
 
-	/* are we the same? */
-	if (theirs == ours) {
-		*num_theirs = *num_ours = 0;
-		return 0;
-	}
-
-	/* Run "rev-list --left-right ours...theirs" internally... */
-	argv_array_push(&argv, ""); /* ignored */
-	argv_array_push(&argv, "--left-right");
-	argv_array_pushf(&argv, "%s...%s",
-			 oid_to_hex(&ours->object.oid),
-			 oid_to_hex(&theirs->object.oid));
-	argv_array_push(&argv, "--");
-
-	init_revisions(&revs, NULL);
-	setup_revisions(argv.argc, argv.argv, &revs, NULL);
-	if (prepare_revision_walk(&revs))
-		die("revision walk setup failed");
-
-	/* ... and count the commits on each side. */
-	*num_ours = 0;
-	*num_theirs = 0;
-	while (1) {
-		struct commit *c = get_revision(&revs);
-		if (!c)
-			break;
-		if (c->object.flags & SYMMETRIC_LEFT)
-			(*num_ours)++;
-		else
-			(*num_theirs)++;
-	}
-
-	/* clear object flags smudged by the above traversal */
-	clear_commit_marks(ours, ALL_REV_FLAGS);
-	clear_commit_marks(theirs, ALL_REV_FLAGS);
-
-	argv_array_clear(&argv);
+	compare_commits(ours, theirs, num_ours, num_theirs);
 	return 0;
 }
 
diff --git a/revision.c b/revision.c
index 99c95c19b0..fe1faf2628 100644
--- a/revision.c
+++ b/revision.c
@@ -1159,6 +1159,51 @@ int ref_excluded(struct string_list *ref_excludes, const char *path)
 	return 0;
 }
 
+void compare_commits(struct commit *ours, struct commit *theirs,
+		    int *num_ours, int *num_theirs)
+{
+	struct rev_info revs;
+	struct argv_array argv = ARGV_ARRAY_INIT;
+
+	/* are we the same? */
+	if (theirs == ours) {
+		*num_theirs = *num_ours = 0;
+		return;
+	}
+
+	/* Run "rev-list --left-right ours...theirs" internally... */
+	argv_array_push(&argv, ""); /* ignored */
+	argv_array_push(&argv, "--left-right");
+	argv_array_pushf(&argv, "%s...%s",
+			 oid_to_hex(&ours->object.oid),
+			 oid_to_hex(&theirs->object.oid));
+	argv_array_push(&argv, "--");
+
+	init_revisions(&revs, NULL);
+	setup_revisions(argv.argc, argv.argv, &revs, NULL);
+	if (prepare_revision_walk(&revs))
+		die("revision walk setup failed");
+
+	/* ... and count the commits on each side. */
+	*num_ours = 0;
+	*num_theirs = 0;
+	while (1) {
+		struct commit *c = get_revision(&revs);
+		if (!c)
+			break;
+		if (c->object.flags & SYMMETRIC_LEFT)
+			(*num_ours)++;
+		else
+			(*num_theirs)++;
+	}
+
+	/* clear object flags smudged by the above traversal */
+	clear_commit_marks(ours, ALL_REV_FLAGS);
+	clear_commit_marks(theirs, ALL_REV_FLAGS);
+
+	argv_array_clear(&argv);
+}
+
 static int handle_one_ref(const char *path, const struct object_id *oid,
 			  int flag, void *cb_data)
 {
diff --git a/revision.h b/revision.h
index 54761200ad..3ff6a5190b 100644
--- a/revision.h
+++ b/revision.h
@@ -324,4 +324,11 @@ extern int rewrite_parents(struct rev_info *revs, struct commit *commit,
  */
 extern struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
 
+/*
+ * Compute the number of commits between 'one' and 'two' storing the number
+ * of commits in their parent DAG  ncluded in each but not the other.
+ */
+extern void compare_commits(struct commit *one, struct commit *two,
+			    int *num_one, int *num_two);
+
 #endif
-- 
2.15.0.128.g40905b34bf.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] submodule.c: factor start_ls_files_dot_dot out of get_superproject_working_tree
  2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
  2017-11-08 19:55 ` [PATCH 1/4] remote, revision: factor out exclusive counting between two commits Stefan Beller
@ 2017-11-08 19:55 ` Stefan Beller
  2017-11-08 19:55 ` [PATCH 3/4] submodule.c: get superprojects gitlink value Stefan Beller
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-08 19:55 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

We'll reuse the code of the factored out function shortly, when exploring
the superproject for another aspect. Instead of knowing the root of the
superproject we'll find out about the gitlink value.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 53 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 239d94d539..4fcb64469e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1977,15 +1977,13 @@ void absorb_git_dir_into_superproject(const char *prefix,
 	}
 }
 
-const char *get_superproject_working_tree(void)
+/* Starts a child process `ls-files` one directory above the root of the repo. */
+static int start_ls_files_dot_dot(struct child_process *cp, struct strbuf *out)
 {
-	struct child_process cp = CHILD_PROCESS_INIT;
-	struct strbuf sb = STRBUF_INIT;
 	const char *one_up = real_path_if_valid("../");
-	const char *cwd = xgetcwd();
-	const char *ret = NULL;
 	const char *subpath;
-	int code;
+	char *cwd = xgetcwd();
+	struct strbuf sb = STRBUF_INIT;
 	ssize_t len;
 
 	if (!is_inside_work_tree())
@@ -1994,31 +1992,48 @@ const char *get_superproject_working_tree(void)
 		 * We might have a superproject, but it is harder
 		 * to determine.
 		 */
-		return NULL;
+		return -1;
 
 	if (!one_up)
-		return NULL;
+		return -1;
 
 	subpath = relative_path(cwd, one_up, &sb);
 
-	prepare_submodule_repo_env(&cp.env_array);
-	argv_array_pop(&cp.env_array);
+	prepare_submodule_repo_env(&cp->env_array);
+	argv_array_pop(&cp->env_array);
 
-	argv_array_pushl(&cp.args, "--literal-pathspecs", "-C", "..",
+	argv_array_pushl(&cp->args, "--literal-pathspecs", "-C", "..",
 			"ls-files", "-z", "--stage", "--full-name", "--",
 			subpath, NULL);
-	strbuf_reset(&sb);
 
-	cp.no_stdin = 1;
-	cp.no_stderr = 1;
-	cp.out = -1;
-	cp.git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->no_stderr = 1;
+	cp->out = -1;
+	cp->git_cmd = 1;
 
-	if (start_command(&cp))
+	if (start_command(cp))
 		die(_("could not start ls-files in .."));
 
-	len = strbuf_read(&sb, cp.out, PATH_MAX);
-	close(cp.out);
+	len = strbuf_read(out, cp->out, PATH_MAX);
+	close(cp->out);
+
+	strbuf_release(&sb);
+	free(cwd);
+	return len;
+}
+
+const char *get_superproject_working_tree(void)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
+	const char *cwd = xgetcwd();
+	const char *ret = NULL;
+	int code;
+	ssize_t len;
+
+	len = start_ls_files_dot_dot(&cp, &sb);
+	if (len < 0)
+		return NULL;
 
 	if (starts_with(sb.buf, "160000")) {
 		int super_sub_len;
-- 
2.15.0.128.g40905b34bf.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] submodule.c: get superprojects gitlink value
  2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
  2017-11-08 19:55 ` [PATCH 1/4] remote, revision: factor out exclusive counting between two commits Stefan Beller
  2017-11-08 19:55 ` [PATCH 2/4] submodule.c: factor start_ls_files_dot_dot out of get_superproject_working_tree Stefan Beller
@ 2017-11-08 19:55 ` Stefan Beller
  2017-11-08 19:55 ` [PATCH 4/4] git-status: report reference to superproject Stefan Beller
  2017-11-08 22:36 ` [RFC PATCH 0/4] git-status reports relation " Jonathan Tan
  4 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-08 19:55 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule.c | 27 +++++++++++++++++++++++++++
 submodule.h |  6 ++++++
 2 files changed, 33 insertions(+)

diff --git a/submodule.c b/submodule.c
index 4fcb64469e..68b123eb13 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2074,6 +2074,33 @@ const char *get_superproject_working_tree(void)
 	return ret;
 }
 
+/*
+ * Returns 0 when the gitlink is found in the superprojects index,
+ * the value will be found in `oid`. Otherwise return -1.
+ */
+int get_superproject_gitlink(struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
+	const char *hash;
+
+	if (start_ls_files_dot_dot(&cp, &sb) < 0)
+		return -1;
+
+	if (!skip_prefix(sb.buf, "160000 ", &hash))
+		/*
+		 * superproject doesn't have a gitlink at submodule position or
+		 * output is gibberish
+		 */
+		return -1;
+
+	if (get_oid_hex(hash, oid))
+		/* could not parse the object name */
+		return -1;
+
+	return 0;
+}
+
 /*
  * Put the gitdir for a submodule (given relative to the main
  * repository worktree) into `buf`, or return -1 on error.
diff --git a/submodule.h b/submodule.h
index f0da0277a4..5fc602f0c7 100644
--- a/submodule.h
+++ b/submodule.h
@@ -137,4 +137,10 @@ extern void absorb_git_dir_into_superproject(const char *prefix,
  */
 extern const char *get_superproject_working_tree(void);
 
+/*
+ * Returns 0 when the gitlink is found in the superprojects index,
+ * the value will be found in `oid`. Otherwise return -1.
+ */
+extern int get_superproject_gitlink(struct object_id *oid);
+
 #endif
-- 
2.15.0.128.g40905b34bf.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] git-status: report reference to superproject
  2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
                   ` (2 preceding siblings ...)
  2017-11-08 19:55 ` [PATCH 3/4] submodule.c: get superprojects gitlink value Stefan Beller
@ 2017-11-08 19:55 ` Stefan Beller
  2017-11-08 22:36 ` [RFC PATCH 0/4] git-status reports relation " Jonathan Tan
  4 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-08 19:55 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

In a submodule the position of HEAD in relation to the gitlink pointer
in the superproject may be of interest.

Introduce a config option `status.superprojectInfo` that when enabled
will report the relation between HEAD and the commit pointed to by the
gitlink in the index of the superproject.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt |  5 +++++
 builtin/commit.c         |  2 ++
 t/t7519-superproject.sh  | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
 wt-status.c              | 37 +++++++++++++++++++++++++++++++
 wt-status.h              |  1 +
 5 files changed, 102 insertions(+)
 create mode 100755 t/t7519-superproject.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5f0d62753d..7825a1a7be 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3097,6 +3097,11 @@ status.submoduleSummary::
 	submodule summary' command, which shows a similar output but does
 	not honor these settings.
 
+status.superprojectInfo
+	Defaults to false.
+	This shows the relation of the current HEAD to the commit pointed
+	to be the gitlink entry in the superprojects index.
+
 stash.showPatch::
 	If this is set to true, the `git stash show` command without an
 	option will show the stash entry in patch form.  Defaults to false.
diff --git a/builtin/commit.c b/builtin/commit.c
index c38542ee46..f937f6c6cf 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1286,6 +1286,8 @@ static int git_status_config(const char *k, const char *v, void *cb)
 			s->submodule_summary = -1;
 		return 0;
 	}
+	if (!strcmp(k, "status.superprojectinfo"))
+		s->superproject_info = git_config_bool(k, v);
 	if (!strcmp(k, "status.short")) {
 		if (git_config_bool(k, v))
 			status_deferred_config.status_format = STATUS_FORMAT_SHORT;
diff --git a/t/t7519-superproject.sh b/t/t7519-superproject.sh
new file mode 100755
index 0000000000..ade2379a59
--- /dev/null
+++ b/t/t7519-superproject.sh
@@ -0,0 +1,57 @@
+
+
+test_description='git status for superproject relations'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit initial &&
+	test_create_repo sub &&
+	# this whole file tests superproject reporting, so set this config here
+	git -C sub config status.superprojectInfo true
+'
+
+test_expect_success 'repo on initial commit does not mention superproject' '
+	git -C sub status > actual &&
+	test_i18ngrep "No commits yet" actual &&
+	test_i18ngrep -v superproject actual
+'
+
+test_expect_success 'setup submodule' '
+	test_commit -C sub initial &&
+	git submodule add ./sub sub
+'
+
+test_expect_success 'submodule in sync with superproject index' '
+	git -C sub status >actual &&
+	test_i18ngrep "superproject points at HEAD" actual
+'
+
+test_expect_success 'submodule in sync with superproject' '
+	git commit -a -m "superproject adds submodule" &&
+	git -C sub status >actual &&
+	test_i18ngrep "superproject points at HEAD" actual
+'
+
+test_expect_success 'submodule advances two commits' '
+	git -C sub commit --allow-empty -m "test" &&
+	git -C sub commit --allow-empty -m "test2" &&
+	git -C sub status >actual &&
+	test_i18ngrep "superproject is 2 commits behind HEAD" actual
+'
+
+test_expect_success 'submodule behind superproject' '
+	git add sub &&
+	git commit -m "update sub" &&
+	git -C sub reset --hard HEAD^ &&
+	git -C sub status >actual &&
+	test_i18ngrep "superproject is ahead of HEAD by 1 commits" actual
+'
+
+test_expect_success 'submodule and superproject differ' '
+	git -C sub commit --allow-empty -m "test2b" &&
+	git -C sub status >actual &&
+	test_i18ngrep "superproject and HEAD differ by +1, -1 commits" actual
+'
+
+test_done
diff --git a/wt-status.c b/wt-status.c
index bedef256ce..3e8e27a550 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1027,6 +1027,41 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 	strbuf_release(&sb);
 }
 
+static void wt_longstatus_print_superproject_relation(struct wt_status *s)
+{
+	struct object_id oid;
+	struct commit *in_gitlink, *head;
+	int head_nr, gitlink_nr;
+
+	if (get_superproject_gitlink(&oid))
+		return;
+
+	in_gitlink = lookup_commit(&oid);
+
+	read_ref("HEAD", &oid);
+	head = lookup_commit(&oid);
+
+	compare_commits(head, in_gitlink, &head_nr, &gitlink_nr);
+
+	if (!head_nr && !gitlink_nr)
+		printf(_("superproject points at HEAD\n"));
+	else if (!head_nr)
+		printf(_("superproject is ahead of HEAD by %d commits %s..%s\n"),
+			 gitlink_nr,
+			 find_unique_abbrev(head->object.oid.hash, DEFAULT_ABBREV),
+			 find_unique_abbrev(in_gitlink->object.oid.hash, DEFAULT_ABBREV));
+	else if (!gitlink_nr)
+		printf(_("superproject is %d commits behind HEAD %s..%s\n"),
+			 head_nr,
+			 find_unique_abbrev(in_gitlink->object.oid.hash, DEFAULT_ABBREV),
+			 find_unique_abbrev(head->object.oid.hash, DEFAULT_ABBREV));
+	else
+		printf(_("superproject and HEAD differ by +%d, -%d commits %s...%s\n"),
+			 gitlink_nr, head_nr,
+			 find_unique_abbrev(head->object.oid.hash, DEFAULT_ABBREV),
+			 find_unique_abbrev(in_gitlink->object.oid.hash, DEFAULT_ABBREV));
+}
+
 static int has_unmerged(struct wt_status *s)
 {
 	int i;
@@ -1593,6 +1628,8 @@ static void wt_longstatus_print(struct wt_status *s)
 		if (!s->is_initial)
 			wt_longstatus_print_tracking(s);
 	}
+	if (!s->is_initial && s->superproject_info)
+		wt_longstatus_print_superproject_relation(s);
 
 	wt_longstatus_print_state(s, &state);
 	free(state.branch);
diff --git a/wt-status.h b/wt-status.h
index 64f4d33ea1..75700980d2 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -70,6 +70,7 @@ struct wt_status {
 	int display_comment_prefix;
 	int relative_paths;
 	int submodule_summary;
+	int superproject_info;
 	int show_ignored_files;
 	enum untracked_status_type show_untracked_files;
 	const char *ignore_submodule_arg;
-- 
2.15.0.128.g40905b34bf.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/4] git-status reports relation to superproject
  2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
                   ` (3 preceding siblings ...)
  2017-11-08 19:55 ` [PATCH 4/4] git-status: report reference to superproject Stefan Beller
@ 2017-11-08 22:36 ` Jonathan Tan
  2017-11-09  0:10   ` [RFD] Long term plan with submodule refs? Stefan Beller
  4 siblings, 1 reply; 16+ messages in thread
From: Jonathan Tan @ 2017-11-08 22:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed,  8 Nov 2017 11:55:05 -0800
Stefan Beller <sbeller@google.com> wrote:

>   $ git -c status.superprojectinfo status
>   HEAD detached at v2.15-rc2
>   superproject is 6 commits behind HEAD 7070ce2..5e6d0fb
>   nothing to commit, working tree clean
> 
> How cool is that?
> 
> This series side steps the questions raised in
> https://public-inbox.org/git/xmqq4lq6hmp2.fsf_-_@gitster.mtv.corp.google.com/
> which I am also putting together albeit slowly.
> 
> This series just reports the relationship between the superprojects gitlink
> (if any) to HEAD. I think that is useful information in the current
> world of submodules.

The relationship is indeed currently useful, but if the long term plan
is to strongly discourage detached submodule HEAD, then I would think
that these patches are in the wrong direction. (If the long term plan is
to end up supporting both detached and linked submodule HEAD, then these
patches are fine, of course.) So I think that the plan referenced in
Junio's email (that you linked above) still needs to be discussed.

About the patches themselves, they look OK to me. Some minor things off
the top of my head are to retain the "ours" and "theirs" (instead of
"one" and "two"), and to replicate the language in remote.c more closely
("This submodule is (ahead of/behind) the superproject by %d commit(s)")
instead of inventing your own.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFD] Long term plan with submodule refs?
  2017-11-08 22:36 ` [RFC PATCH 0/4] git-status reports relation " Jonathan Tan
@ 2017-11-09  0:10   ` Stefan Beller
  2017-11-09  1:29     ` Jonathan Tan
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-09  0:10 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, sbeller

> The relationship is indeed currently useful, but if the long term plan
> is to strongly discourage detached submodule HEAD, then I would think
> that these patches are in the wrong direction. (If the long term plan is
> to end up supporting both detached and linked submodule HEAD, then these
> patches are fine, of course.) So I think that the plan referenced in
> Junio's email (that you linked above) still needs to be discussed.

This email presents different approaches.

Objective
=========
This document should summarize the current situation of Git submodules
and start a discussion of where it can be headed long term.
Show different ways in which submodule refs could evolve.

Background
==========
Submodules in Git are considered as an independet repository currently.
This is okay for current workflows, such as utilizing a library that is
rarely updated. Other workflows that require a tighter integration between
submodule and superproject are possible, but cumbersome as there is an
additional step that has to be performed, which is the update of the gitlink
pointer in the superproject.

Other discussions of the past:
"Re-attach HEAD?"
  https://public-inbox.org/git/20170501180058.8063-1-sbeller@google.com/
"Semantics of checkout --recursive for submodules on a branch"
  https://public-inbox.org/git/20170630003851.17288-1-sbeller@google.com/
"A new type of symref?"
  https://public-inbox.org/git/xmqqvamqg2fy.fsf@gitster.mtv.corp.google.com/

Workflows
=========
* Obtaining a copy of the Superproject tightly coupled with submodules
  solved via git clone --recurse-submodules=<pathspec>
* Changing the submodule selection
  solved via submodule.active flags
* Changing the remote / Interacting with a different remote for all submodules
  -> need to be solved, not core issue of this discussion
* Syncing to the latest upstream
  solved via git pull --recurse  
* Working on a local feature in one submodule
  -> How do refs work spanning superproject/submodule?
* Working on a feature spanning multiple submodules
  -> How do refs work spanning multiple repos?
* Working on a bug fix (Changing the feature that you currently work on, branches)
  -> How does switching branches in the superproject affect submodules

This discussion should resolve around refs are handled in submodules in
relation to a superproject.

Possible data models and workflow implications
==============================================
In the following different data models are presented, which aid a submodule
heavy workflow each giving pros and cons.

Keep everything as is, superproject and submodule have their own refs
---------------------------------------------------------------------
In this alternative we'd just make existing commands nicer, e.g.
git-status, git-log would give information about the superprojects
gitlink similar as they give information about a remote branch.

We might want to introduce an option that triggers adding the submodule
to the superproject once a commit is done in the submodule.

Pros:
 * easiest to implement
 * easy to understand when having a git background already
 
Cons:
 * Current tools that manage multiple repositories (e.g. repo, git-slave)
   have "branches in parallel", i.e. each repo has a branch of the same
   name, instead of using a superproject to manage the state of all repos
   involved. So users of such tools may be confused by submodules.
 * when using a detached HEAD in the submodule, we may run into git-gc issues.
 

Use replicate refs in submodules
--------------------------------
This approach will replicate the superproject refs into the submodule
ref namespace, e.g. git-branch learns about --recurse-submodules, which
creates a branch of a given name in all submodules. These (topic) branches
should be kept in sync with the superproject

Pros:
 * This seemed intuitive to Gerrit users
 * 'quick' to implement, most of the commands are already there,
   just git-branch is needed to have the workflows mentioned above complete.
Cons:
 * What does "git checkout -b A B" mean? (special case: B == HEAD)
   Is the branch name replicated as a string into the submodule operation,
   or do we dereference the superprojects gitlink and walk from there?
   When taking the superprojects gitlink, then why do we have the branches
   in the submodule in the first place? When taking the string as-is,
   then it might confuse users.
 * non-atomic of refs between superproject and submodule by design;
   This relies on superproject and submodule to stay in sync via hope.

No submodule refstore at all
----------------------------
Use refs and commits in the superproject to stitch submodule changes
together. Disallow branches in the submodule. This is only restricted
to the working tree inside the superproject, such that the output of git-branch
changes depending whether the working tree is in- or outside the superproject
working tree.

The messages of git-status inside the superproject working tree are changed
as "detached HEAD"s are common in submodule and sound scary.  Maybe
"following the superproject"

Pros:
 * solves the atomicity issue from the prior proposal
Cons:
 * In a submodule one must use a worktree outside the superproject
   to do upstream work.
 * As the detached HEAD is not referenced, we have git-gc issues.


New type of symbolic refs
=========================
A symbolic ref can currently only point at a ref or another symbolic ref.
This proposal showcases different scenarios on how this could change in the
future.

HEAD pointing at the superprojects index
----------------------------------------
Introduce a new symbolic ref that points at the superprojects
index of the gitlink. The format is

  "repo:" <superprojects gitdir> '\0' <gitlink-path> '\0'

Just like existing symrefs, the content of the ref will be read and followed.
On reading "repo:", the sha1 will be obtained equivalent to:

    git -C <superproject> ls-files -s <gitlink-path> | awk '{ print $2}'

Ref write operations driven by the submodule, affecting symrefs
  e.g. git checkout <other branch> (in the submodule)

In this scenario only the HEAD is optionally attached to the superproject,
so we can rewrite the HEAD to be anything else, such as a branch just fine.
Once the HEAD is not pointing at the superproject any more, we'll leave the
submodule alone in operations driven by the superproject.
To get back on the superproject branch, we’d need to invent new UX, such as
   git checkout --attach-superproject
as that is similar to --detach

Ref write operations driven by the submodule, affecting target ref
  e.g. git commit, reset --hard, update-ref (in the submodule)

The HEAD stays the same, pointing at the superproject.
The gitlink is changed to the target sha1, using

  git -C <superproject> update-index --add \
      --cacheinfo 160000,$SHA1,<gitlink-path>

This will affect the superprojects index, such that then a commit in
the superproject is needed.

Ref write operations driven by the superproject, changing the gitlink
  e.g. git checkout <tree-ish>, git reset --hard (in the superproject)

This will change the gitlink in the superprojects index, such that the HEAD
in the submodule changes, which would trigger an update of the
submodules working tree.

Superproject operations spanning index and worktree
  E.g. git reset --mixed
As the submodules HEAD is defined in the index, we would reset it to the
version in the last commit. As --mixed promises to not touch the working tree,
the submodules worktree would not be touched. git reset --mixed in the
superproject is the same as --soft in the submodule.

Consistency considerations (gc)
  e.g. git gc --aggressive --prune=now

The repacking logic is already aware of a detached HEAD, such that
using this new symref mechanism would not generate problems as long as
we keep the HEAD attached to the superproject. However when commits/objects
are created while the HEAD is attached to the superproject and then HEAD
switches to a local branch, there are problems with the created objects
as they seem unreachable now.

This problem is not new as a superproject may record submodule objects
that are not reachable from any of the submodule branches. Such objects
fall prey to overzealous packing in the submodule.

This proposal however exposes this problem a lot more, as the submodule
has fewer needs for branches.

Pros
 * easy to tell if a submodule is attached to the superproject,
 * no atomicity issues
 * once enough commands implement this behavior, it may be easier to understand
   than previous alternatives and feel more intuitive
Cons:
 * gc issues for now
 * lots of work as it revamps submodules alot.
 
 This last proposal might be differentiated further, e.g. the submodule HEAD
 pointing at the superprojects gitlink in the index, in its HEAD or other
 branch.

Any feedback welcome!
Stefan









^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  0:10   ` [RFD] Long term plan with submodule refs? Stefan Beller
@ 2017-11-09  1:29     ` Jonathan Tan
  2017-11-09  5:47       ` Junio C Hamano
  2017-11-09  5:08     ` Junio C Hamano
  2017-11-09  6:54     ` Jacob Keller
  2 siblings, 1 reply; 16+ messages in thread
From: Jonathan Tan @ 2017-11-09  1:29 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed,  8 Nov 2017 16:10:07 -0800
Stefan Beller <sbeller@google.com> wrote:

I thought of a possible alternative and how it would work.

> Possible data models and workflow implications
> ==============================================
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.

What if, in the submodule, we have a new ref backend that mirrors the
superproject? When initializing the submodule, its original refs are not
cloned at all, but instead virtual refs are used.

Creation of brand-new refs is forbidden in the submodule.

When reading a ref in the submodule, if that ref is the current branch
in the superproject, read the corresponding gitlink entry in the index
(which may be dirty); otherwise read the gitlink in the tree of the tip
commit.

When updating a ref in the submodule, if that ref is the current branch
in the superproject, update the index; otherwise, create a commit on top
of the tip and update the ref to point to the new tip.

No synchronicity is enforced between superproject and submodule in terms
of HEAD, though: If a submodule is currently checked out to a branch,
and the gitlink for that branch is updated through whatever means, that
is equivalent to a "git reset --soft" in the submodule.

These rules seem straightforward to me (although I have been working
with Git for a while, so perhaps I'm not the best judge), and I think
leads to a good workflow, as discussed below.

> Workflows
> =========
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=<pathspec>
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  

(skipping the above, since they are either solved or not a core issue)

> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?

This is perhaps one weak point of my proposal - you can't work on a
submodule as if it were independent. You can checkout a branch and make
commits, but (i) they will automatically affect the superproject, and
(ii) the "origin/foo" etc. branches are those of the superproject. (But
if you checkout a detached HEAD, everything should still work.)

> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?

The above rules allow the following workflow:
 - "checkout --recurse-submodules" the branch you want on the
   superproject
 - make whatever changes you want in each submodule
 - commit each individual submodule (which updates the index of the
   superproject), then commit the superproject (we can introduce a
   commit --recurse-submodules to make this more convenient)
 - a "push --recurse-submodules" can be implemented to push the
   superproject and its submodules independently (and the same refspec
   can be legitimately used both when pushing the superproject and when
   pushing a submodule, since the ref names are the same, and not by
   coincidence)

If the user insists on making changes on a non-current branch (i.e. by
creating commits in submodules then using "git update-ref" or
equivalent), possibly multiple commits would be created in the
superproject, but the user can still squash them later if desired.

> * Working on a bug fix (Changing the feature that you currently work on, branches)
>   -> How does switching branches in the superproject affect submodules

You will have to stash or commit your changes. (Which reminds me...GC in
the subproject will need to consult the revlog of the superproject too.)

> New type of symbolic refs
> =========================
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
> 
> HEAD pointing at the superprojects index
> ----------------------------------------

Assuming we don't need synchronicity, the existing HEAD format can be
retained. To clarify what happens during ref writes, I'll reuse the
scenarios Stefan described:

> Ref write operations driven by the submodule, affecting target ref
>   e.g. git commit, reset --hard, update-ref (in the submodule)
> 
> The HEAD stays the same, pointing at the superproject.
> The gitlink is changed to the target sha1, using
> 
>   git -C <superproject> update-index --add \
>       --cacheinfo 160000,$SHA1,<gitlink-path>
> 
> This will affect the superprojects index, such that then a commit in
> the superproject is needed.

In this proposal, the HEAD also stays the same (pointing at the branch).

Either the index is updated or a commit is needed. If a commit is
needed, it is automatically performed.

> Ref write operations driven by the superproject, changing the gitlink
>   e.g. git checkout <tree-ish>, git reset --hard (in the superproject)
> 
> This will change the gitlink in the superprojects index, such that the HEAD
> in the submodule changes, which would trigger an update of the
> submodules working tree.

The HEAD in the submodule is unchanged. If the value of a ref has
changed "from underneath", this is as if a "git reset --soft" was done.

> Superproject operations spanning index and worktree
>   E.g. git reset --mixed
> As the submodules HEAD is defined in the index, we would reset it to the
> version in the last commit. As --mixed promises to not touch the working tree,
> the submodules worktree would not be touched. git reset --mixed in the
> superproject is the same as --soft in the submodule.

Same.

> Consistency considerations (gc)
>   e.g. git gc --aggressive --prune=now
> 
> The repacking logic is already aware of a detached HEAD, such that
> using this new symref mechanism would not generate problems as long as
> we keep the HEAD attached to the superproject. However when commits/objects
> are created while the HEAD is attached to the superproject and then HEAD
> switches to a local branch, there are problems with the created objects
> as they seem unreachable now.
> 
> This problem is not new as a superproject may record submodule objects
> that are not reachable from any of the submodule branches. Such objects
> fall prey to overzealous packing in the submodule.

The scenario Stefan describes will work OK - if a commit is created
while the HEAD is pointing to a branch, then either the superproject's
index will be updated or commits will be created in the superproject.
When GC reads the list of refs in the submodule, the new submodule
commit will be included. (Remember that if the superproject's current
branch is "foo", "refs/heads/foo" in the submodule reflects the
superproject's index, so any changes to the index, though uncommitted,
will appear as a ref.)

The problem still exists (e.g. stashes in the superproject) but is
reduced, I think.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  0:10   ` [RFD] Long term plan with submodule refs? Stefan Beller
  2017-11-09  1:29     ` Jonathan Tan
@ 2017-11-09  5:08     ` Junio C Hamano
  2017-11-09 19:57       ` Stefan Beller
  2017-11-09  6:54     ` Jacob Keller
  2 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2017-11-09  5:08 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jonathantanmy, git

Stefan Beller <sbeller@google.com> writes:

>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>
> This email presents different approaches.
>
> Objective
> =========
> This document should summarize the current situation of Git submodules
> and start a discussion of where it can be headed long term.
> Show different ways in which submodule refs could evolve.
>
> Background
> ==========
> Submodules in Git are considered as an independet repository currently.
> This is okay for current workflows, such as utilizing a library that is
> rarely updated. Other workflows that require a tighter integration between
> submodule and superproject are possible, but cumbersome as there is an
> additional step that has to be performed, which is the update of the gitlink
> pointer in the superproject.

I do not think "rarely updaed" is an issue.

The problem is that we may want to make it easier to use a
superproject and its submodules as if the combined whole were a
single project, which currently is not easy, primarily because
submodules are separate entities with different set of branches that
can be checked out independently from what branch the superproject
is working on.

> Workflows
> =========
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=<pathspec>
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  
> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?
> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?
> * Working on a bug fix (Changing the feature that you currently work on, branches)
>   -> How does switching branches in the superproject affect submodules

These are good starting points for copying such a combined whole to
your local machine and start working on it.  The more interesting,
important, and potentially difficult part is how the result of such
work is shared back to where you started from.  "push --recursive"
may be a simple phrase, but a sensible definition of how it should
work won't be that simple.

> Possible data models and workflow implications
> ==============================================
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.
>
> Keep everything as is, superproject and submodule have their own refs
> ---------------------------------------------------------------------
> ...
> Cons:
>  * Current tools that manage multiple repositories (e.g. repo, git-slave)
>    have "branches in parallel", i.e. each repo has a branch of the same
>    name, instead of using a superproject to manage the state of all repos
>    involved. So users of such tools may be confused by submodules.
>  * when using a detached HEAD in the submodule, we may run into git-gc issues.

We should make detached HEAD safe against gc if it is not,
regardless of the use of submodules.  I thought it already was made
safe long time ago.

> Use replicate refs in submodules
> --------------------------------
> This approach will replicate the superproject refs into the submodule
> ref namespace, e.g. git-branch learns about --recurse-submodules, which
> creates a branch of a given name in all submodules. These (topic) branches
> should be kept in sync with the superproject
>
> Pros:
>  * This seemed intuitive to Gerrit users
>  * 'quick' to implement, most of the commands are already there,
>    just git-branch is needed to have the workflows mentioned above complete.
> Cons:
>  * What does "git checkout -b A B" mean? (special case: B == HEAD)

The command ran at which level?  In the superproject, or in a single
submodule?

>    Is the branch name replicated as a string into the submodule operation,
>    or do we dereference the superprojects gitlink and walk from there?

If they are "kept in sync with the superproject", then there should
be no difference between the two, so I do not see any room for
wondering about that.  In other words, if there is need to worry
about the differences between the above two, then it probably is
fundamentally impossible to keep these in sync, and a design that
assumes it is possible would have to expose glitches to the end-user
experience.

I do not know if glitches resulting from there would be so severe to
be show-stoppers, though.  It might be possible to paper them over.

> No submodule refstore at all
> ----------------------------
> Use refs and commits in the superproject to stitch submodule changes
> together. Disallow branches in the submodule. This is only restricted
> to the working tree inside the superproject, such that the output of git-branch
> changes depending whether the working tree is in- or outside the superproject
> working tree.

This would need enhancement for reachability code, but it feels the
cleanest from the philosophical standpoint---if you want to treat a
superproject and its submodules as if it were a single project,
ability to check out a branch in a submodule that does not match
that of the superproject would only get in the way of preserving the
illusion of "single project"-ness.

> New type of symbolic refs
> =========================
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
>
> HEAD pointing at the superprojects index
> ----------------------------------------

This looks to me a mere implementation detail for a (part of)
necessary component to realize the above "No submodule refstore".

> Superproject operations spanning index and worktree
>   E.g. git reset --mixed
> As the submodules HEAD is defined in the index, we would reset it to the
> version in the last commit. As --mixed promises to not touch the working tree,
> the submodules worktree would not be touched. git reset --mixed in the
> superproject is the same as --soft in the submodule.

I am not sure if you want to take these promises low-level "single
repository" plumbing operations make too literally.  "reset --mixed"
may promise not to touch the working tree, but it also promises not
to touch submodules at all.  If you are breaking the latter anyway,
it would make more sense not to be afraid of breaking the former if
it makes sense in the context of allowing the command to do more by
breaking the latter.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  1:29     ` Jonathan Tan
@ 2017-11-09  5:47       ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2017-11-09  5:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Stefan Beller, git

Jonathan Tan <jonathantanmy@google.com> writes:

> What if, in the submodule, we have a new ref backend that mirrors the
> superproject? When initializing the submodule, its original refs are not
> cloned at all, but instead virtual refs are used.
> ...
> These rules seem straightforward to me (although I have been working
> with Git for a while, so perhaps I'm not the best judge), and I think
> leads to a good workflow, as discussed below.

Indeed this is intriguing.

> The above rules allow the following workflow:
>  - "checkout --recurse-submodules" the branch you want on the
>    superproject
>  - make whatever changes you want in each submodule
>  - commit each individual submodule (which updates the index of the
>    superproject), then commit the superproject (we can introduce a
>    commit --recurse-submodules to make this more convenient)

The "recurse" option would also give users an extra atomicity, and
would not be merely for convenience; when a user wants to treat a
superproject and its two submodules as if the combined whole were a
single repository, there shouldn't be two separate commits in the
history of the superproject only because two submodules made one
commit each to work on a single theme that spans all of them.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  0:10   ` [RFD] Long term plan with submodule refs? Stefan Beller
  2017-11-09  1:29     ` Jonathan Tan
  2017-11-09  5:08     ` Junio C Hamano
@ 2017-11-09  6:54     ` Jacob Keller
  2017-11-09 20:16       ` Stefan Beller
  2 siblings, 1 reply; 16+ messages in thread
From: Jacob Keller @ 2017-11-09  6:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Git mailing list

On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller <sbeller@google.com> wrote:
>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>

> New type of symbolic refs
> =========================
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
>
> HEAD pointing at the superprojects index
> ----------------------------------------
> Introduce a new symbolic ref that points at the superprojects
> index of the gitlink. The format is
>
>   "repo:" <superprojects gitdir> '\0' <gitlink-path> '\0'
>
> Just like existing symrefs, the content of the ref will be read and followed.
> On reading "repo:", the sha1 will be obtained equivalent to:
>
>     git -C <superproject> ls-files -s <gitlink-path> | awk '{ print $2}'
>
> Ref write operations driven by the submodule, affecting symrefs
>   e.g. git checkout <other branch> (in the submodule)
>
> In this scenario only the HEAD is optionally attached to the superproject,
> so we can rewrite the HEAD to be anything else, such as a branch just fine.
> Once the HEAD is not pointing at the superproject any more, we'll leave the
> submodule alone in operations driven by the superproject.
> To get back on the superproject branch, we’d need to invent new UX, such as
>    git checkout --attach-superproject
> as that is similar to --detach
>

Some of the idea trimmed for brevity, but I like this aspect the most.
Currently, I work on several projects which have multiple
repositories, which are essentially submodules.

However, historically, we kept them separate. 99% of the time, you can
use all 3 projects on "master" and everything works. But if you go
back in time, there's no correlation to "what did the parent project
want this "COMMON" folder to be at?

I started promoting using submodules for this, since it seemed quite natural.

The core problem, is that several developers never quite understood or
grasped how submodules worked. There's problems like "but what if I
wanna work on master?" or people assume submodules need to be checked
out at master instead of in a detached HEAD state.

So we often get people who don't run git submodule update and thus are
confused about why their submodules are often out of date. (This can
be solved by recursive options to commands to more often recurse into
submodules and checkout and update them).

We also often get people who accidentally commit the old version of
the repository, or commit an update to the parent project pointing the
submodule at some commit which isn't yet in the upstream of the common
repository.

The proposal here seems to match the intuition about how submodules
should work, with the ability to "attach" or "detach" the submodule
when working on the submodule directly.

Ideally, I'd like for more ways to say "ignore what my submodule is
checked out at, since I will have something else checked out, and
don't intend to commit just yet."

Basically, a workflow where it's easier to have each submodule checked
out at master, and we can still keep track of historical relationship
of what commit was the submodule at some time ago, but without causing
some of these headaches.

I've often tried to use the "--skip-worktree" bit to have people set
their repository to ignore the submodule. Unfortunately, this is
pretty complex, and most of the time, developers never remember to do
this again on a fresh clone.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  5:08     ` Junio C Hamano
@ 2017-11-09 19:57       ` Stefan Beller
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2017-11-09 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git

On Wed, Nov 8, 2017 at 9:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> The relationship is indeed currently useful, but if the long term plan
>>> is to strongly discourage detached submodule HEAD, then I would think
>>> that these patches are in the wrong direction. (If the long term plan is
>>> to end up supporting both detached and linked submodule HEAD, then these
>>> patches are fine, of course.) So I think that the plan referenced in
>>> Junio's email (that you linked above) still needs to be discussed.
>>
>> This email presents different approaches.
>>
>> Objective
>> =========
>> This document should summarize the current situation of Git submodules
>> and start a discussion of where it can be headed long term.
>> Show different ways in which submodule refs could evolve.
>>
>> Background
>> ==========
>> Submodules in Git are considered as an independet repository currently.
>> This is okay for current workflows, such as utilizing a library that is
>> rarely updated. Other workflows that require a tighter integration between
>> submodule and superproject are possible, but cumbersome as there is an
>> additional step that has to be performed, which is the update of the gitlink
>> pointer in the superproject.
>
> I do not think "rarely updaed" is an issue.
>
> The problem is that we may want to make it easier to use a
> superproject and its submodules as if the combined whole were a
> single project, which currently is not easy, primarily because
> submodules are separate entities with different set of branches that
> can be checked out independently from what branch the superproject
> is working on.

Well and this fact seems to be not a problem in the current use of submodules,
precisely because the workflow either (a) is not too cumbersome or (b)
is executed
not too often to bother enough.

> These are good starting points for copying such a combined whole to
> your local machine and start working on it.  The more interesting,
> important, and potentially difficult part is how the result of such
> work is shared back to where you started from.  "push --recursive"
> may be a simple phrase, but a sensible definition of how it should
> work won't be that simple.
...
>
> We should make detached HEAD safe against gc if it is not,
> regardless of the use of submodules.  I thought it already was made
> safe long time ago.

The detached HEAD itself is protected via its reflog (which is around
for say 2 weeks?)

If I were to develop using detached HEAD only in todays world of
submodules using different branches in the superproject, I run the risk
of loosing some commits in the submodule, as they are not the detached
HEAD all the time, but might even be loose tips.

This combined with the previous paragraph brings in another important
concern:
Some projects would have a very different history when used as a
submodule compared to when used as a stand alone project.
Other projects may be closely aligned between their branches and
what the superproject records.

So the more we deviate from the traditional branch model, the easier
we make it to have the submodule tips be very different from the
standalone tips, which may overexpose us to the gc issues as well as
the general question how much these projects have in common.

>> Use replicate refs in submodules
>> --------------------------------
>> This approach will replicate the superproject refs into the submodule
>> ref namespace, e.g. git-branch learns about --recurse-submodules, which
>> creates a branch of a given name in all submodules. These (topic) branches
>> should be kept in sync with the superproject
>>
>> Pros:
>>  * This seemed intuitive to Gerrit users
>>  * 'quick' to implement, most of the commands are already there,
>>    just git-branch is needed to have the workflows mentioned above complete.
>> Cons:
>>  * What does "git checkout -b A B" mean? (special case: B == HEAD)
>
> The command ran at which level?  In the superproject, or in a single
> submodule?

In the superproject, with --recurse-submodules, as the A and B would recurse
as strings, and not change meaning depending on the gitlink value.

>
>>    Is the branch name replicated as a string into the submodule operation,
>>    or do we dereference the superprojects gitlink and walk from there?
>
> If they are "kept in sync with the superproject", then there should
> be no difference between the two, so I do not see any room for
> wondering about that.

Except you can still break out by issuing commands in the submodule
to change the submodule refs to be different from the superproject.

This was also more along the lines of thinking about the (Gerrit) remote,
which does and okay, but not stellar job in keeping the remote branches
for superproject and submodule in sync. I'd expect glitches there.

> In other words, if there is need to worry
> about the differences between the above two, then it probably is
> fundamentally impossible to keep these in sync, and a design that
> assumes it is possible would have to expose glitches to the end-user
> experience.

yup. And by exposing you probably mean a patch series as presented?
(git status/log/diff making noise about the glitch?)

> I do not know if glitches resulting from there would be so severe to
> be show-stoppers, though.  It might be possible to paper them over.

I think so, too, as long as the user is pointed at the glitch to correct them.

>
>> No submodule refstore at all
>> ----------------------------
>> Use refs and commits in the superproject to stitch submodule changes
>> together. Disallow branches in the submodule. This is only restricted
>> to the working tree inside the superproject, such that the output of git-branch
>> changes depending whether the working tree is in- or outside the superproject
>> working tree.
>
> This would need enhancement for reachability code, but it feels the
> cleanest from the philosophical standpoint---if you want to treat a
> superproject and its submodules as if it were a single project,
> ability to check out a branch in a submodule that does not match
> that of the superproject would only get in the way of preserving the
> illusion of "single project"-ness.

I wonder if we can combine this with the approach Jonathan gave above.
In the worktree (of the submodule inside the superproject) you are allowed
to use these "mirrored" refs, whereas in any other worktree you have full
access to the normal refs of the project.

>
>> New type of symbolic refs
>> =========================
>> A symbolic ref can currently only point at a ref or another symbolic ref.
>> This proposal showcases different scenarios on how this could change in the
>> future.
>>
>> HEAD pointing at the superprojects index
>> ----------------------------------------
>
> This looks to me a mere implementation detail for a (part of)
> necessary component to realize the above "No submodule refstore".

Ah ok.

If all branches would use this new symref type, the handling would
seem to be very similar to what Jonathan described with a new type
of refstore instead.

>> Superproject operations spanning index and worktree
>>   E.g. git reset --mixed
>> As the submodules HEAD is defined in the index, we would reset it to the
>> version in the last commit. As --mixed promises to not touch the working tree,
>> the submodules worktree would not be touched. git reset --mixed in the
>> superproject is the same as --soft in the submodule.
>
> I am not sure if you want to take these promises low-level "single
> repository" plumbing operations make too literally.  "reset --mixed"
> may promise not to touch the working tree, but it also promises not
> to touch submodules at all.  If you are breaking the latter anyway,
> it would make more sense not to be afraid of breaking the former if
> it makes sense in the context of allowing the command to do more by
> breaking the latter.

ok.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09  6:54     ` Jacob Keller
@ 2017-11-09 20:16       ` Stefan Beller
  2017-11-10  3:37         ` Jacob Keller
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Beller @ 2017-11-09 20:16 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jonathan Tan, Git mailing list

On Wed, Nov 8, 2017 at 10:54 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
> On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller <sbeller@google.com> wrote:
>>> The relationship is indeed currently useful, but if the long term plan
>>> is to strongly discourage detached submodule HEAD, then I would think
>>> that these patches are in the wrong direction. (If the long term plan is
>>> to end up supporting both detached and linked submodule HEAD, then these
>>> patches are fine, of course.) So I think that the plan referenced in
>>> Junio's email (that you linked above) still needs to be discussed.
>>
>
>> New type of symbolic refs
>> =========================
>> A symbolic ref can currently only point at a ref or another symbolic ref.
>> This proposal showcases different scenarios on how this could change in the
>> future.
>>
>> HEAD pointing at the superprojects index
>> ----------------------------------------
>> Introduce a new symbolic ref that points at the superprojects
>> index of the gitlink. The format is
>>
>>   "repo:" <superprojects gitdir> '\0' <gitlink-path> '\0'
>>
>> Just like existing symrefs, the content of the ref will be read and followed.
>> On reading "repo:", the sha1 will be obtained equivalent to:
>>
>>     git -C <superproject> ls-files -s <gitlink-path> | awk '{ print $2}'
>>
>> Ref write operations driven by the submodule, affecting symrefs
>>   e.g. git checkout <other branch> (in the submodule)
>>
>> In this scenario only the HEAD is optionally attached to the superproject,
>> so we can rewrite the HEAD to be anything else, such as a branch just fine.
>> Once the HEAD is not pointing at the superproject any more, we'll leave the
>> submodule alone in operations driven by the superproject.
>> To get back on the superproject branch, we’d need to invent new UX, such as
>>    git checkout --attach-superproject
>> as that is similar to --detach
>>
>
> Some of the idea trimmed for brevity, but I like this aspect the most.
> Currently, I work on several projects which have multiple
> repositories, which are essentially submodules.
>
> However, historically, we kept them separate. 99% of the time, you can
> use all 3 projects on "master" and everything works. But if you go
> back in time, there's no correlation to "what did the parent project
> want this "COMMON" folder to be at?

So an environment where "git submodule update --remote" is not that
harmful, but rather brings the joy of being up to date in each project?

> I started promoting using submodules for this, since it seemed quite natural.
>
> The core problem, is that several developers never quite understood or
> grasped how submodules worked. There's problems like "but what if I
> wanna work on master?" or people assume submodules need to be checked
> out at master instead of in a detached HEAD state.

So the documentation sucks?

It is intentional that from the superprojects perspective the gitlink
must be one
exact value, and rely on the submodule to get to and keep that state.

(I think we once discussed if setting the gitlink value to 00...00 or otherwise
signal that we actually want "the most recent tip of the X branch" would be
a good idea, but I do not think it as it misses the point of versioning)

> So we often get people who don't run git submodule update and thus are
> confused about why their submodules are often out of date. (This can
> be solved by recursive options to commands to more often recurse into
> submodules and checkout and update them).
>
> We also often get people who accidentally commit the old version of
> the repository, or commit an update to the parent project pointing the
> submodule at some commit which isn't yet in the upstream of the common
> repository.

Would an upstream prereceive hook (maybe even builtin and accessible via
'receive.denyUnreachableSubmodules') help? (It would require submodules
to be defined with relative URLs in the .gitmodules file and then the receive
command can check for the gitlink value present in this other repository)

> The proposal here seems to match the intuition about how submodules
> should work, with the ability to "attach" or "detach" the submodule
> when working on the submodule directly.

Well I think the big picture discussion is how easy this attaching or
detaching is. Whether only the HEAD is attached or detached, or if we
invent a new refstore that is a complete new submodule thing, which
cannot be detached from the superproject at all.

> Ideally, I'd like for more ways to say "ignore what my submodule is
> checked out at, since I will have something else checked out, and
> don't intend to commit just yet."

This is in the superproject, when doing a git add . ?

> Basically, a workflow where it's easier to have each submodule checked
> out at master, and we can still keep track of historical relationship
> of what commit was the submodule at some time ago, but without causing
> some of these headaches.

So essentially a repo or otherwise parallel workflow just with the versioning
happening magically behind your back?

> I've often tried to use the "--skip-worktree" bit to have people set
> their repository to ignore the submodule. Unfortunately, this is
> pretty complex, and most of the time, developers never remember to do
> this again on a fresh clone.

That sounds interesting.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-09 20:16       ` Stefan Beller
@ 2017-11-10  3:37         ` Jacob Keller
  2017-11-10 20:01           ` Stefan Beller
  0 siblings, 1 reply; 16+ messages in thread
From: Jacob Keller @ 2017-11-10  3:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Git mailing list

On Thu, Nov 9, 2017 at 12:16 PM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Nov 8, 2017 at 10:54 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
>> On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller <sbeller@google.com> wrote:
>>>> The relationship is indeed currently useful, but if the long term plan
>>>> is to strongly discourage detached submodule HEAD, then I would think
>>>> that these patches are in the wrong direction. (If the long term plan is
>>>> to end up supporting both detached and linked submodule HEAD, then these
>>>> patches are fine, of course.) So I think that the plan referenced in
>>>> Junio's email (that you linked above) still needs to be discussed.
>>>
>>
>>> New type of symbolic refs
>>> =========================
>>> A symbolic ref can currently only point at a ref or another symbolic ref.
>>> This proposal showcases different scenarios on how this could change in the
>>> future.
>>>
>>> HEAD pointing at the superprojects index
>>> ----------------------------------------
>>> Introduce a new symbolic ref that points at the superprojects
>>> index of the gitlink. The format is
>>>
>>>   "repo:" <superprojects gitdir> '\0' <gitlink-path> '\0'
>>>
>>> Just like existing symrefs, the content of the ref will be read and followed.
>>> On reading "repo:", the sha1 will be obtained equivalent to:
>>>
>>>     git -C <superproject> ls-files -s <gitlink-path> | awk '{ print $2}'
>>>
>>> Ref write operations driven by the submodule, affecting symrefs
>>>   e.g. git checkout <other branch> (in the submodule)
>>>
>>> In this scenario only the HEAD is optionally attached to the superproject,
>>> so we can rewrite the HEAD to be anything else, such as a branch just fine.
>>> Once the HEAD is not pointing at the superproject any more, we'll leave the
>>> submodule alone in operations driven by the superproject.
>>> To get back on the superproject branch, we’d need to invent new UX, such as
>>>    git checkout --attach-superproject
>>> as that is similar to --detach
>>>
>>
>> Some of the idea trimmed for brevity, but I like this aspect the most.
>> Currently, I work on several projects which have multiple
>> repositories, which are essentially submodules.
>>
>> However, historically, we kept them separate. 99% of the time, you can
>> use all 3 projects on "master" and everything works. But if you go
>> back in time, there's no correlation to "what did the parent project
>> want this "COMMON" folder to be at?
>
> So an environment where "git submodule update --remote" is not that
> harmful, but rather brings the joy of being up to date in each project?
>
>> I started promoting using submodules for this, since it seemed quite natural.
>>
>> The core problem, is that several developers never quite understood or
>> grasped how submodules worked. There's problems like "but what if I
>> wanna work on master?" or people assume submodules need to be checked
>> out at master instead of in a detached HEAD state.
>
> So the documentation sucks?
>
> It is intentional that from the superprojects perspective the gitlink
> must be one
> exact value, and rely on the submodule to get to and keep that state.
>
> (I think we once discussed if setting the gitlink value to 00...00 or otherwise
> signal that we actually want "the most recent tip of the X branch" would be
> a good idea, but I do not think it as it misses the point of versioning)
>
>> So we often get people who don't run git submodule update and thus are
>> confused about why their submodules are often out of date. (This can
>> be solved by recursive options to commands to more often recurse into
>> submodules and checkout and update them).
>>
>> We also often get people who accidentally commit the old version of
>> the repository, or commit an update to the parent project pointing the
>> submodule at some commit which isn't yet in the upstream of the common
>> repository.
>
> Would an upstream prereceive hook (maybe even builtin and accessible via
> 'receive.denyUnreachableSubmodules') help? (It would require submodules
> to be defined with relative URLs in the .gitmodules file and then the receive
> command can check for the gitlink value present in this other repository)
>
>> The proposal here seems to match the intuition about how submodules
>> should work, with the ability to "attach" or "detach" the submodule
>> when working on the submodule directly.
>
> Well I think the big picture discussion is how easy this attaching or
> detaching is. Whether only the HEAD is attached or detached, or if we
> invent a new refstore that is a complete new submodule thing, which
> cannot be detached from the superproject at all.
>
>> Ideally, I'd like for more ways to say "ignore what my submodule is
>> checked out at, since I will have something else checked out, and
>> don't intend to commit just yet."
>
> This is in the superproject, when doing a git add . ?
>

Yes.

>> Basically, a workflow where it's easier to have each submodule checked
>> out at master, and we can still keep track of historical relationship
>> of what commit was the submodule at some time ago, but without causing
>> some of these headaches.
>
> So essentially a repo or otherwise parallel workflow just with the versioning
> happening magically behind your back?

Ideally, my developers would like to just have each submodule checked
out at master.

Ideally, I'd like to be able to checkout an old version of the parent
project and have it recorded what version of the shared submodule was
at at the time.

Ideally, my developers don't want to have to worry about knowing that
they shouldn't "git add -a" or "git commit -a" when they have a
submodule checked out at a different location from the parent projects
gitlink.

Thanks,
Jake

>
>> I've often tried to use the "--skip-worktree" bit to have people set
>> their repository to ignore the submodule. Unfortunately, this is
>> pretty complex, and most of the time, developers never remember to do
>> this again on a fresh clone.
>
> That sounds interesting.
>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-10  3:37         ` Jacob Keller
@ 2017-11-10 20:01           ` Stefan Beller
  2017-11-11  5:25             ` Jacob Keller
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Beller @ 2017-11-10 20:01 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jonathan Tan, Git mailing list

>
>>> Basically, a workflow where it's easier to have each submodule checked
>>> out at master, and we can still keep track of historical relationship
>>> of what commit was the submodule at some time ago, but without causing
>>> some of these headaches.
>>
>> So essentially a repo or otherwise parallel workflow just with the versioning
>> happening magically behind your back?
>
> Ideally, my developers would like to just have each submodule checked
> out at master.
>
> Ideally, I'd like to be able to checkout an old version of the parent
> project and have it recorded what version of the shared submodule was
> at at the time.

This sounds as if a "passive superproject" would work best for you, i.e.
each commit in a submodule is bubbled up into the superproject,
making a commit potentially even behind the scenes, such that the
user interaction with the superproject would be none.

However this approach also sounds careless, as there is no precondition
that e.g. the superproject builds with all the submodules as is; it is a mere
tracking of "at this time we have the submodules arranged as such",
whereas for the versioning aspect, you would want to have commit messages
in the superproject saying *why* you bumped up a specific submodule.
The user may not like to give such an explanation as they already wrote
a commit message for the individual project.

Also this approach sounds like a local approach, as it is not clear to me,
why you'd want to share the superproject history.

> Ideally, my developers don't want to have to worry about knowing that
> they shouldn't "git add -a" or "git commit -a" when they have a
> submodule checked out at a different location from the parent projects
> gitlink.
>
> Thanks,
> Jake
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFD] Long term plan with submodule refs?
  2017-11-10 20:01           ` Stefan Beller
@ 2017-11-11  5:25             ` Jacob Keller
  0 siblings, 0 replies; 16+ messages in thread
From: Jacob Keller @ 2017-11-11  5:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Git mailing list

On Fri, Nov 10, 2017 at 12:01 PM, Stefan Beller <sbeller@google.com> wrote:
>>
>>>> Basically, a workflow where it's easier to have each submodule checked
>>>> out at master, and we can still keep track of historical relationship
>>>> of what commit was the submodule at some time ago, but without causing
>>>> some of these headaches.
>>>
>>> So essentially a repo or otherwise parallel workflow just with the versioning
>>> happening magically behind your back?
>>
>> Ideally, my developers would like to just have each submodule checked
>> out at master.
>>
>> Ideally, I'd like to be able to checkout an old version of the parent
>> project and have it recorded what version of the shared submodule was
>> at at the time.
>
> This sounds as if a "passive superproject" would work best for you, i.e.
> each commit in a submodule is bubbled up into the superproject,
> making a commit potentially even behind the scenes, such that the
> user interaction with the superproject would be none.
>
> However this approach also sounds careless, as there is no precondition
> that e.g. the superproject builds with all the submodules as is; it is a mere
> tracking of "at this time we have the submodules arranged as such",
> whereas for the versioning aspect, you would want to have commit messages
> in the superproject saying *why* you bumped up a specific submodule.
> The user may not like to give such an explanation as they already wrote
> a commit message for the individual project.
>
> Also this approach sounds like a local approach, as it is not clear to me,
> why you'd want to share the superproject history.
>
>> Ideally, my developers don't want to have to worry about knowing that
>> they shouldn't "git add -a" or "git commit -a" when they have a
>> submodule checked out at a different location from the parent projects
>> gitlink.
>>
>> Thanks,
>> Jake
>>


It doesn't need to be totally passive, in that some (or one
maintainer) can manage when the submodule pointer is actually updated,
but ideally other users don't have to worry about that and can
"pretend" to always keep each submodule at master, as they have always
done in the past.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-11  5:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08 19:55 [RFC PATCH 0/4] git-status reports relation to superproject Stefan Beller
2017-11-08 19:55 ` [PATCH 1/4] remote, revision: factor out exclusive counting between two commits Stefan Beller
2017-11-08 19:55 ` [PATCH 2/4] submodule.c: factor start_ls_files_dot_dot out of get_superproject_working_tree Stefan Beller
2017-11-08 19:55 ` [PATCH 3/4] submodule.c: get superprojects gitlink value Stefan Beller
2017-11-08 19:55 ` [PATCH 4/4] git-status: report reference to superproject Stefan Beller
2017-11-08 22:36 ` [RFC PATCH 0/4] git-status reports relation " Jonathan Tan
2017-11-09  0:10   ` [RFD] Long term plan with submodule refs? Stefan Beller
2017-11-09  1:29     ` Jonathan Tan
2017-11-09  5:47       ` Junio C Hamano
2017-11-09  5:08     ` Junio C Hamano
2017-11-09 19:57       ` Stefan Beller
2017-11-09  6:54     ` Jacob Keller
2017-11-09 20:16       ` Stefan Beller
2017-11-10  3:37         ` Jacob Keller
2017-11-10 20:01           ` Stefan Beller
2017-11-11  5:25             ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.