All of lore.kernel.org
 help / color / mirror / Atom feed
* [GSoC] [PATCH] submodule--helper: run update procedures from C
@ 2021-07-22 13:40 Atharva Raykar
  2021-07-23  9:37 ` Ævar Arnfjörð Bjarmason
  2021-08-02 13:06 ` [GSoC] [PATCH v2] " Atharva Raykar
  0 siblings, 2 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-07-22 13:40 UTC (permalink / raw)
  To: git
  Cc: Atharva Raykar, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan

Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.

This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.

Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.

This patch could have been approached differently, by first changing the
`is_tip_reachable` and `fetch_in_submodule` shell functions to be
`submodule--helper` subcommands, and then following up with a patch that
introduces the `run-update-procedure` subcommand. We have not done it
like that because those functions are trivial enough to convert directly
along with these other changes. This lets us avoid the boilerplate and
the cleanup patches that will need to be introduced in following that
approach.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
---

This patch depends on changes introduced in 559e49fe5c (submodule: prefix die
messages with 'fatal', 2021-07-08), which belongs to the ar/submodule-add
(2021-07-12) series[1]. Other than that commit, it is independent of my other
'submodule add' conversion series.

Opinions on the following would be appreciated:

* Currently there is a lot of special meaning for the exit code, of this
  subcommand, which was needed to handle the various failures of running the
  update mode in the shell code that follows (note the extra handling of exit
  code 128, because a die() in C returns that value). I felt this was okay to do
  because in a later series that converts whatever is left, the handling of exit
  codes will be simplified.

* Is there a way to check if a sha1 is unreachable from all the refs?
  Currently 'is_tip_reachable()' spawns a subprocess with an incantation of:
  'git rev-list -n 1 $sha1 --not --all'
  I suppose I could do this with the revision-walk API [2], but it felt like
  a lot of boilerplate for something that was really succinct in the original
  shell implementation. Maybe worth looking into it for a later patch?

* I added a 'NOTE' comment for `case SM_UPDATE_COMMAND` in
  submodule--helper.c:run_update_command(). I wonder if that comment is
  unnecessary noise or worth mentioning. Is there an edge case where the
  !command in the 'submodule.update' configuration can be improperly handled by
  strvec_split()?

[1] https://lore.kernel.org/git/20210710074801.19917-1-raykar.ath@gmail.com/

Fetch-it-via:
git fetch https://github.com/tfidfwastaken/git.git submodule-run-update-proc-list-1

 builtin/submodule--helper.c | 247 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  95 ++++----------
 2 files changed, 269 insertions(+), 73 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d55f6262e9..4e16561bf1 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2029,6 +2029,20 @@ struct submodule_update_clone {
 	.max_jobs = 1, \
 }
 
+struct update_data {
+	const char *recursive_prefix;
+	const char *sm_path;
+	const char *displaypath;
+	struct object_id sha1;
+	struct object_id subsha1;
+	struct submodule_update_strategy update_strategy;
+	int depth;
+	unsigned int force: 1;
+	unsigned int quiet: 1;
+	unsigned int nofetch: 1;
+	unsigned int just_cloned: 1;
+};
+#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
 		struct strbuf *out, const char *displaypath)
@@ -2282,6 +2296,165 @@ static int git_update_clone_config(const char *var, const char *value,
 	return 0;
 }
 
+/* NEEDSWORK: try to do this without creating a new process */
+static int is_tip_reachable(const char *path, struct object_id *sha1)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf rev = STRBUF_INIT;
+	char *sha1_hex = oid_to_hex(sha1);
+
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(path);
+	cp.no_stderr = 1;
+	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
+		return 0;
+
+	return 1;
+}
+
+static int fetch_in_submodule(const char *module_path, int depth, int quiet, struct object_id *sha1)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+
+	prepare_submodule_repo_env(&cp.env_array);
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(module_path);
+
+	strvec_push(&cp.args, "fetch");
+	if (quiet)
+		strvec_push(&cp.args, "--quiet");
+	if (depth)
+		strvec_pushf(&cp.args, "--depth=%d", depth);
+	if (sha1) {
+		char *sha1_hex = oid_to_hex(sha1);
+		char *remote = get_default_remote();
+		strvec_pushl(&cp.args, remote, sha1_hex, NULL);
+	}
+
+	return run_command(&cp);
+}
+
+static int run_update_command(struct update_data *ud, int subforce)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf die_msg = STRBUF_INIT;
+	struct strbuf say_msg = STRBUF_INIT;
+	char *sha1 = oid_to_hex(&ud->sha1);
+	int retval, must_die_on_failure = 0;
+
+	cp.dir = xstrdup(ud->sm_path);
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "checkout", "-q", NULL);
+		if (subforce)
+			strvec_push(&cp.args, "-f");
+		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
+			    ud->displaypath, sha1);
+		break;
+	case SM_UPDATE_REBASE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "rebase");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
+			    ud->displaypath, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_MERGE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "merge");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
+			    sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
+			    ud->displaypath, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_COMMAND:
+		/* NOTE: this does not handle quoted arguments */
+		strvec_split(&cp.args, ud->update_strategy.command);
+		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
+			    ud->update_strategy.command, sha1, ud->displaypath);
+		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
+			    ud->displaypath, ud->update_strategy.command, sha1);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+	case SM_UPDATE_NONE:
+		BUG("update strategy should have been specified");
+	}
+
+	strvec_push(&cp.args, sha1);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (run_command(&cp)) {
+		if (must_die_on_failure) {
+			retval = 2;
+			fputs(_(die_msg.buf), stderr);
+			goto cleanup;
+		}
+		/*
+		 * This signifies to the caller in shell that
+		 * the command failed without dying
+		 */
+		retval = 1;
+		goto cleanup;
+	}
+	retval = 0;
+	puts(_(say_msg.buf));
+
+cleanup:
+	strbuf_release(&die_msg);
+	strbuf_release(&say_msg);
+	return retval;
+}
+
+static int do_run_update_procedure(struct update_data *ud)
+{
+	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
+	    is_null_oid(&ud->subsha1) || ud->force) {
+		int subforce = is_null_oid(&ud->subsha1) || ud->force;
+
+		if (!ud->nofetch) {
+			/*
+			 * Run fetch only if `sha1` isn't present or it
+			 * is not reachable from a ref.
+			 */
+			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
+				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
+				    !ud->quiet)
+					fprintf_ln(stderr,
+						   _("Unable to fetch in submodule path '%s'; "
+						     "trying to directly fetch %s:"),
+						   ud->displaypath, oid_to_hex(&ud->sha1));
+			/*
+			 * Now we tried the usual fetch, but `sha1` may
+			 * not be reachable from any of the refs.
+			 */
+			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
+				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
+					die(_("Fetched in submodule path '%s', but it did not "
+					      "contain %s. Direct fetching of that commit failed."),
+					    ud->displaypath, oid_to_hex(&ud->sha1));
+		}
+
+		return run_update_command(ud, subforce);
+	}
+
+	return 3;
+}
+
 static void update_submodule(struct update_clone_data *ucd)
 {
 	fprintf(stdout, "dummy %s %d\t%s\n",
@@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	return update_submodules(&suc);
 }
 
+static int run_update_procedure(int argc, const char **argv, const char *prefix)
+{
+	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
+	char *prefixed_path, *update = NULL;
+	char *sha1 = NULL, *subsha1 = NULL;
+	struct update_data update_data = UPDATE_DATA_INIT;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
+		OPT__FORCE(&force, N_("force checkout updates"), 0),
+		OPT_BOOL('N', "no-fetch", &nofetch,
+			 N_("don't fetch new objects from the remote site")),
+		OPT_BOOL(0, "just-cloned", &just_cloned,
+			 N_("overrides update mode in case the repository is a fresh clone")),
+		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("rebase, merge, checkout or none")),
+		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "sha1", &sha1, N_("string"),
+			   N_("SHA1 expected by superproject")),
+		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
+			   N_("SHA1 of submodule's HEAD")),
+		OPT_END()
+	};
+
+	const char *const usage[] = {
+		N_("git submodule--helper run-update-procedure [<options>] <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (argc != 1)
+		usage_with_options(usage, options);
+
+	update_data.force = !!force;
+	update_data.quiet = !!quiet;
+	update_data.nofetch = !!nofetch;
+	update_data.just_cloned = !!just_cloned;
+	update_data.sm_path = argv[0];
+
+	if (sha1)
+		get_oid_hex(sha1, &update_data.sha1);
+	else
+		oidcpy(&update_data.sha1, null_oid());
+
+	if (subsha1)
+		get_oid_hex(subsha1, &update_data.subsha1);
+	else
+		oidcpy(&update_data.subsha1, null_oid());
+
+	if (update_data.recursive_prefix)
+		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
+	else
+		prefixed_path = xstrdup(update_data.sm_path);
+
+	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
+
+	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
+					    update_data.sm_path, update,
+					    &update_data.update_strategy);
+
+	free(prefixed_path);
+
+	return do_run_update_procedure(&update_data);
+}
+
 static int resolve_relative_path(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -2759,6 +3005,7 @@ static struct cmd_struct commands[] = {
 	{"clone", module_clone, 0},
 	{"update-module-mode", module_update_module_mode, 0},
 	{"update-clone", update_clone, 0},
+	{"run-update-procedure", run_update_procedure, 0},
 	{"ensure-core-worktree", ensure_core_worktree, 0},
 	{"relative-path", resolve_relative_path, 0},
 	{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 1187c21260..4d5437f5c2 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -405,13 +405,6 @@ cmd_deinit()
 	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit ${GIT_QUIET:+--quiet} ${prefix:+--prefix "$prefix"} ${force:+--force} ${deinit_all:+--all} -- "$@"
 }
 
-is_tip_reachable () (
-	sanitize_submodule_env &&
-	cd "$1" &&
-	rev=$(git rev-list -n 1 "$2" --not --all 2>/dev/null) &&
-	test -z "$rev"
-)
-
 # usage: fetch_in_submodule <module_path> [<depth>] [<sha1>]
 # Because arguments are positional, use an empty string to omit <depth>
 # but include <sha1>.
@@ -555,14 +548,13 @@ cmd_update()
 
 		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
 
-		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
-
 		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
 
 		if test $just_cloned -eq 1
 		then
 			subsha1=
 		else
+			just_cloned=
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
 			die "$(eval_gettext "fatal: Unable to find current revision in submodule path '\$displaypath'")"
@@ -583,70 +575,27 @@ cmd_update()
 			die "$(eval_gettext "fatal: Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
 		fi
 
-		if test "$subsha1" != "$sha1" || test -n "$force"
-		then
-			subforce=$force
-			# If we don't already have a -f flag and the submodule has never been checked out
-			if test -z "$subsha1" && test -z "$force"
-			then
-				subforce="-f"
-			fi
+		out=$(git submodule--helper run-update-procedure ${wt_prefix:+--prefix "$wt_prefix"} ${GIT_QUIET:+--quiet} ${force:+--force} ${just_cloned:+--just-cloned} ${nofetch:+--no-fetch} ${depth:+"$depth"} ${update:+--update "$update"} ${prefix:+--recursive-prefix "$prefix"} ${sha1:+--sha1 "$sha1"} ${subsha1:+--subsha1 "$subsha1"} "$sm_path")
 
-			if test -z "$nofetch"
-			then
-				# Run fetch only if $sha1 isn't present or it
-				# is not reachable from a ref.
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" $depth ||
-				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
-
-				# Now we tried the usual fetch, but $sha1 may
-				# not be reachable from any of the refs
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
-				die "$(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
-			fi
-
-			must_die_on_failure=
-			case "$update_module" in
-			checkout)
-				command="git checkout $subforce -q"
-				die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
-				;;
-			rebase)
-				command="git rebase ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			merge)
-				command="git merge ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			!*)
-				command="${update_module#!}"
-				die_msg="$(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
-				must_die_on_failure=yes
-				;;
-			*)
-				die "$(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
-			esac
-
-			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
-			then
-				say "$say_msg"
-			elif test -n "$must_die_on_failure"
-			then
-				die_with_status 2 "$die_msg"
-			else
-				err="${err};$die_msg"
-				continue
-			fi
-		fi
+		# exit codes for run-update-procedure:
+		# 0: update was successful, say command output
+		# 128: subcommand died during execution
+		# 1: update procedure failed and must die
+		# 2: update procedure failed, but should not die
+		# 3: no update procedure was run
+		res="$?"
+		case $res in
+		0)
+			say "$out"
+			;;
+		2|128)
+			exit $res
+			;;
+		1)
+			err="${err};$out"
+			continue
+			;;
+		esac
 
 		if test -n "$recursive"
 		then
@@ -661,7 +610,7 @@ cmd_update()
 			if test $res -gt 0
 			then
 				die_msg="$(eval_gettext "fatal: Failed to recurse into submodule path '\$displaypath'")"
-				if test $res -ne 2
+				if test $res -ne 2 && test $res -ne 128
 				then
 					err="${err};$die_msg"
 					continue
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH] submodule--helper: run update procedures from C
  2021-07-22 13:40 [GSoC] [PATCH] submodule--helper: run update procedures from C Atharva Raykar
@ 2021-07-23  9:37 ` Ævar Arnfjörð Bjarmason
  2021-07-23 16:59   ` Atharva Raykar
  2021-08-02 13:06 ` [GSoC] [PATCH v2] " Atharva Raykar
  1 sibling, 1 reply; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-23  9:37 UTC (permalink / raw)
  To: Atharva Raykar
  Cc: git, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan


On Thu, Jul 22 2021, Atharva Raykar wrote:

> +/* NEEDSWORK: try to do this without creating a new process */
> +static int is_tip_reachable(const char *path, struct object_id *sha1)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	struct strbuf rev = STRBUF_INIT;
> +	char *sha1_hex = oid_to_hex(sha1);
> +
> +	cp.git_cmd = 1;
> +	cp.dir = xstrdup(path);
> +	cp.no_stderr = 1;
> +	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
> +
> +	prepare_submodule_repo_env(&cp.env_array);
> +
> +	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
> +		return 0;
> +
> +	return 1;
> +}

I think it's fine to do this & leave out the NEEDSWORK commit, just
briefly noting in the commit-message that we're not bothering with
trying to reduce sub-command invocations. It can be done later if anyone
cares.

> [...]
> +		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
> +			    ud->displaypath, sha1);

For all of these you're removing the translation from a message like:

    die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"

Which is easy enough to fix, just use _(), i.e.:

    strbuf_addf(&die_msg, _("Unable to checkout '%s' in submodule path '%s'\n"), [...]

I removed the "fatal: " per a comment below...

> +		break;
> +	case SM_UPDATE_REBASE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "rebase");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
> +			    ud->displaypath, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_MERGE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "merge");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
> +			    sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
> +			    ud->displaypath, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_COMMAND:
> +		/* NOTE: this does not handle quoted arguments */
> +		strvec_split(&cp.args, ud->update_strategy.command);
> +		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
> +			    ud->update_strategy.command, sha1, ud->displaypath);
> +		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
> +			    ud->displaypath, ud->update_strategy.command, sha1);
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_UNSPECIFIED:
> +	case SM_UPDATE_NONE:
> +		BUG("update strategy should have been specified");
> +	}
> +
> +	strvec_push(&cp.args, sha1);
> +
> +	prepare_submodule_repo_env(&cp.env_array);
> +
> +	if (run_command(&cp)) {
> +		if (must_die_on_failure) {
> +			retval = 2;
> +			fputs(_(die_msg.buf), stderr);
> +			goto cleanup;

FWIW I'd find this clearer if we just kept track of what operation we
ran above, and just in this run_command() && must_die_on_failure case
started populating these die messages.

But even if not the reason I dropped the "fatal: " is shouldn't we just
call die() here directly? Why clean up when we're dying anyway?

Also since I see you used _() here that won't work, i.e. with gettet if
you happen to need to declare things earlier, you need to use N_() to
mark the message for translation.

The _() here won't find any message translated (unless the string
happened to exactly match a thing in the *.po file for other reasons,
not the case here).

But in this case we can just die(msg) here and have used the _() above,
or just call die() directly here not having made a die_msg we usually
won't use...

> +static int do_run_update_procedure(struct update_data *ud)
> +{
> +	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
> +	    is_null_oid(&ud->subsha1) || ud->force) {
> +		int subforce = is_null_oid(&ud->subsha1) || ud->force;
> +
> +		if (!ud->nofetch) {
> +			/*
> +			 * Run fetch only if `sha1` isn't present or it
> +			 * is not reachable from a ref.
> +			 */
> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
> +				    !ud->quiet)
> +					fprintf_ln(stderr,
> +						   _("Unable to fetch in submodule path '%s'; "
> +						     "trying to directly fetch %s:"),
> +						   ud->displaypath, oid_to_hex(&ud->sha1));
> +			/*
> +			 * Now we tried the usual fetch, but `sha1` may
> +			 * not be reachable from any of the refs.
> +			 */
> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
> +					die(_("Fetched in submodule path '%s', but it did not "
> +					      "contain %s. Direct fetching of that commit failed."),
> +					    ud->displaypath, oid_to_hex(&ud->sha1));
> +		}
> +
> +		return run_update_command(ud, subforce);
> +	}
> +
> +	return 3;
> +}

Since this has excatly one caller I think it's better for readability
(less indentation) and flow to just remove that "return 3" condition and
do the big "if" you have at the end, i.e. have this function start with
"int subforce =" and...

>  static void update_submodule(struct update_clone_data *ucd)
>  {
>  	fprintf(stdout, "dummy %s %d\t%s\n",
> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>  	return update_submodules(&suc);
>  }
>  
> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
> +{
> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
> +	char *prefixed_path, *update = NULL;
> +	char *sha1 = NULL, *subsha1 = NULL;
> +	struct update_data update_data = UPDATE_DATA_INIT;
> +
> +	struct option options[] = {
> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
> +		OPT_BOOL('N', "no-fetch", &nofetch,
> +			 N_("don't fetch new objects from the remote site")),
> +		OPT_BOOL(0, "just-cloned", &just_cloned,
> +			 N_("overrides update mode in case the repository is a fresh clone")),
> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
> +		OPT_STRING(0, "prefix", &prefix,
> +			   N_("path"),
> +			   N_("path into the working tree")),
> +		OPT_STRING(0, "update", &update,
> +			   N_("string"),
> +			   N_("rebase, merge, checkout or none")),
> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
> +			   N_("path into the working tree, across nested "
> +			      "submodule boundaries")),
> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
> +			   N_("SHA1 expected by superproject")),
> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
> +			   N_("SHA1 of submodule's HEAD")),
> +		OPT_END()
> +	};
> +
> +	const char *const usage[] = {
> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
> +		NULL
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
> +
> +	if (argc != 1)
> +		usage_with_options(usage, options);
> +	update_data.force = !!force;
> +	update_data.quiet = !!quiet;
> +	update_data.nofetch = !!nofetch;
> +	update_data.just_cloned = !!just_cloned;

For all of these just pass the reference to the update_data variable
directly in the OPT_*(). No need to set an "int force", only to copy it
over to update_data.force. Let's just use the latter only.

> +
> +	if (sha1)
> +		get_oid_hex(sha1, &update_data.sha1);
> +	else
> +		oidcpy(&update_data.sha1, null_oid());

Nit: Even if a historical option forces us to support --sha1, let's use
"oid" for the variable etc. But in this case the --sha1 is new, no?
Let's use --object-id or --oid (whatever is more common, I didn't
check)>

> +
> +	if (subsha1)
> +		get_oid_hex(subsha1, &update_data.subsha1);
> +	else
> +		oidcpy(&update_data.subsha1, null_oid());

Ditto. Also I think for both of these you can re-use
parse_opt_object_id. See "squash-onto" and "upstream" in
builtin/rebase.c.

Then you just supply an oid variable directly and let that helper do all
the get_oid etc.

> +	if (update_data.recursive_prefix)
> +		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
> +	else
> +		prefixed_path = xstrdup(update_data.sm_path);
> +
> +	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
> +
> +	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
> +					    update_data.sm_path, update,
> +					    &update_data.update_strategy);
> +
> +	free(prefixed_path);
> +
> +	return do_run_update_procedure(&update_data);

....(continued from above) ...here just do:

    if (that big if condition)
        return do_run_update_procedure(&update_data);
    else
        return 3;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH] submodule--helper: run update procedures from C
  2021-07-23  9:37 ` Ævar Arnfjörð Bjarmason
@ 2021-07-23 16:59   ` Atharva Raykar
  2021-08-04  8:35     ` Atharva Raykar
  0 siblings, 1 reply; 13+ messages in thread
From: Atharva Raykar @ 2021-07-23 16:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan

On 23-Jul-2021, at 15:07, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> 
> On Thu, Jul 22 2021, Atharva Raykar wrote:
> 
>> +/* NEEDSWORK: try to do this without creating a new process */
>> +static int is_tip_reachable(const char *path, struct object_id *sha1)
>> +{
>> +	struct child_process cp = CHILD_PROCESS_INIT;
>> +	struct strbuf rev = STRBUF_INIT;
>> +	char *sha1_hex = oid_to_hex(sha1);
>> +
>> +	cp.git_cmd = 1;
>> +	cp.dir = xstrdup(path);
>> +	cp.no_stderr = 1;
>> +	strvec_pushl(&cp.args, "rev-list", "-n", "1", sha1_hex, "--not", "--all", NULL);
>> +
>> +	prepare_submodule_repo_env(&cp.env_array);
>> +
>> +	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
>> +		return 0;
>> +
>> +	return 1;
>> +}
> 
> I think it's fine to do this & leave out the NEEDSWORK commit, just
> briefly noting in the commit-message that we're not bothering with
> trying to reduce sub-command invocations. It can be done later if anyone
> cares.

Okay, fair enough. I realise I have not been consistent about leaving such
comments in all my conversion efforts.

>> [...]
>> +		strbuf_addf(&die_msg, "fatal: Unable to checkout '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': checked out '%s'\n",
>> +			    ud->displaypath, sha1);
> 
> For all of these you're removing the translation from a message like:
> 
>    die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
> 
> Which is easy enough to fix, just use _(), i.e.:
> 
>    strbuf_addf(&die_msg, _("Unable to checkout '%s' in submodule path '%s'\n"), [...]

Thanks for catching this.

> I removed the "fatal: " per a comment below...
> 
>> +		break;
>> +	case SM_UPDATE_REBASE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "rebase");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		strbuf_addf(&die_msg, "fatal: Unable to rebase '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': rebased into '%s'\n",
>> +			    ud->displaypath, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_MERGE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "merge");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		strbuf_addf(&die_msg, "fatal: Unable to merge '%s' in submodule path '%s'\n",
>> +			    sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': merged in '%s'\n",
>> +			    ud->displaypath, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_COMMAND:
>> +		/* NOTE: this does not handle quoted arguments */
>> +		strvec_split(&cp.args, ud->update_strategy.command);
>> +		strbuf_addf(&die_msg, "fatal: Execution of '%s %s' failed in submodule path '%s'\n",
>> +			    ud->update_strategy.command, sha1, ud->displaypath);
>> +		strbuf_addf(&say_msg, "Submodule path '%s': '%s %s'\n",
>> +			    ud->displaypath, ud->update_strategy.command, sha1);
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_UNSPECIFIED:
>> +	case SM_UPDATE_NONE:
>> +		BUG("update strategy should have been specified");
>> +	}
>> +
>> +	strvec_push(&cp.args, sha1);
>> +
>> +	prepare_submodule_repo_env(&cp.env_array);
>> +
>> +	if (run_command(&cp)) {
>> +		if (must_die_on_failure) {
>> +			retval = 2;
>> +			fputs(_(die_msg.buf), stderr);
>> +			goto cleanup;
> 
> FWIW I'd find this clearer if we just kept track of what operation we
> ran above, and just in this run_command() && must_die_on_failure case
> started populating these die messages.

Could you clarify what you meant here?

I can't think of a way to do this at the moment without introducing one
or more redundant 'switch ()' statements. This is what I interpreted your
statement as:

...
	/* we added the arguments to 'cp' already in the switch () above... */

	if (run_command(&cp)) {
		if (must_die_on_failure) {
			switch (ud->update_strategy.type) {
			case SM_UPDATE_CHECKOUT:
				die(_("Execution of..."));
				break;
			case ...
			case ...
		}
		/*
		 * This signifies to the caller in shell that
		 * the command failed without dying
		 */
		retval = 1;
		goto cleanup;
	}
	
	/* ...another switch to figure out which say_msg() to use? */
...

Or did you mean that instead of storing the die_msg in entirety at the
first switch, I instead store just the action, and in the conditional
finally have something like...?

	die(_("Unable to %s '%s' in submodule path '%s'"),
	    action, sha1, ud->displaypath);

...where 'action' is a value like "merge", "checkout" etc, set in the
first 'switch'.

I am guessing the motivation for this is to make more clear which error
message will be shown for each case?

> But even if not the reason I dropped the "fatal: " is shouldn't we just
> call die() here directly? Why clean up when we're dying anyway?

The reason I did not call die() directly is because the original shell
version originally specifically exited with 2 in the 'must_die_on_error'
case while die() in C exits with 128. I think it would be more semantically
correct for me to have done something like an 'exit(2)' instead of cleaning
up and returning.

That said, it occured to me that the receiving end does not do anything
special with the different exit code, other than just pass it on to another
exit, ie, this bit:

+		2|128)
+			exit $res
+			;;

This code currently sits in a weird middle position where it is neither
fully matching the exit codes as before the conversion (where a failure
unrelated to the command execution should have exited with 1, not 128),
nor is it having a complete disregard for their exact value, which would
somewhat simplify the failure handling code.

I wonder if any script in a machine somewhere cares about the exact exit value
of 'submodule add'. I suspect it would be relatively harmless if I just follow
your suggestion and just die() on command execution failure...

Thoughts?

> Also since I see you used _() here that won't work, i.e. with gettet if
> you happen to need to declare things earlier, you need to use N_() to
> mark the message for translation.
> 
> The _() here won't find any message translated (unless the string
> happened to exactly match a thing in the *.po file for other reasons,
> not the case here).
> 
> But in this case we can just die(msg) here and have used the _() above,
> or just call die() directly here not having made a die_msg we usually
> won't use...

Okay, I'll ensure that the translations are marked properly when I reroll.

>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +	if ((!is_null_oid(&ud->sha1) && !is_null_oid(&ud->subsha1) && !oideq(&ud->sha1, &ud->subsha1)) ||
>> +	    is_null_oid(&ud->subsha1) || ud->force) {
>> +		int subforce = is_null_oid(&ud->subsha1) || ud->force;
>> +
>> +		if (!ud->nofetch) {
>> +			/*
>> +			 * Run fetch only if `sha1` isn't present or it
>> +			 * is not reachable from a ref.
>> +			 */
>> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
>> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
>> +				    !ud->quiet)
>> +					fprintf_ln(stderr,
>> +						   _("Unable to fetch in submodule path '%s'; "
>> +						     "trying to directly fetch %s:"),
>> +						   ud->displaypath, oid_to_hex(&ud->sha1));
>> +			/*
>> +			 * Now we tried the usual fetch, but `sha1` may
>> +			 * not be reachable from any of the refs.
>> +			 */
>> +			if (!is_tip_reachable(ud->sm_path, &ud->sha1))
>> +				if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->sha1))
>> +					die(_("Fetched in submodule path '%s', but it did not "
>> +					      "contain %s. Direct fetching of that commit failed."),
>> +					    ud->displaypath, oid_to_hex(&ud->sha1));
>> +		}
>> +
>> +		return run_update_command(ud, subforce);
>> +	}
>> +
>> +	return 3;
>> +}
> 
> Since this has excatly one caller I think it's better for readability
> (less indentation) and flow to just remove that "return 3" condition and
> do the big "if" you have at the end, i.e. have this function start with
> "int subforce =" and...

Yeah, that would be better. I'll change that.

>> static void update_submodule(struct update_clone_data *ucd)
>> {
>> 	fprintf(stdout, "dummy %s %d\t%s\n",
>> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>> 	return update_submodules(&suc);
>> }
>> 
>> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
>> +{
>> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
>> +	char *prefixed_path, *update = NULL;
>> +	char *sha1 = NULL, *subsha1 = NULL;
>> +	struct update_data update_data = UPDATE_DATA_INIT;
>> +
>> +	struct option options[] = {
>> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
>> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
>> +		OPT_BOOL('N', "no-fetch", &nofetch,
>> +			 N_("don't fetch new objects from the remote site")),
>> +		OPT_BOOL(0, "just-cloned", &just_cloned,
>> +			 N_("overrides update mode in case the repository is a fresh clone")),
>> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
>> +		OPT_STRING(0, "prefix", &prefix,
>> +			   N_("path"),
>> +			   N_("path into the working tree")),
>> +		OPT_STRING(0, "update", &update,
>> +			   N_("string"),
>> +			   N_("rebase, merge, checkout or none")),
>> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
>> +			   N_("path into the working tree, across nested "
>> +			      "submodule boundaries")),
>> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
>> +			   N_("SHA1 expected by superproject")),
>> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
>> +			   N_("SHA1 of submodule's HEAD")),
>> +		OPT_END()
>> +	};
>> +
>> +	const char *const usage[] = {
>> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
>> +		NULL
>> +	};
>> +
>> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
>> +
>> +	if (argc != 1)
>> +		usage_with_options(usage, options);
>> +	update_data.force = !!force;
>> +	update_data.quiet = !!quiet;
>> +	update_data.nofetch = !!nofetch;
>> +	update_data.just_cloned = !!just_cloned;
> 
> For all of these just pass the reference to the update_data variable
> directly in the OPT_*(). No need to set an "int force", only to copy it
> over to update_data.force. Let's just use the latter only.

Hmm, I'm trying to remember why the single bit values are treated this way
in this whole file...

...there seems to be no good reason for it. The API docs for parse options
state that OPT_BOOL() is guaranteed to return either zero or one, so that
double negation does look unnecessary.

>> +
>> +	if (sha1)
>> +		get_oid_hex(sha1, &update_data.sha1);
>> +	else
>> +		oidcpy(&update_data.sha1, null_oid());
> 
> Nit: Even if a historical option forces us to support --sha1, let's use
> "oid" for the variable etc. But in this case the --sha1 is new, no?
> Let's use --object-id or --oid (whatever is more common, I didn't
> check)>

Okay. I can see the confusion this may cause.

>> +
>> +	if (subsha1)
>> +		get_oid_hex(subsha1, &update_data.subsha1);
>> +	else
>> +		oidcpy(&update_data.subsha1, null_oid());
> 
> Ditto. Also I think for both of these you can re-use
> parse_opt_object_id. See "squash-onto" and "upstream" in
> builtin/rebase.c.
> 
> Then you just supply an oid variable directly and let that helper do all
> the get_oid etc.

Thanks for pointing me to this!

>> +	if (update_data.recursive_prefix)
>> +		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
>> +	else
>> +		prefixed_path = xstrdup(update_data.sm_path);
>> +
>> +	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
>> +
>> +	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
>> +					    update_data.sm_path, update,
>> +					    &update_data.update_strategy);
>> +
>> +	free(prefixed_path);
>> +
>> +	return do_run_update_procedure(&update_data);
> 
> ....(continued from above) ...here just do:
> 
>    if (that big if condition)
>        return do_run_update_procedure(&update_data);
>    else
>        return 3;

Okay.

Thanks for the review!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [GSoC] [PATCH v2] submodule--helper: run update procedures from C
  2021-07-22 13:40 [GSoC] [PATCH] submodule--helper: run update procedures from C Atharva Raykar
  2021-07-23  9:37 ` Ævar Arnfjörð Bjarmason
@ 2021-08-02 13:06 ` Atharva Raykar
  2021-08-02 18:50   ` Shourya Shukla
  2021-08-13  7:56   ` [GSoC] [PATCH v3] " Atharva Raykar
  1 sibling, 2 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-02 13:06 UTC (permalink / raw)
  To: git
  Cc: Atharva Raykar, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan

Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.

This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.

Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.

This patch could have been approached differently, by first changing the
`is_tip_reachable` and `fetch_in_submodule` shell functions to be
`submodule--helper` subcommands, and then following up with a patch that
introduces the `run-update-procedure` subcommand. We have not done it
like that because those functions are trivial enough to convert directly
along with these other changes. This lets us avoid the boilerplate and
the cleanup patches that will need to be introduced in following that
approach.

This change is more focused on doing a faithful conversion, so for now we
are not too concerned with trying to reduce subprocess spawns.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
---

Notable changes since v1:

* Modified the code structure in
  submodule--helper.c:run_update_command(), while fixing problems with
  the translation marks.

* Renamed '--sha1' and '--subsha1' options to '--oid' and '--suboid' to
  since the argument is parsed into an object_id struct, not plain sha1
  data.

* Used option callbacks to parse the SHA1 arguments directly.

* Moved the conditional out of 'do_run_update_procedure()'.

Feedback required:

Ævar felt that it would be clearer to populate the 'fatal' messages
after the run_command() operation in 'run_update_command()', to make it
more readable [1]. I have attempted something like that here, and it has led
to a lot more duplicated 'switch' statements, which feels suboptimal.
I'd appreciate suggestions to make it more legible.

[1] https://lore.kernel.org/git/87r1fps63r.fsf@evledraar.gmail.com/

Fetch-it-Via:
git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-2

 builtin/submodule--helper.c | 253 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            | 106 +++++----------
 2 files changed, 286 insertions(+), 73 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d55f6262e9..b9c40324d0 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2029,6 +2029,20 @@ struct submodule_update_clone {
 	.max_jobs = 1, \
 }
 
+struct update_data {
+	const char *recursive_prefix;
+	const char *sm_path;
+	const char *displaypath;
+	struct object_id oid;
+	struct object_id suboid;
+	struct submodule_update_strategy update_strategy;
+	int depth;
+	unsigned int force: 1;
+	unsigned int quiet: 1;
+	unsigned int nofetch: 1;
+	unsigned int just_cloned: 1;
+};
+#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
 		struct strbuf *out, const char *displaypath)
@@ -2282,6 +2296,175 @@ static int git_update_clone_config(const char *var, const char *value,
 	return 0;
 }
 
+static int is_tip_reachable(const char *path, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf rev = STRBUF_INIT;
+	char *hex = oid_to_hex(oid);
+
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(path);
+	cp.no_stderr = 1;
+	strvec_pushl(&cp.args, "rev-list", "-n", "1", hex, "--not", "--all", NULL);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
+		return 0;
+
+	return 1;
+}
+
+static int fetch_in_submodule(const char *module_path, int depth, int quiet, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+
+	prepare_submodule_repo_env(&cp.env_array);
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(module_path);
+
+	strvec_push(&cp.args, "fetch");
+	if (quiet)
+		strvec_push(&cp.args, "--quiet");
+	if (depth)
+		strvec_pushf(&cp.args, "--depth=%d", depth);
+	if (oid) {
+		char *hex = oid_to_hex(oid);
+		char *remote = get_default_remote();
+		strvec_pushl(&cp.args, remote, hex, NULL);
+	}
+
+	return run_command(&cp);
+}
+
+static int run_update_command(struct update_data *ud, int subforce)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	char *oid = oid_to_hex(&ud->oid);
+	int must_die_on_failure = 0;
+
+	cp.dir = xstrdup(ud->sm_path);
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "checkout", "-q", NULL);
+		if (subforce)
+			strvec_push(&cp.args, "-f");
+		break;
+	case SM_UPDATE_REBASE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "rebase");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_MERGE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "merge");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_COMMAND:
+		/* NOTE: this does not handle quoted arguments */
+		strvec_split(&cp.args, ud->update_strategy.command);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+	case SM_UPDATE_NONE:
+		BUG("update strategy should have been specified");
+	}
+
+	strvec_push(&cp.args, oid);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (run_command(&cp)) {
+		if (must_die_on_failure) {
+			switch (ud->update_strategy.type) {
+			case SM_UPDATE_CHECKOUT:
+				die(_("Unable to checkout '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_REBASE:
+				die(_("Unable to rebase '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_MERGE:
+				die(_("Unable to merge '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_COMMAND:
+				die(_("Execution of '%s %s' failed in submodule path '%s'"),
+				      ud->update_strategy.command, oid, ud->displaypath);
+				break;
+			case SM_UPDATE_UNSPECIFIED:
+			case SM_UPDATE_NONE:
+				BUG("update strategy should have been specified");
+			}
+		}
+		/*
+		 * This signifies to the caller in shell that
+		 * the command failed without dying
+		 */
+		return 1;
+	}
+
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		printf(_("Submodule path '%s': checked out '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_REBASE:
+		printf(_("Submodule path '%s': rebased into '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_MERGE:
+		printf(_("Submodule path '%s': merged in '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_COMMAND:
+		printf(_("Submodule path '%s': '%s %s'\n"),
+		       ud->displaypath, ud->update_strategy.command, oid);
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+	case SM_UPDATE_NONE:
+		BUG("update strategy should have been specified");
+	}
+
+	return 0;
+}
+
+static int do_run_update_procedure(struct update_data *ud)
+{
+	int subforce = is_null_oid(&ud->suboid) || ud->force;
+
+	if (!ud->nofetch) {
+		/*
+		 * Run fetch only if `oid` isn't present or it
+		 * is not reachable from a ref.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
+			    !ud->quiet)
+				fprintf_ln(stderr,
+					   _("Unable to fetch in submodule path '%s'; "
+					     "trying to directly fetch %s:"),
+					   ud->displaypath, oid_to_hex(&ud->oid));
+		/*
+		 * Now we tried the usual fetch, but `oid` may
+		 * not be reachable from any of the refs.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
+				die(_("Fetched in submodule path '%s', but it did not "
+				      "contain %s. Direct fetching of that commit failed."),
+				    ud->displaypath, oid_to_hex(&ud->oid));
+	}
+
+	return run_update_command(ud, subforce);
+}
+
 static void update_submodule(struct update_clone_data *ucd)
 {
 	fprintf(stdout, "dummy %s %d\t%s\n",
@@ -2379,6 +2562,75 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	return update_submodules(&suc);
 }
 
+static int run_update_procedure(int argc, const char **argv, const char *prefix)
+{
+	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
+	char *prefixed_path, *update = NULL;
+	struct update_data update_data = UPDATE_DATA_INIT;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
+		OPT__FORCE(&force, N_("force checkout updates"), 0),
+		OPT_BOOL('N', "no-fetch", &nofetch,
+			 N_("don't fetch new objects from the remote site")),
+		OPT_BOOL(0, "just-cloned", &just_cloned,
+			 N_("overrides update mode in case the repository is a fresh clone")),
+		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("rebase, merge, checkout or none")),
+		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_CALLBACK_F(0, "oid", &update_data.oid, N_("sha1"),
+			       N_("SHA1 expected by superproject"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_CALLBACK_F(0, "suboid", &update_data.suboid, N_("subsha1"),
+			       N_("SHA1 of submodule's HEAD"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_END()
+	};
+
+	const char *const usage[] = {
+		N_("git submodule--helper run-update-procedure [<options>] <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (argc != 1)
+		usage_with_options(usage, options);
+
+	update_data.force = !!force;
+	update_data.quiet = !!quiet;
+	update_data.nofetch = !!nofetch;
+	update_data.just_cloned = !!just_cloned;
+	update_data.sm_path = argv[0];
+
+	if (update_data.recursive_prefix)
+		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
+	else
+		prefixed_path = xstrdup(update_data.sm_path);
+
+	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
+
+	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
+					    update_data.sm_path, update,
+					    &update_data.update_strategy);
+
+	free(prefixed_path);
+
+	if ((!is_null_oid(&update_data.oid) && !is_null_oid(&update_data.suboid) &&
+	     !oideq(&update_data.oid, &update_data.suboid)) ||
+	    is_null_oid(&update_data.suboid) || update_data.force)
+		return do_run_update_procedure(&update_data);
+
+	return 3;
+}
+
 static int resolve_relative_path(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -2759,6 +3011,7 @@ static struct cmd_struct commands[] = {
 	{"clone", module_clone, 0},
 	{"update-module-mode", module_update_module_mode, 0},
 	{"update-clone", update_clone, 0},
+	{"run-update-procedure", run_update_procedure, 0},
 	{"ensure-core-worktree", ensure_core_worktree, 0},
 	{"relative-path", resolve_relative_path, 0},
 	{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 1187c21260..0bb4514859 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -405,13 +405,6 @@ cmd_deinit()
 	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit ${GIT_QUIET:+--quiet} ${prefix:+--prefix "$prefix"} ${force:+--force} ${deinit_all:+--all} -- "$@"
 }
 
-is_tip_reachable () (
-	sanitize_submodule_env &&
-	cd "$1" &&
-	rev=$(git rev-list -n 1 "$2" --not --all 2>/dev/null) &&
-	test -z "$rev"
-)
-
 # usage: fetch_in_submodule <module_path> [<depth>] [<sha1>]
 # Because arguments are positional, use an empty string to omit <depth>
 # but include <sha1>.
@@ -555,14 +548,13 @@ cmd_update()
 
 		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
 
-		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
-
 		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
 
 		if test $just_cloned -eq 1
 		then
 			subsha1=
 		else
+			just_cloned=
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
 			die "$(eval_gettext "fatal: Unable to find current revision in submodule path '\$displaypath'")"
@@ -583,70 +575,38 @@ cmd_update()
 			die "$(eval_gettext "fatal: Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
 		fi
 
-		if test "$subsha1" != "$sha1" || test -n "$force"
-		then
-			subforce=$force
-			# If we don't already have a -f flag and the submodule has never been checked out
-			if test -z "$subsha1" && test -z "$force"
-			then
-				subforce="-f"
-			fi
+		out=$(git submodule--helper run-update-procedure \
+			  ${wt_prefix:+--prefix "$wt_prefix"} \
+			  ${GIT_QUIET:+--quiet} \
+			  ${force:+--force} \
+			  ${just_cloned:+--just-cloned} \
+			  ${nofetch:+--no-fetch} \
+			  ${depth:+"$depth"} \
+			  ${update:+--update "$update"} \
+			  ${prefix:+--recursive-prefix "$prefix"} \
+			  ${sha1:+--oid "$sha1"} \
+			  ${subsha1:+--suboid "$subsha1"} \
+			  "--" \
+			  "$sm_path")
 
-			if test -z "$nofetch"
-			then
-				# Run fetch only if $sha1 isn't present or it
-				# is not reachable from a ref.
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" $depth ||
-				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
-
-				# Now we tried the usual fetch, but $sha1 may
-				# not be reachable from any of the refs
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
-				die "$(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
-			fi
-
-			must_die_on_failure=
-			case "$update_module" in
-			checkout)
-				command="git checkout $subforce -q"
-				die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
-				;;
-			rebase)
-				command="git rebase ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			merge)
-				command="git merge ${GIT_QUIET:+--quiet}"
-				die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			!*)
-				command="${update_module#!}"
-				die_msg="$(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
-				must_die_on_failure=yes
-				;;
-			*)
-				die "$(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
-			esac
-
-			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
-			then
-				say "$say_msg"
-			elif test -n "$must_die_on_failure"
-			then
-				die_with_status 2 "$die_msg"
-			else
-				err="${err};$die_msg"
-				continue
-			fi
-		fi
+		# exit codes for run-update-procedure:
+		# 0: update was successful, say command output
+		# 128: subcommand died during execution
+		# 1: update procedure failed, but should not die
+		# 3: no update procedure was run
+		res="$?"
+		case $res in
+		0)
+			say "$out"
+			;;
+		128)
+			exit $res
+			;;
+		1)
+			err="${err};$out"
+			continue
+			;;
+		esac
 
 		if test -n "$recursive"
 		then
@@ -661,7 +621,7 @@ cmd_update()
 			if test $res -gt 0
 			then
 				die_msg="$(eval_gettext "fatal: Failed to recurse into submodule path '\$displaypath'")"
-				if test $res -ne 2
+				if test $res -ne 2 && test $res -ne 128
 				then
 					err="${err};$die_msg"
 					continue
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH v2] submodule--helper: run update procedures from C
  2021-08-02 13:06 ` [GSoC] [PATCH v2] " Atharva Raykar
@ 2021-08-02 18:50   ` Shourya Shukla
  2021-08-03  8:46     ` Atharva Raykar
  2021-08-03 10:07     ` Atharva Raykar
  2021-08-13  7:56   ` [GSoC] [PATCH v3] " Atharva Raykar
  1 sibling, 2 replies; 13+ messages in thread
From: Shourya Shukla @ 2021-08-02 18:50 UTC (permalink / raw)
  To: Atharva Raykar
  Cc: Git List, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Kaartic Sivaraam, Prathamesh Chavan

Le lun. 2 août 2021 à 18:36, Atharva Raykar <raykar.ath@gmail.com> a écrit :
>
> Add a new submodule--helper subcommand `run-update-procedure` that runs
> the update procedure if the SHA1 of the submodule does not match what
> the superproject expects.
>
> This is an intermediate change that works towards total conversion of
> `submodule update` from shell to C.
>
> Specific error codes are returned so that the shell script calling the
> subcommand can take a decision on the control flow, and preserve the
> error messages across subsequent recursive calls of `cmd_update`.
>
> This patch could have been approached differently, by first changing the
> `is_tip_reachable` and `fetch_in_submodule` shell functions to be
> `submodule--helper` subcommands, and then following up with a patch that
> introduces the `run-update-procedure` subcommand. We have not done it
> like that because those functions are trivial enough to convert directly
> along with these other changes. This lets us avoid the boilerplate and
> the cleanup patches that will need to be introduced in following that
> approach.

I feel that this part is more suitable for a cover letter rather than the commit
message itself. It is a useful piece of info though.

> This change is more focused on doing a faithful conversion, so for now we
> are not too concerned with trying to reduce subprocess spawns.
>
> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Shourya Shukla <periperidip@gmail.com>
> ---
>
> Notable changes since v1:
>
> * Modified the code structure in
>   submodule--helper.c:run_update_command(), while fixing problems with
>   the translation marks.
>
> * Renamed '--sha1' and '--subsha1' options to '--oid' and '--suboid' to
>   since the argument is parsed into an object_id struct, not plain sha1
>   data.
>
> * Used option callbacks to parse the SHA1 arguments directly.
>
> * Moved the conditional out of 'do_run_update_procedure()'.
>
> Feedback required:
>
> Ævar felt that it would be clearer to populate the 'fatal' messages
> after the run_command() operation in 'run_update_command()', to make it
> more readable [1]. I have attempted something like that here, and it has led
> to a lot more duplicated 'switch' statements, which feels suboptimal.
> I'd appreciate suggestions to make it more legible.
>
> [1] https://lore.kernel.org/git/87r1fps63r.fsf@evledraar.gmail.com/
>
> Fetch-it-Via:
> git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-2
>
>  builtin/submodule--helper.c | 253 ++++++++++++++++++++++++++++++++++++
>  git-submodule.sh            | 106 +++++----------
>  2 files changed, 286 insertions(+), 73 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index d55f6262e9..b9c40324d0 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -2029,6 +2029,20 @@ struct submodule_update_clone {
>         .max_jobs = 1, \
>  }
>
> +struct update_data {
> +       const char *recursive_prefix;
> +       const char *sm_path;
> +       const char *displaypath;
> +       struct object_id oid;
> +       struct object_id suboid;
> +       struct submodule_update_strategy update_strategy;
> +       int depth;
> +       unsigned int force: 1;
> +       unsigned int quiet: 1;
> +       unsigned int nofetch: 1;
> +       unsigned int just_cloned: 1;
> +};
> +#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
>
>  static void next_submodule_warn_missing(struct submodule_update_clone *suc,
>                 struct strbuf *out, const char *displaypath)
> @@ -2282,6 +2296,175 @@ static int git_update_clone_config(const char *var, const char *value,
>         return 0;
>  }
> +
> +static int run_update_command(struct update_data *ud, int subforce)
> +{
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       char *oid = oid_to_hex(&ud->oid);
> +       int must_die_on_failure = 0;
> +
> +       cp.dir = xstrdup(ud->sm_path);
> +       switch (ud->update_strategy.type) {
> +       case SM_UPDATE_CHECKOUT:
> +               cp.git_cmd = 1;
> +               strvec_pushl(&cp.args, "checkout", "-q", NULL);

Would it be possible to add the 'if' statement above just before the
'switch' (or after,
whichever seems okay) since this is common amongst (almost) all the cases?

> +               if (subforce)
> +                       strvec_push(&cp.args, "-f");
> +               break;
> +       case SM_UPDATE_REBASE:
> +               cp.git_cmd = 1;
> +               strvec_push(&cp.args, "rebase");
> +               if (ud->quiet)
> +                       strvec_push(&cp.args, "--quiet");
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_MERGE:
> +               cp.git_cmd = 1;
> +               strvec_push(&cp.args, "merge");
> +               if (ud->quiet)
> +                       strvec_push(&cp.args, "--quiet");
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_COMMAND:
> +               /* NOTE: this does not handle quoted arguments */
> +               strvec_split(&cp.args, ud->update_strategy.command);
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_UNSPECIFIED:
> +       case SM_UPDATE_NONE:
> +               BUG("update strategy should have been specified");
> +       }

If the original did not bug out, do we need to? The documentation does
not mention
this as well:
https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-none

> +
> +       strvec_push(&cp.args, oid);
> +
> +       prepare_submodule_repo_env(&cp.env_array);
> +
> +       if (run_command(&cp)) {
> +               if (must_die_on_failure) {
> +                       switch (ud->update_strategy.type) {
> +                       case SM_UPDATE_CHECKOUT:
> +                               die(_("Unable to checkout '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_REBASE:
> +                               die(_("Unable to rebase '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_MERGE:
> +                               die(_("Unable to merge '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_COMMAND:
> +                               die(_("Execution of '%s %s' failed in submodule path '%s'"),
> +                                     ud->update_strategy.command, oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_UNSPECIFIED:
> +                       case SM_UPDATE_NONE:
> +                               BUG("update strategy should have been specified");
> +                       }
> +               }
> +               /*
> +                * This signifies to the caller in shell that
> +                * the command failed without dying
> +                */
> +               return 1;
> +       }
> +
> +       switch (ud->update_strategy.type) {
> +       case SM_UPDATE_CHECKOUT:
> +               printf(_("Submodule path '%s': checked out '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_REBASE:
> +               printf(_("Submodule path '%s': rebased into '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_MERGE:
> +               printf(_("Submodule path '%s': merged in '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_COMMAND:
> +               printf(_("Submodule path '%s': '%s %s'\n"),
> +                      ud->displaypath, ud->update_strategy.command, oid);
> +               break;
> +       case SM_UPDATE_UNSPECIFIED:
> +       case SM_UPDATE_NONE:
> +               BUG("update strategy should have been specified");
> +       }
> +
> +       return 0;
> +}
> +
> +static int do_run_update_procedure(struct update_data *ud)
> +{
> +       int subforce = is_null_oid(&ud->suboid) || ud->force;
> +
> +       if (!ud->nofetch) {
> +               /*
> +                * Run fetch only if `oid` isn't present or it
> +                * is not reachable from a ref.
> +                */
> +               if (!is_tip_reachable(ud->sm_path, &ud->oid))
> +                       if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
> +                           !ud->quiet)
> +                               fprintf_ln(stderr,
> +                                          _("Unable to fetch in submodule path '%s'; "
> +                                            "trying to directly fetch %s:"),
> +                                          ud->displaypath, oid_to_hex(&ud->oid));

I was wondering if an OID is invalid, will it be counted as
unreachable and vice-versa?
If that is the case then that would simplify the work.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH v2] submodule--helper: run update procedures from C
  2021-08-02 18:50   ` Shourya Shukla
@ 2021-08-03  8:46     ` Atharva Raykar
  2021-08-03 10:07     ` Atharva Raykar
  1 sibling, 0 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-03  8:46 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Git List, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Kaartic Sivaraam, Prathamesh Chavan


Shourya Shukla <periperidip@gmail.com> writes:

> Le lun. 2 août 2021 à 18:36, Atharva Raykar 
> <raykar.ath@gmail.com> a écrit :
>>
>> Add a new submodule--helper subcommand `run-update-procedure` 
>> that runs
>> the update procedure if the SHA1 of the submodule does not 
>> match what
>> the superproject expects.
>>
>> This is an intermediate change that works towards total 
>> conversion of
>> `submodule update` from shell to C.
>>
>> Specific error codes are returned so that the shell script 
>> calling the
>> subcommand can take a decision on the control flow, and 
>> preserve the
>> error messages across subsequent recursive calls of 
>> `cmd_update`.
>>
>> This patch could have been approached differently, by first 
>> changing the
>> `is_tip_reachable` and `fetch_in_submodule` shell functions to 
>> be
>> `submodule--helper` subcommands, and then following up with a 
>> patch that
>> introduces the `run-update-procedure` subcommand. We have not 
>> done it
>> like that because those functions are trivial enough to convert 
>> directly
>> along with these other changes. This lets us avoid the 
>> boilerplate and
>> the cleanup patches that will need to be introduced in 
>> following that
>> approach.
>
> I feel that this part is more suitable for a cover letter rather 
> than the commit
> message itself. It is a useful piece of info though.

Okay, that seems right, the message does seem a bit too
context-sensitive.

>> This change is more focused on doing a faithful conversion, so 
>> for now we
>> are not too concerned with trying to reduce subprocess spawns.
>>
>> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
>> Mentored-by: Christian Couder <christian.couder@gmail.com>
>> Mentored-by: Shourya Shukla <periperidip@gmail.com>
>> ---
>>
>> Notable changes since v1:
>>
>> * Modified the code structure in
>>   submodule--helper.c:run_update_command(), while fixing 
>>   problems with
>>   the translation marks.
>>
>> * Renamed '--sha1' and '--subsha1' options to '--oid' and 
>> '--suboid' to
>>   since the argument is parsed into an object_id struct, not 
>>   plain sha1
>>   data.
>>
>> * Used option callbacks to parse the SHA1 arguments directly.
>>
>> * Moved the conditional out of 'do_run_update_procedure()'.
>>
>> Feedback required:
>>
>> Ævar felt that it would be clearer to populate the 'fatal' 
>> messages
>> after the run_command() operation in 'run_update_command()', to 
>> make it
>> more readable [1]. I have attempted something like that here, 
>> and it has led
>> to a lot more duplicated 'switch' statements, which feels 
>> suboptimal.
>> I'd appreciate suggestions to make it more legible.
>>
>> [1] 
>> https://lore.kernel.org/git/87r1fps63r.fsf@evledraar.gmail.com/
>>
>> Fetch-it-Via:
>> git fetch https://github.com/tfidfwastaken/git 
>> submodule-run-update-proc-list-2
>>
>>  builtin/submodule--helper.c | 253 
>>  ++++++++++++++++++++++++++++++++++++
>>  git-submodule.sh            | 106 +++++----------
>>  2 files changed, 286 insertions(+), 73 deletions(-)
>>
>> diff --git a/builtin/submodule--helper.c 
>> b/builtin/submodule--helper.c
>> index d55f6262e9..b9c40324d0 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -2029,6 +2029,20 @@ struct submodule_update_clone {
>>         .max_jobs = 1, \
>>  }
>>
>> +struct update_data {
>> +       const char *recursive_prefix;
>> +       const char *sm_path;
>> +       const char *displaypath;
>> +       struct object_id oid;
>> +       struct object_id suboid;
>> +       struct submodule_update_strategy update_strategy;
>> +       int depth;
>> +       unsigned int force: 1;
>> +       unsigned int quiet: 1;
>> +       unsigned int nofetch: 1;
>> +       unsigned int just_cloned: 1;
>> +};
>> +#define UPDATE_DATA_INIT { .update_strategy = 
>> SUBMODULE_UPDATE_STRATEGY_INIT }
>>
>>  static void next_submodule_warn_missing(struct 
>>  submodule_update_clone *suc,
>>                 struct strbuf *out, const char *displaypath)
>> @@ -2282,6 +2296,175 @@ static int 
>> git_update_clone_config(const char *var, const char *value,
>>         return 0;
>>  }
>> +
>> +static int run_update_command(struct update_data *ud, int 
>> subforce)
>> +{
>> +       struct child_process cp = CHILD_PROCESS_INIT;
>> +       char *oid = oid_to_hex(&ud->oid);
>> +       int must_die_on_failure = 0;
>> +
>> +       cp.dir = xstrdup(ud->sm_path);
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               cp.git_cmd = 1;
>> +               strvec_pushl(&cp.args, "checkout", "-q", NULL);
>
> Would it be possible to add the 'if' statement above just before 
> the
> 'switch' (or after,
> whichever seems okay) since this is common amongst (almost) all 
> the cases?

I'll try it on once, if it makes the code more readable, I'll 
include it
in the reroll.

>> +               if (subforce)
>> +                       strvec_push(&cp.args, "-f");
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "rebase");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "merge");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               /* NOTE: this does not handle quoted arguments 
>> */
>> +               strvec_split(&cp.args, 
>> ud->update_strategy.command);
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been 
>> specified");
>> +       }
>
> If the original did not bug out, do we need to? The 
> documentation does
> not mention
> this as well:
> https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-none

This was how the original shell porcelain did it:
case "$update_module" in
checkout)
	command="git checkout $subforce -q"
	die_msg="$(eval_gettext "Unable to checkout '\$sha1' in 
	submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': 
	checked out '\$sha1'")"
	;;
rebase)
	command="git rebase ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to rebase '\$sha1' in 
	submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': 
	rebased into '\$sha1'")"
	must_die_on_failure=yes
	;;
merge)
	command="git merge ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule 
	path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': 
	merged in '\$sha1'")"
	must_die_on_failure=yes
	;;
!*)
	command="${update_module#!}"
	die_msg="$(eval_gettext "Execution of '\$command \$sha1' 
	failed in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': 
	'\$command \$sha1'")"
	must_die_on_failure=yes
	;;
*)
	die "$(eval_gettext "Invalid update mode '$update_module' for 
	submodule path '$path'")"
esac

The fallthrough case used to die, but I noticed that this branch 
will
never get activated. This is because the 'update-clone' helper 
will not
output any entry that has the update mode set to 'none', and thus 
the
`while` loop that contains this code would never run.

Which is why I decided to BUG out on that case, because that state
should never be reached. But I see the source of confusion, and 
maybe I
should have different BUG messages for SM_UPDATE_UNSPECIFIED and
SM_UPDATE_NONE. The latter should probably say "should have been 
handled
by update-clone".

>> +
>> +       strvec_push(&cp.args, oid);
>> +
>> +       prepare_submodule_repo_env(&cp.env_array);
>> +
>> +       if (run_command(&cp)) {
>> +               if (must_die_on_failure) {
>> +                       switch (ud->update_strategy.type) {
>> +                       case SM_UPDATE_CHECKOUT:
>> +                               die(_("Unable to checkout '%s' 
>> in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_REBASE:
>> +                               die(_("Unable to rebase '%s' in 
>> submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_MERGE:
>> +                               die(_("Unable to merge '%s' in 
>> submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_COMMAND:
>> +                               die(_("Execution of '%s %s' 
>> failed in submodule path '%s'"),
>> + 
>> ud->update_strategy.command, oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_UNSPECIFIED:
>> +                       case SM_UPDATE_NONE:
>> +                               BUG("update strategy should 
>> have been specified");
>> +                       }
>> +               }
>> +               /*
>> +                * This signifies to the caller in shell that
>> +                * the command failed without dying
>> +                */
>> +               return 1;
>> +       }
>> +
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               printf(_("Submodule path '%s': checked out 
>> '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               printf(_("Submodule path '%s': rebased into 
>> '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               printf(_("Submodule path '%s': merged in 
>> '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               printf(_("Submodule path '%s': '%s %s'\n"),
>> +                      ud->displaypath, 
>> ud->update_strategy.command, oid);
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been 
>> specified");
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +       int subforce = is_null_oid(&ud->suboid) || ud->force;
>> +
>> +       if (!ud->nofetch) {
>> +               /*
>> +                * Run fetch only if `oid` isn't present or it
>> +                * is not reachable from a ref.
>> +                */
>> +               if (!is_tip_reachable(ud->sm_path, &ud->oid))
>> +                       if (fetch_in_submodule(ud->sm_path, 
>> ud->depth, ud->quiet, NULL) &&
>> +                           !ud->quiet)
>> +                               fprintf_ln(stderr,
>> +                                          _("Unable to fetch 
>> in submodule path '%s'; "
>> +                                            "trying to 
>> directly fetch %s:"),
>> +                                          ud->displaypath, 
>> oid_to_hex(&ud->oid));
>
> I was wondering if an OID is invalid, will it be counted as
> unreachable and vice-versa?
> If that is the case then that would simplify the work.

Could you elaborate? I'm not sure what you mean by 'invalid' in 
this
context. I don't think this code will receive any kind of 
malformed
oid--they come from 'update-clone' which handles it correctly.

As far as I can tell, the only way to check if a particular OID is
unreachable is when we check if all the refs cannot find it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH v2] submodule--helper: run update procedures from C
  2021-08-02 18:50   ` Shourya Shukla
  2021-08-03  8:46     ` Atharva Raykar
@ 2021-08-03 10:07     ` Atharva Raykar
  1 sibling, 0 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-03 10:07 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Git List, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Kaartic Sivaraam, Prathamesh Chavan


(I am resending this email, because my client mangled the whitespaces
due to a misconfiguration. Please ignore the my previous message.)

Shourya Shukla <periperidip@gmail.com> writes:

> Le lun. 2 août 2021 à 18:36, Atharva Raykar <raykar.ath@gmail.com> a écrit :
>>
>> Add a new submodule--helper subcommand `run-update-procedure` that runs
>> the update procedure if the SHA1 of the submodule does not match what
>> the superproject expects.
>>
>> This is an intermediate change that works towards total conversion of
>> `submodule update` from shell to C.
>>
>> Specific error codes are returned so that the shell script calling the
>> subcommand can take a decision on the control flow, and preserve the
>> error messages across subsequent recursive calls of `cmd_update`.
>>
>> This patch could have been approached differently, by first changing the
>> `is_tip_reachable` and `fetch_in_submodule` shell functions to be
>> `submodule--helper` subcommands, and then following up with a patch that
>> introduces the `run-update-procedure` subcommand. We have not done it
>> like that because those functions are trivial enough to convert directly
>> along with these other changes. This lets us avoid the boilerplate and
>> the cleanup patches that will need to be introduced in following that
>> approach.
>
> I feel that this part is more suitable for a cover letter rather than the commit
> message itself. It is a useful piece of info though.

Okay, that seems right, the message does seem a bit too context-sensitive.

>> This change is more focused on doing a faithful conversion, so for now we
>> are not too concerned with trying to reduce subprocess spawns.
>>
>> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
>> Mentored-by: Christian Couder <christian.couder@gmail.com>
>> Mentored-by: Shourya Shukla <periperidip@gmail.com>
>> ---
>>
>> Notable changes since v1:
>>
>> * Modified the code structure in
>>   submodule--helper.c:run_update_command(), while fixing problems with
>>   the translation marks.
>>
>> * Renamed '--sha1' and '--subsha1' options to '--oid' and '--suboid' to
>>   since the argument is parsed into an object_id struct, not plain sha1
>>   data.
>>
>> * Used option callbacks to parse the SHA1 arguments directly.
>>
>> * Moved the conditional out of 'do_run_update_procedure()'.
>>
>> Feedback required:
>>
>> Ævar felt that it would be clearer to populate the 'fatal' messages
>> after the run_command() operation in 'run_update_command()', to make it
>> more readable [1]. I have attempted something like that here, and it has led
>> to a lot more duplicated 'switch' statements, which feels suboptimal.
>> I'd appreciate suggestions to make it more legible.
>>
>> [1] https://lore.kernel.org/git/87r1fps63r.fsf@evledraar.gmail.com/
>>
>> Fetch-it-Via:
>> git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-2
>>
>>  builtin/submodule--helper.c | 253 ++++++++++++++++++++++++++++++++++++
>>  git-submodule.sh            | 106 +++++----------
>>  2 files changed, 286 insertions(+), 73 deletions(-)
>>
>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
>> index d55f6262e9..b9c40324d0 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -2029,6 +2029,20 @@ struct submodule_update_clone {
>>         .max_jobs = 1, \
>>  }
>>
>> +struct update_data {
>> +       const char *recursive_prefix;
>> +       const char *sm_path;
>> +       const char *displaypath;
>> +       struct object_id oid;
>> +       struct object_id suboid;
>> +       struct submodule_update_strategy update_strategy;
>> +       int depth;
>> +       unsigned int force: 1;
>> +       unsigned int quiet: 1;
>> +       unsigned int nofetch: 1;
>> +       unsigned int just_cloned: 1;
>> +};
>> +#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
>>
>>  static void next_submodule_warn_missing(struct submodule_update_clone *suc,
>>                 struct strbuf *out, const char *displaypath)
>> @@ -2282,6 +2296,175 @@ static int git_update_clone_config(const char *var, const char *value,
>>         return 0;
>>  }
>> +
>> +static int run_update_command(struct update_data *ud, int subforce)
>> +{
>> +       struct child_process cp = CHILD_PROCESS_INIT;
>> +       char *oid = oid_to_hex(&ud->oid);
>> +       int must_die_on_failure = 0;
>> +
>> +       cp.dir = xstrdup(ud->sm_path);
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               cp.git_cmd = 1;
>> +               strvec_pushl(&cp.args, "checkout", "-q", NULL);
>
> Would it be possible to add the 'if' statement above just before the
> 'switch' (or after,
> whichever seems okay) since this is common amongst (almost) all the cases?

I'll try it on once, if it makes the code more readable, I'll include it in the
reroll.

>> +               if (subforce)
>> +                       strvec_push(&cp.args, "-f");
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "rebase");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "merge");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               /* NOTE: this does not handle quoted arguments */
>> +               strvec_split(&cp.args, ud->update_strategy.command);
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been specified");
>> +       }
>
> If the original did not bug out, do we need to? The documentation does
> not mention
> this as well:
> https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-none

This was how the original shell porcelain did it:

case "$update_module" in
checkout)
	command="git checkout $subforce -q"
	die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
	;;
rebase)
	command="git rebase ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
	must_die_on_failure=yes
	;;
merge)
	command="git merge ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
	must_die_on_failure=yes
	;;
!*)
	command="${update_module#!}"
	die_msg="$(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
	must_die_on_failure=yes
	;;
*)
	die "$(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
esac

The fallthrough case used to die, but I noticed that this branch will never get
activated. This is because the 'update-clone' helper will not output any entry
that has the update mode set to 'none', and thus the `while` loop that contains
this code would never run.

Which is why I decided to BUG out on that case, because that state should never
be reached. But I see the source of confusion, and maybe I should have different
BUG messages for SM_UPDATE_UNSPECIFIED and SM_UPDATE_NONE. The latter should
probably say "should have been handled by update-clone".

>> +
>> +       strvec_push(&cp.args, oid);
>> +
>> +       prepare_submodule_repo_env(&cp.env_array);
>> +
>> +       if (run_command(&cp)) {
>> +               if (must_die_on_failure) {
>> +                       switch (ud->update_strategy.type) {
>> +                       case SM_UPDATE_CHECKOUT:
>> +                               die(_("Unable to checkout '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_REBASE:
>> +                               die(_("Unable to rebase '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_MERGE:
>> +                               die(_("Unable to merge '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_COMMAND:
>> +                               die(_("Execution of '%s %s' failed in submodule path '%s'"),
>> +                                     ud->update_strategy.command, oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_UNSPECIFIED:
>> +                       case SM_UPDATE_NONE:
>> +                               BUG("update strategy should have been specified");
>> +                       }
>> +               }
>> +               /*
>> +                * This signifies to the caller in shell that
>> +                * the command failed without dying
>> +                */
>> +               return 1;
>> +       }
>> +
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               printf(_("Submodule path '%s': checked out '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               printf(_("Submodule path '%s': rebased into '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               printf(_("Submodule path '%s': merged in '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               printf(_("Submodule path '%s': '%s %s'\n"),
>> +                      ud->displaypath, ud->update_strategy.command, oid);
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been specified");
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +       int subforce = is_null_oid(&ud->suboid) || ud->force;
>> +
>> +       if (!ud->nofetch) {
>> +               /*
>> +                * Run fetch only if `oid` isn't present or it
>> +                * is not reachable from a ref.
>> +                */
>> +               if (!is_tip_reachable(ud->sm_path, &ud->oid))
>> +                       if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
>> +                           !ud->quiet)
>> +                               fprintf_ln(stderr,
>> +                                          _("Unable to fetch in submodule path '%s'; "
>> +                                            "trying to directly fetch %s:"),
>> +                                          ud->displaypath, oid_to_hex(&ud->oid));
>
> I was wondering if an OID is invalid, will it be counted as
> unreachable and vice-versa?
> If that is the case then that would simplify the work.

Could you elaborate? I'm not sure what you mean by 'invalid' in this context. I
don't think this code will receive any kind of malformed oid--they come from
'update-clone' which handles it correctly.

As far as I can tell, the only way to check if a particular OID is unreachable
is when we check if all the refs cannot find it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH] submodule--helper: run update procedures from C
  2021-07-23 16:59   ` Atharva Raykar
@ 2021-08-04  8:35     ` Atharva Raykar
  0 siblings, 0 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-04  8:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan


Atharva Raykar <raykar.ath@gmail.com> writes:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>> static void update_submodule(struct update_clone_data *ucd)
>>> {
>>> 	fprintf(stdout, "dummy %s %d\t%s\n",
>>> @@ -2379,6 +2552,79 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>>> 	return update_submodules(&suc);
>>> }
>>>
>>> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
>>> +{
>>> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
>>> +	char *prefixed_path, *update = NULL;
>>> +	char *sha1 = NULL, *subsha1 = NULL;
>>> +	struct update_data update_data = UPDATE_DATA_INIT;
>>> +
>>> +	struct option options[] = {
>>> +		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
>>> +		OPT__FORCE(&force, N_("force checkout updates"), 0),
>>> +		OPT_BOOL('N', "no-fetch", &nofetch,
>>> +			 N_("don't fetch new objects from the remote site")),
>>> +		OPT_BOOL(0, "just-cloned", &just_cloned,
>>> +			 N_("overrides update mode in case the repository is a fresh clone")),
>>> +		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
>>> +		OPT_STRING(0, "prefix", &prefix,
>>> +			   N_("path"),
>>> +			   N_("path into the working tree")),
>>> +		OPT_STRING(0, "update", &update,
>>> +			   N_("string"),
>>> +			   N_("rebase, merge, checkout or none")),
>>> +		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
>>> +			   N_("path into the working tree, across nested "
>>> +			      "submodule boundaries")),
>>> +		OPT_STRING(0, "sha1", &sha1, N_("string"),
>>> +			   N_("SHA1 expected by superproject")),
>>> +		OPT_STRING(0, "subsha1", &subsha1, N_("string"),
>>> +			   N_("SHA1 of submodule's HEAD")),
>>> +		OPT_END()
>>> +	};
>>> +
>>> +	const char *const usage[] = {
>>> +		N_("git submodule--helper run-update-procedure [<options>] <path>"),
>>> +		NULL
>>> +	};
>>> +
>>> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
>>> +
>>> +	if (argc != 1)
>>> +		usage_with_options(usage, options);
>>> +	update_data.force = !!force;
>>> +	update_data.quiet = !!quiet;
>>> +	update_data.nofetch = !!nofetch;
>>> +	update_data.just_cloned = !!just_cloned;
>>
>> For all of these just pass the reference to the update_data variable
>> directly in the OPT_*(). No need to set an "int force", only to copy it
>> over to update_data.force. Let's just use the latter only.
>
> Hmm, I'm trying to remember why the single bit values are treated this way
> in this whole file...
>
> ...there seems to be no good reason for it. The API docs for parse options
> state that OPT_BOOL() is guaranteed to return either zero or one, so that
> double negation does look unnecessary.

I forgot to mention why I did not address this change in my v3 patch.
The reason why we are handling boolean values this way is because they
are declared as bitfields in the 'update_data' struct. Since we cannot
take the address of bitfields, we have to use a different variable to
store when using 'parse_options()'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [GSoC] [PATCH v3] submodule--helper: run update procedures from C
  2021-08-02 13:06 ` [GSoC] [PATCH v2] " Atharva Raykar
  2021-08-02 18:50   ` Shourya Shukla
@ 2021-08-13  7:56   ` Atharva Raykar
  2021-08-13 18:32     ` Junio C Hamano
  2021-08-24 14:06     ` [PATCH v4] " Atharva Raykar
  1 sibling, 2 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-13  7:56 UTC (permalink / raw)
  To: git
  Cc: Atharva Raykar, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan

Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.

This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.

Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.

This change is more focused on doing a faithful conversion, so for now we
are not too concerned with trying to reduce subprocess spawns.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
---

This patch could have been approached differently, by first changing the
`is_tip_reachable` and `fetch_in_submodule` shell functions to be
`submodule--helper` subcommands, and then following up with a patch that
introduces the `run-update-procedure` subcommand. We have not done it
like that because those functions are trivial enough to convert directly
along with these other changes. This lets us avoid the boilerplate and
the cleanup patches that will need to be introduced in following that
approach.

Since v2:
* Different BUG messages in run_update_command() for the "Unspecified" and
  "None" update modes.
* Move the information about how the patch was approached out of the commit
  message.
* Rebase this patch on top of master (the previous one was based on a stale,
  unmerged topic branch). This patch no longer depends on a topic branch.

Fetch-it-Via:
git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-4

 builtin/submodule--helper.c | 259 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            | 104 +++++----------
 2 files changed, 291 insertions(+), 72 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e4..9b34b29ce2 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2045,6 +2045,20 @@ struct submodule_update_clone {
 	.max_jobs = 1, \
 }
 
+struct update_data {
+	const char *recursive_prefix;
+	const char *sm_path;
+	const char *displaypath;
+	struct object_id oid;
+	struct object_id suboid;
+	struct submodule_update_strategy update_strategy;
+	int depth;
+	unsigned int force: 1;
+	unsigned int quiet: 1;
+	unsigned int nofetch: 1;
+	unsigned int just_cloned: 1;
+};
+#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
 		struct strbuf *out, const char *displaypath)
@@ -2298,6 +2312,181 @@ static int git_update_clone_config(const char *var, const char *value,
 	return 0;
 }
 
+static int is_tip_reachable(const char *path, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf rev = STRBUF_INIT;
+	char *hex = oid_to_hex(oid);
+
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(path);
+	cp.no_stderr = 1;
+	strvec_pushl(&cp.args, "rev-list", "-n", "1", hex, "--not", "--all", NULL);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
+		return 0;
+
+	return 1;
+}
+
+static int fetch_in_submodule(const char *module_path, int depth, int quiet, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+
+	prepare_submodule_repo_env(&cp.env_array);
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(module_path);
+
+	strvec_push(&cp.args, "fetch");
+	if (quiet)
+		strvec_push(&cp.args, "--quiet");
+	if (depth)
+		strvec_pushf(&cp.args, "--depth=%d", depth);
+	if (oid) {
+		char *hex = oid_to_hex(oid);
+		char *remote = get_default_remote();
+		strvec_pushl(&cp.args, remote, hex, NULL);
+	}
+
+	return run_command(&cp);
+}
+
+static int run_update_command(struct update_data *ud, int subforce)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	char *oid = oid_to_hex(&ud->oid);
+	int must_die_on_failure = 0;
+
+	cp.dir = xstrdup(ud->sm_path);
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		cp.git_cmd = 1;
+		strvec_pushl(&cp.args, "checkout", "-q", NULL);
+		if (subforce)
+			strvec_push(&cp.args, "-f");
+		break;
+	case SM_UPDATE_REBASE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "rebase");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_MERGE:
+		cp.git_cmd = 1;
+		strvec_push(&cp.args, "merge");
+		if (ud->quiet)
+			strvec_push(&cp.args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_COMMAND:
+		/* NOTE: this does not handle quoted arguments */
+		strvec_split(&cp.args, ud->update_strategy.command);
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_NONE:
+		BUG("this should have been handled before. How did we reach here?");
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+		BUG("update strategy should have been specified");
+	}
+
+	strvec_push(&cp.args, oid);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (run_command(&cp)) {
+		if (must_die_on_failure) {
+			switch (ud->update_strategy.type) {
+			case SM_UPDATE_CHECKOUT:
+				die(_("Unable to checkout '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_REBASE:
+				die(_("Unable to rebase '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_MERGE:
+				die(_("Unable to merge '%s' in submodule path '%s'"),
+				      oid, ud->displaypath);
+				break;
+			case SM_UPDATE_COMMAND:
+				die(_("Execution of '%s %s' failed in submodule path '%s'"),
+				      ud->update_strategy.command, oid, ud->displaypath);
+				break;
+			case SM_UPDATE_NONE:
+				BUG("this should have been handled before. How did we reach here?");
+				break;
+			case SM_UPDATE_UNSPECIFIED:
+				BUG("update strategy should have been specified");
+			}
+		}
+		/*
+		 * This signifies to the caller in shell that
+		 * the command failed without dying
+		 */
+		return 1;
+	}
+
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		printf(_("Submodule path '%s': checked out '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_REBASE:
+		printf(_("Submodule path '%s': rebased into '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_MERGE:
+		printf(_("Submodule path '%s': merged in '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_COMMAND:
+		printf(_("Submodule path '%s': '%s %s'\n"),
+		       ud->displaypath, ud->update_strategy.command, oid);
+		break;
+	case SM_UPDATE_NONE:
+		BUG("this should have been handled before. How did we reach here?");
+		break;
+	case SM_UPDATE_UNSPECIFIED:
+		BUG("update strategy should have been specified");
+	}
+
+	return 0;
+}
+
+static int do_run_update_procedure(struct update_data *ud)
+{
+	int subforce = is_null_oid(&ud->suboid) || ud->force;
+
+	if (!ud->nofetch) {
+		/*
+		 * Run fetch only if `oid` isn't present or it
+		 * is not reachable from a ref.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
+			    !ud->quiet)
+				fprintf_ln(stderr,
+					   _("Unable to fetch in submodule path '%s'; "
+					     "trying to directly fetch %s:"),
+					   ud->displaypath, oid_to_hex(&ud->oid));
+		/*
+		 * Now we tried the usual fetch, but `oid` may
+		 * not be reachable from any of the refs.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
+				die(_("Fetched in submodule path '%s', but it did not "
+				      "contain %s. Direct fetching of that commit failed."),
+				    ud->displaypath, oid_to_hex(&ud->oid));
+	}
+
+	return run_update_command(ud, subforce);
+}
+
 static void update_submodule(struct update_clone_data *ucd)
 {
 	fprintf(stdout, "dummy %s %d\t%s\n",
@@ -2395,6 +2584,75 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	return update_submodules(&suc);
 }
 
+static int run_update_procedure(int argc, const char **argv, const char *prefix)
+{
+	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
+	char *prefixed_path, *update = NULL;
+	struct update_data update_data = UPDATE_DATA_INIT;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
+		OPT__FORCE(&force, N_("force checkout updates"), 0),
+		OPT_BOOL('N', "no-fetch", &nofetch,
+			 N_("don't fetch new objects from the remote site")),
+		OPT_BOOL(0, "just-cloned", &just_cloned,
+			 N_("overrides update mode in case the repository is a fresh clone")),
+		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("rebase, merge, checkout or none")),
+		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_CALLBACK_F(0, "oid", &update_data.oid, N_("sha1"),
+			       N_("SHA1 expected by superproject"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_CALLBACK_F(0, "suboid", &update_data.suboid, N_("subsha1"),
+			       N_("SHA1 of submodule's HEAD"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_END()
+	};
+
+	const char *const usage[] = {
+		N_("git submodule--helper run-update-procedure [<options>] <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (argc != 1)
+		usage_with_options(usage, options);
+
+	update_data.force = !!force;
+	update_data.quiet = !!quiet;
+	update_data.nofetch = !!nofetch;
+	update_data.just_cloned = !!just_cloned;
+	update_data.sm_path = argv[0];
+
+	if (update_data.recursive_prefix)
+		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
+	else
+		prefixed_path = xstrdup(update_data.sm_path);
+
+	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
+
+	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
+					    update_data.sm_path, update,
+					    &update_data.update_strategy);
+
+	free(prefixed_path);
+
+	if ((!is_null_oid(&update_data.oid) && !is_null_oid(&update_data.suboid) &&
+	     !oideq(&update_data.oid, &update_data.suboid)) ||
+	    is_null_oid(&update_data.suboid) || update_data.force)
+		return do_run_update_procedure(&update_data);
+
+	return 3;
+}
+
 static int resolve_relative_path(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -2951,6 +3209,7 @@ static struct cmd_struct commands[] = {
 	{"add-clone", add_clone, 0},
 	{"update-module-mode", module_update_module_mode, 0},
 	{"update-clone", update_clone, 0},
+	{"run-update-procedure", run_update_procedure, 0},
 	{"ensure-core-worktree", ensure_core_worktree, 0},
 	{"relative-path", resolve_relative_path, 0},
 	{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index dbd2ec2050..d8e30d1afa 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -369,13 +369,6 @@ cmd_deinit()
 	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit ${GIT_QUIET:+--quiet} ${force:+--force} ${deinit_all:+--all} -- "$@"
 }
 
-is_tip_reachable () (
-	sanitize_submodule_env &&
-	cd "$1" &&
-	rev=$(git rev-list -n 1 "$2" --not --all 2>/dev/null) &&
-	test -z "$rev"
-)
-
 # usage: fetch_in_submodule <module_path> [<depth>] [<sha1>]
 # Because arguments are positional, use an empty string to omit <depth>
 # but include <sha1>.
@@ -519,14 +512,13 @@ cmd_update()
 
 		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
 
-		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
-
 		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
 
 		if test $just_cloned -eq 1
 		then
 			subsha1=
 		else
+			just_cloned=
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
 			die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")"
@@ -547,70 +539,38 @@ cmd_update()
 			die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
 		fi
 
-		if test "$subsha1" != "$sha1" || test -n "$force"
-		then
-			subforce=$force
-			# If we don't already have a -f flag and the submodule has never been checked out
-			if test -z "$subsha1" && test -z "$force"
-			then
-				subforce="-f"
-			fi
+		out=$(git submodule--helper run-update-procedure \
+			  ${wt_prefix:+--prefix "$wt_prefix"} \
+			  ${GIT_QUIET:+--quiet} \
+			  ${force:+--force} \
+			  ${just_cloned:+--just-cloned} \
+			  ${nofetch:+--no-fetch} \
+			  ${depth:+"$depth"} \
+			  ${update:+--update "$update"} \
+			  ${prefix:+--recursive-prefix "$prefix"} \
+			  ${sha1:+--oid "$sha1"} \
+			  ${subsha1:+--suboid "$subsha1"} \
+			  "--" \
+			  "$sm_path")
 
-			if test -z "$nofetch"
-			then
-				# Run fetch only if $sha1 isn't present or it
-				# is not reachable from a ref.
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" $depth ||
-				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
-
-				# Now we tried the usual fetch, but $sha1 may
-				# not be reachable from any of the refs
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
-				die "fatal: $(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
-			fi
-
-			must_die_on_failure=
-			case "$update_module" in
-			checkout)
-				command="git checkout $subforce -q"
-				die_msg="fatal: $(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
-				;;
-			rebase)
-				command="git rebase ${GIT_QUIET:+--quiet}"
-				die_msg="fatal: $(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			merge)
-				command="git merge ${GIT_QUIET:+--quiet}"
-				die_msg="fatal: $(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			!*)
-				command="${update_module#!}"
-				die_msg="fatal: $(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
-				must_die_on_failure=yes
-				;;
-			*)
-				die "fatal: $(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
-			esac
-
-			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
-			then
-				say "$say_msg"
-			elif test -n "$must_die_on_failure"
-			then
-				die_with_status 2 "$die_msg"
-			else
-				err="${err};$die_msg"
-				continue
-			fi
-		fi
+		# exit codes for run-update-procedure:
+		# 0: update was successful, say command output
+		# 128: subcommand died during execution
+		# 1: update procedure failed, but should not die
+		# 3: no update procedure was run
+		res="$?"
+		case $res in
+		0)
+			say "$out"
+			;;
+		128)
+			exit $res
+			;;
+		1)
+			err="${err};$out"
+			continue
+			;;
+		esac
 
 		if test -n "$recursive"
 		then
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH v3] submodule--helper: run update procedures from C
  2021-08-13  7:56   ` [GSoC] [PATCH v3] " Atharva Raykar
@ 2021-08-13 18:32     ` Junio C Hamano
  2021-08-24  8:58       ` Atharva Raykar
  2021-08-24 14:06     ` [PATCH v4] " Atharva Raykar
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2021-08-13 18:32 UTC (permalink / raw)
  To: Atharva Raykar
  Cc: git, Emily Shaffer, Jonathan Nieder, Christian Couder,
	Shourya Shukla, Kaartic Sivaraam, Prathamesh Chavan

Atharva Raykar <raykar.ath@gmail.com> writes:

> Add a new submodule--helper subcommand `run-update-procedure` that runs
> the update procedure if the SHA1 of the submodule does not match what
> the superproject expects.
>
> This is an intermediate change that works towards total conversion of
> `submodule update` from shell to C.

OK.  Various things can happen depending on the setting during the
update procedure, and it is complex enough to split it out to a
single step, I guess.

> Specific error codes are returned so that the shell script calling the
> subcommand can take a decision on the control flow, and preserve the
> error messages across subsequent recursive calls of `cmd_update`.
>
> This change is more focused on doing a faithful conversion, so for now we
> are not too concerned with trying to reduce subprocess spawns.
>
> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Shourya Shukla <periperidip@gmail.com>

Keep trailer lines in chronological order.  The mentors mentored,
the patch was written and finally you signed it off.

> +static int run_update_command(struct update_data *ud, int subforce)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	char *oid = oid_to_hex(&ud->oid);
> +	int must_die_on_failure = 0;
> +
> +	cp.dir = xstrdup(ud->sm_path);
> +	switch (ud->update_strategy.type) {
> +	case SM_UPDATE_CHECKOUT:
> +		cp.git_cmd = 1;
> +		strvec_pushl(&cp.args, "checkout", "-q", NULL);
> +		if (subforce)
> +			strvec_push(&cp.args, "-f");
> +		break;
> +	case SM_UPDATE_REBASE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "rebase");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_MERGE:
> +		cp.git_cmd = 1;
> +		strvec_push(&cp.args, "merge");
> +		if (ud->quiet)
> +			strvec_push(&cp.args, "--quiet");
> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_COMMAND:
> +		/* NOTE: this does not handle quoted arguments */
> +		strvec_split(&cp.args, ud->update_strategy.command);

Indeed this doesn't.  I think cp.use_shell would be the right way to
run this.

Study what happens before run_command_v_opt_cd_env() with
RUN_USING_SHELL calls run_command() and make something similar
happen here, instead of doing a manual command line splitting.

	Side note: run_command_v_opt_cd_env() with RUN_USING_SHELL
	is used in places like diff.c::run_external_diff() to invoke
	an external diff command, ll-merge.c::ll_ext_merge() to
	invoke a user-defined low level merge driver,
	sequencer.c::do_exec() to invoke 'x cmd' you write in the
	todo list during an "rebase -i" session.

> +		must_die_on_failure = 1;
> +		break;
> +	case SM_UPDATE_NONE:
> +		BUG("this should have been handled before. How did we reach here?");
> +		break;
> +	case SM_UPDATE_UNSPECIFIED:
> +		BUG("update strategy should have been specified");

These two case arms are not a faithful conversion from the original,
but because you do not carry around a random string from the caller
and instead have parsed enums, it cannot be ;-)  But it makes me
wonder why we want these two cases separate.  Isn't it a BUG() if
anything other than what we handled (i.e. prepared cp.args for)
already is in ud->update_strategy.type?  IOW, wouldn't it be more
forward looking to do

	default:
		BUG("unexpected ud->update_strategy.type (%d)",
		    ud->update_strategy.type (%d)");

or something?  That way, if we ever come up with a new update
strategy and forget to update this part, we will catch such a bug
fairly quickly.

> +	}
> +
> +	strvec_push(&cp.args, oid);
> +
> +	prepare_submodule_repo_env(&cp.env_array);
> +
> +	if (run_command(&cp)) {
> +		if (must_die_on_failure) {
> +			switch (ud->update_strategy.type) {
> +			case SM_UPDATE_CHECKOUT:
> +				die(_("Unable to checkout '%s' in submodule path '%s'"),
> +				      oid, ud->displaypath);
> +				break;
> +			case SM_UPDATE_REBASE:
> +				die(_("Unable to rebase '%s' in submodule path '%s'"),
> +				      oid, ud->displaypath);
> +				break;
> +			case SM_UPDATE_MERGE:
> +				die(_("Unable to merge '%s' in submodule path '%s'"),
> +				      oid, ud->displaypath);
> +				break;
> +			case SM_UPDATE_COMMAND:
> +				die(_("Execution of '%s %s' failed in submodule path '%s'"),
> +				      ud->update_strategy.command, oid, ud->displaypath);
> +				break;

The messages here correspond to what is assigned to $die_message in
the original.  Note that they are emitted to the standard error
stream.

I suspect that these should be "printf()" followed by a call to
exit() with some non-zero value (see below).

> +			case SM_UPDATE_NONE:
> +				BUG("this should have been handled before. How did we reach here?");
> +				break;
> +			case SM_UPDATE_UNSPECIFIED:
> +				BUG("update strategy should have been specified");
> +			}

The same comment applies to the last two case arms of this switch
statement and the next one, too.  I think we just should catch
"everything else" with a simple "default:" label.

Also, don't omit the "break;" from the last case arm in a switch
statement.  It harms the long-term help of the code---the last case
arm may not forever stay to be the last one.

> +		}
> +		/*
> +		 * This signifies to the caller in shell that
> +		 * the command failed without dying
> +		 */
> +		return 1;
> +	}
> +
> +	switch (ud->update_strategy.type) {
> +	case SM_UPDATE_CHECKOUT:
> +		printf(_("Submodule path '%s': checked out '%s'\n"),
> +		       ud->displaypath, oid);
> +		break;
> +	case SM_UPDATE_REBASE:
> +		printf(_("Submodule path '%s': rebased into '%s'\n"),
> +		       ud->displaypath, oid);
> +		break;
> +	case SM_UPDATE_MERGE:
> +		printf(_("Submodule path '%s': merged in '%s'\n"),
> +		       ud->displaypath, oid);
> +		break;
> +	case SM_UPDATE_COMMAND:
> +		printf(_("Submodule path '%s': '%s %s'\n"),
> +		       ud->displaypath, ud->update_strategy.command, oid);
> +		break;
> +	case SM_UPDATE_NONE:
> +		BUG("this should have been handled before. How did we reach here?");
> +		break;
> +	case SM_UPDATE_UNSPECIFIED:
> +		BUG("update strategy should have been specified");

Likewise here.

> +	}
> +
> +	return 0;
> +}
> +
> +static int do_run_update_procedure(struct update_data *ud)
> +{
> +	int subforce = is_null_oid(&ud->suboid) || ud->force;
> +
> +	if (!ud->nofetch) {
> +		/*
> +		 * Run fetch only if `oid` isn't present or it
> +		 * is not reachable from a ref.
> +		 */
> +		if (!is_tip_reachable(ud->sm_path, &ud->oid))
> +			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
> +			    !ud->quiet)
> +				fprintf_ln(stderr,

OK.  Combining these into a single statement like

		if (!is_tip_reachable(...) &&
		    fetch_in_submodule(...) &&
		    !ud->quiet)
			fprintf_ln(...

would reduce the indentation level, but the way the conditional is
structured may convey the flow of the thought better, i.e.

	if we need to fetch, 
	    try to fetch and if that fails,
		report failure.

On the other hand, if we take that line of thought to the extreme,
the check for !ud->quiet should belong to another level of if
statement so perhaps the more concise version I showed above might
be an overall win.  I dunno.

> +					   _("Unable to fetch in submodule path '%s'; "
> +					     "trying to directly fetch %s:"),
> +					   ud->displaypath, oid_to_hex(&ud->oid));
> +		/*
> +		 * Now we tried the usual fetch, but `oid` may
> +		 * not be reachable from any of the refs.
> +		 */
> +		if (!is_tip_reachable(ud->sm_path, &ud->oid))
> +			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
> +				die(_("Fetched in submodule path '%s', but it did not "
> +				      "contain %s. Direct fetching of that commit failed."),
> +				    ud->displaypath, oid_to_hex(&ud->oid));

Likewise.

> +	}
> +
> +	return run_update_command(ud, subforce);
> +}
> +
>  static void update_submodule(struct update_clone_data *ucd)
>  {
>  	fprintf(stdout, "dummy %s %d\t%s\n",
> @@ -2395,6 +2584,75 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>  	return update_submodules(&suc);
>  }
>  
> +static int run_update_procedure(int argc, const char **argv, const char *prefix)
> +{
> +	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
> +	char *prefixed_path, *update = NULL;
> +	struct update_data update_data = UPDATE_DATA_INIT;
> +
> +	struct option options[] = {
> ...
> +		OPT_END()
> +	};
> ...
> +	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
> +
> +	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
> +					    update_data.sm_path, update,
> +					    &update_data.update_strategy);
> +
> +	free(prefixed_path);
> +
> +	if ((!is_null_oid(&update_data.oid) && !is_null_oid(&update_data.suboid) &&
> +	     !oideq(&update_data.oid, &update_data.suboid)) ||
> +	    is_null_oid(&update_data.suboid) || update_data.force)
> +		return do_run_update_procedure(&update_data);

The original does the update procedure if $sha1 and $subsha1 are
different or if $force option is given.  The rewritten seems to skip
the update when .oid is NULL and .suboid is not NULL; intended?

I understand that the division of labour between this function and
do_run_update_procedure() is for the former to only exist to
interface with the script side by populating the update_data
structure, and the latter implements the logic to run update
procedure.  I was a bit surprised that this conditional is
here, not at the very beginning of the callee.

> +	return 3;
> +}
> +
>  static int resolve_relative_path(int argc, const char **argv, const char *prefix)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> @@ -2951,6 +3209,7 @@ static struct cmd_struct commands[] = {
>  	{"add-clone", add_clone, 0},
>  	{"update-module-mode", module_update_module_mode, 0},
>  	{"update-clone", update_clone, 0},
> +	{"run-update-procedure", run_update_procedure, 0},
>  	{"ensure-core-worktree", ensure_core_worktree, 0},
>  	{"relative-path", resolve_relative_path, 0},
>  	{"resolve-relative-url", resolve_relative_url, 0},
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dbd2ec2050..d8e30d1afa 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -519,14 +512,13 @@ cmd_update()
>  
>  		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
>  
> -		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
> -
>  		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")

On the other side of the API boundary, update_data.displaypath is
populated by value computed in C.  It is a bit unfortunate that we
still need to compute it here and risk the two to drift apart.

>  		if test $just_cloned -eq 1
>  		then
>  			subsha1=
>  		else
> +			just_cloned=
>  			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
>  				git rev-parse --verify HEAD) ||
>  			die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")"
> @@ -547,70 +539,38 @@ cmd_update()
>  			die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
>  		fi
>  
> -		if test "$subsha1" != "$sha1" || test -n "$force"
> -		then
> -			subforce=$force
> -			# If we don't already have a -f flag and the submodule has never been checked out
> -			if test -z "$subsha1" && test -z "$force"
> -			then
> -				subforce="-f"
> -			fi
> +		out=$(git submodule--helper run-update-procedure \
> +			  ${wt_prefix:+--prefix "$wt_prefix"} \
> +			  ${GIT_QUIET:+--quiet} \
> +			  ${force:+--force} \
> +			  ${just_cloned:+--just-cloned} \
> +			  ${nofetch:+--no-fetch} \
> +			  ${depth:+"$depth"} \
> +			  ${update:+--update "$update"} \
> +			  ${prefix:+--recursive-prefix "$prefix"} \
> +			  ${sha1:+--oid "$sha1"} \
> +			  ${subsha1:+--suboid "$subsha1"} \
> +			  "--" \
> +			  "$sm_path")

We'd just show errors directly to the standard error stream from
submodule--helper, but what comes from the printf in the switch
statement at the end of run_update_command() is captured in $out
variable.  Notably, the messages from die()s in the second switch
statement in run_update_command() are not captured in $out here.

> -			if test -z "$nofetch"
> -			then
> -				# Run fetch only if $sha1 isn't present or it
> -				# is not reachable from a ref.
> -				is_tip_reachable "$sm_path" "$sha1" ||
> -				fetch_in_submodule "$sm_path" $depth ||
> -				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
> -
> -				# Now we tried the usual fetch, but $sha1 may
> -				# not be reachable from any of the refs
> -				is_tip_reachable "$sm_path" "$sha1" ||
> -				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
> -				die "fatal: $(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
> -			fi
> -
> -			must_die_on_failure=
> -			case "$update_module" in
> -			checkout)
> -				command="git checkout $subforce -q"
> -				die_msg="fatal: $(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
> -				;;
> -			rebase)
> -				command="git rebase ${GIT_QUIET:+--quiet}"
> -				die_msg="fatal: $(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
> -				must_die_on_failure=yes
> -				;;
> -			merge)
> -				command="git merge ${GIT_QUIET:+--quiet}"
> -				die_msg="fatal: $(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
> -				must_die_on_failure=yes
> -				;;
> -			!*)
> -				command="${update_module#!}"
> -				die_msg="fatal: $(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
> -				must_die_on_failure=yes
> -				;;
> -			*)
> -				die "fatal: $(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
> -			esac
> -
> -			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
> -			then
> -				say "$say_msg"
> -			elif test -n "$must_die_on_failure"
> -			then
> -				die_with_status 2 "$die_msg"
> -			else
> -				err="${err};$die_msg"
> -				continue
> -			fi
> -		fi
> +		# exit codes for run-update-procedure:
> +		# 0: update was successful, say command output
> +		# 128: subcommand died during execution
> +		# 1: update procedure failed, but should not die
> +		# 3: no update procedure was run
> +		res="$?"
> +		case $res in
> +		0)
> +			say "$out"
> +			;;

And the case where there is no error is quite straight-forward.  We
just emit what we saw in the standard output stream of the helper.

> +		128)
> +			exit $res
> +			;;
> +		1)
> +			err="${err};$out"

This part is dubious.  In the original, $err accumulates what is in
$die_msg, which are things like "fatal: Unable to rebase ...", but
with this patch, what used to be the contents of $die_msg are given
to die() after we see run_command() fail, and would have sent to the
standard error stream, not captured in $out here, no?

> +			continue
> +			;;
> +		esac
>  
>  		if test -n "$recursive"
>  		then


Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC] [PATCH v3] submodule--helper: run update procedures from C
  2021-08-13 18:32     ` Junio C Hamano
@ 2021-08-24  8:58       ` Atharva Raykar
  0 siblings, 0 replies; 13+ messages in thread
From: Atharva Raykar @ 2021-08-24  8:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, Jonathan Nieder, Christian Couder,
	Shourya Shukla, Kaartic Sivaraam, Prathamesh Chavan

Pardon my late response, I had been occupied with other things.

Junio C Hamano <gitster@pobox.com> writes:

> Atharva Raykar <raykar.ath@gmail.com> writes:
> [...]
>> Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
>> Mentored-by: Christian Couder <christian.couder@gmail.com>
>> Mentored-by: Shourya Shukla <periperidip@gmail.com>
>
> Keep trailer lines in chronological order.  The mentors mentored,
> the patch was written and finally you signed it off.

Okay.

>> +static int run_update_command(struct update_data *ud, int subforce)
>> +{
>> +	struct child_process cp = CHILD_PROCESS_INIT;
>> +	char *oid = oid_to_hex(&ud->oid);
>> +	int must_die_on_failure = 0;
>> +
>> +	cp.dir = xstrdup(ud->sm_path);
>> +	switch (ud->update_strategy.type) {
>> +	case SM_UPDATE_CHECKOUT:
>> +		cp.git_cmd = 1;
>> +		strvec_pushl(&cp.args, "checkout", "-q", NULL);
>> +		if (subforce)
>> +			strvec_push(&cp.args, "-f");
>> +		break;
>> +	case SM_UPDATE_REBASE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "rebase");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_MERGE:
>> +		cp.git_cmd = 1;
>> +		strvec_push(&cp.args, "merge");
>> +		if (ud->quiet)
>> +			strvec_push(&cp.args, "--quiet");
>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_COMMAND:
>> +		/* NOTE: this does not handle quoted arguments */
>> +		strvec_split(&cp.args, ud->update_strategy.command);
>
> Indeed this doesn't.  I think cp.use_shell would be the right way to
> run this.
>
> Study what happens before run_command_v_opt_cd_env() with
> RUN_USING_SHELL calls run_command() and make something similar
> happen here, instead of doing a manual command line splitting.
>
> 	Side note: run_command_v_opt_cd_env() with RUN_USING_SHELL
> 	is used in places like diff.c::run_external_diff() to invoke
> 	an external diff command, ll-merge.c::ll_ext_merge() to
> 	invoke a user-defined low level merge driver,
> 	sequencer.c::do_exec() to invoke 'x cmd' you write in the
> 	todo list during an "rebase -i" session.

Thanks for the pointers, the details helped. I'll handle this more
correctly in the next version.

>> +		must_die_on_failure = 1;
>> +		break;
>> +	case SM_UPDATE_NONE:
>> +		BUG("this should have been handled before. How did we reach here?");
>> +		break;
>> +	case SM_UPDATE_UNSPECIFIED:
>> +		BUG("update strategy should have been specified");
>
> These two case arms are not a faithful conversion from the original,
> but because you do not carry around a random string from the caller
> and instead have parsed enums, it cannot be ;-)  But it makes me
> wonder why we want these two cases separate.  Isn't it a BUG() if
> anything other than what we handled (i.e. prepared cp.args for)
> already is in ud->update_strategy.type?  IOW, wouldn't it be more
> forward looking to do
>
> 	default:
> 		BUG("unexpected ud->update_strategy.type (%d)",
> 		    ud->update_strategy.type (%d)");
>
> or something?  That way, if we ever come up with a new update
> strategy and forget to update this part, we will catch such a bug
> fairly quickly.

The original intention for separating the cases was to differentiate the
cause for the invalid state, but your proposed suggestion is a lot
better. I'll address this.

>> +	}
>> +
>> +	strvec_push(&cp.args, oid);
>> +
>> +	prepare_submodule_repo_env(&cp.env_array);
>> +
>> +	if (run_command(&cp)) {
>> +		if (must_die_on_failure) {
>> +			switch (ud->update_strategy.type) {
>> +			case SM_UPDATE_CHECKOUT:
>> +				die(_("Unable to checkout '%s' in submodule path '%s'"),
>> +				      oid, ud->displaypath);
>> +				break;
>> +			case SM_UPDATE_REBASE:
>> +				die(_("Unable to rebase '%s' in submodule path '%s'"),
>> +				      oid, ud->displaypath);
>> +				break;
>> +			case SM_UPDATE_MERGE:
>> +				die(_("Unable to merge '%s' in submodule path '%s'"),
>> +				      oid, ud->displaypath);
>> +				break;
>> +			case SM_UPDATE_COMMAND:
>> +				die(_("Execution of '%s %s' failed in submodule path '%s'"),
>> +				      ud->update_strategy.command, oid, ud->displaypath);
>> +				break;
>
> The messages here correspond to what is assigned to $die_message in
> the original.  Note that they are emitted to the standard error
> stream.
>
> I suspect that these should be "printf()" followed by a call to
> exit() with some non-zero value (see below).

I also notice another major lapse in conversion. In the shell porcelain,
the "checkout" mode should not die out at all, instead it should print
out the error message.

My code tries to die() on the checkout mode (in a case arm that will
never be activated), and does not ever print the checkout failure
message at all. I will fix this in the re-roll.

(Will address more of this below...)

>> +			case SM_UPDATE_NONE:
>> +				BUG("this should have been handled before. How did we reach here?");
>> +				break;
>> +			case SM_UPDATE_UNSPECIFIED:
>> +				BUG("update strategy should have been specified");
>> +			}
>
> The same comment applies to the last two case arms of this switch
> statement and the next one, too.  I think we just should catch
> "everything else" with a simple "default:" label.
>
> Also, don't omit the "break;" from the last case arm in a switch
> statement.  It harms the long-term help of the code---the last case
> arm may not forever stay to be the last one.
>
>> +		}
>> +		/*
>> +		 * This signifies to the caller in shell that
>> +		 * the command failed without dying
>> +		 */
>> +		return 1;
>> +	}
>> +
>> +	switch (ud->update_strategy.type) {
>> +	case SM_UPDATE_CHECKOUT:
>> +		printf(_("Submodule path '%s': checked out '%s'\n"),
>> +		       ud->displaypath, oid);
>> +		break;
>> +	case SM_UPDATE_REBASE:
>> +		printf(_("Submodule path '%s': rebased into '%s'\n"),
>> +		       ud->displaypath, oid);
>> +		break;
>> +	case SM_UPDATE_MERGE:
>> +		printf(_("Submodule path '%s': merged in '%s'\n"),
>> +		       ud->displaypath, oid);
>> +		break;
>> +	case SM_UPDATE_COMMAND:
>> +		printf(_("Submodule path '%s': '%s %s'\n"),
>> +		       ud->displaypath, ud->update_strategy.command, oid);
>> +		break;
>> +	case SM_UPDATE_NONE:
>> +		BUG("this should have been handled before. How did we reach here?");
>> +		break;
>> +	case SM_UPDATE_UNSPECIFIED:
>> +		BUG("update strategy should have been specified");
>
> Likewise here.

Okay.

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +	int subforce = is_null_oid(&ud->suboid) || ud->force;
>> +
>> +	if (!ud->nofetch) {
>> +		/*
>> +		 * Run fetch only if `oid` isn't present or it
>> +		 * is not reachable from a ref.
>> +		 */
>> +		if (!is_tip_reachable(ud->sm_path, &ud->oid))
>> +			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
>> +			    !ud->quiet)
>> +				fprintf_ln(stderr,
>
> OK.  Combining these into a single statement like
>
> 		if (!is_tip_reachable(...) &&
> 		    fetch_in_submodule(...) &&
> 		    !ud->quiet)
> 			fprintf_ln(...
>
> would reduce the indentation level, but the way the conditional is
> structured may convey the flow of the thought better, i.e.
>
> 	if we need to fetch,
> 	    try to fetch and if that fails,
> 		report failure.
>
> On the other hand, if we take that line of thought to the extreme,
> the check for !ud->quiet should belong to another level of if
> statement so perhaps the more concise version I showed above might
> be an overall win.  I dunno.

Right. I agree with you, mainly because there are many other predicates
in my previous conversions that were done with short-circuited &&'s, so
might as well stick to what has been my convention.

>> +					   _("Unable to fetch in submodule path '%s'; "
>> +					     "trying to directly fetch %s:"),
>> +					   ud->displaypath, oid_to_hex(&ud->oid));
>> +		/*
>> +		 * Now we tried the usual fetch, but `oid` may
>> +		 * not be reachable from any of the refs.
>> +		 */
>> +		if (!is_tip_reachable(ud->sm_path, &ud->oid))
>> +			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
>> +				die(_("Fetched in submodule path '%s', but it did not "
>> +				      "contain %s. Direct fetching of that commit failed."),
>> +				    ud->displaypath, oid_to_hex(&ud->oid));
>
> Likewise.
>
>> +	}
>> [...]
>> +
>> +	if ((!is_null_oid(&update_data.oid) && !is_null_oid(&update_data.suboid) &&
>> +	     !oideq(&update_data.oid, &update_data.suboid)) ||
>> +	    is_null_oid(&update_data.suboid) || update_data.force)
>> +		return do_run_update_procedure(&update_data);
>
> The original does the update procedure if $sha1 and $subsha1 are
> different or if $force option is given.  The rewritten seems to skip
> the update when .oid is NULL and .suboid is not NULL; intended?

Unintended. I initially implemented this with raw chars until I
discovered the object_id API. So this was the result of me
indiscriminately substituting NULL checks with the OID equivalents. I
realise this is not needed anymore, and we can simplify that to:

    if (!oideq(&update_data.oid, &update_data.suboid) || update_data.force)

> I understand that the division of labour between this function and
> do_run_update_procedure() is for the former to only exist to
> interface with the script side by populating the update_data
> structure, and the latter implements the logic to run update
> procedure.  I was a bit surprised that this conditional is
> here, not at the very beginning of the callee.

Ævar pointed out that since this function just had one caller, I could
move the whole 'if' outside it for now, which would save me one level of
indentation within that function, and make it easier to parse. With the
conditional being simplified to what I showed above, I think it can
still be justified?

Even in the series that will follow this, we would still have only one
caller for this function.

>> +	return 3;
>> +}
>> +
>>  static int resolve_relative_path(int argc, const char **argv, const char *prefix)
>>  {
>>  	struct strbuf sb = STRBUF_INIT;
>> @@ -2951,6 +3209,7 @@ static struct cmd_struct commands[] = {
>>  	{"add-clone", add_clone, 0},
>>  	{"update-module-mode", module_update_module_mode, 0},
>>  	{"update-clone", update_clone, 0},
>> +	{"run-update-procedure", run_update_procedure, 0},
>>  	{"ensure-core-worktree", ensure_core_worktree, 0},
>>  	{"relative-path", resolve_relative_path, 0},
>>  	{"resolve-relative-url", resolve_relative_url, 0},
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index dbd2ec2050..d8e30d1afa 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -519,14 +512,13 @@ cmd_update()
>>
>>  		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
>>
>> -		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
>> -
>>  		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
>
> On the other side of the API boundary, update_data.displaypath is
> populated by value computed in C.  It is a bit unfortunate that we
> still need to compute it here and risk the two to drift apart.

Yes, and I had some trouble figuring out a clean separation boundary,
and this compromise felt like the best one. The best I can do is assure
you that the patch series following this change solves this issue
entirely, as it moves all of the shell code you see here into the C
helper, and thus we only compute this value once.

So there should be no worry about drift, as I will not give any chance
to introduce it at all :)

>>  		if test $just_cloned -eq 1
>>  		then
>>  			subsha1=
>>  		else
>> +			just_cloned=
>>  			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
>>  				git rev-parse --verify HEAD) ||
>>  			die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")"
>> @@ -547,70 +539,38 @@ cmd_update()
>>  			die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
>>  		fi
>>
>> -		if test "$subsha1" != "$sha1" || test -n "$force"
>> -		then
>> -			subforce=$force
>> -			# If we don't already have a -f flag and the submodule has never been checked out
>> -			if test -z "$subsha1" && test -z "$force"
>> -			then
>> -				subforce="-f"
>> -			fi
>> +		out=$(git submodule--helper run-update-procedure \
>> +			  ${wt_prefix:+--prefix "$wt_prefix"} \
>> +			  ${GIT_QUIET:+--quiet} \
>> +			  ${force:+--force} \
>> +			  ${just_cloned:+--just-cloned} \
>> +			  ${nofetch:+--no-fetch} \
>> +			  ${depth:+"$depth"} \
>> +			  ${update:+--update "$update"} \
>> +			  ${prefix:+--recursive-prefix "$prefix"} \
>> +			  ${sha1:+--oid "$sha1"} \
>> +			  ${subsha1:+--suboid "$subsha1"} \
>> +			  "--" \
>> +			  "$sm_path")
>
> We'd just show errors directly to the standard error stream from
> submodule--helper, but what comes from the printf in the switch
> statement at the end of run_update_command() is captured in $out
> variable.  Notably, the messages from die()s in the second switch
> statement in run_update_command() are not captured in $out here.
>
>> -			if test -z "$nofetch"
>> -			then
>> -				# Run fetch only if $sha1 isn't present or it
>> -				# is not reachable from a ref.
>> -				is_tip_reachable "$sm_path" "$sha1" ||
>> -				fetch_in_submodule "$sm_path" $depth ||
>> -				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
>> -
>> -				# Now we tried the usual fetch, but $sha1 may
>> -				# not be reachable from any of the refs
>> -				is_tip_reachable "$sm_path" "$sha1" ||
>> -				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
>> -				die "fatal: $(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
>> -			fi
>> -
>> -			must_die_on_failure=
>> -			case "$update_module" in
>> -			checkout)
>> -				command="git checkout $subforce -q"
>> -				die_msg="fatal: $(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
>> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
>> -				;;
>> -			rebase)
>> -				command="git rebase ${GIT_QUIET:+--quiet}"
>> -				die_msg="fatal: $(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
>> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
>> -				must_die_on_failure=yes
>> -				;;
>> -			merge)
>> -				command="git merge ${GIT_QUIET:+--quiet}"
>> -				die_msg="fatal: $(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
>> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
>> -				must_die_on_failure=yes
>> -				;;
>> -			!*)
>> -				command="${update_module#!}"
>> -				die_msg="fatal: $(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
>> -				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
>> -				must_die_on_failure=yes
>> -				;;
>> -			*)
>> -				die "fatal: $(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
>> -			esac
>> -
>> -			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
>> -			then
>> -				say "$say_msg"
>> -			elif test -n "$must_die_on_failure"
>> -			then
>> -				die_with_status 2 "$die_msg"
>> -			else
>> -				err="${err};$die_msg"
>> -				continue
>> -			fi
>> -		fi
>> +		# exit codes for run-update-procedure:
>> +		# 0: update was successful, say command output
>> +		# 128: subcommand died during execution
>> +		# 1: update procedure failed, but should not die
>> +		# 3: no update procedure was run
>> +		res="$?"
>> +		case $res in
>> +		0)
>> +			say "$out"
>> +			;;
>
> And the case where there is no error is quite straight-forward.  We
> just emit what we saw in the standard output stream of the helper.
>
>> +		128)
>> +			exit $res
>> +			;;
>> +		1)
>> +			err="${err};$out"
>
> This part is dubious.  In the original, $err accumulates what is in
> $die_msg, which are things like "fatal: Unable to rebase ...", but
> with this patch, what used to be the contents of $die_msg are given
> to die() after we see run_command() fail, and would have sent to the
> standard error stream, not captured in $out here, no?

Yes, this is bad. This error slipped past the test suite that I was too
reliant on. Even if it did work, it would not have printed the error
messages for the "checkout" mode. I will fix all of these issues.
Printing to stdout with error return ought to fix it, as you suggested.

>> +			continue
>> +			;;
>> +		esac
>>
>>  		if test -n "$recursive"
>>  		then
>
>
> Thanks.

Thanks for the thorough review.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4] submodule--helper: run update procedures from C
  2021-08-13  7:56   ` [GSoC] [PATCH v3] " Atharva Raykar
  2021-08-13 18:32     ` Junio C Hamano
@ 2021-08-24 14:06     ` Atharva Raykar
  2021-09-08  0:14       ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Atharva Raykar @ 2021-08-24 14:06 UTC (permalink / raw)
  To: git
  Cc: Atharva Raykar, Emily Shaffer, Jonathan Nieder, Junio C Hamano,
	Christian Couder, Shourya Shukla, Kaartic Sivaraam,
	Prathamesh Chavan

Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.

This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.

Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.

This change is more focused on doing a faithful conversion, so for now we
are not too concerned with trying to reduce subprocess spawns.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
---

This patch addresses the concerns raised by Junio. The important changes are:

* Fix error message handling of the output of 'run-update-procedure'. While at
  it, ensure the "checkout" mode error message is stored and printed
  appropriately.

* In 'run_update_command()' switch from 'run_command()' to
  'run_command_v_opt_cd_env()' to ensure quoted command update modes are handled
  correctly.

* Code style and hygiene changes.

* Introduce a NEEDSWORK comment, because the printf() and error return is
  correct only because the shell caller in the other end redirects it to the
  correct output stream. Once we switch this completely to C (ie, in the
  follow-up series), I need to remember to die() instead (or print to stderr) to
  reproduce the original behaviour.

Range-diff against v3:
1:  2ff48f8790 ! 1:  2729834e43 submodule--helper: run update procedures from C
    @@ Commit message
         This change is more focused on doing a faithful conversion, so for now we
         are not too concerned with trying to reduce subprocess spawns.
     
    -    Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
         Mentored-by: Christian Couder <christian.couder@gmail.com>
         Mentored-by: Shourya Shukla <periperidip@gmail.com>
    +    Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
     
      ## builtin/submodule--helper.c ##
     @@ builtin/submodule--helper.c: struct submodule_update_clone {
    @@ builtin/submodule--helper.c: static int git_update_clone_config(const char *var,
     +
     +static int run_update_command(struct update_data *ud, int subforce)
     +{
    -+	struct child_process cp = CHILD_PROCESS_INIT;
    ++	struct strvec args = STRVEC_INIT;
    ++	struct strvec child_env = STRVEC_INIT;
     +	char *oid = oid_to_hex(&ud->oid);
     +	int must_die_on_failure = 0;
    ++	int git_cmd;
     +
    -+	cp.dir = xstrdup(ud->sm_path);
     +	switch (ud->update_strategy.type) {
     +	case SM_UPDATE_CHECKOUT:
    -+		cp.git_cmd = 1;
    -+		strvec_pushl(&cp.args, "checkout", "-q", NULL);
    ++		git_cmd = 1;
    ++		strvec_pushl(&args, "checkout", "-q", NULL);
     +		if (subforce)
    -+			strvec_push(&cp.args, "-f");
    ++			strvec_push(&args, "-f");
     +		break;
     +	case SM_UPDATE_REBASE:
    -+		cp.git_cmd = 1;
    -+		strvec_push(&cp.args, "rebase");
    ++		git_cmd = 1;
    ++		strvec_push(&args, "rebase");
     +		if (ud->quiet)
    -+			strvec_push(&cp.args, "--quiet");
    ++			strvec_push(&args, "--quiet");
     +		must_die_on_failure = 1;
     +		break;
     +	case SM_UPDATE_MERGE:
    -+		cp.git_cmd = 1;
    -+		strvec_push(&cp.args, "merge");
    ++		git_cmd = 1;
    ++		strvec_push(&args, "merge");
     +		if (ud->quiet)
    -+			strvec_push(&cp.args, "--quiet");
    ++			strvec_push(&args, "--quiet");
     +		must_die_on_failure = 1;
     +		break;
     +	case SM_UPDATE_COMMAND:
    -+		/* NOTE: this does not handle quoted arguments */
    -+		strvec_split(&cp.args, ud->update_strategy.command);
    ++		git_cmd = 0;
    ++		strvec_push(&args, ud->update_strategy.command);
     +		must_die_on_failure = 1;
     +		break;
    -+	case SM_UPDATE_NONE:
    -+		BUG("this should have been handled before. How did we reach here?");
    -+		break;
    -+	case SM_UPDATE_UNSPECIFIED:
    -+		BUG("update strategy should have been specified");
    ++	default:
    ++		BUG("unexpected update strategy type: %s",
    ++		    submodule_strategy_to_string(&ud->update_strategy));
     +	}
    -+
    -+	strvec_push(&cp.args, oid);
    -+
    -+	prepare_submodule_repo_env(&cp.env_array);
    -+
    -+	if (run_command(&cp)) {
    -+		if (must_die_on_failure) {
    -+			switch (ud->update_strategy.type) {
    -+			case SM_UPDATE_CHECKOUT:
    -+				die(_("Unable to checkout '%s' in submodule path '%s'"),
    -+				      oid, ud->displaypath);
    -+				break;
    -+			case SM_UPDATE_REBASE:
    -+				die(_("Unable to rebase '%s' in submodule path '%s'"),
    -+				      oid, ud->displaypath);
    -+				break;
    -+			case SM_UPDATE_MERGE:
    -+				die(_("Unable to merge '%s' in submodule path '%s'"),
    -+				      oid, ud->displaypath);
    -+				break;
    -+			case SM_UPDATE_COMMAND:
    -+				die(_("Execution of '%s %s' failed in submodule path '%s'"),
    -+				      ud->update_strategy.command, oid, ud->displaypath);
    -+				break;
    -+			case SM_UPDATE_NONE:
    -+				BUG("this should have been handled before. How did we reach here?");
    -+				break;
    -+			case SM_UPDATE_UNSPECIFIED:
    -+				BUG("update strategy should have been specified");
    -+			}
    ++	strvec_push(&args, oid);
    ++
    ++	prepare_submodule_repo_env(&child_env);
    ++	if (run_command_v_opt_cd_env(args.v, git_cmd ? RUN_GIT_CMD : RUN_USING_SHELL,
    ++				     ud->sm_path, child_env.v)) {
    ++		switch (ud->update_strategy.type) {
    ++		case SM_UPDATE_CHECKOUT:
    ++			printf(_("Unable to checkout '%s' in submodule path '%s'"),
    ++			       oid, ud->displaypath);
    ++			break;
    ++		case SM_UPDATE_REBASE:
    ++			printf(_("Unable to rebase '%s' in submodule path '%s'"),
    ++			       oid, ud->displaypath);
    ++			break;
    ++		case SM_UPDATE_MERGE:
    ++			printf(_("Unable to merge '%s' in submodule path '%s'"),
    ++			       oid, ud->displaypath);
    ++			break;
    ++		case SM_UPDATE_COMMAND:
    ++			printf(_("Execution of '%s %s' failed in submodule path '%s'"),
    ++			       ud->update_strategy.command, oid, ud->displaypath);
    ++			break;
    ++		default:
    ++			BUG("unexpected update strategy type: %s",
    ++			    submodule_strategy_to_string(&ud->update_strategy));
     +		}
     +		/*
    -+		 * This signifies to the caller in shell that
    -+		 * the command failed without dying
    ++		 * NEEDSWORK: We are currently printing to stdout with error
    ++		 * return so that the shell caller handles the error output
    ++		 * properly. Once we start handling the error messages within
    ++		 * C, we should use die() instead.
    ++		 */
    ++		if (must_die_on_failure)
    ++			return 2;
    ++		/*
    ++		 * This signifies to the caller in shell that the command
    ++		 * failed without dying
     +		 */
     +		return 1;
     +	}
    @@ builtin/submodule--helper.c: static int git_update_clone_config(const char *var,
     +		printf(_("Submodule path '%s': '%s %s'\n"),
     +		       ud->displaypath, ud->update_strategy.command, oid);
     +		break;
    -+	case SM_UPDATE_NONE:
    -+		BUG("this should have been handled before. How did we reach here?");
    -+		break;
    -+	case SM_UPDATE_UNSPECIFIED:
    -+		BUG("update strategy should have been specified");
    ++	default:
    ++		BUG("unexpected update strategy type: %s",
    ++		    submodule_strategy_to_string(&ud->update_strategy));
     +	}
     +
     +	return 0;
    @@ builtin/submodule--helper.c: static int git_update_clone_config(const char *var,
     +		 * Run fetch only if `oid` isn't present or it
     +		 * is not reachable from a ref.
     +		 */
    -+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
    -+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
    -+			    !ud->quiet)
    -+				fprintf_ln(stderr,
    -+					   _("Unable to fetch in submodule path '%s'; "
    -+					     "trying to directly fetch %s:"),
    -+					   ud->displaypath, oid_to_hex(&ud->oid));
    ++		if (!is_tip_reachable(ud->sm_path, &ud->oid) &&
    ++		    fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
    ++		    !ud->quiet)
    ++			fprintf_ln(stderr,
    ++				   _("Unable to fetch in submodule path '%s'; "
    ++				     "trying to directly fetch %s:"),
    ++				   ud->displaypath, oid_to_hex(&ud->oid));
     +		/*
     +		 * Now we tried the usual fetch, but `oid` may
     +		 * not be reachable from any of the refs.
     +		 */
    -+		if (!is_tip_reachable(ud->sm_path, &ud->oid))
    -+			if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
    -+				die(_("Fetched in submodule path '%s', but it did not "
    -+				      "contain %s. Direct fetching of that commit failed."),
    -+				    ud->displaypath, oid_to_hex(&ud->oid));
    ++		if (!is_tip_reachable(ud->sm_path, &ud->oid) &&
    ++		    fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
    ++			die(_("Fetched in submodule path '%s', but it did not "
    ++			      "contain %s. Direct fetching of that commit failed."),
    ++			    ud->displaypath, oid_to_hex(&ud->oid));
     +	}
     +
     +	return run_update_command(ud, subforce);
    @@ builtin/submodule--helper.c: static int update_clone(int argc, const char **argv
     +
     +	free(prefixed_path);
     +
    -+	if ((!is_null_oid(&update_data.oid) && !is_null_oid(&update_data.suboid) &&
    -+	     !oideq(&update_data.oid, &update_data.suboid)) ||
    -+	    is_null_oid(&update_data.suboid) || update_data.force)
    ++	if (!oideq(&update_data.oid, &update_data.suboid) || update_data.force)
     +		return do_run_update_procedure(&update_data);
     +
     +	return 3;
    @@ git-submodule.sh: cmd_update()
     +
     +		# exit codes for run-update-procedure:
     +		# 0: update was successful, say command output
    -+		# 128: subcommand died during execution
     +		# 1: update procedure failed, but should not die
    ++		# 2 or 128: subcommand died during execution
     +		# 3: no update procedure was run
     +		res="$?"
     +		case $res in
     +		0)
     +			say "$out"
     +			;;
    -+		128)
    -+			exit $res
    -+			;;
     +		1)
    -+			err="${err};$out"
    ++			err="${err};fatal: $out"
     +			continue
     +			;;
    ++		2|128)
    ++			die_with_status $res "fatal: $out"
    ++			;;
     +		esac
      
      		if test -n "$recursive"

 builtin/submodule--helper.c | 257 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            | 104 +++++----------
 2 files changed, 289 insertions(+), 72 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e4..80619361fc 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2045,6 +2045,20 @@ struct submodule_update_clone {
 	.max_jobs = 1, \
 }
 
+struct update_data {
+	const char *recursive_prefix;
+	const char *sm_path;
+	const char *displaypath;
+	struct object_id oid;
+	struct object_id suboid;
+	struct submodule_update_strategy update_strategy;
+	int depth;
+	unsigned int force: 1;
+	unsigned int quiet: 1;
+	unsigned int nofetch: 1;
+	unsigned int just_cloned: 1;
+};
+#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
 		struct strbuf *out, const char *displaypath)
@@ -2298,6 +2312,181 @@ static int git_update_clone_config(const char *var, const char *value,
 	return 0;
 }
 
+static int is_tip_reachable(const char *path, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf rev = STRBUF_INIT;
+	char *hex = oid_to_hex(oid);
+
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(path);
+	cp.no_stderr = 1;
+	strvec_pushl(&cp.args, "rev-list", "-n", "1", hex, "--not", "--all", NULL);
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	if (capture_command(&cp, &rev, GIT_MAX_HEXSZ + 1) || rev.len)
+		return 0;
+
+	return 1;
+}
+
+static int fetch_in_submodule(const char *module_path, int depth, int quiet, struct object_id *oid)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+
+	prepare_submodule_repo_env(&cp.env_array);
+	cp.git_cmd = 1;
+	cp.dir = xstrdup(module_path);
+
+	strvec_push(&cp.args, "fetch");
+	if (quiet)
+		strvec_push(&cp.args, "--quiet");
+	if (depth)
+		strvec_pushf(&cp.args, "--depth=%d", depth);
+	if (oid) {
+		char *hex = oid_to_hex(oid);
+		char *remote = get_default_remote();
+		strvec_pushl(&cp.args, remote, hex, NULL);
+	}
+
+	return run_command(&cp);
+}
+
+static int run_update_command(struct update_data *ud, int subforce)
+{
+	struct strvec args = STRVEC_INIT;
+	struct strvec child_env = STRVEC_INIT;
+	char *oid = oid_to_hex(&ud->oid);
+	int must_die_on_failure = 0;
+	int git_cmd;
+
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		git_cmd = 1;
+		strvec_pushl(&args, "checkout", "-q", NULL);
+		if (subforce)
+			strvec_push(&args, "-f");
+		break;
+	case SM_UPDATE_REBASE:
+		git_cmd = 1;
+		strvec_push(&args, "rebase");
+		if (ud->quiet)
+			strvec_push(&args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_MERGE:
+		git_cmd = 1;
+		strvec_push(&args, "merge");
+		if (ud->quiet)
+			strvec_push(&args, "--quiet");
+		must_die_on_failure = 1;
+		break;
+	case SM_UPDATE_COMMAND:
+		git_cmd = 0;
+		strvec_push(&args, ud->update_strategy.command);
+		must_die_on_failure = 1;
+		break;
+	default:
+		BUG("unexpected update strategy type: %s",
+		    submodule_strategy_to_string(&ud->update_strategy));
+	}
+	strvec_push(&args, oid);
+
+	prepare_submodule_repo_env(&child_env);
+	if (run_command_v_opt_cd_env(args.v, git_cmd ? RUN_GIT_CMD : RUN_USING_SHELL,
+				     ud->sm_path, child_env.v)) {
+		switch (ud->update_strategy.type) {
+		case SM_UPDATE_CHECKOUT:
+			printf(_("Unable to checkout '%s' in submodule path '%s'"),
+			       oid, ud->displaypath);
+			break;
+		case SM_UPDATE_REBASE:
+			printf(_("Unable to rebase '%s' in submodule path '%s'"),
+			       oid, ud->displaypath);
+			break;
+		case SM_UPDATE_MERGE:
+			printf(_("Unable to merge '%s' in submodule path '%s'"),
+			       oid, ud->displaypath);
+			break;
+		case SM_UPDATE_COMMAND:
+			printf(_("Execution of '%s %s' failed in submodule path '%s'"),
+			       ud->update_strategy.command, oid, ud->displaypath);
+			break;
+		default:
+			BUG("unexpected update strategy type: %s",
+			    submodule_strategy_to_string(&ud->update_strategy));
+		}
+		/*
+		 * NEEDSWORK: We are currently printing to stdout with error
+		 * return so that the shell caller handles the error output
+		 * properly. Once we start handling the error messages within
+		 * C, we should use die() instead.
+		 */
+		if (must_die_on_failure)
+			return 2;
+		/*
+		 * This signifies to the caller in shell that the command
+		 * failed without dying
+		 */
+		return 1;
+	}
+
+	switch (ud->update_strategy.type) {
+	case SM_UPDATE_CHECKOUT:
+		printf(_("Submodule path '%s': checked out '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_REBASE:
+		printf(_("Submodule path '%s': rebased into '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_MERGE:
+		printf(_("Submodule path '%s': merged in '%s'\n"),
+		       ud->displaypath, oid);
+		break;
+	case SM_UPDATE_COMMAND:
+		printf(_("Submodule path '%s': '%s %s'\n"),
+		       ud->displaypath, ud->update_strategy.command, oid);
+		break;
+	default:
+		BUG("unexpected update strategy type: %s",
+		    submodule_strategy_to_string(&ud->update_strategy));
+	}
+
+	return 0;
+}
+
+static int do_run_update_procedure(struct update_data *ud)
+{
+	int subforce = is_null_oid(&ud->suboid) || ud->force;
+
+	if (!ud->nofetch) {
+		/*
+		 * Run fetch only if `oid` isn't present or it
+		 * is not reachable from a ref.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid) &&
+		    fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
+		    !ud->quiet)
+			fprintf_ln(stderr,
+				   _("Unable to fetch in submodule path '%s'; "
+				     "trying to directly fetch %s:"),
+				   ud->displaypath, oid_to_hex(&ud->oid));
+		/*
+		 * Now we tried the usual fetch, but `oid` may
+		 * not be reachable from any of the refs.
+		 */
+		if (!is_tip_reachable(ud->sm_path, &ud->oid) &&
+		    fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, &ud->oid))
+			die(_("Fetched in submodule path '%s', but it did not "
+			      "contain %s. Direct fetching of that commit failed."),
+			    ud->displaypath, oid_to_hex(&ud->oid));
+	}
+
+	return run_update_command(ud, subforce);
+}
+
 static void update_submodule(struct update_clone_data *ucd)
 {
 	fprintf(stdout, "dummy %s %d\t%s\n",
@@ -2395,6 +2584,73 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	return update_submodules(&suc);
 }
 
+static int run_update_procedure(int argc, const char **argv, const char *prefix)
+{
+	int force = 0, quiet = 0, nofetch = 0, just_cloned = 0;
+	char *prefixed_path, *update = NULL;
+	struct update_data update_data = UPDATE_DATA_INIT;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet, N_("suppress output for update by rebase or merge")),
+		OPT__FORCE(&force, N_("force checkout updates"), 0),
+		OPT_BOOL('N', "no-fetch", &nofetch,
+			 N_("don't fetch new objects from the remote site")),
+		OPT_BOOL(0, "just-cloned", &just_cloned,
+			 N_("overrides update mode in case the repository is a fresh clone")),
+		OPT_INTEGER(0, "depth", &update_data.depth, N_("depth for shallow fetch")),
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("rebase, merge, checkout or none")),
+		OPT_STRING(0, "recursive-prefix", &update_data.recursive_prefix, N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_CALLBACK_F(0, "oid", &update_data.oid, N_("sha1"),
+			       N_("SHA1 expected by superproject"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_CALLBACK_F(0, "suboid", &update_data.suboid, N_("subsha1"),
+			       N_("SHA1 of submodule's HEAD"), PARSE_OPT_NONEG,
+			       parse_opt_object_id),
+		OPT_END()
+	};
+
+	const char *const usage[] = {
+		N_("git submodule--helper run-update-procedure [<options>] <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (argc != 1)
+		usage_with_options(usage, options);
+
+	update_data.force = !!force;
+	update_data.quiet = !!quiet;
+	update_data.nofetch = !!nofetch;
+	update_data.just_cloned = !!just_cloned;
+	update_data.sm_path = argv[0];
+
+	if (update_data.recursive_prefix)
+		prefixed_path = xstrfmt("%s%s", update_data.recursive_prefix, update_data.sm_path);
+	else
+		prefixed_path = xstrdup(update_data.sm_path);
+
+	update_data.displaypath = get_submodule_displaypath(prefixed_path, prefix);
+
+	determine_submodule_update_strategy(the_repository, update_data.just_cloned,
+					    update_data.sm_path, update,
+					    &update_data.update_strategy);
+
+	free(prefixed_path);
+
+	if (!oideq(&update_data.oid, &update_data.suboid) || update_data.force)
+		return do_run_update_procedure(&update_data);
+
+	return 3;
+}
+
 static int resolve_relative_path(int argc, const char **argv, const char *prefix)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -2951,6 +3207,7 @@ static struct cmd_struct commands[] = {
 	{"add-clone", add_clone, 0},
 	{"update-module-mode", module_update_module_mode, 0},
 	{"update-clone", update_clone, 0},
+	{"run-update-procedure", run_update_procedure, 0},
 	{"ensure-core-worktree", ensure_core_worktree, 0},
 	{"relative-path", resolve_relative_path, 0},
 	{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index dbd2ec2050..f703cddce8 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -369,13 +369,6 @@ cmd_deinit()
 	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit ${GIT_QUIET:+--quiet} ${force:+--force} ${deinit_all:+--all} -- "$@"
 }
 
-is_tip_reachable () (
-	sanitize_submodule_env &&
-	cd "$1" &&
-	rev=$(git rev-list -n 1 "$2" --not --all 2>/dev/null) &&
-	test -z "$rev"
-)
-
 # usage: fetch_in_submodule <module_path> [<depth>] [<sha1>]
 # Because arguments are positional, use an empty string to omit <depth>
 # but include <sha1>.
@@ -519,14 +512,13 @@ cmd_update()
 
 		git submodule--helper ensure-core-worktree "$sm_path" || exit 1
 
-		update_module=$(git submodule--helper update-module-mode $just_cloned "$sm_path" $update)
-
 		displaypath=$(git submodule--helper relative-path "$prefix$sm_path" "$wt_prefix")
 
 		if test $just_cloned -eq 1
 		then
 			subsha1=
 		else
+			just_cloned=
 			subsha1=$(sanitize_submodule_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
 			die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")"
@@ -547,70 +539,38 @@ cmd_update()
 			die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
 		fi
 
-		if test "$subsha1" != "$sha1" || test -n "$force"
-		then
-			subforce=$force
-			# If we don't already have a -f flag and the submodule has never been checked out
-			if test -z "$subsha1" && test -z "$force"
-			then
-				subforce="-f"
-			fi
+		out=$(git submodule--helper run-update-procedure \
+			  ${wt_prefix:+--prefix "$wt_prefix"} \
+			  ${GIT_QUIET:+--quiet} \
+			  ${force:+--force} \
+			  ${just_cloned:+--just-cloned} \
+			  ${nofetch:+--no-fetch} \
+			  ${depth:+"$depth"} \
+			  ${update:+--update "$update"} \
+			  ${prefix:+--recursive-prefix "$prefix"} \
+			  ${sha1:+--oid "$sha1"} \
+			  ${subsha1:+--suboid "$subsha1"} \
+			  "--" \
+			  "$sm_path")
 
-			if test -z "$nofetch"
-			then
-				# Run fetch only if $sha1 isn't present or it
-				# is not reachable from a ref.
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" $depth ||
-				say "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'; trying to directly fetch \$sha1:")"
-
-				# Now we tried the usual fetch, but $sha1 may
-				# not be reachable from any of the refs
-				is_tip_reachable "$sm_path" "$sha1" ||
-				fetch_in_submodule "$sm_path" "$depth" "$sha1" ||
-				die "fatal: $(eval_gettext "Fetched in submodule path '\$displaypath', but it did not contain \$sha1. Direct fetching of that commit failed.")"
-			fi
-
-			must_die_on_failure=
-			case "$update_module" in
-			checkout)
-				command="git checkout $subforce -q"
-				die_msg="fatal: $(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
-				;;
-			rebase)
-				command="git rebase ${GIT_QUIET:+--quiet}"
-				die_msg="fatal: $(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			merge)
-				command="git merge ${GIT_QUIET:+--quiet}"
-				die_msg="fatal: $(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
-				must_die_on_failure=yes
-				;;
-			!*)
-				command="${update_module#!}"
-				die_msg="fatal: $(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
-				say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
-				must_die_on_failure=yes
-				;;
-			*)
-				die "fatal: $(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
-			esac
-
-			if (sanitize_submodule_env; cd "$sm_path" && $command "$sha1")
-			then
-				say "$say_msg"
-			elif test -n "$must_die_on_failure"
-			then
-				die_with_status 2 "$die_msg"
-			else
-				err="${err};$die_msg"
-				continue
-			fi
-		fi
+		# exit codes for run-update-procedure:
+		# 0: update was successful, say command output
+		# 1: update procedure failed, but should not die
+		# 2 or 128: subcommand died during execution
+		# 3: no update procedure was run
+		res="$?"
+		case $res in
+		0)
+			say "$out"
+			;;
+		1)
+			err="${err};fatal: $out"
+			continue
+			;;
+		2|128)
+			die_with_status $res "fatal: $out"
+			;;
+		esac
 
 		if test -n "$recursive"
 		then
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] submodule--helper: run update procedures from C
  2021-08-24 14:06     ` [PATCH v4] " Atharva Raykar
@ 2021-09-08  0:14       ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2021-09-08  0:14 UTC (permalink / raw)
  To: git
  Cc: Atharva Raykar, Emily Shaffer, Jonathan Nieder, Christian Couder,
	Shourya Shukla, Kaartic Sivaraam, Prathamesh Chavan

Atharva Raykar <raykar.ath@gmail.com> writes:

> * Fix error message handling of the output of 'run-update-procedure'. While at
>   it, ensure the "checkout" mode error message is stored and printed
>   appropriately.
>
> * In 'run_update_command()' switch from 'run_command()' to
>   'run_command_v_opt_cd_env()' to ensure quoted command update modes are handled
>   correctly.
>
> * Code style and hygiene changes.
>
> * Introduce a NEEDSWORK comment, because the printf() and error return is
>   correct only because the shell caller in the other end redirects it to the
>   correct output stream. Once we switch this completely to C (ie, in the
>   follow-up series), I need to remember to die() instead (or print to stderr) to
>   reproduce the original behaviour.

I didn't see anybody comment on this round (and do not think I saw
anything glaringly wrong).

Is everybody happy with this version?  I am about to mark it for
'next' in the next issue of "What's cooking" report, so please
holler if I should wait.

Thanks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-08  0:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22 13:40 [GSoC] [PATCH] submodule--helper: run update procedures from C Atharva Raykar
2021-07-23  9:37 ` Ævar Arnfjörð Bjarmason
2021-07-23 16:59   ` Atharva Raykar
2021-08-04  8:35     ` Atharva Raykar
2021-08-02 13:06 ` [GSoC] [PATCH v2] " Atharva Raykar
2021-08-02 18:50   ` Shourya Shukla
2021-08-03  8:46     ` Atharva Raykar
2021-08-03 10:07     ` Atharva Raykar
2021-08-13  7:56   ` [GSoC] [PATCH v3] " Atharva Raykar
2021-08-13 18:32     ` Junio C Hamano
2021-08-24  8:58       ` Atharva Raykar
2021-08-24 14:06     ` [PATCH v4] " Atharva Raykar
2021-09-08  0:14       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.