All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] rebase -i: a couple of small improvements
@ 2021-09-08 13:41 Phillip Wood via GitGitGadget
  2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood

Fix the re-reading of the todo list after an exec or reword command and stop
forking "git checkout" when checking out "onto"

Phillip Wood (5):
  sequencer.c: factor out a function
  rebase: fix todo-list rereading
  reset_head(): mark oid parameter as const
  rebase -i: don't fork git checkout
  rebase: remove unused parameter

 builtin/rebase.c |  3 +-
 reset.c          |  4 +--
 reset.h          |  4 +--
 sequencer.c      | 88 ++++++++++++++++++++----------------------------
 sequencer.h      |  6 ++--
 5 files changed, 43 insertions(+), 62 deletions(-)


base-commit: 66262451ec94d30ac4b80eb3123549cf7a788afd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1034%2Fphillipwood%2Fwip%2Frebase-reread-todo-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1034/phillipwood/wip/rebase-reread-todo-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1034
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/5] sequencer.c: factor out a function
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
@ 2021-09-08 13:41 ` Phillip Wood via GitGitGadget
  2021-09-08 17:51   ` Eric Sunshine
  2021-09-09 10:44   ` Johannes Schindelin
  2021-09-08 13:41 ` [PATCH 2/5] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This code is heavily indented and obscures the high level logic within
the loop. Lets move it to its own function before modifying it in the
next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 7f07cd00f3f..a248c886c27 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4254,6 +4254,27 @@ static int stopped_at_head(struct repository *r)
 
 }
 
+static int reread_todo_if_changed(struct repository *r,
+				  struct todo_list *todo_list,
+				  struct replay_opts *opts)
+{
+	struct stat st;
+
+	if (stat(get_todo_path(opts), &st)) {
+		return error_errno(_("could not stat '%s'"),
+				   get_todo_path(opts));
+	} else if (match_stat_data(&todo_list->stat, &st)) {
+		/* Reread the todo file if it has changed. */
+		todo_list_release(todo_list);
+		if (read_populate_todo(r, todo_list, opts))
+			return -1; /* message was printed */
+		/* `current` will be incremented on return */
+		todo_list->current = -1;
+	}
+
+	return 0;
+}
+
 static const char rescheduled_advice[] =
 N_("Could not execute the todo command\n"
 "\n"
@@ -4433,20 +4454,9 @@ static int pick_commits(struct repository *r,
 							item->commit,
 							arg, item->arg_len,
 							opts, res, 0);
-		} else if (is_rebase_i(opts) && check_todo && !res) {
-			struct stat st;
-
-			if (stat(get_todo_path(opts), &st)) {
-				res = error_errno(_("could not stat '%s'"),
-						  get_todo_path(opts));
-			} else if (match_stat_data(&todo_list->stat, &st)) {
-				/* Reread the todo file if it has changed. */
-				todo_list_release(todo_list);
-				if (read_populate_todo(r, todo_list, opts))
-					res = -1; /* message was printed */
-				/* `current` will be incremented below */
-				todo_list->current = -1;
-			}
+		} else if (is_rebase_i(opts) && check_todo && !res &&
+			   reread_todo_if_changed(r, todo_list, opts)) {
+			return -1;
 		}
 
 		todo_list->current++;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/5] rebase: fix todo-list rereading
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
@ 2021-09-08 13:41 ` Phillip Wood via GitGitGadget
  2021-09-09 10:48   ` Johannes Schindelin
  2021-09-08 13:41 ` [PATCH 3/5] reset_head(): mark oid parameter as const Phillip Wood via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

54fd3243da ("rebase -i: reread the todo list if `exec` touched it",
2017-04-26) sought to reread the todo list after running an exec
command only if it had been changed. To accomplish this it checks the
stat data of the todo list after running an exec command to see if it
has changed. Unfortunately there are two problems, firstly the
implementation is buggy we actually reread the list after each exec
which is quadratic in the number of commit lookups and secondly the
design is predicated on using nanosecond time stamps which are not the
default.

The implementation bug stems from the fact that we write a new todo
list to disk before running each command but do not update the stat
data to reflect this[1].

The design problem is that it is possible for the user to edit the
todo list without changing its size or inode which means we have to
rely on the mtime to tell us if it has changed. Unfortunately unless
git is built with USE_NSEC it is possible for the original and edited
list to share the same mtime.

Ideally "git rebase --edit-todo" would set a flag that we would then
check in sequencer.c. Unfortunately this is approach will not work as
there are scripts in the wild that write to the todo list directly
without running "git rebase --edit-todo". Instead of relying on stat
data this patch simply reads the possibly edited todo list and
compares it to the original with memcmp(). This is much faster than
reparsing the todo list each time. This patch reduces the time to run

   git rebase -r -xtrue v2.32.0~100 v2.32.0

which runs 419 exec commands by 6.6%. For comparison fixing the
implementation bug in stat based approach reduces the time by a
further 1.4% and is indistinguishable from never rereading the todo
list.

[1] https://lore.kernel.org/git/20191125131833.GD23183@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 19 ++++++++-----------
 sequencer.h |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index a248c886c27..d51440ddcd9 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2671,7 +2671,6 @@ static int read_populate_todo(struct repository *r,
 			      struct todo_list *todo_list,
 			      struct replay_opts *opts)
 {
-	struct stat st;
 	const char *todo_file = get_todo_path(opts);
 	int res;
 
@@ -2679,11 +2678,6 @@ static int read_populate_todo(struct repository *r,
 	if (strbuf_read_file_or_whine(&todo_list->buf, todo_file) < 0)
 		return -1;
 
-	res = stat(todo_file, &st);
-	if (res)
-		return error(_("could not stat '%s'"), todo_file);
-	fill_stat_data(&todo_list->stat, &st);
-
 	res = todo_list_parse_insn_buffer(r, todo_list->buf.buf, todo_list);
 	if (res) {
 		if (is_rebase_i(opts))
@@ -4258,12 +4252,14 @@ static int reread_todo_if_changed(struct repository *r,
 				  struct todo_list *todo_list,
 				  struct replay_opts *opts)
 {
-	struct stat st;
+	int offset;
+	struct strbuf buf = STRBUF_INIT;
 
-	if (stat(get_todo_path(opts), &st)) {
-		return error_errno(_("could not stat '%s'"),
-				   get_todo_path(opts));
-	} else if (match_stat_data(&todo_list->stat, &st)) {
+	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
+		return -1;
+	offset = get_item_line_offset(todo_list, todo_list->current + 1);
+	if (buf.len != todo_list->buf.len - offset ||
+	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
 		/* Reread the todo file if it has changed. */
 		todo_list_release(todo_list);
 		if (read_populate_todo(r, todo_list, opts))
@@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
 		/* `current` will be incremented on return */
 		todo_list->current = -1;
 	}
+	strbuf_release(&buf);
 
 	return 0;
 }
diff --git a/sequencer.h b/sequencer.h
index d57d8ea23d7..cdeb0c6be47 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -116,7 +116,6 @@ struct todo_list {
 	struct todo_item *items;
 	int nr, alloc, current;
 	int done_nr, total_nr;
-	struct stat_data stat;
 };
 
 #define TODO_LIST_INIT { STRBUF_INIT }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/5] reset_head(): mark oid parameter as const
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
  2021-09-08 13:41 ` [PATCH 2/5] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
@ 2021-09-08 13:41 ` Phillip Wood via GitGitGadget
  2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

We do not write to oid so mark it as const in preparation for the next
commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 reset.c | 4 ++--
 reset.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/reset.c b/reset.c
index 4bea758053b..5b578d6bb68 100644
--- a/reset.c
+++ b/reset.c
@@ -8,8 +8,8 @@
 #include "tree.h"
 #include "unpack-trees.h"
 
-int reset_head(struct repository *r, struct object_id *oid, const char *action,
-	       const char *switch_to_branch, unsigned flags,
+int reset_head(struct repository *r, const struct object_id *oid,
+	       const char *action, const char *switch_to_branch, unsigned flags,
 	       const char *reflog_orig_head, const char *reflog_head,
 	       const char *default_reflog_action)
 {
diff --git a/reset.h b/reset.h
index 12f83c78e28..6d2be511b8c 100644
--- a/reset.h
+++ b/reset.h
@@ -12,8 +12,8 @@
 #define RESET_HEAD_REFS_ONLY (1<<3)
 #define RESET_ORIG_HEAD (1<<4)
 
-int reset_head(struct repository *r, struct object_id *oid, const char *action,
-	       const char *switch_to_branch, unsigned flags,
+int reset_head(struct repository *r, const struct object_id *oid,
+	       const char *action, const char *switch_to_branch, unsigned flags,
 	       const char *reflog_orig_head, const char *reflog_head,
 	       const char *default_reflog_action);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-09-08 13:41 ` [PATCH 3/5] reset_head(): mark oid parameter as const Phillip Wood via GitGitGadget
@ 2021-09-08 13:41 ` Phillip Wood via GitGitGadget
  2021-09-08 18:14   ` Philippe Blain
  2021-09-09 15:03   ` Elijah Newren
  2021-09-08 13:41 ` [PATCH 5/5] rebase: remove unused parameter Phillip Wood via GitGitGadget
  2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  5 siblings, 2 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The "apply" based rebase has avoided forking git checkout since
ac7f467fef ("builtin/rebase: support running "git rebase <upstream>"",
2018-08-07). The code that handles the checkout was moved into libgit
by b309a97108 ("reset: extract reset_head() from rebase", 2020-04-07)
so lets start using it for the "merge" based rebase as well. This
opens the way for us to stop calling the post-checkout hook in the
future.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 29 ++++-------------------------
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index d51440ddcd9..1a9dbc70d3c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4192,42 +4192,21 @@ int apply_autostash_oid(const char *stash_oid)
 	return apply_save_autostash_oid(stash_oid, 1);
 }
 
-static int run_git_checkout(struct repository *r, struct replay_opts *opts,
-			    const char *commit, const char *action)
-{
-	struct child_process cmd = CHILD_PROCESS_INIT;
-	int ret;
-
-	cmd.git_cmd = 1;
-
-	strvec_push(&cmd.args, "checkout");
-	strvec_push(&cmd.args, commit);
-	strvec_pushf(&cmd.env_array, GIT_REFLOG_ACTION "=%s", action);
-
-	if (opts->verbose)
-		ret = run_command(&cmd);
-	else
-		ret = run_command_silent_on_success(&cmd);
-
-	if (!ret)
-		discard_index(r->index);
-
-	return ret;
-}
-
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
 			 const char *onto_name, const struct object_id *onto,
 			 const struct object_id *orig_head)
 {
 	const char *action = reflog_message(opts, "start", "checkout %s", onto_name);
 
-	if (run_git_checkout(r, opts, oid_to_hex(onto), action)) {
+	if (reset_head(r, onto, "checkout", NULL, RESET_HEAD_DETACH |
+		       RESET_ORIG_HEAD | RESET_HEAD_RUN_POST_CHECKOUT_HOOK,
+		       NULL, action, "rebase")) {
 		apply_autostash(rebase_path_autostash());
 		sequencer_remove_state(opts);
 		return error(_("could not detach HEAD"));
 	}
 
-	return update_ref(NULL, "ORIG_HEAD", orig_head, NULL, 0, UPDATE_REFS_MSG_ON_ERR);
+	return 0;
 }
 
 static int stopped_at_head(struct repository *r)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/5] rebase: remove unused parameter
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
@ 2021-09-08 13:41 ` Phillip Wood via GitGitGadget
  2021-09-09 10:54   ` Johannes Schindelin
  2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  5 siblings, 1 reply; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-08 13:41 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we use reset_head() we don't need to pass orig_head around.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 builtin/rebase.c |  3 +--
 sequencer.c      | 12 +++++-------
 sequencer.h      |  5 ++---
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 12f093121d9..1520f75a491 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -377,8 +377,7 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 		split_exec_commands(opts->cmd, &commands);
 		ret = complete_action(the_repository, &replay, flags,
 			shortrevisions, opts->onto_name, opts->onto,
-			&opts->orig_head, &commands, opts->autosquash,
-			&todo_list);
+			&commands, opts->autosquash, &todo_list);
 	}
 
 	string_list_clear(&commands, 0);
diff --git a/sequencer.c b/sequencer.c
index 1a9dbc70d3c..a7ea60a7a82 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4193,8 +4193,7 @@ int apply_autostash_oid(const char *stash_oid)
 }
 
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
-			 const char *onto_name, const struct object_id *onto,
-			 const struct object_id *orig_head)
+			 const char *onto_name, const struct object_id *onto)
 {
 	const char *action = reflog_message(opts, "start", "checkout %s", onto_name);
 
@@ -5567,9 +5566,8 @@ static int skip_unnecessary_picks(struct repository *r,
 
 int complete_action(struct repository *r, struct replay_opts *opts, unsigned flags,
 		    const char *shortrevisions, const char *onto_name,
-		    struct commit *onto, const struct object_id *orig_head,
-		    struct string_list *commands, unsigned autosquash,
-		    struct todo_list *todo_list)
+		    struct commit *onto, struct string_list *commands,
+		    unsigned autosquash, struct todo_list *todo_list)
 {
 	char shortonto[GIT_MAX_HEXSZ + 1];
 	const char *todo_file = rebase_path_todo();
@@ -5616,7 +5614,7 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
 
 		return error(_("nothing to do"));
 	} else if (res == -4) {
-		checkout_onto(r, opts, onto_name, &onto->object.oid, orig_head);
+		checkout_onto(r, opts, onto_name, &onto->object.oid);
 		todo_list_release(&new_todo);
 
 		return -1;
@@ -5644,7 +5642,7 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
 
 	res = -1;
 
-	if (checkout_onto(r, opts, onto_name, &oid, orig_head))
+	if (checkout_onto(r, opts, onto_name, &oid))
 		goto cleanup;
 
 	if (require_clean_work_tree(r, "rebase", "", 1, 1))
diff --git a/sequencer.h b/sequencer.h
index cdeb0c6be47..352b24014bb 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -163,9 +163,8 @@ void todo_list_add_exec_commands(struct todo_list *todo_list,
 				 struct string_list *commands);
 int complete_action(struct repository *r, struct replay_opts *opts, unsigned flags,
 		    const char *shortrevisions, const char *onto_name,
-		    struct commit *onto, const struct object_id *orig_head,
-		    struct string_list *commands, unsigned autosquash,
-		    struct todo_list *todo_list);
+		    struct commit *onto, struct string_list *commands,
+		    unsigned autosquash, struct todo_list *todo_list);
 int todo_list_rearrange_squash(struct todo_list *todo_list);
 
 /*
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] sequencer.c: factor out a function
  2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
@ 2021-09-08 17:51   ` Eric Sunshine
  2021-09-09 10:10     ` Phillip Wood
  2021-09-09 10:44   ` Johannes Schindelin
  1 sibling, 1 reply; 31+ messages in thread
From: Eric Sunshine @ 2021-09-08 17:51 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: Git List, Phillip Wood

On Wed, Sep 8, 2021 at 9:41 AM Phillip Wood via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> This code is heavily indented and obscures the high level logic within
> the loop. Lets move it to its own function before modifying it in the
> next commit.

s/Lets/Let's/ ... or just drop "Let's" altogether and start with "Move".

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
@ 2021-09-08 18:14   ` Philippe Blain
  2021-09-09 10:09     ` Phillip Wood
  2021-09-09 10:53     ` Johannes Schindelin
  2021-09-09 15:03   ` Elijah Newren
  1 sibling, 2 replies; 31+ messages in thread
From: Philippe Blain @ 2021-09-08 18:14 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget, git; +Cc: Phillip Wood, Emily Shaffer

Hi Phillip,

Le 2021-09-08 à 09:41, Phillip Wood via GitGitGadget a écrit :
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> The "apply" based rebase has avoided forking git checkout since
> ac7f467fef ("builtin/rebase: support running "git rebase <upstream>"",
> 2018-08-07). The code that handles the checkout was moved into libgit
> by b309a97108 ("reset: extract reset_head() from rebase", 2020-04-07)
> so lets start using it for the "merge" based rebase as well. This
> opens the way for us to stop calling the post-checkout hook in the
> future.
> 

While in general I think it's a good thing to avoid forking, this change
might result in behavioral differences. Any config that affects
'git checkout' but not the internal 'reset.c::reset_head' function might
play a role in the rebase UX.

One that immediately came to mind is 'submodule.recurse'. This initial 'onto'
checkout was pretty much the only part of 'git rebase' that did something useful
for submodules, so it's kind of sad to see it regress. [That is, until someone
takes the time to implement 'git rebase --recurse-submodules' and makes sure *all*
code paths that touch the working tree pay attention to this flag, and that will
probably necessitate 'git merge --recurse-submodules' first because of the 'merge'
backend... as far as I'm aware it's on Emily's list [1], it's also on mine but
I don't know when I'll get the time.]

Anyway, I'm not saying that we should not do what this patch is proposing, but
I think caveats such as that should be documented in the commit message, and maybe
an audit of other configs that might results in behavioural differences should be done.

Thanks,

Philippe.

[1] https://lore.kernel.org/git/YHofmWcIAidkvJiD@google.com/t/#m0229af9183a84c2367f21e82adfbd21f08aa4437

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-08 18:14   ` Philippe Blain
@ 2021-09-09 10:09     ` Phillip Wood
  2021-09-09 12:40       ` Philippe Blain
  2021-09-09 10:53     ` Johannes Schindelin
  1 sibling, 1 reply; 31+ messages in thread
From: Phillip Wood @ 2021-09-09 10:09 UTC (permalink / raw)
  To: Philippe Blain, Phillip Wood via GitGitGadget, git
  Cc: Phillip Wood, Emily Shaffer

Hi Philippe

On 08/09/2021 19:14, Philippe Blain wrote:
> Hi Phillip,
> 
> Le 2021-09-08 à 09:41, Phillip Wood via GitGitGadget a écrit :
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> The "apply" based rebase has avoided forking git checkout since
>> ac7f467fef ("builtin/rebase: support running "git rebase <upstream>"",
>> 2018-08-07). The code that handles the checkout was moved into libgit
>> by b309a97108 ("reset: extract reset_head() from rebase", 2020-04-07)
>> so lets start using it for the "merge" based rebase as well. This
>> opens the way for us to stop calling the post-checkout hook in the
>> future.
>>
> 
> While in general I think it's a good thing to avoid forking, this change
> might result in behavioral differences. Any config that affects
> 'git checkout' but not the internal 'reset.c::reset_head' function might
> play a role in the rebase UX.
> 
> One that immediately came to mind is 'submodule.recurse'. This initial 
> 'onto'
> checkout was pretty much the only part of 'git rebase' that did 
> something useful
> for submodules, so it's kind of sad to see it regress. 

Thanks for pointing that out. As a non-submodule user my question would 
be is it actually useful for the initial checkout to work that way if 
the rest of rebase (and the checkout for the am backend) ignores 
submodules? reset.c::reset_head() just uses unpack trees like checkout 
so if rebase read 'submodule.recurse' then reset_head() would work like 
'git checkout' and also 'git rebase --abort' and the "reset" command in 
the todo list would start checking out submodules. I'm reluctant to do 
that until the merge backend also handles submodules unless there is a 
good reason that such partial submodule support would help submodule users.

[That is, until
> someone
> takes the time to implement 'git rebase --recurse-submodules' and makes 
> sure *all*
> code paths that touch the working tree pay attention to this flag, and 
> that will
> probably necessitate 'git merge --recurse-submodules' first because of 
> the 'merge'
> backend... as far as I'm aware it's on Emily's list [1], it's also on 
> mine but
> I don't know when I'll get the time.]
> 
> Anyway, I'm not saying that we should not do what this patch is 
> proposing, but
> I think caveats such as that should be documented in the commit message, 

Yes I'll update the commit message. It should also mention that this 
fixes the bug reported in [1] where a failing post-checkout hook aborts 
a rebase.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/CAHr-Uu_KeAJZrd+GzymNP47iFi+dZkvVYsWQPtzT_FQrVnWTDg@mail.gmail.com/

> and maybe
> an audit of other configs that might results in behavioural differences 
> should be done.
> 
> Thanks,
> 
> Philippe.
> 
> [1] 
> https://lore.kernel.org/git/YHofmWcIAidkvJiD@google.com/t/#m0229af9183a84c2367f21e82adfbd21f08aa4437 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] sequencer.c: factor out a function
  2021-09-08 17:51   ` Eric Sunshine
@ 2021-09-09 10:10     ` Phillip Wood
  0 siblings, 0 replies; 31+ messages in thread
From: Phillip Wood @ 2021-09-09 10:10 UTC (permalink / raw)
  To: Eric Sunshine, Phillip Wood via GitGitGadget; +Cc: Git List, Phillip Wood

On 08/09/2021 18:51, Eric Sunshine wrote:
> On Wed, Sep 8, 2021 at 9:41 AM Phillip Wood via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> This code is heavily indented and obscures the high level logic within
>> the loop. Lets move it to its own function before modifying it in the
>> next commit.
> 
> s/Lets/Let's/ ... or just drop "Let's" altogether and start with "Move".

Thanks, well spotted as always

Phillip

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] sequencer.c: factor out a function
  2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
  2021-09-08 17:51   ` Eric Sunshine
@ 2021-09-09 10:44   ` Johannes Schindelin
  1 sibling, 0 replies; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-09 10:44 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood, Phillip Wood

Hi Phillip,

On Wed, 8 Sep 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This code is heavily indented and obscures the high level logic within
> the loop. Lets move it to its own function before modifying it in the
> next commit.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  sequencer.c | 38 ++++++++++++++++++++++++--------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 7f07cd00f3f..a248c886c27 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4254,6 +4254,27 @@ static int stopped_at_head(struct repository *r)
>
>  }
>
> +static int reread_todo_if_changed(struct repository *r,
> +				  struct todo_list *todo_list,
> +				  struct replay_opts *opts)
> +{
> +	struct stat st;
> +
> +	if (stat(get_todo_path(opts), &st)) {
> +		return error_errno(_("could not stat '%s'"),
> +				   get_todo_path(opts));
> +	} else if (match_stat_data(&todo_list->stat, &st)) {
> +		/* Reread the todo file if it has changed. */
> +		todo_list_release(todo_list);
> +		if (read_populate_todo(r, todo_list, opts))
> +			return -1; /* message was printed */
> +		/* `current` will be incremented on return */
> +		todo_list->current = -1;
> +	}
> +
> +	return 0;
> +}
> +
>  static const char rescheduled_advice[] =
>  N_("Could not execute the todo command\n"
>  "\n"
> @@ -4433,20 +4454,9 @@ static int pick_commits(struct repository *r,
>  							item->commit,
>  							arg, item->arg_len,
>  							opts, res, 0);
> -		} else if (is_rebase_i(opts) && check_todo && !res) {
> -			struct stat st;
> -
> -			if (stat(get_todo_path(opts), &st)) {
> -				res = error_errno(_("could not stat '%s'"),
> -						  get_todo_path(opts));
> -			} else if (match_stat_data(&todo_list->stat, &st)) {
> -				/* Reread the todo file if it has changed. */
> -				todo_list_release(todo_list);
> -				if (read_populate_todo(r, todo_list, opts))
> -					res = -1; /* message was printed */
> -				/* `current` will be incremented below */
> -				todo_list->current = -1;
> -			}
> +		} else if (is_rebase_i(opts) && check_todo && !res &&
> +			   reread_todo_if_changed(r, todo_list, opts)) {
> +			return -1;

It definitely looks like a good refactoring, but it does subtly change the
behavior. If the `todo` file could not be found, we previously continued
after printing out an error, but now we return with an error.

Likewise, if the `todo` file existed but had a parse error, we would
print that error _but continue_. This is definitely a bug, if you ask me.

Now, this might be a desirable change in the first place: if an `exec`
command removed the `todo` file, changes are that the user wanted to
abort, right? And the bug needs to be fixed, too.

It does need to be called out in the commit message, though.

Thank you,
Dscho

>  		}
>
>  		todo_list->current++;
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] rebase: fix todo-list rereading
  2021-09-08 13:41 ` [PATCH 2/5] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
@ 2021-09-09 10:48   ` Johannes Schindelin
  0 siblings, 0 replies; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-09 10:48 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood, Phillip Wood

[-- Attachment #1: Type: text/plain, Size: 4598 bytes --]

Hi Phillip,

On Wed, 8 Sep 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> 54fd3243da ("rebase -i: reread the todo list if `exec` touched it",
> 2017-04-26) sought to reread the todo list after running an exec
> command only if it had been changed. To accomplish this it checks the
> stat data of the todo list after running an exec command to see if it
> has changed. Unfortunately there are two problems, firstly the
> implementation is buggy we actually reread the list after each exec
> which is quadratic in the number of commit lookups and secondly the
> design is predicated on using nanosecond time stamps which are not the
> default.
>
> The implementation bug stems from the fact that we write a new todo
> list to disk before running each command but do not update the stat
> data to reflect this[1].
>
> The design problem is that it is possible for the user to edit the
> todo list without changing its size or inode which means we have to
> rely on the mtime to tell us if it has changed. Unfortunately unless
> git is built with USE_NSEC it is possible for the original and edited
> list to share the same mtime.
>
> Ideally "git rebase --edit-todo" would set a flag that we would then
> check in sequencer.c. Unfortunately this is approach will not work as
> there are scripts in the wild that write to the todo list directly
> without running "git rebase --edit-todo". Instead of relying on stat
> data this patch simply reads the possibly edited todo list and
> compares it to the original with memcmp(). This is much faster than
> reparsing the todo list each time. This patch reduces the time to run
>
>    git rebase -r -xtrue v2.32.0~100 v2.32.0
>
> which runs 419 exec commands by 6.6%. For comparison fixing the
> implementation bug in stat based approach reduces the time by a
> further 1.4% and is indistinguishable from never rereading the todo
> list.

Thank you for fixing my bug _and_ for improving performance,
Dscho

>
> [1] https://lore.kernel.org/git/20191125131833.GD23183@szeder.dev/
>
> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  sequencer.c | 19 ++++++++-----------
>  sequencer.h |  1 -
>  2 files changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index a248c886c27..d51440ddcd9 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2671,7 +2671,6 @@ static int read_populate_todo(struct repository *r,
>  			      struct todo_list *todo_list,
>  			      struct replay_opts *opts)
>  {
> -	struct stat st;
>  	const char *todo_file = get_todo_path(opts);
>  	int res;
>
> @@ -2679,11 +2678,6 @@ static int read_populate_todo(struct repository *r,
>  	if (strbuf_read_file_or_whine(&todo_list->buf, todo_file) < 0)
>  		return -1;
>
> -	res = stat(todo_file, &st);
> -	if (res)
> -		return error(_("could not stat '%s'"), todo_file);
> -	fill_stat_data(&todo_list->stat, &st);
> -
>  	res = todo_list_parse_insn_buffer(r, todo_list->buf.buf, todo_list);
>  	if (res) {
>  		if (is_rebase_i(opts))
> @@ -4258,12 +4252,14 @@ static int reread_todo_if_changed(struct repository *r,
>  				  struct todo_list *todo_list,
>  				  struct replay_opts *opts)
>  {
> -	struct stat st;
> +	int offset;
> +	struct strbuf buf = STRBUF_INIT;
>
> -	if (stat(get_todo_path(opts), &st)) {
> -		return error_errno(_("could not stat '%s'"),
> -				   get_todo_path(opts));
> -	} else if (match_stat_data(&todo_list->stat, &st)) {
> +	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
> +		return -1;
> +	offset = get_item_line_offset(todo_list, todo_list->current + 1);
> +	if (buf.len != todo_list->buf.len - offset ||
> +	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
>  		/* Reread the todo file if it has changed. */
>  		todo_list_release(todo_list);
>  		if (read_populate_todo(r, todo_list, opts))
> @@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
>  		/* `current` will be incremented on return */
>  		todo_list->current = -1;
>  	}
> +	strbuf_release(&buf);
>
>  	return 0;
>  }
> diff --git a/sequencer.h b/sequencer.h
> index d57d8ea23d7..cdeb0c6be47 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -116,7 +116,6 @@ struct todo_list {
>  	struct todo_item *items;
>  	int nr, alloc, current;
>  	int done_nr, total_nr;
> -	struct stat_data stat;
>  };
>
>  #define TODO_LIST_INIT { STRBUF_INIT }
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-08 18:14   ` Philippe Blain
  2021-09-09 10:09     ` Phillip Wood
@ 2021-09-09 10:53     ` Johannes Schindelin
  2021-09-09 12:44       ` Philippe Blain
  1 sibling, 1 reply; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-09 10:53 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood, Emily Shaffer

[-- Attachment #1: Type: text/plain, Size: 2237 bytes --]

Hi Philippe,

On Wed, 8 Sep 2021, Philippe Blain wrote:

> Hi Phillip,
>
> Le 2021-09-08 à 09:41, Phillip Wood via GitGitGadget a écrit :
> > From: Phillip Wood <phillip.wood@dunelm.org.uk>
> >
> > The "apply" based rebase has avoided forking git checkout since
> > ac7f467fef ("builtin/rebase: support running "git rebase <upstream>"",
> > 2018-08-07). The code that handles the checkout was moved into libgit
> > by b309a97108 ("reset: extract reset_head() from rebase", 2020-04-07)
> > so lets start using it for the "merge" based rebase as well. This
> > opens the way for us to stop calling the post-checkout hook in the
> > future.
> >
>
> While in general I think it's a good thing to avoid forking, this change
> might result in behavioral differences. Any config that affects
> 'git checkout' but not the internal 'reset.c::reset_head' function might
> play a role in the rebase UX.
>
> One that immediately came to mind is 'submodule.recurse'. This initial
> 'onto' checkout was pretty much the only part of 'git rebase' that did
> something useful for submodules, so it's kind of sad to see it regress.
> [That is, until someone takes the time to implement 'git rebase
> --recurse-submodules' and makes sure *all* code paths that touch the
> working tree pay attention to this flag, and that will probably
> necessitate 'git merge --recurse-submodules' first because of the
> 'merge' backend... as far as I'm aware it's on Emily's list [1], it's
> also on mine but I don't know when I'll get the time.]

Good point, there is more in the `git_checkout_config()` function (which
is `static` in `builtin/checkout.c`, and could easily be moved to
`checkout.[ch]`). We should probably fall back to calling that function
rather than `git_default_config()` in `rebase_config()`.

> Anyway, I'm not saying that we should not do what this patch is
> proposing, but I think caveats such as that should be documented in the
> commit message, and maybe an audit of other configs that might results
> in behavioural differences should be done.

Since this is already a bug in the `apply` backend, it would be even
better to follow-up with a fix, hint, hint, nudge, nudge ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] rebase: remove unused parameter
  2021-09-08 13:41 ` [PATCH 5/5] rebase: remove unused parameter Phillip Wood via GitGitGadget
@ 2021-09-09 10:54   ` Johannes Schindelin
  2021-09-09 14:04     ` Phillip Wood
  0 siblings, 1 reply; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-09 10:54 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood, Phillip Wood

Hi Phillip,

On Wed, 8 Sep 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Now that we use reset_head() we don't need to pass orig_head around.

Does this indicate a change in behavior? When we call `update_ref()` with
the original `HEAD`, we get some version of safety in that it will fail if
anything changed the ref in an unexpected way in the meantime.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 10:09     ` Phillip Wood
@ 2021-09-09 12:40       ` Philippe Blain
  2021-09-09 13:57         ` Phillip Wood
  0 siblings, 1 reply; 31+ messages in thread
From: Philippe Blain @ 2021-09-09 12:40 UTC (permalink / raw)
  To: phillip.wood, Phillip Wood via GitGitGadget, git; +Cc: Emily Shaffer

Hi Phillip,

Le 2021-09-09 à 06:09, Phillip Wood a écrit :
> Hi Philippe
> 
> On 08/09/2021 19:14, Philippe Blain wrote:
>> Hi Phillip,
>> 
>> Le 2021-09-08 à 09:41, Phillip Wood via GitGitGadget a écrit :
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>> 
>>> The "apply" based rebase has avoided forking git checkout since 
>>> ac7f467fef ("builtin/rebase: support running "git rebase
>>> <upstream>"", 2018-08-07). The code that handles the checkout was
>>> moved into libgit by b309a97108 ("reset: extract reset_head()
>>> from rebase", 2020-04-07) so lets start using it for the "merge"
>>> based rebase as well. This opens the way for us to stop calling
>>> the post-checkout hook in the future.
>>> 
>> 
>> While in general I think it's a good thing to avoid forking, this
>> change might result in behavioral differences. Any config that
>> affects 'git checkout' but not the internal 'reset.c::reset_head'
>> function might play a role in the rebase UX.
>> 
>> One that immediately came to mind is 'submodule.recurse'. This
>> initial 'onto' checkout was pretty much the only part of 'git
>> rebase' that did something useful for submodules, so it's kind of
>> sad to see it regress.
> 
> Thanks for pointing that out. As a non-submodule user my question
> would be is it actually useful for the initial checkout to work that
> way if the rest of rebase (and the checkout for the am backend)
> ignores submodules? reset.c::reset_head() just uses unpack trees like
> checkout so if rebase read 'submodule.recurse' then reset_head()
> would work like 'git checkout' and also 'git rebase --abort' and the
> "reset" command in the todo list would start checking out submodules.
> I'm reluctant to do that until the merge backend also handles
> submodules unless there is a good reason that such partial submodule
> support would help submodule users.

Yeah, it's not that useful, I have to admit; it can also be very confusing
since some parts of rebase are affected, and some not. For example, any time
the rebase stops, like for 'edit', 'break', and when there are conflicts, the
submodules are not updated. So I think a full solution is better than a partial
solution; in the meantime I'm thinking the change you are proposing would actually
be less confusing, even if it slightly changes behaviour...

As an aside, I *think* reading submodule.recurse in rebase like it's done in checkout
et al., i.e. something like this:

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 33e0961900..125ec907e4 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -26,6 +26,7 @@
  #include "rerere.h"
  #include "branch.h"
  #include "sequencer.h"
+#include "submodule.h"
  #include "rebase-interactive.h"
  #include "reset.h"
  
@@ -1106,6 +1107,9 @@ static int rebase_config(const char *var, const char *value, void *data)
  		return git_config_string(&opts->default_backend, var, value);
  	}
  
+	if (!strcmp(var, "submodule.recurse"))
+		return git_default_submodule_config(var, value, data);
+
  	return git_default_config(var, value, data);
  }
  

would actually also affect the merges
performed during the rebase, since that would affect the "global" state in submodule.c.
I hacked exactly that the other day but did not test extensively...

Cheers,
Philippe.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 10:53     ` Johannes Schindelin
@ 2021-09-09 12:44       ` Philippe Blain
  2021-09-09 21:43         ` Johannes Schindelin
  2021-09-10 10:46         ` Johannes Schindelin
  0 siblings, 2 replies; 31+ messages in thread
From: Philippe Blain @ 2021-09-09 12:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood, Emily Shaffer

Hi Dscho,

Le 2021-09-09 à 06:53, Johannes Schindelin a écrit :
> Hi Philippe,
> 
> On Wed, 8 Sep 2021, Philippe Blain wrote:
> 
>> Anyway, I'm not saying that we should not do what this patch is
>> proposing, but I think caveats such as that should be documented in the
>> commit message, and maybe an audit of other configs that might results
>> in behavioural differences should be done.
> 
> Since this is already a bug in the `apply` backend, it would be even
> better to follow-up with a fix, hint, hint, nudge, nudge ;-)
> 

I'm not sure I understand what you are saying. The fact that 'rebase'
does not pay attention to 'submodule.recurse' is not a bug in my opinion,
it's just a limitation of the current code... Or do you mean something else?

Thanks,
Philippe.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 12:40       ` Philippe Blain
@ 2021-09-09 13:57         ` Phillip Wood
  2021-09-09 15:01           ` Elijah Newren
  0 siblings, 1 reply; 31+ messages in thread
From: Phillip Wood @ 2021-09-09 13:57 UTC (permalink / raw)
  To: Philippe Blain, phillip.wood, Phillip Wood via GitGitGadget, git
  Cc: Emily Shaffer, Johannes Schindelin, Elijah Newren

Hi Philippe

On 09/09/2021 13:40, Philippe Blain wrote:
>>> While in general I think it's a good thing to avoid forking, this
>>> change might result in behavioral differences. Any config that
>>> affects 'git checkout' but not the internal 'reset.c::reset_head'
>>> function might play a role in the rebase UX.
>>>
>>> One that immediately came to mind is 'submodule.recurse'. This
>>> initial 'onto' checkout was pretty much the only part of 'git
>>> rebase' that did something useful for submodules, so it's kind of
>>> sad to see it regress.
>>
>> Thanks for pointing that out. As a non-submodule user my question
>> would be is it actually useful for the initial checkout to work that
>> way if the rest of rebase (and the checkout for the am backend)
>> ignores submodules? reset.c::reset_head() just uses unpack trees like
>> checkout so if rebase read 'submodule.recurse' then reset_head()
>> would work like 'git checkout' and also 'git rebase --abort' and the
>> "reset" command in the todo list would start checking out submodules.

it would also affect fast-forwards

>> I'm reluctant to do that until the merge backend also handles
>> submodules unless there is a good reason that such partial submodule
>> support would help submodule users.
> 
> Yeah, it's not that useful, I have to admit; it can also be very confusing
> since some parts of rebase are affected, and some not. For example, any 
> time
> the rebase stops, like for 'edit', 'break', and when there are 
> conflicts, the
> submodules are not updated. So I think a full solution is better than a 
> partial
> solution; in the meantime I'm thinking the change you are proposing 
> would actually
> be less confusing, even if it slightly changes behaviour...
> 
> As an aside, I *think* reading submodule.recurse in rebase like it's 
> done in checkout
> et al., i.e. something like this:
> 
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 33e0961900..125ec907e4 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -26,6 +26,7 @@
>   #include "rerere.h"
>   #include "branch.h"
>   #include "sequencer.h"
> +#include "submodule.h"
>   #include "rebase-interactive.h"
>   #include "reset.h"
> 
> @@ -1106,6 +1107,9 @@ static int rebase_config(const char *var, const 
> char *value, void *data)
>           return git_config_string(&opts->default_backend, var, value);
>       }
> 
> +    if (!strcmp(var, "submodule.recurse"))
> +        return git_default_submodule_config(var, value, data);

That looks about right to me though I think it would be safer to call 
git_default_submodule_config() for submodule.* rather than just 
submodule.recurse

>       return git_default_config(var, value, data);
>   }
> 
> 
> would actually also affect the merges
> performed during the rebase, since that would affect the "global" state 
> in submodule.c.
> I hacked exactly that the other day but did not test extensively...

merge-ort.c:checkout() which is used by merge_switch_to_result() uses 
unpack_trees() so it will pick up the global state and hopefully should 
just work (cc'ing Elijah to check as I didn't look what happens when 
there are conflicts). merge-recursive.c:update_file_flags() does this 
when updating the work tree

        if (S_ISGITLINK(contents->mode)) { 

                 /* 

                  * We may later decide to recursively descend into 

                  * the submodule directory and update its index 

                  * and/or work tree, but we do not do that now. 

                  */ 

                 update_wd = 0; 

                 goto update_index; 

        } 

 

so it looks like it does not update the submodule directory. Given 
merge-ort is now the default perhaps it's time for rebase (and 
cherry-pick/revert) to start reading the submodule config settings (we 
parse the config before we know if we'll be using merge-ort so I don't 
know how easy it would be to only parse the submodule settings if we are 
using merge-ort).

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] rebase: remove unused parameter
  2021-09-09 10:54   ` Johannes Schindelin
@ 2021-09-09 14:04     ` Phillip Wood
  0 siblings, 0 replies; 31+ messages in thread
From: Phillip Wood @ 2021-09-09 14:04 UTC (permalink / raw)
  To: Johannes Schindelin, Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood

Hi Dscho

On 09/09/2021 11:54, Johannes Schindelin wrote:
> Hi Phillip,
> 
> On Wed, 8 Sep 2021, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Now that we use reset_head() we don't need to pass orig_head around.
> 
> Does this indicate a change in behavior? When we call `update_ref()` with
> the original `HEAD`, we get some version of safety in that it will fail if
> anything changed the ref in an unexpected way in the meantime.

Good point, it looks like that was overlooked when the am rebase 
starting using reset_head(). There are already too many arguments to 
reset_head(), I'll look at changing it (in a separate commit) to take a 
struct of options instead.

Best Wishes

Phillip

> Ciao,
> Dscho
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 13:57         ` Phillip Wood
@ 2021-09-09 15:01           ` Elijah Newren
  2021-09-10 12:07             ` Philippe Blain
  2021-09-15 15:44             ` Phillip Wood
  0 siblings, 2 replies; 31+ messages in thread
From: Elijah Newren @ 2021-09-09 15:01 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Philippe Blain, Phillip Wood via GitGitGadget, Git Mailing List,
	Emily Shaffer, Johannes Schindelin

Hi,

On Thu, Sep 9, 2021 at 6:57 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Philippe
>
> On 09/09/2021 13:40, Philippe Blain wrote:
> >>> While in general I think it's a good thing to avoid forking, this
> >>> change might result in behavioral differences. Any config that
> >>> affects 'git checkout' but not the internal 'reset.c::reset_head'
> >>> function might play a role in the rebase UX.
> >>>
> >>> One that immediately came to mind is 'submodule.recurse'. This
> >>> initial 'onto' checkout was pretty much the only part of 'git
> >>> rebase' that did something useful for submodules, so it's kind of
> >>> sad to see it regress.
> >>
> >> Thanks for pointing that out. As a non-submodule user my question
> >> would be is it actually useful for the initial checkout to work that
> >> way if the rest of rebase (and the checkout for the am backend)
> >> ignores submodules? reset.c::reset_head() just uses unpack trees like
> >> checkout so if rebase read 'submodule.recurse' then reset_head()
> >> would work like 'git checkout' and also 'git rebase --abort' and the
> >> "reset" command in the todo list would start checking out submodules.
>
> it would also affect fast-forwards
>
> >> I'm reluctant to do that until the merge backend also handles
> >> submodules unless there is a good reason that such partial submodule
> >> support would help submodule users.
> >
> > Yeah, it's not that useful, I have to admit; it can also be very confusing
> > since some parts of rebase are affected, and some not. For example, any
> > time
> > the rebase stops, like for 'edit', 'break', and when there are
> > conflicts, the
> > submodules are not updated. So I think a full solution is better than a
> > partial
> > solution; in the meantime I'm thinking the change you are proposing
> > would actually
> > be less confusing, even if it slightly changes behaviour...
> >
> > As an aside, I *think* reading submodule.recurse in rebase like it's
> > done in checkout
> > et al., i.e. something like this:
> >
> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index 33e0961900..125ec907e4 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -26,6 +26,7 @@
> >   #include "rerere.h"
> >   #include "branch.h"
> >   #include "sequencer.h"
> > +#include "submodule.h"
> >   #include "rebase-interactive.h"
> >   #include "reset.h"
> >
> > @@ -1106,6 +1107,9 @@ static int rebase_config(const char *var, const
> > char *value, void *data)
> >           return git_config_string(&opts->default_backend, var, value);
> >       }
> >
> > +    if (!strcmp(var, "submodule.recurse"))
> > +        return git_default_submodule_config(var, value, data);
>
> That looks about right to me though I think it would be safer to call
> git_default_submodule_config() for submodule.* rather than just
> submodule.recurse
>
> >       return git_default_config(var, value, data);
> >   }
> >
> >
> > would actually also affect the merges
> > performed during the rebase, since that would affect the "global" state
> > in submodule.c.
> > I hacked exactly that the other day but did not test extensively...
>
> merge-ort.c:checkout() which is used by merge_switch_to_result() uses
> unpack_trees() so it will pick up the global state and hopefully should
> just work (cc'ing Elijah to check as I didn't look what happens when
> there are conflicts).

Yep, merge-ort was designed to just piggy back on checkout code.  The
checkout() function was basically just code lifted from
builtin/checkout.c.  Using that code means that merges now also
benefit from all the special working tree handling that is encoded
into git-checkout -- whether that's parallel checkout, submodule
handling, tricky D/F switches or symlink handling, etc.  In contrast
to merge-recursive, it does not need hundreds and hundreds of lines of
special worktree updating code sprayed all over the codebase.

Conflicts are not special in this regard; merge-ort creates a tree
which has files that include conflict markers, and then merge-ort
calls checkout() to switch the working copy over to that tree.

The only issue conflicts present for merge-ort, is that AFTER it has
checked out that special tree with conflict markers, it then has to go
and touch up the index afterwards to replace the entries for
conflicted files with multiple higher order stages.  (You could say
that merge-recursive is "index-first", since its design focuses on the
index -- updating it first and then figuring out everything else like
updating the working tree with special code afterwards.  In contrast,
merge-ort ignores the index entirely until the very end -- after a new
merge tree is created and after the working tree is updated.)

> merge-recursive.c:update_file_flags() does this
> when updating the work tree
>
>         if (S_ISGITLINK(contents->mode)) {
>
>                  /*
>
>                   * We may later decide to recursively descend into
>
>                   * the submodule directory and update its index
>
>                   * and/or work tree, but we do not do that now.
>
>                   */
>
>                  update_wd = 0;
>
>                  goto update_index;
>
>         }
>
>
>
> so it looks like it does not update the submodule directory. Given
> merge-ort is now the default perhaps it's time for rebase (and
> cherry-pick/revert) to start reading the submodule config settings (we
> parse the config before we know if we'll be using merge-ort so I don't
> know how easy it would be to only parse the submodule settings if we are
> using merge-ort).

I'd just parse any needed config in all cases.  The submodule settings
aren't going to hurt merge-recursive; it'll just ignore them.  (Or are
you worried about a mix-and-match of rebase calling both checkout and
merge code doing weird things, and you'd rather not have the checkout
bits update submodules if the merges won't?)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
  2021-09-08 18:14   ` Philippe Blain
@ 2021-09-09 15:03   ` Elijah Newren
  1 sibling, 0 replies; 31+ messages in thread
From: Elijah Newren @ 2021-09-09 15:03 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: Git Mailing List, Phillip Wood

On Wed, Sep 8, 2021 at 6:44 AM Phillip Wood via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> The "apply" based rebase has avoided forking git checkout since
> ac7f467fef ("builtin/rebase: support running "git rebase <upstream>"",
> 2018-08-07). The code that handles the checkout was moved into libgit
> by b309a97108 ("reset: extract reset_head() from rebase", 2020-04-07)
> so lets start using it for the "merge" based rebase as well. This
> opens the way for us to stop calling the post-checkout hook in the
> future.

Wahoo!  So exciting to see this, and for the future plans mentioned here.  :-)

> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  sequencer.c | 29 ++++-------------------------
>  1 file changed, 4 insertions(+), 25 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index d51440ddcd9..1a9dbc70d3c 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4192,42 +4192,21 @@ int apply_autostash_oid(const char *stash_oid)
>         return apply_save_autostash_oid(stash_oid, 1);
>  }
>
> -static int run_git_checkout(struct repository *r, struct replay_opts *opts,
> -                           const char *commit, const char *action)
> -{
> -       struct child_process cmd = CHILD_PROCESS_INIT;
> -       int ret;
> -
> -       cmd.git_cmd = 1;
> -
> -       strvec_push(&cmd.args, "checkout");
> -       strvec_push(&cmd.args, commit);
> -       strvec_pushf(&cmd.env_array, GIT_REFLOG_ACTION "=%s", action);
> -
> -       if (opts->verbose)
> -               ret = run_command(&cmd);
> -       else
> -               ret = run_command_silent_on_success(&cmd);
> -
> -       if (!ret)
> -               discard_index(r->index);
> -
> -       return ret;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>                          const char *onto_name, const struct object_id *onto,
>                          const struct object_id *orig_head)
>  {
>         const char *action = reflog_message(opts, "start", "checkout %s", onto_name);
>
> -       if (run_git_checkout(r, opts, oid_to_hex(onto), action)) {
> +       if (reset_head(r, onto, "checkout", NULL, RESET_HEAD_DETACH |
> +                      RESET_ORIG_HEAD | RESET_HEAD_RUN_POST_CHECKOUT_HOOK,
> +                      NULL, action, "rebase")) {
>                 apply_autostash(rebase_path_autostash());
>                 sequencer_remove_state(opts);
>                 return error(_("could not detach HEAD"));
>         }
>
> -       return update_ref(NULL, "ORIG_HEAD", orig_head, NULL, 0, UPDATE_REFS_MSG_ON_ERR);
> +       return 0;
>  }
>
>  static int stopped_at_head(struct repository *r)
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 12:44       ` Philippe Blain
@ 2021-09-09 21:43         ` Johannes Schindelin
  2021-09-10 10:46         ` Johannes Schindelin
  1 sibling, 0 replies; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-09 21:43 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood, Emily Shaffer

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

Hi Philippe,

On Thu, 9 Sep 2021, Philippe Blain wrote:

> Le 2021-09-09 à 06:53, Johannes Schindelin a écrit :
> >
> > On Wed, 8 Sep 2021, Philippe Blain wrote:
> >
> > > Anyway, I'm not saying that we should not do what this patch is
> > > proposing, but I think caveats such as that should be documented in the
> > > commit message, and maybe an audit of other configs that might results
> > > in behavioural differences should be done.
> >
> > Since this is already a bug in the `apply` backend, it would be even
> > better to follow-up with a fix, hint, hint, nudge, nudge ;-)
>
> I'm not sure I understand what you are saying.

I am saying that indeed, you found what I consider a bug, but it is
already present in the `apply` backend. And then I am hoping that you
could find the time to fix it ;-)

> The fact that 'rebase' does not pay attention to 'submodule.recurse' is
> not a bug in my opinion, it's just a limitation of the current code...

But the code that spawned `git checkout` _did_ pay attention to the
`submodule.*` settings, no?

Ciao,
Dscho

> Or do you mean something else?
>
> Thanks,
> Philippe.
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 12:44       ` Philippe Blain
  2021-09-09 21:43         ` Johannes Schindelin
@ 2021-09-10 10:46         ` Johannes Schindelin
  2021-09-10 11:58           ` Philippe Blain
  1 sibling, 1 reply; 31+ messages in thread
From: Johannes Schindelin @ 2021-09-10 10:46 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood, Emily Shaffer

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

Hi Philippe,

On Thu, 9 Sep 2021, Philippe Blain wrote:

> Le 2021-09-09 à 06:53, Johannes Schindelin a écrit :
> >
> > On Wed, 8 Sep 2021, Philippe Blain wrote:
> >
> > > Anyway, I'm not saying that we should not do what this patch is
> > > proposing, but I think caveats such as that should be documented in the
> > > commit message, and maybe an audit of other configs that might results
> > > in behavioural differences should be done.
> >
> > Since this is already a bug in the `apply` backend, it would be even
> > better to follow-up with a fix, hint, hint, nudge, nudge ;-)
> >
>
> I'm not sure I understand what you are saying. The fact that 'rebase'
> does not pay attention to 'submodule.recurse' is not a bug in my opinion,
> it's just a limitation of the current code... Or do you mean something else?

I must have misunderstood you. I thought you were saying that Phillip's
patch introduces the regression where `submodule.recurse` is no longer
respected.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-10 10:46         ` Johannes Schindelin
@ 2021-09-10 11:58           ` Philippe Blain
  0 siblings, 0 replies; 31+ messages in thread
From: Philippe Blain @ 2021-09-10 11:58 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Phillip Wood via GitGitGadget, git, Phillip Wood, Emily Shaffer

Hi Dscho,

Le 2021-09-10 à 06:46, Johannes Schindelin a écrit :
> Hi Philippe,
> 
> On Thu, 9 Sep 2021, Philippe Blain wrote:
> 
>> Le 2021-09-09 à 06:53, Johannes Schindelin a écrit :
>>>
>>> On Wed, 8 Sep 2021, Philippe Blain wrote:
>>>
>>>> Anyway, I'm not saying that we should not do what this patch is
>>>> proposing, but I think caveats such as that should be documented in the
>>>> commit message, and maybe an audit of other configs that might results
>>>> in behavioural differences should be done.
>>>
>>> Since this is already a bug in the `apply` backend, it would be even
>>> better to follow-up with a fix, hint, hint, nudge, nudge ;-)
>>>
>>
>> I'm not sure I understand what you are saying. The fact that 'rebase'
>> does not pay attention to 'submodule.recurse' is not a bug in my opinion,
>> it's just a limitation of the current code... Or do you mean something else?
> 
> I must have misunderstood you. I thought you were saying that Phillip's
> patch introduces the regression where `submodule.recurse` is no longer
> respected.
> 
> Ciao,
> Dscho
> 

Well it does, but only for the initial checkout of 'onto'. But as I wrote in
[1], I think that half respecting 'submodule.recurse' is confusing UX. So
I would think that it is not *that* bad to keep Phillip's patch as-is in the meantime
i.e. while we wait for 'git rebase --recurse-submodules' to materialize. I think
that it would not be good UI either to have rebase respect 'submodule.recurse',
but not having a --recurse-submodules flag like all other commands that honor
that config.

Thanks,
Philippe.

P.S. I did not CC you in [1], should I have? What's the etiquette around this,
i.e. should I manually add CC's for people involved in "parallel" threads in
addition to everyone I get by doing "reply-all" ?

1. https://lore.kernel.org/git/pull.1034.git.1631108472.gitgitgadget@gmail.com/T/#m182b8d5f24b41c2ff8e919819229974d71258cd9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 15:01           ` Elijah Newren
@ 2021-09-10 12:07             ` Philippe Blain
  2021-09-15 15:44             ` Phillip Wood
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Blain @ 2021-09-10 12:07 UTC (permalink / raw)
  To: Elijah Newren, Phillip Wood
  Cc: Phillip Wood via GitGitGadget, Git Mailing List, Emily Shaffer,
	Johannes Schindelin

Hi Elijah,

Le 2021-09-09 à 11:01, Elijah Newren a écrit :
> Hi,
> 
> On Thu, Sep 9, 2021 at 6:57 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
<snip>
>> merge-recursive.c:update_file_flags() does this
>> when updating the work tree
>>
>>          if (S_ISGITLINK(contents->mode)) {
>>
>>                   /*
>>
>>                    * We may later decide to recursively descend into
>>
>>                    * the submodule directory and update its index
>>
>>                    * and/or work tree, but we do not do that now.
>>
>>                    */
>>
>>                   update_wd = 0;
>>
>>                   goto update_index;
>>
>>          }
>>
>>
>>
>> so it looks like it does not update the submodule directory. Given
>> merge-ort is now the default perhaps it's time for rebase (and
>> cherry-pick/revert) to start reading the submodule config settings (we
>> parse the config before we know if we'll be using merge-ort so I don't
>> know how easy it would be to only parse the submodule settings if we are
>> using merge-ort).
> 
> I'd just parse any needed config in all cases.  The submodule settings
> aren't going to hurt merge-recursive; it'll just ignore them.  (Or are
> you worried about a mix-and-match of rebase calling both checkout and
> merge code doing weird things, and you'd rather not have the checkout
> bits update submodules if the merges won't?)
> 

Thanks for your input. I agree that reading the config in all cases would
be simpler. We could even decide that since ort is the new default, the
submodule support will not be "backported" to merge recursive (that would
be way simpler to implement, I think) This way we can just document it as
such and be done with it. But anyway, I think this is kind of
orthogonal to this here series and should be done separately.

Cheers,
Philippe.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] rebase -i: don't fork git checkout
  2021-09-09 15:01           ` Elijah Newren
  2021-09-10 12:07             ` Philippe Blain
@ 2021-09-15 15:44             ` Phillip Wood
  1 sibling, 0 replies; 31+ messages in thread
From: Phillip Wood @ 2021-09-15 15:44 UTC (permalink / raw)
  To: Elijah Newren, Phillip Wood
  Cc: Philippe Blain, Phillip Wood via GitGitGadget, Git Mailing List,
	Emily Shaffer, Johannes Schindelin

Hi Elijah

On 09/09/2021 16:01, Elijah Newren wrote:
> [...]
>> merge-ort.c:checkout() which is used by merge_switch_to_result() uses
>> unpack_trees() so it will pick up the global state and hopefully should
>> just work (cc'ing Elijah to check as I didn't look what happens when
>> there are conflicts).
> 
> Yep, merge-ort was designed to just piggy back on checkout code.  The
> checkout() function was basically just code lifted from
> builtin/checkout.c.  Using that code means that merges now also
> benefit from all the special working tree handling that is encoded
> into git-checkout -- whether that's parallel checkout, submodule
> handling, tricky D/F switches or symlink handling, etc.  In contrast
> to merge-recursive, it does not need hundreds and hundreds of lines of
> special worktree updating code sprayed all over the codebase.
> 
> Conflicts are not special in this regard; merge-ort creates a tree
> which has files that include conflict markers, and then merge-ort
> calls checkout() to switch the working copy over to that tree.
> 
> The only issue conflicts present for merge-ort, is that AFTER it has
> checked out that special tree with conflict markers, it then has to go
> and touch up the index afterwards to replace the entries for
> conflicted files with multiple higher order stages.  (You could say
> that merge-recursive is "index-first", since its design focuses on the
> index -- updating it first and then figuring out everything else like
> updating the working tree with special code afterwards.  In contrast,
> merge-ort ignores the index entirely until the very end -- after a new
> merge tree is created and after the working tree is updated.)

Thanks for explaining, it's a nice design feature that you can just reuse
the checkout code to update the working copy with the merge result

>> merge-recursive.c:update_file_flags() does this
>> when updating the work tree
>>
>>          if (S_ISGITLINK(contents->mode)) {
>>                   /*
>>                    * We may later decide to recursively descend into
>>                    * the submodule directory and update its index
>>                    * and/or work tree, but we do not do that now.
>>                    */
>>                   update_wd = 0;
>>                   goto update_index;
>>          }
>>
>> so it looks like it does not update the submodule directory. Given
>> merge-ort is now the default perhaps it's time for rebase (and
>> cherry-pick/revert) to start reading the submodule config settings (we
>> parse the config before we know if we'll be using merge-ort so I don't
>> know how easy it would be to only parse the submodule settings if we are
>> using merge-ort).
> 
> I'd just parse any needed config in all cases.  The submodule settings
> aren't going to hurt merge-recursive; it'll just ignore them.  (Or are
> you worried about a mix-and-match of rebase calling both checkout and
> merge code doing weird things, and you'd rather not have the checkout
> bits update submodules if the merges won't?)

I'd rather just parse the config when we know submodules are going to be
rebased, I think it's confusing if some bit work and others don't. I've
tried the diff below locally, but t7402-submodule-rebase.sh does not show
any change (I was hoping some text_expect_failure would be fixed) so I'm
not sure if it's working or not and I ran out of time.

Best Wishes

Phillip

--- >8 ---
diff --git a/builtin/rebase.c b/builtin/rebase.c
index eed70168df..a35a9e3460 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -26,6 +26,7 @@
  #include "rerere.h"
  #include "branch.h"
  #include "sequencer.h"
+#include "submodule.h"
  #include "rebase-interactive.h"
  #include "reset.h"
  
@@ -1114,6 +1115,10 @@ static int rebase_config(const char *var, const char *value, void *data)
                 return git_config_string(&opts->default_backend, var, value);
         }
  
+       if (starts_with(var, "submodule.")) {
+               return git_default_submodule_config(var, value, NULL);
+       }
+
         return git_default_config(var, value, data);
  }
  
@@ -1820,6 +1825,12 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
             getenv("GIT_TEST_MERGE_ALGORITHM"))
                 options.strategy = xstrdup(getenv("GIT_TEST_MERGE_ALGORITHM"));
  
+       /* only the "ort" merge strategy handles submodules correctly */
+       if (!is_merge(&options) ||
+           (options.strategy && strcmp(options.strategy, "ort")))
+               git_default_submodule_config("submodule.recurse", "false",
+                                            NULL);
+
         switch (options.type) {
         case REBASE_MERGE:
         case REBASE_PRESERVE_MERGES:
  

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 0/2] rebase -i: a couple of small improvements
  2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-09-08 13:41 ` [PATCH 5/5] rebase: remove unused parameter Phillip Wood via GitGitGadget
@ 2021-09-23 15:26 ` Phillip Wood via GitGitGadget
  2021-09-23 15:26   ` [PATCH v2 1/2] sequencer.c: factor out a function Phillip Wood via GitGitGadget
                     ` (2 more replies)
  5 siblings, 3 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-23 15:26 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Philippe Blain, Phillip Wood, Johannes Schindelin,
	Elijah Newren, Phillip Wood

Thanks for the feedback on V1. I have decided to split this series so I'm
just posting a re-roll of the first two patches here. The only change is to
reword the commit message of the first patch as suggested by Eric and Dscho

Cover letter for V1: Fix the re-reading of the todo list after an exec or
reword command and stop forking "git checkout" when checking out "onto"

Phillip Wood (2):
  sequencer.c: factor out a function
  rebase: fix todo-list rereading

 sequencer.c | 47 +++++++++++++++++++++++++++--------------------
 sequencer.h |  1 -
 2 files changed, 27 insertions(+), 21 deletions(-)


base-commit: 66262451ec94d30ac4b80eb3123549cf7a788afd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1034%2Fphillipwood%2Fwip%2Frebase-reread-todo-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1034/phillipwood/wip/rebase-reread-todo-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1034

Range-diff vs v1:

 1:  53cde4825b4 ! 1:  98ebefc140e sequencer.c: factor out a function
     @@ Commit message
          sequencer.c: factor out a function
      
          This code is heavily indented and obscures the high level logic within
     -    the loop. Lets move it to its own function before modifying it in the
     -    next commit.
     +    the loop. Let's move it to its own function before modifying it in the
     +    next commit. Note that there is a subtle change in behavior if the
     +    todo list cannot be reread. Previously todo_list->current was
     +    incremented before returning, now it returns immediately.
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
 2:  3b17a4e3d3f = 2:  0d89c506192 rebase: fix todo-list rereading
 3:  614555fc10f < -:  ----------- reset_head(): mark oid parameter as const
 4:  39ad40c9297 < -:  ----------- rebase -i: don't fork git checkout
 5:  c8a92d4242b < -:  ----------- rebase: remove unused parameter

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/2] sequencer.c: factor out a function
  2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
@ 2021-09-23 15:26   ` Phillip Wood via GitGitGadget
  2021-09-23 15:26   ` [PATCH v2 2/2] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
  2021-09-24 19:24   ` [PATCH v2 0/2] rebase -i: a couple of small improvements Junio C Hamano
  2 siblings, 0 replies; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-23 15:26 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Philippe Blain, Phillip Wood, Johannes Schindelin,
	Elijah Newren, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This code is heavily indented and obscures the high level logic within
the loop. Let's move it to its own function before modifying it in the
next commit. Note that there is a subtle change in behavior if the
todo list cannot be reread. Previously todo_list->current was
incremented before returning, now it returns immediately.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 7f07cd00f3f..a248c886c27 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4254,6 +4254,27 @@ static int stopped_at_head(struct repository *r)
 
 }
 
+static int reread_todo_if_changed(struct repository *r,
+				  struct todo_list *todo_list,
+				  struct replay_opts *opts)
+{
+	struct stat st;
+
+	if (stat(get_todo_path(opts), &st)) {
+		return error_errno(_("could not stat '%s'"),
+				   get_todo_path(opts));
+	} else if (match_stat_data(&todo_list->stat, &st)) {
+		/* Reread the todo file if it has changed. */
+		todo_list_release(todo_list);
+		if (read_populate_todo(r, todo_list, opts))
+			return -1; /* message was printed */
+		/* `current` will be incremented on return */
+		todo_list->current = -1;
+	}
+
+	return 0;
+}
+
 static const char rescheduled_advice[] =
 N_("Could not execute the todo command\n"
 "\n"
@@ -4433,20 +4454,9 @@ static int pick_commits(struct repository *r,
 							item->commit,
 							arg, item->arg_len,
 							opts, res, 0);
-		} else if (is_rebase_i(opts) && check_todo && !res) {
-			struct stat st;
-
-			if (stat(get_todo_path(opts), &st)) {
-				res = error_errno(_("could not stat '%s'"),
-						  get_todo_path(opts));
-			} else if (match_stat_data(&todo_list->stat, &st)) {
-				/* Reread the todo file if it has changed. */
-				todo_list_release(todo_list);
-				if (read_populate_todo(r, todo_list, opts))
-					res = -1; /* message was printed */
-				/* `current` will be incremented below */
-				todo_list->current = -1;
-			}
+		} else if (is_rebase_i(opts) && check_todo && !res &&
+			   reread_todo_if_changed(r, todo_list, opts)) {
+			return -1;
 		}
 
 		todo_list->current++;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 2/2] rebase: fix todo-list rereading
  2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  2021-09-23 15:26   ` [PATCH v2 1/2] sequencer.c: factor out a function Phillip Wood via GitGitGadget
@ 2021-09-23 15:26   ` Phillip Wood via GitGitGadget
  2021-09-24 16:13     ` Junio C Hamano
  2021-09-24 19:24   ` [PATCH v2 0/2] rebase -i: a couple of small improvements Junio C Hamano
  2 siblings, 1 reply; 31+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-09-23 15:26 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Philippe Blain, Phillip Wood, Johannes Schindelin,
	Elijah Newren, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

54fd3243da ("rebase -i: reread the todo list if `exec` touched it",
2017-04-26) sought to reread the todo list after running an exec
command only if it had been changed. To accomplish this it checks the
stat data of the todo list after running an exec command to see if it
has changed. Unfortunately there are two problems, firstly the
implementation is buggy we actually reread the list after each exec
which is quadratic in the number of commit lookups and secondly the
design is predicated on using nanosecond time stamps which are not the
default.

The implementation bug stems from the fact that we write a new todo
list to disk before running each command but do not update the stat
data to reflect this[1].

The design problem is that it is possible for the user to edit the
todo list without changing its size or inode which means we have to
rely on the mtime to tell us if it has changed. Unfortunately unless
git is built with USE_NSEC it is possible for the original and edited
list to share the same mtime.

Ideally "git rebase --edit-todo" would set a flag that we would then
check in sequencer.c. Unfortunately this is approach will not work as
there are scripts in the wild that write to the todo list directly
without running "git rebase --edit-todo". Instead of relying on stat
data this patch simply reads the possibly edited todo list and
compares it to the original with memcmp(). This is much faster than
reparsing the todo list each time. This patch reduces the time to run

   git rebase -r -xtrue v2.32.0~100 v2.32.0

which runs 419 exec commands by 6.6%. For comparison fixing the
implementation bug in stat based approach reduces the time by a
further 1.4% and is indistinguishable from never rereading the todo
list.

[1] https://lore.kernel.org/git/20191125131833.GD23183@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 19 ++++++++-----------
 sequencer.h |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index a248c886c27..d51440ddcd9 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2671,7 +2671,6 @@ static int read_populate_todo(struct repository *r,
 			      struct todo_list *todo_list,
 			      struct replay_opts *opts)
 {
-	struct stat st;
 	const char *todo_file = get_todo_path(opts);
 	int res;
 
@@ -2679,11 +2678,6 @@ static int read_populate_todo(struct repository *r,
 	if (strbuf_read_file_or_whine(&todo_list->buf, todo_file) < 0)
 		return -1;
 
-	res = stat(todo_file, &st);
-	if (res)
-		return error(_("could not stat '%s'"), todo_file);
-	fill_stat_data(&todo_list->stat, &st);
-
 	res = todo_list_parse_insn_buffer(r, todo_list->buf.buf, todo_list);
 	if (res) {
 		if (is_rebase_i(opts))
@@ -4258,12 +4252,14 @@ static int reread_todo_if_changed(struct repository *r,
 				  struct todo_list *todo_list,
 				  struct replay_opts *opts)
 {
-	struct stat st;
+	int offset;
+	struct strbuf buf = STRBUF_INIT;
 
-	if (stat(get_todo_path(opts), &st)) {
-		return error_errno(_("could not stat '%s'"),
-				   get_todo_path(opts));
-	} else if (match_stat_data(&todo_list->stat, &st)) {
+	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
+		return -1;
+	offset = get_item_line_offset(todo_list, todo_list->current + 1);
+	if (buf.len != todo_list->buf.len - offset ||
+	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
 		/* Reread the todo file if it has changed. */
 		todo_list_release(todo_list);
 		if (read_populate_todo(r, todo_list, opts))
@@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
 		/* `current` will be incremented on return */
 		todo_list->current = -1;
 	}
+	strbuf_release(&buf);
 
 	return 0;
 }
diff --git a/sequencer.h b/sequencer.h
index d57d8ea23d7..cdeb0c6be47 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -116,7 +116,6 @@ struct todo_list {
 	struct todo_item *items;
 	int nr, alloc, current;
 	int done_nr, total_nr;
-	struct stat_data stat;
 };
 
 #define TODO_LIST_INIT { STRBUF_INIT }
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] rebase: fix todo-list rereading
  2021-09-23 15:26   ` [PATCH v2 2/2] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
@ 2021-09-24 16:13     ` Junio C Hamano
  2021-09-28 10:20       ` Phillip Wood
  0 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2021-09-24 16:13 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Eric Sunshine, Philippe Blain, Phillip Wood,
	Johannes Schindelin, Elijah Newren, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> .... Instead of relying on stat
> data this patch simply reads the possibly edited todo list and
> compares it to the original with memcmp(). This is much faster than
> reparsing the todo list each time.

Nice.  Is that an egg of Columbus or what ;-)

> +	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
> +		return -1;
> +	offset = get_item_line_offset(todo_list, todo_list->current + 1);
> +	if (buf.len != todo_list->buf.len - offset ||
> +	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
>  		/* Reread the todo file if it has changed. */
>  		todo_list_release(todo_list);
>  		if (read_populate_todo(r, todo_list, opts))

As we already have the contents of hte file in the buffer, we could
further refactor the code around read_populate_todo() to tell it not
to reopen and reread the rebase-todo file (which risks toctou race),
but that is OK for now, I would think.

> @@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
>  		/* `current` will be incremented on return */
>  		todo_list->current = -1;
>  	}
> +	strbuf_release(&buf);
>  
>  	return 0;
>  }
> diff --git a/sequencer.h b/sequencer.h
> index d57d8ea23d7..cdeb0c6be47 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -116,7 +116,6 @@ struct todo_list {
>  	struct todo_item *items;
>  	int nr, alloc, current;
>  	int done_nr, total_nr;
> -	struct stat_data stat;

Good riddance ;-)

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/2] rebase -i: a couple of small improvements
  2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
  2021-09-23 15:26   ` [PATCH v2 1/2] sequencer.c: factor out a function Phillip Wood via GitGitGadget
  2021-09-23 15:26   ` [PATCH v2 2/2] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
@ 2021-09-24 19:24   ` Junio C Hamano
  2 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2021-09-24 19:24 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Eric Sunshine, Philippe Blain, Phillip Wood,
	Johannes Schindelin, Elijah Newren, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Thanks for the feedback on V1. I have decided to split this series so I'm
> just posting a re-roll of the first two patches here. The only change is to
> reword the commit message of the first patch as suggested by Eric and Dscho

OK.  Prioritizing the fix and leaving the add-on part that is not
yet solid for later makes perfect sense.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] rebase: fix todo-list rereading
  2021-09-24 16:13     ` Junio C Hamano
@ 2021-09-28 10:20       ` Phillip Wood
  0 siblings, 0 replies; 31+ messages in thread
From: Phillip Wood @ 2021-09-28 10:20 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Eric Sunshine, Philippe Blain, Johannes Schindelin,
	Elijah Newren, Phillip Wood

On 24/09/2021 17:13, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> .... Instead of relying on stat
>> data this patch simply reads the possibly edited todo list and
>> compares it to the original with memcmp(). This is much faster than
>> reparsing the todo list each time.
> 
> Nice.  Is that an egg of Columbus or what ;-)
> 
>> +	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
>> +		return -1;
>> +	offset = get_item_line_offset(todo_list, todo_list->current + 1);
>> +	if (buf.len != todo_list->buf.len - offset ||
>> +	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
>>   		/* Reread the todo file if it has changed. */
>>   		todo_list_release(todo_list);
>>   		if (read_populate_todo(r, todo_list, opts))
> 
> As we already have the contents of hte file in the buffer, we could
> further refactor the code around read_populate_todo() to tell it not
> to reopen and reread the rebase-todo file (which risks toctou race),
> but that is OK for now, I would think.

I did wonder about doing that but decided to punt it for now, I don't 
think the race is a concern - if another process is writing to the todo 
list while rebase is picking commits it is asking for trouble already. I 
suspect doing this will be easier once git-rebase--preserve-merges.sh is 
gone from master.

>> @@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
>>   		/* `current` will be incremented on return */
>>   		todo_list->current = -1;
>>   	}
>> +	strbuf_release(&buf);
>>   
>>   	return 0;
>>   }
>> diff --git a/sequencer.h b/sequencer.h
>> index d57d8ea23d7..cdeb0c6be47 100644
>> --- a/sequencer.h
>> +++ b/sequencer.h
>> @@ -116,7 +116,6 @@ struct todo_list {
>>   	struct todo_item *items;
>>   	int nr, alloc, current;
>>   	int done_nr, total_nr;
>> -	struct stat_data stat;
> 
> Good riddance ;-)

Hear, hear!

> Will queue.  Thanks.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-09-28 10:21 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
2021-09-08 17:51   ` Eric Sunshine
2021-09-09 10:10     ` Phillip Wood
2021-09-09 10:44   ` Johannes Schindelin
2021-09-08 13:41 ` [PATCH 2/5] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
2021-09-09 10:48   ` Johannes Schindelin
2021-09-08 13:41 ` [PATCH 3/5] reset_head(): mark oid parameter as const Phillip Wood via GitGitGadget
2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
2021-09-08 18:14   ` Philippe Blain
2021-09-09 10:09     ` Phillip Wood
2021-09-09 12:40       ` Philippe Blain
2021-09-09 13:57         ` Phillip Wood
2021-09-09 15:01           ` Elijah Newren
2021-09-10 12:07             ` Philippe Blain
2021-09-15 15:44             ` Phillip Wood
2021-09-09 10:53     ` Johannes Schindelin
2021-09-09 12:44       ` Philippe Blain
2021-09-09 21:43         ` Johannes Schindelin
2021-09-10 10:46         ` Johannes Schindelin
2021-09-10 11:58           ` Philippe Blain
2021-09-09 15:03   ` Elijah Newren
2021-09-08 13:41 ` [PATCH 5/5] rebase: remove unused parameter Phillip Wood via GitGitGadget
2021-09-09 10:54   ` Johannes Schindelin
2021-09-09 14:04     ` Phillip Wood
2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
2021-09-23 15:26   ` [PATCH v2 1/2] sequencer.c: factor out a function Phillip Wood via GitGitGadget
2021-09-23 15:26   ` [PATCH v2 2/2] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
2021-09-24 16:13     ` Junio C Hamano
2021-09-28 10:20       ` Phillip Wood
2021-09-24 19:24   ` [PATCH v2 0/2] rebase -i: a couple of small improvements Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.