All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Philippe Blain <levraiphilippeblain@gmail.com>,
	Phillip Wood <phillip.wood123@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Elijah Newren <newren@gmail.com>,
	Phillip Wood <phillip.wood@dunelm.org.uk>,
	Phillip Wood <phillip.wood@dunelm.org.uk>
Subject: [PATCH v2 2/2] rebase: fix todo-list rereading
Date: Thu, 23 Sep 2021 15:26:21 +0000	[thread overview]
Message-ID: <0d89c506192a84822a3fbd6c76befac187129ad4.1632410782.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1034.v2.git.1632410782.gitgitgadget@gmail.com>

From: Phillip Wood <phillip.wood@dunelm.org.uk>

54fd3243da ("rebase -i: reread the todo list if `exec` touched it",
2017-04-26) sought to reread the todo list after running an exec
command only if it had been changed. To accomplish this it checks the
stat data of the todo list after running an exec command to see if it
has changed. Unfortunately there are two problems, firstly the
implementation is buggy we actually reread the list after each exec
which is quadratic in the number of commit lookups and secondly the
design is predicated on using nanosecond time stamps which are not the
default.

The implementation bug stems from the fact that we write a new todo
list to disk before running each command but do not update the stat
data to reflect this[1].

The design problem is that it is possible for the user to edit the
todo list without changing its size or inode which means we have to
rely on the mtime to tell us if it has changed. Unfortunately unless
git is built with USE_NSEC it is possible for the original and edited
list to share the same mtime.

Ideally "git rebase --edit-todo" would set a flag that we would then
check in sequencer.c. Unfortunately this is approach will not work as
there are scripts in the wild that write to the todo list directly
without running "git rebase --edit-todo". Instead of relying on stat
data this patch simply reads the possibly edited todo list and
compares it to the original with memcmp(). This is much faster than
reparsing the todo list each time. This patch reduces the time to run

   git rebase -r -xtrue v2.32.0~100 v2.32.0

which runs 419 exec commands by 6.6%. For comparison fixing the
implementation bug in stat based approach reduces the time by a
further 1.4% and is indistinguishable from never rereading the todo
list.

[1] https://lore.kernel.org/git/20191125131833.GD23183@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 sequencer.c | 19 ++++++++-----------
 sequencer.h |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index a248c886c27..d51440ddcd9 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2671,7 +2671,6 @@ static int read_populate_todo(struct repository *r,
 			      struct todo_list *todo_list,
 			      struct replay_opts *opts)
 {
-	struct stat st;
 	const char *todo_file = get_todo_path(opts);
 	int res;
 
@@ -2679,11 +2678,6 @@ static int read_populate_todo(struct repository *r,
 	if (strbuf_read_file_or_whine(&todo_list->buf, todo_file) < 0)
 		return -1;
 
-	res = stat(todo_file, &st);
-	if (res)
-		return error(_("could not stat '%s'"), todo_file);
-	fill_stat_data(&todo_list->stat, &st);
-
 	res = todo_list_parse_insn_buffer(r, todo_list->buf.buf, todo_list);
 	if (res) {
 		if (is_rebase_i(opts))
@@ -4258,12 +4252,14 @@ static int reread_todo_if_changed(struct repository *r,
 				  struct todo_list *todo_list,
 				  struct replay_opts *opts)
 {
-	struct stat st;
+	int offset;
+	struct strbuf buf = STRBUF_INIT;
 
-	if (stat(get_todo_path(opts), &st)) {
-		return error_errno(_("could not stat '%s'"),
-				   get_todo_path(opts));
-	} else if (match_stat_data(&todo_list->stat, &st)) {
+	if (strbuf_read_file_or_whine(&buf, get_todo_path(opts)) < 0)
+		return -1;
+	offset = get_item_line_offset(todo_list, todo_list->current + 1);
+	if (buf.len != todo_list->buf.len - offset ||
+	    memcmp(buf.buf, todo_list->buf.buf + offset, buf.len)) {
 		/* Reread the todo file if it has changed. */
 		todo_list_release(todo_list);
 		if (read_populate_todo(r, todo_list, opts))
@@ -4271,6 +4267,7 @@ static int reread_todo_if_changed(struct repository *r,
 		/* `current` will be incremented on return */
 		todo_list->current = -1;
 	}
+	strbuf_release(&buf);
 
 	return 0;
 }
diff --git a/sequencer.h b/sequencer.h
index d57d8ea23d7..cdeb0c6be47 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -116,7 +116,6 @@ struct todo_list {
 	struct todo_item *items;
 	int nr, alloc, current;
 	int done_nr, total_nr;
-	struct stat_data stat;
 };
 
 #define TODO_LIST_INIT { STRBUF_INIT }
-- 
gitgitgadget

  parent reply	other threads:[~2021-09-23 15:26 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08 13:41 [PATCH 0/5] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
2021-09-08 13:41 ` [PATCH 1/5] sequencer.c: factor out a function Phillip Wood via GitGitGadget
2021-09-08 17:51   ` Eric Sunshine
2021-09-09 10:10     ` Phillip Wood
2021-09-09 10:44   ` Johannes Schindelin
2021-09-08 13:41 ` [PATCH 2/5] rebase: fix todo-list rereading Phillip Wood via GitGitGadget
2021-09-09 10:48   ` Johannes Schindelin
2021-09-08 13:41 ` [PATCH 3/5] reset_head(): mark oid parameter as const Phillip Wood via GitGitGadget
2021-09-08 13:41 ` [PATCH 4/5] rebase -i: don't fork git checkout Phillip Wood via GitGitGadget
2021-09-08 18:14   ` Philippe Blain
2021-09-09 10:09     ` Phillip Wood
2021-09-09 12:40       ` Philippe Blain
2021-09-09 13:57         ` Phillip Wood
2021-09-09 15:01           ` Elijah Newren
2021-09-10 12:07             ` Philippe Blain
2021-09-15 15:44             ` Phillip Wood
2021-09-09 10:53     ` Johannes Schindelin
2021-09-09 12:44       ` Philippe Blain
2021-09-09 21:43         ` Johannes Schindelin
2021-09-10 10:46         ` Johannes Schindelin
2021-09-10 11:58           ` Philippe Blain
2021-09-09 15:03   ` Elijah Newren
2021-09-08 13:41 ` [PATCH 5/5] rebase: remove unused parameter Phillip Wood via GitGitGadget
2021-09-09 10:54   ` Johannes Schindelin
2021-09-09 14:04     ` Phillip Wood
2021-09-23 15:26 ` [PATCH v2 0/2] rebase -i: a couple of small improvements Phillip Wood via GitGitGadget
2021-09-23 15:26   ` [PATCH v2 1/2] sequencer.c: factor out a function Phillip Wood via GitGitGadget
2021-09-23 15:26   ` Phillip Wood via GitGitGadget [this message]
2021-09-24 16:13     ` [PATCH v2 2/2] rebase: fix todo-list rereading Junio C Hamano
2021-09-28 10:20       ` Phillip Wood
2021-09-24 19:24   ` [PATCH v2 0/2] rebase -i: a couple of small improvements Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d89c506192a84822a3fbd6c76befac187129ad4.1632410782.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=levraiphilippeblain@gmail.com \
    --cc=newren@gmail.com \
    --cc=phillip.wood123@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.