git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/6] stash: drop usage of a second index
@ 2020-05-05 10:48 Alban Gruin
  2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

The old scripted `git stash' used to create a second index to save
modified and untracked files, and restore untracked files, without
affecting the main index.  This behaviour was carried on when it was
rewritten in C, and here, most operations performed on the second index
are done by forked commands (ie. `read-tree' instead of reset_tree(),
etc.).  This works most of the time, except in some edge case with the
split-index when the split file has expired and is deleted by a forked
command: the main index may still contain a reference to the now-deleted
file, and subsequent operations on the index will fail [0].

The goal of this series is to modernise (a bit) builtin/stash.c, and to
fix the aforementionned edge case.

I have to admit that I don't really know how to test this.
GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
the split-index at all, at least in `git stash' and its forks), and I'm
reluctant to add explicits tests on `git stash' about the split-index,
when nothing in its code explicitly does unusual things with the index
once this series is applied.  If anyone wants to share opinions about
this, I would be happy to read them.

This series is based on b34789c0b0 ("The sixth batch", 2020-05-01).

The tip of this series is tagged as "stash-remove-second-index-v1" at
https://github.com/agrn/git.

[0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/

Alban Gruin (6):
  stash: mark `i_tree' in reset_tree() const
  stash: remove the second index in stash_working_tree()
  stash: remove the second index in stash_patch()
  stash: remove the second index in save_untracked_files()
  stash: remove the second index in restore_untracked()
  stash: remove `stash_index_path'

 builtin/stash.c | 151 +++++++++++++++---------------------------------
 1 file changed, 46 insertions(+), 105 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-13  8:09   ` Christian Couder
  2020-05-05 10:48 ` [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

As reset_tree() does not change the value pointed by `i_tree', and that
it will be provided with `the_hash_algo->empty_tree' which is a
constant, it is changed to be a pointer to a constant.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 0c52a3b849..9baa8b379e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix)
 	return do_clear_stash();
 }
 
-static int reset_tree(struct object_id *i_tree, int update, int reset)
+static int reset_tree(const struct object_id *i_tree, int update, int reset)
 {
 	int nr_trees = 1;
 	struct unpack_trees_options opts;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree()
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
  2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-13  8:52   ` Christian Couder
  2020-05-05 10:48 ` [RFC PATCH v1 3/6] stash: remove the second index in stash_patch() Alban Gruin
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

This removes the second index used in stash_working_tree() to simplify
the code.  It also help to avoid issues with the split-index: when
stash_working_tree() is called, the index is at `i_tree', and this tree
is extracted in a second index for use in a subcommand.  This is not a
problem in the non-split-index case, but in the split-index case, if the
shared index file has expired and is removed by a subcommand, the main
index contains a reference to a file that no longer exists.

The calls to set_alternative_index_output() are dropped to extract
`i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
starting `update-index'.  When it exits, the index has changed, and must
be discarded.

The call to reset_tree() becomes useless: the only caller of
stash_working_tree() is do_create_stash(), which creates `i_tree' from
its index, calls save_untracked_files() if requested (but as it also
works on a second index, it is unaffected), then calls
stash_working_tree().  But when save_untracked_files() will be modified
to stop using another index, it won't reset the tree, because
stash_patch() wants to work on a different tree (`b_tree') than
stash_working_tree().

At the end of the function, the tree is reset to `i_tree'.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 9baa8b379e..2535335275 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	struct rev_info rev;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
 	struct strbuf diff_output = STRBUF_INIT;
-	struct index_state istate = { NULL };
 
 	init_revisions(&rev, NULL);
 	copy_pathspec(&rev.prune_data, ps);
 
-	set_alternate_index_output(stash_index_path.buf);
 	if (reset_tree(&info->i_tree, 0, 0)) {
 		ret = -1;
 		goto done;
 	}
-	set_alternate_index_output(NULL);
 
 	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = add_diff_to_buf;
@@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	argv_array_pushl(&cp_upd_index.args, "update-index",
 			 "--ignore-skip-worktree-entries",
 			 "-z", "--add", "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len,
 			 NULL, 0, NULL, 0)) {
@@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 		goto done;
 	}
 
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
+	    reset_tree(&info->i_tree, 0, 1))
 		ret = -1;
-		goto done;
-	}
 
 done:
-	discard_index(&istate);
 	UNLEAK(rev);
 	object_array_clear(&rev.pending);
 	clear_pathspec(&rev.prune_data);
 	strbuf_release(&diff_output);
-	remove_path(stash_index_path.buf);
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 3/6] stash: remove the second index in stash_patch()
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
  2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
  2020-05-05 10:48 ` [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-13  9:38   ` Christian Couder
  2020-05-05 10:48 ` [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

This removes the second index used in stash_patch().

This function starts by resetting the index (which is set at `i_tree')
to HEAD, which has been stored in `b_commit' by do_create_stash(), and
the call to `read-tree' is replaced by reset_tree().  The index is
discarded after run_add_interactive(), but not `diff-tree' as this
command should not change it.

Since the index has been changed, and subsequent code might be sensitive
to this, it is reset to `i_tree' at the end of the function.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 47 ++++++++++-------------------------------------
 1 file changed, 10 insertions(+), 37 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 2535335275..eaeb7bc8c4 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		       struct strbuf *out_patch, int quiet)
 {
 	int ret = 0;
-	struct child_process cp_read_tree = CHILD_PROCESS_INIT;
 	struct child_process cp_diff_tree = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
-	char *old_index_env = NULL, *old_repo_index_file;
-
-	remove_path(stash_index_path.buf);
 
-	cp_read_tree.git_cmd = 1;
-	argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL);
-	argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp_read_tree)) {
-		ret = -1;
-		goto done;
-	}
+	if (reset_tree(&info->b_commit, 0, 1))
+		return -1;
 
 	/* Find out what the user wants. */
-	old_repo_index_file = the_repository->index_file;
-	the_repository->index_file = stash_index_path.buf;
-	old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
-	setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1);
-
 	ret = run_add_interactive(NULL, "--patch=stash", ps);
 
-	the_repository->index_file = old_repo_index_file;
-	if (old_index_env && *old_index_env)
-		setenv(INDEX_ENVIRONMENT, old_index_env, 1);
-	else
-		unsetenv(INDEX_ENVIRONMENT);
-	FREE_AND_NULL(old_index_env);
-
 	/* State of the working tree. */
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL))
+		return -1;
 
 	cp_diff_tree.git_cmd = 1;
 	argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD",
 			 oid_to_hex(&info->w_tree), "--", NULL);
-	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
+	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0))
+		return -1;
 
 	if (!out_patch->len) {
 		if (!quiet)
@@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		ret = 1;
 	}
 
-done:
-	discard_index(&istate);
-	remove_path(stash_index_path.buf);
+	if (reset_tree(&info->i_tree, 0, 1))
+		return -1;
+
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files()
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (2 preceding siblings ...)
  2020-05-05 10:48 ` [RFC PATCH v1 3/6] stash: remove the second index in stash_patch() Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-13 18:51   ` Christian Couder
  2020-05-05 10:48 ` [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked() Alban Gruin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

This removes the second index used in save_untracked_files().

This functions creates a new commit with no parents, and a tree
containing only untracked files, so the tree is reset to the empty tree
at the beginning (hence the need for reset_tree() to accept constant
trees).  The environment of `update-index' is no longer updated, and the
index is discarded after this command exited.

As the only caller of this function is do_create_stash(), and the next
user of the index is either save_untracked_files() or stash_patch(),
which both want a different tree, the index is left as-is at the end.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index eaeb7bc8c4..cbe37cd24b 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked,
 static int save_untracked_files(struct stash_info *info, struct strbuf *msg,
 				struct strbuf files)
 {
-	int ret = 0;
 	struct strbuf untracked_msg = STRBUF_INIT;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
+
+	if (reset_tree(the_hash_algo->empty_tree, 0, 1))
+		return -1;
 
 	cp_upd_index.git_cmd = 1;
 	argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add",
 			 "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
-	strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf);
 	if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0,
-			 NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
+			 NULL, 0))
+		return -1;
 
-	if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
+	discard_cache();
+	if (write_cache_as_tree(&info->u_tree, 0, NULL))
+		return -1;
 
+	strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf);
 	if (commit_tree(untracked_msg.buf, untracked_msg.len,
 			&info->u_tree, NULL, &info->u_commit, NULL, NULL)) {
-		ret = -1;
-		goto done;
+		strbuf_release(&untracked_msg);
+		return -1;
 	}
 
-done:
-	discard_index(&istate);
+	/* Do not reset the tree, as either stash_patch() or
+	 * stash_working_tree() will do it. */
+
 	strbuf_release(&untracked_msg);
-	remove_path(stash_index_path.buf);
-	return ret;
+	return 0;
 }
 
 static int stash_patch(struct stash_info *info, const struct pathspec *ps,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked()
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (3 preceding siblings ...)
  2020-05-05 10:48 ` [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-13 19:41   ` Christian Couder
  2020-05-05 10:48 ` [RFC PATCH v1 6/6] stash: remove `stash_index_path' Alban Gruin
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

This removes the second index used in restore_untracked().

The call to `read-tree' is replaced by reset_tree() with the appropriate
parameters (no update, no reset).  The environment of `checkout-index'
is no longer modified, and the cache is discarded when it exists.

In do_apply_stash(), the changes are a bit more involved: to avoid
conflicts with the merged index, restore_untracked() is moved after
merge_recursive_generic().

This introduces another problem: the files that were untracked once are
now added to the index, and update_index() would add back those files in
the index.  To avoid this, get_newly_staged() is moved before
restore_untracked().

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index cbe37cd24b..1eafc1fe8d 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -359,29 +359,16 @@ static int restore_untracked(struct object_id *u_tree)
 	int res;
 	struct child_process cp = CHILD_PROCESS_INIT;
 
-	/*
-	 * We need to run restore files from a given index, but without
-	 * affecting the current index, so we use GIT_INDEX_FILE with
-	 * run_command to fork processes that will not interfere.
-	 */
-	cp.git_cmd = 1;
-	argv_array_push(&cp.args, "read-tree");
-	argv_array_push(&cp.args, oid_to_hex(u_tree));
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp)) {
-		remove_path(stash_index_path.buf);
+	if (reset_tree(u_tree, 0, 0))
 		return -1;
-	}
 
 	child_process_init(&cp);
 	cp.git_cmd = 1;
 	argv_array_pushl(&cp.args, "checkout-index", "--all", NULL);
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	res = run_command(&cp);
-	remove_path(stash_index_path.buf);
+	discard_cache();
+
 	return res;
 }
 
@@ -395,6 +382,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 	struct object_id index_tree;
 	struct commit *result;
 	const struct object_id *bases[1];
+	struct strbuf newly_staged = STRBUF_INIT;
 
 	read_cache_preload(NULL);
 	if (refresh_and_write_cache(REFRESH_QUIET, 0, 0))
@@ -433,9 +421,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		}
 	}
 
-	if (info->has_u && restore_untracked(&info->u_tree))
-		return error(_("could not restore untracked files from stash"));
-
 	init_merge_options(&o, the_repository);
 
 	o.branch1 = "Updated upstream";
@@ -463,24 +448,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		return ret;
 	}
 
+	if (!has_index && get_newly_staged(&newly_staged, &c_tree)) {
+		strbuf_release(&newly_staged);
+		return -1;
+	}
+
+	if (info->has_u && restore_untracked(&info->u_tree)) {
+		strbuf_release(&newly_staged);
+		return error(_("could not restore untracked files from stash"));
+	}
+
 	if (has_index) {
 		if (reset_tree(&index_tree, 0, 0))
 			return -1;
 	} else {
-		struct strbuf out = STRBUF_INIT;
-
-		if (get_newly_staged(&out, &c_tree)) {
-			strbuf_release(&out);
-			return -1;
-		}
-
 		if (reset_tree(&c_tree, 0, 1)) {
-			strbuf_release(&out);
+			strbuf_release(&newly_staged);
 			return -1;
 		}
 
-		ret = update_index(&out);
-		strbuf_release(&out);
+		ret = update_index(&newly_staged);
+		strbuf_release(&newly_staged);
 		if (ret)
 			return -1;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v1 6/6] stash: remove `stash_index_path'
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (4 preceding siblings ...)
  2020-05-05 10:48 ` [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked() Alban Gruin
@ 2020-05-05 10:48 ` Alban Gruin
  2020-06-04 12:07 ` [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-05-05 10:48 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano, Alban Gruin

Since stash no longer uses a second index, `stash_index_path' is now
unused, and can be dropped.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 1eafc1fe8d..709fd5ca34 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = {
 };
 
 static const char *ref_stash = "refs/stash";
-static struct strbuf stash_index_path = STRBUF_INIT;
 
 /*
  * w_commit is set to the commit containing the working tree
@@ -1506,8 +1505,6 @@ static int save_stash(int argc, const char **argv, const char *prefix)
 
 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
-	pid_t pid = getpid();
-	const char *index_file;
 	struct argv_array args = ARGV_ARRAY_INIT;
 
 	struct option options[] = {
@@ -1524,10 +1521,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, git_stash_usage,
 			     PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
 
-	index_file = get_index_file();
-	strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file,
-		    (uintmax_t)pid);
-
 	if (!argc)
 		return !!push_stash(0, NULL, prefix, 0);
 	else if (!strcmp(argv[0], "apply"))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (5 preceding siblings ...)
  2020-05-05 10:48 ` [RFC PATCH v1 6/6] stash: remove `stash_index_path' Alban Gruin
@ 2020-06-04 12:07 ` Alban Gruin
  2020-06-13  7:52 ` Christian Couder
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-04 12:07 UTC (permalink / raw)
  To: git; +Cc: Thomas Gummerer, Johannes Schindelin, Junio C Hamano

Hi,

Le 05/05/2020 à 12:48, Alban Gruin a écrit :
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index.  This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.).  This works most of the time, except in some edge case with the
> split-index when the split file has expired and is deleted by a forked
> command: the main index may still contain a reference to the now-deleted
> file, and subsequent operations on the index will fail [0].
> 
> The goal of this series is to modernise (a bit) builtin/stash.c, and to
> fix the aforementionned edge case.
> 
> I have to admit that I don't really know how to test this.
> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
> the split-index at all, at least in `git stash' and its forks), and I'm
> reluctant to add explicits tests on `git stash' about the split-index,
> when nothing in its code explicitly does unusual things with the index
> once this series is applied.  If anyone wants to share opinions about
> this, I would be happy to read them.
> 
> This series is based on b34789c0b0 ("The sixth batch", 2020-05-01).
> 
> The tip of this series is tagged as "stash-remove-second-index-v1" at
> https://github.com/agrn/git.
> 
> [0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/
> 
> Alban Gruin (6):
>   stash: mark `i_tree' in reset_tree() const
>   stash: remove the second index in stash_working_tree()
>   stash: remove the second index in stash_patch()
>   stash: remove the second index in save_untracked_files()
>   stash: remove the second index in restore_untracked()
>   stash: remove `stash_index_path'
> 
>  builtin/stash.c | 151 +++++++++++++++---------------------------------
>  1 file changed, 46 insertions(+), 105 deletions(-)
> 

I’m afraid this series fell through the cracks.

Alban


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (6 preceding siblings ...)
  2020-06-04 12:07 ` [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
@ 2020-06-13  7:52 ` Christian Couder
  2020-06-25 12:35   ` Alban Gruin
  2020-06-15 15:27 ` SZEDER Gábor
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
  9 siblings, 1 reply; 44+ messages in thread
From: Christian Couder @ 2020-06-13  7:52 UTC (permalink / raw)
  To: Alban Gruin
  Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano,
	Son Luong Ngoc

Hi,

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index.  This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.).  This works most of the time, except in some edge case with the
> split-index when the split file has expired and is deleted by a forked
> command: the main index may still contain a reference to the now-deleted
> file, and subsequent operations on the index will fail [0].

Thanks for working on this! I agree that it would be nice to fix split
index issues as it could help for sure with huge repositories. Sorry
also that this patch series fell through the cracks.

I am adding Son Luong Ngoc in Cc as he reported the issue that this
series fixes.

> The goal of this series is to modernise (a bit) builtin/stash.c, and to
> fix the aforementionned edge case.
>
> I have to admit that I don't really know how to test this.
> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
> the split-index at all, at least in `git stash' and its forks),

It should have worked when it was introduced, though maybe not for `git stash`.

> and I'm
> reluctant to add explicits tests on `git stash' about the split-index,
> when nothing in its code explicitly does unusual things with the index
> once this series is applied.  If anyone wants to share opinions about
> this, I would be happy to read them.

I understand. I think the good way forward would be to fix
GIT_TEST_SPLIT_INDEX and find a way to ensure that it keeps working in
the future.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const
  2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
@ 2020-06-13  8:09   ` Christian Couder
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-06-13  8:09 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> As reset_tree() does not change the value pointed by `i_tree', and that
> it will be provided with `the_hash_algo->empty_tree' which is a
> constant, it is changed to be a pointer to a constant.

[...]

> -static int reset_tree(struct object_id *i_tree, int update, int reset)
> +static int reset_tree(const struct object_id *i_tree, int update, int reset)

Yeah, in rest_tree() it is only used like this:

        tree = parse_tree_indirect(i_tree);

and parse_tree_indirect() takes a 'const struct object_id *'.



> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree()
  2020-05-05 10:48 ` [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
@ 2020-06-13  8:52   ` Christian Couder
  2020-06-13 18:00     ` Alban Gruin
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Couder @ 2020-06-13  8:52 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in stash_working_tree() to simplify
> the code.  It also help to avoid issues with the split-index: when

s/help/helps/

> stash_working_tree() is called, the index is at `i_tree', and this tree
> is extracted in a second index for use in a subcommand.  This is not a
> problem in the non-split-index case, but in the split-index case, if the
> shared index file has expired and is removed by a subcommand, the main
> index contains a reference to a file that no longer exists.

As this is fixing a bug and there is no test, it might help if you can
at least give an example of something that used to fail before this
patch and doesn't after it. You are talking about stash subcommands
but it is not very clear which one for example can trigger the bug.

> The calls to set_alternative_index_output() are dropped to extract
> `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
> starting `update-index'.  When it exits, the index has changed, and must
> be discarded.

That makes sense.

> The call to reset_tree() becomes useless:

Your patch doesn't remove any call to reset_tree(), but actually adds
one. So the above is difficult to understand.

Do you want to say that in a later patch it will be possible to remove
the call to reset_tree()? Or do you want to say that the call to
write_index_as_tree() becomes useless?

> the only caller of
> stash_working_tree() is do_create_stash(), which creates `i_tree' from
> its index, calls save_untracked_files() if requested (but as it also
> works on a second index, it is unaffected), then calls
> stash_working_tree().  But when save_untracked_files() will be modified
> to stop using another index, it won't reset the tree, because
> stash_patch() wants to work on a different tree (`b_tree') than
> stash_working_tree().
>
> At the end of the function, the tree is reset to `i_tree'.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 3/6] stash: remove the second index in stash_patch()
  2020-05-05 10:48 ` [RFC PATCH v1 3/6] stash: remove the second index in stash_patch() Alban Gruin
@ 2020-06-13  9:38   ` Christian Couder
  2020-06-13 10:04     ` Christian Couder
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Couder @ 2020-06-13  9:38 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in stash_patch().
>
> This function starts by resetting the index (which is set at `i_tree')
> to HEAD, which has been stored in `b_commit' by do_create_stash(), and
> the call to `read-tree' is replaced by reset_tree().  The index is
> discarded after run_add_interactive(), but not `diff-tree' as this
> command should not change it.
>
> Since the index has been changed, and subsequent code might be sensitive
> to this, it is reset to `i_tree' at the end of the function.

[...]

>         /* State of the working tree. */
> -       if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
> -                               NULL)) {
> -               ret = -1;
> -               goto done;
> -       }
> +       discard_cache();
> +       if (write_cache_as_tree(&info->w_tree, 0, NULL))
> +               return -1;

In the previous patch you use the following:

+       if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
+           reset_tree(&info->i_tree, 0, 1))

Here the reset_tree(&info->i_tree, 0, 1) call is in the "done:" part
of the code. Is there a reason for that? Can't
reset_tree(&info->i_tree, 0, 1) be tried here before returning an
error? Or was the previous patch wrong? I guess the reason why the
reset_tree(&info->i_tree, 0, 1) call here is in the "done:" part of
the call is because it should be after the diff tree command is
launched. I see that the commit message tries to explain this a bit,
but it is still not very clear to me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 3/6] stash: remove the second index in stash_patch()
  2020-06-13  9:38   ` Christian Couder
@ 2020-06-13 10:04     ` Christian Couder
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-06-13 10:04 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Sat, Jun 13, 2020 at 11:38 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
> >
> > This removes the second index used in stash_patch().
> >
> > This function starts by resetting the index (which is set at `i_tree')
> > to HEAD, which has been stored in `b_commit' by do_create_stash(), and
> > the call to `read-tree' is replaced by reset_tree().  The index is
> > discarded after run_add_interactive(), but not `diff-tree' as this
> > command should not change it.
> >
> > Since the index has been changed, and subsequent code might be sensitive
> > to this, it is reset to `i_tree' at the end of the function.
>
> [...]
>
> >         /* State of the working tree. */
> > -       if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
> > -                               NULL)) {
> > -               ret = -1;
> > -               goto done;
> > -       }
> > +       discard_cache();
> > +       if (write_cache_as_tree(&info->w_tree, 0, NULL))
> > +               return -1;
>
> In the previous patch you use the following:
>
> +       if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
> +           reset_tree(&info->i_tree, 0, 1))
>
> Here the reset_tree(&info->i_tree, 0, 1) call is in the "done:" part
> of the code. Is there a reason for that? Can't
> reset_tree(&info->i_tree, 0, 1) be tried here before returning an
> error? Or was the previous patch wrong?

Sorry I think I misunderstood the code in the previous patch. It
actually doesn't try reset_tree(&info->i_tree, 0, 1) when
write_cache_as_tree(&info->w_tree, 0, NULL) fails.

> I guess the reason why the
> reset_tree(&info->i_tree, 0, 1) call here is in the "done:" part of
> the call is because it should be after the diff tree command is
> launched. I see that the commit message tries to explain this a bit,
> but it is still not very clear to me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree()
  2020-06-13  8:52   ` Christian Couder
@ 2020-06-13 18:00     ` Alban Gruin
  2020-06-15 12:02       ` Christian Couder
  0 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-06-13 18:00 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

Hi Christian,

Le 13/06/2020 à 10:52, Christian Couder a écrit :
> On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> This removes the second index used in stash_working_tree() to simplify
>> the code.  It also help to avoid issues with the split-index: when
> 
> s/help/helps/
> 
>> stash_working_tree() is called, the index is at `i_tree', and this tree
>> is extracted in a second index for use in a subcommand.  This is not a
>> problem in the non-split-index case, but in the split-index case, if the
>> shared index file has expired and is removed by a subcommand, the main
>> index contains a reference to a file that no longer exists.
> 
> As this is fixing a bug and there is no test, it might help if you can
> at least give an example of something that used to fail before this
> patch and doesn't after it. You are talking about stash subcommands
> but it is not very clear which one for example can trigger the bug.
> 
>> The calls to set_alternative_index_output() are dropped to extract
>> `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
>> starting `update-index'.  When it exits, the index has changed, and must
>> be discarded.
> 
> That makes sense.
> 
>> The call to reset_tree() becomes useless:
> 
> Your patch doesn't remove any call to reset_tree(), but actually adds
> one. So the above is difficult to understand.
> 
> Do you want to say that in a later patch it will be possible to remove
> the call to reset_tree()? Or do you want to say that the call to
> write_index_as_tree() becomes useless?
> 

No, I meant that with this commit, reset_tree() does not need to be
called at the beginning of stash_working_tree(), because it is only
called by do_create_stash(), which sets the index at `i_tree', and
save_untracked_files() does not change the main index.  But it will
become useful again down the line, when save_untracked_file() will be
rewritten to use the "main" index, so I did not remove it.

I hope it makes more sense now.

>> the only caller of
>> stash_working_tree() is do_create_stash(), which creates `i_tree' from
>> its index, calls save_untracked_files() if requested (but as it also
>> works on a second index, it is unaffected), then calls
>> stash_working_tree().  But when save_untracked_files() will be modified
>> to stop using another index, it won't reset the tree, because
>> stash_patch() wants to work on a different tree (`b_tree') than
>> stash_working_tree().
>>
>> At the end of the function, the tree is reset to `i_tree'.

Alban


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files()
  2020-05-05 10:48 ` [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
@ 2020-06-13 18:51   ` Christian Couder
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-06-13 18:51 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in save_untracked_files().
>
> This functions creates a new commit with no parents, and a tree
> containing only untracked files, so the tree is reset to the empty tree
> at the beginning (hence the need for reset_tree() to accept constant
> trees).  The environment of `update-index' is no longer updated, and the
> index is discarded after this command exited.

Maybe s/exited/exits/.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked()
  2020-05-05 10:48 ` [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked() Alban Gruin
@ 2020-06-13 19:41   ` Christian Couder
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-06-13 19:41 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:

> diff --git a/builtin/stash.c b/builtin/stash.c
> index cbe37cd24b..1eafc1fe8d 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -359,29 +359,16 @@ static int restore_untracked(struct object_id *u_tree)
>         int res;
>         struct child_process cp = CHILD_PROCESS_INIT;
>
> -       /*
> -        * We need to run restore files from a given index, but without
> -        * affecting the current index, so we use GIT_INDEX_FILE with
> -        * run_command to fork processes that will not interfere.
> -        */
> -       cp.git_cmd = 1;
> -       argv_array_push(&cp.args, "read-tree");
> -       argv_array_push(&cp.args, oid_to_hex(u_tree));
> -       argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
> -                        stash_index_path.buf);
> -       if (run_command(&cp)) {
> -               remove_path(stash_index_path.buf);
> +       if (reset_tree(u_tree, 0, 0))
>                 return -1;
> -       }
>
>         child_process_init(&cp);

Is this still necessary?

>         cp.git_cmd = 1;
>         argv_array_pushl(&cp.args, "checkout-index", "--all", NULL);
> -       argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
> -                        stash_index_path.buf);
>
>         res = run_command(&cp);
> -       remove_path(stash_index_path.buf);
> +       discard_cache();
> +
>         return res;
>  }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree()
  2020-06-13 18:00     ` Alban Gruin
@ 2020-06-15 12:02       ` Christian Couder
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-06-15 12:02 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Sat, Jun 13, 2020 at 8:00 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> Le 13/06/2020 à 10:52, Christian Couder a écrit :
> > On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:

> >> The call to reset_tree() becomes useless:
> >
> > Your patch doesn't remove any call to reset_tree(), but actually adds
> > one. So the above is difficult to understand.
> >
> > Do you want to say that in a later patch it will be possible to remove
> > the call to reset_tree()? Or do you want to say that the call to
> > write_index_as_tree() becomes useless?
> >
>
> No, I meant that with this commit, reset_tree() does not need to be
> called at the beginning of stash_working_tree(), because it is only
> called by do_create_stash(), which sets the index at `i_tree', and
> save_untracked_files() does not change the main index.  But it will
> become useful again down the line, when save_untracked_file() will be
> rewritten to use the "main" index, so I did not remove it.
>
> I hope it makes more sense now.

Yeah, it makes more sense with an explanation like the above. Instead
of "down the line" though, you might want to say something like "in a
later commit", as I think it will make it clearer for reviewers that
the commit when it will become useful again should be part of the same
series.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (7 preceding siblings ...)
  2020-06-13  7:52 ` Christian Couder
@ 2020-06-15 15:27 ` SZEDER Gábor
  2020-06-15 21:50   ` SZEDER Gábor
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
  9 siblings, 1 reply; 44+ messages in thread
From: SZEDER Gábor @ 2020-06-15 15:27 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano

On Tue, May 05, 2020 at 12:48:43PM +0200, Alban Gruin wrote:
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index.  This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.).  This works most of the time, except in some edge case with the
> split-index when the split file has expired and is deleted by a forked
> command: the main index may still contain a reference to the now-deleted
> file, and subsequent operations on the index will fail [0].
> 
> The goal of this series is to modernise (a bit) builtin/stash.c, and to
> fix the aforementionned edge case.

While this patch series does fix this edge case, it doesn't fix the
root cause, which is not specific to 'git stash', but is more general
issue when using split index with an alternate index.  More on this in
the notes attached to the patch below.

Having said that, and without actually looking at your patches, I
think removing the second index from the builtin stash is a good idea
nonetheless.

> I have to admit that I don't really know how to test this.
> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
> the split-index at all, at least in `git stash' and its forks)

You are right, as it turns out GIT_TEST_SPLIT_INDEX=1 has been broken
for a while now.  The patch below fixes this, but there is a bit of a
fallout of test failures, which it doesn't fix, because I'm not quite
sure how to fix one of them.  Again, more on this in the attached
notes.

  ---  >8  ---

Subject: [PATCH] [RFH] read-cache: fix GIT_TEST_SPLIT_INDEX

Setting GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the split-index
feature in the test suite and trigger index splitting (mostly)
randomly.  Alas, this has been broken since 6e37c8ed3c (read-cache.c:
fix writing "link" index ext with null base oid, 2019-02-13), and no
index splitting has been performed at all since then.

There are two places where we check the value of GIT_TEST_SPLIT_INDEX,
and before 6e37c8ed3c they worked like this:

  1) In the lower-level do_write_index(), where, if
     GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index().
     This call merely allocates and zero-initializes
     'istate->split_index', but does nothing else (i.e. doesn't fill
     the base/shared index with cache entries, doesn't actually write
     a shared index file, etc.).  Pertinent to this issue, the hash of
     the base index remains all zeroed out.

  2) In the higher-level write_locked_index(), where, if
     GIT_TEST_SPLIT_INDEX is enabled _and_ 'istate->split_index' has
     already been initialized, it randomly sets the flag that triggers
     index splitting later in this function.  This randomness comes
     from the first byte of the hash of the base index via an 'if
     ((first_byte & 15) < 6)' condition.

     However, if 'istate->split_index' hasn't been initialized (i.e.
     still NULL), then it calls do_write_locked_index(), which
     internally calls do_write_index() mentioned above, and then a
     'goto out' skips right over this second GIT_TEST_SPLIT_INDEX
     check.

This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index
splitting randomly, the first two index writes were always
deterministic, though I suspect this was unintentional (this is still
before 6e37c8ed3c):

  - The initial index write never splits the index.
    During the first index write write_locked_index() is called with
    'istate->split_index' still uninitialized, so the check in 2) is
    not executed.  It still calls do_write_index(), though, which then
    executes the check in 1).  The resulting all zero base index hash
    then leads to the 'link' extension being written to '.git/index',
    though a shared index file is not written:

      $ rm .git/index
      $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
      $ test-tool dump-split-index .git/index
      own c6ef71168597caec8553c83d9d0048f1ef416170
      base 0000000000000000000000000000000000000000
      100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file
      replacements:
      deletions:
      $ ls -l .git/sharedindex.*
      ls: cannot access '.git/sharedindex.*': No such file or directory

  - The second index write always splits the index.
    When the index written in the previous point is read,
    'istate->split_index' is initialized because of the presence of
    the 'link' extension.  So during the second write
    write_locked_index() does run the check in 2), and the first byte
    of the all zero base index hash always fulfills the randomness
    condition, which in turn always triggers the index splitting.

  - Subsequent index writes will find the 'link' extension with a real
    non-zero base index hash, so from then on the check in 2) is
    executed and the first byte of the base index hash is as random as
    it gets (coming from the SHA-1 of index data including timestamps
    and inodes...).

All this worked until 6e37c8ed3c came along, and stopped writing the
'link' extension if the hash of the base index was all zero:

  $ rm .git/index
  $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file
  $ test-tool dump-split-index .git/index
  own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7
  not a split index
  $ ls -l .git/sharedindex.*
  ls: cannot access '.git/sharedindex.*': No such file or directory

Since there is no 'link' extension, in the second index write
'istate->split_index' remains uninitialized, and the check in 2) is
not executed, and ultimately the index is never split.

Fix this by modifying write_locked_index() to make sure to check
GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still
uninitialized, and initialize it if necessary.  The check for
GIT_TEST_SPLIT_INDEX and separate init_split_index() call in
do_write_index() thus becomes unnecessary, so remove it.  Furthermore,
add a test to 't1700-split-index.sh' to make sure that
GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the
index splitting on the first index write, because after that it will
be random).

Note that this change does not restore the pre-6e37c8ed3c behaviour,
as it will deterministically split the index already on the first
index write.  Since GIT_TEST_SPLIT_INDEX is purely a developer aid,
there is no backwards compatibility issue here.  The new behaviour
does trigger test failures in 't0003-attributes.sh' and
't1600-index.sh', though, which should be fixed in preparatory patches
first.
---

Notes:
    The failures in 't1600-index.sh' are easy: those tests check how bogus
    values in $GIT_INDEX_VERSION or 'index.version' are handled, and all
    failures look like this:
    
      --- expect.err  2020-06-14 17:28:21.043527044 +0000
      +++ actual.err  2020-06-14 17:28:21.043527044 +0000
      @@ -1,2 +1,4 @@
       warning: index.version set, but the value is invalid.
       Using version
      +warning: index.version set, but the value is invalid.
      +Using version
    
    We get one such warning when writing any index file, and since we now
    split the index on the first write, we get one warning when writing
    the shared index and a second warning when writing '.git/index'.  A
    simple 'sane_unset GIT_TEST_SPLIT_INDEX' fixes these failures, and
    since this test script is about some intricacies of the index, I think
    this is the right thing to do in this case.
    
    The failure in 't0003-attributes.sh' is tricky, because it does this:
    
      # The previous tests wrote the split and shared shared index files
      # in '$TRASH_DIRECTORY/.git'.
      (
            cd bare.git &&
            GIT_INDEX_FILE=../.git/index \
            git check-attr --cached ....
      )
    
    which fails with:
    
      fatal: ./sharedindex.3e9b03a8e3e1533d2bb2e22c77d4784ce2e7b108: index file open failed: No such file or directory
    
    Unsetting GIT_TEST_SPLIT_INDEX in the test script fixes this failure
    as well, of course, though I don't think this is the right thing to do
    in this case.  Worse, this failure raises some serious questions:
    
      - Where to look for and where to write the shared index file when
        using an alternate index location?
    
        Currently it's read from / written to gitdir.  This might be fine
        as long as both cwd and the alternate index are in the same
        repository while writing and reading.  However, as this failing
        test shows, if the shared index was written in one repository,
        then an attempt to read it while in another repository using
        GIT_INDEX_FILE will fail.
    
        I think it would be more sensible to write the shared index next
        to the (alternate) index file in the same directory, wherever that
        might be.  But what about backwards compatibility?  IOW can there
        be a sensible use case that relies on GIT_INDEX_FILE pointing to
        somewhere, but its shared index file is still being written to the
        current repository?
    
      - What about expiring shared index files while using an alternate
        index?
    
        We must be careful not to expire any sharedindex.<sha1> files that
        are used by '.git/index' or any existing alternate index files.
        Unfortunately, we are not that careful, and I think this is the
        root cause of the issue that Alban is trying to fix.  The failing
        test below demonstrates this using basically only 'git
        update-index', without all the complexity of 'git stash'.
    
        test_expect_failure 'split index expiration vs. alternate index' '
            >file1 &&
            >file2 &&
    
            git update-index --split-index --add file1 &&
            # debug: this should print .git/index and .git/sharedindex.<sha1>.
            echo .git/*index* &&
    
            GIT_INDEX_FILE=.git/otherindex \
            git -c splitIndex.sharedIndexExpire=now \
            update-index --split-index --add file2 &&
            # debug: this should print .git/index, .git/otherindex and _two_
            # .git/sharedindex.<sha1> files, but, alas, it prints only one
            # shared index, because the other got expired.
            echo .git/*index* &&
    
            git diff --cached --name-only >actual1 &&
            echo file1 >expect1 &&
            test_cmp expect1 actual1 &&
    
            GIT_INDEX_FILE=.git/otherindex git diff --cached --name-only >actual2 &&
            echo file2 >expect2 &&
            test_cmp expect2 actual2
        '
    
        Now, not expiring the shared index used by '.git/index' while
        writing to an alternate split index is easy, because we do know
        about '.git/index', of course, and thus can check it to exclude
        its shared index.  But how could a git process expiring shared
        indexes know whether a shared index file belongs to an alternate
        index?!  I'm afraid it can't, which leads to the next question...
    
      - Should we even allow writing a split index when using an alternate
        index file?
    
        If we don't do that, then there is no risk of expiring the shared
        index of an alternate index, because there will be no such shared
        index to begin with.  Furthermore, if we don't write a shared
        index while using an alternate index, then we won't trigger
        expiration while using an alternate index, so we won't
        accidentally expire the shared index used by '.git/index'.
    
        Note, however, that we should still support _reading_ the shared
        index of an alternate index, as that alternate index might be the
        "real" '.git/index' of a different repository, like in the failing
        test in t0003.
    
      - Should we even allow 'splitIndex.sharedIndexExpire=now'?
    
        I believe, though haven't confirmed, that it can cause trouble
        even without using an alternate index.  Consider the following
        sequence of events:
    
          - Git process A reads '.git/index', finds the 'link' extension,
            and reads the SHA1 recorded there that determines the filename
            of its shared index.
    
          - The scheduler steps in, and puts process A to sleep.
    
          - Git process B updates the index, decides that it's time to
            write a new shared index, does so, and then because of
            'splitIndex.sharedIndexExpire=now' it removes all other shared
            index files.
    
          - The scheduler wakes process A, which now tries to open the
            shared index file it just learned about, but fails because
            that file has just been removed by process B.
    
        This is similar to the issue we have with 'git gc --prune=now',
        except that 'git gc's documentation explicitly warns about the
        risks of using '--prune=now', while the description of
        'splitIndex.sharedIndexExpire' doesn't have any such warning.
    
        I think that 'splitIndex.sharedIndexExpire=now' should be allowed,
        for those who hopefully know what they are doing, just as we allow
        'git gc --prune=now', but the documentation should clearly warn
        against its potential pitfalls.

 read-cache.c           | 22 +++++++++++++---------
 t/t1700-split-index.sh | 10 ++++++++++
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index aa427c5c17..b888c5df44 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2801,11 +2801,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		}
 	}
 
-	if (!istate->version) {
+	if (!istate->version)
 		istate->version = get_index_format_default(the_repository);
-		if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
-			init_split_index(istate);
-	}
 
 	/* demote version 3 to version 2 when the latter suffices */
 	if (istate->version == 3 || istate->version == 2)
@@ -3222,7 +3219,7 @@ static int too_many_not_shared_entries(struct index_state *istate)
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		       unsigned flags)
 {
-	int new_shared_index, ret;
+	int new_shared_index, ret, test_split_index_env;
 	struct split_index *si = istate->split_index;
 
 	if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
@@ -3237,7 +3234,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	if (istate->fsmonitor_last_update)
 		fill_fsmonitor_bitmap(istate);
 
-	if (!si || alternate_index_output ||
+	test_split_index_env = git_env_bool("GIT_TEST_SPLIT_INDEX", 0);
+
+	if ((!si && !test_split_index_env) || alternate_index_output ||
 	    (istate->cache_changed & ~EXTMASK)) {
 		if (si)
 			oidclr(&si->base_oid);
@@ -3245,10 +3244,15 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		goto out;
 	}
 
-	if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
-		int v = si->base_oid.hash[0];
-		if ((v & 15) < 6)
+	if (test_split_index_env) {
+		if (!si) {
+			si = init_split_index(istate);
 			istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		} else {
+			int v = si->base_oid.hash[0];
+			if ((v & 15) < 6)
+				istate->cache_changed |= SPLIT_INDEX_ORDERED;
+		}
 	}
 	if (too_many_not_shared_entries(istate))
 		istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 12a5568844..e0c9e16a8a 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -507,4 +507,14 @@ test_expect_success 'do not refresh null base index' '
 	)
 '
 
+test_expect_success 'GIT_TEST_SPLIT_INDEX works' '
+	test_create_repo git-test-split-index &&
+	(
+		cd git-test-split-index &&
+		>file &&
+		GIT_TEST_SPLIT_INDEX=1 git update-index --add file &&
+		verbose test $(ls -l .git/sharedindex.* |wc -l) = 1
+	)
+'
+
 test_done
-- 
2.27.0.278.g0bed8b425d


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-06-15 15:27 ` SZEDER Gábor
@ 2020-06-15 21:50   ` SZEDER Gábor
  2020-06-16  7:06     ` SZEDER Gábor
  0 siblings, 1 reply; 44+ messages in thread
From: SZEDER Gábor @ 2020-06-15 21:50 UTC (permalink / raw)
  To: Alban Gruin
  Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano,
	Son Luong Ngoc

On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
>       - Should we even allow 'splitIndex.sharedIndexExpire=now'?
>     
>         I believe, though haven't confirmed, that it can cause trouble
>         even without using an alternate index.  Consider the following
>         sequence of events:
>     
>           - Git process A reads '.git/index', finds the 'link' extension,
>             and reads the SHA1 recorded there that determines the filename
>             of its shared index.
>     
>           - The scheduler steps in, and puts process A to sleep.
>     
>           - Git process B updates the index, decides that it's time to
>             write a new shared index, does so, and then because of
>             'splitIndex.sharedIndexExpire=now' it removes all other shared
>             index files.
>     
>           - The scheduler wakes process A, which now tries to open the
>             shared index file it just learned about, but fails because
>             that file has just been removed by process B.

Confirmed.

To help reproduce the issue, this diff adds a strategically-placed
controllable delay between reading '.git/index' and reading its
shared/base index:

diff --git a/read-cache.c b/read-cache.c
index b888c5df44..5a66e9bf4b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2319,6 +2319,9 @@ int read_index_from(struct index_state *istate, const char *path,
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 
+	if (git_env_bool("GIT_TEST_WAIT", 0))
+		sleep(3);
+
 	base_oid_hex = oid_to_hex(&split_index->base_oid);
 	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
 	trace2_region_enter_printf("index", "shared/do_read_index",


Then this test creates the above described sequence of events:

test_expect_failure 'splitIndex.sharedIndexExpire=now can be harmful' '
	>file1 &&
	>file2 &&
	git update-index --split-index --add file1 &&

	{
		sleep 1 &&
		# "process B"
		git -c splitIndex.sharedIndexExpire=now \
		    update-index --split-index --add file2 &
	} &&

	# "process A"
	GIT_TEST_WAIT=1 git diff --cached --name-only
'

... and fails reliably with:

  [...]
  + GIT_TEST_WAIT=1 git diff --cached --name-only
  [ ... trace from background commands removed ...]
  fatal: .git/sharedindex.818f65852e7402f236aeaadd32efdbb62291aa75: index file open failed: No such file or directory


>         This is similar to the issue we have with 'git gc --prune=now',
>         except that 'git gc's documentation explicitly warns about the
>         risks of using '--prune=now', while the description of
>         'splitIndex.sharedIndexExpire' doesn't have any such warning.
>     
>         I think that 'splitIndex.sharedIndexExpire=now' should be allowed,
>         for those who hopefully know what they are doing, just as we allow
>         'git gc --prune=now', but the documentation should clearly warn
>         against its potential pitfalls.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-06-15 21:50   ` SZEDER Gábor
@ 2020-06-16  7:06     ` SZEDER Gábor
  2020-06-17 20:04       ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: SZEDER Gábor @ 2020-06-16  7:06 UTC (permalink / raw)
  To: Alban Gruin
  Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano,
	Son Luong Ngoc

On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote:
> On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
> >       - Should we even allow 'splitIndex.sharedIndexExpire=now'?
> >     
> >         I believe, though haven't confirmed, that it can cause trouble
> >         even without using an alternate index.  Consider the following
> >         sequence of events:
> >     
> >           - Git process A reads '.git/index', finds the 'link' extension,
> >             and reads the SHA1 recorded there that determines the filename
> >             of its shared index.
> >     
> >           - The scheduler steps in, and puts process A to sleep.
> >     
> >           - Git process B updates the index, decides that it's time to
> >             write a new shared index, does so, and then because of
> >             'splitIndex.sharedIndexExpire=now' it removes all other shared
> >             index files.
> >     
> >           - The scheduler wakes process A, which now tries to open the
> >             shared index file it just learned about, but fails because
> >             that file has just been removed by process B.
> 
> Confirmed.
> 
> To help reproduce the issue, this diff adds a strategically-placed
> controllable delay between reading '.git/index' and reading its
> shared/base index:
> 
> diff --git a/read-cache.c b/read-cache.c
> index b888c5df44..5a66e9bf4b 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2319,6 +2319,9 @@ int read_index_from(struct index_state *istate, const char *path,
>  	else
>  		split_index->base = xcalloc(1, sizeof(*split_index->base));
>  
> +	if (git_env_bool("GIT_TEST_WAIT", 0))
> +		sleep(3);

We don't even need this sleep() call ...

>  	base_oid_hex = oid_to_hex(&split_index->base_oid);
>  	base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
>  	trace2_region_enter_printf("index", "shared/do_read_index",
> 
> 
> Then this test creates the above described sequence of events:
> 
> test_expect_failure 'splitIndex.sharedIndexExpire=now can be harmful' '
> 	>file1 &&
> 	>file2 &&
> 	git update-index --split-index --add file1 &&
> 
> 	{
> 		sleep 1 &&

... and this 'sleep 1' command.

> 		# "process B"
> 		git -c splitIndex.sharedIndexExpire=now \
> 		    update-index --split-index --add file2 &
> 	} &&
> 
> 	# "process A"
> 	GIT_TEST_WAIT=1 git diff --cached --name-only
> '
> 
> ... and fails reliably with:
> 
>   [...]
>   + GIT_TEST_WAIT=1 git diff --cached --name-only
>   [ ... trace from background commands removed ...]
>   fatal: .git/sharedindex.818f65852e7402f236aeaadd32efdbb62291aa75: index file open failed: No such file or directory

Flipping the test to 'test_expect_success' and running it with
'--stress' almost always fails within 2 seconds.


> >         This is similar to the issue we have with 'git gc --prune=now',
> >         except that 'git gc's documentation explicitly warns about the
> >         risks of using '--prune=now', while the description of
> >         'splitIndex.sharedIndexExpire' doesn't have any such warning.
> >     
> >         I think that 'splitIndex.sharedIndexExpire=now' should be allowed,
> >         for those who hopefully know what they are doing, just as we allow
> >         'git gc --prune=now', but the documentation should clearly warn
> >         against its potential pitfalls.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-06-16  7:06     ` SZEDER Gábor
@ 2020-06-17 20:04       ` Junio C Hamano
  2020-06-17 21:31         ` Alban Gruin
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-06-17 20:04 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Alban Gruin, git, Thomas Gummerer, Johannes Schindelin, Son Luong Ngoc

SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote:
>> On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
>> >       - Should we even allow 'splitIndex.sharedIndexExpire=now'?

Good analysis.  The most conservative might end up to be to disable
splitindex altogether but perhaps we can first set a reasonable
minimum to the expiration to say 10min?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-06-17 20:04       ` Junio C Hamano
@ 2020-06-17 21:31         ` Alban Gruin
  0 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-17 21:31 UTC (permalink / raw)
  To: Junio C Hamano, SZEDER Gábor
  Cc: git, Thomas Gummerer, Johannes Schindelin, Son Luong Ngoc

Hi,

Le 17/06/2020 à 22:04, Junio C Hamano a écrit :
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
>> On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote:
>>> On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
>>>>       - Should we even allow 'splitIndex.sharedIndexExpire=now'?
> 
> Good analysis.  The most conservative might end up to be to disable
> splitindex altogether but perhaps we can first set a reasonable
> minimum to the expiration to say 10min?
> 

I tried to understand what write_locked_index() and its callees does,
and I think there is an issue regarding how split-index files are handled:

 1. The new "main" index is written in a temporary file.  This temporary
file is either renamed to "index" if the COMMIT_LOCK flag is set, or
left as-is (cf. do_write_locked_index()).

 2. The split-index files are removed if they are too old, without
checking if the COMMIT_LOCK flag is set (cf. write_shared_index()).

This could lead to a situation where a split-index file is removed
because it is deemed too old, but the main index is left as-is, still
pointing to this file.  I am afraid that this can be an issue, even when
`splitIndex.sharedIndexExpire' is set to a "sane" value (which could
obviously be exacerbated by `now').

Alban


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v1 0/6] stash: drop usage of a second index
  2020-06-13  7:52 ` Christian Couder
@ 2020-06-25 12:35   ` Alban Gruin
  0 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-25 12:35 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Thomas Gummerer, Johannes Schindelin, Junio C Hamano,
	Son Luong Ngoc

Hi Christian,

Le 13/06/2020 à 09:52, Christian Couder a écrit :
> Hi,
> 
> On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> The old scripted `git stash' used to create a second index to save
>> modified and untracked files, and restore untracked files, without
>> affecting the main index.  This behaviour was carried on when it was
>> rewritten in C, and here, most operations performed on the second index
>> are done by forked commands (ie. `read-tree' instead of reset_tree(),
>> etc.).  This works most of the time, except in some edge case with the
>> split-index when the split file has expired and is deleted by a forked
>> command: the main index may still contain a reference to the now-deleted
>> file, and subsequent operations on the index will fail [0].
> 
> Thanks for working on this! I agree that it would be nice to fix split
> index issues as it could help for sure with huge repositories. Sorry
> also that this patch series fell through the cracks.
> 
> I am adding Son Luong Ngoc in Cc as he reported the issue that this
> series fixes.
> 
>> The goal of this series is to modernise (a bit) builtin/stash.c, and to
>> fix the aforementionned edge case.
>>
>> I have to admit that I don't really know how to test this.
>> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
>> the split-index at all, at least in `git stash' and its forks),
> 
> It should have worked when it was introduced, though maybe not for `git stash`.
> 
>> and I'm
>> reluctant to add explicits tests on `git stash' about the split-index,
>> when nothing in its code explicitly does unusual things with the index
>> once this series is applied.  If anyone wants to share opinions about
>> this, I would be happy to read them.
> 
> I understand. I think the good way forward would be to fix
> GIT_TEST_SPLIT_INDEX and find a way to ensure that it keeps working in
> the future.
> 
> Thanks,
> Christian.
> 

Thank you for your feedback.

I'll resend this series with the changes you and Son suggested, but I
think I'm going to remove references to bugs in the commit messages, to
turn it into another cleanup series.  For the index, I will to try to
implement Gábor's suggestions in another series.

Thanks,
Alban


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/6] stash: drop usage of a second index
  2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
                   ` (8 preceding siblings ...)
  2020-06-15 15:27 ` SZEDER Gábor
@ 2020-06-30 15:15 ` Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
                     ` (7 more replies)
  9 siblings, 8 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

The old scripted `git stash' used to create a second index to save
modified and untracked files, and restore untracked files, without
affecting the main index.  This behaviour was carried on when it was
rewritten in C, and here, most operations performed on the second index
are done by forked commands (ie. `read-tree' instead of reset_tree(),
etc.).

The goal of this series is to modernise (a bit) builtin/stash.c.

Originally, this series was also meant to fix a bug reported by Son
Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited
to `git stash', so this series is not a good fix for this particular
issue.

This series is based on a08a83db2b (The sixth batch, 2020-06-29).

The tip of this series is tagged as "stash-remove-second-index-v2" at
https://github.com/agrn/git.

[0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/
[1] https://lore.kernel.org/git/20200615152715.GD2898@szeder.dev/

Changes since v1:

 - Lots of rewording, following comments from Christian Couder and Son
   Luong Ngoc.

 - Removed a useless function call.

Alban Gruin (6):
  stash: mark `i_tree' in reset_tree() const
  stash: remove the second index in stash_working_tree()
  stash: remove the second index in stash_patch()
  stash: remove the second index in save_untracked_files()
  stash: remove the second index in restore_untracked()
  stash: remove `stash_index_path'

 builtin/stash.c | 156 +++++++++++++++---------------------------------
 1 file changed, 48 insertions(+), 108 deletions(-)

Range-diff against v1:
1:  6101e8ce64 ! 1:  598f03e76e stash: mark `i_tree' in reset_tree() const
    @@ -2,9 +2,9 @@
     
         stash: mark `i_tree' in reset_tree() const
     
    -    As reset_tree() does not change the value pointed by `i_tree', and that
    -    it will be provided with `the_hash_algo->empty_tree' which is a
    -    constant, it is changed to be a pointer to a constant.
    +    In reset_tree(), the value pointed by `i_tree' is not modified.  In a
    +    latter commit, it will be provided with `the_hash_algo->empty_tree',
    +    which is a constant.  Hence, this changes `i_tree' to be a constant.
     
         Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
     
2:  26dceead41 ! 2:  7265cfe65c stash: remove the second index in stash_working_tree()
    @@ -3,26 +3,19 @@
         stash: remove the second index in stash_working_tree()
     
         This removes the second index used in stash_working_tree() to simplify
    -    the code.  It also help to avoid issues with the split-index: when
    -    stash_working_tree() is called, the index is at `i_tree', and this tree
    -    is extracted in a second index for use in a subcommand.  This is not a
    -    problem in the non-split-index case, but in the split-index case, if the
    -    shared index file has expired and is removed by a subcommand, the main
    -    index contains a reference to a file that no longer exists.
    +    the code.
     
         The calls to set_alternative_index_output() are dropped to extract
         `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
         starting `update-index'.  When it exits, the index has changed, and must
         be discarded.
     
    -    The call to reset_tree() becomes useless: the only caller of
    -    stash_working_tree() is do_create_stash(), which creates `i_tree' from
    -    its index, calls save_untracked_files() if requested (but as it also
    -    works on a second index, it is unaffected), then calls
    -    stash_working_tree().  But when save_untracked_files() will be modified
    -    to stop using another index, it won't reset the tree, because
    -    stash_patch() wants to work on a different tree (`b_tree') than
    -    stash_working_tree().
    +    With this commit, reset_tree() does not need to be called at the
    +    beginning of stash_working_tree(), because it is only called by
    +    do_create_stash(), which sets the index at `i_tree', and
    +    save_untracked_files() does not change the main index.  But it will
    +    become useful again in a later commit, when save_untracked_file() will
    +    be rewritten to use the "main" index, so I did not remove it.
     
         At the end of the function, the tree is reset to `i_tree'.
     
3:  1ae236daf4 = 3:  b5587c0d56 stash: remove the second index in stash_patch()
4:  8c901ddf9d ! 4:  5fbcddf899 stash: remove the second index in save_untracked_files()
    @@ -8,7 +8,7 @@
         containing only untracked files, so the tree is reset to the empty tree
         at the beginning (hence the need for reset_tree() to accept constant
         trees).  The environment of `update-index' is no longer updated, and the
    -    index is discarded after this command exited.
    +    index is discarded after this command exits.
     
         As the only caller of this function is do_create_stash(), and the next
         user of the index is either save_untracked_files() or stash_patch(),
5:  413be5b5f9 ! 5:  0338986339 stash: remove the second index in restore_untracked()
    @@ -42,7 +42,7 @@
      		return -1;
     -	}
      
    - 	child_process_init(&cp);
    +-	child_process_init(&cp);
      	cp.git_cmd = 1;
      	argv_array_pushl(&cp.args, "checkout-index", "--all", NULL);
     -	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
6:  b2c08c189e = 6:  4f151f4600 stash: remove `stash_index_path'
-- 
2.20.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

In reset_tree(), the value pointed by `i_tree' is not modified.  In a
latter commit, it will be provided with `the_hash_algo->empty_tree',
which is a constant.  Hence, this changes `i_tree' to be a constant.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 0c52a3b849..9baa8b379e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix)
 	return do_clear_stash();
 }
 
-static int reset_tree(struct object_id *i_tree, int update, int reset)
+static int reset_tree(const struct object_id *i_tree, int update, int reset)
 {
 	int nr_trees = 1;
 	struct unpack_trees_options opts;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/6] stash: remove the second index in stash_working_tree()
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 3/6] stash: remove the second index in stash_patch() Alban Gruin
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

This removes the second index used in stash_working_tree() to simplify
the code.

The calls to set_alternative_index_output() are dropped to extract
`i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
starting `update-index'.  When it exits, the index has changed, and must
be discarded.

With this commit, reset_tree() does not need to be called at the
beginning of stash_working_tree(), because it is only called by
do_create_stash(), which sets the index at `i_tree', and
save_untracked_files() does not change the main index.  But it will
become useful again in a later commit, when save_untracked_file() will
be rewritten to use the "main" index, so I did not remove it.

At the end of the function, the tree is reset to `i_tree'.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 9baa8b379e..2535335275 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	struct rev_info rev;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
 	struct strbuf diff_output = STRBUF_INIT;
-	struct index_state istate = { NULL };
 
 	init_revisions(&rev, NULL);
 	copy_pathspec(&rev.prune_data, ps);
 
-	set_alternate_index_output(stash_index_path.buf);
 	if (reset_tree(&info->i_tree, 0, 0)) {
 		ret = -1;
 		goto done;
 	}
-	set_alternate_index_output(NULL);
 
 	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = add_diff_to_buf;
@@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	argv_array_pushl(&cp_upd_index.args, "update-index",
 			 "--ignore-skip-worktree-entries",
 			 "-z", "--add", "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len,
 			 NULL, 0, NULL, 0)) {
@@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 		goto done;
 	}
 
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
+	    reset_tree(&info->i_tree, 0, 1))
 		ret = -1;
-		goto done;
-	}
 
 done:
-	discard_index(&istate);
 	UNLEAK(rev);
 	object_array_clear(&rev.pending);
 	clear_pathspec(&rev.prune_data);
 	strbuf_release(&diff_output);
-	remove_path(stash_index_path.buf);
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/6] stash: remove the second index in stash_patch()
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

This removes the second index used in stash_patch().

This function starts by resetting the index (which is set at `i_tree')
to HEAD, which has been stored in `b_commit' by do_create_stash(), and
the call to `read-tree' is replaced by reset_tree().  The index is
discarded after run_add_interactive(), but not `diff-tree' as this
command should not change it.

Since the index has been changed, and subsequent code might be sensitive
to this, it is reset to `i_tree' at the end of the function.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 47 ++++++++++-------------------------------------
 1 file changed, 10 insertions(+), 37 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 2535335275..eaeb7bc8c4 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		       struct strbuf *out_patch, int quiet)
 {
 	int ret = 0;
-	struct child_process cp_read_tree = CHILD_PROCESS_INIT;
 	struct child_process cp_diff_tree = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
-	char *old_index_env = NULL, *old_repo_index_file;
 
-	remove_path(stash_index_path.buf);
-
-	cp_read_tree.git_cmd = 1;
-	argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL);
-	argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp_read_tree)) {
-		ret = -1;
-		goto done;
-	}
+	if (reset_tree(&info->b_commit, 0, 1))
+		return -1;
 
 	/* Find out what the user wants. */
-	old_repo_index_file = the_repository->index_file;
-	the_repository->index_file = stash_index_path.buf;
-	old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
-	setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1);
-
 	ret = run_add_interactive(NULL, "--patch=stash", ps);
 
-	the_repository->index_file = old_repo_index_file;
-	if (old_index_env && *old_index_env)
-		setenv(INDEX_ENVIRONMENT, old_index_env, 1);
-	else
-		unsetenv(INDEX_ENVIRONMENT);
-	FREE_AND_NULL(old_index_env);
-
 	/* State of the working tree. */
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL))
+		return -1;
 
 	cp_diff_tree.git_cmd = 1;
 	argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD",
 			 oid_to_hex(&info->w_tree), "--", NULL);
-	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
+	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0))
+		return -1;
 
 	if (!out_patch->len) {
 		if (!quiet)
@@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		ret = 1;
 	}
 
-done:
-	discard_index(&istate);
-	remove_path(stash_index_path.buf);
+	if (reset_tree(&info->i_tree, 0, 1))
+		return -1;
+
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/6] stash: remove the second index in save_untracked_files()
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
                     ` (2 preceding siblings ...)
  2020-06-30 15:15   ` [PATCH v2 3/6] stash: remove the second index in stash_patch() Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-06-30 15:15   ` [PATCH v2 5/6] stash: remove the second index in restore_untracked() Alban Gruin
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

This removes the second index used in save_untracked_files().

This functions creates a new commit with no parents, and a tree
containing only untracked files, so the tree is reset to the empty tree
at the beginning (hence the need for reset_tree() to accept constant
trees).  The environment of `update-index' is no longer updated, and the
index is discarded after this command exits.

As the only caller of this function is do_create_stash(), and the next
user of the index is either save_untracked_files() or stash_patch(),
which both want a different tree, the index is left as-is at the end.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index eaeb7bc8c4..cbe37cd24b 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked,
 static int save_untracked_files(struct stash_info *info, struct strbuf *msg,
 				struct strbuf files)
 {
-	int ret = 0;
 	struct strbuf untracked_msg = STRBUF_INIT;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
+
+	if (reset_tree(the_hash_algo->empty_tree, 0, 1))
+		return -1;
 
 	cp_upd_index.git_cmd = 1;
 	argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add",
 			 "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
+
+	if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0,
+			 NULL, 0))
+		return -1;
+
+	discard_cache();
+	if (write_cache_as_tree(&info->u_tree, 0, NULL))
+		return -1;
 
 	strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf);
-	if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0,
-			 NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
-
-	if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
-
 	if (commit_tree(untracked_msg.buf, untracked_msg.len,
 			&info->u_tree, NULL, &info->u_commit, NULL, NULL)) {
-		ret = -1;
-		goto done;
+		strbuf_release(&untracked_msg);
+		return -1;
 	}
 
-done:
-	discard_index(&istate);
+	/* Do not reset the tree, as either stash_patch() or
+	 * stash_working_tree() will do it. */
+
 	strbuf_release(&untracked_msg);
-	remove_path(stash_index_path.buf);
-	return ret;
+	return 0;
 }
 
 static int stash_patch(struct stash_info *info, const struct pathspec *ps,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 5/6] stash: remove the second index in restore_untracked()
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
                     ` (3 preceding siblings ...)
  2020-06-30 15:15   ` [PATCH v2 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-07-31 13:45     ` Christian Couder
  2020-06-30 15:15   ` [PATCH v2 6/6] stash: remove `stash_index_path' Alban Gruin
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

This removes the second index used in restore_untracked().

The call to `read-tree' is replaced by reset_tree() with the appropriate
parameters (no update, no reset).  The environment of `checkout-index'
is no longer modified, and the cache is discarded when it exists.

In do_apply_stash(), the changes are a bit more involved: to avoid
conflicts with the merged index, restore_untracked() is moved after
merge_recursive_generic().

This introduces another problem: the files that were untracked once are
now added to the index, and update_index() would add back those files in
the index.  To avoid this, get_newly_staged() is moved before
restore_untracked().

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 47 +++++++++++++++++------------------------------
 1 file changed, 17 insertions(+), 30 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index cbe37cd24b..d5077a27d9 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -359,29 +359,15 @@ static int restore_untracked(struct object_id *u_tree)
 	int res;
 	struct child_process cp = CHILD_PROCESS_INIT;
 
-	/*
-	 * We need to run restore files from a given index, but without
-	 * affecting the current index, so we use GIT_INDEX_FILE with
-	 * run_command to fork processes that will not interfere.
-	 */
-	cp.git_cmd = 1;
-	argv_array_push(&cp.args, "read-tree");
-	argv_array_push(&cp.args, oid_to_hex(u_tree));
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp)) {
-		remove_path(stash_index_path.buf);
+	if (reset_tree(u_tree, 0, 0))
 		return -1;
-	}
 
-	child_process_init(&cp);
 	cp.git_cmd = 1;
 	argv_array_pushl(&cp.args, "checkout-index", "--all", NULL);
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	res = run_command(&cp);
-	remove_path(stash_index_path.buf);
+	discard_cache();
+
 	return res;
 }
 
@@ -395,6 +381,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 	struct object_id index_tree;
 	struct commit *result;
 	const struct object_id *bases[1];
+	struct strbuf newly_staged = STRBUF_INIT;
 
 	read_cache_preload(NULL);
 	if (refresh_and_write_cache(REFRESH_QUIET, 0, 0))
@@ -433,9 +420,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		}
 	}
 
-	if (info->has_u && restore_untracked(&info->u_tree))
-		return error(_("could not restore untracked files from stash"));
-
 	init_merge_options(&o, the_repository);
 
 	o.branch1 = "Updated upstream";
@@ -463,24 +447,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		return ret;
 	}
 
+	if (!has_index && get_newly_staged(&newly_staged, &c_tree)) {
+		strbuf_release(&newly_staged);
+		return -1;
+	}
+
+	if (info->has_u && restore_untracked(&info->u_tree)) {
+		strbuf_release(&newly_staged);
+		return error(_("could not restore untracked files from stash"));
+	}
+
 	if (has_index) {
 		if (reset_tree(&index_tree, 0, 0))
 			return -1;
 	} else {
-		struct strbuf out = STRBUF_INIT;
-
-		if (get_newly_staged(&out, &c_tree)) {
-			strbuf_release(&out);
-			return -1;
-		}
-
 		if (reset_tree(&c_tree, 0, 1)) {
-			strbuf_release(&out);
+			strbuf_release(&newly_staged);
 			return -1;
 		}
 
-		ret = update_index(&out);
-		strbuf_release(&out);
+		ret = update_index(&newly_staged);
+		strbuf_release(&newly_staged);
 		if (ret)
 			return -1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 6/6] stash: remove `stash_index_path'
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
                     ` (4 preceding siblings ...)
  2020-06-30 15:15   ` [PATCH v2 5/6] stash: remove the second index in restore_untracked() Alban Gruin
@ 2020-06-30 15:15   ` Alban Gruin
  2020-07-31 13:53   ` [PATCH v2 0/6] stash: drop usage of a second index Christian Couder
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
  7 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-06-30 15:15 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Thomas Gummerer, Junio C Hamano,
	Son Luong Ngoc, Alban Gruin

Since stash no longer uses a second index, `stash_index_path' is now
unused, and can be dropped.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index d5077a27d9..f1b3c0d2f8 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = {
 };
 
 static const char *ref_stash = "refs/stash";
-static struct strbuf stash_index_path = STRBUF_INIT;
 
 /*
  * w_commit is set to the commit containing the working tree
@@ -1505,8 +1504,6 @@ static int save_stash(int argc, const char **argv, const char *prefix)
 
 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
-	pid_t pid = getpid();
-	const char *index_file;
 	struct argv_array args = ARGV_ARRAY_INIT;
 
 	struct option options[] = {
@@ -1523,10 +1520,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, git_stash_usage,
 			     PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
 
-	index_file = get_index_file();
-	strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file,
-		    (uintmax_t)pid);
-
 	if (!argc)
 		return !!push_stash(0, NULL, prefix, 0);
 	else if (!strcmp(argv[0], "apply"))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/6] stash: remove the second index in restore_untracked()
  2020-06-30 15:15   ` [PATCH v2 5/6] stash: remove the second index in restore_untracked() Alban Gruin
@ 2020-07-31 13:45     ` Christian Couder
  2020-07-31 16:16       ` Alban Gruin
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Couder @ 2020-07-31 13:45 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Junio C Hamano, Son Luong Ngoc

On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in restore_untracked().
>
> The call to `read-tree' is replaced by reset_tree() with the appropriate
> parameters (no update, no reset).  The environment of `checkout-index'
> is no longer modified, and the cache is discarded when it exists.

Do you mean "when it exits" or "when it exists"? It could be confusing
as you use "exit" not "exist" in 2 previous commits.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/6] stash: drop usage of a second index
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
                     ` (5 preceding siblings ...)
  2020-06-30 15:15   ` [PATCH v2 6/6] stash: remove `stash_index_path' Alban Gruin
@ 2020-07-31 13:53   ` Christian Couder
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
  7 siblings, 0 replies; 44+ messages in thread
From: Christian Couder @ 2020-07-31 13:53 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Thomas Gummerer, Junio C Hamano, Son Luong Ngoc

On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index.  This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.).
>
> The goal of this series is to modernise (a bit) builtin/stash.c.

This patch series looks good to me. I found only a small nit or typo
in the commit message of patch 5/6.

> Originally, this series was also meant to fix a bug reported by Son
> Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited
> to `git stash', so this series is not a good fix for this particular
> issue.
>
> This series is based on a08a83db2b (The sixth batch, 2020-06-29).

It seems to apply without conflicts on top of current master.

> Changes since v1:
>
>  - Lots of rewording, following comments from Christian Couder and Son
>    Luong Ngoc.
>
>  - Removed a useless function call.

[...]

>  builtin/stash.c | 156 +++++++++++++++---------------------------------
>  1 file changed, 48 insertions(+), 108 deletions(-)

I like very much how it simplifies a lot of things.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 5/6] stash: remove the second index in restore_untracked()
  2020-07-31 13:45     ` Christian Couder
@ 2020-07-31 16:16       ` Alban Gruin
  0 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:16 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Thomas Gummerer, Junio C Hamano, Son Luong Ngoc

Hi Christian,

Le 31/07/2020 à 15:45, Christian Couder a écrit :
> On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> This removes the second index used in restore_untracked().
>>
>> The call to `read-tree' is replaced by reset_tree() with the appropriate
>> parameters (no update, no reset).  The environment of `checkout-index'
>> is no longer modified, and the cache is discarded when it exists.
> 
> Do you mean "when it exits" or "when it exists"? It could be confusing
> as you use "exit" not "exist" in 2 previous commits.
> 

I mean "when it exits", as in the previous commits.

Alban


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 0/6] stash: drop usage of a second index
  2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
                     ` (6 preceding siblings ...)
  2020-07-31 13:53   ` [PATCH v2 0/6] stash: drop usage of a second index Christian Couder
@ 2020-07-31 16:51   ` Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
                       ` (6 more replies)
  7 siblings, 7 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

The old scripted `git stash' used to create a second index to save
modified and untracked files, and restore untracked files, without
affecting the main index.  This behaviour was carried on when it was
rewritten in C, and here, most operations performed on the second index
are done by forked commands (ie. `read-tree' instead of reset_tree(),
etc.).

The goal of this series is to modernise (a bit) builtin/stash.c.

Originally, this series was also meant to fix a bug reported by Son
Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited
to `git stash', so this series is not a good fix for this particular
issue.

This series is based on a08a83db2b (The sixth batch, 2020-06-29), and it
applies cleanly on top of the current master (e8ab941b67 (The second
batch -- mostly minor typofixes, 2020-07-30)).

The tip of this series is tagged as "stash-remove-second-index-v3" at
https://github.com/agrn/git.

[0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/
[1] https://lore.kernel.org/git/20200615152715.GD2898@szeder.dev/

Changes since v2:

 - Typofix in the fifth patch, noticed by Christian Couder.

Alban Gruin (6):
  stash: mark `i_tree' in reset_tree() const
  stash: remove the second index in stash_working_tree()
  stash: remove the second index in stash_patch()
  stash: remove the second index in save_untracked_files()
  stash: remove the second index in restore_untracked()
  stash: remove `stash_index_path'

 builtin/stash.c | 156 +++++++++++++++---------------------------------
 1 file changed, 48 insertions(+), 108 deletions(-)

Range-diff against v2:
1:  598f03e76e = 1:  598f03e76e stash: mark `i_tree' in reset_tree() const
2:  7265cfe65c = 2:  7265cfe65c stash: remove the second index in stash_working_tree()
3:  b5587c0d56 = 3:  b5587c0d56 stash: remove the second index in stash_patch()
4:  5fbcddf899 = 4:  5fbcddf899 stash: remove the second index in save_untracked_files()
5:  0338986339 ! 5:  a12a923014 stash: remove the second index in restore_untracked()
    @@ -6,7 +6,7 @@
     
         The call to `read-tree' is replaced by reset_tree() with the appropriate
         parameters (no update, no reset).  The environment of `checkout-index'
    -    is no longer modified, and the cache is discarded when it exists.
    +    is no longer modified, and the cache is discarded when it exits.
     
         In do_apply_stash(), the changes are a bit more involved: to avoid
         conflicts with the merged index, restore_untracked() is moved after
6:  4f151f4600 = 6:  dd25e71f35 stash: remove `stash_index_path'
-- 
2.20.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 18:28       ` Junio C Hamano
  2020-07-31 16:51     ` [PATCH v3 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

In reset_tree(), the value pointed by `i_tree' is not modified.  In a
latter commit, it will be provided with `the_hash_algo->empty_tree',
which is a constant.  Hence, this changes `i_tree' to be a constant.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 0c52a3b849..9baa8b379e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix)
 	return do_clear_stash();
 }
 
-static int reset_tree(struct object_id *i_tree, int update, int reset)
+static int reset_tree(const struct object_id *i_tree, int update, int reset)
 {
 	int nr_trees = 1;
 	struct unpack_trees_options opts;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 2/6] stash: remove the second index in stash_working_tree()
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 18:26       ` Junio C Hamano
  2020-07-31 16:51     ` [PATCH v3 3/6] stash: remove the second index in stash_patch() Alban Gruin
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

This removes the second index used in stash_working_tree() to simplify
the code.

The calls to set_alternative_index_output() are dropped to extract
`i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before
starting `update-index'.  When it exits, the index has changed, and must
be discarded.

With this commit, reset_tree() does not need to be called at the
beginning of stash_working_tree(), because it is only called by
do_create_stash(), which sets the index at `i_tree', and
save_untracked_files() does not change the main index.  But it will
become useful again in a later commit, when save_untracked_file() will
be rewritten to use the "main" index, so I did not remove it.

At the end of the function, the tree is reset to `i_tree'.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 9baa8b379e..2535335275 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	struct rev_info rev;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
 	struct strbuf diff_output = STRBUF_INIT;
-	struct index_state istate = { NULL };
 
 	init_revisions(&rev, NULL);
 	copy_pathspec(&rev.prune_data, ps);
 
-	set_alternate_index_output(stash_index_path.buf);
 	if (reset_tree(&info->i_tree, 0, 0)) {
 		ret = -1;
 		goto done;
 	}
-	set_alternate_index_output(NULL);
 
 	rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = add_diff_to_buf;
@@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 	argv_array_pushl(&cp_upd_index.args, "update-index",
 			 "--ignore-skip-worktree-entries",
 			 "-z", "--add", "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len,
 			 NULL, 0, NULL, 0)) {
@@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
 		goto done;
 	}
 
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
+	    reset_tree(&info->i_tree, 0, 1))
 		ret = -1;
-		goto done;
-	}
 
 done:
-	discard_index(&istate);
 	UNLEAK(rev);
 	object_array_clear(&rev.pending);
 	clear_pathspec(&rev.prune_data);
 	strbuf_release(&diff_output);
-	remove_path(stash_index_path.buf);
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 3/6] stash: remove the second index in stash_patch()
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

This removes the second index used in stash_patch().

This function starts by resetting the index (which is set at `i_tree')
to HEAD, which has been stored in `b_commit' by do_create_stash(), and
the call to `read-tree' is replaced by reset_tree().  The index is
discarded after run_add_interactive(), but not `diff-tree' as this
command should not change it.

Since the index has been changed, and subsequent code might be sensitive
to this, it is reset to `i_tree' at the end of the function.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 47 ++++++++++-------------------------------------
 1 file changed, 10 insertions(+), 37 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index 2535335275..eaeb7bc8c4 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		       struct strbuf *out_patch, int quiet)
 {
 	int ret = 0;
-	struct child_process cp_read_tree = CHILD_PROCESS_INIT;
 	struct child_process cp_diff_tree = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
-	char *old_index_env = NULL, *old_repo_index_file;
 
-	remove_path(stash_index_path.buf);
-
-	cp_read_tree.git_cmd = 1;
-	argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL);
-	argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp_read_tree)) {
-		ret = -1;
-		goto done;
-	}
+	if (reset_tree(&info->b_commit, 0, 1))
+		return -1;
 
 	/* Find out what the user wants. */
-	old_repo_index_file = the_repository->index_file;
-	the_repository->index_file = stash_index_path.buf;
-	old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT));
-	setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1);
-
 	ret = run_add_interactive(NULL, "--patch=stash", ps);
 
-	the_repository->index_file = old_repo_index_file;
-	if (old_index_env && *old_index_env)
-		setenv(INDEX_ENVIRONMENT, old_index_env, 1);
-	else
-		unsetenv(INDEX_ENVIRONMENT);
-	FREE_AND_NULL(old_index_env);
-
 	/* State of the working tree. */
-	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
+	discard_cache();
+	if (write_cache_as_tree(&info->w_tree, 0, NULL))
+		return -1;
 
 	cp_diff_tree.git_cmd = 1;
 	argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD",
 			 oid_to_hex(&info->w_tree), "--", NULL);
-	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
+	if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0))
+		return -1;
 
 	if (!out_patch->len) {
 		if (!quiet)
@@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
 		ret = 1;
 	}
 
-done:
-	discard_index(&istate);
-	remove_path(stash_index_path.buf);
+	if (reset_tree(&info->i_tree, 0, 1))
+		return -1;
+
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 4/6] stash: remove the second index in save_untracked_files()
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
                       ` (2 preceding siblings ...)
  2020-07-31 16:51     ` [PATCH v3 3/6] stash: remove the second index in stash_patch() Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 5/6] stash: remove the second index in restore_untracked() Alban Gruin
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

This removes the second index used in save_untracked_files().

This functions creates a new commit with no parents, and a tree
containing only untracked files, so the tree is reset to the empty tree
at the beginning (hence the need for reset_tree() to accept constant
trees).  The environment of `update-index' is no longer updated, and the
index is discarded after this command exits.

As the only caller of this function is do_create_stash(), and the next
user of the index is either save_untracked_files() or stash_patch(),
which both want a different tree, the index is left as-is at the end.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index eaeb7bc8c4..cbe37cd24b 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked,
 static int save_untracked_files(struct stash_info *info, struct strbuf *msg,
 				struct strbuf files)
 {
-	int ret = 0;
 	struct strbuf untracked_msg = STRBUF_INIT;
 	struct child_process cp_upd_index = CHILD_PROCESS_INIT;
-	struct index_state istate = { NULL };
+
+	if (reset_tree(the_hash_algo->empty_tree, 0, 1))
+		return -1;
 
 	cp_upd_index.git_cmd = 1;
 	argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add",
 			 "--remove", "--stdin", NULL);
-	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
+
+	if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0,
+			 NULL, 0))
+		return -1;
+
+	discard_cache();
+	if (write_cache_as_tree(&info->u_tree, 0, NULL))
+		return -1;
 
 	strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf);
-	if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0,
-			 NULL, 0)) {
-		ret = -1;
-		goto done;
-	}
-
-	if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0,
-				NULL)) {
-		ret = -1;
-		goto done;
-	}
-
 	if (commit_tree(untracked_msg.buf, untracked_msg.len,
 			&info->u_tree, NULL, &info->u_commit, NULL, NULL)) {
-		ret = -1;
-		goto done;
+		strbuf_release(&untracked_msg);
+		return -1;
 	}
 
-done:
-	discard_index(&istate);
+	/* Do not reset the tree, as either stash_patch() or
+	 * stash_working_tree() will do it. */
+
 	strbuf_release(&untracked_msg);
-	remove_path(stash_index_path.buf);
-	return ret;
+	return 0;
 }
 
 static int stash_patch(struct stash_info *info, const struct pathspec *ps,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 5/6] stash: remove the second index in restore_untracked()
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
                       ` (3 preceding siblings ...)
  2020-07-31 16:51     ` [PATCH v3 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 16:51     ` [PATCH v3 6/6] stash: remove `stash_index_path' Alban Gruin
  2020-07-31 17:48     ` [PATCH v3 0/6] stash: drop usage of a second index Junio C Hamano
  6 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

This removes the second index used in restore_untracked().

The call to `read-tree' is replaced by reset_tree() with the appropriate
parameters (no update, no reset).  The environment of `checkout-index'
is no longer modified, and the cache is discarded when it exits.

In do_apply_stash(), the changes are a bit more involved: to avoid
conflicts with the merged index, restore_untracked() is moved after
merge_recursive_generic().

This introduces another problem: the files that were untracked once are
now added to the index, and update_index() would add back those files in
the index.  To avoid this, get_newly_staged() is moved before
restore_untracked().

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 47 +++++++++++++++++------------------------------
 1 file changed, 17 insertions(+), 30 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index cbe37cd24b..d5077a27d9 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -359,29 +359,15 @@ static int restore_untracked(struct object_id *u_tree)
 	int res;
 	struct child_process cp = CHILD_PROCESS_INIT;
 
-	/*
-	 * We need to run restore files from a given index, but without
-	 * affecting the current index, so we use GIT_INDEX_FILE with
-	 * run_command to fork processes that will not interfere.
-	 */
-	cp.git_cmd = 1;
-	argv_array_push(&cp.args, "read-tree");
-	argv_array_push(&cp.args, oid_to_hex(u_tree));
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
-	if (run_command(&cp)) {
-		remove_path(stash_index_path.buf);
+	if (reset_tree(u_tree, 0, 0))
 		return -1;
-	}
 
-	child_process_init(&cp);
 	cp.git_cmd = 1;
 	argv_array_pushl(&cp.args, "checkout-index", "--all", NULL);
-	argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s",
-			 stash_index_path.buf);
 
 	res = run_command(&cp);
-	remove_path(stash_index_path.buf);
+	discard_cache();
+
 	return res;
 }
 
@@ -395,6 +381,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 	struct object_id index_tree;
 	struct commit *result;
 	const struct object_id *bases[1];
+	struct strbuf newly_staged = STRBUF_INIT;
 
 	read_cache_preload(NULL);
 	if (refresh_and_write_cache(REFRESH_QUIET, 0, 0))
@@ -433,9 +420,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		}
 	}
 
-	if (info->has_u && restore_untracked(&info->u_tree))
-		return error(_("could not restore untracked files from stash"));
-
 	init_merge_options(&o, the_repository);
 
 	o.branch1 = "Updated upstream";
@@ -463,24 +447,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info,
 		return ret;
 	}
 
+	if (!has_index && get_newly_staged(&newly_staged, &c_tree)) {
+		strbuf_release(&newly_staged);
+		return -1;
+	}
+
+	if (info->has_u && restore_untracked(&info->u_tree)) {
+		strbuf_release(&newly_staged);
+		return error(_("could not restore untracked files from stash"));
+	}
+
 	if (has_index) {
 		if (reset_tree(&index_tree, 0, 0))
 			return -1;
 	} else {
-		struct strbuf out = STRBUF_INIT;
-
-		if (get_newly_staged(&out, &c_tree)) {
-			strbuf_release(&out);
-			return -1;
-		}
-
 		if (reset_tree(&c_tree, 0, 1)) {
-			strbuf_release(&out);
+			strbuf_release(&newly_staged);
 			return -1;
 		}
 
-		ret = update_index(&out);
-		strbuf_release(&out);
+		ret = update_index(&newly_staged);
+		strbuf_release(&newly_staged);
 		if (ret)
 			return -1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 6/6] stash: remove `stash_index_path'
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
                       ` (4 preceding siblings ...)
  2020-07-31 16:51     ` [PATCH v3 5/6] stash: remove the second index in restore_untracked() Alban Gruin
@ 2020-07-31 16:51     ` Alban Gruin
  2020-07-31 17:48     ` [PATCH v3 0/6] stash: drop usage of a second index Junio C Hamano
  6 siblings, 0 replies; 44+ messages in thread
From: Alban Gruin @ 2020-07-31 16:51 UTC (permalink / raw)
  To: git, Christian Couder, Junio C Hamano
  Cc: Thomas Gummerer, Son Luong Ngoc, Alban Gruin

Since stash no longer uses a second index, `stash_index_path' is now
unused, and can be dropped.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 builtin/stash.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/builtin/stash.c b/builtin/stash.c
index d5077a27d9..f1b3c0d2f8 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = {
 };
 
 static const char *ref_stash = "refs/stash";
-static struct strbuf stash_index_path = STRBUF_INIT;
 
 /*
  * w_commit is set to the commit containing the working tree
@@ -1505,8 +1504,6 @@ static int save_stash(int argc, const char **argv, const char *prefix)
 
 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
-	pid_t pid = getpid();
-	const char *index_file;
 	struct argv_array args = ARGV_ARRAY_INIT;
 
 	struct option options[] = {
@@ -1523,10 +1520,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, git_stash_usage,
 			     PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
 
-	index_file = get_index_file();
-	strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file,
-		    (uintmax_t)pid);
-
 	if (!argc)
 		return !!push_stash(0, NULL, prefix, 0);
 	else if (!strcmp(argv[0], "apply"))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 0/6] stash: drop usage of a second index
  2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
                       ` (5 preceding siblings ...)
  2020-07-31 16:51     ` [PATCH v3 6/6] stash: remove `stash_index_path' Alban Gruin
@ 2020-07-31 17:48     ` Junio C Hamano
  6 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-07-31 17:48 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder, Thomas Gummerer, Son Luong Ngoc

Alban Gruin <alban.gruin@gmail.com> writes:

> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index.  This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.).

Does the "second index" in the title refer to the on-disk $TMPindex
in https://github.com/git/git/blob/ffac537e6c/git-stash.sh#L147 that
is used to create a tree object $u_tree (and similarly for the
working tree files $w_tree)?

> The goal of this series is to modernise (a bit) builtin/stash.c.

Modernise in what way is quite unclear.  With the internal API we
have available from C code, we can create a tree object from an
in-core index without writing the in-core index out to an on-disk
file that is different from the main on-disk index file, and I
suspect, from the "drop usage of" in the title, that it is what this
series is trying to do, but the description could have been written
in a way that is more helpful to readers to understand it without
having to guess.  It made me wonder if you are not even using the
secondary in-core index and no longer writing the tree to record the
untracked paths and their contents, but obviously such a patch would
not work well ;-)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 2/6] stash: remove the second index in stash_working_tree()
  2020-07-31 16:51     ` [PATCH v3 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
@ 2020-07-31 18:26       ` Junio C Hamano
  2020-08-02  2:20         ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2020-07-31 18:26 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder, Thomas Gummerer, Son Luong Ngoc

Alban Gruin <alban.gruin@gmail.com> writes:

> This removes the second index used in stash_working_tree() to simplify
> the code.

Continuing what I said for [0/6], say "the second index file" to
clarify the distinction between the in-core index and on-disk index
files to avoid confusion.

>  	init_revisions(&rev, NULL);
>  	copy_pathspec(&rev.prune_data, ps);
>  
> -	set_alternate_index_output(stash_index_path.buf);
>  	if (reset_tree(&info->i_tree, 0, 0)) {
>  		ret = -1;
>  		goto done;
>  	}
> -	set_alternate_index_output(NULL);

Hmph.  So at this point i_tree is what is in the index, we reset the
working tree to it with reset_tree(), and instead of writing to
$TMPindex, we let reset_tree() clobber the main on-disk index but we
do not care because the result is supposed to be the same as what
was originally in the index anyway?

> @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
>  	argv_array_pushl(&cp_upd_index.args, "update-index",
>  			 "--ignore-skip-worktree-entries",
>  			 "-z", "--add", "--remove", "--stdin", NULL);
> -	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
> -			 stash_index_path.buf);
>  
>  	if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len,
>  			 NULL, 0, NULL, 0)) {

And then the new code now lets "update-index" work directly on the
main index (which does make an observable difference to the outside
world, but we are not letting any hook to look at this intermediate
state, so it might be OK---I cannot tell at this point in the code).

> @@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
>  		goto done;
>  	}
>  
> -	if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0,
> -				NULL)) {
> +	discard_cache();
> +	if (write_cache_as_tree(&info->w_tree, 0, NULL) ||
> +	    reset_tree(&info->i_tree, 0, 1))

We used to read from $TMPindex, which has been updated with the
contents of files modified in the working tree, and write its
content out as a tree object, grabbing its object name into w_tree.

The new code instead writes out the same tree from the main index,
which has been clobbered with the working tree changes that the user
hasn't added to the index.  Because of that, we need to discard
these changes and re-read the in-core index out of i_tree with
reset_tree().

So, this change makes one new call to reset_tree() that scans and
updates working tree files that are modified?  How expensive is
that?  

It's not like this change trades cost to write out a temporary index
file with the cost of having to call reset_tree() one more time---as
far as I can see, the new code writes to on-disk index file the same
time as the original code.

How is the use of second on-disk index hurting us?  I must be
missing something obvious, but at this point, it is not clear what
we are gaining---I only see downsides.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const
  2020-07-31 16:51     ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
@ 2020-07-31 18:28       ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-07-31 18:28 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder, Thomas Gummerer, Son Luong Ngoc

Alban Gruin <alban.gruin@gmail.com> writes:

> In reset_tree(), the value pointed by `i_tree' is not modified.  In a

This helper function is about "reset the working tree and the index
to match this given tree object" and it is clear that it is an
input-only parameter, so that alone is a good justification for code
hygiene.  Good.

> latter commit, it will be provided with `the_hash_algo->empty_tree',
> which is a constant.  Hence, this changes `i_tree' to be a constant.
>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>  builtin/stash.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 0c52a3b849..9baa8b379e 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix)
>  	return do_clear_stash();
>  }
>  
> -static int reset_tree(struct object_id *i_tree, int update, int reset)
> +static int reset_tree(const struct object_id *i_tree, int update, int reset)
>  {
>  	int nr_trees = 1;
>  	struct unpack_trees_options opts;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 2/6] stash: remove the second index in stash_working_tree()
  2020-07-31 18:26       ` Junio C Hamano
@ 2020-08-02  2:20         ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2020-08-02  2:20 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder, Thomas Gummerer, Son Luong Ngoc

Junio C Hamano <gitster@pobox.com> writes:

>> @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps
>>  	argv_array_pushl(&cp_upd_index.args, "update-index",
>>  			 "--ignore-skip-worktree-entries",
>>  			 "-z", "--add", "--remove", "--stdin", NULL);
>> -	argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s",
>> -			 stash_index_path.buf);
>>  
>>  	if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len,
>>  			 NULL, 0, NULL, 0)) {
>
> And then the new code now lets "update-index" work directly on the
> main index (which does make an observable difference to the outside
> world, but we are not letting any hook to look at this intermediate
> state, so it might be OK---I cannot tell at this point in the code).

This changes the behaviour of the command a lot when a fatal happens
anywhere after this point til we reset the index to match the HEAD,
doesn't it, I wonder.  The original code refrained from touching the
index so after such a "fatal:" error message, the user would have
seen the original state (with previous "git add" etc. the user did)
in the index.  New code now added all the working tree changes to
the index, so the user would lose the work made so far in the main
index, no?  With later patch in the series applied, we will end up
having even untracked paths in the main index---I am not sure to
what extent that step would make things even worse, but it cannot be
harmless.

> How is the use of second on-disk index hurting us?  I must be
> missing something obvious, but at this point, it is not clear what
> we are gaining---I only see downsides.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-08-02  2:33 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 10:48 [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
2020-05-05 10:48 ` [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-06-13  8:09   ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-06-13  8:52   ` Christian Couder
2020-06-13 18:00     ` Alban Gruin
2020-06-15 12:02       ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-06-13  9:38   ` Christian Couder
2020-06-13 10:04     ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-06-13 18:51   ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-06-13 19:41   ` Christian Couder
2020-05-05 10:48 ` [RFC PATCH v1 6/6] stash: remove `stash_index_path' Alban Gruin
2020-06-04 12:07 ` [RFC PATCH v1 0/6] stash: drop usage of a second index Alban Gruin
2020-06-13  7:52 ` Christian Couder
2020-06-25 12:35   ` Alban Gruin
2020-06-15 15:27 ` SZEDER Gábor
2020-06-15 21:50   ` SZEDER Gábor
2020-06-16  7:06     ` SZEDER Gábor
2020-06-17 20:04       ` Junio C Hamano
2020-06-17 21:31         ` Alban Gruin
2020-06-30 15:15 ` [PATCH v2 " Alban Gruin
2020-06-30 15:15   ` [PATCH v2 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-06-30 15:15   ` [PATCH v2 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-06-30 15:15   ` [PATCH v2 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-06-30 15:15   ` [PATCH v2 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-06-30 15:15   ` [PATCH v2 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-07-31 13:45     ` Christian Couder
2020-07-31 16:16       ` Alban Gruin
2020-06-30 15:15   ` [PATCH v2 6/6] stash: remove `stash_index_path' Alban Gruin
2020-07-31 13:53   ` [PATCH v2 0/6] stash: drop usage of a second index Christian Couder
2020-07-31 16:51   ` [PATCH v3 " Alban Gruin
2020-07-31 16:51     ` [PATCH v3 1/6] stash: mark `i_tree' in reset_tree() const Alban Gruin
2020-07-31 18:28       ` Junio C Hamano
2020-07-31 16:51     ` [PATCH v3 2/6] stash: remove the second index in stash_working_tree() Alban Gruin
2020-07-31 18:26       ` Junio C Hamano
2020-08-02  2:20         ` Junio C Hamano
2020-07-31 16:51     ` [PATCH v3 3/6] stash: remove the second index in stash_patch() Alban Gruin
2020-07-31 16:51     ` [PATCH v3 4/6] stash: remove the second index in save_untracked_files() Alban Gruin
2020-07-31 16:51     ` [PATCH v3 5/6] stash: remove the second index in restore_untracked() Alban Gruin
2020-07-31 16:51     ` [PATCH v3 6/6] stash: remove `stash_index_path' Alban Gruin
2020-07-31 17:48     ` [PATCH v3 0/6] stash: drop usage of a second index Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).