The old scripted `git stash' used to create a second index to save modified and untracked files, and restore untracked files, without affecting the main index. This behaviour was carried on when it was rewritten in C, and here, most operations performed on the second index are done by forked commands (ie. `read-tree' instead of reset_tree(), etc.). This works most of the time, except in some edge case with the split-index when the split file has expired and is deleted by a forked command: the main index may still contain a reference to the now-deleted file, and subsequent operations on the index will fail [0]. The goal of this series is to modernise (a bit) builtin/stash.c, and to fix the aforementionned edge case. I have to admit that I don't really know how to test this. GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable the split-index at all, at least in `git stash' and its forks), and I'm reluctant to add explicits tests on `git stash' about the split-index, when nothing in its code explicitly does unusual things with the index once this series is applied. If anyone wants to share opinions about this, I would be happy to read them. This series is based on b34789c0b0 ("The sixth batch", 2020-05-01). The tip of this series is tagged as "stash-remove-second-index-v1" at https://github.com/agrn/git. [0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/ Alban Gruin (6): stash: mark `i_tree' in reset_tree() const stash: remove the second index in stash_working_tree() stash: remove the second index in stash_patch() stash: remove the second index in save_untracked_files() stash: remove the second index in restore_untracked() stash: remove `stash_index_path' builtin/stash.c | 151 +++++++++++++++--------------------------------- 1 file changed, 46 insertions(+), 105 deletions(-) -- 2.26.2
As reset_tree() does not change the value pointed by `i_tree', and that it will be provided with `the_hash_algo->empty_tree' which is a constant, it is changed to be a pointer to a constant. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/stash.c b/builtin/stash.c index 0c52a3b849..9baa8b379e 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix) return do_clear_stash(); } -static int reset_tree(struct object_id *i_tree, int update, int reset) +static int reset_tree(const struct object_id *i_tree, int update, int reset) { int nr_trees = 1; struct unpack_trees_options opts; -- 2.26.2
This removes the second index used in stash_working_tree() to simplify the code. It also help to avoid issues with the split-index: when stash_working_tree() is called, the index is at `i_tree', and this tree is extracted in a second index for use in a subcommand. This is not a problem in the non-split-index case, but in the split-index case, if the shared index file has expired and is removed by a subcommand, the main index contains a reference to a file that no longer exists. The calls to set_alternative_index_output() are dropped to extract `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before starting `update-index'. When it exits, the index has changed, and must be discarded. The call to reset_tree() becomes useless: the only caller of stash_working_tree() is do_create_stash(), which creates `i_tree' from its index, calls save_untracked_files() if requested (but as it also works on a second index, it is unaffected), then calls stash_working_tree(). But when save_untracked_files() will be modified to stop using another index, it won't reset the tree, because stash_patch() wants to work on a different tree (`b_tree') than stash_working_tree(). At the end of the function, the tree is reset to `i_tree'. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 9baa8b379e..2535335275 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps struct rev_info rev; struct child_process cp_upd_index = CHILD_PROCESS_INIT; struct strbuf diff_output = STRBUF_INIT; - struct index_state istate = { NULL }; init_revisions(&rev, NULL); copy_pathspec(&rev.prune_data, ps); - set_alternate_index_output(stash_index_path.buf); if (reset_tree(&info->i_tree, 0, 0)) { ret = -1; goto done; } - set_alternate_index_output(NULL); rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; rev.diffopt.format_callback = add_diff_to_buf; @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps argv_array_pushl(&cp_upd_index.args, "update-index", "--ignore-skip-worktree-entries", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len, NULL, 0, NULL, 0)) { @@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps goto done; } - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL) || + reset_tree(&info->i_tree, 0, 1)) ret = -1; - goto done; - } done: - discard_index(&istate); UNLEAK(rev); object_array_clear(&rev.pending); clear_pathspec(&rev.prune_data); strbuf_release(&diff_output); - remove_path(stash_index_path.buf); return ret; } -- 2.26.2
This removes the second index used in stash_patch(). This function starts by resetting the index (which is set at `i_tree') to HEAD, which has been stored in `b_commit' by do_create_stash(), and the call to `read-tree' is replaced by reset_tree(). The index is discarded after run_add_interactive(), but not `diff-tree' as this command should not change it. Since the index has been changed, and subsequent code might be sensitive to this, it is reset to `i_tree' at the end of the function. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 47 ++++++++++------------------------------------- 1 file changed, 10 insertions(+), 37 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 2535335275..eaeb7bc8c4 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, struct strbuf *out_patch, int quiet) { int ret = 0; - struct child_process cp_read_tree = CHILD_PROCESS_INIT; struct child_process cp_diff_tree = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; - char *old_index_env = NULL, *old_repo_index_file; - - remove_path(stash_index_path.buf); - cp_read_tree.git_cmd = 1; - argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL); - argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp_read_tree)) { - ret = -1; - goto done; - } + if (reset_tree(&info->b_commit, 0, 1)) + return -1; /* Find out what the user wants. */ - old_repo_index_file = the_repository->index_file; - the_repository->index_file = stash_index_path.buf; - old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT)); - setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1); - ret = run_add_interactive(NULL, "--patch=stash", ps); - the_repository->index_file = old_repo_index_file; - if (old_index_env && *old_index_env) - setenv(INDEX_ENVIRONMENT, old_index_env, 1); - else - unsetenv(INDEX_ENVIRONMENT); - FREE_AND_NULL(old_index_env); - /* State of the working tree. */ - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL)) + return -1; cp_diff_tree.git_cmd = 1; argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD", oid_to_hex(&info->w_tree), "--", NULL); - if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) { - ret = -1; - goto done; - } + if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) + return -1; if (!out_patch->len) { if (!quiet) @@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, ret = 1; } -done: - discard_index(&istate); - remove_path(stash_index_path.buf); + if (reset_tree(&info->i_tree, 0, 1)) + return -1; + return ret; } -- 2.26.2
This removes the second index used in save_untracked_files(). This functions creates a new commit with no parents, and a tree containing only untracked files, so the tree is reset to the empty tree at the beginning (hence the need for reset_tree() to accept constant trees). The environment of `update-index' is no longer updated, and the index is discarded after this command exited. As the only caller of this function is do_create_stash(), and the next user of the index is either save_untracked_files() or stash_patch(), which both want a different tree, the index is left as-is at the end. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 35 +++++++++++++++-------------------- 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index eaeb7bc8c4..cbe37cd24b 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked, static int save_untracked_files(struct stash_info *info, struct strbuf *msg, struct strbuf files) { - int ret = 0; struct strbuf untracked_msg = STRBUF_INIT; struct child_process cp_upd_index = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; + + if (reset_tree(the_hash_algo->empty_tree, 0, 1)) + return -1; cp_upd_index.git_cmd = 1; argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf); if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0, - NULL, 0)) { - ret = -1; - goto done; - } + NULL, 0)) + return -1; - if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } + discard_cache(); + if (write_cache_as_tree(&info->u_tree, 0, NULL)) + return -1; + strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf); if (commit_tree(untracked_msg.buf, untracked_msg.len, &info->u_tree, NULL, &info->u_commit, NULL, NULL)) { - ret = -1; - goto done; + strbuf_release(&untracked_msg); + return -1; } -done: - discard_index(&istate); + /* Do not reset the tree, as either stash_patch() or + * stash_working_tree() will do it. */ + strbuf_release(&untracked_msg); - remove_path(stash_index_path.buf); - return ret; + return 0; } static int stash_patch(struct stash_info *info, const struct pathspec *ps, -- 2.26.2
This removes the second index used in restore_untracked(). The call to `read-tree' is replaced by reset_tree() with the appropriate parameters (no update, no reset). The environment of `checkout-index' is no longer modified, and the cache is discarded when it exists. In do_apply_stash(), the changes are a bit more involved: to avoid conflicts with the merged index, restore_untracked() is moved after merge_recursive_generic(). This introduces another problem: the files that were untracked once are now added to the index, and update_index() would add back those files in the index. To avoid this, get_newly_staged() is moved before restore_untracked(). Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 46 +++++++++++++++++----------------------------- 1 file changed, 17 insertions(+), 29 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index cbe37cd24b..1eafc1fe8d 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -359,29 +359,16 @@ static int restore_untracked(struct object_id *u_tree) int res; struct child_process cp = CHILD_PROCESS_INIT; - /* - * We need to run restore files from a given index, but without - * affecting the current index, so we use GIT_INDEX_FILE with - * run_command to fork processes that will not interfere. - */ - cp.git_cmd = 1; - argv_array_push(&cp.args, "read-tree"); - argv_array_push(&cp.args, oid_to_hex(u_tree)); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp)) { - remove_path(stash_index_path.buf); + if (reset_tree(u_tree, 0, 0)) return -1; - } child_process_init(&cp); cp.git_cmd = 1; argv_array_pushl(&cp.args, "checkout-index", "--all", NULL); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); res = run_command(&cp); - remove_path(stash_index_path.buf); + discard_cache(); + return res; } @@ -395,6 +382,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, struct object_id index_tree; struct commit *result; const struct object_id *bases[1]; + struct strbuf newly_staged = STRBUF_INIT; read_cache_preload(NULL); if (refresh_and_write_cache(REFRESH_QUIET, 0, 0)) @@ -433,9 +421,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, } } - if (info->has_u && restore_untracked(&info->u_tree)) - return error(_("could not restore untracked files from stash")); - init_merge_options(&o, the_repository); o.branch1 = "Updated upstream"; @@ -463,24 +448,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, return ret; } + if (!has_index && get_newly_staged(&newly_staged, &c_tree)) { + strbuf_release(&newly_staged); + return -1; + } + + if (info->has_u && restore_untracked(&info->u_tree)) { + strbuf_release(&newly_staged); + return error(_("could not restore untracked files from stash")); + } + if (has_index) { if (reset_tree(&index_tree, 0, 0)) return -1; } else { - struct strbuf out = STRBUF_INIT; - - if (get_newly_staged(&out, &c_tree)) { - strbuf_release(&out); - return -1; - } - if (reset_tree(&c_tree, 0, 1)) { - strbuf_release(&out); + strbuf_release(&newly_staged); return -1; } - ret = update_index(&out); - strbuf_release(&out); + ret = update_index(&newly_staged); + strbuf_release(&newly_staged); if (ret) return -1; -- 2.26.2
Since stash no longer uses a second index, `stash_index_path' is now unused, and can be dropped. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 1eafc1fe8d..709fd5ca34 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = { }; static const char *ref_stash = "refs/stash"; -static struct strbuf stash_index_path = STRBUF_INIT; /* * w_commit is set to the commit containing the working tree @@ -1506,8 +1505,6 @@ static int save_stash(int argc, const char **argv, const char *prefix) int cmd_stash(int argc, const char **argv, const char *prefix) { - pid_t pid = getpid(); - const char *index_file; struct argv_array args = ARGV_ARRAY_INIT; struct option options[] = { @@ -1524,10 +1521,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_stash_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH); - index_file = get_index_file(); - strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file, - (uintmax_t)pid); - if (!argc) return !!push_stash(0, NULL, prefix, 0); else if (!strcmp(argv[0], "apply")) -- 2.26.2
Hi,
Le 05/05/2020 à 12:48, Alban Gruin a écrit :
> The old scripted `git stash' used to create a second index to save
> modified and untracked files, and restore untracked files, without
> affecting the main index. This behaviour was carried on when it was
> rewritten in C, and here, most operations performed on the second index
> are done by forked commands (ie. `read-tree' instead of reset_tree(),
> etc.). This works most of the time, except in some edge case with the
> split-index when the split file has expired and is deleted by a forked
> command: the main index may still contain a reference to the now-deleted
> file, and subsequent operations on the index will fail [0].
>
> The goal of this series is to modernise (a bit) builtin/stash.c, and to
> fix the aforementionned edge case.
>
> I have to admit that I don't really know how to test this.
> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
> the split-index at all, at least in `git stash' and its forks), and I'm
> reluctant to add explicits tests on `git stash' about the split-index,
> when nothing in its code explicitly does unusual things with the index
> once this series is applied. If anyone wants to share opinions about
> this, I would be happy to read them.
>
> This series is based on b34789c0b0 ("The sixth batch", 2020-05-01).
>
> The tip of this series is tagged as "stash-remove-second-index-v1" at
> https://github.com/agrn/git.
>
> [0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/
>
> Alban Gruin (6):
> stash: mark `i_tree' in reset_tree() const
> stash: remove the second index in stash_working_tree()
> stash: remove the second index in stash_patch()
> stash: remove the second index in save_untracked_files()
> stash: remove the second index in restore_untracked()
> stash: remove `stash_index_path'
>
> builtin/stash.c | 151 +++++++++++++++---------------------------------
> 1 file changed, 46 insertions(+), 105 deletions(-)
>
I’m afraid this series fell through the cracks.
Alban
Hi, On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > The old scripted `git stash' used to create a second index to save > modified and untracked files, and restore untracked files, without > affecting the main index. This behaviour was carried on when it was > rewritten in C, and here, most operations performed on the second index > are done by forked commands (ie. `read-tree' instead of reset_tree(), > etc.). This works most of the time, except in some edge case with the > split-index when the split file has expired and is deleted by a forked > command: the main index may still contain a reference to the now-deleted > file, and subsequent operations on the index will fail [0]. Thanks for working on this! I agree that it would be nice to fix split index issues as it could help for sure with huge repositories. Sorry also that this patch series fell through the cracks. I am adding Son Luong Ngoc in Cc as he reported the issue that this series fixes. > The goal of this series is to modernise (a bit) builtin/stash.c, and to > fix the aforementionned edge case. > > I have to admit that I don't really know how to test this. > GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable > the split-index at all, at least in `git stash' and its forks), It should have worked when it was introduced, though maybe not for `git stash`. > and I'm > reluctant to add explicits tests on `git stash' about the split-index, > when nothing in its code explicitly does unusual things with the index > once this series is applied. If anyone wants to share opinions about > this, I would be happy to read them. I understand. I think the good way forward would be to fix GIT_TEST_SPLIT_INDEX and find a way to ensure that it keeps working in the future. Thanks, Christian.
On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > As reset_tree() does not change the value pointed by `i_tree', and that > it will be provided with `the_hash_algo->empty_tree' which is a > constant, it is changed to be a pointer to a constant. [...] > -static int reset_tree(struct object_id *i_tree, int update, int reset) > +static int reset_tree(const struct object_id *i_tree, int update, int reset) Yeah, in rest_tree() it is only used like this: tree = parse_tree_indirect(i_tree); and parse_tree_indirect() takes a 'const struct object_id *'. > -- > 2.26.2 >
On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > This removes the second index used in stash_working_tree() to simplify > the code. It also help to avoid issues with the split-index: when s/help/helps/ > stash_working_tree() is called, the index is at `i_tree', and this tree > is extracted in a second index for use in a subcommand. This is not a > problem in the non-split-index case, but in the split-index case, if the > shared index file has expired and is removed by a subcommand, the main > index contains a reference to a file that no longer exists. As this is fixing a bug and there is no test, it might help if you can at least give an example of something that used to fail before this patch and doesn't after it. You are talking about stash subcommands but it is not very clear which one for example can trigger the bug. > The calls to set_alternative_index_output() are dropped to extract > `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before > starting `update-index'. When it exits, the index has changed, and must > be discarded. That makes sense. > The call to reset_tree() becomes useless: Your patch doesn't remove any call to reset_tree(), but actually adds one. So the above is difficult to understand. Do you want to say that in a later patch it will be possible to remove the call to reset_tree()? Or do you want to say that the call to write_index_as_tree() becomes useless? > the only caller of > stash_working_tree() is do_create_stash(), which creates `i_tree' from > its index, calls save_untracked_files() if requested (but as it also > works on a second index, it is unaffected), then calls > stash_working_tree(). But when save_untracked_files() will be modified > to stop using another index, it won't reset the tree, because > stash_patch() wants to work on a different tree (`b_tree') than > stash_working_tree(). > > At the end of the function, the tree is reset to `i_tree'.
On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > This removes the second index used in stash_patch(). > > This function starts by resetting the index (which is set at `i_tree') > to HEAD, which has been stored in `b_commit' by do_create_stash(), and > the call to `read-tree' is replaced by reset_tree(). The index is > discarded after run_add_interactive(), but not `diff-tree' as this > command should not change it. > > Since the index has been changed, and subsequent code might be sensitive > to this, it is reset to `i_tree' at the end of the function. [...] > /* State of the working tree. */ > - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, > - NULL)) { > - ret = -1; > - goto done; > - } > + discard_cache(); > + if (write_cache_as_tree(&info->w_tree, 0, NULL)) > + return -1; In the previous patch you use the following: + if (write_cache_as_tree(&info->w_tree, 0, NULL) || + reset_tree(&info->i_tree, 0, 1)) Here the reset_tree(&info->i_tree, 0, 1) call is in the "done:" part of the code. Is there a reason for that? Can't reset_tree(&info->i_tree, 0, 1) be tried here before returning an error? Or was the previous patch wrong? I guess the reason why the reset_tree(&info->i_tree, 0, 1) call here is in the "done:" part of the call is because it should be after the diff tree command is launched. I see that the commit message tries to explain this a bit, but it is still not very clear to me.
On Sat, Jun 13, 2020 at 11:38 AM Christian Couder <christian.couder@gmail.com> wrote: > > On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > > > This removes the second index used in stash_patch(). > > > > This function starts by resetting the index (which is set at `i_tree') > > to HEAD, which has been stored in `b_commit' by do_create_stash(), and > > the call to `read-tree' is replaced by reset_tree(). The index is > > discarded after run_add_interactive(), but not `diff-tree' as this > > command should not change it. > > > > Since the index has been changed, and subsequent code might be sensitive > > to this, it is reset to `i_tree' at the end of the function. > > [...] > > > /* State of the working tree. */ > > - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, > > - NULL)) { > > - ret = -1; > > - goto done; > > - } > > + discard_cache(); > > + if (write_cache_as_tree(&info->w_tree, 0, NULL)) > > + return -1; > > In the previous patch you use the following: > > + if (write_cache_as_tree(&info->w_tree, 0, NULL) || > + reset_tree(&info->i_tree, 0, 1)) > > Here the reset_tree(&info->i_tree, 0, 1) call is in the "done:" part > of the code. Is there a reason for that? Can't > reset_tree(&info->i_tree, 0, 1) be tried here before returning an > error? Or was the previous patch wrong? Sorry I think I misunderstood the code in the previous patch. It actually doesn't try reset_tree(&info->i_tree, 0, 1) when write_cache_as_tree(&info->w_tree, 0, NULL) fails. > I guess the reason why the > reset_tree(&info->i_tree, 0, 1) call here is in the "done:" part of > the call is because it should be after the diff tree command is > launched. I see that the commit message tries to explain this a bit, > but it is still not very clear to me.
Hi Christian, Le 13/06/2020 à 10:52, Christian Couder a écrit : > On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: >> >> This removes the second index used in stash_working_tree() to simplify >> the code. It also help to avoid issues with the split-index: when > > s/help/helps/ > >> stash_working_tree() is called, the index is at `i_tree', and this tree >> is extracted in a second index for use in a subcommand. This is not a >> problem in the non-split-index case, but in the split-index case, if the >> shared index file has expired and is removed by a subcommand, the main >> index contains a reference to a file that no longer exists. > > As this is fixing a bug and there is no test, it might help if you can > at least give an example of something that used to fail before this > patch and doesn't after it. You are talking about stash subcommands > but it is not very clear which one for example can trigger the bug. > >> The calls to set_alternative_index_output() are dropped to extract >> `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before >> starting `update-index'. When it exits, the index has changed, and must >> be discarded. > > That makes sense. > >> The call to reset_tree() becomes useless: > > Your patch doesn't remove any call to reset_tree(), but actually adds > one. So the above is difficult to understand. > > Do you want to say that in a later patch it will be possible to remove > the call to reset_tree()? Or do you want to say that the call to > write_index_as_tree() becomes useless? > No, I meant that with this commit, reset_tree() does not need to be called at the beginning of stash_working_tree(), because it is only called by do_create_stash(), which sets the index at `i_tree', and save_untracked_files() does not change the main index. But it will become useful again down the line, when save_untracked_file() will be rewritten to use the "main" index, so I did not remove it. I hope it makes more sense now. >> the only caller of >> stash_working_tree() is do_create_stash(), which creates `i_tree' from >> its index, calls save_untracked_files() if requested (but as it also >> works on a second index, it is unaffected), then calls >> stash_working_tree(). But when save_untracked_files() will be modified >> to stop using another index, it won't reset the tree, because >> stash_patch() wants to work on a different tree (`b_tree') than >> stash_working_tree(). >> >> At the end of the function, the tree is reset to `i_tree'. Alban
On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in save_untracked_files().
>
> This functions creates a new commit with no parents, and a tree
> containing only untracked files, so the tree is reset to the empty tree
> at the beginning (hence the need for reset_tree() to accept constant
> trees). The environment of `update-index' is no longer updated, and the
> index is discarded after this command exited.
Maybe s/exited/exits/.
On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > diff --git a/builtin/stash.c b/builtin/stash.c > index cbe37cd24b..1eafc1fe8d 100644 > --- a/builtin/stash.c > +++ b/builtin/stash.c > @@ -359,29 +359,16 @@ static int restore_untracked(struct object_id *u_tree) > int res; > struct child_process cp = CHILD_PROCESS_INIT; > > - /* > - * We need to run restore files from a given index, but without > - * affecting the current index, so we use GIT_INDEX_FILE with > - * run_command to fork processes that will not interfere. > - */ > - cp.git_cmd = 1; > - argv_array_push(&cp.args, "read-tree"); > - argv_array_push(&cp.args, oid_to_hex(u_tree)); > - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", > - stash_index_path.buf); > - if (run_command(&cp)) { > - remove_path(stash_index_path.buf); > + if (reset_tree(u_tree, 0, 0)) > return -1; > - } > > child_process_init(&cp); Is this still necessary? > cp.git_cmd = 1; > argv_array_pushl(&cp.args, "checkout-index", "--all", NULL); > - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", > - stash_index_path.buf); > > res = run_command(&cp); > - remove_path(stash_index_path.buf); > + discard_cache(); > + > return res; > }
On Sat, Jun 13, 2020 at 8:00 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > Le 13/06/2020 à 10:52, Christian Couder a écrit : > > On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote: > >> The call to reset_tree() becomes useless: > > > > Your patch doesn't remove any call to reset_tree(), but actually adds > > one. So the above is difficult to understand. > > > > Do you want to say that in a later patch it will be possible to remove > > the call to reset_tree()? Or do you want to say that the call to > > write_index_as_tree() becomes useless? > > > > No, I meant that with this commit, reset_tree() does not need to be > called at the beginning of stash_working_tree(), because it is only > called by do_create_stash(), which sets the index at `i_tree', and > save_untracked_files() does not change the main index. But it will > become useful again down the line, when save_untracked_file() will be > rewritten to use the "main" index, so I did not remove it. > > I hope it makes more sense now. Yeah, it makes more sense with an explanation like the above. Instead of "down the line" though, you might want to say something like "in a later commit", as I think it will make it clearer for reviewers that the commit when it will become useful again should be part of the same series. Thanks, Christian.
On Tue, May 05, 2020 at 12:48:43PM +0200, Alban Gruin wrote: > The old scripted `git stash' used to create a second index to save > modified and untracked files, and restore untracked files, without > affecting the main index. This behaviour was carried on when it was > rewritten in C, and here, most operations performed on the second index > are done by forked commands (ie. `read-tree' instead of reset_tree(), > etc.). This works most of the time, except in some edge case with the > split-index when the split file has expired and is deleted by a forked > command: the main index may still contain a reference to the now-deleted > file, and subsequent operations on the index will fail [0]. > > The goal of this series is to modernise (a bit) builtin/stash.c, and to > fix the aforementionned edge case. While this patch series does fix this edge case, it doesn't fix the root cause, which is not specific to 'git stash', but is more general issue when using split index with an alternate index. More on this in the notes attached to the patch below. Having said that, and without actually looking at your patches, I think removing the second index from the builtin stash is a good idea nonetheless. > I have to admit that I don't really know how to test this. > GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable > the split-index at all, at least in `git stash' and its forks) You are right, as it turns out GIT_TEST_SPLIT_INDEX=1 has been broken for a while now. The patch below fixes this, but there is a bit of a fallout of test failures, which it doesn't fix, because I'm not quite sure how to fix one of them. Again, more on this in the attached notes. --- >8 --- Subject: [PATCH] [RFH] read-cache: fix GIT_TEST_SPLIT_INDEX Setting GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the split-index feature in the test suite and trigger index splitting (mostly) randomly. Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing "link" index ext with null base oid, 2019-02-13), and no index splitting has been performed at all since then. There are two places where we check the value of GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this: 1) In the lower-level do_write_index(), where, if GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index(). This call merely allocates and zero-initializes 'istate->split_index', but does nothing else (i.e. doesn't fill the base/shared index with cache entries, doesn't actually write a shared index file, etc.). Pertinent to this issue, the hash of the base index remains all zeroed out. 2) In the higher-level write_locked_index(), where, if GIT_TEST_SPLIT_INDEX is enabled _and_ 'istate->split_index' has already been initialized, it randomly sets the flag that triggers index splitting later in this function. This randomness comes from the first byte of the hash of the base index via an 'if ((first_byte & 15) < 6)' condition. However, if 'istate->split_index' hasn't been initialized (i.e. still NULL), then it calls do_write_locked_index(), which internally calls do_write_index() mentioned above, and then a 'goto out' skips right over this second GIT_TEST_SPLIT_INDEX check. This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index splitting randomly, the first two index writes were always deterministic, though I suspect this was unintentional (this is still before 6e37c8ed3c): - The initial index write never splits the index. During the first index write write_locked_index() is called with 'istate->split_index' still uninitialized, so the check in 2) is not executed. It still calls do_write_index(), though, which then executes the check in 1). The resulting all zero base index hash then leads to the 'link' extension being written to '.git/index', though a shared index file is not written: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own c6ef71168597caec8553c83d9d0048f1ef416170 base 0000000000000000000000000000000000000000 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file replacements: deletions: $ ls -l .git/sharedindex.* ls: cannot access '.git/sharedindex.*': No such file or directory - The second index write always splits the index. When the index written in the previous point is read, 'istate->split_index' is initialized because of the presence of the 'link' extension. So during the second write write_locked_index() does run the check in 2), and the first byte of the all zero base index hash always fulfills the randomness condition, which in turn always triggers the index splitting. - Subsequent index writes will find the 'link' extension with a real non-zero base index hash, so from then on the check in 2) is executed and the first byte of the base index hash is as random as it gets (coming from the SHA-1 of index data including timestamps and inodes...). All this worked until 6e37c8ed3c came along, and stopped writing the 'link' extension if the hash of the base index was all zero: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7 not a split index $ ls -l .git/sharedindex.* ls: cannot access '.git/sharedindex.*': No such file or directory Since there is no 'link' extension, in the second index write 'istate->split_index' remains uninitialized, and the check in 2) is not executed, and ultimately the index is never split. Fix this by modifying write_locked_index() to make sure to check GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still uninitialized, and initialize it if necessary. The check for GIT_TEST_SPLIT_INDEX and separate init_split_index() call in do_write_index() thus becomes unnecessary, so remove it. Furthermore, add a test to 't1700-split-index.sh' to make sure that GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the index splitting on the first index write, because after that it will be random). Note that this change does not restore the pre-6e37c8ed3c behaviour, as it will deterministically split the index already on the first index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid, there is no backwards compatibility issue here. The new behaviour does trigger test failures in 't0003-attributes.sh' and 't1600-index.sh', though, which should be fixed in preparatory patches first. --- Notes: The failures in 't1600-index.sh' are easy: those tests check how bogus values in $GIT_INDEX_VERSION or 'index.version' are handled, and all failures look like this: --- expect.err 2020-06-14 17:28:21.043527044 +0000 +++ actual.err 2020-06-14 17:28:21.043527044 +0000 @@ -1,2 +1,4 @@ warning: index.version set, but the value is invalid. Using version +warning: index.version set, but the value is invalid. +Using version We get one such warning when writing any index file, and since we now split the index on the first write, we get one warning when writing the shared index and a second warning when writing '.git/index'. A simple 'sane_unset GIT_TEST_SPLIT_INDEX' fixes these failures, and since this test script is about some intricacies of the index, I think this is the right thing to do in this case. The failure in 't0003-attributes.sh' is tricky, because it does this: # The previous tests wrote the split and shared shared index files # in '$TRASH_DIRECTORY/.git'. ( cd bare.git && GIT_INDEX_FILE=../.git/index \ git check-attr --cached .... ) which fails with: fatal: ./sharedindex.3e9b03a8e3e1533d2bb2e22c77d4784ce2e7b108: index file open failed: No such file or directory Unsetting GIT_TEST_SPLIT_INDEX in the test script fixes this failure as well, of course, though I don't think this is the right thing to do in this case. Worse, this failure raises some serious questions: - Where to look for and where to write the shared index file when using an alternate index location? Currently it's read from / written to gitdir. This might be fine as long as both cwd and the alternate index are in the same repository while writing and reading. However, as this failing test shows, if the shared index was written in one repository, then an attempt to read it while in another repository using GIT_INDEX_FILE will fail. I think it would be more sensible to write the shared index next to the (alternate) index file in the same directory, wherever that might be. But what about backwards compatibility? IOW can there be a sensible use case that relies on GIT_INDEX_FILE pointing to somewhere, but its shared index file is still being written to the current repository? - What about expiring shared index files while using an alternate index? We must be careful not to expire any sharedindex.<sha1> files that are used by '.git/index' or any existing alternate index files. Unfortunately, we are not that careful, and I think this is the root cause of the issue that Alban is trying to fix. The failing test below demonstrates this using basically only 'git update-index', without all the complexity of 'git stash'. test_expect_failure 'split index expiration vs. alternate index' ' >file1 && >file2 && git update-index --split-index --add file1 && # debug: this should print .git/index and .git/sharedindex.<sha1>. echo .git/*index* && GIT_INDEX_FILE=.git/otherindex \ git -c splitIndex.sharedIndexExpire=now \ update-index --split-index --add file2 && # debug: this should print .git/index, .git/otherindex and _two_ # .git/sharedindex.<sha1> files, but, alas, it prints only one # shared index, because the other got expired. echo .git/*index* && git diff --cached --name-only >actual1 && echo file1 >expect1 && test_cmp expect1 actual1 && GIT_INDEX_FILE=.git/otherindex git diff --cached --name-only >actual2 && echo file2 >expect2 && test_cmp expect2 actual2 ' Now, not expiring the shared index used by '.git/index' while writing to an alternate split index is easy, because we do know about '.git/index', of course, and thus can check it to exclude its shared index. But how could a git process expiring shared indexes know whether a shared index file belongs to an alternate index?! I'm afraid it can't, which leads to the next question... - Should we even allow writing a split index when using an alternate index file? If we don't do that, then there is no risk of expiring the shared index of an alternate index, because there will be no such shared index to begin with. Furthermore, if we don't write a shared index while using an alternate index, then we won't trigger expiration while using an alternate index, so we won't accidentally expire the shared index used by '.git/index'. Note, however, that we should still support _reading_ the shared index of an alternate index, as that alternate index might be the "real" '.git/index' of a different repository, like in the failing test in t0003. - Should we even allow 'splitIndex.sharedIndexExpire=now'? I believe, though haven't confirmed, that it can cause trouble even without using an alternate index. Consider the following sequence of events: - Git process A reads '.git/index', finds the 'link' extension, and reads the SHA1 recorded there that determines the filename of its shared index. - The scheduler steps in, and puts process A to sleep. - Git process B updates the index, decides that it's time to write a new shared index, does so, and then because of 'splitIndex.sharedIndexExpire=now' it removes all other shared index files. - The scheduler wakes process A, which now tries to open the shared index file it just learned about, but fails because that file has just been removed by process B. This is similar to the issue we have with 'git gc --prune=now', except that 'git gc's documentation explicitly warns about the risks of using '--prune=now', while the description of 'splitIndex.sharedIndexExpire' doesn't have any such warning. I think that 'splitIndex.sharedIndexExpire=now' should be allowed, for those who hopefully know what they are doing, just as we allow 'git gc --prune=now', but the documentation should clearly warn against its potential pitfalls. read-cache.c | 22 +++++++++++++--------- t/t1700-split-index.sh | 10 ++++++++++ 2 files changed, 23 insertions(+), 9 deletions(-) diff --git a/read-cache.c b/read-cache.c index aa427c5c17..b888c5df44 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2801,11 +2801,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, } } - if (!istate->version) { + if (!istate->version) istate->version = get_index_format_default(the_repository); - if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) - init_split_index(istate); - } /* demote version 3 to version 2 when the latter suffices */ if (istate->version == 3 || istate->version == 2) @@ -3222,7 +3219,7 @@ static int too_many_not_shared_entries(struct index_state *istate) int write_locked_index(struct index_state *istate, struct lock_file *lock, unsigned flags) { - int new_shared_index, ret; + int new_shared_index, ret, test_split_index_env; struct split_index *si = istate->split_index; if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0)) @@ -3237,7 +3234,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, if (istate->fsmonitor_last_update) fill_fsmonitor_bitmap(istate); - if (!si || alternate_index_output || + test_split_index_env = git_env_bool("GIT_TEST_SPLIT_INDEX", 0); + + if ((!si && !test_split_index_env) || alternate_index_output || (istate->cache_changed & ~EXTMASK)) { if (si) oidclr(&si->base_oid); @@ -3245,10 +3244,15 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, goto out; } - if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) { - int v = si->base_oid.hash[0]; - if ((v & 15) < 6) + if (test_split_index_env) { + if (!si) { + si = init_split_index(istate); istate->cache_changed |= SPLIT_INDEX_ORDERED; + } else { + int v = si->base_oid.hash[0]; + if ((v & 15) < 6) + istate->cache_changed |= SPLIT_INDEX_ORDERED; + } } if (too_many_not_shared_entries(istate)) istate->cache_changed |= SPLIT_INDEX_ORDERED; diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 12a5568844..e0c9e16a8a 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -507,4 +507,14 @@ test_expect_success 'do not refresh null base index' ' ) ' +test_expect_success 'GIT_TEST_SPLIT_INDEX works' ' + test_create_repo git-test-split-index && + ( + cd git-test-split-index && + >file && + GIT_TEST_SPLIT_INDEX=1 git update-index --add file && + verbose test $(ls -l .git/sharedindex.* |wc -l) = 1 + ) +' + test_done -- 2.27.0.278.g0bed8b425d
On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote: > - Should we even allow 'splitIndex.sharedIndexExpire=now'? > > I believe, though haven't confirmed, that it can cause trouble > even without using an alternate index. Consider the following > sequence of events: > > - Git process A reads '.git/index', finds the 'link' extension, > and reads the SHA1 recorded there that determines the filename > of its shared index. > > - The scheduler steps in, and puts process A to sleep. > > - Git process B updates the index, decides that it's time to > write a new shared index, does so, and then because of > 'splitIndex.sharedIndexExpire=now' it removes all other shared > index files. > > - The scheduler wakes process A, which now tries to open the > shared index file it just learned about, but fails because > that file has just been removed by process B. Confirmed. To help reproduce the issue, this diff adds a strategically-placed controllable delay between reading '.git/index' and reading its shared/base index: diff --git a/read-cache.c b/read-cache.c index b888c5df44..5a66e9bf4b 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2319,6 +2319,9 @@ int read_index_from(struct index_state *istate, const char *path, else split_index->base = xcalloc(1, sizeof(*split_index->base)); + if (git_env_bool("GIT_TEST_WAIT", 0)) + sleep(3); + base_oid_hex = oid_to_hex(&split_index->base_oid); base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex); trace2_region_enter_printf("index", "shared/do_read_index", Then this test creates the above described sequence of events: test_expect_failure 'splitIndex.sharedIndexExpire=now can be harmful' ' >file1 && >file2 && git update-index --split-index --add file1 && { sleep 1 && # "process B" git -c splitIndex.sharedIndexExpire=now \ update-index --split-index --add file2 & } && # "process A" GIT_TEST_WAIT=1 git diff --cached --name-only ' ... and fails reliably with: [...] + GIT_TEST_WAIT=1 git diff --cached --name-only [ ... trace from background commands removed ...] fatal: .git/sharedindex.818f65852e7402f236aeaadd32efdbb62291aa75: index file open failed: No such file or directory > This is similar to the issue we have with 'git gc --prune=now', > except that 'git gc's documentation explicitly warns about the > risks of using '--prune=now', while the description of > 'splitIndex.sharedIndexExpire' doesn't have any such warning. > > I think that 'splitIndex.sharedIndexExpire=now' should be allowed, > for those who hopefully know what they are doing, just as we allow > 'git gc --prune=now', but the documentation should clearly warn > against its potential pitfalls.
On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote: > On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote: > > - Should we even allow 'splitIndex.sharedIndexExpire=now'? > > > > I believe, though haven't confirmed, that it can cause trouble > > even without using an alternate index. Consider the following > > sequence of events: > > > > - Git process A reads '.git/index', finds the 'link' extension, > > and reads the SHA1 recorded there that determines the filename > > of its shared index. > > > > - The scheduler steps in, and puts process A to sleep. > > > > - Git process B updates the index, decides that it's time to > > write a new shared index, does so, and then because of > > 'splitIndex.sharedIndexExpire=now' it removes all other shared > > index files. > > > > - The scheduler wakes process A, which now tries to open the > > shared index file it just learned about, but fails because > > that file has just been removed by process B. > > Confirmed. > > To help reproduce the issue, this diff adds a strategically-placed > controllable delay between reading '.git/index' and reading its > shared/base index: > > diff --git a/read-cache.c b/read-cache.c > index b888c5df44..5a66e9bf4b 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2319,6 +2319,9 @@ int read_index_from(struct index_state *istate, const char *path, > else > split_index->base = xcalloc(1, sizeof(*split_index->base)); > > + if (git_env_bool("GIT_TEST_WAIT", 0)) > + sleep(3); We don't even need this sleep() call ... > base_oid_hex = oid_to_hex(&split_index->base_oid); > base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex); > trace2_region_enter_printf("index", "shared/do_read_index", > > > Then this test creates the above described sequence of events: > > test_expect_failure 'splitIndex.sharedIndexExpire=now can be harmful' ' > >file1 && > >file2 && > git update-index --split-index --add file1 && > > { > sleep 1 && ... and this 'sleep 1' command. > # "process B" > git -c splitIndex.sharedIndexExpire=now \ > update-index --split-index --add file2 & > } && > > # "process A" > GIT_TEST_WAIT=1 git diff --cached --name-only > ' > > ... and fails reliably with: > > [...] > + GIT_TEST_WAIT=1 git diff --cached --name-only > [ ... trace from background commands removed ...] > fatal: .git/sharedindex.818f65852e7402f236aeaadd32efdbb62291aa75: index file open failed: No such file or directory Flipping the test to 'test_expect_success' and running it with '--stress' almost always fails within 2 seconds. > > This is similar to the issue we have with 'git gc --prune=now', > > except that 'git gc's documentation explicitly warns about the > > risks of using '--prune=now', while the description of > > 'splitIndex.sharedIndexExpire' doesn't have any such warning. > > > > I think that 'splitIndex.sharedIndexExpire=now' should be allowed, > > for those who hopefully know what they are doing, just as we allow > > 'git gc --prune=now', but the documentation should clearly warn > > against its potential pitfalls.
SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote:
>> On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
>> > - Should we even allow 'splitIndex.sharedIndexExpire=now'?
Good analysis. The most conservative might end up to be to disable
splitindex altogether but perhaps we can first set a reasonable
minimum to the expiration to say 10min?
Hi,
Le 17/06/2020 à 22:04, Junio C Hamano a écrit :
> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>> On Mon, Jun 15, 2020 at 11:50:20PM +0200, SZEDER Gábor wrote:
>>> On Mon, Jun 15, 2020 at 05:27:15PM +0200, SZEDER Gábor wrote:
>>>> - Should we even allow 'splitIndex.sharedIndexExpire=now'?
>
> Good analysis. The most conservative might end up to be to disable
> splitindex altogether but perhaps we can first set a reasonable
> minimum to the expiration to say 10min?
>
I tried to understand what write_locked_index() and its callees does,
and I think there is an issue regarding how split-index files are handled:
1. The new "main" index is written in a temporary file. This temporary
file is either renamed to "index" if the COMMIT_LOCK flag is set, or
left as-is (cf. do_write_locked_index()).
2. The split-index files are removed if they are too old, without
checking if the COMMIT_LOCK flag is set (cf. write_shared_index()).
This could lead to a situation where a split-index file is removed
because it is deemed too old, but the main index is left as-is, still
pointing to this file. I am afraid that this can be an issue, even when
`splitIndex.sharedIndexExpire' is set to a "sane" value (which could
obviously be exacerbated by `now').
Alban
Hi Christian,
Le 13/06/2020 à 09:52, Christian Couder a écrit :
> Hi,
>
> On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> The old scripted `git stash' used to create a second index to save
>> modified and untracked files, and restore untracked files, without
>> affecting the main index. This behaviour was carried on when it was
>> rewritten in C, and here, most operations performed on the second index
>> are done by forked commands (ie. `read-tree' instead of reset_tree(),
>> etc.). This works most of the time, except in some edge case with the
>> split-index when the split file has expired and is deleted by a forked
>> command: the main index may still contain a reference to the now-deleted
>> file, and subsequent operations on the index will fail [0].
>
> Thanks for working on this! I agree that it would be nice to fix split
> index issues as it could help for sure with huge repositories. Sorry
> also that this patch series fell through the cracks.
>
> I am adding Son Luong Ngoc in Cc as he reported the issue that this
> series fixes.
>
>> The goal of this series is to modernise (a bit) builtin/stash.c, and to
>> fix the aforementionned edge case.
>>
>> I have to admit that I don't really know how to test this.
>> GIT_TEST_SPLIT_INDEX failed on me (gdb showed me that it does not enable
>> the split-index at all, at least in `git stash' and its forks),
>
> It should have worked when it was introduced, though maybe not for `git stash`.
>
>> and I'm
>> reluctant to add explicits tests on `git stash' about the split-index,
>> when nothing in its code explicitly does unusual things with the index
>> once this series is applied. If anyone wants to share opinions about
>> this, I would be happy to read them.
>
> I understand. I think the good way forward would be to fix
> GIT_TEST_SPLIT_INDEX and find a way to ensure that it keeps working in
> the future.
>
> Thanks,
> Christian.
>
Thank you for your feedback.
I'll resend this series with the changes you and Son suggested, but I
think I'm going to remove references to bugs in the commit messages, to
turn it into another cleanup series. For the index, I will to try to
implement Gábor's suggestions in another series.
Thanks,
Alban
The old scripted `git stash' used to create a second index to save modified and untracked files, and restore untracked files, without affecting the main index. This behaviour was carried on when it was rewritten in C, and here, most operations performed on the second index are done by forked commands (ie. `read-tree' instead of reset_tree(), etc.). The goal of this series is to modernise (a bit) builtin/stash.c. Originally, this series was also meant to fix a bug reported by Son Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited to `git stash', so this series is not a good fix for this particular issue. This series is based on a08a83db2b (The sixth batch, 2020-06-29). The tip of this series is tagged as "stash-remove-second-index-v2" at https://github.com/agrn/git. [0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/ [1] https://lore.kernel.org/git/20200615152715.GD2898@szeder.dev/ Changes since v1: - Lots of rewording, following comments from Christian Couder and Son Luong Ngoc. - Removed a useless function call. Alban Gruin (6): stash: mark `i_tree' in reset_tree() const stash: remove the second index in stash_working_tree() stash: remove the second index in stash_patch() stash: remove the second index in save_untracked_files() stash: remove the second index in restore_untracked() stash: remove `stash_index_path' builtin/stash.c | 156 +++++++++++++++--------------------------------- 1 file changed, 48 insertions(+), 108 deletions(-) Range-diff against v1: 1: 6101e8ce64 ! 1: 598f03e76e stash: mark `i_tree' in reset_tree() const @@ -2,9 +2,9 @@ stash: mark `i_tree' in reset_tree() const - As reset_tree() does not change the value pointed by `i_tree', and that - it will be provided with `the_hash_algo->empty_tree' which is a - constant, it is changed to be a pointer to a constant. + In reset_tree(), the value pointed by `i_tree' is not modified. In a + latter commit, it will be provided with `the_hash_algo->empty_tree', + which is a constant. Hence, this changes `i_tree' to be a constant. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> 2: 26dceead41 ! 2: 7265cfe65c stash: remove the second index in stash_working_tree() @@ -3,26 +3,19 @@ stash: remove the second index in stash_working_tree() This removes the second index used in stash_working_tree() to simplify - the code. It also help to avoid issues with the split-index: when - stash_working_tree() is called, the index is at `i_tree', and this tree - is extracted in a second index for use in a subcommand. This is not a - problem in the non-split-index case, but in the split-index case, if the - shared index file has expired and is removed by a subcommand, the main - index contains a reference to a file that no longer exists. + the code. The calls to set_alternative_index_output() are dropped to extract `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before starting `update-index'. When it exits, the index has changed, and must be discarded. - The call to reset_tree() becomes useless: the only caller of - stash_working_tree() is do_create_stash(), which creates `i_tree' from - its index, calls save_untracked_files() if requested (but as it also - works on a second index, it is unaffected), then calls - stash_working_tree(). But when save_untracked_files() will be modified - to stop using another index, it won't reset the tree, because - stash_patch() wants to work on a different tree (`b_tree') than - stash_working_tree(). + With this commit, reset_tree() does not need to be called at the + beginning of stash_working_tree(), because it is only called by + do_create_stash(), which sets the index at `i_tree', and + save_untracked_files() does not change the main index. But it will + become useful again in a later commit, when save_untracked_file() will + be rewritten to use the "main" index, so I did not remove it. At the end of the function, the tree is reset to `i_tree'. 3: 1ae236daf4 = 3: b5587c0d56 stash: remove the second index in stash_patch() 4: 8c901ddf9d ! 4: 5fbcddf899 stash: remove the second index in save_untracked_files() @@ -8,7 +8,7 @@ containing only untracked files, so the tree is reset to the empty tree at the beginning (hence the need for reset_tree() to accept constant trees). The environment of `update-index' is no longer updated, and the - index is discarded after this command exited. + index is discarded after this command exits. As the only caller of this function is do_create_stash(), and the next user of the index is either save_untracked_files() or stash_patch(), 5: 413be5b5f9 ! 5: 0338986339 stash: remove the second index in restore_untracked() @@ -42,7 +42,7 @@ return -1; - } - child_process_init(&cp); +- child_process_init(&cp); cp.git_cmd = 1; argv_array_pushl(&cp.args, "checkout-index", "--all", NULL); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", 6: b2c08c189e = 6: 4f151f4600 stash: remove `stash_index_path' -- 2.20.1
In reset_tree(), the value pointed by `i_tree' is not modified. In a latter commit, it will be provided with `the_hash_algo->empty_tree', which is a constant. Hence, this changes `i_tree' to be a constant. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/stash.c b/builtin/stash.c index 0c52a3b849..9baa8b379e 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix) return do_clear_stash(); } -static int reset_tree(struct object_id *i_tree, int update, int reset) +static int reset_tree(const struct object_id *i_tree, int update, int reset) { int nr_trees = 1; struct unpack_trees_options opts; -- 2.20.1
This removes the second index used in stash_working_tree() to simplify the code. The calls to set_alternative_index_output() are dropped to extract `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before starting `update-index'. When it exits, the index has changed, and must be discarded. With this commit, reset_tree() does not need to be called at the beginning of stash_working_tree(), because it is only called by do_create_stash(), which sets the index at `i_tree', and save_untracked_files() does not change the main index. But it will become useful again in a later commit, when save_untracked_file() will be rewritten to use the "main" index, so I did not remove it. At the end of the function, the tree is reset to `i_tree'. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 9baa8b379e..2535335275 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps struct rev_info rev; struct child_process cp_upd_index = CHILD_PROCESS_INIT; struct strbuf diff_output = STRBUF_INIT; - struct index_state istate = { NULL }; init_revisions(&rev, NULL); copy_pathspec(&rev.prune_data, ps); - set_alternate_index_output(stash_index_path.buf); if (reset_tree(&info->i_tree, 0, 0)) { ret = -1; goto done; } - set_alternate_index_output(NULL); rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; rev.diffopt.format_callback = add_diff_to_buf; @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps argv_array_pushl(&cp_upd_index.args, "update-index", "--ignore-skip-worktree-entries", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len, NULL, 0, NULL, 0)) { @@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps goto done; } - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL) || + reset_tree(&info->i_tree, 0, 1)) ret = -1; - goto done; - } done: - discard_index(&istate); UNLEAK(rev); object_array_clear(&rev.pending); clear_pathspec(&rev.prune_data); strbuf_release(&diff_output); - remove_path(stash_index_path.buf); return ret; } -- 2.20.1
This removes the second index used in stash_patch(). This function starts by resetting the index (which is set at `i_tree') to HEAD, which has been stored in `b_commit' by do_create_stash(), and the call to `read-tree' is replaced by reset_tree(). The index is discarded after run_add_interactive(), but not `diff-tree' as this command should not change it. Since the index has been changed, and subsequent code might be sensitive to this, it is reset to `i_tree' at the end of the function. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 47 ++++++++++------------------------------------- 1 file changed, 10 insertions(+), 37 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 2535335275..eaeb7bc8c4 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, struct strbuf *out_patch, int quiet) { int ret = 0; - struct child_process cp_read_tree = CHILD_PROCESS_INIT; struct child_process cp_diff_tree = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; - char *old_index_env = NULL, *old_repo_index_file; - remove_path(stash_index_path.buf); - - cp_read_tree.git_cmd = 1; - argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL); - argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp_read_tree)) { - ret = -1; - goto done; - } + if (reset_tree(&info->b_commit, 0, 1)) + return -1; /* Find out what the user wants. */ - old_repo_index_file = the_repository->index_file; - the_repository->index_file = stash_index_path.buf; - old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT)); - setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1); - ret = run_add_interactive(NULL, "--patch=stash", ps); - the_repository->index_file = old_repo_index_file; - if (old_index_env && *old_index_env) - setenv(INDEX_ENVIRONMENT, old_index_env, 1); - else - unsetenv(INDEX_ENVIRONMENT); - FREE_AND_NULL(old_index_env); - /* State of the working tree. */ - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL)) + return -1; cp_diff_tree.git_cmd = 1; argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD", oid_to_hex(&info->w_tree), "--", NULL); - if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) { - ret = -1; - goto done; - } + if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) + return -1; if (!out_patch->len) { if (!quiet) @@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, ret = 1; } -done: - discard_index(&istate); - remove_path(stash_index_path.buf); + if (reset_tree(&info->i_tree, 0, 1)) + return -1; + return ret; } -- 2.20.1
This removes the second index used in save_untracked_files(). This functions creates a new commit with no parents, and a tree containing only untracked files, so the tree is reset to the empty tree at the beginning (hence the need for reset_tree() to accept constant trees). The environment of `update-index' is no longer updated, and the index is discarded after this command exits. As the only caller of this function is do_create_stash(), and the next user of the index is either save_untracked_files() or stash_patch(), which both want a different tree, the index is left as-is at the end. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index eaeb7bc8c4..cbe37cd24b 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked, static int save_untracked_files(struct stash_info *info, struct strbuf *msg, struct strbuf files) { - int ret = 0; struct strbuf untracked_msg = STRBUF_INIT; struct child_process cp_upd_index = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; + + if (reset_tree(the_hash_algo->empty_tree, 0, 1)) + return -1; cp_upd_index.git_cmd = 1; argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); + + if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0, + NULL, 0)) + return -1; + + discard_cache(); + if (write_cache_as_tree(&info->u_tree, 0, NULL)) + return -1; strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf); - if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0, - NULL, 0)) { - ret = -1; - goto done; - } - - if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } - if (commit_tree(untracked_msg.buf, untracked_msg.len, &info->u_tree, NULL, &info->u_commit, NULL, NULL)) { - ret = -1; - goto done; + strbuf_release(&untracked_msg); + return -1; } -done: - discard_index(&istate); + /* Do not reset the tree, as either stash_patch() or + * stash_working_tree() will do it. */ + strbuf_release(&untracked_msg); - remove_path(stash_index_path.buf); - return ret; + return 0; } static int stash_patch(struct stash_info *info, const struct pathspec *ps, -- 2.20.1
This removes the second index used in restore_untracked(). The call to `read-tree' is replaced by reset_tree() with the appropriate parameters (no update, no reset). The environment of `checkout-index' is no longer modified, and the cache is discarded when it exists. In do_apply_stash(), the changes are a bit more involved: to avoid conflicts with the merged index, restore_untracked() is moved after merge_recursive_generic(). This introduces another problem: the files that were untracked once are now added to the index, and update_index() would add back those files in the index. To avoid this, get_newly_staged() is moved before restore_untracked(). Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 47 +++++++++++++++++------------------------------ 1 file changed, 17 insertions(+), 30 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index cbe37cd24b..d5077a27d9 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -359,29 +359,15 @@ static int restore_untracked(struct object_id *u_tree) int res; struct child_process cp = CHILD_PROCESS_INIT; - /* - * We need to run restore files from a given index, but without - * affecting the current index, so we use GIT_INDEX_FILE with - * run_command to fork processes that will not interfere. - */ - cp.git_cmd = 1; - argv_array_push(&cp.args, "read-tree"); - argv_array_push(&cp.args, oid_to_hex(u_tree)); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp)) { - remove_path(stash_index_path.buf); + if (reset_tree(u_tree, 0, 0)) return -1; - } - child_process_init(&cp); cp.git_cmd = 1; argv_array_pushl(&cp.args, "checkout-index", "--all", NULL); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); res = run_command(&cp); - remove_path(stash_index_path.buf); + discard_cache(); + return res; } @@ -395,6 +381,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, struct object_id index_tree; struct commit *result; const struct object_id *bases[1]; + struct strbuf newly_staged = STRBUF_INIT; read_cache_preload(NULL); if (refresh_and_write_cache(REFRESH_QUIET, 0, 0)) @@ -433,9 +420,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, } } - if (info->has_u && restore_untracked(&info->u_tree)) - return error(_("could not restore untracked files from stash")); - init_merge_options(&o, the_repository); o.branch1 = "Updated upstream"; @@ -463,24 +447,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, return ret; } + if (!has_index && get_newly_staged(&newly_staged, &c_tree)) { + strbuf_release(&newly_staged); + return -1; + } + + if (info->has_u && restore_untracked(&info->u_tree)) { + strbuf_release(&newly_staged); + return error(_("could not restore untracked files from stash")); + } + if (has_index) { if (reset_tree(&index_tree, 0, 0)) return -1; } else { - struct strbuf out = STRBUF_INIT; - - if (get_newly_staged(&out, &c_tree)) { - strbuf_release(&out); - return -1; - } - if (reset_tree(&c_tree, 0, 1)) { - strbuf_release(&out); + strbuf_release(&newly_staged); return -1; } - ret = update_index(&out); - strbuf_release(&out); + ret = update_index(&newly_staged); + strbuf_release(&newly_staged); if (ret) return -1; -- 2.20.1
Since stash no longer uses a second index, `stash_index_path' is now unused, and can be dropped. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index d5077a27d9..f1b3c0d2f8 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = { }; static const char *ref_stash = "refs/stash"; -static struct strbuf stash_index_path = STRBUF_INIT; /* * w_commit is set to the commit containing the working tree @@ -1505,8 +1504,6 @@ static int save_stash(int argc, const char **argv, const char *prefix) int cmd_stash(int argc, const char **argv, const char *prefix) { - pid_t pid = getpid(); - const char *index_file; struct argv_array args = ARGV_ARRAY_INIT; struct option options[] = { @@ -1523,10 +1520,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_stash_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH); - index_file = get_index_file(); - strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file, - (uintmax_t)pid); - if (!argc) return !!push_stash(0, NULL, prefix, 0); else if (!strcmp(argv[0], "apply")) -- 2.20.1
On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> This removes the second index used in restore_untracked().
>
> The call to `read-tree' is replaced by reset_tree() with the appropriate
> parameters (no update, no reset). The environment of `checkout-index'
> is no longer modified, and the cache is discarded when it exists.
Do you mean "when it exits" or "when it exists"? It could be confusing
as you use "exit" not "exist" in 2 previous commits.
On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote: > > The old scripted `git stash' used to create a second index to save > modified and untracked files, and restore untracked files, without > affecting the main index. This behaviour was carried on when it was > rewritten in C, and here, most operations performed on the second index > are done by forked commands (ie. `read-tree' instead of reset_tree(), > etc.). > > The goal of this series is to modernise (a bit) builtin/stash.c. This patch series looks good to me. I found only a small nit or typo in the commit message of patch 5/6. > Originally, this series was also meant to fix a bug reported by Son > Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited > to `git stash', so this series is not a good fix for this particular > issue. > > This series is based on a08a83db2b (The sixth batch, 2020-06-29). It seems to apply without conflicts on top of current master. > Changes since v1: > > - Lots of rewording, following comments from Christian Couder and Son > Luong Ngoc. > > - Removed a useless function call. [...] > builtin/stash.c | 156 +++++++++++++++--------------------------------- > 1 file changed, 48 insertions(+), 108 deletions(-) I like very much how it simplifies a lot of things. Thanks, Christian.
Hi Christian,
Le 31/07/2020 à 15:45, Christian Couder a écrit :
> On Tue, Jun 30, 2020 at 5:16 PM Alban Gruin <alban.gruin@gmail.com> wrote:
>>
>> This removes the second index used in restore_untracked().
>>
>> The call to `read-tree' is replaced by reset_tree() with the appropriate
>> parameters (no update, no reset). The environment of `checkout-index'
>> is no longer modified, and the cache is discarded when it exists.
>
> Do you mean "when it exits" or "when it exists"? It could be confusing
> as you use "exit" not "exist" in 2 previous commits.
>
I mean "when it exits", as in the previous commits.
Alban
The old scripted `git stash' used to create a second index to save modified and untracked files, and restore untracked files, without affecting the main index. This behaviour was carried on when it was rewritten in C, and here, most operations performed on the second index are done by forked commands (ie. `read-tree' instead of reset_tree(), etc.). The goal of this series is to modernise (a bit) builtin/stash.c. Originally, this series was also meant to fix a bug reported by Son Luong Ngoc [0], but as emphasized by Gábor [1], the issue is not limited to `git stash', so this series is not a good fix for this particular issue. This series is based on a08a83db2b (The sixth batch, 2020-06-29), and it applies cleanly on top of the current master (e8ab941b67 (The second batch -- mostly minor typofixes, 2020-07-30)). The tip of this series is tagged as "stash-remove-second-index-v3" at https://github.com/agrn/git. [0] https://lore.kernel.org/git/EED2CFF1-5BEF-429D-AB99-AD148A867614@gmail.com/ [1] https://lore.kernel.org/git/20200615152715.GD2898@szeder.dev/ Changes since v2: - Typofix in the fifth patch, noticed by Christian Couder. Alban Gruin (6): stash: mark `i_tree' in reset_tree() const stash: remove the second index in stash_working_tree() stash: remove the second index in stash_patch() stash: remove the second index in save_untracked_files() stash: remove the second index in restore_untracked() stash: remove `stash_index_path' builtin/stash.c | 156 +++++++++++++++--------------------------------- 1 file changed, 48 insertions(+), 108 deletions(-) Range-diff against v2: 1: 598f03e76e = 1: 598f03e76e stash: mark `i_tree' in reset_tree() const 2: 7265cfe65c = 2: 7265cfe65c stash: remove the second index in stash_working_tree() 3: b5587c0d56 = 3: b5587c0d56 stash: remove the second index in stash_patch() 4: 5fbcddf899 = 4: 5fbcddf899 stash: remove the second index in save_untracked_files() 5: 0338986339 ! 5: a12a923014 stash: remove the second index in restore_untracked() @@ -6,7 +6,7 @@ The call to `read-tree' is replaced by reset_tree() with the appropriate parameters (no update, no reset). The environment of `checkout-index' - is no longer modified, and the cache is discarded when it exists. + is no longer modified, and the cache is discarded when it exits. In do_apply_stash(), the changes are a bit more involved: to avoid conflicts with the merged index, restore_untracked() is moved after 6: 4f151f4600 = 6: dd25e71f35 stash: remove `stash_index_path' -- 2.20.1
In reset_tree(), the value pointed by `i_tree' is not modified. In a latter commit, it will be provided with `the_hash_algo->empty_tree', which is a constant. Hence, this changes `i_tree' to be a constant. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/stash.c b/builtin/stash.c index 0c52a3b849..9baa8b379e 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix) return do_clear_stash(); } -static int reset_tree(struct object_id *i_tree, int update, int reset) +static int reset_tree(const struct object_id *i_tree, int update, int reset) { int nr_trees = 1; struct unpack_trees_options opts; -- 2.20.1
This removes the second index used in stash_working_tree() to simplify the code. The calls to set_alternative_index_output() are dropped to extract `i_tree' to the main index, and `GIT_INDEX_FILE' is no longer set before starting `update-index'. When it exits, the index has changed, and must be discarded. With this commit, reset_tree() does not need to be called at the beginning of stash_working_tree(), because it is only called by do_create_stash(), which sets the index at `i_tree', and save_untracked_files() does not change the main index. But it will become useful again in a later commit, when save_untracked_file() will be rewritten to use the "main" index, so I did not remove it. At the end of the function, the tree is reset to `i_tree'. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 9baa8b379e..2535335275 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -1059,17 +1059,14 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps struct rev_info rev; struct child_process cp_upd_index = CHILD_PROCESS_INIT; struct strbuf diff_output = STRBUF_INIT; - struct index_state istate = { NULL }; init_revisions(&rev, NULL); copy_pathspec(&rev.prune_data, ps); - set_alternate_index_output(stash_index_path.buf); if (reset_tree(&info->i_tree, 0, 0)) { ret = -1; goto done; } - set_alternate_index_output(NULL); rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; rev.diffopt.format_callback = add_diff_to_buf; @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps argv_array_pushl(&cp_upd_index.args, "update-index", "--ignore-skip-worktree-entries", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len, NULL, 0, NULL, 0)) { @@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps goto done; } - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL) || + reset_tree(&info->i_tree, 0, 1)) ret = -1; - goto done; - } done: - discard_index(&istate); UNLEAK(rev); object_array_clear(&rev.pending); clear_pathspec(&rev.prune_data); strbuf_release(&diff_output); - remove_path(stash_index_path.buf); return ret; } -- 2.20.1
This removes the second index used in stash_patch(). This function starts by resetting the index (which is set at `i_tree') to HEAD, which has been stored in `b_commit' by do_create_stash(), and the call to `read-tree' is replaced by reset_tree(). The index is discarded after run_add_interactive(), but not `diff-tree' as this command should not change it. Since the index has been changed, and subsequent code might be sensitive to this, it is reset to `i_tree' at the end of the function. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 47 ++++++++++------------------------------------- 1 file changed, 10 insertions(+), 37 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index 2535335275..eaeb7bc8c4 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -995,51 +995,24 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, struct strbuf *out_patch, int quiet) { int ret = 0; - struct child_process cp_read_tree = CHILD_PROCESS_INIT; struct child_process cp_diff_tree = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; - char *old_index_env = NULL, *old_repo_index_file; - remove_path(stash_index_path.buf); - - cp_read_tree.git_cmd = 1; - argv_array_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL); - argv_array_pushf(&cp_read_tree.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp_read_tree)) { - ret = -1; - goto done; - } + if (reset_tree(&info->b_commit, 0, 1)) + return -1; /* Find out what the user wants. */ - old_repo_index_file = the_repository->index_file; - the_repository->index_file = stash_index_path.buf; - old_index_env = xstrdup_or_null(getenv(INDEX_ENVIRONMENT)); - setenv(INDEX_ENVIRONMENT, the_repository->index_file, 1); - ret = run_add_interactive(NULL, "--patch=stash", ps); - the_repository->index_file = old_repo_index_file; - if (old_index_env && *old_index_env) - setenv(INDEX_ENVIRONMENT, old_index_env, 1); - else - unsetenv(INDEX_ENVIRONMENT); - FREE_AND_NULL(old_index_env); - /* State of the working tree. */ - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } + discard_cache(); + if (write_cache_as_tree(&info->w_tree, 0, NULL)) + return -1; cp_diff_tree.git_cmd = 1; argv_array_pushl(&cp_diff_tree.args, "diff-tree", "-p", "-U1", "HEAD", oid_to_hex(&info->w_tree), "--", NULL); - if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) { - ret = -1; - goto done; - } + if (pipe_command(&cp_diff_tree, NULL, 0, out_patch, 0, NULL, 0)) + return -1; if (!out_patch->len) { if (!quiet) @@ -1047,9 +1020,9 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps, ret = 1; } -done: - discard_index(&istate); - remove_path(stash_index_path.buf); + if (reset_tree(&info->i_tree, 0, 1)) + return -1; + return ret; } -- 2.20.1
This removes the second index used in save_untracked_files(). This functions creates a new commit with no parents, and a tree containing only untracked files, so the tree is reset to the empty tree at the beginning (hence the need for reset_tree() to accept constant trees). The environment of `update-index' is no longer updated, and the index is discarded after this command exits. As the only caller of this function is do_create_stash(), and the next user of the index is either save_untracked_files() or stash_patch(), which both want a different tree, the index is left as-is at the end. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 39 +++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 22 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index eaeb7bc8c4..cbe37cd24b 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -954,41 +954,36 @@ static int check_changes(const struct pathspec *ps, int include_untracked, static int save_untracked_files(struct stash_info *info, struct strbuf *msg, struct strbuf files) { - int ret = 0; struct strbuf untracked_msg = STRBUF_INIT; struct child_process cp_upd_index = CHILD_PROCESS_INIT; - struct index_state istate = { NULL }; + + if (reset_tree(the_hash_algo->empty_tree, 0, 1)) + return -1; cp_upd_index.git_cmd = 1; argv_array_pushl(&cp_upd_index.args, "update-index", "-z", "--add", "--remove", "--stdin", NULL); - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); + + if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0, + NULL, 0)) + return -1; + + discard_cache(); + if (write_cache_as_tree(&info->u_tree, 0, NULL)) + return -1; strbuf_addf(&untracked_msg, "untracked files on %s\n", msg->buf); - if (pipe_command(&cp_upd_index, files.buf, files.len, NULL, 0, - NULL, 0)) { - ret = -1; - goto done; - } - - if (write_index_as_tree(&info->u_tree, &istate, stash_index_path.buf, 0, - NULL)) { - ret = -1; - goto done; - } - if (commit_tree(untracked_msg.buf, untracked_msg.len, &info->u_tree, NULL, &info->u_commit, NULL, NULL)) { - ret = -1; - goto done; + strbuf_release(&untracked_msg); + return -1; } -done: - discard_index(&istate); + /* Do not reset the tree, as either stash_patch() or + * stash_working_tree() will do it. */ + strbuf_release(&untracked_msg); - remove_path(stash_index_path.buf); - return ret; + return 0; } static int stash_patch(struct stash_info *info, const struct pathspec *ps, -- 2.20.1
This removes the second index used in restore_untracked(). The call to `read-tree' is replaced by reset_tree() with the appropriate parameters (no update, no reset). The environment of `checkout-index' is no longer modified, and the cache is discarded when it exits. In do_apply_stash(), the changes are a bit more involved: to avoid conflicts with the merged index, restore_untracked() is moved after merge_recursive_generic(). This introduces another problem: the files that were untracked once are now added to the index, and update_index() would add back those files in the index. To avoid this, get_newly_staged() is moved before restore_untracked(). Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 47 +++++++++++++++++------------------------------ 1 file changed, 17 insertions(+), 30 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index cbe37cd24b..d5077a27d9 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -359,29 +359,15 @@ static int restore_untracked(struct object_id *u_tree) int res; struct child_process cp = CHILD_PROCESS_INIT; - /* - * We need to run restore files from a given index, but without - * affecting the current index, so we use GIT_INDEX_FILE with - * run_command to fork processes that will not interfere. - */ - cp.git_cmd = 1; - argv_array_push(&cp.args, "read-tree"); - argv_array_push(&cp.args, oid_to_hex(u_tree)); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); - if (run_command(&cp)) { - remove_path(stash_index_path.buf); + if (reset_tree(u_tree, 0, 0)) return -1; - } - child_process_init(&cp); cp.git_cmd = 1; argv_array_pushl(&cp.args, "checkout-index", "--all", NULL); - argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", - stash_index_path.buf); res = run_command(&cp); - remove_path(stash_index_path.buf); + discard_cache(); + return res; } @@ -395,6 +381,7 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, struct object_id index_tree; struct commit *result; const struct object_id *bases[1]; + struct strbuf newly_staged = STRBUF_INIT; read_cache_preload(NULL); if (refresh_and_write_cache(REFRESH_QUIET, 0, 0)) @@ -433,9 +420,6 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, } } - if (info->has_u && restore_untracked(&info->u_tree)) - return error(_("could not restore untracked files from stash")); - init_merge_options(&o, the_repository); o.branch1 = "Updated upstream"; @@ -463,24 +447,27 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, return ret; } + if (!has_index && get_newly_staged(&newly_staged, &c_tree)) { + strbuf_release(&newly_staged); + return -1; + } + + if (info->has_u && restore_untracked(&info->u_tree)) { + strbuf_release(&newly_staged); + return error(_("could not restore untracked files from stash")); + } + if (has_index) { if (reset_tree(&index_tree, 0, 0)) return -1; } else { - struct strbuf out = STRBUF_INIT; - - if (get_newly_staged(&out, &c_tree)) { - strbuf_release(&out); - return -1; - } - if (reset_tree(&c_tree, 0, 1)) { - strbuf_release(&out); + strbuf_release(&newly_staged); return -1; } - ret = update_index(&out); - strbuf_release(&out); + ret = update_index(&newly_staged); + strbuf_release(&newly_staged); if (ret) return -1; -- 2.20.1
Since stash no longer uses a second index, `stash_index_path' is now unused, and can be dropped. Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- builtin/stash.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/builtin/stash.c b/builtin/stash.c index d5077a27d9..f1b3c0d2f8 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -88,7 +88,6 @@ static const char * const git_stash_save_usage[] = { }; static const char *ref_stash = "refs/stash"; -static struct strbuf stash_index_path = STRBUF_INIT; /* * w_commit is set to the commit containing the working tree @@ -1505,8 +1504,6 @@ static int save_stash(int argc, const char **argv, const char *prefix) int cmd_stash(int argc, const char **argv, const char *prefix) { - pid_t pid = getpid(); - const char *index_file; struct argv_array args = ARGV_ARRAY_INIT; struct option options[] = { @@ -1523,10 +1520,6 @@ int cmd_stash(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, git_stash_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH); - index_file = get_index_file(); - strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file, - (uintmax_t)pid); - if (!argc) return !!push_stash(0, NULL, prefix, 0); else if (!strcmp(argv[0], "apply")) -- 2.20.1
Alban Gruin <alban.gruin@gmail.com> writes: > The old scripted `git stash' used to create a second index to save > modified and untracked files, and restore untracked files, without > affecting the main index. This behaviour was carried on when it was > rewritten in C, and here, most operations performed on the second index > are done by forked commands (ie. `read-tree' instead of reset_tree(), > etc.). Does the "second index" in the title refer to the on-disk $TMPindex in https://github.com/git/git/blob/ffac537e6c/git-stash.sh#L147 that is used to create a tree object $u_tree (and similarly for the working tree files $w_tree)? > The goal of this series is to modernise (a bit) builtin/stash.c. Modernise in what way is quite unclear. With the internal API we have available from C code, we can create a tree object from an in-core index without writing the in-core index out to an on-disk file that is different from the main on-disk index file, and I suspect, from the "drop usage of" in the title, that it is what this series is trying to do, but the description could have been written in a way that is more helpful to readers to understand it without having to guess. It made me wonder if you are not even using the secondary in-core index and no longer writing the tree to record the untracked paths and their contents, but obviously such a patch would not work well ;-)
Alban Gruin <alban.gruin@gmail.com> writes: > This removes the second index used in stash_working_tree() to simplify > the code. Continuing what I said for [0/6], say "the second index file" to clarify the distinction between the in-core index and on-disk index files to avoid confusion. > init_revisions(&rev, NULL); > copy_pathspec(&rev.prune_data, ps); > > - set_alternate_index_output(stash_index_path.buf); > if (reset_tree(&info->i_tree, 0, 0)) { > ret = -1; > goto done; > } > - set_alternate_index_output(NULL); Hmph. So at this point i_tree is what is in the index, we reset the working tree to it with reset_tree(), and instead of writing to $TMPindex, we let reset_tree() clobber the main on-disk index but we do not care because the result is supposed to be the same as what was originally in the index anyway? > @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps > argv_array_pushl(&cp_upd_index.args, "update-index", > "--ignore-skip-worktree-entries", > "-z", "--add", "--remove", "--stdin", NULL); > - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", > - stash_index_path.buf); > > if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len, > NULL, 0, NULL, 0)) { And then the new code now lets "update-index" work directly on the main index (which does make an observable difference to the outside world, but we are not letting any hook to look at this intermediate state, so it might be OK---I cannot tell at this point in the code). > @@ -1100,19 +1095,16 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps > goto done; > } > > - if (write_index_as_tree(&info->w_tree, &istate, stash_index_path.buf, 0, > - NULL)) { > + discard_cache(); > + if (write_cache_as_tree(&info->w_tree, 0, NULL) || > + reset_tree(&info->i_tree, 0, 1)) We used to read from $TMPindex, which has been updated with the contents of files modified in the working tree, and write its content out as a tree object, grabbing its object name into w_tree. The new code instead writes out the same tree from the main index, which has been clobbered with the working tree changes that the user hasn't added to the index. Because of that, we need to discard these changes and re-read the in-core index out of i_tree with reset_tree(). So, this change makes one new call to reset_tree() that scans and updates working tree files that are modified? How expensive is that? It's not like this change trades cost to write out a temporary index file with the cost of having to call reset_tree() one more time---as far as I can see, the new code writes to on-disk index file the same time as the original code. How is the use of second on-disk index hurting us? I must be missing something obvious, but at this point, it is not clear what we are gaining---I only see downsides.
Alban Gruin <alban.gruin@gmail.com> writes: > In reset_tree(), the value pointed by `i_tree' is not modified. In a This helper function is about "reset the working tree and the index to match this given tree object" and it is clear that it is an input-only parameter, so that alone is a good justification for code hygiene. Good. > latter commit, it will be provided with `the_hash_algo->empty_tree', > which is a constant. Hence, this changes `i_tree' to be a constant. > > Signed-off-by: Alban Gruin <alban.gruin@gmail.com> > --- > builtin/stash.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/stash.c b/builtin/stash.c > index 0c52a3b849..9baa8b379e 100644 > --- a/builtin/stash.c > +++ b/builtin/stash.c > @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix) > return do_clear_stash(); > } > > -static int reset_tree(struct object_id *i_tree, int update, int reset) > +static int reset_tree(const struct object_id *i_tree, int update, int reset) > { > int nr_trees = 1; > struct unpack_trees_options opts;
Junio C Hamano <gitster@pobox.com> writes: >> @@ -1091,8 +1088,6 @@ static int stash_working_tree(struct stash_info *info, const struct pathspec *ps >> argv_array_pushl(&cp_upd_index.args, "update-index", >> "--ignore-skip-worktree-entries", >> "-z", "--add", "--remove", "--stdin", NULL); >> - argv_array_pushf(&cp_upd_index.env_array, "GIT_INDEX_FILE=%s", >> - stash_index_path.buf); >> >> if (pipe_command(&cp_upd_index, diff_output.buf, diff_output.len, >> NULL, 0, NULL, 0)) { > > And then the new code now lets "update-index" work directly on the > main index (which does make an observable difference to the outside > world, but we are not letting any hook to look at this intermediate > state, so it might be OK---I cannot tell at this point in the code). This changes the behaviour of the command a lot when a fatal happens anywhere after this point til we reset the index to match the HEAD, doesn't it, I wonder. The original code refrained from touching the index so after such a "fatal:" error message, the user would have seen the original state (with previous "git add" etc. the user did) in the index. New code now added all the working tree changes to the index, so the user would lose the work made so far in the main index, no? With later patch in the series applied, we will end up having even untracked paths in the main index---I am not sure to what extent that step would make things even worse, but it cannot be harmless. > How is the use of second on-disk index hurting us? I must be > missing something obvious, but at this point, it is not clear what > we are gaining---I only see downsides.