This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate an overall ~70% execution time reduction across all tested usages of git reset using a sparse index: Test before after ------------------------------------------------------------------------ 2000.22: git reset (full-v3) 0.48 0.51 +6.3% 2000.23: git reset (full-v4) 0.47 0.50 +6.4% 2000.24: git reset (sparse-v3) 0.93 0.30 -67.7% 2000.25: git reset (sparse-v4) 0.94 0.29 -69.1% 2000.26: git reset --hard (full-v3) 0.69 0.68 -1.4% 2000.27: git reset --hard (full-v4) 0.75 0.68 -9.3% 2000.28: git reset --hard (sparse-v3) 1.29 0.34 -73.6% 2000.29: git reset --hard (sparse-v4) 1.31 0.34 -74.0% 2000.30: git reset -- does-not-exist (full-v3) 0.54 0.51 -5.6% 2000.31: git reset -- does-not-exist (full-v4) 0.54 0.52 -3.7% 2000.32: git reset -- does-not-exist (sparse-v3) 1.02 0.31 -69.6% 2000.33: git reset -- does-not-exist (sparse-v4) 1.07 0.30 -72.0% [1] microsoft@6b8a074 Thanks! -Victoria Kevin Willford (1): reset: behave correctly with sparse-checkout Victoria Dye (6): sparse-index: update command for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry builtin/reset.c | 77 ++++++++++++- cache-tree.c | 43 ++++++- cache.h | 10 ++ read-cache.c | 22 ++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 139 ++++++++++++++++++++++- t/t7114-reset-sparse-checkout.sh | 61 ++++++++++ unpack-trees.c | 23 +++- 8 files changed, 353 insertions(+), 25 deletions(-) create mode 100755 t/t7114-reset-sparse-checkout.sh base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 -- gitgitgadget
From: Kevin Willford <kewillf@microsoft.com> When using the sparse checkout feature, 'git reset' will add entries to the index that will have the skip-worktree bit off but will leave the working directory empty. File data is lost because the index version of the files has been changed but there is nothing that is in the working directory. This will cause the next 'git status' call to show either deleted for files modified or deleting or nothing for files added. The added files should be shown as untracked and modified files should be shown as modified. To fix this when the reset is running if there is not a file in the working directory and if it will be missing with the new index entry or was not missing in the previous version, we create the previous index version of the file in the working directory so that status will report correctly and the files will be availble for the user to deal with. This fixes a documented failure from t1092 that was created in 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23). Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 39 +++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 4 +- t/t7114-reset-sparse-checkout.sh | 61 ++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 3 deletions(-) create mode 100755 t/t7114-reset-sparse-checkout.sh diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..8ffcd713720 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,8 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" +#include "entry.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q, struct diff_options *opt, void *data) { int i; + int pos; int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; + struct diff_filespec *two = q->queue[i]->two; int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int was_missing = !two->mode && is_null_oid(&two->oid); struct cache_entry *ce; + struct cache_entry *ce_before; + struct checkout state = CHECKOUT_INIT; + + /* + * When using the sparse-checkout feature the cache entries + * that are added here will not have the skip-worktree bit + * set. Without this code there is data that is lost because + * the files that would normally be in the working directory + * are not there and show as deleted for the next status. + * In the case of added files, they just disappear. + * + * We need to create the previous version of the files in + * the working directory so that they will have the right + * content and the next status call will show modified or + * untracked files correctly. + */ + if (core_apply_sparse_checkout && !file_exists(two->path)) { + pos = cache_name_pos(two->path, strlen(two->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && + (is_missing || !was_missing)) { + state.force = 1; + state.refresh_cache = 1; + state.istate = &the_index; + + ce_before = make_cache_entry(&the_index, two->mode, + &two->oid, two->path, + 0, 0); + if (!ce_before) + die(_("make_cache_entry failed for path '%s'"), + two->path); + + checkout_entry(ce_before, &state, NULL, NULL); + } + } if (is_missing && !intent_to_add) { remove_file_from_cache(one->path); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..c5977152661 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh new file mode 100755 index 00000000000..a8029707fb1 --- /dev/null +++ b/t/t7114-reset-sparse-checkout.sh @@ -0,0 +1,61 @@ +#!/bin/sh + +test_description='reset when using a sparse-checkout' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_tick && + echo "checkout file" >c && + echo "modify file" >m && + echo "delete file" >d && + git add . && + git commit -m "initial commit" && + echo "added file" >a && + echo "modification of a file" >m && + git rm d && + git add . && + git commit -m "second commit" && + git checkout -b endCommit +' + +test_expect_success 'reset when there is a sparse-checkout' ' + echo "/c" >.git/info/sparse-checkout && + test_config core.sparsecheckout true && + git checkout -B resetBranch && + test_path_is_missing m && + test_path_is_missing a && + test_path_is_missing d && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + echo "modification of a file" >expect && + test_cmp expect m && + test_path_is_missing d +' + +test_expect_success 'reset after deleting file without skip-worktree bit' ' + git checkout -f endCommit && + git clean -xdf && + cat >.git/info/sparse-checkout <<-\EOF && + /c + /m + EOF + test_config core.sparsecheckout true && + git checkout -B resetAfterDelete && + test_path_is_file m && + test_path_is_missing a && + test_path_is_missing d && + rm -f m && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + test_path_is_missing m && + test_path_is_missing d +' + +test_done -- gitgitgadget
From: Victoria Dye <vdye@github.com> In anticipation of multiple commands being fully integrated with sparse index, update the test for index expand and collapse for non-sparse index integrated commands to use `mv`. Signed-off-by: Victoria Dye <vdye@github.com> --- t/t1092-sparse-checkout-compatibility.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index c5977152661..aed8683e629 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index -c core.fsmonitor="" mv a b && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs both inside and outside of the sparse checkout definition. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++ 2 files changed, 110 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index aed8683e629..e36fb18098d 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' ' test_sparse_match git reset update-folder2 ' +# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit> +# will be added to the work tree even if outside the sparse checkout +# definition, and even if the file is modified to a state of having no local +# changes. The file is "re-ignored" if a hard reset is executed. We may want to +# change this behavior in the future and enforce that files are not written +# outside of the sparse checkout definition. +test_expect_success 'checkout and mixed reset file tracking [sparse]' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset update-folder1 && + test_all_match git reset update-deep && + + # At this point, there are no changes in the working tree. However, + # folder1/a now exists locally (even though it is outside of the sparse + # paths). + run_on_sparse test_path_exists folder1 && + + run_on_all rm folder1/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_sparse test_path_is_missing folder1 && + test_path_exists full-checkout/folder1 +' + +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with sparse directory pathspec outside definition' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- folder1 && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with pathspec match in sparse directory' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- folder1/a && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- \*/a && + test_all_match git status --porcelain=v2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is explicitly expanded with `ensure_full_index` to maintain current behavior. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` does need to be run after enabling sparse index for the command and before resetting. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 8ffcd713720..92b9a3815c7 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -205,6 +205,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -282,9 +283,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -430,6 +428,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`, the function must determine whether the currently-processing directory in the tree is sparse or not. If it is not sparse, the tree is parsed and subtree recursively constructed. If it is sparse, no subtrees are added to the tree and the entry count is set to 1 (representing the sparse directory itself). Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 44 +++++++++++++++++++++--- cache.h | 10 ++++++ read-cache.c | 22 ++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 ++++++-- 4 files changed, 78 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..9021669d682 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,29 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct repository *r, + struct cache_tree *it, + struct tree *tree, + struct strbuf *tree_path) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; + return; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { + struct strbuf subtree_path = STRBUF_INIT; struct tree_desc desc; struct name_entry entry; int cnt; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + strbuf_reset(&subtree_path); + strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1); + strbuf_addbuf(&subtree_path, tree_path); + strbuf_add(&subtree_path, entry.path, entry.pathlen); + strbuf_addch(&subtree_path, '/'); + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, subtree_path.buf, subtree_path.len)) + prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path); cnt += sub->cache_tree->entry_count; } } it->entry_count = cnt; + + strbuf_release(&subtree_path); } void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..ea1166895f8 100644 --- a/read-cache.c +++ b/read-cache.c @@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + int search_sparse) { int first, last; @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_sparse && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_sparse); } } @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, 1); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, 1); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e36fb18098d..0b6ff0de17d 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Sparse directory entries are "diffed" as trees in `diff_cache` (used internally by `reset --mixed`), following a code path separate from individual file handling. The use of `diff_tree_oid` there requires setting explicit `change` and `add_remove` functions to process the internal contents of a sparse directory. Additionally, the `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 30 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 92b9a3815c7..2d95ce76f20 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -196,6 +196,8 @@ static int read_from_tree(const struct pathspec *pathspec, int intent_to_add) { struct diff_options opt; + unsigned int i; + char *skip_worktree_seen = NULL; memset(&opt, 0, sizeof(opt)); copy_pathspec(&opt.pathspec, pathspec); @@ -203,9 +205,35 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + /* + * When pathspec is given for resetting a cone-mode sparse checkout, it may + * identify entries that are nested in sparse directories, in which case the + * index should be expanded. For the sake of efficiency, this check is + * overly-cautious: anything with a wildcard or a magic prefix requires + * expansion, as well as literal paths that aren't in the sparse checkout + * definition AND don't match any directory in the index. + */ + if (pathspec->nr && the_index.sparse_index) { + if (pathspec->magic || pathspec->has_wildcard) { + ensure_full_index(&the_index); + } else { + for (i = 0; i < pathspec->nr; i++) { + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { + ensure_full_index(&the_index); + break; + } + } + } + } + + free(skip_worktree_seen); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 0b6ff0de17d..c9b9ef4992c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -801,14 +801,25 @@ test_expect_success 'sparse-index is not expanded' ' for ref in update-deep update-folder1 update-folder2 update-deep do echo >>sparse-index/README.md && + ensure_not_expanded reset --mixed $ref ensure_not_expanded reset --hard $ref || return 1 done && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && - ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so it will be reset without needing to + # expand the full index. + ensure_not_expanded reset --hard update-folder1 && + ensure_not_expanded reset base -- folder1 && + + ensure_not_expanded reset --hard update-deep && ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. The performance of `git reset -- does-not-exist` (testing the "worst case" in which all entries in the index are unpacked with `next_cache_entry`) is significantly improved for the sparse index case: Test before after ------------------------------------------------------ (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q, > struct diff_options *opt, void *data) > { > int i; > + int pos; > int intent_to_add = *(int *)data; > > for (i = 0; i < q->nr; i++) { > struct diff_filespec *one = q->queue[i]->one; > + struct diff_filespec *two = q->queue[i]->two; > int is_missing = !(one->mode && !is_null_oid(&one->oid)); > + int was_missing = !two->mode && is_null_oid(&two->oid); Not a problem introduced by this patch per-se, but is_missing is a counter-intuitive name for what the boolean wants to represent, I think, which was brought in by b4b313f9 (reset: support "--mixed --intent-to-add" mode, 2014-02-04). Before the commit, we used to say for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; if (one->mode && !is_null_sha1(one->sha1)) { ... create ce out of one and add to the index ... } else remove_file_from_cache(one->path); ... i.e. "if one is not missing, create a ce and add it, otherwise remove the path". It should have been called "one_is_missing" if we wanted to literally express the condition the code checked, but an even better name would have been given after the intent of what the code wants to do with the information. If the resetted-to tree (that is what 'one' side of the comparison in diff_cache() is) has a valid blob, we want it to be in the index, and otherwise, we do not want it in the index. Now, the patch makes things worse and I had to do the above digging to see why the new code is even more confusing. The 'two' side of the comparison is what is in the to-be-corrected-by-reset index. "was_missing" in contrast to "is_missing" makes it sound as if it was the state before whatever "is_missing" tries to represent, but that is not what is happening. "is_missing" does not mean "the entry is currently not there in the index", but "was_missing" does mean exactly that: "the entry is currently not there in the index". There isn't any "was" missing about it. It "is" missing in the index. Instead of renaming, I wonder if we can do without this new variable. Let's read on the patch. Also, now the code uses both sides of the filepair, we must double check that our do_diff_cache() is *not* doing any rename detection. It might be even prudent to ensure that if (strcmp(one->path, two->path)) BUG("reset drove diff-cache with rename detection"); but it might be with too much paranoia. I dunno. > struct cache_entry *ce; > + struct cache_entry *ce_before; > + struct checkout state = CHECKOUT_INIT; These two new variables do not need this wide a scope, I would think. Shouldn't it be inside the body of the new "if" statement this patch adds? > + /* > + * When using the sparse-checkout feature the cache entries > + * that are added here will not have the skip-worktree bit > + * set. Without this code there is data that is lost because > + * the files that would normally be in the working directory > + * are not there and show as deleted for the next status. > + * In the case of added files, they just disappear. > + * > + * We need to create the previous version of the files in > + * the working directory so that they will have the right > + * content and the next status call will show modified or > + * untracked files correctly. > + */ > + if (core_apply_sparse_checkout && !file_exists(two->path)) { In a sparsely checked out working tree, there is nothing in the working tree at the path. It may be because it is sparse and we didn't want to have anything there, or it may be because the user wanted to get rid of it and said "rm path" (not "git rm path") and this part of the tree were of interest even if the sparse checkout feature was used to hide other parts of the tree. With the above two checks alone, we cannot tell which. Let's read on. > + pos = cache_name_pos(two->path, strlen(two->path)); We check the index to see if there is an entry for it. I suspect that because we need to do this check anyway, we shouldn't even have to look at 'two' (and add a new 'was_missing' variable), because 'one' and 'two' came from a comparison between the resetted-to tree object and the current index, and if cache_name_pos() for the path (we can use 'one->path') says there is an entry in the index, by definition, 'two' would not be showing a "removed" state (i.e. "the resetted-to tree had it, the index does not" is what "was_missing" wants to say). So I wonder if it is better to - use one->path for !file_exists() above and cache_name_pos() here instead of two->path. - drop the confusingly named 'was_missing', because (pos < 0) is equivalent to it after this point, and we didn't need it up to this point. > + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && And we do find an entry for it. So this path is not something sparse cone specifies not to check out (otherwise we would have a tree-like entry that covers this path in the index and not an entry for this specific path)? Anyway, if it is marked with the skip-worktree bit, does that mean there is no risk that the reason why two->path does not exist in the working tree is because we earlier gave it in the working tree but it was later removed by the user? Just making sure that we are not breaking the end-user's wish that the path should be removed by resurrecting it in the working tree with a new call to checkout_entry(). > + (is_missing || !was_missing)) { And in such a case, if the resetted-to tree says we shouldn't have the path in the resulting index, or if the original state in the index had this path (but because (0 <= pos) must be true for us to reach this point, I am not sure if "was_missing" can ever be true here), then do the following, which is ... > + state.force = 1; > + state.refresh_cache = 1; > + state.istate = &the_index; > + > + ce_before = make_cache_entry(&the_index, two->mode, > + &two->oid, two->path, > + 0, 0); > + if (!ce_before) > + die(_("make_cache_entry failed for path '%s'"), > + two->path); > + > + checkout_entry(ce_before, &state, NULL, NULL); ... to resurrect the last "git add"ed state from the index and write it out to the working tree. As I suspected, ce_before and state should be scoped inside this block and not visible outside, no? I am not sure why this behaviour is desirable. The "mixed" reset should not have to touch the working tree in the first place. The large comment before this block says "... will not have the skip-worktree bit set", but we are dealing with a case where the original index had a cache entry there with skip-worktree bit set, so isn't the more desirable outcome that the cache entry added back to the index has the skip-worktree bit still set and there is no working tree file that the user did not desire to have? And isn't it the matter of preserving the skip-worktree bit when the code in the post context of this hunk this patch did not touch adds the entry back to the index? > + } > + } > > if (is_missing && !intent_to_add) { > remove_file_from_cache(one->path); If we look at the code after this point, we do use "is_missing" information to tweak ce->ce_flags with the intent-to-add bit. Perhaps we can do a similar tweak to the cache entry to mark it with skip-worktree bit if the index had a cache entry at the path with the bit set? The code that needs to do so would only have to remember if the one->path is in the current index and the cache entry for the path has the skip-worktree bit in the body of the new if() statement about (core_apply_sparse_checkout && !file_exists()) added by this patch (I am not sure if !file_exists() even matters, though, as the approach I am suggesting is to preserve the skip bit and not disturb the working tree files at all). Thanks.
On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> In anticipation of multiple commands being fully integrated with sparse
> index, update the test for index expand and collapse for non-sparse index
> integrated commands to use `mv`.
>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> t/t1092-sparse-checkout-compatibility.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index c5977152661..aed8683e629 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
> init_repos &&
>
> GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
> - git -C sparse-index -c core.fsmonitor="" reset --hard &&
> + git -C sparse-index -c core.fsmonitor="" mv a b &&
Double-checking my understanding as somebody who is not so familiar with
t1092: this test is to ensure that commands which don't yet understand
the sparse index can temporarily expand it in order to do their work?
If so, makes sense to me. And renaming 'a' to 'b' is arbitrary and fine
to do since we end up recreating the sparse-index repository each time
via init_repos.
Looks good to me.
Thanks,
Taylor
Taylor Blau wrote:
> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> In anticipation of multiple commands being fully integrated with sparse
>> index, update the test for index expand and collapse for non-sparse index
>> integrated commands to use `mv`.
>>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> t/t1092-sparse-checkout-compatibility.sh | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index c5977152661..aed8683e629 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' '
>> init_repos &&
>>
>> GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>> - git -C sparse-index -c core.fsmonitor="" reset --hard &&
>> + git -C sparse-index -c core.fsmonitor="" mv a b &&
>
> Double-checking my understanding as somebody who is not so familiar with
> t1092: this test is to ensure that commands which don't yet understand
> the sparse index can temporarily expand it in order to do their work?
Exactly - if a command doesn't explicitly enable use of the sparse index by
setting `command_requires_full_index` to 0, the index is expanded if/when it
is first read during the command's execution and collapsed if/when it is
written to disk. This test makes sure that mechanism works as intended.
-Victoria
Victoria Dye <vdye@github.com> writes:
> Taylor Blau wrote:
>> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote:
>>> From: Victoria Dye <vdye@github.com>
>>>
>>> In anticipation of multiple commands being fully integrated with sparse
>>> index, update the test for index expand and collapse for non-sparse index
>>> integrated commands to use `mv`.
>>> ...
>>> GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
>>> - git -C sparse-index -c core.fsmonitor="" reset --hard &&
>>> + git -C sparse-index -c core.fsmonitor="" mv a b &&
>>
>> Double-checking my understanding as somebody who is not so familiar with
>> t1092: this test is to ensure that commands which don't yet understand
>> the sparse index can temporarily expand it in order to do their work?
>
> Exactly - if a command doesn't explicitly enable use of the sparse index by
> setting `command_requires_full_index` to 0, the index is expanded if/when it
> is first read during the command's execution and collapsed if/when it is
> written to disk. This test makes sure that mechanism works as intended.
Sorry, I do not quite follow.
So is this "before this series of patches, 'reset --hard' can be
used to as a sample of a command that expands and then collapses,
but because it no longer is a good sample of a command so we replace
it with 'mv a b'"? Do we need to update this further when "mv a b"
learns to expand and then collapse?
Junio C Hamano wrote: > Victoria Dye <vdye@github.com> writes: > >> Taylor Blau wrote: >>> On Thu, Sep 30, 2021 at 02:50:56PM +0000, Victoria Dye via GitGitGadget wrote: >>>> From: Victoria Dye <vdye@github.com> >>>> >>>> In anticipation of multiple commands being fully integrated with sparse >>>> index, update the test for index expand and collapse for non-sparse index >>>> integrated commands to use `mv`. >>>> ... >>>> GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ >>>> - git -C sparse-index -c core.fsmonitor="" reset --hard && >>>> + git -C sparse-index -c core.fsmonitor="" mv a b && >>> >>> Double-checking my understanding as somebody who is not so familiar with >>> t1092: this test is to ensure that commands which don't yet understand >>> the sparse index can temporarily expand it in order to do their work? >> >> Exactly - if a command doesn't explicitly enable use of the sparse index by >> setting `command_requires_full_index` to 0, the index is expanded if/when it >> is first read during the command's execution and collapsed if/when it is >> written to disk. This test makes sure that mechanism works as intended. > > Sorry, I do not quite follow. > > So is this "before this series of patches, 'reset --hard' can be > used to as a sample of a command that expands and then collapses, > but because it no longer is a good sample of a command so we replace > it with 'mv a b'"? Yes, because this series enables sparse index integration in `git reset`, the test no longer applies to that command (but it does apply to `git mv`). > Do we need to update this further when "mv a b" > learns to expand and then collapse? Unfortunately, yes. `git mv` was picked more-or-less at random from the set of commands that read the index and don't already have sparse index integrations (excluding those I know are planned for sparse index integration in the near future). If `git mv` were to be updated to disable `command_requires_full_index`, the command in the test would need to change again. For what it's worth, I do think the test itself is valuable, since it makes sure a command's capability to use the sparse index is always the result of an intentional update to (and review of) the code.
Victoria Dye <vdye@github.com> writes:
>> Do we need to update this further when "mv a b"
>> learns to expand and then collapse?
>
> Unfortunately, yes. `git mv` was picked more-or-less at random from the set
> of commands that read the index and don't already have sparse index
> integrations (excluding those I know are planned for sparse index
> integration in the near future). If `git mv` were to be updated to disable
> `command_requires_full_index`, the command in the test would need to change
> again.
>
> For what it's worth, I do think the test itself is valuable, since it makes
> sure a command's capability to use the sparse index is always the result of
> an intentional update to (and review of) the code.
Oh, of course.
I was actually wondering if it woudl be a good idea to leave a
command that will never be "converted" so that we can keep using it
for testing.
Perhaps a new option that is invented exactly for the purpose added
to a plumbing e.g. "git update-index --expand-collapse"?
On 30/09/21 21.50, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> In anticipation of multiple commands being fully integrated with sparse
> index, update the test for index expand and collapse for non-sparse index
> integrated commands to use `mv`.
>
We can say "use git sparse-index mv instead of git sparse-index reset".
Why is mv used for this case?
--
An old man doll... just what I always wanted! - Clara
Junio C Hamano wrote: > "Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q, >> struct diff_options *opt, void *data) >> { >> int i; >> + int pos; >> int intent_to_add = *(int *)data; >> >> for (i = 0; i < q->nr; i++) { >> struct diff_filespec *one = q->queue[i]->one; >> + struct diff_filespec *two = q->queue[i]->two; >> int is_missing = !(one->mode && !is_null_oid(&one->oid)); >> + int was_missing = !two->mode && is_null_oid(&two->oid); > > Not a problem introduced by this patch per-se, but is_missing is a > counter-intuitive name for what the boolean wants to represent, I > think, which was brought in by b4b313f9 (reset: support "--mixed > --intent-to-add" mode, 2014-02-04). Before the commit, we used to > say > > for (i = 0; i < q->nr; i++) { > struct diff_filespec *one = q->queue[i]->one; > if (one->mode && !is_null_sha1(one->sha1)) { > ... create ce out of one and add to the index ... > } else > remove_file_from_cache(one->path); > ... > > i.e. "if one is not missing, create a ce and add it, otherwise > remove the path". > > It should have been called "one_is_missing" if we wanted to > literally express the condition the code checked, but an even better > name would have been given after the intent of what the code wants > to do with the information. If the resetted-to tree (that is what > 'one' side of the comparison in diff_cache() is) has a valid blob, > we want it to be in the index, and otherwise, we do not want it in > the index. > > Now, the patch makes things worse and I had to do the above digging > to see why the new code is even more confusing. The 'two' side of > the comparison is what is in the to-be-corrected-by-reset index. > "was_missing" in contrast to "is_missing" makes it sound as if it > was the state before whatever "is_missing" tries to represent, but > that is not what is happening. "is_missing" does not mean "the > entry is currently not there in the index", but "was_missing" does > mean exactly that: "the entry is currently not there in the index". > > There isn't any "was" missing about it. It "is" missing in the > index. Instead of renaming, I wonder if we can do without this new > variable. Let's read on the patch. The new variable can most likely be refactored away, but based on this it's probably worth renaming "is_missing" to "is_missing_in_reset_tree" (or inverting the boolean and using "is_in_reset_tree"). > Also, now the code uses both sides of the filepair, we must double > check that our do_diff_cache() is *not* doing any rename detection. > It might be even prudent to ensure that > > if (strcmp(one->path, two->path)) > BUG("reset drove diff-cache with rename detection"); > > but it might be with too much paranoia. I dunno. I don't think a rename would break what this change intends to do (although it does break some of the current assumptions in the patch). I'll make sure to verify the rename case works before submitting a new version, just in case. >> struct cache_entry *ce; >> + struct cache_entry *ce_before; >> + struct checkout state = CHECKOUT_INIT; > > These two new variables do not need this wide a scope, I would > think. Shouldn't it be inside the body of the new "if" statement > this patch adds? I will likely need to make other changes to this patch and re-roll, so I'll fix the scoping of all of the variables added here when I do. >> + /* >> + * When using the sparse-checkout feature the cache entries >> + * that are added here will not have the skip-worktree bit >> + * set. Without this code there is data that is lost because >> + * the files that would normally be in the working directory >> + * are not there and show as deleted for the next status. >> + * In the case of added files, they just disappear. >> + * >> + * We need to create the previous version of the files in >> + * the working directory so that they will have the right >> + * content and the next status call will show modified or >> + * untracked files correctly. >> + */ >> + if (core_apply_sparse_checkout && !file_exists(two->path)) { > > In a sparsely checked out working tree, there is nothing in the > working tree at the path. It may be because it is sparse and we > didn't want to have anything there, or it may be because the user > wanted to get rid of it and said "rm path" (not "git rm path") and > this part of the tree were of interest even if the sparse checkout > feature was used to hide other parts of the tree. With the above > two checks alone, we cannot tell which. Let's read on. > >> + pos = cache_name_pos(two->path, strlen(two->path)); > > We check the index to see if there is an entry for it. I suspect > that because we need to do this check anyway, we shouldn't even have > to look at 'two' (and add a new 'was_missing' variable), because > 'one' and 'two' came from a comparison between the resetted-to tree > object and the current index, and if cache_name_pos() for the path > (we can use 'one->path') says there is an entry in the index, by > definition, 'two' would not be showing a "removed" state (i.e. "the > resetted-to tree had it, the index does not" is what "was_missing" > wants to say). > > So I wonder if it is better to > > - use one->path for !file_exists() above and cache_name_pos() here > instead of two->path. > > - drop the confusingly named 'was_missing', because (pos < 0) is > equivalent to it after this point, and we didn't need it up to > this point. > >> + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && > > And we do find an entry for it. So this path is not something > sparse cone specifies not to check out (otherwise we would have a > tree-like entry that covers this path in the index and not an entry > for this specific path)? > > Anyway, if it is marked with the skip-worktree bit, does that mean > there is no risk that the reason why two->path does not exist in the > working tree is because we earlier gave it in the working tree but > it was later removed by the user? Just making sure that we are not > breaking the end-user's wish that the path should be removed by > resurrecting it in the working tree with a new call to > checkout_entry(). > >> + (is_missing || !was_missing)) { > > And in such a case, if the resetted-to tree says we shouldn't have > the path in the resulting index, or if the original state in the > index had this path (but because (0 <= pos) must be true for us to > reach this point, I am not sure if "was_missing" can ever be true > here), then do the following, which is ... > >> + state.force = 1; >> + state.refresh_cache = 1; >> + state.istate = &the_index; >> + >> + ce_before = make_cache_entry(&the_index, two->mode, >> + &two->oid, two->path, >> + 0, 0); >> + if (!ce_before) >> + die(_("make_cache_entry failed for path '%s'"), >> + two->path); >> + >> + checkout_entry(ce_before, &state, NULL, NULL); > > ... to resurrect the last "git add"ed state from the index and write > it out to the working tree. As I suspected, ce_before and state > should be scoped inside this block and not visible outside, no? > > I am not sure why this behaviour is desirable. The "mixed" reset > should not have to touch the working tree in the first place. > > The large comment before this block says "... will not have the > skip-worktree bit set", but we are dealing with a case where the > original index had a cache entry there with skip-worktree bit set, > so isn't the more desirable outcome that the cache entry added back > to the index has the skip-worktree bit still set and there is no > working tree file that the user did not desire to have? > > And isn't it the matter of preserving the skip-worktree bit when the > code in the post context of this hunk this patch did not touch adds > the entry back to the index? > >> + } >> + } >> >> if (is_missing && !intent_to_add) { >> remove_file_from_cache(one->path); > > If we look at the code after this point, we do use "is_missing" > information to tweak ce->ce_flags with the intent-to-add bit. > > Perhaps we can do a similar tweak to the cache entry to mark it with > skip-worktree bit if the index had a cache entry at the path with > the bit set? The code that needs to do so would only have to > remember if the one->path is in the current index and the cache > entry for the path has the skip-worktree bit in the body of the new > if() statement about (core_apply_sparse_checkout && !file_exists()) > added by this patch (I am not sure if !file_exists() even matters, > though, as the approach I am suggesting is to preserve the skip bit > and not disturb the working tree files at all). I think it might easier to address these points as a whole rather than inline. The problem this patch is attempting to solve is that, while (as you noted) `git reset --mixed` should not touch the working tree, it is *also* expected to preserve the files of the pre-reset state (both statements paraphrased from the `--mixed` option doc). Normally these statements don't conflict, but if `skip-worktree` is respected and nothing is done to the working tree before resetting the index, `skip-worktree` files will effectively be `reset --hard`. So, to force preservation of the pre-reset state, the files are checked out. Based on that high-level intent, the implementation here can be simplified (and clarified). The condition on checking out a file (to avoid the `reset --hard`) would be "if the path exists in the current index and the entry in the index has `skip-worktree` enabled". * "if the path exists in the current index" - if it does not exist in the index, there's nothing to preserve. * "if the entry in the index has `skip-worktree` enabled" - if it does not, it's already in the working tree so we don't need to checkout. Then, `checkout_entry()` can then be run on the index entry found (rather than a "fake" one created with `make_cache_entry`). This eliminates a lot of unnecessary usage of `one` and `two`, which hopefully addresses some of your concerns about them. After that, the index reset proceeds as normal (without manual changes to the `skip-worktree` bit). As for the issue of ignoring `skip-worktree`: all of this could be conditioned on a "--ignore-skip-worktree-bits" flag (or something like it) if you'd prefer the default behavior is "don't touch the working tree".
Victoria Dye via GitGitGadget wrote:
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 0b6ff0de17d..c9b9ef4992c 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -801,14 +801,25 @@ test_expect_success 'sparse-index is not expanded' '
> for ref in update-deep update-folder1 update-folder2 update-deep
> do
> echo >>sparse-index/README.md &&
> + ensure_not_expanded reset --mixed $ref
> ensure_not_expanded reset --hard $ref || return 1
> done &&
This is a bug - it's missing `&&` at the end of the line (and adding it will
cause the test to fail). The index is expanded if a mixed reset modifies an
entry outside the sparse cone, so I'll update the test in V2 to verify reset
between two refs with only in-cone files changed between them.
Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
>
>>> Do we need to update this further when "mv a b"
>>> learns to expand and then collapse?
>>
>> Unfortunately, yes. `git mv` was picked more-or-less at random from the set
>> of commands that read the index and don't already have sparse index
>> integrations (excluding those I know are planned for sparse index
>> integration in the near future). If `git mv` were to be updated to disable
>> `command_requires_full_index`, the command in the test would need to change
>> again.
>>
>> For what it's worth, I do think the test itself is valuable, since it makes
>> sure a command's capability to use the sparse index is always the result of
>> an intentional update to (and review of) the code.
>
> Oh, of course.
>
> I was actually wondering if it woudl be a good idea to leave a
> command that will never be "converted" so that we can keep using it
> for testing.
>
> Perhaps a new option that is invented exactly for the purpose added
> to a plumbing e.g. "git update-index --expand-collapse"?
>
That sounds good to me! I'll add an `update-index --expand-collapse`
implementation and update the test in v2 of this series.
This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate an overall ~70% execution time reduction across all tested usages of git reset using a sparse index: Test before after ------------------------------------------------------------------------ 2000.22: git reset (full-v3) 0.48 0.51 +6.3% 2000.23: git reset (full-v4) 0.47 0.50 +6.4% 2000.24: git reset (sparse-v3) 0.93 0.30 -67.7% 2000.25: git reset (sparse-v4) 0.94 0.29 -69.1% 2000.26: git reset --hard (full-v3) 0.69 0.68 -1.4% 2000.27: git reset --hard (full-v4) 0.75 0.68 -9.3% 2000.28: git reset --hard (sparse-v3) 1.29 0.34 -73.6% 2000.29: git reset --hard (sparse-v4) 1.31 0.34 -74.0% 2000.30: git reset -- does-not-exist (full-v3) 0.54 0.51 -5.6% 2000.31: git reset -- does-not-exist (full-v4) 0.54 0.52 -3.7% 2000.32: git reset -- does-not-exist (sparse-v3) 1.02 0.31 -69.6% 2000.33: git reset -- does-not-exist (sparse-v4) 1.07 0.30 -72.0% Changes since V1 ================ * Add --force-full-index option to update-index. The option is used circumvent changing command_requires_full_index from its default value - right now this is effectively a no-op, but will change once update-index is integrated with sparse index. By using this option in the t1092 expand/collapse test, the command used to test will not need to be updated with subsequent sparse index integrations. * Update implementation of mixed reset for entries outside sparse checkout definition. The condition in which a file should be checked out before index reset is simplified to "if it has skip-worktree enabled and a reset would change the file, check it out". * After checking the behavior of update_index_from_diff with renames, found that the diff used by reset does not produce diff queue entries with different pathnames for one and two. Because of this, and that nothing in the implementation seems to rely on identical path names, no BUG check is added. * Correct a bug in the sparse index is not expanded tests in t1092 where failure of a git reset --mixed test was not being reported. Test now verifies an appropriate scenario with corrected failure-checking. Thanks! -Victoria [1] microsoft@6b8a074 Kevin Willford (1): reset: behave correctly with sparse-checkout Victoria Dye (6): update-index: add --force-full-index option for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry Documentation/git-update-index.txt | 5 + builtin/reset.c | 62 +++++++++- builtin/update-index.c | 11 ++ cache-tree.c | 43 ++++++- cache.h | 10 ++ read-cache.c | 22 ++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 139 ++++++++++++++++++++++- t/t7114-reset-sparse-checkout.sh | 61 ++++++++++ unpack-trees.c | 23 +++- 10 files changed, 351 insertions(+), 28 deletions(-) create mode 100755 t/t7114-reset-sparse-checkout.sh base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 Range-diff vs v1: 1: 65905bf4e00 ! 1: 22c69bc6030 reset: behave correctly with sparse-checkout @@ builtin/reset.c #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, - struct diff_options *opt, void *data) - { - int i; -+ int pos; int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { ++ int pos; struct diff_filespec *one = q->queue[i]->one; +- int is_missing = !(one->mode && !is_null_oid(&one->oid)); + struct diff_filespec *two = q->queue[i]->two; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); -+ int was_missing = !two->mode && is_null_oid(&two->oid); ++ int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; -+ struct cache_entry *ce_before; -+ struct checkout state = CHECKOUT_INIT; -+ + +- if (is_missing && !intent_to_add) { + /* -+ * When using the sparse-checkout feature the cache entries -+ * that are added here will not have the skip-worktree bit -+ * set. Without this code there is data that is lost because -+ * the files that would normally be in the working directory -+ * are not there and show as deleted for the next status. -+ * In the case of added files, they just disappear. -+ * -+ * We need to create the previous version of the files in -+ * the working directory so that they will have the right -+ * content and the next status call will show modified or -+ * untracked files correctly. ++ * If the file being reset has `skip-worktree` enabled, we need ++ * to check it out to prevent the file from being hard reset. + */ -+ if (core_apply_sparse_checkout && !file_exists(two->path)) { -+ pos = cache_name_pos(two->path, strlen(two->path)); -+ if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && -+ (is_missing || !was_missing)) { -+ state.force = 1; -+ state.refresh_cache = 1; -+ state.istate = &the_index; -+ -+ ce_before = make_cache_entry(&the_index, two->mode, -+ &two->oid, two->path, -+ 0, 0); -+ if (!ce_before) -+ die(_("make_cache_entry failed for path '%s'"), -+ two->path); ++ pos = cache_name_pos(two->path, strlen(two->path)); ++ if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { ++ struct checkout state = CHECKOUT_INIT; ++ state.force = 1; ++ state.refresh_cache = 1; ++ state.istate = &the_index; + -+ checkout_entry(ce_before, &state, NULL, NULL); -+ } ++ checkout_entry(active_cache[pos], &state, NULL, NULL); + } - - if (is_missing && !intent_to_add) { ++ ++ if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); + continue; + } +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, + if (!ce) + die(_("make_cache_entry failed for path '%s'"), + one->path); +- if (is_missing) { ++ if (!is_in_reset_tree) { + ce->ce_flags |= CE_INTENT_TO_ADD; + set_object_name_for_intent_to_add_entry(ce); + } ## t/t1092-sparse-checkout-compatibility.sh ## @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathspec outside sparse definition' ' 2: a1fa7c080ae ! 2: f7cb9013d46 sparse-index: update command for expand/collapse test @@ Metadata Author: Victoria Dye <vdye@github.com> ## Commit message ## - sparse-index: update command for expand/collapse test + update-index: add --force-full-index option for expand/collapse test - In anticipation of multiple commands being fully integrated with sparse - index, update the test for index expand and collapse for non-sparse index - integrated commands to use `mv`. + Add a new `--force-full-index` option to `git update-index`, which skips + explicitly setting `command_requires_full_index`. This lets `git + update-index --force-full-index` run as a command without sparse index + compatibility implemented, even after it receives sparse index compatibility + updates. + + By using `git update-index --force-full-index` in the `t1092` test + `sparse-index is expanded and converted back`, commands can continue to + integrate with the sparse index without the need to keep modifying the + command used in the test. Signed-off-by: Victoria Dye <vdye@github.com> + ## Documentation/git-update-index.txt ## +@@ Documentation/git-update-index.txt: SYNOPSIS + [--[no-]fsmonitor] + [--really-refresh] [--unresolve] [--again | -g] + [--info-only] [--index-info] ++ [--force-full-index] + [-z] [--stdin] [--index-version <n>] + [--verbose] + [--] [<file>...] +@@ Documentation/git-update-index.txt: time. Version 4 is relatively young (first released in 1.8.0 in + October 2012). Other Git implementations such as JGit and libgit2 + may not support it yet. + ++--force-full-index:: ++ Force the command to operate on a full index, expanding a sparse ++ index if necessary. ++ + -z:: + Only meaningful with `--stdin` or `--index-info`; paths are + separated with NUL character instead of LF. + + ## builtin/update-index.c ## +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) + int split_index = -1; + int force_write = 0; + int fsmonitor = -1; ++ int use_default_full_index = 0; + struct lock_file lock_file = LOCK_INIT; + struct parse_opt_ctx_t ctx; + strbuf_getline_fn getline_fn; +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) + {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL, + N_("clear fsmonitor valid bit"), + PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG}, ++ OPT_SET_INT(0, "force-full-index", &use_default_full_index, ++ N_("run with full index explicitly required"), 1), + OPT_END() + }; + +@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix) + if (newfd < 0) + lock_error = errno; + ++ /* ++ * If --force-full-index is set, the command should skip manually ++ * setting `command_requires_full_index`. ++ */ ++ prepare_repo_settings(r); ++ if (!use_default_full_index) ++ r->settings.command_requires_full_index = 1; ++ + entries = read_cache(); + if (entries < 0) + die("cache corrupted"); + ## t/t1092-sparse-checkout-compatibility.sh ## @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && -+ git -C sparse-index -c core.fsmonitor="" mv a b && ++ git -C sparse-index -c core.fsmonitor="" update-index --force-full-index && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' 3: d033c5e365f = 3: c7e9d9f4e03 reset: expand test coverage for sparse checkouts 4: 2d63a250637 = 4: 49813c8d9ed reset: integrate with sparse index 5: e919e6d3270 = 5: 78cd85d8dcc reset: make sparse-aware (except --mixed) 6: e7cda32efb6 ! 6: 5eaae0825af reset: make --mixed sparse-aware @@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec, ## t/t1092-sparse-checkout-compatibility.sh ## @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' ' - for ref in update-deep update-folder1 update-folder2 update-deep - do - echo >>sparse-index/README.md && -+ ensure_not_expanded reset --mixed $ref ensure_not_expanded reset --hard $ref || return 1 done && ++ ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && 7: 8637ec1660e = 7: aa963eefae7 unpack-trees: improve performance of next_cache_entry -- gitgitgadget
From: Kevin Willford <kewillf@microsoft.com> When using the sparse checkout feature, 'git reset' will add entries to the index that will have the skip-worktree bit off but will leave the working directory empty. File data is lost because the index version of the files has been changed but there is nothing that is in the working directory. This will cause the next 'git status' call to show either deleted for files modified or deleting or nothing for files added. The added files should be shown as untracked and modified files should be shown as modified. To fix this when the reset is running if there is not a file in the working directory and if it will be missing with the new index entry or was not missing in the previous version, we create the previous index version of the file in the working directory so that status will report correctly and the files will be availble for the user to deal with. This fixes a documented failure from t1092 that was created in 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23). Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 24 ++++++++-- t/t1092-sparse-checkout-compatibility.sh | 4 +- t/t7114-reset-sparse-checkout.sh | 61 ++++++++++++++++++++++++ 3 files changed, 83 insertions(+), 6 deletions(-) create mode 100755 t/t7114-reset-sparse-checkout.sh diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..3b75d3b2f20 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,8 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" +#include "entry.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + struct diff_filespec *two = q->queue[i]->two; + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + /* + * If the file being reset has `skip-worktree` enabled, we need + * to check it out to prevent the file from being hard reset. + */ + pos = cache_name_pos(two->path, strlen(two->path)); + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { + struct checkout state = CHECKOUT_INIT; + state.force = 1; + state.refresh_cache = 1; + state.istate = &the_index; + + checkout_entry(active_cache[pos], &state, NULL, NULL); + } + + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..c5977152661 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh new file mode 100755 index 00000000000..a8029707fb1 --- /dev/null +++ b/t/t7114-reset-sparse-checkout.sh @@ -0,0 +1,61 @@ +#!/bin/sh + +test_description='reset when using a sparse-checkout' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_tick && + echo "checkout file" >c && + echo "modify file" >m && + echo "delete file" >d && + git add . && + git commit -m "initial commit" && + echo "added file" >a && + echo "modification of a file" >m && + git rm d && + git add . && + git commit -m "second commit" && + git checkout -b endCommit +' + +test_expect_success 'reset when there is a sparse-checkout' ' + echo "/c" >.git/info/sparse-checkout && + test_config core.sparsecheckout true && + git checkout -B resetBranch && + test_path_is_missing m && + test_path_is_missing a && + test_path_is_missing d && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + echo "modification of a file" >expect && + test_cmp expect m && + test_path_is_missing d +' + +test_expect_success 'reset after deleting file without skip-worktree bit' ' + git checkout -f endCommit && + git clean -xdf && + cat >.git/info/sparse-checkout <<-\EOF && + /c + /m + EOF + test_config core.sparsecheckout true && + git checkout -B resetAfterDelete && + test_path_is_file m && + test_path_is_missing a && + test_path_is_missing d && + rm -f m && + git reset HEAD~1 && + echo "checkout file" >expect && + test_cmp expect c && + echo "added file" >expect && + test_cmp expect a && + test_path_is_missing m && + test_path_is_missing d +' + +test_done -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add a new `--force-full-index` option to `git update-index`, which skips explicitly setting `command_requires_full_index`. This lets `git update-index --force-full-index` run as a command without sparse index compatibility implemented, even after it receives sparse index compatibility updates. By using `git update-index --force-full-index` in the `t1092` test `sparse-index is expanded and converted back`, commands can continue to integrate with the sparse index without the need to keep modifying the command used in the test. Signed-off-by: Victoria Dye <vdye@github.com> --- Documentation/git-update-index.txt | 5 +++++ builtin/update-index.c | 11 +++++++++++ t/t1092-sparse-checkout-compatibility.sh | 2 +- 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 2853f168d97..06255e321a3 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -24,6 +24,7 @@ SYNOPSIS [--[no-]fsmonitor] [--really-refresh] [--unresolve] [--again | -g] [--info-only] [--index-info] + [--force-full-index] [-z] [--stdin] [--index-version <n>] [--verbose] [--] [<file>...] @@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in October 2012). Other Git implementations such as JGit and libgit2 may not support it yet. +--force-full-index:: + Force the command to operate on a full index, expanding a sparse + index if necessary. + -z:: Only meaningful with `--stdin` or `--index-info`; paths are separated with NUL character instead of LF. diff --git a/builtin/update-index.c b/builtin/update-index.c index 187203e8bb5..32ada3ead77 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) int split_index = -1; int force_write = 0; int fsmonitor = -1; + int use_default_full_index = 0; struct lock_file lock_file = LOCK_INIT; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; @@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL, N_("clear fsmonitor valid bit"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG}, + OPT_SET_INT(0, "force-full-index", &use_default_full_index, + N_("run with full index explicitly required"), 1), OPT_END() }; @@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) if (newfd < 0) lock_error = errno; + /* + * If --force-full-index is set, the command should skip manually + * setting `command_requires_full_index`. + */ + prepare_repo_settings(r); + if (!use_default_full_index) + r->settings.command_requires_full_index = 1; + entries = read_cache(); if (entries < 0) die("cache corrupted"); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index c5977152661..b3c0d3b98ee 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index -c core.fsmonitor="" update-index --force-full-index && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs both inside and outside of the sparse checkout definition. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++ 2 files changed, 110 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index b3c0d3b98ee..f0723a6ac97 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' ' test_sparse_match git reset update-folder2 ' +# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit> +# will be added to the work tree even if outside the sparse checkout +# definition, and even if the file is modified to a state of having no local +# changes. The file is "re-ignored" if a hard reset is executed. We may want to +# change this behavior in the future and enforce that files are not written +# outside of the sparse checkout definition. +test_expect_success 'checkout and mixed reset file tracking [sparse]' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset update-folder1 && + test_all_match git reset update-deep && + + # At this point, there are no changes in the working tree. However, + # folder1/a now exists locally (even though it is outside of the sparse + # paths). + run_on_sparse test_path_exists folder1 && + + run_on_all rm folder1/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_sparse test_path_is_missing folder1 && + test_path_exists full-checkout/folder1 +' + +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with sparse directory pathspec outside definition' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- folder1 && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with pathspec match in sparse directory' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- folder1/a && + test_all_match git status --porcelain=v2 +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- \*/a && + test_all_match git status --porcelain=v2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is explicitly expanded with `ensure_full_index` to maintain current behavior. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` does need to be run after enabling sparse index for the command and before resetting. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 3b75d3b2f20..e1f2a2bb2c4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -184,6 +184,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -261,9 +262,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -409,6 +407,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`, the function must determine whether the currently-processing directory in the tree is sparse or not. If it is not sparse, the tree is parsed and subtree recursively constructed. If it is sparse, no subtrees are added to the tree and the entry count is set to 1 (representing the sparse directory itself). Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 44 +++++++++++++++++++++--- cache.h | 10 ++++++ read-cache.c | 22 ++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 ++++++-- 4 files changed, 78 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..9021669d682 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,29 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct repository *r, + struct cache_tree *it, + struct tree *tree, + struct strbuf *tree_path) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; + return; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { + struct strbuf subtree_path = STRBUF_INIT; struct tree_desc desc; struct name_entry entry; int cnt; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + strbuf_reset(&subtree_path); + strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1); + strbuf_addbuf(&subtree_path, tree_path); + strbuf_add(&subtree_path, entry.path, entry.pathlen); + strbuf_addch(&subtree_path, '/'); + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, subtree_path.buf, subtree_path.len)) + prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path); cnt += sub->cache_tree->entry_count; } } it->entry_count = cnt; + + strbuf_release(&subtree_path); } void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..ea1166895f8 100644 --- a/read-cache.c +++ b/read-cache.c @@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + int search_sparse) { int first, last; @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_sparse && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_sparse); } } @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, 1); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, 1); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index f0723a6ac97..e301ef5633a 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Sparse directory entries are "diffed" as trees in `diff_cache` (used internally by `reset --mixed`), following a code path separate from individual file handling. The use of `diff_tree_oid` there requires setting explicit `change` and `add_remove` functions to process the internal contents of a sparse directory. Additionally, the `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 30 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index e1f2a2bb2c4..ceb9b122897 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec, int intent_to_add) { struct diff_options opt; + unsigned int i; + char *skip_worktree_seen = NULL; memset(&opt, 0, sizeof(opt)); copy_pathspec(&opt.pathspec, pathspec); @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + /* + * When pathspec is given for resetting a cone-mode sparse checkout, it may + * identify entries that are nested in sparse directories, in which case the + * index should be expanded. For the sake of efficiency, this check is + * overly-cautious: anything with a wildcard or a magic prefix requires + * expansion, as well as literal paths that aren't in the sparse checkout + * definition AND don't match any directory in the index. + */ + if (pathspec->nr && the_index.sparse_index) { + if (pathspec->magic || pathspec->has_wildcard) { + ensure_full_index(&the_index); + } else { + for (i = 0; i < pathspec->nr; i++) { + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { + ensure_full_index(&the_index); + break; + } + } + } + } + + free(skip_worktree_seen); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e301ef5633a..4afcbc2d673 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && - ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so it will be reset without needing to + # expand the full index. + ensure_not_expanded reset --hard update-folder1 && + ensure_not_expanded reset base -- folder1 && + + ensure_not_expanded reset --hard update-deep && ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. The performance of `git reset -- does-not-exist` (testing the "worst case" in which all entries in the index are unpacked with `next_cache_entry`) is significantly improved for the sparse index case: Test before after ------------------------------------------------------ (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote: > The p2000 tests demonstrate an overall ~70% execution time reduction across > all tested usages of git reset using a sparse index: [...] > Test before after > ------------------------------------------------------------------------ > 2000.22: git reset (full-v3) 0.48 0.51 +6.3% > 2000.23: git reset (full-v4) 0.47 0.50 +6.4% > 2000.24: git reset (sparse-v3) 0.93 0.30 -67.7% > 2000.25: git reset (sparse-v4) 0.94 0.29 -69.1% > 2000.26: git reset --hard (full-v3) 0.69 0.68 -1.4% > 2000.27: git reset --hard (full-v4) 0.75 0.68 -9.3% > 2000.28: git reset --hard (sparse-v3) 1.29 0.34 -73.6% > 2000.29: git reset --hard (sparse-v4) 1.31 0.34 -74.0% > 2000.30: git reset -- does-not-exist (full-v3) 0.54 0.51 -5.6% > 2000.31: git reset -- does-not-exist (full-v4) 0.54 0.52 -3.7% > 2000.32: git reset -- does-not-exist (sparse-v3) 1.02 0.31 -69.6% > 2000.33: git reset -- does-not-exist (sparse-v4) 1.07 0.30 -72.0% This series looks like it really improves some cases, but at the cost of that -70% improvement we've got a ~5% regression in 7/7 for the full-v3 --does-not-exist cases. As noted in your 7/7 (which improves all other cases): (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% Which b.t.w. I had to read a couple of times before realizig that its quoted: Test before after ------------------------------------------------------ (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% Is just the does-not-exist part of this bigger table, are the other cases all ~0% changed, or ...? Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us at ~+6%, and full-v4 had been ~0% but got ~6% worse? Is there a way we can get those improvements in performance without regressing on the full-* cases? Also, these tests only check sparse performance, but isn't some of the code being modified here general enough to not be used exclusively by the sparse mode, full checkout cone or not? It looks fairly easy to extend p2000-sparse-operations.sh to run the same tests but just pretend that it's running in a "full" mode without actually setting up anyting sparse-specific (the meat of those tests just runs "git status" etc. How does that look with this series? Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least a partial regression, it would be nice to have perf numbers on each commit (if only as a one-off for ML consumption). Are there any more improvements followed by regressions followed by improvements as we go along? Would be useful to know...
"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes: > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. I am on vacation today, so let me be brief. Let me see if I am understanding the situation correctly. We have the index, with a path that records a blob, but the path is marked with skip-wortree bit. $ rm -fr test && mkdir test && cd test $ git init . $ date >no-skip $ date >skip $ git add no-skip skip $ git commit -m initial 2 files changed, 2 insertions(+) create mode 100644 no-skip create mode 100644 skip $ date >no-skip $ date >skip $ git add no-skip skip $ git update-index --skip-wortree skip $ rm skip $ git commit -m second [master e9088ad] second 2 files changed, 2 insertions(+), 2 deletions(-) $ ls *skip no-skip $ git ls-files -t H no-skip S skip $ git status On branch master nothing to commit, working tree clean Note. There is no 'reset' done yet so far. The user is happy with the state because (1) The user marked the path "skip" with skip-worktree bit, and thanks to that, even though "skip" is absent in the working tree, the "git status" does not complain. (2) The user marked the path "skip" with skip-worktree bit because the user did not want to see such a file in the working tree. And "git commit -m second", "git ls-files -t", or "git status" that were done to get here did not make it materialize in the working tree all of sudden. And then the user says "git reset HEAD^" to switch to a different commit. $ git reset HEAD^ $ ls *skip no-skip $ git ls-files -t M no-skip D skip $ git status -suno M no-skip D skip The user is unhappy with the state because "skip" is shown as lost. Do I understand the situation you are trying to deal with correctly? > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. Assuming I read the problem description correctly, I am highly skeptical that the above is a correct approach to keep the user happy. Yes, if you created a working tree file with contents that match the blob recorded for the path in the initial commit when "reset HEAD^" is done, you may keep "git status" quiet, so (1) above will be kept, but what about (2)? The user marked the path with "skip" but, because the path should not appear on the working tree. The "fix" is countermanding that wish by the user, isn't it? Wouldn't a fix to the situation be to * Add the blob for "skip" taken from the initial commit to the index, just like the entry for "no-skip" is updated; * But remember that "skip" was marked with "skip-worktree" bit immediately before "git reset" was asked to do its thing, and re-add the bit to the path in the index before "git reset" gives the control back to the usre; * And keep the working tree untouched, without writing anything out to "skip". If the user had a (possibly unrelated) file there, it will not be overwritten, and if the user left the path absent, it will still be absent. so that the last three diagnostic commands in the above sample sequence would instead read: $ ls *skip no-skip $ git ls-files -t M no-skip S skip $ git status -suno M no-skip i.e. skip gets updated in the index only, nothing changes in the working tree for "skip" or "no-skip", and status reports that "no-skip" is different from the index but "skip" hasn't changed in the working tree since the index (thanks to its skip-worktree bit). Then the user will be happy in the same way as the user was happy immediately after the state marked with "There is no 'reset' done yet so far." above, on both counts, not just for "status does not report something got changed" part but also "user didn't want to see 'skip' in the working tree, and 'skip' did not materialize" part. Thanks.
Ævar Arnfjörð Bjarmason wrote: > > On Tue, Oct 05 2021, Victoria Dye via GitGitGadget wrote: > >> The p2000 tests demonstrate an overall ~70% execution time reduction across >> all tested usages of git reset using a sparse index: > > [...] > >> Test before after >> ------------------------------------------------------------------------ >> 2000.22: git reset (full-v3) 0.48 0.51 +6.3% >> 2000.23: git reset (full-v4) 0.47 0.50 +6.4% >> 2000.24: git reset (sparse-v3) 0.93 0.30 -67.7% >> 2000.25: git reset (sparse-v4) 0.94 0.29 -69.1% >> 2000.26: git reset --hard (full-v3) 0.69 0.68 -1.4% >> 2000.27: git reset --hard (full-v4) 0.75 0.68 -9.3% >> 2000.28: git reset --hard (sparse-v3) 1.29 0.34 -73.6% >> 2000.29: git reset --hard (sparse-v4) 1.31 0.34 -74.0% >> 2000.30: git reset -- does-not-exist (full-v3) 0.54 0.51 -5.6% >> 2000.31: git reset -- does-not-exist (full-v4) 0.54 0.52 -3.7% >> 2000.32: git reset -- does-not-exist (sparse-v3) 1.02 0.31 -69.6% >> 2000.33: git reset -- does-not-exist (sparse-v4) 1.07 0.30 -72.0% > > This series looks like it really improves some cases, but at the cost of > that -70% improvement we've got a ~5% regression in 7/7 for the full-v3 > --does-not-exist cases. As noted in your 7/7 (which improves all other > cases): > > (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% > (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% > New performance numbers at the end - I think I have an explanation for this. > Which b.t.w. I had to read a couple of times before realizig that its > quoted: > > Test before after > ------------------------------------------------------ > (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% > (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% > (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% > (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% > > Is just the does-not-exist part of this bigger table, are the other > cases all ~0% changed, or ...? > These numbers were for the `git reset -- does-not-exist` case only. If I end up needing to send a V3, though, I'll probably remove the performance numbers from 7/7 altogether - looking at them now, they make the commit message somewhat cluttered. That said, performance numbers *are* helpful for reviews on the mailing list, so I'd keep the information in the cover letter at the very least. > Anyway, until 7/7 the v3 had been sped up, but a ~10% increase landed us > at ~+6%, and full-v4 had been ~0% but got ~6% worse? > > Is there a way we can get those improvements in performance without > regressing on the full-* cases? > > Also, these tests only check sparse performance, but isn't some of the > code being modified here general enough to not be used exclusively by > the sparse mode, full checkout cone or not? > > It looks fairly easy to extend p2000-sparse-operations.sh to run the > same tests but just pretend that it's running in a "full" mode without > actually setting up anyting sparse-specific (the meat of those tests > just runs "git status" etc. How does that look with this series? > I updated `p2000` locally to do this but the setup was substantially slower for the full checkout, to the point that it was infeasible to run the complete test for all relevant commits. Looking at the changes in this series, nothing appears to affect the full checkout case differently than the sparse checkout/full index case, so I'm fairly confident there won't be a regression specific to full checkouts. > Since only the CL and 7/7 quote numbers from p2000, and 7/7 is at least > a partial regression, it would be nice to have perf numbers on each > commit (if only as a one-off for ML consumption). Are there any more > improvements followed by regressions followed by improvements as we go > along? Would be useful to know... > I don't think any of the apparent slowdowns seen in these results represent real regressions. After re-running the performance tests, I saw variability of up to ~20% execution time across changes with commands that should see no effect on their execution time (e.g. sparse-v* from 1/7 to 4/7). Additionally, I saw different increases & decreases each time for each end-to-end run of the tests. The most reliable, noticeable changes across the test executions were: 1. When each variant of `git reset` was integrated with sparse index, a 65-75% execution time reduction in relevant sparse-v* tests. 2. `git reset -- does-not-exist` slower than `git reset` in 6/7, then matching its speed after 7/7. 3. As of 7/7, full-v* to sparse-v* showing a 50% execution time reduction. My guess is that the variability comes from general "uncontrolled" factors when running the tests (e.g., background processes on my system). The good news is, when the tests are re-run with more trials (and the recent bugfix to `t/perf/perf-lib.sh` [1]), the execution times look a lot less worrisome (apologies for the table width, but I'd like to err on the side of providing more complete information): Test base [1/7] [4/7] [5/7] [6/7] [7/7] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000.22: git reset (full-v3) 0.44(0.16+0.19) 0.44(0.17+0.18) +0.0% 0.44(0.17+0.19) +0.0% 0.45(0.17+0.18) +2.3% 0.44(0.17+0.19) +0.0% 0.45(0.17+0.18) +2.3% 2000.23: git reset (full-v4) 0.43(0.16+0.18) 0.43(0.16+0.19) +0.0% 0.45(0.17+0.18) +4.7% 0.44(0.17+0.18) +2.3% 0.44(0.17+0.18) +2.3% 0.44(0.18+0.18) +2.3% 2000.24: git reset (sparse-v3) 0.82(0.54+0.19) 0.84(0.56+0.19) +2.4% 0.81(0.54+0.19) -1.2% 0.88(0.60+0.19) +7.3% 0.27(0.03+0.45) -67.1% 0.27(0.03+0.47) -67.1% 2000.25: git reset (sparse-v4) 0.82(0.55+0.18) 0.82(0.53+0.20) +0.0% 0.83(0.55+0.19) +1.2% 0.82(0.54+0.19) +0.0% 0.27(0.03+0.50) -67.1% 0.27(0.03+0.48) -67.1% 2000.26: git reset --hard (full-v3) 0.71(0.38+0.24) 0.69(0.37+0.23) -2.8% 0.70(0.37+0.24) -1.4% 0.78(0.41+0.27) +9.9% 0.71(0.38+0.25) +0.0% 0.70(0.37+0.23) -1.4% 2000.27: git reset --hard (full-v4) 0.71(0.38+0.23) 0.77(0.42+0.25) +8.5% 0.76(0.41+0.26) +7.0% 0.72(0.40+0.24) +1.4% 0.68(0.37+0.23) -4.2% 0.67(0.36+0.22) -5.6% 2000.28: git reset --hard (sparse-v3) 1.29(0.93+0.26) 1.33(0.95+0.27) +3.1% 1.11(0.76+0.25) -14.0% 0.38(0.05+0.25) -70.5% 0.36(0.04+0.22) -72.1% 0.34(0.04+0.21) -73.6% 2000.29: git reset --hard (sparse-v4) 1.17(0.84+0.24) 1.10(0.79+0.23) -6.0% 1.01(0.69+0.24) -13.7% 0.42(0.05+0.26) -64.1% 0.39(0.05+0.25) -66.7% 0.38(0.05+0.23) -67.5% 2000.30: git reset -- does-not-exist (full-v3) 0.50(0.19+0.20) 0.50(0.19+0.20) +0.0% 0.53(0.21+0.22) +6.0% 0.47(0.18+0.19) -6.0% 0.45(0.18+0.18) -10.0% 0.45(0.18+0.19) -10.0% 2000.31: git reset -- does-not-exist (full-v4) 0.45(0.18+0.18) 0.46(0.18+0.19) +2.2% 0.47(0.19+0.19) +4.4% 0.45(0.18+0.19) +0.0% 0.45(0.18+0.18) +0.0% 0.45(0.18+0.18) +0.0% 2000.32: git reset -- does-not-exist (sparse-v3) 1.01(0.70+0.21) 0.91(0.62+0.20) -9.9% 0.93(0.64+0.20) -7.9% 0.89(0.61+0.20) -11.9% 0.48(0.23+0.46) -52.5% 0.27(0.03+0.49) -73.3% 2000.33: git reset -- does-not-exist (sparse-v4) 0.99(0.67+0.21) 1.02(0.70+0.22) +3.0% 1.04(0.70+0.22) +5.1% 0.83(0.55+0.19) -16.2% 0.48(0.24+0.48) -51.5% 0.27(0.03+0.49) -72.7% Note that some commits in this series are not included because they don't touch any code used by `git reset`. [1] https://lore.kernel.org/git/pull.1051.git.1633386543759.gitgitgadget@gmail.com/
Junio C Hamano wrote:
> Wouldn't a fix to the situation be to
>
> * Add the blob for "skip" taken from the initial commit to the
> index, just like the entry for "no-skip" is updated;
>
> * But remember that "skip" was marked with "skip-worktree" bit
> immediately before "git reset" was asked to do its thing, and
> re-add the bit to the path in the index before "git reset" gives
> the control back to the usre;
>
> * And keep the working tree untouched, without writing anything out
> to "skip". If the user had a (possibly unrelated) file there, it
> will not be overwritten, and if the user left the path absent, it
> will still be absent.
>
> so that the last three diagnostic commands in the above sample
> sequence would instead read:
>
> $ ls *skip
> no-skip
> $ git ls-files -t
> M no-skip
> S skip
> $ git status -suno
> M no-skip
>
> i.e. skip gets updated in the index only, nothing changes in the
> working tree for "skip" or "no-skip", and status reports that
> "no-skip" is different from the index but "skip" hasn't changed in
> the working tree since the index (thanks to its skip-worktree bit).
>
> Then the user will be happy in the same way as the user was happy
> immediately after the state marked with "There is no 'reset' done
> yet so far." above, on both counts, not just for "status does not
> report something got changed" part but also "user didn't want to see
> 'skip' in the working tree, and 'skip' did not materialize" part.
>
> Thanks.
>
Thanks for the thorough explanation, I'm on-board with your approach (and
will re-roll the series with that implemented). A lot of my thought process
(and confusion) came from a comment in e5ca291076 (t1092: document bad
sparse-checkout behavior, 2021-07-14) suggesting that full and sparse
checkouts should have the same result in scenarios like the one you
outlined above. The problem is, as noted earlier, it's impossible to tell
whether (using your example):
1. the user deleted `skip` because they intentionally want to remove it from
the worktree, and it should continue to be deleted after a reset.
2. `skip` doesn't exist in the worktree because it's excluded from the
sparse checkout definition and the user does not want its current state
"deleted" after a reset.
As a result, there's no way `git reset --mixed` could be expected to behave
the same way in full checkouts as it does in sparse, and the most consistent
solution is that the worktree should remain untouched with `skip-worktree`
preserved.
Hi! It appears Junio has already commented on this patch and in more detail, but since I already typed up some comments I'll send them along in case they are useful. On Tue, Oct 5, 2021 at 6:20 AM Kevin Willford via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Kevin Willford <kewillf@microsoft.com> > > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. Yes, that seems like a problem. > File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. Why is the solution to add the files to the working tree rather than to make sure the files have the skip-worktree bit set? That's not at all what I would have expected. > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. s/availble/available/ > > This fixes a documented failure from t1092 that was created in 19a0acc > (t1092: test interesting sparse-checkout scenarios, 2021-01-23). > > Signed-off-by: Kevin Willford <kewillf@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 24 ++++++++-- > t/t1092-sparse-checkout-compatibility.sh | 4 +- > t/t7114-reset-sparse-checkout.sh | 61 ++++++++++++++++++++++++ > 3 files changed, 83 insertions(+), 6 deletions(-) > create mode 100755 t/t7114-reset-sparse-checkout.sh > > diff --git a/builtin/reset.c b/builtin/reset.c > index 51c9e2f43ff..3b75d3b2f20 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -25,6 +25,8 @@ > #include "cache-tree.h" > #include "submodule.h" > #include "submodule-config.h" > +#include "dir.h" > +#include "entry.h" > > #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) > > @@ -130,11 +132,27 @@ static void update_index_from_diff(struct diff_queue_struct *q, > int intent_to_add = *(int *)data; > > for (i = 0; i < q->nr; i++) { > + int pos; > struct diff_filespec *one = q->queue[i]->one; > - int is_missing = !(one->mode && !is_null_oid(&one->oid)); > + struct diff_filespec *two = q->queue[i]->two; > + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); Isn't !is_null_oid(&one->oid) redundant to checking one->mode? When does the diff machinery ever give you a non-zero mode with a null oid? Also, is_in_reset_tree == !is_missing; I'll note that below. > struct cache_entry *ce; > > + /* > + * If the file being reset has `skip-worktree` enabled, we need > + * to check it out to prevent the file from being hard reset. I don't understand this comment. If the file wasn't originally in the index (is_missing), and is being added to it, and is correctly marked as skip_worktree, and the file isn't in the working tree, then it sounds like everything is already in a good state. Files outside the sparse checkout are meant to have the skip_worktree bit set and be missing from the working tree. Also, I don't know what you mean by 'hard reset' here. > + */ > + pos = cache_name_pos(two->path, strlen(two->path)); > + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { > + struct checkout state = CHECKOUT_INIT; > + state.force = 1; > + state.refresh_cache = 1; > + state.istate = &the_index; > + > + checkout_entry(active_cache[pos], &state, NULL, NULL); Does this introduce an error in the opposite direction from the one stated in the commit message? Namely we have two things that should be in sync: the skip_worktree flag stating whether the file should be present in the working directory (skip_worktree), and the question of whether the file is actually in the working directory. In the commit message, you pointed out a case where the y were out of sync one way: the skip_worktree flag was not set but the file was missing. Here you say the skip_worktree flag is set, but you add it to the working tree anyway. Or am I misunderstanding the code? > + } > + [I did some slight editing to the diff to make the next two parts appear next to each other] > - if (is_missing && !intent_to_add) { > + if (!is_in_reset_tree && !intent_to_add) { I thought this was some subtle bugfix or something, and spent a while trying to figure it out, before realizing that is_in_reset_tree was simply defined as !is_missing (for some reason I was assuming it was dealing with two->mode while is_missing was looking at one->mode). So this is a simple variable renaming, which I think is probably good, but I'd prefer if this was separated into a different patch to make it easier to review. > remove_file_from_cache(one->path); > continue; > } > @@ -144,7 +162,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, > if (!ce) > die(_("make_cache_entry failed for path '%s'"), > one->path); > - if (is_missing) { > + if (!is_in_reset_tree) { same note as above; the variable rename is good, but should be a separate patch. > ce->ce_flags |= CE_INTENT_TO_ADD; > set_object_name_for_intent_to_add_entry(ce); > } > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 886e78715fe..c5977152661 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -459,9 +459,7 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' > test_all_match git blame deep/deeper2/deepest/a > ' > > -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout > -# in this scenario, but it shouldn't. > -test_expect_failure 'checkout and reset (mixed)' ' > +test_expect_success 'checkout and reset (mixed)' ' > init_repos && > > test_all_match git checkout -b reset-test update-deep && > diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh > new file mode 100755 > index 00000000000..a8029707fb1 > --- /dev/null > +++ b/t/t7114-reset-sparse-checkout.sh > @@ -0,0 +1,61 @@ > +#!/bin/sh > + > +test_description='reset when using a sparse-checkout' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + test_tick && > + echo "checkout file" >c && > + echo "modify file" >m && > + echo "delete file" >d && > + git add . && > + git commit -m "initial commit" && > + echo "added file" >a && > + echo "modification of a file" >m && > + git rm d && > + git add . && > + git commit -m "second commit" && > + git checkout -b endCommit > +' > + > +test_expect_success 'reset when there is a sparse-checkout' ' > + echo "/c" >.git/info/sparse-checkout && > + test_config core.sparsecheckout true && > + git checkout -B resetBranch && > + test_path_is_missing m && > + test_path_is_missing a && > + test_path_is_missing d && > + git reset HEAD~1 && > + echo "checkout file" >expect && > + test_cmp expect c && > + echo "added file" >expect && > + test_cmp expect a && > + echo "modification of a file" >expect && > + test_cmp expect m && > + test_path_is_missing d > +' > + > +test_expect_success 'reset after deleting file without skip-worktree bit' ' > + git checkout -f endCommit && > + git clean -xdf && > + cat >.git/info/sparse-checkout <<-\EOF && > + /c > + /m > + EOF > + test_config core.sparsecheckout true && > + git checkout -B resetAfterDelete && > + test_path_is_file m && > + test_path_is_missing a && > + test_path_is_missing d && > + rm -f m && > + git reset HEAD~1 && > + echo "checkout file" >expect && > + test_cmp expect c && > + echo "added file" >expect && > + test_cmp expect a && > + test_path_is_missing m && > + test_path_is_missing d > +' > + > +test_done > -- > gitgitgadget >
On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > Add a new `--force-full-index` option to `git update-index`, which skips > explicitly setting `command_requires_full_index`. This lets `git > update-index --force-full-index` run as a command without sparse index > compatibility implemented, even after it receives sparse index compatibility > updates. > > By using `git update-index --force-full-index` in the `t1092` test > `sparse-index is expanded and converted back`, commands can continue to > integrate with the sparse index without the need to keep modifying the > command used in the test. So...we're adding a permanent user-facing command line flag, whose purpose is just to help us with the transition work of implementing sparse indexes everywhere? Am I reading that right, or is that just the reason for t1092 and there are more reasons for it elsewhere? Also, I'm curious if update-index is the right place to add this. If you don't want a sparse index anymore, wouldn't a user want to run git sparse-checkout disable ? Or is the point that you do want to keep the sparse checkout, but you just don't want the index to also be sparse? Still, even in that case, it seems like adding a subcommand or flag to an existing sparse-checkout subcommand would feel more natural, since sparse-checkout is the command the user uses to request to get into a sparse-checkout and sparse index. > Signed-off-by: Victoria Dye <vdye@github.com> > --- > Documentation/git-update-index.txt | 5 +++++ > builtin/update-index.c | 11 +++++++++++ > t/t1092-sparse-checkout-compatibility.sh | 2 +- > 3 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt > index 2853f168d97..06255e321a3 100644 > --- a/Documentation/git-update-index.txt > +++ b/Documentation/git-update-index.txt > @@ -24,6 +24,7 @@ SYNOPSIS > [--[no-]fsmonitor] > [--really-refresh] [--unresolve] [--again | -g] > [--info-only] [--index-info] > + [--force-full-index] > [-z] [--stdin] [--index-version <n>] > [--verbose] > [--] [<file>...] > @@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in > October 2012). Other Git implementations such as JGit and libgit2 > may not support it yet. > > +--force-full-index:: > + Force the command to operate on a full index, expanding a sparse > + index if necessary. > + > -z:: > Only meaningful with `--stdin` or `--index-info`; paths are > separated with NUL character instead of LF. > diff --git a/builtin/update-index.c b/builtin/update-index.c > index 187203e8bb5..32ada3ead77 100644 > --- a/builtin/update-index.c > +++ b/builtin/update-index.c > @@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) > int split_index = -1; > int force_write = 0; > int fsmonitor = -1; > + int use_default_full_index = 0; > struct lock_file lock_file = LOCK_INIT; > struct parse_opt_ctx_t ctx; > strbuf_getline_fn getline_fn; > @@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) > {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL, > N_("clear fsmonitor valid bit"), > PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG}, > + OPT_SET_INT(0, "force-full-index", &use_default_full_index, > + N_("run with full index explicitly required"), 1), > OPT_END() > }; > > @@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) > if (newfd < 0) > lock_error = errno; > > + /* > + * If --force-full-index is set, the command should skip manually > + * setting `command_requires_full_index`. > + */ > + prepare_repo_settings(r); > + if (!use_default_full_index) > + r->settings.command_requires_full_index = 1; > + > entries = read_cache(); > if (entries < 0) > die("cache corrupted"); > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index c5977152661..b3c0d3b98ee 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -642,7 +642,7 @@ test_expect_success 'sparse-index is expanded and converted back' ' > init_repos && > > GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > - git -C sparse-index -c core.fsmonitor="" reset --hard && > + git -C sparse-index -c core.fsmonitor="" update-index --force-full-index && > test_region index convert_to_sparse trace2.txt && > test_region index ensure_full_index trace2.txt > ' > -- > gitgitgadget
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > Add new tests for `--merge` and `--keep` modes, as well as mixed reset with > pathspecs both inside and outside of the sparse checkout definition. New > performance test cases exercise various execution paths for `reset`. > > Co-authored-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > t/perf/p2000-sparse-operations.sh | 3 + > t/t1092-sparse-checkout-compatibility.sh | 107 +++++++++++++++++++++++ > 2 files changed, 110 insertions(+) > > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > index 597626276fb..bfd332120c8 100755 > --- a/t/perf/p2000-sparse-operations.sh > +++ b/t/perf/p2000-sparse-operations.sh > @@ -110,5 +110,8 @@ test_perf_on_all git add -A > test_perf_on_all git add . > test_perf_on_all git commit -a -m A > test_perf_on_all git checkout -f - > +test_perf_on_all git reset > +test_perf_on_all git reset --hard > +test_perf_on_all git reset -- does-not-exist > > test_done > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index b3c0d3b98ee..f0723a6ac97 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -479,6 +479,113 @@ test_expect_success 'checkout and reset (mixed) [sparse]' ' > test_sparse_match git reset update-folder2 > ' > > +# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit> > +# will be added to the work tree even if outside the sparse checkout > +# definition, and even if the file is modified to a state of having no local > +# changes. The file is "re-ignored" if a hard reset is executed. We may want to > +# change this behavior in the future and enforce that files are not written > +# outside of the sparse checkout definition. Yeah, I think this comment highlights some of the reasons that writing the file to the working directory for those files isn't the way I'd prefer to resolve the inconsistency between the skip-worktree bit and the presence of the file in the working directory. > +test_expect_success 'checkout and mixed reset file tracking [sparse]' ' > + init_repos && > + > + test_all_match git checkout -b reset-test update-deep && > + test_all_match git reset update-folder1 && > + test_all_match git reset update-deep && > + > + # At this point, there are no changes in the working tree. However, > + # folder1/a now exists locally (even though it is outside of the sparse > + # paths). > + run_on_sparse test_path_exists folder1 && > + > + run_on_all rm folder1/a && > + test_all_match git status --porcelain=v2 && > + > + test_all_match git reset --hard update-deep && > + run_on_sparse test_path_is_missing folder1 && > + test_path_exists full-checkout/folder1 > +' > + > +test_expect_success 'checkout and reset (merge)' ' > + init_repos && > + > + write_script edit-contents <<-\EOF && > + echo text >>$1 > + EOF > + > + test_all_match git checkout -b reset-test update-deep && > + run_on_all ../edit-contents a && > + test_all_match git reset --merge deepest && > + test_all_match git status --porcelain=v2 && > + > + test_all_match git reset --hard update-deep && > + run_on_all ../edit-contents deep/a && > + test_all_match test_must_fail git reset --merge deepest > +' > + > +test_expect_success 'checkout and reset (keep)' ' > + init_repos && > + > + write_script edit-contents <<-\EOF && > + echo text >>$1 > + EOF > + > + test_all_match git checkout -b reset-test update-deep && > + run_on_all ../edit-contents a && > + test_all_match git reset --keep deepest && > + test_all_match git status --porcelain=v2 && > + > + test_all_match git reset --hard update-deep && > + run_on_all ../edit-contents deep/a && > + test_all_match test_must_fail git reset --keep deepest > +' > + > +test_expect_success 'reset with pathspecs inside sparse definition' ' > + init_repos && > + > + write_script edit-contents <<-\EOF && > + echo text >>$1 > + EOF > + > + test_all_match git checkout -b reset-test update-deep && > + run_on_all ../edit-contents deep/a && > + > + test_all_match git reset base -- deep/a && > + test_all_match git status --porcelain=v2 && > + > + test_all_match git reset base -- nonexistent-file && > + test_all_match git status --porcelain=v2 && > + > + test_all_match git reset deepest -- deep && > + test_all_match git status --porcelain=v2 > +' > + > +test_expect_success 'reset with sparse directory pathspec outside definition' ' > + init_repos && > + > + test_all_match git checkout -b reset-test update-deep && > + test_all_match git reset --hard update-folder1 && > + test_all_match git reset base -- folder1 && > + test_all_match git status --porcelain=v2 > +' > + > +test_expect_success 'reset with pathspec match in sparse directory' ' > + init_repos && > + > + test_all_match git checkout -b reset-test update-deep && > + test_all_match git reset --hard update-folder1 && > + test_all_match git reset base -- folder1/a && > + test_all_match git status --porcelain=v2 > +' > + > +test_expect_success 'reset with wildcard pathspec' ' > + init_repos && > + > + test_all_match git checkout -b reset-test update-deep && > + test_all_match git reset --hard update-folder1 && > + test_all_match git reset base -- \*/a && > + test_all_match git status --porcelain=v2 > +' > + > test_expect_success 'merge, cherry-pick, and rebase' ' > init_repos && > > -- > gitgitgadget >
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > `reset --soft` does not modify the index, so no compatibility changes are > needed for it to function without expanding the index. For all other reset > modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is > explicitly expanded with `ensure_full_index` to maintain current behavior. "to maintain current behavior"? You are changing code here, which suggests some kind of behavior is changing, but that description seems to be claiming the opposite. Is it some kind of preventative change to add ensure_full_index calls in an additional place, with a later patch in the series intending to remove the other one(s), so you're making sure that later changes won't cause unwanted behavioral changes? Or was something else meant here? If the above wasn't what you meant, but you're adding ensure_full_index calls, does that suggest that we had some important code paths that were not protected by such calls? I thought Stolee said we had them all covered (at least to the best of our knowledge), so I'm curious if we just discovered we missed some. If so, are there other codepaths like this one where we missed protective ensure_full_index calls? > Additionally, the `read_cache()` check verifying an uncorrupted index is > moved after argument parsing and preparing the repo settings. The index is > not used by the preceding argument handling, but `read_cache()` does need to > be run after enabling sparse index for the command and before resetting. This seems to be discussing what code changes are being made, but not why. I'm guessing at the reasoning, but is it something along the lines of: """ Also, make sure to read_cache() after setting command_requires_full_index = 0, so that we don't unnecessarily expand the index as part of our early index-corruption check. """ ? > > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 10 +++++++--- > cache-tree.c | 1 + > 2 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/builtin/reset.c b/builtin/reset.c > index 3b75d3b2f20..e1f2a2bb2c4 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -184,6 +184,7 @@ static int read_from_tree(const struct pathspec *pathspec, > opt.flags.override_submodule_config = 1; > opt.repo = the_repository; > > + ensure_full_index(&the_index); > if (do_diff_cache(tree_oid, &opt)) > return 1; > diffcore_std(&opt); > @@ -261,9 +262,6 @@ static void parse_args(struct pathspec *pathspec, > } > *rev_ret = rev; > > - if (read_cache() < 0) > - die(_("index file corrupt")); > - > parse_pathspec(pathspec, 0, > PATHSPEC_PREFER_FULL | > (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), > @@ -409,6 +407,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) > if (intent_to_add && reset_type != MIXED) > die(_("-N can only be used with --mixed")); > > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + > + if (read_cache() < 0) > + die(_("index file corrupt")); > + > /* Soft reset does not touch the index file nor the working tree > * at all, but requires them in a good order. Other resets reset > * the index file to the tree object we are switching to. */ > diff --git a/cache-tree.c b/cache-tree.c > index 90919f9e345..9be19c85b66 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, > cache_tree_free(&istate->cache_tree); > istate->cache_tree = cache_tree(); > > + ensure_full_index(istate); > prime_cache_tree_rec(r, istate->cache_tree, tree); > istate->cache_changed |= CACHE_TREE_CHANGED; > trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); > -- > gitgitgadget >
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`, > the function must determine whether the currently-processing directory in > the tree is sparse or not. If it is not sparse, the tree is parsed and > subtree recursively constructed. If it is sparse, no subtrees are added to > the tree and the entry count is set to 1 (representing the sparse directory > itself). > > Signed-off-by: Victoria Dye <vdye@github.com> > --- > cache-tree.c | 44 +++++++++++++++++++++--- > cache.h | 10 ++++++ > read-cache.c | 22 ++++++++---- > t/t1092-sparse-checkout-compatibility.sh | 15 ++++++-- > 4 files changed, 78 insertions(+), 13 deletions(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 9be19c85b66..9021669d682 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -740,15 +740,29 @@ out: > return ret; > } > > +static void prime_cache_tree_sparse_dir(struct repository *r, > + struct cache_tree *it, > + struct tree *tree, > + struct strbuf *tree_path) > +{ > + > + oidcpy(&it->oid, &tree->object.oid); > + it->entry_count = 1; > + return; Why are 'r' and 'tree_path' passed to this function? > +} > + > static void prime_cache_tree_rec(struct repository *r, > struct cache_tree *it, > - struct tree *tree) > + struct tree *tree, > + struct strbuf *tree_path) > { > + struct strbuf subtree_path = STRBUF_INIT; > struct tree_desc desc; > struct name_entry entry; > int cnt; > > oidcpy(&it->oid, &tree->object.oid); > + Why the blank line addition here? > init_tree_desc(&desc, tree->buffer, tree->size); > cnt = 0; > while (tree_entry(&desc, &entry)) { > @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r, > else { > struct cache_tree_sub *sub; > struct tree *subtree = lookup_tree(r, &entry.oid); > + > if (!subtree->object.parsed) > parse_tree(subtree); > sub = cache_tree_sub(it, entry.path); > sub->cache_tree = cache_tree(); > - prime_cache_tree_rec(r, sub->cache_tree, subtree); > + strbuf_reset(&subtree_path); > + strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1); > + strbuf_addbuf(&subtree_path, tree_path); > + strbuf_add(&subtree_path, entry.path, entry.pathlen); > + strbuf_addch(&subtree_path, '/'); Reconstructing the full path each time? And despite only being useful for the sparse-index case? Would it be better to drop subtree_path from this function, then append entry.path + '/' here to tree_path, and then after the if-block below, call strbuf_setlen to remove the part that this function call added? That way, we don't need subtree_path, and don't have to copy the leading path every time. Also, maybe it'd be better to only do this strbuf manipulation if r->index->sparse_index, since it's not ever used otherwise? > + > + /* > + * If a sparse index is in use, the directory being processed may be > + * sparse. To confirm that, we can check whether an entry with that > + * exact name exists in the index. If it does, the created subtree > + * should be sparse. Otherwise, cache tree expansion should continue > + * as normal. > + */ > + if (r->index->sparse_index && > + index_entry_exists(r->index, subtree_path.buf, subtree_path.len)) > + prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path); > + else > + prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path); > cnt += sub->cache_tree->entry_count; > } > } > it->entry_count = cnt; > + > + strbuf_release(&subtree_path); > } > > void prime_cache_tree(struct repository *r, > struct index_state *istate, > struct tree *tree) > { > + struct strbuf tree_path = STRBUF_INIT; > + > trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); > cache_tree_free(&istate->cache_tree); > istate->cache_tree = cache_tree(); > > - ensure_full_index(istate); > - prime_cache_tree_rec(r, istate->cache_tree, tree); > + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); > + strbuf_release(&tree_path); > istate->cache_changed |= CACHE_TREE_CHANGED; > trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); > } > diff --git a/cache.h b/cache.h > index f6295f3b048..1d3e4665562 100644 > --- a/cache.h > +++ b/cache.h > @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na > */ > int index_name_pos(struct index_state *, const char *name, int namelen); > > +/* > + * Determines whether an entry with the given name exists within the > + * given index. The return value is 1 if an exact match is found, otherwise > + * it is 0. Note that, unlike index_name_pos, this function does not expand > + * the index if it is sparse. If an item exists within the full index but it > + * is contained within a sparse directory (and not in the sparse index), 0 is > + * returned. > + */ > +int index_entry_exists(struct index_state *, const char *name, int namelen); > + > /* > * Some functions return the negative complement of an insert position when a > * precise match was not found but a position was found where the entry would > diff --git a/read-cache.c b/read-cache.c > index f5d4385c408..ea1166895f8 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -551,7 +551,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char > return 0; > } > > -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) > +static int index_name_stage_pos(struct index_state *istate, > + const char *name, int namelen, > + int stage, > + int search_sparse) It'd be nicer to make search_sparse an enum defined within this file, so that... > { > int first, last; > > @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > first = next+1; > } > > - if (istate->sparse_index && > + if (search_sparse && istate->sparse_index && > first > 0) { > /* Note: first <= istate->cache_nr */ > struct cache_entry *ce = istate->cache[first - 1]; > @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > ce_namelen(ce) < namelen && > !strncmp(name, ce->name, ce_namelen(ce))) { > ensure_full_index(istate); > - return index_name_stage_pos(istate, name, namelen, stage); > + return index_name_stage_pos(istate, name, namelen, stage, search_sparse); > } > } > > @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > > int index_name_pos(struct index_state *istate, const char *name, int namelen) > { > - return index_name_stage_pos(istate, name, namelen, 0); > + return index_name_stage_pos(istate, name, namelen, 0, 1); ...this could use SEARCH_SPARSE or some name like that which is more meaningful than "1" here. > +} > + > +int index_entry_exists(struct index_state *istate, const char *name, int namelen) > +{ > + return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0; ...and likewise this spot could use SEARCH_FULL or some name like that, which is more meaningful than the second "0". Similarly for multiple call sites below... > } > > int remove_index_entry_at(struct index_state *istate, int pos) > @@ -1222,7 +1230,7 @@ static int has_dir_name(struct index_state *istate, > */ > } > > - pos = index_name_stage_pos(istate, name, len, stage); > + pos = index_name_stage_pos(istate, name, len, stage, 1); > if (pos >= 0) { > /* > * Found one, but not so fast. This could > @@ -1322,7 +1330,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e > strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) > pos = index_pos_to_insert_pos(istate->cache_nr); > else > - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); > + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); > > /* existing match? Just replace it. */ > if (pos >= 0) { > @@ -1357,7 +1365,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e > if (!ok_to_replace) > return error(_("'%s' appears as both a file and as a directory"), > ce->name); > - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); > + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); > pos = -pos-1; > } > return pos + 1; > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index f0723a6ac97..e301ef5633a 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -786,9 +786,9 @@ test_expect_success 'sparse-index is not expanded' ' > ensure_not_expanded checkout - && > ensure_not_expanded switch rename-out-to-out && > ensure_not_expanded switch - && > - git -C sparse-index reset --hard && > + ensure_not_expanded reset --hard && > ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && > - git -C sparse-index reset --hard && > + ensure_not_expanded reset --hard && > ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && > > echo >>sparse-index/README.md && > @@ -798,6 +798,17 @@ test_expect_success 'sparse-index is not expanded' ' > echo >>sparse-index/untracked.txt && > ensure_not_expanded add . && > > + for ref in update-deep update-folder1 update-folder2 update-deep > + do > + echo >>sparse-index/README.md && > + ensure_not_expanded reset --hard $ref || return 1 > + done && > + > + ensure_not_expanded reset --hard update-deep && > + ensure_not_expanded reset --keep base && > + ensure_not_expanded reset --merge update-deep && > + ensure_not_expanded reset --hard && > + > ensure_not_expanded checkout -f update-deep && > test_config -C sparse-index pull.twohead ort && > ( > -- > gitgitgadget
On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > Sparse directory entries are "diffed" as trees in `diff_cache` (used > internally by `reset --mixed`), following a code path separate from > individual file handling. The use of `diff_tree_oid` there requires setting > explicit `change` and `add_remove` functions to process the internal > contents of a sparse directory. > > Additionally, the `recursive` diff option handles cases in which `reset > --mixed` must diff/merge files that are nested multiple levels deep in a > sparse directory. > > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 30 +++++++++++++++++++++++- > t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++- > 2 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/builtin/reset.c b/builtin/reset.c > index e1f2a2bb2c4..ceb9b122897 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -175,6 +175,8 @@ static int read_from_tree(const struct pathspec *pathspec, > int intent_to_add) > { > struct diff_options opt; > + unsigned int i; > + char *skip_worktree_seen = NULL; > > memset(&opt, 0, sizeof(opt)); > copy_pathspec(&opt.pathspec, pathspec); > @@ -182,9 +184,35 @@ static int read_from_tree(const struct pathspec *pathspec, > opt.format_callback = update_index_from_diff; > opt.format_callback_data = &intent_to_add; > opt.flags.override_submodule_config = 1; > + opt.flags.recursive = 1; > opt.repo = the_repository; > + opt.change = diff_change; > + opt.add_remove = diff_addremove; > + > + /* > + * When pathspec is given for resetting a cone-mode sparse checkout, it may > + * identify entries that are nested in sparse directories, in which case the > + * index should be expanded. For the sake of efficiency, this check is > + * overly-cautious: anything with a wildcard or a magic prefix requires > + * expansion, as well as literal paths that aren't in the sparse checkout > + * definition AND don't match any directory in the index. s/efficiency/efficiency of checking/ ? Being overly-cautious suggests you'll expand to a full index more than is needed, and full indexes are more expensive. But perhaps the checking would be expensive too so you have a tradeoff? Or maybe s/efficiency/simplicity/? > + */ > + if (pathspec->nr && the_index.sparse_index) { > + if (pathspec->magic || pathspec->has_wildcard) { > + ensure_full_index(&the_index); dir.c has the notion of matching the characters preceding the wildcard characters; look for "no_wildcard_len". If the pathspec doesn't match a path up to no_wildcard_len, then the wildcard character(s) later in the pathspec can't make the pathspec match that path. It might at least be worth mentioning this as a possible future optimization. > + } else { > + for (i = 0; i < pathspec->nr; i++) { > + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && > + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { What if the pathspec corresponds to a sparse-directory in the index, but possibly without the trailing '/' character? e.g.: git reset HEAD~1 -- sparse-directory One should be able to reset that directory without recursing into it...does this code handle that? Does it handle it if we add the trailing slash on the path for the reset command line? > + ensure_full_index(&the_index); > + break; > + } > + } > + } > + } > + > + free(skip_worktree_seen); > > - ensure_full_index(&the_index); > if (do_diff_cache(tree_oid, &opt)) > return 1; > diffcore_std(&opt); > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index e301ef5633a..4afcbc2d673 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' > ensure_not_expanded reset --hard $ref || return 1 > done && > > + ensure_not_expanded reset --mixed base && > ensure_not_expanded reset --hard update-deep && > ensure_not_expanded reset --keep base && > ensure_not_expanded reset --merge update-deep && > - ensure_not_expanded reset --hard && This commit was only touching the --mixed case; why is it removing one of the tests for --hard? > > + ensure_not_expanded reset base -- deep/a && > + ensure_not_expanded reset base -- nonexistent-file && > + ensure_not_expanded reset deepest -- deep && > + > + # Although folder1 is outside the sparse definition, it exists as a > + # directory entry in the index, so it will be reset without needing to > + # expand the full index. Ah, I think this answers one of my earlier questions. Does it also work with 'folder1/' as well as 'folder1'? > + ensure_not_expanded reset --hard update-folder1 && Wait...is update-folder1 a branch or a path? And if this commit is about --mixed, why are --hard testcases being added? > + ensure_not_expanded reset base -- folder1 && > + > + ensure_not_expanded reset --hard update-deep && another --hard testcase...was this an accidental squash by chance? > ensure_not_expanded checkout -f update-deep && > test_config -C sparse-index pull.twohead ort && > ( > -- > gitgitgadget >
Hi Victoria,
On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series integrates the sparse index with git reset and provides
> miscellaneous fixes and improvements to the command in sparse checkouts.
> This includes:
>
> 1. tests added to t1092 and p2000 to establish the baseline functionality
> of the command
> 2. repository settings to enable the sparse index with ensure_full_index
> guarding any code paths that break tests without other compatibility
> updates.
> 3. modifications to remove or reduce the scope in which ensure_full_index
> must be called.
>
> The sparse index updates are predicated on a fix originating from the
> microsoft/git fork [1], correcting how git reset --mixed handles resetting
> entries outside the sparse checkout definition. Additionally, a performance
> "bug" in next_cache_entry with sparse index is corrected, preventing
> repeatedly looping over already-searched entries.
>
> The p2000 tests demonstrate an overall ~70% execution time reduction across
> all tested usages of git reset using a sparse index:
>
> Test before after
> ------------------------------------------------------------------------
> 2000.22: git reset (full-v3) 0.48 0.51 +6.3%
> 2000.23: git reset (full-v4) 0.47 0.50 +6.4%
> 2000.24: git reset (sparse-v3) 0.93 0.30 -67.7%
> 2000.25: git reset (sparse-v4) 0.94 0.29 -69.1%
> 2000.26: git reset --hard (full-v3) 0.69 0.68 -1.4%
> 2000.27: git reset --hard (full-v4) 0.75 0.68 -9.3%
> 2000.28: git reset --hard (sparse-v3) 1.29 0.34 -73.6%
> 2000.29: git reset --hard (sparse-v4) 1.31 0.34 -74.0%
> 2000.30: git reset -- does-not-exist (full-v3) 0.54 0.51 -5.6%
> 2000.31: git reset -- does-not-exist (full-v4) 0.54 0.52 -3.7%
> 2000.32: git reset -- does-not-exist (sparse-v3) 1.02 0.31 -69.6%
> 2000.33: git reset -- does-not-exist (sparse-v4) 1.07 0.30 -72.0%
>
>
>
> Changes since V1
> ================
>
> * Add --force-full-index option to update-index. The option is used
> circumvent changing command_requires_full_index from its default value -
> right now this is effectively a no-op, but will change once update-index
> is integrated with sparse index. By using this option in the t1092
> expand/collapse test, the command used to test will not need to be
> updated with subsequent sparse index integrations.
> * Update implementation of mixed reset for entries outside sparse checkout
> definition. The condition in which a file should be checked out before
> index reset is simplified to "if it has skip-worktree enabled and a reset
> would change the file, check it out".
> * After checking the behavior of update_index_from_diff with renames,
> found that the diff used by reset does not produce diff queue entries
> with different pathnames for one and two. Because of this, and that
> nothing in the implementation seems to rely on identical path names, no
> BUG check is added.
> * Correct a bug in the sparse index is not expanded tests in t1092 where
> failure of a git reset --mixed test was not being reported. Test now
> verifies an appropriate scenario with corrected failure-checking.
I read over the first six patches. I tried to read over the seventh,
but I've never figured out cache_bottom for some reason and I did
nothing beyond spot checking when Stolee touched that area either.
Anyway, I had lots of little comments, tweaks to the way to fix the
inconsistency in patch 1, various questions, etc. It probably adds up
to a lot, but it's all small fixable stuff; overall it looks like you
(and Kevin) are making a solid contribution on the sparse-checkout
stuff; I look forward to reading the next round.
On 05/10/21 20.20, Kevin Willford via GitGitGadget wrote: > When using the sparse checkout feature, 'git reset' will add entries to > the index that will have the skip-worktree bit off but will leave the > working directory empty. File data is lost because the index version of > the files has been changed but there is nothing that is in the working > directory. This will cause the next 'git status' call to show either > deleted for files modified or deleting or nothing for files added. The > added files should be shown as untracked and modified files should be > shown as modified. > Better say `... but there is nothing in the working directory`. > To fix this when the reset is running if there is not a file in the > working directory and if it will be missing with the new index entry or > was not missing in the previous version, we create the previous index > version of the file in the working directory so that status will report > correctly and the files will be availble for the user to deal with. > s/availble/available -- An old man doll... just what I always wanted! - Clara
On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This lets `git
> update-index --force-full-index` run as a command without sparse index
> compatibility implemented, even after it receives sparse index compatibility
> updates.
`... explicitly setting ...` or `... explicitly set ...`? I thought of
the latter.
--
An old man doll... just what I always wanted! - Clara
On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`,
> the function must determine whether the currently-processing directory in
> the tree is sparse or not. If it is not sparse, the tree is parsed and
> subtree recursively constructed. If it is sparse, no subtrees are added to
> the tree and the entry count is set to 1 (representing the sparse directory
> itself).
>
Better say `If it is sparse, no subtrees ..., else the tree ...`
--
An old man doll... just what I always wanted! - Clara
On 05/10/21 20.20, Victoria Dye via GitGitGadget wrote:
> The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
> (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain
> the benefit `cache_bottom` provides in non-sparse index cases, a separate
> `hint` position indicates the first position `next_cache_entry` should
> search, updated each execution with a new position. The performance of `git
> reset -- does-not-exist` (testing the "worst case" in which all entries in
> the index are unpacked with `next_cache_entry`) is significantly improved
> for the sparse index case:
Did you mean `a separate `hint` ... should be searched`?
--
An old man doll... just what I always wanted! - Clara
Victoria Dye <vdye@github.com> writes:
> Thanks for the thorough explanation, I'm on-board with your approach (and
> will re-roll the series with that implemented). A lot of my thought process
> (and confusion) came from a comment in e5ca291076 (t1092: document bad
> sparse-checkout behavior, 2021-07-14) suggesting that full and sparse
> checkouts should have the same result in scenarios like the one you
> outlined above.
Thanks for bringing this up. I agree that it is crucial to clarify
what use case we are aiming for. If the objective were to make a
sparse checkout behave just like full checkout, the desired
behaviour would be very different from a system whose objective is
to allow users to pretend as if the hidden parts of sparse checkout
do not even exist, which was the model my example was after. I
agree with you that the "comment" in an earlier commit may have been
unhelpful in that they stopped at "should behave the same but they
shouldn't" without saying "why they should behave the same".
If the goal were to make sparse behave like full, continuing with
the previous example, after a
$ git reset --mixed HEAD^
the user should be able to say
$ git commit -a --amend
to replace the original two-commit history with a single commit
history that records the same resulting tree. If the path "skip"
were to be reset to the blob from the first commit, just like the
path "no-skip" is, for such a "commit -a --amend" to work, we would
need to have a working tree file for "skip" magically materialized
with the contents from the *second* commit. After all, the whole
point of mixed (and soft) reset is that they do not (logically)
change the files in the working tree, so if you are resetting from
the second commit to the first, if you were to have a working tree
file, it should come from the second commit, so that both "skip"
and "no-skip" should show "changed in the working tree relative to
the index", i.e.
$ git reset --mixed HEAD^
$ git ls-files -t
M no-skip
M skip
While such a "make sparse behave the same way as full" can be made
internally consistent, however, as the above example shows, it would
make the resulting "sparse checkout" practically unusable.
By stepping back a bit and realizing that the reason why the user
wanted to mark some path as "skip-worktree" was because the user had
no intention to make any change to them, we can make it usable again,
by not insisting that sparse should behave the same way as full.
When we redesign these patches, I would like to see what we failed
short the last time gets improved. Instead of saying "skip-worktree
entries should stay so" and stopping there, we should leave a note
for later readers to explain why they should.
Thanks.
Elijah Newren <newren@gmail.com> writes: > On Tue, Oct 5, 2021 at 6:21 AM Victoria Dye via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Victoria Dye <vdye@github.com> >> >> `reset --soft` does not modify the index, so no compatibility changes are >> needed for it to function without expanding the index. For all other reset >> modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is >> explicitly expanded with `ensure_full_index` to maintain current behavior. > > "to maintain current behavior"? You are changing code here, which > suggests some kind of behavior is changing, but that description seems > to be claiming the opposite. Is it some kind of preventative change > to add ensure_full_index calls in an additional place, with a later > patch in the series intending to remove the other one(s), so you're > making sure that later changes won't cause unwanted behavioral > changes? Or was something else meant here? > > If the above wasn't what you meant, but you're adding > ensure_full_index calls, does that suggest that we had some important > code paths that were not protected by such calls? The original called read_cache() before we know which mode we operate in, near the end of parse_args(), which resulted in an unconditional call to ensure_full_index() in repo_read_index(). This patch delays the call to read_cache(). If parse_pathspec() and everything the original called after the point where it called read_cache() needed to have a populated in-core index, the change can break things---I didn't check thoroughly, but I am guessing it is OK. >> Additionally, the `read_cache()` check verifying an uncorrupted index is >> moved after argument parsing and preparing the repo settings. The index is >> not used by the preceding argument handling, but `read_cache()` does need to >> be run after enabling sparse index for the command and before resetting. > > This seems to be discussing what code changes are being made, but not > why. I'm guessing at the reasoning, but is it something along the > lines of: > > """ > Also, make sure to read_cache() after setting > command_requires_full_index = 0, so that we don't unnecessarily expand > the index as part of our early index-corruption check. > """ I think it is more like "we used to expand very early for all modes, but with this change we move the read_cache() call to much later, and force it not to expand. The modes that call read_from_tree() needs in-core index fully expanded, so we do so there, but the soft reset does not call it and would stop expanding."
Elijah Newren wrote: >> - int is_missing = !(one->mode && !is_null_oid(&one->oid)); >> + struct diff_filespec *two = q->queue[i]->two; >> + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); > > Isn't !is_null_oid(&one->oid) redundant to checking one->mode? When > does the diff machinery ever give you a non-zero mode with a null oid? > It looks like this originally only checked the mode, and the extra OID check was introduced in ff00b682f2 (reset [<commit>] paths...: do not mishandle unmerged paths, 2011-07-13). I was able to remove `!is_null_oid(&one->oid)` from the condition and run the `t71*` tests without any failures, but I'm hesitant to remove it on the off chance that this handles a case I'm not thinking of. > Also, is_in_reset_tree == !is_missing; I'll note that below. > >> struct cache_entry *ce; >> >> + /* >> + * If the file being reset has `skip-worktree` enabled, we need >> + * to check it out to prevent the file from being hard reset. > > I don't understand this comment. If the file wasn't originally in the > index (is_missing), and is being added to it, and is correctly marked > as skip_worktree, and the file isn't in the working tree, then it > sounds like everything is already in a good state. Files outside the > sparse checkout are meant to have the skip_worktree bit set and be > missing from the working tree. > > Also, I don't know what you mean by 'hard reset' here. > >> + */ >> + pos = cache_name_pos(two->path, strlen(two->path)); >> + if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { >> + struct checkout state = CHECKOUT_INIT; >> + state.force = 1; >> + state.refresh_cache = 1; >> + state.istate = &the_index; >> + >> + checkout_entry(active_cache[pos], &state, NULL, NULL); > > Does this introduce an error in the opposite direction from the one > stated in the commit message? Namely we have two things that should > be in sync: the skip_worktree flag stating whether the file should be > present in the working directory (skip_worktree), and the question of > whether the file is actually in the working directory. In the commit > message, you pointed out a case where the y were out of sync one way: > the skip_worktree flag was not set but the file was missing. Here you > say the skip_worktree flag is set, but you add it to the working tree > anyway. > > Or am I misunderstanding the code? > Most of this is addressed in [1], and you're right that what's in this patch isn't the right fix for the problem. This patch tried to solve the issue of "skip-worktree is being ignored and reset files are showing up deleted" by continuing to ignore `skip-worktree`, but now checking out the `skip-worktree` files based on their pre-reset state in the index (unless they, for some reason, were already present in the worktree). However, that completely disregards the reasoning for having `skip-worktree` in the first place (the user wants the file *ignored* in the worktree) and violates the premise of `git reset --mixed` not modifying the worktree, so the better solution is to set `skip-worktree` in the resulting index entry and not check out anything. [1] https://lore.kernel.org/git/9b99e856-24cc-03fd-7871-de92dc6e39b6@github.com/ >> + } >> + > > [I did some slight editing to the diff to make the next two parts > appear next to each other] > >> - if (is_missing && !intent_to_add) { >> + if (!is_in_reset_tree && !intent_to_add) { > > I thought this was some subtle bugfix or something, and spent a while > trying to figure it out, before realizing that is_in_reset_tree was > simply defined as !is_missing (for some reason I was assuming it was > dealing with two->mode while is_missing was looking at one->mode). So > this is a simple variable renaming, which I think is probably good, > but I'd prefer if this was separated into a different patch to make it > easier to review. > Good call, I'll include this in V3.
Elijah Newren wrote: > On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Victoria Dye <vdye@github.com> >> >> Add a new `--force-full-index` option to `git update-index`, which skips >> explicitly setting `command_requires_full_index`. This lets `git >> update-index --force-full-index` run as a command without sparse index >> compatibility implemented, even after it receives sparse index compatibility >> updates. >> >> By using `git update-index --force-full-index` in the `t1092` test >> `sparse-index is expanded and converted back`, commands can continue to >> integrate with the sparse index without the need to keep modifying the >> command used in the test. > > So...we're adding a permanent user-facing command line flag, whose > purpose is just to help us with the transition work of implementing > sparse indexes everywhere? Am I reading that right, or is that just > the reason for t1092 and there are more reasons for it elsewhere? > > Also, I'm curious if update-index is the right place to add this. If > you don't want a sparse index anymore, wouldn't a user want to run > git sparse-checkout disable > ? Or is the point that you do want to keep the sparse checkout, but > you just don't want the index to also be sparse? Still, even in that > case, it seems like adding a subcommand or flag to an existing > sparse-checkout subcommand would feel more natural, since > sparse-checkout is the command the user uses to request to get into a > sparse-checkout and sparse index. > This came out of a conversation [1] on an earlier version of this patch. Because the `t1092 - sparse-index is expanded and converted back` test verifies sparse index compatibility (i.e., expand the index when reading, collapse back to sparse when writing) on commands that don't have any sparse index integration, it needed to be changed from `git reset` to something else. However, as we keep integrating commands with sparse index we'd need to keep changing the command in the test, creating a bunch of patches doing effectively the same thing for no long-term benefit. The `--force-full-index` flag isn't meant to be used externally or modify the index in any "new" way - it's really just a "test" version of `git update-index` that we guarantee will accurately represent a command using the default settings. Right now, it does exactly what `git update-index` (without the flag) does, and will only behave differently once `git update-index` is integrated with sparse index. Using `--force-full-index`, the test won't need to be regularly updated and will continue to catch errors like: 1. Changing the default value of `command_requires_full_index` to 0 2. Not expanding a sparse index to full when `command_requires_full_index` is 1 3. Not collapsing the index back to sparse if sparse index is enabled I see the issue of introducing a test-only option (when sparse index is integrated everywhere, shouldn't it be deprecated?). If there's a way to make this more obviously internal/temporary, I'm happy to modify it. Or, if semi-frequent updates of the command in the test aren't a huge issue, I can revert to V1. [1] https://lore.kernel.org/git/xmqqr1d58v9x.fsf@gitster.g/
Elijah Newren wrote: >> +static void prime_cache_tree_sparse_dir(struct repository *r, >> + struct cache_tree *it, >> + struct tree *tree, >> + struct strbuf *tree_path) >> +{ >> + >> + oidcpy(&it->oid, &tree->object.oid); >> + it->entry_count = 1; >> + return; > > Why are 'r' and 'tree_path' passed to this function? > I mindlessly copied the function signature of `prime_cache_tree_rec` and didn't notice those variables weren't needed (I'll remove them in V3). >> +} >> + >> static void prime_cache_tree_rec(struct repository *r, >> struct cache_tree *it, >> - struct tree *tree) >> + struct tree *tree, >> + struct strbuf *tree_path) >> { >> + struct strbuf subtree_path = STRBUF_INIT; >> struct tree_desc desc; >> struct name_entry entry; >> int cnt; >> >> oidcpy(&it->oid, &tree->object.oid); >> + > > Why the blank line addition here? > My goal was to visually separate the parts of `prime_cache_tree_rec` that update the properties of the `tree` itself and the parts that deal with its entries. For me, it was helpful when reading and understanding what this function does and seemed like an good (minor) readability change. >> init_tree_desc(&desc, tree->buffer, tree->size); >> cnt = 0; >> while (tree_entry(&desc, &entry)) { >> @@ -757,27 +771,49 @@ static void prime_cache_tree_rec(struct repository *r, >> else { >> struct cache_tree_sub *sub; >> struct tree *subtree = lookup_tree(r, &entry.oid); >> + >> if (!subtree->object.parsed) >> parse_tree(subtree); >> sub = cache_tree_sub(it, entry.path); >> sub->cache_tree = cache_tree(); >> - prime_cache_tree_rec(r, sub->cache_tree, subtree); > >> + strbuf_reset(&subtree_path); >> + strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1); >> + strbuf_addbuf(&subtree_path, tree_path); >> + strbuf_add(&subtree_path, entry.path, entry.pathlen); >> + strbuf_addch(&subtree_path, '/'); > > Reconstructing the full path each time? And despite only being useful > for the sparse-index case? > > Would it be better to drop subtree_path from this function, then > append entry.path + '/' here to tree_path, and then after the if-block > below, call strbuf_setlen to remove the part that this function call > added? That way, we don't need subtree_path, and don't have to copy > the leading path every time. > > Also, maybe it'd be better to only do this strbuf manipulation if > r->index->sparse_index, since it's not ever used otherwise? > [...] >> -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) >> +static int index_name_stage_pos(struct index_state *istate, >> + const char *name, int namelen, >> + int stage, >> + int search_sparse) > > It'd be nicer to make search_sparse an enum defined within this file, so that... > >> { >> int first, last; >> >> @@ -570,7 +573,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in >> first = next+1; >> } >> >> - if (istate->sparse_index && >> + if (search_sparse && istate->sparse_index && >> first > 0) { >> /* Note: first <= istate->cache_nr */ >> struct cache_entry *ce = istate->cache[first - 1]; >> @@ -586,7 +589,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in >> ce_namelen(ce) < namelen && >> !strncmp(name, ce->name, ce_namelen(ce))) { >> ensure_full_index(istate); >> - return index_name_stage_pos(istate, name, namelen, stage); >> + return index_name_stage_pos(istate, name, namelen, stage, search_sparse); >> } >> } >> >> @@ -595,7 +598,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in >> >> int index_name_pos(struct index_state *istate, const char *name, int namelen) >> { >> - return index_name_stage_pos(istate, name, namelen, 0); >> + return index_name_stage_pos(istate, name, namelen, 0, 1); > > ...this could use SEARCH_SPARSE or some name like that which is more > meaningful than "1" here. > >> +} >> + >> +int index_entry_exists(struct index_state *istate, const char *name, int namelen) >> +{ >> + return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0; > > ...and likewise this spot could use SEARCH_FULL or some name like > that, which is more meaningful than the second "0". > > Similarly for multiple call sites below... > > I like all of these suggestions and will include them in the next version. Thanks!
Elijah Newren wrote: >> + /* >> + * When pathspec is given for resetting a cone-mode sparse checkout, it may >> + * identify entries that are nested in sparse directories, in which case the >> + * index should be expanded. For the sake of efficiency, this check is >> + * overly-cautious: anything with a wildcard or a magic prefix requires >> + * expansion, as well as literal paths that aren't in the sparse checkout >> + * definition AND don't match any directory in the index. > > s/efficiency/efficiency of checking/ ? Being overly-cautious suggests > you'll expand to a full index more than is needed, and full indexes > are more expensive. But perhaps the checking would be expensive too > so you have a tradeoff? > > Or maybe s/efficiency/simplicity/? > "Simplicity" is probably more appropriate, although the original intent was "efficiency of checking". I wanted to avoid repeated iteration over the index (for example, matching the `no_wildcard_len` of each wildcard pathspec item against each sparse directory in the index). However, to your point, expanding the index is a more expensive operation anyway, so it's probably worth the more involved checks. >> + */ >> + if (pathspec->nr && the_index.sparse_index) { >> + if (pathspec->magic || pathspec->has_wildcard) { >> + ensure_full_index(&the_index); > > dir.c has the notion of matching the characters preceding the wildcard > characters; look for "no_wildcard_len". If the pathspec doesn't match > a path up to no_wildcard_len, then the wildcard character(s) later in > the pathspec can't make the pathspec match that path. > > It might at least be worth mentioning this as a possible future optimization. > I'll incorporate a something like this into the next version. >> + } else { >> + for (i = 0; i < pathspec->nr; i++) { >> + if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && >> + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { > > What if the pathspec corresponds to a sparse-directory in the index, > but possibly without the trailing '/' character? e.g.: > > git reset HEAD~1 -- sparse-directory > > One should be able to reset that directory without recursing into > it...does this code handle that? Does it handle it if we add the > trailing slash on the path for the reset command line? > It handles both cases (with and without trailing slash), the former due to `!matches_skip_worktree(...)` and the latter due to `!path_in_cone_mode_sparse_checkout(...)`. >> + ensure_full_index(&the_index); >> + break; >> + } >> + } >> + } >> + } >> + >> + free(skip_worktree_seen); >> >> - ensure_full_index(&the_index); >> if (do_diff_cache(tree_oid, &opt)) >> return 1; >> diffcore_std(&opt); >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index e301ef5633a..4afcbc2d673 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -804,11 +804,22 @@ test_expect_success 'sparse-index is not expanded' ' >> ensure_not_expanded reset --hard $ref || return 1 >> done && >> >> + ensure_not_expanded reset --mixed base && >> ensure_not_expanded reset --hard update-deep && >> ensure_not_expanded reset --keep base && >> ensure_not_expanded reset --merge update-deep && >> - ensure_not_expanded reset --hard && > > This commit was only touching the --mixed case; why is it removing one > of the tests for --hard? > [...] >> + ensure_not_expanded reset --hard update-folder1 && > > Wait...is update-folder1 a branch or a path? And if this commit is > about --mixed, why are --hard testcases being added? > >> + ensure_not_expanded reset base -- folder1 && >> + >> + ensure_not_expanded reset --hard update-deep && > > another --hard testcase...was this an accidental squash by chance? > I included `git reset --hard` between the "actual" test cases so that the `git reset --mixed` tests would start in a "clean" state (clear out any modified files), but it's unnecessary in most cases so I'll remove them in V3. To answer your other question, `update-folder1` is a branch.
This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate a ~70% execution time reduction in git reset using a sparse index, and no change (within expected variability [2]) using a full index. Results summarized below [3, 4]: Test base [5/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.97(0.50+0.37) -3.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.96(0.50+0.36) -4.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 1.37(0.91+0.35) -18.5% git reset --hard (sparse-v4) 1.70(1.18+0.40) 1.41(0.94+0.35) -17.1% Test base [6/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.94(0.48+0.34) -6.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.95(0.51+0.34) -5.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 0.46(0.05+0.29) -72.6% git reset --hard (sparse-v4) 1.70(1.18+0.40) 0.46(0.06+0.29) -72.9% Test base [7/8] --------------------------------------------------------------------------- git reset (full-v3) 0.77(0.27+0.37) 0.72(0.26+0.32) -6.5% git reset (full-v4) 0.75(0.27+0.34) 0.73(0.26+0.32) -2.7% git reset (sparse-v3) 1.44(0.96+0.36) 0.43(0.04+0.96) -70.1% git reset (sparse-v4) 1.46(0.97+0.36) 0.43(0.05+0.79) -70.5% git reset -- missing (full-v3) 0.72(0.26+0.32) 0.69(0.26+0.30) -4.2% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.71(0.27+0.32) -4.1% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.81(0.42+0.90) -44.1% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.79(0.42+0.76) -44.0% Test base [8/8] --------------------------------------------------------------------------- git reset -- missing (full-v3) 0.72(0.26+0.32) 0.73(0.26+0.33) +1.4% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.74(0.27+0.32) +0.0% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.43(0.05+0.80) -70.3% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.44(0.05+0.76) -68.8% Changes since V1 ================ * Add --force-full-index option to update-index. The option is used circumvent changing command_requires_full_index from its default value - right now this is effectively a no-op, but will change once update-index is integrated with sparse index. By using this option in the t1092 expand/collapse test, the command used to test will not need to be updated with subsequent sparse index integrations. * Update implementation of mixed reset for entries outside sparse checkout definition. The condition in which a file should be checked out before index reset is simplified to "if it has skip-worktree enabled and a reset would change the file, check it out". * After checking the behavior of update_index_from_diff with renames, found that the diff used by reset does not produce diff queue entries with different pathnames for one and two. Because of this, and that nothing in the implementation seems to rely on identical path names, no BUG check is added. * Correct a bug in the sparse index is not expanded tests in t1092 where failure of a git reset --mixed test was not being reported. Test now verifies an appropriate scenario with corrected failure-checking. Changes since V2 ================ * Replace patch adding checkouts for git reset --mixed with sparse checkout with preserving the skip-worktree flag (including a new test for git reset --mixed and update to t1092 - checkout and reset (mixed)) * Move rename of is_missing into its own patch * Further extend t1092 tests and remove unnecessary commands/tests where possible * Refine logic determining which pathspecs require ensure_full_index in git reset --mixed, add related ensure_not_expanded tests * Add index_search_mode enum to index_name_stage_pos * Clean up variable usage & remove unnecessary subtree_path in prime_cache_tree_rec * Update cover letter performance data * More thoroughly explain changes in each commit message Thanks! -Victoria [1] microsoft@6b8a074 [2] https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/ [3] If a test and/or commit is not mentioned, there is no significant change to performance [4] Pathspec "does-not-exist" is changed to "missing" to save space in performance report Kevin Willford (1): reset: preserve skip-worktree bit in mixed reset Victoria Dye (7): reset: rename is_missing to !is_in_reset_tree update-index: add --force-full-index option for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry Documentation/git-update-index.txt | 5 + builtin/reset.c | 104 +++++++++++++++++- builtin/update-index.c | 11 ++ cache-tree.c | 46 +++++++- cache.h | 10 ++ read-cache.c | 27 +++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 133 ++++++++++++++++++++--- t/t7102-reset.sh | 17 +++ unpack-trees.c | 23 +++- 10 files changed, 342 insertions(+), 37 deletions(-) base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 Range-diff vs v2: -: ----------- > 1: ad7013a31aa reset: rename is_missing to !is_in_reset_tree 1: 22c69bc6030 ! 2: 1f6da84830b reset: behave correctly with sparse-checkout @@ Metadata Author: Kevin Willford <kewillf@microsoft.com> ## Commit message ## - reset: behave correctly with sparse-checkout + reset: preserve skip-worktree bit in mixed reset - When using the sparse checkout feature, 'git reset' will add entries to - the index that will have the skip-worktree bit off but will leave the - working directory empty. File data is lost because the index version of - the files has been changed but there is nothing that is in the working - directory. This will cause the next 'git status' call to show either - deleted for files modified or deleting or nothing for files added. The - added files should be shown as untracked and modified files should be - shown as modified. + Change `update_index_from_diff` to set `skip-worktree` when applicable for + new index entries. When `git reset --mixed <tree-ish>` is run, entries in + the index with differences between the pre-reset HEAD and reset <tree-ish> + are identified and handled with `update_index_from_diff`. For each file, a + new cache entry in inserted into the index, created from the <tree-ish> side + of the reset (without changing the working tree). However, the newly-created + entry must have `skip-worktree` explicitly set in either of the following + scenarios: - To fix this when the reset is running if there is not a file in the - working directory and if it will be missing with the new index entry or - was not missing in the previous version, we create the previous index - version of the file in the working directory so that status will report - correctly and the files will be availble for the user to deal with. + 1. the file is in the current index and has `skip-worktree` set + 2. the file is not in the current index but is outside of a defined sparse + checkout definition - This fixes a documented failure from t1092 that was created in 19a0acc - (t1092: test interesting sparse-checkout scenarios, 2021-01-23). + Not setting the `skip-worktree` bit leads to likely-undesirable results for + a user. It causes `skip-worktree` settings to disappear on the + "diff"-containing files (but *only* the diff-containing files), leading to + those files now showing modifications in `git status`. For example, when + running `git reset --mixed` in a sparse checkout, some file entries outside + of sparse checkout could show up as deleted, despite the user never deleting + anything (and not wanting them on-disk anyway). - Signed-off-by: Kevin Willford <kewillf@microsoft.com> - Signed-off-by: Derrick Stolee <dstolee@microsoft.com> + Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved + in a basic `git reset --mixed` scenario and update a failure-documenting + test from 19a0acc (t1092: test interesting sparse-checkout scenarios, + 2021-01-23) with new expected behavior. + + Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> ## builtin/reset.c ## @@ builtin/reset.c #include "submodule.h" #include "submodule-config.h" +#include "dir.h" -+#include "entry.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; -- int is_missing = !(one->mode && !is_null_oid(&one->oid)); -+ struct diff_filespec *two = q->queue[i]->two; -+ int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, -- if (is_missing && !intent_to_add) { + ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, + 0, 0); ++ + /* -+ * If the file being reset has `skip-worktree` enabled, we need -+ * to check it out to prevent the file from being hard reset. ++ * If the file 1) corresponds to an existing index entry with ++ * skip-worktree set, or 2) does not exist in the index but is ++ * outside the sparse checkout definition, add a skip-worktree bit ++ * to the new index entry. + */ -+ pos = cache_name_pos(two->path, strlen(two->path)); -+ if (pos >= 0 && ce_skip_worktree(active_cache[pos])) { -+ struct checkout state = CHECKOUT_INIT; -+ state.force = 1; -+ state.refresh_cache = 1; -+ state.istate = &the_index; ++ pos = cache_name_pos(one->path, strlen(one->path)); ++ if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || ++ (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) ++ ce->ce_flags |= CE_SKIP_WORKTREE; + -+ checkout_entry(active_cache[pos], &state, NULL, NULL); -+ } -+ -+ if (!is_in_reset_tree && !intent_to_add) { - remove_file_from_cache(one->path); - continue; - } -@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); -- if (is_missing) { -+ if (!is_in_reset_tree) { - ce->ce_flags |= CE_INTENT_TO_ADD; - set_object_name_for_intent_to_add_entry(ce); - } ## t/t1092-sparse-checkout-compatibility.sh ## @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathspec outside sparse definition' ' @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathsp init_repos && test_all_match git checkout -b reset-test update-deep && + test_all_match git reset deepest && +- test_all_match git reset update-folder1 && +- test_all_match git reset update-folder2 +-' +- +-# NEEDSWORK: a sparse-checkout behaves differently from a full checkout +-# in this scenario, but it shouldn't. +-test_expect_success 'checkout and reset (mixed) [sparse]' ' +- init_repos && + +- test_sparse_match git checkout -b reset-test update-deep && +- test_sparse_match git reset deepest && ++ # Because skip-worktree is preserved, resetting to update-folder1 ++ # will show worktree changes for full-checkout that are not present ++ # in sparse-checkout or sparse-index. + test_sparse_match git reset update-folder1 && +- test_sparse_match git reset update-folder2 ++ run_on_sparse test_path_is_missing folder1 + ' + + test_expect_success 'merge, cherry-pick, and rebase' ' - ## t/t7114-reset-sparse-checkout.sh (new) ## -@@ -+#!/bin/sh -+ -+test_description='reset when using a sparse-checkout' -+ -+. ./test-lib.sh -+ -+test_expect_success 'setup' ' -+ test_tick && -+ echo "checkout file" >c && -+ echo "modify file" >m && -+ echo "delete file" >d && -+ git add . && -+ git commit -m "initial commit" && -+ echo "added file" >a && -+ echo "modification of a file" >m && -+ git rm d && -+ git add . && -+ git commit -m "second commit" && -+ git checkout -b endCommit -+' -+ -+test_expect_success 'reset when there is a sparse-checkout' ' -+ echo "/c" >.git/info/sparse-checkout && -+ test_config core.sparsecheckout true && -+ git checkout -B resetBranch && -+ test_path_is_missing m && -+ test_path_is_missing a && -+ test_path_is_missing d && -+ git reset HEAD~1 && -+ echo "checkout file" >expect && -+ test_cmp expect c && -+ echo "added file" >expect && -+ test_cmp expect a && -+ echo "modification of a file" >expect && -+ test_cmp expect m && -+ test_path_is_missing d -+' + ## t/t7102-reset.sh ## +@@ t/t7102-reset.sh: test_expect_success '--mixed refreshes the index' ' + test_cmp expect output + ' + ++test_expect_success '--mixed preserves skip-worktree' ' ++ echo 123 >>file2 && ++ git add file2 && ++ git update-index --skip-worktree file2 && ++ git reset --mixed HEAD >output && ++ test_must_be_empty output && + -+test_expect_success 'reset after deleting file without skip-worktree bit' ' -+ git checkout -f endCommit && -+ git clean -xdf && -+ cat >.git/info/sparse-checkout <<-\EOF && -+ /c -+ /m ++ cat >expect <<-\EOF && ++ Unstaged changes after reset: ++ M file2 + EOF -+ test_config core.sparsecheckout true && -+ git checkout -B resetAfterDelete && -+ test_path_is_file m && -+ test_path_is_missing a && -+ test_path_is_missing d && -+ rm -f m && -+ git reset HEAD~1 && -+ echo "checkout file" >expect && -+ test_cmp expect c && -+ echo "added file" >expect && -+ test_cmp expect a && -+ test_path_is_missing m && -+ test_path_is_missing d ++ git update-index --no-skip-worktree file2 && ++ git add file2 && ++ git reset --mixed HEAD >output && ++ test_cmp expect output +' + -+test_done + test_expect_success 'resetting specific path that is unmerged' ' + git rm --cached file2 && + F1=$(git rev-parse HEAD:file1) && 2: f7cb9013d46 ! 3: 014a408ea5d update-index: add --force-full-index option for expand/collapse test @@ Commit message update-index: add --force-full-index option for expand/collapse test Add a new `--force-full-index` option to `git update-index`, which skips - explicitly setting `command_requires_full_index`. This lets `git - update-index --force-full-index` run as a command without sparse index - compatibility implemented, even after it receives sparse index compatibility - updates. + explicitly setting `command_requires_full_index`. This option, intended for + use in internal testing purposes only, lets `git update-index` run as a + command without sparse index compatibility implemented, even after it + receives updates to otherwise use the sparse index. - By using `git update-index --force-full-index` in the `t1092` test - `sparse-index is expanded and converted back`, commands can continue to - integrate with the sparse index without the need to keep modifying the - command used in the test. + The specific test `--force-full-index` is intended for - `t1092 - + sparse-index is expanded and converted back` - verifies index compatibility + in commands that do not change the default (enabled) + `command_requires_full_index` repo setting. In the past, the test used `git + reset`. However, as `reset` and other commands are integrated with the + sparse index, the command used in the test would need to keep changing. + Conversely, the `--force-full-index` option makes `git update-index` behave + like a not-yet-sparse-aware command, and can be used in the test + indefinitely without interfering with future sparse index integrations. + Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> ## Documentation/git-update-index.txt ## 3: c7e9d9f4e03 ! 4: 7f21cf53e9d reset: expand test coverage for sparse checkouts @@ Commit message reset: expand test coverage for sparse checkouts Add new tests for `--merge` and `--keep` modes, as well as mixed reset with - pathspecs both inside and outside of the sparse checkout definition. New - performance test cases exercise various execution paths for `reset`. + pathspecs. New performance test cases exercise various execution paths for + `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git add -A test_done ## t/t1092-sparse-checkout-compatibility.sh ## -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and reset (mixed) [sparse]' ' - test_sparse_match git reset update-folder2 +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and reset (mixed)' ' + run_on_sparse test_path_is_missing folder1 ' -+# NEEDSWORK: with mixed reset, files with differences between HEAD and <commit> -+# will be added to the work tree even if outside the sparse checkout -+# definition, and even if the file is modified to a state of having no local -+# changes. The file is "re-ignored" if a hard reset is executed. We may want to -+# change this behavior in the future and enforce that files are not written -+# outside of the sparse checkout definition. -+test_expect_success 'checkout and mixed reset file tracking [sparse]' ' -+ init_repos && -+ -+ test_all_match git checkout -b reset-test update-deep && -+ test_all_match git reset update-folder1 && -+ test_all_match git reset update-deep && -+ -+ # At this point, there are no changes in the working tree. However, -+ # folder1/a now exists locally (even though it is outside of the sparse -+ # paths). -+ run_on_sparse test_path_exists folder1 && -+ -+ run_on_all rm folder1/a && -+ test_all_match git status --porcelain=v2 && -+ -+ test_all_match git reset --hard update-deep && -+ run_on_sparse test_path_is_missing folder1 && -+ test_path_exists full-checkout/folder1 -+' -+ +test_expect_success 'checkout and reset (merge)' ' + init_repos && + @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and rese + test_all_match git status --porcelain=v2 +' + -+test_expect_success 'reset with sparse directory pathspec outside definition' ' ++# Although the working tree differs between full and sparse checkouts after ++# reset, the state of the index is the same. ++test_expect_success 'reset with pathspecs outside sparse definition' ' + init_repos && ++ test_all_match git checkout -b reset-test base && + -+ test_all_match git checkout -b reset-test update-deep && -+ test_all_match git reset --hard update-folder1 && -+ test_all_match git reset base -- folder1 && -+ test_all_match git status --porcelain=v2 -+' ++ test_sparse_match git reset update-folder1 -- folder1 && ++ git -C full-checkout reset update-folder1 -- folder1 && ++ test_sparse_match git status --porcelain=v2 && ++ test_all_match git rev-parse HEAD:folder1 && + -+test_expect_success 'reset with pathspec match in sparse directory' ' -+ init_repos && -+ -+ test_all_match git checkout -b reset-test update-deep && -+ test_all_match git reset --hard update-folder1 && -+ test_all_match git reset base -- folder1/a && -+ test_all_match git status --porcelain=v2 ++ test_sparse_match git reset update-folder2 -- folder2/a && ++ git -C full-checkout reset update-folder2 -- folder2/a && ++ test_sparse_match git status --porcelain=v2 && ++ test_all_match git rev-parse HEAD:folder2/a +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && -+ test_all_match git reset --hard update-folder1 && + test_all_match git reset base -- \*/a && -+ test_all_match git status --porcelain=v2 ++ test_all_match git status --porcelain=v2 && ++ test_all_match git rev-parse HEAD:folder1/a && ++ ++ test_all_match git reset base -- folder\* && ++ test_all_match git status --porcelain=v2 && ++ test_all_match git rev-parse HEAD:folder2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' 4: 49813c8d9ed ! 5: a2d6212e287 reset: integrate with sparse index @@ Metadata ## Commit message ## reset: integrate with sparse index - `reset --soft` does not modify the index, so no compatibility changes are - needed for it to function without expanding the index. For all other reset - modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is - explicitly expanded with `ensure_full_index` to maintain current behavior. + Disable `command_requires_full_index` repo setting and add + `ensure_full_index` guards around code paths that cannot yet use sparse + directory index entries. `reset --soft` does not modify the index, so no + compatibility changes are needed for it to function without expanding the + index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), + the full index is expanded to prevent cache tree corruption and invalid + variable accesses. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is - not used by the preceding argument handling, but `read_cache()` does need to - be run after enabling sparse index for the command and before resetting. + not used by the preceding argument handling, but `read_cache()` must be run + *after* enabling sparse index for the command (so that the index is not + expanded unnecessarily) and *before* using the index for reset (so that it + is verified as uncorrupted). Signed-off-by: Victoria Dye <vdye@github.com> 5: 78cd85d8dcc ! 6: 330e0c09774 reset: make sparse-aware (except --mixed) @@ Metadata ## Commit message ## reset: make sparse-aware (except --mixed) - In order to accurately reconstruct the cache tree in `prime_cache_tree_rec`, - the function must determine whether the currently-processing directory in - the tree is sparse or not. If it is not sparse, the tree is parsed and - subtree recursively constructed. If it is sparse, no subtrees are added to - the tree and the entry count is set to 1 (representing the sparse directory - itself). + Remove `ensure_full_index` guard on `prime_cache_tree` and update + `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in + the cache tree. While processing a tree's entries, `prime_cache_tree_rec` + must determine whether a directory entry is sparse or not by searching for + it in the index (*without* expanding the index). If a matching sparse + directory index entry is found, no subtrees are added to the cache tree + entry and the entry count is set to 1 (representing the sparse directory + itself). Otherwise, the tree is assumed to not be sparse and its subtrees + are recursively added to the cache tree. + Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> ## cache-tree.c ## @@ cache-tree.c: out: return ret; } -+static void prime_cache_tree_sparse_dir(struct repository *r, -+ struct cache_tree *it, -+ struct tree *tree, -+ struct strbuf *tree_path) ++static void prime_cache_tree_sparse_dir(struct cache_tree *it, ++ struct tree *tree) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; -+ return; +} + static void prime_cache_tree_rec(struct repository *r, @@ cache-tree.c: out: + struct tree *tree, + struct strbuf *tree_path) { -+ struct strbuf subtree_path = STRBUF_INIT; struct tree_desc desc; struct name_entry entry; int cnt; ++ int base_path_len = tree_path->len; oidcpy(&it->oid, &tree->object.oid); + @@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r, sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); -+ strbuf_reset(&subtree_path); -+ strbuf_grow(&subtree_path, tree_path->len + entry.pathlen + 1); -+ strbuf_addbuf(&subtree_path, tree_path); -+ strbuf_add(&subtree_path, entry.path, entry.pathlen); -+ strbuf_addch(&subtree_path, '/'); ++ ++ /* ++ * Recursively-constructed subtree path is only needed when working ++ * in a sparse index (where it's used to determine whether the ++ * subtree is a sparse directory in the index). ++ */ ++ if (r->index->sparse_index) { ++ strbuf_setlen(tree_path, base_path_len); ++ strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); ++ strbuf_add(tree_path, entry.path, entry.pathlen); ++ strbuf_addch(tree_path, '/'); ++ } + + /* + * If a sparse index is in use, the directory being processed may be @@ cache-tree.c: static void prime_cache_tree_rec(struct repository *r, + * as normal. + */ + if (r->index->sparse_index && -+ index_entry_exists(r->index, subtree_path.buf, subtree_path.len)) -+ prime_cache_tree_sparse_dir(r, sub->cache_tree, subtree, &subtree_path); ++ index_entry_exists(r->index, tree_path->buf, tree_path->len)) ++ prime_cache_tree_sparse_dir(sub->cache_tree, subtree); + else -+ prime_cache_tree_rec(r, sub->cache_tree, subtree, &subtree_path); ++ prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); cnt += sub->cache_tree->entry_count; } } - it->entry_count = cnt; + -+ strbuf_release(&subtree_path); + it->entry_count = cnt; } - void prime_cache_tree(struct repository *r, +@@ cache-tree.c: void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { @@ cache.h: struct cache_entry *index_file_exists(struct index_state *istate, const * precise match was not found but a position was found where the entry would ## read-cache.c ## +@@ + */ + #define CACHE_ENTRY_PATH_LENGTH 80 + ++enum index_search_mode { ++ NO_EXPAND_SPARSE = 0, ++ EXPAND_SPARSE = 1 ++}; ++ + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) + { + struct cache_entry *ce; @@ read-cache.c: int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } @@ read-cache.c: int cache_name_stage_compare(const char *name1, int len1, int stag +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, -+ int search_sparse) ++ enum index_search_mode search_mode) { int first, last; @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const } - if (istate->sparse_index && -+ if (search_sparse && istate->sparse_index && ++ if (search_mode == EXPAND_SPARSE && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); -+ return index_name_stage_pos(istate, name, namelen, stage, search_sparse); ++ return index_name_stage_pos(istate, name, namelen, stage, search_mode); } } @@ read-cache.c: static int index_name_stage_pos(struct index_state *istate, const int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); -+ return index_name_stage_pos(istate, name, namelen, 0, 1); ++ return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ -+ return index_name_stage_pos(istate, name, namelen, 0, 0) >= 0; ++ return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ read-cache.c: static int has_dir_name(struct index_state *istate, } - pos = index_name_stage_pos(istate, name, len, stage); -+ pos = index_name_stage_pos(istate, name, len, stage, 1); ++ pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); if (pos >= 0) { /* * Found one, but not so fast. This could @@ read-cache.c: static int add_index_entry_with_check(struct index_state *istate, pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); -+ pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); ++ pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); /* existing match? Just replace it. */ if (pos >= 0) { @@ read-cache.c: static int add_index_entry_with_check(struct index_state *istate, return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); -+ pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), 1); ++ pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); pos = -pos-1; } return pos + 1; 6: 5eaae0825af ! 7: 6ef8e4e31d3 reset: make --mixed sparse-aware @@ Metadata ## Commit message ## reset: make --mixed sparse-aware - Sparse directory entries are "diffed" as trees in `diff_cache` (used - internally by `reset --mixed`), following a code path separate from - individual file handling. The use of `diff_tree_oid` there requires setting - explicit `change` and `add_remove` functions to process the internal - contents of a sparse directory. + Remove the `ensure_full_index` guard on `read_from_tree` and update `git + reset --mixed` to ensure it can use sparse directory index entries wherever + possible. Sparse directory entries are reset use `diff_tree_oid`, which + requires `change` and `add_remove` functions to process the internal + contents of the sparse directory. The `recursive` diff option handles cases + in which `reset --mixed` must diff/merge files that are nested multiple + levels deep in a sparse directory. - Additionally, the `recursive` diff option handles cases in which `reset - --mixed` must diff/merge files that are nested multiple levels deep in a - sparse directory. + The use of pathspecs with `git reset --mixed` introduces scenarios in which + internal contents of sparse directories may be matched by the pathspec. In + order to reset *all* files in the repo that may match the pathspec, the + following conditions on the pathspec require index expansion before + performing the reset: + * "magic" pathspecs + * wildcard pathspecs that do not match only in-cone files or entire sparse + directories + * literal pathspecs matching something outside the sparse checkout + definition + + Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> ## builtin/reset.c ## -@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec, - int intent_to_add) - { - struct diff_options opt; -+ unsigned int i; -+ char *skip_worktree_seen = NULL; +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, + * If the file 1) corresponds to an existing index entry with + * skip-worktree set, or 2) does not exist in the index but is + * outside the sparse checkout definition, add a skip-worktree bit +- * to the new index entry. ++ * to the new index entry. Note that a sparse index will be expanded ++ * if this entry is outside the sparse cone - this is necessary ++ * to properly construct the reset sparse directory. + */ + pos = cache_name_pos(one->path, strlen(one->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || +@@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, + } + } - memset(&opt, 0, sizeof(opt)); - copy_pathspec(&opt.pathspec, pathspec); -@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec, - opt.format_callback = update_index_from_diff; - opt.format_callback_data = &intent_to_add; - opt.flags.override_submodule_config = 1; -+ opt.flags.recursive = 1; - opt.repo = the_repository; -+ opt.change = diff_change; -+ opt.add_remove = diff_addremove; ++static int pathspec_needs_expanded_index(const struct pathspec *pathspec) ++{ ++ unsigned int i, pos; ++ int res = 0; ++ char *skip_worktree_seen = NULL; + + /* -+ * When pathspec is given for resetting a cone-mode sparse checkout, it may -+ * identify entries that are nested in sparse directories, in which case the -+ * index should be expanded. For the sake of efficiency, this check is -+ * overly-cautious: anything with a wildcard or a magic prefix requires -+ * expansion, as well as literal paths that aren't in the sparse checkout -+ * definition AND don't match any directory in the index. ++ * When using a magic pathspec, assume for the sake of simplicity that ++ * the index needs to be expanded to match all matchable files. + */ -+ if (pathspec->nr && the_index.sparse_index) { -+ if (pathspec->magic || pathspec->has_wildcard) { -+ ensure_full_index(&the_index); -+ } else { -+ for (i = 0; i < pathspec->nr; i++) { -+ if (!path_in_cone_mode_sparse_checkout(pathspec->items[i].original, &the_index) && -+ !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) { -+ ensure_full_index(&the_index); ++ if (pathspec->magic) ++ return 1; ++ ++ for (i = 0; i < pathspec->nr; i++) { ++ struct pathspec_item item = pathspec->items[i]; ++ ++ /* ++ * If the pathspec item has a wildcard, the index should be expanded ++ * if the pathspec has the possibility of matching a subset of entries inside ++ * of a sparse directory (but not the entire directory). ++ * ++ * If the pathspec item is a literal path, the index only needs to be expanded ++ * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't ++ * expand for in-cone files) and b) it doesn't match any sparse directories ++ * (since we can reset whole sparse directories without expanding them). ++ */ ++ if (item.nowildcard_len < item.len) { ++ for (pos = 0; pos < active_nr; pos++) { ++ struct cache_entry *ce = active_cache[pos]; ++ ++ if (!S_ISSPARSEDIR(ce->ce_mode)) ++ continue; ++ ++ /* ++ * If the pre-wildcard length is longer than the sparse ++ * directory name and the sparse directory is the first ++ * component of the pathspec, need to expand the index. ++ */ ++ if (item.nowildcard_len > ce_namelen(ce) && ++ !strncmp(item.original, ce->name, ce_namelen(ce))) { ++ res = 1; ++ break; ++ } ++ ++ /* ++ * If the pre-wildcard length is shorter than the sparse ++ * directory and the pathspec does not match the whole ++ * directory, need to expand the index. ++ */ ++ if (!strncmp(item.original, ce->name, item.nowildcard_len) && ++ wildmatch(item.original, ce->name, 0)) { ++ res = 1; + break; + } + } -+ } ++ } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && ++ !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) ++ res = 1; ++ ++ if (res > 0) ++ break; + } + + free(skip_worktree_seen); ++ return res; ++} ++ + static int read_from_tree(const struct pathspec *pathspec, + struct object_id *tree_oid, + int intent_to_add) +@@ builtin/reset.c: static int read_from_tree(const struct pathspec *pathspec, + opt.format_callback = update_index_from_diff; + opt.format_callback_data = &intent_to_add; + opt.flags.override_submodule_config = 1; ++ opt.flags.recursive = 1; + opt.repo = the_repository; ++ opt.change = diff_change; ++ opt.add_remove = diff_addremove; ++ ++ if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) ++ ensure_full_index(&the_index); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && -- ensure_not_expanded reset --hard && + ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a -+ # directory entry in the index, so it will be reset without needing to -+ # expand the full index. -+ ensure_not_expanded reset --hard update-folder1 && -+ ensure_not_expanded reset base -- folder1 && ++ # directory entry in the index, so the pathspec will not force the ++ # index to be expanded. ++ ensure_not_expanded reset deepest -- folder1 && ++ ensure_not_expanded reset deepest -- folder1/ && ++ ++ # Wildcard identifies only in-cone files, no index expansion ++ ensure_not_expanded reset deepest -- deep/\* && ++ ++ # Wildcard identifies only full sparse directories, no index expansion ++ ensure_not_expanded reset deepest -- folder\* && + -+ ensure_not_expanded reset --hard update-deep && ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( 7: aa963eefae7 ! 8: c7145e039f3 unpack-trees: improve performance of next_cache_entry @@ Commit message beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b - (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain - the benefit `cache_bottom` provides in non-sparse index cases, a separate - `hint` position indicates the first position `next_cache_entry` should - search, updated each execution with a new position. The performance of `git - reset -- does-not-exist` (testing the "worst case" in which all entries in - the index are unpacked with `next_cache_entry`) is significantly improved - for the sparse index case: - - Test before after - ------------------------------------------------------ - (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% - (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% - (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% - (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% + (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the + benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` + position indicates the first position `next_cache_entry` should search, + updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> -- gitgitgadget
From: Victoria Dye <vdye@github.com> Rename and invert value of `is_missing` to `is_in_reset_tree` to make the variable more descriptive of what it represents. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..d3695ce43c4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } -- gitgitgadget
From: Kevin Willford <kewillf@microsoft.com> Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 14 ++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 19 +++++-------------- t/t7102-reset.sh | 17 +++++++++++++++++ 3 files changed, 36 insertions(+), 14 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index d3695ce43c4..e441b6601b9 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,7 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, 0, 0); + + /* + * If the file 1) corresponds to an existing index entry with + * skip-worktree set, or 2) does not exist in the index but is + * outside the sparse checkout definition, add a skip-worktree bit + * to the new index entry. + */ + pos = cache_name_pos(one->path, strlen(one->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) + ce->ce_flags |= CE_SKIP_WORKTREE; + if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..889079f55b8 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && test_all_match git reset deepest && - test_all_match git reset update-folder1 && - test_all_match git reset update-folder2 -' - -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_success 'checkout and reset (mixed) [sparse]' ' - init_repos && - test_sparse_match git checkout -b reset-test update-deep && - test_sparse_match git reset deepest && + # Because skip-worktree is preserved, resetting to update-folder1 + # will show worktree changes for full-checkout that are not present + # in sparse-checkout or sparse-index. test_sparse_match git reset update-folder1 && - test_sparse_match git reset update-folder2 + run_on_sparse test_path_is_missing folder1 ' test_expect_success 'merge, cherry-pick, and rebase' ' diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh index 601b2bf97f0..d05426062ec 100755 --- a/t/t7102-reset.sh +++ b/t/t7102-reset.sh @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' test_cmp expect output ' +test_expect_success '--mixed preserves skip-worktree' ' + echo 123 >>file2 && + git add file2 && + git update-index --skip-worktree file2 && + git reset --mixed HEAD >output && + test_must_be_empty output && + + cat >expect <<-\EOF && + Unstaged changes after reset: + M file2 + EOF + git update-index --no-skip-worktree file2 && + git add file2 && + git reset --mixed HEAD >output && + test_cmp expect output +' + test_expect_success 'resetting specific path that is unmerged' ' git rm --cached file2 && F1=$(git rev-parse HEAD:file1) && -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add a new `--force-full-index` option to `git update-index`, which skips explicitly setting `command_requires_full_index`. This option, intended for use in internal testing purposes only, lets `git update-index` run as a command without sparse index compatibility implemented, even after it receives updates to otherwise use the sparse index. The specific test `--force-full-index` is intended for - `t1092 - sparse-index is expanded and converted back` - verifies index compatibility in commands that do not change the default (enabled) `command_requires_full_index` repo setting. In the past, the test used `git reset`. However, as `reset` and other commands are integrated with the sparse index, the command used in the test would need to keep changing. Conversely, the `--force-full-index` option makes `git update-index` behave like a not-yet-sparse-aware command, and can be used in the test indefinitely without interfering with future sparse index integrations. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- Documentation/git-update-index.txt | 5 +++++ builtin/update-index.c | 11 +++++++++++ t/t1092-sparse-checkout-compatibility.sh | 2 +- 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 2853f168d97..06255e321a3 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -24,6 +24,7 @@ SYNOPSIS [--[no-]fsmonitor] [--really-refresh] [--unresolve] [--again | -g] [--info-only] [--index-info] + [--force-full-index] [-z] [--stdin] [--index-version <n>] [--verbose] [--] [<file>...] @@ -170,6 +171,10 @@ time. Version 4 is relatively young (first released in 1.8.0 in October 2012). Other Git implementations such as JGit and libgit2 may not support it yet. +--force-full-index:: + Force the command to operate on a full index, expanding a sparse + index if necessary. + -z:: Only meaningful with `--stdin` or `--index-info`; paths are separated with NUL character instead of LF. diff --git a/builtin/update-index.c b/builtin/update-index.c index 187203e8bb5..32ada3ead77 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -964,6 +964,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) int split_index = -1; int force_write = 0; int fsmonitor = -1; + int use_default_full_index = 0; struct lock_file lock_file = LOCK_INIT; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; @@ -1069,6 +1070,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL, N_("clear fsmonitor valid bit"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG}, + OPT_SET_INT(0, "force-full-index", &use_default_full_index, + N_("run with full index explicitly required"), 1), OPT_END() }; @@ -1082,6 +1085,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) if (newfd < 0) lock_error = errno; + /* + * If --force-full-index is set, the command should skip manually + * setting `command_requires_full_index`. + */ + prepare_repo_settings(r); + if (!use_default_full_index) + r->settings.command_requires_full_index = 1; + entries = read_cache(); if (entries < 0) die("cache corrupted"); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 889079f55b8..4aa4fef7b4f 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -635,7 +635,7 @@ test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index -c core.fsmonitor="" update-index --force-full-index && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 84 ++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 4aa4fef7b4f..875cdcb0495 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -472,6 +472,90 @@ test_expect_success 'checkout and reset (mixed)' ' run_on_sparse test_path_is_missing folder1 ' +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +# Although the working tree differs between full and sparse checkouts after +# reset, the state of the index is the same. +test_expect_success 'reset with pathspecs outside sparse definition' ' + init_repos && + test_all_match git checkout -b reset-test base && + + test_sparse_match git reset update-folder1 -- folder1 && + git -C full-checkout reset update-folder1 -- folder1 && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1 && + + test_sparse_match git reset update-folder2 -- folder2/a && + git -C full-checkout reset update-folder2 -- folder2/a && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2/a +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset base -- \*/a && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1/a && + + test_all_match git reset base -- folder\* && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> Disable `command_requires_full_index` repo setting and add `ensure_full_index` guards around code paths that cannot yet use sparse directory index entries. `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is expanded to prevent cache tree corruption and invalid variable accesses. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` must be run *after* enabling sparse index for the command (so that the index is not expanded unnecessarily) and *before* using the index for reset (so that it is verified as uncorrupted). Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index e441b6601b9..0ac0de7dc97 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 47 ++++++++++++++++++++++-- cache.h | 10 +++++ read-cache.c | 27 ++++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 +++++++- 4 files changed, 86 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..2866101052c 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,26 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct cache_tree *it, + struct tree *tree) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { struct tree_desc desc; struct name_entry entry; int cnt; + int base_path_len = tree_path->len; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + + /* + * Recursively-constructed subtree path is only needed when working + * in a sparse index (where it's used to determine whether the + * subtree is a sparse directory in the index). + */ + if (r->index->sparse_index) { + strbuf_setlen(tree_path, base_path_len); + strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); + strbuf_add(tree_path, entry.path, entry.pathlen); + strbuf_addch(tree_path, '/'); + } + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, tree_path->buf, tree_path->len)) + prime_cache_tree_sparse_dir(sub->cache_tree, subtree); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); cnt += sub->cache_tree->entry_count; } } + it->entry_count = cnt; } @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..c079ece981a 100644 --- a/read-cache.c +++ b/read-cache.c @@ -68,6 +68,11 @@ */ #define CACHE_ENTRY_PATH_LENGTH 80 +enum index_search_mode { + NO_EXPAND_SPARSE = 0, + EXPAND_SPARSE = 1 +}; + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) { struct cache_entry *ce; @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + enum index_search_mode search_mode) { int first, last; @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_mode == EXPAND_SPARSE && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_mode); } } @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 875cdcb0495..4ac93874cb2 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -756,9 +756,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -768,6 +768,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove the `ensure_full_index` guard on `read_from_tree` and update `git reset --mixed` to ensure it can use sparse directory index entries wherever possible. Sparse directory entries are reset use `diff_tree_oid`, which requires `change` and `add_remove` functions to process the internal contents of the sparse directory. The `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. The use of pathspecs with `git reset --mixed` introduces scenarios in which internal contents of sparse directories may be matched by the pathspec. In order to reset *all* files in the repo that may match the pathspec, the following conditions on the pathspec require index expansion before performing the reset: * "magic" pathspecs * wildcard pathspecs that do not match only in-cone files or entire sparse directories * literal pathspecs matching something outside the sparse checkout definition Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 78 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 17 ++++++ 2 files changed, 93 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 0ac0de7dc97..60517e7e1d6 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, * If the file 1) corresponds to an existing index entry with * skip-worktree set, or 2) does not exist in the index but is * outside the sparse checkout definition, add a skip-worktree bit - * to the new index entry. + * to the new index entry. Note that a sparse index will be expanded + * if this entry is outside the sparse cone - this is necessary + * to properly construct the reset sparse directory. */ pos = cache_name_pos(one->path, strlen(one->path)); if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q, } } +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) +{ + unsigned int i, pos; + int res = 0; + char *skip_worktree_seen = NULL; + + /* + * When using a magic pathspec, assume for the sake of simplicity that + * the index needs to be expanded to match all matchable files. + */ + if (pathspec->magic) + return 1; + + for (i = 0; i < pathspec->nr; i++) { + struct pathspec_item item = pathspec->items[i]; + + /* + * If the pathspec item has a wildcard, the index should be expanded + * if the pathspec has the possibility of matching a subset of entries inside + * of a sparse directory (but not the entire directory). + * + * If the pathspec item is a literal path, the index only needs to be expanded + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't + * expand for in-cone files) and b) it doesn't match any sparse directories + * (since we can reset whole sparse directories without expanding them). + */ + if (item.nowildcard_len < item.len) { + for (pos = 0; pos < active_nr; pos++) { + struct cache_entry *ce = active_cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + continue; + + /* + * If the pre-wildcard length is longer than the sparse + * directory name and the sparse directory is the first + * component of the pathspec, need to expand the index. + */ + if (item.nowildcard_len > ce_namelen(ce) && + !strncmp(item.original, ce->name, ce_namelen(ce))) { + res = 1; + break; + } + + /* + * If the pre-wildcard length is shorter than the sparse + * directory and the pathspec does not match the whole + * directory, need to expand the index. + */ + if (!strncmp(item.original, ce->name, item.nowildcard_len) && + wildmatch(item.original, ce->name, 0)) { + res = 1; + break; + } + } + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) + res = 1; + + if (res > 0) + break; + } + + free(skip_worktree_seen); + return res; +} + static int read_from_tree(const struct pathspec *pathspec, struct object_id *tree_oid, int intent_to_add) @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) + ensure_full_index(&the_index); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 4ac93874cb2..c9343ff5b9c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -774,11 +774,28 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so the pathspec will not force the + # index to be expanded. + ensure_not_expanded reset deepest -- folder1 && + ensure_not_expanded reset deepest -- folder1/ && + + # Wildcard identifies only in-cone files, no index expansion + ensure_not_expanded reset deepest -- deep/\* && + + # Wildcard identifies only full sparse directories, no index expansion + ensure_not_expanded reset deepest -- folder\* && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
On 08/10/21 04.15, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
>
> Add a new `--force-full-index` option to `git update-index`, which skips
> explicitly setting `command_requires_full_index`. This option, intended for
> use in internal testing purposes only, lets `git update-index` run as a
> command without sparse index compatibility implemented, even after it
> receives updates to otherwise use the sparse index.
>
> The specific test `--force-full-index` is intended for - `t1092 -
> sparse-index is expanded and converted back` - verifies index compatibility
> in commands that do not change the default (enabled)
> `command_requires_full_index` repo setting. In the past, the test used `git
> reset`. However, as `reset` and other commands are integrated with the
> sparse index, the command used in the test would need to keep changing.
> Conversely, the `--force-full-index` option makes `git update-index` behave
> like a not-yet-sparse-aware command, and can be used in the test
> indefinitely without interfering with future sparse index integrations.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
Grammar looks OK.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
--
An old man doll... just what I always wanted! - Clara
On Wed, Oct 6, 2021 at 1:40 PM Victoria Dye <vdye@github.com> wrote:
>
> Elijah Newren wrote:
> > On Tue, Oct 5, 2021 at 6:20 AM Victoria Dye via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Victoria Dye <vdye@github.com>
> >>
> >> Add a new `--force-full-index` option to `git update-index`, which skips
> >> explicitly setting `command_requires_full_index`. This lets `git
> >> update-index --force-full-index` run as a command without sparse index
> >> compatibility implemented, even after it receives sparse index compatibility
> >> updates.
> >>
> >> By using `git update-index --force-full-index` in the `t1092` test
> >> `sparse-index is expanded and converted back`, commands can continue to
> >> integrate with the sparse index without the need to keep modifying the
> >> command used in the test.
> >
> > So...we're adding a permanent user-facing command line flag, whose
> > purpose is just to help us with the transition work of implementing
> > sparse indexes everywhere? Am I reading that right, or is that just
> > the reason for t1092 and there are more reasons for it elsewhere?
> >
> > Also, I'm curious if update-index is the right place to add this. If
> > you don't want a sparse index anymore, wouldn't a user want to run
> > git sparse-checkout disable
> > ? Or is the point that you do want to keep the sparse checkout, but
> > you just don't want the index to also be sparse? Still, even in that
> > case, it seems like adding a subcommand or flag to an existing
> > sparse-checkout subcommand would feel more natural, since
> > sparse-checkout is the command the user uses to request to get into a
> > sparse-checkout and sparse index.
> >
>
> This came out of a conversation [1] on an earlier version of this patch.
> Because the `t1092 - sparse-index is expanded and converted back` test
> verifies sparse index compatibility (i.e., expand the index when reading,
> collapse back to sparse when writing) on commands that don't have any sparse
> index integration, it needed to be changed from `git reset` to something
> else. However, as we keep integrating commands with sparse index we'd need
> to keep changing the command in the test, creating a bunch of patches doing
> effectively the same thing for no long-term benefit.
>
> The `--force-full-index` flag isn't meant to be used externally or modify
> the index in any "new" way - it's really just a "test" version of `git
> update-index` that we guarantee will accurately represent a command using
> the default settings. Right now, it does exactly what `git update-index`
> (without the flag) does, and will only behave differently once `git
> update-index` is integrated with sparse index. Using `--force-full-index`,
> the test won't need to be regularly updated and will continue to catch
> errors like:
>
> 1. Changing the default value of `command_requires_full_index` to 0
> 2. Not expanding a sparse index to full when `command_requires_full_index`
> is 1
> 3. Not collapsing the index back to sparse if sparse index is enabled
>
> I see the issue of introducing a test-only option (when sparse index is
> integrated everywhere, shouldn't it be deprecated?). If there's a way to
> make this more obviously internal/temporary, I'm happy to modify it. Or, if
> semi-frequent updates of the command in the test aren't a huge issue, I can
> revert to V1.
If it's a test-only capability you need, I'd say add it under
t/helpers/ somewhere, either a new flag for an existing subcommand of
test-tool, or a new subcommand for test-tool.
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > + /* > + * If --force-full-index is set, the command should skip manually > + * setting `command_requires_full_index`. > + */ Hmph, doesn't that feel unnaturally backwards, though? The settings.command_requires_full_index bit forces read-cache to call ensure_full_index() immediately after the in-core index is read from the disk. If we are forcing operating on the full index, I'd imagine that we'd be making sure that ensure_full_index() to be called. I do not see anything in the code that ensures active_cache_changed to be flipped on. So the new test that says git -C sparse-index -c core.fsmonitor="" update-index --force-full-index may not call ensure_full_index(), but because nothing marks the_index as changed, I think we won't call write_locked_index() at the end of cmd_update_index(). IOW, what we have in the test patch may be an expensive noop, no? Or perhaps I am reading the patch completely incorrectly. I dunno. > + prepare_repo_settings(r); > + if (!use_default_full_index) > + r->settings.command_requires_full_index = 1; > + > entries = read_cache(); > if (entries < 0) > die("cache corrupted"); > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 889079f55b8..4aa4fef7b4f 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -635,7 +635,7 @@ test_expect_success 'sparse-index is expanded and converted back' ' > init_repos && > > GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ > - git -C sparse-index -c core.fsmonitor="" reset --hard && > + git -C sparse-index -c core.fsmonitor="" update-index --force-full-index && > test_region index convert_to_sparse trace2.txt && > test_region index ensure_full_index trace2.txt > '
"Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>
> ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path,
> 0, 0);
> +
> + /*
> + * If the file 1) corresponds to an existing index entry with
> + * skip-worktree set, or 2) does not exist in the index but is
> + * outside the sparse checkout definition, add a skip-worktree bit
> + * to the new index entry.
> + */
> + pos = cache_name_pos(one->path, strlen(one->path));
> + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
> + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index)))
> + ce->ce_flags |= CE_SKIP_WORKTREE;
OK. Nicely explained.
Hi Victoria On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote: > From: Victoria Dye <vdye@github.com> > > Remove `ensure_full_index` guard on `prime_cache_tree` and update > `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in > the cache tree. While processing a tree's entries, `prime_cache_tree_rec` > must determine whether a directory entry is sparse or not by searching for > it in the index (*without* expanding the index). If a matching sparse > directory index entry is found, no subtrees are added to the cache tree > entry and the entry count is set to 1 (representing the sparse directory > itself). Otherwise, the tree is assumed to not be sparse and its subtrees > are recursively added to the cache tree. I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result() Best Wishes Phillip > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > cache-tree.c | 47 ++++++++++++++++++++++-- > cache.h | 10 +++++ > read-cache.c | 27 ++++++++++---- > t/t1092-sparse-checkout-compatibility.sh | 15 +++++++- > 4 files changed, 86 insertions(+), 13 deletions(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 9be19c85b66..2866101052c 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -740,15 +740,26 @@ out: > return ret; > } > > +static void prime_cache_tree_sparse_dir(struct cache_tree *it, > + struct tree *tree) > +{ > + > + oidcpy(&it->oid, &tree->object.oid); > + it->entry_count = 1; > +} > + > static void prime_cache_tree_rec(struct repository *r, > struct cache_tree *it, > - struct tree *tree) > + struct tree *tree, > + struct strbuf *tree_path) > { > struct tree_desc desc; > struct name_entry entry; > int cnt; > + int base_path_len = tree_path->len; > > oidcpy(&it->oid, &tree->object.oid); > + > init_tree_desc(&desc, tree->buffer, tree->size); > cnt = 0; > while (tree_entry(&desc, &entry)) { > @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r, > else { > struct cache_tree_sub *sub; > struct tree *subtree = lookup_tree(r, &entry.oid); > + > if (!subtree->object.parsed) > parse_tree(subtree); > sub = cache_tree_sub(it, entry.path); > sub->cache_tree = cache_tree(); > - prime_cache_tree_rec(r, sub->cache_tree, subtree); > + > + /* > + * Recursively-constructed subtree path is only needed when working > + * in a sparse index (where it's used to determine whether the > + * subtree is a sparse directory in the index). > + */ > + if (r->index->sparse_index) { > + strbuf_setlen(tree_path, base_path_len); > + strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); > + strbuf_add(tree_path, entry.path, entry.pathlen); > + strbuf_addch(tree_path, '/'); > + } > + > + /* > + * If a sparse index is in use, the directory being processed may be > + * sparse. To confirm that, we can check whether an entry with that > + * exact name exists in the index. If it does, the created subtree > + * should be sparse. Otherwise, cache tree expansion should continue > + * as normal. > + */ > + if (r->index->sparse_index && > + index_entry_exists(r->index, tree_path->buf, tree_path->len)) > + prime_cache_tree_sparse_dir(sub->cache_tree, subtree); > + else > + prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); > cnt += sub->cache_tree->entry_count; > } > } > + > it->entry_count = cnt; > } > > @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r, > struct index_state *istate, > struct tree *tree) > { > + struct strbuf tree_path = STRBUF_INIT; > + > trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); > cache_tree_free(&istate->cache_tree); > istate->cache_tree = cache_tree(); > > - ensure_full_index(istate); > - prime_cache_tree_rec(r, istate->cache_tree, tree); > + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); > + strbuf_release(&tree_path); > istate->cache_changed |= CACHE_TREE_CHANGED; > trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); > } > diff --git a/cache.h b/cache.h > index f6295f3b048..1d3e4665562 100644 > --- a/cache.h > +++ b/cache.h > @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na > */ > int index_name_pos(struct index_state *, const char *name, int namelen); > > +/* > + * Determines whether an entry with the given name exists within the > + * given index. The return value is 1 if an exact match is found, otherwise > + * it is 0. Note that, unlike index_name_pos, this function does not expand > + * the index if it is sparse. If an item exists within the full index but it > + * is contained within a sparse directory (and not in the sparse index), 0 is > + * returned. > + */ > +int index_entry_exists(struct index_state *, const char *name, int namelen); > + > /* > * Some functions return the negative complement of an insert position when a > * precise match was not found but a position was found where the entry would > diff --git a/read-cache.c b/read-cache.c > index f5d4385c408..c079ece981a 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -68,6 +68,11 @@ > */ > #define CACHE_ENTRY_PATH_LENGTH 80 > > +enum index_search_mode { > + NO_EXPAND_SPARSE = 0, > + EXPAND_SPARSE = 1 > +}; > + > static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) > { > struct cache_entry *ce; > @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char > return 0; > } > > -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) > +static int index_name_stage_pos(struct index_state *istate, > + const char *name, int namelen, > + int stage, > + enum index_search_mode search_mode) > { > int first, last; > > @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > first = next+1; > } > > - if (istate->sparse_index && > + if (search_mode == EXPAND_SPARSE && istate->sparse_index && > first > 0) { > /* Note: first <= istate->cache_nr */ > struct cache_entry *ce = istate->cache[first - 1]; > @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > ce_namelen(ce) < namelen && > !strncmp(name, ce->name, ce_namelen(ce))) { > ensure_full_index(istate); > - return index_name_stage_pos(istate, name, namelen, stage); > + return index_name_stage_pos(istate, name, namelen, stage, search_mode); > } > } > > @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in > > int index_name_pos(struct index_state *istate, const char *name, int namelen) > { > - return index_name_stage_pos(istate, name, namelen, 0); > + return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); > +} > + > +int index_entry_exists(struct index_state *istate, const char *name, int namelen) > +{ > + return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; > } > > int remove_index_entry_at(struct index_state *istate, int pos) > @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate, > */ > } > > - pos = index_name_stage_pos(istate, name, len, stage); > + pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); > if (pos >= 0) { > /* > * Found one, but not so fast. This could > @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e > strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) > pos = index_pos_to_insert_pos(istate->cache_nr); > else > - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); > + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); > > /* existing match? Just replace it. */ > if (pos >= 0) { > @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e > if (!ok_to_replace) > return error(_("'%s' appears as both a file and as a directory"), > ce->name); > - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); > + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); > pos = -pos-1; > } > return pos + 1; > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 875cdcb0495..4ac93874cb2 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -756,9 +756,9 @@ test_expect_success 'sparse-index is not expanded' ' > ensure_not_expanded checkout - && > ensure_not_expanded switch rename-out-to-out && > ensure_not_expanded switch - && > - git -C sparse-index reset --hard && > + ensure_not_expanded reset --hard && > ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && > - git -C sparse-index reset --hard && > + ensure_not_expanded reset --hard && > ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && > > echo >>sparse-index/README.md && > @@ -768,6 +768,17 @@ test_expect_success 'sparse-index is not expanded' ' > echo >>sparse-index/untracked.txt && > ensure_not_expanded add . && > > + for ref in update-deep update-folder1 update-folder2 update-deep > + do > + echo >>sparse-index/README.md && > + ensure_not_expanded reset --hard $ref || return 1 > + done && > + > + ensure_not_expanded reset --hard update-deep && > + ensure_not_expanded reset --keep base && > + ensure_not_expanded reset --merge update-deep && > + ensure_not_expanded reset --hard && > + > ensure_not_expanded checkout -f update-deep && > test_config -C sparse-index pull.twohead ort && > ( >
Junio C Hamano wrote: > "Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> + /* >> + * If --force-full-index is set, the command should skip manually >> + * setting `command_requires_full_index`. >> + */ > > Hmph, doesn't that feel unnaturally backwards, though? > > The settings.command_requires_full_index bit forces read-cache to > call ensure_full_index() immediately after the in-core index is read > from the disk. If we are forcing operating on the full index, I'd > imagine that we'd be making sure that ensure_full_index() to be > called. > I tried coming up with a user-facing name that wasn't too focused on the internal implementation, but it ends up being misleading. The intention was to have this be a variation of `git update-index` that uses the default setting for `command_requires_full_index` but then proceeds to read and write the index as `update-index` normally would. Something like `--use-default-index-sparsity` might have been more accurate? > I do not see anything in the code that ensures active_cache_changed > to be flipped on. So the new test that says > > git -C sparse-index -c core.fsmonitor="" update-index --force-full-index > > may not call ensure_full_index(), but because nothing marks > the_index as changed, I think we won't call write_locked_index() at > the end of cmd_update_index(). IOW, what we have in the test patch > may be an expensive noop, no? > In the test's use-case, `active_cache_changed` ends up set to `CACHE_TREE_CHANGED`, which forces writing the index. It is still effectively a no-op, but it serves the needs of the test. In any case, Elijah suggested using a `test-tool` subcommand for this purpose [1], which I think is more appropriate overall. Something like `test-tool read-write-cache` can be implemented to make no mention of `command_requires_full_index` (therefore using its default value) and force a basic read & write of the index. It also eliminates the issue of having a user-facing name at all, and can easily be removed once all sparse index integrations are done. [1] https://lore.kernel.org/git/CABPp-BF+bEUcyE0N79uRCkpCayJx_NMqOpnMSHHrpJM5a9hAWw@mail.gmail.com/
Elijah Newren <newren@gmail.com> writes:
>> I see the issue of introducing a test-only option (when sparse index is
>> integrated everywhere, shouldn't it be deprecated?). If there's a way to
>> make this more obviously internal/temporary, I'm happy to modify it. Or, if
>> semi-frequent updates of the command in the test aren't a huge issue, I can
>> revert to V1.
>
> If it's a test-only capability you need, I'd say add it under
> t/helpers/ somewhere, either a new flag for an existing subcommand of
> test-tool, or a new subcommand for test-tool.
Is the ability to force expanding to full index completely useless
in the field? For diagnosing breakage the end-users may see in the
wild, or perhaps in a specialist usecase for whatever reason working
on full index is preferable and the user may want to force it once
to correct an earlier mistake to enable sparse-index before toggling
the configuration off, or something?
If we do not foresee any such reason, I'd agree it is good to move
that to t/helpers/; otherwise, I think update-index is as good as
any other place, and the option will sit well next to other options
like "--[no-]skip-worktree", "--[no-]assume-unchanged". It would
most likely need to be used together with "--force-write-index" (or
be made to imply the latter) to be useful, I suspect.
Thanks.
Phillip Wood wrote:
> Hi Victoria
>
> On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Remove `ensure_full_index` guard on `prime_cache_tree` and update
>> `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
>> the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
>> must determine whether a directory entry is sparse or not by searching for
>> it in the index (*without* expanding the index). If a matching sparse
>> directory index entry is found, no subtrees are added to the cache tree
>> entry and the entry count is set to 1 (representing the sparse directory
>> itself). Otherwise, the tree is assumed to not be sparse and its subtrees
>> are recursively added to the cache tree.
>
> I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result()
>
It looks like `prime_cache_tree` can be removed mostly without issue, but
it causes the two last tests in `t4058-diff-duplicates.sh` to fail. Those
tests document failure cases when dealing with duplicate tree entries [1],
and it looks like `prime_cache_tree` was creating the appearance of a
fully-reset index but was still leaving it in a state where subsequent
operations could fail.
I'm inclined to say the solution here would be to update the tests to
document the "new" failure behavior and proceed with removing
`prime_cache_tree`, because:
* the test using `git reset --hard` disables `GIT_TEST_CHECK_CACHE_TREE`,
indicating that `prime_cache_tree` already wasn't behaving correctly
* attempting to fix the overarching issues with duplicate tree entries will
substantially delay this patch series
* a duplicate entry fix is largely unrelated to the intended scope of the
series
Another option would be to leave `prime_cache_tree` as it is, but with it
being apparently useless outside of mostly-broken use cases in `t4058`, it
seems like a waste to keep it around.
[1] ac14de13b2 (t4058: explore duplicate tree entry handling in a bit more detail, 2020-12-11)
Victoria Dye <vdye@github.com> writes: > I tried coming up with a user-facing name that wasn't too focused on the > internal implementation, but it ends up being misleading. The intention was > to have this be a variation of `git update-index` that uses the default > setting for `command_requires_full_index` but then proceeds to read and > write the index as `update-index` normally would. Something like > `--use-default-index-sparsity` might have been more accurate? The option name in the reviewed patch does imply "we force expanding to full" and not "use the default", so it probably needs renaming, if we want the "use the default" semantics. But is that useful in the context of the test you are using it in place of "reset" or "mv"? Even if the default is somehow flipped to use sparse always, wouldn't the particular test want the index expanded? I dunno. > In the test's use-case, `active_cache_changed` ends up set to > `CACHE_TREE_CHANGED`, which forces writing the index. It is still > effectively a no-op, but it serves the needs of the test. Ah, cache-tree is updated, then it's OK. As to test-tool vs end-user-accessible-command, I do not have a strong opinion, but use your imagination and ask Derrick or somebody else for their imagination to see if such a "force expand" feature may be something the end-users might need an access to in order to dig themselves out of a hole (in which case, it may be better to make it end-user-accessible) or not (in which case, test-tool is more appropriate). Thanks.
Victoria Dye <vdye@github.com> writes: > Phillip Wood wrote: >> I was looking at the callers to prime_cache_tree() this morning >> and would like to suggest an alternative approach - just delete >> prime_cache_tree() and all of its callers! Do you mean the calls added by new patches without understanding what they are doing, or all calls to it? Every time you update a path in the index from the working tree (e.g. "git add") and other sources, the directory in the cache-tree that includes the path is invalidated, and the surviving subtrees of cache-tree is used to speed up writing the index as a tree object, doing "diff-index --cached" (hence "git status"), etc. So over time, the cache-tree "degrades" as you muck with the index entries. When you write out the index as a tree, we by definition have to know the object names of all the tree objects that correspond to each directory in the index. A fully valid cache-tree is saved when it happens, so the above process can start over. There are cases other than "git write-tree" that we can cheaply learn the object names of all the tree objects that correspond to each directory in the index. When we read the index from an existing tree object, we know which tree (and its subtrees) we populated the index from, so we can salvage a degraded cache-tree. "reset --hard" and "reset --mixed" may be good opportunities, so is "checkout <branch>" that starts from a clean index. And cache tree priming is a mechanism to take advantage of such an opportunity. The cache-tree does not have to be primed and all you lose is performance, so priming can be removed mostly "without an issue", if you are not paying attention to cache-tree degradation. Priming with incorrect data, however, would leave permanent damage by writing a wrong tree via "git write-tree" (hence "git commit") and showing a wrong diff via "git diff-index [--cached]" (hence "git status" and probably "git add -- <pathspec>"), so not priming is safer than priming incorrectly. HTH.
On 08/10/2021 19:31, Junio C Hamano wrote: > Victoria Dye <vdye@github.com> writes: > >> Phillip Wood wrote: > >>> I was looking at the callers to prime_cache_tree() this morning >>> and would like to suggest an alternative approach - just delete >>> prime_cache_tree() and all of its callers! > > Do you mean the calls added by new patches without understanding > what they are doing, or all calls to it? I mean all calls to prime_cache_tree() after having understood (or at least thinking that I understand) what they are doing. As I tried to explain in the part of my message that you have cut (a) a successful call to unpack_trees() updates the cache tree (b) all the existing calls to prime_cache_tree() follow a successful call to unpack_trees() and nothing touches in index in between the call to unpack_trees() and prime_cache_tree(). Maybe I've misunderstood something but that leads me believe those calls can be removed without degrading performance. Best Wishes Phillip > Every time you update a path in the index from the working tree > (e.g. "git add") and other sources, the directory in the cache-tree > that includes the path is invalidated, and the surviving subtrees of > cache-tree is used to speed up writing the index as a tree object, > doing "diff-index --cached" (hence "git status"), etc. So over > time, the cache-tree "degrades" as you muck with the index entries. > > When you write out the index as a tree, we by definition have to > know the object names of all the tree objects that correspond to > each directory in the index. A fully valid cache-tree is saved when > it happens, so the above process can start over. > > There are cases other than "git write-tree" that we can cheaply > learn the object names of all the tree objects that correspond to > each directory in the index. When we read the index from an > existing tree object, we know which tree (and its subtrees) we > populated the index from, so we can salvage a degraded cache-tree. > > "reset --hard" and "reset --mixed" may be good opportunities, so is > "checkout <branch>" that starts from a clean index. And cache tree > priming is a mechanism to take advantage of such an opportunity. > > The cache-tree does not have to be primed and all you lose is > performance, so priming can be removed mostly "without an issue", if > you are not paying attention to cache-tree degradation. Priming > with incorrect data, however, would leave permanent damage by > writing a wrong tree via "git write-tree" (hence "git commit") and > showing a wrong diff via "git diff-index [--cached]" (hence "git > status" and probably "git add -- <pathspec>"), so not priming is > safer than priming incorrectly. > > HTH. >
Phillip Wood <phillip.wood123@gmail.com> writes: > On 08/10/2021 19:31, Junio C Hamano wrote: >> Victoria Dye <vdye@github.com> writes: >> >>> Phillip Wood wrote: >> >>>> I was looking at the callers to prime_cache_tree() this morning >>>> and would like to suggest an alternative approach - just delete >>>> prime_cache_tree() and all of its callers! >> Do you mean the calls added by new patches without understanding >> what they are doing, or all calls to it? > > I mean all calls to prime_cache_tree() after having understood (or at > least thinking that I understand) what they are doing. Sorry, my statement was confusingly written. I meant "calls added by new patches, written by those who do not understand what prime_cache_tree() calls are doing", but after re-reading it, I think it could be taken to be referring to "you may be commenting without understanding what prime_cache_tree() calls are doing", which wasn't my intention. > (a) a successful call to unpack_trees() updates the cache tree > > (b) all the existing calls to prime_cache_tree() follow a successful > call to unpack_trees() and nothing touches in index in between the > call to unpack_trees() and prime_cache_tree(). Ahh, OK. I think we originally avoided calling cache_tree_update() lightly (because it is essentially a "write-tree", a fairly heavy-weight operation, without I/O) and instead relied on prime_cache_tree() to get degraded cache-tree back into freshness. What I forgot was that 52fca218 (unpack-trees: populate cache-tree on successful merge, 2015-07-28) added cache_tree_update() there at the end of unpack_trees(). The commit covers quite a wide range of operations---the log message says "merge", but in fact anything that uses unpack_trees() including branch switching and the resetting of the index are affected, and they cause a full reconstruction of the cache tree by calling cache_tree_update(). For most callers of prime_cache_tree(), like the ones in "git read-tree" and "git reset", it is immediately obvious that we just read from the same tree, and we should have everything from the tree and nothing else in the resulting index, so it is clear that the prime_cache_tree() call is recreating the same cache-tree information that we already should have computed ourselves, and these calls can go (or if "prime" is still cheaper than "update", these callers can pass an option to tell unpack_trees() to skip the cache_tree_update() call, because they will call "prime" immediately after). For other callers it is not immediately obvious, but I trust you are correctly reading the code ;-) Thanks.
On 10/8/21 1:19 PM, Junio C Hamano wrote: > Victoria Dye <vdye@github.com> writes: > >> I tried coming up with a user-facing name that wasn't too focused on the >> internal implementation, but it ends up being misleading. The intention was >> to have this be a variation of `git update-index` that uses the default >> setting for `command_requires_full_index` but then proceeds to read and >> write the index as `update-index` normally would. Something like >> `--use-default-index-sparsity` might have been more accurate? > > The option name in the reviewed patch does imply "we force expanding > to full" and not "use the default", so it probably needs renaming, > if we want the "use the default" semantics. But is that useful in > the context of the test you are using it in place of "reset" or "mv"? > Even if the default is somehow flipped to use sparse always, wouldn't > the particular test want the index expanded? I dunno. > >> In the test's use-case, `active_cache_changed` ends up set to >> `CACHE_TREE_CHANGED`, which forces writing the index. It is still >> effectively a no-op, but it serves the needs of the test. > > Ah, cache-tree is updated, then it's OK. > > As to test-tool vs end-user-accessible-command, I do not have a > strong opinion, but use your imagination and ask Derrick or somebody > else for their imagination to see if such a "force expand" feature > may be something the end-users might need an access to in order to > dig themselves out of a hole (in which case, it may be better to > make it end-user-accessible) or not (in which case, test-tool is > more appropriate). I think there is something to be said about the name being confusing, because the current implementation focuses on "expand a sparse index upon read" but it also allows the index to be written as sparse. Conversely, if the user runs git -c index.sparse=false update-index ... then the index.sparse config setting forbids conversion from full to sparse, but does not say anything about expanding to full. Perhaps this should be corrected: the index.sparse=false setting should expand a sparse index to a full one, then prevent it from being converted to a sparse one on write. This diff should do it: diff --git a/read-cache.c b/read-cache.c index 564283c7e7e..04df1051e18 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) if (!istate->repo) istate->repo = the_repository; prepare_repo_settings(istate->repo); - if (istate->repo->settings.command_requires_full_index) + if (!istate->repo->settings.sparse_index || + istate->repo->settings.command_requires_full_index) ensure_full_index(istate); return istate->cache_nr; Victoria, what are your thoughts about including such a change? Junio, would it be better to change the config setting, and then update this test to use the config setting over a command-line flag? This would allow us to punt on the --force-full-index flag until we have time to focus on the 'git update-index' command and interactions like this. Thanks, -Stolee
Derrick Stolee wrote: > On 10/8/21 1:19 PM, Junio C Hamano wrote: >> Victoria Dye <vdye@github.com> writes: >> >>> I tried coming up with a user-facing name that wasn't too focused on the >>> internal implementation, but it ends up being misleading. The intention was >>> to have this be a variation of `git update-index` that uses the default >>> setting for `command_requires_full_index` but then proceeds to read and >>> write the index as `update-index` normally would. Something like >>> `--use-default-index-sparsity` might have been more accurate? >> >> The option name in the reviewed patch does imply "we force expanding >> to full" and not "use the default", so it probably needs renaming, >> if we want the "use the default" semantics. But is that useful in >> the context of the test you are using it in place of "reset" or "mv"? >> Even if the default is somehow flipped to use sparse always, wouldn't >> the particular test want the index expanded? I dunno. >> >>> In the test's use-case, `active_cache_changed` ends up set to >>> `CACHE_TREE_CHANGED`, which forces writing the index. It is still >>> effectively a no-op, but it serves the needs of the test. >> >> Ah, cache-tree is updated, then it's OK. >> >> As to test-tool vs end-user-accessible-command, I do not have a >> strong opinion, but use your imagination and ask Derrick or somebody >> else for their imagination to see if such a "force expand" feature >> may be something the end-users might need an access to in order to >> dig themselves out of a hole (in which case, it may be better to >> make it end-user-accessible) or not (in which case, test-tool is >> more appropriate). > > I think there is something to be said about the name being confusing, > because the current implementation focuses on "expand a sparse index > upon read" but it also allows the index to be written as sparse. > This helps clarify what I was misinterpreting in the test. It isn't looking for "default" behavior, it's verifying whether trace2 logs capture index expansion and collapse when those operations are expected to happen, regardless of whether that's because `command_requires_full_index` is 1 or because the command needs to use entries inside of sparse directories. With that interpretation, I can replace the command with `git reset update-folder1 -- folder1/a` and get the same result (without needing to change the test in the future *or* add a new `git` command option / `test-tool` subcommand). > Conversely, if the user runs > > git -c index.sparse=false update-index ... > > then the index.sparse config setting forbids conversion from full to > sparse, but does not say anything about expanding to full. > > Perhaps this should be corrected: the index.sparse=false setting > should expand a sparse index to a full one, then prevent it from > being converted to a sparse one on write. > > This diff should do it: > > diff --git a/read-cache.c b/read-cache.c > index 564283c7e7e..04df1051e18 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2376,7 +2376,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) > if (!istate->repo) > istate->repo = the_repository; > prepare_repo_settings(istate->repo); > - if (istate->repo->settings.command_requires_full_index) > + if (!istate->repo->settings.sparse_index || > + istate->repo->settings.command_requires_full_index) > ensure_full_index(istate); > > return istate->cache_nr; > > Victoria, what are your thoughts about including such a change? > I think this is a worthwhile change, but I'd prefer submitting it separately (either in an upcoming sparse index integration or on its own). It's not directly needed by anything in this series, and I'd like to avoid adding features to the scope if possible.
Derrick Stolee <stolee@gmail.com> writes:
> Junio, would it be better to change the config setting, and then
> update this test to use the config setting over a command-line flag?
> This would allow us to punt on the --force-full-index flag until we
> have time to focus on the 'git update-index' command and interactions
> like this.
I do not have a strong opinion on where we add the feature; as long
as we have a way to let us avoid having to unnecessarily change this
particular test, that's perfectly fine, and if we can reuse it as a
way for end-users to help those who are debugging their issues, that
would be an added bonus.
Thanks.
Junio C Hamano wrote:
> For most callers of prime_cache_tree(), like the ones in "git
> read-tree" and "git reset", it is immediately obvious that we just
> read from the same tree, and we should have everything from the tree
> and nothing else in the resulting index, so it is clear that the
> prime_cache_tree() call is recreating the same cache-tree
> information that we already should have computed ourselves, and
> these calls can go (or if "prime" is still cheaper than "update",
> these callers can pass an option to tell unpack_trees() to skip the
> cache_tree_update() call, because they will call "prime" immediately
> after).
>
After some basic performance testing of `git reset [--hard]`, it's not clear
whether `cache_tree_update` is definitively faster or slower than
`prime_cache_tree`; more conclusive results would indicate which of the two
could be skipped. I'd like to defer this to a future patch (tracking it with
an internal issue so I don't forget) where I can perform a more thorough
analysis across all of the commands currently using `prime_cache_tree` and
update its usage accordingly.
Victoria Dye <vdye@github.com> writes:
> After some basic performance testing of `git reset [--hard]`, it's not clear
> whether `cache_tree_update` is definitively faster or slower than
> `prime_cache_tree`; more conclusive results would indicate which of the two
> could be skipped. I'd like to defer this to a future patch (tracking it with
> an internal issue so I don't forget) where I can perform a more thorough
> analysis across all of the commands currently using `prime_cache_tree` and
> update its usage accordingly.
Yup. That sounds sensible. Concentrating on correctness first is a
good direction to go.
This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate a ~70% execution time reduction in git reset using a sparse index, and no change (within expected variability [2]) using a full index. Results summarized below [3, 4]: Test base [5/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.97(0.50+0.37) -3.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.96(0.50+0.36) -4.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 1.37(0.91+0.35) -18.5% git reset --hard (sparse-v4) 1.70(1.18+0.40) 1.41(0.94+0.35) -17.1% Test base [6/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.94(0.48+0.34) -6.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.95(0.51+0.34) -5.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 0.46(0.05+0.29) -72.6% git reset --hard (sparse-v4) 1.70(1.18+0.40) 0.46(0.06+0.29) -72.9% Test base [7/8] --------------------------------------------------------------------------- git reset (full-v3) 0.77(0.27+0.37) 0.72(0.26+0.32) -6.5% git reset (full-v4) 0.75(0.27+0.34) 0.73(0.26+0.32) -2.7% git reset (sparse-v3) 1.44(0.96+0.36) 0.43(0.04+0.96) -70.1% git reset (sparse-v4) 1.46(0.97+0.36) 0.43(0.05+0.79) -70.5% git reset -- missing (full-v3) 0.72(0.26+0.32) 0.69(0.26+0.30) -4.2% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.71(0.27+0.32) -4.1% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.81(0.42+0.90) -44.1% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.79(0.42+0.76) -44.0% Test base [8/8] --------------------------------------------------------------------------- git reset -- missing (full-v3) 0.72(0.26+0.32) 0.73(0.26+0.33) +1.4% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.74(0.27+0.32) +0.0% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.43(0.05+0.80) -70.3% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.44(0.05+0.76) -68.8% Changes since V1 ================ * Add --force-full-index option to update-index. The option is used circumvent changing command_requires_full_index from its default value - right now this is effectively a no-op, but will change once update-index is integrated with sparse index. By using this option in the t1092 expand/collapse test, the command used to test will not need to be updated with subsequent sparse index integrations. * Update implementation of mixed reset for entries outside sparse checkout definition. The condition in which a file should be checked out before index reset is simplified to "if it has skip-worktree enabled and a reset would change the file, check it out". * After checking the behavior of update_index_from_diff with renames, found that the diff used by reset does not produce diff queue entries with different pathnames for one and two. Because of this, and that nothing in the implementation seems to rely on identical path names, no BUG check is added. * Correct a bug in the sparse index is not expanded tests in t1092 where failure of a git reset --mixed test was not being reported. Test now verifies an appropriate scenario with corrected failure-checking. Changes since V2 ================ * Replace patch adding checkouts for git reset --mixed with sparse checkout with preserving the skip-worktree flag (including a new test for git reset --mixed and update to t1092 - checkout and reset (mixed)) * Move rename of is_missing into its own patch * Further extend t1092 tests and remove unnecessary commands/tests where possible * Refine logic determining which pathspecs require ensure_full_index in git reset --mixed, add related ensure_not_expanded tests * Add index_search_mode enum to index_name_stage_pos * Clean up variable usage & remove unnecessary subtree_path in prime_cache_tree_rec * Update cover letter performance data * More thoroughly explain changes in each commit message Changes since V3 ================ * Replace git update-index --force-full-index with git reset update-folder1 -- folder1/a, remove introduction of new --force-full-index option entirely, and add comment clarifying the intent of sparse-index is expanded and converted back test * Fix authorship on reset: preserve skip-worktree bit in mixed reset (current patch fully replaces original patch, but metadata of the original wasn't properly replaced) Thanks! -Victoria [1] microsoft@6b8a074 [2] https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/ [3] If a test and/or commit is not mentioned, there is no significant change to performance [4] Pathspec "does-not-exist" is changed to "missing" to save space in performance report Victoria Dye (8): reset: rename is_missing to !is_in_reset_tree reset: preserve skip-worktree bit in mixed reset sparse-index: update command for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry builtin/reset.c | 104 ++++++++++++++++- cache-tree.c | 46 +++++++- cache.h | 10 ++ read-cache.c | 27 +++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 137 ++++++++++++++++++++--- t/t7102-reset.sh | 17 +++ unpack-trees.c | 23 +++- 8 files changed, 330 insertions(+), 37 deletions(-) base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 Range-diff vs v3: 1: ad7013a31aa = 1: ad7013a31aa reset: rename is_missing to !is_in_reset_tree 2: 1f6da84830b ! 2: bd72bd175da reset: preserve skip-worktree bit in mixed reset @@ ## Metadata ## -Author: Kevin Willford <kewillf@microsoft.com> +Author: Victoria Dye <vdye@github.com> ## Commit message ## reset: preserve skip-worktree bit in mixed reset 3: 014a408ea5d < -: ----------- update-index: add --force-full-index option for expand/collapse test -: ----------- > 3: c4df0d6b136 sparse-index: update command for expand/collapse test 4: 7f21cf53e9d = 4: cfbb23e9fe2 reset: expand test coverage for sparse checkouts 5: a2d6212e287 = 5: 62fdbf2ad26 reset: integrate with sparse index 6: 330e0c09774 = 6: b0d437207e7 reset: make sparse-aware (except --mixed) 7: 6ef8e4e31d3 = 7: 00d14fb60bd reset: make --mixed sparse-aware 8: c7145e039f3 = 8: e523dadb8bf unpack-trees: improve performance of next_cache_entry -- gitgitgadget
From: Victoria Dye <vdye@github.com> Rename and invert value of `is_missing` to `is_in_reset_tree` to make the variable more descriptive of what it represents. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..d3695ce43c4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } -- gitgitgadget
From: Victoria Dye <vdye@github.com> Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 14 ++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 19 +++++-------------- t/t7102-reset.sh | 17 +++++++++++++++++ 3 files changed, 36 insertions(+), 14 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index d3695ce43c4..e441b6601b9 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,7 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, 0, 0); + + /* + * If the file 1) corresponds to an existing index entry with + * skip-worktree set, or 2) does not exist in the index but is + * outside the sparse checkout definition, add a skip-worktree bit + * to the new index entry. + */ + pos = cache_name_pos(one->path, strlen(one->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) + ce->ce_flags |= CE_SKIP_WORKTREE; + if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..889079f55b8 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && test_all_match git reset deepest && - test_all_match git reset update-folder1 && - test_all_match git reset update-folder2 -' - -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_success 'checkout and reset (mixed) [sparse]' ' - init_repos && - test_sparse_match git checkout -b reset-test update-deep && - test_sparse_match git reset deepest && + # Because skip-worktree is preserved, resetting to update-folder1 + # will show worktree changes for full-checkout that are not present + # in sparse-checkout or sparse-index. test_sparse_match git reset update-folder1 && - test_sparse_match git reset update-folder2 + run_on_sparse test_path_is_missing folder1 ' test_expect_success 'merge, cherry-pick, and rebase' ' diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh index 601b2bf97f0..d05426062ec 100755 --- a/t/t7102-reset.sh +++ b/t/t7102-reset.sh @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' test_cmp expect output ' +test_expect_success '--mixed preserves skip-worktree' ' + echo 123 >>file2 && + git add file2 && + git update-index --skip-worktree file2 && + git reset --mixed HEAD >output && + test_must_be_empty output && + + cat >expect <<-\EOF && + Unstaged changes after reset: + M file2 + EOF + git update-index --no-skip-worktree file2 && + git add file2 && + git reset --mixed HEAD >output && + test_cmp expect output +' + test_expect_success 'resetting specific path that is unmerged' ' git rm --cached file2 && F1=$(git rev-parse HEAD:file1) && -- gitgitgadget
From: Victoria Dye <vdye@github.com> In anticipation of `git reset --hard` being able to use the sparse index without expanding it, replace the command in `sparse-index is expanded and converted back` with `git reset -- folder1/a`. This command will need to expand the index to work properly, even after integrating the rest of `reset` with sparse index. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/t1092-sparse-checkout-compatibility.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 889079f55b8..e1422797013 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -631,11 +631,15 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +# When working with a sparse index, some commands will need to expand the +# index to operate properly. If those commands also write the index back +# to disk, they need to convert the index to sparse before writing. +# This test verifies that both of these events are logged in trace2 logs. test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index reset -- folder1/a && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 84 ++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index e1422797013..535686a2954 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -472,6 +472,90 @@ test_expect_success 'checkout and reset (mixed)' ' run_on_sparse test_path_is_missing folder1 ' +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +# Although the working tree differs between full and sparse checkouts after +# reset, the state of the index is the same. +test_expect_success 'reset with pathspecs outside sparse definition' ' + init_repos && + test_all_match git checkout -b reset-test base && + + test_sparse_match git reset update-folder1 -- folder1 && + git -C full-checkout reset update-folder1 -- folder1 && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1 && + + test_sparse_match git reset update-folder2 -- folder2/a && + git -C full-checkout reset update-folder2 -- folder2/a && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2/a +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset base -- \*/a && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1/a && + + test_all_match git reset base -- folder\* && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> Disable `command_requires_full_index` repo setting and add `ensure_full_index` guards around code paths that cannot yet use sparse directory index entries. `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is expanded to prevent cache tree corruption and invalid variable accesses. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` must be run *after* enabling sparse index for the command (so that the index is not expanded unnecessarily) and *before* using the index for reset (so that it is verified as uncorrupted). Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index e441b6601b9..0ac0de7dc97 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 47 ++++++++++++++++++++++-- cache.h | 10 +++++ read-cache.c | 27 ++++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 +++++++- 4 files changed, 86 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..2866101052c 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,26 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct cache_tree *it, + struct tree *tree) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { struct tree_desc desc; struct name_entry entry; int cnt; + int base_path_len = tree_path->len; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + + /* + * Recursively-constructed subtree path is only needed when working + * in a sparse index (where it's used to determine whether the + * subtree is a sparse directory in the index). + */ + if (r->index->sparse_index) { + strbuf_setlen(tree_path, base_path_len); + strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); + strbuf_add(tree_path, entry.path, entry.pathlen); + strbuf_addch(tree_path, '/'); + } + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, tree_path->buf, tree_path->len)) + prime_cache_tree_sparse_dir(sub->cache_tree, subtree); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); cnt += sub->cache_tree->entry_count; } } + it->entry_count = cnt; } @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..c079ece981a 100644 --- a/read-cache.c +++ b/read-cache.c @@ -68,6 +68,11 @@ */ #define CACHE_ENTRY_PATH_LENGTH 80 +enum index_search_mode { + NO_EXPAND_SPARSE = 0, + EXPAND_SPARSE = 1 +}; + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) { struct cache_entry *ce; @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + enum index_search_mode search_mode) { int first, last; @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_mode == EXPAND_SPARSE && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_mode); } } @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 535686a2954..78476de18ea 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -760,9 +760,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -772,6 +772,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove the `ensure_full_index` guard on `read_from_tree` and update `git reset --mixed` to ensure it can use sparse directory index entries wherever possible. Sparse directory entries are reset use `diff_tree_oid`, which requires `change` and `add_remove` functions to process the internal contents of the sparse directory. The `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. The use of pathspecs with `git reset --mixed` introduces scenarios in which internal contents of sparse directories may be matched by the pathspec. In order to reset *all* files in the repo that may match the pathspec, the following conditions on the pathspec require index expansion before performing the reset: * "magic" pathspecs * wildcard pathspecs that do not match only in-cone files or entire sparse directories * literal pathspecs matching something outside the sparse checkout definition Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 78 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 17 ++++++ 2 files changed, 93 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 0ac0de7dc97..60517e7e1d6 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, * If the file 1) corresponds to an existing index entry with * skip-worktree set, or 2) does not exist in the index but is * outside the sparse checkout definition, add a skip-worktree bit - * to the new index entry. + * to the new index entry. Note that a sparse index will be expanded + * if this entry is outside the sparse cone - this is necessary + * to properly construct the reset sparse directory. */ pos = cache_name_pos(one->path, strlen(one->path)); if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q, } } +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) +{ + unsigned int i, pos; + int res = 0; + char *skip_worktree_seen = NULL; + + /* + * When using a magic pathspec, assume for the sake of simplicity that + * the index needs to be expanded to match all matchable files. + */ + if (pathspec->magic) + return 1; + + for (i = 0; i < pathspec->nr; i++) { + struct pathspec_item item = pathspec->items[i]; + + /* + * If the pathspec item has a wildcard, the index should be expanded + * if the pathspec has the possibility of matching a subset of entries inside + * of a sparse directory (but not the entire directory). + * + * If the pathspec item is a literal path, the index only needs to be expanded + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't + * expand for in-cone files) and b) it doesn't match any sparse directories + * (since we can reset whole sparse directories without expanding them). + */ + if (item.nowildcard_len < item.len) { + for (pos = 0; pos < active_nr; pos++) { + struct cache_entry *ce = active_cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + continue; + + /* + * If the pre-wildcard length is longer than the sparse + * directory name and the sparse directory is the first + * component of the pathspec, need to expand the index. + */ + if (item.nowildcard_len > ce_namelen(ce) && + !strncmp(item.original, ce->name, ce_namelen(ce))) { + res = 1; + break; + } + + /* + * If the pre-wildcard length is shorter than the sparse + * directory and the pathspec does not match the whole + * directory, need to expand the index. + */ + if (!strncmp(item.original, ce->name, item.nowildcard_len) && + wildmatch(item.original, ce->name, 0)) { + res = 1; + break; + } + } + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) + res = 1; + + if (res > 0) + break; + } + + free(skip_worktree_seen); + return res; +} + static int read_from_tree(const struct pathspec *pathspec, struct object_id *tree_oid, int intent_to_add) @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) + ensure_full_index(&the_index); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 78476de18ea..f19c1b3e2eb 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -778,11 +778,28 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so the pathspec will not force the + # index to be expanded. + ensure_not_expanded reset deepest -- folder1 && + ensure_not_expanded reset deepest -- folder1/ && + + # Wildcard identifies only in-cone files, no index expansion + ensure_not_expanded reset deepest -- deep/\* && + + # Wildcard identifies only full sparse directories, no index expansion + ensure_not_expanded reset deepest -- folder\* && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
On 10/10/2021 23:03, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> On 08/10/2021 19:31, Junio C Hamano wrote: >>> Victoria Dye <vdye@github.com> writes: >>> >>>> Phillip Wood wrote: >>> >>>>> I was looking at the callers to prime_cache_tree() this morning >>>>> and would like to suggest an alternative approach - just delete >>>>> prime_cache_tree() and all of its callers! >>> Do you mean the calls added by new patches without understanding >>> what they are doing, or all calls to it? >> >> I mean all calls to prime_cache_tree() after having understood (or at >> least thinking that I understand) what they are doing. > > Sorry, my statement was confusingly written. I meant "calls added > by new patches, written by those who do not understand what > prime_cache_tree() calls are doing", but after re-reading it, I > think it could be taken to be referring to "you may be commenting > without understanding what prime_cache_tree() calls are doing", > which wasn't my intention. Thanks for clarifying that, I had misunderstood what you had written. >> (a) a successful call to unpack_trees() updates the cache tree >> >> (b) all the existing calls to prime_cache_tree() follow a successful >> call to unpack_trees() and nothing touches in index in between the >> call to unpack_trees() and prime_cache_tree(). > > Ahh, OK. > > I think we originally avoided calling cache_tree_update() lightly > (because it is essentially a "write-tree", a fairly heavy-weight > operation, without I/O) and instead relied on prime_cache_tree() to > get degraded cache-tree back into freshness. > > What I forgot was that 52fca218 (unpack-trees: populate cache-tree > on successful merge, 2015-07-28) added cache_tree_update() there at > the end of unpack_trees(). The commit covers quite a wide range of > operations---the log message says "merge", but in fact anything that > uses unpack_trees() including branch switching and the resetting of > the index are affected, and they cause a full reconstruction of the > cache tree by calling cache_tree_update(). > > For most callers of prime_cache_tree(), like the ones in "git > read-tree" and "git reset", it is immediately obvious that we just > read from the same tree, and we should have everything from the tree > and nothing else in the resulting index, so it is clear that the > prime_cache_tree() call is recreating the same cache-tree > information that we already should have computed ourselves, and > these calls can go (or if "prime" is still cheaper than "update", > these callers can pass an option to tell unpack_trees() to skip the > cache_tree_update() call, because they will call "prime" immediately > after). I haven't really thought this through but could we teach unpack_trees() to call prime_cache_tree() rather than cache_tree_update() when that would be safe? For callers that use oneway_merge() merge it should always be safe I think and it might be possible to modify twoway_merge() to signal if the final tree in the index matches the second one passed to it. We could have a more general mechanism for the callback to signal if it is safe to prime the tree but I suspect the callers that are using custom callbacks are not updating the whole tree. Best Wishes Phillip > For other callers it is not immediately obvious, but I trust you are > correctly reading the code ;-) > > Thanks. > > >
On 08/10/2021 18:14, Victoria Dye wrote: > Phillip Wood wrote: >> Hi Victoria >> >> On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote: >>> From: Victoria Dye <vdye@github.com> >>> >>> Remove `ensure_full_index` guard on `prime_cache_tree` and update >>> `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in >>> the cache tree. While processing a tree's entries, `prime_cache_tree_rec` >>> must determine whether a directory entry is sparse or not by searching for >>> it in the index (*without* expanding the index). If a matching sparse >>> directory index entry is found, no subtrees are added to the cache tree >>> entry and the entry count is set to 1 (representing the sparse directory >>> itself). Otherwise, the tree is assumed to not be sparse and its subtrees >>> are recursively added to the cache tree. >> >> I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result() >> > > It looks like `prime_cache_tree` can be removed mostly without issue, but > it causes the two last tests in `t4058-diff-duplicates.sh` to fail. Those > tests document failure cases when dealing with duplicate tree entries [1], > and it looks like `prime_cache_tree` was creating the appearance of a > fully-reset index but was still leaving it in a state where subsequent > operations could fail. > > I'm inclined to say the solution here would be to update the tests to > document the "new" failure behavior and proceed with removing > `prime_cache_tree`, because: > > * the test using `git reset --hard` disables `GIT_TEST_CHECK_CACHE_TREE`, > indicating that `prime_cache_tree` already wasn't behaving correctly > * attempting to fix the overarching issues with duplicate tree entries will > substantially delay this patch series > * a duplicate entry fix is largely unrelated to the intended scope of the > series That sounds like a good way forward Best Wishes Phillip > Another option would be to leave `prime_cache_tree` as it is, but with it > being apparently useless outside of mostly-broken use cases in `t4058`, it > seems like a waste to keep it around. > > [1] ac14de13b2 (t4058: explore duplicate tree entry handling in a bit more detail, 2020-12-11) >
Phillip Wood <phillip.wood123@gmail.com> writes:
> I haven't really thought this through but could we teach
> unpack_trees() to call prime_cache_tree() rather than
> cache_tree_update() when that would be safe? For callers that use
> oneway_merge() merge it should always be safe I think and it might be
> possible to modify twoway_merge() to signal if the final tree in the
> index matches the second one passed to it. We could have a more
> general mechanism for the callback to signal if it is safe to prime
> the tree but I suspect the callers that are using custom callbacks are
> not updating the whole tree.
Before going in any direction, other than doing nothing ;-), we'd
need to see how expensive "prime" and "update" are.
Having said that.
* Your idea is quite beneficial for callers of unpack_trees() as
they no longer have to decide whether they want to make a
separate call to "prime".
* Right now we do not seem to have a codepath that
(1) populates the index entries from existing trees (not
necessarily making the index in complete sync with the trees)
without unpack_trees() and
(2) does "prime" to fix the cache tree
but such a codepath may want to do either "prime" or "update", or
neither. When it knows that it damages cache-tree so badly, and
that it is often expected that the user would make many other
changes to the index before writing it out as a tree, it may
choose not to do either.
Thanks.
On Mon, Oct 11, 2021 at 08:30:16PM +0000, Victoria Dye via GitGitGadget wrote: > diff --git a/builtin/reset.c b/builtin/reset.c > index d3695ce43c4..e441b6601b9 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -25,6 +25,7 @@ > #include "cache-tree.h" > #include "submodule.h" > #include "submodule-config.h" > +#include "dir.h" > > #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) > > @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, > > ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, > 0, 0); > + > + /* > + * If the file 1) corresponds to an existing index entry with > + * skip-worktree set, or 2) does not exist in the index but is > + * outside the sparse checkout definition, add a skip-worktree bit > + * to the new index entry. > + */ > + pos = cache_name_pos(one->path, strlen(one->path)); > + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || > + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) > + ce->ce_flags |= CE_SKIP_WORKTREE; To put it another way and check my understanding (because I'm not familiar with the sparse-index yet): if the file exists in the index but we didn't care about the worktree anyway, then skip it; if the file doesn't exist in the index but it also isn't in the sparse-checkout cone, then also skip it, because we don't care about the file anyway. I was going to ask if we could check ce_skip_worktree() without checking pos first, but I suppose a negative pos would make the array deref pretty unhappy. Ok. > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 886e78715fe..889079f55b8 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' > test_all_match git blame deep/deeper2/deepest/a > ' > > -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout > -# in this scenario, but it shouldn't. > -test_expect_failure 'checkout and reset (mixed)' ' > +test_expect_success 'checkout and reset (mixed)' ' Ooh ooh, we can start using these tests :) Always exciting. > init_repos && > > test_all_match git checkout -b reset-test update-deep && > test_all_match git reset deepest && > - test_all_match git reset update-folder1 && > - test_all_match git reset update-folder2 > -' > - > -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout > -# in this scenario, but it shouldn't. > -test_expect_success 'checkout and reset (mixed) [sparse]' ' > - init_repos && > > - test_sparse_match git checkout -b reset-test update-deep && > - test_sparse_match git reset deepest && > + # Because skip-worktree is preserved, resetting to update-folder1 > + # will show worktree changes for full-checkout that are not present > + # in sparse-checkout or sparse-index. This doesn't really have anything to do with your patch. But I'm having a very hard time understanding what each branch you're switching between and basing on is for; this entire test suite is a little miserly with comments. I *think* your comment is saying that you're not bothering to check test_all_match because you know that the full-checkout tree won't match? But I also don't see that being asserted; test_sparse_match looks to compare sparse-checkout and sparse-index trees but doesn't say anything at all about the full-checkout tree, right? > test_sparse_match git reset update-folder1 && > - test_sparse_match git reset update-folder2 > + run_on_sparse test_path_is_missing folder1 > ' > > test_expect_success 'merge, cherry-pick, and rebase' ' > diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh > index 601b2bf97f0..d05426062ec 100755 > --- a/t/t7102-reset.sh > +++ b/t/t7102-reset.sh > @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' > test_cmp expect output > ' > > +test_expect_success '--mixed preserves skip-worktree' ' > + echo 123 >>file2 && file2 is just in the worktree... > + git add file2 && ...and now it's in the index... > + git update-index --skip-worktree file2 && ...and now we're asking Git to ignore worktree changes to file2... > + git reset --mixed HEAD >output && But now I'm a little confused, maybe because of 'git reset' syntax. I'd expect this to say "ah yes, the index is different from HEAD, it's got this file2 thingie" and still reset the index; I'm surprised that --skip-worktree, which sounds like it's saying only "don't consider what's going on in the worktree". So I would expect this to still delete file2 from the index. But instead I guess it is keeping file2 in the index with the "who cares what happened in the wt" marker? > + test_must_be_empty output && > + > + cat >expect <<-\EOF && > + Unstaged changes after reset: > + M file2 > + EOF > + git update-index --no-skip-worktree file2 && > + git add file2 && > + git reset --mixed HEAD >output && > + test_cmp expect output > +' > + > test_expect_success 'resetting specific path that is unmerged' ' > git rm --cached file2 && > F1=$(git rev-parse HEAD:file1) && > -- > gitgitgadget >
Emily Shaffer wrote: > On Mon, Oct 11, 2021 at 08:30:16PM +0000, Victoria Dye via GitGitGadget wrote: >> diff --git a/builtin/reset.c b/builtin/reset.c >> index d3695ce43c4..e441b6601b9 100644 >> --- a/builtin/reset.c >> +++ b/builtin/reset.c >> @@ -25,6 +25,7 @@ >> #include "cache-tree.h" >> #include "submodule.h" >> #include "submodule-config.h" >> +#include "dir.h" >> >> #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) >> >> @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, >> >> ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, >> 0, 0); >> + >> + /* >> + * If the file 1) corresponds to an existing index entry with >> + * skip-worktree set, or 2) does not exist in the index but is >> + * outside the sparse checkout definition, add a skip-worktree bit >> + * to the new index entry. >> + */ >> + pos = cache_name_pos(one->path, strlen(one->path)); >> + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || >> + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) >> + ce->ce_flags |= CE_SKIP_WORKTREE; > > To put it another way and check my understanding (because I'm not > familiar with the sparse-index yet): if the file exists in the index but > we didn't care about the worktree anyway, then skip it; if the file > doesn't exist in the index but it also isn't in the sparse-checkout > cone, then also skip it, because we don't care about the file anyway. > > I was going to ask if we could check ce_skip_worktree() without checking > pos first, but I suppose a negative pos would make the array deref > pretty unhappy. Ok. > Exactly! Generally the current skip-worktree flag is the "source of truth" on whether to add the flag to the new entry, but if the file isn't in the index pre-reset, the sparse checkout patterns are used to determine if it should have skip-worktree applied. >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index 886e78715fe..889079f55b8 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -459,26 +459,17 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' >> test_all_match git blame deep/deeper2/deepest/a >> ' >> >> -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout >> -# in this scenario, but it shouldn't. >> -test_expect_failure 'checkout and reset (mixed)' ' >> +test_expect_success 'checkout and reset (mixed)' ' > > Ooh ooh, we can start using these tests :) Always exciting. > >> init_repos && >> >> test_all_match git checkout -b reset-test update-deep && >> test_all_match git reset deepest && >> - test_all_match git reset update-folder1 && >> - test_all_match git reset update-folder2 >> -' >> - >> -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout >> -# in this scenario, but it shouldn't. >> -test_expect_success 'checkout and reset (mixed) [sparse]' ' >> - init_repos && >> >> - test_sparse_match git checkout -b reset-test update-deep && >> - test_sparse_match git reset deepest && >> + # Because skip-worktree is preserved, resetting to update-folder1 >> + # will show worktree changes for full-checkout that are not present >> + # in sparse-checkout or sparse-index. > > This doesn't really have anything to do with your patch. But I'm having > a very hard time understanding what each branch you're switching between > and basing on is for; this entire test suite is a little miserly with > comments. The branches used in this test are: * `base` is the initial branch; all branches in the list here contain one commit on top of `base` * `update-deep` modifies a file inside the sparse checkout cone (`deep/a`) * `deepest` modifies a file deep inside the sparse checkout cone (`deep/deeper1/deepest/a`) * `update-folder1` and `update-folder2` each modify a file outside the sparse checkout cone (`folder1/a` and `folder2/a`, respectively) There are other branches used throughout the file, but they deal with more complicated conflict scenarios not used in this particular test case. > I *think* your comment is saying that you're not bothering to > check test_all_match because you know that the full-checkout tree won't > match? But I also don't see that being asserted; test_sparse_match looks > to compare sparse-checkout and sparse-index trees but doesn't say > anything at all about the full-checkout tree, right? > Your understanding is correct (it's attempting to explain why we're not using `test_all_match`, unlike earlier assertions in the test). That said, it probably _should_ assert on the differences from `full-checkout`, namely that "M folder1/a" would appear in `full-checkout` but not in `sparse-checkout`/`sparse-index`. I can add that in my next version. >> test_sparse_match git reset update-folder1 && >> - test_sparse_match git reset update-folder2 >> + run_on_sparse test_path_is_missing folder1 >> ' >> >> test_expect_success 'merge, cherry-pick, and rebase' ' >> diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh >> index 601b2bf97f0..d05426062ec 100755 >> --- a/t/t7102-reset.sh >> +++ b/t/t7102-reset.sh >> @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' >> test_cmp expect output >> ' >> >> +test_expect_success '--mixed preserves skip-worktree' ' >> + echo 123 >>file2 && > > file2 is just in the worktree... > >> + git add file2 && > > ...and now it's in the index... > >> + git update-index --skip-worktree file2 && > > ...and now we're asking Git to ignore worktree changes to file2... > >> + git reset --mixed HEAD >output && > > But now I'm a little confused, maybe because of 'git reset' syntax. I'd > expect this to say "ah yes, the index is different from HEAD, it's got > this file2 thingie" and still reset the index; I'm surprised that > --skip-worktree, which sounds like it's saying only "don't consider > what's going on in the worktree". So I would expect this to still delete > file2 from the index. But instead I guess it is keeping file2 in the > index with the "who cares what happened in the wt" marker? > Yes - `git update-index --skip-worktree` sets the skip-worktree flag on the index entry of the specified file(s) (in this case, `file2`). So, because `file2` is in the index but ignored in the worktree, the file isn't identified as "modified" after `git reset --mixed HEAD`. Once the skip-worktree flag is removed (with `git update-index --no-skip-worktree`), the reset results in it showing up as "modified". >> + test_must_be_empty output && >> + >> + cat >expect <<-\EOF && >> + Unstaged changes after reset: >> + M file2 >> + EOF >> + git update-index --no-skip-worktree file2 && >> + git add file2 && >> + git reset --mixed HEAD >output && >> + test_cmp expect output >> +' >> + >> test_expect_success 'resetting specific path that is unmerged' ' >> git rm --cached file2 && >> F1=$(git rev-parse HEAD:file1) && >> -- >> gitgitgadget >>
This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate a ~70% execution time reduction in git reset using a sparse index, and no change (within expected variability [2]) using a full index. Results summarized below [3, 4]: Test base [5/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.97(0.50+0.37) -3.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.96(0.50+0.36) -4.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 1.37(0.91+0.35) -18.5% git reset --hard (sparse-v4) 1.70(1.18+0.40) 1.41(0.94+0.35) -17.1% Test base [6/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.94(0.48+0.34) -6.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.95(0.51+0.34) -5.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 0.46(0.05+0.29) -72.6% git reset --hard (sparse-v4) 1.70(1.18+0.40) 0.46(0.06+0.29) -72.9% Test base [7/8] --------------------------------------------------------------------------- git reset (full-v3) 0.77(0.27+0.37) 0.72(0.26+0.32) -6.5% git reset (full-v4) 0.75(0.27+0.34) 0.73(0.26+0.32) -2.7% git reset (sparse-v3) 1.44(0.96+0.36) 0.43(0.04+0.96) -70.1% git reset (sparse-v4) 1.46(0.97+0.36) 0.43(0.05+0.79) -70.5% git reset -- missing (full-v3) 0.72(0.26+0.32) 0.69(0.26+0.30) -4.2% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.71(0.27+0.32) -4.1% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.81(0.42+0.90) -44.1% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.79(0.42+0.76) -44.0% Test base [8/8] --------------------------------------------------------------------------- git reset -- missing (full-v3) 0.72(0.26+0.32) 0.73(0.26+0.33) +1.4% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.74(0.27+0.32) +0.0% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.43(0.05+0.80) -70.3% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.44(0.05+0.76) -68.8% Changes since V1 ================ * Add --force-full-index option to update-index. The option is used circumvent changing command_requires_full_index from its default value - right now this is effectively a no-op, but will change once update-index is integrated with sparse index. By using this option in the t1092 expand/collapse test, the command used to test will not need to be updated with subsequent sparse index integrations. * Update implementation of mixed reset for entries outside sparse checkout definition. The condition in which a file should be checked out before index reset is simplified to "if it has skip-worktree enabled and a reset would change the file, check it out". * After checking the behavior of update_index_from_diff with renames, found that the diff used by reset does not produce diff queue entries with different pathnames for one and two. Because of this, and that nothing in the implementation seems to rely on identical path names, no BUG check is added. * Correct a bug in the sparse index is not expanded tests in t1092 where failure of a git reset --mixed test was not being reported. Test now verifies an appropriate scenario with corrected failure-checking. Changes since V2 ================ * Replace patch adding checkouts for git reset --mixed with sparse checkout with preserving the skip-worktree flag (including a new test for git reset --mixed and update to t1092 - checkout and reset (mixed)) * Move rename of is_missing into its own patch * Further extend t1092 tests and remove unnecessary commands/tests where possible * Refine logic determining which pathspecs require ensure_full_index in git reset --mixed, add related ensure_not_expanded tests * Add index_search_mode enum to index_name_stage_pos * Clean up variable usage & remove unnecessary subtree_path in prime_cache_tree_rec * Update cover letter performance data * More thoroughly explain changes in each commit message Changes since V3 ================ * Replace git update-index --force-full-index with git reset update-folder1 -- folder1/a, remove introduction of new --force-full-index option entirely, and add comment clarifying the intent of sparse-index is expanded and converted back test * Fix authorship on reset: preserve skip-worktree bit in mixed reset (current patch fully replaces original patch, but metadata of the original wasn't properly replaced) Changes since V4 ================ * Update t1092 test 'checkout and reset (mixed)' to explicitly verify differences between sparse and full checkouts Thanks! -Victoria [1] microsoft@6b8a074 [2] https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/ [3] If a test and/or commit is not mentioned, there is no significant change to performance [4] Pathspec "does-not-exist" is changed to "missing" to save space in performance report Victoria Dye (8): reset: rename is_missing to !is_in_reset_tree reset: preserve skip-worktree bit in mixed reset sparse-index: update command for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry builtin/reset.c | 104 ++++++++++++++++- cache-tree.c | 46 +++++++- cache.h | 10 ++ read-cache.c | 27 +++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 140 ++++++++++++++++++++--- t/t7102-reset.sh | 17 +++ unpack-trees.c | 23 +++- 8 files changed, 333 insertions(+), 37 deletions(-) base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 Range-diff vs v4: 1: ad7013a31aa = 1: ad7013a31aa reset: rename is_missing to !is_in_reset_tree 2: bd72bd175da ! 2: b221b00b7e0 reset: preserve skip-worktree bit in mixed reset @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_failure 'blame with pathsp - test_sparse_match git checkout -b reset-test update-deep && - test_sparse_match git reset deepest && + # Because skip-worktree is preserved, resetting to update-folder1 -+ # will show worktree changes for full-checkout that are not present ++ # will show worktree changes for folder1/a in full-checkout, but not + # in sparse-checkout or sparse-index. ++ git -C full-checkout reset update-folder1 >full-checkout-out && test_sparse_match git reset update-folder1 && - test_sparse_match git reset update-folder2 ++ grep "M folder1/a" full-checkout-out && ++ ! grep "M folder1/a" sparse-checkout-out && + run_on_sparse test_path_is_missing folder1 ' 3: c4df0d6b136 = 3: 1bb2ca92c60 sparse-index: update command for expand/collapse test 4: cfbb23e9fe2 = 4: cc76c694647 reset: expand test coverage for sparse checkouts 5: 62fdbf2ad26 = 5: 217ae445418 reset: integrate with sparse index 6: b0d437207e7 = 6: a3e2fd59867 reset: make sparse-aware (except --mixed) 7: 00d14fb60bd = 7: a9135a5ed64 reset: make --mixed sparse-aware 8: e523dadb8bf = 8: f91d1dcf024 unpack-trees: improve performance of next_cache_entry -- gitgitgadget
From: Victoria Dye <vdye@github.com> Rename and invert value of `is_missing` to `is_in_reset_tree` to make the variable more descriptive of what it represents. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..d3695ce43c4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } -- gitgitgadget
From: Victoria Dye <vdye@github.com> Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 14 ++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++-------------- t/t7102-reset.sh | 17 +++++++++++++++++ 3 files changed, 39 insertions(+), 14 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index d3695ce43c4..e441b6601b9 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,7 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, 0, 0); + + /* + * If the file 1) corresponds to an existing index entry with + * skip-worktree set, or 2) does not exist in the index but is + * outside the sparse checkout definition, add a skip-worktree bit + * to the new index entry. + */ + pos = cache_name_pos(one->path, strlen(one->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) + ce->ce_flags |= CE_SKIP_WORKTREE; + if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..c7449afe965 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,26 +459,20 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && test_all_match git reset deepest && - test_all_match git reset update-folder1 && - test_all_match git reset update-folder2 -' - -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_success 'checkout and reset (mixed) [sparse]' ' - init_repos && - test_sparse_match git checkout -b reset-test update-deep && - test_sparse_match git reset deepest && + # Because skip-worktree is preserved, resetting to update-folder1 + # will show worktree changes for folder1/a in full-checkout, but not + # in sparse-checkout or sparse-index. + git -C full-checkout reset update-folder1 >full-checkout-out && test_sparse_match git reset update-folder1 && - test_sparse_match git reset update-folder2 + grep "M folder1/a" full-checkout-out && + ! grep "M folder1/a" sparse-checkout-out && + run_on_sparse test_path_is_missing folder1 ' test_expect_success 'merge, cherry-pick, and rebase' ' diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh index 601b2bf97f0..d05426062ec 100755 --- a/t/t7102-reset.sh +++ b/t/t7102-reset.sh @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' test_cmp expect output ' +test_expect_success '--mixed preserves skip-worktree' ' + echo 123 >>file2 && + git add file2 && + git update-index --skip-worktree file2 && + git reset --mixed HEAD >output && + test_must_be_empty output && + + cat >expect <<-\EOF && + Unstaged changes after reset: + M file2 + EOF + git update-index --no-skip-worktree file2 && + git add file2 && + git reset --mixed HEAD >output && + test_cmp expect output +' + test_expect_success 'resetting specific path that is unmerged' ' git rm --cached file2 && F1=$(git rev-parse HEAD:file1) && -- gitgitgadget
From: Victoria Dye <vdye@github.com> In anticipation of `git reset --hard` being able to use the sparse index without expanding it, replace the command in `sparse-index is expanded and converted back` with `git reset -- folder1/a`. This command will need to expand the index to work properly, even after integrating the rest of `reset` with sparse index. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/t1092-sparse-checkout-compatibility.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index c7449afe965..cab6340a9d0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -634,11 +634,15 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +# When working with a sparse index, some commands will need to expand the +# index to operate properly. If those commands also write the index back +# to disk, they need to convert the index to sparse before writing. +# This test verifies that both of these events are logged in trace2 logs. test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index reset -- folder1/a && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 84 ++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index cab6340a9d0..a8583030b38 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -475,6 +475,90 @@ test_expect_success 'checkout and reset (mixed)' ' run_on_sparse test_path_is_missing folder1 ' +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +# Although the working tree differs between full and sparse checkouts after +# reset, the state of the index is the same. +test_expect_success 'reset with pathspecs outside sparse definition' ' + init_repos && + test_all_match git checkout -b reset-test base && + + test_sparse_match git reset update-folder1 -- folder1 && + git -C full-checkout reset update-folder1 -- folder1 && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1 && + + test_sparse_match git reset update-folder2 -- folder2/a && + git -C full-checkout reset update-folder2 -- folder2/a && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2/a +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git checkout -b reset-test update-deep && + test_all_match git reset base -- \*/a && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1/a && + + test_all_match git reset base -- folder\* && + test_all_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> Disable `command_requires_full_index` repo setting and add `ensure_full_index` guards around code paths that cannot yet use sparse directory index entries. `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is expanded to prevent cache tree corruption and invalid variable accesses. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` must be run *after* enabling sparse index for the command (so that the index is not expanded unnecessarily) and *before* using the index for reset (so that it is verified as uncorrupted). Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index e441b6601b9..0ac0de7dc97 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 47 ++++++++++++++++++++++-- cache.h | 10 +++++ read-cache.c | 27 ++++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 +++++++- 4 files changed, 86 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..2866101052c 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,26 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct cache_tree *it, + struct tree *tree) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { struct tree_desc desc; struct name_entry entry; int cnt; + int base_path_len = tree_path->len; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + + /* + * Recursively-constructed subtree path is only needed when working + * in a sparse index (where it's used to determine whether the + * subtree is a sparse directory in the index). + */ + if (r->index->sparse_index) { + strbuf_setlen(tree_path, base_path_len); + strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); + strbuf_add(tree_path, entry.path, entry.pathlen); + strbuf_addch(tree_path, '/'); + } + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, tree_path->buf, tree_path->len)) + prime_cache_tree_sparse_dir(sub->cache_tree, subtree); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); cnt += sub->cache_tree->entry_count; } } + it->entry_count = cnt; } @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..c079ece981a 100644 --- a/read-cache.c +++ b/read-cache.c @@ -68,6 +68,11 @@ */ #define CACHE_ENTRY_PATH_LENGTH 80 +enum index_search_mode { + NO_EXPAND_SPARSE = 0, + EXPAND_SPARSE = 1 +}; + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) { struct cache_entry *ce; @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + enum index_search_mode search_mode) { int first, last; @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_mode == EXPAND_SPARSE && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_mode); } } @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a8583030b38..5664ff8f039 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -763,9 +763,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -775,6 +775,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove the `ensure_full_index` guard on `read_from_tree` and update `git reset --mixed` to ensure it can use sparse directory index entries wherever possible. Sparse directory entries are reset use `diff_tree_oid`, which requires `change` and `add_remove` functions to process the internal contents of the sparse directory. The `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. The use of pathspecs with `git reset --mixed` introduces scenarios in which internal contents of sparse directories may be matched by the pathspec. In order to reset *all* files in the repo that may match the pathspec, the following conditions on the pathspec require index expansion before performing the reset: * "magic" pathspecs * wildcard pathspecs that do not match only in-cone files or entire sparse directories * literal pathspecs matching something outside the sparse checkout definition Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 78 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 17 ++++++ 2 files changed, 93 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 0ac0de7dc97..60517e7e1d6 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, * If the file 1) corresponds to an existing index entry with * skip-worktree set, or 2) does not exist in the index but is * outside the sparse checkout definition, add a skip-worktree bit - * to the new index entry. + * to the new index entry. Note that a sparse index will be expanded + * if this entry is outside the sparse cone - this is necessary + * to properly construct the reset sparse directory. */ pos = cache_name_pos(one->path, strlen(one->path)); if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q, } } +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) +{ + unsigned int i, pos; + int res = 0; + char *skip_worktree_seen = NULL; + + /* + * When using a magic pathspec, assume for the sake of simplicity that + * the index needs to be expanded to match all matchable files. + */ + if (pathspec->magic) + return 1; + + for (i = 0; i < pathspec->nr; i++) { + struct pathspec_item item = pathspec->items[i]; + + /* + * If the pathspec item has a wildcard, the index should be expanded + * if the pathspec has the possibility of matching a subset of entries inside + * of a sparse directory (but not the entire directory). + * + * If the pathspec item is a literal path, the index only needs to be expanded + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't + * expand for in-cone files) and b) it doesn't match any sparse directories + * (since we can reset whole sparse directories without expanding them). + */ + if (item.nowildcard_len < item.len) { + for (pos = 0; pos < active_nr; pos++) { + struct cache_entry *ce = active_cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + continue; + + /* + * If the pre-wildcard length is longer than the sparse + * directory name and the sparse directory is the first + * component of the pathspec, need to expand the index. + */ + if (item.nowildcard_len > ce_namelen(ce) && + !strncmp(item.original, ce->name, ce_namelen(ce))) { + res = 1; + break; + } + + /* + * If the pre-wildcard length is shorter than the sparse + * directory and the pathspec does not match the whole + * directory, need to expand the index. + */ + if (!strncmp(item.original, ce->name, item.nowildcard_len) && + wildmatch(item.original, ce->name, 0)) { + res = 1; + break; + } + } + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) + res = 1; + + if (res > 0) + break; + } + + free(skip_worktree_seen); + return res; +} + static int read_from_tree(const struct pathspec *pathspec, struct object_id *tree_oid, int intent_to_add) @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) + ensure_full_index(&the_index); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 5664ff8f039..44d5e11c762 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -781,11 +781,28 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so the pathspec will not force the + # index to be expanded. + ensure_not_expanded reset deepest -- folder1 && + ensure_not_expanded reset deepest -- folder1/ && + + # Wildcard identifies only in-cone files, no index expansion + ensure_not_expanded reset deepest -- deep/\* && + + # Wildcard identifies only full sparse directories, no index expansion + ensure_not_expanded reset deepest -- folder\* && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
On Thu, Oct 7, 2021 at 2:15 PM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Victoria Dye <vdye@github.com> > > Remove the `ensure_full_index` guard on `read_from_tree` and update `git > reset --mixed` to ensure it can use sparse directory index entries wherever > possible. Sparse directory entries are reset use `diff_tree_oid`, which I am having trouble parsing this second sentence. Was this meant to be 'Sparse directory entries _which_ are reset use...'? > requires `change` and `add_remove` functions to process the internal > contents of the sparse directory. The `recursive` diff option handles cases > in which `reset --mixed` must diff/merge files that are nested multiple > levels deep in a sparse directory. > > The use of pathspecs with `git reset --mixed` introduces scenarios in which > internal contents of sparse directories may be matched by the pathspec. In > order to reset *all* files in the repo that may match the pathspec, the > following conditions on the pathspec require index expansion before > performing the reset: > > * "magic" pathspecs > * wildcard pathspecs that do not match only in-cone files or entire sparse > directories > * literal pathspecs matching something outside the sparse checkout > definition > > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Victoria Dye <vdye@github.com> > --- > builtin/reset.c | 78 +++++++++++++++++++++++- > t/t1092-sparse-checkout-compatibility.sh | 17 ++++++ > 2 files changed, 93 insertions(+), 2 deletions(-) > > diff --git a/builtin/reset.c b/builtin/reset.c > index 0ac0de7dc97..60517e7e1d6 100644 > --- a/builtin/reset.c > +++ b/builtin/reset.c > @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, > * If the file 1) corresponds to an existing index entry with > * skip-worktree set, or 2) does not exist in the index but is > * outside the sparse checkout definition, add a skip-worktree bit > - * to the new index entry. > + * to the new index entry. Note that a sparse index will be expanded > + * if this entry is outside the sparse cone - this is necessary > + * to properly construct the reset sparse directory. > */ > pos = cache_name_pos(one->path, strlen(one->path)); > if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || > @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q, > } > } > > +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) > +{ > + unsigned int i, pos; > + int res = 0; > + char *skip_worktree_seen = NULL; > + > + /* > + * When using a magic pathspec, assume for the sake of simplicity that > + * the index needs to be expanded to match all matchable files. > + */ > + if (pathspec->magic) > + return 1; > + > + for (i = 0; i < pathspec->nr; i++) { > + struct pathspec_item item = pathspec->items[i]; > + > + /* > + * If the pathspec item has a wildcard, the index should be expanded > + * if the pathspec has the possibility of matching a subset of entries inside > + * of a sparse directory (but not the entire directory). > + * > + * If the pathspec item is a literal path, the index only needs to be expanded > + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't > + * expand for in-cone files) and b) it doesn't match any sparse directories > + * (since we can reset whole sparse directories without expanding them). > + */ > + if (item.nowildcard_len < item.len) { > + for (pos = 0; pos < active_nr; pos++) { > + struct cache_entry *ce = active_cache[pos]; > + > + if (!S_ISSPARSEDIR(ce->ce_mode)) > + continue; This double loop over all pathspecs and over all index entries reminds me of the original non-cone mode sparsity patterns. Stolee introduced cone mode patterns specifically to avoid the expensiveness of such double loops (cf. https://lore.kernel.org/git/19d664a5dada87a9a8dcf18d7548582275593f10.1566313865.git.gitgitgadget@gmail.com/). Can one of the functions he added allow us to avoid this double loop, or are there complications that don't allow this (e.g. the actually SKIP_WORKTREE paths don't quite match the requested sparsity paths in some cases, or here we are faced with just a leading path of multiple index entries)? > + /* > + * If the pre-wildcard length is longer than the sparse > + * directory name and the sparse directory is the first > + * component of the pathspec, need to expand the index. > + */ > + if (item.nowildcard_len > ce_namelen(ce) && > + !strncmp(item.original, ce->name, ce_namelen(ce))) { > + res = 1; > + break; > + } > + > + /* > + * If the pre-wildcard length is shorter than the sparse > + * directory and the pathspec does not match the whole > + * directory, need to expand the index. > + */ > + if (!strncmp(item.original, ce->name, item.nowildcard_len) && > + wildmatch(item.original, ce->name, 0)) { > + res = 1; > + break; > + } > + } > + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && > + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) Oh, so you can at least generally avoid the double loop. That's good. So is this just a case of wildcards are special and there isn't a way, even in cone-mode, to avoid the double loop? (Given that I'm so tardy in reviewing this, even if the answer is that the double loop is avoidable, or if we just don't know, I'd be totally fine with a 'TODO: consider whether this double loop could be avoided in cone mode using some kind of variant of path_in_cone_mode_sparse_checkout()') > + res = 1; > + > + if (res > 0) > + break; > + } > + > + free(skip_worktree_seen); > + return res; > +} > + > static int read_from_tree(const struct pathspec *pathspec, > struct object_id *tree_oid, > int intent_to_add) > @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec, > opt.format_callback = update_index_from_diff; > opt.format_callback_data = &intent_to_add; > opt.flags.override_submodule_config = 1; > + opt.flags.recursive = 1; > opt.repo = the_repository; > + opt.change = diff_change; > + opt.add_remove = diff_addremove; > + > + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) > + ensure_full_index(&the_index); > > - ensure_full_index(&the_index); > if (do_diff_cache(tree_oid, &opt)) > return 1; > diffcore_std(&opt); > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 4ac93874cb2..c9343ff5b9c 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -774,11 +774,28 @@ test_expect_success 'sparse-index is not expanded' ' > ensure_not_expanded reset --hard $ref || return 1 > done && > > + ensure_not_expanded reset --mixed base && > ensure_not_expanded reset --hard update-deep && > ensure_not_expanded reset --keep base && > ensure_not_expanded reset --merge update-deep && > ensure_not_expanded reset --hard && > > + ensure_not_expanded reset base -- deep/a && > + ensure_not_expanded reset base -- nonexistent-file && > + ensure_not_expanded reset deepest -- deep && > + > + # Although folder1 is outside the sparse definition, it exists as a > + # directory entry in the index, so the pathspec will not force the > + # index to be expanded. > + ensure_not_expanded reset deepest -- folder1 && > + ensure_not_expanded reset deepest -- folder1/ && > + > + # Wildcard identifies only in-cone files, no index expansion > + ensure_not_expanded reset deepest -- deep/\* && > + > + # Wildcard identifies only full sparse directories, no index expansion > + ensure_not_expanded reset deepest -- folder\* && > + > ensure_not_expanded checkout -f update-deep && > test_config -C sparse-index pull.twohead ort && > ( > -- > gitgitgadget
Sorry, one more thing...
On Wed, Oct 27, 2021 at 7:39 AM Victoria Dye via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Victoria Dye <vdye@github.com>
>
> Remove the `ensure_full_index` guard on `read_from_tree` and update `git
> reset --mixed` to ensure it can use sparse directory index entries wherever
> possible. Sparse directory entries are reset use `diff_tree_oid`, which
> requires `change` and `add_remove` functions to process the internal
> contents of the sparse directory. The `recursive` diff option handles cases
> in which `reset --mixed` must diff/merge files that are nested multiple
> levels deep in a sparse directory.
>
> The use of pathspecs with `git reset --mixed` introduces scenarios in which
> internal contents of sparse directories may be matched by the pathspec. In
> order to reset *all* files in the repo that may match the pathspec, the
> following conditions on the pathspec require index expansion before
> performing the reset:
>
> * "magic" pathspecs
> * wildcard pathspecs that do not match only in-cone files or entire sparse
> directories
> * literal pathspecs matching something outside the sparse checkout
> definition
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
> builtin/reset.c | 78 +++++++++++++++++++++++-
> t/t1092-sparse-checkout-compatibility.sh | 17 ++++++
> 2 files changed, 93 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 0ac0de7dc97..60517e7e1d6 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q,
> * If the file 1) corresponds to an existing index entry with
> * skip-worktree set, or 2) does not exist in the index but is
> * outside the sparse checkout definition, add a skip-worktree bit
> - * to the new index entry.
> + * to the new index entry. Note that a sparse index will be expanded
> + * if this entry is outside the sparse cone - this is necessary
> + * to properly construct the reset sparse directory.
> */
> pos = cache_name_pos(one->path, strlen(one->path));
> if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
> @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q,
> }
> }
>
> +static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
> +{
> + unsigned int i, pos;
> + int res = 0;
> + char *skip_worktree_seen = NULL;
> +
> + /*
> + * When using a magic pathspec, assume for the sake of simplicity that
> + * the index needs to be expanded to match all matchable files.
> + */
> + if (pathspec->magic)
> + return 1;
> +
> + for (i = 0; i < pathspec->nr; i++) {
> + struct pathspec_item item = pathspec->items[i];
> +
> + /*
> + * If the pathspec item has a wildcard, the index should be expanded
> + * if the pathspec has the possibility of matching a subset of entries inside
> + * of a sparse directory (but not the entire directory).
> + *
> + * If the pathspec item is a literal path, the index only needs to be expanded
> + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
> + * expand for in-cone files) and b) it doesn't match any sparse directories
> + * (since we can reset whole sparse directories without expanding them).
> + */
> + if (item.nowildcard_len < item.len) {
> + for (pos = 0; pos < active_nr; pos++) {
> + struct cache_entry *ce = active_cache[pos];
> +
> + if (!S_ISSPARSEDIR(ce->ce_mode))
> + continue;
> +
> + /*
> + * If the pre-wildcard length is longer than the sparse
> + * directory name and the sparse directory is the first
> + * component of the pathspec, need to expand the index.
> + */
> + if (item.nowildcard_len > ce_namelen(ce) &&
> + !strncmp(item.original, ce->name, ce_namelen(ce))) {
> + res = 1;
> + break;
> + }
> +
> + /*
> + * If the pre-wildcard length is shorter than the sparse
> + * directory and the pathspec does not match the whole
> + * directory, need to expand the index.
> + */
> + if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
> + wildmatch(item.original, ce->name, 0)) {
> + res = 1;
> + break;
> + }
> + }
> + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
> + !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
> + res = 1;
> +
> + if (res > 0)
> + break;
> + }
> +
> + free(skip_worktree_seen);
> + return res;
> +}
> +
> static int read_from_tree(const struct pathspec *pathspec,
> struct object_id *tree_oid,
> int intent_to_add)
> @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec,
> opt.format_callback = update_index_from_diff;
> opt.format_callback_data = &intent_to_add;
> opt.flags.override_submodule_config = 1;
> + opt.flags.recursive = 1;
> opt.repo = the_repository;
> + opt.change = diff_change;
> + opt.add_remove = diff_addremove;
> +
> + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
> + ensure_full_index(&the_index);
>
> - ensure_full_index(&the_index);
> if (do_diff_cache(tree_oid, &opt))
> return 1;
> diffcore_std(&opt);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 5664ff8f039..44d5e11c762 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -781,11 +781,28 @@ test_expect_success 'sparse-index is not expanded' '
> ensure_not_expanded reset --hard $ref || return 1
> done &&
>
> + ensure_not_expanded reset --mixed base &&
> ensure_not_expanded reset --hard update-deep &&
> ensure_not_expanded reset --keep base &&
> ensure_not_expanded reset --merge update-deep &&
> ensure_not_expanded reset --hard &&
>
> + ensure_not_expanded reset base -- deep/a &&
> + ensure_not_expanded reset base -- nonexistent-file &&
> + ensure_not_expanded reset deepest -- deep &&
> +
> + # Although folder1 is outside the sparse definition, it exists as a
> + # directory entry in the index, so the pathspec will not force the
> + # index to be expanded.
> + ensure_not_expanded reset deepest -- folder1 &&
> + ensure_not_expanded reset deepest -- folder1/ &&
> +
> + # Wildcard identifies only in-cone files, no index expansion
> + ensure_not_expanded reset deepest -- deep/\* &&
> +
> + # Wildcard identifies only full sparse directories, no index expansion
> + ensure_not_expanded reset deepest -- folder\* &&
> +
You've added two testcases where a wildcard results in no index
expansion; should there also be a test where a wildcard results in
index expansion for completeness?
Hi! On Wed, Oct 27, 2021 at 7:39 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > This series integrates the sparse index with git reset and provides > miscellaneous fixes and improvements to the command in sparse checkouts. ... > Changes since V3 > ================ > > * Replace git update-index --force-full-index with git reset update-folder1 > -- folder1/a, remove introduction of new --force-full-index option > entirely, and add comment clarifying the intent of sparse-index is > expanded and converted back test > * Fix authorship on reset: preserve skip-worktree bit in mixed reset > (current patch fully replaces original patch, but metadata of the > original wasn't properly replaced) > > > Changes since V4 > ================ > > * Update t1092 test 'checkout and reset (mixed)' to explicitly verify > differences between sparse and full checkouts I apologize for my tardiness in reviewing your updated series. You have addressed all my feedback from v2 and things look really good. I had a couple small questions on patch 7. As with my previous review, I kinda skipped over the last patch because I never figured out the cache_bottom stuff. I read it to see if there were any obvious mistakes to someone unfamiliar with that mechanism (i.e. me) but didn't see anything. I read over the earlier patches much more carefully. Anyway, other than patch 7 -- where I only had a minor nit on the commit message plus two questions (which might result in no changes), the series looks good.
Elijah Newren wrote: > On Thu, Oct 7, 2021 at 2:15 PM Victoria Dye via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Victoria Dye <vdye@github.com> >> >> Remove the `ensure_full_index` guard on `read_from_tree` and update `git >> reset --mixed` to ensure it can use sparse directory index entries wherever >> possible. Sparse directory entries are reset use `diff_tree_oid`, which > > I am having trouble parsing this second sentence. Was this meant to > be 'Sparse directory entries _which_ are reset use...'? > It should be "Sparse directory entries are reset _using_ `diff_tree_oid`...". >> requires `change` and `add_remove` functions to process the internal >> contents of the sparse directory. The `recursive` diff option handles cases >> in which `reset --mixed` must diff/merge files that are nested multiple >> levels deep in a sparse directory. >> >> The use of pathspecs with `git reset --mixed` introduces scenarios in which >> internal contents of sparse directories may be matched by the pathspec. In >> order to reset *all* files in the repo that may match the pathspec, the >> following conditions on the pathspec require index expansion before >> performing the reset: >> >> * "magic" pathspecs >> * wildcard pathspecs that do not match only in-cone files or entire sparse >> directories >> * literal pathspecs matching something outside the sparse checkout >> definition >> >> Helped-by: Elijah Newren <newren@gmail.com> >> Signed-off-by: Victoria Dye <vdye@github.com> >> --- >> builtin/reset.c | 78 +++++++++++++++++++++++- >> t/t1092-sparse-checkout-compatibility.sh | 17 ++++++ >> 2 files changed, 93 insertions(+), 2 deletions(-) >> >> diff --git a/builtin/reset.c b/builtin/reset.c >> index 0ac0de7dc97..60517e7e1d6 100644 >> --- a/builtin/reset.c >> +++ b/builtin/reset.c >> @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, >> * If the file 1) corresponds to an existing index entry with >> * skip-worktree set, or 2) does not exist in the index but is >> * outside the sparse checkout definition, add a skip-worktree bit >> - * to the new index entry. >> + * to the new index entry. Note that a sparse index will be expanded >> + * if this entry is outside the sparse cone - this is necessary >> + * to properly construct the reset sparse directory. >> */ >> pos = cache_name_pos(one->path, strlen(one->path)); >> if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || >> @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q, >> } >> } >> >> +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) >> +{ >> + unsigned int i, pos; >> + int res = 0; >> + char *skip_worktree_seen = NULL; >> + >> + /* >> + * When using a magic pathspec, assume for the sake of simplicity that >> + * the index needs to be expanded to match all matchable files. >> + */ >> + if (pathspec->magic) >> + return 1; >> + >> + for (i = 0; i < pathspec->nr; i++) { >> + struct pathspec_item item = pathspec->items[i]; >> + >> + /* >> + * If the pathspec item has a wildcard, the index should be expanded >> + * if the pathspec has the possibility of matching a subset of entries inside >> + * of a sparse directory (but not the entire directory). >> + * >> + * If the pathspec item is a literal path, the index only needs to be expanded >> + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't >> + * expand for in-cone files) and b) it doesn't match any sparse directories >> + * (since we can reset whole sparse directories without expanding them). >> + */ >> + if (item.nowildcard_len < item.len) { >> + for (pos = 0; pos < active_nr; pos++) { >> + struct cache_entry *ce = active_cache[pos]; >> + >> + if (!S_ISSPARSEDIR(ce->ce_mode)) >> + continue; > > This double loop over all pathspecs and over all index entries reminds > me of the original non-cone mode sparsity patterns. Stolee introduced > cone mode patterns specifically to avoid the expensiveness of such > double loops (cf. > https://lore.kernel.org/git/19d664a5dada87a9a8dcf18d7548582275593f10.1566313865.git.gitgitgadget@gmail.com/). > Can one of the functions he added allow us to avoid this double loop, > or are there complications that don't allow this (e.g. the actually > SKIP_WORKTREE paths don't quite match the requested sparsity paths in > some cases, or here we are faced with just a leading path of multiple > index entries)? > >> + /* >> + * If the pre-wildcard length is longer than the sparse >> + * directory name and the sparse directory is the first >> + * component of the pathspec, need to expand the index. >> + */ >> + if (item.nowildcard_len > ce_namelen(ce) && >> + !strncmp(item.original, ce->name, ce_namelen(ce))) { >> + res = 1; >> + break; >> + } >> + >> + /* >> + * If the pre-wildcard length is shorter than the sparse >> + * directory and the pathspec does not match the whole >> + * directory, need to expand the index. >> + */ >> + if (!strncmp(item.original, ce->name, item.nowildcard_len) && >> + wildmatch(item.original, ce->name, 0)) { >> + res = 1; >> + break; >> + } >> + } >> + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && >> + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) > > Oh, so you can at least generally avoid the double loop. That's good. > So is this just a case of wildcards are special and there isn't a way, > even in cone-mode, to avoid the double loop? > The wildcard pathspecs are difficult to handle as "cleanly" as the non-wildcard pathspecs, since the condition for expanding the index is whether the pathspec has the potential to match some, but not all, contents of any given sparse directory. Luckily, the double loop is constrained to only sparse directories in the index and wildcard pathspecs, and exits as soon as there is any indication that the index needs to be expanded. > (Given that I'm so tardy in reviewing this, even if the answer is that > the double loop is avoidable, or if we just don't know, I'd be totally > fine with a 'TODO: consider whether this double loop could be avoided > in cone mode using some kind of variant of > path_in_cone_mode_sparse_checkout()') > I had originally tried something like this, but even if a wildcard pathspec is in-cone, it could match something outside of it. For example, '*.c' is in-cone, but has the potential to match only some of the files in a sparse directory containing both *.c and *.h files. Looking at it again, though, if you constrain `path_in_cone_mode_sparse_checkout()` pathspecs further and require that the only post-`nowildcard_len` characters are '*', there's no risk of matching partial subsets of files in sparse directories. I'll add that in a re-roll. >> + res = 1; >> + >> + if (res > 0) >> + break; >> + } >> + >> + free(skip_worktree_seen); >> + return res; >> +} >> + >> static int read_from_tree(const struct pathspec *pathspec, >> struct object_id *tree_oid, >> int intent_to_add) >> @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec, >> opt.format_callback = update_index_from_diff; >> opt.format_callback_data = &intent_to_add; >> opt.flags.override_submodule_config = 1; >> + opt.flags.recursive = 1; >> opt.repo = the_repository; >> + opt.change = diff_change; >> + opt.add_remove = diff_addremove; >> + >> + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) >> + ensure_full_index(&the_index); >> >> - ensure_full_index(&the_index); >> if (do_diff_cache(tree_oid, &opt)) >> return 1; >> diffcore_std(&opt); >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index 4ac93874cb2..c9343ff5b9c 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -774,11 +774,28 @@ test_expect_success 'sparse-index is not expanded' ' >> ensure_not_expanded reset --hard $ref || return 1 >> done && >> >> + ensure_not_expanded reset --mixed base && >> ensure_not_expanded reset --hard update-deep && >> ensure_not_expanded reset --keep base && >> ensure_not_expanded reset --merge update-deep && >> ensure_not_expanded reset --hard && >> >> + ensure_not_expanded reset base -- deep/a && >> + ensure_not_expanded reset base -- nonexistent-file && >> + ensure_not_expanded reset deepest -- deep && >> + >> + # Although folder1 is outside the sparse definition, it exists as a >> + # directory entry in the index, so the pathspec will not force the >> + # index to be expanded. >> + ensure_not_expanded reset deepest -- folder1 && >> + ensure_not_expanded reset deepest -- folder1/ && >> + >> + # Wildcard identifies only in-cone files, no index expansion >> + ensure_not_expanded reset deepest -- deep/\* && >> + >> + # Wildcard identifies only full sparse directories, no index expansion >> + ensure_not_expanded reset deepest -- folder\* && >> + >> ensure_not_expanded checkout -f update-deep && >> test_config -C sparse-index pull.twohead ort && >> ( >> -- >> gitgitgadget
Elijah Newren wrote:
> Sorry, one more thing...
>
> On Wed, Oct 27, 2021 at 7:39 AM Victoria Dye via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Victoria Dye <vdye@github.com>
>>
>> Remove the `ensure_full_index` guard on `read_from_tree` and update `git
>> reset --mixed` to ensure it can use sparse directory index entries wherever
>> possible. Sparse directory entries are reset use `diff_tree_oid`, which
>> requires `change` and `add_remove` functions to process the internal
>> contents of the sparse directory. The `recursive` diff option handles cases
>> in which `reset --mixed` must diff/merge files that are nested multiple
>> levels deep in a sparse directory.
>>
>> The use of pathspecs with `git reset --mixed` introduces scenarios in which
>> internal contents of sparse directories may be matched by the pathspec. In
>> order to reset *all* files in the repo that may match the pathspec, the
>> following conditions on the pathspec require index expansion before
>> performing the reset:
>>
>> * "magic" pathspecs
>> * wildcard pathspecs that do not match only in-cone files or entire sparse
>> directories
>> * literal pathspecs matching something outside the sparse checkout
>> definition
>>
>> Helped-by: Elijah Newren <newren@gmail.com>
>> Signed-off-by: Victoria Dye <vdye@github.com>
>> ---
>> builtin/reset.c | 78 +++++++++++++++++++++++-
>> t/t1092-sparse-checkout-compatibility.sh | 17 ++++++
>> 2 files changed, 93 insertions(+), 2 deletions(-)
>>
>> diff --git a/builtin/reset.c b/builtin/reset.c
>> index 0ac0de7dc97..60517e7e1d6 100644
>> --- a/builtin/reset.c
>> +++ b/builtin/reset.c
>> @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>> * If the file 1) corresponds to an existing index entry with
>> * skip-worktree set, or 2) does not exist in the index but is
>> * outside the sparse checkout definition, add a skip-worktree bit
>> - * to the new index entry.
>> + * to the new index entry. Note that a sparse index will be expanded
>> + * if this entry is outside the sparse cone - this is necessary
>> + * to properly construct the reset sparse directory.
>> */
>> pos = cache_name_pos(one->path, strlen(one->path));
>> if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) ||
>> @@ -166,6 +168,73 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>> }
>> }
>>
>> +static int pathspec_needs_expanded_index(const struct pathspec *pathspec)
>> +{
>> + unsigned int i, pos;
>> + int res = 0;
>> + char *skip_worktree_seen = NULL;
>> +
>> + /*
>> + * When using a magic pathspec, assume for the sake of simplicity that
>> + * the index needs to be expanded to match all matchable files.
>> + */
>> + if (pathspec->magic)
>> + return 1;
>> +
>> + for (i = 0; i < pathspec->nr; i++) {
>> + struct pathspec_item item = pathspec->items[i];
>> +
>> + /*
>> + * If the pathspec item has a wildcard, the index should be expanded
>> + * if the pathspec has the possibility of matching a subset of entries inside
>> + * of a sparse directory (but not the entire directory).
>> + *
>> + * If the pathspec item is a literal path, the index only needs to be expanded
>> + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't
>> + * expand for in-cone files) and b) it doesn't match any sparse directories
>> + * (since we can reset whole sparse directories without expanding them).
>> + */
>> + if (item.nowildcard_len < item.len) {
>> + for (pos = 0; pos < active_nr; pos++) {
>> + struct cache_entry *ce = active_cache[pos];
>> +
>> + if (!S_ISSPARSEDIR(ce->ce_mode))
>> + continue;
>> +
>> + /*
>> + * If the pre-wildcard length is longer than the sparse
>> + * directory name and the sparse directory is the first
>> + * component of the pathspec, need to expand the index.
>> + */
>> + if (item.nowildcard_len > ce_namelen(ce) &&
>> + !strncmp(item.original, ce->name, ce_namelen(ce))) {
>> + res = 1;
>> + break;
>> + }
>> +
>> + /*
>> + * If the pre-wildcard length is shorter than the sparse
>> + * directory and the pathspec does not match the whole
>> + * directory, need to expand the index.
>> + */
>> + if (!strncmp(item.original, ce->name, item.nowildcard_len) &&
>> + wildmatch(item.original, ce->name, 0)) {
>> + res = 1;
>> + break;
>> + }
>> + }
>> + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
>> + !matches_skip_worktree(pathspec, i, &skip_worktree_seen))
>> + res = 1;
>> +
>> + if (res > 0)
>> + break;
>> + }
>> +
>> + free(skip_worktree_seen);
>> + return res;
>> +}
>> +
>> static int read_from_tree(const struct pathspec *pathspec,
>> struct object_id *tree_oid,
>> int intent_to_add)
>> @@ -178,9 +247,14 @@ static int read_from_tree(const struct pathspec *pathspec,
>> opt.format_callback = update_index_from_diff;
>> opt.format_callback_data = &intent_to_add;
>> opt.flags.override_submodule_config = 1;
>> + opt.flags.recursive = 1;
>> opt.repo = the_repository;
>> + opt.change = diff_change;
>> + opt.add_remove = diff_addremove;
>> +
>> + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec))
>> + ensure_full_index(&the_index);
>>
>> - ensure_full_index(&the_index);
>> if (do_diff_cache(tree_oid, &opt))
>> return 1;
>> diffcore_std(&opt);
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 5664ff8f039..44d5e11c762 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -781,11 +781,28 @@ test_expect_success 'sparse-index is not expanded' '
>> ensure_not_expanded reset --hard $ref || return 1
>> done &&
>>
>> + ensure_not_expanded reset --mixed base &&
>> ensure_not_expanded reset --hard update-deep &&
>> ensure_not_expanded reset --keep base &&
>> ensure_not_expanded reset --merge update-deep &&
>> ensure_not_expanded reset --hard &&
>>
>> + ensure_not_expanded reset base -- deep/a &&
>> + ensure_not_expanded reset base -- nonexistent-file &&
>> + ensure_not_expanded reset deepest -- deep &&
>> +
>> + # Although folder1 is outside the sparse definition, it exists as a
>> + # directory entry in the index, so the pathspec will not force the
>> + # index to be expanded.
>> + ensure_not_expanded reset deepest -- folder1 &&
>> + ensure_not_expanded reset deepest -- folder1/ &&
>> +
>> + # Wildcard identifies only in-cone files, no index expansion
>> + ensure_not_expanded reset deepest -- deep/\* &&
>> +
>> + # Wildcard identifies only full sparse directories, no index expansion
>> + ensure_not_expanded reset deepest -- folder\* &&
>> +
>
> You've added two testcases where a wildcard results in no index
> expansion; should there also be a test where a wildcard results in
> index expansion for completeness?
>
The tests haven't verified when the index *is* expanded for any of the
commands implemented so far (save for the one ensuring that, when the index
is expanded, the expansion is logged via trace2). I see the value in it
(e.g. using the tests to demonstrate what can trigger index expansion).
Conversely, if the tests are intended to confirm "successful" sparse index
support (where index expansion is equivalent to "the sparse index is not
supported"), then verifying index expansion doesn't necessarily fit that
purpose.
I'm not sure which interpretation is "correct", but if it does make sense to
test expansion cases I'm happy to add them here (or, if not in this series,
add a TODO to include them in the future).
For what it's worth, the test 'reset with wildcard pathspec' in [4/8] is
intended to cover a broader set of wildcard scenarios (verifying
correctness, rather than index expansion). Given the other updates I intend
to make to wildcard handling, I'm planning on adding cases to that test in
my next re-roll.
This series integrates the sparse index with git reset and provides miscellaneous fixes and improvements to the command in sparse checkouts. This includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the command 2. repository settings to enable the sparse index with ensure_full_index guarding any code paths that break tests without other compatibility updates. 3. modifications to remove or reduce the scope in which ensure_full_index must be called. The sparse index updates are predicated on a fix originating from the microsoft/git fork [1], correcting how git reset --mixed handles resetting entries outside the sparse checkout definition. Additionally, a performance "bug" in next_cache_entry with sparse index is corrected, preventing repeatedly looping over already-searched entries. The p2000 tests demonstrate a ~70% execution time reduction in git reset using a sparse index, and no change (within expected variability [2]) using a full index. Results summarized below [3, 4]: Test base [5/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.97(0.50+0.37) -3.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.96(0.50+0.36) -4.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 1.37(0.91+0.35) -18.5% git reset --hard (sparse-v4) 1.70(1.18+0.40) 1.41(0.94+0.35) -17.1% Test base [6/8] ----------------------------------------------------------------------- git reset --hard (full-v3) 1.00(0.50+0.39) 0.94(0.48+0.34) -6.0% git reset --hard (full-v4) 1.00(0.51+0.38) 0.95(0.51+0.34) -5.0% git reset --hard (sparse-v3) 1.68(1.17+0.39) 0.46(0.05+0.29) -72.6% git reset --hard (sparse-v4) 1.70(1.18+0.40) 0.46(0.06+0.29) -72.9% Test base [7/8] --------------------------------------------------------------------------- git reset (full-v3) 0.77(0.27+0.37) 0.72(0.26+0.32) -6.5% git reset (full-v4) 0.75(0.27+0.34) 0.73(0.26+0.32) -2.7% git reset (sparse-v3) 1.44(0.96+0.36) 0.43(0.04+0.96) -70.1% git reset (sparse-v4) 1.46(0.97+0.36) 0.43(0.05+0.79) -70.5% git reset -- missing (full-v3) 0.72(0.26+0.32) 0.69(0.26+0.30) -4.2% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.71(0.27+0.32) -4.1% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.81(0.42+0.90) -44.1% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.79(0.42+0.76) -44.0% Test base [8/8] --------------------------------------------------------------------------- git reset -- missing (full-v3) 0.72(0.26+0.32) 0.73(0.26+0.33) +1.4% git reset -- missing (full-v4) 0.74(0.28+0.33) 0.74(0.27+0.32) +0.0% git reset -- missing (sparse-v3) 1.45(0.97+0.35) 0.43(0.05+0.80) -70.3% git reset -- missing (sparse-v4) 1.41(0.94+0.34) 0.44(0.05+0.76) -68.8% Changes since V1 ================ * Add --force-full-index option to update-index. The option is used circumvent changing command_requires_full_index from its default value - right now this is effectively a no-op, but will change once update-index is integrated with sparse index. By using this option in the t1092 expand/collapse test, the command used to test will not need to be updated with subsequent sparse index integrations. * Update implementation of mixed reset for entries outside sparse checkout definition. The condition in which a file should be checked out before index reset is simplified to "if it has skip-worktree enabled and a reset would change the file, check it out". * After checking the behavior of update_index_from_diff with renames, found that the diff used by reset does not produce diff queue entries with different pathnames for one and two. Because of this, and that nothing in the implementation seems to rely on identical path names, no BUG check is added. * Correct a bug in the sparse index is not expanded tests in t1092 where failure of a git reset --mixed test was not being reported. Test now verifies an appropriate scenario with corrected failure-checking. Changes since V2 ================ * Replace patch adding checkouts for git reset --mixed with sparse checkout with preserving the skip-worktree flag (including a new test for git reset --mixed and update to t1092 - checkout and reset (mixed)) * Move rename of is_missing into its own patch * Further extend t1092 tests and remove unnecessary commands/tests where possible * Refine logic determining which pathspecs require ensure_full_index in git reset --mixed, add related ensure_not_expanded tests * Add index_search_mode enum to index_name_stage_pos * Clean up variable usage & remove unnecessary subtree_path in prime_cache_tree_rec * Update cover letter performance data * More thoroughly explain changes in each commit message Changes since V3 ================ * Replace git update-index --force-full-index with git reset update-folder1 -- folder1/a, remove introduction of new --force-full-index option entirely, and add comment clarifying the intent of sparse-index is expanded and converted back test * Fix authorship on reset: preserve skip-worktree bit in mixed reset (current patch fully replaces original patch, but metadata of the original wasn't properly replaced) Changes since V4 ================ * Update t1092 test 'checkout and reset (mixed)' to explicitly verify differences between sparse and full checkouts Changes since V5 ================ * Update t1092 test 'reset with wildcard pathspec' with more cases and better checks * Add "special case" wildcard pathspec check when determining whether to expand the index (avoids double-loop over pathspecs & index entries) Thanks! -Victoria [1] microsoft@6b8a074 [2] https://lore.kernel.org/git/8b9fe3f8-f0e3-4567-b20b-17c92bd1a5c5@github.com/ [3] If a test and/or commit is not mentioned, there is no significant change to performance [4] Pathspec "does-not-exist" is changed to "missing" to save space in performance report Victoria Dye (8): reset: rename is_missing to !is_in_reset_tree reset: preserve skip-worktree bit in mixed reset sparse-index: update command for expand/collapse test reset: expand test coverage for sparse checkouts reset: integrate with sparse index reset: make sparse-aware (except --mixed) reset: make --mixed sparse-aware unpack-trees: improve performance of next_cache_entry builtin/reset.c | 113 ++++++++++++++++- cache-tree.c | 46 ++++++- cache.h | 10 ++ read-cache.c | 27 ++-- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 154 ++++++++++++++++++++--- t/t7102-reset.sh | 17 +++ unpack-trees.c | 23 +++- 8 files changed, 356 insertions(+), 37 deletions(-) base-commit: cefe983a320c03d7843ac78e73bd513a27806845 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1048%2Fvdye%2Fvdye%2Fsparse-index-part1-v6 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1048/vdye/vdye/sparse-index-part1-v6 Pull-Request: https://github.com/gitgitgadget/git/pull/1048 Range-diff vs v5: 1: ad7013a31aa = 1: ad7013a31aa reset: rename is_missing to !is_in_reset_tree 2: b221b00b7e0 = 2: b221b00b7e0 reset: preserve skip-worktree bit in mixed reset 3: 1bb2ca92c60 = 3: 1bb2ca92c60 sparse-index: update command for expand/collapse test 4: cc76c694647 ! 4: 741a2c9ffaa reset: expand test coverage for sparse checkouts @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and rese +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + -+ test_all_match git checkout -b reset-test update-deep && -+ test_all_match git reset base -- \*/a && -+ test_all_match git status --porcelain=v2 && -+ test_all_match git rev-parse HEAD:folder1/a && ++ test_all_match git reset update-deep -- deep\* && ++ test_all_match git ls-files -s -- deep && + -+ test_all_match git reset base -- folder\* && -+ test_all_match git status --porcelain=v2 && -+ test_all_match git rev-parse HEAD:folder2 ++ test_all_match git reset deepest -- deep\*\*\* && ++ test_all_match git ls-files -s -- deep && ++ ++ # The following `git reset`s result in updating the index on files with ++ # `skip-worktree` enabled. To avoid failing due to discrepencies in reported ++ # "modified" files, `test_sparse_match` reset is performed separately from ++ # "full-checkout" reset, then the index contents of all repos are verified. ++ ++ test_sparse_match git reset update-folder1 -- \*/a && ++ git -C full-checkout reset update-folder1 -- \*/a && ++ test_all_match git ls-files -s -- deep/a folder1/a && ++ ++ test_sparse_match git reset update-folder2 -- folder\* && ++ git -C full-checkout reset update-folder2 -- folder\* && ++ test_all_match git ls-files -s -- folder10 folder1 folder2 && ++ ++ test_sparse_match git reset base -- folder1/\* && ++ git -C full-checkout reset base -- folder1/\* && ++ test_all_match git ls-files -s -- folder1 +' + test_expect_success 'merge, cherry-pick, and rebase' ' 5: 217ae445418 = 5: 65b0eafd27c reset: integrate with sparse index 6: a3e2fd59867 = 6: 908c84005b9 reset: make sparse-aware (except --mixed) 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware @@ Commit message Remove the `ensure_full_index` guard on `read_from_tree` and update `git reset --mixed` to ensure it can use sparse directory index entries wherever - possible. Sparse directory entries are reset use `diff_tree_oid`, which + possible. Sparse directory entries are reset using `diff_tree_oid`, which requires `change` and `add_remove` functions to process the internal contents of the sparse directory. The `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, + * (since we can reset whole sparse directories without expanding them). + */ + if (item.nowildcard_len < item.len) { ++ /* ++ * Special case: if the pattern is a path inside the cone ++ * followed by only wildcards, the pattern cannot match ++ * partial sparse directories, so we don't expand the index. ++ */ ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) ++ continue; ++ + for (pos = 0; pos < active_nr; pos++) { + struct cache_entry *ce = active_cache[pos]; + 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry -- gitgitgadget
From: Victoria Dye <vdye@github.com> Rename and invert value of `is_missing` to `is_in_reset_tree` to make the variable more descriptive of what it represents. Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 51c9e2f43ff..d3695ce43c4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -131,10 +131,10 @@ static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; - int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; - if (is_missing && !intent_to_add) { + if (!is_in_reset_tree && !intent_to_add) { remove_file_from_cache(one->path); continue; } @@ -144,7 +144,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); - if (is_missing) { + if (!is_in_reset_tree) { ce->ce_flags |= CE_INTENT_TO_ADD; set_object_name_for_intent_to_add_entry(ce); } -- gitgitgadget
From: Victoria Dye <vdye@github.com> Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 14 ++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 22 ++++++++-------------- t/t7102-reset.sh | 17 +++++++++++++++++ 3 files changed, 39 insertions(+), 14 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index d3695ce43c4..e441b6601b9 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -25,6 +25,7 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" #define REFRESH_INDEX_DELAY_WARNING_IN_MS (2 * 1000) @@ -130,6 +131,7 @@ static void update_index_from_diff(struct diff_queue_struct *q, int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { + int pos; struct diff_filespec *one = q->queue[i]->one; int is_in_reset_tree = one->mode && !is_null_oid(&one->oid); struct cache_entry *ce; @@ -141,6 +143,18 @@ static void update_index_from_diff(struct diff_queue_struct *q, ce = make_cache_entry(&the_index, one->mode, &one->oid, one->path, 0, 0); + + /* + * If the file 1) corresponds to an existing index entry with + * skip-worktree set, or 2) does not exist in the index but is + * outside the sparse checkout definition, add a skip-worktree bit + * to the new index entry. + */ + pos = cache_name_pos(one->path, strlen(one->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || + (pos < 0 && !path_in_sparse_checkout(one->path, &the_index))) + ce->ce_flags |= CE_SKIP_WORKTREE; + if (!ce) die(_("make_cache_entry failed for path '%s'"), one->path); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 886e78715fe..c7449afe965 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -459,26 +459,20 @@ test_expect_failure 'blame with pathspec outside sparse definition' ' test_all_match git blame deep/deeper2/deepest/a ' -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_failure 'checkout and reset (mixed)' ' +test_expect_success 'checkout and reset (mixed)' ' init_repos && test_all_match git checkout -b reset-test update-deep && test_all_match git reset deepest && - test_all_match git reset update-folder1 && - test_all_match git reset update-folder2 -' - -# NEEDSWORK: a sparse-checkout behaves differently from a full checkout -# in this scenario, but it shouldn't. -test_expect_success 'checkout and reset (mixed) [sparse]' ' - init_repos && - test_sparse_match git checkout -b reset-test update-deep && - test_sparse_match git reset deepest && + # Because skip-worktree is preserved, resetting to update-folder1 + # will show worktree changes for folder1/a in full-checkout, but not + # in sparse-checkout or sparse-index. + git -C full-checkout reset update-folder1 >full-checkout-out && test_sparse_match git reset update-folder1 && - test_sparse_match git reset update-folder2 + grep "M folder1/a" full-checkout-out && + ! grep "M folder1/a" sparse-checkout-out && + run_on_sparse test_path_is_missing folder1 ' test_expect_success 'merge, cherry-pick, and rebase' ' diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh index 601b2bf97f0..d05426062ec 100755 --- a/t/t7102-reset.sh +++ b/t/t7102-reset.sh @@ -472,6 +472,23 @@ test_expect_success '--mixed refreshes the index' ' test_cmp expect output ' +test_expect_success '--mixed preserves skip-worktree' ' + echo 123 >>file2 && + git add file2 && + git update-index --skip-worktree file2 && + git reset --mixed HEAD >output && + test_must_be_empty output && + + cat >expect <<-\EOF && + Unstaged changes after reset: + M file2 + EOF + git update-index --no-skip-worktree file2 && + git add file2 && + git reset --mixed HEAD >output && + test_cmp expect output +' + test_expect_success 'resetting specific path that is unmerged' ' git rm --cached file2 && F1=$(git rev-parse HEAD:file1) && -- gitgitgadget
From: Victoria Dye <vdye@github.com> In anticipation of `git reset --hard` being able to use the sparse index without expanding it, replace the command in `sparse-index is expanded and converted back` with `git reset -- folder1/a`. This command will need to expand the index to work properly, even after integrating the rest of `reset` with sparse index. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/t1092-sparse-checkout-compatibility.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index c7449afe965..cab6340a9d0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -634,11 +634,15 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +# When working with a sparse index, some commands will need to expand the +# index to operate properly. If those commands also write the index back +# to disk, they need to convert the index to sparse before writing. +# This test verifies that both of these events are logged in trace2 logs. test_expect_success 'sparse-index is expanded and converted back' ' init_repos && GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && + git -C sparse-index reset -- folder1/a && test_region index convert_to_sparse trace2.txt && test_region index ensure_full_index trace2.txt ' -- gitgitgadget
From: Victoria Dye <vdye@github.com> Add new tests for `--merge` and `--keep` modes, as well as mixed reset with pathspecs. New performance test cases exercise various execution paths for `reset`. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> --- t/perf/p2000-sparse-operations.sh | 3 + t/t1092-sparse-checkout-compatibility.sh | 98 ++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 597626276fb..bfd332120c8 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -110,5 +110,8 @@ test_perf_on_all git add -A test_perf_on_all git add . test_perf_on_all git commit -a -m A test_perf_on_all git checkout -f - +test_perf_on_all git reset +test_perf_on_all git reset --hard +test_perf_on_all git reset -- does-not-exist test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index cab6340a9d0..4125525ab86 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -475,6 +475,104 @@ test_expect_success 'checkout and reset (mixed)' ' run_on_sparse test_path_is_missing folder1 ' +test_expect_success 'checkout and reset (merge)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --merge deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --merge deepest +' + +test_expect_success 'checkout and reset (keep)' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents a && + test_all_match git reset --keep deepest && + test_all_match git status --porcelain=v2 && + + test_all_match git reset --hard update-deep && + run_on_all ../edit-contents deep/a && + test_all_match test_must_fail git reset --keep deepest +' + +test_expect_success 'reset with pathspecs inside sparse definition' ' + init_repos && + + write_script edit-contents <<-\EOF && + echo text >>$1 + EOF + + test_all_match git checkout -b reset-test update-deep && + run_on_all ../edit-contents deep/a && + + test_all_match git reset base -- deep/a && + test_all_match git status --porcelain=v2 && + + test_all_match git reset base -- nonexistent-file && + test_all_match git status --porcelain=v2 && + + test_all_match git reset deepest -- deep && + test_all_match git status --porcelain=v2 +' + +# Although the working tree differs between full and sparse checkouts after +# reset, the state of the index is the same. +test_expect_success 'reset with pathspecs outside sparse definition' ' + init_repos && + test_all_match git checkout -b reset-test base && + + test_sparse_match git reset update-folder1 -- folder1 && + git -C full-checkout reset update-folder1 -- folder1 && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder1 && + + test_sparse_match git reset update-folder2 -- folder2/a && + git -C full-checkout reset update-folder2 -- folder2/a && + test_sparse_match git status --porcelain=v2 && + test_all_match git rev-parse HEAD:folder2/a +' + +test_expect_success 'reset with wildcard pathspec' ' + init_repos && + + test_all_match git reset update-deep -- deep\* && + test_all_match git ls-files -s -- deep && + + test_all_match git reset deepest -- deep\*\*\* && + test_all_match git ls-files -s -- deep && + + # The following `git reset`s result in updating the index on files with + # `skip-worktree` enabled. To avoid failing due to discrepencies in reported + # "modified" files, `test_sparse_match` reset is performed separately from + # "full-checkout" reset, then the index contents of all repos are verified. + + test_sparse_match git reset update-folder1 -- \*/a && + git -C full-checkout reset update-folder1 -- \*/a && + test_all_match git ls-files -s -- deep/a folder1/a && + + test_sparse_match git reset update-folder2 -- folder\* && + git -C full-checkout reset update-folder2 -- folder\* && + test_all_match git ls-files -s -- folder10 folder1 folder2 && + + test_sparse_match git reset base -- folder1/\* && + git -C full-checkout reset base -- folder1/\* && + test_all_match git ls-files -s -- folder1 +' + test_expect_success 'merge, cherry-pick, and rebase' ' init_repos && -- gitgitgadget
From: Victoria Dye <vdye@github.com> Disable `command_requires_full_index` repo setting and add `ensure_full_index` guards around code paths that cannot yet use sparse directory index entries. `reset --soft` does not modify the index, so no compatibility changes are needed for it to function without expanding the index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`), the full index is expanded to prevent cache tree corruption and invalid variable accesses. Additionally, the `read_cache()` check verifying an uncorrupted index is moved after argument parsing and preparing the repo settings. The index is not used by the preceding argument handling, but `read_cache()` must be run *after* enabling sparse index for the command (so that the index is not expanded unnecessarily) and *before* using the index for reset (so that it is verified as uncorrupted). Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 10 +++++++--- cache-tree.c | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index e441b6601b9..0ac0de7dc97 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -180,6 +180,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.flags.override_submodule_config = 1; opt.repo = the_repository; + ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); @@ -257,9 +258,6 @@ static void parse_args(struct pathspec *pathspec, } *rev_ret = rev; - if (read_cache() < 0) - die(_("index file corrupt")); - parse_pathspec(pathspec, 0, PATHSPEC_PREFER_FULL | (patch_mode ? PATHSPEC_PREFIX_ORIGIN : 0), @@ -405,6 +403,12 @@ int cmd_reset(int argc, const char **argv, const char *prefix) if (intent_to_add && reset_type != MIXED) die(_("-N can only be used with --mixed")); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + + if (read_cache() < 0) + die(_("index file corrupt")); + /* Soft reset does not touch the index file nor the working tree * at all, but requires them in a good order. Other resets reset * the index file to the tree object we are switching to. */ diff --git a/cache-tree.c b/cache-tree.c index 90919f9e345..9be19c85b66 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -776,6 +776,7 @@ void prime_cache_tree(struct repository *r, cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + ensure_full_index(istate); prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- cache-tree.c | 47 ++++++++++++++++++++++-- cache.h | 10 +++++ read-cache.c | 27 ++++++++++---- t/t1092-sparse-checkout-compatibility.sh | 15 +++++++- 4 files changed, 86 insertions(+), 13 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 9be19c85b66..2866101052c 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -740,15 +740,26 @@ out: return ret; } +static void prime_cache_tree_sparse_dir(struct cache_tree *it, + struct tree *tree) +{ + + oidcpy(&it->oid, &tree->object.oid); + it->entry_count = 1; +} + static void prime_cache_tree_rec(struct repository *r, struct cache_tree *it, - struct tree *tree) + struct tree *tree, + struct strbuf *tree_path) { struct tree_desc desc; struct name_entry entry; int cnt; + int base_path_len = tree_path->len; oidcpy(&it->oid, &tree->object.oid); + init_tree_desc(&desc, tree->buffer, tree->size); cnt = 0; while (tree_entry(&desc, &entry)) { @@ -757,14 +768,40 @@ static void prime_cache_tree_rec(struct repository *r, else { struct cache_tree_sub *sub; struct tree *subtree = lookup_tree(r, &entry.oid); + if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); sub->cache_tree = cache_tree(); - prime_cache_tree_rec(r, sub->cache_tree, subtree); + + /* + * Recursively-constructed subtree path is only needed when working + * in a sparse index (where it's used to determine whether the + * subtree is a sparse directory in the index). + */ + if (r->index->sparse_index) { + strbuf_setlen(tree_path, base_path_len); + strbuf_grow(tree_path, base_path_len + entry.pathlen + 1); + strbuf_add(tree_path, entry.path, entry.pathlen); + strbuf_addch(tree_path, '/'); + } + + /* + * If a sparse index is in use, the directory being processed may be + * sparse. To confirm that, we can check whether an entry with that + * exact name exists in the index. If it does, the created subtree + * should be sparse. Otherwise, cache tree expansion should continue + * as normal. + */ + if (r->index->sparse_index && + index_entry_exists(r->index, tree_path->buf, tree_path->len)) + prime_cache_tree_sparse_dir(sub->cache_tree, subtree); + else + prime_cache_tree_rec(r, sub->cache_tree, subtree, tree_path); cnt += sub->cache_tree->entry_count; } } + it->entry_count = cnt; } @@ -772,12 +809,14 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + struct strbuf tree_path = STRBUF_INIT; + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); - ensure_full_index(istate); - prime_cache_tree_rec(r, istate->cache_tree, tree); + prime_cache_tree_rec(r, istate->cache_tree, tree, &tree_path); + strbuf_release(&tree_path); istate->cache_changed |= CACHE_TREE_CHANGED; trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } diff --git a/cache.h b/cache.h index f6295f3b048..1d3e4665562 100644 --- a/cache.h +++ b/cache.h @@ -816,6 +816,16 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na */ int index_name_pos(struct index_state *, const char *name, int namelen); +/* + * Determines whether an entry with the given name exists within the + * given index. The return value is 1 if an exact match is found, otherwise + * it is 0. Note that, unlike index_name_pos, this function does not expand + * the index if it is sparse. If an item exists within the full index but it + * is contained within a sparse directory (and not in the sparse index), 0 is + * returned. + */ +int index_entry_exists(struct index_state *, const char *name, int namelen); + /* * Some functions return the negative complement of an insert position when a * precise match was not found but a position was found where the entry would diff --git a/read-cache.c b/read-cache.c index f5d4385c408..c079ece981a 100644 --- a/read-cache.c +++ b/read-cache.c @@ -68,6 +68,11 @@ */ #define CACHE_ENTRY_PATH_LENGTH 80 +enum index_search_mode { + NO_EXPAND_SPARSE = 0, + EXPAND_SPARSE = 1 +}; + static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool *mem_pool, size_t len) { struct cache_entry *ce; @@ -551,7 +556,10 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, + const char *name, int namelen, + int stage, + enum index_search_mode search_mode) { int first, last; @@ -570,7 +578,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in first = next+1; } - if (istate->sparse_index && + if (search_mode == EXPAND_SPARSE && istate->sparse_index && first > 0) { /* Note: first <= istate->cache_nr */ struct cache_entry *ce = istate->cache[first - 1]; @@ -586,7 +594,7 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in ce_namelen(ce) < namelen && !strncmp(name, ce->name, ce_namelen(ce))) { ensure_full_index(istate); - return index_name_stage_pos(istate, name, namelen, stage); + return index_name_stage_pos(istate, name, namelen, stage, search_mode); } } @@ -595,7 +603,12 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in int index_name_pos(struct index_state *istate, const char *name, int namelen) { - return index_name_stage_pos(istate, name, namelen, 0); + return index_name_stage_pos(istate, name, namelen, 0, EXPAND_SPARSE); +} + +int index_entry_exists(struct index_state *istate, const char *name, int namelen) +{ + return index_name_stage_pos(istate, name, namelen, 0, NO_EXPAND_SPARSE) >= 0; } int remove_index_entry_at(struct index_state *istate, int pos) @@ -1222,7 +1235,7 @@ static int has_dir_name(struct index_state *istate, */ } - pos = index_name_stage_pos(istate, name, len, stage); + pos = index_name_stage_pos(istate, name, len, stage, EXPAND_SPARSE); if (pos >= 0) { /* * Found one, but not so fast. This could @@ -1322,7 +1335,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) pos = index_pos_to_insert_pos(istate->cache_nr); else - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); /* existing match? Just replace it. */ if (pos >= 0) { @@ -1357,7 +1370,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!ok_to_replace) return error(_("'%s' appears as both a file and as a directory"), ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce), EXPAND_SPARSE); pos = -pos-1; } return pos + 1; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 4125525ab86..871cc3fcb8d 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -777,9 +777,9 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded checkout - && ensure_not_expanded switch rename-out-to-out && ensure_not_expanded switch - && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 && - git -C sparse-index reset --hard && + ensure_not_expanded reset --hard && ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 && echo >>sparse-index/README.md && @@ -789,6 +789,17 @@ test_expect_success 'sparse-index is not expanded' ' echo >>sparse-index/untracked.txt && ensure_not_expanded add . && + for ref in update-deep update-folder1 update-folder2 update-deep + do + echo >>sparse-index/README.md && + ensure_not_expanded reset --hard $ref || return 1 + done && + + ensure_not_expanded reset --hard update-deep && + ensure_not_expanded reset --keep base && + ensure_not_expanded reset --merge update-deep && + ensure_not_expanded reset --hard && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> Remove the `ensure_full_index` guard on `read_from_tree` and update `git reset --mixed` to ensure it can use sparse directory index entries wherever possible. Sparse directory entries are reset using `diff_tree_oid`, which requires `change` and `add_remove` functions to process the internal contents of the sparse directory. The `recursive` diff option handles cases in which `reset --mixed` must diff/merge files that are nested multiple levels deep in a sparse directory. The use of pathspecs with `git reset --mixed` introduces scenarios in which internal contents of sparse directories may be matched by the pathspec. In order to reset *all* files in the repo that may match the pathspec, the following conditions on the pathspec require index expansion before performing the reset: * "magic" pathspecs * wildcard pathspecs that do not match only in-cone files or entire sparse directories * literal pathspecs matching something outside the sparse checkout definition Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> --- builtin/reset.c | 87 +++++++++++++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 17 +++++ 2 files changed, 102 insertions(+), 2 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index 0ac0de7dc97..dcb79fb43a3 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -148,7 +148,9 @@ static void update_index_from_diff(struct diff_queue_struct *q, * If the file 1) corresponds to an existing index entry with * skip-worktree set, or 2) does not exist in the index but is * outside the sparse checkout definition, add a skip-worktree bit - * to the new index entry. + * to the new index entry. Note that a sparse index will be expanded + * if this entry is outside the sparse cone - this is necessary + * to properly construct the reset sparse directory. */ pos = cache_name_pos(one->path, strlen(one->path)); if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) || @@ -166,6 +168,82 @@ static void update_index_from_diff(struct diff_queue_struct *q, } } +static int pathspec_needs_expanded_index(const struct pathspec *pathspec) +{ + unsigned int i, pos; + int res = 0; + char *skip_worktree_seen = NULL; + + /* + * When using a magic pathspec, assume for the sake of simplicity that + * the index needs to be expanded to match all matchable files. + */ + if (pathspec->magic) + return 1; + + for (i = 0; i < pathspec->nr; i++) { + struct pathspec_item item = pathspec->items[i]; + + /* + * If the pathspec item has a wildcard, the index should be expanded + * if the pathspec has the possibility of matching a subset of entries inside + * of a sparse directory (but not the entire directory). + * + * If the pathspec item is a literal path, the index only needs to be expanded + * if a) the pathspec isn't in the sparse checkout cone (to make sure we don't + * expand for in-cone files) and b) it doesn't match any sparse directories + * (since we can reset whole sparse directories without expanding them). + */ + if (item.nowildcard_len < item.len) { + /* + * Special case: if the pattern is a path inside the cone + * followed by only wildcards, the pattern cannot match + * partial sparse directories, so we don't expand the index. + */ + if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && + strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) + continue; + + for (pos = 0; pos < active_nr; pos++) { + struct cache_entry *ce = active_cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + continue; + + /* + * If the pre-wildcard length is longer than the sparse + * directory name and the sparse directory is the first + * component of the pathspec, need to expand the index. + */ + if (item.nowildcard_len > ce_namelen(ce) && + !strncmp(item.original, ce->name, ce_namelen(ce))) { + res = 1; + break; + } + + /* + * If the pre-wildcard length is shorter than the sparse + * directory and the pathspec does not match the whole + * directory, need to expand the index. + */ + if (!strncmp(item.original, ce->name, item.nowildcard_len) && + wildmatch(item.original, ce->name, 0)) { + res = 1; + break; + } + } + } else if (!path_in_cone_mode_sparse_checkout(item.original, &the_index) && + !matches_skip_worktree(pathspec, i, &skip_worktree_seen)) + res = 1; + + if (res > 0) + break; + } + + free(skip_worktree_seen); + return res; +} + static int read_from_tree(const struct pathspec *pathspec, struct object_id *tree_oid, int intent_to_add) @@ -178,9 +256,14 @@ static int read_from_tree(const struct pathspec *pathspec, opt.format_callback = update_index_from_diff; opt.format_callback_data = &intent_to_add; opt.flags.override_submodule_config = 1; + opt.flags.recursive = 1; opt.repo = the_repository; + opt.change = diff_change; + opt.add_remove = diff_addremove; + + if (pathspec->nr && the_index.sparse_index && pathspec_needs_expanded_index(pathspec)) + ensure_full_index(&the_index); - ensure_full_index(&the_index); if (do_diff_cache(tree_oid, &opt)) return 1; diffcore_std(&opt); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 871cc3fcb8d..77e302a0ef3 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -795,11 +795,28 @@ test_expect_success 'sparse-index is not expanded' ' ensure_not_expanded reset --hard $ref || return 1 done && + ensure_not_expanded reset --mixed base && ensure_not_expanded reset --hard update-deep && ensure_not_expanded reset --keep base && ensure_not_expanded reset --merge update-deep && ensure_not_expanded reset --hard && + ensure_not_expanded reset base -- deep/a && + ensure_not_expanded reset base -- nonexistent-file && + ensure_not_expanded reset deepest -- deep && + + # Although folder1 is outside the sparse definition, it exists as a + # directory entry in the index, so the pathspec will not force the + # index to be expanded. + ensure_not_expanded reset deepest -- folder1 && + ensure_not_expanded reset deepest -- folder1/ && + + # Wildcard identifies only in-cone files, no index expansion + ensure_not_expanded reset deepest -- deep/\* && + + # Wildcard identifies only full sparse directories, no index expansion + ensure_not_expanded reset deepest -- folder\* && + ensure_not_expanded checkout -f update-deep && test_config -C sparse-index pull.twohead ort && ( -- gitgitgadget
From: Victoria Dye <vdye@github.com> To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> --- unpack-trees.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8ea0a542da8..b94733de6be 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -645,17 +645,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce, } } -static struct cache_entry *next_cache_entry(struct unpack_trees_options *o) +static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint) { const struct index_state *index = o->src_index; int pos = o->cache_bottom; + if (*hint > pos) + pos = *hint; + while (pos < index->cache_nr) { struct cache_entry *ce = index->cache[pos]; - if (!(ce->ce_flags & CE_UNPACKED)) + if (!(ce->ce_flags & CE_UNPACKED)) { + *hint = pos + 1; return ce; + } pos++; } + + *hint = pos; return NULL; } @@ -1365,12 +1372,13 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str /* Are we supposed to look at the index too? */ if (o->merge) { + int hint = -1; while (1) { int cmp; struct cache_entry *ce; if (o->diff_index_cached) - ce = next_cache_entry(o); + ce = next_cache_entry(o, &hint); else ce = find_cache_entry(info, p); @@ -1690,7 +1698,7 @@ static int verify_absent(const struct cache_entry *, int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { struct repository *repo = the_repository; - int i, ret; + int i, hint, ret; static struct cache_entry *dfc; struct pattern_list pl; int free_pattern_list = 0; @@ -1763,13 +1771,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options info.pathspec = o->pathspec; if (o->prefix) { + hint = -1; + /* * Unpack existing index entries that sort before the * prefix the tree is spliced into. Note that o->merge * is always true in this case. */ while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (ce_in_traverse_path(ce, &info)) @@ -1790,8 +1800,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options /* Any left-over entries in the index? */ if (o->merge) { + hint = -1; while (1) { - struct cache_entry *ce = next_cache_entry(o); + struct cache_entry *ce = next_cache_entry(o, &hint); if (!ce) break; if (unpack_index_entry(ce, o) < 0) -- gitgitgadget
Hi, On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Changes since V5 > ================ > > * Update t1092 test 'reset with wildcard pathspec' with more cases and > better checks > * Add "special case" wildcard pathspec check when determining whether to > expand the index (avoids double-loop over pathspecs & index entries) Looks pretty good. However, I'm worried this special case you added at my prodding might be problematic, and that I may have been wrong to prod you into it... > Thanks! -Victoria > > Range-diff vs v5: > > 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware > @@ Commit message > > Remove the `ensure_full_index` guard on `read_from_tree` and update `git > reset --mixed` to ensure it can use sparse directory index entries wherever > - possible. Sparse directory entries are reset use `diff_tree_oid`, which > + possible. Sparse directory entries are reset using `diff_tree_oid`, which > requires `change` and `add_remove` functions to process the internal > contents of the sparse directory. The `recursive` diff option handles cases > in which `reset --mixed` must diff/merge files that are nested multiple > @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, > + * (since we can reset whole sparse directories without expanding them). > + */ > + if (item.nowildcard_len < item.len) { > ++ /* > ++ * Special case: if the pattern is a path inside the cone > ++ * followed by only wildcards, the pattern cannot match > ++ * partial sparse directories, so we don't expand the index. > ++ */ > ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && > ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) I usually expect in an &&-chain to see the cheaper function call first (because that ordering often avoids the need to call the second function), and I would presume that strspn() would be the cheaper of the two. Did you switch the order because you expect the strspn call to nearly always return true, though? Could the strspn() call be replaced by a `item.len == item.nowildcard_len + 1`? I mean, sure, folks could list multiple asterisks in a row in their pathspec, but that seems super unlikely and even if it does happen the code will just fall back to the slower codepath and still give them the right answer. And the simpler check feels a lot easier to parse for human readers. But I'm worried there's a deeper issue here: Is the wildcard character (or characters) in path treated as a literal by path_in_cone_mode_sparse_checkout()? I think it is...and I'm worried that may be incorrect. For example, if the path is foo/* and the user has done a git sparse-checkout set foo/bar/ Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' should match 'foo/baz/file' and yet 'foo/*' when treated as a literal path would be considered in the sparse checkout by path_in_cone_mode_sparse_checkout. Does this result in the code returning an incorrect answer? (Or did I misunderstand something so far?) I'm wondering if I misled you earlier in my musings about whether we could avoid the slow codepath for pathspecs with wildcard characters. Maybe there's no safe optimization here and wildcard characters should always go through the slower codepath. > ++ continue; > ++ > + for (pos = 0; pos < active_nr; pos++) { > + struct cache_entry *ce = active_cache[pos]; > + > 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry > > -- > gitgitgadget
Elijah Newren wrote: > Hi, > > On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> Changes since V5 >> ================ >> >> * Update t1092 test 'reset with wildcard pathspec' with more cases and >> better checks >> * Add "special case" wildcard pathspec check when determining whether to >> expand the index (avoids double-loop over pathspecs & index entries) > > Looks pretty good. However, I'm worried this special case you added > at my prodding might be problematic, and that I may have been wrong to > prod you into it... > >> Thanks! -Victoria >> >> Range-diff vs v5: >> >> 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware >> @@ Commit message >> >> Remove the `ensure_full_index` guard on `read_from_tree` and update `git >> reset --mixed` to ensure it can use sparse directory index entries wherever >> - possible. Sparse directory entries are reset use `diff_tree_oid`, which >> + possible. Sparse directory entries are reset using `diff_tree_oid`, which >> requires `change` and `add_remove` functions to process the internal >> contents of the sparse directory. The `recursive` diff option handles cases >> in which `reset --mixed` must diff/merge files that are nested multiple >> @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, >> + * (since we can reset whole sparse directories without expanding them). >> + */ >> + if (item.nowildcard_len < item.len) { >> ++ /* >> ++ * Special case: if the pattern is a path inside the cone >> ++ * followed by only wildcards, the pattern cannot match >> ++ * partial sparse directories, so we don't expand the index. >> ++ */ >> ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && >> ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) > > I usually expect in an &&-chain to see the cheaper function call first > (because that ordering often avoids the need to call the second > function), and I would presume that strspn() would be the cheaper of > the two. Did you switch the order because you expect the strspn call > to nearly always return true, though? > This is a miss on my part, the `strspn()` check is probably less expensive and should be first. > Could the strspn() call be replaced by a `item.len == > item.nowildcard_len + 1`? I mean, sure, folks could list multiple > asterisks in a row in their pathspec, but that seems super unlikely > and even if it does happen the code will just fall back to the slower > codepath and still give them the right answer. And the simpler check > feels a lot easier to parse for human readers. > Agreed on wanting better readability - if the multiple-wildcard case is unlikely, the `PATHSPEC_ONESTAR` flag would indicate whether the pathspec ends in a single wildcard character. If that flag is still too obscure, though, I can stick with the length comparison. > But I'm worried there's a deeper issue here: > > > Is the wildcard character (or characters) in path treated as a literal > by path_in_cone_mode_sparse_checkout()? I think it is...and I'm > worried that may be incorrect. For example, if the path is > > foo/* > > and the user has done a > > git sparse-checkout set foo/bar/ > > Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' > should match 'foo/baz/file' and yet 'foo/*' when treated as a literal > path would be considered in the sparse checkout by > path_in_cone_mode_sparse_checkout. Does this result in the code > returning an incorrect answer? (Or did I misunderstand something so > far?) > Correct: `path_in_cone_mode_sparse_checkout` interprets the wildcard literally, and the checks here take that into account. The goal of `pathspec_needs_expanded_index` is to determine if the pathspec *may* match only partial contents of a sparse directory (like '*.c', or 'f*le'). For a `git reset --mixed`, only this scenario requires expansion; if an entire sparse directory is matched by a pathspec, the entire sparse directory is reset. Using your example, 'foo/*' does match 'foo/baz/file', but it also matches 'foo/' itself; as a result, the `foo/` sparse directory index entry is reset (rather than some individual files contained within it). The same goes for a patchspec like 'fo*' ("in-cone" and ending in a wildcard). Conversely, a pathspec like 'foo/ba*' would _not_ work (it wouldn't match something like 'foo/test-file'), and neither would 'f*o' (it would match all of 'foo', but would only match files ending in "o" in a directory 'f/'). Hope that helps! > I'm wondering if I misled you earlier in my musings about whether we > could avoid the slow codepath for pathspecs with wildcard characters. > Maybe there's no safe optimization here and wildcard characters should > always go through the slower codepath. > >> ++ continue; >> ++ >> + for (pos = 0; pos < active_nr; pos++) { >> + struct cache_entry *ce = active_cache[pos]; >> + >> 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry >> >> -- >> gitgitgadget
On Mon, Nov 29, 2021 at 11:44 AM Victoria Dye <vdye@github.com> wrote: > > Elijah Newren wrote: > > Hi, > > > > On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> > >> Changes since V5 > >> ================ > >> > >> * Update t1092 test 'reset with wildcard pathspec' with more cases and > >> better checks > >> * Add "special case" wildcard pathspec check when determining whether to > >> expand the index (avoids double-loop over pathspecs & index entries) > > > > Looks pretty good. However, I'm worried this special case you added > > at my prodding might be problematic, and that I may have been wrong to > > prod you into it... > > > >> Thanks! -Victoria > >> > >> Range-diff vs v5: > >> > >> 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware > >> @@ Commit message > >> > >> Remove the `ensure_full_index` guard on `read_from_tree` and update `git > >> reset --mixed` to ensure it can use sparse directory index entries wherever > >> - possible. Sparse directory entries are reset use `diff_tree_oid`, which > >> + possible. Sparse directory entries are reset using `diff_tree_oid`, which > >> requires `change` and `add_remove` functions to process the internal > >> contents of the sparse directory. The `recursive` diff option handles cases > >> in which `reset --mixed` must diff/merge files that are nested multiple > >> @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, > >> + * (since we can reset whole sparse directories without expanding them). > >> + */ > >> + if (item.nowildcard_len < item.len) { > >> ++ /* > >> ++ * Special case: if the pattern is a path inside the cone > >> ++ * followed by only wildcards, the pattern cannot match > >> ++ * partial sparse directories, so we don't expand the index. > >> ++ */ > >> ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && > >> ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) > > > > I usually expect in an &&-chain to see the cheaper function call first > > (because that ordering often avoids the need to call the second > > function), and I would presume that strspn() would be the cheaper of > > the two. Did you switch the order because you expect the strspn call > > to nearly always return true, though? > > > > This is a miss on my part, the `strspn()` check is probably less expensive > and should be first. > > > Could the strspn() call be replaced by a `item.len == > > item.nowildcard_len + 1`? I mean, sure, folks could list multiple > > asterisks in a row in their pathspec, but that seems super unlikely > > and even if it does happen the code will just fall back to the slower > > codepath and still give them the right answer. And the simpler check > > feels a lot easier to parse for human readers. > > > > Agreed on wanting better readability - if the multiple-wildcard case is > unlikely, the `PATHSPEC_ONESTAR` flag would indicate whether the pathspec > ends in a single wildcard character. If that flag is still too obscure, > though, I can stick with the length comparison. > > > But I'm worried there's a deeper issue here: > > > > > > Is the wildcard character (or characters) in path treated as a literal > > by path_in_cone_mode_sparse_checkout()? I think it is...and I'm > > worried that may be incorrect. For example, if the path is > > > > foo/* > > > > and the user has done a > > > > git sparse-checkout set foo/bar/ > > > > Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' > > should match 'foo/baz/file' and yet 'foo/*' when treated as a literal > > path would be considered in the sparse checkout by > > path_in_cone_mode_sparse_checkout. Does this result in the code > > returning an incorrect answer? (Or did I misunderstand something so > > far?) > > > > Correct: `path_in_cone_mode_sparse_checkout` interprets the wildcard > literally, and the checks here take that into account. The goal of > `pathspec_needs_expanded_index` is to determine if the pathspec *may* match > only partial contents of a sparse directory (like '*.c', or 'f*le'). For a > `git reset --mixed`, only this scenario requires expansion; if an entire > sparse directory is matched by a pathspec, the entire sparse directory is > reset. > > Using your example, 'foo/*' does match 'foo/baz/file', but it also matches > 'foo/' itself; as a result, the `foo/` sparse directory index entry is reset > (rather than some individual files contained within it). The same goes for a > patchspec like 'fo*' ("in-cone" and ending in a wildcard). Conversely, a > pathspec like 'foo/ba*' would _not_ work (it wouldn't match something like > 'foo/test-file'), and neither would 'f*o' (it would match all of 'foo', but > would only match files ending in "o" in a directory 'f/'). > > Hope that helps! Ah, yes, thanks for the explanation. :-) > > I'm wondering if I misled you earlier in my musings about whether we > > could avoid the slow codepath for pathspecs with wildcard characters. > > Maybe there's no safe optimization here and wildcard characters should > > always go through the slower codepath. > > > >> ++ continue; > >> ++ > >> + for (pos = 0; pos < active_nr; pos++) { > >> + struct cache_entry *ce = active_cache[pos]; > >> + > >> 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry > >> > >> -- > >> gitgitgadget >
On Mon, Nov 29 2021, Victoria Dye wrote: > Elijah Newren wrote: >> Hi, >> >> On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget >> <gitgitgadget@gmail.com> wrote: >>> >>> Changes since V5 >>> ================ >>> >>> * Update t1092 test 'reset with wildcard pathspec' with more cases and >>> better checks >>> * Add "special case" wildcard pathspec check when determining whether to >>> expand the index (avoids double-loop over pathspecs & index entries) >> >> Looks pretty good. However, I'm worried this special case you added >> at my prodding might be problematic, and that I may have been wrong to >> prod you into it... >> >>> Thanks! -Victoria >>> >>> Range-diff vs v5: >>> >>> 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware >>> @@ Commit message >>> >>> Remove the `ensure_full_index` guard on `read_from_tree` and update `git >>> reset --mixed` to ensure it can use sparse directory index entries wherever >>> - possible. Sparse directory entries are reset use `diff_tree_oid`, which >>> + possible. Sparse directory entries are reset using `diff_tree_oid`, which >>> requires `change` and `add_remove` functions to process the internal >>> contents of the sparse directory. The `recursive` diff option handles cases >>> in which `reset --mixed` must diff/merge files that are nested multiple >>> @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, >>> + * (since we can reset whole sparse directories without expanding them). >>> + */ >>> + if (item.nowildcard_len < item.len) { >>> ++ /* >>> ++ * Special case: if the pattern is a path inside the cone >>> ++ * followed by only wildcards, the pattern cannot match >>> ++ * partial sparse directories, so we don't expand the index. >>> ++ */ >>> ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && >>> ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) >> >> I usually expect in an &&-chain to see the cheaper function call first >> (because that ordering often avoids the need to call the second >> function), and I would presume that strspn() would be the cheaper of >> the two. Did you switch the order because you expect the strspn call >> to nearly always return true, though? >> > > This is a miss on my part, the `strspn()` check is probably less expensive > and should be first. I doubt it matters either way, and I didn't look into this to any degree of carefulness. But having followed the breadcrumb trail from the "What's Cooking" discussion & looked at the code one thing that stuck out for me was that path_in_cone_mode_sparse_checkout() appears returns 1 inconditionally in some cases based on global state: /* * We default to accepting a path if there are no patterns or * they are of the wrong type. */ if (init_sparse_checkout_patterns(istate) || (require_cone_mode && !istate->sparse_checkout_patterns->use_cone_patterns)) return 1; So moreso than the nano-optimization of strspn() v.s. path_in_cone_mode_sparse_checkout() I found it a bit odd that we're calling something in a loop where presumably we can punt out a lot earlier, and at least make that "continue" a "break" or "return" in that case. I.e. something in this direction (this patch obviously doesn't even compile, but should clarify what I'm blathering about :); but again, I really haven't looked at this properly, so just food for thought: diff --git a/builtin/reset.c b/builtin/reset.c index b1ff699b43a..cefdabb09c2 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -187,6 +187,9 @@ static int pathspec_needs_expanded_index(const struct pathspec *pathspec) if (pathspec->magic) return 1; + if (cant_possibly_have_path_in_cone_mode_blah_blah(&the_index)) + return 1; + for (i = 0; i < pathspec->nr; i++) { struct pathspec_item item = pathspec->items[i]; diff --git a/dir.c b/dir.c index 5aa6fbad0b7..19f2d989dd3 100644 --- a/dir.c +++ b/dir.c @@ -1456,14 +1456,8 @@ int init_sparse_checkout_patterns(struct index_state *istate) return 0; } -static int path_in_sparse_checkout_1(const char *path, - struct index_state *istate, - int require_cone_mode) +int cant_possibly_have_path_in_cone_mode_blah_blah(...) { - int dtype = DT_REG; - enum pattern_match_result match = UNDECIDED; - const char *end, *slash; - /* * We default to accepting a path if there are no patterns or * they are of the wrong type. @@ -1472,6 +1466,16 @@ static int path_in_sparse_checkout_1(const char *path, (require_cone_mode && !istate->sparse_checkout_patterns->use_cone_patterns)) return 1; +} + + +static int path_in_sparse_checkout_1(const char *path, + struct index_state *istate, + int require_cone_mode) +{ + int dtype = DT_REG; + enum pattern_match_result match = UNDECIDED; + const char *end, *slash; /* * If UNDECIDED, use the match from the parent dir (recursively), or