* [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it @ 2020-03-24 6:04 Matheus Tavares 2020-03-24 6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares ` (3 more replies) 0 siblings, 4 replies; 123+ messages in thread From: Matheus Tavares @ 2020-03-24 6:04 UTC (permalink / raw) To: git; +Cc: dstolee, newren This series is based on the discussions we had some months ago[1], about git-grep not currently honoring the sparsity patterns. To summarize, the idea is that, since a sparse checkout is used to limit the set of files in which users are interested, git-grep should, by default, only search within this boundary. But it would be good to also have an '--ignore-sparsity' option, to restore the old behavior when needed, as there are also valid use cases for it. The following patches seek to address these suggestions. The first patch is not really related, it is a cleanup, used by the third one. [1]: https://lore.kernel.org/git/CAHd-oW7e5qCuxZLBeVDq+Th3E+E4+P8=WzJfK8WcG2yz=n_nag@mail.gmail.com/t/#u Matheus Tavares (3): doc: grep: unify info on configuration variables grep: honor sparse checkout patterns grep: add option to ignore sparsity patterns Documentation/config/grep.txt | 10 ++- Documentation/git-grep.txt | 40 +++------- builtin/grep.c | 36 ++++++++- t/t7011-skip-worktree-reading.sh | 9 --- t/t7817-grep-sparse-checkout.sh | 130 +++++++++++++++++++++++++++++++ 5 files changed, 180 insertions(+), 45 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh -- 2.25.1 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH 1/3] doc: grep: unify info on configuration variables 2020-03-24 6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares @ 2020-03-24 6:11 ` Matheus Tavares 2020-03-24 7:57 ` Elijah Newren 2020-03-24 6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares ` (2 subsequent siblings) 3 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-03-24 6:11 UTC (permalink / raw) To: git; +Cc: dstolee, newren, sandals Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt". Let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 7 +++++-- Documentation/git-grep.txt | 35 +++++------------------------------ 2 files changed, 10 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..76689771aa 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,11 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` in + linkgit:git-grep[1] for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index ddb6acc025..97e25d7b1b 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,7 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +include::config/grep.txt[] OPTIONS ------- @@ -267,8 +240,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration (see linkgit:git-config[1]). -f <file>:: Read patterns from <file>, one per line. -- 2.25.1 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 1/3] doc: grep: unify info on configuration variables 2020-03-24 6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-03-24 7:57 ` Elijah Newren 2020-03-24 21:26 ` Junio C Hamano 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-03-24 7:57 UTC (permalink / raw) To: Matheus Tavares; +Cc: Git Mailing List, Derrick Stolee, brian m. carlson On Mon, Mar 23, 2020 at 11:11 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > Explanations about the configuration variables for git-grep are > duplicated in "Documentation/git-grep.txt" and > "Documentation/config/grep.txt". Let's unify the information in the > second file and include it in the first. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > Documentation/config/grep.txt | 7 +++++-- > Documentation/git-grep.txt | 35 +++++------------------------------ > 2 files changed, 10 insertions(+), 32 deletions(-) > > diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt > index 44abe45a7c..76689771aa 100644 > --- a/Documentation/config/grep.txt > +++ b/Documentation/config/grep.txt > @@ -16,8 +16,11 @@ grep.extendedRegexp:: > other than 'default'. > > grep.threads:: > - Number of grep worker threads to use. > - See `grep.threads` in linkgit:git-grep[1] for more information. > + Number of grep worker threads to use. See `--threads` in > + linkgit:git-grep[1] for more information. > + > +grep.fullName:: > + If set to true, enable `--full-name` option by default. > > grep.fallbackToNoIndex:: > If set to true, fall back to git grep --no-index if git grep > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index ddb6acc025..97e25d7b1b 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -41,34 +41,7 @@ characters. An empty string as search expression matches all lines. > CONFIGURATION > ------------- > > -grep.lineNumber:: > - If set to true, enable `-n` option by default. > - > -grep.column:: > - If set to true, enable the `--column` option by default. > - > -grep.patternType:: > - Set the default matching behavior. Using a value of 'basic', 'extended', > - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, > - `--fixed-strings`, or `--perl-regexp` option accordingly, while the > - value 'default' will return to the default matching behavior. > - > -grep.extendedRegexp:: > - If set to true, enable `--extended-regexp` option by default. This > - option is ignored when the `grep.patternType` option is set to a value > - other than 'default'. > - > -grep.threads:: > - Number of grep worker threads to use. If unset (or set to 0), Git will > - use as many threads as the number of logical cores available. > - > -grep.fullName:: > - If set to true, enable `--full-name` option by default. > - > -grep.fallbackToNoIndex:: > - If set to true, fall back to git grep --no-index if git grep > - is executed outside of a git repository. Defaults to false. > - > +include::config/grep.txt[] > > OPTIONS > ------- > @@ -267,8 +240,10 @@ providing this option will cause it to die. > found. > > --threads <num>:: > - Number of grep worker threads to use. > - See `grep.threads` in 'CONFIGURATION' for more information. > + Number of grep worker threads to use. If not provided (or set to > + 0), Git will use as many worker threads as the number of logical > + cores available. The default value can also be set with the > + `grep.threads` configuration (see linkgit:git-config[1]). I'm possibly showing my ignorance here, but doesn't the "include::config/grep.txt[]" you added above mean that the user doesn't have to see an external manpage but can see the definition earlier within this same manpage? > > -f <file>:: > Read patterns from <file>, one per line. > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 1/3] doc: grep: unify info on configuration variables 2020-03-24 7:57 ` Elijah Newren @ 2020-03-24 21:26 ` Junio C Hamano 2020-03-24 23:38 ` Matheus Tavares 0 siblings, 1 reply; 123+ messages in thread From: Junio C Hamano @ 2020-03-24 21:26 UTC (permalink / raw) To: Elijah Newren Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, brian m. carlson Elijah Newren <newren@gmail.com> writes: >> diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt >> index 44abe45a7c..76689771aa 100644 >> --- a/Documentation/config/grep.txt >> +++ b/Documentation/config/grep.txt >> @@ -16,8 +16,11 @@ grep.extendedRegexp:: >> ... >> + Number of grep worker threads to use. See `--threads` in >> + linkgit:git-grep[1] for more information. >> ... >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt >> index ddb6acc025..97e25d7b1b 100644 >> --- a/Documentation/git-grep.txt >> +++ b/Documentation/git-grep.txt >> @@ -41,34 +41,7 @@ characters. An empty string as search expression matches all lines. >> ... >> +include::config/grep.txt[] >> ... >> --threads <num>:: >> - Number of grep worker threads to use. >> - See `grep.threads` in 'CONFIGURATION' for more information. >> + Number of grep worker threads to use. If not provided (or set to >> + 0), Git will use as many worker threads as the number of logical >> + cores available. The default value can also be set with the >> + `grep.threads` configuration (see linkgit:git-config[1]). > > I'm possibly showing my ignorance here, but doesn't the > "include::config/grep.txt[]" you added above mean that the user > doesn't have to see an external manpage but can see the definition > earlier within this same manpage? I think so. Also, the new reference "See `--threads` in git-grep" added to grep.threads to config/grep.txt would become somewhat redundant in the context of "git grep --help" (only "See --threads" is relevant when it appears in this same manual page). Readers who finds the reference in "git config --help" still needs to see that --threads is an option to git-grep, though. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 1/3] doc: grep: unify info on configuration variables 2020-03-24 21:26 ` Junio C Hamano @ 2020-03-24 23:38 ` Matheus Tavares 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-03-24 23:38 UTC (permalink / raw) To: gitster; +Cc: dstolee, git, matheus.bernardino, newren, sandals On Tue, Mar 24, 2020 at 6:26 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > >> diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt > >> index 44abe45a7c..76689771aa 100644 > >> --- a/Documentation/config/grep.txt > >> +++ b/Documentation/config/grep.txt > >> @@ -16,8 +16,11 @@ grep.extendedRegexp:: > >> ... > >> + Number of grep worker threads to use. See `--threads` in > >> + linkgit:git-grep[1] for more information. > >> ... > >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > >> index ddb6acc025..97e25d7b1b 100644 > >> --- a/Documentation/git-grep.txt > >> +++ b/Documentation/git-grep.txt > >> @@ -41,34 +41,7 @@ characters. An empty string as search expression matches all lines. > >> ... > >> +include::config/grep.txt[] > >> ... > >> --threads <num>:: > >> - Number of grep worker threads to use. > >> - See `grep.threads` in 'CONFIGURATION' for more information. > >> + Number of grep worker threads to use. If not provided (or set to > >> + 0), Git will use as many worker threads as the number of logical > >> + cores available. The default value can also be set with the > >> + `grep.threads` configuration (see linkgit:git-config[1]). > > > > I'm possibly showing my ignorance here, but doesn't the > > "include::config/grep.txt[]" you added above mean that the user > > doesn't have to see an external manpage but can see the definition > > earlier within this same manpage? You are right. I added the "(see linkgit:git-config[1])" here more as a reference to the config system itself (for a user that is possibly not familiar with git-config). But if this is not necessary, we can remove the reference. > I think so. Also, the new reference "See `--threads` in git-grep" > added to grep.threads to config/grep.txt would become somewhat > redundant in the context of "git grep --help" (only "See --threads" > is relevant when it appears in this same manual page). Thanks for pointing that out. I think we can solve this issue with the following: diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index c1d49484c8..ac06db4206 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,11 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. See `--threads` in - linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. grep.fullName:: If set to true, enable `--full-name` option by default. diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 5c5c66c056..192aab4cba 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,6 +41,7 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- +:git-grep: 1 include::config/grep.txt[] OPTIONS I will add these changes in v2. ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-03-24 6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-03-24 6:12 ` Matheus Tavares 2020-03-24 7:15 ` Elijah Newren 2020-03-24 6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares 3 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-03-24 6:12 UTC (permalink / raw) To: git; +Cc: dstolee, newren, sandals, stefanbeller One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches found outside this subset, which kind of goes in the oposity direction. Let's fix that, making it honor the sparsity boundaries for every grepping case: - git grep in worktree - git grep --cached - git grep $REVISION - git grep --untracked and git grep --no-index (which already respect sparse checkout boundaries) This is also what some users reported[1] they would want as the default behavior. Note: for `git grep $REVISION`, we will choose to honor the sparsity patterns only when $REVISION is a commit-ish object. The reason is that, for a tree, we don't know whether it represents the root of a repository or a subtree. So we wouldn't be able to correctly match it against the sparsity patterns. E.g. suppose we have a repository with these two sparsity rules: "/*" and "!/a"; and the following structure: / | - a (file) | - d (dir) | - a (file) If `git grep $REVISION` were to honor the sparsity patterns for every object type, when grepping the /d tree, we would wrongly ignore the /d/a file. This happens because we wouldn't know it resides in /d and therefore it would wrongly match the pattern "!/a". Furthermore, for a search in a blob object, we wouldn't even have a path to check the patterns against. So, let's ignore the sparsity patterns when grepping non-commit-ish objects (tags to commits should be fine). Finally, the old behavior is still desirable for some use cases. So the next patch will add an option to allow restoring it when needed. [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Something I'm not entirely sure in this patch is how we implement the mechanism to honor sparsity for the `git grep <commit-ish>` case (which is treated in the grep_tree() function). Currently, the patch looks for an index entry that matches the path, and then checks its skip_worktree bit. But this operation is perfomed in O(log(N)); N being the number of index entries. If there are many entries (and no so many sparsity patterns), maybe a better approach would be to try matching the path directly against the sparsity patterns. This would be O(M) in the number of patterns, and it could be done, in builtin/grep.c, with a function like the following: static struct pattern_list sparsity_patterns; static int sparsity_patterns_initialized = 0; static enum pattern_match_result path_matches_sparsity_patterns( const char *path, int pathlen, const char *basename, struct repository *repo) { int dtype = DT_UNKNOWN; if (!sparsity_patterns_initialized) { char *sparse_file = git_pathdup("info/sparse-checkout"); int ret; memset(&sparsity_patterns, 0, sizeof(sparsity_patterns)); sparsity_patterns.use_cone_patterns = core_sparse_checkout_cone; ret = add_patterns_from_file_to_list(sparse_file, "", 0, &sparsity_patterns, NULL); free(sparse_file); if (ret < 0) die(_("failed to load sparse-checkout patterns")); sparsity_patterns_initialized = 1; } return path_matches_pattern_list(path, pathlen, basename, &dtype, &sparsity_patterns, repo->index); } Also, if I understand correctly, the index doesn't hold paths to dirs, right? So even if a complete dir is excluded from sparse checkout, we still have to check all its subentries, only to discover that they should all be skipped from the search. However, if we were to check against the sparsity patterns directly (e.g. with the function above), we could skip such directories together with all their entries. Oh, and there is also the case of a commit whose tree paths are not in the index (maybe manually created objects?). For such commits, with the index lookup approach, we would have to fall back on ignoring the sparsity rules. I'm not sure if that would be OK, though. Any thoughts on these two approaches (looking up the skip_worktree bit in the index or directly matching against sparsity patterns), will be highly appreciated. (Note that it only concerns the `git grep <commit-ish>` case. The other cases already iterate thought the index, so there is no O(log(N)) extra complexity). builtin/grep.c | 29 ++++++++--- t/t7011-skip-worktree-reading.sh | 9 ---- t/t7817-grep-sparse-checkout.sh | 88 ++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 15 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index 99e2685090..52ec72a036 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int from_commit); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -486,6 +486,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -498,8 +502,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -532,7 +535,7 @@ static int grep_cache(struct grep_opt *opt, static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) + int from_commit) { struct repository *repo = opt->repo; int hit = 0; @@ -546,6 +549,9 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, name_base_len = name.len; } + if (from_commit && repo_read_index(repo) < 0) + die(_("index file corrupt")); + while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); @@ -564,9 +570,20 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (from_commit) { + int pos = index_name_pos(repo->index, + base->buf + tn_len, + base->len - tn_len); + if (pos >= 0 && + ce_skip_worktree(repo->index->cache[pos])) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, - check_attr ? base->buf + tn_len : NULL); + from_commit ? base->buf + tn_len : NULL); } else if (S_ISDIR(entry.mode)) { enum object_type type; struct tree_desc sub; @@ -581,7 +598,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + from_commit); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..fccf44e829 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,88 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates the following dir structure: +. +| - a +| - b +| - dir + | - c + +Only "a" should be present due to the sparse checkout patterns: +"/*", "!/b" and "!/dir". +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + git add a b dir && + git commit -m "initial commit" && + git tag -am t-commit t-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am t-tree t-tree $tree && + cat >.git/info/sparse-checkout <<-EOF && + /* + !/b + !/dir + EOF + git sparse-checkout init && + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_file a +' + +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit +' + +test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_t-tree <<-EOF && + t-tree:a:text + t-tree:b:text + t-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" t-tree >actual_t-tree && + test_cmp expect_t-tree actual_t-tree +' + +test_done -- 2.25.1 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares @ 2020-03-24 7:15 ` Elijah Newren 2020-03-24 15:12 ` Derrick Stolee 2020-03-24 22:55 ` Matheus Tavares Bernardino 0 siblings, 2 replies; 123+ messages in thread From: Elijah Newren @ 2020-03-24 7:15 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller Hi Matheus, On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > One of the main uses for a sparse checkout is to allow users to focus on > the subset of files in a repository in which they are interested. But > git-grep currently ignores the sparsity patterns and report all matches > found outside this subset, which kind of goes in the oposity direction. > Let's fix that, making it honor the sparsity boundaries for every > grepping case: > > - git grep in worktree > - git grep --cached > - git grep $REVISION Wahoo! This is great. > - git grep --untracked and git grep --no-index (which already respect > sparse checkout boundaries) > > This is also what some users reported[1] they would want as the default > behavior. > > Note: for `git grep $REVISION`, we will choose to honor the sparsity > patterns only when $REVISION is a commit-ish object. The reason is that, Makes sense. > for a tree, we don't know whether it represents the root of a > repository or a subtree. So we wouldn't be able to correctly match it > against the sparsity patterns. E.g. suppose we have a repository with > these two sparsity rules: "/*" and "!/a"; and the following structure: > > / > | - a (file) > | - d (dir) > | - a (file) > > If `git grep $REVISION` were to honor the sparsity patterns for every > object type, when grepping the /d tree, we would wrongly ignore the /d/a > file. This happens because we wouldn't know it resides in /d and > therefore it would wrongly match the pattern "!/a". Furthermore, for a > search in a blob object, we wouldn't even have a path to check the > patterns against. So, let's ignore the sparsity patterns when grepping > non-commit-ish objects (tags to commits should be fine). > > Finally, the old behavior is still desirable for some use cases. So the > next patch will add an option to allow restoring it when needed. > > [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > > Something I'm not entirely sure in this patch is how we implement the > mechanism to honor sparsity for the `git grep <commit-ish>` case (which > is treated in the grep_tree() function). Currently, the patch looks for > an index entry that matches the path, and then checks its skip_worktree As you discuss below, checking the index is both wrong _and_ costly. You should use the sparsity patterns; Stolee did a lot of work to make those correspond to simple hashes you could check to determine whether to even walk into a subdirectory. So, O(1). Yeah, that's "only" cone mode but the non-cone sparsity patterns were a performance nightmare waiting to rear its ugly head. We should just try to encourage everyone to move to cone mode, or accept the slowness they get without it. > bit. But this operation is perfomed in O(log(N)); N being the number of > index entries. If there are many entries (and no so many sparsity > patterns), maybe a better approach would be to try matching the path > directly against the sparsity patterns. This would be O(M) in the number > of patterns, and it could be done, in builtin/grep.c, with a function > like the following: > > static struct pattern_list sparsity_patterns; > static int sparsity_patterns_initialized = 0; > static enum pattern_match_result path_matches_sparsity_patterns( > const char *path, int pathlen, > const char *basename, > struct repository *repo) > { > int dtype = DT_UNKNOWN; > > if (!sparsity_patterns_initialized) { > char *sparse_file = git_pathdup("info/sparse-checkout"); > int ret; > > memset(&sparsity_patterns, 0, sizeof(sparsity_patterns)); > sparsity_patterns.use_cone_patterns = core_sparse_checkout_cone; > ret = add_patterns_from_file_to_list(sparse_file, "", 0, > &sparsity_patterns, NULL); > free(sparse_file); > > if (ret < 0) > die(_("failed to load sparse-checkout patterns")); > sparsity_patterns_initialized = 1; > } > > return path_matches_pattern_list(path, pathlen, basename, &dtype, > &sparsity_patterns, repo->index); > } > > Also, if I understand correctly, the index doesn't hold paths to dirs, > right? So even if a complete dir is excluded from sparse checkout, we > still have to check all its subentries, only to discover that they > should all be skipped from the search. However, if we were to check > against the sparsity patterns directly (e.g. with the function above), > we could skip such directories together with all their entries. > > Oh, and there is also the case of a commit whose tree paths are not in > the index (maybe manually created objects?). For such commits, with the > index lookup approach, we would have to fall back on ignoring the > sparsity rules. I'm not sure if that would be OK, though. > > Any thoughts on these two approaches (looking up the skip_worktree bit > in the index or directly matching against sparsity patterns), will be > highly appreciated. (Note that it only concerns the `git grep > <commit-ish>` case. The other cases already iterate thought the index, so > there is no O(log(N)) extra complexity). > > builtin/grep.c | 29 ++++++++--- > t/t7011-skip-worktree-reading.sh | 9 ---- > t/t7817-grep-sparse-checkout.sh | 88 ++++++++++++++++++++++++++++++++ > 3 files changed, 111 insertions(+), 15 deletions(-) > create mode 100755 t/t7817-grep-sparse-checkout.sh > > diff --git a/builtin/grep.c b/builtin/grep.c > index 99e2685090..52ec72a036 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, > const struct pathspec *pathspec, int cached); > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr); > + int from_commit); I'm not familiar with grep.c and have to admit I don't know what "check_attr" means. Slightly surprised to see you replace it, but maybe reading the rest will explain... > > static int grep_submodule(struct grep_opt *opt, > const struct pathspec *pathspec, > @@ -486,6 +486,10 @@ static int grep_cache(struct grep_opt *opt, > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > + > + if (ce_skip_worktree(ce)) > + continue; > + Looks good for the case where we are grepping through what's cached. > strbuf_setlen(&name, name_base_len); > strbuf_addstr(&name, ce->name); > > @@ -498,8 +502,7 @@ static int grep_cache(struct grep_opt *opt, > * cache entry are identical, even if worktree file has > * been modified, so use cache version instead > */ > - if (cached || (ce->ce_flags & CE_VALID) || > - ce_skip_worktree(ce)) { > + if (cached || (ce->ce_flags & CE_VALID)) { I had the same change when I was trying to hack something like this patch into place but only handled the worktree case before realized it was a bit bigger job. > if (ce_stage(ce) || ce_intent_to_add(ce)) > continue; > hit |= grep_oid(opt, &ce->oid, name.buf, > @@ -532,7 +535,7 @@ static int grep_cache(struct grep_opt *opt, > > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr) > + int from_commit) > { > struct repository *repo = opt->repo; > int hit = 0; > @@ -546,6 +549,9 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > name_base_len = name.len; > } > > + if (from_commit && repo_read_index(repo) < 0) > + die(_("index file corrupt")); > + As above, I don't think we should need to read the index. We should compare to sparsity patterns, which in the important case (cone mode) simplifies to a hash lookup as we walk directories. > while (tree_entry(tree, &entry)) { > int te_len = tree_entry_len(&entry); > > @@ -564,9 +570,20 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_add(base, entry.path, te_len); > > + if (from_commit) { > + int pos = index_name_pos(repo->index, > + base->buf + tn_len, > + base->len - tn_len); > + if (pos >= 0 && > + ce_skip_worktree(repo->index->cache[pos])) { > + strbuf_setlen(base, old_baselen); > + continue; > + } > + } > + > if (S_ISREG(entry.mode)) { > hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, > - check_attr ? base->buf + tn_len : NULL); > + from_commit ? base->buf + tn_len : NULL); Sadly, this doesn't help me understand check_attr or from_commit. Could you clue me in a bit? > } else if (S_ISDIR(entry.mode)) { > enum object_type type; > struct tree_desc sub; > @@ -581,7 +598,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > strbuf_addch(base, '/'); > init_tree_desc(&sub, data, size); > hit |= grep_tree(opt, pathspec, &sub, base, tn_len, > - check_attr); > + from_commit); Same. > free(data); > } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { > hit |= grep_submodule(opt, pathspec, &entry.oid, > diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh > index 37525cae3a..26852586ac 100755 > --- a/t/t7011-skip-worktree-reading.sh > +++ b/t/t7011-skip-worktree-reading.sh > @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' > test -z "$(git ls-files -m)" > ' > > -test_expect_success 'grep with skip-worktree file' ' > - git update-index --no-skip-worktree 1 && > - echo test > 1 && > - git update-index 1 && > - git update-index --skip-worktree 1 && > - rm 1 && > - test "$(git grep --no-ext-grep test)" = "1:test" > -' > - > echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected > test_expect_success 'diff-index does not examine skip-worktree absent entries' ' > setup_absent && > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > new file mode 100755 > index 0000000000..fccf44e829 > --- /dev/null > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -0,0 +1,88 @@ > +#!/bin/sh > + > +test_description='grep in sparse checkout > + > +This test creates the following dir structure: > +. > +| - a > +| - b > +| - dir > + | - c > + > +Only "a" should be present due to the sparse checkout patterns: > +"/*", "!/b" and "!/dir". > +' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + echo "text" >a && > + echo "text" >b && > + mkdir dir && > + echo "text" >dir/c && > + git add a b dir && > + git commit -m "initial commit" && > + git tag -am t-commit t-commit HEAD && > + tree=$(git rev-parse HEAD^{tree}) && > + git tag -am t-tree t-tree $tree && > + cat >.git/info/sparse-checkout <<-EOF && > + /* > + !/b > + !/dir > + EOF > + git sparse-checkout init && Using `git sparse-checkout init` but then manually writing to .git/info/sparse-checkout? Seems like it'd make more sense to use `git sparse-checkout set` than writing the patterns directly yourself. Also, would prefer to have the examples use cone mode (even if you have to add subdirectories), as it makes the testcase a bit easier to read and more performant, though neither is a big deal. > + test_path_is_missing b && > + test_path_is_missing dir && > + test_path_is_file a > +' > + > +test_expect_success 'grep in working tree should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --cached should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep --cached "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + EOF > + cat >expect_t-commit <<-EOF && > + t-commit:a:text > + EOF > + git grep "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git grep "text" t-commit >actual_t-commit && > + test_cmp expect_t-commit actual_t-commit > +' > + > +test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' I think the test is fine but the title seems misleading. "outside" and "inside" aren't defined because <tree-ish> isn't known to be rooted, meaning we have no way to apply the sparsity patterns. So perhaps just 'grep <tree-ish> should ignore sparsity patterns'? > + commit=$(git rev-parse HEAD) && > + tree=$(git rev-parse HEAD^{tree}) && > + cat >expect_tree <<-EOF && > + $tree:a:text > + $tree:b:text > + $tree:dir/c:text > + EOF > + cat >expect_t-tree <<-EOF && > + t-tree:a:text > + t-tree:b:text > + t-tree:dir/c:text > + EOF > + git grep "text" $tree >actual_tree && > + test_cmp expect_tree actual_tree && > + git grep "text" t-tree >actual_t-tree && > + test_cmp expect_t-tree actual_t-tree > +' > + > +test_done > -- > 2.25.1 ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 7:15 ` Elijah Newren @ 2020-03-24 15:12 ` Derrick Stolee 2020-03-24 16:16 ` Elijah Newren 2020-03-24 23:01 ` Matheus Tavares Bernardino 2020-03-24 22:55 ` Matheus Tavares Bernardino 1 sibling, 2 replies; 123+ messages in thread From: Derrick Stolee @ 2020-03-24 15:12 UTC (permalink / raw) To: Elijah Newren, Matheus Tavares Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller On 3/24/2020 3:15 AM, Elijah Newren wrote: > Hi Matheus, > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: >> >> One of the main uses for a sparse checkout is to allow users to focus on >> the subset of files in a repository in which they are interested. But >> git-grep currently ignores the sparsity patterns and report all matches >> found outside this subset, which kind of goes in the oposity direction. >> Let's fix that, making it honor the sparsity boundaries for every >> grepping case: >> >> - git grep in worktree >> - git grep --cached >> - git grep $REVISION > > Wahoo! This is great. I am also excited. Also thrilled to see the option to get the old behavior in the next patch. >> Something I'm not entirely sure in this patch is how we implement the >> mechanism to honor sparsity for the `git grep <commit-ish>` case (which >> is treated in the grep_tree() function). Currently, the patch looks for >> an index entry that matches the path, and then checks its skip_worktree > > As you discuss below, checking the index is both wrong _and_ costly. I'm not sure why checking the index is _wrong_, but I agree about the performance cost. > You should use the sparsity patterns; Stolee did a lot of work to make > those correspond to simple hashes you could check to determine whether > to even walk into a subdirectory. So, O(1). Yeah, that's "only" cone > mode but the non-cone sparsity patterns were a performance nightmare > waiting to rear its ugly head. We should just try to encourage > everyone to move to cone mode, or accept the slowness they get without > it. > >> bit. But this operation is perfomed in O(log(N)); N being the number of >> index entries. If there are many entries (and no so many sparsity >> patterns), maybe a better approach would be to try matching the path >> directly against the sparsity patterns. This would be O(M) in the number >> of patterns, and it could be done, in builtin/grep.c, with a function >> like the following: >> >> static struct pattern_list sparsity_patterns; >> static int sparsity_patterns_initialized = 0; >> static enum pattern_match_result path_matches_sparsity_patterns( >> const char *path, int pathlen, >> const char *basename, >> struct repository *repo) >> { >> int dtype = DT_UNKNOWN; >> >> if (!sparsity_patterns_initialized) { >> char *sparse_file = git_pathdup("info/sparse-checkout"); >> int ret; >> >> memset(&sparsity_patterns, 0, sizeof(sparsity_patterns)); >> sparsity_patterns.use_cone_patterns = core_sparse_checkout_cone; >> ret = add_patterns_from_file_to_list(sparse_file, "", 0, >> &sparsity_patterns, NULL); >> free(sparse_file); >> >> if (ret < 0) >> die(_("failed to load sparse-checkout patterns")); >> sparsity_patterns_initialized = 1; >> } >> >> return path_matches_pattern_list(path, pathlen, basename, &dtype, >> &sparsity_patterns, repo->index); >> } >> >> Also, if I understand correctly, the index doesn't hold paths to dirs, >> right? So even if a complete dir is excluded from sparse checkout, we >> still have to check all its subentries, only to discover that they >> should all be skipped from the search. However, if we were to check >> against the sparsity patterns directly (e.g. with the function above), >> we could skip such directories together with all their entries. When in cone mode, we can check if a directory is one of these three modes: 1. Completely contained in the cone (recursive match) 2. Completely outside the cone 3. Neither. Keep matching subdirectories. (parent match) The clear_ce_flags() code in dir.c includes the matching algorithms for this. Hopefully you can re-use a lot of it. You may need to extract some methods to use them from the grep code. >> Oh, and there is also the case of a commit whose tree paths are not in >> the index (maybe manually created objects?). For such commits, with the >> index lookup approach, we would have to fall back on ignoring the >> sparsity rules. I'm not sure if that would be OK, though. >> >> Any thoughts on these two approaches (looking up the skip_worktree bit >> in the index or directly matching against sparsity patterns), will be >> highly appreciated. (Note that it only concerns the `git grep >> <commit-ish>` case. The other cases already iterate thought the index, so >> there is no O(log(N)) extra complexity). >> >> builtin/grep.c | 29 ++++++++--- >> t/t7011-skip-worktree-reading.sh | 9 ---- >> t/t7817-grep-sparse-checkout.sh | 88 ++++++++++++++++++++++++++++++++ >> 3 files changed, 111 insertions(+), 15 deletions(-) >> create mode 100755 t/t7817-grep-sparse-checkout.sh >> >> diff --git a/builtin/grep.c b/builtin/grep.c >> index 99e2685090..52ec72a036 100644 >> --- a/builtin/grep.c >> +++ b/builtin/grep.c >> @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, >> const struct pathspec *pathspec, int cached); >> static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, >> struct tree_desc *tree, struct strbuf *base, int tn_len, >> - int check_attr); >> + int from_commit); > > I'm not familiar with grep.c and have to admit I don't know what > "check_attr" means. Slightly surprised to see you replace it, but > maybe reading the rest will explain... > >> >> static int grep_submodule(struct grep_opt *opt, >> const struct pathspec *pathspec, >> @@ -486,6 +486,10 @@ static int grep_cache(struct grep_opt *opt, >> >> for (nr = 0; nr < repo->index->cache_nr; nr++) { >> const struct cache_entry *ce = repo->index->cache[nr]; >> + >> + if (ce_skip_worktree(ce)) >> + continue; >> + > > Looks good for the case where we are grepping through what's cached. > >> strbuf_setlen(&name, name_base_len); >> strbuf_addstr(&name, ce->name); >> >> @@ -498,8 +502,7 @@ static int grep_cache(struct grep_opt *opt, >> * cache entry are identical, even if worktree file has >> * been modified, so use cache version instead >> */ >> - if (cached || (ce->ce_flags & CE_VALID) || >> - ce_skip_worktree(ce)) { >> + if (cached || (ce->ce_flags & CE_VALID)) { > > I had the same change when I was trying to hack something like this > patch into place but only handled the worktree case before realized it > was a bit bigger job. > >> if (ce_stage(ce) || ce_intent_to_add(ce)) >> continue; >> hit |= grep_oid(opt, &ce->oid, name.buf, >> @@ -532,7 +535,7 @@ static int grep_cache(struct grep_opt *opt, >> >> static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, >> struct tree_desc *tree, struct strbuf *base, int tn_len, >> - int check_attr) >> + int from_commit) >> { >> struct repository *repo = opt->repo; >> int hit = 0; >> @@ -546,6 +549,9 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, >> name_base_len = name.len; >> } >> >> + if (from_commit && repo_read_index(repo) < 0) >> + die(_("index file corrupt")); >> + > > As above, I don't think we should need to read the index. We should > compare to sparsity patterns, which in the important case (cone mode) > simplifies to a hash lookup as we walk directories. > >> while (tree_entry(tree, &entry)) { >> int te_len = tree_entry_len(&entry); >> >> @@ -564,9 +570,20 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, >> >> strbuf_add(base, entry.path, te_len); >> >> + if (from_commit) { >> + int pos = index_name_pos(repo->index, >> + base->buf + tn_len, >> + base->len - tn_len); >> + if (pos >= 0 && >> + ce_skip_worktree(repo->index->cache[pos])) { >> + strbuf_setlen(base, old_baselen); >> + continue; >> + } >> + } >> + >> if (S_ISREG(entry.mode)) { >> hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, >> - check_attr ? base->buf + tn_len : NULL); >> + from_commit ? base->buf + tn_len : NULL); > > Sadly, this doesn't help me understand check_attr or from_commit. > Could you clue me in a bit? Yeah, Elijah and I know the sparse-checkout code quite well, but are unfamiliar with grep. Let's all expand our knowledge! >> } else if (S_ISDIR(entry.mode)) { >> enum object_type type; >> struct tree_desc sub; >> @@ -581,7 +598,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, >> strbuf_addch(base, '/'); >> init_tree_desc(&sub, data, size); >> hit |= grep_tree(opt, pathspec, &sub, base, tn_len, >> - check_attr); >> + from_commit); > > Same. > >> free(data); >> } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { >> hit |= grep_submodule(opt, pathspec, &entry.oid, >> diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh >> index 37525cae3a..26852586ac 100755 >> --- a/t/t7011-skip-worktree-reading.sh >> +++ b/t/t7011-skip-worktree-reading.sh >> @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' >> test -z "$(git ls-files -m)" >> ' >> >> -test_expect_success 'grep with skip-worktree file' ' >> - git update-index --no-skip-worktree 1 && >> - echo test > 1 && >> - git update-index 1 && >> - git update-index --skip-worktree 1 && >> - rm 1 && >> - test "$(git grep --no-ext-grep test)" = "1:test" >> -' >> - >> echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected >> test_expect_success 'diff-index does not examine skip-worktree absent entries' ' >> setup_absent && >> diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh >> new file mode 100755 >> index 0000000000..fccf44e829 >> --- /dev/null >> +++ b/t/t7817-grep-sparse-checkout.sh >> @@ -0,0 +1,88 @@ >> +#!/bin/sh >> + >> +test_description='grep in sparse checkout >> + >> +This test creates the following dir structure: >> +. >> +| - a >> +| - b >> +| - dir >> + | - c >> + >> +Only "a" should be present due to the sparse checkout patterns: >> +"/*", "!/b" and "!/dir". >> +' >> + >> +. ./test-lib.sh >> + >> +test_expect_success 'setup' ' >> + echo "text" >a && >> + echo "text" >b && >> + mkdir dir && >> + echo "text" >dir/c && >> + git add a b dir && >> + git commit -m "initial commit" && >> + git tag -am t-commit t-commit HEAD && >> + tree=$(git rev-parse HEAD^{tree}) && >> + git tag -am t-tree t-tree $tree && >> + cat >.git/info/sparse-checkout <<-EOF && >> + /* >> + !/b >> + !/dir >> + EOF >> + git sparse-checkout init && > > Using `git sparse-checkout init` but then manually writing to > .git/info/sparse-checkout? Seems like it'd make more sense to use > `git sparse-checkout set` than writing the patterns directly yourself. > Also, would prefer to have the examples use cone mode (even if you > have to add subdirectories), as it makes the testcase a bit easier to > read and more performant, though neither is a big deal. I agree that we should use the builtin so your test script is less brittle to potential back-end changes to sparse-checkout (none planned). I do recommend having at least one test with non-cone mode patterns, especially if you are checking the pattern-matching yourself instead of relying on the index. Thanks, -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 15:12 ` Derrick Stolee @ 2020-03-24 16:16 ` Elijah Newren 2020-03-24 17:02 ` Derrick Stolee 2020-03-24 23:01 ` Matheus Tavares Bernardino 1 sibling, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-03-24 16:16 UTC (permalink / raw) To: Derrick Stolee Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/24/2020 3:15 AM, Elijah Newren wrote: > > Hi Matheus, > > > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares ... > >> Something I'm not entirely sure in this patch is how we implement the > >> mechanism to honor sparsity for the `git grep <commit-ish>` case (which > >> is treated in the grep_tree() function). Currently, the patch looks for > >> an index entry that matches the path, and then checks its skip_worktree > > > > As you discuss below, checking the index is both wrong _and_ costly. > > I'm not sure why checking the index is _wrong_, but I agree about the > performance cost. Let's say there are two directories, dir1 and dir2. Over time, there have existed a total of six files: dir1/{a,b,c} dir2/{d,e,f} At the current time, there are only four files in the index: dir1/{a,b} dir2/{d,e} And the user has done a `git sparse-checkout set dir2` and then at some point later run `git grep OTHERCOMMIT foobar`. What happens? Well, since we're in a sparse checkout, we should only search the relevant paths within OTHERCOMMIT for "foobar". Let's say we attempt to figure out the "relevant paths" using the index. We can tell that dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them. dir1/c is untracked -- what do we do with it? Include it? Exclude it? Carrying on with the other files, dir2/d and dir2/e are tracked and !SKIP_WORKTREE so we search them. dir2/f is untracked -- what do we do with it? Include it? Exclude it? We're left without the necessary information to tell whether we should search OTHERCOMMIT's dir1/c and dir2/f if we consult the index. Any decision we make is going to be wrong for one of the two paths. If we instead do not attempt to consult the index (which corresponds to a version close to HEAD) in order to ask questions about the completely different OTHERCOMMIT, but instead use the sparsity patterns to query whether those files/directories are interesting, then we get the right answer. The index can only be consulted for the right answer in the case of --cached; in all other cases (including OTHERCOMMIT == HEAD), we should use the sparsity patterns. In fact, we could also use the sparsity patterns in the case of --cached, it's just that for that one particular case consulting the index will also give the right answer. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 16:16 ` Elijah Newren @ 2020-03-24 17:02 ` Derrick Stolee 0 siblings, 0 replies; 123+ messages in thread From: Derrick Stolee @ 2020-03-24 17:02 UTC (permalink / raw) To: Elijah Newren Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller On 3/24/2020 12:16 PM, Elijah Newren wrote: > On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 3/24/2020 3:15 AM, Elijah Newren wrote: >>> Hi Matheus, >>> >>> On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > ... >>>> Something I'm not entirely sure in this patch is how we implement the >>>> mechanism to honor sparsity for the `git grep <commit-ish>` case (which >>>> is treated in the grep_tree() function). Currently, the patch looks for >>>> an index entry that matches the path, and then checks its skip_worktree >>> >>> As you discuss below, checking the index is both wrong _and_ costly. >> >> I'm not sure why checking the index is _wrong_, but I agree about the >> performance cost. > > Let's say there are two directories, dir1 and dir2. Over time, there > have existed a total of six files: > dir1/{a,b,c} > dir2/{d,e,f} > At the current time, there are only four files in the index: > dir1/{a,b} > dir2/{d,e} > And the user has done a `git sparse-checkout set dir2` and then at > some point later run `git grep OTHERCOMMIT foobar`. What happens? > > Well, since we're in a sparse checkout, we should only search the > relevant paths within OTHERCOMMIT for "foobar". Let's say we attempt > to figure out the "relevant paths" using the index. We can tell that > dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them. > dir1/c is untracked -- what do we do with it? Include it? Exclude > it? Carrying on with the other files, dir2/d and dir2/e are tracked > and !SKIP_WORKTREE so we search them. dir2/f is untracked -- what do > we do with it? Include it? Exclude it? > > We're left without the necessary information to tell whether we should > search OTHERCOMMIT's dir1/c and dir2/f if we consult the index. Any > decision we make is going to be wrong for one of the two paths. > > If we instead do not attempt to consult the index (which corresponds > to a version close to HEAD) in order to ask questions about the > completely different OTHERCOMMIT, but instead use the sparsity > patterns to query whether those files/directories are interesting, > then we get the right answer. The index can only be consulted for the > right answer in the case of --cached; in all other cases (including > OTHERCOMMIT == HEAD), we should use the sparsity patterns. In fact, > we could also use the sparsity patterns in the case of --cached, it's > just that for that one particular case consulting the index will also > give the right answer. Thanks! This helps a lot. -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 15:12 ` Derrick Stolee 2020-03-24 16:16 ` Elijah Newren @ 2020-03-24 23:01 ` Matheus Tavares Bernardino 1 sibling, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-03-24 23:01 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren, Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller On Tue, Mar 24, 2020 at 12:12 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/24/2020 3:15 AM, Elijah Newren wrote: > > > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > >> > >> Also, if I understand correctly, the index doesn't hold paths to dirs, > >> right? So even if a complete dir is excluded from sparse checkout, we > >> still have to check all its subentries, only to discover that they > >> should all be skipped from the search. However, if we were to check > >> against the sparsity patterns directly (e.g. with the function above), > >> we could skip such directories together with all their entries. > > When in cone mode, we can check if a directory is one of these three > modes: > > 1. Completely contained in the cone (recursive match) > 2. Completely outside the cone > 3. Neither. Keep matching subdirectories. (parent match) > > The clear_ce_flags() code in dir.c includes the matching algorithms > for this. Hopefully you can re-use a lot of it. You may need to extract > some methods to use them from the grep code. Thanks for the pointer! I will take a look at the code in dir.c. > >> diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > >> new file mode 100755 > >> index 0000000000..fccf44e829 ... > >> + cat >.git/info/sparse-checkout <<-EOF && > >> + /* > >> + !/b > >> + !/dir > >> + EOF > >> + git sparse-checkout init && > > > > Using `git sparse-checkout init` but then manually writing to > > .git/info/sparse-checkout? Seems like it'd make more sense to use > > `git sparse-checkout set` than writing the patterns directly yourself. > > Also, would prefer to have the examples use cone mode (even if you > > have to add subdirectories), as it makes the testcase a bit easier to > > read and more performant, though neither is a big deal. > > I agree that we should use the builtin so your test script is less > brittle to potential back-end changes to sparse-checkout (none planned). Makes sense! > I do recommend having at least one test with non-cone mode patterns, > especially if you are checking the pattern-matching yourself instead of > relying on the index. OK, I will leave at least one test with non-cone patterns then. Thanks for the comments! ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 7:15 ` Elijah Newren 2020-03-24 15:12 ` Derrick Stolee @ 2020-03-24 22:55 ` Matheus Tavares Bernardino 2020-04-21 2:10 ` Matheus Tavares Bernardino 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-03-24 22:55 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller On Tue, Mar 24, 2020 at 4:15 AM Elijah Newren <newren@gmail.com> wrote: > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > Something I'm not entirely sure in this patch is how we implement the > > mechanism to honor sparsity for the `git grep <commit-ish>` case (which > > is treated in the grep_tree() function). Currently, the patch looks for > > an index entry that matches the path, and then checks its skip_worktree > > As you discuss below, checking the index is both wrong _and_ costly. > You should use the sparsity patterns; Stolee did a lot of work to make > those correspond to simple hashes you could check to determine whether > to even walk into a subdirectory. So, O(1). Yeah, that's "only" cone > mode but the non-cone sparsity patterns were a performance nightmare > waiting to rear its ugly head. We should just try to encourage > everyone to move to cone mode, or accept the slowness they get without > it. OK, makes sense. And your reply to Stolee, later in this thread, made it clearer for me why checking the index is not only costly but also wrong. Thanks for the great explanation! I will use the sparsity patterns directly, in the next iteration. > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 99e2685090..52ec72a036 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, > > const struct pathspec *pathspec, int cached); > > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > struct tree_desc *tree, struct strbuf *base, int tn_len, > > - int check_attr); > > + int from_commit); > > I'm not familiar with grep.c and have to admit I don't know what > "check_attr" means. Slightly surprised to see you replace it, but > maybe reading the rest will explain... ... >> if (S_ISREG(entry.mode)) { >> hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, >> - check_attr ? base->buf + tn_len : NULL); >> + from_commit ? base->buf + tn_len : NULL); > > Sadly, this doesn't help me understand check_attr or from_commit. > Could you clue me in a bit? Sure! The grep machinery can optionally look the .gitattributes file, to see if a given path has a "diff" attribute assigned to it. This attribute points to a diff driver in .gitconfig, which can specify many things, such as whether the path should be treated as a binary or not. The "check_attr" flag passed to grep_tree() tells the grep machinery if it should perform this attribute lookup for the paths in the given tree. I decided to replace it with "from_commit" because the only times we want an attribute lookup when grepping a tree, is when it comes from a commit. I.e., when the tree is the root. (The reasoning goes in the same lines as for why we only check sparsity patterns in git-grep for commit-ish objects: we cannot check pattern matching for trees which we are not sure to be rooted). Since "knowing if the tree is a root or not" is useful in grep_tree() for both sparsity checks and attribute checks, I thought we could use a single "from_commit" variable instead of "check_attr" and "check_sparsity", which would always have matching values. But on second thought, I could maybe rename the variable to something as "is_root_tree" or add a comment explaining the usage of "from_commit". (I'm not a big fan of "is_root_tree", thought, because we could give a root tree to grep_tree() but not really know it.) > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > new file mode 100755 > > index 0000000000..fccf44e829 > > --- /dev/null > > +++ b/t/t7817-grep-sparse-checkout.sh ... > > +test_expect_success 'setup' ' > > + echo "text" >a && > > + echo "text" >b && > > + mkdir dir && > > + echo "text" >dir/c && > > + git add a b dir && > > + git commit -m "initial commit" && > > + git tag -am t-commit t-commit HEAD && > > + tree=$(git rev-parse HEAD^{tree}) && > > + git tag -am t-tree t-tree $tree && > > + cat >.git/info/sparse-checkout <<-EOF && > > + /* > > + !/b > > + !/dir > > + EOF > > + git sparse-checkout init && > > Using `git sparse-checkout init` but then manually writing to > .git/info/sparse-checkout? Seems like it'd make more sense to use > `git sparse-checkout set` than writing the patterns directly yourself. > Also, would prefer to have the examples use cone mode (even if you > have to add subdirectories), as it makes the testcase a bit easier to > read and more performant, though neither is a big deal. OK, I will make use of the builtin here. I will also use the cone mode (and leave one test without it, as Stolee suggested later in this thread). > > +test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' > > I think the test is fine but the title seems misleading. "outside" > and "inside" aren't defined because <tree-ish> isn't known to be > rooted, meaning we have no way to apply the sparsity patterns. So > perhaps just 'grep <tree-ish> should ignore sparsity patterns'? Right! "should ignore sparsity patterns" is a much better name, thanks. Thanks a lot for the thoughtful review and comments! ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-03-24 22:55 ` Matheus Tavares Bernardino @ 2020-04-21 2:10 ` Matheus Tavares Bernardino 2020-04-21 3:08 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-04-21 2:10 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller Hi, Elijah, Stolee and others On Tue, Mar 24, 2020 at 7:55 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Tue, Mar 24, 2020 at 4:15 AM Elijah Newren <newren@gmail.com> wrote: > > > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > > > > > > Something I'm not entirely sure in this patch is how we implement the > > > mechanism to honor sparsity for the `git grep <commit-ish>` case (which > > > is treated in the grep_tree() function). Currently, the patch looks for > > > an index entry that matches the path, and then checks its skip_worktree > > > > As you discuss below, checking the index is both wrong _and_ costly. > > You should use the sparsity patterns; Stolee did a lot of work to make > > those correspond to simple hashes you could check to determine whether > > to even walk into a subdirectory. [...] > OK, makes sense. I've been working on the file skipping mechanism using the sparsity patterns directly. But I'm uncertain about some implementation details. So I wanted to share my current plan with you, to get some feedback before going deeper. The first idea was to load the sparsity patterns a priori and pass them to grep_tree(), which recursively greps the entries of a given tree object. If --recurse-submodules is given, however, we would also need to load each surepo's sparse-checkout file on the fly (as the subrepos are lazily initialized in grep_tree()'s call chain). That's not a problem on its own. But in the most naive implementation, this means unnecessarily re-loading the sparse-checkout files of the submodules for each tree given to git-grep (as grep_tree() is called separately for each one of them). So my next idea was to implement a cache, mapping 'struct repository's to 'struct pattern_list'. Well, not 'struct repository' itself, but repo->gitdir. This way we could load each file once, store the pattern list, and quickly retrieve the one that affect the repository currently being grepped, whether it is a submodule or not. But, is gitidir unique per repository? If not, could we use repo_git_path(repo, "info/sparse-checkout") as the key? I already have a prototype implementation of the last idea (using repo_git_path()). But I wanted to make sure, does this seem like a good path? Or should we avoid the work of having this hashmap here and do something else, as adding a 'struct pattern_list' to 'struct repository', directly? Thanks, Matheus ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-04-21 2:10 ` Matheus Tavares Bernardino @ 2020-04-21 3:08 ` Elijah Newren 2020-04-22 12:08 ` Derrick Stolee 2020-04-23 6:09 ` Matheus Tavares Bernardino 0 siblings, 2 replies; 123+ messages in thread From: Elijah Newren @ 2020-04-21 3:08 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller, Jonathan Nieder On Mon, Apr 20, 2020 at 7:11 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > Hi, Elijah, Stolee and others > > On Tue, Mar 24, 2020 at 7:55 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > On Tue, Mar 24, 2020 at 4:15 AM Elijah Newren <newren@gmail.com> wrote: > > > > > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > > > <matheus.bernardino@usp.br> wrote: > > > > > > > > Something I'm not entirely sure in this patch is how we implement the > > > > mechanism to honor sparsity for the `git grep <commit-ish>` case (which > > > > is treated in the grep_tree() function). Currently, the patch looks for > > > > an index entry that matches the path, and then checks its skip_worktree > > > > > > As you discuss below, checking the index is both wrong _and_ costly. > > > You should use the sparsity patterns; Stolee did a lot of work to make > > > those correspond to simple hashes you could check to determine whether > > > to even walk into a subdirectory. > [...] > > OK, makes sense. > > I've been working on the file skipping mechanism using the sparsity > patterns directly. But I'm uncertain about some implementation > details. So I wanted to share my current plan with you, to get some > feedback before going deeper. > > The first idea was to load the sparsity patterns a priori and pass > them to grep_tree(), which recursively greps the entries of a given > tree object. If --recurse-submodules is given, however, we would also > need to load each surepo's sparse-checkout file on the fly (as the > subrepos are lazily initialized in grep_tree()'s call chain). That's > not a problem on its own. But in the most naive implementation, this > means unnecessarily re-loading the sparse-checkout files of the > submodules for each tree given to git-grep (as grep_tree() is called > separately for each one of them). Wouldn't loading the sparse-checkout files be fast compared to grepping a submodule for matching strings? And not just fast, but essentially in the noise and hard to even measure? I have a hard time fathoming parsing the sparse-checkout file for a submodule somehow appreciably affecting the cost of grepping through that submodule. If the submodule has a huge number of sparse-checkout patterns, that'll be because it has a ginormous number of files and grepping through them all would be way, way longer. If the submodule only has a few files, then the sparse-checkout file is only going to be a few lines at most. Also, from another angle: I think the original intent of submodules was an alternate form of sparse-checkout/partial-clone, letting people deal with just their piece of the repo. As such, do we really even expect people to use sparse-checkouts and submodules together, let alone use them very heavily together? Sure, someone will use them, but I have a hard time imagining the scale of use of both features heavily enough for this to matter, especially since it also requires specifying multiple trees to grep (which is slightly unusual) in addition to the combination of these other features before your optimization here could kick in and be worthwhile. I'd be very tempted to just implement the most naive implementation and maybe leave a TODO note in the code for some future person to come along and optimize if it really matters, but I'd like to see numbers before we spend the development and maintenance effort on it because I'm having a hard time imagining any scale where it could matter. > So my next idea was to implement a cache, mapping 'struct repository's > to 'struct pattern_list'. Well, not 'struct repository' itself, but > repo->gitdir. This way we could load each file once, store the pattern > list, and quickly retrieve the one that affect the repository > currently being grepped, whether it is a submodule or not. But, is > gitidir unique per repository? If not, could we use > repo_git_path(repo, "info/sparse-checkout") as the key? > > I already have a prototype implementation of the last idea (using > repo_git_path()). But I wanted to make sure, does this seem like a > good path? Or should we avoid the work of having this hashmap here and > do something else, as adding a 'struct pattern_list' to 'struct > repository', directly? Honestly, it sounds a bit like premature optimization to me. Sorry if that's disappointing since you've apparently already put some effort into this, and it sounds like you're on a good track for optimizing this if it were necessary, but I'm just having a hard time figuring out whether it'd really help and be worth the code complexity. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-04-21 3:08 ` Elijah Newren @ 2020-04-22 12:08 ` Derrick Stolee 2020-04-23 6:09 ` Matheus Tavares Bernardino 1 sibling, 0 replies; 123+ messages in thread From: Derrick Stolee @ 2020-04-22 12:08 UTC (permalink / raw) To: Elijah Newren, Matheus Tavares Bernardino Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller, Jonathan Nieder On 4/20/2020 11:08 PM, Elijah Newren wrote: > On Mon, Apr 20, 2020 at 7:11 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: >> >> Hi, Elijah, Stolee and others >> >> On Tue, Mar 24, 2020 at 7:55 PM Matheus Tavares Bernardino >> <matheus.bernardino@usp.br> wrote: >>> >>> On Tue, Mar 24, 2020 at 4:15 AM Elijah Newren <newren@gmail.com> wrote: >>>> >>>> On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares >>>> <matheus.bernardino@usp.br> wrote: >>>>> >>>>> Something I'm not entirely sure in this patch is how we implement the >>>>> mechanism to honor sparsity for the `git grep <commit-ish>` case (which >>>>> is treated in the grep_tree() function). Currently, the patch looks for >>>>> an index entry that matches the path, and then checks its skip_worktree >>>> >>>> As you discuss below, checking the index is both wrong _and_ costly. >>>> You should use the sparsity patterns; Stolee did a lot of work to make >>>> those correspond to simple hashes you could check to determine whether >>>> to even walk into a subdirectory. >> [...] >>> OK, makes sense. >> >> I've been working on the file skipping mechanism using the sparsity >> patterns directly. But I'm uncertain about some implementation >> details. So I wanted to share my current plan with you, to get some >> feedback before going deeper. >> >> The first idea was to load the sparsity patterns a priori and pass >> them to grep_tree(), which recursively greps the entries of a given >> tree object. If --recurse-submodules is given, however, we would also >> need to load each surepo's sparse-checkout file on the fly (as the >> subrepos are lazily initialized in grep_tree()'s call chain). That's >> not a problem on its own. But in the most naive implementation, this >> means unnecessarily re-loading the sparse-checkout files of the >> submodules for each tree given to git-grep (as grep_tree() is called >> separately for each one of them). > > Wouldn't loading the sparse-checkout files be fast compared to > grepping a submodule for matching strings? And not just fast, but > essentially in the noise and hard to even measure? I have a hard time > fathoming parsing the sparse-checkout file for a submodule somehow > appreciably affecting the cost of grepping through that submodule. If > the submodule has a huge number of sparse-checkout patterns, that'll > be because it has a ginormous number of files and grepping through > them all would be way, way longer. If the submodule only has a few > files, then the sparse-checkout file is only going to be a few lines > at most. > > Also, from another angle: I think the original intent of submodules > was an alternate form of sparse-checkout/partial-clone, letting people > deal with just their piece of the repo. As such, do we really even > expect people to use sparse-checkouts and submodules together, let > alone use them very heavily together? Sure, someone will use them, > but I have a hard time imagining the scale of use of both features > heavily enough for this to matter, especially since it also requires > specifying multiple trees to grep (which is slightly unusual) in > addition to the combination of these other features before your > optimization here could kick in and be worthwhile. > > I'd be very tempted to just implement the most naive implementation > and maybe leave a TODO note in the code for some future person to come > along and optimize if it really matters, but I'd like to see numbers > before we spend the development and maintenance effort on it because > I'm having a hard time imagining any scale where it could matter. > >> So my next idea was to implement a cache, mapping 'struct repository's >> to 'struct pattern_list'. Well, not 'struct repository' itself, but >> repo->gitdir. This way we could load each file once, store the pattern >> list, and quickly retrieve the one that affect the repository >> currently being grepped, whether it is a submodule or not. But, is >> gitidir unique per repository? If not, could we use >> repo_git_path(repo, "info/sparse-checkout") as the key? >> >> I already have a prototype implementation of the last idea (using >> repo_git_path()). But I wanted to make sure, does this seem like a >> good path? Or should we avoid the work of having this hashmap here and >> do something else, as adding a 'struct pattern_list' to 'struct >> repository', directly? > > Honestly, it sounds a bit like premature optimization to me. Sorry if > that's disappointing since you've apparently already put some effort > into this, and it sounds like you're on a good track for optimizing > this if it were necessary, but I'm just having a hard time figuring > out whether it'd really help and be worth the code complexity. My initial thought was to use a stack or queue. It depend on how git-grep treats submodules. Imagine directories A, B, C where B is a submodule. If results from 'B' are output between results from 'A' and 'C', then use a stack to "push" the latest sparse-checkout patterns as you deepen into a submodule, then "pop" the patterns as you leave a submodule. If results from 'B' are output after results from 'C', then you could possibly use a queue instead. I find this unlikely, and it would behave strangely for nested submodules. Since "struct pattern_list" has most of the information you require, then it should not be challenging to create a list of them. Hopefully that provides some ideas. Thanks, -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 2/3] grep: honor sparse checkout patterns 2020-04-21 3:08 ` Elijah Newren 2020-04-22 12:08 ` Derrick Stolee @ 2020-04-23 6:09 ` Matheus Tavares Bernardino 1 sibling, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-04-23 6:09 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Derrick Stolee, brian m. carlson, Stefan Beller, Jonathan Nieder On Tue, Apr 21, 2020 at 12:08 AM Elijah Newren <newren@gmail.com> wrote: > > On Mon, Apr 20, 2020 at 7:11 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > I've been working on the file skipping mechanism using the sparsity > > patterns directly. But I'm uncertain about some implementation > > details. So I wanted to share my current plan with you, to get some > > feedback before going deeper. > > > > The first idea was to load the sparsity patterns a priori and pass > > them to grep_tree(), which recursively greps the entries of a given > > tree object. If --recurse-submodules is given, however, we would also > > need to load each surepo's sparse-checkout file on the fly (as the > > subrepos are lazily initialized in grep_tree()'s call chain). That's > > not a problem on its own. But in the most naive implementation, this > > means unnecessarily re-loading the sparse-checkout files of the > > submodules for each tree given to git-grep (as grep_tree() is called > > separately for each one of them). > > Wouldn't loading the sparse-checkout files be fast compared to > grepping a submodule for matching strings? And not just fast, but > essentially in the noise and hard to even measure? I have a hard time > fathoming parsing the sparse-checkout file for a submodule somehow > appreciably affecting the cost of grepping through that submodule. If > the submodule has a huge number of sparse-checkout patterns, that'll > be because it has a ginormous number of files and grepping through > them all would be way, way longer. If the submodule only has a few > files, then the sparse-checkout file is only going to be a few lines > at most. Yeah, makes sense. > Also, from another angle: I think the original intent of submodules > was an alternate form of sparse-checkout/partial-clone, letting people > deal with just their piece of the repo. As such, do we really even > expect people to use sparse-checkouts and submodules together, let > alone use them very heavily together? Sure, someone will use them, > but I have a hard time imagining the scale of use of both features > heavily enough for this to matter, especially since it also requires > specifying multiple trees to grep (which is slightly unusual) in > addition to the combination of these other features before your > optimization here could kick in and be worthwhile. > > I'd be very tempted to just implement the most naive implementation > and maybe leave a TODO note in the code for some future person to come > along and optimize if it really matters, but I'd like to see numbers > before we spend the development and maintenance effort on it because > I'm having a hard time imagining any scale where it could matter. You're right. I guess I got a little too excited about the optimizations possibilities and neglected the fact that they might not even be needed here. Just to take a look at some numbers, I prototyped the naive implementation and downloaded a testing repository[1] containing 8 submodules (or 14 counting the nested ones). For each of the non-nested submodules, I added its .gitignore rules to the sparse-checkout file (of course this doesn't make any sense for a real-world usage, but I just wanted to populate the file with a large quantity of valid rules, to test the parsing time). I also added the rule '/*'. Then I ran: git-grep --threads=1 --recurse-submodules -E "config_[a-z]+\(" $(cat /tmp/trees) Where /tmp/trees contained about 120 trees in the said repository (again, a probably unreal case, for testing purposes only). Then, measuring the time spent only inside the function I created to load a sparse-checkout file for a given 'struct repository', I got to the following numbers: Number of calls: 1531 (makes sense: ~120 trees and 14 submodules) Percentage over the total time: 0.015% Number of matches: 300897 And using 8 threads, I got the same numbers except for the percentage, which was a little higher: 0.05%. So, indeed, the overhead of re-loading the files is too insignificant. And my cache idea was a premature and unnecessary optimization. > > So my next idea was to implement a cache, mapping 'struct repository's > > to 'struct pattern_list'. Well, not 'struct repository' itself, but > > repo->gitdir. This way we could load each file once, store the pattern > > list, and quickly retrieve the one that affect the repository > > currently being grepped, whether it is a submodule or not. But, is > > gitidir unique per repository? If not, could we use > > repo_git_path(repo, "info/sparse-checkout") as the key? > > > > I already have a prototype implementation of the last idea (using > > repo_git_path()). But I wanted to make sure, does this seem like a > > good path? Or should we avoid the work of having this hashmap here and > > do something else, as adding a 'struct pattern_list' to 'struct > > repository', directly? > > Honestly, it sounds a bit like premature optimization to me. Sorry if > that's disappointing since you've apparently already put some effort > into this, and it sounds like you're on a good track for optimizing > this if it were necessary, but I'm just having a hard time figuring > out whether it'd really help and be worth the code complexity. No problem! I'm glad to have this feedback now, while I'm still working on v2 :) Now I can focus on what's really relevant. So thanks again! [1]: https://github.com/surevine/Metre ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-03-24 6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares 2020-03-24 6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares @ 2020-03-24 6:13 ` Matheus Tavares 2020-03-24 7:54 ` Elijah Newren 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares 3 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-03-24 6:13 UTC (permalink / raw) To: git; +Cc: dstolee, newren, pclouds In the last commit, git-grep learned to honor sparsity patterns. For some use cases, however, it may be desirable to search outside the sparse checkout. So add the '--ignore-sparsity' option, which restores the old behavior. Also add the grep.ignoreSparsity configuration, to allow setting this behavior by default. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Note: I still have to make --ignore-sparsity be able to work together with --untracked. Unfortunatelly, this won't be as simple because the codeflow taken by --untracked goes to grep_directory() which just iterates the working tree, without looking the index entries. So I will have to either: make --untracked use grep_cache(), and grep the untracked files later; or try matching the working tree paths against the sparsity patterns, without looking for the skip_worktree bit in the index (as I mentioned in the previous patch's comments). Any preferences regarding these two approaches? (or other suggestions?) Documentation/config/grep.txt | 3 +++ Documentation/git-grep.txt | 5 ++++ builtin/grep.c | 19 +++++++++++---- t/t7817-grep-sparse-checkout.sh | 42 +++++++++++++++++++++++++++++++++ 4 files changed, 65 insertions(+), 4 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 76689771aa..c1d49484c8 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -25,3 +25,6 @@ grep.fullName:: grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. + +grep.ignoreSparsity:: + If set to true, enable `--ignore-sparsity` by default. diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 97e25d7b1b..5c5c66c056 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -65,6 +65,11 @@ OPTIONS mechanism. Only useful when searching files in the current directory with `--no-index`. +--ignore-sparsity:: + In a sparse checked out repository (see linkgit:git-sparse-checkout[1]), + also search in files that are outside the sparse checkout. This option + cannot be used with --no-index or --untracked. + --recurse-submodules:: Recursively search in each submodule that has been initialized and checked out in the repository. When used in combination with the diff --git a/builtin/grep.c b/builtin/grep.c index 52ec72a036..17eae3edd6 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -33,6 +33,8 @@ static char const * const grep_usage[] = { static int recurse_submodules; +static int ignore_sparsity = 0; + static int num_threads; static pthread_t *threads; @@ -292,6 +294,9 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) if (!strcmp(var, "submodule.recurse")) recurse_submodules = git_config_bool(var, value); + if (!strcmp(var, "grep.ignoresparsity")) + ignore_sparsity = git_config_bool(var, value); + return st; } @@ -487,7 +492,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (!ignore_sparsity && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -502,7 +507,8 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID)) { + if (cached || (ce->ce_flags & CE_VALID) || + ce_skip_worktree(ce)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -549,7 +555,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, name_base_len = name.len; } - if (from_commit && repo_read_index(repo) < 0) + if (!ignore_sparsity && from_commit && repo_read_index(repo) < 0) die(_("index file corrupt")); while (tree_entry(tree, &entry)) { @@ -570,7 +576,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); - if (from_commit) { + if (!ignore_sparsity && from_commit) { int pos = index_name_pos(repo->index, base->buf + tn_len, base->len - tn_len); @@ -932,6 +938,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored, N_("allow calling of grep(1) (ignored by this build)"), PARSE_OPT_NOCOMPLETE), + OPT_BOOL(0, "ignore-sparsity", &ignore_sparsity, + N_("also search in files outside the sparse checkout")), OPT_END() }; @@ -1073,6 +1081,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (recurse_submodules && untracked) die(_("--untracked not supported with --recurse-submodules")); + if (ignore_sparsity && (!use_index || untracked)) + die(_("--no-index or --untracked cannot be used with --ignore-sparsity")); + if (show_in_pager) { if (num_threads > 1) warning(_("invalid option combination, ignoring --threads")); diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index fccf44e829..1891ddea57 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -85,4 +85,46 @@ test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' test_cmp expect_t-tree actual_t-tree ' +for cmd in 'git grep --ignore-sparsity' 'git -c grep.ignoreSparsity grep' \ + 'git -c grep.ignoreSparsity=false grep --ignore-sparsity' +do + test_expect_success "$cmd should search outside sparse checkout" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd --cached should search outside sparse checkout" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should search outside sparse checkout" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + t-commit:b:text + t-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit + ' +done + test_done -- 2.25.1 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares @ 2020-03-24 7:54 ` Elijah Newren 2020-03-24 18:30 ` Junio C Hamano 2020-03-25 23:15 ` Matheus Tavares Bernardino 0 siblings, 2 replies; 123+ messages in thread From: Elijah Newren @ 2020-03-24 7:54 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > In the last commit, git-grep learned to honor sparsity patterns. For > some use cases, however, it may be desirable to search outside the > sparse checkout. So add the '--ignore-sparsity' option, which restores > the old behavior. Also add the grep.ignoreSparsity configuration, to > allow setting this behavior by default. Should `--ignore-sparsity` be a global git option rather than a grep-specific one? Also, should grep.ignoreSparsity rather be core.ignoreSparsity or core.searchOutsideSparsePaths or something? In particular, I want a world where: * Someone can do a "sparse" clone that is NOT just about sparse-checkout but also about partial clone. In particular, it makes use of partial clones to download only the history for the sparsity paths, and does a sparse-checkout --cone to get those checked out. (Or, perhaps, defaults to just downloading history for the toplevel dir, much like `sparse-checkout init --cone`, and then when the user runs `sparse-checkout set $dir1 $dir2 ...` then it downloads the extra bits). * grep, diff, log, shortlog, blame, bisect (and maybe others) all by default make use of the sparsity patterns to limit their output (but can all use whatever flag(s) are added here to search outside the sparsity pattern cones). This helps users feel they are in a smaller repo and searching just their area of interest, and it avoids partial clones downloading blobs unnecessarily. Nice for the user, and nice for the system. * worktrees behave nicer; when creating a new one it inherits the sparsity patterns of the parent (again to avoid partail clones having to download everything, and let users continue working on their area of interest, though they can disable sparse checkouts at any time, of course). Still would like Junio's feedback on this one. * rebase, merge, cherry-pick, etc. (all via the merge machiner) have smarter tree-merging logic such that when trees are unchanged on one or both sides of history, we take advantage of the subset of those cases where we can avoid traversing into subtrees but can resolve the merge at the tree level. This is a performance optimization even when you have all trees and blob available, but an even more important one if you don't want partial clones to suddenly have to download unnecessary objects. I have ideas and am working on this as part of merge-ort. > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > > Note: I still have to make --ignore-sparsity be able to work together > with --untracked. Unfortunatelly, this won't be as simple because the > codeflow taken by --untracked goes to grep_directory() which just > iterates the working tree, without looking the index entries. So I will > have to either: make --untracked use grep_cache(), and grep the > untracked files later; or try matching the working tree paths against > the sparsity patterns, without looking for the skip_worktree bit in > the index (as I mentioned in the previous patch's comments). Any > preferences regarding these two approaches? (or other suggestions?) Hmm. So, 'tracked' in git is the idea that we are keeping information about specific files. 'sparse-checkout' is the idea that we have a subset of those that we can work with without materializing all the other tracked files; it's clearly a subset of the realm of 'tracked'. 'untracked' is about getting everything outside the set of 'tracked' files, which to me means it is clearly outside the set of sparsity paths too (and thus you could take --untracked as implying --ignore-sparsity, though whether you do might not matter in practice because of the items I'll discuss next). Of course, I am also assuming `--untracked` is incompatible with --cached or specifying revisions or trees (based on it's definiton of "In addition to searching in the tracked files in the *working tree*, search also in untracked files." -- emphasis added.) If the incompatibility of --untracked and --cached/REVSIONS/TREES is not enforced, we may want to look into erroring out if they are given together. Once we do, we don't have to worry about grep_cache() at all in the case of --untracked and shouldn't. Files with the skip_worktree bit won't exist in the working directory, and thus won't be searched (this is what makes --untracked imply --ignore-sparsity not really matter). In short: With --untracked you are grepping ALL (non-ignored) files in the working directory -- either because they are both tracked and in the sparsity paths (anything tracked that isn't in the sparsity paths has the skip_worktree bit and thus isn't present), or because it is an untracked file. [And this may be what grep_directory() already does.] Does that make sense? > Documentation/config/grep.txt | 3 +++ > Documentation/git-grep.txt | 5 ++++ > builtin/grep.c | 19 +++++++++++---- > t/t7817-grep-sparse-checkout.sh | 42 +++++++++++++++++++++++++++++++++ > 4 files changed, 65 insertions(+), 4 deletions(-) > > diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt > index 76689771aa..c1d49484c8 100644 > --- a/Documentation/config/grep.txt > +++ b/Documentation/config/grep.txt > @@ -25,3 +25,6 @@ grep.fullName:: > grep.fallbackToNoIndex:: > If set to true, fall back to git grep --no-index if git grep > is executed outside of a git repository. Defaults to false. > + > +grep.ignoreSparsity:: > + If set to true, enable `--ignore-sparsity` by default. > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 97e25d7b1b..5c5c66c056 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -65,6 +65,11 @@ OPTIONS > mechanism. Only useful when searching files in the current directory > with `--no-index`. > > +--ignore-sparsity:: > + In a sparse checked out repository (see linkgit:git-sparse-checkout[1]), > + also search in files that are outside the sparse checkout. This option > + cannot be used with --no-index or --untracked. If they are outside the sparse checkout, then they are not present on disk -- so what is this outside stuff that is being searched? Perhaps clarify that this is only useful in combination with --cached/REVISION/TREE, where there do exist paths outside the sparsity patterns that become relevant? > --recurse-submodules:: > Recursively search in each submodule that has been initialized and > checked out in the repository. When used in combination with the > diff --git a/builtin/grep.c b/builtin/grep.c > index 52ec72a036..17eae3edd6 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -33,6 +33,8 @@ static char const * const grep_usage[] = { > > static int recurse_submodules; > > +static int ignore_sparsity = 0; > + > static int num_threads; > > static pthread_t *threads; > @@ -292,6 +294,9 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) > if (!strcmp(var, "submodule.recurse")) > recurse_submodules = git_config_bool(var, value); > > + if (!strcmp(var, "grep.ignoresparsity")) > + ignore_sparsity = git_config_bool(var, value); > + > return st; > } > > @@ -487,7 +492,7 @@ static int grep_cache(struct grep_opt *opt, > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > > - if (ce_skip_worktree(ce)) > + if (!ignore_sparsity && ce_skip_worktree(ce)) Oh boy on the double negatives...maybe we want to rename this flag somehow? > continue; > > strbuf_setlen(&name, name_base_len); > @@ -502,7 +507,8 @@ static int grep_cache(struct grep_opt *opt, > * cache entry are identical, even if worktree file has > * been modified, so use cache version instead > */ > - if (cached || (ce->ce_flags & CE_VALID)) { > + if (cached || (ce->ce_flags & CE_VALID) || > + ce_skip_worktree(ce)) { > if (ce_stage(ce) || ce_intent_to_add(ce)) > continue; > hit |= grep_oid(opt, &ce->oid, name.buf, > @@ -549,7 +555,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > name_base_len = name.len; > } > > - if (from_commit && repo_read_index(repo) < 0) > + if (!ignore_sparsity && from_commit && repo_read_index(repo) < 0) > die(_("index file corrupt")); > > while (tree_entry(tree, &entry)) { > @@ -570,7 +576,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_add(base, entry.path, te_len); > > - if (from_commit) { > + if (!ignore_sparsity && from_commit) { > int pos = index_name_pos(repo->index, > base->buf + tn_len, > base->len - tn_len); > @@ -932,6 +938,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored, > N_("allow calling of grep(1) (ignored by this build)"), > PARSE_OPT_NOCOMPLETE), > + OPT_BOOL(0, "ignore-sparsity", &ignore_sparsity, > + N_("also search in files outside the sparse checkout")), > OPT_END() > }; > > @@ -1073,6 +1081,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > if (recurse_submodules && untracked) > die(_("--untracked not supported with --recurse-submodules")); > > + if (ignore_sparsity && (!use_index || untracked)) > + die(_("--no-index or --untracked cannot be used with --ignore-sparsity")); > + > if (show_in_pager) { > if (num_threads > 1) > warning(_("invalid option combination, ignoring --threads")); > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index fccf44e829..1891ddea57 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -85,4 +85,46 @@ test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' > test_cmp expect_t-tree actual_t-tree > ' > > +for cmd in 'git grep --ignore-sparsity' 'git -c grep.ignoreSparsity grep' \ > + 'git -c grep.ignoreSparsity=false grep --ignore-sparsity' > +do > + test_expect_success "$cmd should search outside sparse checkout" ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + EOF > + $cmd "text" >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "$cmd --cached should search outside sparse checkout" ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + EOF > + $cmd --cached "text" >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "$cmd <commit-ish> should search outside sparse checkout" ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:b:text > + $commit:dir/c:text > + EOF > + cat >expect_t-commit <<-EOF && > + t-commit:a:text > + t-commit:b:text > + t-commit:dir/c:text > + EOF > + $cmd "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + $cmd "text" t-commit >actual_t-commit && > + test_cmp expect_t-commit actual_t-commit > + ' > +done > + > test_done > -- > 2.25.1 I think there are several things that we need to straighten out first and will affect a lot of this patch quite a bit: * The feedback from the previous patch that the revision handling should use sparsity patterns rather than ce_skip_worktree() is going to affect this patch a fair amount. * I think the fact that --ignore-sparsity is meaningless without --cached or a REVISION or TREE may also affect things. * The decision about how to globally name and set the "ignore-sparsity" bit without requiring users to set it for each and every subcommand will change this patch a bit too. I'm super excited to see work in this area. I hope I'm not discouraging you by attempting to provide what I think is the bigger picture I'd like us to work towards. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 7:54 ` Elijah Newren @ 2020-03-24 18:30 ` Junio C Hamano 2020-03-24 19:07 ` Elijah Newren 2020-03-30 3:23 ` Matheus Tavares Bernardino 2020-03-25 23:15 ` Matheus Tavares Bernardino 1 sibling, 2 replies; 123+ messages in thread From: Junio C Hamano @ 2020-03-24 18:30 UTC (permalink / raw) To: Elijah Newren Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc Elijah Newren <newren@gmail.com> writes: > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: >> >> In the last commit, git-grep learned to honor sparsity patterns. For >> some use cases, however, it may be desirable to search outside the >> sparse checkout. So add the '--ignore-sparsity' option, which restores >> the old behavior. Also add the grep.ignoreSparsity configuration, to >> allow setting this behavior by default. > > Should `--ignore-sparsity` be a global git option rather than a > grep-specific one? Also, should grep.ignoreSparsity rather be > core.ignoreSparsity or core.searchOutsideSparsePaths or something? Great question. I think "git diff" with various options would also want to optionally be able to be confined within the sparse cone, or checking the entire world by lazily fetching outside the sparsity. > * grep, diff, log, shortlog, blame, bisect (and maybe others) all by > default make use of the sparsity patterns to limit their output (but > can all use whatever flag(s) are added here to search outside the > sparsity pattern cones). This helps users feel they are in a smaller > repo and searching just their area of interest, and it avoids partial > clones downloading blobs unnecessarily. Nice for the user, and nice > for the system. I am not sure which one should be the default. From historical point of view that sparse stuff was done as an optimization to omit initial work and lazily give the whole world, I may have slight preference to the "we pretend that you have everything, just some parts may be slower to come to you" world view to be the default, with an option to limit the view to whatever sparsity you initially set up. Regardless of the choice of the default, it would be a good idea to make the subcommands consistently offer the same default and allow the non-default views with the same UI. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 18:30 ` Junio C Hamano @ 2020-03-24 19:07 ` Elijah Newren 2020-03-25 20:18 ` Junio C Hamano 2020-03-30 3:23 ` Matheus Tavares Bernardino 1 sibling, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-03-24 19:07 UTC (permalink / raw) To: Junio C Hamano Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc On Tue, Mar 24, 2020 at 11:30 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > >> > >> In the last commit, git-grep learned to honor sparsity patterns. For > >> some use cases, however, it may be desirable to search outside the > >> sparse checkout. So add the '--ignore-sparsity' option, which restores > >> the old behavior. Also add the grep.ignoreSparsity configuration, to > >> allow setting this behavior by default. > > > > Should `--ignore-sparsity` be a global git option rather than a > > grep-specific one? Also, should grep.ignoreSparsity rather be > > core.ignoreSparsity or core.searchOutsideSparsePaths or something? > > Great question. I think "git diff" with various options would also > want to optionally be able to be confined within the sparse cone, or > checking the entire world by lazily fetching outside the sparsity. > > > * grep, diff, log, shortlog, blame, bisect (and maybe others) all by > > default make use of the sparsity patterns to limit their output (but > > can all use whatever flag(s) are added here to search outside the > > sparsity pattern cones). This helps users feel they are in a smaller > > repo and searching just their area of interest, and it avoids partial > > clones downloading blobs unnecessarily. Nice for the user, and nice > > for the system. > > I am not sure which one should be the default. From historical > point of view that sparse stuff was done as an optimization to omit > initial work and lazily give the whole world, I may have slight > preference to the "we pretend that you have everything, just some > parts may be slower to come to you" world view to be the default, > with an option to limit the view to whatever sparsity you initially > set up. It sounds like you are describing partial clone rather than sparse checkout? Or perhaps you're trying to blur the distinction, suggesting the two should be used together, with the partial clone machinery learning to download history within the specified sparse cones? > Regardless of the choice of the default, it would be a good > idea to make the subcommands consistently offer the same default and > allow the non-default views with the same UI. Agreed. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 19:07 ` Elijah Newren @ 2020-03-25 20:18 ` Junio C Hamano 0 siblings, 0 replies; 123+ messages in thread From: Junio C Hamano @ 2020-03-25 20:18 UTC (permalink / raw) To: Elijah Newren Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc Elijah Newren <newren@gmail.com> writes: > It sounds like you are describing partial clone rather than sparse > checkout? Or perhaps you're trying to blur the distinction, > suggesting the two should be used together, with the partial clone > machinery learning to download history within the specified sparse > cones? Yeah, I guess it is a little bit of both ;-) >> Regardless of the choice of the default, it would be a good >> idea to make the subcommands consistently offer the same default and >> allow the non-default views with the same UI. > > Agreed. Yup, thanks. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 18:30 ` Junio C Hamano 2020-03-24 19:07 ` Elijah Newren @ 2020-03-30 3:23 ` Matheus Tavares Bernardino 2020-03-31 19:12 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-03-30 3:23 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc On Tue, Mar 24, 2020 at 3:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > >> > >> In the last commit, git-grep learned to honor sparsity patterns. For > >> some use cases, however, it may be desirable to search outside the > >> sparse checkout. So add the '--ignore-sparsity' option, which restores > >> the old behavior. Also add the grep.ignoreSparsity configuration, to > >> allow setting this behavior by default. > > > > Should `--ignore-sparsity` be a global git option rather than a > > grep-specific one? Also, should grep.ignoreSparsity rather be > > core.ignoreSparsity or core.searchOutsideSparsePaths or something? > > Great question. I think "git diff" with various options would also > want to optionally be able to be confined within the sparse cone, or > checking the entire world by lazily fetching outside the sparsity. [...] > Regardless of the choice of the default, it would be a good > idea to make the subcommands consistently offer the same default and > allow the non-default views with the same UI. Yeah, it seems like a sensible path. Regarding implementation, there is the question that Elijah raised, of whether to use a global git option or separate but consistent options for each subcommand. I don't have much experience with sparse checkout to argument for one or another, so I would like to hear what others have to say about it. A question that comes to my mind regarding the global git option is: will --ignore-sparsity (or whichever name we choose for it [1]) be sufficient for all subcommands? Or may some of them require additional options for command-specific behaviors concerning sparsity patterns? Also, would it be OK if we just ignored the option in commands that do not operate differently in sparse checkouts (maybe, fetch, branch and send-email, for example)? And would it make sense to allow constructions such as `git --ignore-sparsity checkout` or even `git --ignore-sparsity sparse-checkout ...`? [1]: Does anyone have suggestions for the option/config name? The best I could come up with so far (without being too verbose) is --no-sparsity-constraints. But I fear this might sound generic. As Elijah already mentioned, --ignore-sparsity is not good either, as it introduces double negatives in code... ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-30 3:23 ` Matheus Tavares Bernardino @ 2020-03-31 19:12 ` Elijah Newren 2020-03-31 20:02 ` Derrick Stolee 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-03-31 19:12 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Junio C Hamano, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Jonathan Tan // adding Jonathan Tan to cc based on the fact that we keep bringing up partial clones and how it relates... On Sun, Mar 29, 2020 at 8:23 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Tue, Mar 24, 2020 at 3:30 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > Elijah Newren <newren@gmail.com> writes: > > > > > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > > > <matheus.bernardino@usp.br> wrote: > > >> > > >> In the last commit, git-grep learned to honor sparsity patterns. For > > >> some use cases, however, it may be desirable to search outside the > > >> sparse checkout. So add the '--ignore-sparsity' option, which restores > > >> the old behavior. Also add the grep.ignoreSparsity configuration, to > > >> allow setting this behavior by default. > > > > > > Should `--ignore-sparsity` be a global git option rather than a > > > grep-specific one? Also, should grep.ignoreSparsity rather be > > > core.ignoreSparsity or core.searchOutsideSparsePaths or something? > > > > Great question. I think "git diff" with various options would also > > want to optionally be able to be confined within the sparse cone, or > > checking the entire world by lazily fetching outside the sparsity. > [...] > > Regardless of the choice of the default, it would be a good > > idea to make the subcommands consistently offer the same default and > > allow the non-default views with the same UI. > > Yeah, it seems like a sensible path. Regarding implementation, there > is the question that Elijah raised, of whether to use a global git > option or separate but consistent options for each subcommand. I don't > have much experience with sparse checkout to argument for one or > another, so I would like to hear what others have to say about it. > > A question that comes to my mind regarding the global git option is: > will --ignore-sparsity (or whichever name we choose for it [1]) be > sufficient for all subcommands? Or may some of them require additional > options for command-specific behaviors concerning sparsity patterns? > Also, would it be OK if we just ignored the option in commands that do > not operate differently in sparse checkouts (maybe, fetch, branch and > send-email, for example)? And would it make sense to allow > constructions such as `git --ignore-sparsity checkout` or even `git > --ignore-sparsity sparse-checkout ...`? I think the same option would probably be sufficient for all subcommands, though I have a minor question about the merge machinery (below). And generally, I think it would be unusual for people to pass the command line flag; I suspect most would set a config option for most cases and then only occasionally override it on the command line. Since that config option would always be set, I'd expect commands that are unaffected to just ignore it (much like both "git -c merge.detectRenames=true fetch" and "git --work-tree=othertree fetch" will both ignore the irrelevant options rather than trying to detect that they were specified and error out). > [1]: Does anyone have suggestions for the option/config name? The best > I could come up with so far (without being too verbose) is > --no-sparsity-constraints. But I fear this might sound generic. As > Elijah already mentioned, --ignore-sparsity is not good either, as it > introduces double negatives in code... Does verbosity matter that much? I think people would set it in config, and tab completion would make it pretty easy to complete in any event. Anyway, maybe it will help if I provide a very rough first draft of what changes we could introduce to Documentation/config/core.txt, and then ask a bunch of my own questions about it below: """ core.restrictToSparsePaths:: Only meaningful in conjuntion with core.sparseCheckoutCone. This option extends sparse checkouts (which limit which paths are written to the worktree), so that output and operations are also limited to the sparsity paths where possible and implemented. The purpose of this option is to (1) focus output for the user on the portion of the repository that is of interest to them, and (2) enable potentially dramatic performance improvements, especially in conjunction with partial clones. + When this option is true, git commands such as log, diff, and grep may limit their output to the directories specified by the sparse cone, or to the intersection of those paths and any (like `*.c) that the user might also specify on the command line. (Note that this limit for diff and grep only becomes relevant with --cached or when specifying a REVISION, since a search of the working tree will automatically be limited to the sparse paths that are present.) Also, commands like bisect may only select commits which modify paths within the sparsity cone. The merge machinery may use the sparse paths as a heuristic to avoid trying to detect renames from within the sparsity cone to outside the sparsity cone when at least one side of history only touches paths within the sparsity cone (this can make the merge machinery faster, but may risk modify/delete conflicts since upstream can rename a file within the sparsity paths to a location outside them). Commands which export, integrity check, or create history will always operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), unaffected by any sparsity patterns. """ Several questions here, of course: * do people like or hate the name? indifferent? have alternate ideas? * should we restrict this to core.sparseCheckoutCone as I suggested above or also allow people to do it with core.sparseCheckout without the cone mode? I think attempting to weld partial clones together with core.sparseCheckout is crazy, so I'm tempted to just make it be specific to cone mode and to push people to use it. But I'm interested in thoughts on the matter. * should worktrees be affected? (I've been an advocate of new worktrees inheriting the sparse patterns of the worktree in use at the time the new worktree was created. Junio once suggested he didn't like that and that worktrees should start out dense. That seems problematic to me in big repos with partial clones and sparse chckouts in use. Perhaps dense new worktrees is the behavior you get when core.restrictToSparsePaths is false?) * does my idea for the merge machinery make folks uncomfortable? Should that be a different option? Being able to do trivial *tree* merges for the huge portion of the tree outside the sparsity paths would be a huge win, especially with partial clones, but it certainly is different. Then again, microsoft has disabled rename detection entirely based on it being too expensive, so perhaps the idea of rename-detection-within-your-cone-if-you-really-didn't-modify-anything-outside-the-cone-on-your-side-of-history is a reasonable middle ground between off and on for rename detection. * what should the default be? Junio suggested elsewhere[1] that sparse-checkouts and partial clones should probably be welded together (with partial clones downloading just history in the sparsity paths by default), in which case having this option be true would be useful. But it may also be slightly weird because it'll probably take us a while to implement this; while the big warning in git-sparse-checkout.txt certainly allows this: THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. It may still be slightly weird that the default behavior of commands in the presence of sparse-checkouts changes release to release until we get it all implemented. [1] https://lore.kernel.org/git/xmqqh7ycw5lc.fsf@gitster.c.googlers.com/ ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-31 19:12 ` Elijah Newren @ 2020-03-31 20:02 ` Derrick Stolee 2020-04-27 17:15 ` Matheus Tavares Bernardino 2020-04-29 17:21 ` Elijah Newren 0 siblings, 2 replies; 123+ messages in thread From: Derrick Stolee @ 2020-03-31 20:02 UTC (permalink / raw) To: Elijah Newren, Matheus Tavares Bernardino Cc: Junio C Hamano, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Jonathan Tan On 3/31/2020 3:12 PM, Elijah Newren wrote: > // adding Jonathan Tan to cc based on the fact that we keep bringing > up partial clones and how it relates... > > On Sun, Mar 29, 2020 at 8:23 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: >> >> On Tue, Mar 24, 2020 at 3:30 PM Junio C Hamano <gitster@pobox.com> wrote: >>> >>> Elijah Newren <newren@gmail.com> writes: >>> >>>> On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares >>>> <matheus.bernardino@usp.br> wrote: >>>>> >>>>> In the last commit, git-grep learned to honor sparsity patterns. For >>>>> some use cases, however, it may be desirable to search outside the >>>>> sparse checkout. So add the '--ignore-sparsity' option, which restores >>>>> the old behavior. Also add the grep.ignoreSparsity configuration, to >>>>> allow setting this behavior by default. >>>> >>>> Should `--ignore-sparsity` be a global git option rather than a >>>> grep-specific one? Also, should grep.ignoreSparsity rather be >>>> core.ignoreSparsity or core.searchOutsideSparsePaths or something? >>> >>> Great question. I think "git diff" with various options would also >>> want to optionally be able to be confined within the sparse cone, or >>> checking the entire world by lazily fetching outside the sparsity. >> [...] >>> Regardless of the choice of the default, it would be a good >>> idea to make the subcommands consistently offer the same default and >>> allow the non-default views with the same UI. >> >> Yeah, it seems like a sensible path. Regarding implementation, there >> is the question that Elijah raised, of whether to use a global git >> option or separate but consistent options for each subcommand. I don't >> have much experience with sparse checkout to argument for one or >> another, so I would like to hear what others have to say about it. >> >> A question that comes to my mind regarding the global git option is: >> will --ignore-sparsity (or whichever name we choose for it [1]) be >> sufficient for all subcommands? Or may some of them require additional >> options for command-specific behaviors concerning sparsity patterns? >> Also, would it be OK if we just ignored the option in commands that do >> not operate differently in sparse checkouts (maybe, fetch, branch and >> send-email, for example)? And would it make sense to allow >> constructions such as `git --ignore-sparsity checkout` or even `git >> --ignore-sparsity sparse-checkout ...`? > > I think the same option would probably be sufficient for all > subcommands, though I have a minor question about the merge machinery > (below). And generally, I think it would be unusual for people to > pass the command line flag; I suspect most would set a config option > for most cases and then only occasionally override it on the command > line. Since that config option would always be set, I'd expect > commands that are unaffected to just ignore it (much like both "git -c > merge.detectRenames=true fetch" and "git --work-tree=othertree fetch" > will both ignore the irrelevant options rather than trying to detect > that they were specified and error out). > >> [1]: Does anyone have suggestions for the option/config name? The best >> I could come up with so far (without being too verbose) is >> --no-sparsity-constraints. But I fear this might sound generic. As >> Elijah already mentioned, --ignore-sparsity is not good either, as it >> introduces double negatives in code... > > Does verbosity matter that much? I think people would set it in > config, and tab completion would make it pretty easy to complete in > any event. > > Anyway, maybe it will help if I provide a very rough first draft of > what changes we could introduce to Documentation/config/core.txt, and > then ask a bunch of my own questions about it below: > > """ > core.restrictToSparsePaths:: > Only meaningful in conjuntion with core.sparseCheckoutCone. > This option extends sparse checkouts (which limit which paths > are written to the worktree), so that output and operations > are also limited to the sparsity paths where possible and > implemented. The purpose of this option is to (1) focus > output for the user on the portion of the repository that is > of interest to them, and (2) enable potentially dramatic > performance improvements, especially in conjunction with > partial clones. > + > When this option is true, git commands such as log, diff, and grep may > limit their output to the directories specified by the sparse cone, or > to the intersection of those paths and any (like `*.c) that the user > might also specify on the command line. (Note that this limit for > diff and grep only becomes relevant with --cached or when specifying a > REVISION, since a search of the working tree will automatically be > limited to the sparse paths that are present.) Also, commands like > bisect may only select commits which modify paths within the sparsity > cone. The merge machinery may use the sparse paths as a heuristic to > avoid trying to detect renames from within the sparsity cone to > outside the sparsity cone when at least one side of history only > touches paths within the sparsity cone (this can make the merge > machinery faster, but may risk modify/delete conflicts since upstream > can rename a file within the sparsity paths to a location outside > them). Commands which export, integrity check, or create history will > always operate on full trees (e.g. fast-export, format-patch, fsck, > commit, etc.), unaffected by any sparsity patterns. > """ > > Several questions here, of course: > > * do people like or hate the name? indifferent? have alternate ideas? It's probably time to create a 'sparse-checkout' config space. That would allow sparse-checkout.restrictGrep = true as an option. Or a more general sparse-checkout.restrictCommands = true to make it clear that it affects multiple commands. > * should we restrict this to core.sparseCheckoutCone as I suggested > above or also allow people to do it with core.sparseCheckout without > the cone mode? I think attempting to weld partial clones together > with core.sparseCheckout is crazy, so I'm tempted to just make it be > specific to cone mode and to push people to use it. But I'm > interested in thoughts on the matter. Personally, I prefer cone mode and think it covers 99% of cases. However, there are some who are using a big directory full of large binaries and relying on file-prefix matches to get only the big binaries they need. Until they restructure their repositories to take advantage of cone mode, we should be considerate of the full sparse-checkout specification when possible. > * should worktrees be affected? (I've been an advocate of new > worktrees inheriting the sparse patterns of the worktree in use at the > time the new worktree was created. Junio once suggested he didn't > like that and that worktrees should start out dense. That seems > problematic to me in big repos with partial clones and sparse chckouts > in use. Perhaps dense new worktrees is the behavior you get when > core.restrictToSparsePaths is false?) We should probably consider a `--sparse` option for `git worktree add` so we can allow interested users to add worktrees that initialize to a sparse-checkout. Optionally create a config option that would copy the sparse-checkout file from the current repo to the worktree. > * does my idea for the merge machinery make folks uncomfortable? > Should that be a different option? Being able to do trivial *tree* > merges for the huge portion of the tree outside the sparsity paths > would be a huge win, especially with partial clones, but it certainly > is different. Then again, microsoft has disabled rename detection > entirely based on it being too expensive, so perhaps the idea of > rename-detection-within-your-cone-if-you-really-didn't-modify-anything-outside-the-cone-on-your-side-of-history > is a reasonable middle ground between off and on for rename detection. The part where you say " when at least one side of history only touches paths within the sparsity cone" makes me want to entertain the idea if it can be done cleanly. I'm more concerned about the "git bisect" logic being restricted to the cone, since that is such an open-ended command for what is considered "good" or "bad". > * what should the default be? Junio suggested elsewhere[1] that > sparse-checkouts and partial clones should probably be welded together > (with partial clones downloading just history in the sparsity paths by > default), in which case having this option be true would be useful. My opinion on this is as follows: filtering blobs based on sparse- checkout patterns does not filter enough, and filtering trees based on sparse-checkout patterns filters too much. The costs are just flipped: having extra trees is not a huge problem but recovering from a "tree miss" is problematic. Having extra blobs is painful, but recovering from a "blob miss" is not a big deal. > But it may also be slightly weird because it'll probably take us a > while to implement this; while the big warning in > git-sparse-checkout.txt certainly allows this: > THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER > COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN > THE FUTURE. > It may still be slightly weird that the default behavior of commands > in the presence of sparse-checkouts changes release to release until > we get it all implemented. I appreciate that we put that warning at the top. We will be able to do more experimental things with the feature because of it. The idea I'm toying with is to have "git clone --sparse" set core.sparseCheckoutCone = true. Also, if we are creating the "sparse-checkout.*" config space, we should "rename" core.sparseCheckoutCone to sparse-checkout.coneMode or something. We would need to support both for a while, for sure. Thanks, -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-31 20:02 ` Derrick Stolee @ 2020-04-27 17:15 ` Matheus Tavares Bernardino 2020-04-29 16:46 ` Elijah Newren 2020-04-29 17:21 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-04-27 17:15 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren, Junio C Hamano, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Jonathan Tan Hi, Stolee and Elijah I think I just finished addressing the comments on patch 2/3 [1]. And I'm now looking at the ones in 3/3 (this one). Below are some questions, just to make sure I'm going in the right direction with this one. On Tue, Mar 31, 2020 at 5:02 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/31/2020 3:12 PM, Elijah Newren wrote: > > > > Anyway, maybe it will help if I provide a very rough first draft of > > what changes we could introduce to Documentation/config/core.txt, and > > then ask a bunch of my own questions about it below: > > > > """ > > core.restrictToSparsePaths:: > > Only meaningful in conjuntion with core.sparseCheckoutCone. > > This option extends sparse checkouts (which limit which paths > > are written to the worktree), so that output and operations > > are also limited to the sparsity paths where possible and > > implemented. The purpose of this option is to (1) focus > > output for the user on the portion of the repository that is > > of interest to them, and (2) enable potentially dramatic > > performance improvements, especially in conjunction with > > partial clones. ... > > """ > > > > Several questions here, of course: > > > > * do people like or hate the name? indifferent? have alternate ideas? > > It's probably time to create a 'sparse-checkout' config space. That > would allow > > sparse-checkout.restrictGrep = true > > as an option. Or a more general > > sparse-checkout.restrictCommands = true > > to make it clear that it affects multiple commands. If we are creating the new namespace, 'core.sparseCheckout' should also be renamed to something like 'sparse-checkout.enabled', right? And maybe we could use 'sparsecheckout.*', instead? That seems to be the convention for settings on hyphenated commands (as in sendemail.*, uploadpack.* and gitgui.*). As for compatibility, when running `git sparse-checkout init`, if the config file already has the core.sparseCheckout setting, should we remove it? Or just add the new sparsecheckout.enabled config, which will always be read first? Also, should we emit a warning about the former being deprecated? The good thing about deprecation warnings, IMO, is that users will know the name change faster. But, at least for `git grep <tree>`, where we read core.sparseCheckout and core.sparseCheckoutCone for each submodule and each tree, there would be too much pollution in the output... Finally, about restrictCommands, the idea is to have both sparsecheckout.restrictCommands and `git --restrict-to-sparse-paths`, right? For now, the option/setting would only affect grep, but support would be added gradually to other commands in the future. I noticed git-read-tree already has a --no-sparse-checkout option. Should we remove this option in favor of the global --[no]-restrict-to-sparse-paths? Sorry for too many questions. I just wanted to make sure that I understood the plan before diving into the implementation, to avoid going in the wrong direction. [1]: Here is a sneak peek for v2 of patch 2/3, in case you might want to take a look: https://github.com/matheustavares/git/commit/970ef529f1e8f719c4427bd9fea8205ada69d913 ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-04-27 17:15 ` Matheus Tavares Bernardino @ 2020-04-29 16:46 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-04-29 16:46 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Derrick Stolee, Junio C Hamano, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Jonathan Tan On Mon, Apr 27, 2020 at 10:15 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > Hi, Stolee and Elijah > > I think I just finished addressing the comments on patch 2/3 [1]. And > I'm now looking at the ones in 3/3 (this one). Below are some > questions, just to make sure I'm going in the right direction with > this one. > > On Tue, Mar 31, 2020 at 5:02 PM Derrick Stolee <stolee@gmail.com> wrote: > > > > On 3/31/2020 3:12 PM, Elijah Newren wrote: > > > > > > Anyway, maybe it will help if I provide a very rough first draft of > > > what changes we could introduce to Documentation/config/core.txt, and > > > then ask a bunch of my own questions about it below: > > > > > > """ > > > core.restrictToSparsePaths:: > > > Only meaningful in conjuntion with core.sparseCheckoutCone. > > > This option extends sparse checkouts (which limit which paths > > > are written to the worktree), so that output and operations > > > are also limited to the sparsity paths where possible and > > > implemented. The purpose of this option is to (1) focus > > > output for the user on the portion of the repository that is > > > of interest to them, and (2) enable potentially dramatic > > > performance improvements, especially in conjunction with > > > partial clones. > ... > > > """ > > > > > > Several questions here, of course: > > > > > > * do people like or hate the name? indifferent? have alternate ideas? > > > > It's probably time to create a 'sparse-checkout' config space. That > > would allow > > > > sparse-checkout.restrictGrep = true > > > > as an option. Or a more general > > > > sparse-checkout.restrictCommands = true > > > > to make it clear that it affects multiple commands. > > If we are creating the new namespace, 'core.sparseCheckout' should > also be renamed to something like 'sparse-checkout.enabled', right? > And maybe we could use 'sparsecheckout.*', instead? That seems to be > the convention for settings on hyphenated commands (as in sendemail.*, > uploadpack.* and gitgui.*). Or maybe just call the namespace 'sparse.*' if we're going that route? > As for compatibility, when running `git sparse-checkout init`, if the > config file already has the core.sparseCheckout setting, should we > remove it? Or just add the new sparsecheckout.enabled config, which > will always be read first? We seem to have two competing issues: * If you remove the core.sparseCheckout setting in favor of sparse.enabled, then people can't use the repo with an older version of git. (This may be acceptable, but we've generally been somewhat careful with index extensions and such to avoid such a state, with slow transitions with index and pack versions and such.) * If you leave the core.sparseCheckout setting around as well as having sparse.enabled, then we have two different settings that we can keep in sync with newer git but which older git will only update one of. What do we do if we detect they are out of sync? Throw an error? Pretend that one overrules? If the older one overrules, what do we accomplish with the new name? If the newer name overrules, doesn't that also potentially break using an older git version? I'm not sure what to do here. Maybe people who have worked on index version and pack version transitions have some good suggestions for us? > Also, should we emit a warning about the former being deprecated? The > good thing about deprecation warnings, IMO, is that users will know > the name change faster. But, at least for `git grep <tree>`, where we > read core.sparseCheckout and core.sparseCheckoutCone for each > submodule and each tree, there would be too much pollution in the > output... We've already started to steer away from users setting these values and just have them get set/updated/unset by sparse-checkout init and sparse-checkout disable. Since users won't be setting these directly, I don't think deprecation warnings make sense. > Finally, about restrictCommands, the idea is to have both > sparsecheckout.restrictCommands and `git --restrict-to-sparse-paths`, > right? For now, the option/setting would only affect grep, but support > would be added gradually to other commands in the future. I noticed There should be both a config option and a global command line flag, yes. We might need the flag to default to not-restricting-to-sparse-paths for now because that's consistent with the only thing the current implementation of these commands can do. But I'm really worried that this will remain the default and we'll force users in the future to jump through a bunch of hoops to do a simple thing: $ git clone --sparse-paths $WANTED_DIRECTORIES user@server.name:path/to/repo.git $ cd repo <Enjoy their small view of the repo without every command suddenly requiring a network connection and downloading huge reams of data they don't even care about.> > git-read-tree already has a --no-sparse-checkout option. Should we > remove this option in favor of the global > --[no]-restrict-to-sparse-paths? read-tree is plumbing; we can't break backward compatibility. We'll have to leave that option there and just document that the two options do the same thing. > Sorry for too many questions. I just wanted to make sure that I > understood the plan before diving into the implementation, to avoid > going in the wrong direction. Nah, these are all good questions. Sorry for the delay in getting back to you. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-31 20:02 ` Derrick Stolee 2020-04-27 17:15 ` Matheus Tavares Bernardino @ 2020-04-29 17:21 ` Elijah Newren 1 sibling, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-04-29 17:21 UTC (permalink / raw) To: Derrick Stolee Cc: Matheus Tavares Bernardino, Junio C Hamano, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Jonathan Tan Sorry for the super late reply... On Tue, Mar 31, 2020 at 1:02 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/31/2020 3:12 PM, Elijah Newren wrote: > > // adding Jonathan Tan to cc based on the fact that we keep bringing > > up partial clones and how it relates... > > > > On Sun, Mar 29, 2020 at 8:23 PM Matheus Tavares Bernardino > > <matheus.bernardino@usp.br> wrote: > >> > >> On Tue, Mar 24, 2020 at 3:30 PM Junio C Hamano <gitster@pobox.com> wrote: > >>> > >>> Elijah Newren <newren@gmail.com> writes: > >>> > >>>> On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > >>>> <matheus.bernardino@usp.br> wrote: > >>>>> > >>>>> In the last commit, git-grep learned to honor sparsity patterns. For > >>>>> some use cases, however, it may be desirable to search outside the > >>>>> sparse checkout. So add the '--ignore-sparsity' option, which restores > >>>>> the old behavior. Also add the grep.ignoreSparsity configuration, to > >>>>> allow setting this behavior by default. > >>>> > >>>> Should `--ignore-sparsity` be a global git option rather than a > >>>> grep-specific one? Also, should grep.ignoreSparsity rather be > >>>> core.ignoreSparsity or core.searchOutsideSparsePaths or something? > >>> > >>> Great question. I think "git diff" with various options would also > >>> want to optionally be able to be confined within the sparse cone, or > >>> checking the entire world by lazily fetching outside the sparsity. > >> [...] > >>> Regardless of the choice of the default, it would be a good > >>> idea to make the subcommands consistently offer the same default and > >>> allow the non-default views with the same UI. > >> > >> Yeah, it seems like a sensible path. Regarding implementation, there > >> is the question that Elijah raised, of whether to use a global git > >> option or separate but consistent options for each subcommand. I don't > >> have much experience with sparse checkout to argument for one or > >> another, so I would like to hear what others have to say about it. > >> > >> A question that comes to my mind regarding the global git option is: > >> will --ignore-sparsity (or whichever name we choose for it [1]) be > >> sufficient for all subcommands? Or may some of them require additional > >> options for command-specific behaviors concerning sparsity patterns? > >> Also, would it be OK if we just ignored the option in commands that do > >> not operate differently in sparse checkouts (maybe, fetch, branch and > >> send-email, for example)? And would it make sense to allow > >> constructions such as `git --ignore-sparsity checkout` or even `git > >> --ignore-sparsity sparse-checkout ...`? > > > > I think the same option would probably be sufficient for all > > subcommands, though I have a minor question about the merge machinery > > (below). And generally, I think it would be unusual for people to > > pass the command line flag; I suspect most would set a config option > > for most cases and then only occasionally override it on the command > > line. Since that config option would always be set, I'd expect > > commands that are unaffected to just ignore it (much like both "git -c > > merge.detectRenames=true fetch" and "git --work-tree=othertree fetch" > > will both ignore the irrelevant options rather than trying to detect > > that they were specified and error out). > > > >> [1]: Does anyone have suggestions for the option/config name? The best > >> I could come up with so far (without being too verbose) is > >> --no-sparsity-constraints. But I fear this might sound generic. As > >> Elijah already mentioned, --ignore-sparsity is not good either, as it > >> introduces double negatives in code... > > > > Does verbosity matter that much? I think people would set it in > > config, and tab completion would make it pretty easy to complete in > > any event. > > > > Anyway, maybe it will help if I provide a very rough first draft of > > what changes we could introduce to Documentation/config/core.txt, and > > then ask a bunch of my own questions about it below: > > > > """ > > core.restrictToSparsePaths:: > > Only meaningful in conjuntion with core.sparseCheckoutCone. > > This option extends sparse checkouts (which limit which paths > > are written to the worktree), so that output and operations > > are also limited to the sparsity paths where possible and > > implemented. The purpose of this option is to (1) focus > > output for the user on the portion of the repository that is > > of interest to them, and (2) enable potentially dramatic > > performance improvements, especially in conjunction with > > partial clones. > > + > > When this option is true, git commands such as log, diff, and grep may > > limit their output to the directories specified by the sparse cone, or > > to the intersection of those paths and any (like `*.c) that the user > > might also specify on the command line. (Note that this limit for > > diff and grep only becomes relevant with --cached or when specifying a > > REVISION, since a search of the working tree will automatically be > > limited to the sparse paths that are present.) Also, commands like > > bisect may only select commits which modify paths within the sparsity > > cone. The merge machinery may use the sparse paths as a heuristic to > > avoid trying to detect renames from within the sparsity cone to > > outside the sparsity cone when at least one side of history only > > touches paths within the sparsity cone (this can make the merge > > machinery faster, but may risk modify/delete conflicts since upstream > > can rename a file within the sparsity paths to a location outside > > them). Commands which export, integrity check, or create history will > > always operate on full trees (e.g. fast-export, format-patch, fsck, > > commit, etc.), unaffected by any sparsity patterns. > > """ > > > > Several questions here, of course: > > > > * do people like or hate the name? indifferent? have alternate ideas? > > It's probably time to create a 'sparse-checkout' config space. That > would allow > > sparse-checkout.restrictGrep = true > > as an option. Or a more general > > sparse-checkout.restrictCommands = true > > to make it clear that it affects multiple commands. As I mentioned to Matheus, would a "sparse" config space be nicer? > > * should we restrict this to core.sparseCheckoutCone as I suggested > > above or also allow people to do it with core.sparseCheckout without > > the cone mode? I think attempting to weld partial clones together > > with core.sparseCheckout is crazy, so I'm tempted to just make it be > > specific to cone mode and to push people to use it. But I'm > > interested in thoughts on the matter. > > Personally, I prefer cone mode and think it covers 99% of cases. > However, there are some who are using a big directory full of large > binaries and relying on file-prefix matches to get only the big > binaries they need. Until they restructure their repositories to > take advantage of cone mode, we should be considerate of the full > sparse-checkout specification when possible. I agree with everything you say here except the last word; if you replaced "possible" with "practical" then I'd agree. In particular, I like the idea of a partial clone that defaults to grabbing all the blobs in the sparse path specification; I think it'd be reasonable to transfer the sparseCone specification to the server and have it use that to walk history and make a packfile. Transfering a sparse specification that does not match the cone mode requirements to a server and making it use that as it walks over all of history sounds like a good way to overload the server. > > * should worktrees be affected? (I've been an advocate of new > > worktrees inheriting the sparse patterns of the worktree in use at the > > time the new worktree was created. Junio once suggested he didn't > > like that and that worktrees should start out dense. That seems > > problematic to me in big repos with partial clones and sparse chckouts > > in use. Perhaps dense new worktrees is the behavior you get when > > core.restrictToSparsePaths is false?) > > We should probably consider a `--sparse` option for `git worktree add` > so we can allow interested users to add worktrees that initialize to > a sparse-checkout. Optionally create a config option that would copy > the sparse-checkout file from the current repo to the worktree. Okay, but if someone runs a future $ git clone --sparse $RELEVANT_DIRECTORIES user@server.name:path/to/repo.git $ cd repo <Blissfully work in their smaller repo without commands suddenly downloading reams of unwanted data> should the clone command automatically set this option for the user? I don't like the idea of users having to remember to set this option (and the restrictToSparsePaths option, and whatever other options are needed to work in their smaller environment). I'd really like there to be a single flag, in the form of some clone option, that sets all of this up. > > * does my idea for the merge machinery make folks uncomfortable? > > Should that be a different option? Being able to do trivial *tree* > > merges for the huge portion of the tree outside the sparsity paths > > would be a huge win, especially with partial clones, but it certainly > > is different. Then again, microsoft has disabled rename detection > > entirely based on it being too expensive, so perhaps the idea of > > rename-detection-within-your-cone-if-you-really-didn't-modify-anything-outside-the-cone-on-your-side-of-history > > is a reasonable middle ground between off and on for rename detection. > > The part where you say " when at least one side of history only > touches paths within the sparsity cone" makes me want to entertain > the idea if it can be done cleanly. Yeah, I still have to dig in and verify that this really works. > I'm more concerned about the "git bisect" logic being restricted to > the cone, since that is such an open-ended command for what is > considered "good" or "bad". If the sparse checkout has sufficient information for them to build and test whatever predicate they are interested in, then surely bisecting in a way that restricts to the cone would be a nice optimization, right? And if the cone doesn't have enough information for them to build and test commits, then they would need to leave the sparse checkout in order to bisect anyway. > > * what should the default be? Junio suggested elsewhere[1] that > > sparse-checkouts and partial clones should probably be welded together > > (with partial clones downloading just history in the sparsity paths by > > default), in which case having this option be true would be useful. > > My opinion on this is as follows: filtering blobs based on sparse- > checkout patterns does not filter enough, and filtering trees based > on sparse-checkout patterns filters too much. The costs are just > flipped: having extra trees is not a huge problem but recovering from > a "tree miss" is problematic. Having extra blobs is painful, but > recovering from a "blob miss" is not a big deal. Sounds like --filter=blob:none already solves your issues. It doesn't make me happy; I really want the history within the sparse cone to be downloaded as part of the initial clone. (I can see various ways that downloading all trees would be easier, so if we end up downloading all commits and all trees and just the blobs within the sparse cone, that sounds fine to me.) > > But it may also be slightly weird because it'll probably take us a > > while to implement this; while the big warning in > > git-sparse-checkout.txt certainly allows this: > > THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER > > COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN > > THE FUTURE. > > It may still be slightly weird that the default behavior of commands > > in the presence of sparse-checkouts changes release to release until > > we get it all implemented. > > I appreciate that we put that warning at the top. We will be > able to do more experimental things with the feature because > of it. The idea I'm toying with is to have "git clone --sparse" > set core.sparseCheckoutCone = true. Sounds good to me. We might also want to set worktrees.copySparsity and sparse.restrictToCone (or whatever these end up being named) as well. > Also, if we are creating the "sparse-checkout.*" config space, > we should "rename" core.sparseCheckoutCone to sparse-checkout.coneMode > or something. We would need to support both for a while, for sure. And, if we automatically migrate the setting and delete the old one, do we prevent someone from successfully using an older git version with the repo? Or, if we don't automatically unset the old one, do we risk the two values getting out of sync if they do switch to an older git version? ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-24 7:54 ` Elijah Newren 2020-03-24 18:30 ` Junio C Hamano @ 2020-03-25 23:15 ` Matheus Tavares Bernardino 2020-03-26 6:02 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-03-25 23:15 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Junio C Hamano On Tue, Mar 24, 2020 at 4:55 AM Elijah Newren <newren@gmail.com> wrote: > > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > --- > > > > Note: I still have to make --ignore-sparsity be able to work together > > with --untracked. Unfortunatelly, this won't be as simple because the > > codeflow taken by --untracked goes to grep_directory() which just > > iterates the working tree, without looking the index entries. So I will > > have to either: make --untracked use grep_cache(), and grep the > > untracked files later; or try matching the working tree paths against > > the sparsity patterns, without looking for the skip_worktree bit in > > the index (as I mentioned in the previous patch's comments). Any > > preferences regarding these two approaches? (or other suggestions?) > > Hmm. So, 'tracked' in git is the idea that we are keeping information > about specific files. 'sparse-checkout' is the idea that we have a > subset of those that we can work with without materializing all the > other tracked files; it's clearly a subset of the realm of 'tracked'. > 'untracked' is about getting everything outside the set of 'tracked' > files, which to me means it is clearly outside the set of sparsity > paths too (and thus you could take --untracked as implying > --ignore-sparsity, though whether you do might not matter in practice > because of the items I'll discuss next). Of course, I am also > assuming `--untracked` is incompatible with --cached or specifying > revisions or trees (based on it's definiton of "In addition to > searching in the tracked files in the *working tree*, search also in > untracked files." -- emphasis added.) Hm, I see the point now, but I'm still a little confused: The "in the working tree" section of the definition would exclude non checked out files, right? However, git-grep's description says "Look for specified patterns in the tracked files *in the work tree*", and it still searches non checked out files (loading them from the cache, even when --cache is not given). I know that's exactly what we are trying to change with this patchset, but we will still give the --ignore-sparsity option to allow the old behavior when needed (unless we prohibit using --ignore-sparsity without --cached or $REV). I guess my doubt is whether the problem is in the implementation of the working tree grep, which considers non checked out files, or in the docs, which say "tracked files *in the work tree*". I tend to go with the latter, since using `git grep --ignore-sparsity` in a sparse checked out working tree, to grep not present files as well, kind of makes sense to me. And if the problem is indeed in the docs, then I think we should also allow --ignore-sparsity when grepping with --untracked, since it's an analogous case. > If the incompatibility of > --untracked and --cached/REVSIONS/TREES is not enforced, we may want > to look into erroring out if they are given together. Once we do, we > don't have to worry about grep_cache() at all in the case of > --untracked and shouldn't. Files with the skip_worktree bit won't > exist in the working directory, and thus won't be searched (this is > what makes --untracked imply --ignore-sparsity not really matter). > > In short: With --untracked you are grepping ALL (non-ignored) files in > the working directory -- either because they are both tracked and in > the sparsity paths (anything tracked that isn't in the sparsity paths > has the skip_worktree bit and thus isn't present), or because it is an > untracked file. [And this may be what grep_directory() already does.] > > Does that make sense? It does, and thanks for a very detailed explanation. But as I mentioned before, I'm a little uncertain about --untracked implying --ignore-spasity. The commit that added --untracked (0a93fb8) says: "grep --untracked" would find the specified patterns from files in untracked files in addition to its usual behaviour of finding them in the tracked files So, in my mind, it feels like --untracked wasn't meant to limit the search to "all non-ignored files in the working directory", but to add untracked files to the search (which could also contain tracked but non checked out files). Wouldn't the "all non-ignored files in the working directory" case be the use of --no-index? > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 52ec72a036..17eae3edd6 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c ... > > > > @@ -487,7 +492,7 @@ static int grep_cache(struct grep_opt *opt, > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > > const struct cache_entry *ce = repo->index->cache[nr]; > > > > - if (ce_skip_worktree(ce)) > > + if (!ignore_sparsity && ce_skip_worktree(ce)) > > Oh boy on the double negatives...maybe we want to rename this flag somehow? Yeah, I also thought about that, but couldn't come up with a better name myself... My alternatives were all too verbose. ... > I'm super excited to see work in this area. I hope I'm not > discouraging you by attempting to provide what I think is the bigger > picture I'd like us to work towards. Not at all! :) Thanks a lot for the bigger picture and other explanations. They help me understand the long-term goals and make better decisions now. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-25 23:15 ` Matheus Tavares Bernardino @ 2020-03-26 6:02 ` Elijah Newren 2020-03-27 15:51 ` Junio C Hamano 2020-03-30 1:12 ` Matheus Tavares Bernardino 0 siblings, 2 replies; 123+ messages in thread From: Elijah Newren @ 2020-03-26 6:02 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Junio C Hamano Hi Matheus! On Wed, Mar 25, 2020 at 4:15 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Tue, Mar 24, 2020 at 4:55 AM Elijah Newren <newren@gmail.com> wrote: > > > > On Mon, Mar 23, 2020 at 11:13 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > > > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > > --- > > > > > > Note: I still have to make --ignore-sparsity be able to work together > > > with --untracked. Unfortunatelly, this won't be as simple because the > > > codeflow taken by --untracked goes to grep_directory() which just > > > iterates the working tree, without looking the index entries. So I will > > > have to either: make --untracked use grep_cache(), and grep the > > > untracked files later; or try matching the working tree paths against > > > the sparsity patterns, without looking for the skip_worktree bit in > > > the index (as I mentioned in the previous patch's comments). Any > > > preferences regarding these two approaches? (or other suggestions?) > > > > Hmm. So, 'tracked' in git is the idea that we are keeping information > > about specific files. 'sparse-checkout' is the idea that we have a > > subset of those that we can work with without materializing all the > > other tracked files; it's clearly a subset of the realm of 'tracked'. > > 'untracked' is about getting everything outside the set of 'tracked' > > files, which to me means it is clearly outside the set of sparsity > > paths too (and thus you could take --untracked as implying > > --ignore-sparsity, though whether you do might not matter in practice > > because of the items I'll discuss next). Of course, I am also > > assuming `--untracked` is incompatible with --cached or specifying > > revisions or trees (based on it's definiton of "In addition to > > searching in the tracked files in the *working tree*, search also in > > untracked files." -- emphasis added.) > > Hm, I see the point now, but I'm still a little confused: The "in the > working tree" section of the definition would exclude non checked out > files, right? However, git-grep's description says "Look for specified > patterns in the tracked files *in the work tree*", and it still > searches non checked out files (loading them from the cache, even when > --cache is not given). I know that's exactly what we are trying to I really respect Duy and he does some amazing work and I wish he were still active in git, but the SKIP_WORKTREE stuff wasn't his best work and even he downplayed it: "In my defense it was one of my first contribution when I was naiver...I'd love to hear how sparse checkout could be improved, or even replaced."[0] I've seen enough egregiously confusing cases and enough difficult-to-recover-from cases with the implementation of the SKIP_WORKTREE handling that I think it is dangerous to assume behavior you see with it is intended design. A year and a half ago, I read all available docs to figure out how to sparsify and de-sparsify, and read them several times but was still confused. If I could only figure it out with great difficulty, a lot of google searching, and even trying to look at the code, what chance did "normal" users stand? To add more flavor to that argument, let me cite [1] (the three paragraphs starting with "Playing with sparse-checkout, it feels to me like a half-baked feature"), [2], as well as good chunks of [3], [4], and [5]. [0] https://lore.kernel.org/git/CACsJy8ArUXD0cF2vQAVnzM_AGto2k2yQTFuTO7PhP4ffHM8dVQ@mail.gmail.com/ [1] https://lore.kernel.org/git/CABPp-BFKf2N6TYzCCneRwWUektMzRMnHLZ8JT64q=MGj5WQZkA@mail.gmail.com/ [2] https://lore.kernel.org/git/CABPp-BGE-m_UFfUt_moXG-YR=ZW8hMzMwraD7fkFV-+sEHw36w@mail.gmail.com/ [3] https://lore.kernel.org/git/pull.316.git.gitgitgadget@gmail.com/ [4] https://lore.kernel.org/git/pull.513.git.1579029962.gitgitgadget@gmail.com/ [5] https://lore.kernel.org/git/a46439c8536f912ad4a1e1751852cf477d3d7dc7.1584813609.git.gitgitgadget@gmail.com/ But let me try to explain it all below from first principles in a way that will hopefully make sense why falling back to loading from the cache when --cached is not given is just flat wrong. The explanation from first principles should also help explain --untracked a bit better, and when there are decisions about whether to use sparsity patterns. > change with this patchset, but we will still give the > --ignore-sparsity option to allow the old behavior when needed (unless > we prohibit using --ignore-sparsity without --cached or $REV). I guess > my doubt is whether the problem is in the implementation of the > working tree grep, which considers non checked out files, or in the > docs, which say "tracked files *in the work tree*". > > I tend to go with the latter, since using `git grep --ignore-sparsity` > in a sparse checked out working tree, to grep not present files as > well, kind of makes sense to me. And if the problem is indeed in the > docs, then I think we should also allow --ignore-sparsity when > grepping with --untracked, since it's an analogous case. It's probably not a surprise to you given what I've already said above to hear me say that the docs are correct in this case. But not only are the docs correct, I'll go even further and claim that falling back to the cache when --cached is not passed is indefensible and leads to surprises and contradictions. But instead of just claiming that, let me try to spell out a bit better why I believe that from first principles, though: There were previously three types of files for git: * tracked * ignored * untracked where: * tracked was defined as "recorded in index" * ignored was defined as "a file which is not tracked and which matches an ignore rule (.gitignore, .git/info/exclude, etc.)" * untracked was defined as "all other files present in the working directory". With the SKIP_WORKTREE bit and sparse-checkouts, we actually have four types because we split the "tracked" category into two: * tracked and matches the sparsity patterns (implies it will be missing from the working directory as the SKIP_WORKTREE bit is set) * tracked and does not match the sparsity patterns (implies it will be present in the working directory, as the SKIP_WORKTREE bit is not set) But let's ignore the splitting of the tracked type for a minute as well as everything else related to sparseness. Let's just look at how grep was designed. git grep has traditionally been about searching "tracked files in the work tree" as you highlighted (and note that sparsity bits came four years later in 2009, so cannot undercut that claim). If the user has made edits to files and hasn't staged them, grep would search those working tree files with their edits, not old cached versions of those files. People were told that git grep was a great way to just search relevant stuff (instead of normal grep which would look through build results and random big files in your working directory that you weren't even tracking). Then in 2011 grep gained options like --untracked to extend the search in the working tree to also include untracked files, and added --no-exclude-standard (which is "only useful with --untracked") so that people had a way to search *all* files in the working tree (tracked, untracked, and ignored files). (Note: no mechanism was provided for searching tracked and ignored files without untracked as far as I can tell, though I don't see why that would make sense.) git-grep also gained options like --no-index so that it could be used in a directory that wasn't tracked by git at all -- it turns out people liked git-grep better than normal grep (I think it got colorization first?), even for things that weren't being tracked by git. But again, all these cases were about searching files that existed in the working tree. Of course, people sometimes wanted to search a version other than what existed in the working tree. And thus options like --cached or specifying a REVISION were added early on. Sometimes, code that wasn't meant to be used together accidentally is used together or the docs suggest they can be used together. In 2010, someone had to clarify that --cached was incompatible with <tree>; not sure why someone would attempt to use them together, but that's the type of accident that is easy to have in the implementation or docs because it doesn't even occur to people who understand the design and the data structures why anyone would attempt that. Inevitably, someone comes along who doesn't understand the underlying data structures or design or terminology and tries incompatible options together...and then gets surprised. (Side note: I think this kind of issues occurs fairly frequently, so I'm unlikely to assume options were meant to be supported together based solely on a lack of logic that would throw an error when both are specified. We could probably add a bunch of useful microprojects around checking for flags that should be incompatible and making sure git throws errors when both are specified. We had lots of cases in rebase, for example, where if users happened to specify two flags then one would just be silently ignored.) REVISION and --cached are not just incompatible with each other; each is incompatible with all three of --untracked, --no-index, and --no-exclude-standard. This is because REVISION and --cached are about picking some version other than what exists in the working tree to search through, while those other options are all intended for when we are searching through files in the working tree (and in particular, exist to extend how many files in the working tree we look through). One more useful case to consider before we start adding SKIP_WORKTREE into the mix. Let's say that you have three files: fileA fileB fileC and all of them are tracked. You have made edits to fileA and fileB, and ran 'rm fileC' (NOT 'git rm fileC', i.e. the deletion is not staged). Now, you run 'git grep mystring'. Quick question: Which files are searched for 'mystring'? Well... * REVISION and --cached were left out of the git grep command, so working tree files should be searched, not staged versions or versions from other commits * No flags like --untracked or --no-exclude-standard were included, so only tracked files in the working tree should be searched * There are two files in the working tree, both tracked: fileA and fileB. So, this searches fileA and fileB. In particular: NO VERSION of fileC is searched. fileC may be tracked/cached, but we don't search any version of that file, because this particular command line is about searching the working directory and fileC is not in the working directory. To the best of my knowledge, git grep has always behaved that way. Users understand the idea of searching the working copy vs. the index vs. "old" (or different) versions of the repository. They also understand that when searching the working copy, by default a subset of the files are searched. Tell me: given all this information here, what possible explanation is there for SKIP_WORKTREE entries to be translated into searches of the cache when --cached is not specified? Please square that away with the fact that 'rm fileC' results in fileC NOT being searched. It's just completely, utterly wrong. Also, hopefully this helps answer your question about --untracked and skip_worktree. --untracked is only useful when searching through the working tree, and is entirely about adding the "untracked" category to the things we search. The skip_worktree bit is about adding more granularity to the "tracked" category. The two are thus entirely orthogonal and --untracked shouldn't change behavior at all in the face of sparse checkouts. And I also think it explains more when the sparsity patterns and --ignore-sparsity-patterns flags even matter. The division of working tree files which were tracked into two subsets (those that match sparsity patterns and those that don't) didn't matter because only one of those two sets existed and could be searched. So the question is, when can the sparsity pattern divide a set of files into two subsets where both are non-empty? And the answer is when --cached or REVISION is specified. This is the case Junio recently brought up and said that there are good reasons users might want to limit to just the paths that match the sparsity patterns, and other reasons when users might want to search everything[6]. So, both cases need to be supported fairly easily, and this will be true for several commands besides just grep. [6] https://lore.kernel.org/git/xmqq7dz938sc.fsf@gitster.c.googlers.com/ > > If the incompatibility of > > --untracked and --cached/REVSIONS/TREES is not enforced, we may want > > to look into erroring out if they are given together. Once we do, we > > don't have to worry about grep_cache() at all in the case of > > --untracked and shouldn't. Files with the skip_worktree bit won't > > exist in the working directory, and thus won't be searched (this is > > what makes --untracked imply --ignore-sparsity not really matter). > > > > In short: With --untracked you are grepping ALL (non-ignored) files in > > the working directory -- either because they are both tracked and in > > the sparsity paths (anything tracked that isn't in the sparsity paths > > has the skip_worktree bit and thus isn't present), or because it is an > > untracked file. [And this may be what grep_directory() already does.] > > > > Does that make sense? > > It does, and thanks for a very detailed explanation. But as I > mentioned before, I'm a little uncertain about --untracked implying > --ignore-sparsity. The commit that added --untracked (0a93fb8) says: > > "grep --untracked" would find the specified patterns from files in > untracked files in addition to its usual behaviour of finding them in > the tracked files > > So, in my mind, it feels like --untracked wasn't meant to limit the > search to "all non-ignored files in the working directory", but to add > untracked files to the search (which could also contain tracked but > non checked out files). Wouldn't the "all non-ignored files in the > working directory" case be the use of --no-index? --no-index is specifically designed for when the directory isn't tracked by git at all. It would be equivalent, though, to saying we wanted to search all files in the working copy regardless of whether they are tracked, untracked, or ignored, i.e. equivalent to specifying both --untracked and --no-exclude-standard. And you were right to be uncertain about --untracked implying --ignore-sparsity; --untracked is completely orthogonal to sparsity. (However, it wouldn't much matter if it did imply that option or if it implied its opposite: --untracked implies we are only looking at the working directory files, and thus we aren't even going to check the sparsity patterns, we'll just check which files exist in the working directory. `git sparse-checkout reapply` will care about the sparsity patterns and possibly add files to the working copy or remove some, but grep certainly shouldn't be having a side effect like that; it should just search the directory as it exists.) > > > diff --git a/builtin/grep.c b/builtin/grep.c > > > index 52ec72a036..17eae3edd6 100644 > > > --- a/builtin/grep.c > > > +++ b/builtin/grep.c > ... > > > > > > @@ -487,7 +492,7 @@ static int grep_cache(struct grep_opt *opt, > > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > > > const struct cache_entry *ce = repo->index->cache[nr]; > > > > > > - if (ce_skip_worktree(ce)) > > > + if (!ignore_sparsity && ce_skip_worktree(ce)) > > > > Oh boy on the double negatives...maybe we want to rename this flag somehow? > > Yeah, I also thought about that, but couldn't come up with a better > name myself... My alternatives were all too verbose. > > ... > > I'm super excited to see work in this area. I hope I'm not > > discouraging you by attempting to provide what I think is the bigger > > picture I'd like us to work towards. > > Not at all! :) Thanks a lot for the bigger picture and other > explanations. They help me understand the long-term goals and make > better decisions now. Hope this email helps too. I've composed it over about 4 different sessions with various interruptions, so there's a good chance all my edits and loss of train of thought might have made something murky. Let me know which part(s) are confusing and I'll try to clarify. Elijah ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-26 6:02 ` Elijah Newren @ 2020-03-27 15:51 ` Junio C Hamano 2020-03-27 19:01 ` Elijah Newren 2020-03-30 1:12 ` Matheus Tavares Bernardino 1 sibling, 1 reply; 123+ messages in thread From: Junio C Hamano @ 2020-03-27 15:51 UTC (permalink / raw) To: Elijah Newren Cc: Matheus Tavares Bernardino, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc Elijah Newren <newren@gmail.com> writes: > Sometimes, code that wasn't meant to be used together accidentally is > used together or the docs suggest they can be used together. ... > ... but that's the > type of accident that is easy to have in the implementation or docs > because it doesn't even occur to people who understand the design and > the data structures why anyone would attempt that. The above is not limited to "git grep", but you said so clearly what I have felt, without being able to express myself in a satisfactory manner, for the last 10 years. > ... (Side note: I think this kind of > issues occurs fairly frequently, so I'm unlikely to assume options > were meant to be supported together based solely on a lack of logic > that would throw an error when both are specified. Amen to that. By the way, and I am so sorry to making the main issue of the discussion into a mere "by the way" point, but if I understand your message correctly, the primary conclusion in there is that a file that is not in the working tree, if the sparsity pattern tells us that it should not be checked out to the working tree, should not be sought in the index instead. I think I agree with that conclusion. I however have some disagreement on a minor point, though. "git grep -e '<pattern>' master" looks for the pattern in the commit at the tip of the master branch. "git grep -e '<pattern>' master pu" does so in these two commits. I do not think it is conceptually wrong to allow "git grep -e '<pattern>' --cached master pu" to look for three "commits", i.e. those two commits that already exist, plus the one you would be creating if you were to "git commit" right now. Similarly, I do not see a reason why we should forbid looking for the same pattern in the tracked files in the working tree at the same time we check tree object(s) and/or the index. At least in principle. There are two practical issues that makes these combinations problematic, but I do not think they are insurmountable. - Once you give an object on the command line, there is no syntax to let you say "oh, by the way, I want the working tree as well". If you are looking in the index, the working tree, and optionally in some objects, "--index" instead of "--cached" would be the standard way to tell the command "I want to affect both the index and the working tree", but there is no way to say "I want only tracked files in the working tree and these objects searched". We'd need a new syntax to express it if we wanted to allow the combination. - The lines found in the working tree and in the index are prefixed by the filename, while they are prefixed by the tree's name and a colon. When output for the working tree and the index are combined, we cannot tell where each hit came from. We need to change the output to allow us to tell them apart, by e.g. prefixing "<worktree>:" and "<index>:" in a way similar to we use "<revision>:". Thanks. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-27 15:51 ` Junio C Hamano @ 2020-03-27 19:01 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-03-27 19:01 UTC (permalink / raw) To: Junio C Hamano Cc: Matheus Tavares Bernardino, Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc On Fri, Mar 27, 2020 at 8:51 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > Sometimes, code that wasn't meant to be used together accidentally is > > used together or the docs suggest they can be used together. ... > > ... but that's the > > type of accident that is easy to have in the implementation or docs > > because it doesn't even occur to people who understand the design and > > the data structures why anyone would attempt that. > > The above is not limited to "git grep", but you said so clearly what > I have felt, without being able to express myself in a satisfactory > manner, for the last 10 years. > > > ... (Side note: I think this kind of > > issues occurs fairly frequently, so I'm unlikely to assume options > > were meant to be supported together based solely on a lack of logic > > that would throw an error when both are specified. > > Amen to that. > > By the way, and I am so sorry to making the main issue of the > discussion into a mere "by the way" point, but if I understand your > message correctly, the primary conclusion in there is that a file > that is not in the working tree, if the sparsity pattern tells us > that it should not be checked out to the working tree, should not be > sought in the index instead. I think I agree with that conclusion. Cool. > I however have some disagreement on a minor point, though. > > "git grep -e '<pattern>' master" looks for the pattern in the commit > at the tip of the master branch. "git grep -e '<pattern>' master > pu" does so in these two commits. I do not think it is conceptually > wrong to allow "git grep -e '<pattern>' --cached master pu" to look > for three "commits", i.e. those two commits that already exist, plus > the one you would be creating if you were to "git commit" right now. > Similarly, I do not see a reason why we should forbid looking for > the same pattern in the tracked files in the working tree at the > same time we check tree object(s) and/or the index. > > At least in principle. > > There are two practical issues that makes these combinations > problematic, but I do not think they are insurmountable. > > - Once you give an object on the command line, there is no syntax > to let you say "oh, by the way, I want the working tree as well". > If you are looking in the index, the working tree, and optionally > in some objects, "--index" instead of "--cached" would be the > standard way to tell the command "I want to affect both the index > and the working tree", but there is no way to say "I want only > tracked files in the working tree and these objects searched". > We'd need a new syntax to express it if we wanted to allow the > combination. > > - The lines found in the working tree and in the index are prefixed > by the filename, while they are prefixed by the tree's name and a > colon. When output for the working tree and the index are > combined, we cannot tell where each hit came from. We need to > change the output to allow us to tell them apart, by > e.g. prefixing "<worktree>:" and "<index>:" in a way similar to > we use "<revision>:". > > Thanks. Ah, so you're saying that even though --cached and REVISION are incompatible today, that's not fundamental and we could conceivably let them or even more options be used together in the future and you even highlight how it could be made to sensibly work. I agree with what you say here: _if_ there is a way for users to explicitly specify that they want to search multiple versions (whether that is revisions or the index or the working tree), _and_ we have a way to distinguish which version we found the results from, then (and only then) it'd make sense to search the complete set of files from each of those versions and show the results for the matches we found. That differs in multiple important ways from the SKIP_WORKTREE behavior I was railing against, and I think what you propose as a possibility in contrast would make sense. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-26 6:02 ` Elijah Newren 2020-03-27 15:51 ` Junio C Hamano @ 2020-03-30 1:12 ` Matheus Tavares Bernardino 2020-03-31 16:48 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-03-30 1:12 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Junio C Hamano On Thu, Mar 26, 2020 at 3:02 AM Elijah Newren <newren@gmail.com> wrote: > > Hi Matheus! Hi, Elijah. First of all, thanks for taking the time to go over these topics in great detail. I must say it's much clearer for me now. > On Wed, Mar 25, 2020 at 4:15 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > [...] > One more useful case to consider before we start adding SKIP_WORKTREE > into the mix. Let's say that you have three files: > fileA > fileB > fileC > and all of them are tracked. You have made edits to fileA and fileB, > and ran 'rm fileC' (NOT 'git rm fileC', i.e. the deletion is not > staged). Now, you run 'git grep mystring'. Quick question: Which > files are searched for 'mystring'? Well... > * REVISION and --cached were left out of the git grep command, so > working tree files should be searched, not staged versions or versions > from other commits > * No flags like --untracked or --no-exclude-standard were included, > so only tracked files in the working tree should be searched > * There are two files in the working tree, both tracked: fileA and fileB. > So, this searches fileA and fileB. In particular: NO VERSION of fileC > is searched. fileC may be tracked/cached, but we don't search any > version of that file, because this particular command line is about > searching the working directory and fileC is not in the working > directory. To the best of my knowledge, git grep has always behaved > that way. > > Users understand the idea of searching the working copy vs. the index > vs. "old" (or different) versions of the repository. They also > understand that when searching the working copy, by default a subset > of the files are searched. Tell me: given all this information here, > what possible explanation is there for SKIP_WORKTREE entries to be > translated into searches of the cache when --cached is not specified? > Please square that away with the fact that 'rm fileC' results in fileC > NOT being searched. > > It's just completely, utterly wrong. Makes sense, thanks. I agree that we shouldn't fall back to the cache when searching the working tree. > Also, hopefully this helps answer your question about --untracked and > skip_worktree. --untracked is only useful when searching through the > working tree, and is entirely about adding the "untracked" category to > the things we search. The skip_worktree bit is about adding more > granularity to the "tracked" category. The two are thus entirely > orthogonal and --untracked shouldn't change behavior at all in the > face of sparse checkouts. Thanks, your explanation clarified the issue I had. I see now why --untracked and --ignore-sparsity don't make sense together. It also made me think about the combination of --cached and --untracked which, IIUC, should be prohibited. I will add a patch in v2, making git-grep error out in this case. > And I also think it explains more when the sparsity patterns and > --ignore-sparsity-patterns flags even matter. The division of working > tree files which were tracked into two subsets (those that match > sparsity patterns and those that don't) didn't matter because only one > of those two sets existed and could be searched. So the question is, > when can the sparsity pattern divide a set of files into two subsets > where both are non-empty? And the answer is when --cached or REVISION > is specified. Makes sense. I will add in --ignore-sparsity's description that it is only relevant with --cached or REVISION, as you previously suggested. When it is used outside of these cases, though, I think we could just warn that --ignore-sparsity will be discarded (to avoid erroring out when users have grep.ignoreSparsity enabled). ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns 2020-03-30 1:12 ` Matheus Tavares Bernardino @ 2020-03-31 16:48 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-03-31 16:48 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Derrick Stolee, Nguyễn Thái Ngọc, Junio C Hamano On Sun, Mar 29, 2020 at 6:13 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Thu, Mar 26, 2020 at 3:02 AM Elijah Newren <newren@gmail.com> wrote: > > > > Hi Matheus! > > Hi, Elijah. > > First of all, thanks for taking the time to go over these topics in > great detail. I must say it's much clearer for me now. > > > On Wed, Mar 25, 2020 at 4:15 PM Matheus Tavares Bernardino > > <matheus.bernardino@usp.br> wrote: > > > > [...] > > One more useful case to consider before we start adding SKIP_WORKTREE > > into the mix. Let's say that you have three files: > > fileA > > fileB > > fileC > > and all of them are tracked. You have made edits to fileA and fileB, > > and ran 'rm fileC' (NOT 'git rm fileC', i.e. the deletion is not > > staged). Now, you run 'git grep mystring'. Quick question: Which > > files are searched for 'mystring'? Well... > > * REVISION and --cached were left out of the git grep command, so > > working tree files should be searched, not staged versions or versions > > from other commits > > * No flags like --untracked or --no-exclude-standard were included, > > so only tracked files in the working tree should be searched > > * There are two files in the working tree, both tracked: fileA and fileB. > > So, this searches fileA and fileB. In particular: NO VERSION of fileC > > is searched. fileC may be tracked/cached, but we don't search any > > version of that file, because this particular command line is about > > searching the working directory and fileC is not in the working > > directory. To the best of my knowledge, git grep has always behaved > > that way. > > > > Users understand the idea of searching the working copy vs. the index > > vs. "old" (or different) versions of the repository. They also > > understand that when searching the working copy, by default a subset > > of the files are searched. Tell me: given all this information here, > > what possible explanation is there for SKIP_WORKTREE entries to be > > translated into searches of the cache when --cached is not specified? > > Please square that away with the fact that 'rm fileC' results in fileC > > NOT being searched. > > > > It's just completely, utterly wrong. > > Makes sense, thanks. I agree that we shouldn't fall back to the cache > when searching the working tree. > > > Also, hopefully this helps answer your question about --untracked and > > skip_worktree. --untracked is only useful when searching through the > > working tree, and is entirely about adding the "untracked" category to > > the things we search. The skip_worktree bit is about adding more > > granularity to the "tracked" category. The two are thus entirely > > orthogonal and --untracked shouldn't change behavior at all in the > > face of sparse checkouts. > > Thanks, your explanation clarified the issue I had. I see now why > --untracked and --ignore-sparsity don't make sense together. > > It also made me think about the combination of --cached and > --untracked which, IIUC, should be prohibited. I will add a patch in > v2, making git-grep error out in this case. > > > And I also think it explains more when the sparsity patterns and > > --ignore-sparsity-patterns flags even matter. The division of working > > tree files which were tracked into two subsets (those that match > > sparsity patterns and those that don't) didn't matter because only one > > of those two sets existed and could be searched. So the question is, > > when can the sparsity pattern divide a set of files into two subsets > > where both are non-empty? And the answer is when --cached or REVISION > > is specified. > > Makes sense. I will add in --ignore-sparsity's description that it is > only relevant with --cached or REVISION, as you previously suggested. > When it is used outside of these cases, though, I think we could just > warn that --ignore-sparsity will be discarded (to avoid erroring out > when users have grep.ignoreSparsity enabled). Not grep.ignoreSparsity but core.ignoreSparsity or core.$WHATEVER ;-) ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it 2020-03-24 6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (2 preceding siblings ...) 2020-03-24 6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares @ 2020-05-10 0:41 ` Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares ` (4 more replies) 3 siblings, 5 replies; 123+ messages in thread From: Matheus Tavares @ 2020-05-10 0:41 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy This series is based on the discussions in [1]. The idea is to make git-grep (and other commands, in the future) be able to restrict their output to the sparsity patterns, when requested by the user. Main changes since v1: In patch 1: - Remove two unnecessary references in git-grep.txt, as they are in the same document. Added patch 2. In patch 3: - Match paths directly against the sparsity patterns, when grepping a given tree, instead of checking the index. - Better handle searches with --recurse-submodules when the superproject and/or the submodule have sparse checkout enabled. And add tests for cases like these. - In tests, use the builtin git-sparse-checkout instead of manually writting to the sparse-checkout file. - Add tests for grepping in cone mode. - Rename the previous 'from_commit' parameter and a test name, to be more meaningful. Note: it was suggested to change some of the tests in this patch to use cone mode. I ended up using both cone mode and full patterns, so that we could check that grep behaves correctly when submodules have different pattern rules than the superproject. I tried to leave the testing repo's structure simple, though, so that the tests remain well readable. In patch 4: - Move the configuration that restrict cmds' behavior based on the sparse checkout from the 'grep' namespace to 'sparse', as the idea is to have the same setting affecting multiple cmds. - Add the --[no]-restrict-to-sparse-paths global option - Add more tests for the setting and CLI option in grep. - Add tests to ensure the submodules' values for the setting are respected when running grep with --recurse-submodules. Note: in this patch, I used the 'sparse' namespace, instead of 'core', following the idea we discussed in [2], to have the sparse checkout settings in their own namespace. We also talked about moving core.sparseCheckout and core.sparseCheckoutCone to the new namespace. I tried implementing this change in this same patchset (although, on second thought, it is probably better to do it in another one), but I still haven't managed to come up with a rename implementation that keeps good compatibility. The problems are the ones Elijah listed in [3]. So, for now, sparse.restrictCmds is the only setting in the 'sparse' namespace. But it won't be the only one for too long, as Stolee is already implementing other ones [4]. [1]: https://lore.kernel.org/git/CAHd-oW7e5qCuxZLBeVDq+Th3E+E4+P8=WzJfK8WcG2yz=n_nag@mail.gmail.com/t/#u [2]: https://lore.kernel.org/git/49c1e9a5-b234-1696-03cc-95bf95f4663c@gmail.com/ [3]: https://lore.kernel.org/git/CABPp-BGytfCugK0S99nLPH4_VXmcYPHWdVyLO59BZc4__4CT9w@mail.gmail.com/ [4]: https://lore.kernel.org/git/2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@gmail.com/ Matheus Tavares (4): doc: grep: unify info on configuration variables config: load the correct config.worktree file grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 10 +- Documentation/config/sparse.txt | 22 +++ Documentation/git-grep.txt | 37 +---- Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 137 +++++++++++++++- config.c | 5 +- contrib/completion/git-completion.bash | 2 + git.c | 6 + sparse-checkout.c | 16 ++ sparse-checkout.h | 11 ++ t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 216 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 15 files changed, 431 insertions(+), 51 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v1: 1: 7ba5caf10d ! 1: c344d22313 doc: grep: unify info on configuration variables @@ Commit message Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and - "Documentation/config/grep.txt". Let's unify the information in the - second file and include it in the first. + "Documentation/config/grep.txt", which can make maintenance difficult. + The first also contains a definition not present in the latter + (grep.fullName). To avoid problems like this, let's unify the + information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> @@ Documentation/config/grep.txt: grep.extendedRegexp:: grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. -+ Number of grep worker threads to use. See `--threads` in -+ linkgit:git-grep[1] for more information. ++ Number of grep worker threads to use. See `--threads` ++ifndef::git-grep[] ++ in linkgit:git-grep[1] ++endif::git-grep[] ++ for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. @@ Documentation/git-grep.txt: characters. An empty string as search expression ma - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - ++:git-grep: 1 +include::config/grep.txt[] OPTIONS @@ Documentation/git-grep.txt: providing this option will cause it to die. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the -+ `grep.threads` configuration (see linkgit:git-config[1]). ++ `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -: ---------- > 2: 882310b69f config: load the correct config.worktree file 2: 0b9b4c4b41 ! 3: e00674c727 grep: honor sparse checkout patterns @@ Commit message One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches - found outside this subset, which kind of goes in the oposity direction. + found outside this subset, which kind of goes in the opposite direction. Let's fix that, making it honor the sparsity boundaries for every grepping case: @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); -+ int from_commit); ++ int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + -+ if (ce_skip_worktree(ce)) ++ if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) + continue; + strbuf_setlen(&name, name_base_len); @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, + return hit; + } - static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, +-static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, +- struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) -+ int from_commit) ++static struct pattern_list *get_sparsity_patterns(struct repository *repo) ++{ ++ struct pattern_list *patterns; ++ char *sparse_file; ++ int sparse_config, cone_config; ++ ++ if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || ++ !sparse_config) { ++ return NULL; ++ } ++ ++ sparse_file = repo_git_path(repo, "info/sparse-checkout"); ++ patterns = xcalloc(1, sizeof(*patterns)); ++ ++ if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) ++ cone_config = 0; ++ patterns->use_cone_patterns = cone_config; ++ ++ if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { ++ if (file_exists(sparse_file)) { ++ warning(_("failed to load sparse-checkout file: '%s'"), ++ sparse_file); ++ } ++ free(sparse_file); ++ free(patterns); ++ return NULL; ++ } ++ ++ free(sparse_file); ++ return patterns; ++} ++ ++static int in_sparse_checkout(struct strbuf *path, int prefix_len, ++ unsigned int entry_mode, ++ struct index_state *istate, ++ struct pattern_list *sparsity, ++ enum pattern_match_result parent_match, ++ enum pattern_match_result *match) ++{ ++ int dtype = DT_UNKNOWN; ++ ++ if (S_ISGITLINK(entry_mode)) ++ return 1; ++ ++ if (parent_match == MATCHED_RECURSIVE) { ++ *match = parent_match; ++ return 1; ++ } ++ ++ if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) ++ strbuf_addch(path, '/'); ++ ++ *match = path_matches_pattern_list(path->buf, path->len, ++ path->buf + prefix_len, &dtype, ++ sparsity, istate); ++ if (*match == UNDECIDED) ++ *match = parent_match; ++ ++ if (S_ISDIR(entry_mode)) ++ strbuf_trim_trailing_dir_sep(path); ++ ++ if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || ++ (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) ++ return 0; ++ ++ return 1; ++} ++ ++static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, ++ struct tree_desc *tree, struct strbuf *base, int tn_len, ++ int check_attr, struct pattern_list *sparsity, ++ enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - name_base_len = name.len; - } -+ if (from_commit && repo_read_index(repo) < 0) -+ die(_("index file corrupt")); -+ while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); ++ enum pattern_match_result sparsity_match = 0; + if (match != all_entries_interesting) { + strbuf_addstr(&name, base->buf + tn_len); @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); -+ if (from_commit) { -+ int pos = index_name_pos(repo->index, -+ base->buf + tn_len, -+ base->len - tn_len); -+ if (pos >= 0 && -+ ce_skip_worktree(repo->index->cache[pos])) { ++ if (sparsity) { ++ struct strbuf path = STRBUF_INIT; ++ strbuf_addstr(&path, base->buf + tn_len); ++ ++ if (!in_sparse_checkout(&path, old_baselen - tn_len, ++ entry.mode, repo->index, ++ sparsity, default_sparsity_match, ++ &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, -- check_attr ? base->buf + tn_len : NULL); -+ from_commit ? base->buf + tn_len : NULL); - } else if (S_ISDIR(entry.mode)) { - enum object_type type; - struct tree_desc sub; + check_attr ? base->buf + tn_len : NULL); @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, +- hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); -+ from_commit); ++ hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, ++ check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, +@@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + return hit; + } + ++/* ++ * Note: sparsity patterns and paths' attributes will only be considered if ++ * is_root_tree has true value. (Otherwise, we cannot properly perform pattern ++ * matching on paths.) ++ */ ++static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, ++ struct tree_desc *tree, struct strbuf *base, int tn_len, ++ int is_root_tree) ++{ ++ struct pattern_list *patterns = NULL; ++ int ret; ++ ++ if (is_root_tree) ++ patterns = get_sparsity_patterns(opt->repo); ++ ++ ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, ++ patterns, 0); ++ ++ if (patterns) { ++ clear_pattern_list(patterns); ++ free(patterns); ++ } ++ return ret; ++} ++ + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, + struct object *obj, const char *name, const char *path) + { ## t/t7011-skip-worktree-reading.sh ## @@ t/t7011-skip-worktree-reading.sh: test_expect_success 'ls-files --modified' ' @@ t/t7817-grep-sparse-checkout.sh (new) + +test_description='grep in sparse checkout + -+This test creates the following dir structure: ++This test creates a repo with the following structure: ++ +. -+| - a -+| - b -+| - dir -+ | - c ++|-- a ++|-- b ++|-- dir ++| `-- c ++`-- sub ++ |-- A ++ | `-- a ++ `-- B ++ `-- b + -+Only "a" should be present due to the sparse checkout patterns: -+"/*", "!/b" and "!/dir". ++Where . has non-cone mode sparsity patterns and sub is a submodule with cone ++mode sparsity patterns. The resulting sparse-checkout should leave the following ++structure: ++ ++. ++|-- a ++`-- sub ++ `-- B ++ `-- b +' + +. ./test-lib.sh @@ t/t7817-grep-sparse-checkout.sh (new) + echo "text" >b && + mkdir dir && + echo "text" >dir/c && ++ ++ git init sub && ++ ( ++ cd sub && ++ mkdir A B && ++ echo "text" >A/a && ++ echo "text" >B/b && ++ git add A B && ++ git commit -m sub && ++ git sparse-checkout init --cone && ++ git sparse-checkout set B ++ ) && ++ ++ git submodule add ./sub && + git add a b dir && -+ git commit -m "initial commit" && ++ git commit -m super && ++ git sparse-checkout init --no-cone && ++ git sparse-checkout set "/*" "!b" "!/*/" && ++ + git tag -am t-commit t-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am t-tree t-tree $tree && -+ cat >.git/info/sparse-checkout <<-EOF && -+ /* -+ !/b -+ !/dir -+ EOF -+ git sparse-checkout init && ++ + test_path_is_missing b && + test_path_is_missing dir && -+ test_path_is_file a ++ test_path_is_missing sub/A && ++ test_path_is_file a && ++ test_path_is_file sub/B/b +' + +test_expect_success 'grep in working tree should honor sparse checkout' ' @@ t/t7817-grep-sparse-checkout.sh (new) + test_cmp expect_t-commit actual_t-commit +' + -+test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' ++test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && @@ t/t7817-grep-sparse-checkout.sh (new) + test_cmp expect_t-tree actual_t-tree +' + ++test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' ++ cat >expect <<-EOF && ++ a:text ++ sub/B/b:text ++ EOF ++ git grep --recurse-submodules --cached "text" >actual && ++ test_cmp expect actual ++' ++ ++test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' ++ commit=$(git rev-parse HEAD) && ++ cat >expect_commit <<-EOF && ++ $commit:a:text ++ $commit:sub/B/b:text ++ EOF ++ cat >expect_t-commit <<-EOF && ++ t-commit:a:text ++ t-commit:sub/B/b:text ++ EOF ++ git grep --recurse-submodules "text" $commit >actual_commit && ++ test_cmp expect_commit actual_commit && ++ git grep --recurse-submodules "text" t-commit >actual_t-commit && ++ test_cmp expect_t-commit actual_t-commit ++' ++ +test_done 3: a76242ecfa < -: ---------- grep: add option to ignore sparsity patterns -: ---------- > 4: 3e9e906249 config: add setting to ignore sparsity patterns in some cmds -- 2.26.2 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares @ 2020-05-10 0:41 ` Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares ` (3 subsequent siblings) 4 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-05-10 0:41 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which can make maintenance difficult. The first also contains a definition not present in the latter (grep.fullName). To avoid problems like this, let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 10 ++++++++-- Documentation/git-grep.txt | 36 ++++++----------------------------- 2 files changed, 14 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..dd51db38e1 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,14 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index a7f9bc99ea..9bdf807584 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,8 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +:git-grep: 1 +include::config/grep.txt[] OPTIONS ------- @@ -269,8 +243,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [RFC PATCH v2 2/4] config: load the correct config.worktree file 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-05-10 0:41 ` Matheus Tavares 2020-05-11 19:10 ` Junio C Hamano 2020-05-10 0:41 ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares ` (2 subsequent siblings) 4 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-10 0:41 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the steps in do_git_config_sequence() is to load the worktree-specific config file. Although the function receives a git_dir string, it relies on git_pathdup(), which uses the_repository->git_dir, to make the path to the file. Thus, when a submodule has a worktree setting, a command executed in the superproject that recurses into the submodule won't find the said setting. Such a scenario might not be needed now, but it will be in the following patch. git-grep will learn to honor sparse checkouts and, when running with --recurse-submodules, the submodule's sparse checkout settings must be loaded. As these settings are stored in the config.worktree file, they would be ignored without this patch. The fix is simple, we replace git_pathdup() with mkpathdup(), to format the path with the given git_dir. This is the same idea used to make the config.worktree path in setup.c:check_repository_format_gently(). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- config.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/config.c b/config.c index 8db9c77098..a3d0a0d266 100644 --- a/config.c +++ b/config.c @@ -1747,8 +1747,9 @@ static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); current_parsing_scope = CONFIG_SCOPE_WORKTREE; - if (!opts->ignore_worktree && repository_format_worktree_config) { - char *path = git_pathdup("config.worktree"); + if (!opts->ignore_worktree && repository_format_worktree_config && + opts->git_dir) { + char *path = mkpathdup("%s/config.worktree", opts->git_dir); if (!access_or_die(path, R_OK, 0)) ret += git_config_from_file(fn, path, data); free(path); -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 2/4] config: load the correct config.worktree file 2020-05-10 0:41 ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares @ 2020-05-11 19:10 ` Junio C Hamano 2020-05-12 22:55 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Junio C Hamano @ 2020-05-11 19:10 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, stolee, newren, jonathantanmy Matheus Tavares <matheus.bernardino@usp.br> writes: > One of the steps in do_git_config_sequence() is to load the > worktree-specific config file. Although the function receives a git_dir > string, it relies on git_pathdup(), which uses the_repository->git_dir, > to make the path to the file. Thus, when a submodule has a worktree > setting, a command executed in the superproject that recurses into the > submodule won't find the said setting. This has far wider ramifications than just "git grep" and it may be an important fix. Anything that wants to read from a per-worktree configuration is not working as expected when run from a secondary worktree, right? Can we add a test or two to protect this fix from future breakages? > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > - if (!opts->ignore_worktree && repository_format_worktree_config) { > - char *path = git_pathdup("config.worktree"); > + if (!opts->ignore_worktree && repository_format_worktree_config && > + opts->git_dir) { > + char *path = mkpathdup("%s/config.worktree", opts->git_dir); > if (!access_or_die(path, R_OK, 0)) > ret += git_config_from_file(fn, path, data); > free(path); ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 2/4] config: load the correct config.worktree file 2020-05-11 19:10 ` Junio C Hamano @ 2020-05-12 22:55 ` Matheus Tavares Bernardino 2020-05-12 23:22 ` Junio C Hamano 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-12 22:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Derrick Stolee, Elijah Newren, Jonathan Tan On Mon, May 11, 2020 at 4:10 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > One of the steps in do_git_config_sequence() is to load the > > worktree-specific config file. Although the function receives a git_dir > > string, it relies on git_pathdup(), which uses the_repository->git_dir, > > to make the path to the file. Thus, when a submodule has a worktree > > setting, a command executed in the superproject that recurses into the > > submodule won't find the said setting. > > This has far wider ramifications than just "git grep" and it may be > an important fix. Anything that wants to read from a per-worktree > configuration is not working as expected when run from a secondary > worktree, right? Hmm, I think the code should be able to retrieve the per-worktree configuration, in this case, as the_repository->gitdir will be pointing to the secondary worktree's gitdir. But when we want to read a per-worktree configuration from a repo other than the_repository, then the code doesn't find the setting (even if it is in the main worktree of the subrepo). > Can we add a test or two to protect this fix from future breakages? Sure! There are already a couple tests, in the following patch, that check this behavior *indirectly*. As we recurse into submodules, in grep, we try to retrieve the core.sparseCheckout setting for each submodule (which is stored in the subrepo's config.worktree file). The said tests make sure we can get this setting, and they indeed fail without this patch. But would it be better to also add a more direct test, in this patch? I think we could do so by adding a new test helper that prints submodules' configs, from the superproject, and then testing the presence of per-worktree configs in the output. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 2/4] config: load the correct config.worktree file 2020-05-12 22:55 ` Matheus Tavares Bernardino @ 2020-05-12 23:22 ` Junio C Hamano 0 siblings, 0 replies; 123+ messages in thread From: Junio C Hamano @ 2020-05-12 23:22 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, Derrick Stolee, Elijah Newren, Jonathan Tan Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: >> Can we add a test or two to protect this fix from future breakages? > > Sure! There are already a couple tests, in the following patch, that > check this behavior *indirectly*. As we recurse into submodules, in > grep, we try to retrieve the core.sparseCheckout setting for each > submodule (which is stored in the subrepo's config.worktree file). The > said tests make sure we can get this setting, and they indeed fail > without this patch. But would it be better to also add a more direct > test, in this patch? I think we could do so by adding a new test > helper that prints submodules' configs, from the superproject, and > then testing the presence of per-worktree configs in the output. Sounds like a plan. Yes, checking by observing how grep that recurses into submodules behave is doable but is indirect, and if any other subcommand that may want to do the recursion will have the same issue that gets fixed by this patch, it's better to ensure that the fix applies to any subcommand in a more direct way. Thanks. ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares @ 2020-05-10 0:41 ` Matheus Tavares 2020-05-11 19:35 ` Junio C Hamano 2020-05-10 0:41 ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares 4 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-10 0:41 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches found outside this subset, which kind of goes in the opposite direction. Let's fix that, making it honor the sparsity boundaries for every grepping case: - git grep in worktree - git grep --cached - git grep $REVISION - git grep --untracked and git grep --no-index (which already respect sparse checkout boundaries) This is also what some users reported[1] they would want as the default behavior. Note: for `git grep $REVISION`, we will choose to honor the sparsity patterns only when $REVISION is a commit-ish object. The reason is that, for a tree, we don't know whether it represents the root of a repository or a subtree. So we wouldn't be able to correctly match it against the sparsity patterns. E.g. suppose we have a repository with these two sparsity rules: "/*" and "!/a"; and the following structure: / | - a (file) | - d (dir) | - a (file) If `git grep $REVISION` were to honor the sparsity patterns for every object type, when grepping the /d tree, we would wrongly ignore the /d/a file. This happens because we wouldn't know it resides in /d and therefore it would wrongly match the pattern "!/a". Furthermore, for a search in a blob object, we wouldn't even have a path to check the patterns against. So, let's ignore the sparsity patterns when grepping non-commit-ish objects (tags to commits should be fine). Finally, the old behavior may still be desirable for some use cases. So the next patch will add an option to allow restoring it when needed. [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Note: as I mentioned in the cover letter, the tests in this patch now contain both cone mode and full pattern sparse checkouts. This was done for two reasons: To test grep's behavior when searching with --recurse-submodules and having submodules with different pattern sets than the superproject (which was incorrect in my first implementation). And to test the direct pattern matching in grep_tree(), using both modes. builtin/grep.c | 127 ++++++++++++++++++++++++++-- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 140 +++++++++++++++++++++++++++++++ 3 files changed, 259 insertions(+), 17 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index a5056f395a..91ee0b2734 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -552,9 +555,78 @@ static int grep_cache(struct grep_opt *opt, return hit; } -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) +static struct pattern_list *get_sparsity_patterns(struct repository *repo) +{ + struct pattern_list *patterns; + char *sparse_file; + int sparse_config, cone_config; + + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || + !sparse_config) { + return NULL; + } + + sparse_file = repo_git_path(repo, "info/sparse-checkout"); + patterns = xcalloc(1, sizeof(*patterns)); + + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) + cone_config = 0; + patterns->use_cone_patterns = cone_config; + + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { + if (file_exists(sparse_file)) { + warning(_("failed to load sparse-checkout file: '%s'"), + sparse_file); + } + free(sparse_file); + free(patterns); + return NULL; + } + + free(sparse_file); + return patterns; +} + +static int in_sparse_checkout(struct strbuf *path, int prefix_len, + unsigned int entry_mode, + struct index_state *istate, + struct pattern_list *sparsity, + enum pattern_match_result parent_match, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + + if (S_ISGITLINK(entry_mode)) + return 1; + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + + if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, + path->buf + prefix_len, &dtype, + sparsity, istate); + if (*match == UNDECIDED) + *match = parent_match; + + if (S_ISDIR(entry_mode)) + strbuf_trim_trailing_dir_sep(path); + + if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || + (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) + return 0; + + return 1; +} + +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int check_attr, struct pattern_list *sparsity, + enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ -570,6 +642,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); + enum pattern_match_result sparsity_match = 0; if (match != all_entries_interesting) { strbuf_addstr(&name, base->buf + tn_len); @@ -586,6 +659,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (sparsity) { + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + + if (!in_sparse_checkout(&path, old_baselen - tn_len, + entry.mode, repo->index, + sparsity, default_sparsity_match, + &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); @@ -602,8 +688,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, + check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, @@ -621,6 +707,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, return hit; } +/* + * Note: sparsity patterns and paths' attributes will only be considered if + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern + * matching on paths.) + */ +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int is_root_tree) +{ + struct pattern_list *patterns = NULL; + int ret; + + if (is_root_tree) + patterns = get_sparsity_patterns(opt->repo); + + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, + patterns, 0); + + if (patterns) { + clear_pattern_list(patterns); + free(patterns); + } + return ret; +} + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct object *obj, const char *name, const char *path) { diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..3bd67082eb --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,140 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +`-- sub + |-- A + | `-- a + `-- B + `-- b + +Where . has non-cone mode sparsity patterns and sub is a submodule with cone +mode sparsity patterns. The resulting sparse-checkout should leave the following +structure: + +. +|-- a +`-- sub + `-- B + `-- b +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git submodule add ./sub && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" && + + git tag -am t-commit t-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am t-tree t-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b +' + +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_t-tree <<-EOF && + t-tree:a:text + t-tree:b:text + t-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" t-tree >actual_t-tree && + test_cmp expect_t-tree actual_t-tree +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:sub/B/b:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + t-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep --recurse-submodules "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit +' + +test_done -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-10 0:41 ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares @ 2020-05-11 19:35 ` Junio C Hamano 2020-05-13 0:05 ` Matheus Tavares Bernardino 2020-05-21 7:36 ` Elijah Newren 0 siblings, 2 replies; 123+ messages in thread From: Junio C Hamano @ 2020-05-11 19:35 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, stolee, newren, jonathantanmy Matheus Tavares <matheus.bernardino@usp.br> writes: > One of the main uses for a sparse checkout is to allow users to focus on > the subset of files in a repository in which they are interested. But > git-grep currently ignores the sparsity patterns and report all matches > found outside this subset, which kind of goes in the opposite direction. > Let's fix that, making it honor the sparsity boundaries for every > grepping case: > > - git grep in worktree > - git grep --cached > - git grep $REVISION It makes sense for these to be limited within the "sparse" area. > - git grep --untracked and git grep --no-index (which already respect > sparse checkout boundaries) I can understand the former; those untracked files are what _could_ be brought into attention by "git add", so limiting to the same "sparse" area may make sense. I am not sure about the latter, though, as "--no-index" is an explicit request to pretend that we are dealing with a random collection of files, not managed in a git repository. But perhaps there is a similar justification like how "--untracked" is unjustifiable. I dunno. > diff --git a/builtin/grep.c b/builtin/grep.c > index a5056f395a..91ee0b2734 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, > const struct pathspec *pathspec, int cached); > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr); > + int is_root_tree); > > static int grep_submodule(struct grep_opt *opt, > const struct pathspec *pathspec, > @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > + > + if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) > + continue; Hmph. Why exclude gitlink from this rule? If a submodule sits at a path that is excluded by the sparse pattern, should we still recurse into it? > strbuf_setlen(&name, name_base_len); > strbuf_addstr(&name, ce->name); > > @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, > * cache entry are identical, even if worktree file has > * been modified, so use cache version instead > */ > - if (cached || (ce->ce_flags & CE_VALID) || > - ce_skip_worktree(ce)) { > + if (cached || (ce->ce_flags & CE_VALID)) { > if (ce_stage(ce) || ce_intent_to_add(ce)) > continue; > hit |= grep_oid(opt, &ce->oid, name.buf, > @@ -552,9 +555,78 @@ static int grep_cache(struct grep_opt *opt, > return hit; > } > > -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > - struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr) > +static struct pattern_list *get_sparsity_patterns(struct repository *repo) > +{ > + struct pattern_list *patterns; > + char *sparse_file; > + int sparse_config, cone_config; > + > + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || > + !sparse_config) { > + return NULL; > + } > + > + sparse_file = repo_git_path(repo, "info/sparse-checkout"); > + patterns = xcalloc(1, sizeof(*patterns)); > + > + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) > + cone_config = 0; > + patterns->use_cone_patterns = cone_config; > + > + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { > + if (file_exists(sparse_file)) { > + warning(_("failed to load sparse-checkout file: '%s'"), > + sparse_file); > + } > + free(sparse_file); > + free(patterns); > + return NULL; > + } > + > + free(sparse_file); > + return patterns; > +} > + > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, > + unsigned int entry_mode, > + struct index_state *istate, > + struct pattern_list *sparsity, > + enum pattern_match_result parent_match, > + enum pattern_match_result *match) > +{ > + int dtype = DT_UNKNOWN; > + > + if (S_ISGITLINK(entry_mode)) > + return 1; This is consistent with the "we do not care where a gitlink appears---submodules are always descended into, regardless of the sparse definition" decision we saw earlier, I think. I am not sure if that is a good design in the first place, though. > + if (parent_match == MATCHED_RECURSIVE) { > + *match = parent_match; > + return 1; > + } > + > + if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) > + strbuf_addch(path, '/'); > + > + *match = path_matches_pattern_list(path->buf, path->len, > + path->buf + prefix_len, &dtype, > + sparsity, istate); > + if (*match == UNDECIDED) > + *match = parent_match; > + > + if (S_ISDIR(entry_mode)) > + strbuf_trim_trailing_dir_sep(path); > + > + if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || > + (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) > + return 0; > + > + return 1; > +} > +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > + struct tree_desc *tree, struct strbuf *base, int tn_len, > + int check_attr, struct pattern_list *sparsity, > + enum pattern_match_result default_sparsity_match) > { > struct repository *repo = opt->repo; > int hit = 0; > @@ -570,6 +642,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > while (tree_entry(tree, &entry)) { > int te_len = tree_entry_len(&entry); > + enum pattern_match_result sparsity_match = 0; > > if (match != all_entries_interesting) { > strbuf_addstr(&name, base->buf + tn_len); > @@ -586,6 +659,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_add(base, entry.path, te_len); > > + if (sparsity) { > + struct strbuf path = STRBUF_INIT; > + strbuf_addstr(&path, base->buf + tn_len); > + > + if (!in_sparse_checkout(&path, old_baselen - tn_len, > + entry.mode, repo->index, > + sparsity, default_sparsity_match, > + &sparsity_match)) { > + strbuf_setlen(base, old_baselen); > + continue; > + } > + } OK. > if (S_ISREG(entry.mode)) { > hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, > check_attr ? base->buf + tn_len : NULL); > @@ -602,8 +688,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_addch(base, '/'); > init_tree_desc(&sub, data, size); > - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, > - check_attr); > + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, > + check_attr, sparsity, sparsity_match); > free(data); > } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { > hit |= grep_submodule(opt, pathspec, &entry.oid, > @@ -621,6 +707,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > return hit; > } > > +/* > + * Note: sparsity patterns and paths' attributes will only be considered if > + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern > + * matching on paths.) > + */ > +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > + struct tree_desc *tree, struct strbuf *base, int tn_len, > + int is_root_tree) > +{ > + struct pattern_list *patterns = NULL; > + int ret; > + > + if (is_root_tree) > + patterns = get_sparsity_patterns(opt->repo); > + > + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, > + patterns, 0); > + > + if (patterns) { > + clear_pattern_list(patterns); > + free(patterns); > + } OK, it is not like this codepath is driven by "git log" to grep from top-level tree objects of many commits, so it is OK to grab the sparsity patterns once before do_grep_tree() and discard it when we are done. > + return ret; > +} > + > static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, > struct object *obj, const char *name, const char *path) > { > diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh > index 37525cae3a..26852586ac 100755 > --- a/t/t7011-skip-worktree-reading.sh > +++ b/t/t7011-skip-worktree-reading.sh > @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' > test -z "$(git ls-files -m)" > ' > > -test_expect_success 'grep with skip-worktree file' ' > - git update-index --no-skip-worktree 1 && > - echo test > 1 && > - git update-index 1 && > - git update-index --skip-worktree 1 && > - rm 1 && > - test "$(git grep --no-ext-grep test)" = "1:test" > -' > - > echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected > test_expect_success 'diff-index does not examine skip-worktree absent entries' ' > setup_absent && > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > new file mode 100755 > index 0000000000..3bd67082eb > --- /dev/null > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -0,0 +1,140 @@ > +#!/bin/sh > + > +test_description='grep in sparse checkout > + > +This test creates a repo with the following structure: > + > +. > +|-- a > +|-- b > +|-- dir > +| `-- c > +`-- sub > + |-- A > + | `-- a > + `-- B > + `-- b > + > +Where . has non-cone mode sparsity patterns and sub is a submodule with cone > +mode sparsity patterns. The resulting sparse-checkout should leave the following > +structure: > + > +. > +|-- a > +`-- sub > + `-- B > + `-- b > +' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + echo "text" >a && > + echo "text" >b && > + mkdir dir && > + echo "text" >dir/c && > + > + git init sub && > + ( > + cd sub && > + mkdir A B && > + echo "text" >A/a && > + echo "text" >B/b && > + git add A B && > + git commit -m sub && > + git sparse-checkout init --cone && > + git sparse-checkout set B > + ) && > + > + git submodule add ./sub && > + git add a b dir && > + git commit -m super && > + git sparse-checkout init --no-cone && > + git sparse-checkout set "/*" "!b" "!/*/" && > + > + git tag -am t-commit t-commit HEAD && > + tree=$(git rev-parse HEAD^{tree}) && > + git tag -am t-tree t-tree $tree && > + > + test_path_is_missing b && > + test_path_is_missing dir && > + test_path_is_missing sub/A && > + test_path_is_file a && > + test_path_is_file sub/B/b > +' > + > +test_expect_success 'grep in working tree should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --cached should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep --cached "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + EOF > + cat >expect_t-commit <<-EOF && > + t-commit:a:text > + EOF > + git grep "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git grep "text" t-commit >actual_t-commit && > + test_cmp expect_t-commit actual_t-commit > +' > + > +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' > + commit=$(git rev-parse HEAD) && > + tree=$(git rev-parse HEAD^{tree}) && > + cat >expect_tree <<-EOF && > + $tree:a:text > + $tree:b:text > + $tree:dir/c:text > + EOF > + cat >expect_t-tree <<-EOF && > + t-tree:a:text > + t-tree:b:text > + t-tree:dir/c:text > + EOF > + git grep "text" $tree >actual_tree && > + test_cmp expect_tree actual_tree && > + git grep "text" t-tree >actual_t-tree && > + test_cmp expect_t-tree actual_t-tree > +' > + > +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' > + cat >expect <<-EOF && > + a:text > + sub/B/b:text > + EOF > + git grep --recurse-submodules --cached "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:sub/B/b:text > + EOF > + cat >expect_t-commit <<-EOF && > + t-commit:a:text > + t-commit:sub/B/b:text > + EOF > + git grep --recurse-submodules "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git grep --recurse-submodules "text" t-commit >actual_t-commit && > + test_cmp expect_t-commit actual_t-commit > +' > + > +test_done ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-11 19:35 ` Junio C Hamano @ 2020-05-13 0:05 ` Matheus Tavares Bernardino 2020-05-13 0:17 ` Junio C Hamano 2020-05-21 7:36 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-13 0:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Derrick Stolee, Elijah Newren, Jonathan Tan On Mon, May 11, 2020 at 4:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > One of the main uses for a sparse checkout is to allow users to focus on > > the subset of files in a repository in which they are interested. But > > git-grep currently ignores the sparsity patterns and report all matches > > found outside this subset, which kind of goes in the opposite direction. > > Let's fix that, making it honor the sparsity boundaries for every > > grepping case: > > > > - git grep in worktree > > - git grep --cached > > - git grep $REVISION > > It makes sense for these to be limited within the "sparse" area. > > > - git grep --untracked and git grep --no-index (which already respect > > sparse checkout boundaries) > > I can understand the former; those untracked files are what _could_ > be brought into attention by "git add", so limiting to the same > "sparse" area may make sense. > > I am not sure about the latter, though, as "--no-index" is an > explicit request to pretend that we are dealing with a random > collection of files, not managed in a git repository. But perhaps > there is a similar justification like how "--untracked" is > unjustifiable. I dunno. Yeah, I think there was no need to mention those two cases here. My intention was to say that, in these cases, we should stick to the files that are present in the working tree (which should match the sparsity patterns + untracked {and ignored, in --no-index}), as opposed to how the worktree grep used to behave until now, falling back to the cache on files excluded by the sparse checkout. > > diff --git a/builtin/grep.c b/builtin/grep.c > > index a5056f395a..91ee0b2734 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, > > const struct pathspec *pathspec, int cached); > > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > struct tree_desc *tree, struct strbuf *base, int tn_len, > > - int check_attr); > > + int is_root_tree); > > > > static int grep_submodule(struct grep_opt *opt, > > const struct pathspec *pathspec, > > @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, > > > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > > const struct cache_entry *ce = repo->index->cache[nr]; > > + > > + if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) > > + continue; > > Hmph. Why exclude gitlink from this rule? If a submodule sits at a > path that is excluded by the sparse pattern, should we still recurse > into it? The idea behind not skipping gitlinks here was to be compliant with what we have in the working tree. In 4fd683b ("sparse-checkout: document interactions with submodules"), we decided that, if the sparse-checkout patterns exclude a submodule, the submodule would still appear in the working tree. The purpose was to keep these features (submodules and sparse-checkout) independent. Along the same lines, I think we should always recurse into initialized submodules in grep, and then load their own sparsity patterns, to decide what should be grepped within. [...] > > +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > + struct tree_desc *tree, struct strbuf *base, int tn_len, > > + int is_root_tree) > > +{ > > + struct pattern_list *patterns = NULL; > > + int ret; > > + > > + if (is_root_tree) > > + patterns = get_sparsity_patterns(opt->repo); > > + > > + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, > > + patterns, 0); > > + > > + if (patterns) { > > + clear_pattern_list(patterns); > > + free(patterns); > > + } > > OK, it is not like this codepath is driven by "git log" to grep from > top-level tree objects of many commits, so it is OK to grab the > sparsity patterns once before do_grep_tree() and discard it when we > are done. Yeah. A possible performance problem here would be when users pass many trees to git-grep (since we are reloading the pattern lists, from both the_repository and submodules, for each tree). But, as Elijah pointed out [1], the cases where this overhead might be somewhat noticeable should be very rare. [1]: https://lore.kernel.org/git/CABPp-BGUf-4exGW23xka1twf2D=nFOz1CkD_f-rDX_AGdVEeDA@mail.gmail.com/ ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-13 0:05 ` Matheus Tavares Bernardino @ 2020-05-13 0:17 ` Junio C Hamano 2020-05-21 7:26 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Junio C Hamano @ 2020-05-13 0:17 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, Derrick Stolee, Elijah Newren, Jonathan Tan Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > The idea behind not skipping gitlinks here was to be compliant with > what we have in the working tree. In 4fd683b ("sparse-checkout: > document interactions with submodules"), we decided that, if the > sparse-checkout patterns exclude a submodule, the submodule would > still appear in the working tree. The purpose was to keep these > features (submodules and sparse-checkout) independent. Along the same > lines, I think we should always recurse into initialized submodules in > grep, and then load their own sparsity patterns, to decide what should > be grepped within. OK. I do not necessarily agree with the justification described in 4fd683b (e.g. "would easily cause problems." that is not substantiated is merely an opinion), but I do agree with you that the new code in "git grep" we are discussing here does behave in line with that design. Thanks. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-13 0:17 ` Junio C Hamano @ 2020-05-21 7:26 ` Elijah Newren 2020-05-21 17:35 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-21 7:26 UTC (permalink / raw) To: Junio C Hamano Cc: Matheus Tavares Bernardino, git, Derrick Stolee, Jonathan Tan On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > > > The idea behind not skipping gitlinks here was to be compliant with > > what we have in the working tree. In 4fd683b ("sparse-checkout: > > document interactions with submodules"), we decided that, if the > > sparse-checkout patterns exclude a submodule, the submodule would > > still appear in the working tree. The purpose was to keep these > > features (submodules and sparse-checkout) independent. Along the same > > lines, I think we should always recurse into initialized submodules in Sorry if I missed it in the code, but do you check whether the submodule is initialized before descending into it, or do you descend into it based on it just being a submodule? > > grep, and then load their own sparsity patterns, to decide what should > > be grepped within. > > OK. > > I do not necessarily agree with the justification described in > 4fd683b (e.g. "would easily cause problems." that is not > substantiated is merely an opinion), but I do agree with you that > the new code in "git grep" we are discussing here does behave in > line with that design. > > Thanks. I'm also a little worried by 4fd683b; are we headed towards a circular reasoning of some sort? In particular, sparse-checkout was written assuming submodules might already be checked out. I can see how un-checking-out an existing submodule could raise fears of losing untracked or ignored files within it, or stuff stored on other branches, etc. But that's not the only relevant case. What if someone runs: git clone --recurse-submodules --sparse=moduleA git.hosting.site:my/repo.git In such a case, we don't have already checked out submodules. Obviously, we should clone submodules that are within our sparsity paths. But should we automatically clone the submodules outside our sparsity paths? The the logic presented in 4fd683b makes this completely ambiguous. ("It will appear if it's initialized." Okay, but do we initialize it?) You may say that clone doesn't have a --sparse= flag right now. So let me change the example slightly. What if someone runs git checkout --recurse-submodules $otherBranch and $otherBranch adds a new submodule somewhere deep under a directory excluded by the sparsity patterns (i.e. deep within a directory we aren't interested in and don't have checked out). Should the submodule be checked out, i.e. should it be initialized? Commit 4fd683b only says it will appear if it's initialized, but my whole question is should we initialize it? ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-21 7:26 ` Elijah Newren @ 2020-05-21 17:35 ` Matheus Tavares Bernardino 2020-05-21 17:52 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-21 17:35 UTC (permalink / raw) To: Elijah Newren; +Cc: Junio C Hamano, git, Derrick Stolee, Jonathan Tan On Thu, May 21, 2020 at 4:26 AM Elijah Newren <newren@gmail.com> wrote: > > On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > > > > > The idea behind not skipping gitlinks here was to be compliant with > > > what we have in the working tree. In 4fd683b ("sparse-checkout: > > > document interactions with submodules"), we decided that, if the > > > sparse-checkout patterns exclude a submodule, the submodule would > > > still appear in the working tree. The purpose was to keep these > > > features (submodules and sparse-checkout) independent. Along the same > > > lines, I think we should always recurse into initialized submodules in > > Sorry if I missed it in the code, but do you check whether the > submodule is initialized before descending into it, or do you descend > into it based on it just being a submodule? We only descend if the submodule is initialized. The new code in this patch doesn't do this check, but it is already implemented in grep_submodule() (which is called by grep_tree() and grep_cache() when a submodule is found). ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-21 17:35 ` Matheus Tavares Bernardino @ 2020-05-21 17:52 ` Elijah Newren 2020-05-22 5:49 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-21 17:52 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Junio C Hamano, git, Derrick Stolee, Jonathan Tan On Thu, May 21, 2020 at 10:36 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Thu, May 21, 2020 at 4:26 AM Elijah Newren <newren@gmail.com> wrote: > > > > On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > > > > > > > The idea behind not skipping gitlinks here was to be compliant with > > > > what we have in the working tree. In 4fd683b ("sparse-checkout: > > > > document interactions with submodules"), we decided that, if the > > > > sparse-checkout patterns exclude a submodule, the submodule would > > > > still appear in the working tree. The purpose was to keep these > > > > features (submodules and sparse-checkout) independent. Along the same > > > > lines, I think we should always recurse into initialized submodules in > > > > Sorry if I missed it in the code, but do you check whether the > > submodule is initialized before descending into it, or do you descend > > into it based on it just being a submodule? > > We only descend if the submodule is initialized. The new code in this > patch doesn't do this check, but it is already implemented in > grep_submodule() (which is called by grep_tree() and grep_cache() when > a submodule is found). Good to know. To up the ante a bit: What if another branch has directory that doesn't exist in HEAD or the current checkout, and within that directory is a submodule. Would it be recursed into? What if it matched the sparsity paths? (Is it even possible to recurse into it?) ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-21 17:52 ` Elijah Newren @ 2020-05-22 5:49 ` Matheus Tavares Bernardino 2020-05-22 14:26 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-22 5:49 UTC (permalink / raw) To: Elijah Newren; +Cc: Junio C Hamano, git, Derrick Stolee, Jonathan Tan On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@gmail.com> wrote: > > On Thu, May 21, 2020 at 10:36 AM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > On Thu, May 21, 2020 at 4:26 AM Elijah Newren <newren@gmail.com> wrote: > > > > > > On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > > > > > Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > > > > > > > > > The idea behind not skipping gitlinks here was to be compliant with > > > > > what we have in the working tree. In 4fd683b ("sparse-checkout: > > > > > document interactions with submodules"), we decided that, if the > > > > > sparse-checkout patterns exclude a submodule, the submodule would > > > > > still appear in the working tree. The purpose was to keep these > > > > > features (submodules and sparse-checkout) independent. Along the same > > > > > lines, I think we should always recurse into initialized submodules in > > > > > > Sorry if I missed it in the code, but do you check whether the > > > submodule is initialized before descending into it, or do you descend > > > into it based on it just being a submodule? > > > > We only descend if the submodule is initialized. The new code in this > > patch doesn't do this check, but it is already implemented in > > grep_submodule() (which is called by grep_tree() and grep_cache() when > > a submodule is found). > > Good to know. To up the ante a bit: What if another branch has > directory that doesn't exist in HEAD or the current checkout, and > within that directory is a submodule. Would it be recursed into? In this case, `git grep --recurse-submodules <pattern> $branch` will recurse into the submodule, but only if it has already been initialized. I.e. if we have checked out to $branch, ran `git submodule init` and then checked out back. > What if it matched the sparsity paths? (Is it even possible to > recurse into it?) That's a great question. The idea that I tried to implement is to always recurse into _initialized_ submodules (even the ones excluded by the superproject's sparsity patterns) and, then, follow their own sparsity patterns inside. I'm not necessarily in favor (or against) this behavior, but this seemed to be the most compatible way with the design we describe in our docs: "If your sparse-checkout patterns exclude an initialized submodule, then that submodule will still appear in your working directory." (in git-sparse-checkout.txt) So, back to the original question, if you run `git grep --recurse-submodules <pattern> $branch` and $branch contains a submodule which was previously initialized, git-grep _would_ recurse into it, even if it (or its parent dir) was excluded. However, your question helped me notice an inconsistency in my patch: the behavior I just described is working for the full pattern set, but not in cone mode. That's because, in cone mode, we can mark the whole submodule's parent dir as excluded. Then, path_matches_pattern_list() will return NOT_MATCHED for the parent dir and we won't recurse into it, so we won't even get to the submodule's path to discover that it refers to a gitlink. Therefore, if we decide to keep the behavior of always recursing into submodules, we will need some extra work for the cone mode. I.e. grep_tree() will have to check if NOT_MATCHED directories contain submodules before discarding them, and recurse only into the submodules if so. As for the implementation, the first idea that came to my mind was to list the submodules' pathnames and do prefix matching for each submodule and NOT_MATCHED dir. But the places I've seen such submodule listings in the code base so far [1] seem to work only in the current branch. My second idea was to continue the tree walk when we hit NOT_MATCHED dir entries, but not doing any work, just looking for possible gitlinks to recurse into. I'm not sure if that could negatively affect the execution time, though. Does this seem like a good approach? Or is there another solution that I have not considered? Or even further, should we choose to skip the submodules in excluded paths? My only concern in this case is that it would be contrary to the design in git-sparse-checkout.txt. And the working tree grep and cached grep would differ even on a clean working tree. [1]: builtin/submodule--helper.c:module_list_compute() and submodule-config.c:config_from_gitmodules() ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-22 5:49 ` Matheus Tavares Bernardino @ 2020-05-22 14:26 ` Elijah Newren 2020-05-22 15:36 ` Elijah Newren 2020-06-10 11:40 ` Derrick Stolee 0 siblings, 2 replies; 123+ messages in thread From: Elijah Newren @ 2020-05-22 14:26 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, Junio C Hamano, Derrick Stolee, Jonathan Tan, Elijah Newren Hi Matheus, On Thu, May 21, 2020 at 10:49 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@gmail.com> wrote: > > <snip> > > Good to know. To up the ante a bit: What if another branch has > > directory that doesn't exist in HEAD or the current checkout, and > > within that directory is a submodule. Would it be recursed into? > > In this case, `git grep --recurse-submodules <pattern> $branch` will > recurse into the submodule, but only if it has already been > initialized. I.e. if we have checked out to $branch, ran `git > submodule init` and then checked out back. > > > What if it matched the sparsity paths? (Is it even possible to > > recurse into it?) > > That's a great question. The idea that I tried to implement is to > always recurse into _initialized_ submodules (even the ones excluded > by the superproject's sparsity patterns) and, then, follow their own > sparsity patterns inside. I'm not necessarily in favor (or against) > this behavior, but this seemed to be the most compatible way with the > design we describe in our docs: > > "If your sparse-checkout patterns exclude an initialized submodule, > then that submodule will still appear in your working directory." (in > git-sparse-checkout.txt) > > So, back to the original question, if you run `git grep > --recurse-submodules <pattern> $branch` and $branch contains a > submodule which was previously initialized, git-grep _would_ recurse > into it, even if it (or its parent dir) was excluded. However, your > question helped me notice an inconsistency in my patch: the behavior I > just described is working for the full pattern set, but not in cone > mode. That's because, in cone mode, we can mark the whole submodule's > parent dir as excluded. Then, path_matches_pattern_list() will return > NOT_MATCHED for the parent dir and we won't recurse into it, so we > won't even get to the submodule's path to discover that it refers to a > gitlink. > > Therefore, if we decide to keep the behavior of always recursing into > submodules, we will need some extra work for the cone mode. I.e. > grep_tree() will have to check if NOT_MATCHED directories contain > submodules before discarding them, and recurse only into the > submodules if so. As for the implementation, the first idea that came > to my mind was to list the submodules' pathnames and do prefix > matching for each submodule and NOT_MATCHED dir. But the places I've > seen such submodule listings in the code base so far [1] seem to work > only in the current branch. My second idea was to continue the tree > walk when we hit NOT_MATCHED dir entries, but not doing any work, just > looking for possible gitlinks to recurse into. I'm not sure if that > could negatively affect the execution time, though. > > Does this seem like a good approach? Or is there another solution that > I have not considered? Or even further, should we choose to skip the > submodules in excluded paths? My only concern in this case is that it > would be contrary to the design in git-sparse-checkout.txt. And the > working tree grep and cached grep would differ even on a clean working > tree. To be honest, I think it sounds insane. What you propose does make sense if you take what was written in git-sparse-checkout.txt very literally and as though it was a core design principle meant to cover all cases but I do not think it merits such a standing at all. I think it should be treated as a first draft attempt to explain interactions that was written solely with the 'checkout' case in mind, especially since it was written at the same approximate time that this was written earlier in the same file: THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. Anyway, the wording in that file seems to be really important, so let's fix it. -- >8 -- Subject: [PATCH] git-sparse-checkout: clarify interactions with submodules Ignoring the sparse-checkout feature momentarily, if one has a submodule and creates local branches within it with unpushed changes and maybe adds some untracked files to it, then we would want to avoid accidentally removing such a submodule. So, for example with git.git, if you run git checkout v2.13.0 then the sha1collisiondetection/ submodule is NOT removed even though it did not exist as a submodule until v2.14.0. Similarly, if you only had v2.13.0 checked out previously and ran git checkout v2.14.0 the sha1collisiondetection/ submodule would NOT be automatically initialized despite being part of v2.14.0. In both cases, git requires submodules to be initialized or deinitialized separately. Further, we also have special handling for submodules in other commands such as clean, which requires two --force flags to delete untracked submodules, and some commands have a --recurse-submodules flag. sparse-checkout is very similar to checkout, as evidenced by the similar name -- it adds and removes files from the working copy. However, for the same avoid-data-loss reasons we do not want to remove a submodule from the working copy with checkout, we do not want to do it with sparse-checkout either. So submodules need to be separately initialized or deinitialized; changing sparse-checkout rules should not automatically trigger the removal or vivification of submodules. I believe the previous wording in git-sparse-checkout.txt about submodules was only about this particular issue. Unfortunately, the previous wording could be interpreted to imply that submodules should be considered active regardless of sparsity patterns. Update the wording to avoid making such an implication. It may be helpful to consider two example situations where the differences in wording become important: In the future, we want users to be able to run commands like git clone --sparse=moduleA --recurse-submodules $REPO_URL and have sparsity paths automatically set up and have submodules *within the sparsity paths* be automatically initialized. We do not want all submodules in any path to be automatically initialized with that command. Similarly, we want to be able to do things like git -c sparse.restrictCmds grep --recurse-submodules $REV $PATTERN and search through $REV for $PATTERN within the recorded sparsity patterns. We want it to recurse into submodules within those sparsity patterns, but do not want to recurse into directories that do not match the sparsity patterns in search of a possible submodule. Signed-off-by: Elijah Newren <newren@gmail.com> --- Documentation/git-sparse-checkout.txt | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index c0342e5393..7dde2d330c 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -190,10 +190,23 @@ directory. SUBMODULES ---------- -If your repository contains one or more submodules, then those submodules will -appear based on which you initialized with the `git submodule` command. If -your sparse-checkout patterns exclude an initialized submodule, then that -submodule will still appear in your working directory. +If your repository contains one or more submodules, then those submodules +will appear based on which you initialized with the `git submodule` +command. Submodules may have additional untracked files or code stored on +other branches, so to avoid data loss, changing sparse inclusion/exclusion +rules will not cause an already checked out submodule to be removed from +the working copy. Said another way, just as `checkout` will not cause +submodules to be automatically removed or initialized even when switching +between branches that remove or add submodules, using `sparse-checkout` to +reduce or expand the scope of "interesting" files will not cause submodules +to be automatically deinitialized or initialized either. Adding or +removing them must be done as a separate step with `git submodule init` or +`git submodule deinit`. + +This may mean that even if your sparsity patterns include or exclude +submodules, until you manually initialize or deinitialize them, commands +like grep that work on tracked files in the working copy will ignore "not +yet initialized" submodules and pay attention to "left behind" ones. SEE ALSO -- 2.26.1.250.g8bb771e84c ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-22 14:26 ` Elijah Newren @ 2020-05-22 15:36 ` Elijah Newren 2020-05-22 20:54 ` Matheus Tavares Bernardino 2020-06-10 11:40 ` Derrick Stolee 1 sibling, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-22 15:36 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Fri, May 22, 2020 at 7:26 AM Elijah Newren <newren@gmail.com> wrote: > > Hi Matheus, > > On Thu, May 21, 2020 at 10:49 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > > > On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@gmail.com> wrote: > > > <snip> > > Does this seem like a good approach? Or is there another solution that > > I have not considered? Or even further, should we choose to skip the > > submodules in excluded paths? My only concern in this case is that it > > would be contrary to the design in git-sparse-checkout.txt. And the > > working tree grep and cached grep would differ even on a clean working > > tree. > <snip> > Anyway, the wording in that file seems to be really important, so > let's fix it. > Let me also try to give a concrete proposal for grep behavior for the edge cases we've discussed: git -c sparse.restrictCmds=true grep --recurse-submodules $PATTERN This goes through all the files in the index (i.e. all tracked files) which do not have the SKIP_WORKTREE bit set. For each of these: If the file is a symlink, ignore it (like grep currently does). If the file is a regular file and is present in the working copy, search it. If the file is a submodule and it is initialized, recurse into it. git -c sparse.restrictCmds=true grep --recurse-submodules --cached $PATTERN This goes through all the files in the index (i.e. all tracked files) which do not have the SKIP_WORKTREE bit set. For each of these: Skip symlinks. Search regular files. Recurse into submodules if they are initialized. git -c sparse.restrictCmds=true grep --recurse-submodules $REVISION $PATTERN This goes through all the files in the given revision (i.e. all tracked files) which match the sparsity patterns (i.e. that would not have the SKIP_WORKTREE bit set if were we to checkout that commit). For each of these: Skip symlinks. Search regular files. Recurse into submodules if they are initialized. Further, for any of these, when recursing into submodules, make sure to load that submodules' core.sparseCheckout setting (and related settings) and the submodules' sparsity patterns, if any. Sound good? I think this addresses the edge cases we've discussed so far: interaction between submodules and sparsity patterns, and handling of files that are still present despite not matching the sparsity patterns. (Also note that files which are present-despite-the-rules are prone to be removed by the next `git sparse-checkout reapply` or anything that triggers a call to unpack_trees(); there's already multiple things that do and Stolee's proposed patches would add more). If I've missed edge cases, let me know. Elijah ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-22 15:36 ` Elijah Newren @ 2020-05-22 20:54 ` Matheus Tavares Bernardino 2020-05-22 21:06 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-22 20:54 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan Hi, Elijah On Fri, May 22, 2020 at 12:36 PM Elijah Newren <newren@gmail.com> wrote: > > On Fri, May 22, 2020 at 7:26 AM Elijah Newren <newren@gmail.com> wrote: > > > > Hi Matheus, > > > > On Thu, May 21, 2020 at 10:49 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > > > > > On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@gmail.com> wrote: > > > > > <snip> > > > Does this seem like a good approach? Or is there another solution that > > > I have not considered? Or even further, should we choose to skip the > > > submodules in excluded paths? My only concern in this case is that it > > > would be contrary to the design in git-sparse-checkout.txt. And the > > > working tree grep and cached grep would differ even on a clean working > > > tree. > > > <snip> > > Anyway, the wording in that file seems to be really important, so > > let's fix it. > > > > Let me also try to give a concrete proposal for grep behavior for the > edge cases we've discussed: Thank you for this proposal and for the previous comments as well. > git -c sparse.restrictCmds=true grep --recurse-submodules $PATTERN > > This goes through all the files in the index (i.e. all tracked files) > which do not have the SKIP_WORKTREE bit set. For each of these: If > the file is a symlink, ignore it (like grep currently does). If the > file is a regular file and is present in the working copy, search it. > If the file is a submodule and it is initialized, recurse into it. Sounds good. And when sparse.restrictCmds=false, we also search the present files and present initialized submodules that have the SKIP_WORKTREE set, right? > git -c sparse.restrictCmds=true grep --recurse-submodules --cached $PATTERN > > This goes through all the files in the index (i.e. all tracked files) > which do not have the SKIP_WORKTREE bit set. For each of these: Skip > symlinks. Search regular files. Recurse into submodules if they are > initialized. OK. > git -c sparse.restrictCmds=true grep --recurse-submodules $REVISION $PATTERN > > This goes through all the files in the given revision (i.e. all > tracked files) which match the sparsity patterns (i.e. that would not > have the SKIP_WORKTREE bit set if were we to checkout that commit). > For each of these: Skip symlinks. Search regular files. Recurse into > submodules if they are initialized. OK. > Further, for any of these, when recursing into submodules, make sure > to load that submodules' core.sparseCheckout setting (and related > settings) and the submodules' sparsity patterns, if any. > > Sound good? > > I think this addresses the edge cases we've discussed so far: > interaction between submodules and sparsity patterns, and handling of > files that are still present despite not matching the sparsity > patterns. (Also note that files which are present-despite-the-rules > are prone to be removed by the next `git sparse-checkout reapply` or > anything that triggers a call to unpack_trees(); there's already > multiple things that do and Stolee's proposed patches would add more). > If I've missed edge cases, let me know. Sounds great. This addresses all the edge cases we've mentioned before. Thanks again for the detailed proposal, and for considering case by case. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-22 20:54 ` Matheus Tavares Bernardino @ 2020-05-22 21:06 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-05-22 21:06 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Fri, May 22, 2020 at 1:54 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > Hi, Elijah > > On Fri, May 22, 2020 at 12:36 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Fri, May 22, 2020 at 7:26 AM Elijah Newren <newren@gmail.com> wrote: > > > > > > Hi Matheus, > > > > > > On Thu, May 21, 2020 at 10:49 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > > > > > > > On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@gmail.com> wrote: > > > > > > > <snip> > > > > Does this seem like a good approach? Or is there another solution that > > > > I have not considered? Or even further, should we choose to skip the > > > > submodules in excluded paths? My only concern in this case is that it > > > > would be contrary to the design in git-sparse-checkout.txt. And the > > > > working tree grep and cached grep would differ even on a clean working > > > > tree. > > > > > <snip> > > > Anyway, the wording in that file seems to be really important, so > > > let's fix it. > > > > > > > Let me also try to give a concrete proposal for grep behavior for the > > edge cases we've discussed: > > Thank you for this proposal and for the previous comments as well. > > > git -c sparse.restrictCmds=true grep --recurse-submodules $PATTERN > > > > This goes through all the files in the index (i.e. all tracked files) > > which do not have the SKIP_WORKTREE bit set. For each of these: If > > the file is a symlink, ignore it (like grep currently does). If the > > file is a regular file and is present in the working copy, search it. > > If the file is a submodule and it is initialized, recurse into it. > > Sounds good. And when sparse.restrictCmds=false, we also search the > present files and present initialized submodules that have the > SKIP_WORKTREE set, right? You're really pushing those corner cases, I love it. :-) SKIP_WORKTREE is supposed to mean we have removed it from the working tree, i.e. it shouldn't be present (if we decide we're not going to remove it from the working tree, e.g. because the file is unmerged or something, then we don't mark it as SKIP_WORKTREE even if it doesn't match sparsity patterns). Therefore, the set of files that satisfy this condition you have given should generally be empty. But presuming we hit this corner case, I'd say you are right. sparse.restrictCmds=false means we ignore the SKIP_WORKTREE bit entirely (and in the case of grepping a $REVISION, we ignore the sparsity patterns entirely). > > git -c sparse.restrictCmds=true grep --recurse-submodules --cached $PATTERN > > > > This goes through all the files in the index (i.e. all tracked files) > > which do not have the SKIP_WORKTREE bit set. For each of these: Skip > > symlinks. Search regular files. Recurse into submodules if they are > > initialized. > > OK. > > > git -c sparse.restrictCmds=true grep --recurse-submodules $REVISION $PATTERN > > > > This goes through all the files in the given revision (i.e. all > > tracked files) which match the sparsity patterns (i.e. that would not > > have the SKIP_WORKTREE bit set if were we to checkout that commit). > > For each of these: Skip symlinks. Search regular files. Recurse into > > submodules if they are initialized. > > OK. > > > Further, for any of these, when recursing into submodules, make sure > > to load that submodules' core.sparseCheckout setting (and related > > settings) and the submodules' sparsity patterns, if any. > > > > Sound good? > > > > I think this addresses the edge cases we've discussed so far: > > interaction between submodules and sparsity patterns, and handling of > > files that are still present despite not matching the sparsity > > patterns. (Also note that files which are present-despite-the-rules > > are prone to be removed by the next `git sparse-checkout reapply` or > > anything that triggers a call to unpack_trees(); there's already > > multiple things that do and Stolee's proposed patches would add more). > > If I've missed edge cases, let me know. > > Sounds great. This addresses all the edge cases we've mentioned > before. Thanks again for the detailed proposal, and for considering > case by case. And thank you for working on this. :-) ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-22 14:26 ` Elijah Newren 2020-05-22 15:36 ` Elijah Newren @ 2020-06-10 11:40 ` Derrick Stolee 2020-06-10 16:22 ` Matheus Tavares Bernardino 2020-06-10 19:58 ` Elijah Newren 1 sibling, 2 replies; 123+ messages in thread From: Derrick Stolee @ 2020-06-10 11:40 UTC (permalink / raw) To: Elijah Newren, Matheus Tavares Bernardino Cc: git, Junio C Hamano, Jonathan Tan On 5/22/2020 10:26 AM, Elijah Newren wrote: Sorry I missed this patch. I was searching all over for patches with "sparse" or "submodule" in the _subject_. Thanks for calling out the need for review, Junio! > Subject: [PATCH] git-sparse-checkout: clarify interactions with submodules > > Ignoring the sparse-checkout feature momentarily, if one has a submodule and > creates local branches within it with unpushed changes and maybe adds some > untracked files to it, then we would want to avoid accidentally removing such > a submodule. So, for example with git.git, if you run > git checkout v2.13.0 > then the sha1collisiondetection/ submodule is NOT removed even though it > did not exist as a submodule until v2.14.0. Similarly, if you only had > v2.13.0 checked out previously and ran > git checkout v2.14.0 > the sha1collisiondetection/ submodule would NOT be automatically > initialized despite being part of v2.14.0. In both cases, git requires > submodules to be initialized or deinitialized separately. Further, we > also have special handling for submodules in other commands such as > clean, which requires two --force flags to delete untracked submodules, > and some commands have a --recurse-submodules flag. > > sparse-checkout is very similar to checkout, as evidenced by the similar > name -- it adds and removes files from the working copy. However, for > the same avoid-data-loss reasons we do not want to remove a submodule > from the working copy with checkout, we do not want to do it with > sparse-checkout either. So submodules need to be separately initialized > or deinitialized; changing sparse-checkout rules should not > automatically trigger the removal or vivification of submodules. This is a good summary of how submodules decide to be present or not. > I believe the previous wording in git-sparse-checkout.txt about > submodules was only about this particular issue. Unfortunately, the > previous wording could be interpreted to imply that submodules should be > considered active regardless of sparsity patterns. Update the wording > to avoid making such an implication. It may be helpful to consider two > example situations where the differences in wording become important: You are correct, the wording was unclear. Worth fixing. > In the future, we want users to be able to run commands like > git clone --sparse=moduleA --recurse-submodules $REPO_URL > and have sparsity paths automatically set up and have submodules *within > the sparsity paths* be automatically initialized. We do not want all > submodules in any path to be automatically initialized with that > command. INTERESTING. You are correct that it would be nice to have one feature that describes "what should be present or not". The in-tree sparse-checkout feature (still in infancy) would benefit from a redesign with that in mind. I am interested as well in the idea that combining "--sparse[=X]" with "--recurse-submodules" might want to imply that the submodules themselves are initialized with sparse-checkout patterns. These ramblings are of course off-topic for the current patch. > Similarly, we want to be able to do things like > git -c sparse.restrictCmds grep --recurse-submodules $REV $PATTERN > and search through $REV for $PATTERN within the recorded sparsity > patterns. We want it to recurse into submodules within those sparsity > patterns, but do not want to recurse into directories that do not match > the sparsity patterns in search of a possible submodule. (snipping way the old paragraph and focusing on the new text) > +If your repository contains one or more submodules, then those submodules > +will appear based on which you initialized with the `git submodule` > +command. This sentence is awkward. Here is a potential replacement: If your repository contains one or more submodules, then submodules are populated based on interactions with the `git submodule` command. Specifically, `git submodule init -- <path>` will ensure the submodule at `<path>` is present while `git submodule deinit -- <path>` will remove the files for the submodule at `<path>`. Similar to sparse-checkout, the deinitialized submodules still exist in the index, but are not present in the working directory. That got a lot longer as I was working on it. Perhaps add a paragraph break before the next bit. > Submodules may have additional untracked files or code stored on To emphasize the importance of the following "to avoid data loss" statement, you could mention that when a submodule is removed from the working directory, then so is all of its Git data such as objects and branches. If that data was not pushed to another repository, then deinitializing a submodule can result in loss of important data. (Also: maybe I'm wrong about that?) > +other branches, so to avoid data loss, changing sparse inclusion/exclusion Edit: other branches. To avoid data loss, ... > +rules will not cause an already checked out submodule to be removed from > +the working copy. Said another way, just as `checkout` will not cause > +submodules to be automatically removed or initialized even when switching > +between branches that remove or add submodules, using `sparse-checkout` to > +reduce or expand the scope of "interesting" files will not cause submodules > +to be automatically deinitialized or initialized either. Adding or > +removing them must be done as a separate step with `git submodule init` or > +`git submodule deinit`. This final sentence may be redundant if you include reference to init/deinit earlier in the section. > +This may mean that even if your sparsity patterns include or exclude > +submodules, until you manually initialize or deinitialize them, commands > +like grep that work on tracked files in the working copy will ignore "not > +yet initialized" submodules and pay attention to "left behind" ones. I don't think that "left behind" is a good phrase here. It feels like they've been _dropped_ instead of _persisted despite sparse-checkout changes_. Perhaps: commands like `git grep` that work on tracked files in the working copy will pay attention only to initialized submodules, regardless of the sparse-checkout definition. Thanks for pointing out how complicated this scenario is! It certainly demands a careful update like this one. -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-06-10 11:40 ` Derrick Stolee @ 2020-06-10 16:22 ` Matheus Tavares Bernardino 2020-06-10 17:42 ` Derrick Stolee 2020-06-10 20:12 ` Elijah Newren 2020-06-10 19:58 ` Elijah Newren 1 sibling, 2 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-10 16:22 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren, git, Junio C Hamano, Jonathan Tan On Wed, Jun 10, 2020 at 8:41 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/22/2020 10:26 AM, Elijah Newren wrote: > > +This may mean that even if your sparsity patterns include or exclude > > +submodules, until you manually initialize or deinitialize them, commands > > +like grep that work on tracked files in the working copy will ignore "not > > +yet initialized" submodules and pay attention to "left behind" ones. > > I don't think that "left behind" is a good phrase here. It feels like > they've been _dropped_ instead of _persisted despite sparse-checkout > changes_. > > Perhaps: > > commands like `git grep` that work on tracked files in the working copy > will pay attention only to initialized submodules, regardless of the > sparse-checkout definition. Hmm, I'm a little confused by the "regardless of the sparse-checkout definition". The plan we discussed for grep was to not recurse into submodules if they have the SKIP_WORKTREE bit set [1], wasn't it? [1]: https://lore.kernel.org/git/CABPp-BE6M9ATDYuQh8f_r3S00dM2Cv9vM3T5j5W_odbVzhC-5A@mail.gmail.com/ ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-06-10 16:22 ` Matheus Tavares Bernardino @ 2020-06-10 17:42 ` Derrick Stolee 2020-06-10 18:14 ` Matheus Tavares Bernardino 2020-06-10 20:12 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Derrick Stolee @ 2020-06-10 17:42 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Elijah Newren, git, Junio C Hamano, Jonathan Tan On 6/10/2020 12:22 PM, Matheus Tavares Bernardino wrote: > On Wed, Jun 10, 2020 at 8:41 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 5/22/2020 10:26 AM, Elijah Newren wrote: >>> +This may mean that even if your sparsity patterns include or exclude >>> +submodules, until you manually initialize or deinitialize them, commands >>> +like grep that work on tracked files in the working copy will ignore "not >>> +yet initialized" submodules and pay attention to "left behind" ones. >> >> I don't think that "left behind" is a good phrase here. It feels like >> they've been _dropped_ instead of _persisted despite sparse-checkout >> changes_. >> >> Perhaps: >> >> commands like `git grep` that work on tracked files in the working copy >> will pay attention only to initialized submodules, regardless of the >> sparse-checkout definition. > > Hmm, I'm a little confused by the "regardless of the sparse-checkout > definition". The plan we discussed for grep was to not recurse into > submodules if they have the SKIP_WORKTREE bit set [1], wasn't it? > > [1]: https://lore.kernel.org/git/CABPp-BE6M9ATDYuQh8f_r3S00dM2Cv9vM3T5j5W_odbVzhC-5A@mail.gmail.com/ Thanks for correcting my misunderstanding. By introducing `git grep` into this documentation, I have also made it co-dependent on your series. Instead, Elijah was probably purposeful in his use of "grep" over "git grep". If we revert that part of my change to use `grep` instead of `git grep`, then is my statement correct? Thanks, -Stolee ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-06-10 17:42 ` Derrick Stolee @ 2020-06-10 18:14 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-10 18:14 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren, git, Junio C Hamano, Jonathan Tan On Wed, Jun 10, 2020 at 2:42 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 6/10/2020 12:22 PM, Matheus Tavares Bernardino wrote: > > On Wed, Jun 10, 2020 at 8:41 AM Derrick Stolee <stolee@gmail.com> wrote: > >> > >> On 5/22/2020 10:26 AM, Elijah Newren wrote: > >>> +This may mean that even if your sparsity patterns include or exclude > >>> +submodules, until you manually initialize or deinitialize them, commands > >>> +like grep that work on tracked files in the working copy will ignore "not > >>> +yet initialized" submodules and pay attention to "left behind" ones. > >> > >> I don't think that "left behind" is a good phrase here. It feels like > >> they've been _dropped_ instead of _persisted despite sparse-checkout > >> changes_. > >> > >> Perhaps: > >> > >> commands like `git grep` that work on tracked files in the working copy > >> will pay attention only to initialized submodules, regardless of the > >> sparse-checkout definition. > > > > Hmm, I'm a little confused by the "regardless of the sparse-checkout > > definition". The plan we discussed for grep was to not recurse into > > submodules if they have the SKIP_WORKTREE bit set [1], wasn't it? > > > > [1]: https://lore.kernel.org/git/CABPp-BE6M9ATDYuQh8f_r3S00dM2Cv9vM3T5j5W_odbVzhC-5A@mail.gmail.com/ > > Thanks for correcting my misunderstanding. By introducing > `git grep` into this documentation, I have also made it > co-dependent on your series. Instead, Elijah was probably > purposeful in his use of "grep" over "git grep". I think he used grep referring to git-grep as he mentioned "tracked files in the working copy". Maybe he wanted to describe the current state of git-grep, which does recurse into initialized submodules even when they don't match the sparsity patterns. Was that it, Elijah? If so, since this behavior is changed in mt/grep-sparse-checkout, I think I should also change this doc section within my series. Or we change the doc in this patch and make it dependent on the series. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-06-10 16:22 ` Matheus Tavares Bernardino 2020-06-10 17:42 ` Derrick Stolee @ 2020-06-10 20:12 ` Elijah Newren 1 sibling, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-06-10 20:12 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Derrick Stolee, git, Junio C Hamano, Jonathan Tan On Wed, Jun 10, 2020 at 9:23 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Wed, Jun 10, 2020 at 8:41 AM Derrick Stolee <stolee@gmail.com> wrote: > > > > On 5/22/2020 10:26 AM, Elijah Newren wrote: > > > +This may mean that even if your sparsity patterns include or exclude > > > +submodules, until you manually initialize or deinitialize them, commands > > > +like grep that work on tracked files in the working copy will ignore "not > > > +yet initialized" submodules and pay attention to "left behind" ones. > > > > I don't think that "left behind" is a good phrase here. It feels like > > they've been _dropped_ instead of _persisted despite sparse-checkout > > changes_. > > > > Perhaps: > > > > commands like `git grep` that work on tracked files in the working copy > > will pay attention only to initialized submodules, regardless of the > > sparse-checkout definition. > > Hmm, I'm a little confused by the "regardless of the sparse-checkout > definition". The plan we discussed for grep was to not recurse into > submodules if they have the SKIP_WORKTREE bit set [1], wasn't it? > > [1]: https://lore.kernel.org/git/CABPp-BE6M9ATDYuQh8f_r3S00dM2Cv9vM3T5j5W_odbVzhC-5A@mail.gmail.com/ I flagged some issues with that sentence...and an additional issue in my original sentence besides the one Stolee flagged. It seems to be easy to mess up a simple summary here. :-) But I do want a simple summary of some sort; I want Documentation/git-sparse-checkout.txt to be an end-user guide and not an implementation spec. Perhaps I can bring up a simpler example that will make it easier to see my distinction between the two -- let's consider the case of unmerged files. I think all of the following statements are true, but some are meant strictly as implementation details of relevant subcommands, while others are deduced overall behavior observed by end-users: * If you just ran merge or rebase and have some files with conflicts, 'git grep searchstring' will search the conflicted files for the searchstring * When searching the working tree, git grep should not do any special checking for whether files are in a conflicted state * sparse-checkout will never set the SKIP_WORKTREE bit on an unmerged file (despite sparsity patterns) * sparse-checkout will delete all (regular and symlink) files from the working tree when it sets the SKIP_WORKTREE bit for them * sparse-checkout will not delete files from the working copy if it doesn't set the SKIP_WORKTREE bit on it * When merging, if the merge machinery notices a conflict, it must clear the SKIP_WORKTREE bit and write the (conflicted version of the) file out to the working tree. (It is also allowed to clear the SKIP_WORKTREE bit for files that are not conflicted, though we'd rather it didn't do that so much.) These statements above are not incompatible, because some deal with the implementation of git grep (the second item), others deal with implementation details of other commands or machinery (all items after the second), and the first item deals with the combination of behaviors between sparse-checkout + merge machinery + grep. So, even though the first bullet point says "git grep...will search the conflicted files" that does NOT mean git grep should check for whether files are conflicted. My proposed update in v2 that I'll send out (once I come up with one) might use similar broad brushes. Hope that helps, Elijah ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-06-10 11:40 ` Derrick Stolee 2020-06-10 16:22 ` Matheus Tavares Bernardino @ 2020-06-10 19:58 ` Elijah Newren 1 sibling, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-06-10 19:58 UTC (permalink / raw) To: Derrick Stolee Cc: Matheus Tavares Bernardino, Git Mailing List, Junio C Hamano, Jonathan Tan On Wed, Jun 10, 2020 at 4:41 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/22/2020 10:26 AM, Elijah Newren wrote: > > Sorry I missed this patch. I was searching all over for patches with > "sparse" or "submodule" in the _subject_. Thanks for calling out the > need for review, Junio! > > > Subject: [PATCH] git-sparse-checkout: clarify interactions with submodules > > > > Ignoring the sparse-checkout feature momentarily, if one has a submodule and > > creates local branches within it with unpushed changes and maybe adds some > > untracked files to it, then we would want to avoid accidentally removing such > > a submodule. So, for example with git.git, if you run > > git checkout v2.13.0 > > then the sha1collisiondetection/ submodule is NOT removed even though it > > did not exist as a submodule until v2.14.0. Similarly, if you only had > > v2.13.0 checked out previously and ran > > git checkout v2.14.0 > > the sha1collisiondetection/ submodule would NOT be automatically > > initialized despite being part of v2.14.0. In both cases, git requires > > submodules to be initialized or deinitialized separately. Further, we > > also have special handling for submodules in other commands such as > > clean, which requires two --force flags to delete untracked submodules, > > and some commands have a --recurse-submodules flag. > > > > sparse-checkout is very similar to checkout, as evidenced by the similar > > name -- it adds and removes files from the working copy. However, for > > the same avoid-data-loss reasons we do not want to remove a submodule > > from the working copy with checkout, we do not want to do it with > > sparse-checkout either. So submodules need to be separately initialized > > or deinitialized; changing sparse-checkout rules should not > > automatically trigger the removal or vivification of submodules. > > This is a good summary of how submodules decide to be present or not. > > > I believe the previous wording in git-sparse-checkout.txt about > > submodules was only about this particular issue. Unfortunately, the > > previous wording could be interpreted to imply that submodules should be > > considered active regardless of sparsity patterns. Update the wording > > to avoid making such an implication. It may be helpful to consider two > > example situations where the differences in wording become important: > > You are correct, the wording was unclear. Worth fixing. > > > In the future, we want users to be able to run commands like > > git clone --sparse=moduleA --recurse-submodules $REPO_URL > > and have sparsity paths automatically set up and have submodules *within > > the sparsity paths* be automatically initialized. We do not want all > > submodules in any path to be automatically initialized with that > > command. > > INTERESTING. You are correct that it would be nice to have one > feature that describes "what should be present or not". The in-tree > sparse-checkout feature (still in infancy) would benefit from a > redesign with that in mind. > > I am interested as well in the idea that combining "--sparse[=X]" > with "--recurse-submodules" might want to imply that the submodules > themselves are initialized with sparse-checkout patterns. > > These ramblings are of course off-topic for the current patch. Yeah, it might get complicated too; we'd almost certainly want to limit to cone mode (globs could get super hairy). It's also the case we might want some submodules to have sparse-checkouts and others have full checkouts, depending on whether the --sparse=X specification listed some path that traversed from the toplevel outer repo down into a submodule. (But if --sparse is given with no specification, do all submodules become sparse or do all remain full?) Anyway, lots of complications there and we should start a different thread to discuss that when we feel it's time to tackle it. > > Similarly, we want to be able to do things like > > git -c sparse.restrictCmds grep --recurse-submodules $REV $PATTERN > > and search through $REV for $PATTERN within the recorded sparsity > > patterns. We want it to recurse into submodules within those sparsity > > patterns, but do not want to recurse into directories that do not match > > the sparsity patterns in search of a possible submodule. > > (snipping way the old paragraph and focusing on the new text) > > > +If your repository contains one or more submodules, then those submodules > > +will appear based on which you initialized with the `git submodule` > > +command. > > This sentence is awkward. Here is a potential replacement: > > If your repository contains one or more submodules, then submodules are > populated based on interactions with the `git submodule` command. > Specifically, `git submodule init -- <path>` will ensure the submodule at > `<path>` is present while `git submodule deinit -- <path>` will remove the > files for the submodule at `<path>`. Similar to sparse-checkout, the > deinitialized submodules still exist in the index, but are not present in > the working directory. > > That got a lot longer as I was working on it. Perhaps add a paragraph break > before the next bit. Sounds good, thanks. > > Submodules may have additional untracked files or code stored on > > To emphasize the importance of the following "to avoid data loss" statement, > you could mention that when a submodule is removed from the working directory, > then so is all of its Git data such as objects and branches. If that data was > not pushed to another repository, then deinitializing a submodule can result > in loss of important data. (Also: maybe I'm wrong about that?) > > > +other branches, so to avoid data loss, changing sparse inclusion/exclusion I thought that was what I covered with the "code stored on other branches" but I guess that wasn't clear enough. So yeah, I can try extending it a bit. > Edit: other branches. To avoid data loss, ... Sounds good. > > +rules will not cause an already checked out submodule to be removed from > > +the working copy. Said another way, just as `checkout` will not cause > > +submodules to be automatically removed or initialized even when switching > > +between branches that remove or add submodules, using `sparse-checkout` to > > +reduce or expand the scope of "interesting" files will not cause submodules > > +to be automatically deinitialized or initialized either. Adding or > > +removing them must be done as a separate step with `git submodule init` or > > +`git submodule deinit`. > > This final sentence may be redundant if you include reference to init/deinit > earlier in the section. Yep, I'll strike it. > > +This may mean that even if your sparsity patterns include or exclude > > +submodules, until you manually initialize or deinitialize them, commands > > +like grep that work on tracked files in the working copy will ignore "not > > +yet initialized" submodules and pay attention to "left behind" ones. > > I don't think that "left behind" is a good phrase here. It feels like > they've been _dropped_ instead of _persisted despite sparse-checkout > changes_. I think in addition to the "left behind" wording being bad, my paragraph left another funny gray area and might be inconsistent with what Matheus and I wrote elsewhere: If sparsity patterns would exclude a submodule that is initialized, sparse-checkout clearly can't remove the submodule. However, should it set the SKIP_WORKTREE bit for that submodule if it's not going to remove it? I'm not sure of the answer, yet. I think Matheus had the right idea for how to make grep handle an initialized submodule in the different sparse.restrictCmds settings, and if we do go ahead and clear the SKIP_WORKTREE bit, then I think the wording of this paragraph needs to change. So, let's discuss your alternative: > Perhaps: > > commands like `git grep` that work on tracked files in the working copy > will pay attention only to initialized submodules, regardless of the > sparse-checkout definition. I think this is easy to misconstrue in an entirely new way: if there are initialized submodules (and maybe a sparse checkout), then your wording implies normal files would be ignored by grep (even files that aren't removed by the sparse checkout)! While that sounds like crazy behavior, this whole thread started because of suggested behaviors being proposed to carefully follow what was already written in this document even though the end user result seemed somewhat crazy to me. So, we might want to avoid a repeat. :-) Also, your suggested wording is different than the behavior we came up with before, and is also inconsistent with how we'd work with normal files. For example, what if a user: * uses sparse-checkout to remove a bunch of files/directories they don't care about * creates a new file that happens to have the same name as an (unfortunately) generically worded filename that exists in the index (but is marked SKIP_WORKTREE and had previously been removed) Is this new file related to the tracked file? Is the new file considered tracked? Should the new file be considered part of the sparse cone (i.e. should it be considered part of the set of tracked working tree files relevant to the user for commands that operate on that subset)? It's a bit of a thorny case. Here's the behavior Matheus and I came to previously: git -c sparse.restrictCmds=true grep --recurse-submodules <pattern>: This goes through all the files in the index (i.e. all tracked files) which do NOT have the SKIP_WORKTREE bit set. For each of these: If the file is a symlink, ignore it (like git-grep currently does). If the file is a regular file and is present in the working copy, search it. If the file is a submodule and it is initialized, recurse into it. git -c sparse.restrictCmds=false grep --recurse-submodules <pattern>: This goes through all the files in the index (i.e. all tracked files) regardless of SKIP_WORKTREE bit setting. For each of these: If the file is a symlink, ignore it (like git-grep currently does). If the file is a regular file and is present in the working copy, search it. If the file is a submodule and it is initialized, recurse into it. The only difference between these two sparse.restrictCmds settings is the handling of the SKIP_WORKTREE bit. I think that makes them nice and orthogonal. They also generalize nicely to the cases of searching --cached or $REVISION with a few obvious changes (check if data is available in git object store rather than if file is present in working tree, and for $REVISION check sparsity patterns rather than SKIP_WORKTREE bit). If we start ignoring the SKIP_WORKTREE bit for some types of files even when sparse.restrictCmds=true, I think we start getting a number of inconsistencies and user surprises. So, these formal definitions seem like a good high-level design. I think our attempts to summarize behavior in short sentences for users sometimes ignores some cases due to the desire to summarize. However, taking the summary literally can suggest behaviors that'd be inconsistent if not downright crazy for some of the ignored cases. I'll see if I can clean it up somehow. > Thanks for pointing out how complicated this scenario is! It certainly > demands a careful update like this one. Thanks for the thoughtful review! ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 3/4] grep: honor sparse checkout patterns 2020-05-11 19:35 ` Junio C Hamano 2020-05-13 0:05 ` Matheus Tavares Bernardino @ 2020-05-21 7:36 ` Elijah Newren 1 sibling, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-05-21 7:36 UTC (permalink / raw) To: Junio C Hamano Cc: Matheus Tavares, Git Mailing List, Derrick Stolee, Jonathan Tan On Mon, May 11, 2020 at 12:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > One of the main uses for a sparse checkout is to allow users to focus on > > the subset of files in a repository in which they are interested. But > > git-grep currently ignores the sparsity patterns and report all matches > > found outside this subset, which kind of goes in the opposite direction. > > Let's fix that, making it honor the sparsity boundaries for every > > grepping case: > > > > - git grep in worktree > > - git grep --cached > > - git grep $REVISION > > It makes sense for these to be limited within the "sparse" area. > > > - git grep --untracked and git grep --no-index (which already respect > > sparse checkout boundaries) > > I can understand the former; those untracked files are what _could_ > be brought into attention by "git add", so limiting to the same > "sparse" area may make sense. > > I am not sure about the latter, though, as "--no-index" is an > explicit request to pretend that we are dealing with a random > collection of files, not managed in a git repository. But perhaps > there is a similar justification like how "--untracked" is > unjustifiable. I dunno. I don't think it makes sense for sparsity patterns to affect either. sparsity patterns are a way of splitting "tracked" files into two subsets (those matching the sparsity paths and those that don't). Therefore, flags that are about searching things that aren't tracked, clearly don't have anything to do with sparsity patterns. However, I think this was just a wording issue; in the subsequent commit Matheus made it clear that he's not modifying the behavior of grep --untracked or grep --no-index based on the presence or absence of sparsity patterns. > > diff --git a/builtin/grep.c b/builtin/grep.c > > index a5056f395a..91ee0b2734 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, > > const struct pathspec *pathspec, int cached); > > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > struct tree_desc *tree, struct strbuf *base, int tn_len, > > - int check_attr); > > + int is_root_tree); > > > > static int grep_submodule(struct grep_opt *opt, > > const struct pathspec *pathspec, > > @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, > > > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > > const struct cache_entry *ce = repo->index->cache[nr]; > > + > > + if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) > > + continue; > > Hmph. Why exclude gitlink from this rule? If a submodule sits at a > path that is excluded by the sparse pattern, should we still recurse > into it? That bothers me too. > > strbuf_setlen(&name, name_base_len); > > strbuf_addstr(&name, ce->name); > > > > @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, > > * cache entry are identical, even if worktree file has > > * been modified, so use cache version instead > > */ > > - if (cached || (ce->ce_flags & CE_VALID) || > > - ce_skip_worktree(ce)) { > > + if (cached || (ce->ce_flags & CE_VALID)) { > > if (ce_stage(ce) || ce_intent_to_add(ce)) > > continue; > > hit |= grep_oid(opt, &ce->oid, name.buf, > > @@ -552,9 +555,78 @@ static int grep_cache(struct grep_opt *opt, > > return hit; > > } > > > > -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > - struct tree_desc *tree, struct strbuf *base, int tn_len, > > - int check_attr) > > +static struct pattern_list *get_sparsity_patterns(struct repository *repo) > > +{ > > + struct pattern_list *patterns; > > + char *sparse_file; > > + int sparse_config, cone_config; > > + > > + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || > > + !sparse_config) { > > + return NULL; > > + } > > + > > + sparse_file = repo_git_path(repo, "info/sparse-checkout"); > > + patterns = xcalloc(1, sizeof(*patterns)); > > + > > + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) > > + cone_config = 0; > > + patterns->use_cone_patterns = cone_config; > > + > > + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { > > + if (file_exists(sparse_file)) { > > + warning(_("failed to load sparse-checkout file: '%s'"), > > + sparse_file); > > + } > > + free(sparse_file); > > + free(patterns); > > + return NULL; > > + } > > + > > + free(sparse_file); > > + return patterns; > > +} > > + > > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, > > + unsigned int entry_mode, > > + struct index_state *istate, > > + struct pattern_list *sparsity, > > + enum pattern_match_result parent_match, > > + enum pattern_match_result *match) > > +{ > > + int dtype = DT_UNKNOWN; > > + > > + if (S_ISGITLINK(entry_mode)) > > + return 1; > > This is consistent with the "we do not care where a gitlink > appears---submodules are always descended into, regardless of the > sparse definition" decision we saw earlier, I think. I am not sure > if that is a good design in the first place, though. > > > + if (parent_match == MATCHED_RECURSIVE) { > > + *match = parent_match; > > + return 1; > > + } > > + > > + if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) > > + strbuf_addch(path, '/'); > > + > > + *match = path_matches_pattern_list(path->buf, path->len, > > + path->buf + prefix_len, &dtype, > > + sparsity, istate); > > + if (*match == UNDECIDED) > > + *match = parent_match; > > + > > + if (S_ISDIR(entry_mode)) > > + strbuf_trim_trailing_dir_sep(path); > > + > > + if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || > > + (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) > > + return 0; > > + > > + return 1; > > +} > > > > > +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > + struct tree_desc *tree, struct strbuf *base, int tn_len, > > + int check_attr, struct pattern_list *sparsity, > > + enum pattern_match_result default_sparsity_match) > > { > > struct repository *repo = opt->repo; > > int hit = 0; > > @@ -570,6 +642,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > > > while (tree_entry(tree, &entry)) { > > int te_len = tree_entry_len(&entry); > > + enum pattern_match_result sparsity_match = 0; > > > > if (match != all_entries_interesting) { > > strbuf_addstr(&name, base->buf + tn_len); > > @@ -586,6 +659,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > > > strbuf_add(base, entry.path, te_len); > > > > + if (sparsity) { > > + struct strbuf path = STRBUF_INIT; > > + strbuf_addstr(&path, base->buf + tn_len); > > + > > + if (!in_sparse_checkout(&path, old_baselen - tn_len, > > + entry.mode, repo->index, > > + sparsity, default_sparsity_match, > > + &sparsity_match)) { > > + strbuf_setlen(base, old_baselen); > > + continue; > > + } > > + } > > OK. > > > if (S_ISREG(entry.mode)) { > > hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, > > check_attr ? base->buf + tn_len : NULL); > > @@ -602,8 +688,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > > > strbuf_addch(base, '/'); > > init_tree_desc(&sub, data, size); > > - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, > > - check_attr); > > + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, > > + check_attr, sparsity, sparsity_match); > > free(data); > > } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { > > hit |= grep_submodule(opt, pathspec, &entry.oid, > > @@ -621,6 +707,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > return hit; > > } > > > > +/* > > + * Note: sparsity patterns and paths' attributes will only be considered if > > + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern > > + * matching on paths.) > > + */ > > +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > + struct tree_desc *tree, struct strbuf *base, int tn_len, > > + int is_root_tree) > > +{ > > + struct pattern_list *patterns = NULL; > > + int ret; > > + > > + if (is_root_tree) > > + patterns = get_sparsity_patterns(opt->repo); > > + > > + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, > > + patterns, 0); > > + > > + if (patterns) { > > + clear_pattern_list(patterns); > > + free(patterns); > > + } > > OK, it is not like this codepath is driven by "git log" to grep from > top-level tree objects of many commits, so it is OK to grab the > sparsity patterns once before do_grep_tree() and discard it when we > are done. > > > + return ret; > > +} > > + > > > static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, > > struct object *obj, const char *name, const char *path) > > { > > diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh > > index 37525cae3a..26852586ac 100755 > > --- a/t/t7011-skip-worktree-reading.sh > > +++ b/t/t7011-skip-worktree-reading.sh > > @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' > > test -z "$(git ls-files -m)" > > ' > > > > -test_expect_success 'grep with skip-worktree file' ' > > - git update-index --no-skip-worktree 1 && > > - echo test > 1 && > > - git update-index 1 && > > - git update-index --skip-worktree 1 && > > - rm 1 && > > - test "$(git grep --no-ext-grep test)" = "1:test" > > -' > > - > > echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected > > test_expect_success 'diff-index does not examine skip-worktree absent entries' ' > > setup_absent && > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > new file mode 100755 > > index 0000000000..3bd67082eb > > --- /dev/null > > +++ b/t/t7817-grep-sparse-checkout.sh > > @@ -0,0 +1,140 @@ > > +#!/bin/sh > > + > > +test_description='grep in sparse checkout > > + > > +This test creates a repo with the following structure: > > + > > +. > > +|-- a > > +|-- b > > +|-- dir > > +| `-- c > > +`-- sub > > + |-- A > > + | `-- a > > + `-- B > > + `-- b > > + > > +Where . has non-cone mode sparsity patterns and sub is a submodule with cone > > +mode sparsity patterns. The resulting sparse-checkout should leave the following > > +structure: > > + > > +. > > +|-- a > > +`-- sub > > + `-- B > > + `-- b > > +' > > + > > +. ./test-lib.sh > > + > > +test_expect_success 'setup' ' > > + echo "text" >a && > > + echo "text" >b && > > + mkdir dir && > > + echo "text" >dir/c && > > + > > + git init sub && > > + ( > > + cd sub && > > + mkdir A B && > > + echo "text" >A/a && > > + echo "text" >B/b && > > + git add A B && > > + git commit -m sub && > > + git sparse-checkout init --cone && > > + git sparse-checkout set B > > + ) && > > + > > + git submodule add ./sub && > > + git add a b dir && > > + git commit -m super && > > + git sparse-checkout init --no-cone && > > + git sparse-checkout set "/*" "!b" "!/*/" && > > + > > + git tag -am t-commit t-commit HEAD && > > + tree=$(git rev-parse HEAD^{tree}) && > > + git tag -am t-tree t-tree $tree && > > + > > + test_path_is_missing b && > > + test_path_is_missing dir && > > + test_path_is_missing sub/A && > > + test_path_is_file a && > > + test_path_is_file sub/B/b > > +' > > + > > +test_expect_success 'grep in working tree should honor sparse checkout' ' > > + cat >expect <<-EOF && > > + a:text > > + EOF > > + git grep "text" >actual && > > + test_cmp expect actual > > +' > > + > > +test_expect_success 'grep --cached should honor sparse checkout' ' > > + cat >expect <<-EOF && > > + a:text > > + EOF > > + git grep --cached "text" >actual && > > + test_cmp expect actual > > +' > > + > > +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' > > + commit=$(git rev-parse HEAD) && > > + cat >expect_commit <<-EOF && > > + $commit:a:text > > + EOF > > + cat >expect_t-commit <<-EOF && > > + t-commit:a:text > > + EOF > > + git grep "text" $commit >actual_commit && > > + test_cmp expect_commit actual_commit && > > + git grep "text" t-commit >actual_t-commit && > > + test_cmp expect_t-commit actual_t-commit > > +' > > + > > +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' > > + commit=$(git rev-parse HEAD) && > > + tree=$(git rev-parse HEAD^{tree}) && > > + cat >expect_tree <<-EOF && > > + $tree:a:text > > + $tree:b:text > > + $tree:dir/c:text > > + EOF > > + cat >expect_t-tree <<-EOF && > > + t-tree:a:text > > + t-tree:b:text > > + t-tree:dir/c:text > > + EOF > > + git grep "text" $tree >actual_tree && > > + test_cmp expect_tree actual_tree && > > + git grep "text" t-tree >actual_t-tree && > > + test_cmp expect_t-tree actual_t-tree > > +' > > + > > +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' > > + cat >expect <<-EOF && > > + a:text > > + sub/B/b:text > > + EOF > > + git grep --recurse-submodules --cached "text" >actual && > > + test_cmp expect actual > > +' > > + > > +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' > > + commit=$(git rev-parse HEAD) && > > + cat >expect_commit <<-EOF && > > + $commit:a:text > > + $commit:sub/B/b:text > > + EOF > > + cat >expect_t-commit <<-EOF && > > + t-commit:a:text > > + t-commit:sub/B/b:text > > + EOF > > + git grep --recurse-submodules "text" $commit >actual_commit && > > + test_cmp expect_commit actual_commit && > > + git grep --recurse-submodules "text" t-commit >actual_t-commit && > > + test_cmp expect_t-commit actual_t-commit > > +' > > + > > +test_done ^ permalink raw reply [flat|nested] 123+ messages in thread
* [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (2 preceding siblings ...) 2020-05-10 0:41 ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares @ 2020-05-10 0:41 ` Matheus Tavares 2020-05-10 4:23 ` Matheus Tavares Bernardino 2020-05-21 7:09 ` Elijah Newren 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares 4 siblings, 2 replies; 123+ messages in thread From: Matheus Tavares @ 2020-05-10 0:41 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy When sparse checkout is enabled, some users expect the output of certain commands (such as grep, diff, and log) to be also restricted within the sparsity patterns. This would allow them to effectively work only on the subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the previous commit. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function definition that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's introduce the sparse.restrictCmds setting (and the analogous --[no]-restrict-to-sparse-paths global option). The default value is true. For now, grep is the only one affected by this setting, but the goal is to have support for more commands, in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Some notes/questions about this one: - I guess having the additional sparse-checkout.o only for the restrict_to_sparse_paths() function is not very justifiable. Especially since builtin/grep.c is currently its only caller. But since Stolee is already moving some code out of the sparse-checkout builtin and into sparse-checkout.o [1], I thought it would be better to place this function here from the start, as it will likely be needed by other cmds when they start honoring sparse.restrictCmds. (Side note: I think I will also be able to use the populate_sparse_checkout_patterns() function added by Stolee in the same patchset [2], to avoid code duplication in the get_sparsity_patterns() function added in this patch). [1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/ [2]: https://lore.kernel.org/git/444a6b5f894f28e96f713e5caccba18e1ea3b3eb.1588857462.git.gitgitgadget@gmail.com/ - With that said, the only reason we need restrict_to_sparse_paths() to begin with, is so that commands which recurse into submodules may respect the value set in each submodule for the sparse.restrictCmds config. This is already being done for grep, in this patch. But, should we do like this or should we use the value set at the superproject, for all submodules as well, when recursing (ignoring the value set on them)? - It's possible to also make read-tree respect the new setting/option, using --no-restrict-to-sparse-paths as a synonym for its --no-sparse-checkout option (with lower precedence). However, as this command can change the sparse checked out paths, I thought it kind of falls under a different category. Also, `git read-tree -mu --sparse-checkout` doesn't have the effect of *restricting* the command's behavior to the sparsity patterns, but of applying them to the working tree, right? So maybe it could be confusing to make this command honor the new setting. Does that make sense, or should we do it? - Finally, if we decide to make read-tree be affected by sparse.restrictCmds, there is also the case of whether the config should be honored for submodules or just propagate the superproject's value. I think the latter would be as simple as adding this line, before calling parse_options() in builtin/read-tree.c: opts.skip_sparse_checkout = !restrict_to_sparse_paths(the_repository); As for the former, I'm not very familiar with the code in unpack_trees(), so I'm not sure how complicated that would be. Documentation/config.txt | 2 + Documentation/config/sparse.txt | 22 ++++++++ Documentation/git-grep.txt | 3 + Documentation/git.txt | 4 ++ Makefile | 1 + builtin/grep.c | 14 ++++- contrib/completion/git-completion.bash | 2 + git.c | 6 ++ sparse-checkout.c | 16 ++++++ sparse-checkout.h | 11 ++++ t/t7817-grep-sparse-checkout.sh | 78 +++++++++++++++++++++++++- t/t9902-completion.sh | 4 +- 12 files changed, 159 insertions(+), 4 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..fd74b80302 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -436,6 +436,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 0000000000..83a4e0018f --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,22 @@ +sparse.restrictCmds:: + Only meaningful in conjunction with core.sparseCheckout. This option + extends sparse checkouts (which limit which paths are written to the + working tree), so that output and operations are also limited to the + sparsity paths where possible and implemented. The purpose of this + option is to (1) focus output for the user on the portion of the + repository that is of interest to them, and (2) enable potentially + dramatic performance improvements, especially in conjunction with + partial clones. ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c) that the user might also specify on the command +line. When false, the affected commands will work on full trees, ignoring the +sparsity patterns. For now, only git-grep honors this setting. In this command, +the restriction becomes relevant in one of these three cases: with --cached; +when a commit-ish is given; when searching a working tree that contains paths +previously excluded by the sparsity patterns. ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), +unaffected by any sparsity patterns. diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 9bdf807584..abbf100109 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,6 +41,9 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- +git-grep honors the sparse.restrictCmds setting. See its definition in +linkgit:git-config[1]. + :git-grep: 1 include::config/grep.txt[] diff --git a/Documentation/git.txt b/Documentation/git.txt index 9d6769e95a..5e107c6246 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. +--[no-]restrict-to-sparse-paths:: + Overrides the sparse.restrictCmds configuration (see + linkgit:git-config[1]) for this execution. + --list-cmds=group[,group...]:: List commands by group. This is an internal/experimental option and may change or be removed in the future. Supported diff --git a/Makefile b/Makefile index 3d3a39fc19..67580c691b 100644 --- a/Makefile +++ b/Makefile @@ -986,6 +986,7 @@ LIB_OBJS += sha1-name.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-checkout.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/builtin/grep.c b/builtin/grep.c index 91ee0b2734..3f92e7fd6c 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -25,6 +25,7 @@ #include "submodule-config.h" #include "object-store.h" #include "packfile.h" +#include "sparse-checkout.h" static char const * const grep_usage[] = { N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, int nr; struct strbuf name = STRBUF_INIT; int name_base_len = 0; + int sparse_paths_only = restrict_to_sparse_paths(repo); if (repo->submodule_prefix) { name_base_len = strlen(repo->submodule_prefix); strbuf_addstr(&name, repo->submodule_prefix); @@ -509,7 +511,8 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) + if (sparse_paths_only && ce_skip_worktree(ce) && + !S_ISGITLINK(ce->ce_mode)) continue; strbuf_setlen(&name, name_base_len); @@ -717,9 +720,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int is_root_tree) { struct pattern_list *patterns = NULL; + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); int ret; - if (is_root_tree) + if (is_root_tree && sparse_paths_only) patterns = get_sparsity_patterns(opt->repo); ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, @@ -1259,6 +1263,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!use_index || untracked) { int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { + warning(_("--[no-]restrict-to-sparse-paths is ignored" + " with --no-index or --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); } else if (0 <= opt_exclude) { die(_("--[no-]exclude-standard cannot be used for tracked contents")); diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index b1d6e5ebed..cba0f9166c 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3207,6 +3207,8 @@ __git_main () --namespace= --no-replace-objects --help + --restrict-to-sparse-paths + --no-restrict-to-sparse-paths " ;; *) diff --git a/git.c b/git.c index 2e4efb4ff0..f967c75d9c 100644 --- a/git.c +++ b/git.c @@ -37,6 +37,7 @@ const char git_more_info_string[] = "See 'git help git' for an overview of the system."); static int use_pager = -1; +int opt_restrict_to_sparse_paths = -1; static void list_builtins(struct string_list *list, unsigned int exclude_option); @@ -310,6 +311,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); } + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 1; + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 0; } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); @@ -318,6 +323,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) (*argv)++; (*argc)--; } + return (*argv) - orig_argv; } diff --git a/sparse-checkout.c b/sparse-checkout.c new file mode 100644 index 0000000000..9a9e50fd29 --- /dev/null +++ b/sparse-checkout.c @@ -0,0 +1,16 @@ +#include "cache.h" +#include "config.h" +#include "sparse-checkout.h" + +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; + + if (opt_restrict_to_sparse_paths >= 0) + return opt_restrict_to_sparse_paths; + + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) + ret = 1; + + return ret; +} diff --git a/sparse-checkout.h b/sparse-checkout.h new file mode 100644 index 0000000000..1de3b588d8 --- /dev/null +++ b/sparse-checkout.h @@ -0,0 +1,11 @@ +#ifndef SPARSE_CHECKOUT_H +#define SPARSE_CHECKOUT_H + +struct repository; + +extern int opt_restrict_to_sparse_paths; /* from git.c */ + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); + +#endif /* SPARSE_CHECKOUT_H */ diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index 3bd67082eb..8509694bf1 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -63,12 +63,28 @@ test_expect_success 'setup' ' test_path_is_file sub/B/b ' +# The two tests bellow check a special case: the sparsity patterns exclude '/b' +# and sparse checkout is enable, but the path exists on the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# honor --restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text EOF + echo newtext >b && git grep "text" >actual && - test_cmp expect actual + test_cmp expect actual && + rm b +' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text + b:newtext + EOF + echo newtext >b && + git --no-restrict-to-sparse-paths grep "text" >actual && + test_cmp expect actual && + rm b ' test_expect_success 'grep --cached should honor sparse checkout' ' @@ -137,4 +153,64 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse test_cmp expect_t-commit actual_t-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ + 'git -c sparse.restrictCmds=false grep' \ + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' +do + + test_expect_success "$cmd --cached should ignore sparsity patterns" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + t-commit:b:text + t-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit + ' +done + +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF + git -C sub config sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual && + git -C sub config --unset sparse.restrictCmds +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + EOF + git -C sub config sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual && + git -C sub config --unset sparse.restrictCmds +' + test_done diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 3c44af6940..a4a7767e06 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' ' --namespace= --no-replace-objects Z --help Z + --restrict-to-sparse-paths Z + --no-restrict-to-sparse-paths Z EOF ' @@ -1515,7 +1517,7 @@ test_expect_success 'general options' ' test_completion "git --nam" "--namespace=" && test_completion "git --bar" "--bare " && test_completion "git --inf" "--info-path " && - test_completion "git --no-r" "--no-replace-objects " + test_completion "git --no-rep" "--no-replace-objects " ' test_expect_success 'general options plus command' ' -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds 2020-05-10 0:41 ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-05-10 4:23 ` Matheus Tavares Bernardino 2020-05-21 17:18 ` Elijah Newren 2020-05-21 7:09 ` Elijah Newren 1 sibling, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-05-10 4:23 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan On Sat, May 9, 2020 at 9:42 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index 3bd67082eb..8509694bf1 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -63,12 +63,28 @@ test_expect_success 'setup' ' > test_path_is_file sub/B/b > ' > > +# The two tests bellow check a special case: the sparsity patterns exclude '/b' > +# and sparse checkout is enable, but the path exists on the working tree (e.g. > +# manually created after `git sparse-checkout init`). In this case, grep should > +# honor --restrict-to-sparse-paths. I just want to highlight a small thing that I forgot to comment on: Elijah and I had already discussed about --restrict-to-sparse-paths being relevant in grep only with --cached or when a commit-ish is given. But it had not occurred to me, before, the possibility of the special case mentioned above. I.e. when searching in the working tree and a path that should be excluded by the sparsity patterns is present. In this patch, I let --restrict-to-sparse-paths control the desired behavior for grep in this case too. But please, let me know if that doesn't seem like a good idea. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds 2020-05-10 4:23 ` Matheus Tavares Bernardino @ 2020-05-21 17:18 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-05-21 17:18 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sat, May 9, 2020 at 9:23 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Sat, May 9, 2020 at 9:42 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > index 3bd67082eb..8509694bf1 100755 > > --- a/t/t7817-grep-sparse-checkout.sh > > +++ b/t/t7817-grep-sparse-checkout.sh > > @@ -63,12 +63,28 @@ test_expect_success 'setup' ' > > test_path_is_file sub/B/b > > ' > > > > +# The two tests bellow check a special case: the sparsity patterns exclude '/b' > > +# and sparse checkout is enable, but the path exists on the working tree (e.g. > > +# manually created after `git sparse-checkout init`). In this case, grep should > > +# honor --restrict-to-sparse-paths. > > I just want to highlight a small thing that I forgot to comment on: > Elijah and I had already discussed about --restrict-to-sparse-paths > being relevant in grep only with --cached or when a commit-ish is > given. But it had not occurred to me, before, the possibility of the > special case mentioned above. I.e. when searching in the working tree > and a path that should be excluded by the sparsity patterns is > present. In this patch, I let --restrict-to-sparse-paths control the > desired behavior for grep in this case too. But please, let me know if > that doesn't seem like a good idea. Wow, that is an interesting edge case. But it can come up during a merge or rebase or checkout -m, could be manually changed by various plumbing commands, and might just not be enforced well in various areas of the system (see e.g. [1]). Perhaps the most interesting case, given recent discussion, is submodules -- those might be left in the working tree despite not matching sparsity paths. So, should `git -c sparse.restrictCmds=true grep PATTERN` look at these paths or not? Currently, you've chosen contradictory answers -- yes to submodules, and no to other entries. I'm not certain here, but I've given it a little thought and think there's a few things to take into consideration: Users are used to the fact that grep -r PATTERN * searches existing files for PATTERN. If you delete a file, then a subsequent grep isn't going to search through it. Similarly, git grep is billed as a grep which limits searches to tracked files, thus they expect git grep PATTERN to search for files in their working copy but limiting it to files which are tracked. From this angle, I think users would be surprised if `git grep` searched through deleted files, and they would also be surprised if it ignored tracked and present files. That is a basic answer, but let's go a bit further. Since git grep also has history at its disposal, it has more options. For example: git grep REVISION PATTERN means to search through all tracked files (those are the only kinds that are recorded in revisions anyway) as of REVISION for the given PATTERN, without checking it out. Users probably expect this to behave the same as: git checkout REVISION git grep PATTERN and since checkout pays attention to sparsity rules, this is why we'd want to have both "git grep PATTERN" and "git grep REVISION PATTERN" pay attention to sparsity rules. When we think in terms of "git grep REVISION PATTERN" as an optimized version of "git checkout REVISION && git grep PATTERN" it puts us in the frame of mind of asking the following question: For each path, would it be marked as SKIP_WORKTREE if we were to check it out right now? If so, we should skip it for the grepping. Usually, the SKIP_WORKTREE bit is set for files if and only if they don't match the sparsity patterns. Also, we can't use the SKIP_WORKTREE bit of the current index to decide whether to grep through an old REVISION, because there are paths that exists in the old revision that don't exist in the current index. The sparsity rules are the only things that can tell us whether such a path would be marked as SKIP_WORKTREE if we were to check it out. So it makes sense to use the sparsity patterns when looking at REVISIONS. When dealing with the current worktree, we can check SKIP_WORKTREE directly. Usually that'll give the same answer as asking the sparsity rules but as per [1] the two aren't always identical. Rather than asking "Would we mark this as SKIP_WORKTREE if we were to checkout this version right now?", perhaps we should ask "Since we have this version checked out right now, let's just check the path directly. Is it marked as SKIP_WORKTREE?". Does that sound reasonable? [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds 2020-05-10 0:41 ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-05-10 4:23 ` Matheus Tavares Bernardino @ 2020-05-21 7:09 ` Elijah Newren 1 sibling, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-05-21 7:09 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan Sorry for the late reply...and for responding in backwards order. Great to see these newer patches! On Sat, May 9, 2020 at 5:42 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > When sparse checkout is enabled, some users expect the output of certain > commands (such as grep, diff, and log) to be also restricted within the > sparsity patterns. This would allow them to effectively work only on the > subset of files in which they are interested; and allow some commands to > possibly perform better, by not considering uninteresting paths. For > this reason, we taught grep to honor the sparsity patterns, in the > previous commit. But, on the other hand, allowing grep and the other > commands mentioned to optionally ignore the patterns also make for some > interesting use cases. E.g. using grep to search for a function > definition that resides outside the sparse checkout. > > In any case, there is no current way for users to configure the behavior > they want for these commands. Aiming to provide this flexibility, let's > introduce the sparse.restrictCmds setting (and the analogous > --[no]-restrict-to-sparse-paths global option). The default value is > true. For now, grep is the only one affected by this setting, but the > goal is to have support for more commands, in the future. > > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > > Some notes/questions about this one: > > - I guess having the additional sparse-checkout.o only for the > restrict_to_sparse_paths() function is not very justifiable. > Especially since builtin/grep.c is currently its only caller. But > since Stolee is already moving some code out of the sparse-checkout > builtin and into sparse-checkout.o [1], I thought it would be better > to place this function here from the start, as it will likely be > needed by other cmds when they start honoring sparse.restrictCmds. > (Side note: I think I will also be able to use the > populate_sparse_checkout_patterns() function added by Stolee in the > same patchset [2], to avoid code duplication in the > get_sparsity_patterns() function added in this patch). > > [1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/ > [2]: https://lore.kernel.org/git/444a6b5f894f28e96f713e5caccba18e1ea3b3eb.1588857462.git.gitgitgadget@gmail.com/ Seems reasonable to me. > - With that said, the only reason we need restrict_to_sparse_paths() to > begin with, is so that commands which recurse into submodules may > respect the value set in each submodule for the sparse.restrictCmds > config. This is already being done for grep, in this patch. But, > should we do like this or should we use the value set at the > superproject, for all submodules as well, when recursing (ignoring the > value set on them)? We have a few different types of files in git: tracked, untracked, and ignored (though it's sometimes not clear if people are using untracked to mean everything that isn't tracked, or if they are using it to mean everything that is both not tracked and not ignored; it seems to depend on the context). The point of the sparsity patterns is to break the "tracked" category into two subsets: those tracked files matching the sparsity patterns and the tracked files that don't. The reason for this subsetting is it allows us to work with a smaller subset of a much larger repository. The thing about submodules is that the parent repository doesn't know what the submodule tracks, it only has a commit id. The submodule itself knows which individual files it tracks in its own index. If the parent module doesn't even know which files the submodule tracks, how is it supposed to be responsible for defining a subset of the submodules' tracked files? It seems like a layering violation to me. So, I think you are right with grep to not override the submodules' sparse.restrictCmds config. For other commands that recurse into submodules, if there are any relevant ones, I think they'd want to do the same as you did for grep here. But what other commands recurse into submodules? I can't think of any right now. log doesn't, diff doesn't, status doesn't. The only ones I can think of right now are clone and pull. In the case of clone, the submodule doesn't exist yet so can't have any setting yet. In the case of pull, what would it do with the setting anyway? Do a partial fetch that ignores blobs outside the sparse cone? I think that'd be great...but wouldn't that behavior of fetch be controlled by whether the user was in a partial clone rather than any sparse-checkout setting? (I have to admit I'm not familiar with how partial clones work yet.) [Later edit:] Also, pull seems like more of a write operation, so see below. > - It's possible to also make read-tree respect the new setting/option, > using --no-restrict-to-sparse-paths as a synonym for its > --no-sparse-checkout option (with lower precedence). However, as this > command can change the sparse checked out paths, I thought it kind > of falls under a different category. Also, `git read-tree -mu > --sparse-checkout` doesn't have the effect of *restricting* the > command's behavior to the sparsity patterns, but of applying them to > the working tree, right? So maybe it could be confusing to make this > command honor the new setting. Does that make sense, or should we do > it? That's a good question; I hadn't considered read-tree before. My gut reaction is that these flags only affect read operations, not write ones. (And doesn't affect all read operations; e.g. fsck is about integrity checking, so fsck by default would check everything that was downloaded and would only be limited in e.g. a partial clone -- but that's a different kind of limit.) For example, if we said these flags affected write operations, then as soon as someone sets sparse.restrictCmds=false and then runs 'git checkout $branch', then we would be forced to interpret sparse.restrictCmds=false to mean we shouldn't pay attention to sparsity patterns and thus should check out ALL files. The user would end up with a non-sparse tree really fast and would have to constantly re-sparsify. I think that's pretty clearly not the intention. As such, I think these flags are for controlling read operations like grep/diff/log, and that neither read-tree nor checkout should be affected by these flags. > - Finally, if we decide to make read-tree be affected by > sparse.restrictCmds, there is also the case of whether the config > should be honored for submodules or just propagate the superproject's > value. I think the latter would be as simple as adding this line, > before calling parse_options() in builtin/read-tree.c: > > opts.skip_sparse_checkout = !restrict_to_sparse_paths(the_repository); > > As for the former, I'm not very familiar with the code in > unpack_trees(), so I'm not sure how complicated that would be. As before, I don't think propagating the superproject's value makes any sense. However, I don't think making read-tree be affected by sparse.restrictCmds makes sense either so it shouldn't matter. > Documentation/config.txt | 2 + > Documentation/config/sparse.txt | 22 ++++++++ > Documentation/git-grep.txt | 3 + > Documentation/git.txt | 4 ++ > Makefile | 1 + > builtin/grep.c | 14 ++++- > contrib/completion/git-completion.bash | 2 + > git.c | 6 ++ > sparse-checkout.c | 16 ++++++ > sparse-checkout.h | 11 ++++ > t/t7817-grep-sparse-checkout.sh | 78 +++++++++++++++++++++++++- > t/t9902-completion.sh | 4 +- > 12 files changed, 159 insertions(+), 4 deletions(-) > create mode 100644 Documentation/config/sparse.txt > create mode 100644 sparse-checkout.c > create mode 100644 sparse-checkout.h > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index ef0768b91a..fd74b80302 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -436,6 +436,8 @@ include::config/sequencer.txt[] > > include::config/showbranch.txt[] > > +include::config/sparse.txt[] > + > include::config/splitindex.txt[] > > include::config/ssh.txt[] > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > new file mode 100644 > index 0000000000..83a4e0018f > --- /dev/null > +++ b/Documentation/config/sparse.txt > @@ -0,0 +1,22 @@ > +sparse.restrictCmds:: > + Only meaningful in conjunction with core.sparseCheckout. This option > + extends sparse checkouts (which limit which paths are written to the > + working tree), so that output and operations are also limited to the > + sparsity paths where possible and implemented. The purpose of this > + option is to (1) focus output for the user on the portion of the > + repository that is of interest to them, and (2) enable potentially > + dramatic performance improvements, especially in conjunction with > + partial clones. > ++ > +When this option is true (default), some git commands may limit their behavior > +to the paths specified by the sparsity patterns, or to the intersection of > +those paths and any (like `*.c) that the user might also specify on the command > +line. When false, the affected commands will work on full trees, ignoring the > +sparsity patterns. For now, only git-grep honors this setting. In this command, > +the restriction becomes relevant in one of these three cases: with --cached; > +when a commit-ish is given; when searching a working tree that contains paths > +previously excluded by the sparsity patterns. > ++ > +Note: commands which export, integrity check, or create history will always > +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), > +unaffected by any sparsity patterns. > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 9bdf807584..abbf100109 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -41,6 +41,9 @@ characters. An empty string as search expression matches all lines. > CONFIGURATION > ------------- > > +git-grep honors the sparse.restrictCmds setting. See its definition in > +linkgit:git-config[1]. > + > :git-grep: 1 > include::config/grep.txt[] > > diff --git a/Documentation/git.txt b/Documentation/git.txt > index 9d6769e95a..5e107c6246 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use > Do not perform optional operations that require locks. This is > equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. > > +--[no-]restrict-to-sparse-paths:: > + Overrides the sparse.restrictCmds configuration (see > + linkgit:git-config[1]) for this execution. > + > --list-cmds=group[,group...]:: > List commands by group. This is an internal/experimental > option and may change or be removed in the future. Supported > diff --git a/Makefile b/Makefile > index 3d3a39fc19..67580c691b 100644 > --- a/Makefile > +++ b/Makefile > @@ -986,6 +986,7 @@ LIB_OBJS += sha1-name.o > LIB_OBJS += shallow.o > LIB_OBJS += sideband.o > LIB_OBJS += sigchain.o > +LIB_OBJS += sparse-checkout.o > LIB_OBJS += split-index.o > LIB_OBJS += stable-qsort.o > LIB_OBJS += strbuf.o > diff --git a/builtin/grep.c b/builtin/grep.c > index 91ee0b2734..3f92e7fd6c 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -25,6 +25,7 @@ > #include "submodule-config.h" > #include "object-store.h" > #include "packfile.h" > +#include "sparse-checkout.h" > > static char const * const grep_usage[] = { > N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), > @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, > int nr; > struct strbuf name = STRBUF_INIT; > int name_base_len = 0; > + int sparse_paths_only = restrict_to_sparse_paths(repo); > if (repo->submodule_prefix) { > name_base_len = strlen(repo->submodule_prefix); > strbuf_addstr(&name, repo->submodule_prefix); > @@ -509,7 +511,8 @@ static int grep_cache(struct grep_opt *opt, > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > > - if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) > + if (sparse_paths_only && ce_skip_worktree(ce) && > + !S_ISGITLINK(ce->ce_mode)) > continue; > > strbuf_setlen(&name, name_base_len); > @@ -717,9 +720,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > int is_root_tree) > { > struct pattern_list *patterns = NULL; > + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); > int ret; > > - if (is_root_tree) > + if (is_root_tree && sparse_paths_only) > patterns = get_sparsity_patterns(opt->repo); > > ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, > @@ -1259,6 +1263,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > > if (!use_index || untracked) { > int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; > + > + if (opt_restrict_to_sparse_paths >= 0) { > + warning(_("--[no-]restrict-to-sparse-paths is ignored" > + " with --no-index or --untracked")); I think this should instead be die(_("--[no-]restrict-to-sparse-paths is incompatible with --no-index and --untracked")) Restricting to sparse paths (or not) is about working with subsets of tracked files (or all tracked files). --no-index and --untracked are about working with files that aren't tracked. They just don't make sense to combine. > + } > + > hit = grep_directory(&opt, &pathspec, use_exclude, use_index); > } else if (0 <= opt_exclude) { > die(_("--[no-]exclude-standard cannot be used for tracked contents")); > diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash > index b1d6e5ebed..cba0f9166c 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -3207,6 +3207,8 @@ __git_main () > --namespace= > --no-replace-objects > --help > + --restrict-to-sparse-paths > + --no-restrict-to-sparse-paths > " > ;; > *) > diff --git a/git.c b/git.c > index 2e4efb4ff0..f967c75d9c 100644 > --- a/git.c > +++ b/git.c > @@ -37,6 +37,7 @@ const char git_more_info_string[] = > "See 'git help git' for an overview of the system."); > > static int use_pager = -1; > +int opt_restrict_to_sparse_paths = -1; > > static void list_builtins(struct string_list *list, unsigned int exclude_option); > > @@ -310,6 +311,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > } else { > exit(list_cmds(cmd)); > } > + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { > + opt_restrict_to_sparse_paths = 1; > + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { > + opt_restrict_to_sparse_paths = 0; > } else { > fprintf(stderr, _("unknown option: %s\n"), cmd); > usage(git_usage_string); > @@ -318,6 +323,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > (*argv)++; > (*argc)--; > } > + > return (*argv) - orig_argv; > } > > diff --git a/sparse-checkout.c b/sparse-checkout.c > new file mode 100644 > index 0000000000..9a9e50fd29 > --- /dev/null > +++ b/sparse-checkout.c > @@ -0,0 +1,16 @@ > +#include "cache.h" > +#include "config.h" > +#include "sparse-checkout.h" > + > +int restrict_to_sparse_paths(struct repository *repo) > +{ > + int ret; > + > + if (opt_restrict_to_sparse_paths >= 0) > + return opt_restrict_to_sparse_paths; > + > + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) > + ret = 1; > + > + return ret; > +} > diff --git a/sparse-checkout.h b/sparse-checkout.h > new file mode 100644 > index 0000000000..1de3b588d8 > --- /dev/null > +++ b/sparse-checkout.h > @@ -0,0 +1,11 @@ > +#ifndef SPARSE_CHECKOUT_H > +#define SPARSE_CHECKOUT_H > + > +struct repository; > + > +extern int opt_restrict_to_sparse_paths; /* from git.c */ > + > +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ > +int restrict_to_sparse_paths(struct repository *repo); > + > +#endif /* SPARSE_CHECKOUT_H */ > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index 3bd67082eb..8509694bf1 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -63,12 +63,28 @@ test_expect_success 'setup' ' > test_path_is_file sub/B/b > ' > > +# The two tests bellow check a special case: the sparsity patterns exclude '/b' > +# and sparse checkout is enable, but the path exists on the working tree (e.g. > +# manually created after `git sparse-checkout init`). In this case, grep should > +# honor --restrict-to-sparse-paths. > test_expect_success 'grep in working tree should honor sparse checkout' ' > cat >expect <<-EOF && > a:text > EOF > + echo newtext >b && > git grep "text" >actual && > - test_cmp expect actual > + test_cmp expect actual && > + rm b > +' > +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' > + cat >expect <<-EOF && > + a:text > + b:newtext > + EOF > + echo newtext >b && > + git --no-restrict-to-sparse-paths grep "text" >actual && > + test_cmp expect actual && > + rm b > ' > > test_expect_success 'grep --cached should honor sparse checkout' ' > @@ -137,4 +153,64 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse > test_cmp expect_t-commit actual_t-commit > ' > > +for cmd in 'git --no-restrict-to-sparse-paths grep' \ > + 'git -c sparse.restrictCmds=false grep' \ > + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' > +do > + > + test_expect_success "$cmd --cached should ignore sparsity patterns" ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + EOF > + $cmd --cached "text" >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:b:text > + $commit:dir/c:text > + EOF > + cat >expect_t-commit <<-EOF && > + t-commit:a:text > + t-commit:b:text > + t-commit:dir/c:text > + EOF > + $cmd "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + $cmd "text" t-commit >actual_t-commit && > + test_cmp expect_t-commit actual_t-commit > + ' > +done > + > +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' > + cat >expect <<-EOF && > + a:text > + sub/A/a:text > + sub/B/b:text > + EOF > + git -C sub config sparse.restrictCmds false && > + git grep --cached --recurse-submodules "text" >actual && > + test_cmp expect actual && > + git -C sub config --unset sparse.restrictCmds > +' > + > +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + sub/A/a:text > + sub/B/b:text > + EOF > + git -C sub config sparse.restrictCmds true && > + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && > + test_cmp expect actual && > + git -C sub config --unset sparse.restrictCmds > +' > + > test_done > diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh > index 3c44af6940..a4a7767e06 100755 > --- a/t/t9902-completion.sh > +++ b/t/t9902-completion.sh > @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' ' > --namespace= > --no-replace-objects Z > --help Z > + --restrict-to-sparse-paths Z > + --no-restrict-to-sparse-paths Z > EOF > ' > > @@ -1515,7 +1517,7 @@ test_expect_success 'general options' ' > test_completion "git --nam" "--namespace=" && > test_completion "git --bar" "--bare " && > test_completion "git --inf" "--info-path " && > - test_completion "git --no-r" "--no-replace-objects " > + test_completion "git --no-rep" "--no-replace-objects " > ' > > test_expect_success 'general options plus command' ' > -- > 2.26.2 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (3 preceding siblings ...) 2020-05-10 0:41 ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-05-28 1:12 ` Matheus Tavares 2020-05-28 1:12 ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares ` (5 more replies) 4 siblings, 6 replies; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:12 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy This series is based on the discussions in [1]. The idea is to make git-grep (and other commands, in the future) be able to restrict their output to the sparsity patterns, when requested by the user. [1]: https://lore.kernel.org/git/CAHd-oW7e5qCuxZLBeVDq+Th3E+E4+P8=WzJfK8WcG2yz=n_nag@mail.gmail.com/t/#u Note on tests: In the previous iteration, the setup test in t7817 (patch 4), used the following command to set sparse-checkout up: git sparse-checkout set "/*" "!b" "!/*/" "/sub" In this iteration, though, I had to change "/sub" to "sub" (which is not the same, but should produce the same results in t1787). Using the previous format, the test failed on Windows because git grep --recurse-submodules did not recurse into "/sub" (a submodule). I used [2] to investigate and noticed that sub indeed had the SKIP_WORKTREE bit set on the repo created by the test on Windows. And the sparse-checkout file contained the following: /* !b !/*/ C:/Users/<path_to_the_Git_for_Windows_SDK_installation>/sub I wasn't expecting the conversion from "/sub" to the path above. But I'm not very familiar with the Git for Windows SDK, so is this conversion expected? Furthermore, `pwd` would output: /usr/src/git/t/trash directory.t7817-grep-sparse-checkout So I think that would explain why the converted path for the "/sub" rule didn't match sub. Could this be a bug in `git sparse-checkout set`? Or am I missing something? [2]: https://github.com/sbp/gin Main changes since v2: Added patch 2. Patch 3: - Fix reading of extensions.worktreeConfig value in do_git_config_sequence(), to get the one from the given git_dir, not the_repository. - Add --submodule option to test-config helper and regression test for the fixes in do_git_config_sequence(). Patch 4: - Reword commit message to remove snippet about --untracked and --no-index respecting the sparsity patterns. - Don't grep submodules that are excluded by the sparsity patterns. - Add tests to ensure that submodules (and other paths) that are excluded by the sparsity patterns, but present in the working tree, are not grepped. - Some minor variable renames in tests, for better readability. Patch 5: - Mention in sparse config docs that --[no-]restrict-to-sparse-paths won't affect writting commands. - die() in grep when --[no-]restrict-to-sparse-paths is used with --no-index or --untracked, and add test for this behavior. - Use test_when_finished and test_config, when possible, to avoid breaking next test cases on a test error. - Adjust the behavior of --[no-]restrict-to-sparse-paths to follow the ideas proposed by Elijah in [3] and [4]. Also add more tests for the different cases where this option is relevant and improve docs at Documentation/config/sparse.txt. [3]: https://lore.kernel.org/git/CABPp-BE6M9ATDYuQh8f_r3S00dM2Cv9vM3T5j5W_odbVzhC-5A@mail.gmail.com/ [4]: https://lore.kernel.org/git/CABPp-BGEPU49yRN2FRtwhYn6Uh+scGKEFYP4G2GH6=uBTN1SCw@mail.gmail.com/ CI: https://github.com/matheustavares/git/actions/runs/117388742 Matheus Tavares (5): doc: grep: unify info on configuration variables t/helper/test-config: return exit codes consistently config: correctly read worktree configs in submodules grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 10 +- Documentation/config/sparse.txt | 24 ++ Documentation/git-grep.txt | 37 +-- Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 134 ++++++++++- config.c | 21 +- contrib/completion/git-completion.bash | 2 + git.c | 6 + sparse-checkout.c | 16 ++ sparse-checkout.h | 11 + t/helper/test-config.c | 183 +++++++++------ t/t2404-worktree-config.sh | 16 ++ t/t7011-skip-worktree-reading.sh | 9 - t/t7817-grep-sparse-checkout.sh | 300 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 17 files changed, 663 insertions(+), 117 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v2: 1: c344d22313 = 1: 63c195d737 doc: grep: unify info on configuration variables 2: 882310b69f < -: ---------- config: load the correct config.worktree file -: ---------- > 2: 43402007ad t/helper/test-config: return exit codes consistently -: ---------- > 3: 448e0efffd config: correctly read worktree configs in submodules 3: e00674c727 ! 4: 5ddac81818 grep: honor sparse checkout patterns @@ Commit message git-grep currently ignores the sparsity patterns and report all matches found outside this subset, which kind of goes in the opposite direction. Let's fix that, making it honor the sparsity boundaries for every - grepping case: + grepping case where this is relevant: - git grep in worktree - git grep --cached - git grep $REVISION - - git grep --untracked and git grep --no-index (which already respect - sparse checkout boundaries) - This is also what some users reported[1] they would want as the default - behavior. + For the worktree case, we will not grep paths that have the + SKIP_WORKTREE bit set, even if they are present for some reason (e.g. + manually created after `git sparse-checkout init`). But the next patch + will add an option to do so. (See 'Note' below.) - Note: for `git grep $REVISION`, we will choose to honor the sparsity - patterns only when $REVISION is a commit-ish object. The reason is that, - for a tree, we don't know whether it represents the root of a - repository or a subtree. So we wouldn't be able to correctly match it - against the sparsity patterns. E.g. suppose we have a repository with - these two sparsity rules: "/*" and "!/a"; and the following structure: + For `git grep $REVISION`, we will choose to honor the sparsity patterns + only when $REVISION is a commit-ish object. The reason is that, for a + tree, we don't know whether it represents the root of a repository or a + subtree. So we wouldn't be able to correctly match it against the + sparsity patterns. E.g. suppose we have a repository with these two + sparsity rules: "/*" and "!/a"; and the following structure: / | - a (file) @@ Commit message therefore it would wrongly match the pattern "!/a". Furthermore, for a search in a blob object, we wouldn't even have a path to check the patterns against. So, let's ignore the sparsity patterns when grepping - non-commit-ish objects (tags to commits should be fine). + non-commit-ish objects. - Finally, the old behavior may still be desirable for some use cases. So - the next patch will add an option to allow restoring it when needed. + Note: The behavior introduced in this patch is what some users have + reported[1] that they would like by default. But the old behavior is + still desirable for some use cases. Therefore, the next patch will add + an option to allow restoring it when needed. [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + -+ if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) ++ if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; -+ -+ if (S_ISGITLINK(entry_mode)) -+ return 1; ++ int is_dir = S_ISDIR(entry_mode); + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + -+ if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) ++ if (is_dir && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, + if (*match == UNDECIDED) + *match = parent_match; + -+ if (S_ISDIR(entry_mode)) ++ if (is_dir) + strbuf_trim_trailing_dir_sep(path); + -+ if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || -+ (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) -+ return 0; ++ if (*match == NOT_MATCHED && ++ (!is_dir || (is_dir && sparsity->use_cone_patterns))) ++ return 0; + + return 1; +} @@ t/t7817-grep-sparse-checkout.sh (new) +|-- b +|-- dir +| `-- c -+`-- sub -+ |-- A -+ | `-- a -+ `-- B -+ `-- b -+ -+Where . has non-cone mode sparsity patterns and sub is a submodule with cone -+mode sparsity patterns. The resulting sparse-checkout should leave the following -+structure: ++|-- sub ++| |-- A ++| | `-- a ++| `-- B ++| `-- b ++`-- sub2 ++ `-- a ++ ++Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode ++sparsity patterns and sub2 is a submodule that is excluded by the superproject ++sparsity patterns. The resulting sparse checkout should leave the following ++structure on the working tree: + +. +|-- a -+`-- sub -+ `-- B -+ `-- b ++|-- sub ++| `-- B ++| `-- b ++`-- sub2 ++ `-- a ++ ++But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh @@ t/t7817-grep-sparse-checkout.sh (new) + git sparse-checkout set B + ) && + ++ git init sub2 && ++ ( ++ cd sub2 && ++ echo "text" >a && ++ git add a && ++ git commit -m sub2 ++ ) && ++ + git submodule add ./sub && ++ git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && -+ git sparse-checkout set "/*" "!b" "!/*/" && ++ git sparse-checkout set "/*" "!b" "!/*/" "sub" && + -+ git tag -am t-commit t-commit HEAD && ++ git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && -+ git tag -am t-tree t-tree $tree && ++ git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && -+ test_path_is_file sub/B/b ++ test_path_is_file sub/B/b && ++ test_path_is_file sub2/a +' + ++# The test bellow checks a special case: the sparsity patterns exclude '/b' ++# and sparse checkout is enable, but the path exists on the working tree (e.g. ++# manually created after `git sparse-checkout init`). In this case, grep should ++# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF ++ echo "new-text" >b && ++ test_when_finished "rm b" && + git grep "text" >actual && + test_cmp expect actual +' @@ t/t7817-grep-sparse-checkout.sh (new) + cat >expect_commit <<-EOF && + $commit:a:text + EOF -+ cat >expect_t-commit <<-EOF && -+ t-commit:a:text ++ cat >expect_tag-to-commit <<-EOF && ++ tag-to-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && -+ git grep "text" t-commit >actual_t-commit && -+ test_cmp expect_t-commit actual_t-commit ++ git grep "text" tag-to-commit >actual_tag-to-commit && ++ test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' @@ t/t7817-grep-sparse-checkout.sh (new) + $tree:b:text + $tree:dir/c:text + EOF -+ cat >expect_t-tree <<-EOF && -+ t-tree:a:text -+ t-tree:b:text -+ t-tree:dir/c:text ++ cat >expect_tag-to-tree <<-EOF && ++ tag-to-tree:a:text ++ tag-to-tree:b:text ++ tag-to-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && -+ git grep "text" t-tree >actual_t-tree && -+ test_cmp expect_t-tree actual_t-tree ++ git grep "text" tag-to-tree >actual_tag-to-tree && ++ test_cmp expect_tag-to-tree actual_tag-to-tree ++' ++ ++# Note that sub2/ is present in the worktree but it is excluded by the sparsity ++# patterns, so grep should not recurse into it. ++test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' ++ cat >expect <<-EOF && ++ a:text ++ sub/B/b:text ++ EOF ++ git grep --recurse-submodules "text" >actual && ++ test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' @@ t/t7817-grep-sparse-checkout.sh (new) + $commit:a:text + $commit:sub/B/b:text + EOF -+ cat >expect_t-commit <<-EOF && -+ t-commit:a:text -+ t-commit:sub/B/b:text ++ cat >expect_tag-to-commit <<-EOF && ++ tag-to-commit:a:text ++ tag-to-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && -+ git grep --recurse-submodules "text" t-commit >actual_t-commit && -+ test_cmp expect_t-commit actual_t-commit ++ git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && ++ test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_done 4: 3e9e906249 ! 5: 748b1e955c config: add setting to ignore sparsity patterns in some cmds @@ Commit message subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the - previous commit. But, on the other hand, allowing grep and the other + previous patch. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function - definition that resides outside the sparse checkout. + documentation that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's @@ Documentation/config/sparse.txt (new) ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of -+those paths and any (like `*.c) that the user might also specify on the command -+line. When false, the affected commands will work on full trees, ignoring the -+sparsity patterns. For now, only git-grep honors this setting. In this command, -+the restriction becomes relevant in one of these three cases: with --cached; -+when a commit-ish is given; when searching a working tree that contains paths -+previously excluded by the sparsity patterns. ++those paths and any (like `*.c`) that the user might also specify on the ++command line. When false, the affected commands will work on full trees, ++ignoring the sparsity patterns. For now, only git-grep honors this setting. In ++this command, the restriction takes effect in three cases: with --cached; when ++a commit-ish is given; when searching a working tree where some paths excluded ++by the sparsity patterns are present (e.g. manually created paths or not ++removed submodules). ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), -+unaffected by any sparsity patterns. ++unaffected by any sparsity patterns. Also, writting commands such as ++sparse-checkout and read-tree will not be affected by this configuration. ## Documentation/git-grep.txt ## @@ Documentation/git-grep.txt: characters. An empty string as search expression matches all lines. @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; -- if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) -+ if (sparse_paths_only && ce_skip_worktree(ce) && -+ !S_ISGITLINK(ce->ce_mode)) +- if (ce_skip_worktree(ce)) ++ if (sparse_paths_only && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix) int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { -+ warning(_("--[no-]restrict-to-sparse-paths is ignored" -+ " with --no-index or --untracked")); ++ die(_("--[no-]restrict-to-sparse-paths is incompatible" ++ " with --no-index and --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); @@ sparse-checkout.h (new) ## t/t7817-grep-sparse-checkout.sh ## @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'setup' ' - test_path_is_file sub/B/b + test_path_is_file sub2/a ' +-# The test bellow checks a special case: the sparsity patterns exclude '/b' +# The two tests bellow check a special case: the sparsity patterns exclude '/b' -+# and sparse checkout is enable, but the path exists on the working tree (e.g. -+# manually created after `git sparse-checkout init`). In this case, grep should -+# honor --restrict-to-sparse-paths. + # and sparse checkout is enable, but the path exists on the working tree (e.g. + # manually created after `git sparse-checkout init`). In this case, grep should +-# skip it. ++# skip the file by default, but not with --no-restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text - EOF -+ echo newtext >b && +@@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep in working tree should honor sparse checkout' ' git grep "text" >actual && -- test_cmp expect actual -+ test_cmp expect actual && -+ rm b -+' + test_cmp expect actual + ' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text -+ b:newtext ++ b:new-text + EOF -+ echo newtext >b && ++ echo "new-text" >b && ++ test_when_finished "rm b" && + git --no-restrict-to-sparse-paths grep "text" >actual && -+ test_cmp expect actual && -+ rm b - ' ++ test_cmp expect actual ++' test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && +@@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + ' + + # Note that sub2/ is present in the worktree but it is excluded by the sparsity +-# patterns, so grep should not recurse into it. ++# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. + test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text +@@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules should honor sparse checkout in s + git grep --recurse-submodules "text" >actual && + test_cmp expect actual + ' ++test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' ++ cat >expect <<-EOF && ++ a:text ++ sub/B/b:text ++ sub2/a:text ++ EOF ++ git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && ++ test_cmp expect actual ++' + + test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse - test_cmp expect_t-commit actual_t-commit + test_cmp expect_tag-to-commit actual_tag-to-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules + $commit:b:text + $commit:dir/c:text + EOF -+ cat >expect_t-commit <<-EOF && -+ t-commit:a:text -+ t-commit:b:text -+ t-commit:dir/c:text ++ cat >expect_tag-to-commit <<-EOF && ++ tag-to-commit:a:text ++ tag-to-commit:b:text ++ tag-to-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && -+ $cmd "text" t-commit >actual_t-commit && -+ test_cmp expect_t-commit actual_t-commit ++ $cmd "text" tag-to-commit >actual_tag-to-commit && ++ test_cmp expect_tag-to-commit actual_tag-to-commit + ' +done + ++test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' ' ++ cat >expect <<-EOF && ++ a:text ++ b:text ++ dir/c:text ++ sub/A/a:text ++ sub/B/b:text ++ sub2/a:text ++ EOF ++ git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ ++ "text" >actual && ++ test_cmp expect actual ++' ++ ++test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' ' ++ commit=$(git rev-parse HEAD) && ++ cat >expect_commit <<-EOF && ++ $commit:a:text ++ $commit:b:text ++ $commit:dir/c:text ++ $commit:sub/A/a:text ++ $commit:sub/B/b:text ++ $commit:sub2/a:text ++ EOF ++ cat >expect_tag-to-commit <<-EOF && ++ tag-to-commit:a:text ++ tag-to-commit:b:text ++ tag-to-commit:dir/c:text ++ tag-to-commit:sub/A/a:text ++ tag-to-commit:sub/B/b:text ++ tag-to-commit:sub2/a:text ++ EOF ++ git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ ++ $commit >actual_commit && ++ test_cmp expect_commit actual_commit && ++ git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ ++ tag-to-commit >actual_tag-to-commit && ++ test_cmp expect_tag-to-commit actual_tag-to-commit ++' ++ +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF -+ git -C sub config sparse.restrictCmds false && ++ test_config -C sub sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && -+ test_cmp expect actual && -+ git -C sub config --unset sparse.restrictCmds ++ test_cmp expect actual +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules + dir/c:text + sub/A/a:text + sub/B/b:text ++ sub2/a:text + EOF -+ git -C sub config sparse.restrictCmds true && ++ test_config -C sub sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && -+ test_cmp expect actual && -+ git -C sub config --unset sparse.restrictCmds ++ test_cmp expect actual +' ++ ++for opt in '--untracked' '--no-index' ++do ++ test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " ++ test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && ++ test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual ++ " ++done + test_done -- 2.26.2 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v3 1/5] doc: grep: unify info on configuration variables 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares @ 2020-05-28 1:12 ` Matheus Tavares 2020-05-28 1:13 ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares ` (4 subsequent siblings) 5 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:12 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which can make maintenance difficult. The first also contains a definition not present in the latter (grep.fullName). To avoid problems like this, let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 10 ++++++++-- Documentation/git-grep.txt | 36 ++++++----------------------------- 2 files changed, 14 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..dd51db38e1 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,14 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index a7f9bc99ea..9bdf807584 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,8 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +:git-grep: 1 +include::config/grep.txt[] OPTIONS ------- @@ -269,8 +243,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v3 2/5] t/helper/test-config: return exit codes consistently 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-28 1:12 ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-05-28 1:13 ` Matheus Tavares 2020-05-30 14:29 ` Elijah Newren 2020-05-28 1:13 ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares ` (3 subsequent siblings) 5 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:13 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy The test-config helper may exit with a variety of at least four different codes, to reflect the status of the requested operations. These codes are sometimes checked in the tests, but not all of the codes are returned consistently by the helper: 1 will usually refer to a "value not found", but usage errors can also return 1 or 128. The latter is also expected on errors within the configset functions. These inconsistent uses of the exit codes can lead to false positives in the tests. Although all tests that currently check the helper's exit code, on errors, do also check the output, it's still better to standardize the exit codes and avoid future problems in new tests. While we are here, let's also check that we have the expected argc for configset_get_value and configset_get_value_multi, before trying to use argv. Note: this change is implemented with the unification of the exit labels. This might seem unnecessary, for now, but it will benefit the next patch, which will increase the cleanup section. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 76 ++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 36 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 234c722b48..1c8e965840 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -30,6 +30,14 @@ * iterate -> iterate over all values using git_config(), and print some * data for each * + * Exit codes: + * 0: success + * 1: value not found for the given config key + * 2: config file path given as argument is inaccessible or doesn't exist + * 129: test-config usage error + * + * Note: tests may also expect 128 for die() calls in the config machinery. + * * Examples: * * To print the value with highest priority for key "foo.bAr Baz.rock": @@ -64,35 +72,42 @@ static int early_config_cb(const char *var, const char *value, void *vdata) return 0; } +enum test_config_exit_code { + TC_SUCCESS = 0, + TC_VALUE_NOT_FOUND = 1, + TC_CONFIG_FILE_ERROR = 2, + TC_USAGE_ERROR = 129, +}; + int cmd__config(int argc, const char **argv) { int i, val; const char *v; const struct string_list *strptr; struct config_set cs; + enum test_config_exit_code ret = TC_SUCCESS; if (argc == 3 && !strcmp(argv[1], "read_early_config")) { read_early_config(early_config_cb, (void *)argv[2]); - return 0; + return TC_SUCCESS; } setup_git_directory(); git_configset_init(&cs); - if (argc < 2) { - fprintf(stderr, "Please, provide a command name on the command-line\n"); - goto exit1; - } else if (argc == 3 && !strcmp(argv[1], "get_value")) { + if (argc < 2) + goto print_usage_error; + + if (argc == 3 && !strcmp(argv[1], "get_value")) { if (!git_config_get_value(argv[2], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { strptr = git_config_get_value_multi(argv[2]); @@ -104,41 +119,38 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { if (!git_config_get_int(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { if (!git_config_get_bool(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { if (!git_config_get_string_const(argv[2], &v)) { printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "configset_get_value")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } if (!git_configset_get_value(&cs, argv[2], &v)) { @@ -146,17 +158,17 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "configset_get_value_multi")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } strptr = git_configset_get_value_multi(&cs, argv[2]); @@ -168,27 +180,19 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { git_config(iterate_cb, NULL); - goto exit0; + } else { +print_usage_error: + fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); + ret = TC_USAGE_ERROR; } - die("%s: Please check the syntax and the function name", argv[0]); - -exit0: - git_configset_clear(&cs); - return 0; - -exit1: - git_configset_clear(&cs); - return 1; - -exit2: +out: git_configset_clear(&cs); - return 2; + return ret; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v3 2/5] t/helper/test-config: return exit codes consistently 2020-05-28 1:13 ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares @ 2020-05-30 14:29 ` Elijah Newren 2020-06-01 4:36 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-30 14:29 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Wed, May 27, 2020 at 6:13 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > The test-config helper may exit with a variety of at least four > different codes, to reflect the status of the requested operations. > These codes are sometimes checked in the tests, but not all of the codes > are returned consistently by the helper: 1 will usually refer to a > "value not found", but usage errors can also return 1 or 128. The latter I'm not sure what "The latter" refers to here. > is also expected on errors within the configset functions. These > inconsistent uses of the exit codes can lead to false positives in the > tests. Although all tests that currently check the helper's exit code, > on errors, do also check the output, it's still better to standardize > the exit codes and avoid future problems in new tests. While we are That last sentence was slightly hard for me to parse. Maybe something like: ...Although all tests which expect errors and check the helper's exit code currently also check the output, it's still better... > here, let's also check that we have the expected argc for > configset_get_value and configset_get_value_multi, before trying to use > argv. > > Note: this change is implemented with the unification of the exit > labels. This might seem unnecessary, for now, but it will benefit the > next patch, which will increase the cleanup section. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > t/helper/test-config.c | 76 ++++++++++++++++++++++-------------------- > 1 file changed, 40 insertions(+), 36 deletions(-) > > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > index 234c722b48..1c8e965840 100644 > --- a/t/helper/test-config.c > +++ b/t/helper/test-config.c > @@ -30,6 +30,14 @@ > * iterate -> iterate over all values using git_config(), and print some > * data for each > * > + * Exit codes: > + * 0: success > + * 1: value not found for the given config key > + * 2: config file path given as argument is inaccessible or doesn't exist > + * 129: test-config usage error > + * > + * Note: tests may also expect 128 for die() calls in the config machinery. > + * > * Examples: > * > * To print the value with highest priority for key "foo.bAr Baz.rock": > @@ -64,35 +72,42 @@ static int early_config_cb(const char *var, const char *value, void *vdata) > return 0; > } > > +enum test_config_exit_code { > + TC_SUCCESS = 0, > + TC_VALUE_NOT_FOUND = 1, > + TC_CONFIG_FILE_ERROR = 2, > + TC_USAGE_ERROR = 129, > +}; > + > int cmd__config(int argc, const char **argv) > { > int i, val; > const char *v; > const struct string_list *strptr; > struct config_set cs; > + enum test_config_exit_code ret = TC_SUCCESS; > > if (argc == 3 && !strcmp(argv[1], "read_early_config")) { > read_early_config(early_config_cb, (void *)argv[2]); > - return 0; > + return TC_SUCCESS; > } > > setup_git_directory(); > > git_configset_init(&cs); > > - if (argc < 2) { > - fprintf(stderr, "Please, provide a command name on the command-line\n"); > - goto exit1; > - } else if (argc == 3 && !strcmp(argv[1], "get_value")) { > + if (argc < 2) > + goto print_usage_error; > + > + if (argc == 3 && !strcmp(argv[1], "get_value")) { > if (!git_config_get_value(argv[2], &v)) { > if (!v) > printf("(NULL)\n"); > else > printf("%s\n", v); > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { > strptr = git_config_get_value_multi(argv[2]); > @@ -104,41 +119,38 @@ int cmd__config(int argc, const char **argv) > else > printf("%s\n", v); > } > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 3 && !strcmp(argv[1], "get_int")) { > if (!git_config_get_int(argv[2], &val)) { > printf("%d\n", val); > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { > if (!git_config_get_bool(argv[2], &val)) { > printf("%d\n", val); > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 3 && !strcmp(argv[1], "get_string")) { > if (!git_config_get_string_const(argv[2], &v)) { > printf("%s\n", v); > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > - } else if (!strcmp(argv[1], "configset_get_value")) { > + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { > for (i = 3; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); > - goto exit2; > + ret = TC_CONFIG_FILE_ERROR; > + goto out; > } > } > if (!git_configset_get_value(&cs, argv[2], &v)) { > @@ -146,17 +158,17 @@ int cmd__config(int argc, const char **argv) > printf("(NULL)\n"); > else > printf("%s\n", v); > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > - } else if (!strcmp(argv[1], "configset_get_value_multi")) { > + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { > for (i = 3; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); > - goto exit2; > + ret = TC_CONFIG_FILE_ERROR; > + goto out; > } > } > strptr = git_configset_get_value_multi(&cs, argv[2]); > @@ -168,27 +180,19 @@ int cmd__config(int argc, const char **argv) > else > printf("%s\n", v); > } > - goto exit0; > } else { > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; > } > } else if (!strcmp(argv[1], "iterate")) { > git_config(iterate_cb, NULL); > - goto exit0; > + } else { > +print_usage_error: > + fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); > + ret = TC_USAGE_ERROR; > } > > - die("%s: Please check the syntax and the function name", argv[0]); > - > -exit0: > - git_configset_clear(&cs); > - return 0; > - > -exit1: > - git_configset_clear(&cs); > - return 1; > - > -exit2: > +out: > git_configset_clear(&cs); > - return 2; > + return ret; > } > -- > 2.26.2 So, the primary purpose of the commit is getting making the return status clearer, but most the code changes actually center around reducing the gotos and unification of the exit labels. Might have been slightly easier to read if those two issues had been split, but the patch is small enough that it's not a big deal. Makes sense. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 2/5] t/helper/test-config: return exit codes consistently 2020-05-30 14:29 ` Elijah Newren @ 2020-06-01 4:36 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-01 4:36 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sat, May 30, 2020 at 11:29 AM Elijah Newren <newren@gmail.com> wrote: > > On Wed, May 27, 2020 at 6:13 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > The test-config helper may exit with a variety of at least four > > different codes, to reflect the status of the requested operations. > > These codes are sometimes checked in the tests, but not all of the codes > > are returned consistently by the helper: 1 will usually refer to a > > "value not found", but usage errors can also return 1 or 128. The latter > > I'm not sure what "The latter" refers to here. It would be the 128 exit code. I'll try to reword that for clarity. > > is also expected on errors within the configset functions. These > > inconsistent uses of the exit codes can lead to false positives in the > > tests. Although all tests that currently check the helper's exit code, > > on errors, do also check the output, it's still better to standardize > > the exit codes and avoid future problems in new tests. While we are > > That last sentence was slightly hard for me to parse. Maybe something like: > > ...Although all tests which expect errors and check the helper's exit > code currently also check the output, it's still better... Sounds better, I will use that for the next version. Thanks. ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v3 3/5] config: correctly read worktree configs in submodules 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-28 1:12 ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares 2020-05-28 1:13 ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares @ 2020-05-28 1:13 ` Matheus Tavares 2020-05-30 14:49 ` Elijah Newren 2020-05-28 1:13 ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares ` (2 subsequent siblings) 5 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:13 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the steps in do_git_config_sequence() is to load the worktree-specific config file. Although the function receives a git_dir string, it relies on git_pathdup(), which uses the_repository->git_dir, to make the path to the file. Furthermore, it also checks that extensions.worktreeConfig is set through the repository_format_worktree_config variable, which refers to the_repository only. Thus, when a submodule has worktree settings, a command executed in the superproject that recurses into the submodule won't find the said settings. Such a scenario might not be needed now, but it will be in the following patch. git-grep will learn to honor sparse checkouts and, when running with --recurse-submodules, the submodule's sparse checkout settings must be loaded. As these settings are stored in the config.worktree file, they would be ignored without this patch. So let's fix this by reading the right config.worktree file and extensions.worktreeConfig setting, based on the git_dir and commondir paths given to do_git_config_sequence(). Also add a test to avoid any regressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- config.c | 21 +++++-- t/helper/test-config.c | 119 +++++++++++++++++++++++++++---------- t/t2404-worktree-config.sh | 16 +++++ 3 files changed, 118 insertions(+), 38 deletions(-) diff --git a/config.c b/config.c index 8db9c77098..c2d56309dc 100644 --- a/config.c +++ b/config.c @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); current_parsing_scope = CONFIG_SCOPE_WORKTREE; - if (!opts->ignore_worktree && repository_format_worktree_config) { - char *path = git_pathdup("config.worktree"); - if (!access_or_die(path, R_OK, 0)) - ret += git_config_from_file(fn, path, data); - free(path); + if (!opts->ignore_worktree && repo_config && opts->git_dir) { + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + struct strbuf buf = STRBUF_INIT; + + read_repository_format(&repo_fmt, repo_config); + + if (!verify_repository_format(&repo_fmt, &buf) && + repo_fmt.worktree_config) { + char *path = mkpathdup("%s/config.worktree", opts->git_dir); + if (!access_or_die(path, R_OK, 0)) + ret += git_config_from_file(fn, path, data); + free(path); + } + + strbuf_release(&buf); + clear_repository_format(&repo_fmt); } current_parsing_scope = CONFIG_SCOPE_COMMAND; diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 1c8e965840..284f83a921 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -2,12 +2,19 @@ #include "cache.h" #include "config.h" #include "string-list.h" +#include "submodule-config.h" /* * This program exposes the C API of the configuration mechanism * as a set of simple commands in order to facilitate testing. * - * Reads stdin and prints result of command to stdout: + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] + * + * If --submodule=<path> is given, <cmd> will operate on the submodule at the + * given <path>. This option is not valid for the commands: read_early_config, + * configset_get_value and configset_get_value_multi. + * + * Possible cmds are: * * get_value -> prints the value with highest priority for the entered key * @@ -84,33 +91,63 @@ int cmd__config(int argc, const char **argv) int i, val; const char *v; const struct string_list *strptr; - struct config_set cs; + struct config_set cs = { .hash_initialized = 0 }; enum test_config_exit_code ret = TC_SUCCESS; + struct repository *repo = the_repository; + const char *subrepo_path = NULL; + + argc--; /* skip over "config" */ + argv++; + + if (argc == 0) + goto print_usage_error; + + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { + argc--; + argv++; + if (argc == 0) + goto print_usage_error; + } - if (argc == 3 && !strcmp(argv[1], "read_early_config")) { - read_early_config(early_config_cb, (void *)argv[2]); + if (argc == 2 && !strcmp(argv[0], "read_early_config")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); + return TC_USAGE_ERROR; + } + read_early_config(early_config_cb, (void *)argv[1]); return TC_SUCCESS; } setup_git_directory(); - git_configset_init(&cs); - if (argc < 2) - goto print_usage_error; + if (subrepo_path) { + const struct submodule *sub; + struct repository *subrepo = xcalloc(1, sizeof(*repo)); + + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", + subrepo_path); + free(subrepo); + ret = TC_USAGE_ERROR; + goto out; + } + repo = subrepo; + } - if (argc == 3 && !strcmp(argv[1], "get_value")) { - if (!git_config_get_value(argv[2], &v)) { + if (argc == 2 && !strcmp(argv[0], "get_value")) { + if (!repo_config_get_value(repo, argv[1], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { - strptr = git_config_get_value_multi(argv[2]); + } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { + strptr = repo_config_get_value_multi(repo, argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -120,32 +157,38 @@ int cmd__config(int argc, const char **argv) printf("%s\n", v); } } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_int")) { - if (!git_config_get_int(argv[2], &val)) { + } else if (argc == 2 && !strcmp(argv[0], "get_int")) { + if (!repo_config_get_int(repo, argv[1], &val)) { printf("%d\n", val); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { - if (!git_config_get_bool(argv[2], &val)) { + } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { + if (!repo_config_get_bool(repo, argv[1], &val)) { printf("%d\n", val); } else { - printf("Value not found for \"%s\"\n", argv[2]); + + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_string")) { - if (!git_config_get_string_const(argv[2], &v)) { + } else if (argc == 2 && !strcmp(argv[0], "get_string")) { + if (!repo_config_get_string_const(repo, argv[1], &v)) { printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { - for (i = 3; i < argc; i++) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); + ret = TC_USAGE_ERROR; + goto out; + } + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ -153,17 +196,22 @@ int cmd__config(int argc, const char **argv) goto out; } } - if (!git_configset_get_value(&cs, argv[2], &v)) { + if (!git_configset_get_value(&cs, argv[1], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { - for (i = 3; i < argc; i++) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); + ret = TC_USAGE_ERROR; + goto out; + } + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ -171,7 +219,7 @@ int cmd__config(int argc, const char **argv) goto out; } } - strptr = git_configset_get_value_multi(&cs, argv[2]); + strptr = git_configset_get_value_multi(&cs, argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -181,18 +229,23 @@ int cmd__config(int argc, const char **argv) printf("%s\n", v); } } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "iterate")) { - git_config(iterate_cb, NULL); + } else if (!strcmp(argv[0], "iterate")) { + repo_config(repo, iterate_cb, NULL); } else { print_usage_error: - fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); + fprintf(stderr, "Invalid syntax. Usage: test-tool config" + " [--submodule=<path>] <cmd> [args]\n"); ret = TC_USAGE_ERROR; } out: git_configset_clear(&cs); + if (repo != the_repository) { + repo_clear(repo); + free(repo); + } return ret; } diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh index 286121d8de..b6ab793203 100755 --- a/t/t2404-worktree-config.sh +++ b/t/t2404-worktree-config.sh @@ -76,4 +76,20 @@ test_expect_success 'config.worktree no longer read without extension' ' test_cmp_config -C wt2 shared this.is ' +test_expect_success 'correctly read config.worktree from submodules' ' + test_unconfig extensions.worktreeConfig && + git init sub && + ( + cd sub && + test_commit A && + git config extensions.worktreeConfig true && + git config --worktree wtconfig.sub test-value + ) && + git submodule add ./sub && + git commit -m "add sub" && + echo test-value >expect && + test-tool config --submodule=sub get_value wtconfig.sub >actual && + test_cmp expect actual +' + test_done -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v3 3/5] config: correctly read worktree configs in submodules 2020-05-28 1:13 ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-05-30 14:49 ` Elijah Newren 2020-06-01 4:38 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-30 14:49 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Wed, May 27, 2020 at 6:13 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > One of the steps in do_git_config_sequence() is to load the > worktree-specific config file. Although the function receives a git_dir > string, it relies on git_pathdup(), which uses the_repository->git_dir, > to make the path to the file. Furthermore, it also checks that > extensions.worktreeConfig is set through the > repository_format_worktree_config variable, which refers to > the_repository only. Thus, when a submodule has worktree settings, a > command executed in the superproject that recurses into the submodule > won't find the said settings. > > Such a scenario might not be needed now, but it will be in the following It's not needed? Are there not other config values that affect grep's behavior, such as smudge filters of the submodule that might be important if doing a 'git grep --recurse-submodules $REVISION'? Also, is there a similar issue here for .gitattributes? (e.g. if the submodule declares certain files to be binary?) (I don't actually know if these are issues but I'm just surprised to hear that this would be the first case that would need to look at submodule-specific configuration. If the current code handles these other scenarios I bring up, then you just need to correct your commit message. If these aren't issues, then I'd appreciate a quick explanation of why I'm off base. If these are current issues and the current code isn't handling them, I'm not saying you need to address them in this patch series, but you might need to reword the commit message to mention that was already an issue that has previously been overlooked and we're starting by fixing one case.) > patch. git-grep will learn to honor sparse checkouts and, when running > with --recurse-submodules, the submodule's sparse checkout settings must > be loaded. As these settings are stored in the config.worktree file, > they would be ignored without this patch. So let's fix this by reading > the right config.worktree file and extensions.worktreeConfig setting, > based on the git_dir and commondir paths given to > do_git_config_sequence(). Also add a test to avoid any regressions. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > config.c | 21 +++++-- > t/helper/test-config.c | 119 +++++++++++++++++++++++++++---------- > t/t2404-worktree-config.sh | 16 +++++ > 3 files changed, 118 insertions(+), 38 deletions(-) > > diff --git a/config.c b/config.c > index 8db9c77098..c2d56309dc 100644 > --- a/config.c > +++ b/config.c > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > ret += git_config_from_file(fn, repo_config, data); > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > - if (!opts->ignore_worktree && repository_format_worktree_config) { > - char *path = git_pathdup("config.worktree"); > - if (!access_or_die(path, R_OK, 0)) > - ret += git_config_from_file(fn, path, data); > - free(path); > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > + struct strbuf buf = STRBUF_INIT; > + > + read_repository_format(&repo_fmt, repo_config); > + > + if (!verify_repository_format(&repo_fmt, &buf) && > + repo_fmt.worktree_config) { > + char *path = mkpathdup("%s/config.worktree", opts->git_dir); > + if (!access_or_die(path, R_OK, 0)) > + ret += git_config_from_file(fn, path, data); > + free(path); > + } > + > + strbuf_release(&buf); > + clear_repository_format(&repo_fmt); > } > > current_parsing_scope = CONFIG_SCOPE_COMMAND; > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > index 1c8e965840..284f83a921 100644 > --- a/t/helper/test-config.c > +++ b/t/helper/test-config.c > @@ -2,12 +2,19 @@ > #include "cache.h" > #include "config.h" > #include "string-list.h" > +#include "submodule-config.h" > > /* > * This program exposes the C API of the configuration mechanism > * as a set of simple commands in order to facilitate testing. > * > - * Reads stdin and prints result of command to stdout: > + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] > + * > + * If --submodule=<path> is given, <cmd> will operate on the submodule at the > + * given <path>. This option is not valid for the commands: read_early_config, > + * configset_get_value and configset_get_value_multi. > + * > + * Possible cmds are: > * > * get_value -> prints the value with highest priority for the entered key > * > @@ -84,33 +91,63 @@ int cmd__config(int argc, const char **argv) > int i, val; > const char *v; > const struct string_list *strptr; > - struct config_set cs; > + struct config_set cs = { .hash_initialized = 0 }; > enum test_config_exit_code ret = TC_SUCCESS; > + struct repository *repo = the_repository; > + const char *subrepo_path = NULL; > + > + argc--; /* skip over "config" */ This line alone is responsible for a fairly big set of changes throughout this file, just decrementing indices everywhere. It might be nice for review purposes if this and the other changes it caused were pulled out into a separate step, so we can more easily concentrate on the primary additions and changes you are making to this file. In particular, being so unfamiliar with submodules I'd really like to try to find someone who knows them a bit better to review all the subrepo_path related portions of this change to this file plus the config.c changes, but I think that'd be easier if the change were more focused. > + argv++; > + > + if (argc == 0) > + goto print_usage_error; > + > + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { > + argc--; > + argv++; > + if (argc == 0) > + goto print_usage_error; > + } > > - if (argc == 3 && !strcmp(argv[1], "read_early_config")) { > - read_early_config(early_config_cb, (void *)argv[2]); > + if (argc == 2 && !strcmp(argv[0], "read_early_config")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); > + return TC_USAGE_ERROR; > + } > + read_early_config(early_config_cb, (void *)argv[1]); > return TC_SUCCESS; > } > > setup_git_directory(); > - > git_configset_init(&cs); > > - if (argc < 2) > - goto print_usage_error; > + if (subrepo_path) { > + const struct submodule *sub; > + struct repository *subrepo = xcalloc(1, sizeof(*repo)); > + > + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); > + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { > + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", > + subrepo_path); > + free(subrepo); > + ret = TC_USAGE_ERROR; > + goto out; > + } > + repo = subrepo; > + } > > - if (argc == 3 && !strcmp(argv[1], "get_value")) { > - if (!git_config_get_value(argv[2], &v)) { > + if (argc == 2 && !strcmp(argv[0], "get_value")) { > + if (!repo_config_get_value(repo, argv[1], &v)) { > if (!v) > printf("(NULL)\n"); > else > printf("%s\n", v); > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { > - strptr = git_config_get_value_multi(argv[2]); > + } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { > + strptr = repo_config_get_value_multi(repo, argv[1]); > if (strptr) { > for (i = 0; i < strptr->nr; i++) { > v = strptr->items[i].string; > @@ -120,32 +157,38 @@ int cmd__config(int argc, const char **argv) > printf("%s\n", v); > } > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc == 3 && !strcmp(argv[1], "get_int")) { > - if (!git_config_get_int(argv[2], &val)) { > + } else if (argc == 2 && !strcmp(argv[0], "get_int")) { > + if (!repo_config_get_int(repo, argv[1], &val)) { > printf("%d\n", val); > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { > - if (!git_config_get_bool(argv[2], &val)) { > + } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { > + if (!repo_config_get_bool(repo, argv[1], &val)) { > printf("%d\n", val); > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc == 3 && !strcmp(argv[1], "get_string")) { > - if (!git_config_get_string_const(argv[2], &v)) { > + } else if (argc == 2 && !strcmp(argv[0], "get_string")) { > + if (!repo_config_get_string_const(repo, argv[1], &v)) { > printf("%s\n", v); > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { > - for (i = 3; i < argc; i++) { > + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); > + ret = TC_USAGE_ERROR; > + goto out; > + } > + for (i = 2; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); > @@ -153,17 +196,22 @@ int cmd__config(int argc, const char **argv) > goto out; > } > } > - if (!git_configset_get_value(&cs, argv[2], &v)) { > + if (!git_configset_get_value(&cs, argv[1], &v)) { > if (!v) > printf("(NULL)\n"); > else > printf("%s\n", v); > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { > - for (i = 3; i < argc; i++) { > + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); > + ret = TC_USAGE_ERROR; > + goto out; > + } > + for (i = 2; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); > @@ -171,7 +219,7 @@ int cmd__config(int argc, const char **argv) > goto out; > } > } > - strptr = git_configset_get_value_multi(&cs, argv[2]); > + strptr = git_configset_get_value_multi(&cs, argv[1]); > if (strptr) { > for (i = 0; i < strptr->nr; i++) { > v = strptr->items[i].string; > @@ -181,18 +229,23 @@ int cmd__config(int argc, const char **argv) > printf("%s\n", v); > } > } else { > - printf("Value not found for \"%s\"\n", argv[2]); > + printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > - } else if (!strcmp(argv[1], "iterate")) { > - git_config(iterate_cb, NULL); > + } else if (!strcmp(argv[0], "iterate")) { > + repo_config(repo, iterate_cb, NULL); > } else { > print_usage_error: > - fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); > + fprintf(stderr, "Invalid syntax. Usage: test-tool config" > + " [--submodule=<path>] <cmd> [args]\n"); > ret = TC_USAGE_ERROR; > } > > out: > git_configset_clear(&cs); > + if (repo != the_repository) { > + repo_clear(repo); > + free(repo); > + } > return ret; > } > diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh > index 286121d8de..b6ab793203 100755 > --- a/t/t2404-worktree-config.sh > +++ b/t/t2404-worktree-config.sh > @@ -76,4 +76,20 @@ test_expect_success 'config.worktree no longer read without extension' ' > test_cmp_config -C wt2 shared this.is > ' > > +test_expect_success 'correctly read config.worktree from submodules' ' > + test_unconfig extensions.worktreeConfig && > + git init sub && > + ( > + cd sub && > + test_commit A && > + git config extensions.worktreeConfig true && > + git config --worktree wtconfig.sub test-value > + ) && > + git submodule add ./sub && > + git commit -m "add sub" && > + echo test-value >expect && > + test-tool config --submodule=sub get_value wtconfig.sub >actual && > + test_cmp expect actual > +' > + > test_done > -- > 2.26.2 The index updates seem fine, and I like the test, and I tried to look at the submodule and config bits but I'm quite unfamiliar with that area of the code and I'd like to see if we can find someone who knows submodules and/or config a bit better to review those pieces. A split of this patch into two in your next roll of this series would be nice so we can ask someone to look at just the relevant bits. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 3/5] config: correctly read worktree configs in submodules 2020-05-30 14:49 ` Elijah Newren @ 2020-06-01 4:38 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-01 4:38 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sat, May 30, 2020 at 11:49 AM Elijah Newren <newren@gmail.com> wrote: > > On Wed, May 27, 2020 at 6:13 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > One of the steps in do_git_config_sequence() is to load the > > worktree-specific config file. Although the function receives a git_dir > > string, it relies on git_pathdup(), which uses the_repository->git_dir, > > to make the path to the file. Furthermore, it also checks that > > extensions.worktreeConfig is set through the > > repository_format_worktree_config variable, which refers to > > the_repository only. Thus, when a submodule has worktree settings, a > > command executed in the superproject that recurses into the submodule > > won't find the said settings. > > > > Such a scenario might not be needed now, but it will be in the following > > It's not needed? Are there not other config values that affect grep's > behavior, such as smudge filters of the submodule that might be > important if doing a 'git grep --recurse-submodules $REVISION'? Hmm, I haven't used smudge filters before, but it seems to me that `git grep $REVISION` does not honor them. > Also, is there a similar issue here for .gitattributes? (e.g. if the > submodule declares certain files to be binary?) Declaring files as binary in the submodule works fine. But I noticed that textconv filter specifications in the submodule's config are currently ignored. To be honest, I wasn't aware of this issue before. > I don't actually know if these are issues but I'm just surprised to > hear that this would be the first case that would need to look at > submodule-specific configuration. Hmm, not to submodule-specific configuration but to worktree-specific configuration of a submodule, right? I.e. a config.worktree file from within a submodule. Reconsidering this now, we could indeed have a diff.<driver>.textconv or core.quotePath settings specified in the worktree scope of a submodule. And we should honor them when recursing in grep. I guess I thought the "most natural" place for these settings, in a submodule, would be in the standard .git/config file (as opposed to the sparse-checkout ones, which are normally at config.worktree). That's probably why I wrote "Such scenario might not be needed now". But we should indeed support reading diff.<driver>.textconv from config.worktree as well (although grep currently ignores this setting in submodules, both in the local and worktree scopes). So the said sentence doesn't make much sense, indeed. I will remove it. Thanks! > > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > > index 1c8e965840..284f83a921 100644 > > --- a/t/helper/test-config.c > > +++ b/t/helper/test-config.c > > @@ -84,33 +91,63 @@ int cmd__config(int argc, const char **argv) > > int i, val; > > const char *v; > > const struct string_list *strptr; > > - struct config_set cs; > > + struct config_set cs = { .hash_initialized = 0 }; > > enum test_config_exit_code ret = TC_SUCCESS; > > + struct repository *repo = the_repository; > > + const char *subrepo_path = NULL; > > + > > + argc--; /* skip over "config" */ > > This line alone is responsible for a fairly big set of changes > throughout this file, just decrementing indices everywhere. It might > be nice for review purposes if this and the other changes it caused > were pulled out into a separate step, so we can more easily > concentrate on the primary additions and changes you are making to > this file. OK, will do. ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v3 4/5] grep: honor sparse checkout patterns 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (2 preceding siblings ...) 2020-05-28 1:13 ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-05-28 1:13 ` Matheus Tavares 2020-05-30 15:48 ` Elijah Newren 2020-05-28 1:13 ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares 5 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:13 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches found outside this subset, which kind of goes in the opposite direction. Let's fix that, making it honor the sparsity boundaries for every grepping case where this is relevant: - git grep in worktree - git grep --cached - git grep $REVISION For the worktree case, we will not grep paths that have the SKIP_WORKTREE bit set, even if they are present for some reason (e.g. manually created after `git sparse-checkout init`). But the next patch will add an option to do so. (See 'Note' below.) For `git grep $REVISION`, we will choose to honor the sparsity patterns only when $REVISION is a commit-ish object. The reason is that, for a tree, we don't know whether it represents the root of a repository or a subtree. So we wouldn't be able to correctly match it against the sparsity patterns. E.g. suppose we have a repository with these two sparsity rules: "/*" and "!/a"; and the following structure: / | - a (file) | - d (dir) | - a (file) If `git grep $REVISION` were to honor the sparsity patterns for every object type, when grepping the /d tree, we would wrongly ignore the /d/a file. This happens because we wouldn't know it resides in /d and therefore it would wrongly match the pattern "!/a". Furthermore, for a search in a blob object, we wouldn't even have a path to check the patterns against. So, let's ignore the sparsity patterns when grepping non-commit-ish objects. Note: The behavior introduced in this patch is what some users have reported[1] that they would like by default. But the old behavior is still desirable for some use cases. Therefore, the next patch will add an option to allow restoring it when needed. [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- builtin/grep.c | 125 ++++++++++++++++++++-- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 174 +++++++++++++++++++++++++++++++ 3 files changed, 291 insertions(+), 17 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index a5056f395a..11e33b8aee 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -552,9 +555,76 @@ static int grep_cache(struct grep_opt *opt, return hit; } -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) +static struct pattern_list *get_sparsity_patterns(struct repository *repo) +{ + struct pattern_list *patterns; + char *sparse_file; + int sparse_config, cone_config; + + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || + !sparse_config) { + return NULL; + } + + sparse_file = repo_git_path(repo, "info/sparse-checkout"); + patterns = xcalloc(1, sizeof(*patterns)); + + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) + cone_config = 0; + patterns->use_cone_patterns = cone_config; + + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { + if (file_exists(sparse_file)) { + warning(_("failed to load sparse-checkout file: '%s'"), + sparse_file); + } + free(sparse_file); + free(patterns); + return NULL; + } + + free(sparse_file); + return patterns; +} + +static int in_sparse_checkout(struct strbuf *path, int prefix_len, + unsigned int entry_mode, + struct index_state *istate, + struct pattern_list *sparsity, + enum pattern_match_result parent_match, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + int is_dir = S_ISDIR(entry_mode); + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + + if (is_dir && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, + path->buf + prefix_len, &dtype, + sparsity, istate); + if (*match == UNDECIDED) + *match = parent_match; + + if (is_dir) + strbuf_trim_trailing_dir_sep(path); + + if (*match == NOT_MATCHED && + (!is_dir || (is_dir && sparsity->use_cone_patterns))) + return 0; + + return 1; +} + +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int check_attr, struct pattern_list *sparsity, + enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ -570,6 +640,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); + enum pattern_match_result sparsity_match = 0; if (match != all_entries_interesting) { strbuf_addstr(&name, base->buf + tn_len); @@ -586,6 +657,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (sparsity) { + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + + if (!in_sparse_checkout(&path, old_baselen - tn_len, + entry.mode, repo->index, + sparsity, default_sparsity_match, + &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); @@ -602,8 +686,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, + check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, @@ -621,6 +705,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, return hit; } +/* + * Note: sparsity patterns and paths' attributes will only be considered if + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern + * matching on paths.) + */ +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int is_root_tree) +{ + struct pattern_list *patterns = NULL; + int ret; + + if (is_root_tree) + patterns = get_sparsity_patterns(opt->repo); + + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, + patterns, 0); + + if (patterns) { + clear_pattern_list(patterns); + free(patterns); + } + return ret; +} + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct object *obj, const char *name, const char *path) { diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..ce080cf572 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,174 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +|-- sub +| |-- A +| | `-- a +| `-- B +| `-- b +`-- sub2 + `-- a + +Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode +sparsity patterns and sub2 is a submodule that is excluded by the superproject +sparsity patterns. The resulting sparse checkout should leave the following +structure on the working tree: + +. +|-- a +|-- sub +| `-- B +| `-- b +`-- sub2 + `-- a + +But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git init sub2 && + ( + cd sub2 && + echo "text" >a && + git add a && + git commit -m sub2 + ) && + + git submodule add ./sub && + git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" "sub" && + + git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b && + test_path_is_file sub2/a +' + +# The test bellow checks a special case: the sparsity patterns exclude '/b' +# and sparse checkout is enable, but the path exists on the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_tag-to-tree <<-EOF && + tag-to-tree:a:text + tag-to-tree:b:text + tag-to-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" tag-to-tree >actual_tag-to-tree && + test_cmp expect_tag-to-tree actual_tag-to-tree +' + +# Note that sub2/ is present in the worktree but it is excluded by the sparsity +# patterns, so grep should not recurse into it. +test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:sub/B/b:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_done -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v3 4/5] grep: honor sparse checkout patterns 2020-05-28 1:13 ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares @ 2020-05-30 15:48 ` Elijah Newren 2020-06-01 4:44 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-30 15:48 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Wed, May 27, 2020 at 6:13 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > One of the main uses for a sparse checkout is to allow users to focus on > the subset of files in a repository in which they are interested. But > git-grep currently ignores the sparsity patterns and report all matches > found outside this subset, which kind of goes in the opposite direction. > Let's fix that, making it honor the sparsity boundaries for every > grepping case where this is relevant: > > - git grep in worktree > - git grep --cached > - git grep $REVISION > > For the worktree case, we will not grep paths that have the > SKIP_WORKTREE bit set, even if they are present for some reason (e.g. > manually created after `git sparse-checkout init`). This seems worded to rise alarm bells and make users suspect implementation difficulties or regrets rather than desired behavior. It would be much better to word this simply as something like: For the worktree and cached cases, we iterate over paths without the SKIP_WORKTREE bit set, and limit our searches to these paths. > But the next patch > will add an option to do so. (See 'Note' below.) Because this was in the same paragraph as the previous sentence, it made it sound like you were going to provide a special worktree-only option to search outside the SKIP_WORKTREE bits. Very confusing. I think I'd combine this sentence into the very first paragraph of the commit message and massage the wording a little. Perhaps something like: ...goes in the opposite direction. There are some usecases for ignoring the sparsity patterns and the next commit will add an option to obtain this behavior, but here we start by making grep honor the sparsity boundaries for every... > For `git grep $REVISION`, we will choose to honor the sparsity patterns > only when $REVISION is a commit-ish object. The reason is that, for a > tree, we don't know whether it represents the root of a repository or a > subtree. So we wouldn't be able to correctly match it against the > sparsity patterns. E.g. suppose we have a repository with these two > sparsity rules: "/*" and "!/a"; and the following structure: > > / > | - a (file) > | - d (dir) > | - a (file) > > If `git grep $REVISION` were to honor the sparsity patterns for every > object type, when grepping the /d tree, we would wrongly ignore the /d/a > file. This happens because we wouldn't know it resides in /d and > therefore it would wrongly match the pattern "!/a". Furthermore, for a > search in a blob object, we wouldn't even have a path to check the > patterns against. So, let's ignore the sparsity patterns when grepping > non-commit-ish objects. This doesn't actually make it clear how you handle $REVISION which is a commit object; you focus so much on when $REVISION is just a tree and contrasting that case that you omit the behavior for the case of interest. Also, $REVISION to my mind implies "commit"; if you want to imply that a commit or tree could be used, you'd use $TREE or $TREE_ISH or something else. I think it'd make sense to cover all three relevant cases into a single paragraph (thus combining with the previous paragraph), and then add a second paragraph about the $TREE case that streamlines the last two pargraphs above. So, perhaps we can your paragraphs from "For the worktree case, we will not grep paths..." all the way to "So, let's ignore the sparsity patterns when grepping non-commit-ish objects" (after first moving the comment about adding an option in the next commit to some other area of the commit message, as dicussed above) with something like the following: For the worktree and cached cases, we iterate over paths without the SKIP_WORKTREE bit set, and limit our searches to these paths. For the $REVISION case, we limit the paths we search to those that match the sparsity patterns. (We do not check the SKIP_WORKTREE bit for the $REVISION case, because $REVISION may contain paths that do not exist in HEAD and thus for which we have no SKIP_WORKTREE bit to consult. The sparsity patterns tell us how the SKIP_WORKTREE bit would be set if we were to check out $REVISION, so we consult those. Also, we don't use the sparsity paths with the worktree or cached cases, both because we have a bit we can check directly and more efficiently, and because unmerged entries from a merge or a rebase could cause more files to temporarily be present than the sparsity patterns would normally select.) Note that there is a special case here: `git grep $TREE`. In this case we cannot know whether $TREE corresponds to the root of the repository or some sub-tree, and thus there is no way for us to know which sparsity patterns, if any, apply. So the $TREE case will not use sparsity patterns or any SKIP_WORKTREE bits and will instead always search all files within the $TREE. > > Note: The behavior introduced in this patch is what some users have > reported[1] that they would like by default. But the old behavior is > still desirable for some use cases. Therefore, the next patch will add > an option to allow restoring it when needed. This paragraph duplicates information you already stated previously. It's much clearer than what you stated before, but if you just reword the previous comments and combine them into the first paragraph, then we can drop this final note. > [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > builtin/grep.c | 125 ++++++++++++++++++++-- > t/t7011-skip-worktree-reading.sh | 9 -- > t/t7817-grep-sparse-checkout.sh | 174 +++++++++++++++++++++++++++++++ > 3 files changed, 291 insertions(+), 17 deletions(-) > create mode 100755 t/t7817-grep-sparse-checkout.sh > > diff --git a/builtin/grep.c b/builtin/grep.c > index a5056f395a..11e33b8aee 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, > const struct pathspec *pathspec, int cached); > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr); > + int is_root_tree); So you modified the forward declaration of grep_tree()... > > static int grep_submodule(struct grep_opt *opt, > const struct pathspec *pathspec, > @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, > > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > + > + if (ce_skip_worktree(ce)) > + continue; > + > strbuf_setlen(&name, name_base_len); > strbuf_addstr(&name, ce->name); > > @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, > * cache entry are identical, even if worktree file has > * been modified, so use cache version instead > */ > - if (cached || (ce->ce_flags & CE_VALID) || > - ce_skip_worktree(ce)) { > + if (cached || (ce->ce_flags & CE_VALID)) { > if (ce_stage(ce) || ce_intent_to_add(ce)) > continue; > hit |= grep_oid(opt, &ce->oid, name.buf, > @@ -552,9 +555,76 @@ static int grep_cache(struct grep_opt *opt, > return hit; > } > > -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > - struct tree_desc *tree, struct strbuf *base, int tn_len, > - int check_attr) Here the patch splits your handling of grep_tree()... > +static struct pattern_list *get_sparsity_patterns(struct repository *repo) > +{ > + struct pattern_list *patterns; > + char *sparse_file; > + int sparse_config, cone_config; > + > + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || > + !sparse_config) { > + return NULL; > + } Is core_apply_sparse_checkout not initialized for some reason? > + > + sparse_file = repo_git_path(repo, "info/sparse-checkout"); > + patterns = xcalloc(1, sizeof(*patterns)); > + > + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) > + cone_config = 0; > + patterns->use_cone_patterns = cone_config; Similarly, is core_sparse_checkout_cone not intialized? > + > + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { > + if (file_exists(sparse_file)) { > + warning(_("failed to load sparse-checkout file: '%s'"), > + sparse_file); > + } > + free(sparse_file); > + free(patterns); > + return NULL; > + } > + > + free(sparse_file); > + return patterns; > +} > + > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, This function name in_sparse_checkout() makes me think "Does the working tree represent a sparse checkout?" Perhaps we could rename it to path_matches_sparsity_patterns() ? Also, is there a reason we can't use dir.c's path_matches_pattern_list() here? How does this new function differ in behavior from that function? > + unsigned int entry_mode, > + struct index_state *istate, > + struct pattern_list *sparsity, > + enum pattern_match_result parent_match, > + enum pattern_match_result *match) > +{ > + int dtype = DT_UNKNOWN; > + int is_dir = S_ISDIR(entry_mode); > + > + if (parent_match == MATCHED_RECURSIVE) { > + *match = parent_match; > + return 1; > + } > + > + if (is_dir && !is_dir_sep(path->buf[path->len - 1])) > + strbuf_addch(path, '/'); > + > + *match = path_matches_pattern_list(path->buf, path->len, > + path->buf + prefix_len, &dtype, > + sparsity, istate); > + if (*match == UNDECIDED) > + *match = parent_match; > + > + if (is_dir) > + strbuf_trim_trailing_dir_sep(path); > + > + if (*match == NOT_MATCHED && > + (!is_dir || (is_dir && sparsity->use_cone_patterns))) > + return 0; > + > + return 1; > +} > + > +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, I thought this meant you were renaming grep_tree() to do_grep_tree() but it's a new function that happens to have most of the logic from the old grep_tree() and which the new grep_tree() will call to do most its work. > + struct tree_desc *tree, struct strbuf *base, int tn_len, > + int check_attr, struct pattern_list *sparsity, > + enum pattern_match_result default_sparsity_match) > { > struct repository *repo = opt->repo; > int hit = 0; > @@ -570,6 +640,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > while (tree_entry(tree, &entry)) { > int te_len = tree_entry_len(&entry); > + enum pattern_match_result sparsity_match = 0; > > if (match != all_entries_interesting) { > strbuf_addstr(&name, base->buf + tn_len); > @@ -586,6 +657,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_add(base, entry.path, te_len); > > + if (sparsity) { > + struct strbuf path = STRBUF_INIT; > + strbuf_addstr(&path, base->buf + tn_len); > + > + if (!in_sparse_checkout(&path, old_baselen - tn_len, > + entry.mode, repo->index, > + sparsity, default_sparsity_match, > + &sparsity_match)) { > + strbuf_setlen(base, old_baselen); > + continue; > + } > + } > + > if (S_ISREG(entry.mode)) { > hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, > check_attr ? base->buf + tn_len : NULL); > @@ -602,8 +686,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > strbuf_addch(base, '/'); > init_tree_desc(&sub, data, size); > - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, > - check_attr); > + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, > + check_attr, sparsity, sparsity_match); > free(data); > } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { > hit |= grep_submodule(opt, pathspec, &entry.oid, > @@ -621,6 +705,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > return hit; > } > > +/* > + * Note: sparsity patterns and paths' attributes will only be considered if > + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern > + * matching on paths.) > + */ > +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > + struct tree_desc *tree, struct strbuf *base, int tn_len, > + int is_root_tree) > +{ > + struct pattern_list *patterns = NULL; > + int ret; > + > + if (is_root_tree) > + patterns = get_sparsity_patterns(opt->repo); > + > + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, > + patterns, 0); > + > + if (patterns) { > + clear_pattern_list(patterns); > + free(patterns); > + } > + return ret; > +} Once I figured out grep_tree() was just becoming a wrapper around do_grep_tree(), the patch made more sense; I should have scrolled to the end quicker. ;-) > + > static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, > struct object *obj, const char *name, const char *path) > { > diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh > index 37525cae3a..26852586ac 100755 > --- a/t/t7011-skip-worktree-reading.sh > +++ b/t/t7011-skip-worktree-reading.sh > @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' > test -z "$(git ls-files -m)" > ' > > -test_expect_success 'grep with skip-worktree file' ' > - git update-index --no-skip-worktree 1 && > - echo test > 1 && > - git update-index 1 && > - git update-index --skip-worktree 1 && > - rm 1 && > - test "$(git grep --no-ext-grep test)" = "1:test" > -' Yaay! > - > echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected > test_expect_success 'diff-index does not examine skip-worktree absent entries' ' > setup_absent && > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > new file mode 100755 > index 0000000000..ce080cf572 > --- /dev/null > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -0,0 +1,174 @@ > +#!/bin/sh > + > +test_description='grep in sparse checkout > + > +This test creates a repo with the following structure: > + > +. > +|-- a > +|-- b > +|-- dir > +| `-- c > +|-- sub > +| |-- A > +| | `-- a > +| `-- B > +| `-- b > +`-- sub2 > + `-- a > + > +Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode Maybe "Where the outer repository has non-code mode..."? The use of '.' threw me for a bit. > +sparsity patterns and sub2 is a submodule that is excluded by the superproject > +sparsity patterns. The resulting sparse checkout should leave the following > +structure on the working tree: s/on the/in the/? > + > +. > +|-- a > +|-- sub > +| `-- B > +| `-- b > +`-- sub2 > + `-- a > + > +But note that sub2 should have the SKIP_WORKTREE bit set. > +' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + echo "text" >a && > + echo "text" >b && > + mkdir dir && > + echo "text" >dir/c && > + > + git init sub && > + ( > + cd sub && > + mkdir A B && > + echo "text" >A/a && > + echo "text" >B/b && > + git add A B && > + git commit -m sub && > + git sparse-checkout init --cone && > + git sparse-checkout set B > + ) && > + > + git init sub2 && > + ( > + cd sub2 && > + echo "text" >a && > + git add a && > + git commit -m sub2 > + ) && > + > + git submodule add ./sub && > + git submodule add ./sub2 && > + git add a b dir && > + git commit -m super && > + git sparse-checkout init --no-cone && > + git sparse-checkout set "/*" "!b" "!/*/" "sub" && > + > + git tag -am tag-to-commit tag-to-commit HEAD && > + tree=$(git rev-parse HEAD^{tree}) && > + git tag -am tag-to-tree tag-to-tree $tree && > + > + test_path_is_missing b && > + test_path_is_missing dir && > + test_path_is_missing sub/A && > + test_path_is_file a && > + test_path_is_file sub/B/b && > + test_path_is_file sub2/a > +' > + > +# The test bellow checks a special case: the sparsity patterns exclude '/b' s/bellow/below/ > +# and sparse checkout is enable, but the path exists on the working tree (e.g. s/enable/enabled/, s/on/in/ > +# manually created after `git sparse-checkout init`). In this case, grep should > +# skip it. > +test_expect_success 'grep in working tree should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + echo "new-text" >b && > + test_when_finished "rm b" && > + git grep "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --cached should honor sparse checkout' ' > + cat >expect <<-EOF && > + a:text > + EOF > + git grep --cached "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + EOF > + cat >expect_tag-to-commit <<-EOF && > + tag-to-commit:a:text > + EOF > + git grep "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git grep "text" tag-to-commit >actual_tag-to-commit && > + test_cmp expect_tag-to-commit actual_tag-to-commit > +' > + > +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' > + commit=$(git rev-parse HEAD) && > + tree=$(git rev-parse HEAD^{tree}) && > + cat >expect_tree <<-EOF && > + $tree:a:text > + $tree:b:text > + $tree:dir/c:text > + EOF > + cat >expect_tag-to-tree <<-EOF && > + tag-to-tree:a:text > + tag-to-tree:b:text > + tag-to-tree:dir/c:text > + EOF > + git grep "text" $tree >actual_tree && > + test_cmp expect_tree actual_tree && > + git grep "text" tag-to-tree >actual_tag-to-tree && > + test_cmp expect_tag-to-tree actual_tag-to-tree > +' > + > +# Note that sub2/ is present in the worktree but it is excluded by the sparsity > +# patterns, so grep should not recurse into it. > +test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' > + cat >expect <<-EOF && > + a:text > + sub/B/b:text > + EOF > + git grep --recurse-submodules "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' > + cat >expect <<-EOF && > + a:text > + sub/B/b:text > + EOF > + git grep --recurse-submodules --cached "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:sub/B/b:text > + EOF > + cat >expect_tag-to-commit <<-EOF && > + tag-to-commit:a:text > + tag-to-commit:sub/B/b:text > + EOF > + git grep --recurse-submodules "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && > + test_cmp expect_tag-to-commit actual_tag-to-commit > +' > + > +test_done > -- > 2.26.2 Looks good. Do we want to add a testcase where a file is unmerged and present in the working copy despite not matching the sparsity patterns (i.e. to emulate being in the middle of a merge/rebase/cherry-pick)? ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 4/5] grep: honor sparse checkout patterns 2020-05-30 15:48 ` Elijah Newren @ 2020-06-01 4:44 ` Matheus Tavares Bernardino 2020-06-03 2:38 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-01 4:44 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sat, May 30, 2020 at 12:48 PM Elijah Newren <newren@gmail.com> wrote: > > On Wed, May 27, 2020 at 6:13 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > One of the main uses for a sparse checkout is to allow users to focus on > > the subset of files in a repository in which they are interested. But > > git-grep currently ignores the sparsity patterns and report all matches > > found outside this subset, which kind of goes in the opposite direction. > > Let's fix that, making it honor the sparsity boundaries for every > > grepping case where this is relevant: > > > > - git grep in worktree > > - git grep --cached > > - git grep $REVISION > > > > For the worktree case, we will not grep paths that have the > > SKIP_WORKTREE bit set, even if they are present for some reason (e.g. > > manually created after `git sparse-checkout init`). > > This seems worded to rise alarm bells and make users suspect > implementation difficulties or regrets rather than desired behavior. > It would be much better to word this simply as something like: > > For the worktree and cached cases, we iterate over paths without > the SKIP_WORKTREE bit set, and limit our searches to these paths. > > > But the next patch > > will add an option to do so. (See 'Note' below.) > > Because this was in the same paragraph as the previous sentence, it > made it sound like you were going to provide a special worktree-only > option to search outside the SKIP_WORKTREE bits. Very confusing. I > think I'd combine this sentence into the very first paragraph of the > commit message and massage the wording a little. Perhaps something > like: ...goes in the opposite direction. There are some usecases for > ignoring the sparsity patterns and the next commit will add an option > to obtain this behavior, but here we start by making grep honor the > sparsity boundaries for every... > > > For `git grep $REVISION`, we will choose to honor the sparsity patterns > > only when $REVISION is a commit-ish object. The reason is that, for a > > tree, we don't know whether it represents the root of a repository or a > > subtree. So we wouldn't be able to correctly match it against the > > sparsity patterns. E.g. suppose we have a repository with these two > > sparsity rules: "/*" and "!/a"; and the following structure: > > > > / > > | - a (file) > > | - d (dir) > > | - a (file) > > > > If `git grep $REVISION` were to honor the sparsity patterns for every > > object type, when grepping the /d tree, we would wrongly ignore the /d/a > > file. This happens because we wouldn't know it resides in /d and > > therefore it would wrongly match the pattern "!/a". Furthermore, for a > > search in a blob object, we wouldn't even have a path to check the > > patterns against. So, let's ignore the sparsity patterns when grepping > > non-commit-ish objects. > > This doesn't actually make it clear how you handle $REVISION which is > a commit object; you focus so much on when $REVISION is just a tree > and contrasting that case that you omit the behavior for the case of > interest. Also, $REVISION to my mind implies "commit"; if you want to > imply that a commit or tree could be used, you'd use $TREE or > $TREE_ISH or something else. I think it'd make sense to cover all > three relevant cases into a single paragraph (thus combining with the > previous paragraph), and then add a second paragraph about the $TREE > case that streamlines the last two pargraphs above. So, perhaps we > can your paragraphs from "For the worktree case, we will not grep > paths..." all the way to "So, let's ignore the sparsity patterns when > grepping non-commit-ish objects" (after first moving the comment about > adding an option in the next commit to some other area of the commit > message, as dicussed above) with something like the following: > > For the worktree and cached cases, we iterate over paths without > the SKIP_WORKTREE bit set, and limit our searches to these paths. For > the $REVISION case, we limit the paths we search to those that match > the sparsity patterns. (We do not check the SKIP_WORKTREE bit for the > $REVISION case, because $REVISION may contain paths that do not exist > in HEAD and thus for which we have no SKIP_WORKTREE bit to consult. > The sparsity patterns tell us how the SKIP_WORKTREE bit would be set > if we were to check out $REVISION, so we consult those. Also, we > don't use the sparsity paths with the worktree or cached cases, both > because we have a bit we can check directly and more efficiently, and > because unmerged entries from a merge or a rebase could cause more > files to temporarily be present than the sparsity patterns would > normally select.) > > Note that there is a special case here: `git grep $TREE`. In this > case we cannot know whether $TREE corresponds to the root of the > repository or some sub-tree, and thus there is no way for us to know > which sparsity patterns, if any, apply. So the $TREE case will not > use sparsity patterns or any SKIP_WORKTREE bits and will instead > always search all files within the $TREE. > > > > > Note: The behavior introduced in this patch is what some users have > > reported[1] that they would like by default. But the old behavior is > > still desirable for some use cases. Therefore, the next patch will add > > an option to allow restoring it when needed. > > This paragraph duplicates information you already stated previously. > It's much clearer than what you stated before, but if you just reword > the previous comments and combine them into the first paragraph, then > we can drop this final note. All great suggestions! I will amend the commit message using your proposed paragraphs. Thanks! > > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > --- > > builtin/grep.c | 125 ++++++++++++++++++++-- > > t/t7011-skip-worktree-reading.sh | 9 -- > > t/t7817-grep-sparse-checkout.sh | 174 +++++++++++++++++++++++++++++++ [...] > > +static struct pattern_list *get_sparsity_patterns(struct repository *repo) > > +{ > > + struct pattern_list *patterns; > > + char *sparse_file; > > + int sparse_config, cone_config; > > + > > + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || > > + !sparse_config) { > > + return NULL; > > + } > > Is core_apply_sparse_checkout not initialized for some reason? It should be already initialized, yes. But we cannot rely on that as `repo` might be a submodule, and core_apply_sparse_checkout holds the configuration's value for `the_repository`. > > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, > > This function name in_sparse_checkout() makes me think "Does the > working tree represent a sparse checkout?" Perhaps we could rename it > to path_matches_sparsity_patterns() ? > > Also, is there a reason we can't use dir.c's > path_matches_pattern_list() here? Oh, we do use path_matches_pattern_list() inside: > > + *match = path_matches_pattern_list(path->buf, path->len, > > + path->buf + prefix_len, &dtype, > > + sparsity, istate); > > + if (*match == UNDECIDED) > > + *match = parent_match; > How does this new function differ > in behavior from that function? The idea of in_sparse_checkout() is to implement a logic closer to what we have in clear_ce_flags_1(). Here, it is effectively a wrapper to path_matches_pattern_list() but with some extra logic to decide whether grep should search in a given entry, based on its mode, the match result against the sparsity patterns, and the result from the parent dir. > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > new file mode 100755 > > index 0000000000..ce080cf572 > > --- /dev/null > > +++ b/t/t7817-grep-sparse-checkout.sh > > @@ -0,0 +1,174 @@ > > +#!/bin/sh > > + > > +test_description='grep in sparse checkout > > + > > +This test creates a repo with the following structure: > > + > > +. > > +|-- a > > +|-- b > > +|-- dir > > +| `-- c > > +|-- sub > > +| |-- A > > +| | `-- a > > +| `-- B > > +| `-- b > > +`-- sub2 > > + `-- a > > + > > +Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode > > Maybe "Where the outer repository has non-code mode..."? The use of > '.' threw me for a bit. Sure! > > +test_done > > -- > > 2.26.2 > > Looks good. Do we want to add a testcase where a file is unmerged and > present in the working copy despite not matching the sparsity patterns > (i.e. to emulate being in the middle of a merge/rebase/cherry-pick)? Sure, I can add that. But after a quick test here, it seems that the unmerged path doesn't have the SKIP_WORKTREE bit set. Is this how it should be? ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 4/5] grep: honor sparse checkout patterns 2020-06-01 4:44 ` Matheus Tavares Bernardino @ 2020-06-03 2:38 ` Elijah Newren 2020-06-10 17:08 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-06-03 2:38 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sun, May 31, 2020 at 9:44 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Sat, May 30, 2020 at 12:48 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Wed, May 27, 2020 at 6:13 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > > > [...] > > > +static struct pattern_list *get_sparsity_patterns(struct repository *repo) > > > +{ > > > + struct pattern_list *patterns; > > > + char *sparse_file; > > > + int sparse_config, cone_config; > > > + > > > + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || > > > + !sparse_config) { > > > + return NULL; > > > + } > > > > Is core_apply_sparse_checkout not initialized for some reason? > > It should be already initialized, yes. But we cannot rely on that as > `repo` might be a submodule, and core_apply_sparse_checkout holds the > configuration's value for `the_repository`. Ah, gotcha. Thanks for straightening me out. > > > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, > > > > This function name in_sparse_checkout() makes me think "Does the > > working tree represent a sparse checkout?" Perhaps we could rename it > > to path_matches_sparsity_patterns() ? > > > > Also, is there a reason we can't use dir.c's > > path_matches_pattern_list() here? > > Oh, we do use path_matches_pattern_list() inside: > > > > + *match = path_matches_pattern_list(path->buf, path->len, > > > + path->buf + prefix_len, &dtype, > > > + sparsity, istate); > > > + if (*match == UNDECIDED) > > > + *match = parent_match; > > > How does this new function differ > > in behavior from that function? > > The idea of in_sparse_checkout() is to implement a logic closer to > what we have in clear_ce_flags_1(). Here, it is effectively a wrapper > to path_matches_pattern_list() but with some extra logic to decide > whether grep should search in a given entry, based on its mode, the > match result against the sparsity patterns, and the result from the > parent dir. I've had this response and one to 5/5 sitting in my draft folder for over a day because I was hoping to go read clear_ce_flags_1() and find out what it is. I have no idea, so your answer doesn't answer my question... ;-) I'll try to find some time and maybe respond further after I do. > > > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > > new file mode 100755 > > > index 0000000000..ce080cf572 > > > --- /dev/null > > > +++ b/t/t7817-grep-sparse-checkout.sh > > > @@ -0,0 +1,174 @@ > > > +#!/bin/sh > > > + > > > +test_description='grep in sparse checkout > > > + > > > +This test creates a repo with the following structure: > > > + > > > +. > > > +|-- a > > > +|-- b > > > +|-- dir > > > +| `-- c > > > +|-- sub > > > +| |-- A > > > +| | `-- a > > > +| `-- B > > > +| `-- b > > > +`-- sub2 > > > + `-- a > > > + > > > +Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode > > > > Maybe "Where the outer repository has non-code mode..."? The use of > > '.' threw me for a bit. > > Sure! > > > > +test_done > > > -- > > > 2.26.2 > > > > Looks good. Do we want to add a testcase where a file is unmerged and > > present in the working copy despite not matching the sparsity patterns > > (i.e. to emulate being in the middle of a merge/rebase/cherry-pick)? > > Sure, I can add that. But after a quick test here, it seems that the > unmerged path doesn't have the SKIP_WORKTREE bit set. Is this how it > should be? Right, the merge machinery will clear the SKIP_WORKTREE bit when it writes out conflicted files. Also, any future 'git sparse-checkout' commands will see the unmerged entry and avoid marking it as SKIP_WORKTREE even though it doesn't match the sparsity patterns. Thus, grep doesn't have to do any special checking for whether the files are merged or not, and from your current implementation probably doesn't look like a special case at all -- you just check the SKIP_WORKTREE bit. However, I think the test still has value because the test enforces that other areas of the code (merge, sparse-checkout) don't break the invariants that grep is relying on. (I could see someone making a merge change that keeps the SKIP_WORKTREE bit accidentally set even though it writes the file out to the working tree, for example.) Sure, merge has some tests around that, so it might be viewed as slightly duplicative, but I see it as an interesting edge case that exercises whether the SKIP_WORKTREE bit should really be set and since grep expects a certain invariant about how that is handled, the testcase will help make sure our expectations aren't violated. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 4/5] grep: honor sparse checkout patterns 2020-06-03 2:38 ` Elijah Newren @ 2020-06-10 17:08 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-10 17:08 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Tue, Jun 2, 2020 at 11:38 PM Elijah Newren <newren@gmail.com> wrote: > > On Sun, May 31, 2020 at 9:44 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > On Sat, May 30, 2020 at 12:48 PM Elijah Newren <newren@gmail.com> wrote: > > > > > > On Wed, May 27, 2020 at 6:13 PM Matheus Tavares > > > <matheus.bernardino@usp.br> wrote: > > > > > > > > +static int in_sparse_checkout(struct strbuf *path, int prefix_len, > > > > > > This function name in_sparse_checkout() makes me think "Does the > > > working tree represent a sparse checkout?" Perhaps we could rename it > > > to path_matches_sparsity_patterns() ? > > > > > > Also, is there a reason we can't use dir.c's > > > path_matches_pattern_list() here? > > > > Oh, we do use path_matches_pattern_list() inside: > > > > > > + *match = path_matches_pattern_list(path->buf, path->len, > > > > + path->buf + prefix_len, &dtype, > > > > + sparsity, istate); > > > > + if (*match == UNDECIDED) > > > > + *match = parent_match; > > > > > How does this new function differ > > > in behavior from that function? > > > > The idea of in_sparse_checkout() is to implement a logic closer to > > what we have in clear_ce_flags_1(). Here, it is effectively a wrapper > > to path_matches_pattern_list() but with some extra logic to decide > > whether grep should search in a given entry, based on its mode, the > > match result against the sparsity patterns, and the result from the > > parent dir. > > I've had this response and one to 5/5 sitting in my draft folder for > over a day because I was hoping to go read clear_ce_flags_1() and find > out what it is. I have no idea, so your answer doesn't answer my > question... ;-) I'll try to find some time and maybe respond further > after I do. Oops, sorry for the incomplete answer. clear_ce_flags() recursively traverses the index entries, unsetting the bits specified in a given mask when the entry matches a given pattern list. (It is used in unpack-trees.c:mark_new_skip_worktree() to clear the CE_NEW_SKIP_WORKTREE bit for the matched entries.) clear_ce_flags() does use path_matches_pattern_list() but it also has to check some additional rules for cone mode (as there might be recursive matches/non-matches). These rules are implemented in clear_ce_flags_dir(). in_sparse_checkout() is a small wrapper around path_matches_pattern_list() with (1) the additional checks for cone mode, similar to what clear_ce_flags_dir() implements, and (2) the usage of the parent dir's match_result when undecided about the current path. We could just implement this directly in grep_tree(), but I thought that isolating this logic into its own static function would make grep_tree() more readable. > > > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > > > new file mode 100755 > > > > index 0000000000..ce080cf572 > > > > --- /dev/null > > > > +++ b/t/t7817-grep-sparse-checkout.sh > > > > > > Looks good. Do we want to add a testcase where a file is unmerged and > > > present in the working copy despite not matching the sparsity patterns > > > (i.e. to emulate being in the middle of a merge/rebase/cherry-pick)? > > > > Sure, I can add that. But after a quick test here, it seems that the > > unmerged path doesn't have the SKIP_WORKTREE bit set. Is this how it > > should be? > > Right, the merge machinery will clear the SKIP_WORKTREE bit when it > writes out conflicted files. Also, any future 'git sparse-checkout' > commands will see the unmerged entry and avoid marking it as > SKIP_WORKTREE even though it doesn't match the sparsity patterns. > Thus, grep doesn't have to do any special checking for whether the > files are merged or not, and from your current implementation probably > doesn't look like a special case at all -- you just check the > SKIP_WORKTREE bit. > > However, I think the test still has value because the test enforces > that other areas of the code (merge, sparse-checkout) don't break the > invariants that grep is relying on. (I could see someone making a > merge change that keeps the SKIP_WORKTREE bit accidentally set even > though it writes the file out to the working tree, for example.) > Sure, merge has some tests around that, so it might be viewed as > slightly duplicative, but I see it as an interesting edge case that > exercises whether the SKIP_WORKTREE bit should really be set and since > grep expects a certain invariant about how that is handled, the > testcase will help make sure our expectations aren't violated. OK. I will add this test for the next version. ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (3 preceding siblings ...) 2020-05-28 1:13 ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares @ 2020-05-28 1:13 ` Matheus Tavares 2020-05-30 16:18 ` Elijah Newren 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares 5 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-05-28 1:13 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy When sparse checkout is enabled, some users expect the output of certain commands (such as grep, diff, and log) to be also restricted within the sparsity patterns. This would allow them to effectively work only on the subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the previous patch. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function documentation that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's introduce the sparse.restrictCmds setting (and the analogous --[no]-restrict-to-sparse-paths global option). The default value is true. For now, grep is the only one affected by this setting, but the goal is to have support for more commands, in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config.txt | 2 + Documentation/config/sparse.txt | 24 +++++ Documentation/git-grep.txt | 3 + Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 13 ++- contrib/completion/git-completion.bash | 2 + git.c | 6 ++ sparse-checkout.c | 16 +++ sparse-checkout.h | 11 +++ t/t7817-grep-sparse-checkout.sh | 132 ++++++++++++++++++++++++- t/t9902-completion.sh | 4 +- 12 files changed, 212 insertions(+), 6 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..fd74b80302 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -436,6 +436,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 0000000000..2a25b4b8ef --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,24 @@ +sparse.restrictCmds:: + Only meaningful in conjunction with core.sparseCheckout. This option + extends sparse checkouts (which limit which paths are written to the + working tree), so that output and operations are also limited to the + sparsity paths where possible and implemented. The purpose of this + option is to (1) focus output for the user on the portion of the + repository that is of interest to them, and (2) enable potentially + dramatic performance improvements, especially in conjunction with + partial clones. ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c`) that the user might also specify on the +command line. When false, the affected commands will work on full trees, +ignoring the sparsity patterns. For now, only git-grep honors this setting. In +this command, the restriction takes effect in three cases: with --cached; when +a commit-ish is given; when searching a working tree where some paths excluded +by the sparsity patterns are present (e.g. manually created paths or not +removed submodules). ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), +unaffected by any sparsity patterns. Also, writting commands such as +sparse-checkout and read-tree will not be affected by this configuration. diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 9bdf807584..abbf100109 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,6 +41,9 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- +git-grep honors the sparse.restrictCmds setting. See its definition in +linkgit:git-config[1]. + :git-grep: 1 include::config/grep.txt[] diff --git a/Documentation/git.txt b/Documentation/git.txt index 9d6769e95a..5e107c6246 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. +--[no-]restrict-to-sparse-paths:: + Overrides the sparse.restrictCmds configuration (see + linkgit:git-config[1]) for this execution. + --list-cmds=group[,group...]:: List commands by group. This is an internal/experimental option and may change or be removed in the future. Supported diff --git a/Makefile b/Makefile index 90aa329eb7..0c0013b32c 100644 --- a/Makefile +++ b/Makefile @@ -983,6 +983,7 @@ LIB_OBJS += sha1-name.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-checkout.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/builtin/grep.c b/builtin/grep.c index 11e33b8aee..cc696dab4a 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -25,6 +25,7 @@ #include "submodule-config.h" #include "object-store.h" #include "packfile.h" +#include "sparse-checkout.h" static char const * const grep_usage[] = { N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, int nr; struct strbuf name = STRBUF_INIT; int name_base_len = 0; + int sparse_paths_only = restrict_to_sparse_paths(repo); if (repo->submodule_prefix) { name_base_len = strlen(repo->submodule_prefix); strbuf_addstr(&name, repo->submodule_prefix); @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (sparse_paths_only && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int is_root_tree) { struct pattern_list *patterns = NULL; + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); int ret; - if (is_root_tree) + if (is_root_tree && sparse_paths_only) patterns = get_sparsity_patterns(opt->repo); ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, @@ -1257,6 +1260,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!use_index || untracked) { int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { + die(_("--[no-]restrict-to-sparse-paths is incompatible" + " with --no-index and --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); } else if (0 <= opt_exclude) { die(_("--[no-]exclude-standard cannot be used for tracked contents")); diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 70ad04e1b2..71956f7313 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3208,6 +3208,8 @@ __git_main () --namespace= --no-replace-objects --help + --restrict-to-sparse-paths + --no-restrict-to-sparse-paths " ;; *) diff --git a/git.c b/git.c index a2d337eed7..6db1382ae4 100644 --- a/git.c +++ b/git.c @@ -38,6 +38,7 @@ const char git_more_info_string[] = "See 'git help git' for an overview of the system."); static int use_pager = -1; +int opt_restrict_to_sparse_paths = -1; static void list_builtins(struct string_list *list, unsigned int exclude_option); @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); } + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 1; + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 0; } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) (*argv)++; (*argc)--; } + return (*argv) - orig_argv; } diff --git a/sparse-checkout.c b/sparse-checkout.c new file mode 100644 index 0000000000..9a9e50fd29 --- /dev/null +++ b/sparse-checkout.c @@ -0,0 +1,16 @@ +#include "cache.h" +#include "config.h" +#include "sparse-checkout.h" + +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; + + if (opt_restrict_to_sparse_paths >= 0) + return opt_restrict_to_sparse_paths; + + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) + ret = 1; + + return ret; +} diff --git a/sparse-checkout.h b/sparse-checkout.h new file mode 100644 index 0000000000..1de3b588d8 --- /dev/null +++ b/sparse-checkout.h @@ -0,0 +1,11 @@ +#ifndef SPARSE_CHECKOUT_H +#define SPARSE_CHECKOUT_H + +struct repository; + +extern int opt_restrict_to_sparse_paths; /* from git.c */ + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); + +#endif /* SPARSE_CHECKOUT_H */ diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index ce080cf572..1aef084186 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -80,10 +80,10 @@ test_expect_success 'setup' ' test_path_is_file sub2/a ' -# The test bellow checks a special case: the sparsity patterns exclude '/b' +# The two tests bellow check a special case: the sparsity patterns exclude '/b' # and sparse checkout is enable, but the path exists on the working tree (e.g. # manually created after `git sparse-checkout init`). In this case, grep should -# skip it. +# skip the file by default, but not with --no-restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' ' git grep "text" >actual && test_cmp expect actual ' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text + b:new-text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git --no-restrict-to-sparse-paths grep "text" >actual && + test_cmp expect actual +' test_expect_success 'grep --cached should honor sparse checkout' ' cat >expect <<-EOF && @@ -136,7 +146,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' ' # Note that sub2/ is present in the worktree but it is excluded by the sparsity -# patterns, so grep should not recurse into it. +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' cat >expect <<-EOF && a:text @@ -145,6 +155,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s git grep --recurse-submodules "text" >actual && test_cmp expect actual ' +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && + test_cmp expect actual +' test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' cat >expect <<-EOF && @@ -171,4 +190,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse test_cmp expect_tag-to-commit actual_tag-to-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ + 'git -c sparse.restrictCmds=false grep' \ + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' +do + + test_expect_success "$cmd --cached should ignore sparsity patterns" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit + ' +done + +test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ + "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + $commit:sub/A/a:text + $commit:sub/B/b:text + $commit:sub2/a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + tag-to-commit:sub/A/a:text + tag-to-commit:sub/B/b:text + tag-to-commit:sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + $commit >actual_commit && + test_cmp expect_commit actual_commit && + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF + test_config -C sub sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + test_config -C sub sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +for opt in '--untracked' '--no-index' +do + test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " + test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && + test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual + " +done + test_done diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 3c44af6940..a4a7767e06 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' ' --namespace= --no-replace-objects Z --help Z + --restrict-to-sparse-paths Z + --no-restrict-to-sparse-paths Z EOF ' @@ -1515,7 +1517,7 @@ test_expect_success 'general options' ' test_completion "git --nam" "--namespace=" && test_completion "git --bar" "--bare " && test_completion "git --inf" "--info-path " && - test_completion "git --no-r" "--no-replace-objects " + test_completion "git --no-rep" "--no-replace-objects " ' test_expect_success 'general options plus command' ' -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-05-28 1:13 ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-05-30 16:18 ` Elijah Newren 2020-06-01 4:45 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-05-30 16:18 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Wed, May 27, 2020 at 6:14 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > When sparse checkout is enabled, some users expect the output of certain > commands (such as grep, diff, and log) to be also restricted within the > sparsity patterns. This would allow them to effectively work only on the > subset of files in which they are interested; and allow some commands to > possibly perform better, by not considering uninteresting paths. For > this reason, we taught grep to honor the sparsity patterns, in the > previous patch. But, on the other hand, allowing grep and the other > commands mentioned to optionally ignore the patterns also make for some > interesting use cases. E.g. using grep to search for a function > documentation that resides outside the sparse checkout. > > In any case, there is no current way for users to configure the behavior > they want for these commands. Aiming to provide this flexibility, let's > introduce the sparse.restrictCmds setting (and the analogous > --[no]-restrict-to-sparse-paths global option). The default value is > true. For now, grep is the only one affected by this setting, but the > goal is to have support for more commands, in the future. > > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > Documentation/config.txt | 2 + > Documentation/config/sparse.txt | 24 +++++ > Documentation/git-grep.txt | 3 + > Documentation/git.txt | 4 + > Makefile | 1 + > builtin/grep.c | 13 ++- > contrib/completion/git-completion.bash | 2 + > git.c | 6 ++ > sparse-checkout.c | 16 +++ > sparse-checkout.h | 11 +++ > t/t7817-grep-sparse-checkout.sh | 132 ++++++++++++++++++++++++- > t/t9902-completion.sh | 4 +- > 12 files changed, 212 insertions(+), 6 deletions(-) > create mode 100644 Documentation/config/sparse.txt > create mode 100644 sparse-checkout.c > create mode 100644 sparse-checkout.h > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index ef0768b91a..fd74b80302 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -436,6 +436,8 @@ include::config/sequencer.txt[] > > include::config/showbranch.txt[] > > +include::config/sparse.txt[] > + > include::config/splitindex.txt[] > > include::config/ssh.txt[] > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > new file mode 100644 > index 0000000000..2a25b4b8ef > --- /dev/null > +++ b/Documentation/config/sparse.txt > @@ -0,0 +1,24 @@ > +sparse.restrictCmds:: > + Only meaningful in conjunction with core.sparseCheckout. This option > + extends sparse checkouts (which limit which paths are written to the > + working tree), so that output and operations are also limited to the > + sparsity paths where possible and implemented. The purpose of this > + option is to (1) focus output for the user on the portion of the > + repository that is of interest to them, and (2) enable potentially > + dramatic performance improvements, especially in conjunction with > + partial clones. > ++ > +When this option is true (default), some git commands may limit their behavior > +to the paths specified by the sparsity patterns, or to the intersection of > +those paths and any (like `*.c`) that the user might also specify on the > +command line. When false, the affected commands will work on full trees, > +ignoring the sparsity patterns. For now, only git-grep honors this setting. In > +this command, the restriction takes effect in three cases: with --cached; when > +a commit-ish is given; when searching a working tree where some paths excluded > +by the sparsity patterns are present (e.g. manually created paths or not > +removed submodules). I think "In this command, the restriction takes effect..." to the end of the paragraph should be removed. I don't want every subcommand's behavior to be specified here; it'll grow unreadably long and be more likely to eventually go stale. > ++ > +Note: commands which export, integrity check, or create history will always > +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), > +unaffected by any sparsity patterns. Also, writting commands such as > +sparse-checkout and read-tree will not be affected by this configuration. s/writting/writing/ > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 9bdf807584..abbf100109 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -41,6 +41,9 @@ characters. An empty string as search expression matches all lines. > CONFIGURATION > ------------- > > +git-grep honors the sparse.restrictCmds setting. See its definition in > +linkgit:git-config[1]. > + > :git-grep: 1 > include::config/grep.txt[] > > diff --git a/Documentation/git.txt b/Documentation/git.txt > index 9d6769e95a..5e107c6246 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use > Do not perform optional operations that require locks. This is > equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. > > +--[no-]restrict-to-sparse-paths:: > + Overrides the sparse.restrictCmds configuration (see > + linkgit:git-config[1]) for this execution. > + > --list-cmds=group[,group...]:: > List commands by group. This is an internal/experimental > option and may change or be removed in the future. Supported > diff --git a/Makefile b/Makefile > index 90aa329eb7..0c0013b32c 100644 > --- a/Makefile > +++ b/Makefile > @@ -983,6 +983,7 @@ LIB_OBJS += sha1-name.o > LIB_OBJS += shallow.o > LIB_OBJS += sideband.o > LIB_OBJS += sigchain.o > +LIB_OBJS += sparse-checkout.o > LIB_OBJS += split-index.o > LIB_OBJS += stable-qsort.o > LIB_OBJS += strbuf.o > diff --git a/builtin/grep.c b/builtin/grep.c > index 11e33b8aee..cc696dab4a 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -25,6 +25,7 @@ > #include "submodule-config.h" > #include "object-store.h" > #include "packfile.h" > +#include "sparse-checkout.h" > > static char const * const grep_usage[] = { > N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), > @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, > int nr; > struct strbuf name = STRBUF_INIT; > int name_base_len = 0; > + int sparse_paths_only = restrict_to_sparse_paths(repo); > if (repo->submodule_prefix) { > name_base_len = strlen(repo->submodule_prefix); > strbuf_addstr(&name, repo->submodule_prefix); > @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt, > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > > - if (ce_skip_worktree(ce)) > + if (sparse_paths_only && ce_skip_worktree(ce)) > continue; > > strbuf_setlen(&name, name_base_len); > @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > int is_root_tree) > { > struct pattern_list *patterns = NULL; > + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); > int ret; > > - if (is_root_tree) > + if (is_root_tree && sparse_paths_only) > patterns = get_sparsity_patterns(opt->repo); > > ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, It's kinda nice how clean and easy it is to insert this new option after the previous patch. > @@ -1257,6 +1260,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > > if (!use_index || untracked) { > int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; > + > + if (opt_restrict_to_sparse_paths >= 0) { > + die(_("--[no-]restrict-to-sparse-paths is incompatible" > + " with --no-index and --untracked")); > + } > + > hit = grep_directory(&opt, &pathspec, use_exclude, use_index); > } else if (0 <= opt_exclude) { > die(_("--[no-]exclude-standard cannot be used for tracked contents")); > diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash > index 70ad04e1b2..71956f7313 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -3208,6 +3208,8 @@ __git_main () > --namespace= > --no-replace-objects > --help > + --restrict-to-sparse-paths > + --no-restrict-to-sparse-paths > " > ;; > *) > diff --git a/git.c b/git.c > index a2d337eed7..6db1382ae4 100644 > --- a/git.c > +++ b/git.c > @@ -38,6 +38,7 @@ const char git_more_info_string[] = > "See 'git help git' for an overview of the system."); > > static int use_pager = -1; > +int opt_restrict_to_sparse_paths = -1; > > static void list_builtins(struct string_list *list, unsigned int exclude_option); > > @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > } else { > exit(list_cmds(cmd)); > } > + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { > + opt_restrict_to_sparse_paths = 1; > + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { > + opt_restrict_to_sparse_paths = 0; > } else { > fprintf(stderr, _("unknown option: %s\n"), cmd); > usage(git_usage_string); > @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > (*argv)++; > (*argc)--; > } > + > return (*argv) - orig_argv; > } > Why the stray whitespace change? > diff --git a/sparse-checkout.c b/sparse-checkout.c > new file mode 100644 > index 0000000000..9a9e50fd29 > --- /dev/null > +++ b/sparse-checkout.c > @@ -0,0 +1,16 @@ > +#include "cache.h" > +#include "config.h" > +#include "sparse-checkout.h" > + > +int restrict_to_sparse_paths(struct repository *repo) > +{ > + int ret; > + > + if (opt_restrict_to_sparse_paths >= 0) > + return opt_restrict_to_sparse_paths; > + > + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) > + ret = 1; > + > + return ret; > +} Do we want to considering renaming this file to sparse.c, since it's for sparse grep and sparse diff and etc., not just for the checkout piece? It would also go along well with our toplevel related config being in the "sparse" namespace. > diff --git a/sparse-checkout.h b/sparse-checkout.h > new file mode 100644 > index 0000000000..1de3b588d8 > --- /dev/null > +++ b/sparse-checkout.h > @@ -0,0 +1,11 @@ > +#ifndef SPARSE_CHECKOUT_H > +#define SPARSE_CHECKOUT_H > + > +struct repository; > + > +extern int opt_restrict_to_sparse_paths; /* from git.c */ > + > +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ > +int restrict_to_sparse_paths(struct repository *repo); > + > +#endif /* SPARSE_CHECKOUT_H */ > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > index ce080cf572..1aef084186 100755 > --- a/t/t7817-grep-sparse-checkout.sh > +++ b/t/t7817-grep-sparse-checkout.sh > @@ -80,10 +80,10 @@ test_expect_success 'setup' ' > test_path_is_file sub2/a > ' > > -# The test bellow checks a special case: the sparsity patterns exclude '/b' > +# The two tests bellow check a special case: the sparsity patterns exclude '/b' > # and sparse checkout is enable, but the path exists on the working tree (e.g. > # manually created after `git sparse-checkout init`). In this case, grep should > -# skip it. > +# skip the file by default, but not with --no-restrict-to-sparse-paths. > test_expect_success 'grep in working tree should honor sparse checkout' ' > cat >expect <<-EOF && > a:text > @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' ' > git grep "text" >actual && > test_cmp expect actual > ' > +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' > + cat >expect <<-EOF && > + a:text > + b:new-text > + EOF > + echo "new-text" >b && > + test_when_finished "rm b" && > + git --no-restrict-to-sparse-paths grep "text" >actual && > + test_cmp expect actual > +' > > test_expect_success 'grep --cached should honor sparse checkout' ' > cat >expect <<-EOF && > @@ -136,7 +146,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' > ' > > # Note that sub2/ is present in the worktree but it is excluded by the sparsity > -# patterns, so grep should not recurse into it. > +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. > test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' > cat >expect <<-EOF && > a:text > @@ -145,6 +155,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s > git grep --recurse-submodules "text" >actual && > test_cmp expect actual > ' > +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' > + cat >expect <<-EOF && > + a:text > + sub/B/b:text > + sub2/a:text > + EOF > + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && > + test_cmp expect actual > +' > > test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' > cat >expect <<-EOF && > @@ -171,4 +190,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse > test_cmp expect_tag-to-commit actual_tag-to-commit > ' > > +for cmd in 'git --no-restrict-to-sparse-paths grep' \ > + 'git -c sparse.restrictCmds=false grep' \ > + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' > +do > + > + test_expect_success "$cmd --cached should ignore sparsity patterns" ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + EOF > + $cmd --cached "text" >actual && > + test_cmp expect actual > + ' > + > + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:b:text > + $commit:dir/c:text > + EOF > + cat >expect_tag-to-commit <<-EOF && > + tag-to-commit:a:text > + tag-to-commit:b:text > + tag-to-commit:dir/c:text > + EOF > + $cmd "text" $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + $cmd "text" tag-to-commit >actual_tag-to-commit && > + test_cmp expect_tag-to-commit actual_tag-to-commit > + ' > +done > + > +test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' ' s%\w%w/%, or s%\w%with%? Same issue below too. > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + sub/A/a:text > + sub/B/b:text > + sub2/a:text > + EOF > + git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ > + "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' ' > + commit=$(git rev-parse HEAD) && > + cat >expect_commit <<-EOF && > + $commit:a:text > + $commit:b:text > + $commit:dir/c:text > + $commit:sub/A/a:text > + $commit:sub/B/b:text > + $commit:sub2/a:text > + EOF > + cat >expect_tag-to-commit <<-EOF && > + tag-to-commit:a:text > + tag-to-commit:b:text > + tag-to-commit:dir/c:text > + tag-to-commit:sub/A/a:text > + tag-to-commit:sub/B/b:text > + tag-to-commit:sub2/a:text > + EOF > + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ > + $commit >actual_commit && > + test_cmp expect_commit actual_commit && > + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ > + tag-to-commit >actual_tag-to-commit && > + test_cmp expect_tag-to-commit actual_tag-to-commit > +' > + > +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' > + cat >expect <<-EOF && > + a:text > + sub/A/a:text > + sub/B/b:text > + EOF > + test_config -C sub sparse.restrictCmds false && > + git grep --cached --recurse-submodules "text" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' > + cat >expect <<-EOF && > + a:text > + b:text > + dir/c:text > + sub/A/a:text > + sub/B/b:text > + sub2/a:text > + EOF > + test_config -C sub sparse.restrictCmds true && > + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && > + test_cmp expect actual > +' > + > +for opt in '--untracked' '--no-index' > +do > + test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " > + test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && > + test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual > + " > +done > + > test_done > diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh > index 3c44af6940..a4a7767e06 100755 > --- a/t/t9902-completion.sh > +++ b/t/t9902-completion.sh > @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' ' > --namespace= > --no-replace-objects Z > --help Z > + --restrict-to-sparse-paths Z > + --no-restrict-to-sparse-paths Z > EOF > ' > > @@ -1515,7 +1517,7 @@ test_expect_success 'general options' ' > test_completion "git --nam" "--namespace=" && > test_completion "git --bar" "--bare " && > test_completion "git --inf" "--info-path " && > - test_completion "git --no-r" "--no-replace-objects " > + test_completion "git --no-rep" "--no-replace-objects " > ' All these testcases look great (modulo the small typo I pointed out earlier); I kept thinking "but what about case <x>?" and then I kept reading and saw you covered it. You even added some I wasn't thinking about and might have overlooked but seem important. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-05-30 16:18 ` Elijah Newren @ 2020-06-01 4:45 ` Matheus Tavares Bernardino 2020-06-03 2:39 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-01 4:45 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sat, May 30, 2020 at 1:18 PM Elijah Newren <newren@gmail.com> wrote: > > On Wed, May 27, 2020 at 6:14 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > > new file mode 100644 > > index 0000000000..2a25b4b8ef > > --- /dev/null > > +++ b/Documentation/config/sparse.txt > > @@ -0,0 +1,24 @@ > > +sparse.restrictCmds:: > > + Only meaningful in conjunction with core.sparseCheckout. This option > > + extends sparse checkouts (which limit which paths are written to the > > + working tree), so that output and operations are also limited to the > > + sparsity paths where possible and implemented. The purpose of this > > + option is to (1) focus output for the user on the portion of the > > + repository that is of interest to them, and (2) enable potentially > > + dramatic performance improvements, especially in conjunction with > > + partial clones. > > ++ > > +When this option is true (default), some git commands may limit their behavior > > +to the paths specified by the sparsity patterns, or to the intersection of > > +those paths and any (like `*.c`) that the user might also specify on the > > +command line. When false, the affected commands will work on full trees, > > +ignoring the sparsity patterns. For now, only git-grep honors this setting. In > > +this command, the restriction takes effect in three cases: with --cached; when > > +a commit-ish is given; when searching a working tree where some paths excluded > > +by the sparsity patterns are present (e.g. manually created paths or not > > +removed submodules). > > I think "In this command, the restriction takes effect..." to the end > of the paragraph should be removed. I don't want every subcommand's > behavior to be specified here; it'll grow unreadably long and be more > likely to eventually go stale. Yeah, I was also concerned about that. But wouldn't it be important to inform the users how the setting takes place in grep (specially with the corner cases)? And maybe others, in the future? What if we move the information that is only relevant to a single command into its own man page? I.e. git-grep.txt would have something like: sparse.restrictCmds:: See complete definition in linkgit:git-config[1]. In grep, the restriction takes effect in three cases: with --cached; when a commit-ish is given; when searching a working tree where some paths excluded by the sparsity patterns are present (e.g. manually created paths or not removed submodules). The only problem then is that the information would be a little scattered... But I think it shouldn't be a big deal, as a person interested in knowing how foo behaves with sparse.restrictCmds would only need to look into foo's man page, anyway. > > diff --git a/git.c b/git.c > > index a2d337eed7..6db1382ae4 100644 > > --- a/git.c > > +++ b/git.c > > @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > > (*argv)++; > > (*argc)--; > > } > > + > > return (*argv) - orig_argv; > > } > > > > Why the stray whitespace change? Oops, that shouldn't be there. Thanks! > > > diff --git a/sparse-checkout.c b/sparse-checkout.c > > new file mode 100644 > > index 0000000000..9a9e50fd29 > > --- /dev/null > > +++ b/sparse-checkout.c > > @@ -0,0 +1,16 @@ > > +#include "cache.h" > > +#include "config.h" > > +#include "sparse-checkout.h" > > + > > +int restrict_to_sparse_paths(struct repository *repo) > > +{ > > + int ret; > > + > > + if (opt_restrict_to_sparse_paths >= 0) > > + return opt_restrict_to_sparse_paths; > > + > > + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) > > + ret = 1; > > + > > + return ret; > > +} > > Do we want to considering renaming this file to sparse.c, since it's > for sparse grep and sparse diff and etc., not just for the checkout > piece? It would also go along well with our toplevel related config > being in the "sparse" namespace. Makes sense. But since Stolee is already working on "sparse-checkout.c" [1], if we use "sparse.c" in this series we will end up with two extra files. And as "sparse.c" is quite small, I think we could unify into the "sparse-checkout.c". [1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/ > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > index ce080cf572..1aef084186 100755 > > --- a/t/t7817-grep-sparse-checkout.sh > > +++ b/t/t7817-grep-sparse-checkout.sh > > All these testcases look great (modulo the small typo I pointed out > earlier); I kept thinking "but what about case <x>?" and then I kept > reading and saw you covered it. You even added some I wasn't thinking > about and might have overlooked but seem important. Thanks :) ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-06-01 4:45 ` Matheus Tavares Bernardino @ 2020-06-03 2:39 ` Elijah Newren 2020-06-10 21:15 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-06-03 2:39 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Sat, May 30, 2020 at 1:18 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Wed, May 27, 2020 at 6:14 PM Matheus Tavares > > <matheus.bernardino@usp.br> wrote: > > > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > > > new file mode 100644 > > > index 0000000000..2a25b4b8ef > > > --- /dev/null > > > +++ b/Documentation/config/sparse.txt > > > @@ -0,0 +1,24 @@ > > > +sparse.restrictCmds:: > > > + Only meaningful in conjunction with core.sparseCheckout. This option > > > + extends sparse checkouts (which limit which paths are written to the > > > + working tree), so that output and operations are also limited to the > > > + sparsity paths where possible and implemented. The purpose of this > > > + option is to (1) focus output for the user on the portion of the > > > + repository that is of interest to them, and (2) enable potentially > > > + dramatic performance improvements, especially in conjunction with > > > + partial clones. > > > ++ > > > +When this option is true (default), some git commands may limit their behavior > > > +to the paths specified by the sparsity patterns, or to the intersection of > > > +those paths and any (like `*.c`) that the user might also specify on the > > > +command line. When false, the affected commands will work on full trees, > > > +ignoring the sparsity patterns. For now, only git-grep honors this setting. In > > > +this command, the restriction takes effect in three cases: with --cached; when > > > +a commit-ish is given; when searching a working tree where some paths excluded > > > +by the sparsity patterns are present (e.g. manually created paths or not > > > +removed submodules). > > > > I think "In this command, the restriction takes effect..." to the end > > of the paragraph should be removed. I don't want every subcommand's > > behavior to be specified here; it'll grow unreadably long and be more > > likely to eventually go stale. > > Yeah, I was also concerned about that. But wouldn't it be important to > inform the users how the setting takes place in grep (specially with > the corner cases)? And maybe others, in the future? > > What if we move the information that is only relevant to a single > command into its own man page? I.e. git-grep.txt would have something > like: Moving it to grep's manpage seems ideal to me. grep's behavior should be defined in grep's manual. > sparse.restrictCmds:: > See complete definition in linkgit:git-config[1]. In grep, the > restriction takes effect in three cases: with --cached; when a > commit-ish is given; when searching a working tree where some paths > excluded by the sparsity patterns are present (e.g. manually created > paths or not removed submodules). That looks more than a little confusing. Could this definition be something more like "See base definition in linkgit:git-config[1]. grep honors sparse.restrictCmds by limiting searches to the sparsity paths in three cases: when searching the working tree, when searching the index with --cached, or when searching a specified commit" > The only problem then is that the information would be a little > scattered... But I think it shouldn't be a big deal, as a person > interested in knowing how foo behaves with sparse.restrictCmds would > only need to look into foo's man page, anyway. > > > > diff --git a/git.c b/git.c > > > index a2d337eed7..6db1382ae4 100644 > > > --- a/git.c > > > +++ b/git.c > > > @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > > > (*argv)++; > > > (*argc)--; > > > } > > > + > > > return (*argv) - orig_argv; > > > } > > > > > > > Why the stray whitespace change? > > Oops, that shouldn't be there. Thanks! > > > > > > diff --git a/sparse-checkout.c b/sparse-checkout.c > > > new file mode 100644 > > > index 0000000000..9a9e50fd29 > > > --- /dev/null > > > +++ b/sparse-checkout.c > > > @@ -0,0 +1,16 @@ > > > +#include "cache.h" > > > +#include "config.h" > > > +#include "sparse-checkout.h" > > > + > > > +int restrict_to_sparse_paths(struct repository *repo) > > > +{ > > > + int ret; > > > + > > > + if (opt_restrict_to_sparse_paths >= 0) > > > + return opt_restrict_to_sparse_paths; > > > + > > > + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) > > > + ret = 1; > > > + > > > + return ret; > > > +} > > > > Do we want to considering renaming this file to sparse.c, since it's > > for sparse grep and sparse diff and etc., not just for the checkout > > piece? It would also go along well with our toplevel related config > > being in the "sparse" namespace. > > Makes sense. But since Stolee is already working on > "sparse-checkout.c" [1], if we use "sparse.c" in this series we will > end up with two extra files. And as "sparse.c" is quite small, I think > we could unify into the "sparse-checkout.c". > > [1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/ Or we could just suggest he use sparse.c too. :-) Stolee? > > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > > index ce080cf572..1aef084186 100755 > > > --- a/t/t7817-grep-sparse-checkout.sh > > > +++ b/t/t7817-grep-sparse-checkout.sh > > > > All these testcases look great (modulo the small typo I pointed out > > earlier); I kept thinking "but what about case <x>?" and then I kept > > reading and saw you covered it. You even added some I wasn't thinking > > about and might have overlooked but seem important. > > Thanks :) ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-06-03 2:39 ` Elijah Newren @ 2020-06-10 21:15 ` Matheus Tavares Bernardino 2020-06-11 0:35 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-10 21:15 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Tue, Jun 2, 2020 at 11:40 PM Elijah Newren <newren@gmail.com> wrote: > > On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > Moving it to grep's manpage seems ideal to me. grep's behavior should > be defined in grep's manual. > > > sparse.restrictCmds:: > > See complete definition in linkgit:git-config[1]. In grep, the > > restriction takes effect in three cases: with --cached; when a > > commit-ish is given; when searching a working tree where some paths > > excluded by the sparsity patterns are present (e.g. manually created > > paths or not removed submodules). > > That looks more than a little confusing. Could this definition be > something more like "See base definition in linkgit:git-config[1]. > grep honors sparse.restrictCmds by limiting searches to the sparsity > paths in three cases: when searching the working tree, when searching > the index with --cached, or when searching a specified commit" Yes, this looks better, thanks. I would only add a brief explanation on what we mean by limiting the search in the working tree case. Since the working tree should already contain only the sparse paths (in most cases), I think this sentence may sound a little confusing without some explanation. Even further, some users might expect that `git -c sparse.restrictCmds=false grep $pattern` would restore the previous behavior of falling back to the cache for non-present entries, which is not true. In particular, I would like to emphasize that the use for `sparse.restrictCmds=false` in the working tree case, is for situations like the one you described in [1]: * uses sparse-checkout to remove a bunch of files/directories they don't care about * creates a new file that happens to have the same name as an (unfortunately) generically worded filename that exists in the index (but is marked SKIP_WORKTREE and had previously been removed) In this situation, grep would ignore the said file by default, but search it with `sparse.restrictCmds=false`. So what do you think of the following: sparse.restrictCmds:: See base definition in linkgit:git-config[1]. grep honors sparse.restrictCmds by limiting searches to the sparsity paths in three cases: when searching the working tree, when searching the index with --cached, and when searching a specified commit. Note: when this option is set to true (default), the working tree search will ignore paths that are present despite not matching the sparsity patterns. This can happen, for example, if you create a new file in a path that was previously removed by git-sparse-checkout. Or if you don't deinitialize a submodule that is excluded by the sparsity patterns (thus remaining in the working copy, anyway). [1]: https://lore.kernel.org/git/CABPp-BE+BL3Nq=Co=-kNB_wr=6gqX8zcGwa0ega_pGBpk6xYsg@mail.gmail.com/ ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds 2020-06-10 21:15 ` Matheus Tavares Bernardino @ 2020-06-11 0:35 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-06-11 0:35 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan Hi Matheus, On Wed, Jun 10, 2020 at 2:15 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Tue, Jun 2, 2020 at 11:40 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino > > <matheus.bernardino@usp.br> wrote: > > > > > > > Moving it to grep's manpage seems ideal to me. grep's behavior should > > be defined in grep's manual. > > > > > sparse.restrictCmds:: > > > See complete definition in linkgit:git-config[1]. In grep, the > > > restriction takes effect in three cases: with --cached; when a > > > commit-ish is given; when searching a working tree where some paths > > > excluded by the sparsity patterns are present (e.g. manually created > > > paths or not removed submodules). > > > > That looks more than a little confusing. Could this definition be > > something more like "See base definition in linkgit:git-config[1]. > > grep honors sparse.restrictCmds by limiting searches to the sparsity > > paths in three cases: when searching the working tree, when searching > > the index with --cached, or when searching a specified commit" > > Yes, this looks better, thanks. I would only add a brief explanation > on what we mean by limiting the search in the working tree case. Possibly, but I think it would be easy to go overboard here. > Since > the working tree should already contain only the sparse paths (in most > cases), I think this sentence may sound a little confusing without > some explanation. That's an interesting flag. I'm curious, though, would they be confused by it, or would it just seem immediately obvious and almost not worth mentioning? In other words, would they think "Well, if you use sparse-checkout to get just a subset of files checked out, it totally makes sense that grep would be limited in that case. Why do they even need to mention it -- just for completeness, I guess?" And even if not all users think that way, would a large percentage of users around them think that way and point out the obviousness of the docs? If not, maybe we just add a "(obviously)" comment right after "working tree"? > Even further, some users might expect that `git -c > sparse.restrictCmds=false grep $pattern` would restore the previous > behavior of falling back to the cache for non-present entries, which > is not true. 10 years from now, I don't want our docs to consist of a long explanation of all the bugs that existed in various ancient versions of git and how modern behavior differs from each previous iteration. There are times when it's worth calling out bugs in prior versions to bring it to the attention of our users, but I don't see how this is one of them. The previous behavior was just outright buggy and inconsistent, and from my viewpoint, was also a regression. I think it should have been reverted regardless of your series, though skip_worktree stuff was dormant and went unused for a really long time. Also, this is a special area of git where focusing too much on backward compatibility might actually be detrimental. Backward compatibility is a really good goal to keep in mind in general, but the SKIP_WORKTREE usability was traditionally really, really bad -- so much so that outright replacing was contemplated by its author[A], and we placed a HUGE ALL CAPS DISCLAIMER in the documentation of sparse-checkout about how users should expect the behavior of commands to change[B]. So, unlike other areas of git, we should focus on getting sparse-checkout behavior right more than on bug compatibility with previous code and long migration stories. Given the context of such disclaimers and changes, the idea of trying to document those changes makes me think that in the not too distant future we would have the equivalent of the following humorous driving directions from the era before smartphones: "To get to Joe's place, you turn right on the first road after where Billy's Barn burned down 5 years ago..." (when the burned Barn was cleared out 4 years ago and there's no indication of where it once was) [A] https://lore.kernel.org/git/CABPp-BGE-m_UFfUt_moXG-YR=ZW8hMzMwraD7fkFV-+sEHw36w@mail.gmail.com/ [B] https://git-scm.com/docs/git-sparse-checkout#_description > In particular, I would like to emphasize that the use for > `sparse.restrictCmds=false` in the working tree case, is for > situations like the one you described in [1]: > > * uses sparse-checkout to remove a bunch of files/directories they > don't care about > * creates a new file that happens to have the same name as an > (unfortunately) generically worded filename that exists in the index > (but is marked SKIP_WORKTREE and had previously been removed) > > In this situation, grep would ignore the said file by default, but > search it with `sparse.restrictCmds=false`. I think this is such a weird and unusual case that I'm not sure it merits mentioning in the docs. But if others disagree and think this case is worth mentioning in the docs, then it shouldn't just be mentioned in "git grep". All affected manpages should be updated to discuss how they handle this obscure corner case. For example, `git diff` and `git status` just ignore these files and do not print out any information about them. So it's kind of like these files are ignored...but even `git status --ignored` won't show anything about such files. Anyway, I think this is a pretty obscure case whose discussion would dilute the value of the manual in teaching people the basics of commands. > So what do you think of the following: > > sparse.restrictCmds:: > See base definition in linkgit:git-config[1]. grep honors > sparse.restrictCmds by limiting searches to the sparsity paths in > three cases: when searching the working tree, when searching the index > with --cached, and when searching a specified commit. Good up to here. I think I'd like to use just this text as-is (or maybe with the "(obviously)" addition) and then see if we get feedback that we need clarifications, because I'm worried our attempts at clarifying might backfire. For example... > Note: when this > option is set to true (default), the working tree search will ignore > paths that are present despite not matching the sparsity patterns. You've run into the same problem Stolee and I did by trying to provide details about one case, but overlooking others. ;-) This "Note:" statement is not correct; there's a couple cases it gets wrong: merge/rebase/cherry-pick can unset the SKIP_WORKTREE bit even for paths that do not match the sparsity patterns in order to be able to materialize a file and show conflicts. In fact, they are allowed to unset the bit for other files and materialize them too (see https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/). Such paths, despite not matching the sparsity patterns, will not have the SKIP_WORKTREE bit set. And it is the SKIP_WORKTREE bit, rather than the sparsity patterns, that git-grep uses for deciding which files in the working tree to search. Also, if someone runs sparse-checkout init/set, and sparse-checkout would normally remove some file but notices that the file has local modifications, then sparse-checkout will avoid removing the file AND will avoid setting the SKIP_WORKTREE bit on that file. See commit 681c637b4a ("unpack-trees: failure to set SKIP_WORKTREE bits always just a warning", 2020-03-27) > This can happen, for example, if you create a new file in a path that > was previously removed by git-sparse-checkout. This is that obscure corner case discussed above. > Or if you don't > deinitialize a submodule that is excluded by the sparsity patterns > (thus remaining in the working copy, anyway). This case requires more thought. If a submodule doesn't match the sparsity patterns, we already said elsewhere that sparse-checkout should not remove the submodule (since doing so would risk data loss). But do we set the SKIP_WORKTREE bit for it? Generally, sparse-checkout avoids removing files with modifications, and if it doesn't remove them it also doesn't set the SKIP_WORKTREE bit. For consistency, should sparse-checkout not set SKIP_WORKTREE for initialized submodules? If we don't set the SKIP_WORKTREE bit for initialized submodules, then we don't actually have a second different case to mention here. Granted, that's more an issue for `sparse-checkout` than `grep`. Hope that all helps. Let me know if it doesn't, if you disagree with any parts, or some parts aren't clear. Elijah ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (4 preceding siblings ...) 2020-05-28 1:13 ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-06-12 15:44 ` Matheus Tavares 2020-06-12 15:44 ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares ` (7 more replies) 5 siblings, 8 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:44 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy This series makes git-grep restrict its output to the present sparsity patterns. A new global option is added to toggle this behavior in grep and hopefully more commands in the future. Main changes since v3: Patch 2: - Reworded commit message for clarity. Patch 3 and 4: - Split into two patches. The first one contains the changes to easily accommodate new options in t/helper/test-config; the second adds --submodule=path. Patch 4: - Removed the section "Such scenario might not be needed now..." from the commit message. This was not true as we already have other submodule configs, even in git-grep itself, which should be considered when recursing into submodules. And this should happen for both the local scope and worktree scope of the submodules configs. Patch 5: - Reworded commit message as suggested by Elijah [1]. - Fixed spelling errors in t7817 (as also pointed in [1]). - Added test to ensure grep searches unmerged files despite not matching the sparsity patterns. - Renamed builtin/grep.c:in_sparse_checkout() to path_in_sparse_checkout() for clarity. Patch 6: - Fixed typos and spelling errors. - Removed unnecessary new line in git.c. - Included "sparse-checkout.h" in git.c to avoid Sparse error as Ramsay Jones pointed out. And moved opt_restrict_to_sparse_paths to sparse-checkout.c. - Moved information about how grep honors sparse.restrictCmds to grep's man page. [1]: https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ CI: https://github.com/matheustavares/git/actions/runs/133459296 Matheus Tavares (6): doc: grep: unify info on configuration variables t/helper/test-config: return exit codes consistently t/helper/test-config: facilitate addition of new cli options config: correctly read worktree configs in submodules grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 18 +- Documentation/config/sparse.txt | 20 ++ Documentation/git-grep.txt | 36 +-- Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 134 ++++++++++- config.c | 21 +- contrib/completion/git-completion.bash | 2 + git.c | 5 + sparse-checkout.c | 18 ++ sparse-checkout.h | 11 + t/helper/test-config.c | 183 +++++++++----- t/t2404-worktree-config.sh | 16 ++ t/t7011-skip-worktree-reading.sh | 9 - t/t7817-grep-sparse-checkout.sh | 321 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 17 files changed, 687 insertions(+), 118 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v3: 1: 86602034c1 = 1: 99cf2124f3 doc: grep: unify info on configuration variables 2: e5b689aaad ! 2: 85c429ac69 t/helper/test-config: return exit codes consistently @@ Commit message different codes, to reflect the status of the requested operations. These codes are sometimes checked in the tests, but not all of the codes are returned consistently by the helper: 1 will usually refer to a - "value not found", but usage errors can also return 1 or 128. The latter - is also expected on errors within the configset functions. These + "value not found", but usage errors can also return 1 or 128. Moreover, + 128 is also expected on errors within the configset functions. These inconsistent uses of the exit codes can lead to false positives in the - tests. Although all tests that currently check the helper's exit code, - on errors, do also check the output, it's still better to standardize - the exit codes and avoid future problems in new tests. While we are - here, let's also check that we have the expected argc for + tests. Although all tests which expect errors and check the helper's + exit code currently also check the output, it's still better to + standardize the exit codes and avoid future problems in new tests. + While we are here, let's also check that we have the expected argc for configset_get_value and configset_get_value_multi, before trying to use argv. -: ---------- > 3: e9eaaecccc t/helper/test-config: facilitate addition of new cli options 3: 0d2fd01305 ! 4: 6402c96807 config: correctly read worktree configs in submodules @@ Commit message to make the path to the file. Furthermore, it also checks that extensions.worktreeConfig is set through the repository_format_worktree_config variable, which refers to - the_repository only. Thus, when a submodule has worktree settings, a - command executed in the superproject that recurses into the submodule - won't find the said settings. + the_repository only. Thus, when a submodule has worktree-specific + settings, a command executed in the superproject that recurses into the + submodule won't find the said settings. - Such a scenario might not be needed now, but it will be in the following - patch. git-grep will learn to honor sparse checkouts and, when running - with --recurse-submodules, the submodule's sparse checkout settings must - be loaded. As these settings are stored in the config.worktree file, - they would be ignored without this patch. So let's fix this by reading - the right config.worktree file and extensions.worktreeConfig setting, - based on the git_dir and commondir paths given to - do_git_config_sequence(). Also add a test to avoid any regressions. + This will be especially important in the next patch: git-grep will learn + to honor sparse checkouts and, when running with --recurse-submodules, + the submodule's sparse checkout settings must be loaded. As these + settings are stored in the config.worktree file, they would be ignored + without this patch. So let's fix this by reading the right + config.worktree file and extensions.worktreeConfig setting, based on the + git_dir and commondir paths given to do_git_config_sequence(). Also + add a test to avoid any regressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> @@ t/helper/test-config.c * get_value -> prints the value with highest priority for the entered key * @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - int i, val; - const char *v; const struct string_list *strptr; -- struct config_set cs; -+ struct config_set cs = { .hash_initialized = 0 }; + struct config_set cs = { .hash_initialized = 0 }; enum test_config_exit_code ret = TC_SUCCESS; + struct repository *repo = the_repository; + const char *subrepo_path = NULL; -+ -+ argc--; /* skip over "config" */ -+ argv++; -+ -+ if (argc == 0) -+ goto print_usage_error; -+ + + argc--; /* skip over "config" */ + argv++; +@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) + if (argc == 0) + goto print_usage_error; + + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { + argc--; + argv++; + if (argc == 0) + goto print_usage_error; + } - -- if (argc == 3 && !strcmp(argv[1], "read_early_config")) { -- read_early_config(early_config_cb, (void *)argv[2]); -+ if (argc == 2 && !strcmp(argv[0], "read_early_config")) { ++ + if (argc == 2 && !strcmp(argv[0], "read_early_config")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); + return TC_USAGE_ERROR; + } -+ read_early_config(early_config_cb, (void *)argv[1]); + read_early_config(early_config_cb, (void *)argv[1]); return TC_SUCCESS; } - +@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) setup_git_directory(); -- git_configset_init(&cs); -- if (argc < 2) -- goto print_usage_error; + if (subrepo_path) { + const struct submodule *sub; + struct repository *subrepo = xcalloc(1, sizeof(*repo)); @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) + } + repo = subrepo; + } - -- if (argc == 3 && !strcmp(argv[1], "get_value")) { -- if (!git_config_get_value(argv[2], &v)) { -+ if (argc == 2 && !strcmp(argv[0], "get_value")) { ++ + if (argc == 2 && !strcmp(argv[0], "get_value")) { +- if (!git_config_get_value(argv[1], &v)) { + if (!repo_config_get_value(repo, argv[1], &v)) { if (!v) printf("(NULL)\n"); else - printf("%s\n", v); - } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); +@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } -- } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { -- strptr = git_config_get_value_multi(argv[2]); -+ } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { + } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { +- strptr = git_config_get_value_multi(argv[1]); + strptr = repo_config_get_value_multi(repo, argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - printf("%s\n", v); - } - } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (argc == 3 && !strcmp(argv[1], "get_int")) { -- if (!git_config_get_int(argv[2], &val)) { -+ } else if (argc == 2 && !strcmp(argv[0], "get_int")) { + } else if (argc == 2 && !strcmp(argv[0], "get_int")) { +- if (!git_config_get_int(argv[1], &val)) { + if (!repo_config_get_int(repo, argv[1], &val)) { printf("%d\n", val); } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { -- if (!git_config_get_bool(argv[2], &val)) { -+ } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { + } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { +- if (!git_config_get_bool(argv[1], &val)) { + if (!repo_config_get_bool(repo, argv[1], &val)) { printf("%d\n", val); } else { -- printf("Value not found for \"%s\"\n", argv[2]); + -+ printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (argc == 3 && !strcmp(argv[1], "get_string")) { -- if (!git_config_get_string_const(argv[2], &v)) { -+ } else if (argc == 2 && !strcmp(argv[0], "get_string")) { + } else if (argc == 2 && !strcmp(argv[0], "get_string")) { +- if (!git_config_get_string_const(argv[1], &v)) { + if (!repo_config_get_string_const(repo, argv[1], &v)) { printf("%s\n", v); } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { -- for (i = 3; i < argc; i++) { -+ } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); + ret = TC_USAGE_ERROR; + goto out; + } -+ for (i = 2; i < argc; i++) { + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { - fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - goto out; - } - } -- if (!git_configset_get_value(&cs, argv[2], &v)) { -+ if (!git_configset_get_value(&cs, argv[1], &v)) { - if (!v) - printf("(NULL)\n"); - else - printf("%s\n", v); - } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { -- for (i = 3; i < argc; i++) { -+ } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); + ret = TC_USAGE_ERROR; + goto out; + } -+ for (i = 2; i < argc; i++) { + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { - fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - goto out; - } - } -- strptr = git_configset_get_value_multi(&cs, argv[2]); -+ strptr = git_configset_get_value_multi(&cs, argv[1]); - if (strptr) { - for (i = 0; i < strptr->nr; i++) { - v = strptr->items[i].string; -@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - printf("%s\n", v); - } - } else { -- printf("Value not found for \"%s\"\n", argv[2]); -+ printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } -- } else if (!strcmp(argv[1], "iterate")) { + } else if (!strcmp(argv[0], "iterate")) { - git_config(iterate_cb, NULL); -+ } else if (!strcmp(argv[0], "iterate")) { + repo_config(repo, iterate_cb, NULL); } else { print_usage_error: 4: 3b819a8d52 ! 5: 4d2916eb99 grep: honor sparse checkout patterns @@ Commit message One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But - git-grep currently ignores the sparsity patterns and report all matches + git-grep currently ignores the sparsity patterns and reports all matches found outside this subset, which kind of goes in the opposite direction. - Let's fix that, making it honor the sparsity boundaries for every - grepping case where this is relevant: + There are some use cases for ignoring the sparsity patterns and the next + commit will add an option to obtain this behavior, but here we start by + making grep honor the sparsity boundaries in every case where this is + relevant: - git grep in worktree - git grep --cached - git grep $REVISION - For the worktree case, we will not grep paths that have the - SKIP_WORKTREE bit set, even if they are present for some reason (e.g. - manually created after `git sparse-checkout init`). But the next patch - will add an option to do so. (See 'Note' below.) + For the worktree and cached cases, we iterate over paths without the + SKIP_WORKTREE bit set, and limit our searches to these paths. For the + $REVISION case, we limit the paths we search to those that match the + sparsity patterns. (We do not check the SKIP_WORKTREE bit for the + $REVISION case, because $REVISION may contain paths that do not exist in + HEAD and thus for which we have no SKIP_WORKTREE bit to consult. The + sparsity patterns tell us how the SKIP_WORKTREE bit would be set if we + were to check out $REVISION, so we consult those. Also, we don't use the + sparsity patterns with the worktree or cached cases, both because we + have a bit we can check directly and more efficiently, and because + unmerged entries from a merge or a rebase could cause more files to + temporarily be present than the sparsity patterns would normally + select.) - For `git grep $REVISION`, we will choose to honor the sparsity patterns - only when $REVISION is a commit-ish object. The reason is that, for a - tree, we don't know whether it represents the root of a repository or a - subtree. So we wouldn't be able to correctly match it against the - sparsity patterns. E.g. suppose we have a repository with these two - sparsity rules: "/*" and "!/a"; and the following structure: - - / - | - a (file) - | - d (dir) - | - a (file) - - If `git grep $REVISION` were to honor the sparsity patterns for every - object type, when grepping the /d tree, we would wrongly ignore the /d/a - file. This happens because we wouldn't know it resides in /d and - therefore it would wrongly match the pattern "!/a". Furthermore, for a - search in a blob object, we wouldn't even have a path to check the - patterns against. So, let's ignore the sparsity patterns when grepping - non-commit-ish objects. - - Note: The behavior introduced in this patch is what some users have - reported[1] that they would like by default. But the old behavior is - still desirable for some use cases. Therefore, the next patch will add - an option to allow restoring it when needed. - - [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ + Note that there is a special case here: `git grep $TREE`. In this case, + we cannot know whether $TREE corresponds to the root of the repository + or some sub-tree, and thus there is no way for us to know which sparsity + patterns, if any, apply. So the $TREE case will not use sparsity + patterns or any SKIP_WORKTREE bits and will instead always search all + files within the $TREE. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, + return patterns; +} + -+static int in_sparse_checkout(struct strbuf *path, int prefix_len, -+ unsigned int entry_mode, -+ struct index_state *istate, -+ struct pattern_list *sparsity, -+ enum pattern_match_result parent_match, -+ enum pattern_match_result *match) ++static int path_in_sparse_checkout(struct strbuf *path, int prefix_len, ++ unsigned int entry_mode, ++ struct index_state *istate, ++ struct pattern_list *sparsity, ++ enum pattern_match_result parent_match, ++ enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + int is_dir = S_ISDIR(entry_mode); @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + -+ if (!in_sparse_checkout(&path, old_baselen - tn_len, -+ entry.mode, repo->index, -+ sparsity, default_sparsity_match, -+ &sparsity_match)) { ++ if (!path_in_sparse_checkout(&path, old_baselen - tn_len, ++ entry.mode, repo->index, ++ sparsity, default_sparsity_match, ++ &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } @@ t/t7817-grep-sparse-checkout.sh (new) +`-- sub2 + `-- a + -+Where . has non-cone mode sparsity patterns, sub is a submodule with cone mode -+sparsity patterns and sub2 is a submodule that is excluded by the superproject -+sparsity patterns. The resulting sparse checkout should leave the following -+structure on the working tree: ++Where the outer repository has non-cone mode sparsity patterns, sub is a ++submodule with cone mode sparsity patterns and sub2 is a submodule that is ++excluded by the superproject sparsity patterns. The resulting sparse checkout ++should leave the following structure in the working tree: + +. +|-- a @@ t/t7817-grep-sparse-checkout.sh (new) + test_path_is_file sub2/a +' + -+# The test bellow checks a special case: the sparsity patterns exclude '/b' -+# and sparse checkout is enable, but the path exists on the working tree (e.g. ++# The test below checks a special case: the sparsity patterns exclude '/b' ++# and sparse checkout is enabled, but the path exists in the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' @@ t/t7817-grep-sparse-checkout.sh (new) + test_cmp expect actual +' + ++test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' ++ cat >expect <<-EOF && ++ b:modified-b-in-branchX ++ b:modified-b-in-branchY ++ EOF ++ test_when_finished "test_might_fail git merge --abort && \ ++ git checkout master" && ++ ++ git sparse-checkout disable && ++ git checkout -b branchY master && ++ test_commit modified-b-in-branchY b && ++ git checkout -b branchX master && ++ test_commit modified-b-in-branchX b && ++ ++ git sparse-checkout init && ++ test_path_is_missing b && ++ test_must_fail git merge branchY && ++ git grep "modified-b" >actual && ++ test_cmp expect actual ++' ++ +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text 5: 02990a6fa1 ! 6: 4547718b60 config: add setting to ignore sparsity patterns in some cmds @@ Documentation/config.txt: include::config/sequencer.txt[] include::config/ssh.txt[] + ## Documentation/config/grep.txt ## +@@ Documentation/config/grep.txt: grep.fullName:: + grep.fallbackToNoIndex:: + If set to true, fall back to git grep --no-index if git grep + is executed outside of a git repository. Defaults to false. ++ ++ifdef::git-grep[] ++sparse.restrictCmds:: ++ See base definition in linkgit:git-config[1]. grep honors ++ sparse.restrictCmds by limiting searches to the sparsity paths in three ++ cases: when searching the working tree, when searching the index with ++ --cached, and when searching a specified commit. ++endif::git-grep[] + ## Documentation/config/sparse.txt (new) ## @@ +sparse.restrictCmds:: @@ Documentation/config/sparse.txt (new) +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c`) that the user might also specify on the +command line. When false, the affected commands will work on full trees, -+ignoring the sparsity patterns. For now, only git-grep honors this setting. In -+this command, the restriction takes effect in three cases: with --cached; when -+a commit-ish is given; when searching a working tree where some paths excluded -+by the sparsity patterns are present (e.g. manually created paths or not -+removed submodules). ++ignoring the sparsity patterns. For now, only git-grep honors this setting. ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), -+unaffected by any sparsity patterns. Also, writting commands such as ++unaffected by any sparsity patterns. Also, writing commands such as +sparse-checkout and read-tree will not be affected by this configuration. - ## Documentation/git-grep.txt ## -@@ Documentation/git-grep.txt: characters. An empty string as search expression matches all lines. - CONFIGURATION - ------------- - -+git-grep honors the sparse.restrictCmds setting. See its definition in -+linkgit:git-config[1]. -+ - :git-grep: 1 - include::config/grep.txt[] - - ## Documentation/git.txt ## @@ Documentation/git.txt: If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is @@ contrib/completion/git-completion.bash: __git_main () *) ## git.c ## -@@ git.c: const char git_more_info_string[] = - "See 'git help git' for an overview of the system."); - - static int use_pager = -1; -+int opt_restrict_to_sparse_paths = -1; - - static void list_builtins(struct string_list *list, unsigned int exclude_option); +@@ + #include "run-command.h" + #include "alias.h" + #include "shallow.h" ++#include "sparse-checkout.h" + #define RUN_SETUP (1<<0) + #define RUN_SETUP_GENTLY (1<<1) @@ git.c: static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); @@ git.c: static int handle_options(const char ***argv, int *argc, int *envchanged) } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); -@@ git.c: static int handle_options(const char ***argv, int *argc, int *envchanged) - (*argv)++; - (*argc)--; - } -+ - return (*argv) - orig_argv; - } - ## sparse-checkout.c (new) ## @@ @@ sparse-checkout.c (new) +#include "config.h" +#include "sparse-checkout.h" + ++int opt_restrict_to_sparse_paths = -1; ++ +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; @@ sparse-checkout.h (new) + +struct repository; + -+extern int opt_restrict_to_sparse_paths; /* from git.c */ ++extern int opt_restrict_to_sparse_paths; + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'setup' ' test_path_is_file sub2/a ' --# The test bellow checks a special case: the sparsity patterns exclude '/b' -+# The two tests bellow check a special case: the sparsity patterns exclude '/b' - # and sparse checkout is enable, but the path exists on the working tree (e.g. +-# The test below checks a special case: the sparsity patterns exclude '/b' ++# The two tests below check a special case: the sparsity patterns exclude '/b' + # and sparse checkout is enabled, but the path exists in the working tree (e.g. # manually created after `git sparse-checkout init`). In this case, grep should -# skip it. +# skip the file by default, but not with --no-restrict-to-sparse-paths. @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep in working tree shoul + test_cmp expect actual +' - test_expect_success 'grep --cached should honor sparse checkout' ' + test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' cat >expect <<-EOF && @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' ' @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules + ' +done + -+test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' ' ++test_expect_success 'grep --recurse-submodules --cached w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + b:text @@ t/t7817-grep-sparse-checkout.sh: test_expect_success 'grep --recurse-submodules + test_cmp expect actual +' + -+test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' ' ++test_expect_success 'grep --recurse-submodules <commit-ish> w/ --no-restrict-to-sparse-paths' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text -- 2.26.2 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v4 1/6] doc: grep: unify info on configuration variables 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares @ 2020-06-12 15:44 ` Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares ` (6 subsequent siblings) 7 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:44 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which can make maintenance difficult. The first also contains a definition not present in the latter (grep.fullName). To avoid problems like this, let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 10 ++++++++-- Documentation/git-grep.txt | 36 ++++++----------------------------- 2 files changed, 14 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..dd51db38e1 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,14 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index a7f9bc99ea..9bdf807584 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,8 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +:git-grep: 1 +include::config/grep.txt[] OPTIONS ------- @@ -269,8 +243,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v4 2/6] t/helper/test-config: return exit codes consistently 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-06-12 15:44 ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-06-12 15:45 ` Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares ` (5 subsequent siblings) 7 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:45 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy The test-config helper may exit with a variety of at least four different codes, to reflect the status of the requested operations. These codes are sometimes checked in the tests, but not all of the codes are returned consistently by the helper: 1 will usually refer to a "value not found", but usage errors can also return 1 or 128. Moreover, 128 is also expected on errors within the configset functions. These inconsistent uses of the exit codes can lead to false positives in the tests. Although all tests which expect errors and check the helper's exit code currently also check the output, it's still better to standardize the exit codes and avoid future problems in new tests. While we are here, let's also check that we have the expected argc for configset_get_value and configset_get_value_multi, before trying to use argv. Note: this change is implemented with the unification of the exit labels. This might seem unnecessary, for now, but it will benefit the next patch, which will increase the cleanup section. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 76 ++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 36 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 234c722b48..1c8e965840 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -30,6 +30,14 @@ * iterate -> iterate over all values using git_config(), and print some * data for each * + * Exit codes: + * 0: success + * 1: value not found for the given config key + * 2: config file path given as argument is inaccessible or doesn't exist + * 129: test-config usage error + * + * Note: tests may also expect 128 for die() calls in the config machinery. + * * Examples: * * To print the value with highest priority for key "foo.bAr Baz.rock": @@ -64,35 +72,42 @@ static int early_config_cb(const char *var, const char *value, void *vdata) return 0; } +enum test_config_exit_code { + TC_SUCCESS = 0, + TC_VALUE_NOT_FOUND = 1, + TC_CONFIG_FILE_ERROR = 2, + TC_USAGE_ERROR = 129, +}; + int cmd__config(int argc, const char **argv) { int i, val; const char *v; const struct string_list *strptr; struct config_set cs; + enum test_config_exit_code ret = TC_SUCCESS; if (argc == 3 && !strcmp(argv[1], "read_early_config")) { read_early_config(early_config_cb, (void *)argv[2]); - return 0; + return TC_SUCCESS; } setup_git_directory(); git_configset_init(&cs); - if (argc < 2) { - fprintf(stderr, "Please, provide a command name on the command-line\n"); - goto exit1; - } else if (argc == 3 && !strcmp(argv[1], "get_value")) { + if (argc < 2) + goto print_usage_error; + + if (argc == 3 && !strcmp(argv[1], "get_value")) { if (!git_config_get_value(argv[2], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { strptr = git_config_get_value_multi(argv[2]); @@ -104,41 +119,38 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { if (!git_config_get_int(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { if (!git_config_get_bool(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { if (!git_config_get_string_const(argv[2], &v)) { printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "configset_get_value")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } if (!git_configset_get_value(&cs, argv[2], &v)) { @@ -146,17 +158,17 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "configset_get_value_multi")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } strptr = git_configset_get_value_multi(&cs, argv[2]); @@ -168,27 +180,19 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { git_config(iterate_cb, NULL); - goto exit0; + } else { +print_usage_error: + fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); + ret = TC_USAGE_ERROR; } - die("%s: Please check the syntax and the function name", argv[0]); - -exit0: - git_configset_clear(&cs); - return 0; - -exit1: - git_configset_clear(&cs); - return 1; - -exit2: +out: git_configset_clear(&cs); - return 2; + return ret; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-06-12 15:44 ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares @ 2020-06-12 15:45 ` Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares ` (4 subsequent siblings) 7 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:45 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy test-config parses its arguments in an if-else chain, with one arm for each available subcommand. Every arm expects (and checks) that argv corresponds to something like "config <subcommand> [<subcommand args>]". This means that whenever we want to change the syntax to accommodate a new argument before <subcommand> (as we will do in the next patch), we also need to increment the indexes accessing argv everywhere in the if-else chain. This makes patches adding new options much noisier than they need to be, besides being error-prone. So let's skip the "config" argument in argv and argc to take the extra complexity out of such patches (as the following one). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 64 ++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 1c8e965840..61da2574c5 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -84,33 +84,35 @@ int cmd__config(int argc, const char **argv) int i, val; const char *v; const struct string_list *strptr; - struct config_set cs; + struct config_set cs = { .hash_initialized = 0 }; enum test_config_exit_code ret = TC_SUCCESS; - if (argc == 3 && !strcmp(argv[1], "read_early_config")) { - read_early_config(early_config_cb, (void *)argv[2]); + argc--; /* skip over "config" */ + argv++; + + if (argc == 0) + goto print_usage_error; + + if (argc == 2 && !strcmp(argv[0], "read_early_config")) { + read_early_config(early_config_cb, (void *)argv[1]); return TC_SUCCESS; } setup_git_directory(); - git_configset_init(&cs); - if (argc < 2) - goto print_usage_error; - - if (argc == 3 && !strcmp(argv[1], "get_value")) { - if (!git_config_get_value(argv[2], &v)) { + if (argc == 2 && !strcmp(argv[0], "get_value")) { + if (!git_config_get_value(argv[1], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { - strptr = git_config_get_value_multi(argv[2]); + } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { + strptr = git_config_get_value_multi(argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -120,32 +122,32 @@ int cmd__config(int argc, const char **argv) printf("%s\n", v); } } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_int")) { - if (!git_config_get_int(argv[2], &val)) { + } else if (argc == 2 && !strcmp(argv[0], "get_int")) { + if (!git_config_get_int(argv[1], &val)) { printf("%d\n", val); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { - if (!git_config_get_bool(argv[2], &val)) { + } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { + if (!git_config_get_bool(argv[1], &val)) { printf("%d\n", val); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 3 && !strcmp(argv[1], "get_string")) { - if (!git_config_get_string_const(argv[2], &v)) { + } else if (argc == 2 && !strcmp(argv[0], "get_string")) { + if (!git_config_get_string_const(argv[1], &v)) { printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { - for (i = 3; i < argc; i++) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ -153,17 +155,17 @@ int cmd__config(int argc, const char **argv) goto out; } } - if (!git_configset_get_value(&cs, argv[2], &v)) { + if (!git_configset_get_value(&cs, argv[1], &v)) { if (!v) printf("(NULL)\n"); else printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { - for (i = 3; i < argc; i++) { + } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { + for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); @@ -171,7 +173,7 @@ int cmd__config(int argc, const char **argv) goto out; } } - strptr = git_configset_get_value_multi(&cs, argv[2]); + strptr = git_configset_get_value_multi(&cs, argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -181,10 +183,10 @@ int cmd__config(int argc, const char **argv) printf("%s\n", v); } } else { - printf("Value not found for \"%s\"\n", argv[2]); + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[1], "iterate")) { + } else if (!strcmp(argv[0], "iterate")) { git_config(iterate_cb, NULL); } else { print_usage_error: -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v4 4/6] config: correctly read worktree configs in submodules 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (2 preceding siblings ...) 2020-06-12 15:45 ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares @ 2020-06-12 15:45 ` Matheus Tavares 2020-06-16 19:13 ` Elijah Newren 2020-09-01 2:41 ` Jonathan Nieder 2020-06-12 15:45 ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares ` (3 subsequent siblings) 7 siblings, 2 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:45 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the steps in do_git_config_sequence() is to load the worktree-specific config file. Although the function receives a git_dir string, it relies on git_pathdup(), which uses the_repository->git_dir, to make the path to the file. Furthermore, it also checks that extensions.worktreeConfig is set through the repository_format_worktree_config variable, which refers to the_repository only. Thus, when a submodule has worktree-specific settings, a command executed in the superproject that recurses into the submodule won't find the said settings. This will be especially important in the next patch: git-grep will learn to honor sparse checkouts and, when running with --recurse-submodules, the submodule's sparse checkout settings must be loaded. As these settings are stored in the config.worktree file, they would be ignored without this patch. So let's fix this by reading the right config.worktree file and extensions.worktreeConfig setting, based on the git_dir and commondir paths given to do_git_config_sequence(). Also add a test to avoid any regressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- config.c | 21 +++++++++--- t/helper/test-config.c | 67 +++++++++++++++++++++++++++++++++----- t/t2404-worktree-config.sh | 16 +++++++++ 3 files changed, 91 insertions(+), 13 deletions(-) diff --git a/config.c b/config.c index 8db9c77098..c2d56309dc 100644 --- a/config.c +++ b/config.c @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); current_parsing_scope = CONFIG_SCOPE_WORKTREE; - if (!opts->ignore_worktree && repository_format_worktree_config) { - char *path = git_pathdup("config.worktree"); - if (!access_or_die(path, R_OK, 0)) - ret += git_config_from_file(fn, path, data); - free(path); + if (!opts->ignore_worktree && repo_config && opts->git_dir) { + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + struct strbuf buf = STRBUF_INIT; + + read_repository_format(&repo_fmt, repo_config); + + if (!verify_repository_format(&repo_fmt, &buf) && + repo_fmt.worktree_config) { + char *path = mkpathdup("%s/config.worktree", opts->git_dir); + if (!access_or_die(path, R_OK, 0)) + ret += git_config_from_file(fn, path, data); + free(path); + } + + strbuf_release(&buf); + clear_repository_format(&repo_fmt); } current_parsing_scope = CONFIG_SCOPE_COMMAND; diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 61da2574c5..284f83a921 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -2,12 +2,19 @@ #include "cache.h" #include "config.h" #include "string-list.h" +#include "submodule-config.h" /* * This program exposes the C API of the configuration mechanism * as a set of simple commands in order to facilitate testing. * - * Reads stdin and prints result of command to stdout: + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] + * + * If --submodule=<path> is given, <cmd> will operate on the submodule at the + * given <path>. This option is not valid for the commands: read_early_config, + * configset_get_value and configset_get_value_multi. + * + * Possible cmds are: * * get_value -> prints the value with highest priority for the entered key * @@ -86,6 +93,8 @@ int cmd__config(int argc, const char **argv) const struct string_list *strptr; struct config_set cs = { .hash_initialized = 0 }; enum test_config_exit_code ret = TC_SUCCESS; + struct repository *repo = the_repository; + const char *subrepo_path = NULL; argc--; /* skip over "config" */ argv++; @@ -93,7 +102,18 @@ int cmd__config(int argc, const char **argv) if (argc == 0) goto print_usage_error; + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { + argc--; + argv++; + if (argc == 0) + goto print_usage_error; + } + if (argc == 2 && !strcmp(argv[0], "read_early_config")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); + return TC_USAGE_ERROR; + } read_early_config(early_config_cb, (void *)argv[1]); return TC_SUCCESS; } @@ -101,8 +121,23 @@ int cmd__config(int argc, const char **argv) setup_git_directory(); git_configset_init(&cs); + if (subrepo_path) { + const struct submodule *sub; + struct repository *subrepo = xcalloc(1, sizeof(*repo)); + + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", + subrepo_path); + free(subrepo); + ret = TC_USAGE_ERROR; + goto out; + } + repo = subrepo; + } + if (argc == 2 && !strcmp(argv[0], "get_value")) { - if (!git_config_get_value(argv[1], &v)) { + if (!repo_config_get_value(repo, argv[1], &v)) { if (!v) printf("(NULL)\n"); else @@ -112,7 +147,7 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { - strptr = git_config_get_value_multi(argv[1]); + strptr = repo_config_get_value_multi(repo, argv[1]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -126,27 +161,33 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 2 && !strcmp(argv[0], "get_int")) { - if (!git_config_get_int(argv[1], &val)) { + if (!repo_config_get_int(repo, argv[1], &val)) { printf("%d\n", val); } else { printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { - if (!git_config_get_bool(argv[1], &val)) { + if (!repo_config_get_bool(repo, argv[1], &val)) { printf("%d\n", val); } else { + printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 2 && !strcmp(argv[0], "get_string")) { - if (!git_config_get_string_const(argv[1], &v)) { + if (!repo_config_get_string_const(repo, argv[1], &v)) { printf("%s\n", v); } else { printf("Value not found for \"%s\"\n", argv[1]); ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); + ret = TC_USAGE_ERROR; + goto out; + } for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -165,6 +206,11 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { + if (subrepo_path) { + fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); + ret = TC_USAGE_ERROR; + goto out; + } for (i = 2; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -187,14 +233,19 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[0], "iterate")) { - git_config(iterate_cb, NULL); + repo_config(repo, iterate_cb, NULL); } else { print_usage_error: - fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); + fprintf(stderr, "Invalid syntax. Usage: test-tool config" + " [--submodule=<path>] <cmd> [args]\n"); ret = TC_USAGE_ERROR; } out: git_configset_clear(&cs); + if (repo != the_repository) { + repo_clear(repo); + free(repo); + } return ret; } diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh index 286121d8de..b6ab793203 100755 --- a/t/t2404-worktree-config.sh +++ b/t/t2404-worktree-config.sh @@ -76,4 +76,20 @@ test_expect_success 'config.worktree no longer read without extension' ' test_cmp_config -C wt2 shared this.is ' +test_expect_success 'correctly read config.worktree from submodules' ' + test_unconfig extensions.worktreeConfig && + git init sub && + ( + cd sub && + test_commit A && + git config extensions.worktreeConfig true && + git config --worktree wtconfig.sub test-value + ) && + git submodule add ./sub && + git commit -m "add sub" && + echo test-value >expect && + test-tool config --submodule=sub get_value wtconfig.sub >actual && + test_cmp expect actual +' + test_done -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v4 4/6] config: correctly read worktree configs in submodules 2020-06-12 15:45 ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-06-16 19:13 ` Elijah Newren 2020-06-21 16:05 ` Matheus Tavares Bernardino 2020-09-01 2:41 ` Jonathan Nieder 1 sibling, 1 reply; 123+ messages in thread From: Elijah Newren @ 2020-06-16 19:13 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan, Jeff King, Jonathan Nieder Hi, On Fri, Jun 12, 2020 at 8:45 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > One of the steps in do_git_config_sequence() is to load the > worktree-specific config file. Although the function receives a git_dir > string, it relies on git_pathdup(), which uses the_repository->git_dir, > to make the path to the file. Furthermore, it also checks that > extensions.worktreeConfig is set through the > repository_format_worktree_config variable, which refers to > the_repository only. Thus, when a submodule has worktree-specific > settings, a command executed in the superproject that recurses into the > submodule won't find the said settings. > > This will be especially important in the next patch: git-grep will learn > to honor sparse checkouts and, when running with --recurse-submodules, > the submodule's sparse checkout settings must be loaded. As these > settings are stored in the config.worktree file, they would be ignored > without this patch. So let's fix this by reading the right > config.worktree file and extensions.worktreeConfig setting, based on the > git_dir and commondir paths given to do_git_config_sequence(). Also > add a test to avoid any regressions. Thanks for splitting this part of the change out from the previous patch. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > config.c | 21 +++++++++--- > t/helper/test-config.c | 67 +++++++++++++++++++++++++++++++++----- > t/t2404-worktree-config.sh | 16 +++++++++ > 3 files changed, 91 insertions(+), 13 deletions(-) > > diff --git a/config.c b/config.c > index 8db9c77098..c2d56309dc 100644 > --- a/config.c > +++ b/config.c > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > ret += git_config_from_file(fn, repo_config, data); > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > - if (!opts->ignore_worktree && repository_format_worktree_config) { > - char *path = git_pathdup("config.worktree"); > - if (!access_or_die(path, R_OK, 0)) > - ret += git_config_from_file(fn, path, data); > - free(path); > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { What happens when opts->git_dir is NULL? (Does that ever even happen?) Should it fall back to the old code path in that case? > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > + struct strbuf buf = STRBUF_INIT; > + > + read_repository_format(&repo_fmt, repo_config); > + > + if (!verify_repository_format(&repo_fmt, &buf) && > + repo_fmt.worktree_config) { > + char *path = mkpathdup("%s/config.worktree", opts->git_dir); > + if (!access_or_die(path, R_OK, 0)) > + ret += git_config_from_file(fn, path, data); > + free(path); > + } > + > + strbuf_release(&buf); > + clear_repository_format(&repo_fmt); I've tried to poke around a little at this block, but as with the previous series, I still feel like it should be reviewed by someone who knows submodules and/or config handling better. That's easier now that the patch has been split up. Unfortunately, trying to figure out who the submodule expert(s) are (looking at authors of submodule*.[ch]) seems to lead me to a who's who of people who are no longer active in the project. :-( Maybe Peff or jrnieder would have suggestions; cc'ing them. > } > > current_parsing_scope = CONFIG_SCOPE_COMMAND; > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > index 61da2574c5..284f83a921 100644 > --- a/t/helper/test-config.c > +++ b/t/helper/test-config.c > @@ -2,12 +2,19 @@ > #include "cache.h" > #include "config.h" > #include "string-list.h" > +#include "submodule-config.h" > > /* > * This program exposes the C API of the configuration mechanism > * as a set of simple commands in order to facilitate testing. > * > - * Reads stdin and prints result of command to stdout: > + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] > + * > + * If --submodule=<path> is given, <cmd> will operate on the submodule at the > + * given <path>. This option is not valid for the commands: read_early_config, > + * configset_get_value and configset_get_value_multi. > + * > + * Possible cmds are: > * > * get_value -> prints the value with highest priority for the entered key > * > @@ -86,6 +93,8 @@ int cmd__config(int argc, const char **argv) > const struct string_list *strptr; > struct config_set cs = { .hash_initialized = 0 }; > enum test_config_exit_code ret = TC_SUCCESS; > + struct repository *repo = the_repository; > + const char *subrepo_path = NULL; > > argc--; /* skip over "config" */ > argv++; > @@ -93,7 +102,18 @@ int cmd__config(int argc, const char **argv) > if (argc == 0) > goto print_usage_error; > > + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { > + argc--; > + argv++; > + if (argc == 0) > + goto print_usage_error; > + } > + > if (argc == 2 && !strcmp(argv[0], "read_early_config")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); > + return TC_USAGE_ERROR; > + } > read_early_config(early_config_cb, (void *)argv[1]); > return TC_SUCCESS; > } > @@ -101,8 +121,23 @@ int cmd__config(int argc, const char **argv) > setup_git_directory(); > git_configset_init(&cs); > > + if (subrepo_path) { > + const struct submodule *sub; > + struct repository *subrepo = xcalloc(1, sizeof(*repo)); > + > + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); > + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { > + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", > + subrepo_path); > + free(subrepo); > + ret = TC_USAGE_ERROR; > + goto out; > + } > + repo = subrepo; > + } > + > if (argc == 2 && !strcmp(argv[0], "get_value")) { > - if (!git_config_get_value(argv[1], &v)) { > + if (!repo_config_get_value(repo, argv[1], &v)) { > if (!v) > printf("(NULL)\n"); > else > @@ -112,7 +147,7 @@ int cmd__config(int argc, const char **argv) > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { > - strptr = git_config_get_value_multi(argv[1]); > + strptr = repo_config_get_value_multi(repo, argv[1]); > if (strptr) { > for (i = 0; i < strptr->nr; i++) { > v = strptr->items[i].string; > @@ -126,27 +161,33 @@ int cmd__config(int argc, const char **argv) > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 2 && !strcmp(argv[0], "get_int")) { > - if (!git_config_get_int(argv[1], &val)) { > + if (!repo_config_get_int(repo, argv[1], &val)) { > printf("%d\n", val); > } else { > printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { > - if (!git_config_get_bool(argv[1], &val)) { > + if (!repo_config_get_bool(repo, argv[1], &val)) { > printf("%d\n", val); > } else { > + > printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc == 2 && !strcmp(argv[0], "get_string")) { > - if (!git_config_get_string_const(argv[1], &v)) { > + if (!repo_config_get_string_const(repo, argv[1], &v)) { > printf("%s\n", v); > } else { > printf("Value not found for \"%s\"\n", argv[1]); > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); > + ret = TC_USAGE_ERROR; > + goto out; > + } > for (i = 2; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > @@ -165,6 +206,11 @@ int cmd__config(int argc, const char **argv) > ret = TC_VALUE_NOT_FOUND; > } > } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); > + ret = TC_USAGE_ERROR; > + goto out; > + } > for (i = 2; i < argc; i++) { > int err; > if ((err = git_configset_add_file(&cs, argv[i]))) { > @@ -187,14 +233,19 @@ int cmd__config(int argc, const char **argv) > ret = TC_VALUE_NOT_FOUND; > } > } else if (!strcmp(argv[0], "iterate")) { > - git_config(iterate_cb, NULL); > + repo_config(repo, iterate_cb, NULL); > } else { > print_usage_error: > - fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); > + fprintf(stderr, "Invalid syntax. Usage: test-tool config" > + " [--submodule=<path>] <cmd> [args]\n"); > ret = TC_USAGE_ERROR; > } > > out: > git_configset_clear(&cs); > + if (repo != the_repository) { > + repo_clear(repo); > + free(repo); > + } > return ret; > } > diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh > index 286121d8de..b6ab793203 100755 > --- a/t/t2404-worktree-config.sh > +++ b/t/t2404-worktree-config.sh > @@ -76,4 +76,20 @@ test_expect_success 'config.worktree no longer read without extension' ' > test_cmp_config -C wt2 shared this.is > ' > > +test_expect_success 'correctly read config.worktree from submodules' ' > + test_unconfig extensions.worktreeConfig && > + git init sub && > + ( > + cd sub && > + test_commit A && > + git config extensions.worktreeConfig true && > + git config --worktree wtconfig.sub test-value > + ) && > + git submodule add ./sub && > + git commit -m "add sub" && > + echo test-value >expect && > + test-tool config --submodule=sub get_value wtconfig.sub >actual && > + test_cmp expect actual > +' > + > test_done > -- > 2.26.2 ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v4 4/6] config: correctly read worktree configs in submodules 2020-06-16 19:13 ` Elijah Newren @ 2020-06-21 16:05 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-21 16:05 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan, Jeff King, Jonathan Nieder On Tue, Jun 16, 2020 at 4:13 PM Elijah Newren <newren@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 8:45 AM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > config.c | 21 +++++++++--- > > t/helper/test-config.c | 67 +++++++++++++++++++++++++++++++++----- > > t/t2404-worktree-config.sh | 16 +++++++++ > > 3 files changed, 91 insertions(+), 13 deletions(-) > > > > diff --git a/config.c b/config.c > > index 8db9c77098..c2d56309dc 100644 > > --- a/config.c > > +++ b/config.c > > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > > ret += git_config_from_file(fn, repo_config, data); > > > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > > - if (!opts->ignore_worktree && repository_format_worktree_config) { > > - char *path = git_pathdup("config.worktree"); > > - if (!access_or_die(path, R_OK, 0)) > > - ret += git_config_from_file(fn, path, data); > > - free(path); > > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { > > What happens when opts->git_dir is NULL? (Does that ever even > happen?) Should it fall back to the old code path in that case? Sorry for not replying earlier. Yes, opts->git_dir might be NULL in some cases. I did a quick grep search, though, and it seems that this only happens in two circumstances: (1) in builtin/config.c when startup_info->have_repository is false; and (2) in read_early_config(), if have_git_dir() returns false and discover_git_directory() fails. For (2), I think it is right to ignore the worktree config file when opts->git_dir is NULL because we indeed don't have a repo to read the file from. I'm tempted to say the same for (1), but I'm not very familiar with setup.c. By the definition of have_git_dir() it seems possible to have the_repository->git_dir set up even when startup_info->have_repository == false: int have_git_dir(void) { return startup_info->have_repository || the_repository->gitdir; } Nevertheless, the current calls to config_with_options() either set both opts->git_dir and opts->commondir or none. So if we were to fall back to the_repository->git_dir, for the worktree config, when startup_info->have_repository == false, the local config file would still be ignored during the config sequence in such case. I think it wouldn't make much sense to ignore the local config file but try to load the worktree-specific one, which is also dependent on having a repo, and even more specific. So I think we shouldn't fall back to the old code path. But I would appreciate hearing from others more familiar with this code. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v4 4/6] config: correctly read worktree configs in submodules 2020-06-12 15:45 ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares 2020-06-16 19:13 ` Elijah Newren @ 2020-09-01 2:41 ` Jonathan Nieder 2020-09-01 21:44 ` Matheus Tavares Bernardino 1 sibling, 1 reply; 123+ messages in thread From: Jonathan Nieder @ 2020-09-01 2:41 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, gitster, stolee, newren, jonathantanmy Hi, Matheus Tavares wrote: > One of the steps in do_git_config_sequence() is to load the > worktree-specific config file. Although the function receives a git_dir > string, it relies on git_pathdup(), which uses the_repository->git_dir, > to make the path to the file. Furthermore, it also checks that > extensions.worktreeConfig is set through the > repository_format_worktree_config variable, which refers to > the_repository only. Thus, when a submodule has worktree-specific > settings, a command executed in the superproject that recurses into the > submodule won't find the said settings. I think the above goes out of order: it states the "how" before the "what". Instead, a commit message should lead with the problem the change aims to solve. Is the idea here that until this patch, we're only able to read worktree config from a repository when extensions.worktreeConfig is set in the_repository, meaning that - when examining submodule config in a process where the_repository represents the superproject, we do not read the submodule's worktree config even if extensions.worktreeConfig is set in the submodule, unless the superproject has extensions.worktreeConfig set, and - when examining submodule config in a process where the_repository represents the superproject, we *do* read the submodule's worktree config even if extensions.worktreeConfig is not set in the submodule, if the superproject has extensions.worktreeConfig set, and ? That sounds like a serious problem indeed. Thanks for fixing it. > This will be especially important in the next patch: git-grep will learn > to honor sparse checkouts and, when running with --recurse-submodules, > the submodule's sparse checkout settings must be loaded. As these > settings are stored in the config.worktree file, they would be ignored > without this patch. So let's fix this by reading the right > config.worktree file and extensions.worktreeConfig setting, based on the > git_dir and commondir paths given to do_git_config_sequence(). Also > add a test to avoid any regressions. I see. I'm not sure that's more important than other cases, but I can understand if the problem was noticed in this circumstance. :) > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > config.c | 21 +++++++++--- > t/helper/test-config.c | 67 +++++++++++++++++++++++++++++++++----- > t/t2404-worktree-config.sh | 16 +++++++++ > 3 files changed, 91 insertions(+), 13 deletions(-) > > diff --git a/config.c b/config.c > index 8db9c77098..c2d56309dc 100644 > --- a/config.c > +++ b/config.c > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > ret += git_config_from_file(fn, repo_config, data); > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > - if (!opts->ignore_worktree && repository_format_worktree_config) { > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { Can we eliminate the repository_format_worktree_config global to save the next caller from the same problem? > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > + struct strbuf buf = STRBUF_INIT; > + > + read_repository_format(&repo_fmt, repo_config); > + > + if (!verify_repository_format(&repo_fmt, &buf) && > + repo_fmt.worktree_config) { This undoes the caching the repository_format_worktree_config means to do. Can we cache the value in "struct repository" instead? That way, in the common case where we're reading the_repository, we wouldn't experience a slowdown. > - char *path = git_pathdup("config.worktree"); > + char *path = mkpathdup("%s/config.worktree", opts->git_dir); Can this use a helper like repo_git_path or strbuf_repo_git_path (preferably one using strbuf like the latter)? [...] > + strbuf_release(&buf); > + clear_repository_format(&repo_fmt); > } > > current_parsing_scope = CONFIG_SCOPE_COMMAND; > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > index 61da2574c5..284f83a921 100644 > --- a/t/helper/test-config.c > +++ b/t/helper/test-config.c > @@ -2,12 +2,19 @@ > #include "cache.h" > #include "config.h" > #include "string-list.h" > +#include "submodule-config.h" > > /* > * This program exposes the C API of the configuration mechanism > * as a set of simple commands in order to facilitate testing. > * > - * Reads stdin and prints result of command to stdout: > + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] > + * > + * If --submodule=<path> is given, <cmd> will operate on the submodule at the > + * given <path>. This option is not valid for the commands: read_early_config, > + * configset_get_value and configset_get_value_multi. Nice! [...] > @@ -93,7 +102,18 @@ int cmd__config(int argc, const char **argv) > if (argc == 0) > goto print_usage_error; > > + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { > + argc--; > + argv++; > + if (argc == 0) > + goto print_usage_error; > + } Can this use the parse_options API? > + > if (argc == 2 && !strcmp(argv[0], "read_early_config")) { > + if (subrepo_path) { > + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); > + return TC_USAGE_ERROR; Should this use die() or BUG()? > + } > read_early_config(early_config_cb, (void *)argv[1]); > return TC_SUCCESS; > } > @@ -101,8 +121,23 @@ int cmd__config(int argc, const char **argv) > setup_git_directory(); > git_configset_init(&cs); > > + if (subrepo_path) { > + const struct submodule *sub; > + struct repository *subrepo = xcalloc(1, sizeof(*repo)); nit: this could be scoped to cmd__config: struct repository subrepo = {0}; > + > + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); > + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { > + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", > + subrepo_path); > + free(subrepo); > + ret = TC_USAGE_ERROR; Likewise: I think may want to use die() or BUG() (and likewise for other USAGE_ERROR cases). Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v4 4/6] config: correctly read worktree configs in submodules 2020-09-01 2:41 ` Jonathan Nieder @ 2020-09-01 21:44 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-09-01 21:44 UTC (permalink / raw) To: Jonathan Nieder Cc: git, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan Hi, Jonathan On Mon, Aug 31, 2020 at 11:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > > Hi, > > Matheus Tavares wrote: > > > One of the steps in do_git_config_sequence() is to load the > > worktree-specific config file. Although the function receives a git_dir > > string, it relies on git_pathdup(), which uses the_repository->git_dir, > > to make the path to the file. Furthermore, it also checks that > > extensions.worktreeConfig is set through the > > repository_format_worktree_config variable, which refers to > > the_repository only. Thus, when a submodule has worktree-specific > > settings, a command executed in the superproject that recurses into the > > submodule won't find the said settings. > > I think the above goes out of order: it states the "how" before the > "what". Instead, a commit message should lead with the problem the > change aims to solve. Thanks. I will reorder these two sections in the commit message. > Is the idea here that until this patch, we're only able to read > worktree config from a repository when extensions.worktreeConfig is > set in the_repository, meaning that > > - when examining submodule config in a process where the_repository > represents the superproject, we do not read the submodule's worktree > config even if extensions.worktreeConfig is set in the submodule, > unless the superproject has extensions.worktreeConfig set, and Right. > - when examining submodule config in a process where the_repository > represents the superproject, we *do* read the submodule's worktree > config even if extensions.worktreeConfig is not set in the submodule, > if the superproject has extensions.worktreeConfig set, and > > ? Right, but with one change: if extensions.worktreeConfig is not set in the submodule and it is set in the superproject, the *superproject's* worktree config is read (independently of which git_dir was given as argument). > That sounds like a serious problem indeed. Thanks for fixing it. > > > This will be especially important in the next patch: git-grep will learn > > to honor sparse checkouts and, when running with --recurse-submodules, > > the submodule's sparse checkout settings must be loaded. As these > > settings are stored in the config.worktree file, they would be ignored > > without this patch. So let's fix this by reading the right > > config.worktree file and extensions.worktreeConfig setting, based on the > > git_dir and commondir paths given to do_git_config_sequence(). Also > > add a test to avoid any regressions. > > I see. I'm not sure that's more important than other cases, but I > can understand if the problem was noticed in this circumstance. :) > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > --- > > config.c | 21 +++++++++--- > > t/helper/test-config.c | 67 +++++++++++++++++++++++++++++++++----- > > t/t2404-worktree-config.sh | 16 +++++++++ > > 3 files changed, 91 insertions(+), 13 deletions(-) > > > > diff --git a/config.c b/config.c > > index 8db9c77098..c2d56309dc 100644 > > --- a/config.c > > +++ b/config.c > > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > > ret += git_config_from_file(fn, repo_config, data); > > > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > > - if (!opts->ignore_worktree && repository_format_worktree_config) { > > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { > > Can we eliminate the repository_format_worktree_config global to save > the next caller from the same problem? Hmm, I think it's possible, I will investigate it further. > > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > > + struct strbuf buf = STRBUF_INIT; > > + > > + read_repository_format(&repo_fmt, repo_config); > > + > > + if (!verify_repository_format(&repo_fmt, &buf) && > > + repo_fmt.worktree_config) { > > This undoes the caching the repository_format_worktree_config means to > do. Can we cache the value in "struct repository" instead? That way, > in the common case where we're reading the_repository, we wouldn't > experience a slowdown. Yeah, that would be the best solution. But, unfortunately, do_git_config_sequence() doesn't receive a complete repository struct, just the 'commondir' and 'git_dir' strings. > > - char *path = git_pathdup("config.worktree"); > > + char *path = mkpathdup("%s/config.worktree", opts->git_dir); > > Can this use a helper like repo_git_path or strbuf_repo_git_path > (preferably one using strbuf like the latter)? Hmm, here we would have the same problem of not having a 'struct repository' to pass to those functions :( > [...] > > + strbuf_release(&buf); > > + clear_repository_format(&repo_fmt); > > } > > > > current_parsing_scope = CONFIG_SCOPE_COMMAND; > > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > > index 61da2574c5..284f83a921 100644 > > --- a/t/helper/test-config.c > > +++ b/t/helper/test-config.c > > @@ -2,12 +2,19 @@ > > #include "cache.h" > > #include "config.h" > > #include "string-list.h" > > +#include "submodule-config.h" > > > > /* > > * This program exposes the C API of the configuration mechanism > > * as a set of simple commands in order to facilitate testing. > > * > > - * Reads stdin and prints result of command to stdout: > > + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] > > + * > > + * If --submodule=<path> is given, <cmd> will operate on the submodule at the > > + * given <path>. This option is not valid for the commands: read_early_config, > > + * configset_get_value and configset_get_value_multi. > > Nice! > > [...] > > @@ -93,7 +102,18 @@ int cmd__config(int argc, const char **argv) > > if (argc == 0) > > goto print_usage_error; > > > > + if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { > > + argc--; > > + argv++; > > + if (argc == 0) > > + goto print_usage_error; > > + } > > Can this use the parse_options API? Right, it would make it easier to add more options in the future. There is only one consideration, though, about parse_options()'s exit codes on error, but more on that below... > > + > > if (argc == 2 && !strcmp(argv[0], "read_early_config")) { > > + if (subrepo_path) { > > + fprintf(stderr, "Cannot use --submodule with read_early_config\n"); > > + return TC_USAGE_ERROR; > > Should this use die() or BUG()? The idea of using TC_USAGE_ERROR (129) here and not die() (128), was that some users of the test-config helper want to detect die() errors from the config machinery itself. So by using a different exit code, we can avoid false positives in these tests. Of course they should also be checking stderr/stdout, but there is at least one test which only checks the exit code. Rethinking about that now, instead of using different exit codes in test-config.c, should we adjust the tests to use `test_must_fail` and only check stderr/stdout? Then we could use die() (or BUG()) here, as you suggested, as well as the parse_options API in the snippet above. Does that sound reasonable? > > + } > > read_early_config(early_config_cb, (void *)argv[1]); > > return TC_SUCCESS; > > } > > @@ -101,8 +121,23 @@ int cmd__config(int argc, const char **argv) > > setup_git_directory(); > > git_configset_init(&cs); > > > > + if (subrepo_path) { > > + const struct submodule *sub; > > + struct repository *subrepo = xcalloc(1, sizeof(*repo)); > > nit: this could be scoped to cmd__config: > > struct repository subrepo = {0}; OK, will do. Thanks > > + > > + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); > > + if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { > > + fprintf(stderr, "Invalid argument to --submodule: '%s'\n", > > + subrepo_path); > > + free(subrepo); > > + ret = TC_USAGE_ERROR; > > Likewise: I think may want to use die() or BUG() (and likewise for other > USAGE_ERROR cases). > > Thanks and hope that helps, > Jonathan It did :) Thanks a lot for the comments! ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v4 5/6] grep: honor sparse checkout patterns 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (3 preceding siblings ...) 2020-06-12 15:45 ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-06-12 15:45 ` Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares ` (2 subsequent siblings) 7 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:45 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and reports all matches found outside this subset, which kind of goes in the opposite direction. There are some use cases for ignoring the sparsity patterns and the next commit will add an option to obtain this behavior, but here we start by making grep honor the sparsity boundaries in every case where this is relevant: - git grep in worktree - git grep --cached - git grep $REVISION For the worktree and cached cases, we iterate over paths without the SKIP_WORKTREE bit set, and limit our searches to these paths. For the $REVISION case, we limit the paths we search to those that match the sparsity patterns. (We do not check the SKIP_WORKTREE bit for the $REVISION case, because $REVISION may contain paths that do not exist in HEAD and thus for which we have no SKIP_WORKTREE bit to consult. The sparsity patterns tell us how the SKIP_WORKTREE bit would be set if we were to check out $REVISION, so we consult those. Also, we don't use the sparsity patterns with the worktree or cached cases, both because we have a bit we can check directly and more efficiently, and because unmerged entries from a merge or a rebase could cause more files to temporarily be present than the sparsity patterns would normally select.) Note that there is a special case here: `git grep $TREE`. In this case, we cannot know whether $TREE corresponds to the root of the repository or some sub-tree, and thus there is no way for us to know which sparsity patterns, if any, apply. So the $TREE case will not use sparsity patterns or any SKIP_WORKTREE bits and will instead always search all files within the $TREE. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- builtin/grep.c | 125 ++++++++++++++++++-- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 195 +++++++++++++++++++++++++++++++ 3 files changed, 312 insertions(+), 17 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index a5056f395a..bee0681393 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -552,9 +555,76 @@ static int grep_cache(struct grep_opt *opt, return hit; } -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) +static struct pattern_list *get_sparsity_patterns(struct repository *repo) +{ + struct pattern_list *patterns; + char *sparse_file; + int sparse_config, cone_config; + + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || + !sparse_config) { + return NULL; + } + + sparse_file = repo_git_path(repo, "info/sparse-checkout"); + patterns = xcalloc(1, sizeof(*patterns)); + + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) + cone_config = 0; + patterns->use_cone_patterns = cone_config; + + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { + if (file_exists(sparse_file)) { + warning(_("failed to load sparse-checkout file: '%s'"), + sparse_file); + } + free(sparse_file); + free(patterns); + return NULL; + } + + free(sparse_file); + return patterns; +} + +static int path_in_sparse_checkout(struct strbuf *path, int prefix_len, + unsigned int entry_mode, + struct index_state *istate, + struct pattern_list *sparsity, + enum pattern_match_result parent_match, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + int is_dir = S_ISDIR(entry_mode); + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + + if (is_dir && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, + path->buf + prefix_len, &dtype, + sparsity, istate); + if (*match == UNDECIDED) + *match = parent_match; + + if (is_dir) + strbuf_trim_trailing_dir_sep(path); + + if (*match == NOT_MATCHED && + (!is_dir || (is_dir && sparsity->use_cone_patterns))) + return 0; + + return 1; +} + +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int check_attr, struct pattern_list *sparsity, + enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ -570,6 +640,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); + enum pattern_match_result sparsity_match = 0; if (match != all_entries_interesting) { strbuf_addstr(&name, base->buf + tn_len); @@ -586,6 +657,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (sparsity) { + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + + if (!path_in_sparse_checkout(&path, old_baselen - tn_len, + entry.mode, repo->index, + sparsity, default_sparsity_match, + &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); @@ -602,8 +686,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, + check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, @@ -621,6 +705,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, return hit; } +/* + * Note: sparsity patterns and paths' attributes will only be considered if + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern + * matching on paths.) + */ +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int is_root_tree) +{ + struct pattern_list *patterns = NULL; + int ret; + + if (is_root_tree) + patterns = get_sparsity_patterns(opt->repo); + + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, + patterns, 0); + + if (patterns) { + clear_pattern_list(patterns); + free(patterns); + } + return ret; +} + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct object *obj, const char *name, const char *path) { diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..b3109e3479 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,195 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +|-- sub +| |-- A +| | `-- a +| `-- B +| `-- b +`-- sub2 + `-- a + +Where the outer repository has non-cone mode sparsity patterns, sub is a +submodule with cone mode sparsity patterns and sub2 is a submodule that is +excluded by the superproject sparsity patterns. The resulting sparse checkout +should leave the following structure in the working tree: + +. +|-- a +|-- sub +| `-- B +| `-- b +`-- sub2 + `-- a + +But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git init sub2 && + ( + cd sub2 && + echo "text" >a && + git add a && + git commit -m sub2 + ) && + + git submodule add ./sub && + git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" "sub" && + + git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b && + test_path_is_file sub2/a +' + +# The test below checks a special case: the sparsity patterns exclude '/b' +# and sparse checkout is enabled, but the path exists in the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' + cat >expect <<-EOF && + b:modified-b-in-branchX + b:modified-b-in-branchY + EOF + test_when_finished "test_might_fail git merge --abort && \ + git checkout master" && + + git sparse-checkout disable && + git checkout -b branchY master && + test_commit modified-b-in-branchY b && + git checkout -b branchX master && + test_commit modified-b-in-branchX b && + + git sparse-checkout init && + test_path_is_missing b && + test_must_fail git merge branchY && + git grep "modified-b" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_tag-to-tree <<-EOF && + tag-to-tree:a:text + tag-to-tree:b:text + tag-to-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" tag-to-tree >actual_tag-to-tree && + test_cmp expect_tag-to-tree actual_tag-to-tree +' + +# Note that sub2/ is present in the worktree but it is excluded by the sparsity +# patterns, so grep should not recurse into it. +test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:sub/B/b:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_done -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (4 preceding siblings ...) 2020-06-12 15:45 ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares @ 2020-06-12 15:45 ` Matheus Tavares 2020-06-16 22:31 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares 7 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-06-12 15:45 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy When sparse checkout is enabled, some users expect the output of certain commands (such as grep, diff, and log) to be also restricted within the sparsity patterns. This would allow them to effectively work only on the subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the previous patch. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function documentation that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's introduce the sparse.restrictCmds setting (and the analogous --[no]-restrict-to-sparse-paths global option). The default value is true. For now, grep is the only one affected by this setting, but the goal is to have support for more commands, in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config.txt | 2 + Documentation/config/grep.txt | 8 ++ Documentation/config/sparse.txt | 20 ++++ Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 13 ++- contrib/completion/git-completion.bash | 2 + git.c | 5 + sparse-checkout.c | 18 ++++ sparse-checkout.h | 11 +++ t/t7817-grep-sparse-checkout.sh | 132 ++++++++++++++++++++++++- t/t9902-completion.sh | 4 +- 12 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h diff --git a/Documentation/config.txt b/Documentation/config.txt index ef0768b91a..fd74b80302 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -436,6 +436,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index dd51db38e1..a3275ab4b7 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -28,3 +28,11 @@ grep.fullName:: grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. + +ifdef::git-grep[] +sparse.restrictCmds:: + See base definition in linkgit:git-config[1]. grep honors + sparse.restrictCmds by limiting searches to the sparsity paths in three + cases: when searching the working tree, when searching the index with + --cached, and when searching a specified commit. +endif::git-grep[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 0000000000..494761526e --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,20 @@ +sparse.restrictCmds:: + Only meaningful in conjunction with core.sparseCheckout. This option + extends sparse checkouts (which limit which paths are written to the + working tree), so that output and operations are also limited to the + sparsity paths where possible and implemented. The purpose of this + option is to (1) focus output for the user on the portion of the + repository that is of interest to them, and (2) enable potentially + dramatic performance improvements, especially in conjunction with + partial clones. ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c`) that the user might also specify on the +command line. When false, the affected commands will work on full trees, +ignoring the sparsity patterns. For now, only git-grep honors this setting. ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), +unaffected by any sparsity patterns. Also, writing commands such as +sparse-checkout and read-tree will not be affected by this configuration. diff --git a/Documentation/git.txt b/Documentation/git.txt index 40bd32f590..89604b6648 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. +--[no-]restrict-to-sparse-paths:: + Overrides the sparse.restrictCmds configuration (see + linkgit:git-config[1]) for this execution. + --list-cmds=group[,group...]:: List commands by group. This is an internal/experimental option and may change or be removed in the future. Supported diff --git a/Makefile b/Makefile index 372139f1f2..9c8a6f19cd 100644 --- a/Makefile +++ b/Makefile @@ -983,6 +983,7 @@ LIB_OBJS += sha1-name.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-checkout.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/builtin/grep.c b/builtin/grep.c index bee0681393..7f485ea732 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -25,6 +25,7 @@ #include "submodule-config.h" #include "object-store.h" #include "packfile.h" +#include "sparse-checkout.h" static char const * const grep_usage[] = { N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, int nr; struct strbuf name = STRBUF_INIT; int name_base_len = 0; + int sparse_paths_only = restrict_to_sparse_paths(repo); if (repo->submodule_prefix) { name_base_len = strlen(repo->submodule_prefix); strbuf_addstr(&name, repo->submodule_prefix); @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (sparse_paths_only && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int is_root_tree) { struct pattern_list *patterns = NULL; + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); int ret; - if (is_root_tree) + if (is_root_tree && sparse_paths_only) patterns = get_sparsity_patterns(opt->repo); ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, @@ -1257,6 +1260,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!use_index || untracked) { int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { + die(_("--[no-]restrict-to-sparse-paths is incompatible" + " with --no-index and --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); } else if (0 <= opt_exclude) { die(_("--[no-]exclude-standard cannot be used for tracked contents")); diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 4b59004847..3f15fd5275 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3208,6 +3208,8 @@ __git_main () --namespace= --no-replace-objects --help + --restrict-to-sparse-paths + --no-restrict-to-sparse-paths " ;; *) diff --git a/git.c b/git.c index a2d337eed7..99bac1d26d 100644 --- a/git.c +++ b/git.c @@ -5,6 +5,7 @@ #include "run-command.h" #include "alias.h" #include "shallow.h" +#include "sparse-checkout.h" #define RUN_SETUP (1<<0) #define RUN_SETUP_GENTLY (1<<1) @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); } + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 1; + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 0; } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); diff --git a/sparse-checkout.c b/sparse-checkout.c new file mode 100644 index 0000000000..96c5ed5446 --- /dev/null +++ b/sparse-checkout.c @@ -0,0 +1,18 @@ +#include "cache.h" +#include "config.h" +#include "sparse-checkout.h" + +int opt_restrict_to_sparse_paths = -1; + +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; + + if (opt_restrict_to_sparse_paths >= 0) + return opt_restrict_to_sparse_paths; + + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) + ret = 1; + + return ret; +} diff --git a/sparse-checkout.h b/sparse-checkout.h new file mode 100644 index 0000000000..a4805e443a --- /dev/null +++ b/sparse-checkout.h @@ -0,0 +1,11 @@ +#ifndef SPARSE_CHECKOUT_H +#define SPARSE_CHECKOUT_H + +struct repository; + +extern int opt_restrict_to_sparse_paths; + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); + +#endif /* SPARSE_CHECKOUT_H */ diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index b3109e3479..f93a4f71d1 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -80,10 +80,10 @@ test_expect_success 'setup' ' test_path_is_file sub2/a ' -# The test below checks a special case: the sparsity patterns exclude '/b' +# The two tests below check a special case: the sparsity patterns exclude '/b' # and sparse checkout is enabled, but the path exists in the working tree (e.g. # manually created after `git sparse-checkout init`). In this case, grep should -# skip it. +# skip the file by default, but not with --no-restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' ' git grep "text" >actual && test_cmp expect actual ' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text + b:new-text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git --no-restrict-to-sparse-paths grep "text" >actual && + test_cmp expect actual +' test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' cat >expect <<-EOF && @@ -157,7 +167,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' ' # Note that sub2/ is present in the worktree but it is excluded by the sparsity -# patterns, so grep should not recurse into it. +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' cat >expect <<-EOF && a:text @@ -166,6 +176,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s git grep --recurse-submodules "text" >actual && test_cmp expect actual ' +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && + test_cmp expect actual +' test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' cat >expect <<-EOF && @@ -192,4 +211,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse test_cmp expect_tag-to-commit actual_tag-to-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ + 'git -c sparse.restrictCmds=false grep' \ + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' +do + + test_expect_success "$cmd --cached should ignore sparsity patterns" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit + ' +done + +test_expect_success 'grep --recurse-submodules --cached w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ + "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> w/ --no-restrict-to-sparse-paths' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + $commit:sub/A/a:text + $commit:sub/B/b:text + $commit:sub2/a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + tag-to-commit:sub/A/a:text + tag-to-commit:sub/B/b:text + tag-to-commit:sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + $commit >actual_commit && + test_cmp expect_commit actual_commit && + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF + test_config -C sub sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + test_config -C sub sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +for opt in '--untracked' '--no-index' +do + test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " + test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && + test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual + " +done + test_done diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 3c44af6940..a4a7767e06 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' ' --namespace= --no-replace-objects Z --help Z + --restrict-to-sparse-paths Z + --no-restrict-to-sparse-paths Z EOF ' @@ -1515,7 +1517,7 @@ test_expect_success 'general options' ' test_completion "git --nam" "--namespace=" && test_completion "git --bar" "--bare " && test_completion "git --inf" "--info-path " && - test_completion "git --no-r" "--no-replace-objects " + test_completion "git --no-rep" "--no-replace-objects " ' test_expect_success 'general options plus command' ' -- 2.26.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (5 preceding siblings ...) 2020-06-12 15:45 ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-06-16 22:31 ` Elijah Newren 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares 7 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2020-06-16 22:31 UTC (permalink / raw) To: Matheus Tavares Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jonathan Tan On Fri, Jun 12, 2020 at 8:45 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > This series makes git-grep restrict its output to the present sparsity > patterns. A new global option is added to toggle this behavior in grep > and hopefully more commands in the future. You've cleaned up all the issues (or corrected my understanding) from my comments in the previous iterations of this series; I didn't spot any additional issues in reading over this latest version of the series. However, I would like someone more familiar with submodules and/or config to take a look at the changes to do_git_config_sequence() in patch 4, as I commented on there, if we can find someone to do so. Thanks for working on this; nice work! Elijah ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 0/8] grep: honor sparse checkout and add option to ignore it 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (6 preceding siblings ...) 2020-06-16 22:31 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares ` (8 more replies) 7 siblings, 9 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder This series makes git-grep restrict its output to the sparsity patterns when requested by the user. A new global option is added to control this behavior in grep and hopefully more commands in the future. There are also a couple fixes in t/helper/test-config and in a test that uses it. Changes since v4: - Rebased on top of master to use repo_config_get_string_tmp(), added in jk/leakfix, in t/helper/test-config (patch 6). - Added patch 2, to make sure a test that relies on test-config checks its output in addition to the exit code, to avoid false positives. - Split patch "t/helper/test-config: return exit codes consistently" into three separated ones, as these are in fact three non-related changes: "t/helper/test-config: unify exit labels" "t/helper/test-config: check argc before accessing argv" "t/helper/test-config: be consistent with exit codes" - Removed TC_USAGE_ERROR in favor of calling die(). Also removed the test_config_exit_code enum. - On "config: correctly read worktree configs in submodules": * Improved commit message to focus on the problem instead of the implementation and remove section about the grep example. * Made use of the parse_options API * Allocated subrepo struct in the stack instead of malloc()'ing. Matheus Tavares (8): doc: grep: unify info on configuration variables t1308-config-set: avoid false positives when using test-config t/helper/test-config: be consistent with exit codes t/helper/test-config: check argc before accessing argv t/helper/test-config: unify exit labels config: correctly read worktree configs in submodules grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 18 +- Documentation/config/sparse.txt | 20 ++ Documentation/git-grep.txt | 36 +-- Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 134 ++++++++++- config.c | 21 +- contrib/completion/git-completion.bash | 2 + git.c | 5 + sparse-checkout.c | 18 ++ sparse-checkout.h | 11 + t/helper/test-config.c | 126 ++++++---- t/t1308-config-set.sh | 8 +- t/t2404-worktree-config.sh | 16 ++ t/t7011-skip-worktree-reading.sh | 9 - t/t7817-grep-sparse-checkout.sh | 321 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 18 files changed, 652 insertions(+), 104 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v4: 1: fc47a96bfa = 1: 70c9a4e741 doc: grep: unify info on configuration variables -: ---------- > 2: f53782f14c t1308-config-set: avoid false positives when using test-config -: ---------- > 3: 85e1588d6c t/helper/test-config: be consistent with exit codes -: ---------- > 4: 0750191342 t/helper/test-config: check argc before accessing argv 2: 874aab36dd ! 5: 56535b0e36 t/helper/test-config: return exit codes consistently @@ Metadata Author: Matheus Tavares <matheus.bernardino@usp.br> ## Commit message ## - t/helper/test-config: return exit codes consistently + t/helper/test-config: unify exit labels - The test-config helper may exit with a variety of at least four - different codes, to reflect the status of the requested operations. - These codes are sometimes checked in the tests, but not all of the codes - are returned consistently by the helper: 1 will usually refer to a - "value not found", but usage errors can also return 1 or 128. Moreover, - 128 is also expected on errors within the configset functions. These - inconsistent uses of the exit codes can lead to false positives in the - tests. Although all tests which expect errors and check the helper's - exit code currently also check the output, it's still better to - standardize the exit codes and avoid future problems in new tests. - While we are here, let's also check that we have the expected argc for - configset_get_value and configset_get_value_multi, before trying to use - argv. - - Note: this change is implemented with the unification of the exit - labels. This might seem unnecessary, for now, but it will benefit the - next patch, which will increase the cleanup section. + test-config's main function has three different exit labels, all of + which have to perform the same cleanup code before returning. Unify the + labels in preparation for a future patch which will increase the cleanup + section. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> ## t/helper/test-config.c ## -@@ - * iterate -> iterate over all values using git_config(), and print some - * data for each - * -+ * Exit codes: -+ * 0: success -+ * 1: value not found for the given config key -+ * 2: config file path given as argument is inaccessible or doesn't exist -+ * 129: test-config usage error -+ * -+ * Note: tests may also expect 128 for die() calls in the config machinery. -+ * - * Examples: - * - * To print the value with highest priority for key "foo.bAr Baz.rock": @@ t/helper/test-config.c: static int early_config_cb(const char *var, const char *value, void *vdata) return 0; } -+enum test_config_exit_code { -+ TC_SUCCESS = 0, -+ TC_VALUE_NOT_FOUND = 1, -+ TC_CONFIG_FILE_ERROR = 2, -+ TC_USAGE_ERROR = 129, -+}; ++#define TC_VALUE_NOT_FOUND 1 ++#define TC_CONFIG_FILE_ERROR 2 + int cmd__config(int argc, const char **argv) { - int i, val; +- int i, val; ++ int i, val, ret = 0; const char *v; const struct string_list *strptr; struct config_set cs; -+ enum test_config_exit_code ret = TC_SUCCESS; if (argc == 3 && !strcmp(argv[1], "read_early_config")) { read_early_config(early_config_cb, (void *)argv[2]); - return 0; -+ return TC_SUCCESS; ++ return ret; } setup_git_directory(); - - git_configset_init(&cs); - -- if (argc < 2) { -- fprintf(stderr, "Please, provide a command name on the command-line\n"); -- goto exit1; -- } else if (argc == 3 && !strcmp(argv[1], "get_value")) { -+ if (argc < 2) -+ goto print_usage_error; -+ -+ if (argc == 3 && !strcmp(argv[1], "get_value")) { - if (!git_config_get_value(argv[2], &v)) { - if (!v) +@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - goto exit1; + ret = TC_VALUE_NOT_FOUND; } -- } else if (!strcmp(argv[1], "configset_get_value")) { -+ } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - goto exit1; + ret = TC_VALUE_NOT_FOUND; } -- } else if (!strcmp(argv[1], "configset_get_value_multi")) { -+ } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) git_config(iterate_cb, NULL); - goto exit0; + } else { -+print_usage_error: -+ fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); -+ ret = TC_USAGE_ERROR; ++ die("%s: Please check the syntax and the function name", argv[0]); } - die("%s: Please check the syntax and the function name", argv[0]); 3: c5093099f3 < -: ---------- t/helper/test-config: facilitate addition of new cli options 4: b510de0de0 ! 6: 3e02e1bd24 config: correctly read worktree configs in submodules @@ Metadata ## Commit message ## config: correctly read worktree configs in submodules - One of the steps in do_git_config_sequence() is to load the - worktree-specific config file. Although the function receives a git_dir - string, it relies on git_pathdup(), which uses the_repository->git_dir, - to make the path to the file. Furthermore, it also checks that - extensions.worktreeConfig is set through the - repository_format_worktree_config variable, which refers to - the_repository only. Thus, when a submodule has worktree-specific - settings, a command executed in the superproject that recurses into the - submodule won't find the said settings. + The config machinery is not able to read worktree configs from a + submodule in a process where the_repository represents the superproject. + Furthermore, when extensions.worktreeConfig is set on the superproject, + querying for a worktree config in a submodule will, instead, return + the value set at the superproject. - This will be especially important in the next patch: git-grep will learn - to honor sparse checkouts and, when running with --recurse-submodules, - the submodule's sparse checkout settings must be loaded. As these - settings are stored in the config.worktree file, they would be ignored - without this patch. So let's fix this by reading the right - config.worktree file and extensions.worktreeConfig setting, based on the - git_dir and commondir paths given to do_git_config_sequence(). Also - add a test to avoid any regressions. + The problem resides in do_git_config_sequence(). Although the function + receives a git_dir string, it uses the_repository->git_dir when making + the path to the worktree config file. And when checking if + extensions.worktreeConfig is set, it uses the global + repository_format_worktree_config variable, which refers to + the_repository only. So let's fix this by using the git_dir given to the + function and reading the extension value from the right place. Also add + a test to avoid any regressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> @@ t/helper/test-config.c #include "config.h" #include "string-list.h" +#include "submodule-config.h" ++#include "parse-options.h" /* * This program exposes the C API of the configuration mechanism @@ t/helper/test-config.c * * get_value -> prints the value with highest priority for the entered key * -@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) +@@ t/helper/test-config.c: static int early_config_cb(const char *var, const char *value, void *vdata) + #define TC_VALUE_NOT_FOUND 1 + #define TC_CONFIG_FILE_ERROR 2 + ++static const char *test_config_usage[] = { ++ "test-tool config [--submodule=<path>] <cmd> [<args>]", ++ NULL ++}; ++ + int cmd__config(int argc, const char **argv) + { + int i, val, ret = 0; + const char *v; const struct string_list *strptr; - struct config_set cs = { .hash_initialized = 0 }; - enum test_config_exit_code ret = TC_SUCCESS; -+ struct repository *repo = the_repository; + struct config_set cs; ++ struct repository subrepo, *repo = the_repository; + const char *subrepo_path = NULL; - - argc--; /* skip over "config" */ - argv++; -@@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - if (argc == 0) - goto print_usage_error; - -+ if (skip_prefix(*argv, "--submodule=", &subrepo_path)) { -+ argc--; -+ argv++; -+ if (argc == 0) -+ goto print_usage_error; -+ } + - if (argc == 2 && !strcmp(argv[0], "read_early_config")) { -+ if (subrepo_path) { -+ fprintf(stderr, "Cannot use --submodule with read_early_config\n"); -+ return TC_USAGE_ERROR; -+ } - read_early_config(early_config_cb, (void *)argv[1]); - return TC_SUCCESS; ++ struct option options[] = { ++ OPT_STRING(0, "submodule", &subrepo_path, "path", ++ "run <cmd> on the submodule at <path>"), ++ OPT_END() ++ }; ++ ++ argc = parse_options(argc, argv, NULL, options, test_config_usage, ++ PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_STOP_AT_NON_OPTION); ++ if (argc < 2) ++ die("Please, provide a command name on the command-line"); + + if (argc == 3 && !strcmp(argv[1], "read_early_config")) { ++ if (subrepo_path) ++ die("cannot use --submodule with read_early_config"); + read_early_config(early_config_cb, (void *)argv[2]); + return ret; } @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) - setup_git_directory(); + git_configset_init(&cs); +- if (argc < 2) +- die("Please, provide a command name on the command-line"); + if (subrepo_path) { + const struct submodule *sub; -+ struct repository *subrepo = xcalloc(1, sizeof(*repo)); + + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); -+ if (!sub || repo_submodule_init(subrepo, the_repository, sub)) { -+ fprintf(stderr, "Invalid argument to --submodule: '%s'\n", -+ subrepo_path); -+ free(subrepo); -+ ret = TC_USAGE_ERROR; -+ goto out; -+ } -+ repo = subrepo; -+ } ++ if (!sub || repo_submodule_init(&subrepo, the_repository, sub)) ++ die("invalid argument to --submodule: '%s'", subrepo_path); + - if (argc == 2 && !strcmp(argv[0], "get_value")) { -- if (!git_config_get_value(argv[1], &v)) { -+ if (!repo_config_get_value(repo, argv[1], &v)) { ++ repo = &subrepo; ++ } + + if (argc == 3 && !strcmp(argv[1], "get_value")) { +- if (!git_config_get_value(argv[2], &v)) { ++ if (!repo_config_get_value(repo, argv[2], &v)) { if (!v) printf("(NULL)\n"); else @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 2 && !strcmp(argv[0], "get_value_multi")) { -- strptr = git_config_get_value_multi(argv[1]); -+ strptr = repo_config_get_value_multi(repo, argv[1]); + } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { +- strptr = git_config_get_value_multi(argv[2]); ++ strptr = repo_config_get_value_multi(repo, argv[2]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 2 && !strcmp(argv[0], "get_int")) { -- if (!git_config_get_int(argv[1], &val)) { -+ if (!repo_config_get_int(repo, argv[1], &val)) { + } else if (argc == 3 && !strcmp(argv[1], "get_int")) { +- if (!git_config_get_int(argv[2], &val)) { ++ if (!repo_config_get_int(repo, argv[2], &val)) { printf("%d\n", val); } else { - printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 2 && !strcmp(argv[0], "get_bool")) { -- if (!git_config_get_bool(argv[1], &val)) { -+ if (!repo_config_get_bool(repo, argv[1], &val)) { + } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { +- if (!git_config_get_bool(argv[2], &val)) { ++ if (!repo_config_get_bool(repo, argv[2], &val)) { printf("%d\n", val); } else { + - printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc == 2 && !strcmp(argv[0], "get_string")) { -- if (!git_config_get_string_tmp(argv[1], &v)) { -+ if (!repo_config_get_string_tmp(repo, argv[1], &v)) { + } else if (argc == 3 && !strcmp(argv[1], "get_string")) { +- if (!git_config_get_string_tmp(argv[2], &v)) { ++ if (!repo_config_get_string_tmp(repo, argv[2], &v)) { printf("%s\n", v); } else { - printf("Value not found for \"%s\"\n", argv[1]); + printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value")) { -+ if (subrepo_path) { -+ fprintf(stderr, "Cannot use --submodule with configset_get_value\n"); -+ ret = TC_USAGE_ERROR; -+ goto out; -+ } - for (i = 2; i < argc; i++) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { ++ if (subrepo_path) ++ die("cannot use --submodule with configset_get_value"); ++ + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } - } else if (argc >= 2 && !strcmp(argv[0], "configset_get_value_multi")) { -+ if (subrepo_path) { -+ fprintf(stderr, "Cannot use --submodule with configset_get_value_multi\n"); -+ ret = TC_USAGE_ERROR; -+ goto out; -+ } - for (i = 2; i < argc; i++) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { ++ if (subrepo_path) ++ die("cannot use --submodule with configset_get_value_multi"); ++ + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } - } else if (!strcmp(argv[0], "iterate")) { + } else if (!strcmp(argv[1], "iterate")) { - git_config(iterate_cb, NULL); + repo_config(repo, iterate_cb, NULL); } else { - print_usage_error: -- fprintf(stderr, "Invalid syntax. Usage: test-tool config <cmd> [args]\n"); -+ fprintf(stderr, "Invalid syntax. Usage: test-tool config" -+ " [--submodule=<path>] <cmd> [args]\n"); - ret = TC_USAGE_ERROR; + die("%s: Please check the syntax and the function name", argv[0]); } out: git_configset_clear(&cs); -+ if (repo != the_repository) { ++ if (repo != the_repository) + repo_clear(repo); -+ free(repo); -+ } return ret; } 5: 6d9720abf5 = 7: 902556a7b6 grep: honor sparse checkout patterns 6: affb931d35 = 8: 70e7d7b90c config: add setting to ignore sparsity patterns in some cmds -- 2.28.0 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 1/8] doc: grep: unify info on configuration variables 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares ` (7 subsequent siblings) 8 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which can make maintenance difficult. The first also contains a definition not present in the latter (grep.fullName). To avoid problems like this, let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 10 ++++++++-- Documentation/git-grep.txt | 36 ++++++----------------------------- 2 files changed, 14 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..dd51db38e1 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,14 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index a7f9bc99ea..9bdf807584 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,8 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +:git-grep: 1 +include::config/grep.txt[] OPTIONS ------- @@ -269,8 +243,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 6:57 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares ` (6 subsequent siblings) 8 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder One test in t1308 expects test-config to fail with exit code 128 due to a parsing error in the config machinery. But test-config might also exit with 128 for any other reason that leads it to call die(). Therefore the test can potentially succeed for the wrong reason. To avoid false positives, let's check test-config's output, in addition to the exit code, and make sure that the cause of the error is the one we expect in this test. Moreover, the test was using the auxiliary function check_config which optionally takes a string to compare the test-config stdout against. Because this string is optional, there is a risk that future callers may also check only the exit code and not the output. To avoid that, make the string parameter of this function mandatory. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/t1308-config-set.sh | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh index 3a527e3a84..cff17120dc 100755 --- a/t/t1308-config-set.sh +++ b/t/t1308-config-set.sh @@ -14,10 +14,7 @@ check_config () { expect_code=0 fi && op=$1 key=$2 && shift && shift && - if test $# != 0 - then - printf "%s\n" "$@" - fi >expect && + printf "%s\n" "$@" >expect && test_expect_code $expect_code test-tool config "$op" "$key" >actual && test_cmp expect actual } @@ -130,7 +127,8 @@ test_expect_success 'check line error when NULL string is queried' ' ' test_expect_success 'find integer if value is non parse-able' ' - check_config expect_code 128 get_int lamb.head + test_expect_code 128 test-tool config get_int lamb.head 2>result && + test_i18ngrep "fatal: bad numeric config value '\'none\'' for '\'lamb.head\''" result ' test_expect_success 'find bool value for the entered key' ' -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config 2020-09-02 6:17 ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares @ 2020-09-02 6:57 ` Eric Sunshine 2020-09-02 16:16 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Eric Sunshine @ 2020-09-02 6:57 UTC (permalink / raw) To: Matheus Tavares Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan, Jonathan Nieder On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > One test in t1308 expects test-config to fail with exit code 128 due to > a parsing error in the config machinery. But test-config might also exit > with 128 for any other reason that leads it to call die(). Therefore the > test can potentially succeed for the wrong reason. To avoid false > positives, let's check test-config's output, in addition to the exit > code, and make sure that the cause of the error is the one we expect in > this test. > > Moreover, the test was using the auxiliary function check_config which > optionally takes a string to compare the test-config stdout against. > Because this string is optional, there is a risk that future callers may > also check only the exit code and not the output. To avoid that, make > the string parameter of this function mandatory. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh > @@ -14,10 +14,7 @@ check_config () { > expect_code=0 > fi && > op=$1 key=$2 && shift && shift && > - if test $# != 0 > - then > - printf "%s\n" "$@" > - fi >expect && > + printf "%s\n" "$@" >expect && This change in behavior is quite subtle. With the original code, "expect" will be entirely empty if no argument is provided, whereas with the revised code, "expect" will contain a single newline. This could be improved by making the argument genuinely mandatory as stated in the commit message. Perhaps something like this: if test $# -eq 0 then BUG "check_config 'value' argument missing" fi && printf "%s\n" "$@" >expect && > @@ -130,7 +127,8 @@ test_expect_success 'check line error when NULL string is queried' ' > test_expect_success 'find integer if value is non parse-able' ' > - check_config expect_code 128 get_int lamb.head > + test_expect_code 128 test-tool config get_int lamb.head 2>result && > + test_i18ngrep "fatal: bad numeric config value '\'none\'' for '\'lamb.head\''" result > ' The complex '\'quoting\'' magic leaves and re-enters the single-quote context of the test body and makes it difficult to reason about. Since this is a pattern argument to grep, a simpler alternative would be: test_i18ngrep "fatal: bad numeric config value .none. for .lamb.head." result Aside from that, do I understand correctly that all other callers which expect a non-zero exit code will find the error message on stdout, but this case will find it on stderr? That makes one wonder if, rather than dropping use of check_config() here, instead check_config() should be enhanced to accept an additional option, such as 'stderr' which causes it to check stderr rather than stdout (similar to how 'expect_code' allows the caller to override the expected exit code). But perhaps that would be overengineered if this case is not expected to come up again as more callers are added in the future? ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config 2020-09-02 6:57 ` Eric Sunshine @ 2020-09-02 16:16 ` Matheus Tavares Bernardino 2020-09-02 16:38 ` Eric Sunshine 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-09-02 16:16 UTC (permalink / raw) To: Eric Sunshine Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan, Jonathan Nieder On Wed, Sep 2, 2020 at 3:57 AM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > One test in t1308 expects test-config to fail with exit code 128 due to > > a parsing error in the config machinery. But test-config might also exit > > with 128 for any other reason that leads it to call die(). Therefore the > > test can potentially succeed for the wrong reason. To avoid false > > positives, let's check test-config's output, in addition to the exit > > code, and make sure that the cause of the error is the one we expect in > > this test. > > > > Moreover, the test was using the auxiliary function check_config which > > optionally takes a string to compare the test-config stdout against. > > Because this string is optional, there is a risk that future callers may > > also check only the exit code and not the output. To avoid that, make > > the string parameter of this function mandatory. > > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > --- > > diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh > > @@ -14,10 +14,7 @@ check_config () { > > expect_code=0 > > fi && > > op=$1 key=$2 && shift && shift && > > - if test $# != 0 > > - then > > - printf "%s\n" "$@" > > - fi >expect && > > + printf "%s\n" "$@" >expect && > > This change in behavior is quite subtle. With the original code, > "expect" will be entirely empty if no argument is provided, whereas > with the revised code, "expect" will contain a single newline. This > could be improved by making the argument genuinely mandatory as stated > in the commit message. Perhaps something like this: > > if test $# -eq 0 > then > BUG "check_config 'value' argument missing" > fi && > printf "%s\n" "$@" >expect && Thanks for catching this. I will add the check. > > @@ -130,7 +127,8 @@ test_expect_success 'check line error when NULL string is queried' ' > > test_expect_success 'find integer if value is non parse-able' ' > > - check_config expect_code 128 get_int lamb.head > > + test_expect_code 128 test-tool config get_int lamb.head 2>result && > > + test_i18ngrep "fatal: bad numeric config value '\'none\'' for '\'lamb.head\''" result > > ' >exit > The complex '\'quoting\'' magic leaves and re-enters the single-quote > context of the test body and makes it difficult to reason about. Since > this is a pattern argument to grep, a simpler alternative would be: > > test_i18ngrep "fatal: bad numeric config value .none. for > .lamb.head." result Will do, thanks. > Aside from that, do I understand correctly that all other callers > which expect a non-zero exit code will find the error message on > stdout, but this case will find it on stderr? Right. This happens because, for a "value not found" error, test-config will exit with code 1 and print to stdout. This is the only case where it exits with a non-zero code and prints to stdout instead of stderr. With that said, I'm wondering now whether we should change the function's signature from: `check_config [expect_code <code>] <cmd> <key> <expected_value>` to: `check_config <cmd> <key> <expected_value>` `check_config expect_not_found <cmd> <key> <value>` The second form would then automatically expect exit code 1 and check stdout for the message 'Value not found for "<value>"'. With this we can avoid wrong uses of check_config to check an arbitrary error code without also checking stderr. > That makes one wonder > if, rather than dropping use of check_config() here, instead > check_config() should be enhanced to accept an additional option, such > as 'stderr' which causes it to check stderr rather than stdout > (similar to how 'expect_code' allows the caller to override the > expected exit code). But perhaps that would be overengineered if this > case is not expected to come up again as more callers are added in the > future? That's an interesting idea. However, because some callers may want to use test_i18ngrep instead of test_cmp, I think the required logic would become too complex. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config 2020-09-02 16:16 ` Matheus Tavares Bernardino @ 2020-09-02 16:38 ` Eric Sunshine 0 siblings, 0 replies; 123+ messages in thread From: Eric Sunshine @ 2020-09-02 16:38 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan, Jonathan Nieder On Wed, Sep 2, 2020 at 12:16 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > With that said, I'm wondering now whether we should change the > function's signature from: > > `check_config [expect_code <code>] <cmd> <key> <expected_value>` > > to: > > `check_config <cmd> <key> <expected_value>` > `check_config expect_not_found <cmd> <key> <value>` > > The second form would then automatically expect exit code 1 and check > stdout for the message 'Value not found for "<value>"'. With this we > can avoid wrong uses of check_config to check an arbitrary error code > without also checking stderr. Yes, that seems more straightforward. In fact, at this point, you could just have two distinct functions and eliminate the ugly complexity of the existing check_config() implementation. Perhaps something like this (typed in email): check_config () { test_tool config "$1" "$2" >actual && shift && shift && printf "%s\n" "$@" >expect && test_cmp expect actual } check_not_found () { test_expect_code 1 test_tool config "$1" "$2" >actual && echo "Value not found for \"$2\"" >expect && test_cmp expect actual } ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares ` (5 subsequent siblings) 8 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder The test-config helper can return at least three different exit codes to reflect the status of the requested operation. And these codes are checked in some of the tests. But there is an inconsistent place in the helper where an usage error returns the same code as a "value not found" error. Let's fix that and, while we are here, document the meaning of each exit code in the file's header. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index a6e936721f..9e9d50099a 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -30,6 +30,11 @@ * iterate -> iterate over all values using git_config(), and print some * data for each * + * Exit codes: + * 0: success + * 1: value not found for the given config key + * 2: config file path given as argument is inaccessible or doesn't exist + * * Examples: * * To print the value with highest priority for key "foo.bAr Baz.rock": @@ -80,10 +85,10 @@ int cmd__config(int argc, const char **argv) git_configset_init(&cs); - if (argc < 2) { - fprintf(stderr, "Please, provide a command name on the command-line\n"); - goto exit1; - } else if (argc == 3 && !strcmp(argv[1], "get_value")) { + if (argc < 2) + die("Please, provide a command name on the command-line"); + + if (argc == 3 && !strcmp(argv[1], "get_value")) { if (!git_config_get_value(argv[2], &v)) { if (!v) printf("(NULL)\n"); -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (2 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 7:18 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares ` (4 subsequent siblings) 8 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder Check that we have the expected argc in 'configset_get_value' and 'configset_get_value_multi' before trying to access argv elements. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 9e9d50099a..26d9c2ac4c 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -138,7 +138,7 @@ int cmd__config(int argc, const char **argv) printf("Value not found for \"%s\"\n", argv[2]); goto exit1; } - } else if (!strcmp(argv[1], "configset_get_value")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -156,7 +156,7 @@ int cmd__config(int argc, const char **argv) printf("Value not found for \"%s\"\n", argv[2]); goto exit1; } - } else if (!strcmp(argv[1], "configset_get_value_multi")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv 2020-09-02 6:17 ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares @ 2020-09-02 7:18 ` Eric Sunshine 0 siblings, 0 replies; 123+ messages in thread From: Eric Sunshine @ 2020-09-02 7:18 UTC (permalink / raw) To: Matheus Tavares Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan, Jonathan Nieder On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > Check that we have the expected argc in 'configset_get_value' and > 'configset_get_value_multi' before trying to access argv elements. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > @@ -138,7 +138,7 @@ int cmd__config(int argc, const char **argv) > - } else if (!strcmp(argv[1], "configset_get_value")) { > + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { > for (i = 3; i < argc; i++) { > @@ -156,7 +156,7 @@ int cmd__config(int argc, const char **argv) > printf("Value not found for \"%s\"\n", argv[2]); This is certainly a bug fix since it was accessing argv[2] without checking that that element was even present, but the more significant outcome of this change is that it now correctly diagnoses when these two commands are called with the wrong number of arguments (just like all the other commands -- except "iterate" -- diagnose incorrect number of arguments). It might make sense, therefore, for the commit message to focus on that improvement and mention the out-of-bounds array access fix as a side-effect. However, that itself is not worth a re-roll. ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 5/8] t/helper/test-config: unify exit labels 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (3 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 7:30 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares ` (3 subsequent siblings) 8 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder test-config's main function has three different exit labels, all of which have to perform the same cleanup code before returning. Unify the labels in preparation for the next patch which will increase the cleanup section. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 51 +++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 31 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 26d9c2ac4c..8fe43e9775 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -69,16 +69,19 @@ static int early_config_cb(const char *var, const char *value, void *vdata) return 0; } +#define TC_VALUE_NOT_FOUND 1 +#define TC_CONFIG_FILE_ERROR 2 + int cmd__config(int argc, const char **argv) { - int i, val; + int i, val, ret = 0; const char *v; const struct string_list *strptr; struct config_set cs; if (argc == 3 && !strcmp(argv[1], "read_early_config")) { read_early_config(early_config_cb, (void *)argv[2]); - return 0; + return ret; } setup_git_directory(); @@ -94,10 +97,9 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { strptr = git_config_get_value_multi(argv[2]); @@ -109,41 +111,38 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { if (!git_config_get_int(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { if (!git_config_get_bool(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { if (!git_config_get_string_tmp(argv[2], &v)) { printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } if (!git_configset_get_value(&cs, argv[2], &v)) { @@ -151,17 +150,17 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } strptr = git_configset_get_value_multi(&cs, argv[2]); @@ -173,27 +172,17 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { git_config(iterate_cb, NULL); - goto exit0; + } else { + die("%s: Please check the syntax and the function name", argv[0]); } - die("%s: Please check the syntax and the function name", argv[0]); - -exit0: - git_configset_clear(&cs); - return 0; - -exit1: - git_configset_clear(&cs); - return 1; - -exit2: +out: git_configset_clear(&cs); - return 2; + return ret; } -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v5 5/8] t/helper/test-config: unify exit labels 2020-09-02 6:17 ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares @ 2020-09-02 7:30 ` Eric Sunshine 0 siblings, 0 replies; 123+ messages in thread From: Eric Sunshine @ 2020-09-02 7:30 UTC (permalink / raw) To: Matheus Tavares Cc: Git List, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan, Jonathan Nieder On Wed, Sep 2, 2020 at 2:18 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > test-config's main function has three different exit labels, all of > which have to perform the same cleanup code before returning. Unify the > labels in preparation for the next patch which will increase the cleanup > section. > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > diff --git a/t/helper/test-config.c b/t/helper/test-config.c > @@ -69,16 +69,19 @@ static int early_config_cb(const char *var, const char *value, void *vdata) > +#define TC_VALUE_NOT_FOUND 1 > +#define TC_CONFIG_FILE_ERROR 2 > + > int cmd__config(int argc, const char **argv) > { > + int i, val, ret = 0; > > if (argc == 3 && !strcmp(argv[1], "read_early_config")) { > read_early_config(early_config_cb, (void *)argv[2]); > - return 0; > + return ret; > } This one change feels both fragile and increases cognitive load, thus does not seem desirable. It feels fragile because someone could come along and insert code above this conditional which assigns some value other than 0 to 'ret' (not realizing that this conditional wants to return 0), thus breaking it. It increases cognitive load because it forces the reader to scan all the code leading up to this point to determine what value this conditional really wants to return. Nevertheless, this is a minor objection, not necessarily worth a re-roll. > @@ -94,10 +97,9 @@ int cmd__config(int argc, const char **argv) > printf("Value not found for \"%s\"\n", argv[2]); > - goto exit1; > + ret = TC_VALUE_NOT_FOUND; This sort of change, on the other hand, does not increase cognitive load because it's quite obvious what return value this conditional wants to return (because it's assigning it explicitly). ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 6/8] config: correctly read worktree configs in submodules 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (4 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 20:15 ` Jonathan Nieder 2020-09-02 6:17 ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares ` (2 subsequent siblings) 8 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder The config machinery is not able to read worktree configs from a submodule in a process where the_repository represents the superproject. Furthermore, when extensions.worktreeConfig is set on the superproject, querying for a worktree config in a submodule will, instead, return the value set at the superproject. The problem resides in do_git_config_sequence(). Although the function receives a git_dir string, it uses the_repository->git_dir when making the path to the worktree config file. And when checking if extensions.worktreeConfig is set, it uses the global repository_format_worktree_config variable, which refers to the_repository only. So let's fix this by using the git_dir given to the function and reading the extension value from the right place. Also add a test to avoid any regressions. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- config.c | 21 ++++++++++--- t/helper/test-config.c | 62 ++++++++++++++++++++++++++++++++------ t/t2404-worktree-config.sh | 16 ++++++++++ 3 files changed, 85 insertions(+), 14 deletions(-) diff --git a/config.c b/config.c index 2bdff4457b..e1e7fab6dc 100644 --- a/config.c +++ b/config.c @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); current_parsing_scope = CONFIG_SCOPE_WORKTREE; - if (!opts->ignore_worktree && repository_format_worktree_config) { - char *path = git_pathdup("config.worktree"); - if (!access_or_die(path, R_OK, 0)) - ret += git_config_from_file(fn, path, data); - free(path); + if (!opts->ignore_worktree && repo_config && opts->git_dir) { + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; + struct strbuf buf = STRBUF_INIT; + + read_repository_format(&repo_fmt, repo_config); + + if (!verify_repository_format(&repo_fmt, &buf) && + repo_fmt.worktree_config) { + char *path = mkpathdup("%s/config.worktree", opts->git_dir); + if (!access_or_die(path, R_OK, 0)) + ret += git_config_from_file(fn, path, data); + free(path); + } + + strbuf_release(&buf); + clear_repository_format(&repo_fmt); } current_parsing_scope = CONFIG_SCOPE_COMMAND; diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 8fe43e9775..2924c09c21 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -2,12 +2,20 @@ #include "cache.h" #include "config.h" #include "string-list.h" +#include "submodule-config.h" +#include "parse-options.h" /* * This program exposes the C API of the configuration mechanism * as a set of simple commands in order to facilitate testing. * - * Reads stdin and prints result of command to stdout: + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] + * + * If --submodule=<path> is given, <cmd> will operate on the submodule at the + * given <path>. This option is not valid for the commands: read_early_config, + * configset_get_value and configset_get_value_multi. + * + * Possible cmds are: * * get_value -> prints the value with highest priority for the entered key * @@ -72,14 +80,34 @@ static int early_config_cb(const char *var, const char *value, void *vdata) #define TC_VALUE_NOT_FOUND 1 #define TC_CONFIG_FILE_ERROR 2 +static const char *test_config_usage[] = { + "test-tool config [--submodule=<path>] <cmd> [<args>]", + NULL +}; + int cmd__config(int argc, const char **argv) { int i, val, ret = 0; const char *v; const struct string_list *strptr; struct config_set cs; + struct repository subrepo, *repo = the_repository; + const char *subrepo_path = NULL; + + struct option options[] = { + OPT_STRING(0, "submodule", &subrepo_path, "path", + "run <cmd> on the submodule at <path>"), + OPT_END() + }; + + argc = parse_options(argc, argv, NULL, options, test_config_usage, + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_STOP_AT_NON_OPTION); + if (argc < 2) + die("Please, provide a command name on the command-line"); if (argc == 3 && !strcmp(argv[1], "read_early_config")) { + if (subrepo_path) + die("cannot use --submodule with read_early_config"); read_early_config(early_config_cb, (void *)argv[2]); return ret; } @@ -88,11 +116,18 @@ int cmd__config(int argc, const char **argv) git_configset_init(&cs); - if (argc < 2) - die("Please, provide a command name on the command-line"); + if (subrepo_path) { + const struct submodule *sub; + + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); + if (!sub || repo_submodule_init(&subrepo, the_repository, sub)) + die("invalid argument to --submodule: '%s'", subrepo_path); + + repo = &subrepo; + } if (argc == 3 && !strcmp(argv[1], "get_value")) { - if (!git_config_get_value(argv[2], &v)) { + if (!repo_config_get_value(repo, argv[2], &v)) { if (!v) printf("(NULL)\n"); else @@ -102,7 +137,7 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { - strptr = git_config_get_value_multi(argv[2]); + strptr = repo_config_get_value_multi(repo, argv[2]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -116,27 +151,31 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { - if (!git_config_get_int(argv[2], &val)) { + if (!repo_config_get_int(repo, argv[2], &val)) { printf("%d\n", val); } else { printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { - if (!git_config_get_bool(argv[2], &val)) { + if (!repo_config_get_bool(repo, argv[2], &val)) { printf("%d\n", val); } else { + printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { - if (!git_config_get_string_tmp(argv[2], &v)) { + if (!repo_config_get_string_tmp(repo, argv[2], &v)) { printf("%s\n", v); } else { printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { + if (subrepo_path) + die("cannot use --submodule with configset_get_value"); + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -155,6 +194,9 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { + if (subrepo_path) + die("cannot use --submodule with configset_get_value_multi"); + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -177,12 +219,14 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { - git_config(iterate_cb, NULL); + repo_config(repo, iterate_cb, NULL); } else { die("%s: Please check the syntax and the function name", argv[0]); } out: git_configset_clear(&cs); + if (repo != the_repository) + repo_clear(repo); return ret; } diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh index 9536d10919..1e32c93735 100755 --- a/t/t2404-worktree-config.sh +++ b/t/t2404-worktree-config.sh @@ -78,4 +78,20 @@ test_expect_success 'config.worktree no longer read without extension' ' test_cmp_config -C wt2 shared this.is ' +test_expect_success 'correctly read config.worktree from submodules' ' + test_unconfig extensions.worktreeConfig && + git init sub && + ( + cd sub && + test_commit A && + git config extensions.worktreeConfig true && + git config --worktree wtconfig.sub test-value + ) && + git submodule add ./sub && + git commit -m "add sub" && + echo test-value >expect && + test-tool config --submodule=sub get_value wtconfig.sub >actual && + test_cmp expect actual +' + test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v5 6/8] config: correctly read worktree configs in submodules 2020-09-02 6:17 ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-09-02 20:15 ` Jonathan Nieder 2020-09-09 13:04 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 123+ messages in thread From: Jonathan Nieder @ 2020-09-02 20:15 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, gitster, stolee, newren, jonathantanmy Matheus Tavares wrote: > The config machinery is not able to read worktree configs from a > submodule in a process where the_repository represents the superproject. ... where the_repository represents the superproject and extensions.worktreeConfig is not set there, right? > Furthermore, when extensions.worktreeConfig is set on the superproject, > querying for a worktree config in a submodule will, instead, return > the value set at the superproject. > > The problem resides in do_git_config_sequence(). Although the function > receives a git_dir string, it uses the_repository->git_dir when making This part of the commit message seems to be rephrasing what the patch says; for that kind of thing, it seems better to let the patch speak for itself. Can we describe what is happening at a higher level (in other words the intent instead of the details of how that is manifested in code)? For example, The relevant code is in do_git_config_sequence. Although it is designed to act on an arbitrary repository, specified by the passed-in git_dir string, it accidentally depends on the_repository in two places: - it reads the global variable `repository_format_worktree_config`, which is set based on the content of the_repository, to determine whether extensions.worktreeConfig is set - it uses the git_pathdup helper to find the config.worktree file, instead of making a path using the passed-in git_dir falue Sever these dependencies. [...] > --- a/config.c > +++ b/config.c > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > ret += git_config_from_file(fn, repo_config, data); > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > - if (!opts->ignore_worktree && repository_format_worktree_config) { > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { repo_config is non-NULL if and only if commondir is non-NULL and commondir is always non-NUlL if git_dir is non-NULL (as checked higher in the function), right? I think that means this condition could be written more simply as if (!opts->ignore_worktree && opts->git_dir) { which I think should be easier for the reader to understand. > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > + struct strbuf buf = STRBUF_INIT; > + > + read_repository_format(&repo_fmt, repo_config); > + > + if (!verify_repository_format(&repo_fmt, &buf) && > + repo_fmt.worktree_config) { In the common case where we are acting on the_repository, this add extra complexity and slows the routine down. Would passing in the 'struct repository *' to allow distinguishing that case help? Something like this: diff --git i/builtin/config.c w/builtin/config.c index 5e39f618854..ca4caedf33a 100644 --- i/builtin/config.c +++ w/builtin/config.c @@ -699,10 +699,8 @@ int cmd_config(int argc, const char **argv, const char *prefix) config_options.respect_includes = !given_config_source.file; else config_options.respect_includes = respect_includes_opt; - if (!nongit) { - config_options.commondir = get_git_common_dir(); - config_options.git_dir = get_git_dir(); - } + if (!nongit) + config_options.repo = the_repository; if (end_nul) { term = '\0'; diff --git i/config.c w/config.c index 2bdff4457be..70a1dd0ad3f 100644 --- i/config.c +++ w/config.c @@ -222,8 +222,8 @@ static int include_by_gitdir(const struct config_options *opts, const char *git_dir; int already_tried_absolute = 0; - if (opts->git_dir) - git_dir = opts->git_dir; + if (opts->repo && opts->repo->gitdir) + git_dir = opts->repo->gitdir; else goto done; @@ -1720,10 +1720,10 @@ static int do_git_config_sequence(const struct config_options *opts, char *repo_config; enum config_scope prev_parsing_scope = current_parsing_scope; - if (opts->commondir) - repo_config = mkpathdup("%s/config", opts->commondir); - else if (opts->git_dir) - BUG("git_dir without commondir"); + if (opts->repo && opts->repo->commondir) + repo_config = mkpathdup("%s/config", opts->repo->commondir); + else if (opts->repo && opts->repo->gitdir) + BUG("gitdir without commondir"); else repo_config = NULL; @@ -1824,27 +1824,33 @@ void read_early_config(config_fn_t cb, void *data) struct config_options opts = {0}; struct strbuf commondir = STRBUF_INIT; struct strbuf gitdir = STRBUF_INIT; + struct repository the_early_repo = {0}; opts.respect_includes = 1; if (have_git_dir()) { - opts.commondir = get_git_common_dir(); - opts.git_dir = get_git_dir(); + opts.repo = the_repository; /* * When setup_git_directory() was not yet asked to discover the * GIT_DIR, we ask discover_git_directory() to figure out whether there * is any repository config we should use (but unlike - * setup_git_directory_gently(), no global state is changed, most + * setup_git_directory_gently(), no global state is changed; most * notably, the current working directory is still the same after the * call). + * + * NEEDSWORK: There is some duplicate work between + * discover_git_directory and repo_init. Update to use a variant of + * repo_init that does its own repository discovery once available. */ } else if (!discover_git_directory(&commondir, &gitdir)) { - opts.commondir = commondir.buf; - opts.git_dir = gitdir.buf; + repo_init(&the_early_repo, gitdir.buf, NULL); + opts.repo = &the_early_repo; } config_with_options(cb, data, NULL, &opts); + if (the_early_repo.settings.initialized) + repo_clear(&the_early_repo); strbuf_release(&commondir); strbuf_release(&gitdir); } @@ -2097,8 +2103,7 @@ static void repo_read_config(struct repository *repo) struct config_options opts = { 0 }; opts.respect_includes = 1; - opts.commondir = repo->commondir; - opts.git_dir = repo->gitdir; + opts.repo = repo; if (!repo->config) repo->config = xcalloc(1, sizeof(struct config_set)); diff --git i/config.h w/config.h index 91cdfbfb414..e56293fb29f 100644 --- i/config.h +++ w/config.h @@ -21,6 +21,7 @@ */ struct object_id; +struct repository; /* git_config_parse_key() returns these negated: */ #define CONFIG_INVALID_KEY 1 @@ -87,8 +88,7 @@ struct config_options { unsigned int ignore_worktree : 1; unsigned int ignore_cmdline : 1; unsigned int system_gently : 1; - const char *commondir; - const char *git_dir; + struct repository *repo; config_parser_event_fn_t event_fn; void *event_fn_data; enum config_error_action { ==== >8 ==== [...] > --- a/t/helper/test-config.c > +++ b/t/helper/test-config.c [...] > @@ -72,14 +80,34 @@ static int early_config_cb(const char *var, const char *value, void *vdata) > #define TC_VALUE_NOT_FOUND 1 > #define TC_CONFIG_FILE_ERROR 2 > > +static const char *test_config_usage[] = { > + "test-tool config [--submodule=<path>] <cmd> [<args>]", > + NULL > +}; > + > int cmd__config(int argc, const char **argv) > { > int i, val, ret = 0; > const char *v; > const struct string_list *strptr; > struct config_set cs; > + struct repository subrepo, *repo = the_repository; > + const char *subrepo_path = NULL; > + > + struct option options[] = { > + OPT_STRING(0, "submodule", &subrepo_path, "path", > + "run <cmd> on the submodule at <path>"), > + OPT_END() > + }; Nice. > + > + argc = parse_options(argc, argv, NULL, options, test_config_usage, > + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_STOP_AT_NON_OPTION); > + if (argc < 2) > + die("Please, provide a command name on the command-line"); optional nit: can use usage_with_options here. It produces a better error message than any other I can think of (all I can think of are things like "need a <cmd>"). This is from existing code, but the use of parse_options opens up the possibility of taking advantage of the parse-options generated message. :) [...] > --- a/t/t2404-worktree-config.sh > +++ b/t/t2404-worktree-config.sh > @@ -78,4 +78,20 @@ test_expect_success 'config.worktree no longer read without extension' ' > test_cmp_config -c wt2 shared this.is > ' > > +test_expect_success 'correctly read config.worktree from submodules' ' > + test_unconfig extensions.worktreeconfig && > + git init sub && > + ( > + cd sub && > + test_commit a && > + git config extensions.worktreeconfig true && > + git config --worktree wtconfig.sub test-value > + ) && > + git submodule add ./sub && > + git commit -m "add sub" && > + echo test-value >expect && > + test-tool config --submodule=sub get_value wtconfig.sub >actual && > + test_cmp expect actual > +' Lovely. Summary: I like the direction this change goes in. I think we can do it without repeating repository format discovery in the the_repository case and without duplicating repository format discovery code in the submodule case. If it proves too fussy, then a NEEDSWORK comment would be helpful to help the reader see what is going on. Thanks and hope that helps, Jonathan ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v5 6/8] config: correctly read worktree configs in submodules 2020-09-02 20:15 ` Jonathan Nieder @ 2020-09-09 13:04 ` Matheus Tavares Bernardino 2020-09-09 23:32 ` Jonathan Nieder 0 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares Bernardino @ 2020-09-09 13:04 UTC (permalink / raw) To: Jonathan Nieder Cc: git, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan Hi, Jonathan Sorry for the late reply, last week was quite busy. On Wed, Sep 2, 2020 at 5:15 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > > Matheus Tavares wrote: > > > The config machinery is not able to read worktree configs from a > > submodule in a process where the_repository represents the superproject. > > ... where the_repository represents the superproject and > extensions.worktreeConfig is not set there, right? > > > Furthermore, when extensions.worktreeConfig is set on the superproject, > > querying for a worktree config in a submodule will, instead, return > > the value set at the superproject. > > > > The problem resides in do_git_config_sequence(). Although the function > > receives a git_dir string, it uses the_repository->git_dir when making > > This part of the commit message seems to be rephrasing what the patch > says; for that kind of thing, it seems better to let the patch speak > for itself. Can we describe what is happening at a higher level (in > other words the intent instead of the details of how that is > manifested in code)? For example, > > The relevant code is in do_git_config_sequence. Although it is designed > to act on an arbitrary repository, specified by the passed-in git_dir > string, it accidentally depends on the_repository in two places: > > - it reads the global variable `repository_format_worktree_config`, > which is set based on the content of the_repository, to determine > whether extensions.worktreeConfig is set > > - it uses the git_pathdup helper to find the config.worktree file, > instead of making a path using the passed-in git_dir falue > > Sever these dependencies. Yeah, much better, thanks! :) > [...] > > --- a/config.c > > +++ b/config.c > > @@ -1747,11 +1747,22 @@ static int do_git_config_sequence(const struct config_options *opts, > > ret += git_config_from_file(fn, repo_config, data); > > > > current_parsing_scope = CONFIG_SCOPE_WORKTREE; > > - if (!opts->ignore_worktree && repository_format_worktree_config) { > > + if (!opts->ignore_worktree && repo_config && opts->git_dir) { > > repo_config is non-NULL if and only if commondir is non-NULL and > commondir is always non-NUlL if git_dir is non-NULL (as checked higher > in the function), right? I think that means this condition could be > written more simply as > > if (!opts->ignore_worktree && opts->git_dir) { > > which I think should be easier for the reader to understand. Nice, thanks. > > + struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; > > + struct strbuf buf = STRBUF_INIT; > > + > > + read_repository_format(&repo_fmt, repo_config); > > + > > + if (!verify_repository_format(&repo_fmt, &buf) && > > + repo_fmt.worktree_config) { > > In the common case where we are acting on the_repository, this add > extra complexity and slows the routine down. > > Would passing in the 'struct repository *' to allow distinguishing > that case help? Something like this: > > diff --git i/builtin/config.c w/builtin/config.c > index 5e39f618854..ca4caedf33a 100644 > --- i/builtin/config.c > +++ w/builtin/config.c > @@ -699,10 +699,8 @@ int cmd_config(int argc, const char **argv, const char *prefix) > config_options.respect_includes = !given_config_source.file; > else > config_options.respect_includes = respect_includes_opt; > - if (!nongit) { > - config_options.commondir = get_git_common_dir(); > - config_options.git_dir = get_git_dir(); > - } > + if (!nongit) > + config_options.repo = the_repository; > > if (end_nul) { > term = '\0'; > diff --git i/config.c w/config.c > index 2bdff4457be..70a1dd0ad3f 100644 > --- i/config.c > +++ w/config.c > @@ -222,8 +222,8 @@ static int include_by_gitdir(const struct config_options *opts, > const char *git_dir; > int already_tried_absolute = 0; > > - if (opts->git_dir) > - git_dir = opts->git_dir; > + if (opts->repo && opts->repo->gitdir) > + git_dir = opts->repo->gitdir; > else > goto done; > > @@ -1720,10 +1720,10 @@ static int do_git_config_sequence(const struct config_options *opts, > char *repo_config; > enum config_scope prev_parsing_scope = current_parsing_scope; > > - if (opts->commondir) > - repo_config = mkpathdup("%s/config", opts->commondir); > - else if (opts->git_dir) > - BUG("git_dir without commondir"); > + if (opts->repo && opts->repo->commondir) > + repo_config = mkpathdup("%s/config", opts->repo->commondir); > + else if (opts->repo && opts->repo->gitdir) > + BUG("gitdir without commondir"); > else > repo_config = NULL; > > @@ -1824,27 +1824,33 @@ void read_early_config(config_fn_t cb, void *data) > struct config_options opts = {0}; > struct strbuf commondir = STRBUF_INIT; > struct strbuf gitdir = STRBUF_INIT; > + struct repository the_early_repo = {0}; > > opts.respect_includes = 1; > > if (have_git_dir()) { > - opts.commondir = get_git_common_dir(); > - opts.git_dir = get_git_dir(); > + opts.repo = the_repository; I'm not very familiar with the code in setup.c so I apologize for the noob question: have_git_dir() returns `startup_info->have_repository || the_repository->gitdir`; so is it possible that it returns true but the_repository->gitdir is not initialized yet? If so, should we also check the_repository->gitdir here (before assigning opts.repo), and call BUG() when it is NULL, like get_git_dir() does? Hmm, nevertheless, I see that you already check `opts.repo && opts.repo->gitdir` before trying to use it in do_git_config_sequence(). So it should already cover this case, right? > /* > * When setup_git_directory() was not yet asked to discover the > * GIT_DIR, we ask discover_git_directory() to figure out whether there > * is any repository config we should use (but unlike > - * setup_git_directory_gently(), no global state is changed, most > + * setup_git_directory_gently(), no global state is changed; most > * notably, the current working directory is still the same after the > * call). > + * > + * NEEDSWORK: There is some duplicate work between > + * discover_git_directory and repo_init. Update to use a variant of > + * repo_init that does its own repository discovery once available. > */ > } else if (!discover_git_directory(&commondir, &gitdir)) { > - opts.commondir = commondir.buf; > - opts.git_dir = gitdir.buf; > + repo_init(&the_early_repo, gitdir.buf, NULL); > + opts.repo = &the_early_repo; > } > > config_with_options(cb, data, NULL, &opts); > > + if (the_early_repo.settings.initialized) > + repo_clear(&the_early_repo); > > strbuf_release(&commondir); > strbuf_release(&gitdir); > } > @@ -2097,8 +2103,7 @@ static void repo_read_config(struct repository *repo) > struct config_options opts = { 0 }; > > opts.respect_includes = 1; > - opts.commondir = repo->commondir; > - opts.git_dir = repo->gitdir; > + opts.repo = repo; > > if (!repo->config) > repo->config = xcalloc(1, sizeof(struct config_set)); > diff --git i/config.h w/config.h > index 91cdfbfb414..e56293fb29f 100644 > --- i/config.h > +++ w/config.h > @@ -21,6 +21,7 @@ > */ > > struct object_id; > +struct repository; > > /* git_config_parse_key() returns these negated: */ > #define CONFIG_INVALID_KEY 1 > @@ -87,8 +88,7 @@ struct config_options { > unsigned int ignore_worktree : 1; > unsigned int ignore_cmdline : 1; > unsigned int system_gently : 1; > - const char *commondir; > - const char *git_dir; > + struct repository *repo; > config_parser_event_fn_t event_fn; > void *event_fn_data; > enum config_error_action { > ==== >8 ==== Thanks a lot for this :) I was thinking of adding it as a preparatory patch before the fix itself. May I have your S-o-B as the author? Best, Matheus ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v5 6/8] config: correctly read worktree configs in submodules 2020-09-09 13:04 ` Matheus Tavares Bernardino @ 2020-09-09 23:32 ` Jonathan Nieder 0 siblings, 0 replies; 123+ messages in thread From: Jonathan Nieder @ 2020-09-09 23:32 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, Junio C Hamano, Derrick Stolee, Elijah Newren, Jonathan Tan Hi, Matheus Tavares Bernardino wrote: > Sorry for the late reply, last week was quite busy. No problem. It's an unusual time for everyone. [...] > On Wed, Sep 2, 2020 at 5:15 PM Jonathan Nieder <jrnieder@gmail.com> wrote: >> @@ -1824,27 +1824,33 @@ void read_early_config(config_fn_t cb, void *data) >> struct config_options opts = {0}; >> struct strbuf commondir = STRBUF_INIT; >> struct strbuf gitdir = STRBUF_INIT; >> + struct repository the_early_repo = {0}; >> >> opts.respect_includes = 1; >> >> if (have_git_dir()) { >> - opts.commondir = get_git_common_dir(); >> - opts.git_dir = get_git_dir(); >> + opts.repo = the_repository; > > I'm not very familiar with the code in setup.c so I apologize for the > noob question: have_git_dir() returns `startup_info->have_repository > || the_repository->gitdir`; so is it possible that it returns true but > the_repository->gitdir is not initialized yet? If so, should we also > check the_repository->gitdir here (before assigning opts.repo), and > call BUG() when it is NULL, like get_git_dir() does? > > Hmm, nevertheless, I see that you already check `opts.repo && > opts.repo->gitdir` before trying to use it in > do_git_config_sequence(). So it should already cover this case, right? Right --- the main point is that a BUG() call represents "this can't happen", or in other words, it's an assertion failure. As a matter of defensive coding functions like get_git_dir() guard against such cases to make debugging a little easier and exploitation a little more difficult when the impossible happens. [...] > Thanks a lot for this :) I was thinking of adding it as a preparatory > patch before the fix itself. May I have your S-o-B as the author? Sure! Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Thanks, Jonathan ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v5 7/8] grep: honor sparse checkout patterns 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (5 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares 8 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and reports all matches found outside this subset, which kind of goes in the opposite direction. There are some use cases for ignoring the sparsity patterns and the next commit will add an option to obtain this behavior, but here we start by making grep honor the sparsity boundaries in every case where this is relevant: - git grep in worktree - git grep --cached - git grep $REVISION For the worktree and cached cases, we iterate over paths without the SKIP_WORKTREE bit set, and limit our searches to these paths. For the $REVISION case, we limit the paths we search to those that match the sparsity patterns. (We do not check the SKIP_WORKTREE bit for the $REVISION case, because $REVISION may contain paths that do not exist in HEAD and thus for which we have no SKIP_WORKTREE bit to consult. The sparsity patterns tell us how the SKIP_WORKTREE bit would be set if we were to check out $REVISION, so we consult those. Also, we don't use the sparsity patterns with the worktree or cached cases, both because we have a bit we can check directly and more efficiently, and because unmerged entries from a merge or a rebase could cause more files to temporarily be present than the sparsity patterns would normally select.) Note that there is a special case here: `git grep $TREE`. In this case, we cannot know whether $TREE corresponds to the root of the repository or some sub-tree, and thus there is no way for us to know which sparsity patterns, if any, apply. So the $TREE case will not use sparsity patterns or any SKIP_WORKTREE bits and will instead always search all files within the $TREE. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- builtin/grep.c | 125 ++++++++++++++++++-- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 195 +++++++++++++++++++++++++++++++ 3 files changed, 312 insertions(+), 17 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index f58979bc3f..a32815de0a 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -552,9 +555,76 @@ static int grep_cache(struct grep_opt *opt, return hit; } -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) +static struct pattern_list *get_sparsity_patterns(struct repository *repo) +{ + struct pattern_list *patterns; + char *sparse_file; + int sparse_config, cone_config; + + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || + !sparse_config) { + return NULL; + } + + sparse_file = repo_git_path(repo, "info/sparse-checkout"); + patterns = xcalloc(1, sizeof(*patterns)); + + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) + cone_config = 0; + patterns->use_cone_patterns = cone_config; + + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { + if (file_exists(sparse_file)) { + warning(_("failed to load sparse-checkout file: '%s'"), + sparse_file); + } + free(sparse_file); + free(patterns); + return NULL; + } + + free(sparse_file); + return patterns; +} + +static int path_in_sparse_checkout(struct strbuf *path, int prefix_len, + unsigned int entry_mode, + struct index_state *istate, + struct pattern_list *sparsity, + enum pattern_match_result parent_match, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + int is_dir = S_ISDIR(entry_mode); + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + + if (is_dir && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, + path->buf + prefix_len, &dtype, + sparsity, istate); + if (*match == UNDECIDED) + *match = parent_match; + + if (is_dir) + strbuf_trim_trailing_dir_sep(path); + + if (*match == NOT_MATCHED && + (!is_dir || (is_dir && sparsity->use_cone_patterns))) + return 0; + + return 1; +} + +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int check_attr, struct pattern_list *sparsity, + enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ -570,6 +640,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); + enum pattern_match_result sparsity_match = 0; if (match != all_entries_interesting) { strbuf_addstr(&name, base->buf + tn_len); @@ -586,6 +657,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (sparsity) { + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + + if (!path_in_sparse_checkout(&path, old_baselen - tn_len, + entry.mode, repo->index, + sparsity, default_sparsity_match, + &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); @@ -602,8 +686,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, + check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, @@ -621,6 +705,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, return hit; } +/* + * Note: sparsity patterns and paths' attributes will only be considered if + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern + * matching on paths.) + */ +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int is_root_tree) +{ + struct pattern_list *patterns = NULL; + int ret; + + if (is_root_tree) + patterns = get_sparsity_patterns(opt->repo); + + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, + patterns, 0); + + if (patterns) { + clear_pattern_list(patterns); + free(patterns); + } + return ret; +} + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct object *obj, const char *name, const char *path) { diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..b3109e3479 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,195 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +|-- sub +| |-- A +| | `-- a +| `-- B +| `-- b +`-- sub2 + `-- a + +Where the outer repository has non-cone mode sparsity patterns, sub is a +submodule with cone mode sparsity patterns and sub2 is a submodule that is +excluded by the superproject sparsity patterns. The resulting sparse checkout +should leave the following structure in the working tree: + +. +|-- a +|-- sub +| `-- B +| `-- b +`-- sub2 + `-- a + +But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git init sub2 && + ( + cd sub2 && + echo "text" >a && + git add a && + git commit -m sub2 + ) && + + git submodule add ./sub && + git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" "sub" && + + git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b && + test_path_is_file sub2/a +' + +# The test below checks a special case: the sparsity patterns exclude '/b' +# and sparse checkout is enabled, but the path exists in the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' + cat >expect <<-EOF && + b:modified-b-in-branchX + b:modified-b-in-branchY + EOF + test_when_finished "test_might_fail git merge --abort && \ + git checkout master" && + + git sparse-checkout disable && + git checkout -b branchY master && + test_commit modified-b-in-branchY b && + git checkout -b branchX master && + test_commit modified-b-in-branchX b && + + git sparse-checkout init && + test_path_is_missing b && + test_must_fail git merge branchY && + git grep "modified-b" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_tag-to-tree <<-EOF && + tag-to-tree:a:text + tag-to-tree:b:text + tag-to-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" tag-to-tree >actual_tag-to-tree && + test_cmp expect_tag-to-tree actual_tag-to-tree +' + +# Note that sub2/ is present in the worktree but it is excluded by the sparsity +# patterns, so grep should not recurse into it. +test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:sub/B/b:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (6 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares @ 2020-09-02 6:17 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares 8 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-02 6:17 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder When sparse checkout is enabled, some users expect the output of certain commands (such as grep, diff, and log) to be also restricted within the sparsity patterns. This would allow them to effectively work only on the subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the previous patch. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function documentation that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's introduce the sparse.restrictCmds setting (and the analogous --[no]-restrict-to-sparse-paths global option). The default value is true. For now, grep is the only one affected by this setting, but the goal is to have support for more commands, in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config.txt | 2 + Documentation/config/grep.txt | 8 ++ Documentation/config/sparse.txt | 20 ++++ Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 13 ++- contrib/completion/git-completion.bash | 2 + git.c | 5 + sparse-checkout.c | 18 ++++ sparse-checkout.h | 11 +++ t/t7817-grep-sparse-checkout.sh | 132 ++++++++++++++++++++++++- t/t9902-completion.sh | 4 +- 12 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h diff --git a/Documentation/config.txt b/Documentation/config.txt index 3042d80978..3b6e0901b8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -438,6 +438,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index dd51db38e1..a3275ab4b7 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -28,3 +28,11 @@ grep.fullName:: grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. + +ifdef::git-grep[] +sparse.restrictCmds:: + See base definition in linkgit:git-config[1]. grep honors + sparse.restrictCmds by limiting searches to the sparsity paths in three + cases: when searching the working tree, when searching the index with + --cached, and when searching a specified commit. +endif::git-grep[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 0000000000..494761526e --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,20 @@ +sparse.restrictCmds:: + Only meaningful in conjunction with core.sparseCheckout. This option + extends sparse checkouts (which limit which paths are written to the + working tree), so that output and operations are also limited to the + sparsity paths where possible and implemented. The purpose of this + option is to (1) focus output for the user on the portion of the + repository that is of interest to them, and (2) enable potentially + dramatic performance improvements, especially in conjunction with + partial clones. ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c`) that the user might also specify on the +command line. When false, the affected commands will work on full trees, +ignoring the sparsity patterns. For now, only git-grep honors this setting. ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), +unaffected by any sparsity patterns. Also, writing commands such as +sparse-checkout and read-tree will not be affected by this configuration. diff --git a/Documentation/git.txt b/Documentation/git.txt index 2fc92586b5..d857509573 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. +--[no-]restrict-to-sparse-paths:: + Overrides the sparse.restrictCmds configuration (see + linkgit:git-config[1]) for this execution. + --list-cmds=group[,group...]:: List commands by group. This is an internal/experimental option and may change or be removed in the future. Supported diff --git a/Makefile b/Makefile index 65f8cfb236..778c9e499e 100644 --- a/Makefile +++ b/Makefile @@ -982,6 +982,7 @@ LIB_OBJS += sha1-name.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-checkout.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/builtin/grep.c b/builtin/grep.c index a32815de0a..3fa364e91c 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -25,6 +25,7 @@ #include "submodule-config.h" #include "object-store.h" #include "packfile.h" +#include "sparse-checkout.h" static char const * const grep_usage[] = { N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, int nr; struct strbuf name = STRBUF_INIT; int name_base_len = 0; + int sparse_paths_only = restrict_to_sparse_paths(repo); if (repo->submodule_prefix) { name_base_len = strlen(repo->submodule_prefix); strbuf_addstr(&name, repo->submodule_prefix); @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (sparse_paths_only && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int is_root_tree) { struct pattern_list *patterns = NULL; + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); int ret; - if (is_root_tree) + if (is_root_tree && sparse_paths_only) patterns = get_sparsity_patterns(opt->repo); ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, @@ -1258,6 +1261,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!use_index || untracked) { int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { + die(_("--[no-]restrict-to-sparse-paths is incompatible" + " with --no-index and --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); } else if (0 <= opt_exclude) { die(_("--[no-]exclude-standard cannot be used for tracked contents")); diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 9147fba3d5..de12766a70 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3402,6 +3402,8 @@ __git_main () --namespace= --no-replace-objects --help + --restrict-to-sparse-paths + --no-restrict-to-sparse-paths " ;; *) diff --git a/git.c b/git.c index 8bd1d7551d..81206b424c 100644 --- a/git.c +++ b/git.c @@ -5,6 +5,7 @@ #include "run-command.h" #include "alias.h" #include "shallow.h" +#include "sparse-checkout.h" #define RUN_SETUP (1<<0) #define RUN_SETUP_GENTLY (1<<1) @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); } + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 1; + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 0; } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); diff --git a/sparse-checkout.c b/sparse-checkout.c new file mode 100644 index 0000000000..96c5ed5446 --- /dev/null +++ b/sparse-checkout.c @@ -0,0 +1,18 @@ +#include "cache.h" +#include "config.h" +#include "sparse-checkout.h" + +int opt_restrict_to_sparse_paths = -1; + +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; + + if (opt_restrict_to_sparse_paths >= 0) + return opt_restrict_to_sparse_paths; + + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) + ret = 1; + + return ret; +} diff --git a/sparse-checkout.h b/sparse-checkout.h new file mode 100644 index 0000000000..a4805e443a --- /dev/null +++ b/sparse-checkout.h @@ -0,0 +1,11 @@ +#ifndef SPARSE_CHECKOUT_H +#define SPARSE_CHECKOUT_H + +struct repository; + +extern int opt_restrict_to_sparse_paths; + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); + +#endif /* SPARSE_CHECKOUT_H */ diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index b3109e3479..f93a4f71d1 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -80,10 +80,10 @@ test_expect_success 'setup' ' test_path_is_file sub2/a ' -# The test below checks a special case: the sparsity patterns exclude '/b' +# The two tests below check a special case: the sparsity patterns exclude '/b' # and sparse checkout is enabled, but the path exists in the working tree (e.g. # manually created after `git sparse-checkout init`). In this case, grep should -# skip it. +# skip the file by default, but not with --no-restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' ' git grep "text" >actual && test_cmp expect actual ' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text + b:new-text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git --no-restrict-to-sparse-paths grep "text" >actual && + test_cmp expect actual +' test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' cat >expect <<-EOF && @@ -157,7 +167,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' ' # Note that sub2/ is present in the worktree but it is excluded by the sparsity -# patterns, so grep should not recurse into it. +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' cat >expect <<-EOF && a:text @@ -166,6 +176,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s git grep --recurse-submodules "text" >actual && test_cmp expect actual ' +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && + test_cmp expect actual +' test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' cat >expect <<-EOF && @@ -192,4 +211,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse test_cmp expect_tag-to-commit actual_tag-to-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ + 'git -c sparse.restrictCmds=false grep' \ + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' +do + + test_expect_success "$cmd --cached should ignore sparsity patterns" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit + ' +done + +test_expect_success 'grep --recurse-submodules --cached w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ + "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> w/ --no-restrict-to-sparse-paths' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + $commit:sub/A/a:text + $commit:sub/B/b:text + $commit:sub2/a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + tag-to-commit:sub/A/a:text + tag-to-commit:sub/B/b:text + tag-to-commit:sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + $commit >actual_commit && + test_cmp expect_commit actual_commit && + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF + test_config -C sub sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + test_config -C sub sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +for opt in '--untracked' '--no-index' +do + test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " + test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && + test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual + " +done + test_done diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 8425b9a531..a8c2ac9d70 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1928,6 +1928,8 @@ test_expect_success 'double dash "git" itself' ' --namespace= --no-replace-objects Z --help Z + --restrict-to-sparse-paths Z + --no-restrict-to-sparse-paths Z EOF ' @@ -1970,7 +1972,7 @@ test_expect_success 'general options' ' test_completion "git --nam" "--namespace=" && test_completion "git --bar" "--bare " && test_completion "git --inf" "--info-path " && - test_completion "git --no-r" "--no-replace-objects " + test_completion "git --no-rep" "--no-replace-objects " ' test_expect_success 'general options plus command' ' -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares ` (7 preceding siblings ...) 2020-09-02 6:17 ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares ` (9 more replies) 8 siblings, 10 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine This series makes git-grep restrict its output to the sparsity patterns when requested by the user. A new global option is added to control this behavior in grep and hopefully more commands in the future. There is also a fix in config.c, to correctly read worktree-specific settings on submodules. Changes since v5: Patch 2: - Avoid complex '\'quoting\'' by using '.', as the string is going to be used in a grep search. - Split the check_config auxiliary function into two, to remove some unecessary logic and future-proof them against wrong usages of test-config. Patch 4: reword commit message to focus on correctly diagnosing missing arguments. Patch 5: don't replace explicit `return 0` with `return ret`. Added patch 6, passing a `struct repository` to do_git_config_sequence(). Patch 6: - Removed global `repository_format_worktree_config` and cached extensions.worktreeConfig value in `struct repository`, to avoid repeating repository format discovery. - Improved commit message as suggested by Jonathan. - Use usage_with_options() Jonathan Nieder (1): config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares (8): doc: grep: unify info on configuration variables t1308-config-set: avoid false positives when using test-config t/helper/test-config: be consistent with exit codes t/helper/test-config: diagnose missing arguments t/helper/test-config: unify exit labels config: correctly read worktree configs in submodules grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 18 +- Documentation/config/sparse.txt | 20 ++ Documentation/git-grep.txt | 36 +-- Documentation/git.txt | 4 + Makefile | 1 + builtin/config.c | 8 +- builtin/grep.c | 134 ++++++++++- cache.h | 1 - config.c | 47 ++-- config.h | 4 +- contrib/completion/git-completion.bash | 2 + environment.c | 1 - git.c | 5 + repository.c | 1 + repository.h | 1 + setup.c | 4 +- sparse-checkout.c | 18 ++ sparse-checkout.h | 11 + t/helper/test-config.c | 124 ++++++---- t/t1308-config-set.sh | 28 +-- t/t2404-worktree-config.sh | 16 ++ t/t7011-skip-worktree-reading.sh | 9 - t/t7817-grep-sparse-checkout.sh | 321 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 25 files changed, 682 insertions(+), 138 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v5: 1: 70c9a4e741 = 1: 70c9a4e741 doc: grep: unify info on configuration variables 2: f53782f14c < -: ---------- t1308-config-set: avoid false positives when using test-config -: ---------- > 2: 3c2d722152 t1308-config-set: avoid false positives when using test-config 3: 85e1588d6c = 3: 45d13744b7 t/helper/test-config: be consistent with exit codes 4: 0750191342 ! 4: 51656e43c3 t/helper/test-config: check argc before accessing argv @@ Metadata Author: Matheus Tavares <matheus.bernardino@usp.br> ## Commit message ## - t/helper/test-config: check argc before accessing argv + t/helper/test-config: diagnose missing arguments - Check that we have the expected argc in 'configset_get_value' and - 'configset_get_value_multi' before trying to access argv elements. + test-config verifies that the correct number of arguments was given for + all of its commands except for 'configset_get_value' and + 'configset_get_value_multi'. Add the check to these two, so that we + properly report missing arguments and prevent out-of-bounds access to + argv[]. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> 5: 56535b0e36 ! 5: 924d2e8ceb t/helper/test-config: unify exit labels @@ t/helper/test-config.c: static int early_config_cb(const char *var, const char * const char *v; const struct string_list *strptr; struct config_set cs; - - if (argc == 3 && !strcmp(argv[1], "read_early_config")) { - read_early_config(early_config_cb, (void *)argv[2]); -- return 0; -+ return ret; - } - - setup_git_directory(); @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else -: ---------- > 6: f5c0fc3336 config: make do_git_config_sequence receive a 'struct repository' 6: 3e02e1bd24 ! 7: 3a28b8e608 config: correctly read worktree configs in submodules @@ Commit message config: correctly read worktree configs in submodules The config machinery is not able to read worktree configs from a - submodule in a process where the_repository represents the superproject. - Furthermore, when extensions.worktreeConfig is set on the superproject, - querying for a worktree config in a submodule will, instead, return - the value set at the superproject. - - The problem resides in do_git_config_sequence(). Although the function - receives a git_dir string, it uses the_repository->git_dir when making - the path to the worktree config file. And when checking if - extensions.worktreeConfig is set, it uses the global - repository_format_worktree_config variable, which refers to - the_repository only. So let's fix this by using the git_dir given to the - function and reading the extension value from the right place. Also add - a test to avoid any regressions. + submodule in a process where the_repository represents the superproject + and extensions.worktreeConfig is not set there. Furthermore, when + extensions.worktreeConfig is set on the superproject, querying for a + worktree config in a submodule will, instead, return the value set at + the superproject. + + The relevant code is in do_git_config_sequence(). Although it is + designed to act on an arbitrary repository, specified in the passed-in + `struct config_options`, it accidentally depends on the_repository in + two places: + + - it reads the global variable `repository_format_worktree_config`, + which is set based on the content of the_repository, to determine + whether extensions.worktreeConfig is set. + + - it uses the git_pathdup() helper to find the config.worktree file, + instead of making a path using the passed-in repository. + + Sever these dependencies and add a regression test. Also, to avoid + future misuses of `repository_format_worktree_config` like this one, + remove this global variable and store the config value on + `struct repository` itself. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> + ## builtin/config.c ## +@@ builtin/config.c: int cmd_config(int argc, const char **argv, const char *prefix) + given_config_source.scope = CONFIG_SCOPE_LOCAL; + } else if (use_worktree_config) { + struct worktree **worktrees = get_worktrees(); +- if (repository_format_worktree_config) ++ if (!nongit && the_repository->worktree_config_extension) + given_config_source.file = git_pathdup("config.worktree"); + else if (worktrees[0] && worktrees[1]) + die(_("--worktree cannot be used with multiple " + + ## cache.h ## +@@ cache.h: extern int grafts_replace_parents; + #define GIT_REPO_VERSION 0 + #define GIT_REPO_VERSION_READ 1 + extern int repository_format_precious_objects; +-extern int repository_format_worktree_config; + + /* + * You _have_ to initialize a `struct repository_format` using + ## config.c ## @@ config.c: static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); @@ config.c: static int do_git_config_sequence(const struct config_options *opts, - if (!access_or_die(path, R_OK, 0)) - ret += git_config_from_file(fn, path, data); - free(path); -+ if (!opts->ignore_worktree && repo_config && opts->git_dir) { -+ struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT; -+ struct strbuf buf = STRBUF_INIT; -+ -+ read_repository_format(&repo_fmt, repo_config); -+ -+ if (!verify_repository_format(&repo_fmt, &buf) && -+ repo_fmt.worktree_config) { -+ char *path = mkpathdup("%s/config.worktree", opts->git_dir); -+ if (!access_or_die(path, R_OK, 0)) -+ ret += git_config_from_file(fn, path, data); -+ free(path); -+ } -+ -+ strbuf_release(&buf); -+ clear_repository_format(&repo_fmt); ++ if (!opts->ignore_worktree && opts->repo && opts->repo->gitdir && ++ opts->repo->worktree_config_extension) { ++ struct strbuf path = STRBUF_INIT; ++ strbuf_repo_git_path(&path, opts->repo, "config.worktree"); ++ if (!access_or_die(path.buf, R_OK, 0)) ++ ret += git_config_from_file(fn, path.buf, data); ++ strbuf_release(&path); } current_parsing_scope = CONFIG_SCOPE_COMMAND; + ## environment.c ## +@@ environment.c: int warn_ambiguous_refs = 1; + int warn_on_object_refname_ambiguity = 1; + int ref_paranoia = -1; + int repository_format_precious_objects; +-int repository_format_worktree_config; + const char *git_commit_encoding; + const char *git_log_output_encoding; + char *apply_default_whitespace; + + ## repository.c ## +@@ repository.c: int repo_init(struct repository *repo, + goto error; + + repo_set_hash_algo(repo, format.hash_algo); ++ repo->worktree_config_extension = format.worktree_config; + + if (worktree) + repo_set_worktree(repo, worktree); + + ## repository.h ## +@@ repository.h: struct repository { + + /* Indicate if a repository has a different 'commondir' from 'gitdir' */ + unsigned different_commondir:1; ++ unsigned worktree_config_extension:1; + }; + + extern struct repository *the_repository; + + ## setup.c ## +@@ setup.c: static int check_repository_format_gently(const char *gitdir, struct repository_ + + repository_format_precious_objects = candidate->precious_objects; + set_repository_format_partial_clone(candidate->partial_clone); +- repository_format_worktree_config = candidate->worktree_config; ++ the_repository->worktree_config_extension = candidate->worktree_config; + string_list_clear(&candidate->unknown_extensions, 0); + string_list_clear(&candidate->v1_only_extensions, 0); + +- if (repository_format_worktree_config) { ++ if (the_repository->worktree_config_extension) { + /* + * pick up core.bare and core.worktree from per-worktree + * config if present + ## t/helper/test-config.c ## @@ #include "cache.h" @@ t/helper/test-config.c: static int early_config_cb(const char *var, const char * + argc = parse_options(argc, argv, NULL, options, test_config_usage, + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_STOP_AT_NON_OPTION); + if (argc < 2) -+ die("Please, provide a command name on the command-line"); ++ usage_with_options(test_config_usage, options); if (argc == 3 && !strcmp(argv[1], "read_early_config")) { + if (subrepo_path) + die("cannot use --submodule with read_early_config"); read_early_config(early_config_cb, (void *)argv[2]); - return ret; + return 0; } @@ t/helper/test-config.c: int cmd__config(int argc, const char **argv) 7: 902556a7b6 = 8: 2fc889c9c2 grep: honor sparse checkout patterns 8: 70e7d7b90c = 9: 92bc5351cf config: add setting to ignore sparsity patterns in some cmds -- 2.28.0 ^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v6 1/9] doc: grep: unify info on configuration variables 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares ` (8 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which can make maintenance difficult. The first also contains a definition not present in the latter (grep.fullName). To avoid problems like this, let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config/grep.txt | 10 ++++++++-- Documentation/git-grep.txt | 36 ++++++----------------------------- 2 files changed, 14 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..dd51db38e1 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,14 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` +ifndef::git-grep[] + in linkgit:git-grep[1] +endif::git-grep[] + for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index a7f9bc99ea..9bdf807584 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,8 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +:git-grep: 1 +include::config/grep.txt[] OPTIONS ------- @@ -269,8 +243,10 @@ providing this option will cause it to die. found. --threads <num>:: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares ` (7 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine One test in t1308 expects test-config to fail with exit code 128 due to a parsing error in the config machinery. But test-config might also exit with 128 for any other reason that leads it to call die(). Therefore the test can potentially succeed for the wrong reason. To avoid false positives, let's check test-config's stderr, in addition to the exit code, and make sure that the cause of the error is the one we expect in this test. Moreover, the test was using the auxiliary function check_config, which optionally takes a number to compare with test-config's exit code, and a string to compare with its stdout. Because the function does not check stderr, it can induce improper uses, like the one corrected in this patch. To avoid this, remove the optional expect_code parameter, disallowing tests that expect an error from test-config to use this helper function. There is one error, though, which is printed to stdout despite returning a non-zero code: "value not found" (exit code 1). For this one, let's add another function which properly checks stdout and the code. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/t1308-config-set.sh | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh index 3a527e3a84..66c6363080 100755 --- a/t/t1308-config-set.sh +++ b/t/t1308-config-set.sh @@ -7,18 +7,15 @@ test_description='Test git config-set API in different settings' # 'check_config get_* section.key value' verifies that the entry for # section.key is 'value' check_config () { - if test "$1" = expect_code - then - expect_code="$2" && shift && shift - else - expect_code=0 - fi && - op=$1 key=$2 && shift && shift && - if test $# != 0 - then - printf "%s\n" "$@" - fi >expect && - test_expect_code $expect_code test-tool config "$op" "$key" >actual && + test-tool config "$1" "$2" >actual && + shift && shift && + printf "%s\n" "$@" >expect && + test_cmp expect actual +} + +check_not_found () { + test_expect_code 1 test-tool config "$1" "$2" >actual && + echo "Value not found for \"$2\"" >expect && test_cmp expect actual } @@ -108,7 +105,7 @@ test_expect_success 'key with case insensitive section header & variable' ' ' test_expect_success 'find value with misspelled key' ' - check_config expect_code 1 get_value "my.fOo Bar.hi" "Value not found for \"my.fOo Bar.hi\"" + check_not_found get_value "my.fOo Bar.hi" ' test_expect_success 'find value with the highest priority' ' @@ -121,7 +118,7 @@ test_expect_success 'find integer value for a key' ' test_expect_success 'find string value for a key' ' check_config get_string case.baz hask && - check_config expect_code 1 get_string case.ba "Value not found for \"case.ba\"" + check_not_found get_string case.ba ' test_expect_success 'check line error when NULL string is queried' ' @@ -130,7 +127,8 @@ test_expect_success 'check line error when NULL string is queried' ' ' test_expect_success 'find integer if value is non parse-able' ' - check_config expect_code 128 get_int lamb.head + test_expect_code 128 test-tool config get_int lamb.head 2>result && + test_i18ngrep "fatal: bad numeric config value .none. for .lamb\.head." result ' test_expect_success 'find bool value for the entered key' ' -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares ` (6 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine The test-config helper can return at least three different exit codes to reflect the status of the requested operation. And these codes are checked in some of the tests. But there is an inconsistent place in the helper where an usage error returns the same code as a "value not found" error. Let's fix that and, while we are here, document the meaning of each exit code in the file's header. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index a6e936721f..9e9d50099a 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -30,6 +30,11 @@ * iterate -> iterate over all values using git_config(), and print some * data for each * + * Exit codes: + * 0: success + * 1: value not found for the given config key + * 2: config file path given as argument is inaccessible or doesn't exist + * * Examples: * * To print the value with highest priority for key "foo.bAr Baz.rock": @@ -80,10 +85,10 @@ int cmd__config(int argc, const char **argv) git_configset_init(&cs); - if (argc < 2) { - fprintf(stderr, "Please, provide a command name on the command-line\n"); - goto exit1; - } else if (argc == 3 && !strcmp(argv[1], "get_value")) { + if (argc < 2) + die("Please, provide a command name on the command-line"); + + if (argc == 3 && !strcmp(argv[1], "get_value")) { if (!git_config_get_value(argv[2], &v)) { if (!v) printf("(NULL)\n"); -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (2 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares ` (5 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine test-config verifies that the correct number of arguments was given for all of its commands except for 'configset_get_value' and 'configset_get_value_multi'. Add the check to these two, so that we properly report missing arguments and prevent out-of-bounds access to argv[]. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 9e9d50099a..26d9c2ac4c 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -138,7 +138,7 @@ int cmd__config(int argc, const char **argv) printf("Value not found for \"%s\"\n", argv[2]); goto exit1; } - } else if (!strcmp(argv[1], "configset_get_value")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -156,7 +156,7 @@ int cmd__config(int argc, const char **argv) printf("Value not found for \"%s\"\n", argv[2]); goto exit1; } - } else if (!strcmp(argv[1], "configset_get_value_multi")) { + } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 5/9] t/helper/test-config: unify exit labels 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (3 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares ` (4 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine test-config's main function has three different exit labels, all of which have to perform the same cleanup code before returning. Unify the labels in preparation for a future patch which will increase the cleanup section. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- t/helper/test-config.c | 49 ++++++++++++++++-------------------------- 1 file changed, 19 insertions(+), 30 deletions(-) diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 26d9c2ac4c..06c61d91e1 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -69,9 +69,12 @@ static int early_config_cb(const char *var, const char *value, void *vdata) return 0; } +#define TC_VALUE_NOT_FOUND 1 +#define TC_CONFIG_FILE_ERROR 2 + int cmd__config(int argc, const char **argv) { - int i, val; + int i, val, ret = 0; const char *v; const struct string_list *strptr; struct config_set cs; @@ -94,10 +97,9 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { strptr = git_config_get_value_multi(argv[2]); @@ -109,41 +111,38 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { if (!git_config_get_int(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { if (!git_config_get_bool(argv[2], &val)) { printf("%d\n", val); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { if (!git_config_get_string_tmp(argv[2], &v)) { printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } if (!git_configset_get_value(&cs, argv[2], &v)) { @@ -151,17 +150,17 @@ int cmd__config(int argc, const char **argv) printf("(NULL)\n"); else printf("%s\n", v); - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { fprintf(stderr, "Error (%d) reading configuration file %s.\n", err, argv[i]); - goto exit2; + ret = TC_CONFIG_FILE_ERROR; + goto out; } } strptr = git_configset_get_value_multi(&cs, argv[2]); @@ -173,27 +172,17 @@ int cmd__config(int argc, const char **argv) else printf("%s\n", v); } - goto exit0; } else { printf("Value not found for \"%s\"\n", argv[2]); - goto exit1; + ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { git_config(iterate_cb, NULL); - goto exit0; + } else { + die("%s: Please check the syntax and the function name", argv[0]); } - die("%s: Please check the syntax and the function name", argv[0]); - -exit0: - git_configset_clear(&cs); - return 0; - -exit1: - git_configset_clear(&cs); - return 1; - -exit2: +out: git_configset_clear(&cs); - return 2; + return ret; } -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (4 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares ` (3 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine From: Jonathan Nieder <jrnieder@gmail.com> The following patch will fix a bug in do_git_config_sequence, which makes it ignore worktree-specific configurations on submodules when the_repository represents the superproject. To do so, the function will need access to the 'struct repository' instance of the submodule. But it currently only receives the 'git_dir' and 'commondir' paths through 'struct config_options'. So change the struct to hold a repository pointer instead of the two strings, and adjust its users. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Hi, Jonathan. I just made a small change in this patch, in read_early_repo(): when running some test cases in t0001, I noticed that `the_early_repo.settings.initialized` was 0 even though the repo was populated. So I added a flag to track the repo state for the later cleanup. builtin/config.c | 6 ++---- config.c | 35 +++++++++++++++++++++-------------- config.h | 4 ++-- 3 files changed, 25 insertions(+), 20 deletions(-) diff --git a/builtin/config.c b/builtin/config.c index 5e39f61885..ca4caedf33 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -699,10 +699,8 @@ int cmd_config(int argc, const char **argv, const char *prefix) config_options.respect_includes = !given_config_source.file; else config_options.respect_includes = respect_includes_opt; - if (!nongit) { - config_options.commondir = get_git_common_dir(); - config_options.git_dir = get_git_dir(); - } + if (!nongit) + config_options.repo = the_repository; if (end_nul) { term = '\0'; diff --git a/config.c b/config.c index 2bdff4457b..97f3022c92 100644 --- a/config.c +++ b/config.c @@ -222,8 +222,8 @@ static int include_by_gitdir(const struct config_options *opts, const char *git_dir; int already_tried_absolute = 0; - if (opts->git_dir) - git_dir = opts->git_dir; + if (opts->repo && opts->repo->gitdir) + git_dir = opts->repo->gitdir; else goto done; @@ -1720,10 +1720,10 @@ static int do_git_config_sequence(const struct config_options *opts, char *repo_config; enum config_scope prev_parsing_scope = current_parsing_scope; - if (opts->commondir) - repo_config = mkpathdup("%s/config", opts->commondir); - else if (opts->git_dir) - BUG("git_dir without commondir"); + if (opts->repo && opts->repo->commondir) + repo_config = mkpathdup("%s/config", opts->repo->commondir); + else if (opts->repo && opts->repo->gitdir) + BUG("gitdir without commondir"); else repo_config = NULL; @@ -1824,27 +1824,35 @@ void read_early_config(config_fn_t cb, void *data) struct config_options opts = {0}; struct strbuf commondir = STRBUF_INIT; struct strbuf gitdir = STRBUF_INIT; + struct repository the_early_repo = {0}; + int early_repo_initialized = 0; opts.respect_includes = 1; if (have_git_dir()) { - opts.commondir = get_git_common_dir(); - opts.git_dir = get_git_dir(); + opts.repo = the_repository; /* * When setup_git_directory() was not yet asked to discover the * GIT_DIR, we ask discover_git_directory() to figure out whether there * is any repository config we should use (but unlike - * setup_git_directory_gently(), no global state is changed, most + * setup_git_directory_gently(), no global state is changed; most * notably, the current working directory is still the same after the * call). + * + * NEEDSWORK: There is some duplicate work between + * discover_git_directory and repo_init. Update to use a variant of + * repo_init that does its own repository discovery once available. */ - } else if (!discover_git_directory(&commondir, &gitdir)) { - opts.commondir = commondir.buf; - opts.git_dir = gitdir.buf; + } else if (!discover_git_directory(&commondir, &gitdir) && + !repo_init(&the_early_repo, gitdir.buf, NULL)) { + opts.repo = &the_early_repo; + early_repo_initialized = 1; } config_with_options(cb, data, NULL, &opts); + if (early_repo_initialized) + repo_clear(&the_early_repo); strbuf_release(&commondir); strbuf_release(&gitdir); } @@ -2097,8 +2105,7 @@ static void repo_read_config(struct repository *repo) struct config_options opts = { 0 }; opts.respect_includes = 1; - opts.commondir = repo->commondir; - opts.git_dir = repo->gitdir; + opts.repo = repo; if (!repo->config) repo->config = xcalloc(1, sizeof(struct config_set)); diff --git a/config.h b/config.h index 91cdfbfb41..e56293fb29 100644 --- a/config.h +++ b/config.h @@ -21,6 +21,7 @@ */ struct object_id; +struct repository; /* git_config_parse_key() returns these negated: */ #define CONFIG_INVALID_KEY 1 @@ -87,8 +88,7 @@ struct config_options { unsigned int ignore_worktree : 1; unsigned int ignore_cmdline : 1; unsigned int system_gently : 1; - const char *commondir; - const char *git_dir; + struct repository *repo; config_parser_event_fn_t event_fn; void *event_fn_data; enum config_error_action { -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 7/9] config: correctly read worktree configs in submodules 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (5 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares ` (2 subsequent siblings) 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine The config machinery is not able to read worktree configs from a submodule in a process where the_repository represents the superproject and extensions.worktreeConfig is not set there. Furthermore, when extensions.worktreeConfig is set on the superproject, querying for a worktree config in a submodule will, instead, return the value set at the superproject. The relevant code is in do_git_config_sequence(). Although it is designed to act on an arbitrary repository, specified in the passed-in `struct config_options`, it accidentally depends on the_repository in two places: - it reads the global variable `repository_format_worktree_config`, which is set based on the content of the_repository, to determine whether extensions.worktreeConfig is set. - it uses the git_pathdup() helper to find the config.worktree file, instead of making a path using the passed-in repository. Sever these dependencies and add a regression test. Also, to avoid future misuses of `repository_format_worktree_config` like this one, remove this global variable and store the config value on `struct repository` itself. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- builtin/config.c | 2 +- cache.h | 1 - config.c | 12 +++++--- environment.c | 1 - repository.c | 1 + repository.h | 1 + setup.c | 4 +-- t/helper/test-config.c | 62 ++++++++++++++++++++++++++++++++------ t/t2404-worktree-config.sh | 16 ++++++++++ 9 files changed, 81 insertions(+), 19 deletions(-) diff --git a/builtin/config.c b/builtin/config.c index ca4caedf33..586faad359 100644 --- a/builtin/config.c +++ b/builtin/config.c @@ -673,7 +673,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) given_config_source.scope = CONFIG_SCOPE_LOCAL; } else if (use_worktree_config) { struct worktree **worktrees = get_worktrees(); - if (repository_format_worktree_config) + if (!nongit && the_repository->worktree_config_extension) given_config_source.file = git_pathdup("config.worktree"); else if (worktrees[0] && worktrees[1]) die(_("--worktree cannot be used with multiple " diff --git a/cache.h b/cache.h index 4cad61ffa4..4cdce7b68f 100644 --- a/cache.h +++ b/cache.h @@ -1029,7 +1029,6 @@ extern int grafts_replace_parents; #define GIT_REPO_VERSION 0 #define GIT_REPO_VERSION_READ 1 extern int repository_format_precious_objects; -extern int repository_format_worktree_config; /* * You _have_ to initialize a `struct repository_format` using diff --git a/config.c b/config.c index 97f3022c92..38c177d1e5 100644 --- a/config.c +++ b/config.c @@ -1747,11 +1747,13 @@ static int do_git_config_sequence(const struct config_options *opts, ret += git_config_from_file(fn, repo_config, data); current_parsing_scope = CONFIG_SCOPE_WORKTREE; - if (!opts->ignore_worktree && repository_format_worktree_config) { - char *path = git_pathdup("config.worktree"); - if (!access_or_die(path, R_OK, 0)) - ret += git_config_from_file(fn, path, data); - free(path); + if (!opts->ignore_worktree && opts->repo && opts->repo->gitdir && + opts->repo->worktree_config_extension) { + struct strbuf path = STRBUF_INIT; + strbuf_repo_git_path(&path, opts->repo, "config.worktree"); + if (!access_or_die(path.buf, R_OK, 0)) + ret += git_config_from_file(fn, path.buf, data); + strbuf_release(&path); } current_parsing_scope = CONFIG_SCOPE_COMMAND; diff --git a/environment.c b/environment.c index bb518c61cd..5bfb8cf9c2 100644 --- a/environment.c +++ b/environment.c @@ -32,7 +32,6 @@ int warn_ambiguous_refs = 1; int warn_on_object_refname_ambiguity = 1; int ref_paranoia = -1; int repository_format_precious_objects; -int repository_format_worktree_config; const char *git_commit_encoding; const char *git_log_output_encoding; char *apply_default_whitespace; diff --git a/repository.c b/repository.c index a4174ddb06..8d18d9d3f2 100644 --- a/repository.c +++ b/repository.c @@ -170,6 +170,7 @@ int repo_init(struct repository *repo, goto error; repo_set_hash_algo(repo, format.hash_algo); + repo->worktree_config_extension = format.worktree_config; if (worktree) repo_set_worktree(repo, worktree); diff --git a/repository.h b/repository.h index 3c1f7d54bd..3d060bc3e6 100644 --- a/repository.h +++ b/repository.h @@ -136,6 +136,7 @@ struct repository { /* Indicate if a repository has a different 'commondir' from 'gitdir' */ unsigned different_commondir:1; + unsigned worktree_config_extension:1; }; extern struct repository *the_repository; diff --git a/setup.c b/setup.c index c04cd25a30..73cd42dbbe 100644 --- a/setup.c +++ b/setup.c @@ -567,11 +567,11 @@ static int check_repository_format_gently(const char *gitdir, struct repository_ repository_format_precious_objects = candidate->precious_objects; set_repository_format_partial_clone(candidate->partial_clone); - repository_format_worktree_config = candidate->worktree_config; + the_repository->worktree_config_extension = candidate->worktree_config; string_list_clear(&candidate->unknown_extensions, 0); string_list_clear(&candidate->v1_only_extensions, 0); - if (repository_format_worktree_config) { + if (the_repository->worktree_config_extension) { /* * pick up core.bare and core.worktree from per-worktree * config if present diff --git a/t/helper/test-config.c b/t/helper/test-config.c index 06c61d91e1..87488aab6b 100644 --- a/t/helper/test-config.c +++ b/t/helper/test-config.c @@ -2,12 +2,20 @@ #include "cache.h" #include "config.h" #include "string-list.h" +#include "submodule-config.h" +#include "parse-options.h" /* * This program exposes the C API of the configuration mechanism * as a set of simple commands in order to facilitate testing. * - * Reads stdin and prints result of command to stdout: + * Usage: test-tool config [--submodule=<path>] <cmd> [<args>] + * + * If --submodule=<path> is given, <cmd> will operate on the submodule at the + * given <path>. This option is not valid for the commands: read_early_config, + * configset_get_value and configset_get_value_multi. + * + * Possible cmds are: * * get_value -> prints the value with highest priority for the entered key * @@ -72,14 +80,34 @@ static int early_config_cb(const char *var, const char *value, void *vdata) #define TC_VALUE_NOT_FOUND 1 #define TC_CONFIG_FILE_ERROR 2 +static const char *test_config_usage[] = { + "test-tool config [--submodule=<path>] <cmd> [<args>]", + NULL +}; + int cmd__config(int argc, const char **argv) { int i, val, ret = 0; const char *v; const struct string_list *strptr; struct config_set cs; + struct repository subrepo, *repo = the_repository; + const char *subrepo_path = NULL; + + struct option options[] = { + OPT_STRING(0, "submodule", &subrepo_path, "path", + "run <cmd> on the submodule at <path>"), + OPT_END() + }; + + argc = parse_options(argc, argv, NULL, options, test_config_usage, + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_STOP_AT_NON_OPTION); + if (argc < 2) + usage_with_options(test_config_usage, options); if (argc == 3 && !strcmp(argv[1], "read_early_config")) { + if (subrepo_path) + die("cannot use --submodule with read_early_config"); read_early_config(early_config_cb, (void *)argv[2]); return 0; } @@ -88,11 +116,18 @@ int cmd__config(int argc, const char **argv) git_configset_init(&cs); - if (argc < 2) - die("Please, provide a command name on the command-line"); + if (subrepo_path) { + const struct submodule *sub; + + sub = submodule_from_path(the_repository, &null_oid, subrepo_path); + if (!sub || repo_submodule_init(&subrepo, the_repository, sub)) + die("invalid argument to --submodule: '%s'", subrepo_path); + + repo = &subrepo; + } if (argc == 3 && !strcmp(argv[1], "get_value")) { - if (!git_config_get_value(argv[2], &v)) { + if (!repo_config_get_value(repo, argv[2], &v)) { if (!v) printf("(NULL)\n"); else @@ -102,7 +137,7 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) { - strptr = git_config_get_value_multi(argv[2]); + strptr = repo_config_get_value_multi(repo, argv[2]); if (strptr) { for (i = 0; i < strptr->nr; i++) { v = strptr->items[i].string; @@ -116,27 +151,31 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_int")) { - if (!git_config_get_int(argv[2], &val)) { + if (!repo_config_get_int(repo, argv[2], &val)) { printf("%d\n", val); } else { printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_bool")) { - if (!git_config_get_bool(argv[2], &val)) { + if (!repo_config_get_bool(repo, argv[2], &val)) { printf("%d\n", val); } else { + printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc == 3 && !strcmp(argv[1], "get_string")) { - if (!git_config_get_string_tmp(argv[2], &v)) { + if (!repo_config_get_string_tmp(repo, argv[2], &v)) { printf("%s\n", v); } else { printf("Value not found for \"%s\"\n", argv[2]); ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value")) { + if (subrepo_path) + die("cannot use --submodule with configset_get_value"); + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -155,6 +194,9 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (argc >= 3 && !strcmp(argv[1], "configset_get_value_multi")) { + if (subrepo_path) + die("cannot use --submodule with configset_get_value_multi"); + for (i = 3; i < argc; i++) { int err; if ((err = git_configset_add_file(&cs, argv[i]))) { @@ -177,12 +219,14 @@ int cmd__config(int argc, const char **argv) ret = TC_VALUE_NOT_FOUND; } } else if (!strcmp(argv[1], "iterate")) { - git_config(iterate_cb, NULL); + repo_config(repo, iterate_cb, NULL); } else { die("%s: Please check the syntax and the function name", argv[0]); } out: git_configset_clear(&cs); + if (repo != the_repository) + repo_clear(repo); return ret; } diff --git a/t/t2404-worktree-config.sh b/t/t2404-worktree-config.sh index 9536d10919..1e32c93735 100755 --- a/t/t2404-worktree-config.sh +++ b/t/t2404-worktree-config.sh @@ -78,4 +78,20 @@ test_expect_success 'config.worktree no longer read without extension' ' test_cmp_config -C wt2 shared this.is ' +test_expect_success 'correctly read config.worktree from submodules' ' + test_unconfig extensions.worktreeConfig && + git init sub && + ( + cd sub && + test_commit A && + git config extensions.worktreeConfig true && + git config --worktree wtconfig.sub test-value + ) && + git submodule add ./sub && + git commit -m "add sub" && + echo test-value >expect && + test-tool config --submodule=sub get_value wtconfig.sub >actual && + test_cmp expect actual +' + test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 8/9] grep: honor sparse checkout patterns 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (6 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2021-02-09 21:33 ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and reports all matches found outside this subset, which kind of goes in the opposite direction. There are some use cases for ignoring the sparsity patterns and the next commit will add an option to obtain this behavior, but here we start by making grep honor the sparsity boundaries in every case where this is relevant: - git grep in worktree - git grep --cached - git grep $REVISION For the worktree and cached cases, we iterate over paths without the SKIP_WORKTREE bit set, and limit our searches to these paths. For the $REVISION case, we limit the paths we search to those that match the sparsity patterns. (We do not check the SKIP_WORKTREE bit for the $REVISION case, because $REVISION may contain paths that do not exist in HEAD and thus for which we have no SKIP_WORKTREE bit to consult. The sparsity patterns tell us how the SKIP_WORKTREE bit would be set if we were to check out $REVISION, so we consult those. Also, we don't use the sparsity patterns with the worktree or cached cases, both because we have a bit we can check directly and more efficiently, and because unmerged entries from a merge or a rebase could cause more files to temporarily be present than the sparsity patterns would normally select.) Note that there is a special case here: `git grep $TREE`. In this case, we cannot know whether $TREE corresponds to the root of the repository or some sub-tree, and thus there is no way for us to know which sparsity patterns, if any, apply. So the $TREE case will not use sparsity patterns or any SKIP_WORKTREE bits and will instead always search all files within the $TREE. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- builtin/grep.c | 125 ++++++++++++++++++-- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 195 +++++++++++++++++++++++++++++++ 3 files changed, 312 insertions(+), 17 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index f58979bc3f..a32815de0a 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -410,7 +410,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -552,9 +555,76 @@ static int grep_cache(struct grep_opt *opt, return hit; } -static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) +static struct pattern_list *get_sparsity_patterns(struct repository *repo) +{ + struct pattern_list *patterns; + char *sparse_file; + int sparse_config, cone_config; + + if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || + !sparse_config) { + return NULL; + } + + sparse_file = repo_git_path(repo, "info/sparse-checkout"); + patterns = xcalloc(1, sizeof(*patterns)); + + if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) + cone_config = 0; + patterns->use_cone_patterns = cone_config; + + if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { + if (file_exists(sparse_file)) { + warning(_("failed to load sparse-checkout file: '%s'"), + sparse_file); + } + free(sparse_file); + free(patterns); + return NULL; + } + + free(sparse_file); + return patterns; +} + +static int path_in_sparse_checkout(struct strbuf *path, int prefix_len, + unsigned int entry_mode, + struct index_state *istate, + struct pattern_list *sparsity, + enum pattern_match_result parent_match, + enum pattern_match_result *match) +{ + int dtype = DT_UNKNOWN; + int is_dir = S_ISDIR(entry_mode); + + if (parent_match == MATCHED_RECURSIVE) { + *match = parent_match; + return 1; + } + + if (is_dir && !is_dir_sep(path->buf[path->len - 1])) + strbuf_addch(path, '/'); + + *match = path_matches_pattern_list(path->buf, path->len, + path->buf + prefix_len, &dtype, + sparsity, istate); + if (*match == UNDECIDED) + *match = parent_match; + + if (is_dir) + strbuf_trim_trailing_dir_sep(path); + + if (*match == NOT_MATCHED && + (!is_dir || (is_dir && sparsity->use_cone_patterns))) + return 0; + + return 1; +} + +static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int check_attr, struct pattern_list *sparsity, + enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ -570,6 +640,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); + enum pattern_match_result sparsity_match = 0; if (match != all_entries_interesting) { strbuf_addstr(&name, base->buf + tn_len); @@ -586,6 +657,19 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (sparsity) { + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, base->buf + tn_len); + + if (!path_in_sparse_checkout(&path, old_baselen - tn_len, + entry.mode, repo->index, + sparsity, default_sparsity_match, + &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); @@ -602,8 +686,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, + check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, @@ -621,6 +705,31 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, return hit; } +/* + * Note: sparsity patterns and paths' attributes will only be considered if + * is_root_tree has true value. (Otherwise, we cannot properly perform pattern + * matching on paths.) + */ +static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + struct tree_desc *tree, struct strbuf *base, int tn_len, + int is_root_tree) +{ + struct pattern_list *patterns = NULL; + int ret; + + if (is_root_tree) + patterns = get_sparsity_patterns(opt->repo); + + ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, + patterns, 0); + + if (patterns) { + clear_pattern_list(patterns); + free(patterns); + } + return ret; +} + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct object *obj, const char *name, const char *path) { diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..b3109e3479 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,195 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +|-- sub +| |-- A +| | `-- a +| `-- B +| `-- b +`-- sub2 + `-- a + +Where the outer repository has non-cone mode sparsity patterns, sub is a +submodule with cone mode sparsity patterns and sub2 is a submodule that is +excluded by the superproject sparsity patterns. The resulting sparse checkout +should leave the following structure in the working tree: + +. +|-- a +|-- sub +| `-- B +| `-- b +`-- sub2 + `-- a + +But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git init sub2 && + ( + cd sub2 && + echo "text" >a && + git add a && + git commit -m sub2 + ) && + + git submodule add ./sub && + git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" "sub" && + + git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b && + test_path_is_file sub2/a +' + +# The test below checks a special case: the sparsity patterns exclude '/b' +# and sparse checkout is enabled, but the path exists in the working tree (e.g. +# manually created after `git sparse-checkout init`). In this case, grep should +# skip it. +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' + cat >expect <<-EOF && + b:modified-b-in-branchX + b:modified-b-in-branchY + EOF + test_when_finished "test_might_fail git merge --abort && \ + git checkout master" && + + git sparse-checkout disable && + git checkout -b branchY master && + test_commit modified-b-in-branchY b && + git checkout -b branchX master && + test_commit modified-b-in-branchX b && + + git sparse-checkout init && + test_path_is_missing b && + test_must_fail git merge branchY && + git grep "modified-b" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep <commit-ish> should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_tag-to-tree <<-EOF && + tag-to-tree:a:text + tag-to-tree:b:text + tag-to-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" tag-to-tree >actual_tag-to-tree && + test_cmp expect_tag-to-tree actual_tag-to-tree +' + +# Note that sub2/ is present in the worktree but it is excluded by the sparsity +# patterns, so grep should not recurse into it. +test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:sub/B/b:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:sub/B/b:text + EOF + git grep --recurse-submodules "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep --recurse-submodules "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_done -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (7 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares @ 2020-09-10 17:21 ` Matheus Tavares 2021-02-09 21:33 ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares 9 siblings, 0 replies; 123+ messages in thread From: Matheus Tavares @ 2020-09-10 17:21 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren, jonathantanmy, jrnieder, sunshine When sparse checkout is enabled, some users expect the output of certain commands (such as grep, diff, and log) to be also restricted within the sparsity patterns. This would allow them to effectively work only on the subset of files in which they are interested; and allow some commands to possibly perform better, by not considering uninteresting paths. For this reason, we taught grep to honor the sparsity patterns, in the previous patch. But, on the other hand, allowing grep and the other commands mentioned to optionally ignore the patterns also make for some interesting use cases. E.g. using grep to search for a function documentation that resides outside the sparse checkout. In any case, there is no current way for users to configure the behavior they want for these commands. Aiming to provide this flexibility, let's introduce the sparse.restrictCmds setting (and the analogous --[no]-restrict-to-sparse-paths global option). The default value is true. For now, grep is the only one affected by this setting, but the goal is to have support for more commands, in the future. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Documentation/config.txt | 2 + Documentation/config/grep.txt | 8 ++ Documentation/config/sparse.txt | 20 ++++ Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 13 ++- contrib/completion/git-completion.bash | 2 + git.c | 5 + sparse-checkout.c | 18 ++++ sparse-checkout.h | 11 +++ t/t7817-grep-sparse-checkout.sh | 132 ++++++++++++++++++++++++- t/t9902-completion.sh | 4 +- 12 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h diff --git a/Documentation/config.txt b/Documentation/config.txt index 3042d80978..3b6e0901b8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -438,6 +438,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index dd51db38e1..a3275ab4b7 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -28,3 +28,11 @@ grep.fullName:: grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. + +ifdef::git-grep[] +sparse.restrictCmds:: + See base definition in linkgit:git-config[1]. grep honors + sparse.restrictCmds by limiting searches to the sparsity paths in three + cases: when searching the working tree, when searching the index with + --cached, and when searching a specified commit. +endif::git-grep[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 0000000000..494761526e --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,20 @@ +sparse.restrictCmds:: + Only meaningful in conjunction with core.sparseCheckout. This option + extends sparse checkouts (which limit which paths are written to the + working tree), so that output and operations are also limited to the + sparsity paths where possible and implemented. The purpose of this + option is to (1) focus output for the user on the portion of the + repository that is of interest to them, and (2) enable potentially + dramatic performance improvements, especially in conjunction with + partial clones. ++ +When this option is true (default), some git commands may limit their behavior +to the paths specified by the sparsity patterns, or to the intersection of +those paths and any (like `*.c`) that the user might also specify on the +command line. When false, the affected commands will work on full trees, +ignoring the sparsity patterns. For now, only git-grep honors this setting. ++ +Note: commands which export, integrity check, or create history will always +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.), +unaffected by any sparsity patterns. Also, writing commands such as +sparse-checkout and read-tree will not be affected by this configuration. diff --git a/Documentation/git.txt b/Documentation/git.txt index 2fc92586b5..d857509573 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use Do not perform optional operations that require locks. This is equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. +--[no-]restrict-to-sparse-paths:: + Overrides the sparse.restrictCmds configuration (see + linkgit:git-config[1]) for this execution. + --list-cmds=group[,group...]:: List commands by group. This is an internal/experimental option and may change or be removed in the future. Supported diff --git a/Makefile b/Makefile index 65f8cfb236..778c9e499e 100644 --- a/Makefile +++ b/Makefile @@ -982,6 +982,7 @@ LIB_OBJS += sha1-name.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-checkout.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/builtin/grep.c b/builtin/grep.c index a32815de0a..3fa364e91c 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -25,6 +25,7 @@ #include "submodule-config.h" #include "object-store.h" #include "packfile.h" +#include "sparse-checkout.h" static char const * const grep_usage[] = { N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"), @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt, int nr; struct strbuf name = STRBUF_INIT; int name_base_len = 0; + int sparse_paths_only = restrict_to_sparse_paths(repo); if (repo->submodule_prefix) { name_base_len = strlen(repo->submodule_prefix); strbuf_addstr(&name, repo->submodule_prefix); @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (sparse_paths_only && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int is_root_tree) { struct pattern_list *patterns = NULL; + int sparse_paths_only = restrict_to_sparse_paths(opt->repo); int ret; - if (is_root_tree) + if (is_root_tree && sparse_paths_only) patterns = get_sparsity_patterns(opt->repo); ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, @@ -1258,6 +1261,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!use_index || untracked) { int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude; + + if (opt_restrict_to_sparse_paths >= 0) { + die(_("--[no-]restrict-to-sparse-paths is incompatible" + " with --no-index and --untracked")); + } + hit = grep_directory(&opt, &pathspec, use_exclude, use_index); } else if (0 <= opt_exclude) { die(_("--[no-]exclude-standard cannot be used for tracked contents")); diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 9147fba3d5..de12766a70 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -3402,6 +3402,8 @@ __git_main () --namespace= --no-replace-objects --help + --restrict-to-sparse-paths + --no-restrict-to-sparse-paths " ;; *) diff --git a/git.c b/git.c index 8bd1d7551d..81206b424c 100644 --- a/git.c +++ b/git.c @@ -5,6 +5,7 @@ #include "run-command.h" #include "alias.h" #include "shallow.h" +#include "sparse-checkout.h" #define RUN_SETUP (1<<0) #define RUN_SETUP_GENTLY (1<<1) @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) } else { exit(list_cmds(cmd)); } + } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 1; + } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) { + opt_restrict_to_sparse_paths = 0; } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); diff --git a/sparse-checkout.c b/sparse-checkout.c new file mode 100644 index 0000000000..96c5ed5446 --- /dev/null +++ b/sparse-checkout.c @@ -0,0 +1,18 @@ +#include "cache.h" +#include "config.h" +#include "sparse-checkout.h" + +int opt_restrict_to_sparse_paths = -1; + +int restrict_to_sparse_paths(struct repository *repo) +{ + int ret; + + if (opt_restrict_to_sparse_paths >= 0) + return opt_restrict_to_sparse_paths; + + if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret)) + ret = 1; + + return ret; +} diff --git a/sparse-checkout.h b/sparse-checkout.h new file mode 100644 index 0000000000..a4805e443a --- /dev/null +++ b/sparse-checkout.h @@ -0,0 +1,11 @@ +#ifndef SPARSE_CHECKOUT_H +#define SPARSE_CHECKOUT_H + +struct repository; + +extern int opt_restrict_to_sparse_paths; + +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */ +int restrict_to_sparse_paths(struct repository *repo); + +#endif /* SPARSE_CHECKOUT_H */ diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index b3109e3479..f93a4f71d1 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -80,10 +80,10 @@ test_expect_success 'setup' ' test_path_is_file sub2/a ' -# The test below checks a special case: the sparsity patterns exclude '/b' +# The two tests below check a special case: the sparsity patterns exclude '/b' # and sparse checkout is enabled, but the path exists in the working tree (e.g. # manually created after `git sparse-checkout init`). In this case, grep should -# skip it. +# skip the file by default, but not with --no-restrict-to-sparse-paths. test_expect_success 'grep in working tree should honor sparse checkout' ' cat >expect <<-EOF && a:text @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' ' git grep "text" >actual && test_cmp expect actual ' +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' ' + cat >expect <<-EOF && + a:text + b:new-text + EOF + echo "new-text" >b && + test_when_finished "rm b" && + git --no-restrict-to-sparse-paths grep "text" >actual && + test_cmp expect actual +' test_expect_success 'grep unmerged file despite not matching sparsity patterns' ' cat >expect <<-EOF && @@ -157,7 +167,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' ' # Note that sub2/ is present in the worktree but it is excluded by the sparsity -# patterns, so grep should not recurse into it. +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths. test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' ' cat >expect <<-EOF && a:text @@ -166,6 +176,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s git grep --recurse-submodules "text" >actual && test_cmp expect actual ' +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual && + test_cmp expect actual +' test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' cat >expect <<-EOF && @@ -192,4 +211,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse test_cmp expect_tag-to-commit actual_tag-to-commit ' +for cmd in 'git --no-restrict-to-sparse-paths grep' \ + 'git -c sparse.restrictCmds=false grep' \ + 'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep' +do + + test_expect_success "$cmd --cached should ignore sparsity patterns" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit + ' +done + +test_expect_success 'grep --recurse-submodules --cached w/ --no-restrict-to-sparse-paths' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \ + "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules <commit-ish> w/ --no-restrict-to-sparse-paths' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + $commit:sub/A/a:text + $commit:sub/B/b:text + $commit:sub2/a:text + EOF + cat >expect_tag-to-commit <<-EOF && + tag-to-commit:a:text + tag-to-commit:b:text + tag-to-commit:dir/c:text + tag-to-commit:sub/A/a:text + tag-to-commit:sub/B/b:text + tag-to-commit:sub2/a:text + EOF + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + $commit >actual_commit && + test_cmp expect_commit actual_commit && + git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \ + tag-to-commit >actual_tag-to-commit && + test_cmp expect_tag-to-commit actual_tag-to-commit +' + +test_expect_success 'should respect the sparse.restrictCmds values from submodules' ' + cat >expect <<-EOF && + a:text + sub/A/a:text + sub/B/b:text + EOF + test_config -C sub sparse.restrictCmds false && + git grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + test_config -C sub sparse.restrictCmds true && + git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual && + test_cmp expect actual +' + +for opt in '--untracked' '--no-index' +do + test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" " + test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual && + test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual + " +done + test_done diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 8425b9a531..a8c2ac9d70 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1928,6 +1928,8 @@ test_expect_success 'double dash "git" itself' ' --namespace= --no-replace-objects Z --help Z + --restrict-to-sparse-paths Z + --no-restrict-to-sparse-paths Z EOF ' @@ -1970,7 +1972,7 @@ test_expect_success 'general options' ' test_completion "git --nam" "--namespace=" && test_completion "git --bar" "--bare " && test_completion "git --inf" "--info-path " && - test_completion "git --no-r" "--no-replace-objects " + test_completion "git --no-rep" "--no-replace-objects " ' test_expect_success 'general options plus command' ' -- 2.28.0 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v7] grep: honor sparse-checkout on working tree searches 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares ` (8 preceding siblings ...) 2020-09-10 17:21 ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares @ 2021-02-09 21:33 ` Matheus Tavares 2021-02-09 23:23 ` Junio C Hamano 9 siblings, 1 reply; 123+ messages in thread From: Matheus Tavares @ 2021-02-09 21:33 UTC (permalink / raw) To: git; +Cc: gitster, stolee, newren On a sparse checked out repository, `git grep` (without --cached) ends up searching the cache when an entry matches the search pathspec and has the SKIP_WORKTREE bit set. This is confusing both because the sparse paths are not expected to be in a working tree search (as they are not checked out), and because the output mixes working tree and cache results without distinguishing them. (Note that grep also resorts to the cache on working tree searches that include --assume-unchanged paths. But the whole point in that case is to assume that the contents of the index entry and the file are the same. This does not apply to the case of sparse paths, where the file isn't even expected to be present.) Fix that by teaching grep to honor the sparse-checkout rules for working tree searches. If the user wants to grep paths outside the current sparse-checkout definition, they may either update the sparsity rules to materialize the files, or use --cached to search all blobs registered in the index. Note: it might also be interesting to add a configuration option that allow users to search paths that are present despite having the SKIP_WORKTREE bit set, and/or to restrict searches in the index and past revisions too. These ideas are left as future improvements to avoid conflicting with other sparse-checkout topics currently in flight. Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- This new version includes only the bug fix for the working tree grep, as discussed in [1]. I think there are a couple other patches that could be extracted from the previous v6 [2] and sent as standalone topics, without the risk of conflicting with the sparse-index work. E.g. the unification of the git-grep.txt and config/grep.txt doc files. I'll look into that tomorrow. [1]: https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/ [2]: https://lore.kernel.org/git/cover.1599758167.git.matheus.bernardino@usp.br/ builtin/grep.c | 7 +- t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 174 +++++++++++++++++++++++++++++++ 3 files changed, 179 insertions(+), 11 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index ca259af441..bff214c755 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -508,6 +508,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (!cached && ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -520,8 +524,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..590b99bbb6 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,174 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates a repo with the following structure: + +. +|-- a +|-- b +|-- dir +| `-- c +|-- sub +| |-- A +| | `-- a +| `-- B +| `-- b +`-- sub2 + `-- a + +Where the outer repository has non-cone mode sparsity patterns, sub is a +submodule with cone mode sparsity patterns and sub2 is a submodule that is +excluded by the superproject sparsity patterns. The resulting sparse checkout +should leave the following structure in the working tree: + +. +|-- a +|-- sub +| `-- B +| `-- b +`-- sub2 + `-- a + +But note that sub2 should have the SKIP_WORKTREE bit set. +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + + git init sub && + ( + cd sub && + mkdir A B && + echo "text" >A/a && + echo "text" >B/b && + git add A B && + git commit -m sub && + git sparse-checkout init --cone && + git sparse-checkout set B + ) && + + git init sub2 && + ( + cd sub2 && + echo "text" >a && + git add a && + git commit -m sub2 + ) && + + git submodule add ./sub && + git submodule add ./sub2 && + git add a b dir && + git commit -m super && + git sparse-checkout init --no-cone && + git sparse-checkout set "/*" "!b" "!/*/" "sub" && + + git tag -am tag-to-commit tag-to-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am tag-to-tree tag-to-tree $tree && + + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_missing sub/A && + test_path_is_file a && + test_path_is_file sub/B/b && + test_path_is_file sub2/a && + git branch -m main +' + +# The test below covers a special case: the sparsity patterns exclude '/b' and +# sparse checkout is enabled, but the path exists in the working tree (e.g. +# manually created after `git sparse-checkout init`). git grep should skip it. +test_expect_success 'working tree grep honors sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + test_when_finished "rm -f b" && + echo "new-text" >b && + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep searches unmerged file despite not matching sparsity patterns' ' + cat >expect <<-EOF && + b:modified-b-in-branchX + b:modified-b-in-branchY + EOF + test_when_finished "test_might_fail git merge --abort && \ + git checkout main && git sparse-checkout init" && + + git sparse-checkout disable && + git checkout -b branchY main && + test_commit modified-b-in-branchY b && + git checkout -b branchX main && + test_commit modified-b-in-branchX b && + + git sparse-checkout init && + test_path_is_missing b && + test_must_fail git merge branchY && + git grep "modified-b" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +# Note that sub2/ is present in the worktree but it is excluded by the sparsity +# patterns, so grep should not recurse into it. +test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + EOF + git grep --recurse-submodules "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + sub/A/a:text + sub/B/b:text + sub2/a:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'working tree grep does not search the index with CE_VALID and SKIP_WORKTREE' ' + cat >expect <<-EOF && + a:text + EOF + test_when_finished "git update-index --no-assume-unchanged b" && + git update-index --assume-unchanged b && + git grep text >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + test_when_finished "git update-index --no-assume-unchanged b" && + git update-index --assume-unchanged b && + git grep --cached text >actual && + test_cmp expect actual +' + +test_done -- 2.29.2 ^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v7] grep: honor sparse-checkout on working tree searches 2021-02-09 21:33 ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares @ 2021-02-09 23:23 ` Junio C Hamano 2021-02-10 6:12 ` Elijah Newren 0 siblings, 1 reply; 123+ messages in thread From: Junio C Hamano @ 2021-02-09 23:23 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, stolee, newren Matheus Tavares <matheus.bernardino@usp.br> writes: > This new version includes only the bug fix for the working tree grep, as > discussed in [1]. I think there are a couple other patches that could be > extracted from the previous v6 [2] and sent as standalone topics, > without the risk of conflicting with the sparse-index work. E.g. the > unification of the git-grep.txt and config/grep.txt doc files. I'll look > into that tomorrow. As mt/rm-sparse-checkout depends on the v6 that you are ejecting with this patch, I'll stop merging that topic to 'seen' for now. Thanks for keeping an eye on this one. ^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v7] grep: honor sparse-checkout on working tree searches 2021-02-09 23:23 ` Junio C Hamano @ 2021-02-10 6:12 ` Elijah Newren 0 siblings, 0 replies; 123+ messages in thread From: Elijah Newren @ 2021-02-10 6:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Matheus Tavares, Git Mailing List, Derrick Stolee On Tue, Feb 9, 2021 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > This new version includes only the bug fix for the working tree grep, as > > discussed in [1]. I think there are a couple other patches that could be > > extracted from the previous v6 [2] and sent as standalone topics, > > without the risk of conflicting with the sparse-index work. E.g. the > > unification of the git-grep.txt and config/grep.txt doc files. I'll look > > into that tomorrow. > > As mt/rm-sparse-checkout depends on the v6 that you are ejecting > with this patch, I'll stop merging that topic to 'seen' for now. > > Thanks for keeping an eye on this one. Yeah, Matheus and I have been talking about both topics off-list a bit; I hadn't yet responded to his final revision of a newer mt/rm-sparse-checkout but he has a good replacement for that topic now that he'll send to the list soon. Anyway, for the patch submitted here you can add a Reviewed-by: Elijah Newren <newren@gmail.com> ^ permalink raw reply [flat|nested] 123+ messages in thread
end of thread, other threads:[~2021-02-10 6:22 UTC | newest] Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-24 6:04 [RFC PATCH 0/3] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-03-24 6:11 ` [RFC PATCH 1/3] doc: grep: unify info on configuration variables Matheus Tavares 2020-03-24 7:57 ` Elijah Newren 2020-03-24 21:26 ` Junio C Hamano 2020-03-24 23:38 ` Matheus Tavares 2020-03-24 6:12 ` [RFC PATCH 2/3] grep: honor sparse checkout patterns Matheus Tavares 2020-03-24 7:15 ` Elijah Newren 2020-03-24 15:12 ` Derrick Stolee 2020-03-24 16:16 ` Elijah Newren 2020-03-24 17:02 ` Derrick Stolee 2020-03-24 23:01 ` Matheus Tavares Bernardino 2020-03-24 22:55 ` Matheus Tavares Bernardino 2020-04-21 2:10 ` Matheus Tavares Bernardino 2020-04-21 3:08 ` Elijah Newren 2020-04-22 12:08 ` Derrick Stolee 2020-04-23 6:09 ` Matheus Tavares Bernardino 2020-03-24 6:13 ` [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Matheus Tavares 2020-03-24 7:54 ` Elijah Newren 2020-03-24 18:30 ` Junio C Hamano 2020-03-24 19:07 ` Elijah Newren 2020-03-25 20:18 ` Junio C Hamano 2020-03-30 3:23 ` Matheus Tavares Bernardino 2020-03-31 19:12 ` Elijah Newren 2020-03-31 20:02 ` Derrick Stolee 2020-04-27 17:15 ` Matheus Tavares Bernardino 2020-04-29 16:46 ` Elijah Newren 2020-04-29 17:21 ` Elijah Newren 2020-03-25 23:15 ` Matheus Tavares Bernardino 2020-03-26 6:02 ` Elijah Newren 2020-03-27 15:51 ` Junio C Hamano 2020-03-27 19:01 ` Elijah Newren 2020-03-30 1:12 ` Matheus Tavares Bernardino 2020-03-31 16:48 ` Elijah Newren 2020-05-10 0:41 ` [RFC PATCH v2 0/4] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 1/4] doc: grep: unify info on configuration variables Matheus Tavares 2020-05-10 0:41 ` [RFC PATCH v2 2/4] config: load the correct config.worktree file Matheus Tavares 2020-05-11 19:10 ` Junio C Hamano 2020-05-12 22:55 ` Matheus Tavares Bernardino 2020-05-12 23:22 ` Junio C Hamano 2020-05-10 0:41 ` [RFC PATCH v2 3/4] grep: honor sparse checkout patterns Matheus Tavares 2020-05-11 19:35 ` Junio C Hamano 2020-05-13 0:05 ` Matheus Tavares Bernardino 2020-05-13 0:17 ` Junio C Hamano 2020-05-21 7:26 ` Elijah Newren 2020-05-21 17:35 ` Matheus Tavares Bernardino 2020-05-21 17:52 ` Elijah Newren 2020-05-22 5:49 ` Matheus Tavares Bernardino 2020-05-22 14:26 ` Elijah Newren 2020-05-22 15:36 ` Elijah Newren 2020-05-22 20:54 ` Matheus Tavares Bernardino 2020-05-22 21:06 ` Elijah Newren 2020-06-10 11:40 ` Derrick Stolee 2020-06-10 16:22 ` Matheus Tavares Bernardino 2020-06-10 17:42 ` Derrick Stolee 2020-06-10 18:14 ` Matheus Tavares Bernardino 2020-06-10 20:12 ` Elijah Newren 2020-06-10 19:58 ` Elijah Newren 2020-05-21 7:36 ` Elijah Newren 2020-05-10 0:41 ` [RFC PATCH v2 4/4] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-05-10 4:23 ` Matheus Tavares Bernardino 2020-05-21 17:18 ` Elijah Newren 2020-05-21 7:09 ` Elijah Newren 2020-05-28 1:12 ` [PATCH v3 0/5] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-05-28 1:12 ` [PATCH v3 1/5] doc: grep: unify info on configuration variables Matheus Tavares 2020-05-28 1:13 ` [PATCH v3 2/5] t/helper/test-config: return exit codes consistently Matheus Tavares 2020-05-30 14:29 ` Elijah Newren 2020-06-01 4:36 ` Matheus Tavares Bernardino 2020-05-28 1:13 ` [PATCH v3 3/5] config: correctly read worktree configs in submodules Matheus Tavares 2020-05-30 14:49 ` Elijah Newren 2020-06-01 4:38 ` Matheus Tavares Bernardino 2020-05-28 1:13 ` [PATCH v3 4/5] grep: honor sparse checkout patterns Matheus Tavares 2020-05-30 15:48 ` Elijah Newren 2020-06-01 4:44 ` Matheus Tavares Bernardino 2020-06-03 2:38 ` Elijah Newren 2020-06-10 17:08 ` Matheus Tavares Bernardino 2020-05-28 1:13 ` [PATCH v3 5/5] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-05-30 16:18 ` Elijah Newren 2020-06-01 4:45 ` Matheus Tavares Bernardino 2020-06-03 2:39 ` Elijah Newren 2020-06-10 21:15 ` Matheus Tavares Bernardino 2020-06-11 0:35 ` Elijah Newren 2020-06-12 15:44 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-06-12 15:44 ` [PATCH v4 1/6] doc: grep: unify info on configuration variables Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 2/6] t/helper/test-config: return exit codes consistently Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 3/6] t/helper/test-config: facilitate addition of new cli options Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 4/6] config: correctly read worktree configs in submodules Matheus Tavares 2020-06-16 19:13 ` Elijah Newren 2020-06-21 16:05 ` Matheus Tavares Bernardino 2020-09-01 2:41 ` Jonathan Nieder 2020-09-01 21:44 ` Matheus Tavares Bernardino 2020-06-12 15:45 ` [PATCH v4 5/6] grep: honor sparse checkout patterns Matheus Tavares 2020-06-12 15:45 ` [PATCH v4 6/6] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-06-16 22:31 ` [PATCH v4 0/6] grep: honor sparse checkout and add option to ignore it Elijah Newren 2020-09-02 6:17 ` [PATCH v5 0/8] " Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 1/8] doc: grep: unify info on configuration variables Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 2/8] t1308-config-set: avoid false positives when using test-config Matheus Tavares 2020-09-02 6:57 ` Eric Sunshine 2020-09-02 16:16 ` Matheus Tavares Bernardino 2020-09-02 16:38 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 3/8] t/helper/test-config: be consistent with exit codes Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 4/8] t/helper/test-config: check argc before accessing argv Matheus Tavares 2020-09-02 7:18 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 5/8] t/helper/test-config: unify exit labels Matheus Tavares 2020-09-02 7:30 ` Eric Sunshine 2020-09-02 6:17 ` [PATCH v5 6/8] config: correctly read worktree configs in submodules Matheus Tavares 2020-09-02 20:15 ` Jonathan Nieder 2020-09-09 13:04 ` Matheus Tavares Bernardino 2020-09-09 23:32 ` Jonathan Nieder 2020-09-02 6:17 ` [PATCH v5 7/8] grep: honor sparse checkout patterns Matheus Tavares 2020-09-02 6:17 ` [PATCH v5 8/8] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 0/9] grep: honor sparse checkout and add option to ignore it Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 1/9] doc: grep: unify info on configuration variables Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 2/9] t1308-config-set: avoid false positives when using test-config Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 3/9] t/helper/test-config: be consistent with exit codes Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 4/9] t/helper/test-config: diagnose missing arguments Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 5/9] t/helper/test-config: unify exit labels Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 6/9] config: make do_git_config_sequence receive a 'struct repository' Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 7/9] config: correctly read worktree configs in submodules Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 8/9] grep: honor sparse checkout patterns Matheus Tavares 2020-09-10 17:21 ` [PATCH v6 9/9] config: add setting to ignore sparsity patterns in some cmds Matheus Tavares 2021-02-09 21:33 ` [PATCH v7] grep: honor sparse-checkout on working tree searches Matheus Tavares 2021-02-09 23:23 ` Junio C Hamano 2021-02-10 6:12 ` Elijah Newren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).