From: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
To: Derrick Stolee <derrickstolee@github.com>, git@vger.kernel.org
Cc: vdye@github.com
Subject: Re: [PATCH v1 1/2] builtin/grep.c: add --sparse option
Date: Thu, 25 Aug 2022 02:20:27 +0800 [thread overview]
Message-ID: <74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com> (raw)
In-Reply-To: <80f24382-1188-d450-d1e2-42f68c08e60b@github.com>
Hi reviewrs,
I came back from busying with relocation :)
On 8/17/2022 10:12 PM, Derrick Stolee wrote:
> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>> Add a --sparse option to `git-grep`. This option is mainly used to:
>>
>> If searching in the index (using --cached):
>>
>> With --sparse, proceed the action when the current cache_entry is
>
> This phrasing is awkward. It might be better to reframe to describe the
> _why_ before the _what_
>
> When the '--cached' option is used with the 'git grep' command, the
> search is limited to the blobs found in the index, not in the worktree.
> If the user has enabled sparse-checkout, this might present more
results
> than they would like, since the files outside of the
sparse-checkout are
> unlikely to be important to them.
>
> Change the default behavior of 'git grep' to focus on the files within
> the sparse-checkout definition. To enable the previous behavior, add a
> '--sparse' option to 'git grep' that triggers the old behavior that
> inspects paths outside of the sparse-checkout definition when paired
> with the '--cached' option.
Good suggestion!
> Or something like that. The documentation updates will also help clarify
> what happens when '--cached' is not included. I assume '--sparse' is
> ignored, but perhaps it _could_ allow looking at the cached files outside
> the sparse-checkout definition, this could make the simpler invocation of
> 'git grep --sparse <pattern>' be the way that users can search after
their
> attempt to search the worktree failed.
This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.
I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."
To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.
>> marked with SKIP_WORKTREE bit (the default is to skip this kind of
>> entry). Before this patch, --cached itself can realize this action.
>> Adding --sparse here grants the user finer control over sparse
>> entries. If the user only wants to peak into the index without
>
> s/peak/peek/
>
>> caring about sparse entries, --cached should suffice; if the user
>> wants to peak into the index _and_ cares about sparse entries,
>> combining --sparse with --cached can address this need.
>>
>> Suggested-by: Victoria Dye <vdye@github.com>
>> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>> ---
>> builtin/grep.c | 10 +++++++++-
>> t/t7817-grep-sparse-checkout.sh | 12 ++++++------
>> 2 files changed, 15 insertions(+), 7 deletions(-)
>
> You mentioned in Slack that you missed the documentation of the --sparse
> option. Just pointing it out here so we don't forget.
Will do.
>>
>> diff --git a/builtin/grep.c b/builtin/grep.c
>> index e6bcdf860c..61402e8084 100644
>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
>>
>> static int skip_first_line;
>>
>> +static int grep_sparse = 0;
>> +
>
> I initially thought it might be good to not define an additional global,
> but there are many defined in this file outside of the context and they
> are spread out with extra whitespace like this.
>
>> static void add_work(struct grep_opt *opt, struct grep_source *gs)
>> {
>> if (opt->binary != GREP_BINARY_TEXT)
>> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
>> for (nr = 0; nr < repo->index->cache_nr; nr++) {
>> const struct cache_entry *ce = repo->index->cache[nr];
>>
>> - if (!cached && ce_skip_worktree(ce))
>
> This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
> was provided.
>
>> + /*
>> + * If ce is a SKIP_WORKTREE entry, look into it when both
>> + * --sparse and --cached are given.
>> + */
>> + if (!(grep_sparse && cached) && ce_skip_worktree(ce))
>> continue;
>
> The logic of this if statement is backwards from the comment because a
> true statement means "skip the entry" _not_ "look into it".
>
> /*
> * Skip entries with SKIP_WORKTREE unless both --sparse and
> * --cached are given.
> */
Got it.
> But again, we might want to consider this alternative:
>
> /*
> * Skip entries with SKIP_WORKTREE unless --sparse is given.
> */
> if (!grep_sparse && ce_skip_worktree(ce))
> continue;
>
> This will require further changes below, specifically this bit:
>
> /*
> * If CE_VALID is on, we assume worktree file and its
> * cache entry are identical, even if worktree file has
> * been modified, so use cache version instead
> */
> if (cached || (ce->ce_flags & CE_VALID)) {
> if (ce_stage(ce) || ce_intent_to_add(ce))
> continue;
> hit |= grep_oid(opt, &ce->oid, name.buf,
> 0, name.buf);
> } else {
>
> We need to activate this grep_oid() call also when ce_skip_worktree(c) is
> true. That is, if we want 'git grep --sparse' to extend the search beyond
> the worktree and into the sparse entries.
>
>>
>> strbuf_setlen(&name, name_base_len);
>> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const
char *prefix)
>> PARSE_OPT_NOCOMPLETE),
>> OPT_INTEGER('m', "max-count", &opt.max_count,
>> N_("maximum number of results per file")),
>> + OPT_BOOL(0, "sparse", &grep_sparse,
>> + N_("search sparse contents and expand sparse index")),
>
> This "and expand sparse index" is an internal implementation detail,
not a
> heplful item for the help text. Instead, perhaps:
>
> "search the contents of files outside the sparse-checkout definition"
Sounds good!
> (Also, while the sparse index is being expanded right now, I would expect
> to not expand the sparse index by the end of the series.)
>
>> -test_expect_success 'grep --cached searches entries with the
SKIP_WORKTREE bit' '
>> +test_expect_success 'grep --cached and --sparse searches entries
with the SKIP_WORKTREE bit' '
>> cat >expect <<-EOF &&
>> a:text
>> b:text
>> dir/c:text
>> EOF
>> - git grep --cached "text" >actual &&
>> + git grep --cached --sparse "text" >actual &&
>> test_cmp expect actual
>> '
>
> Please add a test that demonstrates the change of behavior when only
--cached
> is provided, not --sparse.
Sure!
> (If you take my suggestion to allow 'git grep --sparse' to do something
> different, then also add a test for that case.)
>
>>
>> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules
honors sparse checkout in submodu
>> test_cmp expect actual
>> '
>>
>> -test_expect_success 'grep --recurse-submodules --cached searches
entries with the SKIP_WORKTREE bit' '
>> +test_expect_success 'grep --recurse-submodules --cached and
--sparse searches entries with the SKIP_WORKTREE bit' '
>> cat >expect <<-EOF &&
>> a:text
>> b:text
>> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules
--cached searches entries with th
>> sub/B/b:text
>> sub2/a:text
>> EOF
>> - git grep --recurse-submodules --cached "text" >actual &&
>> + git grep --recurse-submodules --cached --sparse "text" >actual &&
>> test_cmp expect actual
>> '
>> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not
search the index with CE_VALID a
>> test_cmp expect actual
>> '
>>
>> -test_expect_success 'grep --cached searches index entries with both
CE_VALID and SKIP_WORKTREE' '
>> +test_expect_success 'grep --cached and --sparse searches index
entries with both CE_VALID and SKIP_WORKTREE' '
>> cat >expect <<-EOF &&
>> a:text
>> b:text
>> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches
index entries with both CE_VALID and
>> EOF
>> test_when_finished "git update-index --no-assume-unchanged b" &&
>> git update-index --assume-unchanged b &&
>> - git grep --cached text >actual &&
>> + git grep --cached --sparse text >actual &&
>> test_cmp expect actual
>> '
>
> Same with these two tests. Add additional commands that show the
change of
> behavior when only using '--cached'.
--
Thanks,
Shaoxuan
next prev parent reply other threads:[~2022-08-24 18:20 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-17 7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17 7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12 ` Derrick Stolee
2022-08-17 17:13 ` Junio C Hamano
2022-08-17 17:34 ` Victoria Dye
2022-08-17 17:43 ` Derrick Stolee
2022-08-17 18:47 ` Junio C Hamano
2022-08-17 17:37 ` Elijah Newren
2022-08-24 18:20 ` Shaoxuan Yuan [this message]
2022-08-24 19:08 ` Derrick Stolee
2022-08-17 7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23 ` Derrick Stolee
2022-08-24 21:06 ` Shaoxuan Yuan
2022-08-25 0:39 ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28 ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28 ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45 ` Derrick Stolee
2022-09-01 4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03 ` Derrick Stolee
2022-09-01 18:31 ` Shaoxuan Yuan
2022-09-01 17:17 ` Junio C Hamano
2022-09-01 17:27 ` Junio C Hamano
2022-09-01 22:49 ` Shaoxuan Yuan
2022-09-01 22:36 ` Shaoxuan Yuan
2022-09-02 3:28 ` Victoria Dye
2022-09-02 18:47 ` Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03 4:39 ` Junio C Hamano
2022-09-08 0:24 ` Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10 1:07 ` Victoria Dye
2022-09-14 6:08 ` Elijah Newren
2022-09-15 2:57 ` Junio C Hamano
2022-09-18 2:14 ` Elijah Newren
2022-09-18 19:52 ` Victoria Dye
2022-09-19 1:23 ` Junio C Hamano
2022-09-19 4:27 ` Shaoxuan Yuan
2022-09-19 11:03 ` Ævar Arnfjörð Bjarmason
2022-09-20 7:13 ` Elijah Newren
2022-09-17 3:34 ` Shaoxuan Yuan
2022-09-18 4:24 ` Elijah Newren
2022-09-19 4:13 ` Shaoxuan Yuan
2022-09-17 3:45 ` Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59 ` Junio C Hamano
2022-09-08 20:46 ` Derrick Stolee
2022-09-08 20:56 ` Junio C Hamano
2022-09-08 21:06 ` Shaoxuan Yuan
2022-09-09 12:49 ` Derrick Stolee
2022-09-13 17:23 ` Junio C Hamano
2022-09-10 2:04 ` Victoria Dye
2022-09-23 4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23 4:18 ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40 ` Junio C Hamano
2022-09-23 16:58 ` Junio C Hamano
2022-09-26 17:28 ` Junio C Hamano
2022-09-23 14:13 ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01 ` Victoria Dye
2022-09-23 17:08 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=74ff1a97-fa98-7280-9d84-35dafaf3cb3d@gmail.com \
--to=shaoxuan.yuan02@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).