All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git@vger.kernel.org, gitster@pobox.com, jonathantanmy@google.com
Subject: Re: [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates
Date: Mon, 27 Sep 2021 19:30:56 +0200	[thread overview]
Message-ID: <87zgryylfx.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <296230071f2cabda213b2a5f6f435f9308718569.1632754555.git.matheus.bernardino@usp.br>


On Mon, Sep 27 2021, Matheus Tavares wrote:

> On Mon, Sep 27, 2021 at 9:09 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Aug 16 2021, Jonathan Tan wrote:
>>
>> > Record the repository whenever an OID grep source is created, and teach
>> > the worker threads to explicitly provide the repository when accessing
>> > objects.
>> > [...]
>> > diff --git a/grep.h b/grep.h
>> > index 480b3f5bba..128007db65 100644
>> > --- a/grep.h
>> > +++ b/grep.h
>> > @@ -120,7 +120,20 @@ struct grep_opt {
>> >       struct grep_pat *header_list;
>> >       struct grep_pat **header_tail;
>> >       struct grep_expr *pattern_expression;
>> > +
>> > +     /*
>> > +      * NEEDSWORK: See if we can remove this field, because the repository
>> > +      * should probably be per-source. That is, grep.c functions using this
>> > +      * field should probably start using "repo" in "struct grep_source"
>> > +      * instead.
>> > +      *
>> > +      * This is potentially the cause of at least one bug - "git grep"
>> > +      * ignoring the textconv attributes from submodules. See [1] for more
>> > +      * information.
>> > +      * [1] https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
>> > +      */
>> >       struct repository *repo;
>> > +
>>
>> I ran into this comment and read the linked E-Mail, and then the
>> downthread
>> https://lore.kernel.org/git/CAHd-oW6uG1fap-T4UF17bJmjoHAqWCDq9KbY+_8a3cEnnfATxg@mail.gmail.com/;
>>
>> Given Matheus's "I've somehow missed this guard and the..." there I'm
>> not quite sure what/if we should be doing here & what this comment is
>> recommending? I.e. do we still need to adjust the call chains as noted
>> in the E-Mail the comment links to, or not?
>
> I think we should still adjust the call chains, yes. The downthread
> message you mentioned is kind of a tangent about performance, where
> Junio helped me understand something I had previously missed in the
> code, regarding the persistence of the attributes stack.
>
> But the issue that started the thread was about a correctness problem:
> the superproject textconv attributes are being used on submodules'
> files when running `git grep` with `--recurse-submodules --textconv`.
> The three cases to consider are:
>
> - .gitattributes from the working tree
> - .gitattributes from the index
> - .git/info/attributes
>
> On all these cases, the superproject attributes are being used on the
> submodule. Additionally, if the superproject does not define any
> attribute, the submodule attributes are being ignored in all cases
> except by the first one (but that is only because the code sees the
> .gitattributes file on the submodule as if it were a "regular"
> subdirectory of the surperproject. So the submodule's .gitattribures
> takes higher precedence when evaluating the attributes for files in
> that directory).
>
> Another issue is that the textconv cache is always saved to (and read
> from) the superproject gitdir, even for submodules' files.
>
> Here are some test cases that demonstrate these issues:
>
> -- snipsnap --
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 3172f5b936..d01a3bc5d8 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -441,4 +441,104 @@ test_expect_success 'grep --recurse-submodules with --cached ignores worktree mo
>  	test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 &&
>  	test_must_be_empty actual
>  '
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .gitattributes (from index) does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >.gitattributes &&
> +	git add .gitattributes &&
> +	rm .gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv: superproject .git/info/attributes does not affect submodules' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	super_attr="$(git rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$super_attr" &&
> +	echo "a diff=d2x" >"$super_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'grep --textconv corectly reads submodule .gitattributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .gitattributes (from index)' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +	echo "a diff=d2x" >submodule/.gitattributes &&
> +	git -C submodule add .gitattributes &&
> +	rm submodule/.gitattributes &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep --textconv corectly reads submodule .git/info/attributes' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x.textconv "sed -e \"s/d/x/\"" &&
> +
> +	# Workaround: we use --path-format=relative because the absolute path
> +	# contains whitespaces and that seems to confuse test_when_finished
> +	#
> +	submodule_attr="submodule/$(git -C submodule rev-parse --path-format=relative --git-path info/attributes)" &&
> +	test_when_finished rm -f "$submodule_attr" &&
> +	echo "a diff=d2x" >"$submodule_attr" &&
> +
> +	cat >expect <<-\EOF &&
> +	submodule/a:(1|2)x(3|4)
> +	EOF
> +	git grep --textconv --recurse-submodules x >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_failure 'grep saves textconv cache in the appropriated repository' '
> +	reset_and_clean &&
> +	test_config_global diff.d2x_cached.textconv "sed -e \"s/d/x/\"" &&
> +	test_config_global diff.d2x_cached.cachetextconv true &&
> +	echo "a diff=d2x_cached" >submodule/.gitattributes &&
> +
> +	# Note: we only read/write to the textconv cache when grepping from an
> +	# OID as the working tree file might have modifications. That is why
> +	# we use --cached here.
> +	#
> +	git grep --textconv --cached --recurse-submodules x &&
> +	test_path_is_missing "$(git rev-parse --git-path refs/notes/textconv/d2x_cached)" &&
> +	test_path_is_file "$(git -C submodule rev-parse --git-path refs/notes/textconv/d2x_cached)"
> +'
> +
>  test_done

Thanks! I think it would be very good to have these tests in-tree along
with an updated comment pointing to them.


  reply	other threads:[~2021-09-27 17:46 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 18:28 [PATCH 0/7] In grep, no adding submodule ODB as alternates Jonathan Tan
2021-08-10 18:28 ` [PATCH 1/7] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-10 21:13   ` Junio C Hamano
2021-08-13 16:53     ` Jonathan Tan
2021-08-11 21:33   ` Emily Shaffer
2021-08-13 16:23     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 2/7] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-11 21:36   ` Emily Shaffer
2021-08-13 16:31     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 3/7] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-10 21:38   ` Junio C Hamano
2021-08-11 21:42   ` Emily Shaffer
2021-08-11 23:07     ` Ramsay Jones
2021-08-13 16:32       ` Jonathan Tan
2021-08-11 22:45   ` Matheus Tavares Bernardino
2021-08-12 16:49     ` Junio C Hamano
2021-08-13 16:33       ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 4/7] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-11 21:44   ` Emily Shaffer
2021-08-13 16:39     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 5/7] grep: allocate subrepos on heap Jonathan Tan
2021-08-11 21:50   ` Emily Shaffer
2021-08-13 16:42     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 6/7] grep: add repository to OID grep sources Jonathan Tan
2021-08-11 21:52   ` Emily Shaffer
2021-08-13 16:44     ` Jonathan Tan
2021-08-11 23:28   ` Matheus Tavares Bernardino
2021-08-13 16:47     ` Jonathan Tan
2021-08-10 18:28 ` [PATCH 7/7] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-11 21:55   ` Emily Shaffer
2021-08-11 22:22     ` Matheus Tavares Bernardino
2021-08-13 16:50       ` Jonathan Tan
2021-08-11 21:29 ` [PATCH 0/7] In grep, no adding submodule ODB as alternates Emily Shaffer
2021-08-11 22:49 ` Josh Steadmon
2021-08-13 21:05 ` [PATCH v2 0/8] " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 15:06     ` Matheus Tavares Bernardino
2021-08-13 21:05   ` [PATCH v2 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-13 21:44     ` Junio C Hamano
2021-08-16 19:42       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-08-16 14:48     ` Matheus Tavares Bernardino
2021-08-16 19:44       ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 14:32     ` Matheus Tavares Bernardino
2021-08-16 19:57       ` Matheus Tavares Bernardino
2021-08-16 20:02       ` Jonathan Tan
2021-08-16 15:48     ` Matheus Tavares Bernardino
2021-08-16 20:09       ` Jonathan Tan
2021-08-16 20:57         ` Jonathan Tan
2021-08-13 21:05   ` [PATCH v2 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-16 15:14   ` [PATCH v2 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-08-16 21:09 ` [PATCH v3 " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 1/8] submodule: lazily add submodule ODBs " Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 2/8] grep: use submodule-ODB-as-alternate lazy-addition Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 3/8] grep: typesafe versions of grep_source_init Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 4/8] grep: read submodule entry with explicit repo Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 5/8] grep: allocate subrepos on heap Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 6/8] grep: add repository to OID grep sources Jonathan Tan
2021-09-27 12:08     ` Ævar Arnfjörð Bjarmason
2021-09-27 16:45       ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
2021-09-27 17:30         ` Ævar Arnfjörð Bjarmason [this message]
2021-08-16 21:09   ` [PATCH v3 7/8] submodule-config: pass repo upon blob config read Jonathan Tan
2021-08-16 21:09   ` [PATCH v3 8/8] t7814: show lack of alternate ODB-adding Jonathan Tan
2021-08-17 19:29   ` [PATCH v3 0/8] In grep, no adding submodule ODB as alternates Matheus Tavares Bernardino
2021-09-08  0:26   ` Junio C Hamano
2021-09-08 15:31     ` Matheus Tavares Bernardino
2021-09-08 18:45       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2019-09-18  1:56 [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Matheus Tavares
2019-09-18 19:55 ` Junio C Hamano
2019-09-19  5:18   ` Matheus Tavares Bernardino
2019-09-20 16:26     ` Junio C Hamano
2019-09-21 20:34       ` Matheus Tavares Bernardino
2019-09-28  3:24         ` Junio C Hamano
2019-09-28  4:20           ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgryylfx.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=matheus.bernardino@usp.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.