git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>, Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates
Date: Sat, 21 Sep 2019 17:34:35 -0300	[thread overview]
Message-ID: <CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqh857vsqz.fsf@gitster-ct.c.googlers.com>

On Fri, Sep 20, 2019 at 1:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:
>
> > Hmm, I may have gotten a little confused here. Are you talking
> > about the attributes stack (which contains .gitattributes and
> > info/attributes)?  If so, isn't this stack already rebuild for
> > every path? I mean, by the previous call chain it seems to me that
> > at least these two files are reread for every path.
>
> Yes, but for the switch that happens when coming out of a normal
> directory and then descending into another normal directory is just
> to pop the entries from the directory hierarchy we are getting out
> of, and then pushing the entries from the new directory hierarchy.
> We would not be discarding and rereading $GIT_DIR/info/ or the
> .gitattribute file from the top-level of the working tree.

Right, this would be the best way of doing it. However, I think this
is not how it's currently implemented. I if correctly understood the
code in this call chain:

grep_source_load_driver() >  userdiff_find_by_path() >
git_check_attr() > collect_some_attrs() > prepare_attr_stack() >
bootstrap_attr_stack()

it seems that the whole stack is being rebuild for every path (even
for paths descending in the same superproject or submodule). So
$GIT_DIR/info/ and .gitattributes are being discarded and reread every
time :(

> Descending into a submodule is fundamentally and completely
> different.  None of the attributes defined in the superproject
> should affect the paths in the submodule, as it is a totally
> separate project, oblivious to the existence of enclosing the
> superproject.

I think we currently have a bug here as the attributes from the
superproject *are* affecting the paths in the submodule. Here is a
small script I wrote to test this: https://gitlab.com/snippets/1896951

The cause of this problem is that boostrap_attr_stack() doesn't read
"<subrepo_prefix>/.gitattributes" but just ".gitattributes", always
getting the superproject's file not the suprepo's. Yet another problem
is that when this file is not present and we need to retrieve it from
the index, this function calls read_attr() > read_attr_from_index() >
read_blob_data_from_index(). The last one always reads from
the_repository's odb, so it won't ever find the subrepo's
".gitattributes".

And a third bug is that when reading attributes of paths inside
subrepo's directories, from index, we call read_attr_from_index() with
a path such as "<subrepo_prefix>/<subdir>/.gitattributes". However,
the subrepo_prefix should be stripped when looking in the subrepo's
index, otherwise there will be no matches.

To fix these three problems, I think we would need to pass on a struct
repository in these call chains. But this would require a very big
modification as there are many places that can lead to one of them...
And there're corner cases such as index_stream_convert_blob() which
would need to receive a struct repository but it always writes to
the_repo, which would be kind of inconsistent. Do you think this would
be a good solution or should I try something else?

  reply	other threads:[~2019-09-21 20:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18  1:56 [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 1/3] diff: use the given repo at diff_populate_filespec() Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 2/3] object: allow parse_object_or_die() to handle any repo Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 3/3] grep: don't add submodules to the alternates list Matheus Tavares
2019-09-21 21:58   ` Brandon Williams
2019-09-18 19:55 ` [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Junio C Hamano
2019-09-19  5:18   ` Matheus Tavares Bernardino
2019-09-20 16:26     ` Junio C Hamano
2019-09-21 20:34       ` Matheus Tavares Bernardino [this message]
2019-09-28  3:24         ` Junio C Hamano
2019-09-28  4:20           ` Matheus Tavares Bernardino
2021-09-27 12:08 [PATCH v3 6/8] grep: add repository to OID grep sources Ævar Arnfjörð Bjarmason
2021-09-27 16:45 ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
2021-09-27 17:30   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com \
    --to=matheus.bernardino@usp.br \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).