All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>, Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates
Date: Thu, 19 Sep 2019 02:18:57 -0300	[thread overview]
Message-ID: <CAHd-oW4u+iPFMvSGNvSJxfPLE34dQQ3x5_aQ-Y4Pd99FXR1p7Q@mail.gmail.com> (raw)
In-Reply-To: <xmqq36gt5qhr.fsf@gitster-ct.c.googlers.com>

On Wed, Sep 18, 2019 at 4:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
[...]
> > - textconv cache is written to the_repository's object database even for
> >   submodules. Should it perhaps be written to submodules' odb instead?
>
> You mention "is written", but that is what happens upon a cache
> miss.  Before the code notices a cache miss, it must be checking if
> a cached result is available.  In which odb is it done?  Writing
> that follow the miss should happen to the same one, or the cache is
> not very effective.

Right. Both writing and reading operations on the textconv cache
always happen in the_repository's odb.

> Because you would want the cache to be effective, after running "git
> grep --recurse-submodules" from the top-level, when you chdir down
> to the submodule and say "git grep" to dig further, the answer to
> your question is most likely "yes".

OK, makes sense. To keep this series simple, I think I'll try to work
on this in a following patchset.

> > - Considering the following call chain: grep_source_load_driver() >
> >   userdiff_find_by_path() > git_check_attr() > collect_some_attrs() >
> >   prepare_attr_stack() > bootstrap_attr_stack():
> >
> >   * The last function tries to read the attributes from the
> >     .gitattributes and .git/info/attributes files of the_repository.
> >     However, for paths inside the submodule, shouldn't it try to read
> >     these files from the submodule?
>
> Yes, I think all of those would have worked correctly if we forked a
> separate "git grep" inside submodule repository, but in the rush to
> "do everything in process", many things like these are not done
> correctly.  As there is only one attribute cache IIUC, invalidating
> the whole cache for the top-level and replacing it with the one for
> a submodule, every time we cross the module boundary, would probably
> have a negative effect on the performance, and I am not sure what
> would happen if you run more than one threads working in different
> repositories (i.e. top-level and submodules).

Hmm, I may have gotten a little confused here. Are you talking about
the attributes stack (which contains .gitattributes and
info/attributes)? If so, isn't this stack already rebuild for every
path? I mean, by the previous call chain it seems to me that at least
these two files are reread for every path.

> >   * This function will also call: read_attr() > read_attr_from_index() >
> >     read_blob_data_from_index() which might, in turn, call
> >     read_object_file(). Shouldn't we pass the subrepo to it so that it
> >     can call repo_read_object_file()? (Again, for paths inside the
> >     submodule, read_object_file() won't be able to find the object as
> >     we won't be adding to alternates anymore.)

Just as a side note, I noticed another problem with this: in a
submodule with a non-checked-out .gitattributes, `git grep --textconv`
will successfully retrieve the file from the index. But running `git
grep --recurse-submodules --textconv` from the superproject won't. The
problem is that read_blob_data_from_index() doesn't have the
repository struct and thus cannot strip the subrepo_prefix from the
path (so it won't find it in the subrepo's index). I'll try to fix
this in this series as well.

  reply	other threads:[~2019-09-19  5:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18  1:56 [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 1/3] diff: use the given repo at diff_populate_filespec() Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 2/3] object: allow parse_object_or_die() to handle any repo Matheus Tavares
2019-09-18  1:56 ` [RFC PATCH 3/3] grep: don't add submodules to the alternates list Matheus Tavares
2019-09-21 21:58   ` Brandon Williams
2019-09-18 19:55 ` [RFC PATCH 0/3] grep: don't add subrepos to in-memory alternates Junio C Hamano
2019-09-19  5:18   ` Matheus Tavares Bernardino [this message]
2019-09-20 16:26     ` Junio C Hamano
2019-09-21 20:34       ` Matheus Tavares Bernardino
2019-09-28  3:24         ` Junio C Hamano
2019-09-28  4:20           ` Matheus Tavares Bernardino
2021-09-27 12:08 [PATCH v3 6/8] grep: add repository to OID grep sources Ævar Arnfjörð Bjarmason
2021-09-27 16:45 ` [RFC PATCH 0/3] grep: don'\''t add subrepos to in-memory alternates Matheus Tavares
2021-09-27 17:30   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHd-oW4u+iPFMvSGNvSJxfPLE34dQQ3x5_aQ-Y4Pd99FXR1p7Q@mail.gmail.com \
    --to=matheus.bernardino@usp.br \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.