All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] cache-tree: avoid needless promisor object fetch
Date: Fri, 23 Apr 2021 14:28:13 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2104231409500.54@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <20210423031659.2362659-1-jonathantanmy@google.com>

Hi Jonathan,

On Thu, 22 Apr 2021, Jonathan Tan wrote:

> In update_one() (used only by cache_tree_update()), there is an object
> existence check that, if it fails, will automatically trigger a lazy
> fetch in a partial clone. But the fetch is not necessary - the object is
> not actually being used.

I find it curious, though, that the `ce_missing_ok` variable is defined
thusly (sadly, the context of your diff is too small to show it):

                ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
                        (has_promisor_remote() &&
                         ce_skip_worktree(ce));

Which means that the `has_object_file()` function is only called if the
entry is not marked with the `skip-worktree` bit, i.e. if it is _not_
excluded from the sparse checkout.

Wouldn't that mean that the object _should_ be there?

I guess what I am saying is that while the commit message focuses on the
"What?" of the patch, I would love to hear more about the "Why?". And
maybe the "When?" as in: when does this actually matter?

And since the bug was critical enough for you to spend time on crafting
it, maybe it would make sense to add a regression test to ensure that this
bug does not creep in again?

>
> Replace that check with two checks: an object existence check that does
> not fetch, and then a check that that object is a promisor object.

This essentially repeats what the diff says, but it might make more sense
to explain why the post-image of this diff is more correct (and maybe
discuss performance implications).

>
> Doing this avoids multiple lazy fetches when merging two trees in a
> partial clone, as noticed at $DAYJOB.

Ah. But where are those trees fetched, then?

Maybe lead with the description of the bug?

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Another alternative is to think about whether the object existence check
> here is needed in the first place.
>
> There might also be other places we can make a similar change in
> update_one(), but I limited myself to what's needed to solve the
> specific case we discovered at $DAYJOB.

I only see another `has_object_file()` call site at the very beginning,
and I think this needs to fetch. Or maybe it is more efficient to
construct the cache tree from scratch than fetch it?

There is also `cache_tree_fully_valid_1()`, where I think the same
handling could potentially make sense. (Or, if you target `seen`,
`cache_tree_fully_valid()`.

Ciao,
Johannes

> ---
>  cache-tree.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index add1f07713..6728722597 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -6,6 +6,7 @@
>  #include "object-store.h"
>  #include "replace-object.h"
>  #include "promisor-remote.h"
> +#include "packfile.h"
>
>  #ifndef DEBUG_CACHE_TREE
>  #define DEBUG_CACHE_TREE 0
> @@ -362,7 +363,9 @@ static int update_one(struct cache_tree *it,
>  			(has_promisor_remote() &&
>  			 ce_skip_worktree(ce));
>  		if (is_null_oid(oid) ||
> -		    (!ce_missing_ok && !has_object_file(oid))) {
> +		    (!ce_missing_ok &&
> +		     !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT) &&
> +		     !is_promisor_object(oid))) {
>  			strbuf_release(&buffer);
>  			if (expected_missing)
>  				return -1;
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
>

  reply	other threads:[~2021-04-23 12:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23  3:16 [PATCH] cache-tree: avoid needless promisor object fetch Jonathan Tan
2021-04-23 12:28 ` Johannes Schindelin [this message]
2021-04-26 19:49   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2104231409500.54@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.