All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] cache-tree: prefetch in partial clone read-tree
Date: Fri, 23 Jul 2021 14:34:34 -0700	[thread overview]
Message-ID: <CABPp-BHtJPp3qi3ww3EcxxzyCUU8DjS1ZWnEQfd4A2rqXyGUXg@mail.gmail.com> (raw)
In-Reply-To: <f4881b7455b9d33c8a53a91eda7fbdfc5d11382c.1627066238.git.jonathantanmy@google.com>

On Fri, Jul 23, 2021 at 11:55 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> "git read-tree" checks the existence of the blobs referenced by the
> given tree, but does not bulk prefetch them. Add a bulk prefetch.
>
> The lack of prefetch here was noticed at $DAYJOB during a merge
> involving some specific commits, but I couldn't find a minimal merge
> that didn't also trigger the prefetch in check_updates() in
> unpack-trees.c (and in all these cases, the lack of prefetch in
> cache-tree.c didn't matter because all the relevant blobs would have
> already been prefetched by then). This is why I used read-tree here to
> exercise this code path.

Okay, you have me stumped, I can't figure out what kind of a merge
would bypass the check_updates() in unpack-trees.c either.  I was
curious about octopus or merge.autostash, but I just can't trigger it.

Using read-tree to trigger the case makes perfect sense, though.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  cache-tree.c                       | 11 ++++++++--
>  t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 2 deletions(-)
>  create mode 100755 t/t1022-read-tree-partial-clone.sh
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 45e58666af..9ba2c7c6b2 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it)
>         return 1;
>  }
>
> +static int must_check_existence(const struct cache_entry *ce)
> +{
> +       return !(has_promisor_remote() && ce_skip_worktree(ce));
> +}
> +
>  static int update_one(struct cache_tree *it,
>                       struct cache_entry **cache,
>                       int entries,
> @@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it,
>                 }
>
>                 ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> -                       (has_promisor_remote() &&
> -                        ce_skip_worktree(ce));
> +                       !must_check_existence(ce);
>                 if (is_null_oid(oid) ||
>                     (!ce_missing_ok && !has_object_file(oid))) {
>                         strbuf_release(&buffer);
> @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags)
>         if (!istate->cache_tree)
>                 istate->cache_tree = cache_tree();
>
> +       if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
> +               prefetch_cache_entries(istate, must_check_existence);
> +

Nice that the fix is so simple.

>         trace_performance_enter();
>         trace2_region_enter("cache_tree", "update", the_repository);
>         i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
> new file mode 100755
> index 0000000000..a763e27c7d
> --- /dev/null
> +++ b/t/t1022-read-tree-partial-clone.sh
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +
> +test_description='git read-tree in partial clones'
> +
> +TEST_NO_CREATE_REPO=1
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'read-tree in partial clone prefetches in one batch' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       git init server &&
> +       echo foo >server/one &&
> +       echo bar >server/two &&
> +       git -C server add one two &&
> +       git -C server commit -m "initial commit" &&
> +       TREE=$(git -C server rev-parse HEAD^{tree}) &&
> +
> +       git -C server config uploadpack.allowfilter 1 &&
> +       git -C server config uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
> +
> +       # "done" marks the end of negotiation (once per fetch). Expect that
> +       # only one fetch occurs.
> +       grep "fetch> done" trace >donelines &&
> +       test_line_count = 1 donelines
> +'
> +
> +test_done
> --
> 2.32.0.432.gabb21c7263-goog

Any reason for preferring GIT_TRACE_PACKET over GIT_TRACE2_PERF and
looking for the reported fetch_count (or even the number of
fetch_count lines)?  Just curious.

Anyway, looks good to me.

  parent reply	other threads:[~2021-07-23 21:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 18:52 [PATCH 0/2] Another partial clone prefetch Jonathan Tan
2021-07-23 18:52 ` [PATCH 1/2] unpack-trees: refactor prefetching code Jonathan Tan
2021-07-23 20:26   ` Elijah Newren
2021-07-23 18:52 ` [PATCH 2/2] cache-tree: prefetch in partial clone read-tree Jonathan Tan
2021-07-23 18:55   ` Jonathan Tan
2021-07-23 21:20   ` Junio C Hamano
2021-07-23 21:34   ` Elijah Newren [this message]
2021-07-26 13:01 ` [PATCH 0/2] Another partial clone prefetch Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHtJPp3qi3ww3EcxxzyCUU8DjS1ZWnEQfd4A2rqXyGUXg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.