All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jeff King <peff@peff.net>
Cc: Richard Oliver <roliver@roku.com>, Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH] mktree: learn about promised objects
Date: Thu, 16 Jun 2022 09:59:57 -0400	[thread overview]
Message-ID: <b559b9a3-b97a-f394-5845-5c810425f8a4@github.com> (raw)
In-Reply-To: <YqrIrYHKUP6b/EtN@coredump.intra.peff.net>

On 6/16/2022 2:07 AM, Jeff King wrote:
> On Wed, Jun 15, 2022 at 02:17:58PM -0400, Derrick Stolee wrote:
> 
>> On 6/15/2022 1:40 PM, Richard Oliver wrote:
>>> On 15/06/2022 05:00, Jeff King wrote:
>>
>>>> So it is not just lookup, but actual tree walking that is expensive. The
>>>> flip side is that you don't have to store a complete separate list of
>>>> the promised objects. Whether that's a win depends on how many local
>>>> objects you have, versus how many are promised.
>>
>> This is also why blobless (or blob-size filters) are the recommended way
>> to use partial clone. It's just too expensive to have tree misses.
> 
> I agree that tree misses are awful, but I'm actually talking about
> something different: traversing the local trees we _do_ have in order to
> find the set of promised objects. Which is worse for blob:none, because
> it means you have more trees locally. :)

Ah, I misread your email. I agree that walking trees is far too
expensive to do just to find an object type.

> Try this with a big repo like linux.git:
> 
>   git clone --no-local --filter=blob:none linux.git repo
>   cd repo
> 
>   # this is fast; we mark the promisor trees as UNINTERESTING, so we do
>   # not look at them as part of the traversal, and never call
>   # is_promisor_object().
>   time git rev-list --count --objects --all --exclude-promisor-objects
> 
>   # but imagine we had a fixed mktree[1] that did not fault in the blobs
>   # unnecessarily, and we made a new tree that references a promised
>   # blob.
>   tree=$(git ls-tree HEAD~1000 | grep Makefile | git mktree --missing)
>   commit=$(echo foo | git commit-tree -p HEAD $tree)
>   git update-ref refs/heads/foo $commit
> 
>   # this is now slow; even though we only call is_promisor_object()
>   # once, we have to open every single tree in the pack to find it!
>   time git rev-list --count --objects --all --exclude-promisor-objects
> 
> Those rev-lists run in 1.7s and 224s respectively. Ouch!

This is exactly the reason I thought just asking for the objects
directly is faster than scanning all the packs. Thanks for giving
concrete numbers that support that assumption.

Thanks,
-Stolee

  parent reply	other threads:[~2022-06-16 14:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33   ` Richard Oliver
2022-06-14 17:27     ` Derrick Stolee
2022-06-15  0:35       ` Taylor Blau
2022-06-15  4:00         ` Jeff King
2022-06-15 17:40           ` Richard Oliver
2022-06-15 18:17             ` Derrick Stolee
2022-06-16  6:07               ` Jeff King
2022-06-16  6:54                 ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Jeff King
2022-06-16 14:00                   ` Derrick Stolee
2022-06-17 19:50                   ` Jonathan Tan
2022-06-16 13:59                 ` Derrick Stolee [this message]
2022-06-15 21:01             ` [PATCH] mktree: learn about promised objects Junio C Hamano
2022-06-16  5:02               ` Jeff King
2022-06-16 15:46               ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44                 ` Junio C Hamano
2022-06-21 13:59                   ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51                     ` Junio C Hamano
2022-06-21 17:48                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b559b9a3-b97a-f394-5845-5c810425f8a4@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=roliver@roku.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.