All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Richard Oliver <roliver@roku.com>, Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH] mktree: learn about promised objects
Date: Thu, 16 Jun 2022 02:07:41 -0400	[thread overview]
Message-ID: <YqrIrYHKUP6b/EtN@coredump.intra.peff.net> (raw)
In-Reply-To: <1fe6c00a-806c-89de-cb67-d063dc4a5279@github.com>

On Wed, Jun 15, 2022 at 02:17:58PM -0400, Derrick Stolee wrote:

> On 6/15/2022 1:40 PM, Richard Oliver wrote:
> > On 15/06/2022 05:00, Jeff King wrote:
> 
> >> So it is not just lookup, but actual tree walking that is expensive. The
> >> flip side is that you don't have to store a complete separate list of
> >> the promised objects. Whether that's a win depends on how many local
> >> objects you have, versus how many are promised.
> 
> This is also why blobless (or blob-size filters) are the recommended way
> to use partial clone. It's just too expensive to have tree misses.

I agree that tree misses are awful, but I'm actually talking about
something different: traversing the local trees we _do_ have in order to
find the set of promised objects. Which is worse for blob:none, because
it means you have more trees locally. :)

Try this with a big repo like linux.git:

  git clone --no-local --filter=blob:none linux.git repo
  cd repo

  # this is fast; we mark the promisor trees as UNINTERESTING, so we do
  # not look at them as part of the traversal, and never call
  # is_promisor_object().
  time git rev-list --count --objects --all --exclude-promisor-objects

  # but imagine we had a fixed mktree[1] that did not fault in the blobs
  # unnecessarily, and we made a new tree that references a promised
  # blob.
  tree=$(git ls-tree HEAD~1000 | grep Makefile | git mktree --missing)
  commit=$(echo foo | git commit-tree -p HEAD $tree)
  git update-ref refs/heads/foo $commit

  # this is now slow; even though we only call is_promisor_object()
  # once, we have to open every single tree in the pack to find it!
  time git rev-list --count --objects --all --exclude-promisor-objects

Those rev-lists run in 1.7s and 224s respectively. Ouch!

Now the setup there is kind of contrived, and it's actually not trivial
to convince rev-list to actually call is_promisor_object() these days.
That's because promisor-ness (promisosity?) tends to be one-way
transitive. A promised blob is usually either only referenced by
promised trees (which we have in this case), or is faulted in as part of
making a new tree.

But it's not guaranteed, and certainly a faulted-in object could later
be deleted from the local repo, since it's promised. I suspect there are
less convoluted workflows to get to a similar state, but I don't know
them offhand. There's more discussion of this issue in this thread from
last summer:

  https://lore.kernel.org/git/20210403090412.GH2271@szeder.dev/

-Peff

[1] The mktree I used was the fix discussed elsewhere in the thread:

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 902edba6d2..42ae82230c 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -117,15 +117,11 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	}
 
 	/* Check the type of object identified by sha1 */
-	obj_type = oid_object_info(the_repository, &oid, NULL);
-	if (obj_type < 0) {
-		if (allow_missing) {
-			; /* no problem - missing objects are presumed to be of the right type */
-		} else {
+	if (!allow_missing) {
+		obj_type = oid_object_info(the_repository, &oid, NULL);
+		if (obj_type < 0)
 			die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
-		}
-	} else {
-		if (obj_type != mode_type) {
+		else if (obj_type != mode_type) {
 			/*
 			 * The object exists but is of the wrong type.
 			 * This is a problem regardless of allow_missing

  reply	other threads:[~2022-06-16  6:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33   ` Richard Oliver
2022-06-14 17:27     ` Derrick Stolee
2022-06-15  0:35       ` Taylor Blau
2022-06-15  4:00         ` Jeff King
2022-06-15 17:40           ` Richard Oliver
2022-06-15 18:17             ` Derrick Stolee
2022-06-16  6:07               ` Jeff King [this message]
2022-06-16  6:54                 ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Jeff King
2022-06-16 14:00                   ` Derrick Stolee
2022-06-17 19:50                   ` Jonathan Tan
2022-06-16 13:59                 ` [PATCH] mktree: learn about promised objects Derrick Stolee
2022-06-15 21:01             ` Junio C Hamano
2022-06-16  5:02               ` Jeff King
2022-06-16 15:46               ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44                 ` Junio C Hamano
2022-06-21 13:59                   ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51                     ` Junio C Hamano
2022-06-21 17:48                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqrIrYHKUP6b/EtN@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=roliver@roku.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.