All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Andrew Olsen <andrew.olsen@koordinates.com>
Cc: git@vger.kernel.org
Subject: Re: [BUG] git rev-list --missing=allow-promisor
Date: Thu, 11 Aug 2022 04:12:01 -0400	[thread overview]
Message-ID: <YvS50W6wku5Y/NC7@coredump.intra.peff.net> (raw)
In-Reply-To: <CAPJmHpVssKshapGYDF-ifU1fts-jFTC-HqxnjN8meSMP3weB4g@mail.gmail.com>

On Thu, Aug 11, 2022 at 09:33:54AM +1200, Andrew Olsen wrote:

> Steps to reproduce:
> 1. Create a git repository with missing+promised blobs.
> For example, I did this:
> git clone git@github.com:git/git.git git-no-blobs --filter=blob:none
> --depth=1 --bare
> 
> --filter=blob:none - don't fetch any blobs
> --depth=1 - we don't need git's entire history, just one commit is fine
> --bare - this avoids git fetching any blobs as part of the checkout operation
> 
> 2. In this repository, run:
> git rev-list HEAD --objects --missing=allow-promisor

Thanks for a clear report. I can reproduce this easily.

> Expected outcome:
> git rev-list prints the paths and OIDs of all the objects
> 
> Actual outcome:
> Any of the following:
> a) git rev-list prints the paths and OIDs of all the objects
> b) git rev-list fails with "fatal: malformed mode in tree entry"
> c) git rev-list fails with "fatal: too-short tree object"

Even better is to compile with SANITIZE=address, which reliably shows
that this is indeed a use-after-free.

> Diagnosis:
> With missing=allow-promisor, when git encounters a missing object, it
> calls is_promisor on it, which in turn calls add_promisor_object on
> lots of things.
> In this line in add_promisor_object, the buffer of the tree that we
> are currently traversing is freed:
> https://github.com/git/git/blob/master/packfile.c#L2242

Ugh, yes. This is due to my fcc07e980b (is_promisor_object(): free tree
buffer after parsing, 2021-04-13). In most cases we'll have just
allocated memory for the tree buffer in parse_object(), but occasionally
we'll jump into is_promisor_object() while looking at an existing tree,
and parse_object() will just reuse the buffer cached in the struct.

It would be really nice to have some kind of reference-counting for
these cached buffers, but doing it everywhere may be rather complicated.
An easier hack to fix this area is for it to check ahead of time whether
there's a cached buffer we'll reuse. That only fixes this spot, but this
function is rather unlike most others in that it starts looking at new
random trees in the middle of an existing tree traversal (this would be
possible in a regular tree walk if you could have circular references,
but you can't easily because of sha1).

So something like this makes the bug go away, though it does feel a
little dirty:

---
diff --git a/packfile.c b/packfile.c
index 6b0eb9048e..c3186c1a02 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2217,7 +2217,14 @@ static int add_promisor_object(const struct object_id *oid,
 			       void *set_)
 {
 	struct oidset *set = set_;
-	struct object *obj = parse_object(the_repository, oid);
+	struct object *obj;
+	int save_tree_buffer = 0;
+
+	obj = lookup_object(the_repository, oid);
+	if (obj && obj->type == OBJ_TREE && obj->parsed)
+		save_tree_buffer = 1;
+
+	obj = parse_object(the_repository, oid);
 	if (!obj)
 		return 1;
 
@@ -2239,7 +2246,8 @@ static int add_promisor_object(const struct object_id *oid,
 			return 0;
 		while (tree_entry_gently(&desc, &entry))
 			oidset_insert(set, &entry.oid);
-		free_tree_buffer(tree);
+		if (!save_tree_buffer)
+			free_tree_buffer(tree);
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;

-Peff

  reply	other threads:[~2022-08-11  8:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-10 21:33 [BUG] git rev-list --missing=allow-promisor Andrew Olsen
2022-08-11  8:12 ` Jeff King [this message]
2022-08-14  6:29   ` [PATCH] is_promisor_object(): fix use-after-free of tree buffer Jeff King
2022-08-15  5:32     ` Junio C Hamano
2022-08-15 22:53       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvS50W6wku5Y/NC7@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=andrew.olsen@koordinates.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.