From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/4] commit: don't lazy-fetch commits
Date: Wed, 30 Nov 2022 16:04:42 -0500 [thread overview]
Message-ID: <Y4fFaoRFro2hNDdv@coredump.intra.peff.net> (raw)
In-Reply-To: <6af8dcebd14d803fc8d2a01fbcc7f42ff380719d.1669839849.git.jonathantanmy@google.com>
On Wed, Nov 30, 2022 at 12:30:49PM -0800, Jonathan Tan wrote:
> When parsing commits, fail fast when the commit is missing or
> corrupt, instead of attempting to fetch them. This is done by inlining
> repo_read_object_file() and setting the flag that prevents fetching.
>
> This is motivated by a situation in which through a bug (not necessarily
> through Git), there was corruption in the object store of a partial
> clone. In this particular case, the problem was exposed when "git gc"
> tried to expire reflogs, which calls repo_parse_commit(), which triggers
> fetches of the missing commits.
Makes sense.
> diff --git a/commit.c b/commit.c
> index 572301b80a..17e71f5be4 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -508,6 +508,13 @@ int repo_parse_commit_internal(struct repository *r,
> enum object_type type;
> void *buffer;
> unsigned long size;
> + const struct object_id *real_oid;
> + struct object_info oi = {
> + .typep = &type,
> + .sizep = &size,
> + .contentp = &buffer,
> + .real_oidp = &real_oid,
> + };
> int ret;
>
> if (!item)
> @@ -516,11 +523,18 @@ int repo_parse_commit_internal(struct repository *r,
> return 0;
> if (use_commit_graph && parse_commit_in_graph(r, item))
> return 0;
> - buffer = repo_read_object_file(r, &item->object.oid, &type, &size);
> - if (!buffer)
> +
> + /*
> + * Git does not support partial clones that exclude commits, so set
> + * OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing.
> + */
> + if (oid_object_info_extended(r, &item->object.oid, &oi,
> + OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT) < 0) {
> + die_if_corrupt(r, &item->object.oid, real_oid);
> return quiet_on_missing ? -1 :
> error("Could not read %s",
> oid_to_hex(&item->object.oid));
> + }
OK, so we know we want a commit object because we're in the
commit-parsing function, so we just ask to disable fetching.
Two devil's advocate thoughts:
1. What if we're wrong that it's a commit? If somebody references a
blob in a commit "parent" header, the "right" outcome is for us to
say "oops, the type is wrong" when we try to parse it as a commit.
But now in a partial clone, we might avoid fetching that supposed
commit and say "you don't have this object", even though we could
get it.
I think I'm OK with that. Either way, the repo is corrupt, and
we'll have informed the user of that. The fact that this bizarre
and specific sequence of corruptions might not go as far as it can
to deduce the root cause is probably fine.
2. Are there other places where we'd want to do the same thing? E.g.,
in parse_object() we might ask for an object (not knowing its type)
only to find out that it is a commit. But we have no idea if we
lazy-fetched it or not!
I had somehow imagined your series would be hooking in at the level
of the lazy-fetch code, and complaining about fetching commits. But
that may be tricky to do, because we really don't know the type
until after we fetch it, and selectively removing an object from
the odb is quite hard. We'd probably have to tell the other side
"please, don't send me any commits", which requires a protocol
extension, and...yuck.
By comparison, your approach is an easy win that may catch problems
in practice (and is certainly better than the status quo).
-Peff
next prev parent reply other threads:[~2022-11-30 21:04 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-30 20:30 [PATCH 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-11-30 20:30 ` [PATCH 1/4] object-file: reread object with exact same args Jonathan Tan
2022-11-30 20:30 ` [PATCH 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-11-30 20:47 ` Jeff King
2022-11-30 23:42 ` Junio C Hamano
2022-12-01 19:06 ` Jonathan Tan
2022-11-30 20:30 ` [PATCH 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-11-30 20:54 ` Jeff King
2022-11-30 20:30 ` [PATCH 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-11-30 21:04 ` Jeff King [this message]
2022-12-01 19:11 ` Jonathan Tan
2022-12-01 19:33 ` Jeff King
2022-11-30 23:56 ` Junio C Hamano
2022-11-30 21:06 ` [PATCH 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 19:27 ` [PATCH v2 " Jonathan Tan
2022-12-01 19:27 ` [PATCH v2 1/4] object-file: reread object with exact same args Jonathan Tan
2022-12-01 19:27 ` [PATCH v2 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-12-01 19:27 ` [PATCH v2 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-12-01 19:27 ` [PATCH v2 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-01 19:54 ` [PATCH v2 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 21:26 ` Jonathan Tan
2022-12-02 0:23 ` Jeff King
2022-12-06 0:49 ` Jonathan Tan
2022-12-06 2:03 ` Jeff King
2022-12-01 23:09 ` Junio C Hamano
2022-12-07 0:40 ` [PATCH v2 0/3] " Jonathan Tan
2022-12-07 0:40 ` [PATCH v2 1/3] object-file: don't exit early if skipping loose Jonathan Tan
2022-12-07 1:12 ` Junio C Hamano
2022-12-07 6:14 ` Jeff King
2022-12-07 6:43 ` Junio C Hamano
2022-12-07 23:20 ` Jonathan Tan
2022-12-07 0:40 ` [PATCH v2 2/3] object-file: emit corruption errors when detected Jonathan Tan
2022-12-07 1:16 ` Junio C Hamano
2022-12-07 4:05 ` Ævar Arnfjörð Bjarmason
2022-12-07 7:07 ` Jeff King
2022-12-07 10:33 ` Ævar Arnfjörð Bjarmason
2022-12-07 23:26 ` Jonathan Tan
2022-12-07 23:50 ` Ævar Arnfjörð Bjarmason
2022-12-08 6:33 ` Jeff King
2022-12-07 6:42 ` Jeff King
2022-12-07 0:40 ` [PATCH v2 3/3] commit: don't lazy-fetch commits Jonathan Tan
2022-12-07 1:17 ` Junio C Hamano
2022-12-07 6:47 ` Jeff King
2022-12-08 20:57 ` [PATCH v3 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-08 20:57 ` [PATCH v3 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-08 20:57 ` [PATCH v3 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 2:00 ` Jeff King
2022-12-09 18:17 ` Jonathan Tan
2022-12-09 20:27 ` Jeff King
2022-12-09 20:27 ` Jeff King
2022-12-08 20:57 ` [PATCH v3 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-09 1:56 ` Jeff King
2022-12-09 18:26 ` Jonathan Tan
2022-12-09 14:19 ` Ævar Arnfjörð Bjarmason
2022-12-09 18:33 ` Jonathan Tan
2022-12-08 20:57 ` [PATCH v3 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-09 14:14 ` Ævar Arnfjörð Bjarmason
2022-12-09 21:44 ` [PATCH v4 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-09 21:44 ` [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-09 21:44 ` [PATCH v4 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 21:44 ` [PATCH v4 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-10 0:16 ` Junio C Hamano
2022-12-12 20:38 ` Jonathan Tan
2022-12-12 20:49 ` Jeff King
2022-12-12 20:59 ` Jonathan Tan
2022-12-12 21:20 ` Jeff King
2022-12-12 21:29 ` Jonathan Tan
2022-12-12 22:17 ` Jeff King
2022-12-12 22:52 ` Jonathan Tan
2022-12-13 10:37 ` Jeff King
2022-12-09 21:44 ` [PATCH v4 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-13 1:51 ` Junio C Hamano
2022-12-13 10:38 ` Jeff King
2022-12-12 22:48 ` [PATCH v5 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 20:43 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-15 0:07 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4fFaoRFro2hNDdv@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).