git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: 程洋 <chengyang@xiaomi.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	何浩 <hehao@xiaomi.com>, "Xin7 Ma 马鑫" <maxin7@xiaomi.com>,
	石奉兵 <shifengbing@xiaomi.com>, 凡军辉 <fanjunhui@xiaomi.com>,
	王汉基 <wanghanji@xiaomi.com>
Subject: Re: [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects
Date: Wed, 7 Sep 2022 16:36:22 -0400	[thread overview]
Message-ID: <YxkAxutS+B8//0WF@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqq1qsnugsu.fsf@gitster.g>

On Wed, Sep 07, 2022 at 12:26:41PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The exception for both is if --verify-objects is used. In that case,
> > we'll skip this optimization, and the new test makes sure we do this
> > correctly.
> 
> I wondered if we want to test the change on the "upload-pack" side
> by going in to the swapped-commits repository, running upload-pack
> manually and seeing that it spews unusable output without failing,
> but it probably is not worth the effort.  We have plenty of tests
> that exercises upload-pack in "good" cases.  What might be a good
> test is to try fetching from swapped-commits repository and make
> sure that index-pack on the receiving end notices, but I suspect we
> already have such a "fetch/clone from a corrupt repository" test,
> in which case we do not have to add one.

There are some tests in t1060 that cover transfer of corrupted objects.
But one of the tricky things about corruption is that it can be somewhat
arbitrary where and when we notice. I.e., whether upload-pack notices
the problem and bails or whether it spews an invalid result, we're OK
with either outcome, and a test for it can end up rather fragile.

What we do care about is that _somebody_ notices the problem. t1060
covers that for some cases, though of course there's no partial clone
there. If we add one like this:

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index 5b8e47e346..fd18a8b29e 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -139,4 +139,11 @@ test_expect_success 'internal tree objects are not "missing"' '
 	)
 '
 
+test_expect_success 'partial clone of corrupted reository' '
+	test_config -C bit-error uploadpack.allowFilter true &&
+	git clone --no-local --no-checkout --filter=blob:none \
+		bit-error corrupt-partial && \
+	test_must_fail git -C corrupt-partial checkout --force
+'
+
 test_done

we can see that the initial blob:none clone is OK (since neither side
looks at the corrupted blob at all). And then the checkout does barf as
expected. But it looks like we catch the error in upload-pack, probably
because the loose object is so corrupted that we cannot even access its
type header. I tried moving the corruption to further in the file, but I
think it's simply so small that zlib will read the whole input and
complain about the bogus crc.

Hmm. Looks like that script has another corrupted repo where an object
is misnamed. So if we s/bit-error/misnamed/ in the test above, then we
do trigger the new code. Before we get:

  error: hash mismatch d95f3ad14dee633a758d2e331151e950dd13e4ed
  fatal: git upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed
  fatal: remote error: upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed

(the error is a bit misleading, but I guess the inability to create the
object struct means our "is it our ref" code gets confused). After my
patches, we get:

  remote: Enumerating objects: 1, done.
  remote: Counting objects: 100% (1/1), done.
  remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
  Receiving objects: 100% (1/1), 49 bytes | 49.00 KiB/s, done.
  fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
  error: [...]/misnamed did not send all necessary objects

Is that worth having? I dunno. It's kind of brittle in that a later
change could mean we're finding the corruption elsewhere, and not
checking exactly what we think we are. OTOH, it probably doesn't hurt to
cover more cases here, and it's not a very expensive test.

-Peff

  reply	other threads:[~2022-09-07 20:36 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  8:09 Partial-clone cause big performance impact on server 程洋
2022-08-11 17:22 ` Jonathan Tan
2022-08-13  7:55   ` 回复: [External Mail]Re: " 程洋
2022-08-13 11:41     ` 程洋
2022-08-15  5:16     ` ZheNing Hu
2022-08-15 13:15       ` 程洋
2022-08-12 12:21 ` Derrick Stolee
2022-08-14  6:48 ` Jeff King
2022-08-15 13:18   ` Derrick Stolee
2022-08-15 14:50     ` [External Mail]Re: " 程洋
2022-08-17 10:22     ` 程洋
2022-08-17 13:41       ` Derrick Stolee
2022-08-18  5:49         ` Jeff King
2022-09-01  6:53   ` 程洋
2022-09-01 16:19     ` Jeff King
2022-09-05 11:17       ` 程洋
2022-09-06 18:38         ` Jeff King
2022-09-06 22:58           ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Jeff King
2022-09-06 23:01             ` [PATCH 1/3] parse_object(): allow skipping hash check Jeff King
2022-09-07 14:15               ` Derrick Stolee
2022-09-07 20:44                 ` Jeff King
2022-09-06 23:05             ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 14:36               ` Derrick Stolee
2022-09-07 14:45                 ` Derrick Stolee
2022-09-07 20:50                   ` Jeff King
2022-09-07 19:26               ` Junio C Hamano
2022-09-07 20:36                 ` Jeff King [this message]
2022-09-07 20:48                   ` [BUG] t1800: Fails for error text comparison rsbecker
2022-09-07 21:55                     ` Junio C Hamano
2022-09-07 22:23                       ` rsbecker
2022-09-07 21:02                   ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 22:07                     ` Junio C Hamano
2022-09-08  5:04                       ` Jeff King
2022-09-08 16:41                         ` Junio C Hamano
2022-09-06 23:06             ` [PATCH 3/3] parse_object(): check commit-graph when skip_hash set Jeff King
2022-09-07 14:46               ` Derrick Stolee
2022-09-07 19:31               ` Junio C Hamano
2022-09-08 10:39                 ` [External Mail]Re: " 程洋
2022-09-08 18:42                   ` Jeff King
2022-09-07 14:48             ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxkAxutS+B8//0WF@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=chengyang@xiaomi.com \
    --cc=derrickstolee@github.com \
    --cc=fanjunhui@xiaomi.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hehao@xiaomi.com \
    --cc=maxin7@xiaomi.com \
    --cc=shifengbing@xiaomi.com \
    --cc=wanghanji@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).