All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 3/5] cat-file: disable refs/replace with --batch-all-objects
Date: Tue, 5 Oct 2021 16:36:07 -0400	[thread overview]
Message-ID: <YVy3N7ZX+s6Mi93y@coredump.intra.peff.net> (raw)
In-Reply-To: <YVy1sx8Xb1xMLFQT@coredump.intra.peff.net>

When we're enumerating all objects in the object database, it doesn't
make sense to respect refs/replace. The point of this option is to
enumerate all of the objects in the database at a low level. By
definition we'd already show the replacement object's contents (under
its real oid), and showing those contents under another oid is almost
certainly working against what the user is trying to do.

Note that you could make the same argument for something like:

  git show-index <foo.idx |
  awk '{print $2}' |
  git cat-file --batch

but there we can't know in cat-file exactly what the user intended,
because we don't know the source of the input. They could be trying to
do low-level debugging, or they could be doing something more high-level
(e.g., imagine a porcelain built around cat-file for its object
accesses). So in those cases, we'll have to rely on the user specifying
"git --no-replace-objects" to tell us what to do.

One _could_ make an argument that "cat-file --batch" is sufficiently
low-level plumbing that it should not respect replace-objects at all
(and the caller should do any replacement if they want it).  But we have
been doing so for some time. The history is a little tangled:

  - looking back as far as v1.6.6, we would not respect replace refs for
    --batch-check, but would for --batch (because the former used
    sha1_object_info(), and the replace mechanism only affected actual
    object reads)

  - this discrepancy was made even weirder by 98e2092b50 (cat-file:
    teach --batch to stream blob objects, 2013-07-10), where we always
    output the header using the --batch-check code, and then printed the
    object separately. This could lead to "cat-file --batch" dying (when
    it notices the size or type changed for a non-blob object) or even
    producing bogus output (in streaming mode, we didn't notice that we
    wrote the wrong number of bytes).

  - that persisted until 1f7117ef7a (sha1_file: perform object
    replacement in sha1_object_info_extended(), 2013-12-11), which then
    respected replace refs for both forms.

So it has worked reliably this way for over 7 years, and we should make
sure it continues to do so. That could also be an argument that
--batch-all-objects should not change behavior (which this patch is
doing), but I really consider the current behavior to be an unintended
bug. It's a side effect of how the code is implemented (feeding the oids
back into oid_object_info() rather than looking at what we found while
reading the loose and packed object storage).

The implementation is straight-forward: we just disable the global
read_replace_refs flag when we're in --batch-all-objects mode. It would
perhaps be a little cleaner to change the flag we pass to
oid_object_info_extended(), but that's not enough. We also read objects
via read_object_file() and stream_blob_to_fd(). The former could switch
to its _extended() form, but the streaming code has no mechanism for
disabling replace refs. Setting the global flag works, and as a bonus,
it's impossible to have any "oops, we're sometimes replacing the object
and sometimes not" bugs in the output (like the ones caused by
98e2092b50 above).

The tests here cover the regular-input and --batch-all-objects cases,
for both --batch-check and --batch. There is a test in t6050 that covers
the regular-input case with --batch already, but this new one goes much
further in actually verifying the output (plus covering --batch-check
explicitly). This is perhaps a little overkill and the tests would be
simpler just covering --batch-check, but I wanted to make sure we're
checking that --batch output is consistent between the header and the
content. The global-flag technique used here makes that easy to get
right, but this is future-proofing us against regressions.

Signed-off-by: Jeff King <peff@peff.net>
---
Reading between the lines, you might guess that I also introduced a
"whoops, our size/content do not match" bug while trying the other
approach. ;)

 Documentation/git-cat-file.txt |  3 +-
 builtin/cat-file.c             |  2 ++
 t/t1006-cat-file.sh            | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 6467707c6e..27b27e2b30 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -96,7 +96,8 @@ OPTIONS
 	any alternate object stores (not just reachable objects).
 	Requires `--batch` or `--batch-check` be specified. By default,
 	the objects are visited in order sorted by their hashes; see
-	also `--unordered` below.
+	also `--unordered` below. Objects are presented as-is, without
+	respecting the "replace" mechanism of linkgit:git-replace[1].
 
 --buffer::
 	Normally batch output is flushed after each object is output, so
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 243fe6844b..b713be545e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -529,6 +529,8 @@ static int batch_objects(struct batch_options *opt)
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
 
+		read_replace_refs = 0;
+
 		cb.opt = opt;
 		cb.expand = &data;
 		cb.scratch = &output;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index c77db35728..4a753705ec 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -617,4 +617,70 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
 	cmp expect actual
 '
 
+test_expect_success 'set up replacement object' '
+	orig=$(git rev-parse HEAD) &&
+	git cat-file commit $orig >orig &&
+	{
+		cat orig &&
+		echo extra
+	} >fake &&
+	fake=$(git hash-object -t commit -w fake) &&
+	orig_size=$(git cat-file -s $orig) &&
+	fake_size=$(git cat-file -s $fake) &&
+	git replace $orig $fake
+'
+
+test_expect_success 'cat-file --batch respects replace objects' '
+	git cat-file --batch >actual <<-EOF &&
+	$orig
+	EOF
+	{
+		echo "$orig commit $fake_size" &&
+		cat fake &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'cat-file --batch-check respects replace objects' '
+	git cat-file --batch-check >actual <<-EOF &&
+	$orig
+	EOF
+	echo "$orig commit $fake_size" >expect &&
+	test_cmp expect actual
+'
+
+# Pull the entry for object with oid "$1" out of the output of
+# "cat-file --batch", including its object content (which requires
+# parsing and reading a set amount of bytes, hence perl).
+extract_batch_output () {
+    perl -ne '
+	BEGIN { $oid = shift }
+	if (/^$oid \S+ (\d+)$/) {
+	    print;
+	    read STDIN, my $buf, $1;
+	    print $buf;
+	    print "\n";
+	}
+    ' "$@"
+}
+
+test_expect_success 'cat-file --batch-all-objects --batch ignores replace' '
+	git cat-file --batch-all-objects --batch >actual.raw &&
+	extract_batch_output $orig <actual.raw >actual &&
+	{
+		echo "$orig commit $orig_size" &&
+		cat orig &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'cat-file --batch-all-objects --batch-check ignores replace' '
+	git cat-file --batch-all-objects --batch-check >actual.raw &&
+	grep ^$orig actual.raw >actual &&
+	echo "$orig commit $orig_size" >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.33.0.1231.g45ae28b974


  parent reply	other threads:[~2021-10-05 20:36 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-05 20:29 [PATCH 0/5] cat-file replace handling and optimization Jeff King
2021-10-05 20:30 ` [PATCH 1/5] t1006: clean up broken objects Jeff King
2021-10-05 20:31 ` [PATCH 2/5] cat-file: mention --unordered along with --batch-all-objects Jeff King
2021-10-05 21:02   ` Ævar Arnfjörð Bjarmason
2021-10-05 21:41     ` Jeff King
2021-10-06  9:02       ` Ævar Arnfjörð Bjarmason
2021-10-06 16:15         ` Jeff King
2021-10-07 10:18           ` Ævar Arnfjörð Bjarmason
2021-10-08  2:30             ` Jeff King
2021-10-08  7:54               ` Ævar Arnfjörð Bjarmason
2021-10-08 20:34                 ` Junio C Hamano
2021-10-08 21:44                   ` Jeff King
2021-10-08 22:04                     ` Junio C Hamano
2021-11-06 21:46                       ` [PATCH 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-11-06 21:46                         ` [PATCH 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-11-07  1:07                           ` Eric Sunshine
2021-11-06 21:46                         ` [PATCH 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-11-06 21:46                         ` [PATCH 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-11-06 21:46                         ` [PATCH 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-11-06 21:46                         ` [PATCH 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-11-06 21:46                         ` [PATCH 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-11-07  3:00                           ` Eric Sunshine
2021-11-06 21:46                         ` [PATCH 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-11-06 21:47                         ` [PATCH 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2021-11-06 21:47                         ` [PATCH 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-11-07  3:05                           ` Eric Sunshine
2021-11-06 21:47                         ` [PATCH 10/10] cat-file: improve --(textconv|filters) disambiguation Ævar Arnfjörð Bjarmason
2021-11-12 22:19                         ` [PATCH v2 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-11-12 22:19                           ` [PATCH v2 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-11-12 22:20                           ` [PATCH v2 10/10] cat-file: improve --(textconv|filters) disambiguation Ævar Arnfjörð Bjarmason
2021-11-29 19:57                           ` [PATCH v3 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-12-06  1:19                               ` Jiang Xin
2021-11-29 19:57                             ` [PATCH v3 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-11-29 19:57                             ` [PATCH v3 10/10] cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters) Ævar Arnfjörð Bjarmason
2021-12-08 12:34                             ` [PATCH v4 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-12-20 16:00                                 ` John Cai
2021-12-08 12:34                               ` [PATCH v4 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-12-08 12:34                               ` [PATCH v4 10/10] cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters) Ævar Arnfjörð Bjarmason
2021-12-22  4:12                               ` [PATCH v5 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-12-22  4:12                                 ` [PATCH v5 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-12-22  4:13                                 ` [PATCH v5 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-12-26  0:31                                   ` Junio C Hamano
2021-12-22  4:13                                 ` [PATCH v5 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2021-12-22  4:13                                 ` [PATCH v5 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-12-22  4:13                                 ` [PATCH v5 10/10] cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters) Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                 ` [PATCH v6 00/10] cat-file: better usage UX & error messages Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 01/10] cat-file tests: test bad usage Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 02/10] cat-file tests: test messaging on bad objects/paths Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 03/10] parse-options API: add a usage_msg_optf() Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 04/10] cat-file docs: fix SYNOPSIS and "-h" output Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 05/10] cat-file: move "usage" variable to cmd_cat_file() Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 06/10] cat-file: make --batch-all-objects a CMDMODE Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 07/10] cat-file: fix remaining usage bugs Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 08/10] cat-file: correct and improve usage information Ævar Arnfjörð Bjarmason
2022-01-08  2:58                                     ` Jiang Xin
2022-01-10 22:08                                       ` [PATCH 0/2] fixups for issues in next-merged ab/cat-file Ævar Arnfjörð Bjarmason
2022-01-10 22:08                                         ` [PATCH 1/2] cat-file: don't whitespace-pad "(...)" in SYNOPSIS and usage output Ævar Arnfjörð Bjarmason
2022-01-10 22:08                                         ` [PATCH 2/2] cat-file: s/_/-/ in typo'd usage_msg_optf() message Ævar Arnfjörð Bjarmason
2022-01-10 22:20                                         ` [PATCH 0/2] fixups for issues in next-merged ab/cat-file Junio C Hamano
2022-01-11 15:48                                         ` Taylor Blau
2022-01-12 18:11                                           ` Junio C Hamano
2021-12-28 13:28                                   ` [PATCH v6 09/10] object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY Ævar Arnfjörð Bjarmason
2021-12-28 13:28                                   ` [PATCH v6 10/10] cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters) Ævar Arnfjörð Bjarmason
2021-10-05 20:36 ` Jeff King [this message]
2021-10-06 20:33   ` [PATCH 3/5] cat-file: disable refs/replace with --batch-all-objects Derrick Stolee
2021-10-07 20:48   ` Junio C Hamano
2021-10-05 20:36 ` [PATCH 4/5] cat-file: split ordered/unordered batch-all-objects callbacks Jeff King
2021-10-05 20:38 ` [PATCH 5/5] cat-file: use packed_object_info() for --batch-all-objects Jeff King
2021-10-07 20:56   ` Junio C Hamano
2021-10-08  2:35     ` Jeff King
2021-10-06 20:41 ` [PATCH 0/5] cat-file replace handling and optimization Derrick Stolee
2021-10-07  0:32   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVy3N7ZX+s6Mi93y@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.