All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/16] enabling GIT_REF_PARANOIA by default
@ 2021-09-24 18:30 Jeff King
  2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:30 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

I recently ran into a situation where dealing with a corrupted
repository was more confusing than necessary, because Git by default
ignores corrupted refs in many commands.

A while ago we introduced GIT_REF_PARANOIA, which works by including
broken refs in iteration, which then typically causes later operations
to fail (e.g., during repacking, you'd prefer to barf loudly when trying
to access the missing object rather than incorrectly assume the objects
from the broken ref aren't reachable).

I think this is a better default for Git to have in general, not just
for a few select operations (we turn it on by default for pruning and
some repacks). We shouldn't see corruptions in general, and complaining
loudly when we do is the safest option. The reason we held back when the
knob was introduced was mostly out of deference to the historical
behavior.

So this series started as a patch to just flip that default, but I found
some interesting things:

 - there are a couple of tests that get confused. IMHO this is
   vindicating the idea of flipping the default, beacuse in each case
   these tests were poorly written (either corruptions they didn't
   realize they had, or doing questionable operations on an incomplete
   set of refs)

 - the existing GIT_REF_PARANOIA is over-eager to complain about
   dangling symrefs, even though they're perfectly fine

 - as usual, there was some obvious cleanup along the way. ;)

Even if you don't buy the argument that we should flip the default, I
think everything up through patch 11 is a worthwhile cleanup on its own.

Note that this conflicts with jt/no-abuse-alternate-odb-for-submodules,
since it is touching the innards of DO_FOR_EACH_REF_INCLUDE_BROKEN, too.
I left a note on that series about how I think that could be reconciled
(i.e., the conflict is just around how the code is written, and not
inherent to the goals).

In the end I left GIT_REF_PARANOIA as a knob, just defaulting to "1". I
think it's possibly useful as an escape hatch when dealing with a
corrupt repo. But we _could_ go all the way and basically drop
DO_FOR_EACH_REF_INCLUDE_BROKEN's do-we-have-the-object check entirely.
That would totally sever the relationship between the ref store and the
object store, which would make things conceptually a lot simpler (and I
saw was discussed in some of those earlier threads).

Just a breakdown of the series:

  [01/16]: t7900: clean up some more broken refs
  [02/16]: t5516: don't use HEAD ref for invalid ref-deletion tests
  [03/16]: t5600: provide detached HEAD for corruption failures
  [04/16]: t5312: drop "verbose" helper
  [05/16]: t5312: create bogus ref as necessary
  [06/16]: t5312: test non-destructive repack
  [07/16]: t5312: be more assertive about command failure

     Test cleanups. Necessary for the default flip, but I think each
     stands on its own.

  [08/16]: refs-internal.h: move DO_FOR_EACH_* flags next to each other
  [09/16]: refs-internal.h: reorganize DO_FOR_EACH_* flag documentation

     Cleanup of existing features.

  [10/16]: refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  [11/16]: refs: omit dangling symrefs when using GIT_REF_PARANOIA

     Fixing the current over-eager behavior of GIT_REF_PARANOIA.

  [12/16]: refs: turn on GIT_REF_PARANOIA by default

     The actual flip.

  [13/16]: repack, prune: drop GIT_REF_PARANOIA settings
  [14/16]: ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  [15/16]: ref-filter: drop broken-ref code entirely
  [16/16]: refs: drop "broken" flag from for_each_fullref_in()

     Some small cleanups we can do as a result.

 Documentation/git.txt         | 19 ++++++------
 builtin/branch.c              |  2 +-
 builtin/for-each-ref.c        |  2 +-
 builtin/prune.c               |  1 -
 builtin/repack.c              |  3 --
 builtin/rev-parse.c           |  4 +--
 cache.h                       |  8 -----
 environment.c                 |  1 -
 ls-refs.c                     |  2 +-
 ref-filter.c                  | 22 ++++++--------
 ref-filter.h                  |  1 -
 refs.c                        | 42 +++++++++++++-------------
 refs.h                        |  9 ++----
 refs/files-backend.c          |  5 ++++
 refs/refs-internal.h          | 56 ++++++++++++++++++++++-------------
 revision.c                    |  2 +-
 t/t1430-bad-ref-name.sh       |  2 +-
 t/t5312-prune-corruption.sh   | 48 ++++++++++++++++++++++--------
 t/t5516-fetch-push.sh         | 19 ++++++------
 t/t5600-clone-fail-cleanup.sh |  4 ++-
 t/t7900-maintenance.sh        |  6 +++-
 21 files changed, 142 insertions(+), 116 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/16] t7900: clean up some more broken refs
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
@ 2021-09-24 18:32 ` Jeff King
  2021-09-27 17:38   ` Jonathan Tan
  2021-09-24 18:33 ` [PATCH 02/16] t5516: don't use HEAD ref for invalid ref-deletion tests Jeff King
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee

The "incremental-repack task" test replaces the object directory with a
known state. As a result, some of our refs point to objects that are not
included in that state.

Commit 3cf5f221be (t7900: clean up some broken refs, 2021-01-19) cleaned
up some of those (that were causing warnings to stderr from the
maintenance process). But there are a few more that were missed. These
aren't hurting anything for now, but it's certainly an unexpected state
to leave the test repository in, and it will become a problem if repack
ever gets more picky about broken refs.

Let's clean up those additional refs (which are all in refs/remotes,
with nothing there that isn't broken), and add an extra "for-each-ref"
call to assert that we've got everything.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7900-maintenance.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 36a4218745..31245f6276 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -277,7 +277,7 @@ test_expect_success 'incremental-repack task' '
 
 	# Delete refs that have not been repacked in these packs.
 	git for-each-ref --format="delete %(refname)" \
-		refs/prefetch refs/tags >refs &&
+		refs/prefetch refs/tags refs/remotes >refs &&
 	git update-ref --stdin <refs &&
 
 	# Replace the object directory with this pack layout.
@@ -286,6 +286,10 @@ test_expect_success 'incremental-repack task' '
 	ls $packDir/*.pack >packs-before &&
 	test_line_count = 3 packs-before &&
 
+	# make sure we do not have any broken refs that were
+	# missed in the deletion above
+	git for-each-ref &&
+
 	# the job repacks the two into a new pack, but does not
 	# delete the old ones.
 	git maintenance run --task=incremental-repack &&
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/16] t5516: don't use HEAD ref for invalid ref-deletion tests
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
  2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
@ 2021-09-24 18:33 ` Jeff King
  2021-09-24 18:34 ` [PATCH 03/16] t5600: provide detached HEAD for corruption failures Jeff King
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:33 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

A few tests in t5516 want to assert that we can delete a corrupted ref
whose pointed-to object is missing. They do so by using the "main"
branch, which is also pointed to by HEAD.

This does work, but only because of a subtle assumption about the
implementation. We do not block the deletion because of the invalid ref,
but we _also_ do not notice that the deleted branch is pointed to by
HEAD. And so the safety rule of "do not allow HEAD to be deleted in a
non-bare repository" does not kick in, and the test passes.

Let's instead use a non-HEAD branch. That still tests what we care about
here (deleting a corrupt ref), but without implicitly depending on our
failure to notice that we're deleting HEAD. That will future proof the
test against that behavior changing.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5516-fetch-push.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 4db8edd9c8..b13553ecf4 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -662,10 +662,10 @@ test_expect_success 'push does not update local refs on failure' '
 
 test_expect_success 'allow deleting an invalid remote ref' '
 
-	mk_test testrepo heads/main &&
+	mk_test testrepo heads/branch &&
 	rm -f testrepo/.git/objects/??/* &&
-	git push testrepo :refs/heads/main &&
-	(cd testrepo && test_must_fail git rev-parse --verify refs/heads/main)
+	git push testrepo :refs/heads/branch &&
+	(cd testrepo && test_must_fail git rev-parse --verify refs/heads/branch)
 
 '
 
@@ -706,25 +706,25 @@ test_expect_success 'pushing valid refs triggers post-receive and post-update ho
 '
 
 test_expect_success 'deleting dangling ref triggers hooks with correct args' '
-	mk_test_with_hooks testrepo heads/main &&
+	mk_test_with_hooks testrepo heads/branch &&
 	rm -f testrepo/.git/objects/??/* &&
-	git push testrepo :refs/heads/main &&
+	git push testrepo :refs/heads/branch &&
 	(
 		cd testrepo/.git &&
 		cat >pre-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/main
+		$ZERO_OID $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >update.expect <<-EOF &&
-		refs/heads/main $ZERO_OID $ZERO_OID
+		refs/heads/branch $ZERO_OID $ZERO_OID
 		EOF
 
 		cat >post-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/main
+		$ZERO_OID $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >post-update.expect <<-EOF &&
-		refs/heads/main
+		refs/heads/branch
 		EOF
 
 		test_cmp pre-receive.expect pre-receive.actual &&
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/16] t5600: provide detached HEAD for corruption failures
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
  2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
  2021-09-24 18:33 ` [PATCH 02/16] t5516: don't use HEAD ref for invalid ref-deletion tests Jeff King
@ 2021-09-24 18:34 ` Jeff King
  2021-09-24 18:35 ` [PATCH 04/16] t5312: drop "verbose" helper Jeff King
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When checking how git-clone behaves when it fails, we stimulate some
failures by trying to do a clone from a local repository whose objects
have been removed. Because these clones use local optimizations, there's
a subtle dependency in how the corruption is handled on the sending
side.

If upload-pack does not show us the broken refs (which it does not
currently), then we see only HEAD (which is itself broken), and clone
that as a detached HEAD. When we try to write the ref, we notice that we
never got the object and bail.

But if upload-pack _does_ show us the broken refs (which it may in a
future patch), then we'll realize that HEAD is a symref and just write
that. You'd think we'd fail when writing out the refs themselves, but we
don't; we do a bulk write and skip the connectivity check because of our
--local optimizations. For the non-bare case, we do notice the problem
when we try to checkout. But for a bare repository, we unexpectedly
complete the clone successfully!

At first glance this may seem like a bug. But the whole point of those
local optimizations is to give up some safety for speed. If you want to
be careful, you should be using "--no-local", which would notice that
the pack did not transfer sufficient objects. We could do that in these
tests, but part of the point is for them to fail at specific moments
(and indeed, we have a later test that checks for transport failure).

However, we can make this less subtle and future-proof it against
changes on the upload-pack side by just having an explicit detached
HEAD in the corrupted repo. Now we'll fail as expected during the ref
write if any ref _or_ HEAD is corrupt, whether we're --bare or not.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5600-clone-fail-cleanup.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t5600-clone-fail-cleanup.sh b/t/t5600-clone-fail-cleanup.sh
index 5bf10261d3..34b3df4027 100755
--- a/t/t5600-clone-fail-cleanup.sh
+++ b/t/t5600-clone-fail-cleanup.sh
@@ -35,7 +35,9 @@ test_expect_success 'create a repo to clone' '
 '
 
 test_expect_success 'create objects in repo for later corruption' '
-	test_commit -C foo file
+	test_commit -C foo file &&
+	git -C foo checkout --detach &&
+	test_commit -C foo detached
 '
 
 # source repository given to git clone should be relative to the
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/16] t5312: drop "verbose" helper
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (2 preceding siblings ...)
  2021-09-24 18:34 ` [PATCH 03/16] t5600: provide detached HEAD for corruption failures Jeff King
@ 2021-09-24 18:35 ` Jeff King
  2021-09-24 18:36 ` [PATCH 05/16] t5312: create bogus ref as necessary Jeff King
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

t5312 has several uses of the "verbose" helper, as described in
8ad1652418 (t5304: use helper to report failure of "test foo = bar",
2014-10-10). Back then the "-x" trace option for tests was new, and was
not as pleasant to use (e.g., some tests failed under "-x", we did not
support BASH_XTRACEFD, etc).

These days it is clear that "-x" is the preferred way to get extra
output, and we don't need to mark up individual tests. Let's get rid of
the uses of "verbose" here, as one step toward eradicating it totally.

Signed-off-by: Jeff King <peff@peff.net>
---
I've been tempted to do this tree-wide for a while, and I don't mind
doing that separately. But as I was touching these tests, it seemed like
a good time to do it here.

 t/t5312-prune-corruption.sh | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 11423b3cb2..165cc80fa4 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -31,21 +31,21 @@ test_expect_success 'create history reachable only from a bogus-named ref' '
 test_expect_success 'pruning does not drop bogus object' '
 	test_when_finished "git hash-object -w -t commit saved" &&
 	test_might_fail git prune --expire=now &&
-	verbose git cat-file -e $bogus
+	git cat-file -e $bogus
 '
 
 test_expect_success 'put bogus object into pack' '
 	git tag reachable $bogus &&
 	git repack -ad &&
 	git tag -d reachable &&
-	verbose git cat-file -e $bogus
+	git cat-file -e $bogus
 '
 
 test_expect_success 'destructive repack keeps packed object' '
 	test_might_fail git repack -Ad --unpack-unreachable=now &&
-	verbose git cat-file -e $bogus &&
+	git cat-file -e $bogus &&
 	test_might_fail git repack -ad &&
-	verbose git cat-file -e $bogus
+	git cat-file -e $bogus
 '
 
 # subsequent tests will have different corruptions
@@ -78,7 +78,7 @@ test_expect_success 'create history with missing tip commit' '
 test_expect_success 'pruning with a corrupted tip does not drop history' '
 	test_when_finished "git hash-object -w -t commit saved" &&
 	test_might_fail git prune --expire=now &&
-	verbose git cat-file -e $recoverable
+	git cat-file -e $recoverable
 '
 
 test_expect_success 'pack-refs does not silently delete broken loose ref' '
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/16] t5312: create bogus ref as necessary
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (3 preceding siblings ...)
  2021-09-24 18:35 ` [PATCH 04/16] t5312: drop "verbose" helper Jeff King
@ 2021-09-24 18:36 ` Jeff King
  2021-09-24 18:36 ` [PATCH 06/16] t5312: test non-destructive repack Jeff King
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:36 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Some tests in t5312 create an illegally-named ref, and then see how
various operations handle it. But between those operations, we also do
some more setup (e.g., repacking), and we are subtly depending on how
those setup steps react to the illegal ref.

To future-proof us against those behaviors changing, let's instead
create and clean up our bogus ref on demand in the tests that need it.

This has two small extra advantages:

 - the tests are more stand-alone; we do not need an extra test to clean
   up the ref before moving on to other parts of the script

 - the creation and cleanup is together in one helper function. Because
   these depend on touching the refs in the filesystem directly, they
   may need to be tweaked for a world with alternate backends (they have
   not been noticed so far in the reftable work because with a non-file
   backend the tests don't fail; they simply become uninteresting noops
   because the broken ref isn't read at all).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5312-prune-corruption.sh | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 165cc80fa4..0704f3e930 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -18,18 +18,23 @@ test_expect_success 'disable reflogs' '
 	git reflog expire --expire=all --all
 '
 
+create_bogus_ref () {
+	test_when_finished 'rm -f .git/refs/heads/bogus..name' &&
+	echo $bogus >.git/refs/heads/bogus..name
+}
+
 test_expect_success 'create history reachable only from a bogus-named ref' '
 	test_tick && git commit --allow-empty -m main &&
 	base=$(git rev-parse HEAD) &&
 	test_tick && git commit --allow-empty -m bogus &&
 	bogus=$(git rev-parse HEAD) &&
 	git cat-file commit $bogus >saved &&
-	echo $bogus >.git/refs/heads/bogus..name &&
 	git reset --hard HEAD^
 '
 
 test_expect_success 'pruning does not drop bogus object' '
 	test_when_finished "git hash-object -w -t commit saved" &&
+	create_bogus_ref &&
 	test_might_fail git prune --expire=now &&
 	git cat-file -e $bogus
 '
@@ -42,17 +47,13 @@ test_expect_success 'put bogus object into pack' '
 '
 
 test_expect_success 'destructive repack keeps packed object' '
+	create_bogus_ref &&
 	test_might_fail git repack -Ad --unpack-unreachable=now &&
 	git cat-file -e $bogus &&
 	test_might_fail git repack -ad &&
 	git cat-file -e $bogus
 '
 
-# subsequent tests will have different corruptions
-test_expect_success 'clean up bogus ref' '
-	rm .git/refs/heads/bogus..name
-'
-
 # We create two new objects here, "one" and "two". Our
 # main branch points to "two", which is deleted,
 # corrupting the repository. But we'd like to make sure
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/16] t5312: test non-destructive repack
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (4 preceding siblings ...)
  2021-09-24 18:36 ` [PATCH 05/16] t5312: create bogus ref as necessary Jeff King
@ 2021-09-24 18:36 ` Jeff King
  2021-09-24 18:37 ` [PATCH 07/16] t5312: be more assertive about command failure Jeff King
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:36 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In t5312, we create a state with a broken ref, and then make sure that
destructive repacks don't silently ignore the breakage (where a
destructive repack is one that might drop objects). But we don't check
the behavior of non-destructive repacks at all (i.e., ones where we'd
keep unreachable objects).

So let's add a test to confirm the current behavior, which is that
they are allowed (i.e., ignoring the breakage and considering any
objects it points to as unreachable). This may change in the future, but
we'd like for the test suite to alert us to that fact.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5312-prune-corruption.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 0704f3e930..c7010fb754 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -46,6 +46,11 @@ test_expect_success 'put bogus object into pack' '
 	git cat-file -e $bogus
 '
 
+test_expect_success 'non-destructive repack ignores bogus name' '
+	create_bogus_ref &&
+	git repack -adk
+'
+
 test_expect_success 'destructive repack keeps packed object' '
 	create_bogus_ref &&
 	test_might_fail git repack -Ad --unpack-unreachable=now &&
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/16] t5312: be more assertive about command failure
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (5 preceding siblings ...)
  2021-09-24 18:36 ` [PATCH 06/16] t5312: test non-destructive repack Jeff King
@ 2021-09-24 18:37 ` Jeff King
  2021-09-24 18:37 ` [PATCH 08/16] refs-internal.h: move DO_FOR_EACH_* flags next to each other Jeff King
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When repacking or pruning in a corrupted repository, our tests in t5312
argue that it is OK to complete the operation or bail, as long as we
don't actually delete the objects pointed to by the corruption.

This isn't a wrong line of reasoning, but the tests are a bit permissive
by using test_might_fail. The fact is that we _do_ bail currently, and
if we ever stopped doing so, that would be worthy of a human
investigating. So let's switch these to test_must_fail.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5312-prune-corruption.sh | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index c7010fb754..1522a4ba8e 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -7,6 +7,9 @@ if we see, for example, a ref with a bogus name, it is OK either to
 bail out or to proceed using it as a reachable tip, but it is _not_
 OK to proceed as if it did not exist. Otherwise we might silently
 delete objects that cannot be recovered.
+
+Note that we do assert command failure in these cases, because that is
+what currently happens. If that changes, these tests should be revisited.
 '
 GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
@@ -35,7 +38,7 @@ test_expect_success 'create history reachable only from a bogus-named ref' '
 test_expect_success 'pruning does not drop bogus object' '
 	test_when_finished "git hash-object -w -t commit saved" &&
 	create_bogus_ref &&
-	test_might_fail git prune --expire=now &&
+	test_must_fail git prune --expire=now &&
 	git cat-file -e $bogus
 '
 
@@ -53,9 +56,9 @@ test_expect_success 'non-destructive repack ignores bogus name' '
 
 test_expect_success 'destructive repack keeps packed object' '
 	create_bogus_ref &&
-	test_might_fail git repack -Ad --unpack-unreachable=now &&
+	test_must_fail git repack -Ad --unpack-unreachable=now &&
 	git cat-file -e $bogus &&
-	test_might_fail git repack -ad &&
+	test_must_fail git repack -ad &&
 	git cat-file -e $bogus
 '
 
@@ -83,7 +86,7 @@ test_expect_success 'create history with missing tip commit' '
 
 test_expect_success 'pruning with a corrupted tip does not drop history' '
 	test_when_finished "git hash-object -w -t commit saved" &&
-	test_might_fail git prune --expire=now &&
+	test_must_fail git prune --expire=now &&
 	git cat-file -e $recoverable
 '
 
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/16] refs-internal.h: move DO_FOR_EACH_* flags next to each other
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (6 preceding siblings ...)
  2021-09-24 18:37 ` [PATCH 07/16] t5312: be more assertive about command failure Jeff King
@ 2021-09-24 18:37 ` Jeff King
  2021-09-24 18:39 ` [PATCH 09/16] refs-internal.h: reorganize DO_FOR_EACH_* flag documentation Jeff King
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:37 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

There are currently two DO_FOR_EACH_* flags, which must not have their
bits overlap. Yet they're defined hundreds of lines apart. Let's move
them next to each other to make it clear that they are related and are a
complete set (which matters if you are adding a new flag and would like
to know what the next available bit is).

Signed-off-by: Jeff King <peff@peff.net>
---
You can probably guess how I found this problem. :)

 refs/refs-internal.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3155708345..7b30910974 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -248,6 +248,14 @@ int refs_rename_ref_available(struct ref_store *refs,
 /* Include broken references in a do_for_each_ref*() iteration: */
 #define DO_FOR_EACH_INCLUDE_BROKEN 0x01
 
+/*
+ * Only include per-worktree refs in a do_for_each_ref*() iteration.
+ * Normally this will be used with a files ref_store, since that's
+ * where all reference backends will presumably store their
+ * per-worktree refs.
+ */
+#define DO_FOR_EACH_PER_WORKTREE_ONLY 0x02
+
 /*
  * Reference iterators
  *
@@ -498,14 +506,6 @@ int do_for_each_repo_ref_iterator(struct repository *r,
 				  struct ref_iterator *iter,
 				  each_repo_ref_fn fn, void *cb_data);
 
-/*
- * Only include per-worktree refs in a do_for_each_ref*() iteration.
- * Normally this will be used with a files ref_store, since that's
- * where all reference backends will presumably store their
- * per-worktree refs.
- */
-#define DO_FOR_EACH_PER_WORKTREE_ONLY 0x02
-
 struct ref_store;
 
 /* refs backends */
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/16] refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (7 preceding siblings ...)
  2021-09-24 18:37 ` [PATCH 08/16] refs-internal.h: move DO_FOR_EACH_* flags next to each other Jeff King
@ 2021-09-24 18:39 ` Jeff King
  2021-09-24 18:41 ` [PATCH 10/16] refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag Jeff King
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:39 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The documentation for the DO_FOR_EACH_* flags is sprinkled over the
refs-internal.h file. We define the two flags in one spot, and then
describe them in more detail far away from there, in the definitions of
refs_ref_iterator_begin() and ref_iterator_advance_fn().

Let's try to organize this a bit better:

  - convert the #defines to an enum. This makes it clear that they are
    related, and that the enum shows the complete set of flags.

  - combine all descriptions for each flag in a single spot, next to the
    flag's definition

  - use the enum rather than a bare int for functions which take the
    flags. This helps readers realize which flags can be used.

  - clarify the mention of flags for ref_iterator_advance_fn(). It does
    not take flags itself, but is meant to depend on ones set up
    earlier.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c               | 10 ++++++----
 refs/refs-internal.h | 46 ++++++++++++++++++++++++++------------------
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/refs.c b/refs.c
index 8b9f7c3a80..c28bd6a818 100644
--- a/refs.c
+++ b/refs.c
@@ -1413,7 +1413,8 @@ int head_ref(each_ref_fn fn, void *cb_data)
 
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim, int flags)
+		const char *prefix, int trim,
+		enum do_for_each_ref_flags flags)
 {
 	struct ref_iterator *iter;
 
@@ -1479,7 +1480,8 @@ static int do_for_each_ref_helper(struct repository *r,
 }
 
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
-			   each_ref_fn fn, int trim, int flags, void *cb_data)
+			   each_ref_fn fn, int trim,
+			   enum do_for_each_ref_flags flags, void *cb_data)
 {
 	struct ref_iterator *iter;
 	struct do_for_each_ref_help hp = { fn, cb_data };
@@ -1516,7 +1518,7 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 
 int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data, unsigned int broken)
 {
-	unsigned int flag = 0;
+	enum do_for_each_ref_flags flag = 0;
 
 	if (broken)
 		flag = DO_FOR_EACH_INCLUDE_BROKEN;
@@ -1528,7 +1530,7 @@ int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
 			     each_ref_fn fn, void *cb_data,
 			     unsigned int broken)
 {
-	unsigned int flag = 0;
+	enum do_for_each_ref_flags flag = 0;
 
 	if (broken)
 		flag = DO_FOR_EACH_INCLUDE_BROKEN;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 7b30910974..2c4e1739f2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -245,16 +245,30 @@ int refs_rename_ref_available(struct ref_store *refs,
 /* We allow "recursive" symbolic refs. Only within reason, though */
 #define SYMREF_MAXDEPTH 5
 
-/* Include broken references in a do_for_each_ref*() iteration: */
-#define DO_FOR_EACH_INCLUDE_BROKEN 0x01
-
 /*
- * Only include per-worktree refs in a do_for_each_ref*() iteration.
- * Normally this will be used with a files ref_store, since that's
- * where all reference backends will presumably store their
- * per-worktree refs.
+ * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(),
+ * which feeds it).
  */
-#define DO_FOR_EACH_PER_WORKTREE_ONLY 0x02
+enum do_for_each_ref_flags {
+	/*
+	 * Include broken references in a do_for_each_ref*() iteration, which
+	 * would normally be omitted. This includes both refs that point to
+	 * missing objects (a true repository corruption), ones with illegal
+	 * names (which we prefer not to expose to callers), as well as
+	 * dangling symbolic refs (i.e., those that point to a non-existent
+	 * ref; this is not a corruption, but as they have no valid oid, we
+	 * omit them from normal iteration results).
+	 */
+	DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0),
+
+	/*
+	 * Only include per-worktree refs in a do_for_each_ref*() iteration.
+	 * Normally this will be used with a files ref_store, since that's
+	 * where all reference backends will presumably store their
+	 * per-worktree refs.
+	 */
+	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+};
 
 /*
  * Reference iterators
@@ -357,16 +371,12 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  * Return an iterator that goes over each reference in `refs` for
  * which the refname begins with prefix. If trim is non-zero, then
  * trim that many characters off the beginning of each refname.
- * The output is ordered by refname. The following flags are supported:
- *
- * DO_FOR_EACH_INCLUDE_BROKEN: include broken references in
- *         the iteration.
- *
- * DO_FOR_EACH_PER_WORKTREE_ONLY: only produce REF_TYPE_PER_WORKTREE refs.
+ * The output is ordered by refname.
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
-		const char *prefix, int trim, int flags);
+		const char *prefix, int trim,
+		enum do_for_each_ref_flags flags);
 
 /*
  * A callback function used to instruct merge_ref_iterator how to
@@ -454,10 +464,8 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 /*
  * backend-specific implementation of ref_iterator_advance. For symrefs, the
  * function should set REF_ISSYMREF, and it should also dereference the symref
- * to provide the OID referent. If DO_FOR_EACH_INCLUDE_BROKEN is set, symrefs
- * with non-existent referents and refs pointing to non-existent object names
- * should also be returned. If DO_FOR_EACH_PER_WORKTREE_ONLY, only
- * REF_TYPE_PER_WORKTREE refs should be returned.
+ * to provide the OID referent. It should respect do_for_each_ref_flags
+ * that were passed to refs_ref_iterator_begin().
  */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/16] refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (8 preceding siblings ...)
  2021-09-24 18:39 ` [PATCH 09/16] refs-internal.h: reorganize DO_FOR_EACH_* flag documentation Jeff King
@ 2021-09-24 18:41 ` Jeff King
  2021-09-24 18:42 ` [PATCH 11/16] refs: omit dangling symrefs when using GIT_REF_PARANOIA Jeff King
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:41 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual
corrupt refs (illegal names, missing objects), but also symrefs that
point to nothing. This latter is not really a corruption, but just
something that may happen normally. For example, the symref at
refs/remotes/origin/HEAD may point to a tracking branch which is later
deleted. (The local HEAD may also be unborn, of course, but we do not
access it through ref iteration).

Most callers of for_each_ref() etc, do not care. They don't pass
INCLUDE_BROKEN, so don't see it at all. But for those which do pass it,
this somewhat-normal state causes extra warnings (e.g., from
for-each-ref) or even aborts operations (destructive repacks with
GIT_REF_PARANOIA set).

This patch just introduces the flag and the mechanism; there are no
callers yet (and hence no tests). Two things to note on the
implementation:

  - we actually skip any symref that does not resolve to a ref. This
    includes ones which point to an invalidly-named ref. You could argue
    this is a more serious breakage than simple dangling. But the
    overall effect is the same (we could not follow the symref), as well
    as the impact on things like REF_PARANOIA (either way, a symref we
    can't follow won't impact reachability, because we'll see the ref
    itself during iteration). The underlying resolution function doesn't
    distinguish these two cases (they both get REF_ISBROKEN).

  - we change the iterator in refs/files-backend.c where we check
    INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c,
    but we don't know need to do anything there. The packed backend does
    not support symrefs at all.

The resulting set of flags might be a bit easier to follow if we broke
this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS".
But there are a few reasons not do so:

  - adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing
    callers intact, without changing their behavior (and some of them
    really do want to see the dangling symrefs; e.g., t5505 has a test
    which expects us to report when a symref becomes dangling)

  - they're not actually independent. You cannot say "include dangling
    symrefs" without also including refs whose objects are not
    reachable, because dangling symrefs by definition do not have an
    object. We could tweak the implementation to distinguish this, but
    in practice nobody wants to ask for that. Adding the OMIT flag keeps
    the implementation simple and makes sure we don't regress the
    current behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs/files-backend.c | 5 +++++
 refs/refs-internal.h | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 74c0385873..1148c0cf09 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -744,6 +744,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		    ref_type(iter->iter0->refname) != REF_TYPE_PER_WORKTREE)
 			continue;
 
+		if ((iter->flags & DO_FOR_EACH_OMIT_DANGLING_SYMREFS) &&
+		    (iter->iter0->flags & REF_ISSYMREF) &&
+		    (iter->iter0->flags & REF_ISBROKEN))
+			continue;
+
 		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
 		    !ref_resolves_to_object(iter->iter0->refname,
 					    iter->iter0->oid,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2c4e1739f2..96911fb26e 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -268,6 +268,12 @@ enum do_for_each_ref_flags {
 	 * per-worktree refs.
 	 */
 	DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1),
+
+	/*
+	 * Omit dangling symrefs from output; this only has an effect with
+	 * INCLUDE_BROKEN, since they are otherwise not included at all.
+	 */
+	DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2),
 };
 
 /*
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/16] refs: omit dangling symrefs when using GIT_REF_PARANOIA
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (9 preceding siblings ...)
  2021-09-24 18:41 ` [PATCH 10/16] refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag Jeff King
@ 2021-09-24 18:42 ` Jeff King
  2021-09-24 18:46 ` [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default Jeff King
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:42 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Dangling symrefs aren't actually a corruption problem. It's perfectly
fine for refs/remotes/origin/HEAD to point to an unborn branch. And in
particular, if you are trying to establish reachability, a symref that
points nowhere doesn't matter either way. Any ref it could point to will
be examined during the rest of the traversal.

It's possible that a symref pointing nowhere _could_ be a sign that the
ref it was meant to point to was deleted accidentally (e.g., via
corruption). But there is no particular reason to think that is true for
any given case, and in the meantime, GIT_REF_PARANOIA kicking in
automatically for some operations means they'll fail unnecessarily.

So let's loosen it just a bit. The new test in t5312 shows off an
example that is safe, but currently fails (and no longer does after this
patch).

Note that we don't do anything if the caller explicitly asked for
DO_FOR_EACH_INCLUDE_BROKEN. In that case they may be looking for
dangling symrefs themselves, and setting GIT_REF_PARANOIA should not
_loosen_ things from what the caller asked for.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c                      | 12 ++++++++----
 t/t5312-prune-corruption.sh |  7 +++++++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index c28bd6a818..e72510813e 100644
--- a/refs.c
+++ b/refs.c
@@ -1418,10 +1418,14 @@ struct ref_iterator *refs_ref_iterator_begin(
 {
 	struct ref_iterator *iter;
 
-	if (ref_paranoia < 0)
-		ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
-	if (ref_paranoia)
-		flags |= DO_FOR_EACH_INCLUDE_BROKEN;
+	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
+		if (ref_paranoia < 0)
+			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
+		if (ref_paranoia) {
+			flags |= DO_FOR_EACH_INCLUDE_BROKEN;
+			flags |= DO_FOR_EACH_OMIT_DANGLING_SYMREFS;
+		}
+	}
 
 	iter = refs->be->iterator_begin(refs, prefix, flags);
 
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index 1522a4ba8e..d8ec5a7462 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -62,6 +62,13 @@ test_expect_success 'destructive repack keeps packed object' '
 	git cat-file -e $bogus
 '
 
+test_expect_success 'destructive repack not confused by dangling symref' '
+	test_when_finished "git symbolic-ref -d refs/heads/dangling" &&
+	git symbolic-ref refs/heads/dangling refs/heads/does-not-exist &&
+	git repack -ad &&
+	test_must_fail git cat-file -e $bogus
+'
+
 # We create two new objects here, "one" and "two". Our
 # main branch points to "two", which is deleted,
 # corrupting the repository. But we'd like to make sure
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (10 preceding siblings ...)
  2021-09-24 18:42 ` [PATCH 11/16] refs: omit dangling symrefs when using GIT_REF_PARANOIA Jeff King
@ 2021-09-24 18:46 ` Jeff King
  2021-09-27 17:42   ` Jonathan Tan
  2021-09-24 18:46 ` [PATCH 13/16] repack, prune: drop GIT_REF_PARANOIA settings Jeff King
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The original point of the GIT_REF_PARANOIA flag was to include broken
refs in iterations, so that possibly-destructive operations would not
silently ignore them (and would generally instead try to operate on the
oids and fail when the objects could not be accessed).

We already turned this on by default for some dangerous operations, like
"repack -ad" (where missing a reachability tip would mean dropping the
associated history). But it was not on for general use, even though it
could easily result in the spreading of corruption (e.g., imagine
cloning a repository which simply omits some of its refs because
their objects are missing; the result quietly succeeds even though you
did not clone everything!).

This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned
above would actually fail (upload-pack tells us about the broken ref,
and when we ask for the objects, pack-objects fails to deliver them).
This may be inconvenient when working with a corrupted repository, but:

  - we are better off to err on the side of complaining about
    corruption, and then provide mechanisms for explicitly loosening
    safety.

  - this is only one type of corruption anyway. If we are missing any
    other objects in the history that _aren't_ ref tips, then we'd
    behave similarly (happily show the ref, but then barf when we
    started traversing).

We retain the GIT_REF_PARANOIA variable, but simply default it to "1"
instead of "0". That gives the user an escape hatch for loosening this
when working with a corrupt repository. It won't work across a remote
connection to upload-pack (because we can't necessarily set environment
variables on the remote), but there the client has other options (e.g.,
choosing which refs to fetch).

As a bonus, this also makes ref iteration faster in general (because we
don't have to call has_object_file() for each ref), though probably not
noticeably so in the general case. In a repo with a million refs, it
shaved a few hundred milliseconds off of upload-pack's advertisement;
that's noticeable, but most repos are not nearly that large.

The possible downside here is that any operation which iterates refs but
doesn't ever open their objects may now quietly claim to have X when the
object is corrupted (e.g., "git rev-list new-branch --not --all" will
treat a broken ref as uninteresting). But again, that's not really any
different than corruption below the ref level. We might have
refs/heads/old-branch as non-corrupt, but we are not actively checking
that we have the entire reachable history. Or the pointed-to object
could even be corrupted on-disk (but our "do we have it" check would
still succeed). In that sense, this is merely bringing ref-corruption in
line with general object corruption.

One alternative implementation would be to actually check for broken
refs, and then _immediately die_ if we see any. That would cause the
"rev-list --not --all" case above to abort immediately. But in many ways
that's the worst of all worlds:

  - it still spends time looking up the objects an extra time

  - it still doesn't catch corruption below the ref level

  - it's even more inconvenient; with the current implementation of
    GIT_REF_PARANOIA for something like upload-pack, we can make
    the advertisement and let the client choose a non-broken piece of
    history. If we bail as soon as we see a broken ref, they cannot even
    see the advertisement.

The test changes here show some of the fallout. A non-destructive "git
repack -adk" now fails by default (but we can override it). Deleting a
broken ref now actually tells the hooks the correct "before" state,
rather than a confusing null oid.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git.txt       | 19 ++++++++++---------
 refs.c                      |  2 +-
 t/t5312-prune-corruption.sh | 10 ++++++++--
 t/t5516-fetch-push.sh       |  7 ++++---
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index abace9eac2..d63c65e67d 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -867,15 +867,16 @@ for full details.
 	end user, to be recorded in the body of the reflog.
 
 `GIT_REF_PARANOIA`::
-	If set to `1`, include broken or badly named refs when iterating
-	over lists of refs. In a normal, non-corrupted repository, this
-	does nothing. However, enabling it may help git to detect and
-	abort some operations in the presence of broken refs. Git sets
-	this variable automatically when performing destructive
-	operations like linkgit:git-prune[1]. You should not need to set
-	it yourself unless you want to be paranoid about making sure
-	an operation has touched every ref (e.g., because you are
-	cloning a repository to make a backup).
+	If set to `0`, ignore broken or badly named refs when iterating
+	over lists of refs. Normally Git will try to include any such
+	refs, which may cause some operations to fail. This is usually
+	preferable, as potentially destructive operations (e.g.,
+	linkgit:git-prune[1]) are better off aborting rather than
+	ignoring broken refs (and thus considering the history they
+	point to as not worth saving). The default value is `1` (i.e.,
+	be paranoid about detecting and aborting all operations). You
+	should not normally need to set this to `0`, but it may be
+	useful when trying to salvage data from a corrupted repository.
 
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
diff --git a/refs.c b/refs.c
index e72510813e..ac19c689fa 100644
--- a/refs.c
+++ b/refs.c
@@ -1420,7 +1420,7 @@ struct ref_iterator *refs_ref_iterator_begin(
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
 		if (ref_paranoia < 0)
-			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
+			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 1);
 		if (ref_paranoia) {
 			flags |= DO_FOR_EACH_INCLUDE_BROKEN;
 			flags |= DO_FOR_EACH_OMIT_DANGLING_SYMREFS;
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index d8ec5a7462..ea889c088a 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -49,11 +49,17 @@ test_expect_success 'put bogus object into pack' '
 	git cat-file -e $bogus
 '
 
-test_expect_success 'non-destructive repack ignores bogus name' '
+test_expect_success 'non-destructive repack bails on bogus ref' '
 	create_bogus_ref &&
-	git repack -adk
+	test_must_fail git repack -adk
 '
 
+test_expect_success 'GIT_REF_PARANOIA=0 overrides safety' '
+	create_bogus_ref &&
+	GIT_REF_PARANOIA=0 git repack -adk
+'
+
+
 test_expect_success 'destructive repack keeps packed object' '
 	create_bogus_ref &&
 	test_must_fail git repack -Ad --unpack-unreachable=now &&
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index b13553ecf4..8212ca56dc 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -707,20 +707,21 @@ test_expect_success 'pushing valid refs triggers post-receive and post-update ho
 
 test_expect_success 'deleting dangling ref triggers hooks with correct args' '
 	mk_test_with_hooks testrepo heads/branch &&
+	orig=$(git -C testrepo rev-parse refs/heads/branch) &&
 	rm -f testrepo/.git/objects/??/* &&
 	git push testrepo :refs/heads/branch &&
 	(
 		cd testrepo/.git &&
 		cat >pre-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/branch
+		$orig $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >update.expect <<-EOF &&
-		refs/heads/branch $ZERO_OID $ZERO_OID
+		refs/heads/branch $orig $ZERO_OID
 		EOF
 
 		cat >post-receive.expect <<-EOF &&
-		$ZERO_OID $ZERO_OID refs/heads/branch
+		$orig $ZERO_OID refs/heads/branch
 		EOF
 
 		cat >post-update.expect <<-EOF &&
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/16] repack, prune: drop GIT_REF_PARANOIA settings
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (11 preceding siblings ...)
  2021-09-24 18:46 ` [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default Jeff King
@ 2021-09-24 18:46 ` Jeff King
  2021-09-24 18:48 ` [PATCH 14/16] ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN Jeff King
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Now that GIT_REF_PARANOIA is the default, we don't need to selectively
enable it for destructive operations. In fact, it's harmful to do so,
because it overrides any GIT_REF_PARANOIA=0 setting that the user may
have provided (because they're trying to work around some corruption).

With these uses gone, we can further clean up the ref_paranoia global,
and make it a static variable inside the refs code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/prune.c  | 1 -
 builtin/repack.c | 3 ---
 cache.h          | 8 --------
 environment.c    | 1 -
 refs.c           | 2 ++
 5 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 02c6ab7cba..485c9a3c56 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -143,7 +143,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 	expire = TIME_MAX;
 	save_commit_buffer = 0;
 	read_replace_refs = 0;
-	ref_paranoia = 1;
 	repo_init_revisions(the_repository, &revs, prefix);
 
 	argc = parse_options(argc, argv, prefix, options, prune_usage, 0);
diff --git a/builtin/repack.c b/builtin/repack.c
index c1a209013b..cb9f4bfed3 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -586,15 +586,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				strvec_pushf(&cmd.args,
 					     "--unpack-unreachable=%s",
 					     unpack_unreachable);
-				strvec_push(&cmd.env_array, "GIT_REF_PARANOIA=1");
 			} else if (pack_everything & LOOSEN_UNREACHABLE) {
 				strvec_push(&cmd.args,
 					    "--unpack-unreachable");
 			} else if (keep_unreachable) {
 				strvec_push(&cmd.args, "--keep-unreachable");
 				strvec_push(&cmd.args, "--pack-loose-unreachable");
-			} else {
-				strvec_push(&cmd.env_array, "GIT_REF_PARANOIA=1");
 			}
 		}
 	} else if (geometry) {
diff --git a/cache.h b/cache.h
index f6295f3b04..f445008895 100644
--- a/cache.h
+++ b/cache.h
@@ -994,14 +994,6 @@ extern const char *core_fsmonitor;
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
 
-/*
- * Include broken refs in all ref iterations, which will
- * generally choke dangerous operations rather than letting
- * them silently proceed without taking the broken ref into
- * account.
- */
-extern int ref_paranoia;
-
 /*
  * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
  */
diff --git a/environment.c b/environment.c
index b4ba4fa22d..7923ab21dd 100644
--- a/environment.c
+++ b/environment.c
@@ -31,7 +31,6 @@ int prefer_symlink_refs;
 int is_bare_repository_cfg = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
-int ref_paranoia = -1;
 int repository_format_precious_objects;
 int repository_format_worktree_config;
 const char *git_commit_encoding;
diff --git a/refs.c b/refs.c
index ac19c689fa..d0f4e8726b 100644
--- a/refs.c
+++ b/refs.c
@@ -1419,6 +1419,8 @@ struct ref_iterator *refs_ref_iterator_begin(
 	struct ref_iterator *iter;
 
 	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
+		static int ref_paranoia = -1;
+
 		if (ref_paranoia < 0)
 			ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 1);
 		if (ref_paranoia) {
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/16] ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (12 preceding siblings ...)
  2021-09-24 18:46 ` [PATCH 13/16] repack, prune: drop GIT_REF_PARANOIA settings Jeff King
@ 2021-09-24 18:48 ` Jeff King
  2021-09-24 18:48 ` [PATCH 15/16] ref-filter: drop broken-ref code entirely Jeff King
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Of the ref-filter callers, for-each-ref and git-branch both set the
INCLUDE_BROKEN flag (but git-tag does not, which is a weird
inconsistency).  But now that GIT_REF_PARANOIA is on by default, that
produces almost the same outcome for all three.

The one exception is that GIT_REF_PARANOIA will omit dangling symrefs.
That's a better behavior for these tools, as they would never include
such a symref in the main output anyway (they can't, as it doesn't point
to an object). Instead they issue a warning to stderr. But that warning
is somewhat useless; a dangling symref is a perfectly reasonable thing
to have in your repository, and is not a sign of corruption. It's much
friendlier to just quietly ignore it.

And in terms of robustness, the warning gains us little. It does not
impact the exit code of either tool. So while the warning _might_ clue
in a user that they have an unexpected broken symref, it would not help
any kind of scripted use.

This patch converts for-each-ref and git-branch to stop using the
INCLUDE_BROKEN flag. That gives them more reasonable behavior, and
harmonizes them with git-tag.

We have to change one test to adapt to the situation. t1430 tries to
trigger all of the REF_ISBROKEN behaviors from the underlying ref code.
It uses for-each-ref to do so (because there isn't any other mechanism).
That will no longer issue a warning about the symref which points to an
invalid name, as it's considered dangling (and we can instead be sure
that it's _not_ mentioned on stderr). Note that we do still complain
about the illegally named "broken..symref"; its problem is not that it's
dangling, but the name of the symref itself is illegal.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/branch.c        | 2 +-
 builtin/for-each-ref.c  | 2 +-
 t/t1430-bad-ref-name.sh | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 03c7b7253a..0b7ed82654 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -427,7 +427,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 
 	memset(&array, 0, sizeof(array));
 
-	filter_refs(&array, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
+	filter_refs(&array, filter, filter->kind);
 
 	if (filter->verbose)
 		maxwidth = calc_maxwidth(&array, strlen(remote_prefix));
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 89cb6307d4..642b4b888f 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -77,7 +77,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	filter.name_patterns = argv;
 	filter.match_as_path = 1;
-	filter_refs(&array, &filter, FILTER_REFS_ALL | FILTER_REFS_INCLUDE_BROKEN);
+	filter_refs(&array, &filter, FILTER_REFS_ALL);
 	ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index b1839e0877..fa3aeb80f2 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -170,7 +170,7 @@ test_expect_success 'for-each-ref emits warnings for broken names' '
 	! grep -e "badname" output &&
 	! grep -e "broken\.\.\.symref" output &&
 	test_i18ngrep "ignoring ref with broken name refs/heads/broken\.\.\.ref" error &&
-	test_i18ngrep "ignoring broken ref refs/heads/badname" error &&
+	test_i18ngrep ! "ignoring broken ref refs/heads/badname" error &&
 	test_i18ngrep "ignoring ref with broken name refs/heads/broken\.\.\.symref" error
 '
 
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 15/16] ref-filter: drop broken-ref code entirely
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (13 preceding siblings ...)
  2021-09-24 18:48 ` [PATCH 14/16] ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN Jeff King
@ 2021-09-24 18:48 ` Jeff King
  2021-09-24 18:48 ` [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in() Jeff King
  2021-09-24 20:22 ` [PATCH 0/16] enabling GIT_REF_PARANOIA by default Junio C Hamano
  16 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Now that none of our callers passes the INCLUDE_BROKEN flag, we can drop
it entirely, along with the code to plumb it through to the
for_each_fullref_in() functions.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 11 ++++-------
 ref-filter.h |  1 -
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 93ce2a6ef2..e59bb4cf9c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2405,13 +2405,10 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 {
 	struct ref_filter_cbdata ref_cbdata;
 	int ret = 0;
-	unsigned int broken = 0;
 
 	ref_cbdata.array = array;
 	ref_cbdata.filter = filter;
 
-	if (type & FILTER_REFS_INCLUDE_BROKEN)
-		broken = 1;
 	filter->kind = type & FILTER_REFS_KIND_MASK;
 
 	init_contains_cache(&ref_cbdata.contains_cache);
@@ -2428,13 +2425,13 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata, broken);
+			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata, 0);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata, broken);
+			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata, 0);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata, broken);
+			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata, 0);
 		else if (filter->kind & FILTER_REFS_ALL)
-			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata, broken);
+			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata, 0);
 		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
 			head_ref(ref_filter_handler, &ref_cbdata);
 	}
diff --git a/ref-filter.h b/ref-filter.h
index c15dee8d6b..b636f4389d 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -13,7 +13,6 @@
 #define QUOTE_PYTHON 4
 #define QUOTE_TCL 8
 
-#define FILTER_REFS_INCLUDE_BROKEN 0x0001
 #define FILTER_REFS_TAGS           0x0002
 #define FILTER_REFS_BRANCHES       0x0004
 #define FILTER_REFS_REMOTES        0x0008
-- 
2.33.0.1071.gb37e412355


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in()
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (14 preceding siblings ...)
  2021-09-24 18:48 ` [PATCH 15/16] ref-filter: drop broken-ref code entirely Jeff King
@ 2021-09-24 18:48 ` Jeff King
  2021-09-27 17:47   ` Jonathan Tan
  2021-09-24 20:22 ` [PATCH 0/16] enabling GIT_REF_PARANOIA by default Junio C Hamano
  16 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2021-09-24 18:48 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

No callers pass in anything but "0" here. Likewise to our sibling
functions. Note that some of them ferry along the flag, but none of
their callers pass anything but "0" either.

Nor is anybody likely to change that. Callers which really want to see
all of the raw refs use for_each_rawref(). And anybody interested in
iterating a subset of the refs will likely be happy to use the
now-default behavior of showing broken refs, but omitting dangling
symlinks.

So we can get rid of this whole feature.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-parse.c |  4 ++--
 ls-refs.c           |  2 +-
 ref-filter.c        | 19 +++++++++----------
 refs.c              | 22 ++++++----------------
 refs.h              |  9 +++------
 revision.c          |  2 +-
 6 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 22c4e1a4ff..8480a59f57 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -863,8 +863,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (!strcmp(arg, "--bisect")) {
-				for_each_fullref_in("refs/bisect/bad", show_reference, NULL, 0);
-				for_each_fullref_in("refs/bisect/good", anti_reference, NULL, 0);
+				for_each_fullref_in("refs/bisect/bad", show_reference, NULL);
+				for_each_fullref_in("refs/bisect/good", anti_reference, NULL);
 				continue;
 			}
 			if (opt_with_value(arg, "--branches", &arg)) {
diff --git a/ls-refs.c b/ls-refs.c
index be09568910..7fe9675d3c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -171,7 +171,7 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 	if (!data.prefixes.nr)
 		strvec_push(&data.prefixes, "");
 	for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v,
-				     send_ref, &data, 0);
+				     send_ref, &data);
 	packet_fflush(stdout);
 	strvec_clear(&data.prefixes);
 	strbuf_release(&data.buf);
diff --git a/ref-filter.c b/ref-filter.c
index e59bb4cf9c..e15f0c4bac 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2100,16 +2100,15 @@ static int filter_pattern_match(struct ref_filter *filter, const char *refname)
  */
 static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       each_ref_fn cb,
-				       void *cb_data,
-				       int broken)
+				       void *cb_data)
 {
 	if (!filter->match_as_path) {
 		/*
 		 * in this case, the patterns are applied after
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in("", cb, cb_data);
 	}
 
 	if (filter->ignore_case) {
@@ -2118,16 +2117,16 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in("", cb, cb_data);
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in("", cb, cb_data);
 	}
 
 	return for_each_fullref_in_prefixes(NULL, filter->name_patterns,
-					    cb, cb_data, broken);
+					    cb, cb_data);
 }
 
 /*
@@ -2425,13 +2424,13 @@ int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata, 0);
+			ret = for_each_fullref_in("refs/heads/", ref_filter_handler, &ref_cbdata);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata, 0);
+			ret = for_each_fullref_in("refs/remotes/", ref_filter_handler, &ref_cbdata);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata, 0);
+			ret = for_each_fullref_in("refs/tags/", ref_filter_handler, &ref_cbdata);
 		else if (filter->kind & FILTER_REFS_ALL)
-			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata, 0);
+			ret = for_each_fullref_in_pattern(filter, ref_filter_handler, &ref_cbdata);
 		if (!ret && (filter->kind & FILTER_REFS_DETACHED_HEAD))
 			head_ref(ref_filter_handler, &ref_cbdata);
 	}
diff --git a/refs.c b/refs.c
index d0f4e8726b..2be0d0f057 100644
--- a/refs.c
+++ b/refs.c
@@ -1522,25 +1522,16 @@ int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 	return refs_for_each_ref_in(get_main_ref_store(the_repository), prefix, fn, cb_data);
 }
 
-int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data, unsigned int broken)
+int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
 {
-	enum do_for_each_ref_flags flag = 0;
-
-	if (broken)
-		flag = DO_FOR_EACH_INCLUDE_BROKEN;
 	return do_for_each_ref(get_main_ref_store(the_repository),
-			       prefix, fn, 0, flag, cb_data);
+			       prefix, fn, 0, 0, cb_data);
 }
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
-			     each_ref_fn fn, void *cb_data,
-			     unsigned int broken)
+			     each_ref_fn fn, void *cb_data)
 {
-	enum do_for_each_ref_flags flag = 0;
-
-	if (broken)
-		flag = DO_FOR_EACH_INCLUDE_BROKEN;
-	return do_for_each_ref(refs, prefix, fn, 0, flag, cb_data);
+	return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
 }
 
 int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
@@ -1632,8 +1623,7 @@ static void find_longest_prefixes(struct string_list *out,
 
 int for_each_fullref_in_prefixes(const char *namespace,
 				 const char **patterns,
-				 each_ref_fn fn, void *cb_data,
-				 unsigned int broken)
+				 each_ref_fn fn, void *cb_data)
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
 	struct string_list_item *prefix;
@@ -1648,7 +1638,7 @@ int for_each_fullref_in_prefixes(const char *namespace,
 
 	for_each_string_list_item(prefix, &prefixes) {
 		strbuf_addstr(&buf, prefix->string);
-		ret = for_each_fullref_in(buf.buf, fn, cb_data, broken);
+		ret = for_each_fullref_in(buf.buf, fn, cb_data);
 		if (ret)
 			break;
 		strbuf_setlen(&buf, namespace_len);
diff --git a/refs.h b/refs.h
index 48970dfc7e..10e7696a64 100644
--- a/refs.h
+++ b/refs.h
@@ -342,10 +342,8 @@ int for_each_ref(each_ref_fn fn, void *cb_data);
 int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
-			     each_ref_fn fn, void *cb_data,
-			     unsigned int broken);
-int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data,
-			unsigned int broken);
+			     each_ref_fn fn, void *cb_data);
+int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data);
 
 /**
  * iterate all refs in "patterns" by partitioning patterns into disjoint sets
@@ -354,8 +352,7 @@ int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data,
  * callers should be prepared to ignore references that they did not ask for.
  */
 int for_each_fullref_in_prefixes(const char *namespace, const char **patterns,
-				 each_ref_fn fn, void *cb_data,
-				 unsigned int broken);
+				 each_ref_fn fn, void *cb_data);
 /**
  * iterate refs from the respective area.
  */
diff --git a/revision.c b/revision.c
index 0dabb5a0bc..b7a2baad0e 100644
--- a/revision.c
+++ b/revision.c
@@ -2548,7 +2548,7 @@ static int for_each_bisect_ref(struct ref_store *refs, each_ref_fn fn,
 	struct strbuf bisect_refs = STRBUF_INIT;
 	int status;
 	strbuf_addf(&bisect_refs, "refs/bisect/%s", term);
-	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data, 0);
+	status = refs_for_each_fullref_in(refs, bisect_refs.buf, fn, cb_data);
 	strbuf_release(&bisect_refs);
 	return status;
 }
-- 
2.33.0.1071.gb37e412355

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/16] enabling GIT_REF_PARANOIA by default
  2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
                   ` (15 preceding siblings ...)
  2021-09-24 18:48 ` [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in() Jeff King
@ 2021-09-24 20:22 ` Junio C Hamano
  16 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2021-09-24 20:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Tan

Jeff King <peff@peff.net> writes:

> I recently ran into a situation where dealing with a corrupted
> repository was more confusing than necessary, because Git by default
> ignores corrupted refs in many commands.
>
> A while ago we introduced GIT_REF_PARANOIA, which works by including
> broken refs in iteration, which then typically causes later operations
> to fail (e.g., during repacking, you'd prefer to barf loudly when trying
> to access the missing object rather than incorrectly assume the objects
> from the broken ref aren't reachable).
>
> I think this is a better default for Git to have in general, not just
> for a few select operations (we turn it on by default for pruning and
> some repacks). We shouldn't see corruptions in general, and complaining
> loudly when we do is the safest option. The reason we held back when the
> knob was introduced was mostly out of deference to the historical
> behavior.

Yeah, having an escape hatch to serve as a tool to deal with and
repair a corrupt repository might be worth considering, but I tend
to agree that it is a better default to notice and loudly report a
corruption.

The series was a quite pleasant read.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 01/16] t7900: clean up some more broken refs
  2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
@ 2021-09-27 17:38   ` Jonathan Tan
  2021-09-27 19:49     ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2021-09-27 17:38 UTC (permalink / raw)
  To: peff; +Cc: git, jonathantanmy, dstolee

> @@ -277,7 +277,7 @@ test_expect_success 'incremental-repack task' '
>  
>  	# Delete refs that have not been repacked in these packs.
>  	git for-each-ref --format="delete %(refname)" \
> -		refs/prefetch refs/tags >refs &&
> +		refs/prefetch refs/tags refs/remotes >refs &&
>  	git update-ref --stdin <refs &&
>  
>  	# Replace the object directory with this pack layout.
> @@ -286,6 +286,10 @@ test_expect_success 'incremental-repack task' '
>  	ls $packDir/*.pack >packs-before &&
>  	test_line_count = 3 packs-before &&
>  
> +	# make sure we do not have any broken refs that were
> +	# missed in the deletion above
> +	git for-each-ref &&

For what it's worth, I verified that a fatal error is indeed caused in
"git for-each-ref" if refs/remotes was not deleted. This patch looks
good.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default
  2021-09-24 18:46 ` [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default Jeff King
@ 2021-09-27 17:42   ` Jonathan Tan
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Tan @ 2021-09-27 17:42 UTC (permalink / raw)
  To: peff; +Cc: git, jonathantanmy

> Deleting a
> broken ref now actually tells the hooks the correct "before" state,
> rather than a confusing null oid.

I believe that this is due to write_head_info() in
builtin/receive-pack.c now advertising the broken ref, so that the
client knows what the old OID is when pushing in order to delete the ref
(verified by running the relevant "git push" in the test with
GIT_TRACE_PACKET=1, with and without GIT_REF_PARANOIA=0). If rerolling,
maybe this is worth adding in parentheses (e.g. "(because such a ref
would be advertised to the client and the client would thus send the
"before" OID when deleting it)").

All the patches up to this look good.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in()
  2021-09-24 18:48 ` [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in() Jeff King
@ 2021-09-27 17:47   ` Jonathan Tan
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Tan @ 2021-09-27 17:47 UTC (permalink / raw)
  To: peff; +Cc: git, jonathantanmy

> No callers pass in anything but "0" here. Likewise to our sibling
> functions. Note that some of them ferry along the flag, but none of
> their callers pass anything but "0" either.
> 
> Nor is anybody likely to change that. Callers which really want to see
> all of the raw refs use for_each_rawref(). And anybody interested in
> iterating a subset of the refs will likely be happy to use the
> now-default behavior of showing broken refs, but omitting dangling
> symlinks.
> 
> So we can get rid of this whole feature.
> 
> Signed-off-by: Jeff King <peff@peff.net>

All the patches look good.
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>

As you have mentioned before, this clashes with my series [1]. But since
I'll need to update my series anyway to reinclude the
DO_FOR_EACH_INCLUDE_BROKEN flag, the obvious thing is for me to rebase
mine on top of yours so that yours can go in first, so I don't mind
doing that.

[1] https://lore.kernel.org/git/cover.1632242495.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 01/16] t7900: clean up some more broken refs
  2021-09-27 17:38   ` Jonathan Tan
@ 2021-09-27 19:49     ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2021-09-27 19:49 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, dstolee

On Mon, Sep 27, 2021 at 10:38:18AM -0700, Jonathan Tan wrote:

> > @@ -277,7 +277,7 @@ test_expect_success 'incremental-repack task' '
> >  
> >  	# Delete refs that have not been repacked in these packs.
> >  	git for-each-ref --format="delete %(refname)" \
> > -		refs/prefetch refs/tags >refs &&
> > +		refs/prefetch refs/tags refs/remotes >refs &&
> >  	git update-ref --stdin <refs &&
> >  
> >  	# Replace the object directory with this pack layout.
> > @@ -286,6 +286,10 @@ test_expect_success 'incremental-repack task' '
> >  	ls $packDir/*.pack >packs-before &&
> >  	test_line_count = 3 packs-before &&
> >  
> > +	# make sure we do not have any broken refs that were
> > +	# missed in the deletion above
> > +	git for-each-ref &&
> 
> For what it's worth, I verified that a fatal error is indeed caused in
> "git for-each-ref" if refs/remotes was not deleted. This patch looks
> good.

Me too. :) It works because for-each-ref's default format will try to
show the type of the object, so it complains when the object is missing.

We could make this "git fsck" if that's less subtle, but this is a bit
cheaper.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-09-27 19:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24 18:30 [PATCH 0/16] enabling GIT_REF_PARANOIA by default Jeff King
2021-09-24 18:32 ` [PATCH 01/16] t7900: clean up some more broken refs Jeff King
2021-09-27 17:38   ` Jonathan Tan
2021-09-27 19:49     ` Jeff King
2021-09-24 18:33 ` [PATCH 02/16] t5516: don't use HEAD ref for invalid ref-deletion tests Jeff King
2021-09-24 18:34 ` [PATCH 03/16] t5600: provide detached HEAD for corruption failures Jeff King
2021-09-24 18:35 ` [PATCH 04/16] t5312: drop "verbose" helper Jeff King
2021-09-24 18:36 ` [PATCH 05/16] t5312: create bogus ref as necessary Jeff King
2021-09-24 18:36 ` [PATCH 06/16] t5312: test non-destructive repack Jeff King
2021-09-24 18:37 ` [PATCH 07/16] t5312: be more assertive about command failure Jeff King
2021-09-24 18:37 ` [PATCH 08/16] refs-internal.h: move DO_FOR_EACH_* flags next to each other Jeff King
2021-09-24 18:39 ` [PATCH 09/16] refs-internal.h: reorganize DO_FOR_EACH_* flag documentation Jeff King
2021-09-24 18:41 ` [PATCH 10/16] refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag Jeff King
2021-09-24 18:42 ` [PATCH 11/16] refs: omit dangling symrefs when using GIT_REF_PARANOIA Jeff King
2021-09-24 18:46 ` [PATCH 12/16] refs: turn on GIT_REF_PARANOIA by default Jeff King
2021-09-27 17:42   ` Jonathan Tan
2021-09-24 18:46 ` [PATCH 13/16] repack, prune: drop GIT_REF_PARANOIA settings Jeff King
2021-09-24 18:48 ` [PATCH 14/16] ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN Jeff King
2021-09-24 18:48 ` [PATCH 15/16] ref-filter: drop broken-ref code entirely Jeff King
2021-09-24 18:48 ` [PATCH 16/16] refs: drop "broken" flag from for_each_fullref_in() Jeff King
2021-09-27 17:47   ` Jonathan Tan
2021-09-24 20:22 ` [PATCH 0/16] enabling GIT_REF_PARANOIA by default Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.