All of lore.kernel.org
 help / color / mirror / Atom feed
* Repository corruption if objects pushed in the middle of repack
@ 2022-06-13 20:31 Konstantin Ryabitsev
  2022-06-13 21:18 ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2022-06-13 20:31 UTC (permalink / raw)
  To: git

Hi, all:

I'm trying to figure the cause of repository corruption in a very specific
case. Here's the setup:

1. the repository is several GB in size, full of automatically generated pushes
   (https://git.yoctoproject.org/poky-buildhistory/)
2. this repository has no alternates or other clever things -- just your old
   boring repository
2. the builders check out this repository with --depth 1 during the build
   stage, then add new logs to the repository, commit and push

Admittedly, this is a bad use of git, but let's use that outside the scope.

Every weekend we run a set of maintenance tasks and if we find that there are
lots of new loose objects (which there usually are), we fire off a routine repack:

1. first, repack runs with the following flags (-f if deemed necessary):

        git repack -n --window-memory=1g -a -b --unpack-unreachable=yesterday -f --pack-kept-objects -d

   Since the repository is large, this usually takes a long time (3+ hours)

2. next, we generate a fresh commit-graph:

        git commit-graph write

3. next, we run pack-refs:

        git pack-refs --all

4. after that, we run prune:

        git prune --expire=yesterday

In the case of this particular repository, we regularly run into repository
corruption, reported during the prune stage:

    fsck[10362] 2022-05-09 01:00:06,378 - INFO - /var/lib/gitolite3/repositories/poky-buildhistory.git:
    fsck[10362] 2022-05-09 01:00:06,700 - INFO -    repack: performing a full repack for optimal deltas
    fsck[10362] 2022-05-09 01:00:06,701 - INFO -    repack: repacking with "-n --window-memory=1g -a -b --unpack-unreachable=yesterday -f --pack-kept-objects -d"
    fsck[10362] 2022-05-09 03:19:15,825 - INFO -     graph: generating commit-graph
    fsck[10362] 2022-05-09 03:19:20,830 - INFO -  packrefs: repacking all refs
    fsck[10362] 2022-05-09 03:19:20,842 - INFO -     prune: pruning
    fsck[10362] 2022-05-09 03:19:21,622 - CRITICAL - /var/lib/gitolite3/repositories/poky-buildhistory.git reports errors:
    fsck[10362] 2022-05-09 03:19:21,625 - CRITICAL -        fatal: bad tree object ace77888c63e5c4e545f1bd7a3ee5934e35f56e9
    fsck[10362] 2022-05-09 03:19:21,626 - WARNING - Repacking /var/lib/gitolite3/repositories/poky-buildhistory.git was unsuccessful

The tree object in question came in during the repack stage:

    2022-05-09.02:36:33     11129   update  poky-buildhistory buildhistory    W       refs/heads/poky/master/qemuppc  5aad6c8130370bf22f5639162bbbfeaefd0fcd5e ea4e65d72a6161fece5734f7b111e31af77c7578        refs/.*

As far as I know, the maintenance steps we are running shouldn't result in any
missing objects, so I'm curious if it's something we're doing wrong (using
unsafe flags) or if git isn't properly accounting for some objects that come
in during the repack stage. We're seeing this happen fairly routinely, so it's
not just a random fluke.

Git version 2.36.1 (and earlier).

Thanks,
Konstantin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 20:31 Repository corruption if objects pushed in the middle of repack Konstantin Ryabitsev
@ 2022-06-13 21:18 ` Taylor Blau
  2022-06-13 21:24   ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2022-06-13 21:18 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git

Hi Konstantin,

On Mon, Jun 13, 2022 at 04:31:45PM -0400, Konstantin Ryabitsev wrote:
> As far as I know, the maintenance steps we are running shouldn't result in any
> missing objects, so I'm curious if it's something we're doing wrong (using
> unsafe flags) or if git isn't properly accounting for some objects that come
> in during the repack stage. We're seeing this happen fairly routinely, so it's
> not just a random fluke.

Interesting. From what you described, it does suggest that `repack` is
deleting things too eagerly.

But I would be quite surprised if that were the case, since `repack` is
*very* careful to only delete packs which it knew about at the start of
the repack. Likewise, when it cleans up loose objects, it only unlinks
objects which can be found in some (non-deleted) pack that will remain
in the repository.

So I doubt that repack is doing something weird here, though it would be
extremely interseting if you were able to pause the `repack` process at
a specific point, push new objects into the repository, and reliably
demonstrate the corruption.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 21:18 ` Taylor Blau
@ 2022-06-13 21:24   ` Taylor Blau
  2022-06-13 21:32     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2022-06-13 21:24 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git

On Mon, Jun 13, 2022 at 05:18:25PM -0400, Taylor Blau wrote:
> Hi Konstantin,
>
> On Mon, Jun 13, 2022 at 04:31:45PM -0400, Konstantin Ryabitsev wrote:
> > As far as I know, the maintenance steps we are running shouldn't result in any
> > missing objects, so I'm curious if it's something we're doing wrong (using
> > unsafe flags) or if git isn't properly accounting for some objects that come
> > in during the repack stage. We're seeing this happen fairly routinely, so it's
> > not just a random fluke.
>
> [...]
>
> So I doubt that repack is doing something weird here, though it would be
> extremely interseting if you were able to pause the `repack` process at
> a specific point, push new objects into the repository, and reliably
> demonstrate the corruption.

A much more likely explanation for what is going on has to do with the
`--unpack-unreachable` option you're using.

In your example, any unreachable object written within the last day is
written loose, and anything else older than that is simply discarded. If
the following happens, in order:

  - an unreachable object is detected, and marked for deletion
  - that object then becomes reachable via some ref-update
  - then the object becomes an ancestor of some push which depends on it
  - _then_ the object is deleted by repack

...then the repository will be missing some objects which are in its
reachable set, and thus corrupt. IOW, the `--unpack-unreachable` option
(and its successor, cruft packs) are both racy with respect to
ref-updates.

Are you able to find evidence of that race in your logging? I would bet
that is likely what is going on here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 21:24   ` Taylor Blau
@ 2022-06-13 21:32     ` Konstantin Ryabitsev
  2022-06-13 21:36       ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2022-06-13 21:32 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

On Mon, Jun 13, 2022 at 05:24:27PM -0400, Taylor Blau wrote:
> A much more likely explanation for what is going on has to do with the
> `--unpack-unreachable` option you're using.
> 
> In your example, any unreachable object written within the last day is
> written loose, and anything else older than that is simply discarded. If
> the following happens, in order:
> 
>   - an unreachable object is detected, and marked for deletion
>   - that object then becomes reachable via some ref-update
>   - then the object becomes an ancestor of some push which depends on it
>   - _then_ the object is deleted by repack
> 
> ...then the repository will be missing some objects which are in its
> reachable set, and thus corrupt. IOW, the `--unpack-unreachable` option
> (and its successor, cruft packs) are both racy with respect to
> ref-updates.
> 
> Are you able to find evidence of that race in your logging? I would bet
> that is likely what is going on here.

I'm not sure that's the case, because the object that is missing is the one
that didn't exist before the repack started. In the scenario you describe, the
pre-existing unreachable ancestor of it would be missing, not the newly
incoming object. Right?

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 21:32     ` Konstantin Ryabitsev
@ 2022-06-13 21:36       ` Taylor Blau
  2022-06-13 21:45         ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2022-06-13 21:36 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git

On Mon, Jun 13, 2022 at 05:32:21PM -0400, Konstantin Ryabitsev wrote:
> On Mon, Jun 13, 2022 at 05:24:27PM -0400, Taylor Blau wrote:
> > A much more likely explanation for what is going on has to do with the
> > `--unpack-unreachable` option you're using.
> >
> > In your example, any unreachable object written within the last day is
> > written loose, and anything else older than that is simply discarded. If
> > the following happens, in order:
> >
> >   - an unreachable object is detected, and marked for deletion
> >   - that object then becomes reachable via some ref-update
> >   - then the object becomes an ancestor of some push which depends on it
> >   - _then_ the object is deleted by repack
> >
> > ...then the repository will be missing some objects which are in its
> > reachable set, and thus corrupt. IOW, the `--unpack-unreachable` option
> > (and its successor, cruft packs) are both racy with respect to
> > ref-updates.
> >
> > Are you able to find evidence of that race in your logging? I would bet
> > that is likely what is going on here.
>
> I'm not sure that's the case, because the object that is missing is the one
> that didn't exist before the repack started. In the scenario you describe, the
> pre-existing unreachable ancestor of it would be missing, not the newly
> incoming object. Right?

Aren't we reporting that the newly pushed tree was broken _because_ it
had some links to sub-trees that no longer existed?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 21:36       ` Taylor Blau
@ 2022-06-13 21:45         ` Konstantin Ryabitsev
  2022-06-13 22:26           ` Chris Torek
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2022-06-13 21:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

On Mon, Jun 13, 2022 at 05:36:43PM -0400, Taylor Blau wrote:
> > I'm not sure that's the case, because the object that is missing is the one
> > that didn't exist before the repack started. In the scenario you describe, the
> > pre-existing unreachable ancestor of it would be missing, not the newly
> > incoming object. Right?
> 
> Aren't we reporting that the newly pushed tree was broken _because_ it
> had some links to sub-trees that no longer existed?

Hmm... now I'm not sure, and I don't have the broken repo in front of me any
more. :(

Well, the upside of this happening on a routine basis is that I can make
a copy of it next time so I can be more helpful in troubleshooting this
situation. Let me sit on this and make some copies next time this happens
(even if it's super annoying that it happens to such a large repo), and then
perhaps I can give you a better answer.

It's just strange that we've been doing something similar like this to tens of
thousands of repositories (e.g. those on codeaurora.org), and it's the first
time that I see such consistent corruption manifest itself. If I were to go
with my gut instinct, I would blame the shallow checkout on the client, but I
don't have any good way of explaining why that would be the culprit either.

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Repository corruption if objects pushed in the middle of repack
  2022-06-13 21:45         ` Konstantin Ryabitsev
@ 2022-06-13 22:26           ` Chris Torek
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Torek @ 2022-06-13 22:26 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Taylor Blau, Git List

On Mon, Jun 13, 2022 at 3:07 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
> It's just strange that we've been doing something similar like this to tens of
> thousands of repositories (e.g. those on codeaurora.org), and it's the first
> time that I see such consistent corruption manifest itself. If I were to go
> with my gut instinct, I would blame the shallow checkout on the client, but I
> don't have any good way of explaining why that would be the culprit either.

One thing that *is* different with a shallow clone followed by push, is
that the `git push` pushes a lot of objects unnecessarily because the
client doesn't have the commits to prove that they're unnecessary. So
the delivered pack file has a *lot* of redundant objects.

(A `--depth 2` clone usually omits most redundant objects, which is a
reason to use `--depth 2` instead of `--depth 1`.)

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-13 22:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13 20:31 Repository corruption if objects pushed in the middle of repack Konstantin Ryabitsev
2022-06-13 21:18 ` Taylor Blau
2022-06-13 21:24   ` Taylor Blau
2022-06-13 21:32     ` Konstantin Ryabitsev
2022-06-13 21:36       ` Taylor Blau
2022-06-13 21:45         ` Konstantin Ryabitsev
2022-06-13 22:26           ` Chris Torek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.