All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: git@vger.kernel.org
Subject: Repository corruption if objects pushed in the middle of repack
Date: Mon, 13 Jun 2022 16:31:45 -0400	[thread overview]
Message-ID: <20220613203145.wbpi2m3ys3hchw6c@meerkat.local> (raw)

Hi, all:

I'm trying to figure the cause of repository corruption in a very specific
case. Here's the setup:

1. the repository is several GB in size, full of automatically generated pushes
   (https://git.yoctoproject.org/poky-buildhistory/)
2. this repository has no alternates or other clever things -- just your old
   boring repository
2. the builders check out this repository with --depth 1 during the build
   stage, then add new logs to the repository, commit and push

Admittedly, this is a bad use of git, but let's use that outside the scope.

Every weekend we run a set of maintenance tasks and if we find that there are
lots of new loose objects (which there usually are), we fire off a routine repack:

1. first, repack runs with the following flags (-f if deemed necessary):

        git repack -n --window-memory=1g -a -b --unpack-unreachable=yesterday -f --pack-kept-objects -d

   Since the repository is large, this usually takes a long time (3+ hours)

2. next, we generate a fresh commit-graph:

        git commit-graph write

3. next, we run pack-refs:

        git pack-refs --all

4. after that, we run prune:

        git prune --expire=yesterday

In the case of this particular repository, we regularly run into repository
corruption, reported during the prune stage:

    fsck[10362] 2022-05-09 01:00:06,378 - INFO - /var/lib/gitolite3/repositories/poky-buildhistory.git:
    fsck[10362] 2022-05-09 01:00:06,700 - INFO -    repack: performing a full repack for optimal deltas
    fsck[10362] 2022-05-09 01:00:06,701 - INFO -    repack: repacking with "-n --window-memory=1g -a -b --unpack-unreachable=yesterday -f --pack-kept-objects -d"
    fsck[10362] 2022-05-09 03:19:15,825 - INFO -     graph: generating commit-graph
    fsck[10362] 2022-05-09 03:19:20,830 - INFO -  packrefs: repacking all refs
    fsck[10362] 2022-05-09 03:19:20,842 - INFO -     prune: pruning
    fsck[10362] 2022-05-09 03:19:21,622 - CRITICAL - /var/lib/gitolite3/repositories/poky-buildhistory.git reports errors:
    fsck[10362] 2022-05-09 03:19:21,625 - CRITICAL -        fatal: bad tree object ace77888c63e5c4e545f1bd7a3ee5934e35f56e9
    fsck[10362] 2022-05-09 03:19:21,626 - WARNING - Repacking /var/lib/gitolite3/repositories/poky-buildhistory.git was unsuccessful

The tree object in question came in during the repack stage:

    2022-05-09.02:36:33     11129   update  poky-buildhistory buildhistory    W       refs/heads/poky/master/qemuppc  5aad6c8130370bf22f5639162bbbfeaefd0fcd5e ea4e65d72a6161fece5734f7b111e31af77c7578        refs/.*

As far as I know, the maintenance steps we are running shouldn't result in any
missing objects, so I'm curious if it's something we're doing wrong (using
unsafe flags) or if git isn't properly accounting for some objects that come
in during the repack stage. We're seeing this happen fairly routinely, so it's
not just a random fluke.

Git version 2.36.1 (and earlier).

Thanks,
Konstantin

             reply	other threads:[~2022-06-13 20:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 20:31 Konstantin Ryabitsev [this message]
2022-06-13 21:18 ` Repository corruption if objects pushed in the middle of repack Taylor Blau
2022-06-13 21:24   ` Taylor Blau
2022-06-13 21:32     ` Konstantin Ryabitsev
2022-06-13 21:36       ` Taylor Blau
2022-06-13 21:45         ` Konstantin Ryabitsev
2022-06-13 22:26           ` Chris Torek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220613203145.wbpi2m3ys3hchw6c@meerkat.local \
    --to=konstantin@linuxfoundation.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.