git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: "R. Diez" <rdiez-temp3@rd10.de>
Cc: git@vger.kernel.org
Subject: Re: git fsck does not check the packed-refs file
Date: Thu, 18 Jan 2024 12:15:08 +0100	[thread overview]
Message-ID: <ZakIPEytlxHGCB9Y@tanuki> (raw)
In-Reply-To: <6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de>

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

On Thu, Jan 18, 2024 at 09:02:30AM +0100, R. Diez wrote:
> Hi all:
> 
> I have been hit by an unfortunate system problem, and as a result, a
> few files in my Git repository got corrupted on my last git push. Some
> random blocks of bytes were overwritten with binary zeros, so I
> started getting weird unpacking errors etc.
> 
> It took a while to realise what the problem was. During my
> investigation, I ran "git fsck", which reported no problems, and then
> "git push" failed.
> 
> One of the very few corrupted files was packed-refs. This is a text
> file, so it was easy to compare it and see the corrupting binary
> zeros. But that made me wonder what "git fsck" checks.

Can you maybe expand a bit on how you arrived at this bug? Was this a
hard crash of the system that corrupted the repository or rather
something like actual disk corruption?

I'm mostly asking because I have been fixing some sources of refdb
corruption:

  - bc22d845c4 (core.fsync: new option to harden references, 2022-03-11)
    started to fsync loose refs to disk before renaming them into place,
    released with Git v2.36.

  - ce54672f9b (refs: fix corruption by not correctly syncing
    packed-refs to disk, 2022-12-20) started to sync packed-refs to disk
    before renaming them into place, released with Git v2.40 and
    backported to Git v2.39.3.

So if:

  - you use a journaling filesystem,

  - you didn't disable `core.fsync`,

  - you use Git v2.40 or newer,

then you should in theory not run into any refdb corruption anymore. At
least we didn't experience corruption anymore at GitLab.com, whereas
before we encountered corruption every so often.

> I am guessing that "git fsck" does not check file packed-refs at all.
> I mean, it does not even attempt to parse it, in order to check
> whether at least the format makes any sense. Only "git push" does it.

Indeed it doesn't. While the issue is comparatively easy to spot by
manually inspecting the `packed-refs` file, I agree that it would be
great if git-fsck(1) knew how to check the refdb for consistency. This
problem is only going to get worse once the upcoming reftable backend
lands -- it is a binary format, and just opening it with a text editor
to check whether it looks sane-ish stops being a viable option here.

In fact, I already planned to introduce such consistency checks for the
refdb soonish. Once the reftable backend is upstream I will focus more
on additional tooling to support it, and extending our consistency
checks is one of the first items on my todo list here.

> What other parts of the repository does "git fsck" not check then?

There may be some metadata and cache-like data structures that we don't
check, but the object database is checked by default.

> The repository check is suspiciously fast. Is there a slow way to
> check that a repository is fine? I mean, something along the lines of
> checking whether every commit can be checked out without problems.

Other than running `git fsck --full --strict`: not that I'm aware of.
And `--full` isn't even needed because it's the default.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-01-18 11:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  8:02 git fsck does not check the packed-refs file R. Diez
2024-01-18 11:15 ` Patrick Steinhardt [this message]
2024-01-20  1:00   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZakIPEytlxHGCB9Y@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=rdiez-temp3@rd10.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).