git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Szorc <gregory.szorc@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, stolee@gmail.com
Subject: Re: Race condition between repack and loose-objects maintenance task
Date: Wed, 29 Jun 2022 17:44:25 -0700	[thread overview]
Message-ID: <CAKQoGanPBec6wRO6uWrETaoJXdszpjRWytXaJwx6jw0mrrj-gQ@mail.gmail.com> (raw)
In-Reply-To: <YryKCl5J1Em89d3e@nand.local>

On Wed, Jun 29, 2022 at 10:21 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Jun 29, 2022 at 10:16:38AM -0700, Gregory Szorc wrote:
> > > In either case, my recommendation would be to keep those unreachable
> > > objects which haven't yet aged out of the repository around for longer,
> > > which will decrease the likelihood of seeing the race.
> >
> > We had to lower gc.pruneExpire from its default of 1 hour because
> > otherwise this would result in the creation of thousands of loose
> > objects. This is normally acceptable. However, on NFS, the churn from
> > lots of file creation and deletion resulted in acceptable performance
> > degradation. We had to lower gc.pruneExpire to minimize the damage.
>
> Makes sense. Having a shorter gc.pruneExpire makes the race much more
> likely to occur in practice, so I'm glad that we have a plausible
> confirmation that that's what's going on in your repository.
>
> > > See Documentation/technical/cruft-packs.txt for more information about
> > > cruft packs.
> >
> > Yes, I saw this new feature in the Git release this week and am hoping
> > to roll it out to better mitigate this general problem! With cruft
> > packs, I anticipate being able to restore a longer gc.pruneExpire
> > value as they should mitigate the NFS performance badness we're
> > working around. Thank you for confirming it should help with the
> > problem.
>
> That should definitely help. Let me know if you have any questions, and
> thanks for trying it out!

Revising my initial post, not running `loose-objects` is insufficient:
we also need to prevent `incremental-repack` from running while a `git
gc` / `git repack` is in progress. If an `incremental-repack` runs
concurrently with `git repack`, `repack` can error after bitmap
generation but before the temp packfile is renamed with "could not
find 'pack-*.pack'." I suspect this has to do with
`incremental-repack` deleting packs that were initially in place when
`git repack` started running. But I haven't looked into where exactly
repack is failing.

I deployed Git 2.37 and enabled cruft packs via `gc.cruftPacks=true`
in the global gitconfig. I simultaneously increased `gc.pruneExpire`
to 1 day (as we're no longer concerned about each unreferenced object
turning into a single file). And I changed the frequently-invoked `git
maintenance` code to not run `loose-objects` or `incremental-repack`
if a gc.pid file is present (this is still racy but should hopefully
work enough of the time). This appeared to work: `git gc` created a
cruft pack and didn't complain about anything disappearing out from
under it, despite `git maintenance` running `commit-graph` and
`pack-refs` tasks during the long-running `git gc` / `git repack`.

However, I think there is yet another bug at play: running
`incremental-repack` appears to be able to repack the cruft packfile.
In doing so, it deletes its .mtimes file and the metadata inside. This
behavior seems wrong to me: I was expecting the cruft packfile to
linger until the next `git gc` and/or for its mtimes metadata to be
preserved across future repacks. Otherwise, what is the point in
retaining granular mtime metadata? Is this the behavior expected? Or
do cruft packfiles just not work as intended alongside use of the
`incremental-task` maintenance task / multi-pack index in Git 2.37?

  reply	other threads:[~2022-06-30  0:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 16:55 Race condition between repack and loose-objects maintenance task Gregory Szorc
2022-06-29 17:03 ` Taylor Blau
2022-06-29 17:10   ` Taylor Blau
2022-06-29 17:16   ` Gregory Szorc
2022-06-29 17:21     ` Taylor Blau
2022-06-30  0:44       ` Gregory Szorc [this message]
2022-06-30  3:19         ` Taylor Blau
2022-06-30  3:23           ` Taylor Blau
2022-06-30 20:12             ` Junio C Hamano
2022-07-05 18:43               ` Gregory Szorc
2022-07-06  8:50                 ` Ævar Arnfjörð Bjarmason
2022-07-20  1:40             ` Gregory Szorc
2022-07-20  9:52               ` Ævar Arnfjörð Bjarmason
2022-07-26 16:22                 ` Gregory Szorc
2022-07-26 18:11                   ` Ævar Arnfjörð Bjarmason
2022-07-20  1:41             ` Gregory Szorc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKQoGanPBec6wRO6uWrETaoJXdszpjRWytXaJwx6jw0mrrj-gQ@mail.gmail.com \
    --to=gregory.szorc@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).