git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, derrickstolee@github.com, gitster@pobox.com
Subject: Re: [RFC PATCH 0/4] move pruned objects to a separate repository
Date: Thu, 30 Jun 2022 14:15:04 -0700	[thread overview]
Message-ID: <20220630211504.2833463-1-jonathantanmy@google.com> (raw)
In-Reply-To: <Yr0OuwCyDot0wJjs@nand.local>

Taylor Blau <me@ttaylorr.com> writes:
> We don't usually do pruning GC's except during manual intervention or
> upon request through a support ticket. But when we do it is often
> infeasible to lock the entire network's push traffic and reference
> updates. So it is not an unheard of event to encounter the race that I
> described above.
> 
> The idea is that, at least for non-sensitive pruning, we would move the
> pruned objects to a separate repository and hold them there until we
> could run `git fsck` on the repository after pruning and verify that the
> repository is intact. If it is, then the expired.git repository can be
> emptied, too, permanently removing the pruned objects. If not, the
> expired.git repository then becomes a donor for the missing objects,
> which are used to heal the corrupt main repository. Once *that* is done,
> and fsck comes back clean, then the expired.git repository can be
> removed.

Thanks for the information. Presumably, during such pruning GCs (because
manual intervention was needed or because of a support ticket), you
would use a cruft expiration of "now", not something like "14 days ago".
If so, more on this specific case later...

> > I think there is at least one more alternative that should be
> > considered, though: since the cruft pack is unlikely to have its objects
> > "resurrected" (since the reason why they're there is because they are
> > unreachable), it is likely that the objects that are pruned are exactly
> > the same as those in the craft pack. So it would be more efficient to
> > just unconditionally rename the cruft pack to the backup destination.
> 
> This isn't quite right. The contents that are written into the
> expired.git repository is everything that *didn't* end up in the cruft
> pack.

I reread what I wrote and realized that I didn't give a good description
of what I was thinking. So hopefully this is a better one: I was
thinking of the case in which GC is regularly run on a repo, say, every
14 days with a 14-day expiration time. So on the first run you would
have:

  a1.pack      a2.pack (+ .mtime)
  Reachable    Unreachable (cruft pack)

and on the second run, assuming this patch set is in effect:

  b1.pack      b2.pack (+ .mtime)          expired.git/b3.pack
  Reachable    Unreachable (cruft pack)    In expired.git

and my idea was that it is very likely that a2.pack and
expired.git/b3.pack are the same, so I thought that a feature in which
cruft packs could be moved instead of deleted would be more efficient.

Going back to the specific case at the start of this email. I now see
that my idea wouldn't work in that specific case, because you would want
to generate expired.git packs from a repository that is unlikely to have
cruft packs (or, at least, it is unlikely that the existing cruft packs
match exactly what you would write in expired.git).

If it is likely that cruft pack(s) would only be written in one place
(whether in the same repository or in expired.git, but not both),
another design option would be to be able to tell Git where to write
cruft packs, but not give Git the ability to write cruft packs both to
the same repository and expired.git. (Unless in your use case, Taylor,
you envision that you would occasionally need to write cruft packs in
both places.) That would simplify the UI and the code a bit, but not by
much (this patch set, as written, is already elegantly written), so I
don't feel too strongly about it and I would be happy with the current
pattern.



  reply	other threads:[~2022-06-30 21:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 18:45 [RFC PATCH 0/4] move pruned objects to a separate repository Taylor Blau
2022-06-29 18:45 ` [RFC PATCH 1/4] builtin/repack.c: pass "out" to `prepare_pack_objects` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 2/4] builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 3/4] builtin/repack.c: write cruft packs to arbitrary locations Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 4/4] builtin/repack.c: implement `--expire-to` for storing pruned objects Taylor Blau
2022-06-29 22:54 ` [RFC PATCH 0/4] move pruned objects to a separate repository Jonathan Tan
2022-06-30  2:47   ` Taylor Blau
2022-06-30 21:15     ` Jonathan Tan [this message]
2022-06-30  8:00 ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220630211504.2833463-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).