All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, derrickstolee@github.com,
	jonathantanmy@google.com, gitster@pobox.com,
	Jeff King <peff@peff.net>
Subject: Re: [RFC PATCH 0/4] move pruned objects to a separate repository
Date: Thu, 30 Jun 2022 10:00:27 +0200	[thread overview]
Message-ID: <220630.86y1xeeeik.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <cover.1656528343.git.me@ttaylorr.com>


On Wed, Jun 29 2022, Taylor Blau wrote:

> Now that cruft packs are available in v2.37.0, here is an interesting
> application of that new feature to enable a two-phase object pruning
> approach.
>
> This came out of a discussion within GitHub about ways we could support
> storing a set of pruned objects in "limbo" so that they were not
> accessible from the repository which pruned them, but instead stored in
> a cruft pack in a separate repository which lists the original one as an
> alternate.
>
> This makes it possible to take the collection of all pruned objects and
> store them in a cruft pack in a separate repository. This repository
> (which I have been referring to as the "expired.git") can then be used
> as a donor repository for any missing objects (like the ones described
> by the race in [1]).
> [...]
> [1]: https://lore.kernel.org/git/YryF+vkosJOXf+mQ@nand.local/

I think the best description of that race on-list is this by Jeff King,
if so I think it would be nice to work it into a commit message (for
4/4):

	https://public-inbox.org/git/20190319001829.GL29661@sigill.intra.peff.net/

Downthread of that, starting at:

	https://public-inbox.org/git/878svjj4t5.fsf@evledraar.gmail.com/

I describe a proposed mechanism to address the race condition, which
seems to me be functionally the same as parts of what you're proposing
here. I.e. the "limbo" here being the same as the proposed "gc
quarantine".

The main difference being one that this RFC leaves on the table, which
is how you'd get these objects back into the non-cruft repository once
they're erroneously/racily expired. I imagined that we'd add it as a
special alternate, read it last, and make the object reading code aware
that any object needed from such an alternate is one that we'd need to
COR over to our primary repository:

	https://public-inbox.org/git/8736lnxlig.fsf@evledraar.gmail.com/

Whereas it seems like you're imagining just having the "cruft pack"
repository around so that a support engineer can manually recover from
corruption, or have some other out-of-tree mechanism not part of this
RFC to (semi-?)automate that step.

If you haven't it would be nice if you could read that thread & see if
what I'm describing there is essentially a superset of what you have
here, and if any of the concerns Jeff King brought up there are ones you
think apply here.

Particularly as there was a reference to an off-list (presumably at
GitHub) discussion with Michael Haggerty about these sorts of races. I
don't know if either Jeff or Michael were involved in the discussions
you had.

I think that the mechanism I proposed there was subtly different from
what Jeff was concerned about being racy, but that thread was left
hanging as the last reply is from me trying to clarify that point.

So maybe I'm wrong, but I think if that was the case you'd also be wrong
about this approach being viable, so it would be nice to clear that up
:)

I'd also be very interested to know if you have anything like my
proposed auto-healing via a special alternate planned.  I think that
would allow aggressive pruning of live repositories not by fixing our
underlying race conditions, but by "leaning into" them as it were.

I.e. we'd race even more, but as we could always auto-heal by "no, I'll
actually need that" COR-ing the relevant object(s) back from the "gc
quarantine" (or your "cruft repository") that would be OK.

      parent reply	other threads:[~2022-06-30  8:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 18:45 [RFC PATCH 0/4] move pruned objects to a separate repository Taylor Blau
2022-06-29 18:45 ` [RFC PATCH 1/4] builtin/repack.c: pass "out" to `prepare_pack_objects` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 2/4] builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 3/4] builtin/repack.c: write cruft packs to arbitrary locations Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 4/4] builtin/repack.c: implement `--expire-to` for storing pruned objects Taylor Blau
2022-06-29 22:54 ` [RFC PATCH 0/4] move pruned objects to a separate repository Jonathan Tan
2022-06-30  2:47   ` Taylor Blau
2022-06-30 21:15     ` Jonathan Tan
2022-06-30  8:00 ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220630.86y1xeeeik.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.