git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: derrickstolee@github.com, jonathantanmy@google.com, gitster@pobox.com
Subject: [RFC PATCH 0/4] move pruned objects to a separate repository
Date: Wed, 29 Jun 2022 14:45:49 -0400	[thread overview]
Message-ID: <cover.1656528343.git.me@ttaylorr.com> (raw)

Now that cruft packs are available in v2.37.0, here is an interesting
application of that new feature to enable a two-phase object pruning
approach.

This came out of a discussion within GitHub about ways we could support
storing a set of pruned objects in "limbo" so that they were not
accessible from the repository which pruned them, but instead stored in
a cruft pack in a separate repository which lists the original one as an
alternate.

This makes it possible to take the collection of all pruned objects and
store them in a cruft pack in a separate repository. This repository
(which I have been referring to as the "expired.git") can then be used
as a donor repository for any missing objects (like the ones described
by the race in [1]).

The first few patches are preparatory. The final one implements writing
the pruned objects separately. The trick is to write another cruft pack
to a separate repository, with two tweaks:

  - the `--cruft-expiration` value is set to "never", since we want to
    keep around all of the objects we expired in the previous step, and

  - the original cruft pack appears as a pack that we are going to keep,
    meaning all unreachable objects that are stored in the original
    cruft pack are excluded from the one we write to the "expired.git"
    repository.

You can try this out yourself by doing something like:

    $ git init --bare ../expired.git $ git repack --cruft
    --cruft-expiration=1.day.ago -d \
    --expire-to=../expired.git/objects/pack/pack

which will create two cruft packs:

  - one in the repository which ran `git repack` containing all
    unreachable objects written within the last day, and
  - another in the "expired.git" repository which contains all
    unreachable objects written prior to the last day

This series is an RFC for now since I'm interested in discussing whether
or not this is a feature that people would actually want to use or not.
But if it is, I'm happy to polish this up and turn it into a
non-RFC-quality series ;-).

In the meantime, thanks for your review!

[1]: https://lore.kernel.org/git/YryF+vkosJOXf+mQ@nand.local/

Taylor Blau (4):
  builtin/repack.c: pass "out" to `prepare_pack_objects`
  builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
  builtin/repack.c: write cruft packs to arbitrary locations
  builtin/repack.c: implement `--expire-to` for storing pruned objects

 Documentation/git-repack.txt |   6 ++
 builtin/repack.c             |  67 ++++++++++++++++---
 t/t7700-repack.sh            | 121 +++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+), 8 deletions(-)

-- 
2.37.0.1.g1379af2e9d

             reply	other threads:[~2022-06-29 18:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 18:45 Taylor Blau [this message]
2022-06-29 18:45 ` [RFC PATCH 1/4] builtin/repack.c: pass "out" to `prepare_pack_objects` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 2/4] builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack` Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 3/4] builtin/repack.c: write cruft packs to arbitrary locations Taylor Blau
2022-06-29 18:47 ` [RFC PATCH 4/4] builtin/repack.c: implement `--expire-to` for storing pruned objects Taylor Blau
2022-06-29 22:54 ` [RFC PATCH 0/4] move pruned objects to a separate repository Jonathan Tan
2022-06-30  2:47   ` Taylor Blau
2022-06-30 21:15     ` Jonathan Tan
2022-06-30  8:00 ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1656528343.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).