All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, larsxschneider@gmail.com, peff@peff.net,
	tytso@mit.edu
Subject: Re: [PATCH 00/17] cruft packs
Date: Fri, 3 Dec 2021 15:08:00 -0500	[thread overview]
Message-ID: <Yap5INmX2ACfjoda@nand.local> (raw)
In-Reply-To: <xmqq5ys5sbzc.fsf@gitster.g>

On Fri, Dec 03, 2021 at 11:51:51AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This series implements "cruft packs", a pack which stores accumulated
> > unreachable objects, along with a new ".mtimes" file which tracks each
> > object's last known modification time.
>
> Let me rephrase the above to test my understanding, since I need to
> write a summary for the  "What's cooking" report.
>
>  Instead of leaving unreachable objects in loose form when packing,
>  or ejecting them into loose form when repacking, gather them in a
>  packfile with an auxiliary file that records the last-use time of
>  these objects.

Exactly. Thanks for such a concise and accurate description of the
topic.

> That way, we do not have to waste so many inodes for loose objects
> that is not likely to be used, which feels like a win.

Yes. This had historically been a problem for GitHub. We don't
automatically prune unreachable objects during repacking, but sometimes
customers will ask us to do it on their behalf (if, for example, they
accidentally pushed sensitive information to us, and then force-pushed
over it).

But occasionally we'd get bitten by exploding many years of loose
objects (because we used to freshen packfiles too aggressively when
moving them around).

We've been running this series in production for the past few months,
and it's been a huge relief on the folks who typically run these pruning
GCs.

> >   - The final patch handles object freshening for objects stored in a
> >     cruft pack.
>
> I am not going to read it today, but I think this is the most
> interesting part of the series.  Instead of using mtime of an
> individual loose object file, we'd need to record the time of
> last use for each object in a pack.
>
> Stepping back a bit, I do not see how we can get away without doing
> the same .mtimes file for non-cruft packs.  An object that is in a
> non-cruft pack may be referenced immediately after the repack that
> created the pack, but the ref that was referencing the object may
> have gone away and now the pack is a month old.  If we were to
> repack the object, we do not know when was the last time the object
> was reachable from any of the refs and index entries (collectively
> known as anchor points).

In that situation, we would use the mtime of the pack which contains
that object itself as a proxy (or the mtime of a loose copy of the
object, if it is more recent).

That isn't perfect, as you note, since if the pack isn't otherwise
freshened, we'd consider that object to be a month old, even if the
reference pointing at it was deleted a mere second ago.

I can't recall if Peff and I talked about this off-list, but I have a
vague sense we probably did (and I forgot the details).

> Of course, recording all mtimes for all
> packed objects all the time would involve quite a lot of overhead.
> I am guessing (I will not spend time today to figure it out myself)
> that .mtimes update at runtime will happen in-place (i.e. via
> seek(2)+write(2), or pwrite()), and I wonder what the safety concern
> would be (which is the primary reason why we tend not to do in-place
> updates but recreate-and-rename updates).

Yeah, this series avoids doing an in-place update, and similarly avoids
recreating the entire .mtimes file before moving into place. Instead,
freshening an object stored in a cruft pack takes place by rewriting a
copy of the object loose, since we consider an object's mtime to be the
most recent of (a) what's in the .mtimes file, (b) the mtime of the
containing pack, and (c) the mtime of a loose copy (if one exists).

It can be wasteful, but in practice "resurrecting" an object in a cruft
pack is pretty rare, so on balance it ends up costing less work to do.

> Thanks for working on such an interesting topic.

I'm glad to have piqued your interest.

Taylor

  reply	other threads:[~2021-12-03 20:08 UTC|newest]

Thread overview: 201+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 22:25 [PATCH 00/17] cruft packs Taylor Blau
2021-11-29 22:25 ` [PATCH 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2021-12-02 14:33   ` Derrick Stolee
2021-12-03 21:53     ` Taylor Blau
2021-12-04 22:20   ` Elijah Newren
2021-12-04 23:32     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2021-12-02 15:06   ` Derrick Stolee
2021-12-02 22:32     ` brian m. carlson
2021-12-03 22:24     ` Taylor Blau
2022-01-07 19:41       ` Taylor Blau
2021-11-29 22:25 ` [PATCH 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2021-11-29 22:25 ` [PATCH 04/17] chunk-format.h: extract oid_version() Taylor Blau
2021-12-02 15:22   ` Derrick Stolee
2021-12-03 22:40     ` Taylor Blau
2021-12-06 17:33       ` Derrick Stolee
2021-11-29 22:25 ` [PATCH 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2021-12-02 15:36   ` Derrick Stolee
2021-12-03 23:04     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2021-12-06 21:16   ` Derrick Stolee
2022-02-23 22:24     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2021-11-29 22:25 ` [PATCH 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2021-12-06 21:44   ` Derrick Stolee
2022-03-01  2:48     ` Taylor Blau
2021-12-07 15:17   ` Derrick Stolee
2022-02-23 23:34     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2021-11-29 22:25 ` [PATCH 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2021-11-29 22:25 ` [PATCH 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2021-12-07 15:30   ` Derrick Stolee
2022-02-23 23:35     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2021-12-05 20:46   ` Junio C Hamano
2022-03-01  2:00     ` Taylor Blau
2021-12-07 15:38   ` Derrick Stolee
2022-02-23 23:37     ` Taylor Blau
2021-11-29 22:25 ` [PATCH 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2021-11-29 22:25 ` [PATCH 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2021-11-29 22:25 ` [PATCH 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2021-11-29 22:25 ` [PATCH 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2021-11-29 22:25 ` [PATCH 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2021-12-03 19:51 ` [PATCH 00/17] " Junio C Hamano
2021-12-03 20:08   ` Taylor Blau [this message]
2021-12-03 20:47     ` Taylor Blau
2022-03-02  0:57 ` [PATCH v2 " Taylor Blau
2022-03-02  0:58   ` [PATCH v2 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-03-02  0:58   ` [PATCH v2 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-03-02 20:22     ` Derrick Stolee
2022-03-02 21:33       ` Taylor Blau
2022-03-02  0:58   ` [PATCH v2 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-03-02  0:58   ` [PATCH v2 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-03-02  0:58   ` [PATCH v2 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-03-02  0:58   ` [PATCH v2 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-03-02  0:58   ` [PATCH v2 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-03-02  0:58   ` [PATCH v2 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-03-02  0:58   ` [PATCH v2 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-03-02 20:19     ` Derrick Stolee
2022-03-02 21:28       ` Taylor Blau
2022-03-02  0:58   ` [PATCH v2 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-03-02  0:58   ` [PATCH v2 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-03-02  7:42     ` Junio C Hamano
2022-03-02 15:54       ` Taylor Blau
2022-03-02 19:57         ` Derrick Stolee
2022-03-02  0:58   ` [PATCH v2 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-03-02  0:58   ` [PATCH v2 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-03-02  0:58   ` [PATCH v2 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-03-02  0:58   ` [PATCH v2 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-03-02  0:58   ` [PATCH v2 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-03-02  0:58   ` [PATCH v2 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-03-02 20:23   ` [PATCH v2 00/17] " Derrick Stolee
2022-03-02 21:36     ` Taylor Blau
2022-03-03  0:20 ` [PATCH v3 " Taylor Blau
2022-03-03  0:20   ` [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-03-07 18:03     ` Jonathan Nieder
2022-03-22  1:16       ` Taylor Blau
2022-03-22 21:45         ` Jonathan Nieder
2022-03-22 22:02           ` Taylor Blau
2022-03-22 23:04             ` Jonathan Nieder
2022-03-23  1:01               ` Taylor Blau
2022-03-28 18:46                 ` Taylor Blau
2022-03-28 20:55                   ` Junio C Hamano
2022-03-28 21:21                     ` Taylor Blau
2022-03-29 15:59                       ` Junio C Hamano
2022-03-30  2:23                         ` Taylor Blau
2022-03-30 13:37                           ` Junio C Hamano
2022-03-30 17:30                             ` Taylor Blau
2022-03-03  0:20   ` [PATCH v3 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-03-03  0:20   ` [PATCH v3 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-03-03  0:20   ` [PATCH v3 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-03-03 16:30     ` Ævar Arnfjörð Bjarmason
2022-03-03 23:32       ` Taylor Blau
2022-03-04  0:16         ` Junio C Hamano
2022-03-03  0:20   ` [PATCH v3 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-03-03 16:45     ` Ævar Arnfjörð Bjarmason
2022-03-03 23:35       ` Taylor Blau
2022-03-04 10:40         ` Ævar Arnfjörð Bjarmason
2022-03-03  0:20   ` [PATCH v3 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-03-03  0:21   ` [PATCH v3 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-03-03  0:21   ` [PATCH v3 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-03-03  0:21   ` [PATCH v3 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-03-03  0:21   ` [PATCH v3 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-03-03  0:21   ` [PATCH v3 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-03-03  0:21   ` [PATCH v3 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-03-03  0:21   ` [PATCH v3 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-03-03  0:21   ` [PATCH v3 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-03-03  0:21   ` [PATCH v3 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-03-03  0:21   ` [PATCH v3 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-03-03  0:21   ` [PATCH v3 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-03-03  1:29   ` [PATCH v3 00/17] " Derrick Stolee
2022-05-18 23:10 ` [PATCH v4 " Taylor Blau
2022-05-18 23:10   ` [PATCH v4 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-05-19 14:04     ` Junio C Hamano
2022-05-18 23:10   ` [PATCH v4 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-19 10:40     ` Ævar Arnfjörð Bjarmason
2022-05-19 15:21       ` Junio C Hamano
2022-05-20  7:32         ` Ævar Arnfjörð Bjarmason
2022-05-20 22:37           ` Taylor Blau
2022-05-18 23:10   ` [PATCH v4 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-05-18 23:11   ` [PATCH v4 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-05-19 11:44     ` Ævar Arnfjörð Bjarmason
2022-05-18 23:11   ` [PATCH v4 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-05-18 23:11   ` [PATCH v4 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-05-18 23:11   ` [PATCH v4 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-05-18 23:11   ` [PATCH v4 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-05-19 10:04     ` Junio C Hamano
2022-05-19 15:16       ` Junio C Hamano
2022-05-20 22:52         ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-05-18 23:11   ` [PATCH v4 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-05-18 23:11   ` [PATCH v4 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-05-18 23:11   ` [PATCH v4 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-05-19 11:29     ` Ævar Arnfjörð Bjarmason
2022-05-20 22:39       ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-05-18 23:11   ` [PATCH v4 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-05-18 23:11   ` [PATCH v4 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-05-19 11:32     ` Ævar Arnfjörð Bjarmason
2022-05-20 22:42       ` Taylor Blau
2022-05-18 23:11   ` [PATCH v4 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-05-18 23:11   ` [PATCH v4 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-05-18 23:48   ` [PATCH v4 00/17] " Derrick Stolee
2022-05-20 23:19     ` Junio C Hamano
2022-05-20 23:30       ` Taylor Blau
2022-05-19 11:42   ` [RFC PATCH 0/2] Utility functions for duplicated pack(write) code Ævar Arnfjörð Bjarmason
2022-05-19 11:42     ` [RFC PATCH 1/2] packfile API: add and use a pack_name_to_ext() utility function Ævar Arnfjörð Bjarmason
2022-05-19 15:40       ` Junio C Hamano
2022-05-19 11:42     ` [RFC PATCH 2/2] hash API: add and use a hash_short_id_by_algo() function Ævar Arnfjörð Bjarmason
2022-05-19 15:50       ` Junio C Hamano
2022-05-19 19:07         ` Ævar Arnfjörð Bjarmason
2022-05-19 15:31     ` [RFC PATCH 0/2] Utility functions for duplicated pack(write) code Junio C Hamano
2022-05-19 11:54   ` [PATCH v4 00/17] cruft packs Ævar Arnfjörð Bjarmason
2022-05-20 23:17 ` [PATCH v5 " Taylor Blau
2022-05-20 23:17   ` [PATCH v5 01/17] Documentation/technical: add cruft-packs.txt Taylor Blau
2022-05-20 23:17   ` [PATCH v5 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-24 19:32     ` Jonathan Nieder
2022-05-24 19:44       ` rsbecker
2022-05-24 22:25         ` Taylor Blau
2022-05-24 23:24           ` rsbecker
2022-05-25  0:07             ` Taylor Blau
2022-05-25  0:20               ` rsbecker
2022-05-25  9:11               ` adding new 32-bit on-disk (unsigned) timestamp formats (was: [PATCH v5 02/17] pack-mtimes: support reading .mtimes files) Ævar Arnfjörð Bjarmason
2022-05-25 13:30                 ` Derrick Stolee
2022-05-25 21:13                   ` Taylor Blau
2022-05-26  0:02                     ` Ævar Arnfjörð Bjarmason
2022-05-26  0:12                       ` Taylor Blau
2022-05-24 22:21       ` [PATCH v5 02/17] pack-mtimes: support reading .mtimes files Taylor Blau
2022-05-25  7:48         ` Jonathan Nieder
2022-05-25 21:36           ` Taylor Blau
2022-05-25 21:58             ` rsbecker
2022-05-25 22:59               ` Taylor Blau
2022-05-25 23:02     ` Taylor Blau
2022-05-26  0:30       ` Junio C Hamano
2023-06-01 13:01     ` Andreas Schwab
2022-05-20 23:17   ` [PATCH v5 03/17] pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' Taylor Blau
2022-05-20 23:17   ` [PATCH v5 04/17] chunk-format.h: extract oid_version() Taylor Blau
2022-05-20 23:17   ` [PATCH v5 05/17] pack-mtimes: support writing pack .mtimes files Taylor Blau
2022-05-20 23:17   ` [PATCH v5 06/17] t/helper: add 'pack-mtimes' test-tool Taylor Blau
2022-05-20 23:17   ` [PATCH v5 07/17] builtin/pack-objects.c: return from create_object_entry() Taylor Blau
2022-05-20 23:17   ` [PATCH v5 08/17] builtin/pack-objects.c: --cruft without expiration Taylor Blau
2022-05-20 23:17   ` [PATCH v5 09/17] reachable: add options to add_unseen_recent_objects_to_traversal Taylor Blau
2022-05-20 23:17   ` [PATCH v5 10/17] reachable: report precise timestamps from objects in cruft packs Taylor Blau
2022-05-20 23:18   ` [PATCH v5 11/17] builtin/pack-objects.c: --cruft with expiration Taylor Blau
2022-05-20 23:18   ` [PATCH v5 12/17] builtin/repack.c: support generating a cruft pack Taylor Blau
2022-05-20 23:18   ` [PATCH v5 13/17] builtin/repack.c: allow configuring cruft pack generation Taylor Blau
2022-05-20 23:18   ` [PATCH v5 14/17] builtin/repack.c: use named flags for existing_packs Taylor Blau
2022-05-20 23:18   ` [PATCH v5 15/17] builtin/repack.c: add cruft packs to MIDX during geometric repack Taylor Blau
2022-05-20 23:18   ` [PATCH v5 16/17] builtin/gc.c: conditionally avoid pruning objects via loose Taylor Blau
2022-06-19  5:38     ` René Scharfe
2022-06-21 15:58       ` Junio C Hamano
2022-05-20 23:18   ` [PATCH v5 17/17] sha1-file.c: don't freshen cruft packs Taylor Blau
2022-05-21 11:17   ` [PATCH v5 00/17] " Ævar Arnfjörð Bjarmason
2022-05-24 19:39     ` Jonathan Nieder
2022-05-24 21:50       ` Taylor Blau
2022-05-24 21:55         ` Ævar Arnfjörð Bjarmason
2022-05-24 22:12           ` Taylor Blau
2022-05-25  7:53             ` Jonathan Nieder
2022-05-25 19:59               ` Derrick Stolee
2022-05-25 21:09                 ` Taylor Blau
2022-05-26  0:06                   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yap5INmX2ACfjoda@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.