git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Josh Steadmon <steadmon@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, peff@peff.net, gitster@pobox.com,
	avarab@gmail.com
Subject: Re: [PATCH v2 2/8] builtin/multi-pack-index.c: support `--stdin-packs` mode
Date: Wed, 22 Sep 2021 22:03:27 -0400	[thread overview]
Message-ID: <YUvgbxRmUwZF87b1@nand.local> (raw)
In-Reply-To: <YUuuYW3RktlDzskH@google.com>

On Wed, Sep 22, 2021 at 03:29:53PM -0700, Josh Steadmon wrote:
> Thanks for the series! I have a couple of questions:
>
>
> On 2021.09.15 14:24, Taylor Blau wrote:
> > To power a new `--write-midx` mode, `git repack` will want to write a
> > multi-pack index containing a certain set of packs in the repository.
> >
> > This new option will be used by `git repack` to write a MIDX which
> > contains only the packs which will survive after the repack (that is, it
> > will exclude any packs which are about to be deleted).
> >
> > This patch effectively exposes the function implemented in the previous
> > commit via the `git multi-pack-index` builtin. An alternative approach
> > would have been to call that function from the `git repack` builtin
> > directly, but this introduces awkward problems around closing and
> > reopening the object store, so the MIDX will be written out-of-process.
>
> Could you elaborate a bit on the "awkward problems" here? I'm afraid I'm
> missing the context here.

A variety of things can go wrong when the object store is closed and
re-opened in the same process. Many of the symptoms are described
beginning at this message:

  https://lore.kernel.org/git/YPf1m01mcdJ3HNBt@coredump.intra.peff.net/

and further down in the sub-thread. Many of those problems have been
resolved, but I'm not convinced that there aren't others lurking.

> > +static void read_packs_from_stdin(struct string_list *to)
> > +{
> > +	struct strbuf buf = STRBUF_INIT;
> > +	while (strbuf_getline(&buf, stdin) != EOF)
> > +		string_list_append(to, buf.buf);
> > +	string_list_sort(to);
> > +
> > +	strbuf_release(&buf);
> > +}
> > +
>
> I'm presuming that the packfile list is going to be generated
> automatically, but what happens if that becomes corrupt somehow, and we
> skip a packfile that should have been included? Will that cause
> incorrect behavior, or will we just miss out on some of the bitmap
> performance benefits?

A multi-pack bitmap can only refer to objects that are in a pack which
the repository's MIDX includes. So if we left off a pack from this list,
we'd be unable to cover that pack's objects in the resulting bitmap.
We'd also be unable to cover any objects which are reachable from the
missing pack's objects, since the set of objects in a bitmap must be
closed under reachability.

If, on the other hand, we read a line which does not correspond to any
pack, we'll simply ignore it. That's because we loop over the results of
get_all_packs() and try to find a match in this list instead of the
other way around.

We could mark the packs we found by abusing the string_list_item's util
pointer, but it's probably not worth it since this is mostly an internal
interface.

Thanks,
Taylor

  reply	other threads:[~2021-09-23  2:03 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11  3:32 [PATCH 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-11  3:32 ` [PATCH 1/8] midx: expose 'write_midx_file_only()' publicly Taylor Blau
2021-09-11  5:00   ` Junio C Hamano
2021-09-11 16:17     ` Taylor Blau
2021-09-11 10:07   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:21     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 2/8] builtin/multi-pack-index.c: support --stdin-packs mode Taylor Blau
2021-09-11 10:05   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:25     ` Taylor Blau
2021-09-11 16:28       ` Taylor Blau
2021-09-12  2:08       ` Eric Sunshine
2021-09-12  2:21         ` Taylor Blau
2021-09-12 15:15           ` Ævar Arnfjörð Bjarmason
2021-09-12 22:30             ` Junio C Hamano
2021-09-12 22:32               ` Ævar Arnfjörð Bjarmason
2021-09-14 19:02       ` Jeff King
2021-09-14 23:48         ` Taylor Blau
2021-09-15  1:55           ` Eric Sunshine
2021-09-11  3:32 ` [PATCH 3/8] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-11  3:32 ` [PATCH 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-11  3:32 ` [PATCH 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-11  3:32 ` [PATCH 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-11  3:32 ` [PATCH 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-11 10:17   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:35     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-11 10:27   ` Ævar Arnfjörð Bjarmason
2021-09-11 11:19     ` Ævar Arnfjörð Bjarmason
2021-09-11 16:51       ` Taylor Blau
2021-09-14 18:55         ` Jeff King
2021-09-14 23:34           ` Taylor Blau
2021-09-14 23:56             ` Ævar Arnfjörð Bjarmason
2021-09-15  4:31               ` Taylor Blau
2021-09-11 16:49     ` Taylor Blau
2021-09-15 18:24 ` [PATCH v2 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 1/8] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-22 23:14     ` Jonathan Tan
2021-09-23  3:09       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 2/8] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-22 22:29     ` Josh Steadmon
2021-09-23  2:03       ` Taylor Blau [this message]
2021-09-22 23:11     ` Jonathan Tan
2021-09-23  2:06       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 3/8] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-22 22:34     ` Josh Steadmon
2021-09-23  2:08       ` Taylor Blau
2021-09-22 23:00     ` Jonathan Tan
2021-09-23  2:18       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-22 22:56     ` Jonathan Tan
2021-09-23  2:59       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-15 18:24   ` [PATCH v2 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-22 22:39     ` Jonathan Tan
2021-09-23  2:40       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-15 18:24   ` [PATCH v2 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-24 18:22     ` Jonathan Tan
2021-10-01 22:38       ` Taylor Blau
2021-09-15 19:22   ` [PATCH v2 0/8] repack: introduce `--write-midx` Junio C Hamano
2021-09-15 19:29     ` Junio C Hamano
2021-09-15 21:19       ` Taylor Blau
2021-09-16 22:16         ` Junio C Hamano
2021-09-29  1:54   ` [PATCH v3 0/9] " Taylor Blau
2021-09-29  1:55     ` [PATCH v3 1/9] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-29  1:55     ` [PATCH v3 2/9] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-29  1:55     ` [PATCH v3 3/9] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-29  1:55     ` [PATCH v3 4/9] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-29  1:55     ` [PATCH v3 5/9] builtin/repack.c: rename variables that deal with non-kept packs Taylor Blau
2021-09-29  1:55     ` [PATCH v3 6/9] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-29  1:55     ` [PATCH v3 7/9] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-29  1:55     ` [PATCH v3 8/9] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-29  1:55     ` [PATCH v3 9/9] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-29  4:24     ` [PATCH v3 0/9] repack: introduce `--write-midx` Junio C Hamano
2021-10-01 20:01     ` Jonathan Tan
2021-10-01 22:40       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUvgbxRmUwZF87b1@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).