git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
	jrnieder@gmail.com
Subject: Re: [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files
Date: Tue, 26 Jan 2021 16:43:25 -0500	[thread overview]
Message-ID: <YBCM/fdZZwoPVsrI@coredump.intra.peff.net> (raw)
In-Reply-To: <YA8YxtUUwE0lLxQt@nand.local>

On Mon, Jan 25, 2021 at 02:15:18PM -0500, Taylor Blau wrote:

> On Fri, Jan 22, 2021 at 06:24:59PM -0500, Jeff King wrote:
> > On Wed, Jan 13, 2021 at 05:28:11PM -0500, Taylor Blau wrote:
> >
> > [...]
> >
> > qsort? Don't we have a perfectly good radix sort for this exact purpose? :)
> 
> Yeah, I think your argument against it (which I agree with is
> basically):
> 
>   We could, and it would be faster, but it would (1) require allocating
>   twice as much data, and (2) the speed increase is a drop in the bucket
>   on any repository with enough objects to even notice it.
> 
> I'll add something to that effect to the commit message.

Yep, that's accurate.

It might make more of a difference when we eventually have a revindex
cache for a midx (because there we're mostly doing things linear in the
number of objects, and not actually traversing). But I'm quite happy to
punt it until then (or later; even there I'd guess it might shave one
second off a 10-15 second operation. That's not nothing, but if nobody
ever cares enough to implement it, I won't be sad).

> > I forgot to comment on this in patch 1, but: I think the format is
> > really independent of the hash size. The contents are identical for a
> > sha-1 versus sha-256 file.
> 
> Right; the format is identical. That is, the fields are the same, but
> the hashes are longer.

Hmm. Actually, I wasn't thinking it through completely. I thought the
file would literally have the exact same bytes in it, but the packfile
hash as well as the hashfd() trailing hash would both be different
sizes. So even if it doesn't match the packfile (which obviously is a
bug), it would hard to even be able to figure that out without knowing
the hash identifier.

> > That said, I don't overly mind having a hash identifier if it might help
> > debug things (OTOH, how the heck do you end up with one that matches the
> > trailer's packfile but  _doesn't_ match the trailer's contents?).
> >
> > If we do have it, should we also be checking it in the loading function?
> 
> I'm not opposed to it, but I imagined that this field would be primarily
> for debugging purposes. I was surprised to learn that we _don't_ verify
> the checksum for .idx files. So, I'm reluctant to start doing so here,
> honestly.

I think we're talking about two different things. I just meant that if
we see a .rev file that says "I'm sha256", we should make sure we are in
a sha256 repo (since that is what we'd be assuming for the .idx and
.pack files!).

We can't do that same check for .idx files, because they don't have a
hash identifier field in them.

Checking that the pack hash matches the trailer from the packfile is a
separate matter. I also think that's worth doing, but am likewise
surprised that we don't check it for the .idx. I'm OK leaving it out
since we don't do it there, either.

> > I wonder if we could factor out some of this repeated logic, but I
> > suspect it is mostly diminishing returns. Maybe this "open a pack file
> > for writing" could become a helper function, though.
> 
> Yeah. I tried factoring it out before replying, and it's a little gross.
> Most of my discomfort with it lies in the complexity of the parameters.
> 
> Consider extracting the code in write_idx_file():
> 
>   - There would need to be a double pointer for 'index_name' (which is
>     sometimes read, and sometimes written to).
> 
>   - There would be an unsigned bit for "verify" (i.e., "open with
>     hashfd_check() or not").
> 
>   - There would be a "pattern" variable if we were creating a new
>     temporary file with odb_mkstemp().
> 
> Having the caller be forced to juggle the combinations of passing
> NULL/0 or not for each of those three makes me leery that this is worth
> doing, so I tend to agree with your judgement that this provides a
> diminishing return.

Thanks for taking a look. I'm not too surprised it turned out ugly.

-Peff

  reply	other threads:[~2021-01-26 22:20 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 18:19 [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-08 18:19 ` [PATCH 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-08 18:20 ` [PATCH 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
2021-01-08 18:20 ` [PATCH 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
2021-01-08 18:20 ` [PATCH 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
2021-01-08 18:20 ` [PATCH 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
2021-01-08 18:20 ` [PATCH 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-12 17:11   ` Ævar Arnfjörð Bjarmason
2021-01-12 18:40     ` Taylor Blau
2021-01-08 18:20 ` [PATCH 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-12 16:49   ` Derrick Stolee
2021-01-12 17:34     ` Taylor Blau
2021-01-12 17:18   ` Ævar Arnfjörð Bjarmason
2021-01-12 17:39     ` Derrick Stolee
2021-01-12 18:17       ` Taylor Blau
2021-01-08 18:20 ` [PATCH 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
2021-01-13 22:28 ` [PATCH v2 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-13 22:28   ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-14  7:22     ` Junio C Hamano
2021-01-14 12:07       ` Derrick Stolee
2021-01-14 19:57         ` Jeff King
2021-01-14 18:28       ` Taylor Blau
2021-01-14  7:26     ` Junio C Hamano
2021-01-14 18:13       ` Taylor Blau
2021-01-14 20:57         ` Junio C Hamano
2021-01-22 22:54     ` Jeff King
2021-01-25 17:44       ` Taylor Blau
2021-01-25 18:27         ` Jeff King
2021-01-25 19:04         ` Junio C Hamano
2021-01-25 19:23           ` Taylor Blau
2021-01-13 22:28   ` [PATCH v2 2/8] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
2021-01-22 23:24     ` Jeff King
2021-01-25 19:15       ` Taylor Blau
2021-01-26 21:43         ` Jeff King [this message]
2021-01-13 22:28   ` [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes Taylor Blau
2021-01-22 23:53     ` Jeff King
2021-01-25 20:03       ` Taylor Blau
2021-01-13 22:28   ` [PATCH v2 4/8] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
2021-01-22 23:57     ` Jeff King
2021-01-23  0:08       ` Jeff King
2021-01-25 20:21         ` Taylor Blau
2021-01-25 20:50           ` Jeff King
2021-01-13 22:28   ` [PATCH v2 5/8] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
2021-01-13 22:28   ` [PATCH v2 6/8] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-13 22:28   ` [PATCH v2 7/8] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-13 22:28   ` [PATCH v2 8/8] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
2021-01-25 23:37 ` [PATCH v3 00/10] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-25 23:37   ` [PATCH v3 01/10] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-29  0:27     ` Jeff King
2021-01-29  1:14       ` Taylor Blau
2021-01-30  8:39         ` Jeff King
2021-01-25 23:37   ` [PATCH v3 02/10] pack-write.c: prepare to write 'pack-*.rev' files Taylor Blau
2021-01-25 23:37   ` [PATCH v3 03/10] builtin/index-pack.c: allow stripping arbitrary extensions Taylor Blau
2021-01-29  0:28     ` Jeff King
2021-01-29  1:15       ` Taylor Blau
2021-01-25 23:37   ` [PATCH v3 04/10] builtin/index-pack.c: write reverse indexes Taylor Blau
2021-01-25 23:37   ` [PATCH v3 05/10] builtin/pack-objects.c: respect 'pack.writeReverseIndex' Taylor Blau
2021-01-25 23:37   ` [PATCH v3 06/10] Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' Taylor Blau
2021-01-29  0:30     ` Jeff King
2021-01-29  1:17       ` Taylor Blau
2021-01-30  8:41         ` Jeff King
2021-01-25 23:37   ` [PATCH v3 07/10] t: prepare for GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-29  0:45     ` Jeff King
2021-01-29  1:09       ` Eric Sunshine
2021-01-29  1:21       ` Taylor Blau
2021-01-30  8:43         ` Jeff King
2021-01-29  2:42       ` Junio C Hamano
2021-01-25 23:37   ` [PATCH v3 08/10] t: support GIT_TEST_WRITE_REV_INDEX Taylor Blau
2021-01-29  0:47     ` Jeff King
2021-01-25 23:37   ` [PATCH v3 09/10] pack-revindex: ensure that on-disk reverse indexes are given precedence Taylor Blau
2021-01-29  0:53     ` Jeff King
2021-01-29  1:25       ` Taylor Blau
2021-01-30  8:46         ` Jeff King
2021-01-25 23:37   ` [PATCH v3 10/10] t5325: check both on-disk and in-memory reverse index Taylor Blau
2021-01-29  1:04     ` Jeff King
2021-01-29  1:05     ` Jeff King
2021-01-29  1:32       ` Taylor Blau
2021-01-30  8:47         ` Jeff King
2021-01-26  2:36   ` [PATCH v3 00/10] pack-revindex: introduce on-disk '.rev' format Junio C Hamano
2021-01-26  2:49     ` Taylor Blau
2021-01-29  1:06   ` Jeff King
2021-01-29  1:34     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YBCM/fdZZwoPVsrI@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).