git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com
Subject: Re: [PATCH 16/20] builtin/gc.c: guess the size of the revindex
Date: Mon, 11 Jan 2021 11:23:54 -0500	[thread overview]
Message-ID: <X/x7mrcwfxGO8xH7@nand.local> (raw)
In-Reply-To: <87cd1b2c-7a28-da77-4ae4-99ffbbdfda72@gmail.com>

On Mon, Jan 11, 2021 at 06:52:24AM -0500, Derrick Stolee wrote:
> This is so far the only not-completely-obvious change.
>
> > But, this is an approximation anyway, and it does remove a use of the
> > 'struct revindex_entry' from outside of pack-revindex internals.
>
> And this might be enough justification for it, but...
>
> > -	heap += sizeof(struct revindex_entry) * nr_objects;
> > +	heap += (sizeof(off_t) + sizeof(uint32_t)) * nr_objects;
>
> ...outside of the estimation change, will this need another change
> when the rev-index is mmap'd? Should this instead be an API call,
> such as estimate_rev_index_memory(nr_objects)? That would
> centralize the estimate to be next to the code that currently
> interacts with 'struct revindex_entry' and will later interact with
> the mmap region.

I definitely did consider this, and it seems that I made a mistake in
not documenting my consideration (since I assumed that it was so benign
nobody would notice / care ;-)).

The reason I didn't pursue it here was that we haven't yet loaded the
reverse index by this point. So, you'd want a function that at least
stats the '*.rev' file (and either does or doesn't parse it [1]), or
aborts early to indicate otherwise.

One would hope that 'load_pack_revindex()' would do just that, but it
falls back to load a reverse index in memory, which involves exactly the
slow sort that we're trying to avoid. (Of course, we're going to have to
do it later anyway, but allocating many GB of heap just to provide an
estimation seems ill-advised to me ;-).)

So, we'd have to expand the API in some way or another, and to me it
didn't seem worth it. As I mentioned in the commit message, I'm
skeptical of the value of being accurate here, since this is (after all)
an estimation.

Perhaps a longer response than you were bargaining for, but... :-).

Thanks,
Taylor

[1]: Likely negligible, since all "parsing" really does is verify the
internal checksum, and then assign a pointer into it.

  reply	other threads:[~2021-01-11 16:24 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 18:16 [PATCH 00/20] pack-revindex: prepare for on-disk reverse index Taylor Blau
2021-01-08 18:16 ` [PATCH 01/20] pack-revindex: introduce a new API Taylor Blau
2021-01-12  8:41   ` Jeff King
2021-01-12  9:41     ` Jeff King
2021-01-12 16:27     ` Taylor Blau
2021-01-13  8:06   ` Junio C Hamano
2021-01-13  8:54     ` Junio C Hamano
2021-01-13 13:17     ` Jeff King
2021-01-13 16:23     ` Taylor Blau
2021-01-08 18:16 ` [PATCH 02/20] write_reuse_object(): convert to new revindex API Taylor Blau
2021-01-12  8:47   ` Jeff King
2021-01-12 16:31     ` Taylor Blau
2021-01-13 13:02       ` Jeff King
2021-01-08 18:16 ` [PATCH 03/20] write_reused_pack_one(): " Taylor Blau
2021-01-12  8:50   ` Jeff King
2021-01-12 16:34     ` Taylor Blau
2021-01-08 18:16 ` [PATCH 04/20] write_reused_pack_verbatim(): " Taylor Blau
2021-01-12  8:50   ` Jeff King
2021-01-08 18:17 ` [PATCH 05/20] check_object(): " Taylor Blau
2021-01-11 11:43   ` Derrick Stolee
2021-01-11 16:15     ` Taylor Blau
2021-01-12  8:54       ` Jeff King
2021-01-12  8:51   ` Jeff King
2021-01-08 18:17 ` [PATCH 06/20] bitmap_position_packfile(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 07/20] show_objects_for_type(): " Taylor Blau
2021-01-12  8:57   ` Jeff King
2021-01-12 16:35     ` Taylor Blau
2021-01-08 18:17 ` [PATCH 08/20] get_size_by_pos(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 09/20] try_partial_reuse(): " Taylor Blau
2021-01-12  9:06   ` Jeff King
2021-01-12 16:47     ` Taylor Blau
2021-01-08 18:17 ` [PATCH 10/20] rebuild_existing_bitmaps(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 11/20] get_delta_base_oid(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 12/20] retry_bad_packed_offset(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 13/20] packed_object_info(): " Taylor Blau
2021-01-12  9:11   ` Jeff King
2021-01-12 16:51     ` Taylor Blau
2021-01-08 18:17 ` [PATCH 14/20] unpack_entry(): " Taylor Blau
2021-01-12  9:22   ` Jeff King
2021-01-12 16:56     ` Taylor Blau
2021-01-08 18:17 ` [PATCH 15/20] for_each_object_in_pack(): " Taylor Blau
2021-01-08 18:17 ` [PATCH 16/20] builtin/gc.c: guess the size of the revindex Taylor Blau
2021-01-11 11:52   ` Derrick Stolee
2021-01-11 16:23     ` Taylor Blau [this message]
2021-01-11 17:09       ` Derrick Stolee
2021-01-12  9:28         ` Jeff King
2021-01-08 18:17 ` [PATCH 17/20] pack-revindex: remove unused 'find_pack_revindex()' Taylor Blau
2021-01-08 18:17 ` [PATCH 18/20] pack-revindex: remove unused 'find_revindex_position()' Taylor Blau
2021-01-11 11:57   ` Derrick Stolee
2021-01-11 16:27     ` Taylor Blau
2021-01-11 17:11       ` Derrick Stolee
2021-01-12  9:32   ` Jeff King
2021-01-12 16:59     ` Taylor Blau
2021-01-13 13:05       ` Jeff King
2021-01-08 18:18 ` [PATCH 19/20] pack-revindex: hide the definition of 'revindex_entry' Taylor Blau
2021-01-11 11:57   ` Derrick Stolee
2021-01-12  9:34   ` Jeff King
2021-01-08 18:18 ` [PATCH 20/20] pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' Taylor Blau
2021-01-12  9:37   ` Jeff King
2021-01-12 17:02     ` Taylor Blau
2021-01-11 12:07 ` [PATCH 00/20] pack-revindex: prepare for on-disk reverse index Derrick Stolee
2021-01-11 16:30   ` Taylor Blau
2021-01-11 17:15     ` Derrick Stolee
2021-01-11 17:29       ` Taylor Blau
2021-01-11 18:40       ` Junio C Hamano
2021-01-12  9:45 ` Jeff King
2021-01-12 17:07   ` Taylor Blau
2021-01-13  0:23 ` Junio C Hamano
2021-01-13  0:52   ` Taylor Blau
2021-01-13  2:15     ` Junio C Hamano
2021-01-13  3:23       ` Taylor Blau
2021-01-13  8:21         ` Junio C Hamano
2021-01-13 13:13           ` Jeff King
2021-01-13 15:34             ` Taylor Blau
2021-01-13 20:06               ` Junio C Hamano
2021-01-13 20:13                 ` Taylor Blau
2021-01-13 20:22                 ` Jeff King
2021-01-13 22:23 ` [PATCH v2 " Taylor Blau
2021-01-13 22:23   ` [PATCH v2 01/20] pack-revindex: introduce a new API Taylor Blau
2021-01-14  6:46     ` Junio C Hamano
2021-01-14 12:00       ` Derrick Stolee
2021-01-14 17:06       ` Taylor Blau
2021-01-14 19:19         ` Jeff King
2021-01-14 20:47           ` Junio C Hamano
2021-01-13 22:23   ` [PATCH v2 02/20] write_reuse_object(): convert to new revindex API Taylor Blau
2021-01-13 22:23   ` [PATCH v2 03/20] write_reused_pack_one(): " Taylor Blau
2021-01-13 22:23   ` [PATCH v2 04/20] write_reused_pack_verbatim(): " Taylor Blau
2021-01-13 22:23   ` [PATCH v2 05/20] check_object(): " Taylor Blau
2021-01-13 22:23   ` [PATCH v2 06/20] bitmap_position_packfile(): " Taylor Blau
2021-01-13 22:23   ` [PATCH v2 07/20] show_objects_for_type(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 08/20] get_size_by_pos(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 09/20] try_partial_reuse(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 10/20] rebuild_existing_bitmaps(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 11/20] get_delta_base_oid(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 12/20] retry_bad_packed_offset(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 13/20] packed_object_info(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 14/20] unpack_entry(): " Taylor Blau
2021-01-13 22:24   ` [PATCH v2 15/20] for_each_object_in_pack(): " Taylor Blau
2021-01-14  6:43     ` Junio C Hamano
2021-01-14 17:00       ` Taylor Blau
2021-01-14 19:33       ` Jeff King
2021-01-14 20:11         ` Jeff King
2021-01-14 20:15           ` Taylor Blau
2021-01-15  2:22             ` Junio C Hamano
2021-01-15  2:29               ` Taylor Blau
2021-01-14 20:51         ` Junio C Hamano
2021-01-13 22:24   ` [PATCH v2 16/20] builtin/gc.c: guess the size of the revindex Taylor Blau
2021-01-14  6:33     ` Junio C Hamano
2021-01-14 16:53       ` Taylor Blau
2021-01-14 19:43         ` Jeff King
2021-01-13 22:24   ` [PATCH v2 17/20] pack-revindex: remove unused 'find_pack_revindex()' Taylor Blau
2021-01-13 22:25   ` [PATCH v2 18/20] pack-revindex: remove unused 'find_revindex_position()' Taylor Blau
2021-01-14  6:42     ` Junio C Hamano
2021-01-13 22:25   ` [PATCH v2 19/20] pack-revindex: hide the definition of 'revindex_entry' Taylor Blau
2021-01-13 22:25   ` [PATCH v2 20/20] pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' Taylor Blau
2021-01-14  6:42     ` Junio C Hamano
2021-01-14 16:56       ` Taylor Blau
2021-01-14 19:51   ` [PATCH v2 00/20] pack-revindex: prepare for on-disk reverse index Jeff King
2021-01-14 20:53     ` Junio C Hamano
2021-01-15  9:32       ` Jeff King
2021-01-15  9:33         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X/x7mrcwfxGO8xH7@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).