From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry()
Date: Wed, 17 Feb 2021 14:54:33 -0500 [thread overview]
Message-ID: <YC10eZkpqtzLlJUP@nand.local> (raw)
In-Reply-To: <YC1OJDFXPnxGMHPK@coredump.intra.peff.net>
On Wed, Feb 17, 2021 at 12:11:00PM -0500, Jeff King wrote:
> On Wed, Feb 03, 2021 at 10:59:21PM -0500, Taylor Blau wrote:
>
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index fbd7b54d70..b2ba5aa14f 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
> > */
> > unsigned flags = 0;
> > if (ignore_packed_keep_on_disk)
> > - flags |= ON_DISK_KEEP_PACKS;
> > + flags |= CACHE_ON_DISK_KEEP_PACKS;
> > if (ignore_packed_keep_in_core)
> > - flags |= IN_CORE_KEEP_PACKS;
> > + flags |= CACHE_IN_CORE_KEEP_PACKS;
>
> Why are we renaming the constants in this patch?
>
> I know I'm listed as the author, but I think this came out of some
> off-list back and forth between us. It seems like the existing constants
> would have been fine.
Yeah, they would have been fine. They were renamed because this patch
makes them only used for the kept pack cache, but I agree the existing
names are fine, too.
In any case, they make an easier-to-read diff, so I'm perfectly happy to
un-rename them ;).
> > +static void maybe_invalidate_kept_pack_cache(struct repository *r,
> > + unsigned flags)
> > {
> > - return find_one_pack_entry(r, oid, e, 0);
> > + if (!r->objects->kept_pack_cache)
> > + return;
> > + if (r->objects->kept_pack_cache->flags == flags)
> > + return;
> > + free(r->objects->kept_pack_cache->packs);
> > + FREE_AND_NULL(r->objects->kept_pack_cache);
> > +}
>
> OK, so we keep a single cache based on the flags, and then if somebody
> ever asks for different flags, we throw it away. That's probably OK for
> our purposes, since we wouldn't expect multiple callers within a single
> process.
>
> I wondered if it would be simpler to just keep two lists, one for
> in-core keeps and one for on-disk keeps. And then just walk over each
> list separately based on the query flags. That makes things more robust
> _and_ I think would be less code. It does mean that a pack could appear
> in both lists, though, which means we might do a lookup in it twice.
> That doesn't seem all that likely, but it is working against our goal
> here.
>
> Another option is to keep 3 caches (two separate and one combined),
> rather than flipping between them. I'm not sure if that would be less
> code or not (it gets rid of the "invalidate" function, but you do have
> to pick the right cache depending on the query flags).
>
> Yet another option is to keep a cache of any that are marked as _either_
> in core or on-disk keeps, and then decide to look up the object based on
> the query flags. Then you just pay the cost to iterate over the list and
> check the flags (which really is all this cache is helping with in the
> first place).
All interesting ideas. In this patch (and by the end of the series)
callers that use the kept pack cache never ask for the cache with a
different set of flags. IOW, there isn't a situation where a caller
would populate the in-core kept pack cache, and then suddenly ask for
both in-core and on-disk packs to be kept.
So all of this code is defensive in case that were to change, and
suddenly we'd be returning subtly wrong results. I could imagine that
being kind of a nasty bug to track down, so detecting and invalidating
the cache would make it a non-issue.
I'll note it in the commit message, though, since it's good for future
readers to be aware, too.
> I dunno. TBH, I kind of wonder if this whole patch is worth doing at
> all, giving the underwhelming performance benefit (3% on the
> pathological 1000-pack case). When I had timed this strategy initially,
> it was more like 15%. I'm not sure where the savings went in the
> interim, or if it was a timing fluke.
Yeah, I dunno. It's certainly not hurting (I don't think the extra code
is all that complex, and the savings is at least non-zero), so I'm
inclined to keep it.
> > +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
> > +{
> > + maybe_invalidate_kept_pack_cache(r, flags);
> > +
> > + if (!r->objects->kept_pack_cache) {
> > + struct packed_git **packs = NULL;
> > + size_t nr = 0, alloc = 0;
> > + struct packed_git *p;
> > +
> > + /*
> > + * We want "all" packs here, because we need to cover ones that
> > + * are used by a midx, as well. We need to look in every one of
> > + * them (instead of the midx itself) to cover duplicates. It's
> > + * possible that an object is found in two packs that the midx
> > + * covers, one kept and one not kept, but the midx returns only
> > + * the non-kept version.
> > + */
> > + for (p = get_all_packs(r); p; p = p->next) {
> > + if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
> > + (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
> > + ALLOC_GROW(packs, nr + 1, alloc);
> > + packs[nr++] = p;
> > + }
> > + }
> > + ALLOC_GROW(packs, nr + 1, alloc);
> > + packs[nr] = NULL;
> > +
> > + r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
> > + r->objects->kept_pack_cache->packs = packs;
> > + r->objects->kept_pack_cache->flags = flags;
> > + }
>
> Is there any reason not to just embed the kept_pack_cache struct inside
> the object_store? It's one less pointer to deal with. I wonder if this
> is a holdover from an attempt to have multiple caches.
>
> (I also think it would be reasonable if we wanted to hide the definition
> of the cache struct from callers, but we don't seem do to that).
Not a holdover, just designed to avoid adding too many extra fields to
the object-store. I don't feel strongly, but I do think hiding the
definition is a good idea, so I'll inline it.
> > @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
> > return find_pack_entry(the_repository, oid, &e);
> > }
> >
> > -int has_object_kept_pack(const struct object_id *oid, unsigned flags)
> > +int has_object_kept_pack(const struct object_id *oid,
> > + unsigned flags)
> > {
> > struct pack_entry e;
> > return find_kept_pack_entry(the_repository, oid, flags, &e);
>
> This seems like a stray change.
Good eyes, thanks.
>
> > diff --git a/packfile.h b/packfile.h
> > index 624327f64d..eb56db2a7b 100644
> > --- a/packfile.h
> > +++ b/packfile.h
> > @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
> > void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
> > const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
> >
> > -#define ON_DISK_KEEP_PACKS 1
> > -#define IN_CORE_KEEP_PACKS 2
> > -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)
>
> I notice that when the constants moved, we didn't keep an equivalent of
> ALL_KEEP_PACKS. Maybe we didn't need it in the first place in patch 1?
Yeah, we didn't need it to begin with. I'll drop it accordingly.
> BTW, I absolutely hate the complication that all of this on-disk
> versus in-core keep distinction brings to this code. And I wondered
> what it was really doing for us and whether we could get rid of it.
> But I think we do need it: a common case may be to avoid using
> --honor-pack-keep (because you don't want to deal with racy .keep
> writes from incoming receive-pack processes), but use in-core ones for
> something like --stdin-packs. So we do need to respect one and not the
> other.
>
> I do wonder if things would be simpler if pack-objects simply kept its
> own list of "in core" packs in a separate array. But that is really
> just another form of the same problem, I guess.
Yeah, the complexity is awfully hard to reason about, but you're right
that here it is necessary.
> -Peff
Thanks,
Taylor
next prev parent reply other threads:[~2021-02-17 19:55 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40 ` Derrick Stolee
2021-01-20 14:38 ` Taylor Blau
2021-01-29 2:33 ` Junio C Hamano
2021-01-29 18:38 ` Taylor Blau
2021-01-29 19:31 ` Jeff King
2021-01-29 20:20 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29 3:10 ` Junio C Hamano
2021-01-29 19:13 ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29 3:21 ` Junio C Hamano
2021-01-29 19:19 ` Jeff King
2021-01-29 20:01 ` Taylor Blau
2021-01-29 20:25 ` Jeff King
2021-01-29 22:10 ` Taylor Blau
2021-01-29 22:57 ` Jeff King
2021-01-29 23:03 ` Junio C Hamano
2021-01-29 23:28 ` Taylor Blau
2021-02-02 3:04 ` Taylor Blau
2021-01-29 23:31 ` Jeff King
2021-01-29 22:13 ` Junio C Hamano
2021-01-29 20:30 ` Junio C Hamano
2021-01-29 22:43 ` Jeff King
2021-01-29 22:53 ` Taylor Blau
2021-01-29 23:00 ` Jeff King
2021-01-29 23:10 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29 3:40 ` Junio C Hamano
2021-01-29 19:32 ` Jeff King
2021-01-29 20:04 ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19 ` Eric Sunshine
2021-01-29 20:27 ` Jeff King
2021-01-29 20:36 ` Eric Sunshine
2021-01-29 22:11 ` Taylor Blau
2021-01-29 20:38 ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10 ` Jeff King
2021-01-29 23:12 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59 ` Derrick Stolee
2021-01-20 14:34 ` Taylor Blau
2021-01-20 15:51 ` Derrick Stolee
2021-01-21 3:45 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04 3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04 3:58 ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42 ` Jeff King
2021-02-16 21:48 ` Taylor Blau
2021-02-04 3:58 ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17 ` Jeff King
2021-02-17 18:35 ` Taylor Blau
2021-02-04 3:59 ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46 ` Jeff King
2021-02-17 18:59 ` Taylor Blau
2021-02-17 19:21 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04 3:59 ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58 ` Jeff King
2021-02-17 0:02 ` Jeff King
2021-02-17 19:13 ` Taylor Blau
2021-02-17 19:25 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05 ` Jeff King
2021-02-17 19:23 ` Taylor Blau
2021-02-17 19:29 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11 ` Jeff King
2021-02-17 19:54 ` Taylor Blau [this message]
2021-02-17 20:25 ` Jeff King
2021-02-17 20:29 ` Taylor Blau
2021-02-17 21:43 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17 ` Jeff King
2021-02-17 20:01 ` Taylor Blau
2021-02-17 0:01 ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18 ` Jeff King
2021-02-18 3:14 ` [PATCH v3 " Taylor Blau
2021-02-18 3:14 ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18 3:14 ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18 3:14 ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18 3:14 ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18 3:14 ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18 3:14 ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23 0:31 ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 1:06 ` Taylor Blau
2021-02-23 1:42 ` Jeff King
2021-02-23 2:24 ` [PATCH v4 " Taylor Blau
2021-02-23 2:25 ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23 8:07 ` Junio C Hamano
2021-02-23 18:51 ` Jeff King
2021-02-23 2:25 ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23 2:25 ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23 2:25 ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23 2:25 ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23 2:25 ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19 ` Junio C Hamano
2021-02-24 23:43 ` Junio C Hamano
2021-03-04 21:40 ` Taylor Blau
2021-03-04 21:55 ` Taylor Blau
2021-02-23 3:39 ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 7:43 ` Junio C Hamano
2021-02-23 18:44 ` Jeff King
2021-02-23 19:54 ` Martin Fick
2021-02-23 20:06 ` Taylor Blau
2021-02-23 21:57 ` Martin Fick
2021-02-23 20:15 ` Jeff King
2021-02-23 21:41 ` Martin Fick
2021-02-23 21:53 ` Jeff King
2021-02-24 18:13 ` Martin Fick
2021-02-26 6:23 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YC10eZkpqtzLlJUP@nand.local \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).