git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry()
Date: Wed, 17 Feb 2021 14:54:33 -0500	[thread overview]
Message-ID: <YC10eZkpqtzLlJUP@nand.local> (raw)
In-Reply-To: <YC1OJDFXPnxGMHPK@coredump.intra.peff.net>

On Wed, Feb 17, 2021 at 12:11:00PM -0500, Jeff King wrote:
> On Wed, Feb 03, 2021 at 10:59:21PM -0500, Taylor Blau wrote:
>
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index fbd7b54d70..b2ba5aa14f 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
> >  		 */
> >  		unsigned flags = 0;
> >  		if (ignore_packed_keep_on_disk)
> > -			flags |= ON_DISK_KEEP_PACKS;
> > +			flags |= CACHE_ON_DISK_KEEP_PACKS;
> >  		if (ignore_packed_keep_in_core)
> > -			flags |= IN_CORE_KEEP_PACKS;
> > +			flags |= CACHE_IN_CORE_KEEP_PACKS;
>
> Why are we renaming the constants in this patch?
>
> I know I'm listed as the author, but I think this came out of some
> off-list back and forth between us. It seems like the existing constants
> would have been fine.

Yeah, they would have been fine. They were renamed because this patch
makes them only used for the kept pack cache, but I agree the existing
names are fine, too.

In any case, they make an easier-to-read diff, so I'm perfectly happy to
un-rename them ;).

> > +static void maybe_invalidate_kept_pack_cache(struct repository *r,
> > +					     unsigned flags)
> >  {
> > -	return find_one_pack_entry(r, oid, e, 0);
> > +	if (!r->objects->kept_pack_cache)
> > +		return;
> > +	if (r->objects->kept_pack_cache->flags == flags)
> > +		return;
> > +	free(r->objects->kept_pack_cache->packs);
> > +	FREE_AND_NULL(r->objects->kept_pack_cache);
> > +}
>
> OK, so we keep a single cache based on the flags, and then if somebody
> ever asks for different flags, we throw it away. That's probably OK for
> our purposes, since we wouldn't expect multiple callers within a single
> process.
>
> I wondered if it would be simpler to just keep two lists, one for
> in-core keeps and one for on-disk keeps. And then just walk over each
> list separately based on the query flags. That makes things more robust
> _and_ I think would be less code. It does mean that a pack could appear
> in both lists, though, which means we might do a lookup in it twice.
> That doesn't seem all that likely, but it is working against our goal
> here.
>
> Another option is to keep 3 caches (two separate and one combined),
> rather than flipping between them. I'm not sure if that would be less
> code or not (it gets rid of the "invalidate" function, but you do have
> to pick the right cache depending on the query flags).
>
> Yet another option is to keep a cache of any that are marked as _either_
> in core or on-disk keeps, and then decide to look up the object based on
> the query flags. Then you just pay the cost to iterate over the list and
> check the flags (which really is all this cache is helping with in the
> first place).

All interesting ideas. In this patch (and by the end of the series)
callers that use the kept pack cache never ask for the cache with a
different set of flags. IOW, there isn't a situation where a caller
would populate the in-core kept pack cache, and then suddenly ask for
both in-core and on-disk packs to be kept.

So all of this code is defensive in case that were to change, and
suddenly we'd be returning subtly wrong results. I could imagine that
being kind of a nasty bug to track down, so detecting and invalidating
the cache would make it a non-issue.

I'll note it in the commit message, though, since it's good for future
readers to be aware, too.

> I dunno. TBH, I kind of wonder if this whole patch is worth doing at
> all, giving the underwhelming performance benefit (3% on the
> pathological 1000-pack case). When I had timed this strategy initially,
> it was more like 15%. I'm not sure where the savings went in the
> interim, or if it was a timing fluke.

Yeah, I dunno. It's certainly not hurting (I don't think the extra code
is all that complex, and the savings is at least non-zero), so I'm
inclined to keep it.

> > +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
> > +{
> > +	maybe_invalidate_kept_pack_cache(r, flags);
> > +
> > +	if (!r->objects->kept_pack_cache) {
> > +		struct packed_git **packs = NULL;
> > +		size_t nr = 0, alloc = 0;
> > +		struct packed_git *p;
> > +
> > +		/*
> > +		 * We want "all" packs here, because we need to cover ones that
> > +		 * are used by a midx, as well. We need to look in every one of
> > +		 * them (instead of the midx itself) to cover duplicates. It's
> > +		 * possible that an object is found in two packs that the midx
> > +		 * covers, one kept and one not kept, but the midx returns only
> > +		 * the non-kept version.
> > +		 */
> > +		for (p = get_all_packs(r); p; p = p->next) {
> > +			if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
> > +			    (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
> > +				ALLOC_GROW(packs, nr + 1, alloc);
> > +				packs[nr++] = p;
> > +			}
> > +		}
> > +		ALLOC_GROW(packs, nr + 1, alloc);
> > +		packs[nr] = NULL;
> > +
> > +		r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
> > +		r->objects->kept_pack_cache->packs = packs;
> > +		r->objects->kept_pack_cache->flags = flags;
> > +	}
>
> Is there any reason not to just embed the kept_pack_cache struct inside
> the object_store? It's one less pointer to deal with. I wonder if this
> is a holdover from an attempt to have multiple caches.
>
> (I also think it would be reasonable if we wanted to hide the definition
> of the cache struct from callers, but we don't seem do to that).

Not a holdover, just designed to avoid adding too many extra fields to
the object-store. I don't feel strongly, but I do think hiding the
definition is a good idea, so I'll inline it.

> > @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
> >  	return find_pack_entry(the_repository, oid, &e);
> >  }
> >
> > -int has_object_kept_pack(const struct object_id *oid, unsigned flags)
> > +int has_object_kept_pack(const struct object_id *oid,
> > +			 unsigned flags)
> >  {
> >  	struct pack_entry e;
> >  	return find_kept_pack_entry(the_repository, oid, flags, &e);
>
> This seems like a stray change.

Good eyes, thanks.

>
> > diff --git a/packfile.h b/packfile.h
> > index 624327f64d..eb56db2a7b 100644
> > --- a/packfile.h
> > +++ b/packfile.h
> > @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
> >  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
> >  const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
> >
> > -#define ON_DISK_KEEP_PACKS 1
> > -#define IN_CORE_KEEP_PACKS 2
> > -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)
>
> I notice that when the constants moved, we didn't keep an equivalent of
> ALL_KEEP_PACKS. Maybe we didn't need it in the first place in patch 1?

Yeah, we didn't need it to begin with. I'll drop it accordingly.

>   BTW, I absolutely hate the complication that all of this on-disk
>   versus in-core keep distinction brings to this code. And I wondered
>   what it was really doing for us and whether we could get rid of it.
>   But I think we do need it: a common case may be to avoid using
>   --honor-pack-keep (because you don't want to deal with racy .keep
>   writes from incoming receive-pack processes), but use in-core ones for
>   something like --stdin-packs. So we do need to respect one and not the
>   other.
>
>   I do wonder if things would be simpler if pack-objects simply kept its
>   own list of "in core" packs in a separate array. But that is really
>   just another form of the same problem, I guess.

Yeah, the complexity is awfully hard to reason about, but you're right
that here it is necessary.

> -Peff

Thanks,
Taylor

  reply	other threads:[~2021-02-17 19:55 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40   ` Derrick Stolee
2021-01-20 14:38     ` Taylor Blau
2021-01-29  2:33   ` Junio C Hamano
2021-01-29 18:38     ` Taylor Blau
2021-01-29 19:31     ` Jeff King
2021-01-29 20:20       ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29  3:10   ` Junio C Hamano
2021-01-29 19:13     ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29  3:21   ` Junio C Hamano
2021-01-29 19:19     ` Jeff King
2021-01-29 20:01       ` Taylor Blau
2021-01-29 20:25         ` Jeff King
2021-01-29 22:10           ` Taylor Blau
2021-01-29 22:57             ` Jeff King
2021-01-29 23:03             ` Junio C Hamano
2021-01-29 23:28               ` Taylor Blau
2021-02-02  3:04                 ` Taylor Blau
2021-01-29 23:31               ` Jeff King
2021-01-29 22:13           ` Junio C Hamano
2021-01-29 20:30       ` Junio C Hamano
2021-01-29 22:43         ` Jeff King
2021-01-29 22:53           ` Taylor Blau
2021-01-29 23:00             ` Jeff King
2021-01-29 23:10             ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29  3:40   ` Junio C Hamano
2021-01-29 19:32     ` Jeff King
2021-01-29 20:04       ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19         ` Eric Sunshine
2021-01-29 20:27           ` Jeff King
2021-01-29 20:36             ` Eric Sunshine
2021-01-29 22:11               ` Taylor Blau
2021-01-29 20:38       ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10         ` Jeff King
2021-01-29 23:12           ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59   ` Derrick Stolee
2021-01-20 14:34     ` Taylor Blau
2021-01-20 15:51       ` Derrick Stolee
2021-01-21  3:45     ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04  3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04  3:58   ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42     ` Jeff King
2021-02-16 21:48       ` Taylor Blau
2021-02-04  3:58   ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17     ` Jeff King
2021-02-17 18:35       ` Taylor Blau
2021-02-04  3:59   ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46     ` Jeff King
2021-02-17 18:59       ` Taylor Blau
2021-02-17 19:21         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04  3:59   ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58     ` Jeff King
2021-02-17  0:02       ` Jeff King
2021-02-17 19:13       ` Taylor Blau
2021-02-17 19:25         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05     ` Jeff King
2021-02-17 19:23       ` Taylor Blau
2021-02-17 19:29         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11     ` Jeff King
2021-02-17 19:54       ` Taylor Blau [this message]
2021-02-17 20:25         ` Jeff King
2021-02-17 20:29           ` Taylor Blau
2021-02-17 21:43             ` Jeff King
2021-02-04  3:59   ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17     ` Jeff King
2021-02-17 20:01       ` Taylor Blau
2021-02-17  0:01   ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18     ` Jeff King
2021-02-18  3:14 ` [PATCH v3 " Taylor Blau
2021-02-18  3:14   ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18  3:14   ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18  3:14   ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18  3:14   ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18  3:14   ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18  3:14   ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23  0:31   ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  1:06     ` Taylor Blau
2021-02-23  1:42       ` Jeff King
2021-02-23  2:24 ` [PATCH v4 " Taylor Blau
2021-02-23  2:25   ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23  8:07     ` Junio C Hamano
2021-02-23 18:51       ` Jeff King
2021-02-23  2:25   ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23  2:25   ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23  2:25   ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23  2:25   ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23  2:25   ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19     ` Junio C Hamano
2021-02-24 23:43       ` Junio C Hamano
2021-03-04 21:40         ` Taylor Blau
2021-03-04 21:55       ` Taylor Blau
2021-02-23  3:39   ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  7:43   ` Junio C Hamano
2021-02-23 18:44     ` Jeff King
2021-02-23 19:54       ` Martin Fick
2021-02-23 20:06         ` Taylor Blau
2021-02-23 21:57           ` Martin Fick
2021-02-23 20:15         ` Jeff King
2021-02-23 21:41           ` Martin Fick
2021-02-23 21:53             ` Jeff King
2021-02-24 18:13               ` Martin Fick
2021-02-26  6:23                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YC10eZkpqtzLlJUP@nand.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).