All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com
Subject: Re: [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry()
Date: Wed, 17 Feb 2021 12:11:00 -0500	[thread overview]
Message-ID: <YC1OJDFXPnxGMHPK@coredump.intra.peff.net> (raw)
In-Reply-To: <f1c07324f62cf4d087c41165cefed98f554cfd78.1612411124.git.me@ttaylorr.com>

On Wed, Feb 03, 2021 at 10:59:21PM -0500, Taylor Blau wrote:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index fbd7b54d70..b2ba5aa14f 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
>  		 */
>  		unsigned flags = 0;
>  		if (ignore_packed_keep_on_disk)
> -			flags |= ON_DISK_KEEP_PACKS;
> +			flags |= CACHE_ON_DISK_KEEP_PACKS;
>  		if (ignore_packed_keep_in_core)
> -			flags |= IN_CORE_KEEP_PACKS;
> +			flags |= CACHE_IN_CORE_KEEP_PACKS;

Why are we renaming the constants in this patch?

I know I'm listed as the author, but I think this came out of some
off-list back and forth between us. It seems like the existing constants
would have been fine.

> +static void maybe_invalidate_kept_pack_cache(struct repository *r,
> +					     unsigned flags)
>  {
> -	return find_one_pack_entry(r, oid, e, 0);
> +	if (!r->objects->kept_pack_cache)
> +		return;
> +	if (r->objects->kept_pack_cache->flags == flags)
> +		return;
> +	free(r->objects->kept_pack_cache->packs);
> +	FREE_AND_NULL(r->objects->kept_pack_cache);
> +}

OK, so we keep a single cache based on the flags, and then if somebody
ever asks for different flags, we throw it away. That's probably OK for
our purposes, since we wouldn't expect multiple callers within a single
process.

I wondered if it would be simpler to just keep two lists, one for
in-core keeps and one for on-disk keeps. And then just walk over each
list separately based on the query flags. That makes things more robust
_and_ I think would be less code. It does mean that a pack could appear
in both lists, though, which means we might do a lookup in it twice.
That doesn't seem all that likely, but it is working against our goal
here.

Another option is to keep 3 caches (two separate and one combined),
rather than flipping between them. I'm not sure if that would be less
code or not (it gets rid of the "invalidate" function, but you do have
to pick the right cache depending on the query flags).

Yet another option is to keep a cache of any that are marked as _either_
in core or on-disk keeps, and then decide to look up the object based on
the query flags. Then you just pay the cost to iterate over the list and
check the flags (which really is all this cache is helping with in the
first place).

I dunno. TBH, I kind of wonder if this whole patch is worth doing at
all, giving the underwhelming performance benefit (3% on the
pathological 1000-pack case). When I had timed this strategy initially,
it was more like 15%. I'm not sure where the savings went in the
interim, or if it was a timing fluke.

> +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
> +{
> +	maybe_invalidate_kept_pack_cache(r, flags);
> +
> +	if (!r->objects->kept_pack_cache) {
> +		struct packed_git **packs = NULL;
> +		size_t nr = 0, alloc = 0;
> +		struct packed_git *p;
> +
> +		/*
> +		 * We want "all" packs here, because we need to cover ones that
> +		 * are used by a midx, as well. We need to look in every one of
> +		 * them (instead of the midx itself) to cover duplicates. It's
> +		 * possible that an object is found in two packs that the midx
> +		 * covers, one kept and one not kept, but the midx returns only
> +		 * the non-kept version.
> +		 */
> +		for (p = get_all_packs(r); p; p = p->next) {
> +			if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
> +			    (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
> +				ALLOC_GROW(packs, nr + 1, alloc);
> +				packs[nr++] = p;
> +			}
> +		}
> +		ALLOC_GROW(packs, nr + 1, alloc);
> +		packs[nr] = NULL;
> +
> +		r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
> +		r->objects->kept_pack_cache->packs = packs;
> +		r->objects->kept_pack_cache->flags = flags;
> +	}

Is there any reason not to just embed the kept_pack_cache struct inside
the object_store? It's one less pointer to deal with. I wonder if this
is a holdover from an attempt to have multiple caches.

(I also think it would be reasonable if we wanted to hide the definition
of the cache struct from callers, but we don't seem do to that).

> @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
>  	return find_pack_entry(the_repository, oid, &e);
>  }
>  
> -int has_object_kept_pack(const struct object_id *oid, unsigned flags)
> +int has_object_kept_pack(const struct object_id *oid,
> +			 unsigned flags)
>  {
>  	struct pack_entry e;
>  	return find_kept_pack_entry(the_repository, oid, flags, &e);

This seems like a stray change.

> diff --git a/packfile.h b/packfile.h
> index 624327f64d..eb56db2a7b 100644
> --- a/packfile.h
> +++ b/packfile.h
> @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
>  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
>  const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
>  
> -#define ON_DISK_KEEP_PACKS 1
> -#define IN_CORE_KEEP_PACKS 2
> -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)

I notice that when the constants moved, we didn't keep an equivalent of
ALL_KEEP_PACKS. Maybe we didn't need it in the first place in patch 1?

  BTW, I absolutely hate the complication that all of this on-disk
  versus in-core keep distinction brings to this code. And I wondered
  what it was really doing for us and whether we could get rid of it.
  But I think we do need it: a common case may be to avoid using
  --honor-pack-keep (because you don't want to deal with racy .keep
  writes from incoming receive-pack processes), but use in-core ones for
  something like --stdin-packs. So we do need to respect one and not the
  other.

  I do wonder if things would be simpler if pack-objects simply kept its
  own list of "in core" packs in a separate array. But that is really
  just another form of the same problem, I guess.

-Peff

  reply	other threads:[~2021-02-17 17:11 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40   ` Derrick Stolee
2021-01-20 14:38     ` Taylor Blau
2021-01-29  2:33   ` Junio C Hamano
2021-01-29 18:38     ` Taylor Blau
2021-01-29 19:31     ` Jeff King
2021-01-29 20:20       ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29  3:10   ` Junio C Hamano
2021-01-29 19:13     ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29  3:21   ` Junio C Hamano
2021-01-29 19:19     ` Jeff King
2021-01-29 20:01       ` Taylor Blau
2021-01-29 20:25         ` Jeff King
2021-01-29 22:10           ` Taylor Blau
2021-01-29 22:57             ` Jeff King
2021-01-29 23:03             ` Junio C Hamano
2021-01-29 23:28               ` Taylor Blau
2021-02-02  3:04                 ` Taylor Blau
2021-01-29 23:31               ` Jeff King
2021-01-29 22:13           ` Junio C Hamano
2021-01-29 20:30       ` Junio C Hamano
2021-01-29 22:43         ` Jeff King
2021-01-29 22:53           ` Taylor Blau
2021-01-29 23:00             ` Jeff King
2021-01-29 23:10             ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29  3:40   ` Junio C Hamano
2021-01-29 19:32     ` Jeff King
2021-01-29 20:04       ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19         ` Eric Sunshine
2021-01-29 20:27           ` Jeff King
2021-01-29 20:36             ` Eric Sunshine
2021-01-29 22:11               ` Taylor Blau
2021-01-29 20:38       ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10         ` Jeff King
2021-01-29 23:12           ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59   ` Derrick Stolee
2021-01-20 14:34     ` Taylor Blau
2021-01-20 15:51       ` Derrick Stolee
2021-01-21  3:45     ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04  3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04  3:58   ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42     ` Jeff King
2021-02-16 21:48       ` Taylor Blau
2021-02-04  3:58   ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17     ` Jeff King
2021-02-17 18:35       ` Taylor Blau
2021-02-04  3:59   ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46     ` Jeff King
2021-02-17 18:59       ` Taylor Blau
2021-02-17 19:21         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04  3:59   ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58     ` Jeff King
2021-02-17  0:02       ` Jeff King
2021-02-17 19:13       ` Taylor Blau
2021-02-17 19:25         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05     ` Jeff King
2021-02-17 19:23       ` Taylor Blau
2021-02-17 19:29         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11     ` Jeff King [this message]
2021-02-17 19:54       ` Taylor Blau
2021-02-17 20:25         ` Jeff King
2021-02-17 20:29           ` Taylor Blau
2021-02-17 21:43             ` Jeff King
2021-02-04  3:59   ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17     ` Jeff King
2021-02-17 20:01       ` Taylor Blau
2021-02-17  0:01   ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18     ` Jeff King
2021-02-18  3:14 ` [PATCH v3 " Taylor Blau
2021-02-18  3:14   ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18  3:14   ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18  3:14   ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18  3:14   ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18  3:14   ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18  3:14   ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23  0:31   ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  1:06     ` Taylor Blau
2021-02-23  1:42       ` Jeff King
2021-02-23  2:24 ` [PATCH v4 " Taylor Blau
2021-02-23  2:25   ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23  8:07     ` Junio C Hamano
2021-02-23 18:51       ` Jeff King
2021-02-23  2:25   ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23  2:25   ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23  2:25   ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23  2:25   ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23  2:25   ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19     ` Junio C Hamano
2021-02-24 23:43       ` Junio C Hamano
2021-03-04 21:40         ` Taylor Blau
2021-03-04 21:55       ` Taylor Blau
2021-02-23  3:39   ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  7:43   ` Junio C Hamano
2021-02-23 18:44     ` Jeff King
2021-02-23 19:54       ` Martin Fick
2021-02-23 20:06         ` Taylor Blau
2021-02-23 21:57           ` Martin Fick
2021-02-23 20:15         ` Jeff King
2021-02-23 21:41           ` Martin Fick
2021-02-23 21:53             ` Jeff King
2021-02-24 18:13               ` Martin Fick
2021-02-26  6:23                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YC1OJDFXPnxGMHPK@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.