All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 1/5] speed up alt_odb_usable() with many alternates
Date: Tue, 6 Jul 2021 23:01:15 +0000	[thread overview]
Message-ID: <20210706230115.GA8624@dcvr> (raw)
In-Reply-To: <fc342ddd-1b35-62cc-dd4b-e0462d595819@web.de>

René Scharfe <l.s.r@web.de> wrote:
> Am 29.06.21 um 22:53 schrieb Eric Wong:
> > With many alternates, the duplicate check in alt_odb_usable()
> > wastes many cycles doing repeated fspathcmp() on every existing
> > alternate.  Use a khash to speed up lookups by odb->path.
> >
> > Since the kh_put_* API uses the supplied key without
> > duplicating it, we also take advantage of it to replace both
> > xstrdup() and strbuf_release() in link_alt_odb_entry() with
> > strbuf_detach() to avoid the allocation and copy.
> >
> > In a test repository with 50K alternates and each of those 50K
> > alternates having one alternate each (for a total of 100K total
> > alternates); this speeds up lookup of a non-existent blob from
> > over 16 minutes to roughly 2.7 seconds on my busy workstation.
> 
> Yay for hashmaps! :)
> 
> > Note: all underlying git object directories were small and
> > unpacked with only loose objects and no packs.  Having to load
> > packs increases times significantly.
> >
> > Signed-off-by: Eric Wong <e@80x24.org>
> > ---
> >  object-file.c  | 33 ++++++++++++++++++++++-----------
> >  object-store.h | 17 +++++++++++++++++
> >  object.c       |  2 ++
> >  3 files changed, 41 insertions(+), 11 deletions(-)
> >
> > diff --git a/object-file.c b/object-file.c
> > index f233b440b2..304af3a172 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -517,9 +517,9 @@ const char *loose_object_path(struct repository *r, struct strbuf *buf,
> >   */
> >  static int alt_odb_usable(struct raw_object_store *o,
> >  			  struct strbuf *path,
> > -			  const char *normalized_objdir)
> > +			  const char *normalized_objdir, khiter_t *pos)
> >  {
> > -	struct object_directory *odb;
> > +	int r;
> >
> >  	/* Detect cases where alternate disappeared */
> >  	if (!is_directory(path->buf)) {
> > @@ -533,14 +533,22 @@ static int alt_odb_usable(struct raw_object_store *o,
> >  	 * Prevent the common mistake of listing the same
> >  	 * thing twice, or object directory itself.
> >  	 */
> > -	for (odb = o->odb; odb; odb = odb->next) {
> > -		if (!fspathcmp(path->buf, odb->path))
> > -			return 0;
> > +	if (!o->odb_by_path) {
> > +		khiter_t p;
> > +
> > +		o->odb_by_path = kh_init_odb_path_map();
> > +		assert(!o->odb->next);
> > +		p = kh_put_odb_path_map(o->odb_by_path, o->odb->path, &r);
> 
> So on the first run you not just create the hashmap, but you also
> pre-populate it with the main object directory.  Makes sense.  The
> hashmap wouldn't even be created in repositories without alternates.
> 
> > +		if (r < 0) die_errno(_("kh_put_odb_path_map"));
> 
> Our other callers don't handle a negative return code because it would
> indicate an allocation failure, and in our version we use ALLOC_ARRAY,
> which dies on error.  So you don't need that check here, but we better
> clarify that in khash.h.
> 
> > +		assert(r == 1); /* never used */
> > +		kh_value(o->odb_by_path, p) = o->odb;
> >  	}
> >  	if (!fspathcmp(path->buf, normalized_objdir))
> >  		return 0;
> > -
> > -	return 1;
> > +	*pos = kh_put_odb_path_map(o->odb_by_path, path->buf, &r);
> > +	if (r < 0) die_errno(_("kh_put_odb_path_map"));
> 
> Dito.
> 
> > +	/* r: 0 = exists, 1 = never used, 2 = deleted */
> > +	return r == 0 ? 0 : 1;
> 
> The comment indicates that khash would be nicer to use if it had an
> enum for the kh_put return values.  Perhaps, but that should be done in
> another series.

Agreed for another series.  I've also found myself wishing khash
used enums.  But I'm also not sure how much changing of 3rd
party code we should be doing...

> I like the solution in oidset.c to make this more readable, though: Call
> the return value "added" instead of "r" and then a "return !added;"
> makes sense without additional comments.
> 
> >  }
> >
> >  /*
> > diff --git a/object-store.h b/object-store.h
> > index ec32c23dcb..20c1cedb75 100644
> > --- a/object-store.h
> > +++ b/object-store.h
> > @@ -7,6 +7,8 @@
> >  #include "oid-array.h"
> >  #include "strbuf.h"
> >  #include "thread-utils.h"
> > +#include "khash.h"
> > +#include "dir.h"
> >
> >  struct object_directory {
> >  	struct object_directory *next;
> > @@ -30,6 +32,19 @@ struct object_directory {
> >  	char *path;
> >  };
> >
> > +static inline int odb_path_eq(const char *a, const char *b)
> > +{
> > +	return !fspathcmp(a, b);
> > +}
> 
> This is not specific to the object store.  It could be called fspatheq
> and live in dir.h.  Or dir.c -- a surprising amount of code seems to
> necessary for that negation (https://godbolt.org/z/MY7Wda3a7).  Anyway,
> it's just an idea for another series.

No JS here for godbolt, but there's also a bunch of "!fspathcmp"
here that could probably be changed to fspatheq.

> > +
> > +static inline int odb_path_hash(const char *str)
> > +{
> > +	return ignore_case ? strihash(str) : __ac_X31_hash_string(str);
> > +}
> 
> The internal Attractive Chaos (__ac_*) macros should be left confined
> to khash.h, I think.  Its alias kh_str_hash_func would be better
> suited here.
> 
> Do we want to use the K&R hash function here at all, though?  If we
> use FNV-1 when ignoring case, why not also use it (i.e. strhash) when
> respecting it?  At least that's done in builtin/sparse-checkout.c,
> dir.c and merge-recursive.c.  This is just handwaving and yammering
> about lack of symmetry, but I do wonder how your performance numbers
> look with strhash.  If it's fine then we could package this up as
> fspathhash..

Yeah, I think fspathhash should be path_hash in merge-recursive.c
(and path_hash eliminated).

I don't have performance numbers, and I doubt hash function
performance is much overhead, here.  I used X31 since it was
local to khash.

I would prefer we only have one non-cryptographic hash
implementation to reduce cognitive overhead, so maybe we can
drop X31 entirely for FNV-1.  I'd also prefer we only have khash
or hashmap, not both.

> And I also wonder how it looks if you use strihash unconditionally.
> I guess case collisions are usually rare and branching based on a
> global variable may be more expensive than case folding.

*shrug* I'll let somebody with more appropriate systems do
benchmarks, there.  But it could be an easy switch once
fspathhash is in place.

  parent reply	other threads:[~2021-07-06 23:01 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24  0:58 [PATCH] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47 ` [PATCH 0/5] optimizations for many odb alternates Eric Wong
2021-06-27  2:47   ` [PATCH 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-06-27  2:47   ` [PATCH 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-27  2:47   ` [PATCH 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-27 10:23     ` René Scharfe
2021-06-28 23:09       ` Eric Wong
2021-06-27  2:47   ` [PATCH 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-27  2:47   ` [PATCH 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 14:42     ` Junio C Hamano
2021-06-29 20:17       ` Eric Wong
2021-06-29 20:53   ` [PATCH v2 0/5] optimizations for many alternates Eric Wong
2021-07-07 23:10     ` [PATCH v3 " Eric Wong
2021-07-07 23:10     ` [PATCH v3 1/5] speed up alt_odb_usable() with " Eric Wong
2021-07-08  0:20       ` Junio C Hamano
2021-07-08  1:14         ` Eric Wong
2021-07-08  4:30           ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-07-08  4:57       ` Junio C Hamano
2021-07-07 23:10     ` [PATCH v3 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-07-07 23:10     ` [PATCH v3 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-07-07 23:10     ` [PATCH v3 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-06-29 20:53   ` [PATCH v2 1/5] speed up alt_odb_usable() with many alternates Eric Wong
2021-07-03 10:05     ` René Scharfe
2021-07-04  9:02       ` René Scharfe
2021-07-06 23:01       ` Eric Wong [this message]
2021-06-29 20:53   ` [PATCH v2 2/5] avoid strlen via strbuf_addstr in link_alt_odb_entry Eric Wong
2021-06-29 20:53   ` [PATCH v2 3/5] make object_directory.loose_objects_subdir_seen a bitmap Eric Wong
2021-06-29 20:53   ` [PATCH v2 4/5] oidcpy_with_padding: constify `src' arg Eric Wong
2021-06-29 20:53   ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache Eric Wong
2021-07-04  9:02     ` René Scharfe
2021-07-06 23:21       ` Eric Wong
2021-07-04  9:32     ` Ævar Arnfjörð Bjarmason
2021-07-07 23:12       ` Eric Wong
2021-08-06 15:31     ` Andrzej Hunt
2021-08-06 17:53       ` René Scharfe
2021-08-07 22:49         ` Eric Wong
2021-08-09  1:35           ` Carlo Arenas
2021-08-09  1:38             ` [PATCH/RFC 0/3] pedantic errors in next Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 1/3] oidtree: avoid nested struct oidtree_node Carlo Marcelo Arenas Belón
2021-08-09  1:38               ` [PATCH/RFC 2/3] object-store: avoid extra ';' from KHASH_INIT Carlo Marcelo Arenas Belón
2021-08-09 15:53                 ` Junio C Hamano
2021-08-09  1:38               ` [PATCH/RFC 3/3] ci: run a pedantic build as part of the GitHub workflow Carlo Marcelo Arenas Belón
2021-08-09 10:50                 ` Bagas Sanjaya
2021-08-09 22:03                   ` Carlo Arenas
2021-08-09 14:56                 ` Phillip Wood
2021-08-09 22:48                   ` Carlo Arenas
2021-08-10 15:24                     ` Phillip Wood
2021-08-10 18:25                       ` Junio C Hamano
2021-08-30 11:36                   ` Ævar Arnfjörð Bjarmason
2021-08-31 20:28                     ` Carlo Arenas
2021-08-31 20:51                       ` Ævar Arnfjörð Bjarmason
2021-08-31 23:54                         ` Carlo Arenas
2021-09-01  1:52                           ` Jeff King
2021-09-01 17:55                             ` Junio C Hamano
2021-08-30 11:40                 ` Ævar Arnfjörð Bjarmason
2021-09-01  9:19                 ` [RFC PATCH v2 0/4] developer: support pedantic Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 1/4] developer: retire USE_PARENS_AROUND_GETTEXT_N support Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 2/4] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 3/4] developer: add an alternative script for detecting broken N_() Carlo Marcelo Arenas Belón
2021-09-01  9:19                   ` [RFC PATCH v2 4/4] developer: move detect-compiler out of the main directory Carlo Marcelo Arenas Belón
2021-09-01 10:10                   ` [RFC PATCH v2 0/4] developer: support pedantic Jeff King
2021-09-01 11:25                     ` [PATCH] gettext: remove optional non-standard parens in N_() definition Ævar Arnfjörð Bjarmason
2021-09-01 17:31                       ` Eric Sunshine
2021-09-02  9:13                       ` Jeff King
2021-09-02 19:19                       ` Junio C Hamano
2021-09-01 11:27                     ` [RFC PATCH v2 0/4] developer: support pedantic Ævar Arnfjörð Bjarmason
2021-09-01 18:03                       ` Carlo Arenas
2021-09-03 17:02                   ` [PATCH v3 0/3] support pedantic in developer mode Carlo Marcelo Arenas Belón
2021-09-03 17:02                     ` [PATCH v3 1/3] gettext: remove optional non-standard parens in N_() definition Carlo Marcelo Arenas Belón
2021-09-10 15:39                       ` Ævar Arnfjörð Bjarmason
2021-09-03 17:02                     ` [PATCH v3 2/3] win32: allow building with pedantic mode enabled Carlo Marcelo Arenas Belón
2021-09-03 18:47                       ` René Scharfe
2021-09-03 20:13                         ` Carlo Marcelo Arenas Belón
2021-09-03 20:32                           ` Junio C Hamano
2021-09-03 20:38                           ` René Scharfe
2021-09-04  9:37                             ` René Scharfe
2021-09-04 14:42                               ` Carlo Arenas
2021-09-27 23:04                       ` Jonathan Tan
2021-09-28  0:30                         ` Carlo Arenas
2021-09-28 16:50                           ` Jonathan Tan
2021-09-28 17:37                           ` Junio C Hamano
2021-09-28 20:16                             ` Jonathan Tan
2021-09-29  1:00                               ` Carlo Arenas
2021-09-29 15:55                                 ` Junio C Hamano
2021-09-03 17:02                     ` [PATCH v3 3/3] developer: enable pedantic by default Carlo Marcelo Arenas Belón
2021-09-05  7:54                     ` [PATCH v3 0/3] support pedantic in developer mode Ævar Arnfjörð Bjarmason
2021-08-09 16:44               ` [PATCH/RFC 0/3] pedantic errors in next Junio C Hamano
2021-08-09 20:10                 ` Eric Wong
2021-08-10  6:16                 ` Carlo Marcelo Arenas Belón
2021-08-10 19:30                   ` René Scharfe
2021-08-10 23:49                     ` Carlo Arenas
2021-08-11  0:57                       ` Carlo Arenas
2021-08-11 14:57                       ` René Scharfe
2021-08-11 17:20                         ` Junio C Hamano
2021-08-10 18:59             ` [PATCH v2 5/5] oidtree: a crit-bit tree for odb_loose_cache René Scharfe
2021-08-10 19:40           ` René Scharfe
2021-08-14 20:00       ` [PATCH] oidtree: avoid unaligned access to crit-bit tree René Scharfe
2021-08-16 19:11         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210706230115.GA8624@dcvr \
    --to=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.