All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Andreas Krey <a.krey@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] refs.c: get_ref_cache: use a bucket hash
Date: Mon, 16 Mar 2015 10:23:05 -0700	[thread overview]
Message-ID: <xmqq1tkosvpi.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20150316142026.GJ7847@inner.h.apk.li> (Andreas Krey's message of "Mon, 16 Mar 2015 15:20:26 +0100")

Andreas Krey <a.krey@gmx.de> writes:

> get_ref_cache used a linear list, which obviously is O(n^2).
> Use a fixed bucket hash which just takes a factor of 100000
> (~ 317^2) out of the n^2 - which is enough.
>
> Signed-off-by: Andreas Krey <a.krey@gmx.de>
> ---
>
> This brings 'git clean -ndx' times down from 17 minutes
> to 11 seconds on one of our workspaces (which accumulated
> a lot of ignored directories).

Nice.

These impressive numbers should go to the commit log message,
together with a bit more numbers to characterise the shape of the
repository that exhibits the problem with the original code.  You
say "a lot of ignored directories", but do you mean directories in
the working tree (which I suppose do not have much to do with the
submodule_ref_caches[])?  I am guessing that the repository has tons
of submodules?  How many is "tons" to make the pain noticeable?

> Actuallly using adaptive
> hashing or other structures seems overkill.

Perhaps _implementing_ these structures only for this codepath may
be overkill, but would it be an overkill to _use_ existing hashmap.c
implementation?  After all, those who wrote the original would have
thought that anything more complex than a linear list would be
overkill, and nobody disagreed until you found that your repository
disagreed with that assumption ;-)

>  refs.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index e23542b..8198d9e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -982,6 +982,8 @@ struct packed_ref_cache {
>  	struct stat_validity validity;
>  };
>  
> +#define REF_CACHE_HASH 317
> +
>  /*
>   * Future: need to be in "struct repository"
>   * when doing a full libification.
> @@ -996,7 +998,7 @@ static struct ref_cache {
>  	 * is initialized correctly.
>  	 */
>  	char name[1];
> -} ref_cache, *submodule_ref_caches;
> +} ref_cache, *submodule_ref_caches[REF_CACHE_HASH];
>  
>  /* Lock used for the main packed-refs file: */
>  static struct lock_file packlock;
> @@ -1065,18 +1067,19 @@ static struct ref_cache *create_ref_cache(const char *submodule)
>   */
>  static struct ref_cache *get_ref_cache(const char *submodule)
>  {
> -	struct ref_cache *refs;
> +	struct ref_cache *refs, **bucketp;
> +	bucketp = submodule_ref_caches + strhash(submodule) % REF_CACHE_HASH;
>  
>  	if (!submodule || !*submodule)
>  		return &ref_cache;
>  
> -	for (refs = submodule_ref_caches; refs; refs = refs->next)
> +	for (refs = *bucketp; refs; refs = refs->next)
>  		if (!strcmp(submodule, refs->name))
>  			return refs;
>  
>  	refs = create_ref_cache(submodule);
> -	refs->next = submodule_ref_caches;
> -	submodule_ref_caches = refs;
> +	refs->next = *bucketp;
> +	*bucketp = refs;
>  	return refs;
>  }

  parent reply	other threads:[~2015-03-16 17:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-16 14:20 [PATCH] refs.c: get_ref_cache: use a bucket hash Andreas Krey
2015-03-16 17:19 ` Thomas Gummerer
2015-03-16 17:23 ` Junio C Hamano [this message]
2015-03-16 18:40   ` Andreas Krey
2015-03-17  2:40     ` Jeff King
2015-03-17  5:35       ` Junio C Hamano
2015-03-17  5:48         ` Jeff King
2015-11-13 15:29           ` Andreas Krey
2015-11-14  0:01             ` Jeff King
2015-11-14 13:22               ` Andreas Krey
2015-11-14 13:35               ` Andreas Krey
2015-11-16 16:31                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1tkosvpi.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=a.krey@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.