git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scharfe" <l.s.r@web.de>
Cc: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/2] oidset: use khash
Date: Wed, 3 Oct 2018 15:40:51 -0400	[thread overview]
Message-ID: <20181003194051.GB20709@sigill.intra.peff.net> (raw)
In-Reply-To: <5efe6695-2e82-786c-1170-7874978cb534@web.de>

On Wed, Oct 03, 2018 at 03:16:39PM +0200, René Scharfe wrote:

> Performance of a command that mainly checks for duplicate objects using
> an oidset, with master and Clang 6.0.1:
> 
>   $ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'"
> 
>   $ /usr/bin/time $cmd >/dev/null
>   0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k
>   0inputs+0outputs (0major+11204minor)pagefaults 0swaps
> 
>   $ hyperfine "$cmd"
>   Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
> 
>     Time (mean ± σ):     250.0 ms ±   6.0 ms    [User: 225.9 ms, System: 23.6 ms]
> 
>     Range (min … max):   242.0 ms … 261.1 ms
> 
> And with this patch:
> 
>   $ /usr/bin/time $cmd >/dev/null
>   0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k
>   0inputs+0outputs (0major+8318minor)pagefaults 0swaps
> 
>   $ hyperfine "$cmd"
>   Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
> 
>     Time (mean ± σ):     151.9 ms ±   4.9 ms    [User: 130.5 ms, System: 21.2 ms]
> 
>     Range (min … max):   148.2 ms … 170.4 ms

Thanks. Just to reinforce these timings, a similar command on linux.git
drops from 4.0s to 2.3s.

The patch itself mostly looks good. A few small comments:

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 75047a4b2a..a839315726 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -536,7 +536,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
>  	 * add to "newlist" between calls, the additions will always be for
>  	 * oids that are already in the set.
>  	 */
> -	if (!tip_oids->map.map.tablesize) {
> +	if (!tip_oids->set.n_buckets) {
>  		add_refs_to_oidset(tip_oids, unmatched);
>  		add_refs_to_oidset(tip_oids, newlist);
>  	}

This is a little intimate with the implementation of khash, but I think
it's probably OK (and really no worse than what was there before).

As the comment above notes, I think we're really looking at the case
where this gets populated on the first call, but not subsequent ones. It
might be less hacky to use a "static int initialized" here. Or if we
want to avoid hidden globals, put the logic into filter_refs() to decide
when to populate.

I don't think any of that needs to hold up this series, though.

>  int oidset_insert(struct oidset *set, const struct object_id *oid)
>  {
> -	struct oidmap_entry *entry;
> -
> -	if (!set->map.map.tablesize)
> -		oidmap_init(&set->map, 0);
> -	else if (oidset_contains(set, oid))
> -		return 1;
> -
> -	entry = xmalloc(sizeof(*entry));
> -	oidcpy(&entry->oid, oid);
> -
> -	oidmap_put(&set->map, entry);
> -	return 0;
> +	int added;
> +	kh_put_oid(&set->set, *oid, &added);
> +	return !added;
>  }

This actually does the check-and-insert in a single lookup, which is
nice. ;)

> diff --git a/oidset.h b/oidset.h
> index 40ec5f87fe..4b90540cd4 100644
> --- a/oidset.h
> +++ b/oidset.h
> [...]
> +KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)

This will declare these "static inline". Our other major "macros become
inline functions" code is commit-slab.h, and there we found it necessary
to add MAYBE_UNUSED. I wonder if we ought to be doing the same here (I
don't get any warnings, but I suspect sparse might complain).

It might be nice if these functions could hide inside oidset.c (and just
declare the struct here). It looks like we might be able to do that with
__KHASH_TYPE(), but the double-underscore implies that we're not
supposed to. ;)

I guess we also use a few of them in our inlines here. I'm not 100% sure
that oidset_* needs to be inlined either, but this is at least a pretty
faithful conversion of the original.

-Peff

  reply	other threads:[~2018-10-03 19:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03 13:11 [PATCH v2 0/2] " René Scharfe
2018-10-03 13:12 ` [PATCH v2 1/2] khash: factor out kh_release_* René Scharfe
2018-10-03 13:16 ` [PATCH v2 2/2] oidset: use khash René Scharfe
2018-10-03 19:40   ` Jeff King [this message]
2018-10-04  5:56     ` René Scharfe
2018-10-04  6:48       ` Jeff King
2018-10-04  6:50         ` Jeff King
2018-10-04 15:05         ` René Scharfe
2018-10-04 15:05 ` [PATCH v3 0/5] " René Scharfe
2018-10-04 15:09   ` [PATCH v3 1/5] fetch-pack: factor out is_unmatched_ref() René Scharfe
2018-10-04 15:09   ` [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed René Scharfe
2018-10-04 21:38     ` Jonathan Tan
2018-10-04 22:11       ` René Scharfe
2018-10-05 20:13         ` René Scharfe
2018-10-04 22:14       ` Jeff King
2018-10-04 22:52         ` Jonathan Tan
2018-10-04 23:18           ` Jeff King
2018-10-04 22:07     ` Jeff King
2018-10-05 20:13       ` René Scharfe
2018-10-05 20:27         ` Jeff King
2018-10-05 21:22           ` René Scharfe
2018-10-05 21:47             ` Jeff King
2018-10-04 15:10   ` [PATCH v3 3/5] khash: factor out kh_release_* René Scharfe
2018-10-04 15:13   ` [PATCH v3 4/5] oidset: use khash René Scharfe
2018-10-04 15:14   ` [PATCH 5/5] oidset: uninline oidset_init() René Scharfe
2018-10-04 22:15   ` [PATCH v3 0/5] oidset: use khash Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181003194051.GB20709@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --subject='Re: [PATCH v2 2/2] oidset: use khash' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox