All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/3] switch to tombstone-free khashl table
Date: Thu, 28 Mar 2024 08:52:35 -0700	[thread overview]
Message-ID: <xmqqh6gqwdz0.fsf@gitster.g> (raw)
In-Reply-To: <20240328101356.300374-1-e@80x24.org> (Eric Wong's message of "Thu, 28 Mar 2024 10:13:53 +0000")

Eric Wong <e@80x24.org> writes:

> Fortunately, this set of changes is unintrusive; but I'm
> hoping to have more time to make deeper changes this year.
>
> Eric Wong (3):
>   list-objects-filter: use kh_size API
>   treewide: switch to khashl for memory savings
>   khashl: fix ensemble lookups on empty table
>
>  builtin/fast-import.c       |   2 +-
>  builtin/fsmonitor--daemon.c |   4 +-
>  delta-islands.c             |   4 +-
>  khash.h                     | 338 -----------------------
>  khashl.h                    | 522 ++++++++++++++++++++++++++++++++++++
>  list-objects-filter.c       |   2 +-
>  object-store-ll.h           |   2 +-
>  object-store.h              |   7 +-
>  oidset.h                    |   2 +-
>  pack-bitmap.h               |   2 +-
>  10 files changed, 535 insertions(+), 350 deletions(-)
>  delete mode 100644 khash.h
>  create mode 100644 khashl.h
>
> Range-diff:
> -:  ---------- > 1:  3bf3148cab list-objects-filter: use kh_size API
> 1:  e74965907e ! 2:  09900edb48 treewide: switch to khashl for memory savings

Do you have the correct range-diff?  The previous round had the
change to the list-object-filter.c to use kh_size() already.

But I see the 0 -> NULL fixes.  Perhaps the left-side base was off
by one when you took the range-diff and there is nothing else going
on that we should be worried about...

>     @@ Commit message
>      
>          khashl is an updated version of khash with less memory overhead
>          (one bit/bucket instead of two) than the original khash and
>     -    similar overall performance.  Insertions are simpler (linear
>     -    probing) but deletions may be slightly slower[1].  Of course,
>     -    the majority of hash tables in git do not delete individual
>     -    elements.
>     +    similar overall performance.  According to its author,
>     +    insertions are simpler (linear probing) but deletions may be
>     +    slightly slower[1].  Of course, the majority of hash tables in
>     +    git do not delete individual elements.
>      
>          Overall memory usage did not decrease much, as the hash tables
>          and elements we store in them are big and currently dwarf the
>          overhead of the khash internals.  Only around 10 MB in
>     -    allocations (not peak use) is saved when doing a no-op `git gc'
>     -    of a Linux kernel object store with thousands of refs and
>     -    islands.
>     +    allocations (and a few dozen KB peak use out of ~6 GB) is saved
>     +    when doing a no-op `git gc' of a Linux kernel object store with
>     +    thousands of refs and islands.
>      
>          A summary of differences I've found from khash to khashl:
>      
>     @@ Commit message
>          * flesh out KHASHL_{SET,MAP}_INIT wrappers with *_clear, *_resize,
>            and *_release functions
>      
>     +    * sparse fixes from Junio and Jeff
>     +
>          [1] https://attractivechaos.wordpress.com/2019/12/28/deletion-from-hash-tables-without-tombstones/
>          [2] git clone https://github.com/attractivechaos/klib.git
>              2895a16cb55e (support an ensemble of hash tables, 2023-12-18)
>     @@ Commit message
>            typedef) and was the only place where I had to change a definition.
>      
>          Signed-off-by: Eric Wong <e@80x24.org>
>     +    Helped-by: Junio C Hamano <gitster@pobox.com>
>     +    Helped-by: Jeff King <peff@peff.net>
>      
>       ## builtin/fast-import.c ##
>      @@
>     @@ khashl.h (new)
>      +#define __KHASHL_IMPL_GET(SCOPE, HType, prefix, khkey_t, __hash_fn, __hash_eq) \
>      +	SCOPE khint_t prefix##_getp_core(const HType *h, const khkey_t *key, khint_t hash) { \
>      +		khint_t i, last, n_buckets, mask; \
>     -+		if (h->keys == 0) return 0; \
>     ++		if (!h->keys) return 0; \
>      +		n_buckets = (khint_t)1U << h->bits; \
>      +		mask = n_buckets - 1U; \
>      +		i = last = __kh_h2b(hash, h->bits); \
>     @@ khashl.h (new)
>      +
>      +#define __KHASHL_IMPL_RESIZE(SCOPE, HType, prefix, khkey_t, __hash_fn, __hash_eq) \
>      +	SCOPE void prefix##_resize(HType *h, khint_t new_n_buckets) { \
>     -+		khint32_t *new_used = 0; \
>     ++		khint32_t *new_used = NULL; \
>      +		khint_t j = 0, x = new_n_buckets, n_buckets, new_bits, new_mask; \
>      +		while ((x >>= 1) != 0) ++j; \
>      +		if (new_n_buckets & (new_n_buckets - 1)) ++j; \
>     @@ khashl.h (new)
>      +#define __KHASHL_IMPL_DEL(SCOPE, HType, prefix, khkey_t, __hash_fn) \
>      +	SCOPE int prefix##_del(HType *h, khint_t i) { \
>      +		khint_t j = i, k, mask, n_buckets; \
>     -+		if (h->keys == 0) return 0; \
>     ++		if (!h->keys) return 0; \
>      +		n_buckets = (khint_t)1U<<h->bits; \
>      +		mask = n_buckets - 1U; \
>      +		while (1) { \
> 2:  744e1b7198 = 3:  bfb20eae37 khashl: fix ensemble lookups on empty table

  parent reply	other threads:[~2024-03-28 15:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 10:13 [PATCH 0/3] switch to tombstone-free khashl table Eric Wong
2024-03-28 10:13 ` [PATCH 1/3] list-objects-filter: use kh_size API Eric Wong
2024-03-28 10:13 ` [PATCH 2/3] treewide: switch to khashl for memory savings Eric Wong
2024-03-28 10:13 ` [PATCH 3/3] khashl: fix ensemble lookups on empty table Eric Wong
2024-03-28 10:14 ` oops, forgot [v2] Eric Wong
2024-03-28 15:52 ` Junio C Hamano [this message]
2024-03-28 17:56   ` [PATCH 0/3] switch to tombstone-free khashl table Eric Wong
  -- strict thread matches above, loose matches on Subject: below --
2024-03-25 23:07 Eric Wong
2024-03-26 17:40 ` Junio C Hamano
2024-04-19 21:31   ` Junio C Hamano
2024-04-19 21:46     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh6gqwdz0.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.