All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: git@vger.kernel.org
Subject: [PATCH 0/3] switch to tombstone-free khashl table
Date: Mon, 25 Mar 2024 23:07:00 +0000	[thread overview]
Message-ID: <20240325230704.262272-1-e@80x24.org> (raw)

The memory improvement is minor, but any memory reduction at all
is welcome at this point.  Fortunately, this set of changes is
unintrusive.

I have some other ideas that I'll hopefully get to implement before
swapping kills all my SSDs (see bottom).

Eric Wong (3):
  list-objects-filter: use kh_size API
  treewide: switch to khashl for memory savings
  khashl: fix ensemble lookups on empty table

 builtin/fast-import.c       |   2 +-
 builtin/fsmonitor--daemon.c |   4 +-
 delta-islands.c             |   4 +-
 khash.h                     | 338 -----------------------
 khashl.h                    | 522 ++++++++++++++++++++++++++++++++++++
 list-objects-filter.c       |   2 +-
 object-store-ll.h           |   2 +-
 object-store.h              |   7 +-
 oidset.h                    |   2 +-
 pack-bitmap.h               |   2 +-
 10 files changed, 535 insertions(+), 350 deletions(-)
 delete mode 100644 khash.h
 create mode 100644 khashl.h

TODO (some ideas stolen from khashl):

* obj_hash can probably use a 0.75 load factor (instead of 0.5)
  to save memory and not slow down too much.  oid_* khash has
  always had 0.77 which was fine and now khashl has 0.75.
  0.75 is cheaper to compute via (shifts + ORs) than 0.77.

* obj_hash can use `i = (i + 1) & mask' like khashl does to
  avoid branches in linear probe loops

* obj_hash can use realloc and copy in-place resize logic khashl does.
  khashl uses the ->used bitmap for this, but it should be
  possible to do for obj_hash w/o additional allocations
  via pointer tagging

(not khashl inspired):

* keep an identity pool of OIDs separately (similar to how Perl5
  pools all hash keys) and only use tagged pointers for OIDs.
  Pointer tagging can be used to distinguish between 7 hash
  functions, leaving us room for 5 more in addition to SHA-(1|256).
  This change will be a large effort (with a hopefully large savings).
  If we ever need more than 7 hash functions, we can switch to
  storing hash type information in slabs that can be looked up
  using address ranges (AFAIK, jemalloc does this).

             reply	other threads:[~2024-03-25 23:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 23:07 Eric Wong [this message]
2024-03-25 23:07 ` [PATCH 1/3] list-objects-filter: use kh_size API Eric Wong
2024-03-25 23:07 ` [PATCH 2/3] treewide: switch to khashl for memory savings Eric Wong
2024-03-26 17:48   ` Junio C Hamano
2024-03-27  9:37     ` Jeff King
2024-03-27 21:54       ` Junio C Hamano
2024-03-25 23:07 ` [PATCH 3/3] khashl: fix ensemble lookups on empty table Eric Wong
2024-03-25 23:07 ` [REJECT 4/3] switch to khashl ensemble Eric Wong
2024-03-26 17:40 ` [PATCH 0/3] switch to tombstone-free khashl table Junio C Hamano
2024-04-19 21:31   ` Junio C Hamano
2024-04-19 21:46     ` Eric Wong
2024-03-28 10:13 Eric Wong
2024-03-28 15:52 ` Junio C Hamano
2024-03-28 17:56   ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240325230704.262272-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.