bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
@ 2023-04-01 20:06 Anton Protopopov
  2023-04-01 22:10 ` Alexei Starovoitov
  2023-04-01 22:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 4+ messages in thread
From: Anton Protopopov @ 2023-04-01 20:06 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, John Fastabend
  Cc: Anton Protopopov

The BPF hashmap uses the jhash() hash function. There is an optimized version
of this hash function which may be used if hash size is a multiple of 4. Apply
this optimization to the hashmap in a similar way as it is done in the bloom
filter map.

On practice the optimization is only noticeable for smaller key sizes, which,
however, is sufficient for many applications. An example is listed in the
following table of measurements (a hashmap of 65536 elements was used):

    --------------------------------------------------------------------
    | key_size | fullness | lookups /sec | lookups (opt) /sec |   gain |
    --------------------------------------------------------------------
    |        4 |      25% |      42.990M |            46.000M |   7.0% |
    |        4 |      50% |      37.910M |            39.094M |   3.1% |
    |        4 |      75% |      34.486M |            36.124M |   4.7% |
    |        4 |     100% |      31.760M |            32.719M |   3.0% |
    --------------------------------------------------------------------
    |        8 |      25% |      43.855M |            49.626M |  13.2% |
    |        8 |      50% |      38.328M |            42.152M |  10.0% |
    |        8 |      75% |      34.483M |            38.088M |  10.5% |
    |        8 |     100% |      31.306M |            34.686M |  10.8% |
    --------------------------------------------------------------------
    |       12 |      25% |      38.398M |            43.770M |  14.0% |
    |       12 |      50% |      33.336M |            37.712M |  13.1% |
    |       12 |      75% |      29.917M |            34.440M |  15.1% |
    |       12 |     100% |      27.322M |            30.480M |  11.6% |
    --------------------------------------------------------------------
    |       16 |      25% |      41.491M |            41.921M |   1.0% |
    |       16 |      50% |      36.206M |            36.474M |   0.7% |
    |       16 |      75% |      32.529M |            33.027M |   1.5% |
    |       16 |     100% |      29.581M |            30.325M |   2.5% |
    --------------------------------------------------------------------
    |       20 |      25% |      34.240M |            36.787M |   7.4% |
    |       20 |      50% |      30.328M |            32.663M |   7.7% |
    |       20 |      75% |      27.536M |            29.354M |   6.6% |
    |       20 |     100% |      24.847M |            26.505M |   6.7% |
    --------------------------------------------------------------------
    |       24 |      25% |      36.329M |            40.608M |  11.8% |
    |       24 |      50% |      31.444M |            35.059M |  11.5% |
    |       24 |      75% |      28.426M |            31.452M |  10.6% |
    |       24 |     100% |      26.278M |            28.741M |   9.4% |
    --------------------------------------------------------------------
    |       28 |      25% |      31.540M |            31.944M |   1.3% |
    |       28 |      50% |      27.739M |            28.063M |   1.2% |
    |       28 |      75% |      24.993M |            25.814M |   3.3% |
    |       28 |     100% |      23.513M |            23.500M |  -0.1% |
    --------------------------------------------------------------------
    |       32 |      25% |      32.116M |            33.953M |   5.7% |
    |       32 |      50% |      28.879M |            29.859M |   3.4% |
    |       32 |      75% |      26.227M |            26.948M |   2.7% |
    |       32 |     100% |      23.829M |            24.613M |   3.3% |
    --------------------------------------------------------------------
    |       64 |      25% |      22.535M |            22.554M |   0.1% |
    |       64 |      50% |      20.471M |            20.675M |   1.0% |
    |       64 |      75% |      19.077M |            19.146M |   0.4% |
    |       64 |     100% |      17.710M |            18.131M |   2.4% |
    --------------------------------------------------------------------

The following script was used to gather the results (SMT & frequency off):

    cd tools/testing/selftests/bpf
    for key_size in 4 8 12 16 20 24 28 32 64; do
            for nr_entries in `seq 16384 16384 65536`; do
                    fullness=$(printf '%3s' $((nr_entries*100/65536)))
                    echo -n "key_size=$key_size: $fullness% full: "
                    sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu
            done
            echo
    done

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---

v1->v2:
  - simplify/optimize code by just testing the (key_len%4 == 0) in hot path (Alexei)

 kernel/bpf/hashtab.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 96b645bba3a4..00c253b84bf5 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -607,6 +607,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 
 static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
 {
+	if (likely(key_len % 4 == 0))
+		return jhash2(key, key_len / 4, hashrnd);
 	return jhash(key, key_len, hashrnd);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
  2023-04-01 20:06 [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4 Anton Protopopov
@ 2023-04-01 22:10 ` Alexei Starovoitov
  2023-04-02 11:45   ` Anton Protopopov
  2023-04-01 22:20 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2023-04-01 22:10 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, John Fastabend

On Sat, Apr 1, 2023 at 1:05 PM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> The BPF hashmap uses the jhash() hash function. There is an optimized version
> of this hash function which may be used if hash size is a multiple of 4. Apply
> this optimization to the hashmap in a similar way as it is done in the bloom
> filter map.
>
> On practice the optimization is only noticeable for smaller key sizes, which,
> however, is sufficient for many applications. An example is listed in the
> following table of measurements (a hashmap of 65536 elements was used):
>
>     --------------------------------------------------------------------
>     | key_size | fullness | lookups /sec | lookups (opt) /sec |   gain |
>     --------------------------------------------------------------------
>     |        4 |      25% |      42.990M |            46.000M |   7.0% |
>     |        4 |      50% |      37.910M |            39.094M |   3.1% |
>     |        4 |      75% |      34.486M |            36.124M |   4.7% |
>     |        4 |     100% |      31.760M |            32.719M |   3.0% |
>     --------------------------------------------------------------------
>     |        8 |      25% |      43.855M |            49.626M |  13.2% |
>     |        8 |      50% |      38.328M |            42.152M |  10.0% |
>     |        8 |      75% |      34.483M |            38.088M |  10.5% |
>     |        8 |     100% |      31.306M |            34.686M |  10.8% |
>     --------------------------------------------------------------------
>     |       12 |      25% |      38.398M |            43.770M |  14.0% |
>     |       12 |      50% |      33.336M |            37.712M |  13.1% |
>     |       12 |      75% |      29.917M |            34.440M |  15.1% |
>     |       12 |     100% |      27.322M |            30.480M |  11.6% |
>     --------------------------------------------------------------------
>     |       16 |      25% |      41.491M |            41.921M |   1.0% |
>     |       16 |      50% |      36.206M |            36.474M |   0.7% |
>     |       16 |      75% |      32.529M |            33.027M |   1.5% |
>     |       16 |     100% |      29.581M |            30.325M |   2.5% |
>     --------------------------------------------------------------------
>     |       20 |      25% |      34.240M |            36.787M |   7.4% |
>     |       20 |      50% |      30.328M |            32.663M |   7.7% |
>     |       20 |      75% |      27.536M |            29.354M |   6.6% |
>     |       20 |     100% |      24.847M |            26.505M |   6.7% |
>     --------------------------------------------------------------------
>     |       24 |      25% |      36.329M |            40.608M |  11.8% |
>     |       24 |      50% |      31.444M |            35.059M |  11.5% |
>     |       24 |      75% |      28.426M |            31.452M |  10.6% |
>     |       24 |     100% |      26.278M |            28.741M |   9.4% |
>     --------------------------------------------------------------------
>     |       28 |      25% |      31.540M |            31.944M |   1.3% |
>     |       28 |      50% |      27.739M |            28.063M |   1.2% |
>     |       28 |      75% |      24.993M |            25.814M |   3.3% |
>     |       28 |     100% |      23.513M |            23.500M |  -0.1% |
>     --------------------------------------------------------------------
>     |       32 |      25% |      32.116M |            33.953M |   5.7% |
>     |       32 |      50% |      28.879M |            29.859M |   3.4% |
>     |       32 |      75% |      26.227M |            26.948M |   2.7% |
>     |       32 |     100% |      23.829M |            24.613M |   3.3% |
>     --------------------------------------------------------------------
>     |       64 |      25% |      22.535M |            22.554M |   0.1% |
>     |       64 |      50% |      20.471M |            20.675M |   1.0% |
>     |       64 |      75% |      19.077M |            19.146M |   0.4% |
>     |       64 |     100% |      17.710M |            18.131M |   2.4% |
>     --------------------------------------------------------------------
>
> The following script was used to gather the results (SMT & frequency off):
>
>     cd tools/testing/selftests/bpf
>     for key_size in 4 8 12 16 20 24 28 32 64; do
>             for nr_entries in `seq 16384 16384 65536`; do
>                     fullness=$(printf '%3s' $((nr_entries*100/65536)))
>                     echo -n "key_size=$key_size: $fullness% full: "
>                     sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu
>             done
>             echo
>     done
>
> Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
> ---
>
> v1->v2:
>   - simplify/optimize code by just testing the (key_len%4 == 0) in hot path (Alexei)
>
>  kernel/bpf/hashtab.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 96b645bba3a4..00c253b84bf5 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -607,6 +607,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>
>  static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
>  {
> +       if (likely(key_len % 4 == 0))
> +               return jhash2(key, key_len / 4, hashrnd);
>         return jhash(key, key_len, hashrnd);
>  }

This looks much cleaner than v1. Applied.

Do you mind doing similar patch for bloomfilter?
(removing aligned_u32_count variable)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
  2023-04-01 20:06 [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4 Anton Protopopov
  2023-04-01 22:10 ` Alexei Starovoitov
@ 2023-04-01 22:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-01 22:20 UTC (permalink / raw)
  To: Anton Protopopov; +Cc: bpf, ast, daniel, andrii, martin.lau, john.fastabend

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Sat,  1 Apr 2023 20:06:02 +0000 you wrote:
> The BPF hashmap uses the jhash() hash function. There is an optimized version
> of this hash function which may be used if hash size is a multiple of 4. Apply
> this optimization to the hashmap in a similar way as it is done in the bloom
> filter map.
> 
> On practice the optimization is only noticeable for smaller key sizes, which,
> however, is sufficient for many applications. An example is listed in the
> following table of measurements (a hashmap of 65536 elements was used):
> 
> [...]

Here is the summary with links:
  - [v2,bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
    https://git.kernel.org/bpf/bpf-next/c/5b85575ad428

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
  2023-04-01 22:10 ` Alexei Starovoitov
@ 2023-04-02 11:45   ` Anton Protopopov
  0 siblings, 0 replies; 4+ messages in thread
From: Anton Protopopov @ 2023-04-02 11:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, John Fastabend

On Sat, Apr 01, 2023 at 03:10:46PM -0700, Alexei Starovoitov wrote:
> On Sat, Apr 1, 2023 at 1:05 PM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > [...]
> >
> >  static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
> >  {
> > +       if (likely(key_len % 4 == 0))
> > +               return jhash2(key, key_len / 4, hashrnd);
> >         return jhash(key, key_len, hashrnd);
> >  }
> 
> This looks much cleaner than v1. Applied.

Thanks!

> Do you mind doing similar patch for bloomfilter?
> (removing aligned_u32_count variable)

Sure, done.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-02 11:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-01 20:06 [PATCH v2 bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4 Anton Protopopov
2023-04-01 22:10 ` Alexei Starovoitov
2023-04-02 11:45   ` Anton Protopopov
2023-04-01 22:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).