[PATCH RFC bpf-next 0/4] bpf_jhash_mem() and BPF Bloom filter implementation

* [PATCH RFC bpf-next 0/4] bpf_jhash_mem() and BPF Bloom filter implementation
@ 2021-09-22 20:32 Andrii Nakryiko
  2021-09-22 20:32 ` [PATCH RFC bpf-next 1/4] bpf: add bpf_jhash_mem BPF helper Andrii Nakryiko
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Andrii Nakryiko @ 2021-09-22 20:32 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: kafai, joannekoong, Andrii Nakryiko

This is a quick experiment on adding a hashing helper and buliding Bloom
filter with pure BPF code with no extra kernel functionality (beyond generic
hashing helper).

This is based on top of Joanne's series [0].

Patch 1 adds bpf_jhash_mem() helper. Patch 2 fixes existing benchmark to
report valid and consistent benchmark results and reduce amount of overhead
that stats counting itself causes. Patch 3 contains BPF-side implementation of
Bloom filter based on global variables. Patch 4 completes the integration on
user-space side. I split patch 3 and 4 to not distract from BPF-side changes.

Note that "Hashmap without bloom filter vs. hashmap with bloom filter"
benchmark is still spending lots of time in just generating random values on
BPF side, and it would be nice to optimize that and make it more reproducible
to compare different implementations. But I ran out of steam for doing that,
especially that I'm not sure this changes anything.

The same benchmark also checks only short 8-byte keys, which is a valid use
case, but not the only probably one, so would be nice to have that extended as
well.

For reference, here are full benchmark results comparing in-kernel Bloom filter
and custom BPF-implemented Bloom filter. I shortened the set of different
combinations tested to reduce amount of time to wait for results.

The results for "Hashmap without bloom filter vs. hashmap with bloom filter"
are quite confusing, though. I spent a bit of time to try to find
discrepancies. I confirmed that both implementations function correctly and
match 100% of time in terms of positives/negatives. Pure Bloom filter reading
benchmarks show a pretty small gap in performance with custom BPF
implementation losing. The "Hashmap without bloom filter vs. hashmap with bloom
filter" benchmark shows much bigger difference, though, which I wasn't able to
completely explain, to be entirely honest.

It's probably worth spending some more time investigating this, but I ran out
of self-alloted time for this.

Bloom filter map
================
        # threads: 1, # hashes: 1
10,000 entries -
        Total operations:    50.854 ± 0.134M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  31.36%
        [CUSTOM] Total operations:  49.391 ± 0.123M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  31.37%
100,000 entries -
        Total operations:    50.969 ± 0.049M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  24.04%
        [CUSTOM] Total operations:  49.135 ± 1.579M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  24.04%
1,000,000 entries -
        Total operations:    48.474 ± 1.619M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  27.50%
        [CUSTOM] Total operations:  46.088 ± 0.776M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  27.50%

        # threads: 1, # hashes: 3
10,000 entries -
        Total operations:    25.136 ± 0.011M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  4.71%
        [CUSTOM] Total operations:  24.115 ± 0.014M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  4.77%
100,000 entries -
        Total operations:    25.045 ± 0.120M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  7.67%
        [CUSTOM] Total operations:  23.028 ± 0.042M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  7.65%
1,000,000 entries -
        Total operations:    18.712 ± 0.406M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  2.64%
        [CUSTOM] Total operations:  18.100 ± 0.422M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  2.64%

        # threads: 1, # hashes: 5
10,000 entries -
        Total operations:    16.672 ± 0.011M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.32%
        [CUSTOM] Total operations:  15.318 ± 0.014M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.32%
100,000 entries -
        Total operations:    16.540 ± 0.121M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.78%
        [CUSTOM] Total operations:  15.189 ± 0.045M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.78%
1,000,000 entries -
        Total operations:    11.781 ± 0.192M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  1.79%
        [CUSTOM] Total operations:  11.651 ± 0.012M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  1.79%

        # threads: 1, # hashes: 10
10,000 entries -
        Total operations:    9.038 ± 0.052M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.00%
        [CUSTOM] Total operations:  8.620 ± 0.008M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.00%
100,000 entries -
        Total operations:    8.076 ± 0.027M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.01%
        [CUSTOM] Total operations:  7.261 ± 0.007M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.01%
1,000,000 entries -
        Total operations:    6.263 ± 0.041M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.03%
        [CUSTOM] Total operations:  6.173 ± 0.013M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.03%

        # threads: 4, # hashes: 1
10,000 entries -
        Total operations:    203.453 ± 0.161M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  31.28%
        [CUSTOM] Total operations:  197.959 ± 0.051M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  31.34%
100,000 entries -
        Total operations:    203.887 ± 0.155M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  24.07%
        [CUSTOM] Total operations:  197.476 ± 1.796M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  24.09%
1,000,000 entries -
        Total operations:    189.259 ± 0.473M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  27.50%
        [CUSTOM] Total operations:  185.157 ± 0.346M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  27.48%

        # threads: 4, # hashes: 3
10,000 entries -
        Total operations:    100.394 ± 0.062M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  4.76%
        [CUSTOM] Total operations:  93.896 ± 0.104M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  4.75%
100,000 entries -
        Total operations:    100.382 ± 0.155M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  7.62%
        [CUSTOM] Total operations:  93.460 ± 0.145M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  7.65%
1,000,000 entries -
        Total operations:    71.424 ± 0.710M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  2.65%
        [CUSTOM] Total operations:  72.546 ± 0.228M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  2.65%

        # threads: 4, # hashes: 5
10,000 entries -
        Total operations:    66.652 ± 0.116M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.32%
        [CUSTOM] Total operations:  60.790 ± 0.454M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.33%
100,000 entries -
        Total operations:    66.401 ± 0.090M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.78%
        [CUSTOM] Total operations:  61.066 ± 0.069M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.77%
1,000,000 entries -
        Total operations:    48.299 ± 0.369M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  1.79%
        [CUSTOM] Total operations:  47.401 ± 0.347M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  1.79%

        # threads: 4, # hashes: 10
10,000 entries -
        Total operations:    36.273 ± 0.030M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.00%
        [CUSTOM] Total operations:  34.464 ± 0.073M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.00%
100,000 entries -
        Total operations:    32.525 ± 0.047M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.01%
        [CUSTOM] Total operations:  29.516 ± 0.110M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.01%
1,000,000 entries -
        Total operations:    25.515 ± 0.405M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.03%
        [CUSTOM] Total operations:  24.566 ± 0.189M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.03%

        # threads: 8, # hashes: 1
10,000 entries -
        Total operations:    406.129 ± 2.262M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  31.45%
        [CUSTOM] Total operations:  384.758 ± 0.379M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  31.24%
100,000 entries -
        Total operations:    407.817 ± 0.793M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  24.05%
        [CUSTOM] Total operations:  394.745 ± 1.302M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  24.05%
1,000,000 entries -
        Total operations:    383.159 ± 0.289M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  27.49%
        [CUSTOM] Total operations:  371.173 ± 0.454M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  27.49%

        # threads: 8, # hashes: 3
10,000 entries -
        Total operations:    201.079 ± 0.183M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  4.69%
        [CUSTOM] Total operations:  187.658 ± 0.544M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  4.74%
100,000 entries -
        Total operations:    199.826 ± 0.972M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  7.63%
        [CUSTOM] Total operations:  185.415 ± 0.358M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  7.62%
1,000,000 entries -
        Total operations:    148.589 ± 1.320M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  2.65%
        [CUSTOM] Total operations:  142.591 ± 4.825M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  2.64%

        # threads: 8, # hashes: 5
10,000 entries -
        Total operations:    133.306 ± 0.468M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.32%
        [CUSTOM] Total operations:  127.377 ± 0.271M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.32%
100,000 entries -
        Total operations:    132.915 ± 0.364M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.78%
        [CUSTOM] Total operations:  123.722 ± 3.169M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.77%
1,000,000 entries -
        Total operations:    94.803 ± 3.240M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  1.79%
        [CUSTOM] Total operations:  95.670 ± 2.624M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  1.79%

        # threads: 8, # hashes: 10
10,000 entries -
        Total operations:    72.517 ± 0.083M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.00%
        [CUSTOM] Total operations:  65.803 ± 0.069M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.00%
100,000 entries -
        Total operations:    65.533 ± 0.148M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.01%
        [CUSTOM] Total operations:  59.908 ± 2.841M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.01%
1,000,000 entries -
        Total operations:    50.937 ± 0.105M/s (drops 0.000 ± 0.000M/s)
        False positive rate:  0.03%
        [CUSTOM] Total operations:  47.526 ± 1.044M/s (drops 0.000 ± 0.000M/s)
        [CUSTOM] False positive rate:  0.03%

Hashmap without bloom filter vs. hashmap with bloom filter (throughput, 8 threads)
==================================================================================
        # hashes: 1
10,000 entries -
        Hashmap without bloom filter:  123.919 ± 0.458M/s
        Hashmap with bloom filter:  124.588 ± 0.450M/s
        [CUSTOM] Hashmap with bloom filter:  106.838 ± 0.114M/s
100,000 entries -
        Hashmap without bloom filter:  93.708 ± 0.715M/s
        Hashmap with bloom filter:  131.686 ± 1.272M/s
        [CUSTOM] Hashmap with bloom filter:  105.649 ± 0.278M/s
1,000,000 entries -
        Hashmap without bloom filter:  40.040 ± 0.677M/s
        Hashmap with bloom filter:  67.250 ± 0.506M/s
        [CUSTOM] Hashmap with bloom filter:  58.356 ± 0.541M/s

        # hashes: 3
10,000 entries -
        Hashmap without bloom filter:  123.882 ± 0.077M/s
        Hashmap with bloom filter:  152.423 ± 0.061M/s
        [CUSTOM] Hashmap with bloom filter:  126.021 ± 0.115M/s
100,000 entries -
        Hashmap without bloom filter:  94.291 ± 0.986M/s
        Hashmap with bloom filter:  127.944 ± 0.825M/s
        [CUSTOM] Hashmap with bloom filter:  106.943 ± 0.827M/s
1,000,000 entries -
        Hashmap without bloom filter:  40.183 ± 0.382M/s
        Hashmap with bloom filter:  125.224 ± 0.266M/s
        [CUSTOM] Hashmap with bloom filter:  89.717 ± 0.158M/s

        # hashes: 5
10,000 entries -
        Hashmap without bloom filter:  120.510 ± 0.351M/s
        Hashmap with bloom filter:  170.138 ± 0.088M/s
        [CUSTOM] Hashmap with bloom filter:  136.006 ± 0.324M/s
100,000 entries -
        Hashmap without bloom filter:  94.774 ± 0.191M/s
        Hashmap with bloom filter:  145.559 ± 0.687M/s
        [CUSTOM] Hashmap with bloom filter:  117.073 ± 0.382M/s
1,000,000 entries -
        Hashmap without bloom filter:  40.004 ± 0.492M/s
        Hashmap with bloom filter:  96.916 ± 0.148M/s
        [CUSTOM] Hashmap with bloom filter:  78.485 ± 0.289M/s

        # hashes: 10
10,000 entries -
        Hashmap without bloom filter:  124.034 ± 0.245M/s
        Hashmap with bloom filter:  169.757 ± 0.336M/s
        [CUSTOM] Hashmap with bloom filter:  134.634 ± 0.276M/s
100,000 entries -
        Hashmap without bloom filter:  94.872 ± 0.195M/s
        Hashmap with bloom filter:  141.107 ± 0.780M/s
        [CUSTOM] Hashmap with bloom filter:  109.279 ± 0.330M/s
1,000,000 entries -
        Hashmap without bloom filter:  40.215 ± 0.396M/s
        Hashmap with bloom filter:  98.084 ± 0.267M/s
        [CUSTOM] Hashmap with bloom filter:  78.579 ± 0.046M/s

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=550585&state=*

Andrii Nakryiko (4):
  bpf: add bpf_jhash_mem BPF helper
  selftests/bpf: fix and optimize bloom filter bench
  selftests/bpf: implement custom Bloom filter purely in BPF
  selftests/bpf: integrate custom Bloom filter impl into benchs

 include/uapi/linux/bpf.h                      |   8 +
 kernel/bpf/helpers.c                          |  23 +++
 tools/include/uapi/linux/bpf.h                |   8 +
 tools/testing/selftests/bpf/Makefile          |   2 +-
 tools/testing/selftests/bpf/bench.c           |   6 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 153 +++++++++++++++++-
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  22 +--
 .../selftests/bpf/benchs/run_common.sh        |   2 +-
 .../selftests/bpf/progs/bloom_filter_map.c    | 125 ++++++++++++--
 9 files changed, 317 insertions(+), 32 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 5+ messages in thread