All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/10] Add support for qp-trie map
@ 2022-09-17 15:31 Hou Tao
  2022-09-17 15:31 ` [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size Hou Tao
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
  To: bpf, Andrii Nakryiko
  Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, Paul E . McKenney, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The initial motivation for qp-trie map is to reduce memory usage for
string keys special those with large differencies in length as
discussed in [0]. And as a big-endian lexicographical ordered map, it
can also be used for any binary data with fixed or variable length.

Now the basic functionality of qp-trie is ready, so posting it to get
more feedback or suggestions about qp-trie. Specially feedback
about the following questions:

(1) Use bpf_dynptr in map key
Now qp-trie uses bpf_dynptr as map key. For simplicity a new map flags
BPF_F_DYNPTR_KEY is introduced, but it is not usable for map key in
which a bpf_dynptr is embedded. So maybe a better and generic way is to
check the offset of bpf_dynptr in map key during map creation and use
the offset when initializing or verifying map key ?

(2) Use cases for qp-trie
Andrii had proposed to re-implement lpm-trie by using qp-trie. The
advantage would be the speed up of lookup operations due to lower tree
depth of qp-trie and the performance of update may also increase.
But is there any other use cases for qp-trie ? Specially those cases
which need both ordering and memory efficiency or cases in which qp-trie
will have high branch factor and its lookup performance will be much
better than hash-table as shown below:

  Randomly-generated binary data (key size=255, max entries=16K, key length range:[1, 255])
  htab lookup      (1  thread)    4.968 ± 0.009M/s (drops 0.002 ± 0.000M/s mem 8.169 MiB)
  htab lookup      (2  thread)   10.118 ± 0.010M/s (drops 0.007 ± 0.000M/s mem 8.169 MiB)
  htab lookup      (4  thread)   20.084 ± 0.022M/s (drops 0.007 ± 0.000M/s mem 8.168 MiB)
  htab lookup      (8  thread)   39.866 ± 0.047M/s (drops 0.010 ± 0.000M/s mem 8.168 MiB)
  htab lookup      (16 thread)   79.412 ± 0.065M/s (drops 0.049 ± 0.000M/s mem 8.169 MiB)
  
  qp-trie lookup   (1  thread)   10.291 ± 0.007M/s (drops 0.004 ± 0.000M/s mem 4.899 MiB)
  qp-trie lookup   (2  thread)   20.797 ± 0.009M/s (drops 0.006 ± 0.000M/s mem 4.879 MiB)
  qp-trie lookup   (4  thread)   41.943 ± 0.019M/s (drops 0.015 ± 0.000M/s mem 4.262 MiB)
  qp-trie lookup   (8  thread)   81.985 ± 0.032M/s (drops 0.025 ± 0.000M/s mem 4.215 MiB)
  qp-trie lookup   (16 thread)  164.681 ± 0.051M/s (drops 0.050 ± 0.000M/s mem 4.261 MiB)

  * non-zero drops is due to duplicated keys in generated keys.

(3) Improve update performance for qp-trie
Now qp-trie is divided into 256 sub-trees by using the first byte as the
key and one sub-tree is protected one spinlock. I also had tried the
hierarchical lock by using hand-over-hand lock scheme [1], but it
doesn't scale well. Update performance of hash-table has been optimized
a lot by "bpf: BPF specific memory allocator", but because qp-trie
allocates memory for a variable-length key and bpf_mem_alloc() only
support <=4KB memory, so it doesn't fill well for qp-trie.

  Strings in /proc/kallsyms (key size=83, max entries=170958)
  htab update      (1  thread)    3.135 ± 0.355M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB)
  htab update      (2  thread)    6.269 ± 0.686M/s (drops 0.000 ± 0.000M/s mem 30.841 MiB)
  htab update      (4  thread)   11.607 ± 1.632M/s (drops 0.000 ± 0.000M/s mem 30.840 MiB)
  htab update      (8  thread)   23.041 ± 0.806M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB)
  htab update      (16 thread)   31.393 ± 0.307M/s (drops 0.000 ± 0.000M/s mem 30.835 MiB)
  
  qp-trie update   (1  thread)    1.191 ± 0.093M/s (drops 0.000 ± 0.000M/s mem 17.170 MiB)
  qp-trie update   (2  thread)    2.057 ± 0.041M/s (drops 0.000 ± 0.000M/s mem 17.058 MiB)
  qp-trie update   (4  thread)    2.975 ± 0.035M/s (drops 0.000 ± 0.000M/s mem 17.411 MiB)
  qp-trie update   (8  thread)    3.596 ± 0.031M/s (drops 0.000 ± 0.000M/s mem 17.110 MiB)
  qp-trie update   (16 thread)    4.200 ± 0.048M/s (drops 0.000 ± 0.000M/s mem 17.228 MiB)
  
(4) Improve memory efficiency further
When using strings in BTF string section as a data set for qp-trie, the
slab memory usage showed in cgroup memory.stats file is about 11MB for
qp-trie and 22MB for hash table as shown below. However the theoretical
memory usage for qp-trie is ~6.8MB (is ~4.9MB if removing "parent" & "rcu"
fields from qp_trie_branch) and the extra memory usage (about 38% of total
usage) mainly comes from internal fragment in slab (namely 2^n alignment
for allocation) and overhead in kmem-cgroup accounting. We can reduce the
internal fragment by creating separated kmem_cache for qp_trie_branch with
different child nodes, but not sure whether it is worthy or not.

And in order to prevent allocating a rcu_head for each leaf node, now only
branch node is RCU-freed, so when replacing a leaf node, a new branch node
and a new leaf node will be allocated instead of replacing the old leaf
node and RCU-freed the old leaf node. Also it seems rcu_head can be
replaced by a pointer as BPF specific memory allocator does, so 8-bytes
can be saved for each node, and again not sure it's worth it.

  Sorted strings in BTF string sections (entries=115980)
  htab lookup      (1  thread)    6.915 ± 0.029M/s (drops 0.000 ± 0.000M/s mem 22.216 MiB)
  qp-trie lookup   (1  thread)    6.791 ± 0.005M/s (drops 0.000 ± 0.000M/s mem 11.273 MiB)
  
  All files under linux kernel source directory (entries=74359)
  htab lookup      (1  thread)    7.978 ± 0.009M/s (drops 0.000 ± 0.000M/s mem 14.272 MiB)
  qp-trie lookup   (1  thread)    5.521 ± 0.003M/s (drops 0.000 ± 0.000M/s mem 9.367 MiB)
  
  Domain names for Alexa top million web site (entries=1000000)
  htab lookup      (1  thread)    3.148 ± 0.039M/s (drops 0.000 ± 0.000M/s mem 190.831 MiB)
  qp-trie lookup   (1  thread)    2.374 ± 0.026M/s (drops 0.000 ± 0.000M/s mem 83.733 MiB)

Comments and suggestions are always welcome.

Change Log:
v1:
 * Use bpf_dynptr as map key type instead of bpf_lpm_trie_key-styled key (Suggested by Andrii)
 * Fix build error and RCU related sparse errors reported by lkp robot
 * Copy the passed key firstly in qp_trie_update_elem(), because the content of passed key may
   change and may break the assumption in two-round lookup process during update.
 * Add the missing rcu_barrier in qp_trie_free()

RFC: https://lore.kernel.org/bpf/20220726130005.3102470-1-houtao1@huawei.com/

[0]: https://lore.kernel.org/bpf/CAEf4Bzb7keBS8vXgV5JZzwgNGgMV0X3_guQ_m9JW3X6fJBDpPQ@mail.gmail.com/
[1]: https://lore.kernel.org/bpf/db34696a-cbfe-16e8-6dd5-8174b97dcf1d@huawei.com/

Hou Tao (10):
  bpf: Export bpf_dynptr_{get|set}_size
  bpf: Add helper btf_type_is_bpf_dynptr()
  bpf: Support bpf_dynptr-typed map key for bpf syscall
  bpf: Support bpf_dynptr-typed map key for bpf program
  libbpf: Add helpers for bpf_dynptr_user
  bpf: Add support for qp-trie map
  selftests/bpf: Add two new dynptr_fail cases for map key
  selftests/bpf: Add test case for basic qp-trie map operations
  selftests/bpf: Add benchmark for qp-trie map
  selftests/bpf: Add tests for qp-trie map by using bpf syscalls

 include/linux/bpf.h                           |    4 +
 include/linux/bpf_types.h                     |    1 +
 include/linux/btf.h                           |    1 +
 include/uapi/linux/bpf.h                      |   10 +
 kernel/bpf/Makefile                           |    1 +
 kernel/bpf/bpf_qp_trie.c                      | 1055 +++++++++++++++
 kernel/bpf/btf.c                              |    7 +
 kernel/bpf/helpers.c                          |    9 +-
 kernel/bpf/syscall.c                          |  108 +-
 kernel/bpf/verifier.c                         |   18 +-
 tools/include/uapi/linux/bpf.h                |    9 +
 tools/lib/bpf/bpf.h                           |   19 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |    1 +
 tools/testing/selftests/bpf/Makefile          |    5 +-
 tools/testing/selftests/bpf/bench.c           |   10 +
 .../selftests/bpf/benchs/bench_qp_trie.c      |  511 ++++++++
 .../selftests/bpf/benchs/run_bench_qp_trie.sh |   55 +
 .../selftests/bpf/map_tests/qp_trie_map.c     | 1165 +++++++++++++++++
 .../testing/selftests/bpf/prog_tests/dynptr.c |    2 +
 .../selftests/bpf/prog_tests/qp_trie_test.c   |   91 ++
 .../testing/selftests/bpf/progs/dynptr_fail.c |   43 +
 .../selftests/bpf/progs/qp_trie_bench.c       |  236 ++++
 .../selftests/bpf/progs/qp_trie_test.c        |  110 ++
 23 files changed, 3452 insertions(+), 19 deletions(-)
 create mode 100644 kernel/bpf/bpf_qp_trie.c
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_qp_trie.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh
 create mode 100644 tools/testing/selftests/bpf/map_tests/qp_trie_map.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/qp_trie_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_test.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-17 18:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 02/10] bpf: Add helper btf_type_is_bpf_dynptr() Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Hou Tao
2022-09-17 18:18   ` kernel test robot
2022-09-17 18:29   ` kernel test robot
2022-09-17 15:31 ` [PATCH bpf-next 04/10] bpf: Support bpf_dynptr-typed map key for bpf program Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 05/10] libbpf: Add helpers for bpf_dynptr_user Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 06/10] bpf: Add support for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 07/10] selftests/bpf: Add two new dynptr_fail cases for map key Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 08/10] selftests/bpf: Add test case for basic qp-trie map operations Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 09/10] selftests/bpf: Add benchmark for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 10/10] selftests/bpf: Add tests for qp-trie map by using bpf syscalls Hou Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.