From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Kernel Team <kernel-team@fb.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Tejun Heo <tj@kernel.org>,
Dave Marchevsky <davemarchevsky@fb.com>
Subject: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
Date: Tue, 6 Dec 2022 15:09:47 -0800 [thread overview]
Message-ID: <20221206231000.3180914-1-davemarchevsky@fb.com> (raw)
This series adds a rbtree datastructure following the "next-gen
datastructure" precedent set by recently-added linked-list [0]. This is
a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
instead of adding a new map type. This series adds a smaller set of API
functions than that RFC - just the minimum needed to support current
cgfifo example scheduler in ongoing sched_ext effort [2], namely:
bpf_rbtree_add
bpf_rbtree_remove
bpf_rbtree_first
The meat of this series is bugfixes and verifier infra work to support
these API functions. Adding more rbtree kfuncs in future patches should
be straightforward as a result.
BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
hood. From the BPF program writer's perspective, a BPF rbtree is very
similar to existing linked list. Consider the following example:
struct node_data {
long key;
long data;
struct bpf_rb_node node;
}
static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
{
struct node_data *node_a;
struct node_data *node_b;
node_a = container_of(a, struct node_data, node);
node_b = container_of(b, struct node_data, node);
return node_a->key < node_b->key;
}
private(A) struct bpf_spin_lock glock;
private(A) struct bpf_rb_root groot __contains(node_data, node);
/* ... in BPF program */
struct node_data *n, *m;
struct bpf_rb_node *res;
n = bpf_obj_new(typeof(*n));
if (!n)
/* skip */
n->key = 5;
n->data = 10;
bpf_spin_lock(&glock);
bpf_rbtree_add(&groot, &n->node, less);
bpf_spin_unlock(&glock);
bpf_spin_lock(&glock);
res = bpf_rbtree_first(&groot);
if (!res)
/* skip */
res = bpf_rbtree_remove(&groot, res);
if (!res)
/* skip */
bpf_spin_unlock(&glock);
m = container_of(res, struct node_data, node);
bpf_obj_drop(m);
Some obvious similarities:
* Special bpf_rb_root and bpf_rb_node types have same semantics
as bpf_list_head and bpf_list_node, respectively
* __contains is used to associated node type with root
* The spin_lock associated with a rbtree must be held when using
rbtree API kfuncs
* Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
* Rbtree takes ownership of node lifetime when a node is added.
Removing a node gives ownership back to the program, requiring a
bpf_obj_drop before program exit
Some new additions as well:
* Support for callbacks in kfunc args is added to enable 'less'
callback use above
* bpf_rbtree_first's release_on_unlock handling is a bit novel, as
it's the first next-gen ds API function to release_on_unlock its
return reg instead of nonexistent node arg
* Because all references to nodes already added to the rbtree are
'non-owning', i.e. release_on_unlock and PTR_UNTRUSTED,
bpf_rbtree_remove must accept such a reference in order to remove it
from the tree
It seemed better to special-case some 'new additions' verifier logic for
now instead of adding new type flags and concepts, as some of the concepts
(e.g. PTR_UNTRUSTED + release_on_unlock) need a refactoring pass before
we pile more on. Regardless, the net-new verifier logic added in this
patchset is minimal. Verifier changes are mostly generaliztion of
existing linked-list logic and some bugfixes.
A note on naming:
Some existing list-specific helpers are renamed to 'datastructure_head',
'datastructure_node', etc. Probably a more concise and accurate naming
would be something like 'ng_ds_head' for 'next-gen datastructure'.
For folks who weren't following the conversations over past few months,
though, such a naming scheme might seem to indicate that _all_ next-gen
datastructures must have certain semantics, like release_on_unlock,
which aren't necessarily required. For this reason I'd like some
feedback on how to name things.
Summary of patches:
Patches 1, 2, and 10 are bugfixes which are likely worth applying
independently of rbtree implementation. Patch 12 is somewhere between
nice-to-have and bugfix.
Patches 3 and 4 are nonfunctional refactor/rename.
Patches 5 - 9 implement the meat of rbtree support in this series,
gradually building up to implemented kfuncs that verify as expected.
Patch 11 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.
Patch 13 adds tests.
[0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com
[1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
[2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org
Future work:
Enabling writes to release_on_unlock refs should be done before the
functionality of BPF rbtree can truly be considered complete.
Implementing this proved more complex than expected so it's been
pushed off to a future patch.
Dave Marchevsky (13):
bpf: Loosen alloc obj test in verifier's reg_btf_record
bpf: map_check_btf should fail if btf_parse_fields fails
bpf: Minor refactor of ref_set_release_on_unlock
bpf: rename list_head -> datastructure_head in field info types
bpf: Add basic bpf_rb_{root,node} support
bpf: Add bpf_rbtree_{add,remove,first} kfuncs
bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
bpf: Add callback validation to kfunc verifier logic
bpf: Special verifier handling for bpf_rbtree_{remove, first}
bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj
type
selftests/bpf: Add rbtree selftests
arch/x86/net/bpf_jit_comp.c | 123 +++--
include/linux/bpf.h | 21 +-
include/uapi/linux/bpf.h | 11 +
kernel/bpf/btf.c | 181 ++++---
kernel/bpf/helpers.c | 75 ++-
kernel/bpf/syscall.c | 33 +-
kernel/bpf/verifier.c | 506 +++++++++++++++---
tools/include/uapi/linux/bpf.h | 11 +
tools/lib/bpf/libbpf.c | 50 +-
.../testing/selftests/bpf/bpf_experimental.h | 24 +
.../selftests/bpf/prog_tests/linked_list.c | 12 +-
.../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++
tools/testing/selftests/bpf/progs/rbtree.c | 180 +++++++
.../progs/rbtree_btf_fail__add_wrong_type.c | 48 ++
.../progs/rbtree_btf_fail__wrong_node_type.c | 21 +
.../testing/selftests/bpf/progs/rbtree_fail.c | 263 +++++++++
16 files changed, 1549 insertions(+), 194 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c
--
2.30.2
next reply other threads:[~2022-12-06 23:10 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 23:09 Dave Marchevsky [this message]
2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
2022-12-07 16:41 ` Kumar Kartikeya Dwivedi
2022-12-07 18:34 ` Dave Marchevsky
2022-12-07 18:59 ` Alexei Starovoitov
2022-12-07 20:38 ` Dave Marchevsky
2022-12-07 22:46 ` Alexei Starovoitov
2022-12-07 23:42 ` Dave Marchevsky
2022-12-07 19:03 ` Kumar Kartikeya Dwivedi
2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
2022-12-07 1:32 ` Alexei Starovoitov
2022-12-07 16:49 ` Kumar Kartikeya Dwivedi
2022-12-07 19:05 ` Alexei Starovoitov
2022-12-17 8:59 ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
2022-12-07 1:41 ` Alexei Starovoitov
2022-12-07 18:52 ` Dave Marchevsky
2022-12-07 19:01 ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
2022-12-07 1:48 ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
2022-12-07 14:20 ` kernel test robot
2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
2022-12-07 1:51 ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
2022-12-07 2:01 ` Alexei Starovoitov
2022-12-17 8:49 ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
2022-12-07 2:18 ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
2022-12-07 2:39 ` Alexei Starovoitov
2022-12-07 6:46 ` Dave Marchevsky
2022-12-07 18:06 ` Alexei Starovoitov
2022-12-07 23:39 ` Dave Marchevsky
2022-12-08 0:47 ` Alexei Starovoitov
2022-12-08 8:50 ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
2022-12-06 23:10 ` [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
2022-12-07 2:50 ` [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure patchwork-bot+netdevbpf
2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
2022-12-07 22:28 ` Dave Marchevsky
2022-12-07 23:06 ` Alexei Starovoitov
2022-12-08 1:18 ` Dave Marchevsky
2022-12-08 3:51 ` Alexei Starovoitov
2022-12-08 8:28 ` Dave Marchevsky
2022-12-08 12:57 ` Kumar Kartikeya Dwivedi
2022-12-08 20:36 ` Alexei Starovoitov
2022-12-08 23:35 ` Dave Marchevsky
2022-12-09 0:39 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221206231000.3180914-1-davemarchevsky@fb.com \
--to=davemarchevsky@fb.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=memxor@gmail.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.