From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Kernel Team <kernel-team@fb.com>, Tejun Heo <tj@kernel.org>,
Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFC PATCH bpf-next 10/11] bpf: Introduce PTR_ITER and PTR_ITER_END type flags
Date: Fri, 22 Jul 2022 11:34:37 -0700 [thread overview]
Message-ID: <20220722183438.3319790-11-davemarchevsky@fb.com> (raw)
In-Reply-To: <20220722183438.3319790-1-davemarchevsky@fb.com>
Add two type flags, PTR_ITER and PTR_ITER_END, meant to support the
following pattern for iterating data structures:
struct node *elem = data_structure_iter_first(&some_map)
while (elem) {
do_something();
elem = data_structure_iter_next(&some_map, elem);
}
Currently the ret_type of both _first() and _next() helpers would be
PTR_TO_BTF_ID_OR_NULL and the verifier would consider the loop to be
infinite as it knows nothing about an arbitrary PTR_TO_BTF_ID value.
However, if we can express "this PTR_TO_BTF_ID will eventually be
NULL if used in iteration helpers" via a new type flag, the verifier can
use this information to determine that the loop will terminate while
still verifying the loop body.
So for our contrived example above, _first() now returns a
PTR_TO_BTF_ID_OR_NULL | PTR_ITER, which _next() expects as input,
itself returning PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.
The verifier does nothing special for PTR_ITER, so the first iteration
of the example loop will result in both elem == NULL and elem != NULL
branches being verified. When verifying the latter branch, elem will
become a PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.
When the verifier sees a reg holding PTR_ITER_END in a conditional jump
it knows the reg type and val can be replaced with SCALAR 0. So in the
example loop on 2nd iteration elem == NULL will be assumed and
verification will continue with that branch only.
[ TODO:
* PTR_ITER needs to be associated with the helper that produced it, to
prevent something like:
struct node *elem = data_structure_iter_last(&some_map)
while (elem) {
do_something();
elem = data_structure_iter_next(&some_map, elem);
}
i.e. _first() and _next() should be used together, same with _last()
and _prev().
* This is currently very unsafe. Per multiple conversations w/ Alexei
and Andrii, there are probably some ways forward here, but may be more
complex than it's worth. If so, can migrate to a callback-based
approach with similar functionality .
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 3 ++
include/uapi/linux/bpf.h | 34 ++++++++++++++-
kernel/bpf/helpers.c | 12 ++++++
kernel/bpf/rbtree.c | 78 ++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 53 +++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 34 ++++++++++++++-
6 files changed, 209 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a601ab30a2b1..819778e05060 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -416,6 +416,9 @@ enum bpf_type_flag {
CONDITIONAL_RELEASE = BIT(12 + BPF_BASE_TYPE_BITS),
+ PTR_ITER = BIT(13 + BPF_BASE_TYPE_BITS),
+ PTR_ITER_END = BIT(14 + BPF_BASE_TYPE_BITS),
+
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d21e2c99ea14..3869653c6aeb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5409,6 +5409,34 @@ union bpf_attr {
*
* Return
* 0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ * Description
+ * Return the first node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ * Description
+ * Return the last node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ * Description
+ * Return the next node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ * Description
+ * Return the previous node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5620,13 +5648,17 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(rbtree_alloc_node), \
- FN(rbtree_add), \
+ FN(rbtree_add), \
FN(rbtree_find), \
FN(rbtree_remove), \
FN(rbtree_free_node), \
FN(rbtree_get_lock), \
FN(rbtree_lock), \
FN(rbtree_unlock), \
+ FN(rbtree_first), \
+ FN(rbtree_last), \
+ FN(rbtree_next), \
+ FN(rbtree_prev), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index fa2dba1dcec8..fdc505139b99 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1592,6 +1592,10 @@ const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
const struct bpf_func_proto bpf_rbtree_lock_proto __weak;
const struct bpf_func_proto bpf_rbtree_unlock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_first_proto __weak;
+const struct bpf_func_proto bpf_rbtree_last_proto __weak;
+const struct bpf_func_proto bpf_rbtree_next_proto __weak;
+const struct bpf_func_proto bpf_rbtree_prev_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1697,6 +1701,14 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_rbtree_lock_proto;
case BPF_FUNC_rbtree_unlock:
return &bpf_rbtree_unlock_proto;
+ case BPF_FUNC_rbtree_first:
+ return &bpf_rbtree_first_proto;
+ case BPF_FUNC_rbtree_last:
+ return &bpf_rbtree_last_proto;
+ case BPF_FUNC_rbtree_next:
+ return &bpf_rbtree_next_proto;
+ case BPF_FUNC_rbtree_prev:
+ return &bpf_rbtree_prev_proto;
default:
break;
}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index dcf8f69d4ada..f1a06c340d88 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -140,6 +140,84 @@ const struct bpf_func_proto bpf_rbtree_find_proto = {
.arg3_type = ARG_PTR_TO_FUNC,
};
+BPF_CALL_1(bpf_rbtree_first, struct bpf_map *, map)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_first_cached(&tree->root);
+}
+
+const struct bpf_func_proto bpf_rbtree_first_proto = {
+ .func = bpf_rbtree_first,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+ .ret_btf_id = &bpf_rbtree_btf_ids[0],
+ .arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_1(bpf_rbtree_last, struct bpf_map *, map)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_last(&tree->root.rb_root);
+}
+
+const struct bpf_func_proto bpf_rbtree_last_proto = {
+ .func = bpf_rbtree_last,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+ .ret_btf_id = &bpf_rbtree_btf_ids[0],
+ .arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_2(bpf_rbtree_next, struct bpf_map *, map, void *, cur)
+{
+ struct rb_node *next = rb_next((struct rb_node *)cur);
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)next;
+}
+
+const struct bpf_func_proto bpf_rbtree_next_proto = {
+ .func = bpf_rbtree_next,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+ .ret_btf_id = &bpf_rbtree_btf_ids[0],
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+ .arg2_btf_id = &bpf_rbtree_btf_ids[0],
+};
+
+BPF_CALL_2(bpf_rbtree_prev, struct bpf_map *, map, void *, cur)
+{
+ struct rb_node *node = (struct rb_node *)cur;
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_prev(node);
+}
+
+const struct bpf_func_proto bpf_rbtree_prev_proto = {
+ .func = bpf_rbtree_prev,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+ .ret_btf_id = &bpf_rbtree_btf_ids[0],
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+ .arg2_btf_id = &bpf_rbtree_btf_ids[0],
+};
+
/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
* 'rb_node *', so field name of rb_node within containing struct is not
* needed.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f80e161170de..e1580a01cfe3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -482,6 +482,11 @@ static bool type_may_be_null(u32 type)
return type & PTR_MAYBE_NULL;
}
+static bool type_is_iter(u32 type)
+{
+ return type & PTR_ITER || type & PTR_ITER_END;
+}
+
static bool is_acquire_function(enum bpf_func_id func_id,
const struct bpf_map *map)
{
@@ -547,14 +552,20 @@ static bool function_manipulates_rbtree_node(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_rbtree_add ||
func_id == BPF_FUNC_rbtree_remove ||
- func_id == BPF_FUNC_rbtree_free_node;
+ func_id == BPF_FUNC_rbtree_free_node ||
+ func_id == BPF_FUNC_rbtree_next ||
+ func_id == BPF_FUNC_rbtree_prev;
}
static bool function_returns_rbtree_node(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_rbtree_alloc_node ||
func_id == BPF_FUNC_rbtree_add ||
- func_id == BPF_FUNC_rbtree_remove;
+ func_id == BPF_FUNC_rbtree_remove ||
+ func_id == BPF_FUNC_rbtree_first ||
+ func_id == BPF_FUNC_rbtree_last ||
+ func_id == BPF_FUNC_rbtree_next ||
+ func_id == BPF_FUNC_rbtree_prev;
}
/* string representation of 'enum bpf_reg_type'
@@ -614,6 +625,12 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
32 - postfix_idx);
}
+ if (type_is_iter(type)) {
+ postfix_idx += strlcpy(postfix + postfix_idx, "_iter", 32 - postfix_idx);
+ if (type & PTR_ITER_END)
+ postfix_idx += strlcpy(postfix + postfix_idx, "_end", 32 - postfix_idx);
+ }
+
if (type & MEM_RDONLY)
strncpy(prefix, "rdonly_", 32);
if (type & MEM_ALLOC)
@@ -1428,7 +1445,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
map->map_type == BPF_MAP_TYPE_SOCKHASH) {
reg->type = PTR_TO_SOCKET;
} else {
- reg->type = PTR_TO_MAP_VALUE;
+ reg->type &= ~PTR_MAYBE_NULL;
}
return;
}
@@ -3021,6 +3038,11 @@ static bool __is_pointer_value(bool allow_ptr_leaks,
return reg->type != SCALAR_VALUE;
}
+static bool __is_iter_end(const struct bpf_reg_state *reg)
+{
+ return reg->type & PTR_ITER_END;
+}
+
static void save_register_state(struct bpf_func_state *state,
int spi, struct bpf_reg_state *reg,
int size)
@@ -5666,6 +5688,8 @@ static const struct bpf_reg_types map_key_value_types = {
PTR_TO_PACKET_META,
PTR_TO_MAP_KEY,
PTR_TO_MAP_VALUE,
+ PTR_TO_MAP_VALUE | PTR_ITER,
+ PTR_TO_MAP_VALUE | PTR_ITER_END,
},
};
@@ -5793,6 +5817,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
if (arg_type & PTR_MAYBE_NULL)
type &= ~PTR_MAYBE_NULL;
+ /* TYPE | PTR_ITER is valid input for helpers that expect TYPE
+ * TYPE is not valid input for helpers that expect TYPE | PTR_ITER
+ */
+ if (type_is_iter(arg_type)) {
+ if (!type_is_iter(type))
+ goto not_found;
+
+ type &= ~PTR_ITER;
+ type &= ~PTR_ITER_END;
+ }
+
for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
expected = compatible->types[i];
if (expected == NOT_INIT)
@@ -5802,6 +5837,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
goto found;
}
+not_found:
verbose(env, "R%d type=%s expected=", regno, reg_type_str(env, reg->type));
for (j = 0; j + 1 < i; j++)
verbose(env, "%s, ", reg_type_str(env, compatible->types[j]));
@@ -9762,6 +9798,17 @@ static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode,
bool is_jmp32)
{
if (__is_pointer_value(false, reg)) {
+ if (__is_iter_end(reg) && val == 0) {
+ __mark_reg_const_zero(reg);
+ switch (opcode) {
+ case BPF_JEQ:
+ return 1;
+ case BPF_JNE:
+ return 0;
+ default:
+ return -1;
+ }
+ }
if (!reg_type_not_null(reg->type))
return -1;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d21e2c99ea14..3869653c6aeb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5409,6 +5409,34 @@ union bpf_attr {
*
* Return
* 0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ * Description
+ * Return the first node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ * Description
+ * Return the last node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ * Description
+ * Return the next node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ * Description
+ * Return the previous node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5620,13 +5648,17 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(rbtree_alloc_node), \
- FN(rbtree_add), \
+ FN(rbtree_add), \
FN(rbtree_find), \
FN(rbtree_remove), \
FN(rbtree_free_node), \
FN(rbtree_get_lock), \
FN(rbtree_lock), \
FN(rbtree_unlock), \
+ FN(rbtree_first), \
+ FN(rbtree_last), \
+ FN(rbtree_next), \
+ FN(rbtree_prev), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.30.2
next prev parent reply other threads:[~2022-07-22 18:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-22 18:34 [RFC PATCH bpf-next 00/11] bpf: Introduce rbtree map Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 01/11] bpf: Pull repeated reg access bounds check into helper fn Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 02/11] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 03/11] bpf: Add rb_node_off to bpf_map Dave Marchevsky
2022-08-01 22:19 ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 04/11] bpf: Add rbtree map Dave Marchevsky
2022-08-01 21:49 ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 05/11] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
2022-08-01 22:17 ` Alexei Starovoitov
2022-08-02 13:59 ` Kumar Kartikeya Dwivedi
2022-08-02 15:30 ` Alexei Starovoitov
2022-08-10 21:46 ` Kumar Kartikeya Dwivedi
2022-08-10 22:06 ` Alexei Starovoitov
2022-08-10 23:16 ` Kumar Kartikeya Dwivedi
2022-08-15 5:33 ` Yonghong Song
2022-08-15 5:37 ` Kumar Kartikeya Dwivedi
2022-07-22 18:34 ` [RFC PATCH bpf-next 06/11] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
2022-08-01 21:58 ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 07/11] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 08/11] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
2022-08-01 22:41 ` Alexei Starovoitov
2022-07-22 18:34 ` [RFC PATCH bpf-next 09/11] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
2022-08-01 22:23 ` Alexei Starovoitov
2022-07-22 18:34 ` Dave Marchevsky [this message]
2022-07-29 16:31 ` [RFC PATCH bpf-next 10/11] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Tejun Heo
2022-08-01 22:44 ` Alexei Starovoitov
2022-08-02 13:05 ` Kumar Kartikeya Dwivedi
2022-08-02 15:10 ` Alexei Starovoitov
2022-08-10 17:56 ` Dave Marchevsky
2022-07-22 18:34 ` [RFC PATCH bpf-next 11/11] selftests/bpf: Add rbtree map tests Dave Marchevsky
2022-07-28 7:18 ` Yonghong Song
2022-08-10 17:48 ` Dave Marchevsky
2022-07-28 7:04 ` [RFC PATCH bpf-next 00/11] bpf: Introduce rbtree map Yonghong Song
2022-08-10 17:54 ` Dave Marchevsky
2022-08-01 21:27 ` Alexei Starovoitov
2022-08-10 18:11 ` Dave Marchevsky
2022-08-02 22:02 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220722183438.3319790-11-davemarchevsky@fb.com \
--to=davemarchevsky@fb.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).