All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
@ 2022-12-06 23:09 Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
                   ` (14 more replies)
  0 siblings, 15 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This series adds a rbtree datastructure following the "next-gen
datastructure" precedent set by recently-added linked-list [0]. This is
a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
instead of adding a new map type. This series adds a smaller set of API
functions than that RFC - just the minimum needed to support current
cgfifo example scheduler in ongoing sched_ext effort [2], namely:

  bpf_rbtree_add
  bpf_rbtree_remove
  bpf_rbtree_first

The meat of this series is bugfixes and verifier infra work to support
these API functions. Adding more rbtree kfuncs in future patches should
be straightforward as a result.

BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
hood. From the BPF program writer's perspective, a BPF rbtree is very
similar to existing linked list. Consider the following example:

  struct node_data {
    long key;
    long data;
    struct bpf_rb_node node;
  }

  static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
  {
    struct node_data *node_a;
    struct node_data *node_b;

    node_a = container_of(a, struct node_data, node);
    node_b = container_of(b, struct node_data, node);

    return node_a->key < node_b->key;
  }

  private(A) struct bpf_spin_lock glock;
  private(A) struct bpf_rb_root groot __contains(node_data, node);

  /* ... in BPF program */
  struct node_data *n, *m;
  struct bpf_rb_node *res;

  n = bpf_obj_new(typeof(*n));
  if (!n)
    /* skip */
  n->key = 5;
  n->data = 10;

  bpf_spin_lock(&glock);
  bpf_rbtree_add(&groot, &n->node, less);
  bpf_spin_unlock(&glock);

  bpf_spin_lock(&glock);
  res = bpf_rbtree_first(&groot);
  if (!res)
    /* skip */
  res = bpf_rbtree_remove(&groot, res);
  if (!res)
    /* skip */
  bpf_spin_unlock(&glock);

  m = container_of(res, struct node_data, node);
  bpf_obj_drop(m);

Some obvious similarities:

  * Special bpf_rb_root and bpf_rb_node types have same semantics
    as bpf_list_head and bpf_list_node, respectively
  * __contains is used to associated node type with root
  * The spin_lock associated with a rbtree must be held when using
    rbtree API kfuncs
  * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
  * Rbtree takes ownership of node lifetime when a node is added.
    Removing a node gives ownership back to the program, requiring a
    bpf_obj_drop before program exit

Some new additions as well:

  * Support for callbacks in kfunc args is added to enable 'less'
    callback use above
  * bpf_rbtree_first's release_on_unlock handling is a bit novel, as
    it's the first next-gen ds API function to release_on_unlock its
    return reg instead of nonexistent node arg
  * Because all references to nodes already added to the rbtree are
    'non-owning', i.e. release_on_unlock and PTR_UNTRUSTED,
    bpf_rbtree_remove must accept such a reference in order to remove it
    from the tree

It seemed better to special-case some 'new additions' verifier logic for
now instead of adding new type flags and concepts, as some of the concepts
(e.g. PTR_UNTRUSTED + release_on_unlock) need a refactoring pass before
we pile more on. Regardless, the net-new verifier logic added in this
patchset is minimal. Verifier changes are mostly generaliztion of
existing linked-list logic and some bugfixes.

A note on naming: 

Some existing list-specific helpers are renamed to 'datastructure_head',
'datastructure_node', etc. Probably a more concise and accurate naming
would be something like 'ng_ds_head' for 'next-gen datastructure'.

For folks who weren't following the conversations over past few months, 
though, such a naming scheme might seem to indicate that _all_ next-gen
datastructures must have certain semantics, like release_on_unlock,
which aren't necessarily required. For this reason I'd like some
feedback on how to name things.

Summary of patches:

  Patches 1, 2, and 10 are bugfixes which are likely worth applying
  independently of rbtree implementation. Patch 12 is somewhere between
  nice-to-have and bugfix.

  Patches 3 and 4 are nonfunctional refactor/rename.

  Patches 5 - 9 implement the meat of rbtree support in this series,
  gradually building up to implemented kfuncs that verify as expected.
  Patch 11 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.

  Patch 13 adds tests.

  [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com
  [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
  [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org

Future work:
  Enabling writes to release_on_unlock refs should be done before the
  functionality of BPF rbtree can truly be considered complete.
  Implementing this proved more complex than expected so it's been
  pushed off to a future patch.

Dave Marchevsky (13):
  bpf: Loosen alloc obj test in verifier's reg_btf_record
  bpf: map_check_btf should fail if btf_parse_fields fails
  bpf: Minor refactor of ref_set_release_on_unlock
  bpf: rename list_head -> datastructure_head in field info types
  bpf: Add basic bpf_rb_{root,node} support
  bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  bpf: Add callback validation to kfunc verifier logic
  bpf: Special verifier handling for bpf_rbtree_{remove, first}
  bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj
    type
  selftests/bpf: Add rbtree selftests

 arch/x86/net/bpf_jit_comp.c                   | 123 +++--
 include/linux/bpf.h                           |  21 +-
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/btf.c                              | 181 ++++---
 kernel/bpf/helpers.c                          |  75 ++-
 kernel/bpf/syscall.c                          |  33 +-
 kernel/bpf/verifier.c                         | 506 +++++++++++++++---
 tools/include/uapi/linux/bpf.h                |  11 +
 tools/lib/bpf/libbpf.c                        |  50 +-
 .../testing/selftests/bpf/bpf_experimental.h  |  24 +
 .../selftests/bpf/prog_tests/linked_list.c    |  12 +-
 .../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 180 +++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  48 ++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  21 +
 .../testing/selftests/bpf/progs/rbtree_fail.c | 263 +++++++++
 16 files changed, 1549 insertions(+), 194 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07 16:41   ` Kumar Kartikeya Dwivedi
  2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
There, a BTF record is created for any type containing a spin_lock or
any next-gen datastructure node/head.

Currently, for non-MAP_VALUE types, reg_btf_record will only search for
a record using struct_meta_tab if the reg->type exactly matches
(PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
"allocated obj" type - returned from bpf_obj_new - might pick up other
flags while working its way through the program.

Loosen the check to be exact for base_type and just use MEM_ALLOC mask
for type_flag.

This patch is marked Fixes as the original intent of reg_btf_record was
unlikely to have been to fail finding btf_record for valid alloc obj
types with additional flags, some of which (e.g. PTR_UNTRUSTED)
are valid register type states for alloc obj independent of this series.
However, I didn't find a specific broken repro case outside of this
series' added functionality, so it's possible that nothing was
triggering this logic error before.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Fixes: 4e814da0d599 ("bpf: Allow locking bpf_spin_lock in allocated objects")
---
 kernel/bpf/verifier.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1d51bd9596da..67a13110bc22 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -451,6 +451,11 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 		type == PTR_TO_SOCK_COMMON;
 }
 
+static bool type_is_ptr_alloc_obj(u32 type)
+{
+	return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC;
+}
+
 static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
 {
 	struct btf_record *rec = NULL;
@@ -458,7 +463,7 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
 
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		rec = reg->map_ptr->record;
-	} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
+	} else if (type_is_ptr_alloc_obj(reg->type)) {
 		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
 		if (meta)
 			rec = meta->record;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  1:32   ` Alexei Starovoitov
  2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
  2022-12-06 23:09 ` [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock Dave Marchevsky
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

map_check_btf calls btf_parse_fields to create a btf_record for its
value_type. If there are no special fields in the value_type
btf_parse_fields returns NULL, whereas if there special value_type
fields but they are invalid in some way an error is returned.

An example invalid state would be:

  struct node_data {
    struct bpf_rb_node node;
    int data;
  };

  private(A) struct bpf_spin_lock glock;
  private(A) struct bpf_list_head ghead __contains(node_data, node);

groot should be invalid as its __contains tag points to a field with
type != "bpf_list_node".

Before this patch, such a scenario would result in btf_parse_fields
returning an error ptr, subsequent !IS_ERR_OR_NULL check failing,
and btf_check_and_fixup_fields returning 0, which would then be
returned by map_check_btf.

After this patch's changes, -EINVAL would be returned by map_check_btf
and the map would correctly fail to load.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Fixes: aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
---
 kernel/bpf/syscall.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..c3599a7902f0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1007,7 +1007,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	map->record = btf_parse_fields(btf, value_type,
 				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
 				       map->value_size);
-	if (!IS_ERR_OR_NULL(map->record)) {
+	if (IS_ERR(map->record))
+		return -EINVAL;
+
+	if (map->record) {
 		int i;
 
 		if (!bpf_capable()) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This is mostly a nonfunctional change. The verifier log message
"expected false release_on_unlock" was missing a newline, so add it and
move some checks around to reduce indentation level.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 67a13110bc22..6f0aac837d77 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8438,19 +8438,21 @@ static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_i
 		return -EFAULT;
 	}
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].id == ref_obj_id) {
-			if (state->refs[i].release_on_unlock) {
-				verbose(env, "verifier internal error: expected false release_on_unlock");
-				return -EFAULT;
-			}
-			state->refs[i].release_on_unlock = true;
-			/* Now mark everyone sharing same ref_obj_id as untrusted */
-			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-				if (reg->ref_obj_id == ref_obj_id)
-					reg->type |= PTR_UNTRUSTED;
-			}));
-			return 0;
+		if (state->refs[i].id != ref_obj_id)
+			continue;
+
+		if (state->refs[i].release_on_unlock) {
+			verbose(env, "verifier internal error: expected false release_on_unlock\n");
+			return -EFAULT;
 		}
+
+		state->refs[i].release_on_unlock = true;
+		/* Now mark everyone sharing same ref_obj_id as untrusted */
+		bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+			if (reg->ref_obj_id == ref_obj_id)
+				reg->type |= PTR_UNTRUSTED;
+		}));
+		return 0;
 	}
 	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
 	return -EFAULT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (2 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  1:41   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:

include/linux/bpf.h:
  struct btf_field_list_head -> struct btf_field_datastructure_head
  list_head -> datastructure_head in struct btf_field union
kernel/bpf/btf.c:
  list_head -> datastructure_head in struct btf_field_info

This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  4 ++--
 kernel/bpf/btf.c      | 21 +++++++++++----------
 kernel/bpf/helpers.c  |  4 ++--
 kernel/bpf/verifier.c | 21 +++++++++++----------
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4920ac252754..9e8b12c7061e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -189,7 +189,7 @@ struct btf_field_kptr {
 	u32 btf_id;
 };
 
-struct btf_field_list_head {
+struct btf_field_datastructure_head {
 	struct btf *btf;
 	u32 value_btf_id;
 	u32 node_offset;
@@ -201,7 +201,7 @@ struct btf_field {
 	enum btf_field_type type;
 	union {
 		struct btf_field_kptr kptr;
-		struct btf_field_list_head list_head;
+		struct btf_field_datastructure_head datastructure_head;
 	};
 };
 
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c80bd8709e69..284e3e4b76b7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3227,7 +3227,7 @@ struct btf_field_info {
 		struct {
 			const char *node_name;
 			u32 value_btf_id;
-		} list_head;
+		} datastructure_head;
 	};
 };
 
@@ -3334,8 +3334,8 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
 		return -EINVAL;
 	info->type = BPF_LIST_HEAD;
 	info->off = off;
-	info->list_head.value_btf_id = id;
-	info->list_head.node_name = list_node;
+	info->datastructure_head.value_btf_id = id;
+	info->datastructure_head.node_name = list_node;
 	return BTF_FIELD_FOUND;
 }
 
@@ -3603,13 +3603,14 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 	u32 offset;
 	int i;
 
-	t = btf_type_by_id(btf, info->list_head.value_btf_id);
+	t = btf_type_by_id(btf, info->datastructure_head.value_btf_id);
 	/* We've already checked that value_btf_id is a struct type. We
 	 * just need to figure out the offset of the list_node, and
 	 * verify its type.
 	 */
 	for_each_member(i, t, member) {
-		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
+		if (strcmp(info->datastructure_head.node_name,
+			   __btf_name_by_offset(btf, member->name_off)))
 			continue;
 		/* Invalid BTF, two members with same name */
 		if (n)
@@ -3626,9 +3627,9 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 		if (offset % __alignof__(struct bpf_list_node))
 			return -EINVAL;
 
-		field->list_head.btf = (struct btf *)btf;
-		field->list_head.value_btf_id = info->list_head.value_btf_id;
-		field->list_head.node_offset = offset;
+		field->datastructure_head.btf = (struct btf *)btf;
+		field->datastructure_head.value_btf_id = info->datastructure_head.value_btf_id;
+		field->datastructure_head.node_offset = offset;
 	}
 	if (!n)
 		return -ENOENT;
@@ -3735,11 +3736,11 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 
 		if (!(rec->fields[i].type & BPF_LIST_HEAD))
 			continue;
-		btf_id = rec->fields[i].list_head.value_btf_id;
+		btf_id = rec->fields[i].datastructure_head.value_btf_id;
 		meta = btf_find_struct_meta(btf, btf_id);
 		if (!meta)
 			return -EFAULT;
-		rec->fields[i].list_head.value_rec = meta->record;
+		rec->fields[i].datastructure_head.value_rec = meta->record;
 
 		if (!(rec->field_mask & BPF_LIST_NODE))
 			continue;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index cca642358e80..6c67740222c2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1737,12 +1737,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	while (head != orig_head) {
 		void *obj = head;
 
-		obj -= field->list_head.node_offset;
+		obj -= field->datastructure_head.node_offset;
 		head = head->next;
 		/* The contained type can also have resources, including a
 		 * bpf_list_head which needs to be freed.
 		 */
-		bpf_obj_free_fields(field->list_head.value_rec, obj);
+		bpf_obj_free_fields(field->datastructure_head.value_rec, obj);
 		/* bpf_mem_free requires migrate_disable(), since we can be
 		 * called from map free path as well apart from BPF program (as
 		 * part of map ops doing bpf_obj_free_fields).
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f0aac837d77..bc80b4c4377b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8615,21 +8615,22 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 
 	field = meta->arg_list_head.field;
 
-	et = btf_type_by_id(field->list_head.btf, field->list_head.value_btf_id);
+	et = btf_type_by_id(field->datastructure_head.btf, field->datastructure_head.value_btf_id);
 	t = btf_type_by_id(reg->btf, reg->btf_id);
-	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
-				  field->list_head.value_btf_id, true)) {
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->datastructure_head.btf,
+				  field->datastructure_head.value_btf_id, true)) {
 		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
 			"in struct %s, but arg is at offset=%d in struct %s\n",
-			field->list_head.node_offset, btf_name_by_offset(field->list_head.btf, et->name_off),
+			field->datastructure_head.node_offset,
+			btf_name_by_offset(field->datastructure_head.btf, et->name_off),
 			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
 
-	if (list_node_off != field->list_head.node_offset) {
+	if (list_node_off != field->datastructure_head.node_offset) {
 		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
-			list_node_off, field->list_head.node_offset,
-			btf_name_by_offset(field->list_head.btf, et->name_off));
+			list_node_off, field->datastructure_head.node_offset,
+			btf_name_by_offset(field->datastructure_head.btf, et->name_off));
 		return -EINVAL;
 	}
 	/* Set arg#1 for expiration after unlock */
@@ -9078,9 +9079,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = field->list_head.btf;
-				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
-				regs[BPF_REG_0].off = field->list_head.node_offset;
+				regs[BPF_REG_0].btf = field->datastructure_head.btf;
+				regs[BPF_REG_0].btf_id = field->datastructure_head.value_btf_id;
+				regs[BPF_REG_0].off = field->datastructure_head.node_offset;
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (3 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  1:48   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
map_values.

structs bpf_rb_root and bpf_rb_node are opaque types meant to
obscure structs rb_root_cached rb_node, respectively.

btf_struct_access will prevent BPF programs from touching these special
fields automatically now that they're recognized.

btf_check_and_fixup_fields now groups list_head and rb_root together as
"owner" fields and {list,rb}_node as "ownee", and does same ownership
cycle checking as before. Note this function does _not_ prevent
ownership type mixups (e.g. rb_root owning list_node) - that's handled
by btf_parse_datastructure_head.

After this patch, a bpf program can have a struct bpf_rb_root in a
map_value, but not add anything to nor do anything useful with it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h                           |  17 ++
 include/uapi/linux/bpf.h                      |  11 ++
 kernel/bpf/btf.c                              | 162 ++++++++++++------
 kernel/bpf/helpers.c                          |  40 +++++
 kernel/bpf/syscall.c                          |  28 ++-
 kernel/bpf/verifier.c                         |   5 +-
 tools/include/uapi/linux/bpf.h                |  11 ++
 .../selftests/bpf/prog_tests/linked_list.c    |  12 +-
 8 files changed, 214 insertions(+), 72 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9e8b12c7061e..2f8c4960390e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,6 +180,8 @@ enum btf_field_type {
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
 	BPF_LIST_NODE  = (1 << 5),
+	BPF_RB_ROOT    = (1 << 6),
+	BPF_RB_NODE    = (1 << 7),
 };
 
 struct btf_field_kptr {
@@ -283,6 +285,10 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
 		return "bpf_list_node";
+	case BPF_RB_ROOT:
+		return "bpf_rb_root";
+	case BPF_RB_NODE:
+		return "bpf_rb_node";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -303,6 +309,10 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return sizeof(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return sizeof(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return sizeof(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -323,6 +333,10 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return __alignof__(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return __alignof__(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return __alignof__(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -433,6 +447,9 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 void bpf_timer_cancel_and_free(void *timer);
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock);
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock);
+
 
 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f89de51a45db..02e68c352372 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6901,6 +6901,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 284e3e4b76b7..a42f67031963 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3304,12 +3304,14 @@ static const char *btf_find_decl_tag_value(const struct btf *btf,
 	return NULL;
 }
 
-static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
-			      const struct btf_type *t, int comp_idx,
-			      u32 off, int sz, struct btf_field_info *info)
+static int
+btf_find_datastructure_head(const struct btf *btf, const struct btf_type *pt,
+			    const struct btf_type *t, int comp_idx, u32 off,
+			    int sz, struct btf_field_info *info,
+			    enum btf_field_type head_type)
 {
+	const char *node_field_name;
 	const char *value_type;
-	const char *list_node;
 	s32 id;
 
 	if (!__btf_type_is_struct(t))
@@ -3319,26 +3321,32 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
 	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
 	if (!value_type)
 		return -EINVAL;
-	list_node = strstr(value_type, ":");
-	if (!list_node)
+	node_field_name = strstr(value_type, ":");
+	if (!node_field_name)
 		return -EINVAL;
-	value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN);
+	value_type = kstrndup(value_type, node_field_name - value_type, GFP_KERNEL | __GFP_NOWARN);
 	if (!value_type)
 		return -ENOMEM;
 	id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT);
 	kfree(value_type);
 	if (id < 0)
 		return id;
-	list_node++;
-	if (str_is_empty(list_node))
+	node_field_name++;
+	if (str_is_empty(node_field_name))
 		return -EINVAL;
-	info->type = BPF_LIST_HEAD;
+	info->type = head_type;
 	info->off = off;
 	info->datastructure_head.value_btf_id = id;
-	info->datastructure_head.node_name = list_node;
+	info->datastructure_head.node_name = node_field_name;
 	return BTF_FIELD_FOUND;
 }
 
+#define field_mask_test_name(field_type, field_type_str) \
+	if (field_mask & field_type && !strcmp(name, field_type_str)) { \
+		type = field_type;					\
+		goto end;						\
+	}
+
 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			      int *align, int *sz)
 {
@@ -3362,18 +3370,11 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
-	if (field_mask & BPF_LIST_HEAD) {
-		if (!strcmp(name, "bpf_list_head")) {
-			type = BPF_LIST_HEAD;
-			goto end;
-		}
-	}
-	if (field_mask & BPF_LIST_NODE) {
-		if (!strcmp(name, "bpf_list_node")) {
-			type = BPF_LIST_NODE;
-			goto end;
-		}
-	}
+	field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
+	field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
+	field_mask_test_name(BPF_RB_ROOT,   "bpf_rb_root");
+	field_mask_test_name(BPF_RB_NODE,   "bpf_rb_node");
+
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3386,6 +3387,8 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 	return type;
 }
 
+#undef field_mask_test_name
+
 static int btf_find_struct_field(const struct btf *btf,
 				 const struct btf_type *t, u32 field_mask,
 				 struct btf_field_info *info, int info_cnt)
@@ -3418,6 +3421,7 @@ static int btf_find_struct_field(const struct btf *btf,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3431,8 +3435,11 @@ static int btf_find_struct_field(const struct btf *btf,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, t, member_type, i, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_datastructure_head(btf, t, member_type,
+							  i, off, sz,
+							  idx < info_cnt ? &info[idx] : &tmp,
+							  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3479,6 +3486,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3492,8 +3500,11 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, var, var_type, -1, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_datastructure_head(btf, var, var_type,
+							  -1, off, sz,
+							  idx < info_cnt ? &info[idx] : &tmp,
+							  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3595,8 +3606,11 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 	return ret;
 }
 
-static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
-			       struct btf_field_info *info)
+static int btf_parse_datastructure_head(const struct btf *btf,
+					struct btf_field *field,
+					struct btf_field_info *info,
+					const char *node_type_name,
+					size_t node_type_align)
 {
 	const struct btf_type *t, *n = NULL;
 	const struct btf_member *member;
@@ -3618,13 +3632,13 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 		n = btf_type_by_id(btf, member->type);
 		if (!__btf_type_is_struct(n))
 			return -EINVAL;
-		if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off)))
+		if (strcmp(node_type_name, __btf_name_by_offset(btf, n->name_off)))
 			return -EINVAL;
 		offset = __btf_member_bit_offset(n, member);
 		if (offset % 8)
 			return -EINVAL;
 		offset /= 8;
-		if (offset % __alignof__(struct bpf_list_node))
+		if (offset % node_type_align)
 			return -EINVAL;
 
 		field->datastructure_head.btf = (struct btf *)btf;
@@ -3636,6 +3650,20 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 	return 0;
 }
 
+static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
+			       struct btf_field_info *info)
+{
+	return btf_parse_datastructure_head(btf, field, info, "bpf_list_node",
+					    __alignof__(struct bpf_list_node));
+}
+
+static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field,
+			     struct btf_field_info *info)
+{
+	return btf_parse_datastructure_head(btf, field, info, "bpf_rb_node",
+					    __alignof__(struct bpf_rb_node));
+}
+
 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
 				    u32 field_mask, u32 value_size)
 {
@@ -3698,7 +3726,13 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_RB_ROOT:
+			ret = btf_parse_rb_root(btf, &rec->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			ret = -EFAULT;
@@ -3707,8 +3741,9 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
-	/* bpf_list_head requires bpf_spin_lock */
-	if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) {
+	/* bpf_{list_head, rb_node} require bpf_spin_lock */
+	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
+	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -3719,22 +3754,28 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 	return ERR_PTR(ret);
 }
 
+#define OWNER_FIELD_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
+#define OWNEE_FIELD_MASK (BPF_LIST_NODE | BPF_RB_NODE)
+
 int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 {
 	int i;
 
-	/* There are two owning types, kptr_ref and bpf_list_head. The former
-	 * only supports storing kernel types, which can never store references
-	 * to program allocated local types, atleast not yet. Hence we only need
-	 * to ensure that bpf_list_head ownership does not form cycles.
+	/* There are three types that signify ownership of some other type:
+	 *  kptr_ref, bpf_list_head, bpf_rb_root.
+	 * kptr_ref only supports storing kernel types, which can't store
+	 * references to program allocated local types.
+	 *
+	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
+	 * does not form cycles.
 	 */
-	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
+	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & OWNER_FIELD_MASK))
 		return 0;
 	for (i = 0; i < rec->cnt; i++) {
 		struct btf_struct_meta *meta;
 		u32 btf_id;
 
-		if (!(rec->fields[i].type & BPF_LIST_HEAD))
+		if (!(rec->fields[i].type & OWNER_FIELD_MASK))
 			continue;
 		btf_id = rec->fields[i].datastructure_head.value_btf_id;
 		meta = btf_find_struct_meta(btf, btf_id);
@@ -3742,39 +3783,47 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 			return -EFAULT;
 		rec->fields[i].datastructure_head.value_rec = meta->record;
 
-		if (!(rec->field_mask & BPF_LIST_NODE))
+		/* We need to set value_rec for all owner types, but no need
+		 * to check ownership cycle for a type unless it's also an
+		 * ownee type.
+		 */
+		if (!(rec->field_mask & OWNEE_FIELD_MASK))
 			continue;
 
 		/* We need to ensure ownership acyclicity among all types. The
 		 * proper way to do it would be to topologically sort all BTF
 		 * IDs based on the ownership edges, since there can be multiple
-		 * bpf_list_head in a type. Instead, we use the following
-		 * reasoning:
+		 * bpf_{list_head,rb_node} in a type. Instead, we use the
+		 * following resaoning:
 		 *
 		 * - A type can only be owned by another type in user BTF if it
-		 *   has a bpf_list_node.
+		 *   has a bpf_{list,rb}_node. Let's call these ownee types.
 		 * - A type can only _own_ another type in user BTF if it has a
-		 *   bpf_list_head.
+		 *   bpf_{list_head,rb_root}. Let's call these owner types.
 		 *
-		 * We ensure that if a type has both bpf_list_head and
-		 * bpf_list_node, its element types cannot be owning types.
+		 * We ensure that if a type is both an owner and ownee, its
+		 * element types cannot be owner types.
 		 *
 		 * To ensure acyclicity:
 		 *
-		 * When A only has bpf_list_head, ownership chain can be:
+		 * When A is an owner type but not an ownee, its ownership
+		 * chain can be:
 		 *	A -> B -> C
 		 * Where:
-		 * - B has both bpf_list_head and bpf_list_node.
-		 * - C only has bpf_list_node.
+		 * - A is an owner, e.g. has bpf_rb_root.
+		 * - B is both an owner and ownee, e.g. has bpf_rb_node and
+		 *   bpf_list_head.
+		 * - C is only an owner, e.g. has bpf_list_node
 		 *
-		 * When A has both bpf_list_head and bpf_list_node, some other
-		 * type already owns it in the BTF domain, hence it can not own
-		 * another owning type through any of the bpf_list_head edges.
+		 * When A is both an owner and ownee, some other type already
+		 * owns it in the BTF domain, hence it can not own
+		 * another owner type through any of the ownership edges.
 		 *	A -> B
 		 * Where:
-		 * - B only has bpf_list_node.
+		 * - A is both an owner and ownee.
+		 * - B is only an ownee.
 		 */
-		if (meta->record->field_mask & BPF_LIST_HEAD)
+		if (meta->record->field_mask & OWNER_FIELD_MASK)
 			return -ELOOP;
 	}
 	return 0;
@@ -5236,6 +5285,8 @@ static const char *alloc_obj_fields[] = {
 	"bpf_spin_lock",
 	"bpf_list_head",
 	"bpf_list_node",
+	"bpf_rb_root",
+	"bpf_rb_node",
 };
 
 static struct btf_struct_metas *
@@ -5309,7 +5360,8 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+						  BPF_RB_ROOT | BPF_RB_NODE, t->size);
 		/* The record cannot be unset, treat it as an error if so */
 		if (IS_ERR_OR_NULL(record)) {
 			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6c67740222c2..4d04432b162e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1753,6 +1753,46 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	}
 }
 
+/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
+ * 'rb_node *', so field name of rb_node within containing struct is not
+ * needed.
+ *
+ * Since bpf_rb_tree's node type has a corresponding struct btf_field with
+ * datastructure_head.node_offset, it's not necessary to know field name
+ * or type of node struct
+ */
+#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \
+	for (pos = rb_first_postorder(root); \
+	    pos && ({ n = rb_next_postorder(pos); 1; }); \
+	    pos = n)
+
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock)
+{
+	struct rb_root_cached orig_root, *root = rb_root;
+	struct rb_node *pos, *n;
+	void *obj;
+
+	BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
+	BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
+
+	__bpf_spin_lock_irqsave(spin_lock);
+	orig_root = *root;
+	*root = RB_ROOT_CACHED;
+	__bpf_spin_unlock_irqrestore(spin_lock);
+
+	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
+		obj = pos;
+		obj -= field->datastructure_head.node_offset;
+
+		bpf_obj_free_fields(field->datastructure_head.value_rec, obj);
+
+		migrate_disable();
+		bpf_mem_free(&bpf_global_ma, obj);
+		migrate_enable();
+	}
+}
+
 __diag_push();
 __diag_ignore_all("-Wmissing-prototypes",
 		  "Global functions as their definitions will be in vmlinux BTF");
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c3599a7902f0..b6b464c15575 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -527,9 +527,6 @@ void btf_record_free(struct btf_record *rec)
 		return;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (rec->fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			if (rec->fields[i].kptr.module)
@@ -538,7 +535,11 @@ void btf_record_free(struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to release for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to release */
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -571,9 +572,6 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 	new_rec->cnt = 0;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			btf_get(fields[i].kptr.btf);
@@ -584,7 +582,11 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to acquire for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to acquire */
 			break;
 		default:
 			ret = -EFAULT;
@@ -664,7 +666,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 				continue;
 			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
+		case BPF_RB_ROOT:
+			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
+				continue;
+			bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -1005,7 +1013,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
+				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_RB_ROOT,
 				       map->value_size);
 	if (IS_ERR(map->record))
 		return -EINVAL;
@@ -1056,6 +1065,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				}
 				break;
 			case BPF_LIST_HEAD:
+			case BPF_RB_ROOT:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bc80b4c4377b..9d9e00fd6dfa 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14105,9 +14105,10 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
-	if (btf_record_has_field(map->record, BPF_LIST_HEAD)) {
+	if (btf_record_has_field(map->record, BPF_LIST_HEAD) ||
+	    btf_record_has_field(map->record, BPF_RB_ROOT)) {
 		if (is_tracing_prog_type(prog_type)) {
-			verbose(env, "tracing progs cannot use bpf_list_head yet\n");
+			verbose(env, "tracing progs cannot use bpf_{list_head,rb_root} yet\n");
 			return -EINVAL;
 		}
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f89de51a45db..02e68c352372 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6901,6 +6901,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 9a7d4c47af63..b124028ab51a 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -58,12 +58,12 @@ static struct {
 	TEST(inner_map, pop_front)
 	TEST(inner_map, pop_back)
 #undef TEST
-	{ "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_perf", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_kprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_kretprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_perf", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
 	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
 	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (4 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07 14:20   ` kernel test robot
  2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds implementations of bpf_rbtree_{add,remove,first}
and teaches verifier about their BTF_IDs as well as those of
bpf_rb_{root,node}.

All three kfuncs have some nonstandard component to their verification
that needs to be addressed in future patches before programs can
properly use them:

  * bpf_rbtree_add:     Takes 'less' callback, need to verify it

  * bpf_rbtree_first:   Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). Return value ref is
			should be released on unlock.

  * bpf_rbtree_remove:  Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). 2nd arg (node) is a
			release_on_unlock + PTR_UNTRUSTED reg.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c  | 31 +++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 11 +++++++++++
 2 files changed, 42 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4d04432b162e..d216c54b65ab 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1865,6 +1865,33 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 	return __bpf_list_del(head, true);
 }
 
+struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, struct bpf_rb_node *node)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+	struct rb_node *n = (struct rb_node *)node;
+
+	if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
+		return (struct bpf_rb_node *)NULL;
+
+	rb_erase_cached(n, r);
+	RB_CLEAR_NODE(n);
+	return (struct bpf_rb_node *)n;
+}
+
+void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+		    bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
+{
+	rb_add_cached((struct rb_node *)node, (struct rb_root_cached *)root,
+		      (bool (*)(struct rb_node *, const struct rb_node *))less);
+}
+
+struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+
+	return (struct bpf_rb_node *)rb_first_cached(r);
+}
+
 /**
  * bpf_task_acquire - Acquire a reference to a task. A task acquired by this
  * kfunc which is not stored in a map as a kptr, must be released by calling
@@ -2069,6 +2096,10 @@ BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_rbtree_add)
+BTF_ID_FLAGS(func, bpf_rbtree_first, KF_ACQUIRE | KF_RET_NULL)
+
 #ifdef CONFIG_CGROUPS
 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9d9e00fd6dfa..e36dbde8736c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8135,6 +8135,8 @@ BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
 BTF_ID(struct, bpf_list_head)
 BTF_ID(struct, bpf_list_node)
+BTF_ID(struct, bpf_rb_root)
+BTF_ID(struct, bpf_rb_node)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -8240,6 +8242,9 @@ enum special_kfunc_type {
 	KF_bpf_rdonly_cast,
 	KF_bpf_rcu_read_lock,
 	KF_bpf_rcu_read_unlock,
+	KF_bpf_rbtree_remove,
+	KF_bpf_rbtree_add,
+	KF_bpf_rbtree_first,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -8251,6 +8256,9 @@ BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -8264,6 +8272,9 @@ BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
 BTF_ID(func, bpf_rcu_read_lock)
 BTF_ID(func, bpf_rcu_read_unlock)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (5 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  1:51   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Now that we find bpf_rb_root and bpf_rb_node in structs, let's give args
that contain those types special classification and properly handle
these types when checking kfunc args.

"Properly handling" these types largely requires generalizing similar
handling for bpf_list_{head,node}, with little new logic added in this
patch.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 237 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 203 insertions(+), 34 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e36dbde8736c..652112007b2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8018,6 +8018,9 @@ struct bpf_kfunc_call_arg_meta {
 	struct {
 		struct btf_field *field;
 	} arg_list_head;
+	struct {
+		struct btf_field *field;
+	} arg_rbtree_root;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -8129,6 +8132,8 @@ enum {
 	KF_ARG_DYNPTR_ID,
 	KF_ARG_LIST_HEAD_ID,
 	KF_ARG_LIST_NODE_ID,
+	KF_ARG_RB_ROOT_ID,
+	KF_ARG_RB_NODE_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -8170,6 +8175,16 @@ static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
 }
 
+static bool is_kfunc_arg_rbtree_root(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_ROOT_ID);
+}
+
+static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8229,6 +8244,8 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_RB_ROOT,
+	KF_ARG_PTR_TO_RB_NODE,
 };
 
 enum special_kfunc_type {
@@ -8336,6 +8353,12 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_LIST_NODE;
 
+	if (is_kfunc_arg_rbtree_root(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_ROOT;
+
+	if (is_kfunc_arg_rbtree_node(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_NODE;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -8550,97 +8573,196 @@ static bool is_bpf_list_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_back];
 }
 
-static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
+static bool is_bpf_rbtree_api_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_first];
+}
+
+static bool is_bpf_datastructure_api_kfunc(u32 btf_id)
+{
+	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
+}
+
+static bool check_kfunc_is_datastructure_head_api(struct bpf_verifier_env *env,
+						  enum btf_field_type head_field_type,
+						  u32 kfunc_btf_id)
+{
+	bool ret;
+
+	switch (head_field_type) {
+	case BPF_LIST_HEAD:
+		ret = is_bpf_list_api_kfunc(kfunc_btf_id);
+		break;
+	case BPF_RB_ROOT:
+		ret = is_bpf_rbtree_api_kfunc(kfunc_btf_id);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected datastructure head argument type %s\n",
+			btf_field_type_name(head_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s head arg for unknown kfunc\n",
+			btf_field_type_name(head_field_type));
+	return ret;
+}
+
+static bool check_kfunc_is_datastructure_node_api(struct bpf_verifier_env *env,
+						  enum btf_field_type node_field_type,
+						  u32 kfunc_btf_id)
+{
+	bool ret;
+
+	switch (node_field_type) {
+	case BPF_LIST_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_front] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_back]);
+		break;
+	case BPF_RB_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add]);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected datastructure node argument type %s\n",
+			btf_field_type_name(node_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s node arg for unknown kfunc\n",
+			btf_field_type_name(node_field_type));
+	return ret;
+}
+
+static int
+__process_kf_arg_ptr_to_datastructure_head(struct bpf_verifier_env *env,
 					   struct bpf_reg_state *reg, u32 regno,
-					   struct bpf_kfunc_call_arg_meta *meta)
+					   struct bpf_kfunc_call_arg_meta *meta,
+					   enum btf_field_type head_field_type,
+					   struct btf_field **head_field)
 {
+	const char *head_type_name;
 	struct btf_field *field;
 	struct btf_record *rec;
-	u32 list_head_off;
+	u32 head_off;
 
-	if (meta->btf != btf_vmlinux || !is_bpf_list_api_kfunc(meta->func_id)) {
-		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_datastructure_head_api(env, head_field_type, meta->func_id))
+		return -EFAULT;
+
+	head_type_name = btf_field_type_name(head_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, head_type_name);
 		return -EINVAL;
 	}
 
 	rec = reg_btf_record(reg);
-	list_head_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_head_off, BPF_LIST_HEAD);
+	head_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, head_off, head_field_type);
 	if (!field) {
-		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
+		verbose(env, "%s not found at offset=%u\n", head_type_name, head_off);
 		return -EINVAL;
 	}
 
 	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
 	if (check_reg_allocation_locked(env, reg)) {
-		verbose(env, "bpf_spin_lock at off=%d must be held for bpf_list_head\n",
-			rec->spin_lock_off);
+		verbose(env, "bpf_spin_lock at off=%d must be held for %s\n",
+			rec->spin_lock_off, head_type_name);
 		return -EINVAL;
 	}
 
-	if (meta->arg_list_head.field) {
-		verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
+	if (*head_field) {
+		verbose(env, "verifier internal error: repeating %s arg\n", head_type_name);
 		return -EFAULT;
 	}
-	meta->arg_list_head.field = field;
+	*head_field = field;
 	return 0;
 }
 
-static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+
+static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
 					   struct bpf_reg_state *reg, u32 regno,
 					   struct bpf_kfunc_call_arg_meta *meta)
 {
+	return __process_kf_arg_ptr_to_datastructure_head(env, reg, regno, meta, BPF_LIST_HEAD,
+							  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_root(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_datastructure_head(env, reg, regno, meta, BPF_RB_ROOT,
+							  &meta->arg_rbtree_root.field);
+}
+
+static int
+__process_kf_arg_ptr_to_datastructure_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta,
+					   enum btf_field_type head_field_type,
+					   enum btf_field_type node_field_type,
+					   struct btf_field **node_field)
+{
+	const char *node_type_name;
 	const struct btf_type *et, *t;
 	struct btf_field *field;
 	struct btf_record *rec;
-	u32 list_node_off;
+	u32 node_off;
 
-	if (meta->btf != btf_vmlinux ||
-	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
-	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
-		verbose(env, "verifier internal error: bpf_list_node argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_datastructure_node_api(env, node_field_type, meta->func_id))
+		return -EFAULT;
+
+	node_type_name = btf_field_type_name(node_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_node has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, node_type_name);
 		return -EINVAL;
 	}
 
 	rec = reg_btf_record(reg);
-	list_node_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
-	if (!field || field->offset != list_node_off) {
-		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
+	node_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, node_off, node_field_type);
+	if (!field || field->offset != node_off) {
+		verbose(env, "%s not found at offset=%u\n", node_type_name, node_off);
 		return -EINVAL;
 	}
 
-	field = meta->arg_list_head.field;
+	field = *node_field;
 
 	et = btf_type_by_id(field->datastructure_head.btf, field->datastructure_head.value_btf_id);
 	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->datastructure_head.btf,
 				  field->datastructure_head.value_btf_id, true)) {
-		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
+		verbose(env, "operation on %s expects arg#1 %s at offset=%d "
 			"in struct %s, but arg is at offset=%d in struct %s\n",
+			btf_field_type_name(head_field_type),
+			btf_field_type_name(node_field_type),
 			field->datastructure_head.node_offset,
 			btf_name_by_offset(field->datastructure_head.btf, et->name_off),
-			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
+			node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
 
-	if (list_node_off != field->datastructure_head.node_offset) {
-		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
-			list_node_off, field->datastructure_head.node_offset,
+	if (node_off != field->datastructure_head.node_offset) {
+		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
+			node_off, btf_field_type_name(node_field_type),
+			field->datastructure_head.node_offset,
 			btf_name_by_offset(field->datastructure_head.btf, et->name_off));
 		return -EINVAL;
 	}
@@ -8648,6 +8770,24 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 	return ref_set_release_on_unlock(env, reg->ref_obj_id);
 }
 
+static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
+							  BPF_LIST_HEAD, BPF_LIST_NODE,
+							  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
+							  BPF_RB_ROOT, BPF_RB_NODE,
+							  &meta->arg_rbtree_root.field);
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
@@ -8776,6 +8916,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_DYNPTR:
 		case KF_ARG_PTR_TO_LIST_HEAD:
 		case KF_ARG_PTR_TO_LIST_NODE:
+		case KF_ARG_PTR_TO_RB_ROOT:
+		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
 			/* Trusted by default */
@@ -8861,6 +9003,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_ROOT:
+			if (reg->type != PTR_TO_MAP_VALUE &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_root(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_LIST_NODE:
 			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
@@ -8874,6 +9030,19 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_NODE:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to allocated object\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
@@ -13818,7 +13987,7 @@ static int do_check(struct bpf_verifier_env *env)
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_CALL) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
-					     (insn->off != 0 || !is_bpf_list_api_kfunc(insn->imm)))) {
+					     (insn->off != 0 || !is_bpf_datastructure_api_kfunc(insn->imm)))) {
 						verbose(env, "function calls are not allowed while holding a lock\n");
 						return -EINVAL;
 					}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (6 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  2:01   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Some BPF helpers take a callback function which the helper calls. For
each helper that takes such a callback, there's a special call to
__check_func_call with a callback-state-setting callback that sets up
verifier bpf_func_state for the callback's frame.

kfuncs don't have any of this infrastructure yet, so let's add it in
this patch, following existing helper pattern as much as possible. To
validate functionality of this added plumbing, this patch adds
callback handling for the bpf_rbtree_add kfunc and hopes to lay
groundwork for future next-gen datastructure callbacks.

In the "general plumbing" category we have:

  * check_kfunc_call doing callback verification right before clearing
    CALLER_SAVED_REGS, exactly like check_helper_call
  * recognition of func_ptr BTF types in kfunc args as
    KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type

In the "rbtree_add / next-gen datastructure-specific plumbing" category:

  * Since bpf_rbtree_add must be called while the spin_lock associated
    with the tree is held, don't complain when callback's func_state
    doesn't unlock it by frame exit
  * Mark rbtree_add callback's args PTR_UNTRUSTED to prevent rbtree
    api functions from being called in the callback

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 136 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 652112007b2c..9ad8c0b264dc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1448,6 +1448,16 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 	reg->type &= ~PTR_MAYBE_NULL;
 }
 
+static void mark_reg_datastructure_node(struct bpf_reg_state *regs, u32 regno,
+					struct btf_field_datastructure_head *ds_head)
+{
+	__mark_reg_known_zero(&regs[regno]);
+	regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC;
+	regs[regno].btf = ds_head->btf;
+	regs[regno].btf_id = ds_head->value_btf_id;
+	regs[regno].off = ds_head->node_offset;
+}
+
 static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)
 {
 	return type_is_pkt_pointer(reg->type);
@@ -4771,7 +4781,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 			return -EACCES;
 		}
 
-		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
+		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
+		    !cur_func(env)->in_callback_fn) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -6952,6 +6963,8 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *caller,
 			    struct bpf_func_state *callee, int insn_idx);
 
+static bool is_callback_calling_kfunc(u32 btf_id);
+
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx, int subprog,
 			     set_callee_state_fn set_callee_state_cb)
@@ -7006,10 +7019,18 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	 * interested in validating only BPF helpers that can call subprogs as
 	 * callbacks
 	 */
-	if (set_callee_state_cb != set_callee_state && !is_callback_calling_function(insn->imm)) {
-		verbose(env, "verifier bug: helper %s#%d is not marked as callback-calling\n",
-			func_id_name(insn->imm), insn->imm);
-		return -EFAULT;
+	if (set_callee_state_cb != set_callee_state) {
+		if (bpf_pseudo_kfunc_call(insn) &&
+		    !is_callback_calling_kfunc(insn->imm)) {
+			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		} else if (!bpf_pseudo_kfunc_call(insn) &&
+			   !is_callback_calling_function(insn->imm)) { /* helper */
+			verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		}
 	}
 
 	if (insn->code == (BPF_JMP | BPF_CALL) &&
@@ -7275,6 +7296,67 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
+					 struct bpf_func_state *caller,
+					 struct bpf_func_state *callee,
+					 int insn_idx)
+{
+	/* void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+	 *                     bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b));
+	 *
+	 * 'struct bpf_rb_node *node' arg to bpf_rbtree_add is the same PTR_TO_BTF_ID w/ offset
+	 * that 'less' callback args will be receiving. However, 'node' arg was release_reference'd
+	 * by this point, so look at 'root'
+	 */
+	struct btf_field *field;
+	struct btf_record *rec;
+
+	rec = reg_btf_record(&caller->regs[BPF_REG_1]);
+	if (!rec)
+		return -EFAULT;
+
+	field = btf_record_find(rec, caller->regs[BPF_REG_1].off, BPF_RB_ROOT);
+	if (!field || !field->datastructure_head.value_btf_id)
+		return -EFAULT;
+
+	mark_reg_datastructure_node(callee->regs, BPF_REG_1, &field->datastructure_head);
+	callee->regs[BPF_REG_1].type |= PTR_UNTRUSTED;
+	mark_reg_datastructure_node(callee->regs, BPF_REG_2, &field->datastructure_head);
+	callee->regs[BPF_REG_2].type |= PTR_UNTRUSTED;
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, 1);
+	return 0;
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id);
+
+/* Are we currently verifying the callback for a rbtree helper that must
+ * be called with lock held? If so, no need to complain about unreleased
+ * lock
+ */
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_insn *insn = env->prog->insnsi;
+	struct bpf_func_state *callee;
+	int kfunc_btf_id;
+
+	if (!state->curframe)
+		return false;
+
+	callee = state->frame[state->curframe];
+
+	if (!callee->in_callback_fn)
+		return false;
+
+	kfunc_btf_id = insn[callee->callsite].imm;
+	return is_rbtree_lock_required_kfunc(kfunc_btf_id);
+}
+
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -8007,6 +8089,7 @@ struct bpf_kfunc_call_arg_meta {
 	bool r0_rdonly;
 	u32 ret_btf_id;
 	u64 r0_size;
+	u32 subprogno;
 	struct {
 		u64 value;
 		bool found;
@@ -8185,6 +8268,18 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
 }
 
+static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
+				  const struct btf_param *arg)
+{
+	const struct btf_type *t;
+
+	t = btf_type_resolve_func_ptr(btf, arg->type, NULL);
+	if (!t)
+		return false;
+
+	return true;
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8244,6 +8339,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_CALLBACK,
 	KF_ARG_PTR_TO_RB_ROOT,
 	KF_ARG_PTR_TO_RB_NODE,
 };
@@ -8368,6 +8464,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		return KF_ARG_PTR_TO_BTF_ID;
 	}
 
+	if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_CALLBACK;
+
 	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
 		arg_mem_size = true;
 
@@ -8585,6 +8684,16 @@ static bool is_bpf_datastructure_api_kfunc(u32 btf_id)
 	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
 }
 
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id)
+{
+	return is_bpf_rbtree_api_kfunc(btf_id);
+}
+
 static bool check_kfunc_is_datastructure_head_api(struct bpf_verifier_env *env,
 						  enum btf_field_type head_field_type,
 						  u32 kfunc_btf_id)
@@ -8920,6 +9029,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
+		case KF_ARG_PTR_TO_CALLBACK:
 			/* Trusted by default */
 			break;
 		default:
@@ -9078,6 +9188,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			/* Skip next '__sz' argument */
 			i++;
 			break;
+		case KF_ARG_PTR_TO_CALLBACK:
+			meta->subprogno = reg->subprogno;
+			break;
 		}
 	}
 
@@ -9193,6 +9306,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_rbtree_add_callback_state);
+		if (err) {
+			verbose(env, "kfunc %s#%d failed callback verification\n",
+				func_name, func_id);
+			return err;
+		}
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
@@ -14023,7 +14146,8 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_lock.ptr) {
+				if (env->cur_state->active_lock.ptr &&
+				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (7 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  2:18   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
that require handling in the verifier:

  * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
    the bpf_rb_node field, with the offset set to that field's offset,
    instead of a struct bpf_rb_node *
    * Generalized existing next-gen list verifier handling for this
      as mark_reg_datastructure_node helper

  * Unlike other functions, which set release_on_unlock on one of their
    args, bpf_rbtree_first takes no arguments, rather setting
    release_on_unlock on its return value

  * bpf_rbtree_remove's node input is a node that's been inserted
    in the tree. Only non-owning references (PTR_UNTRUSTED +
    release_on_unlock) refer to such nodes, but kfuncs don't take
    PTR_UNTRUSTED args
    * Added special carveout for bpf_rbtree_remove to take PTR_UNTRUSTED
    * Since node input already has release_on_unlock set, don't set
      it again

This patch, along with the previous one, complete special verifier
handling for all rbtree API functions added in this series.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 89 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 73 insertions(+), 16 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9ad8c0b264dc..29983e2c27df 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6122,6 +6122,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static bool
+func_arg_reg_rb_node_offset(const struct bpf_reg_state *reg, s32 off)
+{
+	struct btf_record *rec;
+	struct btf_field *field;
+
+	rec = reg_btf_record(reg);
+	if (!rec)
+		return false;
+
+	field = btf_record_find(rec, off, BPF_RB_NODE);
+	if (!field)
+		return false;
+
+	return true;
+}
+
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
 			   enum bpf_arg_type arg_type)
@@ -6176,6 +6193,13 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		 */
 		fixed_off_ok = true;
 		break;
+	case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
+		/* Currently only bpf_rbtree_remove accepts a PTR_UNTRUSTED
+		 * bpf_rb_node. Fixed off of the node type is OK
+		 */
+		if (reg->off && func_arg_reg_rb_node_offset(reg, reg->off))
+			fixed_off_ok = true;
+		break;
 	default:
 		break;
 	}
@@ -8875,26 +8899,44 @@ __process_kf_arg_ptr_to_datastructure_node(struct bpf_verifier_env *env,
 			btf_name_by_offset(field->datastructure_head.btf, et->name_off));
 		return -EINVAL;
 	}
-	/* Set arg#1 for expiration after unlock */
-	return ref_set_release_on_unlock(env, reg->ref_obj_id);
+
+	return 0;
 }
 
 static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 					   struct bpf_reg_state *reg, u32 regno,
 					   struct bpf_kfunc_call_arg_meta *meta)
 {
-	return __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
-							  BPF_LIST_HEAD, BPF_LIST_NODE,
-							  &meta->arg_list_head.field);
+	int err;
+
+	err = __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
+							 BPF_LIST_HEAD, BPF_LIST_NODE,
+							 &meta->arg_list_head.field);
+	if (err)
+		return err;
+
+	return ref_set_release_on_unlock(env, reg->ref_obj_id);
 }
 
 static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
 					     struct bpf_reg_state *reg, u32 regno,
 					     struct bpf_kfunc_call_arg_meta *meta)
 {
-	return __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
-							  BPF_RB_ROOT, BPF_RB_NODE,
-							  &meta->arg_rbtree_root.field);
+	int err;
+
+	err = __process_kf_arg_ptr_to_datastructure_node(env, reg, regno, meta,
+							 BPF_RB_ROOT, BPF_RB_NODE,
+							 &meta->arg_rbtree_root.field);
+	if (err)
+		return err;
+
+	/* bpf_rbtree_remove's node parameter is a non-owning reference to
+	 * a bpf_rb_node, so release_on_unlock is already set
+	 */
+	if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove])
+		return 0;
+
+	return ref_set_release_on_unlock(env, reg->ref_obj_id);
 }
 
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
@@ -8902,7 +8944,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	const char *func_name = meta->func_name, *ref_tname;
 	const struct btf *btf = meta->btf;
 	const struct btf_param *args;
-	u32 i, nargs;
+	u32 i, nargs, check_type;
 	int ret;
 
 	args = (const struct btf_param *)(meta->func_proto + 1);
@@ -9141,7 +9183,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return ret;
 			break;
 		case KF_ARG_PTR_TO_RB_NODE:
-			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+			if (meta->btf == btf_vmlinux &&
+			    meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove])
+				check_type = (PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED);
+			else
+				check_type = (PTR_TO_BTF_ID | MEM_ALLOC);
+
+			if (reg->type != check_type) {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
 				return -EINVAL;
 			}
@@ -9380,11 +9428,14 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
 				struct btf_field *field = meta.arg_list_head.field;
 
-				mark_reg_known_zero(env, regs, BPF_REG_0);
-				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = field->datastructure_head.btf;
-				regs[BPF_REG_0].btf_id = field->datastructure_head.value_btf_id;
-				regs[BPF_REG_0].off = field->datastructure_head.node_offset;
+				mark_reg_datastructure_node(regs, BPF_REG_0,
+							    &field->datastructure_head);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+				struct btf_field *field = meta.arg_rbtree_root.field;
+
+				mark_reg_datastructure_node(regs, BPF_REG_0,
+							    &field->datastructure_head);
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
@@ -9450,6 +9501,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			if (is_kfunc_ret_null(&meta))
 				regs[BPF_REG_0].id = id;
 			regs[BPF_REG_0].ref_obj_id = id;
+
+			if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first])
+				ref_set_release_on_unlock(env, regs[BPF_REG_0].ref_obj_id);
 		}
 		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
 			regs[BPF_REG_0].id = ++env->id_gen;
@@ -11636,8 +11690,11 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 		 */
 		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0)))
 			return;
-		if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL) && WARN_ON_ONCE(reg->off))
+		if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL) &&
+		    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL | PTR_UNTRUSTED) &&
+		    WARN_ON_ONCE(reg->off)) {
 			return;
+		}
 		if (is_null) {
 			reg->type = SCALAR_VALUE;
 			/* We don't need id and ref_obj_id from this point
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (8 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-07  2:39   ` Alexei Starovoitov
  2022-12-06 23:09 ` [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Current comment in BPF_PROBE_MEM jit code claims that verifier prevents
insn->off < 0, but this appears to not be true irrespective of changes
in this series. Regardless, changes in this series will result in an
example like:

  struct example_node {
    long key;
    long val;
    struct bpf_rb_node node;
  }

  /* In BPF prog, assume root contains example_node nodes */
  struct bpf_rb_node res = bpf_rbtree_first(&root);
  if (!res)
    return 1;

  struct example_node n = container_of(res, struct example_node, node);
  long key = n->key;

Resulting in a load with off = -16, as bpf_rbtree_first's return is
modified by verifier to be PTR_TO_BTF_ID of example_node w/ offset =
offsetof(struct example_node, node), instead of PTR_TO_BTF_ID of
bpf_rb_node. So it's necessary to support negative insn->off when
jitting BPF_PROBE_MEM.

In order to ensure that page fault for a BPF_PROBE_MEM load of *src_reg +
insn->off is safely handled, we must confirm that *src_reg + insn->off is
in kernel's memory. Two runtime checks are emitted to confirm that:

  1) (*src_reg + insn->off) > boundary between user and kernel address
  spaces
  2) (*src_reg + insn->off) does not overflow to a small positive
  number. This might happen if some function meant to set src_reg
  returns ERR_PTR(-EINVAL) or similar.

Check 1 currently is sligtly off - it compares a

  u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);

to *src_reg, aborting the load if limit is larger. Rewriting this as an
inequality:

  *src_reg > TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off)
  *src_reg - abs(insn->off) > TASK_SIZE_MAX + PAGE_SIZE

shows that this isn't quite right even if insn->off is positive, as we
really want:

  *src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
  *src_reg > TASK_SIZE_MAX + PAGE_SIZE - insn_off

Since *src_reg + insn->off is the address we'll be loading from, not
*src_reg - insn->off or *src_reg - abs(insn->off). So change the
subtraction to an addition and remove the abs(), as comment indicates
that it was only added to ignore negative insn->off.

For Check 2, currently "does not overflow to a small positive number" is
confirmed by emitting an 'add insn->off, src_reg' instruction and
checking for carry flag. While this works fine for a positive insn->off,
a small negative insn->off like -16 is almost guaranteed to wrap over to
a small positive number when added to any kernel address.

This patch addresses this by not doing Check 2 at BPF prog runtime when
insn->off is negative, rather doing a stronger check at JIT-time. The
logic supporting this is as follows:

1) Assume insn->off is negative, call the largest such negative offset
   MAX_NEGATIVE_OFF. So insn->off >= MAX_NEGATIVE_OFF for all possible
   insn->off.

2) *src_reg + insn->off will not wrap over to an unexpected address by
   virtue of negative insn->off, but it might wrap under if
   -insn->off > *src_reg, as that implies *src_reg + insn->off < 0

3) Inequality (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
   must be true since insn->off is negative.

4) If we've completed check 1, we know that
   src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)

5) Combining statements 3 and 4, we know src_reg > (TASK_SIZE_MAX + PAGE_SIZE)

6) By statements 1, 4, and 5, if we can prove
   (TASK_SIZE_MAX + PAGE_SIZE) > -MAX_NEGATIVE_OFF, we'll know that
   (TASK_SIZE_MAX + PAGE_SIZE) > -insn->off for all possible insn->off
   values. We can rewrite this as (TASK_SIZE_MAX + PAGE_SIZE) +
   MAX_NEGATIVE_OFF > 0.

   Since src_reg > TASK_SIZE_MAX + PAGE_SIZE and MAX_NEGATIVE_OFF is
   negative, if the previous inequality is true,
   src_reg + MAX_NEGATIVE_OFF > 0 is also true for all src_reg values.
   Similarly, since insn->off >= MAX_NEGATIVE_OFF for all possible
   negative insn->off vals, src_reg + insn->off > 0 and there can be no
   wrapping under.

So proving (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 implies
*src_reg + insn->off > 0 for any src_reg that's passed check 1 and any
negative insn->off. Luckily the former inequality does not need to be
checked at runtime, and in fact could be a static_assert if
TASK_SIZE_MAX wasn't determined by a function when CONFIG_X86_5LEVEL
kconfig is used.

Regardless, we can just check (TASK_SIZE_MAX + PAGE_SIZE) +
MAX_NEGATIVE_OFF > 0 once per do_jit call instead of emitting a runtime
check. Given that insn->off is a s16 and is unlikely to grow larger,
this check should always succeed on any x86 processor made in the 21st
century. If it doesn't fail all do_jit calls and complain loudly with
the assumption that the BPF subsystem is misconfigured or has a bug.

A few instructions are saved for negative insn->offs as a result. Using
the struct example_node / off = -16 example from before, code looks
like:

BEFORE CHANGE
  72:   movabs $0x800000000010,%r11
  7c:   cmp    %r11,%rdi
  7f:   jb     0x000000000000008d         (check 1 on 7c and here)
  81:   mov    %rdi,%r11
  84:   add    $0xfffffffffffffff0,%r11   (check 2, will set carry for almost any r11, so bug for
  8b:   jae    0x0000000000000091          negative insn->off)
  8d:   xor    %edi,%edi                  (as a result long key = n->key; will be 0'd out here)
  8f:   jmp    0x0000000000000095
  91:   mov    -0x10(%rdi),%rdi
  95:

AFTER CHANGE:
  5a:   movabs $0x800000000010,%r11
  64:   cmp    %r11,%rdi
  67:   jae    0x000000000000006d     (check 1 on 64 and here, but now JNC instead of JC)
  69:   xor    %edi,%edi              (no check 2, 0 out if %rdi - %r11 < 0)
  6b:   jmp    0x0000000000000071
  6d:   mov    -0x10(%rdi),%rdi
  71:

We could do the same for insn->off == 0, but for now keep code
generation unchanged for previously working nonnegative insn->offs.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 arch/x86/net/bpf_jit_comp.c | 123 +++++++++++++++++++++++++++---------
 1 file changed, 92 insertions(+), 31 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 36ffe67ad6e5..843f619d0d35 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -11,6 +11,7 @@
 #include <linux/bpf.h>
 #include <linux/memory.h>
 #include <linux/sort.h>
+#include <linux/limits.h>
 #include <asm/extable.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
@@ -94,6 +95,7 @@ static int bpf_size_to_x86_bytes(int bpf_size)
  */
 #define X86_JB  0x72
 #define X86_JAE 0x73
+#define X86_JNC 0x73
 #define X86_JE  0x74
 #define X86_JNE 0x75
 #define X86_JBE 0x76
@@ -950,6 +952,36 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
 	*pprog = prog;
 }
 
+/* Check that condition necessary for PROBE_MEM handling for insn->off < 0
+ * holds.
+ *
+ * This could be a static_assert((TASK_SIZE_MAX + PAGE_SIZE) > -S16_MIN),
+ * but TASK_SIZE_MAX can't always be evaluated at compile time, so let's not
+ * assume insn->off size either
+ */
+static int check_probe_mem_task_size_overflow(void)
+{
+	struct bpf_insn insn;
+	s64 max_negative;
+
+	switch (sizeof(insn.off)) {
+	case 2:
+		max_negative = S16_MIN;
+		break;
+	default:
+		pr_err("bpf_jit_error: unexpected bpf_insn->off size\n");
+		return -EFAULT;
+	}
+
+	if (!((TASK_SIZE_MAX + PAGE_SIZE) > -max_negative)) {
+		pr_err("bpf jit error: assumption does not hold:\n");
+		pr_err("\t(TASK_SIZE_MAX + PAGE_SIZE) + (max negative insn->off) > 0\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
@@ -967,6 +999,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 	u8 *prog = temp;
 	int err;
 
+	err = check_probe_mem_task_size_overflow();
+	if (err)
+		return err;
+
 	detect_reg_usage(insn, insn_cnt, callee_regs_used,
 			 &tail_call_seen);
 
@@ -1359,20 +1395,30 @@ st:			if (is_imm8(insn->off))
 		case BPF_LDX | BPF_MEM | BPF_DW:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
-				/* Though the verifier prevents negative insn->off in BPF_PROBE_MEM
-				 * add abs(insn->off) to the limit to make sure that negative
-				 * offset won't be an issue.
-				 * insn->off is s16, so it won't affect valid pointers.
-				 */
-				u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);
-				u8 *end_of_jmp1, *end_of_jmp2;
-
 				/* Conservatively check that src_reg + insn->off is a kernel address:
-				 * 1. src_reg + insn->off >= limit
-				 * 2. src_reg + insn->off doesn't become small positive.
-				 * Cannot do src_reg + insn->off >= limit in one branch,
-				 * since it needs two spare registers, but JIT has only one.
+				 * 1. src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE
+				 * 2. src_reg + insn->off doesn't overflow and become small positive
+				 *
+				 * For check 1, to save regs, do
+				 * src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off) call rhs
+				 * of inequality 'limit'
+				 *
+				 * For check 2:
+				 * If insn->off is positive, add src_reg + insn->off and check
+				 * overflow directly
+				 * If insn->off is negative, we know that
+				 *   (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
+				 * and from check 1 we know
+				 *   src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)
+				 * So if (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 we can
+				 * be sure that src_reg + insn->off won't overflow in either
+				 * direction and avoid runtime check entirely.
+				 *
+				 * check_probe_mem_task_size_overflow confirms the above assumption
+				 * at the beginning of this function
 				 */
+				u64 limit = TASK_SIZE_MAX + PAGE_SIZE - insn->off;
+				u8 *end_of_jmp1, *end_of_jmp2;
 
 				/* movabsq r11, limit */
 				EMIT2(add_1mod(0x48, AUX_REG), add_1reg(0xB8, AUX_REG));
@@ -1381,32 +1427,47 @@ st:			if (is_imm8(insn->off))
 				/* cmp src_reg, r11 */
 				maybe_emit_mod(&prog, src_reg, AUX_REG, true);
 				EMIT2(0x39, add_2reg(0xC0, src_reg, AUX_REG));
-				/* if unsigned '<' goto end_of_jmp2 */
-				EMIT2(X86_JB, 0);
-				end_of_jmp1 = prog;
-
-				/* mov r11, src_reg */
-				emit_mov_reg(&prog, true, AUX_REG, src_reg);
-				/* add r11, insn->off */
-				maybe_emit_1mod(&prog, AUX_REG, true);
-				EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
-				/* jmp if not carry to start_of_ldx
-				 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
-				 * that has to be rejected.
-				 */
-				EMIT2(0x73 /* JNC */, 0);
-				end_of_jmp2 = prog;
+				if (insn->off >= 0) {
+					/* cmp src_reg, r11 */
+					/* if unsigned '<' goto end_of_jmp2 */
+					EMIT2(X86_JB, 0);
+					end_of_jmp1 = prog;
+
+					/* mov r11, src_reg */
+					emit_mov_reg(&prog, true, AUX_REG, src_reg);
+					/* add r11, insn->off */
+					maybe_emit_1mod(&prog, AUX_REG, true);
+					EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
+					/* jmp if not carry to start_of_ldx
+					 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
+					 * that has to be rejected.
+					 */
+					EMIT2(X86_JNC, 0);
+					end_of_jmp2 = prog;
+				} else {
+					/* cmp src_reg, r11 */
+					/* if unsigned '>=' goto start_of_ldx
+					 * w/o needing to do check 2
+					 */
+					EMIT2(X86_JAE, 0);
+					end_of_jmp1 = prog;
+				}
 
 				/* xor dst_reg, dst_reg */
 				emit_mov_imm32(&prog, false, dst_reg, 0);
 				/* jmp byte_after_ldx */
 				EMIT2(0xEB, 0);
 
-				/* populate jmp_offset for JB above to jump to xor dst_reg */
-				end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
-				/* populate jmp_offset for JNC above to jump to start_of_ldx */
 				start_of_ldx = prog;
-				end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
+				if (insn->off >= 0) {
+					/* populate jmp_offset for JB above to jump to xor dst_reg */
+					end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
+					/* populate jmp_offset for JNC above to jump to start_of_ldx */
+					end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
+				} else {
+					/* populate jmp_offset for JAE above to jump to start_of_ldx */
+					end_of_jmp1[-1] = start_of_ldx - end_of_jmp1;
+				}
 			}
 			emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (9 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-06 23:09 ` [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 424f7bbbfe9b..dbd2c729781a 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -65,4 +65,28 @@ extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ks
  */
 extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
 
+/* Description
+ *	Remove 'node' from rbtree with root 'root'
+ * Returns
+ * 	Pointer to the removed node, or NULL if 'root' didn't contain 'node'
+ */
+extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
+					     struct bpf_rb_node *node) __ksym;
+
+/* Description
+ *	Add 'node' to rbtree with root 'root' using comparator 'less'
+ * Returns
+ *	Nothing
+ */
+extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+			   bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) __ksym;
+
+/* Description
+ *	Return the first (leftmost) node in input tree
+ * Returns
+ *	Pointer to the node, which is _not_ removed from the tree. If the tree
+ *	contains no nodes, returns NULL.
+ */
+extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (10 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
@ 2022-12-06 23:09 ` Dave Marchevsky
  2022-12-06 23:10 ` [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:09 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

If a BPF program defines a struct or union type which has a field type
that the verifier considers special - spin_lock, next-gen datastructure
heads and nodes - the verifier needs to be able to find fields of that
type using BTF.

For such a program, BTF is required, so modify kernel_needs_btf helper
to ensure that correct "BTF is mandatory" error message is emitted.

The newly-added btf_has_alloc_obj_type looks for BTF_KIND_STRUCTs with a
name corresponding to a special type. If any such struct is found it is
assumed that some variable is using it, and therefore that successful
BTF load is necessary.

Also add a kernel_needs_btf check to bpf_object__create_map where it was
previously missing. When this function calls bpf_map_create, kernel may
reject map creation due to mismatched datastructure owner and ownee
types (e.g. a struct bpf_list_head with __contains tag pointing to
bpf_rbtree_node field). In such a scenario - or any other where BTF is
necessary for verification - bpf_map_create should not be retried
without BTF.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/lib/bpf/libbpf.c | 50 ++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2a82f49ce16f..56a905b502c9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -998,6 +998,31 @@ find_struct_ops_kern_types(const struct btf *btf, const char *tname,
 	return 0;
 }
 
+/* Should match alloc_obj_fields in kernel/bpf/btf.c
+ */
+static const char *alloc_obj_fields[] = {
+	"bpf_spin_lock",
+	"bpf_list_head",
+	"bpf_list_node",
+	"bpf_rb_root",
+	"bpf_rb_node",
+};
+
+static bool
+btf_has_alloc_obj_type(const struct btf *btf)
+{
+	const char *tname;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(alloc_obj_fields); i++) {
+		tname = alloc_obj_fields[i];
+		if (btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT) > 0)
+			return true;
+	}
+
+	return false;
+}
+
 static bool bpf_map__is_struct_ops(const struct bpf_map *map)
 {
 	return map->def.type == BPF_MAP_TYPE_STRUCT_OPS;
@@ -2794,7 +2819,8 @@ static bool libbpf_needs_btf(const struct bpf_object *obj)
 
 static bool kernel_needs_btf(const struct bpf_object *obj)
 {
-	return obj->efile.st_ops_shndx >= 0;
+	return obj->efile.st_ops_shndx >= 0 ||
+		(obj->btf && btf_has_alloc_obj_type(obj->btf));
 }
 
 static int bpf_object__init_btf(struct bpf_object *obj,
@@ -5103,16 +5129,18 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 
 		err = -errno;
 		cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
-		pr_warn("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
-			map->name, cp, err);
-		create_attr.btf_fd = 0;
-		create_attr.btf_key_type_id = 0;
-		create_attr.btf_value_type_id = 0;
-		map->btf_key_type_id = 0;
-		map->btf_value_type_id = 0;
-		map->fd = bpf_map_create(def->type, map_name,
-					 def->key_size, def->value_size,
-					 def->max_entries, &create_attr);
+		pr_warn("Error in bpf_create_map_xattr(%s):%s(%d).\n", map->name, cp, err);
+		if (!kernel_needs_btf(obj)) {
+			pr_warn("Retrying bpf_map_create_xattr(%s) without BTF.\n", map->name);
+			create_attr.btf_fd = 0;
+			create_attr.btf_key_type_id = 0;
+			create_attr.btf_value_type_id = 0;
+			map->btf_key_type_id = 0;
+			map->btf_value_type_id = 0;
+			map->fd = bpf_map_create(def->type, map_name,
+						 def->key_size, def->value_size,
+						 def->max_entries, &create_attr);
+		}
 	}
 
 	err = map->fd < 0 ? -errno : 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (11 preceding siblings ...)
  2022-12-06 23:09 ` [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
@ 2022-12-06 23:10 ` Dave Marchevsky
  2022-12-07  2:50 ` [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure patchwork-bot+netdevbpf
  2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
  14 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-06 23:10 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds selftests exercising the logic changed/added in the
previous patches in the series. A variety of successful and unsuccessful
rbtree usages are validated:

Success:
  * Add some nodes, let map_value bpf_rbtree_root destructor clean them
    up
  * Add some nodes, remove one using the release_on_unlock ref leftover
    by successful rbtree_add() call
  * Add some nodes, remove one using the release_on_unlock ref returned
    from rbtree_first() call

Failure:
  * BTF where bpf_rb_root owns bpf_list_node should fail to load
  * BTF where node of type X is added to tree containing nodes of type Y
    should fail to load
  * No calling rbtree api functions in 'less' callback for rbtree_add
  * No releasing lock in 'less' callback for rbtree_add
  * No removing a node which hasn't been added to any tree
  * No adding a node which has already been added to a tree
  * No escaping of release_on_unlock references past their lock's
    critical section

These tests mostly focus on rbtree-specific additions, but some of the
Failure cases revalidate scenarios common to both linked_list and rbtree
which are covered in the former's tests. Better to be a bit redundant in
case linked_list and rbtree semantics deviate over time.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/prog_tests/rbtree.c | 184 ++++++++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 180 ++++++++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  48 ++++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  21 ++
 .../testing/selftests/bpf/progs/rbtree_fail.c | 263 ++++++++++++++++++
 5 files changed, 696 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree.c b/tools/testing/selftests/bpf/prog_tests/rbtree.c
new file mode 100644
index 000000000000..688ce56d8b92
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "rbtree.skel.h"
+#include "rbtree_fail.skel.h"
+#include "rbtree_btf_fail__wrong_node_type.skel.h"
+#include "rbtree_btf_fail__add_wrong_type.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} rbtree_fail_tests[] = {
+	{"rbtree_api_nolock_add", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_remove", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_first", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+
+	/* Specific failure string for these three isn't very important, but it shouldn't be
+	 * possible to call rbtree api func from within add() callback
+	 */
+	{"rbtree_api_add_bad_cb_bad_fn_call_add", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_remove", "allocated object must be referenced"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_first", "Unreleased reference id=4 alloc_insn=26"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_first_unlock_after",
+	  "failed to release release_on_unlock reference"},
+
+	{"rbtree_api_remove_unadded_node", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_add_to_multiple_trees", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_add_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_first_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_remove_no_drop", "Unreleased reference id=4 alloc_insn=10"},
+};
+
+static void test_rbtree_fail_prog(const char *prog_name, const char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts,
+		    .kernel_log_buf = log_buf,
+		    .kernel_log_size = sizeof(log_buf),
+		    .kernel_log_level = 1
+	);
+	struct rbtree_fail *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = rbtree_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "rbtree_fail__open_opts"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto end;
+
+	bpf_program__set_autoload(prog, true);
+
+	ret = rbtree_fail__load(skel);
+	if (!ASSERT_ERR(ret, "rbtree_fail__load must fail"))
+		goto end;
+
+	if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+		fprintf(stderr, "Expected: %s\n", err_msg);
+		fprintf(stderr, "Verifier: %s\n", log_buf);
+	}
+
+end:
+	rbtree_fail__destroy(skel);
+}
+
+static void test_rbtree_add_nodes(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_nodes), &opts);
+	ASSERT_OK(ret, "rbtree_add_nodes run");
+	ASSERT_OK(opts.retval, "rbtree_add_nodes retval");
+	ASSERT_EQ(skel->data->less_callback_ran, 1, "rbtree_add_nodes less_callback_ran");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_add_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_add_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_add_and_remove retval");
+	ASSERT_EQ(skel->data->removed_key, 5, "rbtree_add_and_remove first removed key");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_first_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_first_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_first_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_first_and_remove retval");
+	ASSERT_EQ(skel->data->first_data[0], 2, "rbtree_first_and_remove first rbtree_first()");
+	ASSERT_EQ(skel->data->removed_key, 1, "rbtree_first_and_remove first removed key");
+	ASSERT_EQ(skel->data->first_data[1], 4, "rbtree_first_and_remove second rbtree_first()");
+
+	rbtree__destroy(skel);
+}
+
+void test_rbtree_success(void)
+{
+	if (test__start_subtest("rbtree_add_nodes"))
+		test_rbtree_add_nodes();
+	if (test__start_subtest("rbtree_add_and_remove"))
+		test_rbtree_add_and_remove();
+	if (test__start_subtest("rbtree_first_and_remove"))
+		test_rbtree_first_and_remove();
+}
+
+#define BTF_FAIL_TEST(suffix)									\
+void test_rbtree_btf_fail__##suffix(void)							\
+{												\
+	struct rbtree_btf_fail__##suffix *skel;							\
+												\
+	skel = rbtree_btf_fail__##suffix##__open_and_load();					\
+	if (!ASSERT_ERR_PTR(skel,								\
+			    "rbtree_btf_fail__" #suffix "__open_and_load unexpected success"))	\
+		rbtree_btf_fail__##suffix##__destroy(skel);					\
+}
+
+#define RUN_BTF_FAIL_TEST(suffix)				\
+	if (test__start_subtest("rbtree_btf_fail__" #suffix))	\
+		test_rbtree_btf_fail__##suffix();
+
+BTF_FAIL_TEST(wrong_node_type);
+BTF_FAIL_TEST(add_wrong_type);
+
+void test_rbtree_btf_fail(void)
+{
+	RUN_BTF_FAIL_TEST(wrong_node_type);
+	RUN_BTF_FAIL_TEST(add_wrong_type);
+}
+
+void test_rbtree_fail(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rbtree_fail_tests); i++) {
+		if (!test__start_subtest(rbtree_fail_tests[i].prog_name))
+			continue;
+		test_rbtree_fail_prog(rbtree_fail_tests[i].prog_name,
+				      rbtree_fail_tests[i].err_msg);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree.c b/tools/testing/selftests/bpf/progs/rbtree.c
new file mode 100644
index 000000000000..96a9d732e3fe
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+long less_callback_ran = -1;
+long removed_key = -1;
+long first_data[2] = {-1, -1};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	less_callback_ran = 1;
+
+	return node_a->key < node_b->key;
+}
+
+static long __add_three(struct bpf_rb_root *root, struct bpf_spin_lock *lock)
+{
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 2;
+	}
+	m->key = 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_spin_unlock(&glock);
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 3;
+	n->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("tc")
+long rbtree_add_nodes(void *ctx)
+{
+	return __add_three(&groot, &glock);
+}
+
+SEC("tc")
+long rbtree_add_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		goto err_out;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	res = bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+
+	if (!res)
+		return 1;
+	n = container_of(res, struct node_data, node);
+	removed_key = n->key;
+
+	bpf_obj_drop(n);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+SEC("tc")
+long rbtree_first_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m, *o;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 3;
+	n->data = 4;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 5;
+	m->data = 6;
+
+	o = bpf_obj_new(typeof(*o));
+	if (!o)
+		goto err_out;
+	o->key = 1;
+	o->data = 2;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_rbtree_add(&groot, &o->node, less);
+
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 2;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[0] = o->data;
+
+	res = bpf_rbtree_remove(&groot, &o->node);
+	bpf_spin_unlock(&glock);
+
+	if (!res)
+		return 1;
+	o = container_of(res, struct node_data, node);
+	removed_key = o->key;
+
+	bpf_obj_drop(o);
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 3;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[1] = o->data;
+	bpf_spin_unlock(&glock);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
new file mode 100644
index 000000000000..1729712722ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	int key;
+	int data;
+	struct bpf_rb_node node;
+};
+
+struct node_data2 {
+	int key;
+	struct bpf_rb_node node;
+	int data;
+};
+
+static bool less2(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data2 *node_a;
+	struct node_data2 *node_b;
+
+	node_a = container_of(a, struct node_data2, node);
+	node_b = container_of(b, struct node_data2, node);
+
+	return node_a->key < node_b->key;
+}
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+SEC("tc")
+long rbtree_api_nolock_add(void *ctx)
+{
+	struct node_data2 *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_rbtree_add(&groot, &n->node, less2);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
new file mode 100644
index 000000000000..df0efb46177c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+/* BTF load should fail as bpf_rb_root __contains this type and points to
+ * 'node', but 'node' is not a bpf_rb_node
+ */
+struct node_data {
+	int key;
+	int data;
+	struct bpf_list_node node;
+};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
diff --git a/tools/testing/selftests/bpf/progs/rbtree_fail.c b/tools/testing/selftests/bpf/progs/rbtree_fail.c
new file mode 100644
index 000000000000..96caa7f33805
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_fail.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+private(A) struct bpf_rb_root groot2 __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->key < node_b->key;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_add(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_rbtree_add(&groot, &n->node, less);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_remove(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_rbtree_remove(&groot, &n->node);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_first(void *ctx)
+{
+	bpf_rbtree_first(&groot);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_unadded_node(void *ctx)
+{
+	struct node_data *n, *m;
+	struct bpf_rb_node *res;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 1;
+	}
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This remove should pass verifier */
+	res = bpf_rbtree_remove(&groot, &n->node);
+	if (res)
+		n = container_of(res, struct node_data, node);
+
+	/* This remove shouldn't, m isn't in an rbtree */
+	res = bpf_rbtree_remove(&groot, &m->node);
+	if (res)
+		m = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_no_drop(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		goto unlock_err;
+
+	res = bpf_rbtree_remove(&groot, res);
+	if (!res)
+		goto unlock_err;
+
+	n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	/* bpf_obj_drop(n) is missing here */
+	return 0;
+
+unlock_err:
+	bpf_spin_unlock(&glock);
+	return 1;
+}
+
+SEC("?tc")
+long rbtree_api_add_to_multiple_trees(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This add should fail since n already in groot's tree */
+	bpf_rbtree_add(&groot2, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_add_release_unlock_escape(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After add() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_first_release_unlock_escape(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (res)
+		n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After first() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+static bool less__bad_fn_call_add(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_add(&groot, &node_a->node, less);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_remove(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_remove(&groot, &node_a->node);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_first(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_first(&groot);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_first_unlock_after(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_first(&groot);
+	bpf_spin_unlock(&glock);
+
+	return node_a->key < node_b->key;
+}
+
+#define RBTREE_API_ADD_BAD_CB(cb_suffix)				\
+SEC("?tc")								\
+long rbtree_api_add_bad_cb_##cb_suffix(void *ctx)			\
+{									\
+	struct node_data *n;						\
+									\
+	n = bpf_obj_new(typeof(*n));					\
+	if (!n)								\
+		return 1;						\
+									\
+	bpf_spin_lock(&glock);						\
+	bpf_rbtree_add(&groot, &n->node, less__##cb_suffix);		\
+	bpf_spin_unlock(&glock);					\
+	return 0;							\
+}
+
+RBTREE_API_ADD_BAD_CB(bad_fn_call_add);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_remove);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_first);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_first_unlock_after);
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails
  2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
@ 2022-12-07  1:32   ` Alexei Starovoitov
  2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  1:32 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:49PM -0800, Dave Marchevsky wrote:
> map_check_btf calls btf_parse_fields to create a btf_record for its
> value_type. If there are no special fields in the value_type
> btf_parse_fields returns NULL, whereas if there special value_type
> fields but they are invalid in some way an error is returned.
> 
> An example invalid state would be:
> 
>   struct node_data {
>     struct bpf_rb_node node;
>     int data;
>   };
> 
>   private(A) struct bpf_spin_lock glock;
>   private(A) struct bpf_list_head ghead __contains(node_data, node);
> 
> groot should be invalid as its __contains tag points to a field with

s/groot/ghead/ ?

> type != "bpf_list_node".
> 
> Before this patch, such a scenario would result in btf_parse_fields
> returning an error ptr, subsequent !IS_ERR_OR_NULL check failing,
> and btf_check_and_fixup_fields returning 0, which would then be
> returned by map_check_btf.
> 
> After this patch's changes, -EINVAL would be returned by map_check_btf
> and the map would correctly fail to load.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Fixes: aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
> ---
>  kernel/bpf/syscall.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..c3599a7902f0 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1007,7 +1007,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  	map->record = btf_parse_fields(btf, value_type,
>  				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
>  				       map->value_size);
> -	if (!IS_ERR_OR_NULL(map->record)) {
> +	if (IS_ERR(map->record))
> +		return -EINVAL;
> +
> +	if (map->record) {
>  		int i;
>  
>  		if (!bpf_capable()) {
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types
  2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
@ 2022-12-07  1:41   ` Alexei Starovoitov
  2022-12-07 18:52     ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  1:41 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:51PM -0800, Dave Marchevsky wrote:
> Many of the structs recently added to track field info for linked-list
> head are useful as-is for rbtree root. So let's do a mechanical renaming
> of list_head-related types and fields:
> 
> include/linux/bpf.h:
>   struct btf_field_list_head -> struct btf_field_datastructure_head
>   list_head -> datastructure_head in struct btf_field union
> kernel/bpf/btf.c:
>   list_head -> datastructure_head in struct btf_field_info

Looking through this patch and others it eventually becomes
confusing with 'datastructure head' name.
I'm not sure what is 'head' of the data structure.
There is head in the link list, but 'head of tree' is odd.

The attemp here is to find a common name that represents programming
concept where there is a 'root' and there are 'nodes' that added to that 'root'.
The 'data structure' name is too broad in that sense.
Especially later it becomes 'datastructure_api' which is even broader.

I was thinking to propose:
 struct btf_field_list_head -> struct btf_field_tree_root
 list_head -> tree_root in struct btf_field union

and is_kfunc_tree_api later...
since link list is a tree too.

But reading 'tree' next to other names like 'field', 'kfunc'
it might be mistaken that 'tree' applies to the former.
So I think using 'graph' as more general concept to describe both
link list and rb-tree would be the best.

So the proposal:
 struct btf_field_list_head -> struct btf_field_graph_root
 list_head -> graph_root in struct btf_field union

and is_kfunc_graph_api later...

'graph' is short enough and rarely used in names,
so it stands on its own next to 'field' and in combination
with other names.
wdyt?

> 
> This is a nonfunctional change, functionality to actually use these
> fields for rbtree will be added in further patches.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h   |  4 ++--
>  kernel/bpf/btf.c      | 21 +++++++++++----------
>  kernel/bpf/helpers.c  |  4 ++--
>  kernel/bpf/verifier.c | 21 +++++++++++----------
>  4 files changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 4920ac252754..9e8b12c7061e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -189,7 +189,7 @@ struct btf_field_kptr {
>  	u32 btf_id;
>  };
>  
> -struct btf_field_list_head {
> +struct btf_field_datastructure_head {
>  	struct btf *btf;
>  	u32 value_btf_id;
>  	u32 node_offset;
> @@ -201,7 +201,7 @@ struct btf_field {
>  	enum btf_field_type type;
>  	union {
>  		struct btf_field_kptr kptr;
> -		struct btf_field_list_head list_head;
> +		struct btf_field_datastructure_head datastructure_head;
>  	};
>  };
>  
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index c80bd8709e69..284e3e4b76b7 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3227,7 +3227,7 @@ struct btf_field_info {
>  		struct {
>  			const char *node_name;
>  			u32 value_btf_id;
> -		} list_head;
> +		} datastructure_head;
>  	};
>  };
>  
> @@ -3334,8 +3334,8 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
>  		return -EINVAL;
>  	info->type = BPF_LIST_HEAD;
>  	info->off = off;
> -	info->list_head.value_btf_id = id;
> -	info->list_head.node_name = list_node;
> +	info->datastructure_head.value_btf_id = id;
> +	info->datastructure_head.node_name = list_node;
>  	return BTF_FIELD_FOUND;
>  }
>  
> @@ -3603,13 +3603,14 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
>  	u32 offset;
>  	int i;
>  
> -	t = btf_type_by_id(btf, info->list_head.value_btf_id);
> +	t = btf_type_by_id(btf, info->datastructure_head.value_btf_id);
>  	/* We've already checked that value_btf_id is a struct type. We
>  	 * just need to figure out the offset of the list_node, and
>  	 * verify its type.
>  	 */
>  	for_each_member(i, t, member) {
> -		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
> +		if (strcmp(info->datastructure_head.node_name,
> +			   __btf_name_by_offset(btf, member->name_off)))
>  			continue;
>  		/* Invalid BTF, two members with same name */
>  		if (n)
> @@ -3626,9 +3627,9 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
>  		if (offset % __alignof__(struct bpf_list_node))
>  			return -EINVAL;
>  
> -		field->list_head.btf = (struct btf *)btf;
> -		field->list_head.value_btf_id = info->list_head.value_btf_id;
> -		field->list_head.node_offset = offset;
> +		field->datastructure_head.btf = (struct btf *)btf;
> +		field->datastructure_head.value_btf_id = info->datastructure_head.value_btf_id;
> +		field->datastructure_head.node_offset = offset;
>  	}
>  	if (!n)
>  		return -ENOENT;
> @@ -3735,11 +3736,11 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>  
>  		if (!(rec->fields[i].type & BPF_LIST_HEAD))
>  			continue;
> -		btf_id = rec->fields[i].list_head.value_btf_id;
> +		btf_id = rec->fields[i].datastructure_head.value_btf_id;
>  		meta = btf_find_struct_meta(btf, btf_id);
>  		if (!meta)
>  			return -EFAULT;
> -		rec->fields[i].list_head.value_rec = meta->record;
> +		rec->fields[i].datastructure_head.value_rec = meta->record;
>  
>  		if (!(rec->field_mask & BPF_LIST_NODE))
>  			continue;
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index cca642358e80..6c67740222c2 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1737,12 +1737,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  	while (head != orig_head) {
>  		void *obj = head;
>  
> -		obj -= field->list_head.node_offset;
> +		obj -= field->datastructure_head.node_offset;
>  		head = head->next;
>  		/* The contained type can also have resources, including a
>  		 * bpf_list_head which needs to be freed.
>  		 */
> -		bpf_obj_free_fields(field->list_head.value_rec, obj);
> +		bpf_obj_free_fields(field->datastructure_head.value_rec, obj);
>  		/* bpf_mem_free requires migrate_disable(), since we can be
>  		 * called from map free path as well apart from BPF program (as
>  		 * part of map ops doing bpf_obj_free_fields).
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6f0aac837d77..bc80b4c4377b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8615,21 +8615,22 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  
>  	field = meta->arg_list_head.field;
>  
> -	et = btf_type_by_id(field->list_head.btf, field->list_head.value_btf_id);
> +	et = btf_type_by_id(field->datastructure_head.btf, field->datastructure_head.value_btf_id);
>  	t = btf_type_by_id(reg->btf, reg->btf_id);
> -	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
> -				  field->list_head.value_btf_id, true)) {
> +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->datastructure_head.btf,
> +				  field->datastructure_head.value_btf_id, true)) {
>  		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
>  			"in struct %s, but arg is at offset=%d in struct %s\n",
> -			field->list_head.node_offset, btf_name_by_offset(field->list_head.btf, et->name_off),
> +			field->datastructure_head.node_offset,
> +			btf_name_by_offset(field->datastructure_head.btf, et->name_off),
>  			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
>  		return -EINVAL;
>  	}
>  
> -	if (list_node_off != field->list_head.node_offset) {
> +	if (list_node_off != field->datastructure_head.node_offset) {
>  		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
> -			list_node_off, field->list_head.node_offset,
> -			btf_name_by_offset(field->list_head.btf, et->name_off));
> +			list_node_off, field->datastructure_head.node_offset,
> +			btf_name_by_offset(field->datastructure_head.btf, et->name_off));
>  		return -EINVAL;
>  	}
>  	/* Set arg#1 for expiration after unlock */
> @@ -9078,9 +9079,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  
>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> -				regs[BPF_REG_0].btf = field->list_head.btf;
> -				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
> -				regs[BPF_REG_0].off = field->list_head.node_offset;
> +				regs[BPF_REG_0].btf = field->datastructure_head.btf;
> +				regs[BPF_REG_0].btf_id = field->datastructure_head.value_btf_id;
> +				regs[BPF_REG_0].off = field->datastructure_head.node_offset;
>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support
  2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
@ 2022-12-07  1:48   ` Alexei Starovoitov
  0 siblings, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  1:48 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:52PM -0800, Dave Marchevsky wrote:
>  
> +#define OWNER_FIELD_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
> +#define OWNEE_FIELD_MASK (BPF_LIST_NODE | BPF_RB_NODE)

One letter difference makes it so hard to review.
How about
GRAPH_ROOT_MASK
GRAPH_NODE_MASK
?

> +
>  int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>  {
>  	int i;
>  
> -	/* There are two owning types, kptr_ref and bpf_list_head. The former
> -	 * only supports storing kernel types, which can never store references
> -	 * to program allocated local types, atleast not yet. Hence we only need
> -	 * to ensure that bpf_list_head ownership does not form cycles.
> +	/* There are three types that signify ownership of some other type:
> +	 *  kptr_ref, bpf_list_head, bpf_rb_root.
> +	 * kptr_ref only supports storing kernel types, which can't store
> +	 * references to program allocated local types.
> +	 *
> +	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
> +	 * does not form cycles.
>  	 */
> -	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
> +	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & OWNER_FIELD_MASK))
>  		return 0;
>  	for (i = 0; i < rec->cnt; i++) {
>  		struct btf_struct_meta *meta;
>  		u32 btf_id;
>  
> -		if (!(rec->fields[i].type & BPF_LIST_HEAD))
> +		if (!(rec->fields[i].type & OWNER_FIELD_MASK))
>  			continue;
>  		btf_id = rec->fields[i].datastructure_head.value_btf_id;
>  		meta = btf_find_struct_meta(btf, btf_id);
> @@ -3742,39 +3783,47 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>  			return -EFAULT;
>  		rec->fields[i].datastructure_head.value_rec = meta->record;
>  
> -		if (!(rec->field_mask & BPF_LIST_NODE))
> +		/* We need to set value_rec for all owner types, but no need
> +		 * to check ownership cycle for a type unless it's also an
> +		 * ownee type.
> +		 */
> +		if (!(rec->field_mask & OWNEE_FIELD_MASK))
>  			continue;
>  
>  		/* We need to ensure ownership acyclicity among all types. The
>  		 * proper way to do it would be to topologically sort all BTF
>  		 * IDs based on the ownership edges, since there can be multiple
> -		 * bpf_list_head in a type. Instead, we use the following
> -		 * reasoning:
> +		 * bpf_{list_head,rb_node} in a type. Instead, we use the
> +		 * following resaoning:
>  		 *
>  		 * - A type can only be owned by another type in user BTF if it
> -		 *   has a bpf_list_node.
> +		 *   has a bpf_{list,rb}_node. Let's call these ownee types.
>  		 * - A type can only _own_ another type in user BTF if it has a
> -		 *   bpf_list_head.
> +		 *   bpf_{list_head,rb_root}. Let's call these owner types.
>  		 *
> -		 * We ensure that if a type has both bpf_list_head and
> -		 * bpf_list_node, its element types cannot be owning types.
> +		 * We ensure that if a type is both an owner and ownee, its
> +		 * element types cannot be owner types.
>  		 *
>  		 * To ensure acyclicity:
>  		 *
> -		 * When A only has bpf_list_head, ownership chain can be:
> +		 * When A is an owner type but not an ownee, its ownership

and that would become:
When A is a root type, but not a node type...

reads easier.

> +		 * chain can be:
>  		 *	A -> B -> C
>  		 * Where:
> -		 * - B has both bpf_list_head and bpf_list_node.
> -		 * - C only has bpf_list_node.
> +		 * - A is an owner, e.g. has bpf_rb_root.
> +		 * - B is both an owner and ownee, e.g. has bpf_rb_node and
> +		 *   bpf_list_head.
> +		 * - C is only an owner, e.g. has bpf_list_node
>  		 *
> -		 * When A has both bpf_list_head and bpf_list_node, some other
> -		 * type already owns it in the BTF domain, hence it can not own
> -		 * another owning type through any of the bpf_list_head edges.
> +		 * When A is both an owner and ownee, some other type already
> +		 * owns it in the BTF domain, hence it can not own
> +		 * another owner type through any of the ownership edges.
>  		 *	A -> B
>  		 * Where:
> -		 * - B only has bpf_list_node.
> +		 * - A is both an owner and ownee.
> +		 * - B is only an ownee.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
@ 2022-12-07  1:51   ` Alexei Starovoitov
  0 siblings, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  1:51 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:54PM -0800, Dave Marchevsky wrote:
>  
> -static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
> +static bool is_bpf_rbtree_api_kfunc(u32 btf_id)
> +{
> +	return btf_id == special_kfunc_list[KF_bpf_rbtree_add] ||
> +	       btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> +	       btf_id == special_kfunc_list[KF_bpf_rbtree_first];
> +}
> +
> +static bool is_bpf_datastructure_api_kfunc(u32 btf_id)
> +{
> +	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
> +}

static bool is_bpf_graph_api_kfunc(u32 btf_id)
{
	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
}

would read well here.
Much shorter too.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic
  2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
@ 2022-12-07  2:01   ` Alexei Starovoitov
  2022-12-17  8:49     ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  2:01 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:55PM -0800, Dave Marchevsky wrote:
> Some BPF helpers take a callback function which the helper calls. For
> each helper that takes such a callback, there's a special call to
> __check_func_call with a callback-state-setting callback that sets up
> verifier bpf_func_state for the callback's frame.
> 
> kfuncs don't have any of this infrastructure yet, so let's add it in
> this patch, following existing helper pattern as much as possible. To
> validate functionality of this added plumbing, this patch adds
> callback handling for the bpf_rbtree_add kfunc and hopes to lay
> groundwork for future next-gen datastructure callbacks.
> 
> In the "general plumbing" category we have:
> 
>   * check_kfunc_call doing callback verification right before clearing
>     CALLER_SAVED_REGS, exactly like check_helper_call
>   * recognition of func_ptr BTF types in kfunc args as
>     KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type
> 
> In the "rbtree_add / next-gen datastructure-specific plumbing" category:
> 
>   * Since bpf_rbtree_add must be called while the spin_lock associated
>     with the tree is held, don't complain when callback's func_state
>     doesn't unlock it by frame exit
>   * Mark rbtree_add callback's args PTR_UNTRUSTED to prevent rbtree
>     api functions from being called in the callback
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  kernel/bpf/verifier.c | 136 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 130 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 652112007b2c..9ad8c0b264dc 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1448,6 +1448,16 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
>  	reg->type &= ~PTR_MAYBE_NULL;
>  }
>  
> +static void mark_reg_datastructure_node(struct bpf_reg_state *regs, u32 regno,
> +					struct btf_field_datastructure_head *ds_head)
> +{
> +	__mark_reg_known_zero(&regs[regno]);
> +	regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC;
> +	regs[regno].btf = ds_head->btf;
> +	regs[regno].btf_id = ds_head->value_btf_id;
> +	regs[regno].off = ds_head->node_offset;
> +}
> +
>  static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)
>  {
>  	return type_is_pkt_pointer(reg->type);
> @@ -4771,7 +4781,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  			return -EACCES;
>  		}
>  
> -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
> +		    !cur_func(env)->in_callback_fn) {
>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>  			return -EFAULT;
>  		}
> @@ -6952,6 +6963,8 @@ static int set_callee_state(struct bpf_verifier_env *env,
>  			    struct bpf_func_state *caller,
>  			    struct bpf_func_state *callee, int insn_idx);
>  
> +static bool is_callback_calling_kfunc(u32 btf_id);
> +
>  static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			     int *insn_idx, int subprog,
>  			     set_callee_state_fn set_callee_state_cb)
> @@ -7006,10 +7019,18 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  	 * interested in validating only BPF helpers that can call subprogs as
>  	 * callbacks
>  	 */
> -	if (set_callee_state_cb != set_callee_state && !is_callback_calling_function(insn->imm)) {
> -		verbose(env, "verifier bug: helper %s#%d is not marked as callback-calling\n",
> -			func_id_name(insn->imm), insn->imm);
> -		return -EFAULT;
> +	if (set_callee_state_cb != set_callee_state) {
> +		if (bpf_pseudo_kfunc_call(insn) &&
> +		    !is_callback_calling_kfunc(insn->imm)) {
> +			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
> +				func_id_name(insn->imm), insn->imm);
> +			return -EFAULT;
> +		} else if (!bpf_pseudo_kfunc_call(insn) &&
> +			   !is_callback_calling_function(insn->imm)) { /* helper */
> +			verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n",
> +				func_id_name(insn->imm), insn->imm);
> +			return -EFAULT;
> +		}
>  	}
>  
>  	if (insn->code == (BPF_JMP | BPF_CALL) &&
> @@ -7275,6 +7296,67 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> +static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
> +					 struct bpf_func_state *caller,
> +					 struct bpf_func_state *callee,
> +					 int insn_idx)
> +{
> +	/* void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
> +	 *                     bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b));
> +	 *
> +	 * 'struct bpf_rb_node *node' arg to bpf_rbtree_add is the same PTR_TO_BTF_ID w/ offset
> +	 * that 'less' callback args will be receiving. However, 'node' arg was release_reference'd
> +	 * by this point, so look at 'root'
> +	 */
> +	struct btf_field *field;
> +	struct btf_record *rec;
> +
> +	rec = reg_btf_record(&caller->regs[BPF_REG_1]);
> +	if (!rec)
> +		return -EFAULT;
> +
> +	field = btf_record_find(rec, caller->regs[BPF_REG_1].off, BPF_RB_ROOT);
> +	if (!field || !field->datastructure_head.value_btf_id)
> +		return -EFAULT;
> +
> +	mark_reg_datastructure_node(callee->regs, BPF_REG_1, &field->datastructure_head);
> +	callee->regs[BPF_REG_1].type |= PTR_UNTRUSTED;
> +	mark_reg_datastructure_node(callee->regs, BPF_REG_2, &field->datastructure_head);
> +	callee->regs[BPF_REG_2].type |= PTR_UNTRUSTED;

Please add a comment here to explain that the pointers are actually trusted
and here it's a quick hack to prevent callback to call into rb_tree kfuncs.
We definitely would need to clean it up.
Have you tried to check for is_bpf_list_api_kfunc() || is_bpf_rbtree_api_kfunc()
while processing kfuncs inside callback ?

> +	callee->in_callback_fn = true;

this will give you a flag to do that check.

> +	callee->callback_ret_range = tnum_range(0, 1);
> +	return 0;
> +}
> +
> +static bool is_rbtree_lock_required_kfunc(u32 btf_id);
> +
> +/* Are we currently verifying the callback for a rbtree helper that must
> + * be called with lock held? If so, no need to complain about unreleased
> + * lock
> + */
> +static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
> +{
> +	struct bpf_verifier_state *state = env->cur_state;
> +	struct bpf_insn *insn = env->prog->insnsi;
> +	struct bpf_func_state *callee;
> +	int kfunc_btf_id;
> +
> +	if (!state->curframe)
> +		return false;
> +
> +	callee = state->frame[state->curframe];
> +
> +	if (!callee->in_callback_fn)
> +		return false;
> +
> +	kfunc_btf_id = insn[callee->callsite].imm;
> +	return is_rbtree_lock_required_kfunc(kfunc_btf_id);
> +}
> +
>  static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>  {
>  	struct bpf_verifier_state *state = env->cur_state;
> @@ -8007,6 +8089,7 @@ struct bpf_kfunc_call_arg_meta {
>  	bool r0_rdonly;
>  	u32 ret_btf_id;
>  	u64 r0_size;
> +	u32 subprogno;
>  	struct {
>  		u64 value;
>  		bool found;
> @@ -8185,6 +8268,18 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
>  	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
>  }
>  
> +static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
> +				  const struct btf_param *arg)
> +{
> +	const struct btf_type *t;
> +
> +	t = btf_type_resolve_func_ptr(btf, arg->type, NULL);
> +	if (!t)
> +		return false;
> +
> +	return true;
> +}
> +
>  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
>  static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
>  					const struct btf *btf,
> @@ -8244,6 +8339,7 @@ enum kfunc_ptr_arg_type {
>  	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
>  	KF_ARG_PTR_TO_MEM,
>  	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
> +	KF_ARG_PTR_TO_CALLBACK,
>  	KF_ARG_PTR_TO_RB_ROOT,
>  	KF_ARG_PTR_TO_RB_NODE,
>  };
> @@ -8368,6 +8464,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  		return KF_ARG_PTR_TO_BTF_ID;
>  	}
>  
> +	if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
> +		return KF_ARG_PTR_TO_CALLBACK;
> +
>  	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
>  		arg_mem_size = true;
>  
> @@ -8585,6 +8684,16 @@ static bool is_bpf_datastructure_api_kfunc(u32 btf_id)
>  	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
>  }
>  
> +static bool is_callback_calling_kfunc(u32 btf_id)
> +{
> +	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
> +}
> +
> +static bool is_rbtree_lock_required_kfunc(u32 btf_id)
> +{
> +	return is_bpf_rbtree_api_kfunc(btf_id);
> +}
> +
>  static bool check_kfunc_is_datastructure_head_api(struct bpf_verifier_env *env,
>  						  enum btf_field_type head_field_type,
>  						  u32 kfunc_btf_id)
> @@ -8920,6 +9029,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  		case KF_ARG_PTR_TO_RB_NODE:
>  		case KF_ARG_PTR_TO_MEM:
>  		case KF_ARG_PTR_TO_MEM_SIZE:
> +		case KF_ARG_PTR_TO_CALLBACK:
>  			/* Trusted by default */
>  			break;
>  		default:
> @@ -9078,6 +9188,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			/* Skip next '__sz' argument */
>  			i++;
>  			break;
> +		case KF_ARG_PTR_TO_CALLBACK:
> +			meta->subprogno = reg->subprogno;
> +			break;
>  		}
>  	}
>  
> @@ -9193,6 +9306,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		}
>  	}
>  
> +	if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
> +		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
> +					set_rbtree_add_callback_state);
> +		if (err) {
> +			verbose(env, "kfunc %s#%d failed callback verification\n",
> +				func_name, func_id);
> +			return err;
> +		}
> +	}
> +
>  	for (i = 0; i < CALLER_SAVED_REGS; i++)
>  		mark_reg_not_init(env, regs, caller_saved[i]);
>  
> @@ -14023,7 +14146,8 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> -				if (env->cur_state->active_lock.ptr) {
> +				if (env->cur_state->active_lock.ptr &&
> +				    !in_rbtree_lock_required_cb(env)) {

That looks wrong.
It will allow callbacks to use unpaired lock/unlock.
Have you tried clearing cur_state->active_lock when entering callback?
That should solve it and won't cause lock/unlock imbalance.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2022-12-07  2:18   ` Alexei Starovoitov
  0 siblings, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  2:18 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:56PM -0800, Dave Marchevsky wrote:
> Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> that require handling in the verifier:
> 
>   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
>     the bpf_rb_node field, with the offset set to that field's offset,
>     instead of a struct bpf_rb_node *
>     * Generalized existing next-gen list verifier handling for this
>       as mark_reg_datastructure_node helper
> 
>   * Unlike other functions, which set release_on_unlock on one of their
>     args, bpf_rbtree_first takes no arguments, rather setting
>     release_on_unlock on its return value
> 
>   * bpf_rbtree_remove's node input is a node that's been inserted
>     in the tree. Only non-owning references (PTR_UNTRUSTED +
>     release_on_unlock) refer to such nodes, but kfuncs don't take
>     PTR_UNTRUSTED args
>     * Added special carveout for bpf_rbtree_remove to take PTR_UNTRUSTED
>     * Since node input already has release_on_unlock set, don't set
>       it again
> 
> This patch, along with the previous one, complete special verifier
> handling for all rbtree API functions added in this series.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  kernel/bpf/verifier.c | 89 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 73 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9ad8c0b264dc..29983e2c27df 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6122,6 +6122,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  	return 0;
>  }
>  
> +static bool
> +func_arg_reg_rb_node_offset(const struct bpf_reg_state *reg, s32 off)
> +{
> +	struct btf_record *rec;
> +	struct btf_field *field;
> +
> +	rec = reg_btf_record(reg);
> +	if (!rec)
> +		return false;
> +
> +	field = btf_record_find(rec, off, BPF_RB_NODE);
> +	if (!field)
> +		return false;
> +
> +	return true;
> +}
> +
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
>  			   enum bpf_arg_type arg_type)
> @@ -6176,6 +6193,13 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  		 */
>  		fixed_off_ok = true;
>  		break;
> +	case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
> +		/* Currently only bpf_rbtree_remove accepts a PTR_UNTRUSTED
> +		 * bpf_rb_node. Fixed off of the node type is OK
> +		 */
> +		if (reg->off && func_arg_reg_rb_node_offset(reg, reg->off))
> +			fixed_off_ok = true;
> +		break;

This doesn't look safe.
We cannot pass generic PTR_UNTRUSTED to bpf_rbtree_remove.
bpf_rbtree_remove wouldn't be able to distinguish invalid pointer.

Considering the cover letter example:

 bpf_spin_lock(&glock);
 res = bpf_rbtree_first(&groot);
   // groot and res are both trusted, no?
 if (!res)
   /* skip */
 // res is acquired and !null here

 res = bpf_rbtree_remove(&groot, res); // both args are trusted

 // here old res becomes untrusted because it went through release kfunc
 // new res is untrusted
 if (!res)
   /* skip */
 bpf_spin_unlock(&glock);

what am I missing?

I thought
bpf_obj_new -> returns acq obj
bpf_rbtree_add -> releases that obj
same way bpf_rbtree_first/next/ -> return acq obj
that can be passed to both rbtree_add and rbtree_remove.
The former will be a nop in runtime, but release from the verifier pov.
Similar with rbtree_remove:
obj = bpf_obj_new
bpf_rbtree_remove(root, obj); will be equivalent to bpf_obj_drop at run-time
and release form the verifier pov.

Are you trying to return untrusted from bpf_rbtree_first?
But then how we can guarantee safety?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
@ 2022-12-07  2:39   ` Alexei Starovoitov
  2022-12-07  6:46     ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07  2:39 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Dec 06, 2022 at 03:09:57PM -0800, Dave Marchevsky wrote:
> Current comment in BPF_PROBE_MEM jit code claims that verifier prevents
> insn->off < 0, but this appears to not be true irrespective of changes
> in this series. Regardless, changes in this series will result in an
> example like:
> 
>   struct example_node {
>     long key;
>     long val;
>     struct bpf_rb_node node;
>   }
> 
>   /* In BPF prog, assume root contains example_node nodes */
>   struct bpf_rb_node res = bpf_rbtree_first(&root);
>   if (!res)
>     return 1;
> 
>   struct example_node n = container_of(res, struct example_node, node);
>   long key = n->key;
> 
> Resulting in a load with off = -16, as bpf_rbtree_first's return is

Looks like the bug in the previous patch:
+                       } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+                                  meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+                               struct btf_field *field = meta.arg_rbtree_root.field;
+
+                               mark_reg_datastructure_node(regs, BPF_REG_0,
+                                                           &field->datastructure_head);

The R0 .off should have been:
 regs[BPF_REG_0].off = field->rb_node.node_offset;

node, not root.

PTR_TO_BTF_ID should have been returned with approriate 'off',
so that container_of() would it bring back to zero offset.

The apporach of returning untrusted from bpf_rbtree_first is questionable.
Without doing that this issue would not have surfaced.

All PTR_TO_BTF_ID need to have positive offset.
I'm not sure btf_struct_walk() and other PTR_TO_BTF_ID accessors
can deal with negative offsets.
There could be all kinds of things to fix.

> modified by verifier to be PTR_TO_BTF_ID of example_node w/ offset =
> offsetof(struct example_node, node), instead of PTR_TO_BTF_ID of
> bpf_rb_node. So it's necessary to support negative insn->off when
> jitting BPF_PROBE_MEM.

I'm not convinced it's necessary.
container_of() seems to be the only case where bpf prog can convert
PTR_TO_BTF_ID with off >= 0 to negative off.
Normal pointer walking will not make it negative.

> In order to ensure that page fault for a BPF_PROBE_MEM load of *src_reg +
> insn->off is safely handled, we must confirm that *src_reg + insn->off is
> in kernel's memory. Two runtime checks are emitted to confirm that:
> 
>   1) (*src_reg + insn->off) > boundary between user and kernel address
>   spaces
>   2) (*src_reg + insn->off) does not overflow to a small positive
>   number. This might happen if some function meant to set src_reg
>   returns ERR_PTR(-EINVAL) or similar.
> 
> Check 1 currently is sligtly off - it compares a
> 
>   u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);
> 
> to *src_reg, aborting the load if limit is larger. Rewriting this as an
> inequality:
> 
>   *src_reg > TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off)
>   *src_reg - abs(insn->off) > TASK_SIZE_MAX + PAGE_SIZE
> 
> shows that this isn't quite right even if insn->off is positive, as we
> really want:
> 
>   *src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
>   *src_reg > TASK_SIZE_MAX + PAGE_SIZE - insn_off
> 
> Since *src_reg + insn->off is the address we'll be loading from, not
> *src_reg - insn->off or *src_reg - abs(insn->off). So change the
> subtraction to an addition and remove the abs(), as comment indicates
> that it was only added to ignore negative insn->off.
> 
> For Check 2, currently "does not overflow to a small positive number" is
> confirmed by emitting an 'add insn->off, src_reg' instruction and
> checking for carry flag. While this works fine for a positive insn->off,
> a small negative insn->off like -16 is almost guaranteed to wrap over to
> a small positive number when added to any kernel address.
> 
> This patch addresses this by not doing Check 2 at BPF prog runtime when
> insn->off is negative, rather doing a stronger check at JIT-time. The
> logic supporting this is as follows:
> 
> 1) Assume insn->off is negative, call the largest such negative offset
>    MAX_NEGATIVE_OFF. So insn->off >= MAX_NEGATIVE_OFF for all possible
>    insn->off.
> 
> 2) *src_reg + insn->off will not wrap over to an unexpected address by
>    virtue of negative insn->off, but it might wrap under if
>    -insn->off > *src_reg, as that implies *src_reg + insn->off < 0
> 
> 3) Inequality (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
>    must be true since insn->off is negative.
> 
> 4) If we've completed check 1, we know that
>    src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)
> 
> 5) Combining statements 3 and 4, we know src_reg > (TASK_SIZE_MAX + PAGE_SIZE)
> 
> 6) By statements 1, 4, and 5, if we can prove
>    (TASK_SIZE_MAX + PAGE_SIZE) > -MAX_NEGATIVE_OFF, we'll know that
>    (TASK_SIZE_MAX + PAGE_SIZE) > -insn->off for all possible insn->off
>    values. We can rewrite this as (TASK_SIZE_MAX + PAGE_SIZE) +
>    MAX_NEGATIVE_OFF > 0.
> 
>    Since src_reg > TASK_SIZE_MAX + PAGE_SIZE and MAX_NEGATIVE_OFF is
>    negative, if the previous inequality is true,
>    src_reg + MAX_NEGATIVE_OFF > 0 is also true for all src_reg values.
>    Similarly, since insn->off >= MAX_NEGATIVE_OFF for all possible
>    negative insn->off vals, src_reg + insn->off > 0 and there can be no
>    wrapping under.
> 
> So proving (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 implies
> *src_reg + insn->off > 0 for any src_reg that's passed check 1 and any
> negative insn->off. Luckily the former inequality does not need to be
> checked at runtime, and in fact could be a static_assert if
> TASK_SIZE_MAX wasn't determined by a function when CONFIG_X86_5LEVEL
> kconfig is used.
> 
> Regardless, we can just check (TASK_SIZE_MAX + PAGE_SIZE) +
> MAX_NEGATIVE_OFF > 0 once per do_jit call instead of emitting a runtime
> check. Given that insn->off is a s16 and is unlikely to grow larger,
> this check should always succeed on any x86 processor made in the 21st
> century. If it doesn't fail all do_jit calls and complain loudly with
> the assumption that the BPF subsystem is misconfigured or has a bug.
> 
> A few instructions are saved for negative insn->offs as a result. Using
> the struct example_node / off = -16 example from before, code looks
> like:

This is quite complex to review. I couldn't convince myself
that droping 2nd check is safe, but don't have an argument to
prove that it's not safe.
Let's get to these details when there is need to support negative off.

> 
> BEFORE CHANGE
>   72:   movabs $0x800000000010,%r11
>   7c:   cmp    %r11,%rdi
>   7f:   jb     0x000000000000008d         (check 1 on 7c and here)
>   81:   mov    %rdi,%r11
>   84:   add    $0xfffffffffffffff0,%r11   (check 2, will set carry for almost any r11, so bug for
>   8b:   jae    0x0000000000000091          negative insn->off)
>   8d:   xor    %edi,%edi                  (as a result long key = n->key; will be 0'd out here)
>   8f:   jmp    0x0000000000000095
>   91:   mov    -0x10(%rdi),%rdi
>   95:
> 
> AFTER CHANGE:
>   5a:   movabs $0x800000000010,%r11
>   64:   cmp    %r11,%rdi
>   67:   jae    0x000000000000006d     (check 1 on 64 and here, but now JNC instead of JC)
>   69:   xor    %edi,%edi              (no check 2, 0 out if %rdi - %r11 < 0)
>   6b:   jmp    0x0000000000000071
>   6d:   mov    -0x10(%rdi),%rdi
>   71:
> 
> We could do the same for insn->off == 0, but for now keep code
> generation unchanged for previously working nonnegative insn->offs.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 123 +++++++++++++++++++++++++++---------
>  1 file changed, 92 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 36ffe67ad6e5..843f619d0d35 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -11,6 +11,7 @@
>  #include <linux/bpf.h>
>  #include <linux/memory.h>
>  #include <linux/sort.h>
> +#include <linux/limits.h>
>  #include <asm/extable.h>
>  #include <asm/set_memory.h>
>  #include <asm/nospec-branch.h>
> @@ -94,6 +95,7 @@ static int bpf_size_to_x86_bytes(int bpf_size)
>   */
>  #define X86_JB  0x72
>  #define X86_JAE 0x73
> +#define X86_JNC 0x73
>  #define X86_JE  0x74
>  #define X86_JNE 0x75
>  #define X86_JBE 0x76
> @@ -950,6 +952,36 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>  	*pprog = prog;
>  }
>  
> +/* Check that condition necessary for PROBE_MEM handling for insn->off < 0
> + * holds.
> + *
> + * This could be a static_assert((TASK_SIZE_MAX + PAGE_SIZE) > -S16_MIN),
> + * but TASK_SIZE_MAX can't always be evaluated at compile time, so let's not
> + * assume insn->off size either
> + */
> +static int check_probe_mem_task_size_overflow(void)
> +{
> +	struct bpf_insn insn;
> +	s64 max_negative;
> +
> +	switch (sizeof(insn.off)) {
> +	case 2:
> +		max_negative = S16_MIN;
> +		break;
> +	default:
> +		pr_err("bpf_jit_error: unexpected bpf_insn->off size\n");
> +		return -EFAULT;
> +	}
> +
> +	if (!((TASK_SIZE_MAX + PAGE_SIZE) > -max_negative)) {
> +		pr_err("bpf jit error: assumption does not hold:\n");
> +		pr_err("\t(TASK_SIZE_MAX + PAGE_SIZE) + (max negative insn->off) > 0\n");
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>  
>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
> @@ -967,6 +999,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>  	u8 *prog = temp;
>  	int err;
>  
> +	err = check_probe_mem_task_size_overflow();
> +	if (err)
> +		return err;
> +
>  	detect_reg_usage(insn, insn_cnt, callee_regs_used,
>  			 &tail_call_seen);
>  
> @@ -1359,20 +1395,30 @@ st:			if (is_imm8(insn->off))
>  		case BPF_LDX | BPF_MEM | BPF_DW:
>  		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
>  			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
> -				/* Though the verifier prevents negative insn->off in BPF_PROBE_MEM
> -				 * add abs(insn->off) to the limit to make sure that negative
> -				 * offset won't be an issue.
> -				 * insn->off is s16, so it won't affect valid pointers.
> -				 */
> -				u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);
> -				u8 *end_of_jmp1, *end_of_jmp2;
> -
>  				/* Conservatively check that src_reg + insn->off is a kernel address:
> -				 * 1. src_reg + insn->off >= limit
> -				 * 2. src_reg + insn->off doesn't become small positive.
> -				 * Cannot do src_reg + insn->off >= limit in one branch,
> -				 * since it needs two spare registers, but JIT has only one.
> +				 * 1. src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE
> +				 * 2. src_reg + insn->off doesn't overflow and become small positive
> +				 *
> +				 * For check 1, to save regs, do
> +				 * src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off) call rhs
> +				 * of inequality 'limit'
> +				 *
> +				 * For check 2:
> +				 * If insn->off is positive, add src_reg + insn->off and check
> +				 * overflow directly
> +				 * If insn->off is negative, we know that
> +				 *   (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
> +				 * and from check 1 we know
> +				 *   src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)
> +				 * So if (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 we can
> +				 * be sure that src_reg + insn->off won't overflow in either
> +				 * direction and avoid runtime check entirely.
> +				 *
> +				 * check_probe_mem_task_size_overflow confirms the above assumption
> +				 * at the beginning of this function
>  				 */
> +				u64 limit = TASK_SIZE_MAX + PAGE_SIZE - insn->off;
> +				u8 *end_of_jmp1, *end_of_jmp2;
>  
>  				/* movabsq r11, limit */
>  				EMIT2(add_1mod(0x48, AUX_REG), add_1reg(0xB8, AUX_REG));
> @@ -1381,32 +1427,47 @@ st:			if (is_imm8(insn->off))
>  				/* cmp src_reg, r11 */
>  				maybe_emit_mod(&prog, src_reg, AUX_REG, true);
>  				EMIT2(0x39, add_2reg(0xC0, src_reg, AUX_REG));
> -				/* if unsigned '<' goto end_of_jmp2 */
> -				EMIT2(X86_JB, 0);
> -				end_of_jmp1 = prog;
> -
> -				/* mov r11, src_reg */
> -				emit_mov_reg(&prog, true, AUX_REG, src_reg);
> -				/* add r11, insn->off */
> -				maybe_emit_1mod(&prog, AUX_REG, true);
> -				EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
> -				/* jmp if not carry to start_of_ldx
> -				 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
> -				 * that has to be rejected.
> -				 */
> -				EMIT2(0x73 /* JNC */, 0);
> -				end_of_jmp2 = prog;
> +				if (insn->off >= 0) {
> +					/* cmp src_reg, r11 */
> +					/* if unsigned '<' goto end_of_jmp2 */
> +					EMIT2(X86_JB, 0);
> +					end_of_jmp1 = prog;
> +
> +					/* mov r11, src_reg */
> +					emit_mov_reg(&prog, true, AUX_REG, src_reg);
> +					/* add r11, insn->off */
> +					maybe_emit_1mod(&prog, AUX_REG, true);
> +					EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
> +					/* jmp if not carry to start_of_ldx
> +					 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
> +					 * that has to be rejected.
> +					 */
> +					EMIT2(X86_JNC, 0);
> +					end_of_jmp2 = prog;
> +				} else {
> +					/* cmp src_reg, r11 */
> +					/* if unsigned '>=' goto start_of_ldx
> +					 * w/o needing to do check 2
> +					 */
> +					EMIT2(X86_JAE, 0);
> +					end_of_jmp1 = prog;
> +				}
>  
>  				/* xor dst_reg, dst_reg */
>  				emit_mov_imm32(&prog, false, dst_reg, 0);
>  				/* jmp byte_after_ldx */
>  				EMIT2(0xEB, 0);
>  
> -				/* populate jmp_offset for JB above to jump to xor dst_reg */
> -				end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
> -				/* populate jmp_offset for JNC above to jump to start_of_ldx */
>  				start_of_ldx = prog;
> -				end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
> +				if (insn->off >= 0) {
> +					/* populate jmp_offset for JB above to jump to xor dst_reg */
> +					end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
> +					/* populate jmp_offset for JNC above to jump to start_of_ldx */
> +					end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
> +				} else {
> +					/* populate jmp_offset for JAE above to jump to start_of_ldx */
> +					end_of_jmp1[-1] = start_of_ldx - end_of_jmp1;
> +				}
>  			}
>  			emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
>  			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (12 preceding siblings ...)
  2022-12-06 23:10 ` [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
@ 2022-12-07  2:50 ` patchwork-bot+netdevbpf
  2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
  14 siblings, 0 replies; 51+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-12-07  2:50 UTC (permalink / raw)
  To: Dave Marchevsky; +Cc: bpf, ast, daniel, andrii, kernel-team, memxor, tj

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue, 6 Dec 2022 15:09:47 -0800 you wrote:
> This series adds a rbtree datastructure following the "next-gen
> datastructure" precedent set by recently-added linked-list [0]. This is
> a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
> instead of adding a new map type. This series adds a smaller set of API
> functions than that RFC - just the minimum needed to support current
> cgfifo example scheduler in ongoing sched_ext effort [2], namely:
> 
> [...]

Here is the summary with links:
  - [bpf-next,01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
    https://git.kernel.org/bpf/bpf-next/c/d8939cb0a03c
  - [bpf-next,02/13] bpf: map_check_btf should fail if btf_parse_fields fails
    (no matching commit)
  - [bpf-next,03/13] bpf: Minor refactor of ref_set_release_on_unlock
    (no matching commit)
  - [bpf-next,04/13] bpf: rename list_head -> datastructure_head in field info types
    (no matching commit)
  - [bpf-next,05/13] bpf: Add basic bpf_rb_{root,node} support
    (no matching commit)
  - [bpf-next,06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
    (no matching commit)
  - [bpf-next,07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
    (no matching commit)
  - [bpf-next,08/13] bpf: Add callback validation to kfunc verifier logic
    (no matching commit)
  - [bpf-next,09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first}
    (no matching commit)
  - [bpf-next,10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
    (no matching commit)
  - [bpf-next,11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
    (no matching commit)
  - [bpf-next,12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type
    (no matching commit)
  - [bpf-next,13/13] selftests/bpf: Add rbtree selftests
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-07  2:39   ` Alexei Starovoitov
@ 2022-12-07  6:46     ` Dave Marchevsky
  2022-12-07 18:06       ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07  6:46 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/6/22 9:39 PM, Alexei Starovoitov wrote:
> On Tue, Dec 06, 2022 at 03:09:57PM -0800, Dave Marchevsky wrote:
>> Current comment in BPF_PROBE_MEM jit code claims that verifier prevents
>> insn->off < 0, but this appears to not be true irrespective of changes
>> in this series. Regardless, changes in this series will result in an
>> example like:
>>
>>   struct example_node {
>>     long key;
>>     long val;
>>     struct bpf_rb_node node;
>>   }
>>
>>   /* In BPF prog, assume root contains example_node nodes */
>>   struct bpf_rb_node res = bpf_rbtree_first(&root);
>>   if (!res)
>>     return 1;
>>
>>   struct example_node n = container_of(res, struct example_node, node);
>>   long key = n->key;
>>
>> Resulting in a load with off = -16, as bpf_rbtree_first's return is
> 
> Looks like the bug in the previous patch:
> +                       } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> +                                  meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> +                               struct btf_field *field = meta.arg_rbtree_root.field;
> +
> +                               mark_reg_datastructure_node(regs, BPF_REG_0,
> +                                                           &field->datastructure_head);
> 
> The R0 .off should have been:
>  regs[BPF_REG_0].off = field->rb_node.node_offset;
> 
> node, not root.
> 
> PTR_TO_BTF_ID should have been returned with approriate 'off',
> so that container_of() would it bring back to zero offset.
> 

The root's btf_field is used to hold information about the node type. Of
specific interest to us are value_btf_id and node_offset, which
mark_reg_datastructure_node uses to set REG_0's type and offset correctly.

This "use head type to keep info about node type" strategy felt strange to me
initially too: all PTR_TO_BTF_ID regs are passing around their type info, so
why not use that to lookup bpf_rb_node field info? But consider that
bpf_rbtree_first (and bpf_list_pop_{front,back}) doesn't take a node as
input arg, so there's no opportunity to get btf_field info from input
reg type. 

So we'll need to keep this info in rbtree_root's btf_field
regardless, and since any rbtree API function that operates on a node
also operates on a root and expects its node arg to match the node
type expected by the root, might as well use root's field as the main
lookup for this info and not even have &field->rb_node for now.
All __process_kf_arg_ptr_to_datastructure_node calls (added earlier
in the series) use the &meta->arg_{list_head,rbtree_root}.field for same
reason.

So it's setting the reg offset correctly.

> All PTR_TO_BTF_ID need to have positive offset.
> I'm not sure btf_struct_walk() and other PTR_TO_BTF_ID accessors
> can deal with negative offsets.
> There could be all kinds of things to fix.

I think you may be conflating reg offset and insn offset here. None of the
changes in this series result in a PTR_TO_BTF_ID reg w/ negative offset
being returned. But LLVM may generate load insns with a negative offset,
and since we're passing around pointers to bpf_rb_node that may come
after useful data fields in a type, this will happen more often.

Consider this small example from selftests in this series:

struct node_data {
  long key;
  long data;
  struct bpf_rb_node node;
};

static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
{
        struct node_data *node_a;
        struct node_data *node_b;

        node_a = container_of(a, struct node_data, node);
        node_b = container_of(b, struct node_data, node);

        return node_a->key < node_b->key;
}

llvm-objdump shows this bpf bytecode for 'less':

0000000000000000 <less>:
;       return node_a->key < node_b->key;
       0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
       1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
       2:       b4 00 00 00 01 00 00 00 w0 = 0x1
;       return node_a->key < node_b->key;
       3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
       4:       b4 00 00 00 00 00 00 00 w0 = 0x0

0000000000000028 <LBB2_2>:
;       return node_a->key < node_b->key;
       5:       95 00 00 00 00 00 00 00 exit

Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
negative insn->off. Verifier's view or R1 and R2 before insn 0 is
untrusted_ptr_node_data(off=16). If there were some intermediate insns
storing result of container_of() before dereferencing:

  r3 = (r2 - 0x10)
  r2 = *(u64 *)(r3)

Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
using original arg ptrs to less() instead of storing container_of() ptr
adjustments.

Since the container_of usage and code pattern in above example's less()
isn't particularly specific to this series, I think there are other scenarios
where such code would be generated and considered this a general bugfix in
cover letter.

[ below paragraph was moved here, it originally preceded "All PTR_TO_BTF_ID"
  paragraph ]

> The apporach of returning untrusted from bpf_rbtree_first is questionable.
> Without doing that this issue would not have surfaced.
> 

I agree re: PTR_UNTRUSTED, but note that my earlier example doesn't involve
bpf_rbtree_first. Regardless, I think the issue is that PTR_UNTRUSTED is
used to denote a few separate traits of a PTR_TO_BTF_ID reg:

  * "I have no ownership over the thing I'm pointing to"
  * "My backing memory may go away at any time"
  * "Access to my fields might result in page fault"
  * "Kfuncs shouldn't accept me as an arg"

Seems like original PTR_UNTRUSTED usage really wanted to denote the first
point and the others were just naturally implied from the first. But
as you've noted there are some things using PTR_UNTRUSTED that really
want to make more granular statements:

ref_set_release_on_unlock logic sets release_on_unlock = true and adds
PTR_UNTRUSTED to the reg type. In this case PTR_UNTRUSTED is trying to say:

  * "I have no ownership over the thing I'm pointing to"
  * "My backing memory may go away at any time _after_ bpf_spin_unlock"
    * Before spin_unlock it's guaranteed to be valid
  * "Kfuncs shouldn't accept me as an arg"
    * We don't want arbitrary kfunc saving and accessing release_on_unlock
      reg after bpf_spin_unlock, as its backing memory can go away any time
      after spin_unlock.

The "backing memory" statement PTR_UNTRUSTED is making is a blunt superset
of what release_on_unlock really needs.

For less() callback we just want

  * "I have no ownership over the thing I'm pointing to"
  * "Kfuncs shouldn't accept me as an arg"

There is probably a way to decompose PTR_UNTRUSTED into a few flags such that
it's possible to denote these things separately and avoid unwanted additional
behavior. But after talking to David Vernet about current complexity of
PTR_TRUSTED and PTR_UNTRUSTED logic and his desire to refactor, it seemed
better to continue with PTR_UNTRUSTED blunt instrument with a bit of
special casing for now, instead of piling on more flags.

> 
>> modified by verifier to be PTR_TO_BTF_ID of example_node w/ offset =
>> offsetof(struct example_node, node), instead of PTR_TO_BTF_ID of
>> bpf_rb_node. So it's necessary to support negative insn->off when
>> jitting BPF_PROBE_MEM.
> 
> I'm not convinced it's necessary.
> container_of() seems to be the only case where bpf prog can convert
> PTR_TO_BTF_ID with off >= 0 to negative off.
> Normal pointer walking will not make it negative.
> 

I see what you mean - if some non-container_of case resulted in load generation
with negative insn->off, this probably would've been noticed already. But
hopefully my replies above explain why it should be addressed now.

>> In order to ensure that page fault for a BPF_PROBE_MEM load of *src_reg +
>> insn->off is safely handled, we must confirm that *src_reg + insn->off is
>> in kernel's memory. Two runtime checks are emitted to confirm that:
>>
>>   1) (*src_reg + insn->off) > boundary between user and kernel address
>>   spaces
>>   2) (*src_reg + insn->off) does not overflow to a small positive
>>   number. This might happen if some function meant to set src_reg
>>   returns ERR_PTR(-EINVAL) or similar.
>>
>> Check 1 currently is sligtly off - it compares a
>>
>>   u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);
>>
>> to *src_reg, aborting the load if limit is larger. Rewriting this as an
>> inequality:
>>
>>   *src_reg > TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off)
>>   *src_reg - abs(insn->off) > TASK_SIZE_MAX + PAGE_SIZE
>>
>> shows that this isn't quite right even if insn->off is positive, as we
>> really want:
>>
>>   *src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
>>   *src_reg > TASK_SIZE_MAX + PAGE_SIZE - insn_off
>>
>> Since *src_reg + insn->off is the address we'll be loading from, not
>> *src_reg - insn->off or *src_reg - abs(insn->off). So change the
>> subtraction to an addition and remove the abs(), as comment indicates
>> that it was only added to ignore negative insn->off.
>>
>> For Check 2, currently "does not overflow to a small positive number" is
>> confirmed by emitting an 'add insn->off, src_reg' instruction and
>> checking for carry flag. While this works fine for a positive insn->off,
>> a small negative insn->off like -16 is almost guaranteed to wrap over to
>> a small positive number when added to any kernel address.
>>
>> This patch addresses this by not doing Check 2 at BPF prog runtime when
>> insn->off is negative, rather doing a stronger check at JIT-time. The
>> logic supporting this is as follows:
>>
>> 1) Assume insn->off is negative, call the largest such negative offset
>>    MAX_NEGATIVE_OFF. So insn->off >= MAX_NEGATIVE_OFF for all possible
>>    insn->off.
>>
>> 2) *src_reg + insn->off will not wrap over to an unexpected address by
>>    virtue of negative insn->off, but it might wrap under if
>>    -insn->off > *src_reg, as that implies *src_reg + insn->off < 0
>>
>> 3) Inequality (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
>>    must be true since insn->off is negative.
>>
>> 4) If we've completed check 1, we know that
>>    src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)
>>
>> 5) Combining statements 3 and 4, we know src_reg > (TASK_SIZE_MAX + PAGE_SIZE)
>>
>> 6) By statements 1, 4, and 5, if we can prove
>>    (TASK_SIZE_MAX + PAGE_SIZE) > -MAX_NEGATIVE_OFF, we'll know that
>>    (TASK_SIZE_MAX + PAGE_SIZE) > -insn->off for all possible insn->off
>>    values. We can rewrite this as (TASK_SIZE_MAX + PAGE_SIZE) +
>>    MAX_NEGATIVE_OFF > 0.
>>
>>    Since src_reg > TASK_SIZE_MAX + PAGE_SIZE and MAX_NEGATIVE_OFF is
>>    negative, if the previous inequality is true,
>>    src_reg + MAX_NEGATIVE_OFF > 0 is also true for all src_reg values.
>>    Similarly, since insn->off >= MAX_NEGATIVE_OFF for all possible
>>    negative insn->off vals, src_reg + insn->off > 0 and there can be no
>>    wrapping under.
>>
>> So proving (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 implies
>> *src_reg + insn->off > 0 for any src_reg that's passed check 1 and any
>> negative insn->off. Luckily the former inequality does not need to be
>> checked at runtime, and in fact could be a static_assert if
>> TASK_SIZE_MAX wasn't determined by a function when CONFIG_X86_5LEVEL
>> kconfig is used.
>>
>> Regardless, we can just check (TASK_SIZE_MAX + PAGE_SIZE) +
>> MAX_NEGATIVE_OFF > 0 once per do_jit call instead of emitting a runtime
>> check. Given that insn->off is a s16 and is unlikely to grow larger,
>> this check should always succeed on any x86 processor made in the 21st
>> century. If it doesn't fail all do_jit calls and complain loudly with
>> the assumption that the BPF subsystem is misconfigured or has a bug.
>>
>> A few instructions are saved for negative insn->offs as a result. Using
>> the struct example_node / off = -16 example from before, code looks
>> like:
> 
> This is quite complex to review. I couldn't convince myself
> that droping 2nd check is safe, but don't have an argument to
> prove that it's not safe.
> Let's get to these details when there is need to support negative off.
> 

Hopefully above explanation shows that there's need to support it now.
I will try to simplify and rephrase the summary to make it easier to follow,
but will prioritize addressing feedback in less complex patches, so this
patch may not change for a few respins.

>>
>> BEFORE CHANGE
>>   72:   movabs $0x800000000010,%r11
>>   7c:   cmp    %r11,%rdi
>>   7f:   jb     0x000000000000008d         (check 1 on 7c and here)
>>   81:   mov    %rdi,%r11
>>   84:   add    $0xfffffffffffffff0,%r11   (check 2, will set carry for almost any r11, so bug for
>>   8b:   jae    0x0000000000000091          negative insn->off)
>>   8d:   xor    %edi,%edi                  (as a result long key = n->key; will be 0'd out here)
>>   8f:   jmp    0x0000000000000095
>>   91:   mov    -0x10(%rdi),%rdi
>>   95:
>>
>> AFTER CHANGE:
>>   5a:   movabs $0x800000000010,%r11
>>   64:   cmp    %r11,%rdi
>>   67:   jae    0x000000000000006d     (check 1 on 64 and here, but now JNC instead of JC)
>>   69:   xor    %edi,%edi              (no check 2, 0 out if %rdi - %r11 < 0)
>>   6b:   jmp    0x0000000000000071
>>   6d:   mov    -0x10(%rdi),%rdi
>>   71:
>>
>> We could do the same for insn->off == 0, but for now keep code
>> generation unchanged for previously working nonnegative insn->offs.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 123 +++++++++++++++++++++++++++---------
>>  1 file changed, 92 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 36ffe67ad6e5..843f619d0d35 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/bpf.h>
>>  #include <linux/memory.h>
>>  #include <linux/sort.h>
>> +#include <linux/limits.h>
>>  #include <asm/extable.h>
>>  #include <asm/set_memory.h>
>>  #include <asm/nospec-branch.h>
>> @@ -94,6 +95,7 @@ static int bpf_size_to_x86_bytes(int bpf_size)
>>   */
>>  #define X86_JB  0x72
>>  #define X86_JAE 0x73
>> +#define X86_JNC 0x73
>>  #define X86_JE  0x74
>>  #define X86_JNE 0x75
>>  #define X86_JBE 0x76
>> @@ -950,6 +952,36 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
>>  	*pprog = prog;
>>  }
>>  
>> +/* Check that condition necessary for PROBE_MEM handling for insn->off < 0
>> + * holds.
>> + *
>> + * This could be a static_assert((TASK_SIZE_MAX + PAGE_SIZE) > -S16_MIN),
>> + * but TASK_SIZE_MAX can't always be evaluated at compile time, so let's not
>> + * assume insn->off size either
>> + */
>> +static int check_probe_mem_task_size_overflow(void)
>> +{
>> +	struct bpf_insn insn;
>> +	s64 max_negative;
>> +
>> +	switch (sizeof(insn.off)) {
>> +	case 2:
>> +		max_negative = S16_MIN;
>> +		break;
>> +	default:
>> +		pr_err("bpf_jit_error: unexpected bpf_insn->off size\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	if (!((TASK_SIZE_MAX + PAGE_SIZE) > -max_negative)) {
>> +		pr_err("bpf jit error: assumption does not hold:\n");
>> +		pr_err("\t(TASK_SIZE_MAX + PAGE_SIZE) + (max negative insn->off) > 0\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>>  
>>  static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
>> @@ -967,6 +999,10 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>>  	u8 *prog = temp;
>>  	int err;
>>  
>> +	err = check_probe_mem_task_size_overflow();
>> +	if (err)
>> +		return err;
>> +
>>  	detect_reg_usage(insn, insn_cnt, callee_regs_used,
>>  			 &tail_call_seen);
>>  
>> @@ -1359,20 +1395,30 @@ st:			if (is_imm8(insn->off))
>>  		case BPF_LDX | BPF_MEM | BPF_DW:
>>  		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
>>  			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
>> -				/* Though the verifier prevents negative insn->off in BPF_PROBE_MEM
>> -				 * add abs(insn->off) to the limit to make sure that negative
>> -				 * offset won't be an issue.
>> -				 * insn->off is s16, so it won't affect valid pointers.
>> -				 */
>> -				u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off);
>> -				u8 *end_of_jmp1, *end_of_jmp2;
>> -
>>  				/* Conservatively check that src_reg + insn->off is a kernel address:
>> -				 * 1. src_reg + insn->off >= limit
>> -				 * 2. src_reg + insn->off doesn't become small positive.
>> -				 * Cannot do src_reg + insn->off >= limit in one branch,
>> -				 * since it needs two spare registers, but JIT has only one.
>> +				 * 1. src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE
>> +				 * 2. src_reg + insn->off doesn't overflow and become small positive
>> +				 *
>> +				 * For check 1, to save regs, do
>> +				 * src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off) call rhs
>> +				 * of inequality 'limit'
>> +				 *
>> +				 * For check 2:
>> +				 * If insn->off is positive, add src_reg + insn->off and check
>> +				 * overflow directly
>> +				 * If insn->off is negative, we know that
>> +				 *   (TASK_SIZE_MAX + PAGE_SIZE - insn->off) > (TASK_SIZE_MAX + PAGE_SIZE)
>> +				 * and from check 1 we know
>> +				 *   src_reg >= (TASK_SIZE_MAX + PAGE_SIZE - insn->off)
>> +				 * So if (TASK_SIZE_MAX + PAGE_SIZE) + MAX_NEGATIVE_OFF > 0 we can
>> +				 * be sure that src_reg + insn->off won't overflow in either
>> +				 * direction and avoid runtime check entirely.
>> +				 *
>> +				 * check_probe_mem_task_size_overflow confirms the above assumption
>> +				 * at the beginning of this function
>>  				 */
>> +				u64 limit = TASK_SIZE_MAX + PAGE_SIZE - insn->off;
>> +				u8 *end_of_jmp1, *end_of_jmp2;
>>  
>>  				/* movabsq r11, limit */
>>  				EMIT2(add_1mod(0x48, AUX_REG), add_1reg(0xB8, AUX_REG));
>> @@ -1381,32 +1427,47 @@ st:			if (is_imm8(insn->off))
>>  				/* cmp src_reg, r11 */
>>  				maybe_emit_mod(&prog, src_reg, AUX_REG, true);
>>  				EMIT2(0x39, add_2reg(0xC0, src_reg, AUX_REG));
>> -				/* if unsigned '<' goto end_of_jmp2 */
>> -				EMIT2(X86_JB, 0);
>> -				end_of_jmp1 = prog;
>> -
>> -				/* mov r11, src_reg */
>> -				emit_mov_reg(&prog, true, AUX_REG, src_reg);
>> -				/* add r11, insn->off */
>> -				maybe_emit_1mod(&prog, AUX_REG, true);
>> -				EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
>> -				/* jmp if not carry to start_of_ldx
>> -				 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
>> -				 * that has to be rejected.
>> -				 */
>> -				EMIT2(0x73 /* JNC */, 0);
>> -				end_of_jmp2 = prog;
>> +				if (insn->off >= 0) {
>> +					/* cmp src_reg, r11 */
>> +					/* if unsigned '<' goto end_of_jmp2 */
>> +					EMIT2(X86_JB, 0);
>> +					end_of_jmp1 = prog;
>> +
>> +					/* mov r11, src_reg */
>> +					emit_mov_reg(&prog, true, AUX_REG, src_reg);
>> +					/* add r11, insn->off */
>> +					maybe_emit_1mod(&prog, AUX_REG, true);
>> +					EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off);
>> +					/* jmp if not carry to start_of_ldx
>> +					 * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr
>> +					 * that has to be rejected.
>> +					 */
>> +					EMIT2(X86_JNC, 0);
>> +					end_of_jmp2 = prog;
>> +				} else {
>> +					/* cmp src_reg, r11 */
>> +					/* if unsigned '>=' goto start_of_ldx
>> +					 * w/o needing to do check 2
>> +					 */
>> +					EMIT2(X86_JAE, 0);
>> +					end_of_jmp1 = prog;
>> +				}
>>  
>>  				/* xor dst_reg, dst_reg */
>>  				emit_mov_imm32(&prog, false, dst_reg, 0);
>>  				/* jmp byte_after_ldx */
>>  				EMIT2(0xEB, 0);
>>  
>> -				/* populate jmp_offset for JB above to jump to xor dst_reg */
>> -				end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
>> -				/* populate jmp_offset for JNC above to jump to start_of_ldx */
>>  				start_of_ldx = prog;
>> -				end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
>> +				if (insn->off >= 0) {
>> +					/* populate jmp_offset for JB above to jump to xor dst_reg */
>> +					end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1;
>> +					/* populate jmp_offset for JNC above to jump to start_of_ldx */
>> +					end_of_jmp2[-1] = start_of_ldx - end_of_jmp2;
>> +				} else {
>> +					/* populate jmp_offset for JAE above to jump to start_of_ldx */
>> +					end_of_jmp1[-1] = start_of_ldx - end_of_jmp1;
>> +				}
>>  			}
>>  			emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
>>  			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
@ 2022-12-07 14:20   ` kernel test robot
  0 siblings, 0 replies; 51+ messages in thread
From: kernel test robot @ 2022-12-07 14:20 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: llvm, oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo,
	Dave Marchevsky

[-- Attachment #1: Type: text/plain, Size: 6083 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Marchevsky/BPF-rbtree-next-gen-datastructure/20221207-071214
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20221206231000.3180914-7-davemarchevsky%40fb.com
patch subject: [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
config: hexagon-randconfig-r036-20221207
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 6e4cea55f0d1104408b26ac574566a0e4de48036)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/829813f5a42feb5637b0e6acf6a840b65efb1c49
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dave-Marchevsky/BPF-rbtree-next-gen-datastructure/20221207-071214
        git checkout 829813f5a42feb5637b0e6acf6a840b65efb1c49
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/thermal/qcom/ kernel/bpf/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/bpf/helpers.c:6:
   In file included from include/linux/bpf-cgroup.h:11:
   In file included from include/net/sock.h:38:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from kernel/bpf/helpers.c:6:
   In file included from include/linux/bpf-cgroup.h:11:
   In file included from include/net/sock.h:38:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from kernel/bpf/helpers.c:6:
   In file included from include/linux/bpf-cgroup.h:11:
   In file included from include/net/sock.h:38:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> kernel/bpf/helpers.c:1885:9: warning: cast from 'bool (*)(struct bpf_rb_node *, const struct bpf_rb_node *)' (aka '_Bool (*)(struct bpf_rb_node *, const struct bpf_rb_node *)') to 'bool (*)(struct rb_node *, const struct rb_node *)' (aka '_Bool (*)(struct rb_node *, const struct rb_node *)') converts to incompatible function type [-Wcast-function-type-strict]
                         (bool (*)(struct rb_node *, const struct rb_node *))less);
                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   7 warnings generated.


vim +1885 kernel/bpf/helpers.c

  1880	
  1881	void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
  1882			    bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
  1883	{
  1884		rb_add_cached((struct rb_node *)node, (struct rb_root_cached *)root,
> 1885			      (bool (*)(struct rb_node *, const struct rb_node *))less);
  1886	}
  1887	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 134770 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/hexagon 6.1.0-rc7 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="clang version 16.0.0 (git://gitmirror/llvm_project 6e4cea55f0d1104408b26ac574566a0e4de48036)"
CONFIG_GCC_VERSION=0
CONFIG_CC_IS_CLANG=y
CONFIG_CLANG_VERSION=160000
CONFIG_AS_IS_LLVM=y
CONFIG_AS_VERSION=160000
CONFIG_LD_VERSION=0
CONFIG_LD_IS_LLD=y
CONFIG_LLD_VERSION=160000
CONFIG_RUST_IS_AVAILABLE=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
CONFIG_TOOLS_SUPPORT_RELR=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_COMPILE_TEST=y
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
# end of IRQ subsystem

CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_CONTEXT_TRACKING_IDLE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y

#
# BPF subsystem
#
CONFIG_BPF_SYSCALL=y
# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
CONFIG_IKHEADERS=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_FAVOR_DYNMODS is not set
# CONFIG_MEMCG is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_MISC is not set
CONFIG_CGROUP_DEBUG=y
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_RD_GZIP is not set
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
CONFIG_RD_LZO=y
# CONFIG_RD_LZ4 is not set
CONFIG_RD_ZSTD=y
CONFIG_BOOT_CONFIG=y
# CONFIG_BOOT_CONFIG_EMBED is not set
# CONFIG_INITRAMFS_PRESERVE_MTIME is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_EXPERT=y
# CONFIG_MULTIUSER is not set
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_FHANDLE is not set
CONFIG_POSIX_TIMERS=y
# CONFIG_PRINTK is not set
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
CONFIG_SHMEM=y
# CONFIG_AIO is not set
# CONFIG_IO_URING is not set
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_KCMP=y
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PC104=y

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
# end of General setup

#
# Linux Kernel Configuration for Hexagon
#
CONFIG_HEXAGON=y
CONFIG_HEXAGON_PHYS_OFFSET=y
CONFIG_FRAME_POINTER=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_EARLY_PRINTK=y
CONFIG_MMU=y
CONFIG_GENERIC_CSUM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_BUG=y

#
# Machine selection
#
CONFIG_HEXAGON_COMET=y
CONFIG_HEXAGON_ARCH_VERSION=2
CONFIG_CMDLINE=""
CONFIG_SMP=y
CONFIG_NR_CPUS=6
# CONFIG_PAGE_SIZE_4KB is not set
# CONFIG_PAGE_SIZE_16KB is not set
CONFIG_PAGE_SIZE_64KB=y
# CONFIG_PAGE_SIZE_256KB is not set
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
# end of Machine selection

#
# General architecture-dependent options
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_32BIT_OFF_T=y
CONFIG_LTO_NONE=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_ISA_BUS_API=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_ARCH_NO_PREEMPT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
# end of GCOV-based kernel profiling
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODULE_UNLOAD_TAINT_TRACKING=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_FORCE=y
CONFIG_MODULE_SIG_ALL=y
CONFIG_MODULE_SIG_SHA1=y
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha1"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
CONFIG_MODULES_TREE_LOOKUP=y
# CONFIG_BLOCK is not set
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
CONFIG_SLAB_FREELIST_HARDENED=y
# CONFIG_SLUB_STATS is not set
# CONFIG_SLUB_CPU_PARTIAL is not set
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_FLATMEM=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_COMPACT_UNEVICTABLE_DEFAULT=1
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
# CONFIG_CMA is not set
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
CONFIG_USERFAULTFD=y
CONFIG_LRU_GEN=y
CONFIG_LRU_GEN_ENABLED=y
# CONFIG_LRU_GEN_STATS is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_DIAG is not set
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
# CONFIG_UNIX_DIAG is not set
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_INTERFACE is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=y
# CONFIG_NET_KEY is not set
CONFIG_XFRM_ESPINTCP=y
CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_XDP_SOCKETS=y
CONFIG_XDP_SOCKETS_DIAG=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
# CONFIG_IP_ROUTE_MULTIPATH is not set
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
CONFIG_IP_PNP_RARP=y
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_NET_IP_TUNNEL=m
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
# CONFIG_IP_PIMSM_V2 is not set
CONFIG_SYN_COOKIES=y
# CONFIG_NET_IPVTI is not set
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
CONFIG_INET_ESPINTCP=y
CONFIG_INET_IPCOMP=y
CONFIG_INET_TABLE_PERTURB_ORDER=16
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
# CONFIG_INET_UDP_DIAG is not set
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=m
# CONFIG_TCP_CONG_HSTCP is not set
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_NV=y
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=y
CONFIG_TCP_CONG_YEAH=y
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
CONFIG_TCP_CONG_CDG=y
CONFIG_TCP_CONG_BBR=m
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_VENO is not set
CONFIG_DEFAULT_WESTWOOD=y
# CONFIG_DEFAULT_CDG is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="westwood"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=m
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
CONFIG_INET6_ESP=m
# CONFIG_INET6_ESP_OFFLOAD is not set
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
CONFIG_IPV6_ILA=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
# CONFIG_IPV6_SIT is not set
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
# CONFIG_IPV6_MROUTE is not set
CONFIG_IPV6_SEG6_LWTUNNEL=y
CONFIG_IPV6_SEG6_HMAC=y
CONFIG_IPV6_RPL_LWTUNNEL=y
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_MPTCP is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
# CONFIG_NETFILTER_EGRESS is not set
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
CONFIG_NETFILTER_NETLINK_HOOK=y
CONFIG_NETFILTER_NETLINK_ACCT=y
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
CONFIG_NETFILTER_NETLINK_LOG=y
# CONFIG_NETFILTER_NETLINK_OSF is not set
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=y
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
# CONFIG_NF_CONNTRACK_TIMESTAMP is not set
# CONFIG_NF_CONNTRACK_LABELS is not set
# CONFIG_NF_CT_PROTO_DCCP is not set
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
# CONFIG_NF_CT_PROTO_UDPLITE is not set
# CONFIG_NF_CONNTRACK_AMANDA is not set
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
# CONFIG_NF_CONNTRACK_IRC is not set
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
# CONFIG_NF_CONNTRACK_SNMP is not set
CONFIG_NF_CONNTRACK_PPTP=m
# CONFIG_NF_CONNTRACK_SANE is not set
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
CONFIG_NF_NAT=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=y
# CONFIG_NF_TABLES_INET is not set
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=y
# CONFIG_NFT_CT is not set
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=y
# CONFIG_NFT_LIMIT is not set
CONFIG_NFT_MASQ=m
# CONFIG_NFT_REDIR is not set
# CONFIG_NFT_NAT is not set
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_QUOTA=y
CONFIG_NFT_REJECT=m
CONFIG_NFT_COMPAT=y
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
CONFIG_NFT_TPROXY=m
CONFIG_NFT_SYNPROXY=m
CONFIG_NF_DUP_NETDEV=y
# CONFIG_NFT_DUP_NETDEV is not set
CONFIG_NFT_FWD_NETDEV=y
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_CONNMARK=m

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_CHECKSUM is not set
# CONFIG_NETFILTER_XT_TARGET_CLASSIFY is not set
# CONFIG_NETFILTER_XT_TARGET_CONNMARK is not set
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=y
CONFIG_NETFILTER_XT_TARGET_HL=y
CONFIG_NETFILTER_XT_TARGET_HMARK=m
# CONFIG_NETFILTER_XT_TARGET_IDLETIMER is not set
CONFIG_NETFILTER_XT_TARGET_LED=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
# CONFIG_NETFILTER_XT_TARGET_NFQUEUE is not set
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=y
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=y

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
CONFIG_NETFILTER_XT_MATCH_BPF=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
# CONFIG_NETFILTER_XT_MATCH_CONNLABEL is not set
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
# CONFIG_NETFILTER_XT_MATCH_CPU is not set
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=y
CONFIG_NETFILTER_XT_MATCH_DSCP=y
CONFIG_NETFILTER_XT_MATCH_ECN=y
# CONFIG_NETFILTER_XT_MATCH_ESP is not set
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_L2TP=y
# CONFIG_NETFILTER_XT_MATCH_LENGTH is not set
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
CONFIG_NETFILTER_XT_MATCH_NFACCT=y
# CONFIG_NETFILTER_XT_MATCH_OSF is not set
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
CONFIG_NETFILTER_XT_MATCH_POLICY=y
# CONFIG_NETFILTER_XT_MATCH_PHYSDEV is not set
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
CONFIG_NETFILTER_XT_MATCH_REALM=y
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
# CONFIG_NETFILTER_XT_MATCH_STATE is not set
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=y
# end of Core Netfilter Configuration

# CONFIG_IP_SET is not set
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
# CONFIG_IP_VS_PROTO_UDP is not set
CONFIG_IP_VS_PROTO_AH_ESP=y
# CONFIG_IP_VS_PROTO_ESP is not set
CONFIG_IP_VS_PROTO_AH=y
# CONFIG_IP_VS_PROTO_SCTP is not set

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
# CONFIG_IP_VS_WLC is not set
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
# CONFIG_IP_VS_LBLCR is not set
# CONFIG_IP_VS_DH is not set
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_MH=m
# CONFIG_IP_VS_SED is not set
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=y
CONFIG_NF_TPROXY_IPV4=y
# CONFIG_NF_TABLES_IPV4 is not set
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
# CONFIG_NF_LOG_IPV4 is not set
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_AH=y
# CONFIG_IP_NF_MATCH_ECN is not set
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
# CONFIG_IP_NF_NAT is not set
CONFIG_IP_NF_MANGLE=y
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=y
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_ARPTABLES=y
# CONFIG_IP_NF_ARPFILTER is not set
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
# CONFIG_NFT_DUP_IPV6 is not set
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
# CONFIG_IP6_NF_MATCH_HL is not set
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
# CONFIG_IP6_NF_MATCH_RT is not set
# CONFIG_IP6_NF_MATCH_SRH is not set
CONFIG_IP6_NF_TARGET_HL=m
# CONFIG_IP6_NF_FILTER is not set
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
# CONFIG_IP6_NF_NAT is not set
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_CONNTRACK_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
# CONFIG_BRIDGE_EBT_T_NAT is not set
CONFIG_BRIDGE_EBT_802_3=m
# CONFIG_BRIDGE_EBT_AMONG is not set
CONFIG_BRIDGE_EBT_ARP=m
# CONFIG_BRIDGE_EBT_IP is not set
CONFIG_BRIDGE_EBT_IP6=m
# CONFIG_BRIDGE_EBT_LIMIT is not set
CONFIG_BRIDGE_EBT_MARK=m
# CONFIG_BRIDGE_EBT_PKTTYPE is not set
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
# CONFIG_BRIDGE_EBT_REDIRECT is not set
CONFIG_BRIDGE_EBT_SNAT=m
# CONFIG_BRIDGE_EBT_LOG is not set
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
# CONFIG_IP_DCCP_CCID3 is not set
# end of DCCP CCIDs Configuration

#
# DCCP Kernel Hacking
#
CONFIG_IP_DCCP_DEBUG=y
# end of DCCP Kernel Hacking

CONFIG_IP_SCTP=m
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
CONFIG_RDS=m
# CONFIG_RDS_TCP is not set
CONFIG_RDS_DEBUG=y
CONFIG_TIPC=m
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_GARP=y
CONFIG_MRP=y
CONFIG_BRIDGE=m
# CONFIG_BRIDGE_IGMP_SNOOPING is not set
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
CONFIG_BRIDGE_CFM=y
CONFIG_VLAN_8021Q=y
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
CONFIG_LLC=y
CONFIG_LLC2=m
CONFIG_ATALK=y
CONFIG_DEV_APPLETALK=y
CONFIG_IPDDP=m
# CONFIG_IPDDP_ENCAP is not set
CONFIG_X25=y
# CONFIG_LAPB is not set
CONFIG_PHONET=m
CONFIG_6LOWPAN=m
CONFIG_6LOWPAN_DEBUGFS=y
CONFIG_6LOWPAN_NHC=m
CONFIG_6LOWPAN_NHC_DEST=m
CONFIG_6LOWPAN_NHC_FRAGMENT=m
CONFIG_6LOWPAN_NHC_HOP=m
CONFIG_6LOWPAN_NHC_IPV6=m
CONFIG_6LOWPAN_NHC_MOBILITY=m
# CONFIG_6LOWPAN_NHC_ROUTING is not set
# CONFIG_6LOWPAN_NHC_UDP is not set
# CONFIG_6LOWPAN_GHC_EXT_HDR_HOP is not set
# CONFIG_6LOWPAN_GHC_UDP is not set
CONFIG_6LOWPAN_GHC_ICMPV6=m
CONFIG_6LOWPAN_GHC_EXT_HDR_DEST=m
CONFIG_6LOWPAN_GHC_EXT_HDR_FRAG=m
# CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE is not set
CONFIG_IEEE802154=y
CONFIG_IEEE802154_NL802154_EXPERIMENTAL=y
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_NETLINK_DIAG=y
# CONFIG_MPLS is not set
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_QRTR=m
# CONFIG_QRTR_SMD is not set
CONFIG_QRTR_TUN=m
CONFIG_NET_NCSI=y
CONFIG_NCSI_OEM_CMD_GET_MAC=y
CONFIG_NCSI_OEM_CMD_KEEP_PHY=y
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
CONFIG_BT=m
# CONFIG_BT_BREDR is not set
CONFIG_BT_LE=y
CONFIG_BT_6LOWPAN=m
# CONFIG_BT_LEDS is not set
CONFIG_BT_MSFTEXT=y
# CONFIG_BT_AOSPEXT is not set
CONFIG_BT_DEBUGFS=y
CONFIG_BT_SELFTEST=y
# CONFIG_BT_SELFTEST_ECDH is not set
# CONFIG_BT_SELFTEST_SMP is not set
# CONFIG_BT_FEATURE_DEBUG is not set

#
# Bluetooth device drivers
#
CONFIG_BT_QCA=m
CONFIG_BT_MTK=m
# CONFIG_BT_HCIBTSDIO is not set
# CONFIG_BT_HCIVHCI is not set
CONFIG_BT_MRVL=m
CONFIG_BT_MRVL_SDIO=m
# CONFIG_BT_MTKSDIO is not set
CONFIG_BT_MTKUART=m
CONFIG_BT_QCOMSMD=m
CONFIG_BT_VIRTIO=m
# end of Bluetooth device drivers

CONFIG_AF_RXRPC=m
CONFIG_AF_RXRPC_IPV6=y
CONFIG_AF_RXRPC_INJECT_LOSS=y
# CONFIG_AF_RXRPC_DEBUG is not set
# CONFIG_RXKAD is not set
CONFIG_AF_KCM=y
CONFIG_STREAM_PARSER=y
CONFIG_MCTP=y
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_CERTIFICATION_ONUS=y
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_EXTRA_REGDB_KEYDIR=""
CONFIG_CFG80211_REG_CELLULAR_HINTS=y
CONFIG_CFG80211_REG_RELAX_NO_IR=y
# CONFIG_CFG80211_DEFAULT_PS is not set
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
# CONFIG_CFG80211_WEXT is not set
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_MESH=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
CONFIG_MAC80211_MESSAGE_TRACING=y
CONFIG_MAC80211_DEBUG_MENU=y
CONFIG_MAC80211_NOINLINE=y
# CONFIG_MAC80211_VERBOSE_DEBUG is not set
# CONFIG_MAC80211_MLME_DEBUG is not set
CONFIG_MAC80211_STA_DEBUG=y
# CONFIG_MAC80211_HT_DEBUG is not set
CONFIG_MAC80211_OCB_DEBUG=y
CONFIG_MAC80211_IBSS_DEBUG=y
CONFIG_MAC80211_PS_DEBUG=y
# CONFIG_MAC80211_MPL_DEBUG is not set
# CONFIG_MAC80211_MPATH_DEBUG is not set
CONFIG_MAC80211_MHWMP_DEBUG=y
CONFIG_MAC80211_MESH_SYNC_DEBUG=y
CONFIG_MAC80211_MESH_CSA_DEBUG=y
# CONFIG_MAC80211_MESH_PS_DEBUG is not set
# CONFIG_MAC80211_TDLS_DEBUG is not set
# CONFIG_MAC80211_DEBUG_COUNTERS is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
CONFIG_RFKILL=y
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_GPIO=m
CONFIG_NET_9P=y
CONFIG_NET_9P_FD=m
# CONFIG_NET_9P_VIRTIO is not set
CONFIG_NET_9P_DEBUG=y
CONFIG_CAIF=m
CONFIG_CAIF_DEBUG=y
CONFIG_CAIF_NETDEV=m
# CONFIG_CAIF_USB is not set
CONFIG_CEPH_LIB=y
CONFIG_CEPH_LIB_PRETTYDEBUG=y
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
CONFIG_NFC=y
CONFIG_NFC_DIGITAL=y
CONFIG_NFC_NCI=y
CONFIG_NFC_NCI_SPI=y
# CONFIG_NFC_HCI is not set

#
# Near Field Communication (NFC) devices
#
CONFIG_NFC_TRF7970A=m
CONFIG_NFC_SIM=y
CONFIG_NFC_VIRTUAL_NCI=y
CONFIG_NFC_FDP=m
# CONFIG_NFC_FDP_I2C is not set
CONFIG_NFC_PN533=y
# CONFIG_NFC_PN533_I2C is not set
CONFIG_NFC_PN532_UART=y
# CONFIG_NFC_ST_NCI_I2C is not set
# CONFIG_NFC_ST_NCI_SPI is not set
# CONFIG_NFC_NXP_NCI is not set
CONFIG_NFC_S3FWRN5=m
CONFIG_NFC_S3FWRN5_I2C=m
# CONFIG_NFC_S3FWRN82_UART is not set
# CONFIG_NFC_ST95HF is not set
# end of Near Field Communication (NFC) devices

CONFIG_PSAMPLE=y
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SOCK_MSG=y
CONFIG_PAGE_POOL=y
CONFIG_PAGE_POOL_STATS=y
CONFIG_FAILOVER=m
# CONFIG_ETHTOOL_NETLINK is not set

#
# Device Drivers
#
# CONFIG_PCCARD is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_DEVTMPFS_SAFE is not set
CONFIG_STANDALONE=y
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=m
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_FW_LOADER_SYSFS=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_UPLOAD=y
# end of Firmware loader

CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
CONFIG_TEST_ASYNC_DRIVER_PROBE=m
CONFIG_GENERIC_CPU_DEVICES=y
CONFIG_SOC_BUS=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPI=y
CONFIG_REGMAP_SPMI=m
CONFIG_REGMAP_W1=y
CONFIG_REGMAP_MMIO=y
CONFIG_REGMAP_IRQ=y
CONFIG_REGMAP_I3C=m
CONFIG_REGMAP_SPI_AVMM=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_ARM_INTEGRATOR_LM is not set
# CONFIG_BT1_APB is not set
# CONFIG_BT1_AXI is not set
# CONFIG_MOXTET is not set
# CONFIG_INTEL_IXP4XX_EB is not set
# CONFIG_QCOM_EBI2 is not set
# CONFIG_MHI_BUS is not set
CONFIG_MHI_BUS_EP=y
# end of Bus devices

CONFIG_CONNECTOR=m

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# CONFIG_ARM_SCMI_PROTOCOL is not set
CONFIG_ARM_SCMI_POWER_DOMAIN=y
CONFIG_ARM_SCMI_POWER_CONTROL=y
# end of ARM System Control and Management Interface Protocol

# CONFIG_ARM_SCPI_POWER_DOMAIN is not set
# CONFIG_FIRMWARE_MEMMAP is not set
CONFIG_QCOM_SCM=m
CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT=y
CONFIG_BCM47XX_NVRAM=y
CONFIG_BCM47XX_SPROM=y
# CONFIG_TEE_BNXT_FW is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
CONFIG_MTD=m
CONFIG_MTD_TESTS=m

#
# Partition parsers
#
# CONFIG_MTD_AR7_PARTS is not set
# CONFIG_MTD_BCM63XX_PARTS is not set
CONFIG_MTD_BRCM_U_BOOT=m
CONFIG_MTD_CMDLINE_PARTS=m
CONFIG_MTD_OF_PARTS=m
CONFIG_MTD_OF_PARTS_BCM4908=y
CONFIG_MTD_OF_PARTS_LINKSYS_NS=y
CONFIG_MTD_PARSER_IMAGETAG=m
CONFIG_MTD_PARSER_TRX=m
CONFIG_MTD_SHARPSL_PARTS=m
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED=y
CONFIG_MTD_REDBOOT_PARTS_READONLY=y
# CONFIG_MTD_QCOMSMEM_PARTS is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
# CONFIG_MTD_OOPS is not set
CONFIG_MTD_PARTITIONED_MASTER=y

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
CONFIG_MTD_CFI_ADV_OPTIONS=y
# CONFIG_MTD_CFI_NOSWAP is not set
# CONFIG_MTD_CFI_BE_BYTE_SWAP is not set
CONFIG_MTD_CFI_LE_BYTE_SWAP=y
# CONFIG_MTD_CFI_GEOMETRY is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
CONFIG_MTD_OTP=y
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m
# end of RAM/ROM/Flash chip drivers

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
CONFIG_MTD_PHYSMAP=m
CONFIG_MTD_PHYSMAP_COMPAT=y
CONFIG_MTD_PHYSMAP_START=0x8000000
CONFIG_MTD_PHYSMAP_LEN=0
CONFIG_MTD_PHYSMAP_BANKWIDTH=2
# CONFIG_MTD_PHYSMAP_OF is not set
CONFIG_MTD_PHYSMAP_GPIO_ADDR=y
CONFIG_MTD_SC520CDP=m
CONFIG_MTD_NETSC520=m
CONFIG_MTD_TS5500=m
CONFIG_MTD_PLATRAM=m
# end of Mapping drivers for chip access

#
# Self-contained MTD device drivers
#
CONFIG_MTD_DATAFLASH=m
# CONFIG_MTD_DATAFLASH_WRITE_VERIFY is not set
CONFIG_MTD_DATAFLASH_OTP=y
CONFIG_MTD_MCHP23K256=m
# CONFIG_MTD_MCHP48L640 is not set
# CONFIG_MTD_SPEAR_SMI is not set
# CONFIG_MTD_SST25L is not set
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=m
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128

#
# Disk-On-Chip Device Drivers
#
CONFIG_MTD_DOCG3=m
CONFIG_BCH_CONST_M=14
CONFIG_BCH_CONST_T=4
# end of Self-contained MTD device drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=m
CONFIG_MTD_ONENAND=m
CONFIG_MTD_ONENAND_VERIFY_WRITE=y
CONFIG_MTD_ONENAND_GENERIC=m
# CONFIG_MTD_ONENAND_SAMSUNG is not set
# CONFIG_MTD_ONENAND_OTP is not set
CONFIG_MTD_ONENAND_2X_PROGRAM=y
# CONFIG_MTD_RAW_NAND is not set
# CONFIG_MTD_SPI_NAND is not set

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
CONFIG_MTD_NAND_ECC_SW_HAMMING=y
# CONFIG_MTD_NAND_ECC_SW_HAMMING_SMC is not set
# CONFIG_MTD_NAND_ECC_SW_BCH is not set
# CONFIG_MTD_NAND_ECC_MXIC is not set
CONFIG_MTD_NAND_ECC_MEDIATEK=m
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
# end of LPDDR & LPDDR2 PCM memory drivers

# CONFIG_MTD_SPI_NOR is not set
# CONFIG_MTD_UBI is not set
CONFIG_MTD_HYPERBUS=m
CONFIG_HBMC_AM654=m
CONFIG_DTC=y
CONFIG_OF=y
# CONFIG_OF_UNITTEST is not set
CONFIG_OF_ALL_DTBS=y
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_KOBJ=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_ADDRESS=y
CONFIG_OF_IRQ=y
CONFIG_OF_RESERVED_MEM=y
CONFIG_OF_RESOLVE=y
CONFIG_OF_OVERLAY=y
CONFIG_PARPORT=y
CONFIG_PARPORT_AX88796=y
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y

#
# NVME Support
#
# end of NVME Support

#
# Misc devices
#
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ATMEL_SSC is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_GEHC_ACHC is not set
# CONFIG_HI6421V600_IRQ is not set
CONFIG_QCOM_COINCELL=y
# CONFIG_QCOM_FASTRPC is not set
# CONFIG_APDS9802ALS is not set
CONFIG_ISL29003=y
CONFIG_ISL29020=y
CONFIG_SENSORS_TSL2550=y
CONFIG_SENSORS_BH1770=y
# CONFIG_SENSORS_APDS990X is not set
CONFIG_HMC6352=y
# CONFIG_DS1682 is not set
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_HISI_HIKEY_USB is not set
CONFIG_OPEN_DICE=y
# CONFIG_VCPU_STALL_DETECTOR is not set
CONFIG_C2PORT=y

#
# EEPROM support
#
CONFIG_EEPROM_AT24=m
CONFIG_EEPROM_AT25=m
CONFIG_EEPROM_LEGACY=y
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=y
# CONFIG_EEPROM_93XX46 is not set
CONFIG_EEPROM_IDT_89HPESX=m
CONFIG_EEPROM_EE1004=y
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

CONFIG_ALTERA_STAPL=y
# CONFIG_ECHO is not set
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
# end of SCSI device support

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# end of IEEE 1394 (FireWire) support

# CONFIG_NETDEVICES is not set

#
# Input device support
#
# CONFIG_INPUT is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=m
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
# CONFIG_TTY is not set
CONFIG_SERIAL_DEV_BUS=y
# CONFIG_PRINTER is not set
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
# CONFIG_IPMI_PANIC_STRING is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_IPMB=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_IPMI_KCS_BMC=m
# CONFIG_ASPEED_KCS_IPMI_BMC is not set
CONFIG_NPCM7XX_KCS_IPMI_BMC=m
CONFIG_IPMI_KCS_BMC_CDEV_IPMI=m
CONFIG_ASPEED_BT_IPMI_BMC=y
CONFIG_IPMB_DEVICE_INTERFACE=m
CONFIG_HW_RANDOM=m
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_ATMEL=m
CONFIG_HW_RANDOM_BA431=m
# CONFIG_HW_RANDOM_BCM2835 is not set
CONFIG_HW_RANDOM_IPROC_RNG200=m
CONFIG_HW_RANDOM_IXP4XX=m
# CONFIG_HW_RANDOM_OMAP is not set
# CONFIG_HW_RANDOM_OMAP3_ROM is not set
CONFIG_HW_RANDOM_VIRTIO=m
# CONFIG_HW_RANDOM_IMX_RNGC is not set
# CONFIG_HW_RANDOM_NOMADIK is not set
CONFIG_HW_RANDOM_STM32=m
CONFIG_HW_RANDOM_MESON=m
# CONFIG_HW_RANDOM_MTK is not set
CONFIG_HW_RANDOM_EXYNOS=m
# CONFIG_HW_RANDOM_NPCM is not set
CONFIG_HW_RANDOM_KEYSTONE=m
CONFIG_HW_RANDOM_CCTRNG=m
CONFIG_HW_RANDOM_XIPHERA=m
CONFIG_DEVMEM=y
CONFIG_TCG_TPM=m
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=m
CONFIG_TCG_TIS=m
CONFIG_TCG_TIS_SPI=m
CONFIG_TCG_TIS_SPI_CR50=y
CONFIG_TCG_TIS_I2C=m
CONFIG_TCG_TIS_SYNQUACER=m
CONFIG_TCG_TIS_I2C_CR50=m
CONFIG_TCG_TIS_I2C_ATMEL=m
# CONFIG_TCG_TIS_I2C_INFINEON is not set
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_ATMEL=m
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_XILLYBUS_CLASS=y
CONFIG_XILLYBUS=y
# CONFIG_XILLYBUS_OF is not set
# CONFIG_RANDOM_TRUST_CPU is not set
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_COMPAT is not set
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_ARB_GPIO_CHALLENGE=y
# CONFIG_I2C_MUX_GPIO is not set
CONFIG_I2C_MUX_GPMUX=y
CONFIG_I2C_MUX_LTC4306=m
CONFIG_I2C_MUX_PCA9541=m
CONFIG_I2C_MUX_PCA954x=y
CONFIG_I2C_MUX_REG=m
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=m
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#
CONFIG_I2C_HIX5HD2=y

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_ALTERA is not set
CONFIG_I2C_ASPEED=m
CONFIG_I2C_AT91=m
# CONFIG_I2C_AT91_SLAVE_EXPERIMENTAL is not set
CONFIG_I2C_AXXIA=m
CONFIG_I2C_BCM2835=m
# CONFIG_I2C_BCM_IPROC is not set
CONFIG_I2C_BCM_KONA=m
CONFIG_I2C_BRCMSTB=y
# CONFIG_I2C_CADENCE is not set
CONFIG_I2C_CBUS_GPIO=y
CONFIG_I2C_DAVINCI=m
CONFIG_I2C_DESIGNWARE_CORE=m
CONFIG_I2C_DESIGNWARE_SLAVE=y
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DIGICOLOR=m
CONFIG_I2C_EMEV2=m
# CONFIG_I2C_EXYNOS5 is not set
CONFIG_I2C_GPIO=m
CONFIG_I2C_GPIO_FAULT_INJECTOR=y
# CONFIG_I2C_HIGHLANDER is not set
# CONFIG_I2C_HISI is not set
CONFIG_I2C_IMG=y
# CONFIG_I2C_IMX is not set
# CONFIG_I2C_IMX_LPI2C is not set
CONFIG_I2C_IOP3XX=m
CONFIG_I2C_JZ4780=y
# CONFIG_I2C_KEMPLD is not set
CONFIG_I2C_LPC2K=m
CONFIG_I2C_MESON=y
CONFIG_I2C_MICROCHIP_CORE=m
CONFIG_I2C_MT65XX=m
# CONFIG_I2C_MT7621 is not set
# CONFIG_I2C_MV64XXX is not set
# CONFIG_I2C_MXS is not set
CONFIG_I2C_NPCM=m
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_OMAP is not set
# CONFIG_I2C_OWL is not set
CONFIG_I2C_APPLE=m
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_PNX=m
CONFIG_I2C_PXA=m
# CONFIG_I2C_PXA_SLAVE is not set
CONFIG_I2C_QCOM_CCI=y
# CONFIG_I2C_QUP is not set
CONFIG_I2C_RIIC=m
CONFIG_I2C_RK3X=m
# CONFIG_I2C_RZV2M is not set
# CONFIG_I2C_S3C2410 is not set
# CONFIG_I2C_SH_MOBILE is not set
# CONFIG_I2C_SIMTEC is not set
CONFIG_I2C_SPRD=y
CONFIG_I2C_ST=y
CONFIG_I2C_STM32F4=y
# CONFIG_I2C_STM32F7 is not set
CONFIG_I2C_SUN6I_P2WI=y
# CONFIG_I2C_SYNQUACER is not set
CONFIG_I2C_TEGRA_BPMP=y
# CONFIG_I2C_UNIPHIER is not set
# CONFIG_I2C_UNIPHIER_F is not set
# CONFIG_I2C_VERSATILE is not set
CONFIG_I2C_WMT=m
CONFIG_I2C_XILINX=m
CONFIG_I2C_XLP9XX=y
CONFIG_I2C_RCAR=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# CONFIG_I2C_CROS_EC_TUNNEL is not set
# CONFIG_I2C_FSI is not set
CONFIG_I2C_VIRTIO=y
# end of I2C Hardware Bus support

# CONFIG_I2C_STUB is not set
CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=y
# CONFIG_I2C_SLAVE_TESTUNIT is not set
CONFIG_I2C_DEBUG_CORE=y
# CONFIG_I2C_DEBUG_ALGO is not set
CONFIG_I2C_DEBUG_BUS=y
# end of I2C support

CONFIG_I3C=y
CONFIG_CDNS_I3C_MASTER=y
# CONFIG_DW_I3C_MASTER is not set
CONFIG_SVC_I3C_MASTER=y
# CONFIG_MIPI_I3C_HCI is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
CONFIG_SPI_ALTERA=m
CONFIG_SPI_ALTERA_CORE=m
CONFIG_SPI_AR934X=m
CONFIG_SPI_ATH79=m
CONFIG_SPI_ARMADA_3700=m
CONFIG_SPI_ASPEED_SMC=y
CONFIG_SPI_ATMEL=m
# CONFIG_SPI_AT91_USART is not set
CONFIG_SPI_ATMEL_QUADSPI=m
# CONFIG_SPI_AXI_SPI_ENGINE is not set
CONFIG_SPI_BCM2835=y
CONFIG_SPI_BCM2835AUX=y
CONFIG_SPI_BCM63XX=m
CONFIG_SPI_BCM63XX_HSSPI=y
CONFIG_SPI_BCM_QSPI=y
CONFIG_SPI_BITBANG=y
CONFIG_SPI_BUTTERFLY=y
CONFIG_SPI_CADENCE=m
# CONFIG_SPI_CADENCE_QUADSPI is not set
CONFIG_SPI_CLPS711X=m
# CONFIG_SPI_DESIGNWARE is not set
CONFIG_SPI_EP93XX=m
CONFIG_SPI_FSI=m
CONFIG_SPI_FSL_LPSPI=m
CONFIG_SPI_FSL_QUADSPI=m
CONFIG_SPI_GXP=y
CONFIG_SPI_HISI_KUNPENG=y
# CONFIG_SPI_HISI_SFC_V3XX is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
CONFIG_SPI_GPIO=y
CONFIG_SPI_IMG_SPFI=y
CONFIG_SPI_IMX=y
# CONFIG_SPI_INGENIC is not set
CONFIG_SPI_JCORE=m
CONFIG_SPI_LM70_LLP=y
CONFIG_SPI_LP8841_RTC=y
# CONFIG_SPI_FSL_SPI is not set
# CONFIG_SPI_FSL_DSPI is not set
CONFIG_SPI_MESON_SPICC=y
CONFIG_SPI_MESON_SPIFC=m
# CONFIG_SPI_MICROCHIP_CORE is not set
CONFIG_SPI_MICROCHIP_CORE_QSPI=y
# CONFIG_SPI_MT65XX is not set
CONFIG_SPI_MT7621=m
CONFIG_SPI_MTK_NOR=m
CONFIG_SPI_MTK_SNFI=m
CONFIG_SPI_NPCM_FIU=m
CONFIG_SPI_NPCM_PSPI=m
CONFIG_SPI_LANTIQ_SSC=y
# CONFIG_SPI_OC_TINY is not set
CONFIG_SPI_OMAP24XX=y
# CONFIG_SPI_TI_QSPI is not set
CONFIG_SPI_OMAP_100K=y
# CONFIG_SPI_ORION is not set
CONFIG_SPI_PIC32=m
CONFIG_SPI_PIC32_SQI=y
CONFIG_SPI_PXA2XX=m
# CONFIG_SPI_ROCKCHIP is not set
CONFIG_SPI_ROCKCHIP_SFC=y
# CONFIG_SPI_RPCIF is not set
CONFIG_SPI_RSPI=m
CONFIG_SPI_QUP=y
# CONFIG_SPI_S3C64XX is not set
# CONFIG_SPI_SC18IS602 is not set
CONFIG_SPI_SH_MSIOF=y
CONFIG_SPI_SH=y
CONFIG_SPI_SH_HSPI=y
CONFIG_SPI_SIFIVE=m
CONFIG_SPI_SPRD=m
CONFIG_SPI_SPRD_ADI=y
# CONFIG_SPI_STM32 is not set
CONFIG_SPI_ST_SSC4=m
CONFIG_SPI_SUN4I=m
CONFIG_SPI_SUN6I=m
# CONFIG_SPI_SUNPLUS_SP7021 is not set
CONFIG_SPI_SYNQUACER=m
CONFIG_SPI_MXIC=y
CONFIG_SPI_TEGRA210_QUAD=y
CONFIG_SPI_TEGRA114=y
CONFIG_SPI_TEGRA20_SFLASH=y
CONFIG_SPI_TEGRA20_SLINK=y
CONFIG_SPI_UNIPHIER=y
CONFIG_SPI_XCOMM=m
CONFIG_SPI_XILINX=m
# CONFIG_SPI_XLP is not set
# CONFIG_SPI_XTENSA_XTFPGA is not set
CONFIG_SPI_ZYNQ_QSPI=m
CONFIG_SPI_ZYNQMP_GQSPI=y
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
CONFIG_SPI_SPIDEV=m
CONFIG_SPI_LOOPBACK_TEST=m
CONFIG_SPI_TLE62X0=y
# CONFIG_SPI_SLAVE is not set
CONFIG_SPI_DYNAMIC=y
CONFIG_SPMI=m
CONFIG_SPMI_HISI3670=m
# CONFIG_SPMI_MSM_PMIC_ARB is not set
CONFIG_SPMI_MTK_PMIF=m
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set
CONFIG_NTP_PPS=y

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_PARPORT is not set
CONFIG_PPS_CLIENT_GPIO=y

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_OPTIONAL=y
CONFIG_PTP_1588_CLOCK_DTE=m
CONFIG_PTP_1588_CLOCK_QORIQ=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
CONFIG_PTP_1588_CLOCK_IDT82P33=m
CONFIG_PTP_1588_CLOCK_IDTCM=y
# end of PTP clock support

# CONFIG_PINCTRL is not set
CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_OF_GPIO=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y
CONFIG_GPIO_GENERIC=y
CONFIG_GPIO_REGMAP=y
CONFIG_GPIO_MAX730X=y

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_74XX_MMIO=m
CONFIG_GPIO_ALTERA=y
# CONFIG_GPIO_ASPEED is not set
# CONFIG_GPIO_ASPEED_SGPIO is not set
# CONFIG_GPIO_ATH79 is not set
# CONFIG_GPIO_RASPBERRYPI_EXP is not set
# CONFIG_GPIO_BCM_KONA is not set
CONFIG_GPIO_BCM_XGS_IPROC=y
CONFIG_GPIO_BRCMSTB=m
# CONFIG_GPIO_CADENCE is not set
CONFIG_GPIO_CLPS711X=m
CONFIG_GPIO_DWAPB=y
CONFIG_GPIO_EIC_SPRD=m
CONFIG_GPIO_EM=y
# CONFIG_GPIO_FTGPIO010 is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_GRGPIO=m
CONFIG_GPIO_HISI=y
CONFIG_GPIO_HLWD=m
# CONFIG_GPIO_IOP is not set
CONFIG_GPIO_LOGICVC=m
# CONFIG_GPIO_LPC18XX is not set
CONFIG_GPIO_LPC32XX=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_MENZ127 is not set
# CONFIG_GPIO_MPC8XXX is not set
CONFIG_GPIO_MT7621=y
CONFIG_GPIO_MXC=m
CONFIG_GPIO_MXS=y
CONFIG_GPIO_PMIC_EIC_SPRD=y
CONFIG_GPIO_PXA=y
# CONFIG_GPIO_RCAR is not set
CONFIG_GPIO_RDA=y
CONFIG_GPIO_ROCKCHIP=y
CONFIG_GPIO_SAMA5D2_PIOBU=y
CONFIG_GPIO_SIFIVE=y
CONFIG_GPIO_SNPS_CREG=y
CONFIG_GPIO_SPRD=m
# CONFIG_GPIO_STP_XWAY is not set
CONFIG_GPIO_SYSCON=y
# CONFIG_GPIO_TEGRA is not set
CONFIG_GPIO_TEGRA186=y
CONFIG_GPIO_TS4800=m
CONFIG_GPIO_UNIPHIER=m
CONFIG_GPIO_VISCONTI=y
CONFIG_GPIO_XGENE_SB=m
CONFIG_GPIO_XILINX=y
CONFIG_GPIO_XLP=y
# CONFIG_GPIO_AMD_FCH is not set
CONFIG_GPIO_IDT3243X=m
# end of Memory mapped GPIO drivers

#
# I2C GPIO expanders
#
CONFIG_GPIO_ADNP=m
CONFIG_GPIO_GW_PLD=y
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
# CONFIG_GPIO_PCF857X is not set
CONFIG_GPIO_TPIC2810=m
# CONFIG_GPIO_TS4900 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
CONFIG_GPIO_ARIZONA=m
CONFIG_GPIO_BD71815=m
# CONFIG_GPIO_BD71828 is not set
CONFIG_GPIO_BD9571MWV=y
# CONFIG_GPIO_DA9052 is not set
CONFIG_GPIO_DA9055=m
CONFIG_GPIO_KEMPLD=m
# CONFIG_GPIO_LP3943 is not set
CONFIG_GPIO_LP87565=y
CONFIG_GPIO_SL28CPLD=y
# CONFIG_GPIO_TC3589X is not set
# CONFIG_GPIO_TPS65086 is not set
CONFIG_GPIO_TPS65218=m
CONFIG_GPIO_TPS6586X=y
CONFIG_GPIO_TPS65910=y
# CONFIG_GPIO_TQMX86 is not set
# CONFIG_GPIO_TWL4030 is not set
# CONFIG_GPIO_WM831X is not set
CONFIG_GPIO_WM8994=m
# end of MFD GPIO expanders

#
# SPI GPIO expanders
#
CONFIG_GPIO_74X164=y
# CONFIG_GPIO_MAX3191X is not set
CONFIG_GPIO_MAX7301=y
CONFIG_GPIO_MC33880=m
# CONFIG_GPIO_PISOSR is not set
CONFIG_GPIO_XRA1403=m
# end of SPI GPIO expanders

#
# Virtual GPIO drivers
#
CONFIG_GPIO_AGGREGATOR=m
# CONFIG_GPIO_MOCKUP is not set
CONFIG_GPIO_VIRTIO=y
CONFIG_GPIO_SIM=m
# end of Virtual GPIO drivers

CONFIG_W1=y
# CONFIG_W1_CON is not set

#
# 1-wire Bus Masters
#
# CONFIG_W1_MASTER_DS2482 is not set
CONFIG_W1_MASTER_MXC=m
CONFIG_W1_MASTER_DS1WM=y
CONFIG_W1_MASTER_GPIO=m
# CONFIG_W1_MASTER_SGI is not set
# end of 1-wire Bus Masters

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=y
CONFIG_W1_SLAVE_SMEM=y
CONFIG_W1_SLAVE_DS2405=y
CONFIG_W1_SLAVE_DS2408=m
CONFIG_W1_SLAVE_DS2408_READBACK=y
CONFIG_W1_SLAVE_DS2413=m
CONFIG_W1_SLAVE_DS2406=y
CONFIG_W1_SLAVE_DS2423=m
# CONFIG_W1_SLAVE_DS2805 is not set
CONFIG_W1_SLAVE_DS2430=m
CONFIG_W1_SLAVE_DS2431=m
CONFIG_W1_SLAVE_DS2433=m
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_W1_SLAVE_DS2438=y
CONFIG_W1_SLAVE_DS250X=y
CONFIG_W1_SLAVE_DS2780=y
CONFIG_W1_SLAVE_DS2781=y
CONFIG_W1_SLAVE_DS28E04=m
# CONFIG_W1_SLAVE_DS28E17 is not set
# end of 1-wire Slaves

# CONFIG_POWER_RESET is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_PDA_POWER=y
CONFIG_GENERIC_ADC_BATTERY=m
CONFIG_IP5XXX_POWER=m
# CONFIG_WM831X_BACKUP is not set
CONFIG_WM831X_POWER=y
CONFIG_TEST_POWER=y
CONFIG_CHARGER_ADP5061=y
CONFIG_BATTERY_ACT8945A=y
CONFIG_BATTERY_CW2015=m
CONFIG_BATTERY_DS2760=y
CONFIG_BATTERY_DS2780=y
CONFIG_BATTERY_DS2781=y
CONFIG_BATTERY_DS2782=y
CONFIG_BATTERY_LEGO_EV3=m
CONFIG_BATTERY_SAMSUNG_SDI=y
# CONFIG_BATTERY_INGENIC is not set
# CONFIG_BATTERY_SBS is not set
CONFIG_CHARGER_SBS=m
CONFIG_MANAGER_SBS=m
CONFIG_BATTERY_BQ27XXX=y
CONFIG_BATTERY_BQ27XXX_I2C=m
CONFIG_BATTERY_BQ27XXX_HDQ=m
CONFIG_BATTERY_BQ27XXX_DT_UPDATES_NVM=y
# CONFIG_BATTERY_DA9030 is not set
CONFIG_BATTERY_DA9052=y
CONFIG_CHARGER_AXP20X=m
# CONFIG_BATTERY_AXP20X is not set
CONFIG_AXP20X_POWER=m
CONFIG_BATTERY_MAX17040=m
# CONFIG_BATTERY_MAX17042 is not set
CONFIG_BATTERY_MAX1721X=y
# CONFIG_BATTERY_TWL4030_MADC is not set
CONFIG_BATTERY_RX51=m
# CONFIG_CHARGER_ISP1704 is not set
# CONFIG_CHARGER_MAX8903 is not set
CONFIG_CHARGER_TWL4030=m
# CONFIG_CHARGER_LP8727 is not set
CONFIG_CHARGER_GPIO=y
CONFIG_CHARGER_MANAGER=y
# CONFIG_CHARGER_LT3651 is not set
CONFIG_CHARGER_LTC4162L=m
CONFIG_CHARGER_MAX14577=m
# CONFIG_CHARGER_DETECTOR_MAX14656 is not set
CONFIG_CHARGER_MAX77693=m
CONFIG_CHARGER_MAX77976=y
CONFIG_CHARGER_MAX8998=y
CONFIG_CHARGER_MP2629=m
CONFIG_CHARGER_MT6370=m
CONFIG_CHARGER_QCOM_SMBB=y
# CONFIG_CHARGER_BQ2415X is not set
CONFIG_CHARGER_BQ24190=y
CONFIG_CHARGER_BQ24257=y
CONFIG_CHARGER_BQ24735=m
CONFIG_CHARGER_BQ2515X=m
CONFIG_CHARGER_BQ25890=y
CONFIG_CHARGER_BQ25980=y
CONFIG_CHARGER_BQ256XX=m
# CONFIG_CHARGER_RK817 is not set
CONFIG_CHARGER_SMB347=m
# CONFIG_CHARGER_TPS65217 is not set
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
CONFIG_BATTERY_GOLDFISH=m
# CONFIG_BATTERY_RT5033 is not set
# CONFIG_CHARGER_RT9455 is not set
CONFIG_CHARGER_CROS_PCHG=y
# CONFIG_CHARGER_SC2731 is not set
CONFIG_FUEL_GAUGE_SC27XX=m
CONFIG_CHARGER_UCS1002=m
# CONFIG_CHARGER_BD99954 is not set
CONFIG_RN5T618_POWER=m
CONFIG_BATTERY_ACER_A500=m
# CONFIG_BATTERY_UG3105 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
CONFIG_SENSORS_AD7314=m
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM1177=m
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
CONFIG_SENSORS_ADT7310=m
# CONFIG_SENSORS_ADT7410 is not set
CONFIG_SENSORS_ADT7411=m
# CONFIG_SENSORS_ADT7462 is not set
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
CONFIG_SENSORS_AHT10=m
CONFIG_SENSORS_AS370=m
CONFIG_SENSORS_ASC7621=m
CONFIG_SENSORS_AXI_FAN_CONTROL=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ASPEED=m
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_BT1_PVT is not set
CONFIG_SENSORS_DS620=m
# CONFIG_SENSORS_DS1621 is not set
CONFIG_SENSORS_DA9052_ADC=m
# CONFIG_SENSORS_DA9055 is not set
CONFIG_SENSORS_SPARX5=m
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_GSC=m
# CONFIG_SENSORS_MC13783_ADC is not set
CONFIG_SENSORS_FSCHMD=m
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_G760A is not set
CONFIG_SENSORS_G762=m
# CONFIG_SENSORS_GPIO_FAN is not set
CONFIG_SENSORS_HIH6130=m
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_IIO_HWMON=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LAN966X=m
# CONFIG_SENSORS_LINEAGE is not set
CONFIG_SENSORS_LTC2945=m
CONFIG_SENSORS_LTC2947=m
CONFIG_SENSORS_LTC2947_I2C=m
CONFIG_SENSORS_LTC2947_SPI=m
# CONFIG_SENSORS_LTC2990 is not set
CONFIG_SENSORS_LTC2992=m
CONFIG_SENSORS_LTC4151=m
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
CONFIG_SENSORS_LTC4260=m
CONFIG_SENSORS_LTC4261=m
CONFIG_SENSORS_MAX1111=m
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
CONFIG_SENSORS_MAX31722=m
CONFIG_SENSORS_MAX31730=m
CONFIG_SENSORS_MAX31760=m
# CONFIG_SENSORS_MAX6620 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6650=m
# CONFIG_SENSORS_MAX6697 is not set
CONFIG_SENSORS_MAX31790=m
# CONFIG_SENSORS_MCP3021 is not set
# CONFIG_SENSORS_TC654 is not set
CONFIG_SENSORS_TPS23861=m
CONFIG_SENSORS_MENF21BMC_HWMON=m
CONFIG_SENSORS_MR75203=m
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM70=m
CONFIG_SENSORS_LM73=m
# CONFIG_SENSORS_LM75 is not set
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LM95234 is not set
# CONFIG_SENSORS_LM95241 is not set
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
# CONFIG_SENSORS_NTC_THERMISTOR is not set
CONFIG_SENSORS_NCT6683=m
CONFIG_SENSORS_NCT6775_CORE=m
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT6775_I2C is not set
CONFIG_SENSORS_NCT7802=m
# CONFIG_SENSORS_NPCM7XX is not set
CONFIG_SENSORS_NSA320=m
CONFIG_SENSORS_OCC_P8_I2C=m
# CONFIG_SENSORS_OCC_P9_SBE is not set
CONFIG_SENSORS_OCC=m
CONFIG_SENSORS_PCF8591=m
# CONFIG_PMBUS is not set
CONFIG_SENSORS_PWM_FAN=m
# CONFIG_SENSORS_RASPBERRYPI_HWMON is not set
CONFIG_SENSORS_SL28CPLD=m
# CONFIG_SENSORS_SBTSI is not set
CONFIG_SENSORS_SBRMI=m
# CONFIG_SENSORS_SHT15 is not set
CONFIG_SENSORS_SHT21=m
CONFIG_SENSORS_SHT3x=m
CONFIG_SENSORS_SHT4x=m
CONFIG_SENSORS_SHTC1=m
CONFIG_SENSORS_SY7636A=m
# CONFIG_SENSORS_DME1737 is not set
CONFIG_SENSORS_EMC1403=m
CONFIG_SENSORS_EMC2103=m
# CONFIG_SENSORS_EMC2305 is not set
CONFIG_SENSORS_EMC6W201=m
# CONFIG_SENSORS_SMSC47M1 is not set
CONFIG_SENSORS_SMSC47M192=m
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_STTS751 is not set
CONFIG_SENSORS_SMM665=m
CONFIG_SENSORS_ADC128D818=m
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA238 is not set
CONFIG_SENSORS_INA3221=m
CONFIG_SENSORS_TC74=m
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
CONFIG_SENSORS_TMP464=m
CONFIG_SENSORS_TMP513=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_W83773G=m
CONFIG_SENSORS_W83781D=m
# CONFIG_SENSORS_W83791D is not set
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_WM831X is not set
CONFIG_SENSORS_INTEL_M10_BMC_HWMON=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
# CONFIG_THERMAL_OF is not set
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
# CONFIG_THERMAL_GOV_USER_SPACE is not set
CONFIG_DEVFREQ_THERMAL=y
CONFIG_THERMAL_EMULATION=y
CONFIG_THERMAL_MMIO=m
CONFIG_HISI_THERMAL=m
# CONFIG_IMX_THERMAL is not set
# CONFIG_IMX8MM_THERMAL is not set
# CONFIG_K3_THERMAL is not set
# CONFIG_SPEAR_THERMAL is not set
CONFIG_SUN8I_THERMAL=y
CONFIG_ROCKCHIP_THERMAL=m
# CONFIG_RCAR_THERMAL is not set
# CONFIG_RCAR_GEN3_THERMAL is not set
CONFIG_RZG2L_THERMAL=m
CONFIG_KIRKWOOD_THERMAL=y
# CONFIG_DOVE_THERMAL is not set
CONFIG_ARMADA_THERMAL=m
CONFIG_DA9062_THERMAL=y
# CONFIG_MTK_THERMAL is not set

#
# Intel thermal drivers
#

#
# ACPI INT340X thermal drivers
#
# end of ACPI INT340X thermal drivers
# end of Intel thermal drivers

#
# Broadcom thermal drivers
#
# CONFIG_BRCMSTB_THERMAL is not set
# CONFIG_BCM_NS_THERMAL is not set
CONFIG_BCM_SR_THERMAL=y
# end of Broadcom thermal drivers

#
# Texas Instruments thermal drivers
#
# CONFIG_TI_SOC_THERMAL is not set
# end of Texas Instruments thermal drivers

#
# Samsung thermal drivers
#
# end of Samsung thermal drivers

#
# NVIDIA Tegra thermal drivers
#
# CONFIG_TEGRA_SOCTHERM is not set
# CONFIG_TEGRA_BPMP_THERMAL is not set
CONFIG_TEGRA30_TSENSOR=m
# end of NVIDIA Tegra thermal drivers

CONFIG_GENERIC_ADC_THERMAL=m

#
# Qualcomm thermal drivers
#
CONFIG_QCOM_SPMI_ADC_TM5=m
CONFIG_QCOM_SPMI_TEMP_ALARM=m
# end of Qualcomm thermal drivers

CONFIG_SPRD_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
# CONFIG_BCMA_HOST_SOC is not set
# CONFIG_BCMA_DRIVER_MIPS is not set
# CONFIG_BCMA_DRIVER_GMAC_CMN is not set
# CONFIG_BCMA_DRIVER_GPIO is not set
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_ACT8945A=m
CONFIG_MFD_SUN4I_GPADC=y
CONFIG_MFD_AS3711=y
CONFIG_MFD_AS3722=m
# CONFIG_PMIC_ADP5520 is not set
CONFIG_MFD_AAT2870_CORE=y
CONFIG_MFD_AT91_USART=y
CONFIG_MFD_ATMEL_FLEXCOM=m
CONFIG_MFD_ATMEL_HLCDC=m
# CONFIG_MFD_BCM590XX is not set
CONFIG_MFD_BD9571MWV=y
CONFIG_MFD_AXP20X=y
CONFIG_MFD_AXP20X_I2C=y
CONFIG_MFD_CROS_EC_DEV=y
# CONFIG_MFD_MADERA is not set
CONFIG_MFD_ASIC3=y
CONFIG_PMIC_DA903X=y
CONFIG_PMIC_DA9052=y
CONFIG_MFD_DA9052_SPI=y
CONFIG_MFD_DA9052_I2C=y
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9062=m
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
CONFIG_MFD_ENE_KB3930=y
CONFIG_MFD_EXYNOS_LPASS=m
CONFIG_MFD_GATEWORKS_GSC=m
CONFIG_MFD_MC13XXX=m
# CONFIG_MFD_MC13XXX_SPI is not set
CONFIG_MFD_MC13XXX_I2C=m
CONFIG_MFD_MP2629=m
# CONFIG_MFD_MXS_LRADC is not set
CONFIG_MFD_MX25_TSADC=m
CONFIG_MFD_HI6421_PMIC=y
CONFIG_MFD_HI6421_SPMI=m
CONFIG_MFD_HI655X_PMIC=y
CONFIG_HTC_PASIC3=m
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_IQS62X is not set
CONFIG_MFD_KEMPLD=m
CONFIG_MFD_88PM800=m
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
CONFIG_MFD_MAX14577=m
# CONFIG_MFD_MAX77620 is not set
# CONFIG_MFD_MAX77650 is not set
CONFIG_MFD_MAX77686=y
CONFIG_MFD_MAX77693=m
CONFIG_MFD_MAX77714=m
CONFIG_MFD_MAX77843=y
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
CONFIG_MFD_MAX8998=y
# CONFIG_MFD_MT6360 is not set
CONFIG_MFD_MT6370=y
CONFIG_MFD_MT6397=m
CONFIG_MFD_MENF21BMC=y
CONFIG_MFD_OCELOT=y
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_CPCAP is not set
# CONFIG_MFD_NTXEC is not set
CONFIG_MFD_RETU=m
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_PM8XXX is not set
# CONFIG_MFD_SPMI_PMIC is not set
CONFIG_MFD_SY7636A=m
# CONFIG_MFD_RT4831 is not set
CONFIG_MFD_RT5033=m
CONFIG_MFD_RT5120=y
# CONFIG_MFD_RC5T583 is not set
CONFIG_MFD_RK808=y
CONFIG_MFD_RN5T618=m
CONFIG_MFD_SEC_CORE=m
CONFIG_MFD_SI476X_CORE=m
CONFIG_MFD_SIMPLE_MFD_I2C=y
CONFIG_MFD_SL28CPLD=y
CONFIG_MFD_SM501=y
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
CONFIG_MFD_SC27XX_PMIC=m
CONFIG_ABX500_CORE=y
# CONFIG_MFD_STMPE is not set
CONFIG_MFD_SUN6I_PRCM=y
CONFIG_MFD_SYSCON=y
# CONFIG_MFD_TI_AM335X_TSCADC is not set
CONFIG_MFD_LP3943=m
CONFIG_MFD_LP8788=y
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
CONFIG_TPS6105X=m
CONFIG_TPS65010=y
# CONFIG_TPS6507X is not set
CONFIG_MFD_TPS65086=m
# CONFIG_MFD_TPS65090 is not set
CONFIG_MFD_TPS65217=y
# CONFIG_MFD_TI_LP873X is not set
CONFIG_MFD_TI_LP87565=y
CONFIG_MFD_TPS65218=m
CONFIG_MFD_TPS6586X=y
CONFIG_MFD_TPS65910=y
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
CONFIG_TWL4030_CORE=y
CONFIG_MFD_TWL4030_AUDIO=y
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
CONFIG_MFD_LM3533=m
CONFIG_MFD_TC3589X=y
CONFIG_MFD_TQMX86=y
# CONFIG_MFD_LOCHNAGAR is not set
CONFIG_MFD_ARIZONA=m
# CONFIG_MFD_ARIZONA_I2C is not set
CONFIG_MFD_ARIZONA_SPI=m
# CONFIG_MFD_CS47L24 is not set
CONFIG_MFD_WM5102=y
# CONFIG_MFD_WM5110 is not set
CONFIG_MFD_WM8997=y
# CONFIG_MFD_WM8998 is not set
# CONFIG_MFD_WM8400 is not set
CONFIG_MFD_WM831X=y
CONFIG_MFD_WM831X_I2C=y
CONFIG_MFD_WM831X_SPI=y
# CONFIG_MFD_WM8350_I2C is not set
CONFIG_MFD_WM8994=m
# CONFIG_MFD_STW481X is not set
CONFIG_MFD_ROHM_BD718XX=y
CONFIG_MFD_ROHM_BD71828=y
# CONFIG_MFD_ROHM_BD957XMUF is not set
CONFIG_MFD_STM32_LPTIMER=m
CONFIG_MFD_STM32_TIMERS=m
CONFIG_MFD_STPMIC1=m
CONFIG_MFD_STMFX=y
# CONFIG_MFD_WCD934X is not set
CONFIG_MFD_ATC260X=m
CONFIG_MFD_ATC260X_I2C=m
# CONFIG_MFD_KHADAS_MCU is not set
CONFIG_MFD_ACER_A500_EC=m
CONFIG_MFD_QCOM_PM8008=y
CONFIG_RAVE_SP_CORE=m
CONFIG_MFD_INTEL_M10_BMC=m
# CONFIG_MFD_RSMU_I2C is not set
CONFIG_MFD_RSMU_SPI=y
# end of Multifunction device drivers

CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_VIRTUAL_CONSUMER=m
CONFIG_REGULATOR_USERSPACE_CONSUMER=y
# CONFIG_REGULATOR_88PG86X is not set
CONFIG_REGULATOR_88PM800=m
CONFIG_REGULATOR_ACT8865=m
CONFIG_REGULATOR_ACT8945A=m
CONFIG_REGULATOR_AD5398=m
CONFIG_REGULATOR_ANATOP=m
CONFIG_REGULATOR_AAT2870=y
# CONFIG_REGULATOR_AS3711 is not set
# CONFIG_REGULATOR_AS3722 is not set
# CONFIG_REGULATOR_ATC260X is not set
# CONFIG_REGULATOR_AXP20X is not set
# CONFIG_REGULATOR_BD71815 is not set
CONFIG_REGULATOR_BD71828=y
CONFIG_REGULATOR_BD718XX=m
CONFIG_REGULATOR_BD9571MWV=m
CONFIG_REGULATOR_CROS_EC=m
# CONFIG_REGULATOR_DA9052 is not set
# CONFIG_REGULATOR_DA9055 is not set
CONFIG_REGULATOR_DA9062=m
CONFIG_REGULATOR_DA9121=m
CONFIG_REGULATOR_DA9210=y
CONFIG_REGULATOR_DA9211=y
CONFIG_REGULATOR_FAN53555=m
CONFIG_REGULATOR_FAN53880=y
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_HI6421=m
CONFIG_REGULATOR_HI6421V530=m
CONFIG_REGULATOR_HI655X=y
CONFIG_REGULATOR_HI6421V600=m
# CONFIG_REGULATOR_ISL9305 is not set
CONFIG_REGULATOR_ISL6271A=m
# CONFIG_REGULATOR_LP3971 is not set
# CONFIG_REGULATOR_LP3972 is not set
# CONFIG_REGULATOR_LP872X is not set
# CONFIG_REGULATOR_LP8755 is not set
CONFIG_REGULATOR_LP87565=y
CONFIG_REGULATOR_LP8788=m
# CONFIG_REGULATOR_LTC3589 is not set
CONFIG_REGULATOR_LTC3676=y
CONFIG_REGULATOR_MAX14577=m
# CONFIG_REGULATOR_MAX1586 is not set
CONFIG_REGULATOR_MAX77620=y
CONFIG_REGULATOR_MAX77650=y
# CONFIG_REGULATOR_MAX8649 is not set
CONFIG_REGULATOR_MAX8660=m
CONFIG_REGULATOR_MAX8893=m
# CONFIG_REGULATOR_MAX8907 is not set
CONFIG_REGULATOR_MAX8952=y
CONFIG_REGULATOR_MAX8998=y
CONFIG_REGULATOR_MAX20086=y
CONFIG_REGULATOR_MAX77686=y
CONFIG_REGULATOR_MAX77693=m
CONFIG_REGULATOR_MAX77802=m
CONFIG_REGULATOR_MAX77826=m
CONFIG_REGULATOR_MC13XXX_CORE=m
CONFIG_REGULATOR_MC13783=m
CONFIG_REGULATOR_MC13892=m
CONFIG_REGULATOR_MCP16502=y
# CONFIG_REGULATOR_MP5416 is not set
CONFIG_REGULATOR_MP8859=y
# CONFIG_REGULATOR_MP886X is not set
CONFIG_REGULATOR_MPQ7920=y
CONFIG_REGULATOR_MT6311=m
# CONFIG_REGULATOR_MT6315 is not set
# CONFIG_REGULATOR_MT6323 is not set
CONFIG_REGULATOR_MT6331=m
# CONFIG_REGULATOR_MT6332 is not set
# CONFIG_REGULATOR_MT6358 is not set
# CONFIG_REGULATOR_MT6359 is not set
# CONFIG_REGULATOR_MT6370 is not set
CONFIG_REGULATOR_MT6380=m
CONFIG_REGULATOR_MT6397=m
CONFIG_REGULATOR_PBIAS=m
CONFIG_REGULATOR_PCA9450=y
CONFIG_REGULATOR_PF8X00=m
CONFIG_REGULATOR_PFUZE100=y
CONFIG_REGULATOR_PV88060=y
CONFIG_REGULATOR_PV88080=y
CONFIG_REGULATOR_PV88090=y
CONFIG_REGULATOR_PWM=y
# CONFIG_REGULATOR_QCOM_RPMH is not set
# CONFIG_REGULATOR_QCOM_SMD_RPM is not set
# CONFIG_REGULATOR_QCOM_SPMI is not set
# CONFIG_REGULATOR_QCOM_USB_VBUS is not set
# CONFIG_REGULATOR_RASPBERRYPI_TOUCHSCREEN_ATTINY is not set
CONFIG_REGULATOR_RK808=y
CONFIG_REGULATOR_RN5T618=m
CONFIG_REGULATOR_ROHM=y
# CONFIG_REGULATOR_RT4801 is not set
# CONFIG_REGULATOR_RT5033 is not set
CONFIG_REGULATOR_RT5120=y
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
# CONFIG_REGULATOR_RT6160 is not set
# CONFIG_REGULATOR_RT6245 is not set
CONFIG_REGULATOR_RTQ2134=m
# CONFIG_REGULATOR_RTMV20 is not set
# CONFIG_REGULATOR_RTQ6752 is not set
CONFIG_REGULATOR_S2MPA01=m
CONFIG_REGULATOR_S2MPS11=y
CONFIG_REGULATOR_S5M8767=y
# CONFIG_REGULATOR_SC2731 is not set
CONFIG_REGULATOR_SLG51000=m
CONFIG_REGULATOR_STM32_BOOSTER=y
# CONFIG_REGULATOR_STM32_VREFBUF is not set
# CONFIG_REGULATOR_STM32_PWR is not set
# CONFIG_REGULATOR_STPMIC1 is not set
CONFIG_REGULATOR_TI_ABB=y
CONFIG_REGULATOR_STW481X_VMMC=y
CONFIG_REGULATOR_SY7636A=m
CONFIG_REGULATOR_SY8106A=y
CONFIG_REGULATOR_SY8824X=m
CONFIG_REGULATOR_SY8827N=m
# CONFIG_REGULATOR_TPS51632 is not set
CONFIG_REGULATOR_TPS6105X=m
CONFIG_REGULATOR_TPS62360=m
CONFIG_REGULATOR_TPS6286X=m
# CONFIG_REGULATOR_TPS65023 is not set
CONFIG_REGULATOR_TPS6507X=m
CONFIG_REGULATOR_TPS65086=m
CONFIG_REGULATOR_TPS65132=y
CONFIG_REGULATOR_TPS65217=y
# CONFIG_REGULATOR_TPS65218 is not set
CONFIG_REGULATOR_TPS6524X=y
CONFIG_REGULATOR_TPS6586X=y
CONFIG_REGULATOR_TPS65910=y
# CONFIG_REGULATOR_TPS68470 is not set
# CONFIG_REGULATOR_TWL4030 is not set
# CONFIG_REGULATOR_UNIPHIER is not set
CONFIG_REGULATOR_VCTRL=m
CONFIG_REGULATOR_WM831X=m
# CONFIG_REGULATOR_WM8994 is not set
# CONFIG_REGULATOR_QCOM_LABIBB is not set
CONFIG_CEC_CORE=y
CONFIG_CEC_NOTIFIER=y
CONFIG_CEC_PIN=y

#
# CEC support
#
# CONFIG_CEC_PIN_ERROR_INJ is not set
CONFIG_MEDIA_CEC_SUPPORT=y
CONFIG_CEC_CH7322=y
CONFIG_CEC_CROS_EC=y
CONFIG_CEC_MESON_AO=m
# CONFIG_CEC_MESON_G12A_AO is not set
CONFIG_CEC_GPIO=m
# CONFIG_CEC_SAMSUNG_S5P is not set
CONFIG_CEC_STI=m
CONFIG_CEC_STM32=y
CONFIG_CEC_TEGRA=y
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_APERTURE_HELPERS=y
CONFIG_IMX_IPUV3_CORE=m
CONFIG_DRM=y
CONFIG_DRM_MIPI_DBI=y
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DEBUG_MM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
CONFIG_DRM_DEBUG_MODESET_LOCK=y
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
CONFIG_DRM_DP_AUX_BUS=y
CONFIG_DRM_DISPLAY_HELPER=y
CONFIG_DRM_DISPLAY_DP_HELPER=y
CONFIG_DRM_DISPLAY_HDCP_HELPER=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_GEM_DMA_HELPER=y
CONFIG_DRM_GEM_SHMEM_HELPER=y
CONFIG_DRM_SCHED=m

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=y
# CONFIG_DRM_I2C_SIL164 is not set
CONFIG_DRM_I2C_NXP_TDA998X=y
CONFIG_DRM_I2C_NXP_TDA9950=m
# end of I2C encoder or helper chips

#
# ARM devices
#
CONFIG_DRM_HDLCD=m
# CONFIG_DRM_HDLCD_SHOW_UNDERRUN is not set
CONFIG_DRM_MALI_DISPLAY=y
# CONFIG_DRM_KOMEDA is not set
# end of ARM devices

CONFIG_DRM_KMB_DISPLAY=y
CONFIG_DRM_VGEM=y
CONFIG_DRM_VKMS=y
CONFIG_DRM_EXYNOS=y

#
# CRTCs
#
# CONFIG_DRM_EXYNOS_FIMD is not set
# CONFIG_DRM_EXYNOS5433_DECON is not set
CONFIG_DRM_EXYNOS7_DECON=y
# CONFIG_DRM_EXYNOS_MIXER is not set
CONFIG_DRM_EXYNOS_VIDI=y

#
# Encoders and Bridges
#
CONFIG_DRM_EXYNOS_DSI=y
CONFIG_DRM_EXYNOS_DP=y

#
# Sub-drivers
#
CONFIG_DRM_EXYNOS_G2D=y
CONFIG_DRM_EXYNOS_IPP=y
CONFIG_DRM_EXYNOS_FIMC=y
# CONFIG_DRM_EXYNOS_ROTATOR is not set
CONFIG_DRM_EXYNOS_SCALER=y
# CONFIG_DRM_EXYNOS_GSC is not set
CONFIG_DRM_ROCKCHIP=m
CONFIG_ROCKCHIP_VOP=y
# CONFIG_ROCKCHIP_VOP2 is not set
CONFIG_ROCKCHIP_ANALOGIX_DP=y
# CONFIG_ROCKCHIP_CDN_DP is not set
# CONFIG_ROCKCHIP_DW_HDMI is not set
CONFIG_ROCKCHIP_DW_MIPI_DSI=y
CONFIG_ROCKCHIP_INNO_HDMI=y
CONFIG_ROCKCHIP_RK3066_HDMI=y
# CONFIG_DRM_RCAR_DW_HDMI is not set
# CONFIG_DRM_RCAR_USE_LVDS is not set
# CONFIG_DRM_RCAR_USE_MIPI_DSI is not set
# CONFIG_DRM_SUN4I is not set
CONFIG_DRM_VIRTIO_GPU=m
# CONFIG_DRM_MSM is not set
CONFIG_DRM_PANEL=y

#
# Display Panels
#
CONFIG_DRM_PANEL_ABT_Y030XX067A=m
# CONFIG_DRM_PANEL_ARM_VERSATILE is not set
CONFIG_DRM_PANEL_ASUS_Z00T_TM5P5_NT35596=y
# CONFIG_DRM_PANEL_BOE_BF060Y8M_AJ0 is not set
# CONFIG_DRM_PANEL_BOE_HIMAX8279D is not set
# CONFIG_DRM_PANEL_BOE_TV101WUM_NL6 is not set
CONFIG_DRM_PANEL_DSI_CM=m
CONFIG_DRM_PANEL_LVDS=m
CONFIG_DRM_PANEL_EBBG_FT8719=m
# CONFIG_DRM_PANEL_ELIDA_KD35T133 is not set
CONFIG_DRM_PANEL_FEIXIN_K101_IM2BA02=y
# CONFIG_DRM_PANEL_FEIYANG_FY07024DI26A30D is not set
CONFIG_DRM_PANEL_ILITEK_IL9322=y
# CONFIG_DRM_PANEL_ILITEK_ILI9341 is not set
# CONFIG_DRM_PANEL_ILITEK_ILI9881C is not set
# CONFIG_DRM_PANEL_INNOLUX_EJ030NA is not set
CONFIG_DRM_PANEL_INNOLUX_P079ZCA=m
CONFIG_DRM_PANEL_JDI_LT070ME05000=m
# CONFIG_DRM_PANEL_JDI_R63452 is not set
CONFIG_DRM_PANEL_KHADAS_TS050=y
# CONFIG_DRM_PANEL_KINGDISPLAY_KD097D04 is not set
CONFIG_DRM_PANEL_LEADTEK_LTK050H3146W=y
CONFIG_DRM_PANEL_LEADTEK_LTK500HD1829=y
# CONFIG_DRM_PANEL_SAMSUNG_LD9040 is not set
CONFIG_DRM_PANEL_LG_LB035Q02=y
CONFIG_DRM_PANEL_LG_LG4573=m
CONFIG_DRM_PANEL_NEC_NL8048HL11=m
CONFIG_DRM_PANEL_NEWVISION_NV3052C=y
CONFIG_DRM_PANEL_NOVATEK_NT35510=m
CONFIG_DRM_PANEL_NOVATEK_NT35560=y
CONFIG_DRM_PANEL_NOVATEK_NT35950=y
# CONFIG_DRM_PANEL_NOVATEK_NT36672A is not set
# CONFIG_DRM_PANEL_NOVATEK_NT39016 is not set
# CONFIG_DRM_PANEL_MANTIX_MLAF057WE51 is not set
CONFIG_DRM_PANEL_OLIMEX_LCD_OLINUXINO=y
CONFIG_DRM_PANEL_ORISETECH_OTM8009A=y
# CONFIG_DRM_PANEL_OSD_OSD101T2587_53TS is not set
# CONFIG_DRM_PANEL_PANASONIC_VVX10F034N00 is not set
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# CONFIG_DRM_PANEL_RAYDIUM_RM67191 is not set
# CONFIG_DRM_PANEL_RAYDIUM_RM68200 is not set
CONFIG_DRM_PANEL_RONBO_RB070D30=m
CONFIG_DRM_PANEL_SAMSUNG_DB7430=y
# CONFIG_DRM_PANEL_SAMSUNG_S6D16D0 is not set
# CONFIG_DRM_PANEL_SAMSUNG_S6D27A1 is not set
# CONFIG_DRM_PANEL_SAMSUNG_S6E3HA2 is not set
# CONFIG_DRM_PANEL_SAMSUNG_S6E63J0X03 is not set
CONFIG_DRM_PANEL_SAMSUNG_S6E63M0=m
# CONFIG_DRM_PANEL_SAMSUNG_S6E63M0_SPI is not set
CONFIG_DRM_PANEL_SAMSUNG_S6E63M0_DSI=m
# CONFIG_DRM_PANEL_SAMSUNG_S6E88A0_AMS452EF01 is not set
# CONFIG_DRM_PANEL_SAMSUNG_S6E8AA0 is not set
CONFIG_DRM_PANEL_SAMSUNG_SOFEF00=y
# CONFIG_DRM_PANEL_SEIKO_43WVF1G is not set
# CONFIG_DRM_PANEL_SHARP_LQ101R1SX01 is not set
# CONFIG_DRM_PANEL_SHARP_LS037V7DW01 is not set
CONFIG_DRM_PANEL_SHARP_LS043T1LE01=y
CONFIG_DRM_PANEL_SHARP_LS060T1SX01=m
# CONFIG_DRM_PANEL_SITRONIX_ST7701 is not set
# CONFIG_DRM_PANEL_SITRONIX_ST7703 is not set
CONFIG_DRM_PANEL_SITRONIX_ST7789V=m
CONFIG_DRM_PANEL_SONY_ACX565AKM=y
CONFIG_DRM_PANEL_SONY_TULIP_TRULY_NT35521=m
# CONFIG_DRM_PANEL_TDO_TL070WSH30 is not set
CONFIG_DRM_PANEL_TPO_TD028TTEC1=m
CONFIG_DRM_PANEL_TPO_TD043MTEA1=y
CONFIG_DRM_PANEL_TPO_TPG110=y
CONFIG_DRM_PANEL_TRULY_NT35597_WQXGA=m
# CONFIG_DRM_PANEL_VISIONOX_RM69299 is not set
CONFIG_DRM_PANEL_WIDECHIPS_WS2401=y
CONFIG_DRM_PANEL_XINPENG_XPP055C272=m
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
CONFIG_DRM_CDNS_DSI=y
CONFIG_DRM_CHIPONE_ICN6211=m
CONFIG_DRM_CHRONTEL_CH7033=m
# CONFIG_DRM_CROS_EC_ANX7688 is not set
# CONFIG_DRM_DISPLAY_CONNECTOR is not set
# CONFIG_DRM_FSL_LDB is not set
CONFIG_DRM_ITE_IT6505=y
# CONFIG_DRM_LONTIUM_LT8912B is not set
CONFIG_DRM_LONTIUM_LT9211=m
# CONFIG_DRM_LONTIUM_LT9611 is not set
# CONFIG_DRM_LONTIUM_LT9611UXC is not set
CONFIG_DRM_ITE_IT66121=y
# CONFIG_DRM_LVDS_CODEC is not set
# CONFIG_DRM_MEGACHIPS_STDPXXXX_GE_B850V3_FW is not set
CONFIG_DRM_NWL_MIPI_DSI=m
# CONFIG_DRM_NXP_PTN3460 is not set
CONFIG_DRM_PARADE_PS8622=y
# CONFIG_DRM_PARADE_PS8640 is not set
CONFIG_DRM_SIL_SII8620=m
CONFIG_DRM_SII902X=y
# CONFIG_DRM_SII9234 is not set
CONFIG_DRM_SIMPLE_BRIDGE=m
# CONFIG_DRM_THINE_THC63LVD1024 is not set
CONFIG_DRM_TOSHIBA_TC358762=m
CONFIG_DRM_TOSHIBA_TC358764=y
# CONFIG_DRM_TOSHIBA_TC358767 is not set
# CONFIG_DRM_TOSHIBA_TC358768 is not set
# CONFIG_DRM_TOSHIBA_TC358775 is not set
CONFIG_DRM_TI_DLPC3433=y
# CONFIG_DRM_TI_TFP410 is not set
# CONFIG_DRM_TI_SN65DSI83 is not set
CONFIG_DRM_TI_SN65DSI86=m
CONFIG_DRM_TI_TPD12S015=y
CONFIG_DRM_ANALOGIX_ANX6345=m
CONFIG_DRM_ANALOGIX_ANX78XX=y
CONFIG_DRM_ANALOGIX_DP=y
CONFIG_DRM_ANALOGIX_ANX7625=y
# CONFIG_DRM_I2C_ADV7511 is not set
CONFIG_DRM_CDNS_MHDP8546=y
CONFIG_DRM_CDNS_MHDP8546_J721E=y
# CONFIG_DRM_IMX8QM_LDB is not set
# CONFIG_DRM_IMX8QXP_LDB is not set
CONFIG_DRM_IMX8QXP_PIXEL_COMBINER=y
CONFIG_DRM_IMX8QXP_PIXEL_LINK_TO_DPI=y
CONFIG_DRM_DW_MIPI_DSI=m
# end of Display Interface Bridges

# CONFIG_DRM_IMX is not set
# CONFIG_DRM_V3D is not set
CONFIG_DRM_ETNAVIV=m
# CONFIG_DRM_ETNAVIV_THERMAL is not set
CONFIG_DRM_LOGICVC=m
CONFIG_DRM_MXS=y
# CONFIG_DRM_MXSFB is not set
CONFIG_DRM_IMX_LCDIF=y
CONFIG_DRM_ARCPGU=m
CONFIG_DRM_PANEL_MIPI_DBI=y
CONFIG_DRM_SIMPLEDRM=m
# CONFIG_TINYDRM_HX8357D is not set
CONFIG_TINYDRM_ILI9163=m
CONFIG_TINYDRM_ILI9225=m
CONFIG_TINYDRM_ILI9341=y
CONFIG_TINYDRM_ILI9486=m
# CONFIG_TINYDRM_MI0283QT is not set
CONFIG_TINYDRM_REPAPER=m
CONFIG_TINYDRM_ST7586=m
CONFIG_TINYDRM_ST7735R=y
# CONFIG_DRM_PL111 is not set
CONFIG_DRM_LIMA=m
# CONFIG_DRM_ASPEED_GFX is not set
CONFIG_DRM_TIDSS=m
# CONFIG_DRM_SSD130X is not set
# CONFIG_DRM_SPRD is not set
CONFIG_DRM_LEGACY=y
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y
CONFIG_DRM_NOMODESET=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=m
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
CONFIG_FB_CFB_REV_PIXELS_IN_BYTE=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_BACKLIGHT=m
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CLPS711X=m
CONFIG_FB_IMX=m
CONFIG_FB_ARC=m
# CONFIG_FB_UVESA is not set
CONFIG_FB_PVR2=m
CONFIG_FB_OPENCORES=m
CONFIG_FB_S1D13XXX=m
CONFIG_FB_ATMEL=m
# CONFIG_FB_PXA168 is not set
# CONFIG_FB_W100 is not set
CONFIG_FB_SH_MOBILE_LCDC=m
CONFIG_FB_TMIO=m
CONFIG_FB_TMIO_ACCELL=y
CONFIG_FB_S3C=m
# CONFIG_FB_S3C_DEBUG_REGWRITE is not set
CONFIG_FB_SM501=m
CONFIG_FB_IBM_GXT4500=m
CONFIG_FB_GOLDFISH=m
CONFIG_FB_DA8XX=m
CONFIG_FB_VIRTUAL=m
CONFIG_FB_METRONOME=m
CONFIG_FB_BROADSHEET=m
CONFIG_FB_SIMPLE=m
CONFIG_FB_SSD1307=m
CONFIG_FB_OMAP_LCD_H3=y
# CONFIG_FB_OMAP2 is not set
CONFIG_MMP_DISP=y
# CONFIG_MMP_DISP_CONTROLLER is not set
CONFIG_MMP_PANEL_TPOHVGA=y
CONFIG_MMP_FB=m
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_LCD_L4F00242T03=m
# CONFIG_LCD_LMS283GF05 is not set
CONFIG_LCD_LTV350QV=m
CONFIG_LCD_ILI922X=m
CONFIG_LCD_ILI9320=m
CONFIG_LCD_TDO24M=m
CONFIG_LCD_VGG2432A4=m
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
CONFIG_LCD_OTM3225A=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_ATMEL_LCDC=y
CONFIG_BACKLIGHT_KTD253=y
CONFIG_BACKLIGHT_LM3533=m
# CONFIG_BACKLIGHT_OMAP1 is not set
CONFIG_BACKLIGHT_PWM=m
CONFIG_BACKLIGHT_DA903X=m
CONFIG_BACKLIGHT_DA9052=y
# CONFIG_BACKLIGHT_MT6370 is not set
CONFIG_BACKLIGHT_QCOM_WLED=y
CONFIG_BACKLIGHT_WM831X=y
CONFIG_BACKLIGHT_ADP8860=m
CONFIG_BACKLIGHT_ADP8870=y
CONFIG_BACKLIGHT_AAT2870=m
# CONFIG_BACKLIGHT_LM3630A is not set
CONFIG_BACKLIGHT_LM3639=m
# CONFIG_BACKLIGHT_LP855X is not set
CONFIG_BACKLIGHT_LP8788=m
CONFIG_BACKLIGHT_PANDORA=m
# CONFIG_BACKLIGHT_TPS65217 is not set
# CONFIG_BACKLIGHT_AS3711 is not set
CONFIG_BACKLIGHT_GPIO=y
CONFIG_BACKLIGHT_LV5207LP=y
CONFIG_BACKLIGHT_BD6107=y
CONFIG_BACKLIGHT_ARCXCNN=y
CONFIG_BACKLIGHT_RAVE_SP=m
# CONFIG_BACKLIGHT_LED is not set
# end of Backlight & LCD device support

CONFIG_VIDEOMODE_HELPERS=y
CONFIG_HDMI=y
# CONFIG_LOGO is not set
# end of Graphics support

CONFIG_SOUND=y
# CONFIG_SND is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_LED_TRIG=y
CONFIG_USB_ULPI_BUS=m
CONFIG_USB_CONN_GPIO=y
CONFIG_USB_ARCH_HAS_HCD=y
# CONFIG_USB is not set

#
# USB port drivers
#

#
# USB Physical Layer drivers
#
CONFIG_USB_PHY=y
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
CONFIG_TAHVO_USB=m
CONFIG_TAHVO_USB_HOST_BY_DEFAULT=y
CONFIG_USB_TEGRA_PHY=y
CONFIG_USB_ULPI=y
CONFIG_USB_ULPI_VIEWPORT=y
# CONFIG_JZ4770_PHY is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_UCSI is not set
# CONFIG_TYPEC_TPS6598X is not set
CONFIG_TYPEC_ANX7411=m
CONFIG_TYPEC_RT1719=m
CONFIG_TYPEC_HD3SS3220=m
# CONFIG_TYPEC_STUSB160X is not set
# CONFIG_TYPEC_QCOM_PMIC is not set
CONFIG_TYPEC_WUSB3801=m

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
CONFIG_TYPEC_MUX_FSA4480=m
CONFIG_TYPEC_MUX_PI3USB30532=y
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

CONFIG_USB_ROLE_SWITCH=y
CONFIG_MMC=m
CONFIG_PWRSEQ_EMMC=m
CONFIG_PWRSEQ_SD8787=m
# CONFIG_PWRSEQ_SIMPLE is not set
CONFIG_MMC_TEST=m

#
# MMC/SD/SDIO Host Controller Drivers
#
CONFIG_MMC_DEBUG=y
# CONFIG_MMC_SDHCI is not set
CONFIG_MMC_MESON_GX=m
CONFIG_MMC_MESON_MX_SDHC=m
CONFIG_MMC_MESON_MX_SDIO=m
CONFIG_MMC_MOXART=m
CONFIG_MMC_OMAP_HS=m
CONFIG_MMC_DAVINCI=m
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_S3C is not set
CONFIG_MMC_TMIO_CORE=m
# CONFIG_MMC_TMIO is not set
# CONFIG_MMC_SDHI is not set
CONFIG_MMC_UNIPHIER=m
# CONFIG_MMC_DW is not set
CONFIG_MMC_SH_MMCIF=m
# CONFIG_MMC_USDHI6ROL0 is not set
CONFIG_MMC_SUNXI=m
CONFIG_MMC_CQHCI=m
CONFIG_MMC_HSQ=m
CONFIG_MMC_BCM2835=m
CONFIG_MMC_MTK=m
# CONFIG_MMC_OWL is not set
CONFIG_MMC_LITEX=m
CONFIG_MEMSTICK=y
CONFIG_MEMSTICK_DEBUG=y

#
# MemoryStick drivers
#
# CONFIG_MEMSTICK_UNSAFE_RESUME is not set

#
# MemoryStick Host Controller Drivers
#
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=m
CONFIG_LEDS_CLASS_MULTICOLOR=m
CONFIG_LEDS_BRIGHTNESS_HW_CHANGED=y

#
# LED drivers
#
CONFIG_LEDS_AN30259A=m
CONFIG_LEDS_ARIEL=y
CONFIG_LEDS_AW2013=y
CONFIG_LEDS_BCM6328=m
# CONFIG_LEDS_BCM6358 is not set
CONFIG_LEDS_CR0014114=y
CONFIG_LEDS_EL15203000=m
CONFIG_LEDS_TURRIS_OMNIA=m
# CONFIG_LEDS_LM3530 is not set
CONFIG_LEDS_LM3532=y
# CONFIG_LEDS_LM3533 is not set
# CONFIG_LEDS_LM3642 is not set
CONFIG_LEDS_LM3692X=y
CONFIG_LEDS_MT6323=m
CONFIG_LEDS_S3C24XX=y
CONFIG_LEDS_COBALT_QUBE=y
CONFIG_LEDS_COBALT_RAQ=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
CONFIG_LEDS_LP50XX=m
# CONFIG_LEDS_LP55XX_COMMON is not set
CONFIG_LEDS_LP8788=m
CONFIG_LEDS_LP8860=m
# CONFIG_LEDS_PCA955X is not set
CONFIG_LEDS_PCA963X=y
CONFIG_LEDS_WM831X_STATUS=m
CONFIG_LEDS_DA903X=y
CONFIG_LEDS_DA9052=y
CONFIG_LEDS_DAC124S085=y
# CONFIG_LEDS_PWM is not set
CONFIG_LEDS_REGULATOR=m
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_LT3593=m
CONFIG_LEDS_MC13783=m
CONFIG_LEDS_NS2=m
# CONFIG_LEDS_NETXBIG is not set
# CONFIG_LEDS_ASIC3 is not set
CONFIG_LEDS_TCA6507=y
CONFIG_LEDS_TLC591XX=y
# CONFIG_LEDS_LM355x is not set
CONFIG_LEDS_OT200=m
CONFIG_LEDS_MENF21BMC=y
# CONFIG_LEDS_IS31FL319X is not set
CONFIG_LEDS_IS31FL32XX=m
CONFIG_LEDS_SC27XX_BLTC=m

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_SYSCON=y
# CONFIG_LEDS_MLXREG is not set
CONFIG_LEDS_USER=m
CONFIG_LEDS_SPI_BYTE=y
# CONFIG_LEDS_TI_LMU_COMMON is not set
# CONFIG_LEDS_TPS6105X is not set
CONFIG_LEDS_IP30=y
CONFIG_LEDS_ACER_A500=m
CONFIG_LEDS_BCM63138=y
# CONFIG_LEDS_LGM is not set

#
# Flash and Torch LED drivers
#
CONFIG_LEDS_AS3645A=m
# CONFIG_LEDS_KTD2692 is not set
CONFIG_LEDS_LM3601X=m
CONFIG_LEDS_MAX77693=m
CONFIG_LEDS_RT4505=m
CONFIG_LEDS_RT8515=m
# CONFIG_LEDS_SGM3140 is not set

#
# RGB LED drivers
#
CONFIG_LEDS_PWM_MULTICOLOR=m
CONFIG_LEDS_QCOM_LPG=m

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_ONESHOT=y
# CONFIG_LEDS_TRIGGER_MTD is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
CONFIG_LEDS_TRIGGER_ACTIVITY=y
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_LEDS_TRIGGER_TRANSIENT is not set
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=y

#
# Simple LED drivers
#
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
# CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set
# CONFIG_INFINIBAND_ADDR_TRANS is not set
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_RDMA_SIW is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
CONFIG_DMABUF_DEBUG=y
CONFIG_DMABUF_SELFTESTS=m
CONFIG_DMABUF_HEAPS=y
# CONFIG_DMABUF_SYSFS_STATS is not set
CONFIG_DMABUF_HEAPS_SYSTEM=y
# end of DMABUF options

CONFIG_AUXDISPLAY=y
CONFIG_CHARLCD=y
CONFIG_LINEDISP=y
CONFIG_HD44780_COMMON=m
# CONFIG_HD44780 is not set
CONFIG_IMG_ASCII_LCD=y
CONFIG_LCD2S=m
CONFIG_PARPORT_PANEL=m
CONFIG_PANEL_PARPORT=0
CONFIG_PANEL_PROFILE=5
CONFIG_PANEL_CHANGE_MESSAGE=y
CONFIG_PANEL_BOOT_MESSAGE=""
# CONFIG_CHARLCD_BL_OFF is not set
CONFIG_CHARLCD_BL_ON=y
# CONFIG_CHARLCD_BL_FLASH is not set
CONFIG_PANEL=m
CONFIG_UIO=m
# CONFIG_UIO_PDRV_GENIRQ is not set
CONFIG_UIO_DMEM_GENIRQ=m
CONFIG_UIO_PRUSS=m
CONFIG_VFIO=m
CONFIG_VFIO_NOIOMMU=y
# CONFIG_VFIO_PLATFORM is not set
CONFIG_VFIO_MDEV=m
CONFIG_VIRT_DRIVERS=y
CONFIG_VIRTIO_ANCHOR=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_MENU=y
# CONFIG_VIRTIO_VDPA is not set
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_MMIO=y
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
CONFIG_VIRTIO_DMA_SHARED_BUFFER=m
CONFIG_VDPA=y
CONFIG_VDPA_SIM=y
# CONFIG_VDPA_SIM_NET is not set
CONFIG_VDPA_SIM_BLOCK=m
CONFIG_VHOST_IOTLB=y
CONFIG_VHOST_RING=y
# CONFIG_VHOST_MENU is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
CONFIG_COMEDI=m
# CONFIG_COMEDI_DEBUG is not set
CONFIG_COMEDI_DEFAULT_BUF_SIZE_KB=2048
CONFIG_COMEDI_DEFAULT_BUF_MAXSIZE_KB=20480
# CONFIG_COMEDI_MISC_DRIVERS is not set
# CONFIG_COMEDI_ISA_DRIVERS is not set
# CONFIG_COMEDI_8255_SA is not set
CONFIG_COMEDI_KCOMEDILIB=m
CONFIG_COMEDI_TESTS=m
CONFIG_COMEDI_TESTS_EXAMPLE=m
# CONFIG_COMEDI_TESTS_NI_ROUTES is not set
# CONFIG_STAGING is not set
CONFIG_GOLDFISH=y
CONFIG_GOLDFISH_PIPE=m
CONFIG_CHROME_PLATFORMS=y
CONFIG_CROS_EC=y
CONFIG_CROS_EC_I2C=y
CONFIG_CROS_EC_RPMSG=y
# CONFIG_CROS_EC_SPI is not set
CONFIG_CROS_EC_PROTO=y
CONFIG_CROS_KBD_LED_BACKLIGHT=m
CONFIG_CROS_EC_CHARDEV=m
CONFIG_CROS_EC_LIGHTBAR=m
CONFIG_CROS_EC_VBC=y
# CONFIG_CROS_EC_DEBUGFS is not set
CONFIG_CROS_EC_SENSORHUB=y
# CONFIG_CROS_EC_SYSFS is not set
# CONFIG_CROS_USBPD_NOTIFY is not set
# CONFIG_MELLANOX_PLATFORM is not set
# CONFIG_OLPC_XO175 is not set
CONFIG_SURFACE_PLATFORMS=y
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
CONFIG_COMMON_CLK_WM831X=y

#
# Clock driver for ARM Reference designs
#
# CONFIG_CLK_ICST is not set
# CONFIG_CLK_SP810 is not set
# end of Clock driver for ARM Reference designs

# CONFIG_CLK_HSDK is not set
CONFIG_LMK04832=y
# CONFIG_COMMON_CLK_APPLE_NCO is not set
# CONFIG_COMMON_CLK_MAX77686 is not set
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_RK808 is not set
CONFIG_COMMON_CLK_HI655X=m
# CONFIG_COMMON_CLK_SCMI is not set
# CONFIG_COMMON_CLK_SCPI is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
CONFIG_COMMON_CLK_SI514=y
CONFIG_COMMON_CLK_SI544=y
# CONFIG_COMMON_CLK_SI570 is not set
CONFIG_COMMON_CLK_BM1880=y
CONFIG_COMMON_CLK_CDCE706=m
CONFIG_COMMON_CLK_TPS68470=m
# CONFIG_COMMON_CLK_CDCE925 is not set
CONFIG_COMMON_CLK_CS2000_CP=m
CONFIG_COMMON_CLK_EN7523=y
CONFIG_COMMON_CLK_FSL_FLEXSPI=m
# CONFIG_COMMON_CLK_FSL_SAI is not set
CONFIG_COMMON_CLK_GEMINI=y
# CONFIG_COMMON_CLK_LAN966X is not set
# CONFIG_COMMON_CLK_ASPEED is not set
CONFIG_COMMON_CLK_S2MPS11=y
# CONFIG_COMMON_CLK_AXI_CLKGEN is not set
# CONFIG_CLK_QORIQ is not set
# CONFIG_CLK_LS1028A_PLLDIG is not set
CONFIG_COMMON_CLK_XGENE=y
CONFIG_COMMON_CLK_PWM=m
CONFIG_COMMON_CLK_OXNAS=y
CONFIG_COMMON_CLK_RS9_PCIE=y
CONFIG_COMMON_CLK_VC5=y
CONFIG_COMMON_CLK_VC7=y
CONFIG_COMMON_CLK_MMP2_AUDIO=m
CONFIG_COMMON_CLK_BD718XX=y
# CONFIG_COMMON_CLK_FIXED_MMIO is not set
# CONFIG_CLK_ACTIONS is not set
# CONFIG_CLK_BAIKAL_T1 is not set
CONFIG_CLK_BCM2711_DVP=y
# CONFIG_CLK_BCM2835 is not set
CONFIG_CLK_BCM_63XX=y
CONFIG_CLK_BCM_63XX_GATE=y
CONFIG_CLK_BCM_KONA=y
CONFIG_COMMON_CLK_IPROC=y
CONFIG_CLK_BCM_CYGNUS=y
CONFIG_CLK_BCM_HR2=y
# CONFIG_CLK_BCM_NSP is not set
CONFIG_CLK_BCM_NS2=y
CONFIG_CLK_BCM_SR=y
CONFIG_CLK_RASPBERRYPI=m
CONFIG_COMMON_CLK_HI3516CV300=m
# CONFIG_COMMON_CLK_HI3519 is not set
CONFIG_COMMON_CLK_HI3559A=y
CONFIG_COMMON_CLK_HI3660=y
# CONFIG_COMMON_CLK_HI3670 is not set
# CONFIG_COMMON_CLK_HI3798CV200 is not set
CONFIG_COMMON_CLK_HI6220=y
CONFIG_RESET_HISI=y
CONFIG_COMMON_CLK_BOSTON=y
CONFIG_MXC_CLK=y
CONFIG_CLK_IMX8MM=m
CONFIG_CLK_IMX8MN=y
CONFIG_CLK_IMX8MP=m
CONFIG_CLK_IMX8MQ=y
CONFIG_CLK_IMX8ULP=m
CONFIG_CLK_IMX93=m

#
# Ingenic SoCs drivers
#
CONFIG_INGENIC_CGU_COMMON=y
CONFIG_INGENIC_CGU_JZ4740=y
# CONFIG_INGENIC_CGU_JZ4725B is not set
# CONFIG_INGENIC_CGU_JZ4760 is not set
CONFIG_INGENIC_CGU_JZ4770=y
CONFIG_INGENIC_CGU_JZ4780=y
# CONFIG_INGENIC_CGU_X1000 is not set
CONFIG_INGENIC_CGU_X1830=y
# CONFIG_INGENIC_TCU_CLK is not set
# end of Ingenic SoCs drivers

CONFIG_COMMON_CLK_KEYSTONE=m
CONFIG_TI_SYSCON_CLK=y

#
# Clock driver for MediaTek SoC
#
CONFIG_COMMON_CLK_MEDIATEK=y
# CONFIG_COMMON_CLK_MT2701 is not set
CONFIG_COMMON_CLK_MT2712=y
# CONFIG_COMMON_CLK_MT2712_BDPSYS is not set
CONFIG_COMMON_CLK_MT2712_IMGSYS=y
# CONFIG_COMMON_CLK_MT2712_JPGDECSYS is not set
CONFIG_COMMON_CLK_MT2712_MFGCFG=y
# CONFIG_COMMON_CLK_MT2712_MMSYS is not set
CONFIG_COMMON_CLK_MT2712_VDECSYS=y
CONFIG_COMMON_CLK_MT2712_VENCSYS=y
# CONFIG_COMMON_CLK_MT6765 is not set
CONFIG_COMMON_CLK_MT6779=y
# CONFIG_COMMON_CLK_MT6779_MMSYS is not set
CONFIG_COMMON_CLK_MT6779_IMGSYS=y
# CONFIG_COMMON_CLK_MT6779_IPESYS is not set
CONFIG_COMMON_CLK_MT6779_CAMSYS=y
CONFIG_COMMON_CLK_MT6779_VDECSYS=y
CONFIG_COMMON_CLK_MT6779_VENCSYS=m
CONFIG_COMMON_CLK_MT6779_MFGCFG=m
CONFIG_COMMON_CLK_MT6779_AUDSYS=m
CONFIG_COMMON_CLK_MT6795=y
# CONFIG_COMMON_CLK_MT6795_MFGCFG is not set
# CONFIG_COMMON_CLK_MT6795_MMSYS is not set
CONFIG_COMMON_CLK_MT6795_VDECSYS=m
# CONFIG_COMMON_CLK_MT6795_VENCSYS is not set
# CONFIG_COMMON_CLK_MT6797 is not set
# CONFIG_COMMON_CLK_MT7622 is not set
CONFIG_COMMON_CLK_MT7629=y
CONFIG_COMMON_CLK_MT7629_ETHSYS=y
# CONFIG_COMMON_CLK_MT7629_HIFSYS is not set
CONFIG_COMMON_CLK_MT7986=y
CONFIG_COMMON_CLK_MT7986_ETHSYS=y
# CONFIG_COMMON_CLK_MT8135 is not set
# CONFIG_COMMON_CLK_MT8167 is not set
CONFIG_COMMON_CLK_MT8173=y
# CONFIG_COMMON_CLK_MT8173_MMSYS is not set
# CONFIG_COMMON_CLK_MT8183 is not set
CONFIG_COMMON_CLK_MT8186=y
CONFIG_COMMON_CLK_MT8192=y
CONFIG_COMMON_CLK_MT8192_AUDSYS=y
# CONFIG_COMMON_CLK_MT8192_CAMSYS is not set
CONFIG_COMMON_CLK_MT8192_IMGSYS=y
# CONFIG_COMMON_CLK_MT8192_IMP_IIC_WRAP is not set
CONFIG_COMMON_CLK_MT8192_IPESYS=y
CONFIG_COMMON_CLK_MT8192_MDPSYS=y
CONFIG_COMMON_CLK_MT8192_MFGCFG=y
CONFIG_COMMON_CLK_MT8192_MMSYS=y
# CONFIG_COMMON_CLK_MT8192_MSDC is not set
# CONFIG_COMMON_CLK_MT8192_SCP_ADSP is not set
CONFIG_COMMON_CLK_MT8192_VDECSYS=y
CONFIG_COMMON_CLK_MT8192_VENCSYS=y
CONFIG_COMMON_CLK_MT8195=y
# CONFIG_COMMON_CLK_MT8365 is not set
# CONFIG_COMMON_CLK_MT8516 is not set
# end of Clock driver for MediaTek SoC

#
# Clock support for Amlogic platforms
#
# end of Clock support for Amlogic platforms

# CONFIG_MSTAR_MSC313_MPLL is not set
# CONFIG_MCHP_CLK_MPFS is not set
# CONFIG_COMMON_CLK_PISTACHIO is not set
CONFIG_QCOM_GDSC=y
CONFIG_QCOM_RPMCC=y
CONFIG_COMMON_CLK_QCOM=m
CONFIG_QCOM_A53PLL=m
# CONFIG_QCOM_A7PLL is not set
CONFIG_QCOM_CLK_APCS_MSM8916=m
CONFIG_QCOM_CLK_APCS_SDX55=m
CONFIG_QCOM_CLK_SMD_RPM=m
CONFIG_APQ_GCC_8084=m
# CONFIG_APQ_MMCC_8084 is not set
CONFIG_IPQ_APSS_PLL=m
CONFIG_IPQ_APSS_6018=m
CONFIG_IPQ_GCC_4019=m
CONFIG_IPQ_GCC_6018=m
CONFIG_IPQ_GCC_806X=m
CONFIG_IPQ_LCC_806X=m
# CONFIG_IPQ_GCC_8074 is not set
CONFIG_MSM_GCC_8660=m
# CONFIG_MSM_GCC_8909 is not set
# CONFIG_MSM_GCC_8916 is not set
CONFIG_MSM_GCC_8939=m
CONFIG_MSM_GCC_8960=m
CONFIG_MSM_LCC_8960=m
# CONFIG_MDM_GCC_9607 is not set
# CONFIG_MDM_GCC_9615 is not set
# CONFIG_MDM_LCC_9615 is not set
CONFIG_MSM_MMCC_8960=m
CONFIG_MSM_GCC_8953=m
# CONFIG_MSM_GCC_8974 is not set
# CONFIG_MSM_MMCC_8974 is not set
CONFIG_MSM_GCC_8976=m
# CONFIG_MSM_MMCC_8994 is not set
CONFIG_MSM_GCC_8994=m
CONFIG_MSM_GCC_8996=m
CONFIG_MSM_MMCC_8996=m
CONFIG_MSM_GCC_8998=m
CONFIG_MSM_GPUCC_8998=m
CONFIG_MSM_MMCC_8998=m
CONFIG_QCM_GCC_2290=m
CONFIG_QCM_DISPCC_2290=m
CONFIG_QCS_GCC_404=m
CONFIG_SC_CAMCC_7180=m
CONFIG_SC_CAMCC_7280=m
# CONFIG_SC_DISPCC_7180 is not set
# CONFIG_SC_DISPCC_7280 is not set
CONFIG_SC_GCC_7180=m
CONFIG_SC_GCC_7280=m
CONFIG_SC_GCC_8180X=m
CONFIG_SC_GCC_8280XP=m
# CONFIG_SC_GPUCC_7180 is not set
CONFIG_SC_GPUCC_7280=m
CONFIG_SC_GPUCC_8280XP=m
# CONFIG_SC_LPASSCC_7280 is not set
CONFIG_SC_LPASS_CORECC_7180=m
CONFIG_SC_LPASS_CORECC_7280=m
CONFIG_SC_MSS_7180=m
CONFIG_SC_VIDEOCC_7180=m
CONFIG_SC_VIDEOCC_7280=m
# CONFIG_SDM_CAMCC_845 is not set
CONFIG_SDM_GCC_660=m
# CONFIG_SDM_MMCC_660 is not set
# CONFIG_SDM_GPUCC_660 is not set
CONFIG_QCS_TURING_404=m
CONFIG_QCS_Q6SSTOP_404=m
CONFIG_SDM_GCC_845=m
# CONFIG_SDM_GPUCC_845 is not set
CONFIG_SDM_VIDEOCC_845=m
# CONFIG_SDM_DISPCC_845 is not set
CONFIG_SDM_LPASSCC_845=m
CONFIG_SDX_GCC_55=m
CONFIG_SDX_GCC_65=m
CONFIG_SM_CAMCC_8250=m
CONFIG_SM_CAMCC_8450=m
# CONFIG_SM_DISPCC_6115 is not set
CONFIG_SM_DISPCC_6125=m
CONFIG_SM_DISPCC_8250=m
CONFIG_SM_DISPCC_8450=m
CONFIG_SM_GCC_6115=m
CONFIG_SM_GCC_6125=m
# CONFIG_SM_GCC_6350 is not set
# CONFIG_SM_GCC_6375 is not set
CONFIG_SM_GCC_8150=m
CONFIG_SM_GCC_8250=m
CONFIG_SM_GCC_8350=m
CONFIG_SM_GCC_8450=m
# CONFIG_SM_GPUCC_6350 is not set
CONFIG_SM_GPUCC_8150=m
CONFIG_SM_GPUCC_8250=m
CONFIG_SM_GPUCC_8350=m
# CONFIG_SM_VIDEOCC_8150 is not set
# CONFIG_SM_VIDEOCC_8250 is not set
CONFIG_SPMI_PMIC_CLKDIV=m
CONFIG_QCOM_HFPLL=m
CONFIG_KPSS_XCC=m
CONFIG_CLK_GFM_LPASS_SM8250=m
# CONFIG_CLK_MT7621 is not set
# CONFIG_CLK_RENESAS is not set
CONFIG_COMMON_CLK_SAMSUNG=y
# CONFIG_S3C64XX_COMMON_CLK is not set
# CONFIG_S5PV210_COMMON_CLK is not set
CONFIG_EXYNOS_3250_COMMON_CLK=y
# CONFIG_EXYNOS_4_COMMON_CLK is not set
CONFIG_EXYNOS_5250_COMMON_CLK=y
# CONFIG_EXYNOS_5260_COMMON_CLK is not set
# CONFIG_EXYNOS_5410_COMMON_CLK is not set
CONFIG_EXYNOS_5420_COMMON_CLK=y
CONFIG_EXYNOS_ARM64_COMMON_CLK=y
CONFIG_EXYNOS_AUDSS_CLK_CON=y
# CONFIG_EXYNOS_CLKOUT is not set
# CONFIG_S3C2410_COMMON_CLK is not set
CONFIG_S3C2412_COMMON_CLK=y
# CONFIG_S3C2443_COMMON_CLK is not set
CONFIG_TESLA_FSD_COMMON_CLK=y
CONFIG_CLK_SIFIVE=y
# CONFIG_CLK_SIFIVE_PRCI is not set
# CONFIG_CLK_INTEL_SOCFPGA is not set
CONFIG_SPRD_COMMON_CLK=m
# CONFIG_SPRD_SC9860_CLK is not set
CONFIG_SPRD_SC9863A_CLK=m
# CONFIG_SPRD_UMS512_CLK is not set
CONFIG_CLK_STARFIVE_JH7100=y
CONFIG_CLK_STARFIVE_JH7100_AUDIO=y
CONFIG_CLK_SUNXI=y
CONFIG_CLK_SUNXI_CLOCKS=y
CONFIG_CLK_SUNXI_PRCM_SUN6I=y
# CONFIG_CLK_SUNXI_PRCM_SUN8I is not set
CONFIG_CLK_SUNXI_PRCM_SUN9I=y
CONFIG_SUNXI_CCU=y
CONFIG_SUNIV_F1C100S_CCU=y
CONFIG_SUN20I_D1_CCU=m
# CONFIG_SUN20I_D1_R_CCU is not set
CONFIG_SUN50I_A64_CCU=m
CONFIG_SUN50I_A100_CCU=m
CONFIG_SUN50I_A100_R_CCU=m
CONFIG_SUN50I_H6_CCU=m
CONFIG_SUN50I_H616_CCU=m
# CONFIG_SUN50I_H6_R_CCU is not set
CONFIG_SUN4I_A10_CCU=y
# CONFIG_SUN5I_CCU is not set
CONFIG_SUN6I_A31_CCU=m
CONFIG_SUN6I_RTC_CCU=y
# CONFIG_SUN8I_A23_CCU is not set
# CONFIG_SUN8I_A33_CCU is not set
CONFIG_SUN8I_A83T_CCU=y
CONFIG_SUN8I_H3_CCU=m
CONFIG_SUN8I_V3S_CCU=m
CONFIG_SUN8I_DE2_CCU=m
CONFIG_SUN8I_R40_CCU=m
# CONFIG_SUN9I_A80_CCU is not set
CONFIG_SUN8I_R_CCU=y
CONFIG_COMMON_CLK_TI_ADPLL=y
# CONFIG_CLK_UNIPHIER is not set
# CONFIG_COMMON_CLK_VISCONTI is not set
CONFIG_CLK_LGM_CGU=y
CONFIG_XILINX_VCU=m
# CONFIG_COMMON_CLK_XLNX_CLKWZRD is not set
# CONFIG_COMMON_CLK_ZYNQMP is not set
CONFIG_HWSPINLOCK=y
CONFIG_HWSPINLOCK_OMAP=y
CONFIG_HWSPINLOCK_QCOM=m
# CONFIG_HWSPINLOCK_SPRD is not set
CONFIG_HWSPINLOCK_STM32=m
CONFIG_HWSPINLOCK_SUN6I=y
CONFIG_HSEM_U8500=y

#
# Clock Source drivers
#
CONFIG_TIMER_OF=y
CONFIG_TIMER_PROBE=y
CONFIG_CLKSRC_MMIO=y
# CONFIG_BCM2835_TIMER is not set
CONFIG_BCM_KONA_TIMER=y
# CONFIG_DAVINCI_TIMER is not set
# CONFIG_DIGICOLOR_TIMER is not set
# CONFIG_OMAP_DM_TIMER is not set
CONFIG_DW_APB_TIMER=y
CONFIG_FTTMR010_TIMER=y
CONFIG_IXP4XX_TIMER=y
CONFIG_MESON6_TIMER=y
CONFIG_OWL_TIMER=y
CONFIG_RDA_TIMER=y
CONFIG_SUN4I_TIMER=y
CONFIG_SUN5I_HSTIMER=y
CONFIG_TEGRA_TIMER=y
# CONFIG_VT8500_TIMER is not set
CONFIG_NPCM7XX_TIMER=y
# CONFIG_CADENCE_TTC_TIMER is not set
CONFIG_ASM9260_TIMER=y
CONFIG_CLKSRC_DBX500_PRCMU=y
CONFIG_CLPS711X_TIMER=y
# CONFIG_MXS_TIMER is not set
CONFIG_NSPIRE_TIMER=y
# CONFIG_INTEGRATOR_AP_TIMER is not set
# CONFIG_CLKSRC_PISTACHIO is not set
CONFIG_CLKSRC_STM32_LP=y
# CONFIG_ARMV7M_SYSTICK is not set
CONFIG_ATMEL_PIT=y
CONFIG_ATMEL_ST=y
CONFIG_CLKSRC_SAMSUNG_PWM=y
# CONFIG_FSL_FTM_TIMER is not set
CONFIG_OXNAS_RPS_TIMER=y
# CONFIG_MTK_TIMER is not set
CONFIG_SPRD_TIMER=y
CONFIG_CLKSRC_JCORE_PIT=y
CONFIG_SH_TIMER_CMT=y
# CONFIG_SH_TIMER_MTU2 is not set
# CONFIG_RENESAS_OSTM is not set
CONFIG_SH_TIMER_TMU=y
# CONFIG_EM_TIMER_STI is not set
# CONFIG_CLKSRC_PXA is not set
CONFIG_TIMER_IMX_SYS_CTR=y
# CONFIG_CLKSRC_ST_LPC is not set
CONFIG_GXP_TIMER=y
# CONFIG_MSC313E_TIMER is not set
# CONFIG_INGENIC_TIMER is not set
CONFIG_INGENIC_SYSOST=y
# CONFIG_INGENIC_OST is not set
CONFIG_MICROCHIP_PIT64B=y
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
CONFIG_IOMMU_IO_PGTABLE=y
CONFIG_IOMMU_IO_PGTABLE_ARMV7S=y
CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST=y
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_DMA_STRICT is not set
CONFIG_IOMMU_DEFAULT_DMA_LAZY=y
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_OF_IOMMU=y
# CONFIG_OMAP_IOMMU is not set
CONFIG_ROCKCHIP_IOMMU=y
CONFIG_SUN50I_IOMMU=y
# CONFIG_EXYNOS_IOMMU is not set
CONFIG_S390_CCW_IOMMU=y
# CONFIG_S390_AP_IOMMU is not set
CONFIG_MTK_IOMMU=m
CONFIG_SPRD_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
CONFIG_RPMSG=y
CONFIG_RPMSG_CHAR=y
CONFIG_RPMSG_CTRL=m
CONFIG_RPMSG_NS=y
CONFIG_RPMSG_VIRTIO=y
# end of Rpmsg drivers

CONFIG_SOUNDWIRE=m

#
# SoundWire Devices
#

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
CONFIG_MESON_CANVAS=y
# CONFIG_MESON_CLK_MEASURE is not set
# CONFIG_MESON_GX_SOCINFO is not set
# CONFIG_MESON_MX_SOCINFO is not set
# end of Amlogic SoC drivers

#
# Apple SoC drivers
#
# CONFIG_APPLE_SART is not set
# end of Apple SoC drivers

#
# ASPEED SoC drivers
#
# CONFIG_ASPEED_LPC_CTRL is not set
CONFIG_ASPEED_LPC_SNOOP=m
CONFIG_ASPEED_UART_ROUTING=y
# CONFIG_ASPEED_P2A_CTRL is not set
CONFIG_ASPEED_SOCINFO=y
# end of ASPEED SoC drivers

# CONFIG_AT91_SOC_ID is not set
CONFIG_AT91_SOC_SFR=y

#
# Broadcom SoC drivers
#
# CONFIG_BCM2835_POWER is not set
CONFIG_SOC_BCM63XX=y
# CONFIG_SOC_BRCMSTB is not set
CONFIG_BCM63XX_POWER=y
# CONFIG_BCM_PMB is not set
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# CONFIG_QUICC_ENGINE is not set
# CONFIG_DPAA2_CONSOLE is not set
# end of NXP/Freescale QorIQ SoC drivers

#
# fujitsu SoC drivers
#
# end of fujitsu SoC drivers

#
# i.MX SoC drivers
#
# CONFIG_SOC_IMX8M is not set
CONFIG_SOC_IMX9=m
# end of i.MX SoC drivers

#
# IXP4xx SoC drivers
#
# CONFIG_IXP4XX_QMGR is not set
# CONFIG_IXP4XX_NPE is not set
# end of IXP4xx SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
CONFIG_LITEX=y
CONFIG_LITEX_SOC_CONTROLLER=m
# end of Enable LiteX SoC Builder specific drivers

#
# MediaTek SoC drivers
#
# CONFIG_MTK_CMDQ is not set
CONFIG_MTK_DEVAPC=y
CONFIG_MTK_INFRACFG=y
CONFIG_MTK_PMIC_WRAP=y
CONFIG_MTK_SCPSYS=y
# CONFIG_MTK_MMSYS is not set
CONFIG_MTK_SVS=m
# end of MediaTek SoC drivers

#
# Qualcomm SoC drivers
#
# CONFIG_QCOM_COMMAND_DB is not set
# CONFIG_QCOM_GENI_SE is not set
CONFIG_QCOM_GSBI=y
CONFIG_QCOM_LLCC=m
# CONFIG_QCOM_RPMH is not set
CONFIG_QCOM_SMEM=m
CONFIG_QCOM_SMD_RPM=m
CONFIG_QCOM_SMEM_STATE=y
CONFIG_QCOM_SMSM=m
CONFIG_QCOM_SOCINFO=m
CONFIG_QCOM_SPM=m
CONFIG_QCOM_STATS=m
# CONFIG_QCOM_WCNSS_CTRL is not set
# CONFIG_QCOM_APR is not set
CONFIG_QCOM_ICC_BWMON=m
# end of Qualcomm SoC drivers

# CONFIG_SOC_RENESAS is not set
CONFIG_ROCKCHIP_GRF=y
# CONFIG_ROCKCHIP_IODOMAIN is not set
# CONFIG_SOC_SAMSUNG is not set
# CONFIG_SOC_TEGRA20_VOLTAGE_COUPLER is not set
CONFIG_SOC_TEGRA30_VOLTAGE_COUPLER=y
# CONFIG_SOC_TI is not set
# CONFIG_UX500_SOC_ID is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

CONFIG_PM_DEVFREQ=y

#
# DEVFREQ Governors
#
CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND=y
CONFIG_DEVFREQ_GOV_PERFORMANCE=m
CONFIG_DEVFREQ_GOV_POWERSAVE=y
CONFIG_DEVFREQ_GOV_USERSPACE=y
CONFIG_DEVFREQ_GOV_PASSIVE=y

#
# DEVFREQ Drivers
#
CONFIG_ARM_EXYNOS_BUS_DEVFREQ=m
CONFIG_ARM_IMX_BUS_DEVFREQ=y
CONFIG_ARM_TEGRA_DEVFREQ=y
CONFIG_ARM_MEDIATEK_CCI_DEVFREQ=y
# CONFIG_ARM_SUN8I_A33_MBUS_DEVFREQ is not set
CONFIG_PM_DEVFREQ_EVENT=y
CONFIG_DEVFREQ_EVENT_EXYNOS_NOCP=y
CONFIG_DEVFREQ_EVENT_EXYNOS_PPMU=m
# CONFIG_DEVFREQ_EVENT_ROCKCHIP_DFI is not set
CONFIG_EXTCON=y

#
# Extcon Device Drivers
#
# CONFIG_EXTCON_ADC_JACK is not set
CONFIG_EXTCON_GPIO=m
CONFIG_EXTCON_MAX14577=m
CONFIG_EXTCON_MAX3355=y
CONFIG_EXTCON_MAX77843=m
# CONFIG_EXTCON_PTN5150 is not set
# CONFIG_EXTCON_QCOM_SPMI_MISC is not set
CONFIG_EXTCON_RT8973A=m
# CONFIG_EXTCON_SM5502 is not set
CONFIG_EXTCON_USB_GPIO=m
# CONFIG_EXTCON_USBC_CROS_EC is not set
# CONFIG_EXTCON_USBC_TUSB320 is not set
CONFIG_MEMORY=y
CONFIG_DDR=y
# CONFIG_ATMEL_SDRAMC is not set
# CONFIG_ATMEL_EBI is not set
# CONFIG_BRCMSTB_DPFE is not set
# CONFIG_BRCMSTB_MEMC is not set
# CONFIG_BT1_L2_CTL is not set
# CONFIG_TI_AEMIF is not set
CONFIG_TI_EMIF=m
CONFIG_OMAP_GPMC=y
CONFIG_OMAP_GPMC_DEBUG=y
# CONFIG_MVEBU_DEVBUS is not set
# CONFIG_FSL_CORENET_CF is not set
# CONFIG_FSL_IFC is not set
CONFIG_JZ4780_NEMC=y
CONFIG_MTK_SMI=m
# CONFIG_DA8XX_DDRCTL is not set
CONFIG_RENESAS_RPCIF=y
# CONFIG_STM32_FMC2_EBI is not set
CONFIG_SAMSUNG_MC=y
CONFIG_EXYNOS5422_DMC=m
# CONFIG_EXYNOS_SROM is not set
# CONFIG_TEGRA_MC is not set
CONFIG_IIO=m
CONFIG_IIO_BUFFER=y
CONFIG_IIO_BUFFER_CB=m
CONFIG_IIO_BUFFER_DMA=m
CONFIG_IIO_BUFFER_DMAENGINE=m
# CONFIG_IIO_BUFFER_HW_CONSUMER is not set
CONFIG_IIO_KFIFO_BUF=m
CONFIG_IIO_TRIGGERED_BUFFER=m
CONFIG_IIO_CONFIGFS=m
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
# CONFIG_IIO_SW_DEVICE is not set
CONFIG_IIO_SW_TRIGGER=m
CONFIG_IIO_TRIGGERED_EVENT=m

#
# Accelerometers
#
CONFIG_ADIS16201=m
# CONFIG_ADIS16209 is not set
CONFIG_ADXL313=m
# CONFIG_ADXL313_I2C is not set
CONFIG_ADXL313_SPI=m
CONFIG_ADXL345=m
CONFIG_ADXL345_I2C=m
CONFIG_ADXL345_SPI=m
CONFIG_ADXL355=m
# CONFIG_ADXL355_I2C is not set
CONFIG_ADXL355_SPI=m
CONFIG_ADXL367=m
CONFIG_ADXL367_SPI=m
CONFIG_ADXL367_I2C=m
CONFIG_ADXL372=m
CONFIG_ADXL372_SPI=m
CONFIG_ADXL372_I2C=m
CONFIG_BMA180=m
CONFIG_BMA220=m
CONFIG_BMA400=m
CONFIG_BMA400_I2C=m
CONFIG_BMA400_SPI=m
CONFIG_BMC150_ACCEL=m
CONFIG_BMC150_ACCEL_I2C=m
CONFIG_BMC150_ACCEL_SPI=m
CONFIG_BMI088_ACCEL=m
CONFIG_BMI088_ACCEL_SPI=m
CONFIG_DA280=m
CONFIG_DA311=m
# CONFIG_DMARD06 is not set
CONFIG_DMARD09=m
CONFIG_DMARD10=m
CONFIG_FXLS8962AF=m
CONFIG_FXLS8962AF_I2C=m
# CONFIG_FXLS8962AF_SPI is not set
CONFIG_IIO_ST_ACCEL_3AXIS=m
CONFIG_IIO_ST_ACCEL_I2C_3AXIS=m
CONFIG_IIO_ST_ACCEL_SPI_3AXIS=m
CONFIG_KXSD9=m
CONFIG_KXSD9_SPI=m
CONFIG_KXSD9_I2C=m
CONFIG_KXCJK1013=m
CONFIG_MC3230=m
CONFIG_MMA7455=m
CONFIG_MMA7455_I2C=m
CONFIG_MMA7455_SPI=m
CONFIG_MMA7660=m
# CONFIG_MMA8452 is not set
CONFIG_MMA9551_CORE=m
CONFIG_MMA9551=m
CONFIG_MMA9553=m
CONFIG_MSA311=m
# CONFIG_MXC4005 is not set
CONFIG_MXC6255=m
CONFIG_SCA3000=m
# CONFIG_SCA3300 is not set
CONFIG_STK8312=m
# CONFIG_STK8BA50 is not set
# end of Accelerometers

#
# Analog to digital converters
#
CONFIG_AD_SIGMA_DELTA=m
CONFIG_AD7091R5=m
# CONFIG_AD7124 is not set
CONFIG_AD7192=m
# CONFIG_AD7266 is not set
CONFIG_AD7280=m
# CONFIG_AD7291 is not set
CONFIG_AD7292=m
# CONFIG_AD7298 is not set
# CONFIG_AD7476 is not set
CONFIG_AD7606=m
# CONFIG_AD7606_IFACE_PARALLEL is not set
CONFIG_AD7606_IFACE_SPI=m
# CONFIG_AD7766 is not set
CONFIG_AD7768_1=m
# CONFIG_AD7780 is not set
# CONFIG_AD7791 is not set
CONFIG_AD7793=m
CONFIG_AD7887=m
# CONFIG_AD7923 is not set
CONFIG_AD7949=m
# CONFIG_AD799X is not set
# CONFIG_ADI_AXI_ADC is not set
CONFIG_ASPEED_ADC=m
CONFIG_AT91_SAMA5D2_ADC=m
CONFIG_AXP20X_ADC=m
CONFIG_AXP288_ADC=m
# CONFIG_BCM_IPROC_ADC is not set
# CONFIG_BERLIN2_ADC is not set
CONFIG_CC10001_ADC=m
# CONFIG_ENVELOPE_DETECTOR is not set
CONFIG_EXYNOS_ADC=m
# CONFIG_FSL_MX25_ADC is not set
CONFIG_HI8435=m
CONFIG_HX711=m
CONFIG_INA2XX_ADC=m
CONFIG_INGENIC_ADC=m
CONFIG_IMX7D_ADC=m
CONFIG_IMX8QXP_ADC=m
# CONFIG_LP8788_ADC is not set
CONFIG_LPC18XX_ADC=m
CONFIG_LPC32XX_ADC=m
CONFIG_LTC2471=m
# CONFIG_LTC2485 is not set
CONFIG_LTC2496=m
# CONFIG_LTC2497 is not set
CONFIG_MAX1027=m
CONFIG_MAX11100=m
# CONFIG_MAX1118 is not set
# CONFIG_MAX11205 is not set
CONFIG_MAX1241=m
# CONFIG_MAX1363 is not set
# CONFIG_MAX9611 is not set
CONFIG_MCP320X=m
CONFIG_MCP3422=m
CONFIG_MCP3911=m
CONFIG_MEDIATEK_MT6577_AUXADC=m
CONFIG_MEN_Z188_ADC=m
# CONFIG_MESON_SARADC is not set
CONFIG_MP2629_ADC=m
CONFIG_NAU7802=m
CONFIG_NPCM_ADC=m
CONFIG_QCOM_VADC_COMMON=m
# CONFIG_QCOM_SPMI_IADC is not set
CONFIG_QCOM_SPMI_VADC=m
# CONFIG_QCOM_SPMI_ADC5 is not set
CONFIG_RCAR_GYRO_ADC=m
CONFIG_RN5T618_ADC=m
CONFIG_ROCKCHIP_SARADC=m
CONFIG_RICHTEK_RTQ6056=m
CONFIG_RZG2L_ADC=m
# CONFIG_SC27XX_ADC is not set
CONFIG_SPEAR_ADC=m
CONFIG_SD_ADC_MODULATOR=m
CONFIG_STM32_ADC_CORE=m
# CONFIG_STM32_ADC is not set
CONFIG_STM32_DFSDM_CORE=m
# CONFIG_STM32_DFSDM_ADC is not set
# CONFIG_SUN4I_GPADC is not set
# CONFIG_TI_ADC081C is not set
# CONFIG_TI_ADC0832 is not set
CONFIG_TI_ADC084S021=m
CONFIG_TI_ADC12138=m
# CONFIG_TI_ADC108S102 is not set
CONFIG_TI_ADC128S052=m
# CONFIG_TI_ADC161S626 is not set
CONFIG_TI_ADS1015=m
CONFIG_TI_ADS7950=m
CONFIG_TI_ADS8344=m
CONFIG_TI_ADS8688=m
CONFIG_TI_ADS124S08=m
CONFIG_TI_ADS131E08=m
# CONFIG_TI_TLC4541 is not set
CONFIG_TI_TSC2046=m
CONFIG_TWL4030_MADC=m
# CONFIG_TWL6030_GPADC is not set
CONFIG_VF610_ADC=m
# CONFIG_XILINX_XADC is not set
CONFIG_XILINX_AMS=m
# end of Analog to digital converters

#
# Analog to digital and digital to analog converters
#
# CONFIG_AD74413R is not set
# end of Analog to digital and digital to analog converters

#
# Analog Front Ends
#
CONFIG_IIO_RESCALE=m
# end of Analog Front Ends

#
# Amplifiers
#
CONFIG_AD8366=m
CONFIG_ADA4250=m
# CONFIG_HMC425 is not set
# end of Amplifiers

#
# Capacitance to digital converters
#
CONFIG_AD7150=m
CONFIG_AD7746=m
# end of Capacitance to digital converters

#
# Chemical Sensors
#
# CONFIG_ATLAS_PH_SENSOR is not set
CONFIG_ATLAS_EZO_SENSOR=m
CONFIG_BME680=m
CONFIG_BME680_I2C=m
CONFIG_BME680_SPI=m
CONFIG_CCS811=m
CONFIG_IAQCORE=m
# CONFIG_PMS7003 is not set
CONFIG_SCD30_CORE=m
# CONFIG_SCD30_I2C is not set
CONFIG_SCD30_SERIAL=m
CONFIG_SCD4X=m
# CONFIG_SENSIRION_SGP30 is not set
CONFIG_SENSIRION_SGP40=m
CONFIG_SPS30=m
# CONFIG_SPS30_I2C is not set
CONFIG_SPS30_SERIAL=m
CONFIG_SENSEAIR_SUNRISE_CO2=m
# CONFIG_VZ89X is not set
# end of Chemical Sensors

# CONFIG_IIO_CROS_EC_SENSORS_CORE is not set

#
# Hid Sensor IIO Common
#
# end of Hid Sensor IIO Common

CONFIG_IIO_MS_SENSORS_I2C=m

#
# IIO SCMI Sensors
#
# end of IIO SCMI Sensors

#
# SSP Sensor Common
#
# CONFIG_IIO_SSP_SENSORHUB is not set
# end of SSP Sensor Common

CONFIG_IIO_ST_SENSORS_I2C=m
CONFIG_IIO_ST_SENSORS_SPI=m
CONFIG_IIO_ST_SENSORS_CORE=m

#
# Digital to analog converters
#
CONFIG_AD3552R=m
CONFIG_AD5064=m
CONFIG_AD5360=m
CONFIG_AD5380=m
CONFIG_AD5421=m
CONFIG_AD5446=m
CONFIG_AD5449=m
CONFIG_AD5592R_BASE=m
# CONFIG_AD5592R is not set
CONFIG_AD5593R=m
CONFIG_AD5504=m
# CONFIG_AD5624R_SPI is not set
CONFIG_LTC2688=m
CONFIG_AD5686=m
CONFIG_AD5686_SPI=m
CONFIG_AD5696_I2C=m
# CONFIG_AD5755 is not set
CONFIG_AD5758=m
CONFIG_AD5761=m
CONFIG_AD5764=m
# CONFIG_AD5766 is not set
CONFIG_AD5770R=m
CONFIG_AD5791=m
CONFIG_AD7293=m
# CONFIG_AD7303 is not set
# CONFIG_AD8801 is not set
# CONFIG_DPOT_DAC is not set
# CONFIG_DS4424 is not set
# CONFIG_LPC18XX_DAC is not set
# CONFIG_LTC1660 is not set
# CONFIG_LTC2632 is not set
# CONFIG_M62332 is not set
# CONFIG_MAX517 is not set
CONFIG_MAX5821=m
CONFIG_MCP4725=m
CONFIG_MCP4922=m
CONFIG_STM32_DAC=m
CONFIG_STM32_DAC_CORE=m
# CONFIG_TI_DAC082S085 is not set
# CONFIG_TI_DAC5571 is not set
CONFIG_TI_DAC7311=m
CONFIG_TI_DAC7612=m
# CONFIG_VF610_DAC is not set
# end of Digital to analog converters

#
# IIO dummy driver
#
# end of IIO dummy driver

#
# Filters
#
# end of Filters

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
CONFIG_AD9523=m
# end of Clock Generator/Distribution

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
CONFIG_ADF4350=m
CONFIG_ADF4371=m
CONFIG_ADMV1013=m
CONFIG_ADMV4420=m
CONFIG_ADRF6780=m
# end of Phase-Locked Loop (PLL) frequency synthesizers
# end of Frequency Synthesizers DDS/PLL

#
# Digital gyroscope sensors
#
CONFIG_ADIS16080=m
CONFIG_ADIS16130=m
CONFIG_ADIS16136=m
CONFIG_ADIS16260=m
# CONFIG_ADXRS290 is not set
CONFIG_ADXRS450=m
CONFIG_BMG160=m
CONFIG_BMG160_I2C=m
CONFIG_BMG160_SPI=m
CONFIG_FXAS21002C=m
CONFIG_FXAS21002C_I2C=m
CONFIG_FXAS21002C_SPI=m
CONFIG_MPU3050=m
CONFIG_MPU3050_I2C=m
# CONFIG_IIO_ST_GYRO_3AXIS is not set
CONFIG_ITG3200=m
# end of Digital gyroscope sensors

#
# Health Sensors
#

#
# Heart Rate Monitors
#
CONFIG_AFE4403=m
CONFIG_AFE4404=m
CONFIG_MAX30100=m
CONFIG_MAX30102=m
# end of Heart Rate Monitors
# end of Health Sensors

#
# Humidity sensors
#
CONFIG_AM2315=m
CONFIG_DHT11=m
CONFIG_HDC100X=m
CONFIG_HDC2010=m
# CONFIG_HTS221 is not set
CONFIG_HTU21=m
CONFIG_SI7005=m
# CONFIG_SI7020 is not set
# end of Humidity sensors

#
# Inertial measurement units
#
CONFIG_ADIS16400=m
CONFIG_ADIS16460=m
CONFIG_ADIS16475=m
CONFIG_ADIS16480=m
CONFIG_BMI160=m
CONFIG_BMI160_I2C=m
# CONFIG_BMI160_SPI is not set
# CONFIG_BOSCH_BNO055_SERIAL is not set
# CONFIG_BOSCH_BNO055_I2C is not set
# CONFIG_FXOS8700_I2C is not set
# CONFIG_FXOS8700_SPI is not set
CONFIG_KMX61=m
CONFIG_INV_ICM42600=m
# CONFIG_INV_ICM42600_I2C is not set
CONFIG_INV_ICM42600_SPI=m
CONFIG_INV_MPU6050_IIO=m
CONFIG_INV_MPU6050_I2C=m
CONFIG_INV_MPU6050_SPI=m
CONFIG_IIO_ST_LSM6DSX=m
CONFIG_IIO_ST_LSM6DSX_I2C=m
CONFIG_IIO_ST_LSM6DSX_SPI=m
CONFIG_IIO_ST_LSM6DSX_I3C=m
CONFIG_IIO_ST_LSM9DS0=m
CONFIG_IIO_ST_LSM9DS0_I2C=m
CONFIG_IIO_ST_LSM9DS0_SPI=m
# end of Inertial measurement units

CONFIG_IIO_ADIS_LIB=m
CONFIG_IIO_ADIS_LIB_BUFFER=y

#
# Light sensors
#
# CONFIG_ADJD_S311 is not set
CONFIG_ADUX1020=m
# CONFIG_AL3010 is not set
CONFIG_AL3320A=m
CONFIG_APDS9300=m
# CONFIG_APDS9960 is not set
CONFIG_AS73211=m
# CONFIG_BH1750 is not set
CONFIG_BH1780=m
CONFIG_CM32181=m
CONFIG_CM3232=m
# CONFIG_CM3323 is not set
# CONFIG_CM3605 is not set
CONFIG_CM36651=m
CONFIG_GP2AP002=m
CONFIG_GP2AP020A00F=m
CONFIG_IQS621_ALS=m
# CONFIG_SENSORS_ISL29018 is not set
CONFIG_SENSORS_ISL29028=m
# CONFIG_ISL29125 is not set
# CONFIG_JSA1212 is not set
CONFIG_RPR0521=m
CONFIG_SENSORS_LM3533=m
CONFIG_LTR501=m
CONFIG_LTRF216A=m
CONFIG_LV0104CS=m
CONFIG_MAX44000=m
CONFIG_MAX44009=m
CONFIG_NOA1305=m
CONFIG_OPT3001=m
# CONFIG_PA12203001 is not set
# CONFIG_SI1133 is not set
CONFIG_SI1145=m
CONFIG_STK3310=m
# CONFIG_ST_UVIS25 is not set
CONFIG_TCS3414=m
CONFIG_TCS3472=m
CONFIG_SENSORS_TSL2563=m
CONFIG_TSL2583=m
CONFIG_TSL2591=m
CONFIG_TSL2772=m
CONFIG_TSL4531=m
CONFIG_US5182D=m
# CONFIG_VCNL4000 is not set
CONFIG_VCNL4035=m
CONFIG_VEML6030=m
# CONFIG_VEML6070 is not set
CONFIG_VL6180=m
CONFIG_ZOPT2201=m
# end of Light sensors

#
# Magnetometer sensors
#
CONFIG_AK8974=m
CONFIG_AK8975=m
# CONFIG_AK09911 is not set
# CONFIG_BMC150_MAGN_I2C is not set
# CONFIG_BMC150_MAGN_SPI is not set
# CONFIG_MAG3110 is not set
CONFIG_MMC35240=m
CONFIG_IIO_ST_MAGN_3AXIS=m
CONFIG_IIO_ST_MAGN_I2C_3AXIS=m
CONFIG_IIO_ST_MAGN_SPI_3AXIS=m
CONFIG_SENSORS_HMC5843=m
# CONFIG_SENSORS_HMC5843_I2C is not set
CONFIG_SENSORS_HMC5843_SPI=m
CONFIG_SENSORS_RM3100=m
# CONFIG_SENSORS_RM3100_I2C is not set
CONFIG_SENSORS_RM3100_SPI=m
CONFIG_YAMAHA_YAS530=m
# end of Magnetometer sensors

#
# Multiplexers
#
# CONFIG_IIO_MUX is not set
# end of Multiplexers

#
# Inclinometer sensors
#
# end of Inclinometer sensors

#
# Triggers - standalone
#
# CONFIG_IIO_HRTIMER_TRIGGER is not set
CONFIG_IIO_INTERRUPT_TRIGGER=m
# CONFIG_IIO_STM32_LPTIMER_TRIGGER is not set
CONFIG_IIO_STM32_TIMER_TRIGGER=m
# CONFIG_IIO_TIGHTLOOP_TRIGGER is not set
CONFIG_IIO_SYSFS_TRIGGER=m
# end of Triggers - standalone

#
# Linear and angular position sensors
#
# CONFIG_IQS624_POS is not set
# end of Linear and angular position sensors

#
# Digital potentiometers
#
# CONFIG_AD5110 is not set
CONFIG_AD5272=m
# CONFIG_DS1803 is not set
# CONFIG_MAX5432 is not set
# CONFIG_MAX5481 is not set
CONFIG_MAX5487=m
# CONFIG_MCP4018 is not set
# CONFIG_MCP4131 is not set
CONFIG_MCP4531=m
# CONFIG_MCP41010 is not set
CONFIG_TPL0102=m
# end of Digital potentiometers

#
# Digital potentiostats
#
CONFIG_LMP91000=m
# end of Digital potentiostats

#
# Pressure sensors
#
CONFIG_ABP060MG=m
CONFIG_BMP280=m
CONFIG_BMP280_I2C=m
CONFIG_BMP280_SPI=m
# CONFIG_DLHL60D is not set
CONFIG_DPS310=m
# CONFIG_HP03 is not set
CONFIG_ICP10100=m
CONFIG_MPL115=m
CONFIG_MPL115_I2C=m
CONFIG_MPL115_SPI=m
# CONFIG_MPL3115 is not set
CONFIG_MS5611=m
# CONFIG_MS5611_I2C is not set
CONFIG_MS5611_SPI=m
CONFIG_MS5637=m
CONFIG_IIO_ST_PRESS=m
# CONFIG_IIO_ST_PRESS_I2C is not set
# CONFIG_IIO_ST_PRESS_SPI is not set
# CONFIG_T5403 is not set
CONFIG_HP206C=m
CONFIG_ZPA2326=m
CONFIG_ZPA2326_I2C=m
CONFIG_ZPA2326_SPI=m
# end of Pressure sensors

#
# Lightning sensors
#
# CONFIG_AS3935 is not set
# end of Lightning sensors

#
# Proximity and distance sensors
#
CONFIG_CROS_EC_MKBP_PROXIMITY=m
CONFIG_ISL29501=m
CONFIG_LIDAR_LITE_V2=m
CONFIG_MB1232=m
CONFIG_PING=m
# CONFIG_RFD77402 is not set
CONFIG_SRF04=m
CONFIG_SX_COMMON=m
CONFIG_SX9310=m
# CONFIG_SX9324 is not set
# CONFIG_SX9360 is not set
# CONFIG_SX9500 is not set
CONFIG_SRF08=m
CONFIG_VCNL3020=m
CONFIG_VL53L0X_I2C=m
# end of Proximity and distance sensors

#
# Resolver to digital converters
#
CONFIG_AD2S90=m
# CONFIG_AD2S1200 is not set
# end of Resolver to digital converters

#
# Temperature sensors
#
# CONFIG_IQS620AT_TEMP is not set
# CONFIG_LTC2983 is not set
# CONFIG_MAXIM_THERMOCOUPLE is not set
CONFIG_MLX90614=m
# CONFIG_MLX90632 is not set
CONFIG_TMP006=m
CONFIG_TMP007=m
CONFIG_TMP117=m
CONFIG_TSYS01=m
# CONFIG_TSYS02D is not set
CONFIG_MAX31856=m
CONFIG_MAX31865=m
# end of Temperature sensors

CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
CONFIG_PWM_DEBUG=y
# CONFIG_PWM_ATMEL is not set
# CONFIG_PWM_ATMEL_HLCDC_PWM is not set
CONFIG_PWM_ATMEL_TCB=m
# CONFIG_PWM_BCM_IPROC is not set
# CONFIG_PWM_BCM_KONA is not set
# CONFIG_PWM_BCM2835 is not set
CONFIG_PWM_BERLIN=y
CONFIG_PWM_BRCMSTB=y
CONFIG_PWM_CLK=m
CONFIG_PWM_CLPS711X=m
CONFIG_PWM_CROS_EC=y
CONFIG_PWM_EP93XX=y
# CONFIG_PWM_FSL_FTM is not set
CONFIG_PWM_HIBVT=y
CONFIG_PWM_IMG=y
CONFIG_PWM_IMX1=m
CONFIG_PWM_IMX27=y
CONFIG_PWM_IMX_TPM=m
CONFIG_PWM_INTEL_LGM=m
# CONFIG_PWM_IQS620A is not set
# CONFIG_PWM_JZ4740 is not set
# CONFIG_PWM_KEEMBAY is not set
CONFIG_PWM_LP3943=m
# CONFIG_PWM_LPC18XX_SCT is not set
# CONFIG_PWM_LPC32XX is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PLATFORM=m
CONFIG_PWM_MESON=m
CONFIG_PWM_MTK_DISP=m
CONFIG_PWM_MEDIATEK=m
CONFIG_PWM_MXS=m
CONFIG_PWM_OMAP_DMTIMER=m
CONFIG_PWM_PCA9685=y
CONFIG_PWM_PXA=m
# CONFIG_PWM_RASPBERRYPI_POE is not set
CONFIG_PWM_RCAR=m
CONFIG_PWM_RENESAS_TPU=y
CONFIG_PWM_ROCKCHIP=m
# CONFIG_PWM_SAMSUNG is not set
CONFIG_PWM_SIFIVE=m
# CONFIG_PWM_SL28CPLD is not set
CONFIG_PWM_SPEAR=y
CONFIG_PWM_SPRD=m
# CONFIG_PWM_STI is not set
# CONFIG_PWM_STM32 is not set
CONFIG_PWM_STM32_LP=m
CONFIG_PWM_SUN4I=m
# CONFIG_PWM_SUNPLUS is not set
CONFIG_PWM_TEGRA=m
# CONFIG_PWM_TIECAP is not set
# CONFIG_PWM_TIEHRPWM is not set
# CONFIG_PWM_TWL is not set
# CONFIG_PWM_TWL_LED is not set
CONFIG_PWM_VISCONTI=y
CONFIG_PWM_VT8500=m
CONFIG_PWM_XILINX=y

#
# IRQ chip support
#
CONFIG_IRQCHIP=y
# CONFIG_AL_FIC is not set
CONFIG_JCORE_AIC=y
CONFIG_RENESAS_INTC_IRQPIN=y
CONFIG_RENESAS_IRQC=y
CONFIG_RENESAS_RZA1_IRQC=y
CONFIG_RENESAS_RZG2L_IRQC=y
CONFIG_SL28CPLD_INTC=y
# CONFIG_TS4800_IRQ is not set
CONFIG_XILINX_INTC=y
# CONFIG_INGENIC_TCU_IRQ is not set
CONFIG_IRQ_UNIPHIER_AIDET=y
CONFIG_MESON_IRQ_GPIO=y
# CONFIG_IMX_IRQSTEER is not set
CONFIG_IMX_INTMUX=y
CONFIG_IMX_MU_MSI=m
# CONFIG_EXYNOS_IRQ_COMBINER is not set
CONFIG_MST_IRQ=y
CONFIG_MCHP_EIC=y
CONFIG_SUNPLUS_SP7021_INTC=y
# end of IRQ chip support

CONFIG_IPACK_BUS=y
CONFIG_RESET_CONTROLLER=y
CONFIG_RESET_A10SR=m
CONFIG_RESET_ATH79=y
# CONFIG_RESET_AXS10X is not set
CONFIG_RESET_BCM6345=y
# CONFIG_RESET_BERLIN is not set
CONFIG_RESET_BRCMSTB=y
CONFIG_RESET_BRCMSTB_RESCAL=y
CONFIG_RESET_HSDK=y
# CONFIG_RESET_IMX7 is not set
CONFIG_RESET_INTEL_GW=y
# CONFIG_RESET_K210 is not set
# CONFIG_RESET_LANTIQ is not set
CONFIG_RESET_LPC18XX=y
CONFIG_RESET_MCHP_SPARX5=y
CONFIG_RESET_MESON=y
CONFIG_RESET_MESON_AUDIO_ARB=y
CONFIG_RESET_NPCM=y
CONFIG_RESET_PISTACHIO=y
# CONFIG_RESET_QCOM_AOSS is not set
CONFIG_RESET_QCOM_PDC=y
CONFIG_RESET_RASPBERRYPI=y
CONFIG_RESET_RZG2L_USBPHY_CTRL=y
# CONFIG_RESET_SCMI is not set
CONFIG_RESET_SIMPLE=y
CONFIG_RESET_SOCFPGA=y
CONFIG_RESET_STARFIVE_JH7100=y
CONFIG_RESET_SUNPLUS=y
CONFIG_RESET_SUNXI=y
# CONFIG_RESET_TI_SCI is not set
# CONFIG_RESET_TI_SYSCON is not set
# CONFIG_RESET_TI_TPS380X is not set
CONFIG_RESET_TN48M_CPLD=y
CONFIG_RESET_UNIPHIER=y
CONFIG_RESET_UNIPHIER_GLUE=y
# CONFIG_RESET_ZYNQ is not set
CONFIG_COMMON_RESET_HI3660=y
# CONFIG_COMMON_RESET_HI6220 is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
CONFIG_GENERIC_PHY_MIPI_DPHY=y
CONFIG_PHY_LPC18XX_USB_OTG=y
CONFIG_PHY_PISTACHIO_USB=m
CONFIG_PHY_XGENE=y
CONFIG_USB_LGM_PHY=y
# CONFIG_PHY_CAN_TRANSCEIVER is not set
# CONFIG_PHY_SUN4I_USB is not set
CONFIG_PHY_SUN6I_MIPI_DPHY=m
CONFIG_PHY_SUN9I_USB=y
CONFIG_PHY_SUN50I_USB3=m
CONFIG_PHY_MESON8_HDMI_TX=y
# CONFIG_PHY_MESON8B_USB2 is not set
# CONFIG_PHY_MESON_GXL_USB2 is not set
CONFIG_PHY_MESON_G12A_MIPI_DPHY_ANALOG=m
CONFIG_PHY_MESON_G12A_USB2=m
# CONFIG_PHY_MESON_G12A_USB3_PCIE is not set
CONFIG_PHY_MESON_AXG_PCIE=y
CONFIG_PHY_MESON_AXG_MIPI_PCIE_ANALOG=m
# CONFIG_PHY_MESON_AXG_MIPI_DPHY is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_PHY_BCM63XX_USBH is not set
CONFIG_PHY_CYGNUS_PCIE=y
# CONFIG_PHY_BCM_SR_USB is not set
CONFIG_BCM_KONA_USB2_PHY=m
# CONFIG_PHY_BCM_NS_USB2 is not set
# CONFIG_PHY_NS2_USB_DRD is not set
CONFIG_PHY_BRCM_SATA=m
CONFIG_PHY_BRCM_USB=m
CONFIG_PHY_BCM_SR_PCIE=y
# end of PHY drivers for Broadcom platforms

CONFIG_PHY_CADENCE_TORRENT=y
CONFIG_PHY_CADENCE_DPHY=m
CONFIG_PHY_CADENCE_DPHY_RX=y
CONFIG_PHY_CADENCE_SIERRA=m
# CONFIG_PHY_CADENCE_SALVO is not set
CONFIG_PHY_FSL_IMX8MQ_USB=y
CONFIG_PHY_MIXEL_LVDS_PHY=y
CONFIG_PHY_MIXEL_MIPI_DPHY=m
CONFIG_PHY_FSL_IMX8M_PCIE=y
CONFIG_PHY_FSL_LYNX_28G=m
CONFIG_PHY_HI6220_USB=y
CONFIG_PHY_HI3660_USB=y
CONFIG_PHY_HI3670_USB=y
# CONFIG_PHY_HI3670_PCIE is not set
CONFIG_PHY_HISTB_COMBPHY=y
CONFIG_PHY_HISI_INNO_USB2=y
# CONFIG_PHY_INGENIC_USB is not set
# CONFIG_PHY_LANTIQ_VRX200_PCIE is not set
CONFIG_PHY_LANTIQ_RCU_USB2=y
CONFIG_ARMADA375_USBCLUSTER_PHY=y
CONFIG_PHY_BERLIN_SATA=y
CONFIG_PHY_BERLIN_USB=m
# CONFIG_PHY_MVEBU_A3700_UTMI is not set
# CONFIG_PHY_MVEBU_A38X_COMPHY is not set
CONFIG_PHY_MVEBU_CP110_UTMI=y
CONFIG_PHY_PXA_28NM_HSIC=m
CONFIG_PHY_PXA_28NM_USB2=m
CONFIG_PHY_PXA_USB=m
CONFIG_PHY_MMP3_USB=y
CONFIG_PHY_MMP3_HSIC=y
CONFIG_PHY_MTK_PCIE=m
# CONFIG_PHY_MTK_TPHY is not set
# CONFIG_PHY_MTK_UFS is not set
CONFIG_PHY_MTK_XSPHY=y
CONFIG_PHY_MTK_HDMI=m
# CONFIG_PHY_MTK_MIPI_DSI is not set
CONFIG_PHY_MTK_DP=m
# CONFIG_PHY_SPARX5_SERDES is not set
CONFIG_PHY_LAN966X_SERDES=m
# CONFIG_PHY_CPCAP_USB is not set
CONFIG_PHY_MAPPHONE_MDM6600=m
# CONFIG_PHY_OCELOT_SERDES is not set
CONFIG_PHY_ATH79_USB=y
# CONFIG_PHY_QCOM_EDP is not set
CONFIG_PHY_QCOM_IPQ4019_USB=m
# CONFIG_PHY_QCOM_PCIE2 is not set
CONFIG_PHY_QCOM_QMP=y
# CONFIG_PHY_QCOM_QUSB2 is not set
# CONFIG_PHY_QCOM_USB_HS is not set
# CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2 is not set
# CONFIG_PHY_QCOM_USB_HSIC is not set
# CONFIG_PHY_QCOM_USB_HS_28NM is not set
CONFIG_PHY_QCOM_USB_SS=m
CONFIG_PHY_QCOM_IPQ806X_USB=m
# CONFIG_PHY_MT7621_PCI is not set
CONFIG_PHY_RALINK_USB=m
CONFIG_PHY_RCAR_GEN3_USB3=y
# CONFIG_PHY_ROCKCHIP_DPHY_RX0 is not set
CONFIG_PHY_ROCKCHIP_INNO_HDMI=y
# CONFIG_PHY_ROCKCHIP_INNO_USB2 is not set
CONFIG_PHY_ROCKCHIP_INNO_CSIDPHY=y
# CONFIG_PHY_ROCKCHIP_INNO_DSIDPHY is not set
CONFIG_PHY_ROCKCHIP_PCIE=y
# CONFIG_PHY_ROCKCHIP_SNPS_PCIE3 is not set
# CONFIG_PHY_ROCKCHIP_TYPEC is not set
CONFIG_PHY_EXYNOS_DP_VIDEO=y
CONFIG_PHY_EXYNOS_MIPI_VIDEO=m
# CONFIG_PHY_EXYNOS_PCIE is not set
CONFIG_PHY_SAMSUNG_UFS=m
CONFIG_PHY_SAMSUNG_USB2=m
# CONFIG_PHY_S5PV210_USB2 is not set
CONFIG_PHY_UNIPHIER_USB2=m
# CONFIG_PHY_UNIPHIER_USB3 is not set
CONFIG_PHY_UNIPHIER_PCIE=m
CONFIG_PHY_UNIPHIER_AHCI=y
# CONFIG_PHY_ST_SPEAR1310_MIPHY is not set
# CONFIG_PHY_ST_SPEAR1340_MIPHY is not set
CONFIG_PHY_STIH407_USB=m
# CONFIG_PHY_STM32_USBPHYC is not set
CONFIG_PHY_SUNPLUS_USB=m
CONFIG_PHY_TEGRA194_P2U=y
CONFIG_PHY_DA8XX_USB=m
CONFIG_PHY_DM816X_USB=m
CONFIG_PHY_AM654_SERDES=m
CONFIG_PHY_J721E_WIZ=m
CONFIG_OMAP_CONTROL_PHY=y
CONFIG_TI_PIPE3=y
CONFIG_PHY_TUSB1210=m
# CONFIG_PHY_INTEL_KEEMBAY_EMMC is not set
CONFIG_PHY_INTEL_KEEMBAY_USB=m
# CONFIG_PHY_INTEL_LGM_COMBO is not set
CONFIG_PHY_INTEL_LGM_EMMC=m
CONFIG_PHY_INTEL_THUNDERBAY_EMMC=y
CONFIG_PHY_XILINX_ZYNQMP=m
# end of PHY Subsystem

CONFIG_POWERCAP=y
# CONFIG_DTPM is not set
CONFIG_MCB=m
CONFIG_MCB_LPC=m
CONFIG_RAS=y

#
# Android
#
CONFIG_ANDROID_BINDER_IPC=y
# CONFIG_ANDROID_BINDERFS is not set
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
CONFIG_ANDROID_BINDER_IPC_SELFTEST=y
# end of Android

# CONFIG_DAX is not set
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_APPLE_EFUSES is not set
# CONFIG_NVMEM_BCM_OCOTP is not set
CONFIG_NVMEM_BRCM_NVRAM=y
CONFIG_NVMEM_IMX_IIM=m
CONFIG_NVMEM_IMX_OCOTP=y
CONFIG_NVMEM_JZ4780_EFUSE=m
# CONFIG_NVMEM_LAN9662_OTPC is not set
# CONFIG_NVMEM_LAYERSCAPE_SFP is not set
CONFIG_NVMEM_LPC18XX_EEPROM=y
# CONFIG_NVMEM_LPC18XX_OTP is not set
# CONFIG_NVMEM_MESON_MX_EFUSE is not set
CONFIG_NVMEM_MICROCHIP_OTPC=y
CONFIG_NVMEM_MTK_EFUSE=y
CONFIG_NVMEM_MXS_OCOTP=m
CONFIG_NVMEM_NINTENDO_OTP=y
# CONFIG_NVMEM_QCOM_QFPROM is not set
CONFIG_NVMEM_RAVE_SP_EEPROM=m
CONFIG_NVMEM_RMEM=y
CONFIG_NVMEM_ROCKCHIP_EFUSE=m
CONFIG_NVMEM_ROCKCHIP_OTP=y
CONFIG_NVMEM_SC27XX_EFUSE=m
CONFIG_NVMEM_SNVS_LPGPR=y
CONFIG_NVMEM_SPMI_SDAM=m
CONFIG_NVMEM_SPRD_EFUSE=y
CONFIG_NVMEM_STM32_ROMEM=m
# CONFIG_NVMEM_SUNPLUS_OCOTP is not set
CONFIG_NVMEM_U_BOOT_ENV=m
CONFIG_NVMEM_UNIPHIER_EFUSE=y
CONFIG_NVMEM_VF610_OCOTP=y

#
# HW tracing support
#
CONFIG_STM=y
# CONFIG_STM_PROTO_BASIC is not set
CONFIG_STM_PROTO_SYS_T=y
CONFIG_STM_DUMMY=y
CONFIG_STM_SOURCE_CONSOLE=y
# CONFIG_STM_SOURCE_HEARTBEAT is not set
CONFIG_STM_SOURCE_FTRACE=y
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
CONFIG_INTEL_TH_DEBUG=y
# end of HW tracing support

CONFIG_FPGA=y
CONFIG_FPGA_MGR_SOCFPGA=y
CONFIG_FPGA_MGR_SOCFPGA_A10=y
CONFIG_ALTERA_PR_IP_CORE=m
CONFIG_ALTERA_PR_IP_CORE_PLAT=m
CONFIG_FPGA_MGR_ALTERA_PS_SPI=y
CONFIG_FPGA_MGR_ZYNQ_FPGA=y
# CONFIG_FPGA_MGR_XILINX_SPI is not set
CONFIG_FPGA_MGR_ICE40_SPI=y
CONFIG_FPGA_MGR_MACHXO2_SPI=y
CONFIG_FPGA_BRIDGE=y
CONFIG_ALTERA_FREEZE_BRIDGE=y
CONFIG_XILINX_PR_DECOUPLER=y
CONFIG_FPGA_REGION=m
# CONFIG_OF_FPGA_REGION is not set
# CONFIG_FPGA_DFL is not set
# CONFIG_FPGA_MGR_ZYNQMP_FPGA is not set
# CONFIG_FPGA_MGR_VERSAL_FPGA is not set
CONFIG_FPGA_M10_BMC_SEC_UPDATE=m
CONFIG_FPGA_MGR_MICROCHIP_SPI=m
CONFIG_FSI=m
# CONFIG_FSI_NEW_DEV_NODE is not set
# CONFIG_FSI_MASTER_GPIO is not set
# CONFIG_FSI_MASTER_HUB is not set
CONFIG_FSI_MASTER_ASPEED=m
CONFIG_FSI_SCOM=m
CONFIG_FSI_SBEFIFO=m
CONFIG_FSI_OCC=m
CONFIG_TEE=y
CONFIG_MULTIPLEXER=y

#
# Multiplexer drivers
#
CONFIG_MUX_ADG792A=y
CONFIG_MUX_ADGS1408=y
# CONFIG_MUX_GPIO is not set
CONFIG_MUX_MMIO=y
# end of Multiplexer drivers

CONFIG_PM_OPP=y
# CONFIG_SIOX is not set
CONFIG_SLIMBUS=m
# CONFIG_SLIM_QCOM_CTRL is not set
# CONFIG_INTERCONNECT is not set
CONFIG_COUNTER=y
CONFIG_104_QUAD_8=y
# CONFIG_INTERRUPT_CNT is not set
CONFIG_STM32_TIMER_CNT=y
# CONFIG_STM32_LPTIMER_CNT is not set
CONFIG_TI_EQEP=y
CONFIG_FTM_QUADDEC=y
CONFIG_MICROCHIP_TCB_CAPTURE=m
CONFIG_TI_ECAP_CAPTURE=y
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_EXPORTFS=m
# CONFIG_EXPORTFS_BLOCK_OPS is not set
# CONFIG_FILE_LOCKING is not set
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_VERITY=y
CONFIG_FS_VERITY_DEBUG=y
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
# CONFIG_DNOTIFY is not set
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_FUSE_FS is not set
CONFIG_OVERLAY_FS=m
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
CONFIG_OVERLAY_FS_INDEX=y
CONFIG_OVERLAY_FS_NFS_EXPORT=y
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=y
CONFIG_FSCACHE=y
CONFIG_FSCACHE_DEBUG=y
# end of Caches

#
# Pseudo filesystems
#
# CONFIG_PROC_FS is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_TMPFS_XATTR is not set
CONFIG_MEMFD_CREATE=y
CONFIG_CONFIGFS_FS=y
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS=m
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_737=m
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=m
# CONFIG_NLS_CODEPAGE_950 is not set
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
# CONFIG_NLS_CODEPAGE_874 is not set
CONFIG_NLS_ISO8859_8=m
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
# CONFIG_NLS_MAC_CENTEURO is not set
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
# CONFIG_NLS_MAC_ICELAND is not set
CONFIG_NLS_MAC_INUIT=m
# CONFIG_NLS_MAC_ROMANIAN is not set
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEPRECATED_API=y
# CONFIG_DLM_DEBUG is not set
# CONFIG_UNICODE is not set
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_REQUEST_CACHE=y
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_USER_DECRYPTED_DATA is not set
CONFIG_KEY_DH_OPERATIONS=y
CONFIG_SECURITY_DMESG_RESTRICT=y
CONFIG_SECURITYFS=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_BARE=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
CONFIG_INIT_STACK_ALL_PATTERN=y
# CONFIG_INIT_STACK_ALL_ZERO is not set
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization

CONFIG_CC_HAS_RANDSTRUCT=y
# CONFIG_RANDSTRUCT_NONE is not set
CONFIG_RANDSTRUCT_FULL=y
CONFIG_RANDSTRUCT=y
# end of Kernel hardening options
# end of Security options

CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_FIPS_NAME="Linux Kernel Cryptographic API"
CONFIG_CRYPTO_FIPS_CUSTOM_VERSION=y
CONFIG_CRYPTO_FIPS_VERSION="(none)"
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=y
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ENGINE=y
# end of Crypto core or helper

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=y
CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_ECDSA=y
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_SM2 is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# end of Public-key cryptography

#
# Block ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_ARIA=m
CONFIG_CRYPTO_BLOWFISH=y
CONFIG_CRYPTO_BLOWFISH_COMMON=y
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=y
CONFIG_CRYPTO_CAST5=y
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_SM4=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
# CONFIG_CRYPTO_TWOFISH is not set
# end of Block ciphers

#
# Length-preserving ciphers and modes
#
CONFIG_CRYPTO_ADIANTUM=y
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=y
# CONFIG_CRYPTO_HCTR2 is not set
CONFIG_CRYPTO_KEYWRAP=y
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_OFB is not set
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=m
CONFIG_CRYPTO_NHPOLY1305=y
# end of Length-preserving ciphers and modes

#
# AEAD (authenticated encryption with associated data) ciphers
#
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_ECHAINIV=m
CONFIG_CRYPTO_ESSIV=m
# end of AEAD (authenticated encryption with associated data) ciphers

#
# Hashes, digests, and MACs
#
# CONFIG_CRYPTO_BLAKE2B is not set
CONFIG_CRYPTO_CMAC=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
CONFIG_CRYPTO_POLY1305=y
# CONFIG_CRYPTO_RMD160 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=y
CONFIG_CRYPTO_SM3=m
CONFIG_CRYPTO_SM3_GENERIC=m
CONFIG_CRYPTO_STREEBOG=y
CONFIG_CRYPTO_VMAC=y
CONFIG_CRYPTO_WP512=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_XXHASH=y
# end of Hashes, digests, and MACs

#
# CRCs (cyclic redundancy checks)
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=y
CONFIG_CRYPTO_CRCT10DIF=m
CONFIG_CRYPTO_CRC64_ROCKSOFT=m
# end of CRCs (cyclic redundancy checks)

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=m
CONFIG_CRYPTO_842=y
CONFIG_CRYPTO_LZ4=y
CONFIG_CRYPTO_LZ4HC=m
CONFIG_CRYPTO_ZSTD=m
# end of Compression

#
# Random number generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
# end of Random number generation

#
# Userspace interface
#
CONFIG_CRYPTO_USER_API=y
# CONFIG_CRYPTO_USER_API_HASH is not set
CONFIG_CRYPTO_USER_API_SKCIPHER=y
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
# CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE is not set
# CONFIG_CRYPTO_STATS is not set
# end of Userspace interface

CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_ALLWINNER is not set
# CONFIG_CRYPTO_DEV_EXYNOS_RNG is not set
CONFIG_CRYPTO_DEV_S5P=m
# CONFIG_CRYPTO_DEV_EXYNOS_HASH is not set
# CONFIG_CRYPTO_DEV_ATMEL_AES is not set
CONFIG_CRYPTO_DEV_ATMEL_TDES=y
CONFIG_CRYPTO_DEV_ATMEL_SHA=y
CONFIG_CRYPTO_DEV_ATMEL_I2C=m
CONFIG_CRYPTO_DEV_ATMEL_ECC=m
CONFIG_CRYPTO_DEV_ATMEL_SHA204A=m
CONFIG_CRYPTO_DEV_QCE=m
CONFIG_CRYPTO_DEV_QCE_AEAD=y
# CONFIG_CRYPTO_DEV_QCE_ENABLE_ALL is not set
# CONFIG_CRYPTO_DEV_QCE_ENABLE_SKCIPHER is not set
# CONFIG_CRYPTO_DEV_QCE_ENABLE_SHA is not set
CONFIG_CRYPTO_DEV_QCE_ENABLE_AEAD=y
# CONFIG_CRYPTO_DEV_QCOM_RNG is not set
CONFIG_CRYPTO_DEV_IMGTEC_HASH=y
CONFIG_CRYPTO_DEV_ZYNQMP_AES=m
CONFIG_CRYPTO_DEV_ZYNQMP_SHA3=m
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_CRYPTO_DEV_SAFEXCEL=m
CONFIG_CRYPTO_DEV_CCREE=m
CONFIG_CRYPTO_DEV_HISI_SEC=y
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_CRYPTO_DEV_SA2UL=m
CONFIG_CRYPTO_DEV_KEEMBAY_OCS_AES_SM4=m
CONFIG_CRYPTO_DEV_KEEMBAY_OCS_AES_SM4_ECB=y
# CONFIG_CRYPTO_DEV_KEEMBAY_OCS_AES_SM4_CTS is not set
CONFIG_CRYPTO_DEV_KEEMBAY_OCS_ECC=m
# CONFIG_CRYPTO_DEV_KEEMBAY_OCS_HCU is not set
# CONFIG_CRYPTO_DEV_ASPEED is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS8_PRIVATE_KEY_PARSER=y
CONFIG_PKCS7_MESSAGE_PARSER=y
CONFIG_PKCS7_TEST_KEY=y
CONFIG_SIGNED_PE_FILE_VERIFICATION=y
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
# CONFIG_MODULE_SIG_KEY_TYPE_RSA is not set
CONFIG_MODULE_SIG_KEY_TYPE_ECDSA=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE_SIZE=4096
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# CONFIG_SYSTEM_REVOCATION_LIST is not set
CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE=y
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_LINEAR_RANGES=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=m
CONFIG_PRIME_NUMBERS=m
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_STMP_DEVICE=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_UTILS=y
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=y
CONFIG_CRYPTO_LIB_CURVE25519=y
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
CONFIG_CRYPTO_LIB_POLY1305=y
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA1=y
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=m
CONFIG_CRC64_ROCKSOFT=m
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC32_SELFTEST=m
# CONFIG_CRC32_SLICEBY8 is not set
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
CONFIG_CRC32_BIT=y
CONFIG_CRC64=m
CONFIG_CRC4=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=y
CONFIG_842_DECOMPRESS=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=y
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMMON=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_POWERPC is not set
# CONFIG_XZ_DEC_IA64 is not set
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_MICROLZMA=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=y
CONFIG_REED_SOLOMON_ENC16=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_BCH=m
CONFIG_BCH_CONST_PARAMS=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_DMA_DECLARE_COHERENT=y
CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE=y
CONFIG_DMA_GLOBAL_POOL=y
CONFIG_DMA_API_DEBUG=y
CONFIG_DMA_API_DEBUG_SG=y
CONFIG_DMA_MAP_BENCHMARK=y
CONFIG_SGL_ALLOC=y
# CONFIG_FORCE_NR_CPUS is not set
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
CONFIG_GLOB_SELFTEST=m
CONFIG_NLATTR=y
CONFIG_GENERIC_ATOMIC64=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_DIMLIB=y
CONFIG_LIBFDT=y
CONFIG_OID_REGISTRY=y
CONFIG_SG_SPLIT=y
CONFIG_STACKDEPOT=y
CONFIG_REF_TRACKER=y
CONFIG_PARMAN=m
# CONFIG_OBJAGG is not set
# end of Library routines

CONFIG_POLYNOMIAL=m

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_SYMBOLIC_ERRNAME=y
# CONFIG_DEBUG_BUGVERBOSE is not set
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_MISC is not set

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_AS_HAS_NON_CONST_LEB128=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
CONFIG_DEBUG_INFO_COMPRESSED=y
CONFIG_DEBUG_INFO_SPLIT=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_PAHOLE_HAS_BTF_TAG=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HEADERS_INSTALL=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_VMLINUX_MAP is not set
CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
# CONFIG_MAGIC_SYSRQ_SERIAL is not set
CONFIG_DEBUG_FS=y
# CONFIG_DEBUG_FS_ALLOW_ALL is not set
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
CONFIG_DEBUG_FS_ALLOW_NONE=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_KGDB=y
# CONFIG_KGDB_TESTS is not set
CONFIG_KGDB_KDB=y
CONFIG_KDB_DEFAULT_ENABLE=0x1
CONFIG_KDB_CONTINUE_CATASTROPHIC=0
CONFIG_UBSAN=y
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_CC_HAS_UBSAN_ARRAY_BOUNDS=y
# CONFIG_UBSAN_BOUNDS is not set
# CONFIG_UBSAN_SHIFT is not set
CONFIG_UBSAN_UNREACHABLE=y
# CONFIG_UBSAN_BOOL is not set
CONFIG_UBSAN_ENUM=y
CONFIG_TEST_UBSAN=m
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
CONFIG_DEBUG_NET=y
# end of Networking Debugging

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
# CONFIG_SLUB_DEBUG is not set
CONFIG_PAGE_OWNER=y
CONFIG_PAGE_POISONING=y
CONFIG_DEBUG_PAGE_REF=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_SHRINKER_DEBUG=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_VM is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_WQ_WATCHDOG=y
CONFIG_TEST_LOCKUP=m
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RAW_LOCK_NESTING=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
CONFIG_SCF_TORTURE_TEST=y
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
CONFIG_WARN_ALL_UNSEEDED_RANDOM=y
CONFIG_DEBUG_KOBJECT=y

#
# Debug kernel data structures
#
# CONFIG_DEBUG_LIST is not set
CONFIG_DEBUG_PLIST=y
# CONFIG_DEBUG_SG is not set
CONFIG_DEBUG_NOTIFIERS=y
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# CONFIG_DEBUG_MAPLE_TREE is not set
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=y
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_REF_SCALE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
CONFIG_NOP_TRACER=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set

#
# hexagon Debugging
#
# end of hexagon Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
CONFIG_NOTIFIER_ERROR_INJECTION=m
CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT=m
CONFIG_NETDEV_NOTIFIER_ERROR_INJECT=m
# CONFIG_FAULT_INJECTION is not set
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=m
CONFIG_TEST_MIN_HEAP=y
# CONFIG_TEST_DIV64 is not set
CONFIG_BACKTRACE_SELF_TEST=y
CONFIG_TEST_REF_TRACKER=y
CONFIG_RBTREE_TEST=m
CONFIG_REED_SOLOMON_TEST=y
CONFIG_INTERVAL_TREE_TEST=m
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=m
CONFIG_TEST_HEXDUMP=m
CONFIG_STRING_SELFTEST=y
CONFIG_TEST_STRING_HELPERS=m
# CONFIG_TEST_STRSCPY is not set
CONFIG_TEST_KSTRTOX=m
# CONFIG_TEST_PRINTF is not set
CONFIG_TEST_SCANF=m
# CONFIG_TEST_BITMAP is not set
CONFIG_TEST_UUID=m
CONFIG_TEST_XARRAY=m
# CONFIG_TEST_MAPLE_TREE is not set
CONFIG_TEST_RHASHTABLE=m
# CONFIG_TEST_SIPHASH is not set
CONFIG_TEST_IDA=y
CONFIG_TEST_PARMAN=m
# CONFIG_TEST_LKM is not set
CONFIG_TEST_BITOPS=m
# CONFIG_TEST_VMALLOC is not set
CONFIG_TEST_USER_COPY=m
CONFIG_TEST_BPF=m
CONFIG_TEST_BLACKHOLE_DEV=m
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
CONFIG_TEST_UDELAY=y
CONFIG_TEST_STATIC_KEYS=m
CONFIG_TEST_MEMCAT_P=m
# CONFIG_TEST_MEMINIT is not set
CONFIG_TEST_FREE_PAGES=y
# end of Kernel Testing and Coverage

#
# Rust hacking
#
# end of Rust hacking

CONFIG_WARN_MISSING_DOCUMENTS=y
CONFIG_WARN_ABI_ERRORS=y
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
@ 2022-12-07 16:41   ` Kumar Kartikeya Dwivedi
  2022-12-07 18:34     ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-12-07 16:41 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
> There, a BTF record is created for any type containing a spin_lock or
> any next-gen datastructure node/head.
>
> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
> a record using struct_meta_tab if the reg->type exactly matches
> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
> "allocated obj" type - returned from bpf_obj_new - might pick up other
> flags while working its way through the program.
>

Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
is to make then unpassable to helpers.

> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
> for type_flag.
>
> This patch is marked Fixes as the original intent of reg_btf_record was
> unlikely to have been to fail finding btf_record for valid alloc obj
> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
> are valid register type states for alloc obj independent of this series.

That was the actual intent, same as how check_ptr_to_btf_access uses the exact
reg->type to allow the BPF_WRITE case.

I think this series is the one introducing this case, passing bpf_rbtree_first's
result to bpf_rbtree_remove, which I think is not possible to make safe in the
first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
bpf_list_del due to this exact issue. More in [0].

 [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com

> However, I didn't find a specific broken repro case outside of this
> series' added functionality, so it's possible that nothing was
> triggering this logic error before.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Fixes: 4e814da0d599 ("bpf: Allow locking bpf_spin_lock in allocated objects")
> ---
>  kernel/bpf/verifier.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 1d51bd9596da..67a13110bc22 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -451,6 +451,11 @@ static bool reg_type_not_null(enum bpf_reg_type type)
>  		type == PTR_TO_SOCK_COMMON;
>  }
>
> +static bool type_is_ptr_alloc_obj(u32 type)
> +{
> +	return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC;
> +}
> +
>  static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>  {
>  	struct btf_record *rec = NULL;
> @@ -458,7 +463,7 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>
>  	if (reg->type == PTR_TO_MAP_VALUE) {
>  		rec = reg->map_ptr->record;
> -	} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
> +	} else if (type_is_ptr_alloc_obj(reg->type)) {
>  		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
>  		if (meta)
>  			rec = meta->record;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails
  2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
  2022-12-07  1:32   ` Alexei Starovoitov
@ 2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
  2022-12-07 19:05     ` Alexei Starovoitov
  1 sibling, 1 reply; 51+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-12-07 16:49 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 04:39:49AM IST, Dave Marchevsky wrote:
> map_check_btf calls btf_parse_fields to create a btf_record for its
> value_type. If there are no special fields in the value_type
> btf_parse_fields returns NULL, whereas if there special value_type
> fields but they are invalid in some way an error is returned.
>
> An example invalid state would be:
>
>   struct node_data {
>     struct bpf_rb_node node;
>     int data;
>   };
>
>   private(A) struct bpf_spin_lock glock;
>   private(A) struct bpf_list_head ghead __contains(node_data, node);
>
> groot should be invalid as its __contains tag points to a field with
> type != "bpf_list_node".
>
> Before this patch, such a scenario would result in btf_parse_fields
> returning an error ptr, subsequent !IS_ERR_OR_NULL check failing,
> and btf_check_and_fixup_fields returning 0, which would then be
> returned by map_check_btf.
>
> After this patch's changes, -EINVAL would be returned by map_check_btf
> and the map would correctly fail to load.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Fixes: aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
> ---
>  kernel/bpf/syscall.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..c3599a7902f0 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1007,7 +1007,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  	map->record = btf_parse_fields(btf, value_type,
>  				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
>  				       map->value_size);
> -	if (!IS_ERR_OR_NULL(map->record)) {
> +	if (IS_ERR(map->record))
> +		return -EINVAL;
> +

I didn't do this on purpose, because of backward compatibility concerns. An
error has not been returned in earlier kernel versions during map creation time
and those fields acted like normal non-special regions, with errors on use of
helpers that act on those fields.

Especially that bpf_spin_lock and bpf_timer are part of the unified btf_record.

If we are doing such a change, then you should also drop the checks for IS_ERR
in verifier.c, since that shouldn't be possible anymore. But I think we need to
think carefully before changing this.

One possible example is: If we introduce bpf_foo in the future and program
already has that defined in map value, using it for some other purpose, with
different alignment and size, their map creation will start failing.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-07  6:46     ` Dave Marchevsky
@ 2022-12-07 18:06       ` Alexei Starovoitov
  2022-12-07 23:39         ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 18:06 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 07, 2022 at 01:46:56AM -0500, Dave Marchevsky wrote:
> On 12/6/22 9:39 PM, Alexei Starovoitov wrote:
> > On Tue, Dec 06, 2022 at 03:09:57PM -0800, Dave Marchevsky wrote:
> >> Current comment in BPF_PROBE_MEM jit code claims that verifier prevents
> >> insn->off < 0, but this appears to not be true irrespective of changes
> >> in this series. Regardless, changes in this series will result in an
> >> example like:
> >>
> >>   struct example_node {
> >>     long key;
> >>     long val;
> >>     struct bpf_rb_node node;
> >>   }
> >>
> >>   /* In BPF prog, assume root contains example_node nodes */
> >>   struct bpf_rb_node res = bpf_rbtree_first(&root);
> >>   if (!res)
> >>     return 1;
> >>
> >>   struct example_node n = container_of(res, struct example_node, node);
> >>   long key = n->key;
> >>
> >> Resulting in a load with off = -16, as bpf_rbtree_first's return is
> > 
> > Looks like the bug in the previous patch:
> > +                       } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> > +                                  meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> > +                               struct btf_field *field = meta.arg_rbtree_root.field;
> > +
> > +                               mark_reg_datastructure_node(regs, BPF_REG_0,
> > +                                                           &field->datastructure_head);
> > 
> > The R0 .off should have been:
> >  regs[BPF_REG_0].off = field->rb_node.node_offset;
> > 
> > node, not root.
> > 
> > PTR_TO_BTF_ID should have been returned with approriate 'off',
> > so that container_of() would it bring back to zero offset.
> > 
> 
> The root's btf_field is used to hold information about the node type. Of
> specific interest to us are value_btf_id and node_offset, which
> mark_reg_datastructure_node uses to set REG_0's type and offset correctly.
> 
> This "use head type to keep info about node type" strategy felt strange to me
> initially too: all PTR_TO_BTF_ID regs are passing around their type info, so
> why not use that to lookup bpf_rb_node field info? But consider that
> bpf_rbtree_first (and bpf_list_pop_{front,back}) doesn't take a node as
> input arg, so there's no opportunity to get btf_field info from input
> reg type. 
> 
> So we'll need to keep this info in rbtree_root's btf_field
> regardless, and since any rbtree API function that operates on a node
> also operates on a root and expects its node arg to match the node
> type expected by the root, might as well use root's field as the main
> lookup for this info and not even have &field->rb_node for now.
> All __process_kf_arg_ptr_to_datastructure_node calls (added earlier
> in the series) use the &meta->arg_{list_head,rbtree_root}.field for same
> reason.
> 
> So it's setting the reg offset correctly.

Ok. Got it. Than the commit log is incorrectly describing the failing scenario.
It's a container_of() inside bool less() that is generating negative offsets.

> > All PTR_TO_BTF_ID need to have positive offset.
> > I'm not sure btf_struct_walk() and other PTR_TO_BTF_ID accessors
> > can deal with negative offsets.
> > There could be all kinds of things to fix.
> 
> I think you may be conflating reg offset and insn offset here. None of the
> changes in this series result in a PTR_TO_BTF_ID reg w/ negative offset
> being returned. But LLVM may generate load insns with a negative offset,
> and since we're passing around pointers to bpf_rb_node that may come
> after useful data fields in a type, this will happen more often.
> 
> Consider this small example from selftests in this series:
> 
> struct node_data {
>   long key;
>   long data;
>   struct bpf_rb_node node;
> };
> 
> static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
> {
>         struct node_data *node_a;
>         struct node_data *node_b;
> 
>         node_a = container_of(a, struct node_data, node);
>         node_b = container_of(b, struct node_data, node);
> 
>         return node_a->key < node_b->key;
> }
> 
> llvm-objdump shows this bpf bytecode for 'less':
> 
> 0000000000000000 <less>:
> ;       return node_a->key < node_b->key;
>        0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
>        1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
>        2:       b4 00 00 00 01 00 00 00 w0 = 0x1
> ;       return node_a->key < node_b->key;

I see. That's the same bug.
The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with 
correct positive offset.
Then node_a = container_of(a, struct node_data, node);
would have produced correct offset into proper btf_id.

The verifier should be passing into less() the btf_id
of struct node_data instead of btf_id of struct bpf_rb_node.

>        3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
>        4:       b4 00 00 00 00 00 00 00 w0 = 0x0
> 
> 0000000000000028 <LBB2_2>:
> ;       return node_a->key < node_b->key;
>        5:       95 00 00 00 00 00 00 00 exit
> 
> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
> negative insn->off. Verifier's view or R1 and R2 before insn 0 is
> untrusted_ptr_node_data(off=16). If there were some intermediate insns
> storing result of container_of() before dereferencing:
> 
>   r3 = (r2 - 0x10)
>   r2 = *(u64 *)(r3)
> 
> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
> using original arg ptrs to less() instead of storing container_of() ptr
> adjustments.
> 
> Since the container_of usage and code pattern in above example's less()
> isn't particularly specific to this series, I think there are other scenarios
> where such code would be generated and considered this a general bugfix in
> cover letter.

imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set.

> 
> [ below paragraph was moved here, it originally preceded "All PTR_TO_BTF_ID"
>   paragraph ]
> 
> > The apporach of returning untrusted from bpf_rbtree_first is questionable.
> > Without doing that this issue would not have surfaced.
> > 
> 
> I agree re: PTR_UNTRUSTED, but note that my earlier example doesn't involve
> bpf_rbtree_first. Regardless, I think the issue is that PTR_UNTRUSTED is
> used to denote a few separate traits of a PTR_TO_BTF_ID reg:
> 
>   * "I have no ownership over the thing I'm pointing to"
>   * "My backing memory may go away at any time"
>   * "Access to my fields might result in page fault"
>   * "Kfuncs shouldn't accept me as an arg"
> 
> Seems like original PTR_UNTRUSTED usage really wanted to denote the first
> point and the others were just naturally implied from the first. But
> as you've noted there are some things using PTR_UNTRUSTED that really
> want to make more granular statements:

I think PTR_UNTRUSTED implies all of the above. All 4 statements are connected.

> ref_set_release_on_unlock logic sets release_on_unlock = true and adds
> PTR_UNTRUSTED to the reg type. In this case PTR_UNTRUSTED is trying to say:
> 
>   * "I have no ownership over the thing I'm pointing to"
>   * "My backing memory may go away at any time _after_ bpf_spin_unlock"
>     * Before spin_unlock it's guaranteed to be valid
>   * "Kfuncs shouldn't accept me as an arg"
>     * We don't want arbitrary kfunc saving and accessing release_on_unlock
>       reg after bpf_spin_unlock, as its backing memory can go away any time
>       after spin_unlock.
> 
> The "backing memory" statement PTR_UNTRUSTED is making is a blunt superset
> of what release_on_unlock really needs.
> 
> For less() callback we just want
> 
>   * "I have no ownership over the thing I'm pointing to"
>   * "Kfuncs shouldn't accept me as an arg"
> 
> There is probably a way to decompose PTR_UNTRUSTED into a few flags such that
> it's possible to denote these things separately and avoid unwanted additional
> behavior. But after talking to David Vernet about current complexity of
> PTR_TRUSTED and PTR_UNTRUSTED logic and his desire to refactor, it seemed
> better to continue with PTR_UNTRUSTED blunt instrument with a bit of
> special casing for now, instead of piling on more flags.

Exactly. More flags will only increase the confusion.
Please try to make callback args as proper PTR_TRUSTED and disallow calling specific
rbtree kfuncs while inside this particular callback to prevent recursion.
That would solve all these issues, no?
Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus.

Consider less() receiving btf_id ptr_trusted of struct node_data and it contains
both link list and rbtree.
It should still be safe to operate on link list part of that node from less()
though it's not something we would ever recommend.
The kfunc call on rb tree part of struct node_data is problematic because
of recursion, right? No other safety concerns ?

> > 
> >> modified by verifier to be PTR_TO_BTF_ID of example_node w/ offset =
> >> offsetof(struct example_node, node), instead of PTR_TO_BTF_ID of
> >> bpf_rb_node. So it's necessary to support negative insn->off when
> >> jitting BPF_PROBE_MEM.
> > 
> > I'm not convinced it's necessary.
> > container_of() seems to be the only case where bpf prog can convert
> > PTR_TO_BTF_ID with off >= 0 to negative off.
> > Normal pointer walking will not make it negative.
> > 
> 
> I see what you mean - if some non-container_of case resulted in load generation
> with negative insn->off, this probably would've been noticed already. But
> hopefully my replies above explain why it should be addressed now.

Even with container_of() usage we should be passing proper btf_id of container
struct, so that callbacks and non-callbacks can properly container_of() it
and still get offset >= 0.

> >>
> >> A few instructions are saved for negative insn->offs as a result. Using
> >> the struct example_node / off = -16 example from before, code looks
> >> like:
> > 
> > This is quite complex to review. I couldn't convince myself
> > that droping 2nd check is safe, but don't have an argument to
> > prove that it's not safe.
> > Let's get to these details when there is need to support negative off.
> > 
> 
> Hopefully above explanation shows that there's need to support it now.
> I will try to simplify and rephrase the summary to make it easier to follow,
> but will prioritize addressing feedback in less complex patches, so this
> patch may not change for a few respins.

I'm not saying that this patch will never be needed.
Supporting negative offsets here is a good thing.
I'm arguing that it's not necessary to enable bpf_rbtree.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 16:41   ` Kumar Kartikeya Dwivedi
@ 2022-12-07 18:34     ` Dave Marchevsky
  2022-12-07 18:59       ` Alexei Starovoitov
  2022-12-07 19:03       ` Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 18:34 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
>> There, a BTF record is created for any type containing a spin_lock or
>> any next-gen datastructure node/head.
>>
>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
>> a record using struct_meta_tab if the reg->type exactly matches
>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
>> "allocated obj" type - returned from bpf_obj_new - might pick up other
>> flags while working its way through the program.
>>
> 
> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
> is to make then unpassable to helpers.
> 

I see what you mean. If reg_btf_record is only used on regs which are args,
then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
but then we have its use in reg_may_point_to_spin_lock, which is itself
used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
sure that it's only used on arg regs currently.

Regardless, if the intended use is on arg regs only, it should be renamed to
arg_reg_btf_record or similar to make that clear, as current name sounds like
it should be applicable to any reg, and thus not enforce constraints particular
to arg regs.

But I think it's better to leave it general and enforce those constraints
elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
big switch statements for KF_ARG_* are doing exact type matching.

>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
>> for type_flag.
>>
>> This patch is marked Fixes as the original intent of reg_btf_record was
>> unlikely to have been to fail finding btf_record for valid alloc obj
>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
>> are valid register type states for alloc obj independent of this series.
> 
> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
> reg->type to allow the BPF_WRITE case.
> 
> I think this series is the one introducing this case, passing bpf_rbtree_first's
> result to bpf_rbtree_remove, which I think is not possible to make safe in the
> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
> bpf_list_del due to this exact issue. More in [0].
> 
>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
> 

Thanks for the link, I better understand what Alexei meant in his comment on
patch 9 of this series. For the helpers added in this series, we can make
bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
refs after the rbtree_remove in same manner as they're invalidated after
spin_unlock currently.

Logic for why this is safe:

  * If we have two non-owning refs to nodes in a tree, e.g. from
    bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
    we have no way of knowing if they're aliases of same node.

  * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
    it might be removing a node that's already been removed, e.g.:

        n = bpf_obj_new(...);
        bpf_spin_lock(&lock);

        bpf_rbtree_add(&tree, &n->node);
        // n is now non-owning ref to node which was added
        res = bpf_rbtree_first();
        if (!m) {}
        m = container_of(res, struct node_data, node);
        // m is now non-owning ref to the same node
        bpf_rbtree_remove(&tree, &n->node);
        bpf_rbtree_remove(&tree, &m->node); // BAD

        bpf_spin_unlock(&lock);

  * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
    of pointing to something that was already removed _only_ after a
    rbtree_remove, so if we invalidate them all after rbtree_remove they can't
    be inputs to subsequent remove()s

This does conflate current "release non-owning refs because it's not safe to
read from them" reasoning with new "release non-owning refs so they can't be
passed to remove()". Ideally we could add some new tag to these refs that
prevents them from being passed to remove()-type fns, but does allow them to
be read, e.g.:

  n = bpf_obj_new(...);
  bpf_spin_lock(&lock);

  bpf_rbtree_add(&tree, &n->node);
  // n is now non-owning ref to node which was added
  res = bpf_rbtree_first();
  if (!m) {}
  m = container_of(res, struct node_data, node);
  // m is now non-owning ref to the same node
  n = bpf_rbtree_remove(&tree, &n->node);
  // n is now owning ref again, m is non-owning ref to same node
  x = m->key; // this should be safe since we're still in CS
  bpf_rbtree_remove(&tree, &m->node); // But this should be prevented

  bpf_spin_unlock(&lock);

But this would introduce too much addt'l complexity for now IMO. The proposal
of just invalidating all non-owning refs prevents both the unsafe second
remove() and the safe x = m->key.

I will give it a shot, if it doesn't work can change rbtree_remove to
rbtree_remove_first w/o node param. But per that linked convo such logic
should be tackled eventually, might as well chip away at it now.

>> However, I didn't find a specific broken repro case outside of this
>> series' added functionality, so it's possible that nothing was
>> triggering this logic error before.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>> Fixes: 4e814da0d599 ("bpf: Allow locking bpf_spin_lock in allocated objects")
>> ---
>>  kernel/bpf/verifier.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 1d51bd9596da..67a13110bc22 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -451,6 +451,11 @@ static bool reg_type_not_null(enum bpf_reg_type type)
>>  		type == PTR_TO_SOCK_COMMON;
>>  }
>>
>> +static bool type_is_ptr_alloc_obj(u32 type)
>> +{
>> +	return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC;
>> +}
>> +
>>  static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>>  {
>>  	struct btf_record *rec = NULL;
>> @@ -458,7 +463,7 @@ static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg)
>>
>>  	if (reg->type == PTR_TO_MAP_VALUE) {
>>  		rec = reg->map_ptr->record;
>> -	} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
>> +	} else if (type_is_ptr_alloc_obj(reg->type)) {
>>  		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
>>  		if (meta)
>>  			rec = meta->record;
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types
  2022-12-07  1:41   ` Alexei Starovoitov
@ 2022-12-07 18:52     ` Dave Marchevsky
  2022-12-07 19:01       ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 18:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/6/22 8:41 PM, Alexei Starovoitov wrote:
> On Tue, Dec 06, 2022 at 03:09:51PM -0800, Dave Marchevsky wrote:
>> Many of the structs recently added to track field info for linked-list
>> head are useful as-is for rbtree root. So let's do a mechanical renaming
>> of list_head-related types and fields:
>>
>> include/linux/bpf.h:
>>   struct btf_field_list_head -> struct btf_field_datastructure_head
>>   list_head -> datastructure_head in struct btf_field union
>> kernel/bpf/btf.c:
>>   list_head -> datastructure_head in struct btf_field_info
> 
> Looking through this patch and others it eventually becomes
> confusing with 'datastructure head' name.
> I'm not sure what is 'head' of the data structure.
> There is head in the link list, but 'head of tree' is odd.
> 
> The attemp here is to find a common name that represents programming
> concept where there is a 'root' and there are 'nodes' that added to that 'root'.
> The 'data structure' name is too broad in that sense.
> Especially later it becomes 'datastructure_api' which is even broader.
> 
> I was thinking to propose:
>  struct btf_field_list_head -> struct btf_field_tree_root
>  list_head -> tree_root in struct btf_field union
> 
> and is_kfunc_tree_api later...
> since link list is a tree too.
> 
> But reading 'tree' next to other names like 'field', 'kfunc'
> it might be mistaken that 'tree' applies to the former.
> So I think using 'graph' as more general concept to describe both
> link list and rb-tree would be the best.
> 
> So the proposal:
>  struct btf_field_list_head -> struct btf_field_graph_root
>  list_head -> graph_root in struct btf_field union
> 
> and is_kfunc_graph_api later...
> 
> 'graph' is short enough and rarely used in names,
> so it stands on its own next to 'field' and in combination
> with other names.
> wdyt?
> 

I'm not a huge fan of 'graph', but it's certainly better than
'datastructure_api', and avoids the "all next-gen datastructures must do this"
implication of a 'ng_ds' name. So will try the rename in v2.

(all specific GRAPH naming suggestions in subsequent patches will
be done as well)

list 'head' -> list 'root' SGTM as well. Not ideal, but alternatives
are worse (rbtree 'head'...)

>>
>> This is a nonfunctional change, functionality to actually use these
>> fields for rbtree will be added in further patches.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  include/linux/bpf.h   |  4 ++--
>>  kernel/bpf/btf.c      | 21 +++++++++++----------
>>  kernel/bpf/helpers.c  |  4 ++--
>>  kernel/bpf/verifier.c | 21 +++++++++++----------
>>  4 files changed, 26 insertions(+), 24 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 4920ac252754..9e8b12c7061e 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -189,7 +189,7 @@ struct btf_field_kptr {
>>  	u32 btf_id;
>>  };
>>  
>> -struct btf_field_list_head {
>> +struct btf_field_datastructure_head {
>>  	struct btf *btf;
>>  	u32 value_btf_id;
>>  	u32 node_offset;
>> @@ -201,7 +201,7 @@ struct btf_field {
>>  	enum btf_field_type type;
>>  	union {
>>  		struct btf_field_kptr kptr;
>> -		struct btf_field_list_head list_head;
>> +		struct btf_field_datastructure_head datastructure_head;
>>  	};
>>  };
>>  
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index c80bd8709e69..284e3e4b76b7 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3227,7 +3227,7 @@ struct btf_field_info {
>>  		struct {
>>  			const char *node_name;
>>  			u32 value_btf_id;
>> -		} list_head;
>> +		} datastructure_head;
>>  	};
>>  };
>>  
>> @@ -3334,8 +3334,8 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
>>  		return -EINVAL;
>>  	info->type = BPF_LIST_HEAD;
>>  	info->off = off;
>> -	info->list_head.value_btf_id = id;
>> -	info->list_head.node_name = list_node;
>> +	info->datastructure_head.value_btf_id = id;
>> +	info->datastructure_head.node_name = list_node;
>>  	return BTF_FIELD_FOUND;
>>  }
>>  
>> @@ -3603,13 +3603,14 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
>>  	u32 offset;
>>  	int i;
>>  
>> -	t = btf_type_by_id(btf, info->list_head.value_btf_id);
>> +	t = btf_type_by_id(btf, info->datastructure_head.value_btf_id);
>>  	/* We've already checked that value_btf_id is a struct type. We
>>  	 * just need to figure out the offset of the list_node, and
>>  	 * verify its type.
>>  	 */
>>  	for_each_member(i, t, member) {
>> -		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
>> +		if (strcmp(info->datastructure_head.node_name,
>> +			   __btf_name_by_offset(btf, member->name_off)))
>>  			continue;
>>  		/* Invalid BTF, two members with same name */
>>  		if (n)
>> @@ -3626,9 +3627,9 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
>>  		if (offset % __alignof__(struct bpf_list_node))
>>  			return -EINVAL;
>>  
>> -		field->list_head.btf = (struct btf *)btf;
>> -		field->list_head.value_btf_id = info->list_head.value_btf_id;
>> -		field->list_head.node_offset = offset;
>> +		field->datastructure_head.btf = (struct btf *)btf;
>> +		field->datastructure_head.value_btf_id = info->datastructure_head.value_btf_id;
>> +		field->datastructure_head.node_offset = offset;
>>  	}
>>  	if (!n)
>>  		return -ENOENT;
>> @@ -3735,11 +3736,11 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>>  
>>  		if (!(rec->fields[i].type & BPF_LIST_HEAD))
>>  			continue;
>> -		btf_id = rec->fields[i].list_head.value_btf_id;
>> +		btf_id = rec->fields[i].datastructure_head.value_btf_id;
>>  		meta = btf_find_struct_meta(btf, btf_id);
>>  		if (!meta)
>>  			return -EFAULT;
>> -		rec->fields[i].list_head.value_rec = meta->record;
>> +		rec->fields[i].datastructure_head.value_rec = meta->record;
>>  
>>  		if (!(rec->field_mask & BPF_LIST_NODE))
>>  			continue;
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index cca642358e80..6c67740222c2 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -1737,12 +1737,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>>  	while (head != orig_head) {
>>  		void *obj = head;
>>  
>> -		obj -= field->list_head.node_offset;
>> +		obj -= field->datastructure_head.node_offset;
>>  		head = head->next;
>>  		/* The contained type can also have resources, including a
>>  		 * bpf_list_head which needs to be freed.
>>  		 */
>> -		bpf_obj_free_fields(field->list_head.value_rec, obj);
>> +		bpf_obj_free_fields(field->datastructure_head.value_rec, obj);
>>  		/* bpf_mem_free requires migrate_disable(), since we can be
>>  		 * called from map free path as well apart from BPF program (as
>>  		 * part of map ops doing bpf_obj_free_fields).
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6f0aac837d77..bc80b4c4377b 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8615,21 +8615,22 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>  
>>  	field = meta->arg_list_head.field;
>>  
>> -	et = btf_type_by_id(field->list_head.btf, field->list_head.value_btf_id);
>> +	et = btf_type_by_id(field->datastructure_head.btf, field->datastructure_head.value_btf_id);
>>  	t = btf_type_by_id(reg->btf, reg->btf_id);
>> -	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
>> -				  field->list_head.value_btf_id, true)) {
>> +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->datastructure_head.btf,
>> +				  field->datastructure_head.value_btf_id, true)) {
>>  		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
>>  			"in struct %s, but arg is at offset=%d in struct %s\n",
>> -			field->list_head.node_offset, btf_name_by_offset(field->list_head.btf, et->name_off),
>> +			field->datastructure_head.node_offset,
>> +			btf_name_by_offset(field->datastructure_head.btf, et->name_off),
>>  			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
>>  		return -EINVAL;
>>  	}
>>  
>> -	if (list_node_off != field->list_head.node_offset) {
>> +	if (list_node_off != field->datastructure_head.node_offset) {
>>  		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
>> -			list_node_off, field->list_head.node_offset,
>> -			btf_name_by_offset(field->list_head.btf, et->name_off));
>> +			list_node_off, field->datastructure_head.node_offset,
>> +			btf_name_by_offset(field->datastructure_head.btf, et->name_off));
>>  		return -EINVAL;
>>  	}
>>  	/* Set arg#1 for expiration after unlock */
>> @@ -9078,9 +9079,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  
>>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
>> -				regs[BPF_REG_0].btf = field->list_head.btf;
>> -				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
>> -				regs[BPF_REG_0].off = field->list_head.node_offset;
>> +				regs[BPF_REG_0].btf = field->datastructure_head.btf;
>> +				regs[BPF_REG_0].btf_id = field->datastructure_head.value_btf_id;
>> +				regs[BPF_REG_0].off = field->datastructure_head.node_offset;
>>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 18:34     ` Dave Marchevsky
@ 2022-12-07 18:59       ` Alexei Starovoitov
  2022-12-07 20:38         ` Dave Marchevsky
  2022-12-07 19:03       ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 18:59 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
> >> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
> >> There, a BTF record is created for any type containing a spin_lock or
> >> any next-gen datastructure node/head.
> >>
> >> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
> >> a record using struct_meta_tab if the reg->type exactly matches
> >> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
> >> "allocated obj" type - returned from bpf_obj_new - might pick up other
> >> flags while working its way through the program.
> >>
> > 
> > Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
> > passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
> > Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
> > cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
> > is to make then unpassable to helpers.
> > 
> 
> I see what you mean. If reg_btf_record is only used on regs which are args,
> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
> but then we have its use in reg_may_point_to_spin_lock, which is itself
> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
> sure that it's only used on arg regs currently.
> 
> Regardless, if the intended use is on arg regs only, it should be renamed to
> arg_reg_btf_record or similar to make that clear, as current name sounds like
> it should be applicable to any reg, and thus not enforce constraints particular
> to arg regs.
> 
> But I think it's better to leave it general and enforce those constraints
> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
> big switch statements for KF_ARG_* are doing exact type matching.
> 
> >> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
> >> for type_flag.
> >>
> >> This patch is marked Fixes as the original intent of reg_btf_record was
> >> unlikely to have been to fail finding btf_record for valid alloc obj
> >> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
> >> are valid register type states for alloc obj independent of this series.
> > 
> > That was the actual intent, same as how check_ptr_to_btf_access uses the exact
> > reg->type to allow the BPF_WRITE case.
> > 
> > I think this series is the one introducing this case, passing bpf_rbtree_first's
> > result to bpf_rbtree_remove, which I think is not possible to make safe in the
> > first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
> > bpf_list_del due to this exact issue. More in [0].
> > 
> >  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
> > 
> 
> Thanks for the link, I better understand what Alexei meant in his comment on
> patch 9 of this series. For the helpers added in this series, we can make
> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
> refs after the rbtree_remove in same manner as they're invalidated after
> spin_unlock currently.
> 
> Logic for why this is safe:
> 
>   * If we have two non-owning refs to nodes in a tree, e.g. from
>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>     we have no way of knowing if they're aliases of same node.
> 
>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>     it might be removing a node that's already been removed, e.g.:
> 
>         n = bpf_obj_new(...);
>         bpf_spin_lock(&lock);
> 
>         bpf_rbtree_add(&tree, &n->node);
>         // n is now non-owning ref to node which was added
>         res = bpf_rbtree_first();
>         if (!m) {}
>         m = container_of(res, struct node_data, node);
>         // m is now non-owning ref to the same node
>         bpf_rbtree_remove(&tree, &n->node);
>         bpf_rbtree_remove(&tree, &m->node); // BAD

Let me clarify my previous email:

Above doesn't have to be 'BAD'.
Instead of
if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))

we can drop WARN and simply return.
If node is not part of the tree -> nop.

Same for bpf_rbtree_add.
If it's already added -> nop.

Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
We do all these checks under the same rbtree root lock, so it's safe.

>         bpf_spin_unlock(&lock);
> 
>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>     of pointing to something that was already removed _only_ after a
>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>     be inputs to subsequent remove()s

With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
can have release semantics.
No need for special release_on_unlock hacks.

> This does conflate current "release non-owning refs because it's not safe to
> read from them" reasoning with new "release non-owning refs so they can't be
> passed to remove()". Ideally we could add some new tag to these refs that
> prevents them from being passed to remove()-type fns, but does allow them to
> be read, e.g.:
> 
>   n = bpf_obj_new(...);

'n' is acquired.

>   bpf_spin_lock(&lock);
> 
>   bpf_rbtree_add(&tree, &n->node);
>   // n is now non-owning ref to node which was added

since bpf_rbtree_add does release on 'n'...

>   res = bpf_rbtree_first();
>   if (!m) {}
>   m = container_of(res, struct node_data, node);
>   // m is now non-owning ref to the same node

... below is not allowed by the verifier.
>   n = bpf_rbtree_remove(&tree, &n->node);

I'm not sure what's an idea to return 'n' from remove...
Maybe it should be simple bool ?

>   // n is now owning ref again, m is non-owning ref to same node
>   x = m->key; // this should be safe since we're still in CS

below works because 'm' cames from bpf_rbtree_first that acquired 'res'.

>   bpf_rbtree_remove(&tree, &m->node); // But this should be prevented
> 
>   bpf_spin_unlock(&lock);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types
  2022-12-07 18:52     ` Dave Marchevsky
@ 2022-12-07 19:01       ` Alexei Starovoitov
  0 siblings, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 19:01 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 07, 2022 at 01:52:07PM -0500, Dave Marchevsky wrote:
> On 12/6/22 8:41 PM, Alexei Starovoitov wrote:
> > On Tue, Dec 06, 2022 at 03:09:51PM -0800, Dave Marchevsky wrote:
> >> Many of the structs recently added to track field info for linked-list
> >> head are useful as-is for rbtree root. So let's do a mechanical renaming
> >> of list_head-related types and fields:
> >>
> >> include/linux/bpf.h:
> >>   struct btf_field_list_head -> struct btf_field_datastructure_head
> >>   list_head -> datastructure_head in struct btf_field union
> >> kernel/bpf/btf.c:
> >>   list_head -> datastructure_head in struct btf_field_info
> > 
> > Looking through this patch and others it eventually becomes
> > confusing with 'datastructure head' name.
> > I'm not sure what is 'head' of the data structure.
> > There is head in the link list, but 'head of tree' is odd.
> > 
> > The attemp here is to find a common name that represents programming
> > concept where there is a 'root' and there are 'nodes' that added to that 'root'.
> > The 'data structure' name is too broad in that sense.
> > Especially later it becomes 'datastructure_api' which is even broader.
> > 
> > I was thinking to propose:
> >  struct btf_field_list_head -> struct btf_field_tree_root
> >  list_head -> tree_root in struct btf_field union
> > 
> > and is_kfunc_tree_api later...
> > since link list is a tree too.
> > 
> > But reading 'tree' next to other names like 'field', 'kfunc'
> > it might be mistaken that 'tree' applies to the former.
> > So I think using 'graph' as more general concept to describe both
> > link list and rb-tree would be the best.
> > 
> > So the proposal:
> >  struct btf_field_list_head -> struct btf_field_graph_root
> >  list_head -> graph_root in struct btf_field union
> > 
> > and is_kfunc_graph_api later...
> > 
> > 'graph' is short enough and rarely used in names,
> > so it stands on its own next to 'field' and in combination
> > with other names.
> > wdyt?
> > 
> 
> I'm not a huge fan of 'graph', but it's certainly better than
> 'datastructure_api', and avoids the "all next-gen datastructures must do this"
> implication of a 'ng_ds' name. So will try the rename in v2.

fwiw I don't like 'next-' bit in 'next-gen ds'.
A year from now the 'next' will sound really old.
Just like N in NAPI used to be 'new'.

> (all specific GRAPH naming suggestions in subsequent patches will
> be done as well)
> 
> list 'head' -> list 'root' SGTM as well. Not ideal, but alternatives
> are worse (rbtree 'head'...)

Thanks!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 18:34     ` Dave Marchevsky
  2022-12-07 18:59       ` Alexei Starovoitov
@ 2022-12-07 19:03       ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 51+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-12-07 19:03 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Thu, Dec 08, 2022 at 12:04:44AM IST, Dave Marchevsky wrote:
> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
> >> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
> >> There, a BTF record is created for any type containing a spin_lock or
> >> any next-gen datastructure node/head.
> >>
> >> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
> >> a record using struct_meta_tab if the reg->type exactly matches
> >> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
> >> "allocated obj" type - returned from bpf_obj_new - might pick up other
> >> flags while working its way through the program.
> >>
> >
> > Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
> > passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
> > Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
> > cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
> > is to make then unpassable to helpers.
> >
>
> I see what you mean. If reg_btf_record is only used on regs which are args,
> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
> but then we have its use in reg_may_point_to_spin_lock, which is itself
> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
> sure that it's only used on arg regs currently.
>
> Regardless, if the intended use is on arg regs only, it should be renamed to
> arg_reg_btf_record or similar to make that clear, as current name sounds like
> it should be applicable to any reg, and thus not enforce constraints particular
> to arg regs.
>
> But I think it's better to leave it general and enforce those constraints
> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
> big switch statements for KF_ARG_* are doing exact type matching.
>
> >> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
> >> for type_flag.
> >>
> >> This patch is marked Fixes as the original intent of reg_btf_record was
> >> unlikely to have been to fail finding btf_record for valid alloc obj
> >> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
> >> are valid register type states for alloc obj independent of this series.
> >
> > That was the actual intent, same as how check_ptr_to_btf_access uses the exact
> > reg->type to allow the BPF_WRITE case.
> >
> > I think this series is the one introducing this case, passing bpf_rbtree_first's
> > result to bpf_rbtree_remove, which I think is not possible to make safe in the
> > first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
> > bpf_list_del due to this exact issue. More in [0].
> >
> >  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
> >
>
> Thanks for the link, I better understand what Alexei meant in his comment on
> patch 9 of this series. For the helpers added in this series, we can make
> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
> refs after the rbtree_remove in same manner as they're invalidated after
> spin_unlock currently.
>

Rather than doing that, you'll cut down on a lot of complexity and confusion
regarding PTR_UNTRUSTED's use in this set by removing bpf_rbtree_first and
bpf_rbtree_remove, and simply exposing bpf_rbtree_pop_front.

> Logic for why this is safe:
>
>   * If we have two non-owning refs to nodes in a tree, e.g. from
>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>     we have no way of knowing if they're aliases of same node.
>
>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>     it might be removing a node that's already been removed, e.g.:
>
>         n = bpf_obj_new(...);
>         bpf_spin_lock(&lock);
>
>         bpf_rbtree_add(&tree, &n->node);
>         // n is now non-owning ref to node which was added
>         res = bpf_rbtree_first();
>         if (!m) {}
>         m = container_of(res, struct node_data, node);
>         // m is now non-owning ref to the same node
>         bpf_rbtree_remove(&tree, &n->node);
>         bpf_rbtree_remove(&tree, &m->node); // BAD
>
>         bpf_spin_unlock(&lock);
>
>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>     of pointing to something that was already removed _only_ after a
>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>     be inputs to subsequent remove()s
>
> This does conflate current "release non-owning refs because it's not safe to
> read from them" reasoning with new "release non-owning refs so they can't be
> passed to remove()". Ideally we could add some new tag to these refs that
> prevents them from being passed to remove()-type fns, but does allow them to
> be read, e.g.:
>
>   n = bpf_obj_new(...);
>   bpf_spin_lock(&lock);
>
>   bpf_rbtree_add(&tree, &n->node);
>   // n is now non-owning ref to node which was added
>   res = bpf_rbtree_first();
>   if (!m) {}
>   m = container_of(res, struct node_data, node);
>   // m is now non-owning ref to the same node
>   n = bpf_rbtree_remove(&tree, &n->node);
>   // n is now owning ref again, m is non-owning ref to same node
>   x = m->key; // this should be safe since we're still in CS
>   bpf_rbtree_remove(&tree, &m->node); // But this should be prevented
>
>   bpf_spin_unlock(&lock);
>
> But this would introduce too much addt'l complexity for now IMO. The proposal
> of just invalidating all non-owning refs prevents both the unsafe second
> remove() and the safe x = m->key.
>
> I will give it a shot, if it doesn't work can change rbtree_remove to
> rbtree_remove_first w/o node param. But per that linked convo such logic
> should be tackled eventually, might as well chip away at it now.
>

I sympathise with your goal to make it as close to kernel programming style as
possible. I was exploring the same option (as you saw in that link). But based
on multiple discussions so far and trying different approaches, I'm convinced
the additional complexity in the verifier is not worth it.

Both bpf_list_del and bpf_rbtree_remove are useful and should be added, but
should work on e.g. 'current' node in iteration callback. In that context
verifier knows that the node is part of the list/rbtree. Introducing more than
one node in that same context introduces potential aliasing which hinders the
verifier's ability to reason about safety. Then, it has to be pessimistic like
your case and invalidate everything to prevent invalid use, so that double
list_del and double rbtree_remove is not possible.

You will avoid all the problems with PTR_UNTRUSTED being passed to helpers if
you adopt such an approach. The code will become much simpler, while allowing
people to do the same thing without any loss of usability.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails
  2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
@ 2022-12-07 19:05     ` Alexei Starovoitov
  2022-12-17  8:59       ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 19:05 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 10:19:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, Dec 07, 2022 at 04:39:49AM IST, Dave Marchevsky wrote:
> > map_check_btf calls btf_parse_fields to create a btf_record for its
> > value_type. If there are no special fields in the value_type
> > btf_parse_fields returns NULL, whereas if there special value_type
> > fields but they are invalid in some way an error is returned.
> >
> > An example invalid state would be:
> >
> >   struct node_data {
> >     struct bpf_rb_node node;
> >     int data;
> >   };
> >
> >   private(A) struct bpf_spin_lock glock;
> >   private(A) struct bpf_list_head ghead __contains(node_data, node);
> >
> > groot should be invalid as its __contains tag points to a field with
> > type != "bpf_list_node".
> >
> > Before this patch, such a scenario would result in btf_parse_fields
> > returning an error ptr, subsequent !IS_ERR_OR_NULL check failing,
> > and btf_check_and_fixup_fields returning 0, which would then be
> > returned by map_check_btf.
> >
> > After this patch's changes, -EINVAL would be returned by map_check_btf
> > and the map would correctly fail to load.
> >
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > Fixes: aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
> > ---
> >  kernel/bpf/syscall.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 35972afb6850..c3599a7902f0 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1007,7 +1007,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >  	map->record = btf_parse_fields(btf, value_type,
> >  				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
> >  				       map->value_size);
> > -	if (!IS_ERR_OR_NULL(map->record)) {
> > +	if (IS_ERR(map->record))
> > +		return -EINVAL;
> > +
> 
> I didn't do this on purpose, because of backward compatibility concerns. An
> error has not been returned in earlier kernel versions during map creation time
> and those fields acted like normal non-special regions, with errors on use of
> helpers that act on those fields.
> 
> Especially that bpf_spin_lock and bpf_timer are part of the unified btf_record.
> 
> If we are doing such a change, then you should also drop the checks for IS_ERR
> in verifier.c, since that shouldn't be possible anymore. But I think we need to
> think carefully before changing this.
> 
> One possible example is: If we introduce bpf_foo in the future and program
> already has that defined in map value, using it for some other purpose, with
> different alignment and size, their map creation will start failing.

That's a good point.
If we can error on such misconstructed map at the program verification time that's better
anyway, since there will be a proper verifier log instead of EINVAL from map_create.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (13 preceding siblings ...)
  2022-12-07  2:50 ` [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure patchwork-bot+netdevbpf
@ 2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
  2022-12-07 22:28   ` Dave Marchevsky
  14 siblings, 1 reply; 51+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-12-07 19:36 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 04:39:47AM IST, Dave Marchevsky wrote:
> This series adds a rbtree datastructure following the "next-gen
> datastructure" precedent set by recently-added linked-list [0]. This is
> a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
> instead of adding a new map type. This series adds a smaller set of API
> functions than that RFC - just the minimum needed to support current
> cgfifo example scheduler in ongoing sched_ext effort [2], namely:
>
>   bpf_rbtree_add
>   bpf_rbtree_remove
>   bpf_rbtree_first
>
> [...]
>
> Future work:
>   Enabling writes to release_on_unlock refs should be done before the
>   functionality of BPF rbtree can truly be considered complete.
>   Implementing this proved more complex than expected so it's been
>   pushed off to a future patch.
>

TBH, I think we need to revisit whether there's a strong need for this. I would
even argue that we should simply make the release semantics of rbtree_add,
list_push helpers stronger and remove release_on_unlock logic entirely,
releasing the node immediately. I don't see why it is so critical to have read,
and more importantly, write access to nodes after losing their ownership. And
that too is only available until the lock is unlocked.

I think this relaxed release logic and write support is the wrong direction to
take, as it has a direct bearing on what can be done with a node inside the
critical section. There's already the problem with not being able to do
bpf_obj_drop easily inside the critical section with this. That might be useful
for draining operations while holding the lock.

Semantically in other languages, once you move an object, accessing it is
usually a bug, and in most of the cases it is sufficient to prepare it before
insertion. We are certainly in the same territory here with these APIs.

Can you elaborate on actual use cases where immediate release or not having
write support makes it hard or impossible to support a certain use case, so that
it is easier to understand the requirements and design things accordingly?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 18:59       ` Alexei Starovoitov
@ 2022-12-07 20:38         ` Dave Marchevsky
  2022-12-07 22:46           ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 20:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 1:59 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
>> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
>>>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
>>>> There, a BTF record is created for any type containing a spin_lock or
>>>> any next-gen datastructure node/head.
>>>>
>>>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
>>>> a record using struct_meta_tab if the reg->type exactly matches
>>>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
>>>> "allocated obj" type - returned from bpf_obj_new - might pick up other
>>>> flags while working its way through the program.
>>>>
>>>
>>> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
>>> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
>>> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
>>> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
>>> is to make then unpassable to helpers.
>>>
>>
>> I see what you mean. If reg_btf_record is only used on regs which are args,
>> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
>> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
>> but then we have its use in reg_may_point_to_spin_lock, which is itself
>> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
>> sure that it's only used on arg regs currently.
>>
>> Regardless, if the intended use is on arg regs only, it should be renamed to
>> arg_reg_btf_record or similar to make that clear, as current name sounds like
>> it should be applicable to any reg, and thus not enforce constraints particular
>> to arg regs.
>>
>> But I think it's better to leave it general and enforce those constraints
>> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
>> big switch statements for KF_ARG_* are doing exact type matching.
>>
>>>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
>>>> for type_flag.
>>>>
>>>> This patch is marked Fixes as the original intent of reg_btf_record was
>>>> unlikely to have been to fail finding btf_record for valid alloc obj
>>>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
>>>> are valid register type states for alloc obj independent of this series.
>>>
>>> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
>>> reg->type to allow the BPF_WRITE case.
>>>
>>> I think this series is the one introducing this case, passing bpf_rbtree_first's
>>> result to bpf_rbtree_remove, which I think is not possible to make safe in the
>>> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
>>> bpf_list_del due to this exact issue. More in [0].
>>>
>>>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
>>>
>>
>> Thanks for the link, I better understand what Alexei meant in his comment on
>> patch 9 of this series. For the helpers added in this series, we can make
>> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
>> refs after the rbtree_remove in same manner as they're invalidated after
>> spin_unlock currently.
>>
>> Logic for why this is safe:
>>
>>   * If we have two non-owning refs to nodes in a tree, e.g. from
>>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>>     we have no way of knowing if they're aliases of same node.
>>
>>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>>     it might be removing a node that's already been removed, e.g.:
>>
>>         n = bpf_obj_new(...);
>>         bpf_spin_lock(&lock);
>>
>>         bpf_rbtree_add(&tree, &n->node);
>>         // n is now non-owning ref to node which was added
>>         res = bpf_rbtree_first();
>>         if (!m) {}
>>         m = container_of(res, struct node_data, node);
>>         // m is now non-owning ref to the same node
>>         bpf_rbtree_remove(&tree, &n->node);
>>         bpf_rbtree_remove(&tree, &m->node); // BAD
> 
> Let me clarify my previous email:
> 
> Above doesn't have to be 'BAD'.
> Instead of
> if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
> 
> we can drop WARN and simply return.
> If node is not part of the tree -> nop.
> 
> Same for bpf_rbtree_add.
> If it's already added -> nop.
> 

These runtime checks can certainly be done, but if we can guarantee via
verifier type system that a particular ptr-to-node is guaranteed to be in /
not be in a tree, that's better, no?

Feels like a similar train of thought to "fail verification when correct rbtree
lock isn't held" vs "just check if lock is held in every rbtree API kfunc".

> Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
> We do all these checks under the same rbtree root lock, so it's safe.
> 

I'll comment on PTR_TRUSTED in our discussion on patch 10.

>>         bpf_spin_unlock(&lock);
>>
>>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>>     of pointing to something that was already removed _only_ after a
>>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>>     be inputs to subsequent remove()s
> 
> With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
> can have release semantics.
> No need for special release_on_unlock hacks.
> 

If we want to be able to interact w/ nodes after they've been added to the
rbtree, but before critical section ends, we need to support non-owning refs,
which are currently implemented using special release_on_unlock logic.

If we go with the runtime check suggestion from above, we'd need to implement
'conditional release' similarly to earlier "rbtree map" attempt:
https://lore.kernel.org/bpf/20220830172759.4069786-14-davemarchevsky@fb.com/ .

If rbtree_add has release semantics for its node arg, but the node is already
in some tree and runtime check fails, the reference should not be released as
rbtree_add() was a nop.

Similarly, if rbtree_remove has release semantics for its node arg and acquire
semantics for its return value, runtime check failing should result in the
node arg not being released. Acquire semantics for the retval are already
conditional - if retval == NULL, mark_ptr_or_null regs will release the
acquired ref before it can be used. So no issue with failing rbtree_remove
messing up acquire.

For this reason rbtree_remove and rbtree_first are tagged
KF_ACQUIRE | KF_RET_NULL. "special release_on_unlock hacks" can likely be
refactored into a similar flag, KF_RELEASE_NON_OWN or similar.

>> This does conflate current "release non-owning refs because it's not safe to
>> read from them" reasoning with new "release non-owning refs so they can't be
>> passed to remove()". Ideally we could add some new tag to these refs that
>> prevents them from being passed to remove()-type fns, but does allow them to
>> be read, e.g.:
>>
>>   n = bpf_obj_new(...);
> 
> 'n' is acquired.
> 
>>   bpf_spin_lock(&lock);
>>
>>   bpf_rbtree_add(&tree, &n->node);
>>   // n is now non-owning ref to node which was added
> 
> since bpf_rbtree_add does release on 'n'...
> 
>>   res = bpf_rbtree_first();
>>   if (!m) {}
>>   m = container_of(res, struct node_data, node);
>>   // m is now non-owning ref to the same node
> 
> ... below is not allowed by the verifier.
>>   n = bpf_rbtree_remove(&tree, &n->node);
> 
> I'm not sure what's an idea to return 'n' from remove...
> Maybe it should be simple bool ?
> 

I agree that returning node from rbtree_remove is not strictly necessary, since
rbtree_remove can be thought of turning its non-owning ref argument into an
owning ref, instead of taking non-owning ref and returning owning ref. But such
an operation isn't really an 'acquire' by current verifier logic, since only
retvals can be 'acquired'. So we'd need to add some logic to enable acquire
semantics for args. Furthermore it's not really 'acquiring' a new ref, rather
changing properties of node arg ref.

However, if rbtree_remove can fail, such a "turn non-owning into owning"
operation will need to be able to fail as well, and the program will need to
be able to check for failure. Returning 'acquire' result in retval makes
this simple - just check for NULL. For your "return bool" proposal, we'd have
to add verifier logic which turns the 'acquired' owning ref back into non-owning
based on check of the bool, which will add some verifier complexity.

IIRC when doing experimentation with "rbtree map" implementation, I did
something like this and decided that the additional complexity wasn't worth
it when retval can just be used. 

>>   // n is now owning ref again, m is non-owning ref to same node
>>   x = m->key; // this should be safe since we're still in CS
> 
> below works because 'm' cames from bpf_rbtree_first that acquired 'res'.
> 
>>   bpf_rbtree_remove(&tree, &m->node); // But this should be prevented
>>
>>   bpf_spin_unlock(&lock);
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
@ 2022-12-07 22:28   ` Dave Marchevsky
  2022-12-07 23:06     ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 22:28 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 2:36 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, Dec 07, 2022 at 04:39:47AM IST, Dave Marchevsky wrote:
>> This series adds a rbtree datastructure following the "next-gen
>> datastructure" precedent set by recently-added linked-list [0]. This is
>> a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
>> instead of adding a new map type. This series adds a smaller set of API
>> functions than that RFC - just the minimum needed to support current
>> cgfifo example scheduler in ongoing sched_ext effort [2], namely:
>>
>>   bpf_rbtree_add
>>   bpf_rbtree_remove
>>   bpf_rbtree_first
>>
>> [...]
>>
>> Future work:
>>   Enabling writes to release_on_unlock refs should be done before the
>>   functionality of BPF rbtree can truly be considered complete.
>>   Implementing this proved more complex than expected so it's been
>>   pushed off to a future patch.
>>

> 
> TBH, I think we need to revisit whether there's a strong need for this. I would
> even argue that we should simply make the release semantics of rbtree_add,
> list_push helpers stronger and remove release_on_unlock logic entirely,
> releasing the node immediately. I don't see why it is so critical to have read,
> and more importantly, write access to nodes after losing their ownership. And
> that too is only available until the lock is unlocked.
> 

Moved the next paragraph here to ease reply, it was the last paragraph
in your response.

> 
> Can you elaborate on actual use cases where immediate release or not having
> write support makes it hard or impossible to support a certain use case, so that
> it is easier to understand the requirements and design things accordingly?
>

Sure, the main usecase and impetus behind this for me is the sched_ext work
Tejun and others are doing (https://lwn.net/Articles/916291/). One of the
things they'd like to be able to do is implement a CFS-like scheduler using
rbtree entirely in BPF. This would prove that sched_ext + BPF can be used to
implement complicated scheduling logic.

If we can implement such complicated scheduling logic, but it has so much
BPF-specific twisting of program logic that it's incomprehensible to scheduler
folks, that's not great. The overlap between "BPF experts" and "scheduler
experts" is small, and we want the latter group to be able to read BPF
scheduling logic without too much struggle. Lower learning curve makes folks
more likely to experiment with sched_ext.

When 'rbtree map' was in brainstorming / prototyping, non-owning reference
semantics were called out as moving BPF datastructures closer to their kernel
equivalents from a UX perspective.

If the "it makes BPF code better resemble normal kernel code" argumentwas the
only reason to do this I wouldn't feel so strongly, but there are practical
concerns as well:

If we could only read / write from rbtree node if it isn't in a tree, the common
operation of "find this node and update its data" would require removing and
re-adding it. For rbtree, these unnecessary remove and add operations could
result in unnecessary rebalancing. Going back to the sched_ext usecase,
if we have a rbtree with task or cgroup stats that need to be updated often,
unnecessary rebalancing would make this update slower than if non-owning refs
allowed in-place read/write of node data.

Also, we eventually want to be able to have a node that's part of both a
list and rbtree. Likely adding such a node to both would require calling
kfunc for adding to list, and separate kfunc call for adding to rbtree.
Once the node has been added to list, we need some way to represent a reference
to that node so that we can pass it to rbtree add kfunc. Sounds like a
non-owning reference to me, albeit with different semantics than current
release_on_unlock.

> I think this relaxed release logic and write support is the wrong direction to
> take, as it has a direct bearing on what can be done with a node inside the
> critical section. There's already the problem with not being able to do
> bpf_obj_drop easily inside the critical section with this. That might be useful
> for draining operations while holding the lock.
> 

The bpf_obj_drop case is similar to your "can't pass non-owning reference
to bpf_rbtree_remove" concern from patch 1's thread. If we have:

  n = bpf_obj_new(...); // n is owning ref
  bpf_rbtree_add(&tree, &n->node); // n is non-owning ref

  res = bpf_rbtree_first(&tree);
  if (!res) {...}
  m = container_of(res, struct node_data, node); // m is non-owning ref

  res = bpf_rbtree_remove(&tree, &n->node);
  n = container_of(res, struct node_data, node); // n is owning ref, m points to same memory

  bpf_obj_drop(n);
  // Not safe to use m anymore

Datastructures which support bpf_obj_drop in the critical section can
do same as my bpf_rbtree_remove suggestion: just invalidate all non-owning
references after bpf_obj_drop. Then there's no potential use-after-free.
(For the above example, pretend bpf_rbtree_remove didn't already invalidate
'm', or that there's some other way to obtain non-owning ref to 'n''s node
after rbtree_remove)

I think that, in practice, operations where the BPF program wants to remove
/ delete nodes will be distinct from operations where program just wants to 
obtain some non-owning refs and do read / write. At least for sched_ext usecase
this is true. So all the additional clobbers won't require program writer
to do special workarounds to deal with verifier in the common case.

> Semantically in other languages, once you move an object, accessing it is
> usually a bug, and in most of the cases it is sufficient to prepare it before
> insertion. We are certainly in the same territory here with these APIs.

Sure, but 'add'/'remove' for these intrusive linked datastructures is
_not_ a 'move'. Obscuring this from the user and forcing them to use
less performant patterns for the sake of some verifier complexity, or desire
to mimic semantics of languages w/o reference stability, doesn't make sense to
me.

If we were to add some datastructures without reference stability, sure, let's
not do non-owning references for those. So let's make this non-owning reference
stuff easy to turn on/off, perhaps via KF_RELEASE_NON_OWN or similar flags,
which will coincidentally make it very easy to remove if we later decide that
the complexity isn't worth it. 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 20:38         ` Dave Marchevsky
@ 2022-12-07 22:46           ` Alexei Starovoitov
  2022-12-07 23:42             ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 22:46 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 03:38:55PM -0500, Dave Marchevsky wrote:
> On 12/7/22 1:59 PM, Alexei Starovoitov wrote:
> > On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
> >> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
> >>> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
> >>>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
> >>>> There, a BTF record is created for any type containing a spin_lock or
> >>>> any next-gen datastructure node/head.
> >>>>
> >>>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
> >>>> a record using struct_meta_tab if the reg->type exactly matches
> >>>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
> >>>> "allocated obj" type - returned from bpf_obj_new - might pick up other
> >>>> flags while working its way through the program.
> >>>>
> >>>
> >>> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
> >>> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
> >>> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
> >>> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
> >>> is to make then unpassable to helpers.
> >>>
> >>
> >> I see what you mean. If reg_btf_record is only used on regs which are args,
> >> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
> >> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
> >> but then we have its use in reg_may_point_to_spin_lock, which is itself
> >> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
> >> sure that it's only used on arg regs currently.
> >>
> >> Regardless, if the intended use is on arg regs only, it should be renamed to
> >> arg_reg_btf_record or similar to make that clear, as current name sounds like
> >> it should be applicable to any reg, and thus not enforce constraints particular
> >> to arg regs.
> >>
> >> But I think it's better to leave it general and enforce those constraints
> >> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
> >> big switch statements for KF_ARG_* are doing exact type matching.
> >>
> >>>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
> >>>> for type_flag.
> >>>>
> >>>> This patch is marked Fixes as the original intent of reg_btf_record was
> >>>> unlikely to have been to fail finding btf_record for valid alloc obj
> >>>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
> >>>> are valid register type states for alloc obj independent of this series.
> >>>
> >>> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
> >>> reg->type to allow the BPF_WRITE case.
> >>>
> >>> I think this series is the one introducing this case, passing bpf_rbtree_first's
> >>> result to bpf_rbtree_remove, which I think is not possible to make safe in the
> >>> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
> >>> bpf_list_del due to this exact issue. More in [0].
> >>>
> >>>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
> >>>
> >>
> >> Thanks for the link, I better understand what Alexei meant in his comment on
> >> patch 9 of this series. For the helpers added in this series, we can make
> >> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
> >> refs after the rbtree_remove in same manner as they're invalidated after
> >> spin_unlock currently.
> >>
> >> Logic for why this is safe:
> >>
> >>   * If we have two non-owning refs to nodes in a tree, e.g. from
> >>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
> >>     we have no way of knowing if they're aliases of same node.
> >>
> >>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
> >>     it might be removing a node that's already been removed, e.g.:
> >>
> >>         n = bpf_obj_new(...);
> >>         bpf_spin_lock(&lock);
> >>
> >>         bpf_rbtree_add(&tree, &n->node);
> >>         // n is now non-owning ref to node which was added
> >>         res = bpf_rbtree_first();
> >>         if (!m) {}
> >>         m = container_of(res, struct node_data, node);
> >>         // m is now non-owning ref to the same node
> >>         bpf_rbtree_remove(&tree, &n->node);
> >>         bpf_rbtree_remove(&tree, &m->node); // BAD
> > 
> > Let me clarify my previous email:
> > 
> > Above doesn't have to be 'BAD'.
> > Instead of
> > if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
> > 
> > we can drop WARN and simply return.
> > If node is not part of the tree -> nop.
> > 
> > Same for bpf_rbtree_add.
> > If it's already added -> nop.
> > 
> 
> These runtime checks can certainly be done, but if we can guarantee via
> verifier type system that a particular ptr-to-node is guaranteed to be in /
> not be in a tree, that's better, no?
> 
> Feels like a similar train of thought to "fail verification when correct rbtree
> lock isn't held" vs "just check if lock is held in every rbtree API kfunc".
> 
> > Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
> > We do all these checks under the same rbtree root lock, so it's safe.
> > 
> 
> I'll comment on PTR_TRUSTED in our discussion on patch 10.
> 
> >>         bpf_spin_unlock(&lock);
> >>
> >>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
> >>     of pointing to something that was already removed _only_ after a
> >>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
> >>     be inputs to subsequent remove()s
> > 
> > With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
> > can have release semantics.
> > No need for special release_on_unlock hacks.
> > 
> 
> If we want to be able to interact w/ nodes after they've been added to the
> rbtree, but before critical section ends, we need to support non-owning refs,
> which are currently implemented using special release_on_unlock logic.
> 
> If we go with the runtime check suggestion from above, we'd need to implement
> 'conditional release' similarly to earlier "rbtree map" attempt:
> https://lore.kernel.org/bpf/20220830172759.4069786-14-davemarchevsky@fb.com/ .
> 
> If rbtree_add has release semantics for its node arg, but the node is already
> in some tree and runtime check fails, the reference should not be released as
> rbtree_add() was a nop.

Got it.
The conditional release is tricky. We should probably avoid it for now.

I think we can either go with Kumar's proposal and do
bpf_rbtree_pop_front() instead of bpf_rbtree_first()
that avoids all these issues...

but considering that we'll have inline iterators soon and should be able to do:

struct bpf_rbtree_iter it;
struct bpf_rb_node * node;

bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree
while ((node = bpf_rbtree_iter_next(&it)) {
  if (node->field == condition) {
    struct bpf_rb_node *n;

    n = bpf_rbtree_remove(rb_root, node);
    bpf_spin_lock(another_rb_root);
    bpf_rbtree_add(another_rb_root, n);
    bpf_spin_unlock(another_rb_root);
    break;
  }
}
bpf_rbtree_iter_destroy(&it);

We can treat the 'node' returned from bpf_rbtree_iter_next() the same way
as return from bpf_rbtree_first() ->  PTR_TRUSTED | MAYBE_NULL,
but not acquired (ref_obj_id == 0).

bpf_rbtree_add -> KF_RELEASE
so we cannot pass not acquired pointers into it.

We should probably remove release_on_unlock logic as Kumar suggesting and
make bpf_list_push_front/back to be KF_RELEASE.

Then
bpf_list_pop_front/back stay KF_ACQUIRE | KF_RET_NULL
and
bpf_rbtree_remove is also KF_ACQUIRE | KF_RET_NULL.

The difference is bpf_list_pop has only 'head'
while bpf_rbtree_remove has 'root' and 'node' where 'node' has to be PTR_TRUSTED
(but not acquired).

bpf_rbtree_add will always succeed.
bpf_rbtree_remove will conditionally fail if 'node' is not linked.

Similarly we can extend link list with
n = bpf_list_remove(node)
which will have KF_ACQUIRE | KF_RET_NULL semantics.

Then everything is nicely uniform.
We'll be able to iterate rbtree and iterate link lists.

There are downsides, of course.
Like the following from your test case:
+       bpf_spin_lock(&glock);
+       bpf_rbtree_add(&groot, &n->node, less);
+       bpf_rbtree_add(&groot, &m->node, less);
+       res = bpf_rbtree_remove(&groot, &n->node);
+       bpf_spin_unlock(&glock);
will not work.
Since bpf_rbtree_add() releases 'n' and it becomes UNTRUSTED.
(assuming release_on_unlock is removed).

I think it's fine for now. I have to agree with Kumar that it's hard to come up
with realistic use case where 'n' should be accessed after it was added to link
list or rbtree. Above test case doesn't look real.

This part of your test case:
+       bpf_spin_lock(&glock);
+       bpf_rbtree_add(&groot, &n->node, less);
+       bpf_rbtree_add(&groot, &m->node, less);
+       bpf_rbtree_add(&groot, &o->node, less);
+
+       res = bpf_rbtree_first(&groot);
+       if (!res) {
+               bpf_spin_unlock(&glock);
+               return 2;
+       }
+
+       o = container_of(res, struct node_data, node);
+       res = bpf_rbtree_remove(&groot, &o->node);
+       bpf_spin_unlock(&glock);

will work, because bpf_rbtree_first returns PTR_TRUSTED | MAYBE_NULL.

> Similarly, if rbtree_remove has release semantics for its node arg and acquire
> semantics for its return value, runtime check failing should result in the
> node arg not being released. Acquire semantics for the retval are already
> conditional - if retval == NULL, mark_ptr_or_null regs will release the
> acquired ref before it can be used. So no issue with failing rbtree_remove
> messing up acquire.
> 
> For this reason rbtree_remove and rbtree_first are tagged
> KF_ACQUIRE | KF_RET_NULL. "special release_on_unlock hacks" can likely be
> refactored into a similar flag, KF_RELEASE_NON_OWN or similar.

I guess what I'm propsing above is sort-of KF_RELEASE_NON_OWN idea,
but from a different angle.
I'd like to avoid introducing new flags.
I think PTR_TRUSTED is enough.

> > I'm not sure what's an idea to return 'n' from remove...
> > Maybe it should be simple bool ?
> > 
> 
> I agree that returning node from rbtree_remove is not strictly necessary, since
> rbtree_remove can be thought of turning its non-owning ref argument into an
> owning ref, instead of taking non-owning ref and returning owning ref. But such
> an operation isn't really an 'acquire' by current verifier logic, since only
> retvals can be 'acquired'. So we'd need to add some logic to enable acquire
> semantics for args. Furthermore it's not really 'acquiring' a new ref, rather
> changing properties of node arg ref.
> 
> However, if rbtree_remove can fail, such a "turn non-owning into owning"
> operation will need to be able to fail as well, and the program will need to
> be able to check for failure. Returning 'acquire' result in retval makes
> this simple - just check for NULL. For your "return bool" proposal, we'd have
> to add verifier logic which turns the 'acquired' owning ref back into non-owning
> based on check of the bool, which will add some verifier complexity.
> 
> IIRC when doing experimentation with "rbtree map" implementation, I did
> something like this and decided that the additional complexity wasn't worth
> it when retval can just be used. 

Agree. Forget 'bool' idea.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-07 22:28   ` Dave Marchevsky
@ 2022-12-07 23:06     ` Alexei Starovoitov
  2022-12-08  1:18       ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 23:06 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 05:28:34PM -0500, Dave Marchevsky wrote:
> On 12/7/22 2:36 PM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Dec 07, 2022 at 04:39:47AM IST, Dave Marchevsky wrote:
> >> This series adds a rbtree datastructure following the "next-gen
> >> datastructure" precedent set by recently-added linked-list [0]. This is
> >> a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
> >> instead of adding a new map type. This series adds a smaller set of API
> >> functions than that RFC - just the minimum needed to support current
> >> cgfifo example scheduler in ongoing sched_ext effort [2], namely:
> >>
> >>   bpf_rbtree_add
> >>   bpf_rbtree_remove
> >>   bpf_rbtree_first
> >>
> >> [...]
> >>
> >> Future work:
> >>   Enabling writes to release_on_unlock refs should be done before the
> >>   functionality of BPF rbtree can truly be considered complete.
> >>   Implementing this proved more complex than expected so it's been
> >>   pushed off to a future patch.
> >>
> 
> > 
> > TBH, I think we need to revisit whether there's a strong need for this. I would
> > even argue that we should simply make the release semantics of rbtree_add,
> > list_push helpers stronger and remove release_on_unlock logic entirely,
> > releasing the node immediately. I don't see why it is so critical to have read,
> > and more importantly, write access to nodes after losing their ownership. And
> > that too is only available until the lock is unlocked.
> > 
> 
> Moved the next paragraph here to ease reply, it was the last paragraph
> in your response.
> 
> > 
> > Can you elaborate on actual use cases where immediate release or not having
> > write support makes it hard or impossible to support a certain use case, so that
> > it is easier to understand the requirements and design things accordingly?
> >
> 
> Sure, the main usecase and impetus behind this for me is the sched_ext work
> Tejun and others are doing (https://lwn.net/Articles/916291/). One of the
> things they'd like to be able to do is implement a CFS-like scheduler using
> rbtree entirely in BPF. This would prove that sched_ext + BPF can be used to
> implement complicated scheduling logic.
> 
> If we can implement such complicated scheduling logic, but it has so much
> BPF-specific twisting of program logic that it's incomprehensible to scheduler
> folks, that's not great. The overlap between "BPF experts" and "scheduler
> experts" is small, and we want the latter group to be able to read BPF
> scheduling logic without too much struggle. Lower learning curve makes folks
> more likely to experiment with sched_ext.
> 
> When 'rbtree map' was in brainstorming / prototyping, non-owning reference
> semantics were called out as moving BPF datastructures closer to their kernel
> equivalents from a UX perspective.

Our emails crossed. See my previous email.
Agree on the above.

> If the "it makes BPF code better resemble normal kernel code" argumentwas the
> only reason to do this I wouldn't feel so strongly, but there are practical
> concerns as well:
> 
> If we could only read / write from rbtree node if it isn't in a tree, the common
> operation of "find this node and update its data" would require removing and
> re-adding it. For rbtree, these unnecessary remove and add operations could

Not really. See my previous email.

> result in unnecessary rebalancing. Going back to the sched_ext usecase,
> if we have a rbtree with task or cgroup stats that need to be updated often,
> unnecessary rebalancing would make this update slower than if non-owning refs
> allowed in-place read/write of node data.

Agree. Read/write from non-owning refs is necessary.
In the other email I'm arguing that PTR_TRUSTED with ref_obj_id == 0
(your non-owning ref) should not be mixed with release_on_unlock logic.

KF_RELEASE should still accept as args and release only ptrs with ref_obj_id > 0.

> 
> Also, we eventually want to be able to have a node that's part of both a
> list and rbtree. Likely adding such a node to both would require calling
> kfunc for adding to list, and separate kfunc call for adding to rbtree.
> Once the node has been added to list, we need some way to represent a reference
> to that node so that we can pass it to rbtree add kfunc. Sounds like a
> non-owning reference to me, albeit with different semantics than current
> release_on_unlock.

A node with both link list and rbtree would be a new concept.
We'd need to introduce 'struct bpf_refcnt' and make sure prog does the right thing.
That's a future discussion.

> 
> > I think this relaxed release logic and write support is the wrong direction to
> > take, as it has a direct bearing on what can be done with a node inside the
> > critical section. There's already the problem with not being able to do
> > bpf_obj_drop easily inside the critical section with this. That might be useful
> > for draining operations while holding the lock.
> > 
> 
> The bpf_obj_drop case is similar to your "can't pass non-owning reference
> to bpf_rbtree_remove" concern from patch 1's thread. If we have:
> 
>   n = bpf_obj_new(...); // n is owning ref
>   bpf_rbtree_add(&tree, &n->node); // n is non-owning ref

what I proposed in the other email...
n should be untrusted here.
That's != 'n is non-owning ref'

>   res = bpf_rbtree_first(&tree);
>   if (!res) {...}
>   m = container_of(res, struct node_data, node); // m is non-owning ref

agree. m == PTR_TRUSTED with ref_obj_id == 0.

>   res = bpf_rbtree_remove(&tree, &n->node);

a typo here? Did you mean 'm->node' ?

and after 'if (res)' ...
>   n = container_of(res, struct node_data, node); // n is owning ref, m points to same memory

agree. n -> ref_obj_id > 0

>   bpf_obj_drop(n);

above is ok to do.
'n' becomes UNTRUSTED or invalid.

>   // Not safe to use m anymore

'm' should have become UNTRUSTED after bpf_rbtree_remove.

> Datastructures which support bpf_obj_drop in the critical section can
> do same as my bpf_rbtree_remove suggestion: just invalidate all non-owning
> references after bpf_obj_drop.

'invalidate all' sounds suspicious.
I don't think we need to do sweaping search after bpf_obj_drop.

> Then there's no potential use-after-free.
> (For the above example, pretend bpf_rbtree_remove didn't already invalidate
> 'm', or that there's some other way to obtain non-owning ref to 'n''s node
> after rbtree_remove)
> 
> I think that, in practice, operations where the BPF program wants to remove
> / delete nodes will be distinct from operations where program just wants to 
> obtain some non-owning refs and do read / write. At least for sched_ext usecase
> this is true. So all the additional clobbers won't require program writer
> to do special workarounds to deal with verifier in the common case.
> 
> > Semantically in other languages, once you move an object, accessing it is
> > usually a bug, and in most of the cases it is sufficient to prepare it before
> > insertion. We are certainly in the same territory here with these APIs.
> 
> Sure, but 'add'/'remove' for these intrusive linked datastructures is
> _not_ a 'move'. Obscuring this from the user and forcing them to use
> less performant patterns for the sake of some verifier complexity, or desire
> to mimic semantics of languages w/o reference stability, doesn't make sense to
> me.

I agree, but everything we discuss in the above looks orthogonal
to release_on_unlock that myself and Kumar are proposing to drop.

> If we were to add some datastructures without reference stability, sure, let's
> not do non-owning references for those. So let's make this non-owning reference
> stuff easy to turn on/off, perhaps via KF_RELEASE_NON_OWN or similar flags,
> which will coincidentally make it very easy to remove if we later decide that
> the complexity isn't worth it. 

You mean KF_RELEASE_NON_OWN would be applied to bpf_rbtree_remove() ?
So it accepts PTR_TRUSTED ref_obj_id == 0 arg and makes it PTR_UNTRUSTED ?
If so then I agree. The 'release' part of the name was confusing.
It's also not clear which arg it applies to.
bpf_rbtree_remove has two args. Both are PTR_TRUSTED.
I wouldn't introduce a new flag for this just yet.
We can hard code bpf_rbtree_remove, bpf_list_pop for now
or use our name suffix hack.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-07 18:06       ` Alexei Starovoitov
@ 2022-12-07 23:39         ` Dave Marchevsky
  2022-12-08  0:47           ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 23:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/7/22 1:06 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 01:46:56AM -0500, Dave Marchevsky wrote:
>> On 12/6/22 9:39 PM, Alexei Starovoitov wrote:
>>> On Tue, Dec 06, 2022 at 03:09:57PM -0800, Dave Marchevsky wrote:
>>>> Current comment in BPF_PROBE_MEM jit code claims that verifier prevents
>>>> insn->off < 0, but this appears to not be true irrespective of changes
>>>> in this series. Regardless, changes in this series will result in an
>>>> example like:
>>>>
>>>>   struct example_node {
>>>>     long key;
>>>>     long val;
>>>>     struct bpf_rb_node node;
>>>>   }
>>>>
>>>>   /* In BPF prog, assume root contains example_node nodes */
>>>>   struct bpf_rb_node res = bpf_rbtree_first(&root);
>>>>   if (!res)
>>>>     return 1;
>>>>
>>>>   struct example_node n = container_of(res, struct example_node, node);
>>>>   long key = n->key;
>>>>
>>>> Resulting in a load with off = -16, as bpf_rbtree_first's return is
>>>
>>> Looks like the bug in the previous patch:
>>> +                       } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
>>> +                                  meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
>>> +                               struct btf_field *field = meta.arg_rbtree_root.field;
>>> +
>>> +                               mark_reg_datastructure_node(regs, BPF_REG_0,
>>> +                                                           &field->datastructure_head);
>>>
>>> The R0 .off should have been:
>>>  regs[BPF_REG_0].off = field->rb_node.node_offset;
>>>
>>> node, not root.
>>>
>>> PTR_TO_BTF_ID should have been returned with approriate 'off',
>>> so that container_of() would it bring back to zero offset.
>>>
>>
>> The root's btf_field is used to hold information about the node type. Of
>> specific interest to us are value_btf_id and node_offset, which
>> mark_reg_datastructure_node uses to set REG_0's type and offset correctly.
>>
>> This "use head type to keep info about node type" strategy felt strange to me
>> initially too: all PTR_TO_BTF_ID regs are passing around their type info, so
>> why not use that to lookup bpf_rb_node field info? But consider that
>> bpf_rbtree_first (and bpf_list_pop_{front,back}) doesn't take a node as
>> input arg, so there's no opportunity to get btf_field info from input
>> reg type. 
>>
>> So we'll need to keep this info in rbtree_root's btf_field
>> regardless, and since any rbtree API function that operates on a node
>> also operates on a root and expects its node arg to match the node
>> type expected by the root, might as well use root's field as the main
>> lookup for this info and not even have &field->rb_node for now.
>> All __process_kf_arg_ptr_to_datastructure_node calls (added earlier
>> in the series) use the &meta->arg_{list_head,rbtree_root}.field for same
>> reason.
>>
>> So it's setting the reg offset correctly.
> 
> Ok. Got it. Than the commit log is incorrectly describing the failing scenario.
> It's a container_of() inside bool less() that is generating negative offsets.
> 

I noticed this happening with container_of() both inside less() and in the
example in patch summary. Specifically in the rbtree_first_and_remove 'success'
selftest added in patch 13. There, operations like this:

  bpf_spin_lock(&glock);
  res = bpf_rbtree_first(&groot);
  if (!res) {...}

  o = container_of(res, struct node_data, node);
  first_data[1] = o->data;
  bpf_spin_unlock(&glock);

Would fail to set first_data[1] to the expected value, instead setting
it to 0. 

>>> All PTR_TO_BTF_ID need to have positive offset.
>>> I'm not sure btf_struct_walk() and other PTR_TO_BTF_ID accessors
>>> can deal with negative offsets.
>>> There could be all kinds of things to fix.
>>
>> I think you may be conflating reg offset and insn offset here. None of the
>> changes in this series result in a PTR_TO_BTF_ID reg w/ negative offset
>> being returned. But LLVM may generate load insns with a negative offset,
>> and since we're passing around pointers to bpf_rb_node that may come
>> after useful data fields in a type, this will happen more often.
>>
>> Consider this small example from selftests in this series:
>>
>> struct node_data {
>>   long key;
>>   long data;
>>   struct bpf_rb_node node;
>> };
>>
>> static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
>> {
>>         struct node_data *node_a;
>>         struct node_data *node_b;
>>
>>         node_a = container_of(a, struct node_data, node);
>>         node_b = container_of(b, struct node_data, node);
>>
>>         return node_a->key < node_b->key;
>> }
>>
>> llvm-objdump shows this bpf bytecode for 'less':
>>
>> 0000000000000000 <less>:
>> ;       return node_a->key < node_b->key;
>>        0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
>>        1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
>>        2:       b4 00 00 00 01 00 00 00 w0 = 0x1
>> ;       return node_a->key < node_b->key;
> 
> I see. That's the same bug.
> The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with 
> correct positive offset.
> Then node_a = container_of(a, struct node_data, node);
> would have produced correct offset into proper btf_id.
> 
> The verifier should be passing into less() the btf_id
> of struct node_data instead of btf_id of struct bpf_rb_node.
> 

The verifier is already passing the struct node_data type, not bpf_rb_node.
For less() args, and rbtree_{first,remove} retval, mark_reg_datastructure_node
- added in patch 8 - is doing as you describe.

Verifier sees less' arg regs as R=ptr_to_node_data(off=16). If it was
instead passing R=ptr_to_bpf_rb_node(off=0), attempting to access *(reg - 0x10)
would cause verifier err.

>>        3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
>>        4:       b4 00 00 00 00 00 00 00 w0 = 0x0
>>
>> 0000000000000028 <LBB2_2>:
>> ;       return node_a->key < node_b->key;
>>        5:       95 00 00 00 00 00 00 00 exit
>>
>> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
>> negative insn->off. Verifier's view or R1 and R2 before insn 0 is
>> untrusted_ptr_node_data(off=16). If there were some intermediate insns
>> storing result of container_of() before dereferencing:
>>
>>   r3 = (r2 - 0x10)
>>   r2 = *(u64 *)(r3)
>>
>> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
>> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
>> using original arg ptrs to less() instead of storing container_of() ptr
>> adjustments.
>>
>> Since the container_of usage and code pattern in above example's less()
>> isn't particularly specific to this series, I think there are other scenarios
>> where such code would be generated and considered this a general bugfix in
>> cover letter.
> 
> imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set.
> 

If I used PTR_TRUSTED here, the JITted instructions would still do a load like
r2 = *(u64 *)(r2 - 0x10). There would just be no BPF_PROBE_MEM runtime checking
insns generated, avoiding negative insn issue there. But the negative insn->off
load being generated is not specific to PTR_UNTRUSTED.

>>
>> [ below paragraph was moved here, it originally preceded "All PTR_TO_BTF_ID"
>>   paragraph ]
>>
>>> The apporach of returning untrusted from bpf_rbtree_first is questionable.
>>> Without doing that this issue would not have surfaced.
>>>
>>
>> I agree re: PTR_UNTRUSTED, but note that my earlier example doesn't involve
>> bpf_rbtree_first. Regardless, I think the issue is that PTR_UNTRUSTED is
>> used to denote a few separate traits of a PTR_TO_BTF_ID reg:
>>
>>   * "I have no ownership over the thing I'm pointing to"
>>   * "My backing memory may go away at any time"
>>   * "Access to my fields might result in page fault"
>>   * "Kfuncs shouldn't accept me as an arg"
>>
>> Seems like original PTR_UNTRUSTED usage really wanted to denote the first
>> point and the others were just naturally implied from the first. But
>> as you've noted there are some things using PTR_UNTRUSTED that really
>> want to make more granular statements:
> 
> I think PTR_UNTRUSTED implies all of the above. All 4 statements are connected.
> 
>> ref_set_release_on_unlock logic sets release_on_unlock = true and adds
>> PTR_UNTRUSTED to the reg type. In this case PTR_UNTRUSTED is trying to say:
>>
>>   * "I have no ownership over the thing I'm pointing to"
>>   * "My backing memory may go away at any time _after_ bpf_spin_unlock"
>>     * Before spin_unlock it's guaranteed to be valid
>>   * "Kfuncs shouldn't accept me as an arg"
>>     * We don't want arbitrary kfunc saving and accessing release_on_unlock
>>       reg after bpf_spin_unlock, as its backing memory can go away any time
>>       after spin_unlock.
>>
>> The "backing memory" statement PTR_UNTRUSTED is making is a blunt superset
>> of what release_on_unlock really needs.
>>
>> For less() callback we just want
>>
>>   * "I have no ownership over the thing I'm pointing to"
>>   * "Kfuncs shouldn't accept me as an arg"
>>
>> There is probably a way to decompose PTR_UNTRUSTED into a few flags such that
>> it's possible to denote these things separately and avoid unwanted additional
>> behavior. But after talking to David Vernet about current complexity of
>> PTR_TRUSTED and PTR_UNTRUSTED logic and his desire to refactor, it seemed
>> better to continue with PTR_UNTRUSTED blunt instrument with a bit of
>> special casing for now, instead of piling on more flags.
> 
> Exactly. More flags will only increase the confusion.
> Please try to make callback args as proper PTR_TRUSTED and disallow calling specific
> rbtree kfuncs while inside this particular callback to prevent recursion.
> That would solve all these issues, no?
> Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus.
> 
> Consider less() receiving btf_id ptr_trusted of struct node_data and it contains
> both link list and rbtree.
> It should still be safe to operate on link list part of that node from less()
> though it's not something we would ever recommend.

I definitely want to allow writes on non-owning references. In order to properly
support this, there needs to be a way to designate a field as a "key":

struct node_data {
  long key __key;
  long data;
  struct bpf_rb_node node;
};

or perhaps on the rb_root via __contains or separate tag:

struct bpf_rb_root groot __contains(struct node_data, node, key);

This is necessary because rbtree's less() uses key field to determine order, so
we don't want to allow write to the key field when the node is in a rbtree. If
such a write were possible the rbtree could easily be placed in an invalid state
since the new key may mean that the rbtree is no longer sorted. Subsequent add()
operations would compare less() using the new key, so other nodes will be placed
in wrong spot as well.

Since PTR_UNTRUSTED currently allows read but not write, and prevents use of
non-owning ref as kfunc arg, it seemed to be reasonable tag for less() args.

I was planning on adding __key / non-owning-ref write support as a followup, but
adding it as part of this series will probably save a lot of back-and-forth.
Will try to add it.

> The kfunc call on rb tree part of struct node_data is problematic because
> of recursion, right? No other safety concerns ?
> 
>>>
>>>> modified by verifier to be PTR_TO_BTF_ID of example_node w/ offset =
>>>> offsetof(struct example_node, node), instead of PTR_TO_BTF_ID of
>>>> bpf_rb_node. So it's necessary to support negative insn->off when
>>>> jitting BPF_PROBE_MEM.
>>>
>>> I'm not convinced it's necessary.
>>> container_of() seems to be the only case where bpf prog can convert
>>> PTR_TO_BTF_ID with off >= 0 to negative off.
>>> Normal pointer walking will not make it negative.
>>>
>>
>> I see what you mean - if some non-container_of case resulted in load generation
>> with negative insn->off, this probably would've been noticed already. But
>> hopefully my replies above explain why it should be addressed now.
> 
> Even with container_of() usage we should be passing proper btf_id of container
> struct, so that callbacks and non-callbacks can properly container_of() it
> and still get offset >= 0.
> 

This was addressed earlier in my response.

>>>>
>>>> A few instructions are saved for negative insn->offs as a result. Using
>>>> the struct example_node / off = -16 example from before, code looks
>>>> like:
>>>
>>> This is quite complex to review. I couldn't convince myself
>>> that droping 2nd check is safe, but don't have an argument to
>>> prove that it's not safe.
>>> Let's get to these details when there is need to support negative off.
>>>
>>
>> Hopefully above explanation shows that there's need to support it now.
>> I will try to simplify and rephrase the summary to make it easier to follow,
>> but will prioritize addressing feedback in less complex patches, so this
>> patch may not change for a few respins.
> 
> I'm not saying that this patch will never be needed.
> Supporting negative offsets here is a good thing.
> I'm arguing that it's not necessary to enable bpf_rbtree.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record
  2022-12-07 22:46           ` Alexei Starovoitov
@ 2022-12-07 23:42             ` Dave Marchevsky
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-07 23:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 5:46 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 03:38:55PM -0500, Dave Marchevsky wrote:
>> On 12/7/22 1:59 PM, Alexei Starovoitov wrote:
>>> On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
>>>> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
>>>>> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
>>>>>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
>>>>>> There, a BTF record is created for any type containing a spin_lock or
>>>>>> any next-gen datastructure node/head.
>>>>>>
>>>>>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
>>>>>> a record using struct_meta_tab if the reg->type exactly matches
>>>>>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
>>>>>> "allocated obj" type - returned from bpf_obj_new - might pick up other
>>>>>> flags while working its way through the program.
>>>>>>
>>>>>
>>>>> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
>>>>> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
>>>>> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
>>>>> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
>>>>> is to make then unpassable to helpers.
>>>>>
>>>>
>>>> I see what you mean. If reg_btf_record is only used on regs which are args,
>>>> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
>>>> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
>>>> but then we have its use in reg_may_point_to_spin_lock, which is itself
>>>> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
>>>> sure that it's only used on arg regs currently.
>>>>
>>>> Regardless, if the intended use is on arg regs only, it should be renamed to
>>>> arg_reg_btf_record or similar to make that clear, as current name sounds like
>>>> it should be applicable to any reg, and thus not enforce constraints particular
>>>> to arg regs.
>>>>
>>>> But I think it's better to leave it general and enforce those constraints
>>>> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
>>>> big switch statements for KF_ARG_* are doing exact type matching.
>>>>
>>>>>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
>>>>>> for type_flag.
>>>>>>
>>>>>> This patch is marked Fixes as the original intent of reg_btf_record was
>>>>>> unlikely to have been to fail finding btf_record for valid alloc obj
>>>>>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
>>>>>> are valid register type states for alloc obj independent of this series.
>>>>>
>>>>> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
>>>>> reg->type to allow the BPF_WRITE case.
>>>>>
>>>>> I think this series is the one introducing this case, passing bpf_rbtree_first's
>>>>> result to bpf_rbtree_remove, which I think is not possible to make safe in the
>>>>> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
>>>>> bpf_list_del due to this exact issue. More in [0].
>>>>>
>>>>>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@mail.gmail.com
>>>>>
>>>>
>>>> Thanks for the link, I better understand what Alexei meant in his comment on
>>>> patch 9 of this series. For the helpers added in this series, we can make
>>>> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
>>>> refs after the rbtree_remove in same manner as they're invalidated after
>>>> spin_unlock currently.
>>>>
>>>> Logic for why this is safe:
>>>>
>>>>   * If we have two non-owning refs to nodes in a tree, e.g. from
>>>>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>>>>     we have no way of knowing if they're aliases of same node.
>>>>
>>>>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>>>>     it might be removing a node that's already been removed, e.g.:
>>>>
>>>>         n = bpf_obj_new(...);
>>>>         bpf_spin_lock(&lock);
>>>>
>>>>         bpf_rbtree_add(&tree, &n->node);
>>>>         // n is now non-owning ref to node which was added
>>>>         res = bpf_rbtree_first();
>>>>         if (!m) {}
>>>>         m = container_of(res, struct node_data, node);
>>>>         // m is now non-owning ref to the same node
>>>>         bpf_rbtree_remove(&tree, &n->node);
>>>>         bpf_rbtree_remove(&tree, &m->node); // BAD
>>>
>>> Let me clarify my previous email:
>>>
>>> Above doesn't have to be 'BAD'.
>>> Instead of
>>> if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
>>>
>>> we can drop WARN and simply return.
>>> If node is not part of the tree -> nop.
>>>
>>> Same for bpf_rbtree_add.
>>> If it's already added -> nop.
>>>
>>
>> These runtime checks can certainly be done, but if we can guarantee via
>> verifier type system that a particular ptr-to-node is guaranteed to be in /
>> not be in a tree, that's better, no?
>>
>> Feels like a similar train of thought to "fail verification when correct rbtree
>> lock isn't held" vs "just check if lock is held in every rbtree API kfunc".
>>
>>> Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
>>> We do all these checks under the same rbtree root lock, so it's safe.
>>>
>>
>> I'll comment on PTR_TRUSTED in our discussion on patch 10.
>>
>>>>         bpf_spin_unlock(&lock);
>>>>
>>>>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>>>>     of pointing to something that was already removed _only_ after a
>>>>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>>>>     be inputs to subsequent remove()s
>>>
>>> With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
>>> can have release semantics.
>>> No need for special release_on_unlock hacks.
>>>
>>
>> If we want to be able to interact w/ nodes after they've been added to the
>> rbtree, but before critical section ends, we need to support non-owning refs,
>> which are currently implemented using special release_on_unlock logic.
>>
>> If we go with the runtime check suggestion from above, we'd need to implement
>> 'conditional release' similarly to earlier "rbtree map" attempt:
>> https://lore.kernel.org/bpf/20220830172759.4069786-14-davemarchevsky@fb.com/ .
>>
>> If rbtree_add has release semantics for its node arg, but the node is already
>> in some tree and runtime check fails, the reference should not be released as
>> rbtree_add() was a nop.
> 
> Got it.
> The conditional release is tricky. We should probably avoid it for now.
> 
> I think we can either go with Kumar's proposal and do
> bpf_rbtree_pop_front() instead of bpf_rbtree_first()
> that avoids all these issues...
> 
> but considering that we'll have inline iterators soon and should be able to do:
> 
> struct bpf_rbtree_iter it;
> struct bpf_rb_node * node;
> 
> bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree
> while ((node = bpf_rbtree_iter_next(&it)) {
>   if (node->field == condition) {
>     struct bpf_rb_node *n;
> 
>     n = bpf_rbtree_remove(rb_root, node);
>     bpf_spin_lock(another_rb_root);
>     bpf_rbtree_add(another_rb_root, n);
>     bpf_spin_unlock(another_rb_root);
>     break;
>   }
> }
> bpf_rbtree_iter_destroy(&it);
> 
> We can treat the 'node' returned from bpf_rbtree_iter_next() the same way
> as return from bpf_rbtree_first() ->  PTR_TRUSTED | MAYBE_NULL,
> but not acquired (ref_obj_id == 0).
> 
> bpf_rbtree_add -> KF_RELEASE
> so we cannot pass not acquired pointers into it.
> 
> We should probably remove release_on_unlock logic as Kumar suggesting and
> make bpf_list_push_front/back to be KF_RELEASE.
> 
> Then
> bpf_list_pop_front/back stay KF_ACQUIRE | KF_RET_NULL
> and
> bpf_rbtree_remove is also KF_ACQUIRE | KF_RET_NULL.
> 
> The difference is bpf_list_pop has only 'head'
> while bpf_rbtree_remove has 'root' and 'node' where 'node' has to be PTR_TRUSTED
> (but not acquired).
> 
> bpf_rbtree_add will always succeed.
> bpf_rbtree_remove will conditionally fail if 'node' is not linked.
> 
> Similarly we can extend link list with
> n = bpf_list_remove(node)
> which will have KF_ACQUIRE | KF_RET_NULL semantics.
> 
> Then everything is nicely uniform.
> We'll be able to iterate rbtree and iterate link lists.
> 
> There are downsides, of course.
> Like the following from your test case:
> +       bpf_spin_lock(&glock);
> +       bpf_rbtree_add(&groot, &n->node, less);
> +       bpf_rbtree_add(&groot, &m->node, less);
> +       res = bpf_rbtree_remove(&groot, &n->node);
> +       bpf_spin_unlock(&glock);
> will not work.
> Since bpf_rbtree_add() releases 'n' and it becomes UNTRUSTED.
> (assuming release_on_unlock is removed).
> 
> I think it's fine for now. I have to agree with Kumar that it's hard to come up
> with realistic use case where 'n' should be accessed after it was added to link
> list or rbtree. Above test case doesn't look real.
> 
> This part of your test case:
> +       bpf_spin_lock(&glock);
> +       bpf_rbtree_add(&groot, &n->node, less);
> +       bpf_rbtree_add(&groot, &m->node, less);
> +       bpf_rbtree_add(&groot, &o->node, less);
> +
> +       res = bpf_rbtree_first(&groot);
> +       if (!res) {
> +               bpf_spin_unlock(&glock);
> +               return 2;
> +       }
> +
> +       o = container_of(res, struct node_data, node);
> +       res = bpf_rbtree_remove(&groot, &o->node);
> +       bpf_spin_unlock(&glock);
> 
> will work, because bpf_rbtree_first returns PTR_TRUSTED | MAYBE_NULL.
> 
>> Similarly, if rbtree_remove has release semantics for its node arg and acquire
>> semantics for its return value, runtime check failing should result in the
>> node arg not being released. Acquire semantics for the retval are already
>> conditional - if retval == NULL, mark_ptr_or_null regs will release the
>> acquired ref before it can be used. So no issue with failing rbtree_remove
>> messing up acquire.
>>
>> For this reason rbtree_remove and rbtree_first are tagged
>> KF_ACQUIRE | KF_RET_NULL. "special release_on_unlock hacks" can likely be
>> refactored into a similar flag, KF_RELEASE_NON_OWN or similar.
> 
> I guess what I'm propsing above is sort-of KF_RELEASE_NON_OWN idea,
> but from a different angle.
> I'd like to avoid introducing new flags.
> I think PTR_TRUSTED is enough.
> 
>>> I'm not sure what's an idea to return 'n' from remove...
>>> Maybe it should be simple bool ?
>>>
>>
>> I agree that returning node from rbtree_remove is not strictly necessary, since
>> rbtree_remove can be thought of turning its non-owning ref argument into an
>> owning ref, instead of taking non-owning ref and returning owning ref. But such
>> an operation isn't really an 'acquire' by current verifier logic, since only
>> retvals can be 'acquired'. So we'd need to add some logic to enable acquire
>> semantics for args. Furthermore it's not really 'acquiring' a new ref, rather
>> changing properties of node arg ref.
>>
>> However, if rbtree_remove can fail, such a "turn non-owning into owning"
>> operation will need to be able to fail as well, and the program will need to
>> be able to check for failure. Returning 'acquire' result in retval makes
>> this simple - just check for NULL. For your "return bool" proposal, we'd have
>> to add verifier logic which turns the 'acquired' owning ref back into non-owning
>> based on check of the bool, which will add some verifier complexity.
>>
>> IIRC when doing experimentation with "rbtree map" implementation, I did
>> something like this and decided that the additional complexity wasn't worth
>> it when retval can just be used. 
> 
> Agree. Forget 'bool' idea.

We will merge this convo w/ similar one in the cover letter's thread, and
continue w/ replies there.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-07 23:39         ` Dave Marchevsky
@ 2022-12-08  0:47           ` Alexei Starovoitov
  2022-12-08  8:50             ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-08  0:47 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 07, 2022 at 06:39:38PM -0500, Dave Marchevsky wrote:
> >>
> >> 0000000000000000 <less>:
> >> ;       return node_a->key < node_b->key;
> >>        0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
> >>        1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
> >>        2:       b4 00 00 00 01 00 00 00 w0 = 0x1
> >> ;       return node_a->key < node_b->key;
> > 
> > I see. That's the same bug.
> > The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with 
> > correct positive offset.
> > Then node_a = container_of(a, struct node_data, node);
> > would have produced correct offset into proper btf_id.
> > 
> > The verifier should be passing into less() the btf_id
> > of struct node_data instead of btf_id of struct bpf_rb_node.
> > 
> 
> The verifier is already passing the struct node_data type, not bpf_rb_node.
> For less() args, and rbtree_{first,remove} retval, mark_reg_datastructure_node
> - added in patch 8 - is doing as you describe.
> 
> Verifier sees less' arg regs as R=ptr_to_node_data(off=16). If it was
> instead passing R=ptr_to_bpf_rb_node(off=0), attempting to access *(reg - 0x10)
> would cause verifier err.

Ahh. I finally got it :)
Please put these details in the commit log when you respin.

> >>        3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
> >>        4:       b4 00 00 00 00 00 00 00 w0 = 0x0
> >>
> >> 0000000000000028 <LBB2_2>:
> >> ;       return node_a->key < node_b->key;
> >>        5:       95 00 00 00 00 00 00 00 exit
> >>
> >> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
> >> negative insn->off. Verifier's view or R1 and R2 before insn 0 is
> >> untrusted_ptr_node_data(off=16). If there were some intermediate insns
> >> storing result of container_of() before dereferencing:
> >>
> >>   r3 = (r2 - 0x10)
> >>   r2 = *(u64 *)(r3)
> >>
> >> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
> >> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
> >> using original arg ptrs to less() instead of storing container_of() ptr
> >> adjustments.
> >>
> >> Since the container_of usage and code pattern in above example's less()
> >> isn't particularly specific to this series, I think there are other scenarios
> >> where such code would be generated and considered this a general bugfix in
> >> cover letter.
> > 
> > imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set.
> > 
> 
> If I used PTR_TRUSTED here, the JITted instructions would still do a load like
> r2 = *(u64 *)(r2 - 0x10). There would just be no BPF_PROBE_MEM runtime checking
> insns generated, avoiding negative insn issue there. But the negative insn->off
> load being generated is not specific to PTR_UNTRUSTED.

yep.

> > 
> > Exactly. More flags will only increase the confusion.
> > Please try to make callback args as proper PTR_TRUSTED and disallow calling specific
> > rbtree kfuncs while inside this particular callback to prevent recursion.
> > That would solve all these issues, no?
> > Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus.
> > 
> > Consider less() receiving btf_id ptr_trusted of struct node_data and it contains
> > both link list and rbtree.
> > It should still be safe to operate on link list part of that node from less()
> > though it's not something we would ever recommend.
> 
> I definitely want to allow writes on non-owning references. In order to properly
> support this, there needs to be a way to designate a field as a "key":
> 
> struct node_data {
>   long key __key;
>   long data;
>   struct bpf_rb_node node;
> };
> 
> or perhaps on the rb_root via __contains or separate tag:
> 
> struct bpf_rb_root groot __contains(struct node_data, node, key);
> 
> This is necessary because rbtree's less() uses key field to determine order, so
> we don't want to allow write to the key field when the node is in a rbtree. If
> such a write were possible the rbtree could easily be placed in an invalid state
> since the new key may mean that the rbtree is no longer sorted. Subsequent add()
> operations would compare less() using the new key, so other nodes will be placed
> in wrong spot as well.
> 
> Since PTR_UNTRUSTED currently allows read but not write, and prevents use of
> non-owning ref as kfunc arg, it seemed to be reasonable tag for less() args.
> 
> I was planning on adding __key / non-owning-ref write support as a followup, but
> adding it as part of this series will probably save a lot of back-and-forth.
> Will try to add it.

Just key mark might not be enough. less() could be doing all sort of complex
logic on more than one field and even global fields.
But what is the concern with writing into 'key' ?
The rbtree will not be sorted. find/add operation will not be correct,
but nothing will crash. At the end bpf_rb_root_free() will walk all
unsorted nodes anyway and free them all.
Even if we pass PTR_TRUSTED | MEM_RDONLY pointers into less() the less()
can still do nonsensical things like returning random true/false.
Doesn't look like an issue to me.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-07 23:06     ` Alexei Starovoitov
@ 2022-12-08  1:18       ` Dave Marchevsky
  2022-12-08  3:51         ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-08  1:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 6:06 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 05:28:34PM -0500, Dave Marchevsky wrote:
>> On 12/7/22 2:36 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, Dec 07, 2022 at 04:39:47AM IST, Dave Marchevsky wrote:
>>>> This series adds a rbtree datastructure following the "next-gen
>>>> datastructure" precedent set by recently-added linked-list [0]. This is
>>>> a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
>>>> instead of adding a new map type. This series adds a smaller set of API
>>>> functions than that RFC - just the minimum needed to support current
>>>> cgfifo example scheduler in ongoing sched_ext effort [2], namely:
>>>>
>>>>   bpf_rbtree_add
>>>>   bpf_rbtree_remove
>>>>   bpf_rbtree_first
>>>>
>>>> [...]
>>>>
>>>> Future work:
>>>>   Enabling writes to release_on_unlock refs should be done before the
>>>>   functionality of BPF rbtree can truly be considered complete.
>>>>   Implementing this proved more complex than expected so it's been
>>>>   pushed off to a future patch.
>>>>
>>
>>>
>>> TBH, I think we need to revisit whether there's a strong need for this. I would
>>> even argue that we should simply make the release semantics of rbtree_add,
>>> list_push helpers stronger and remove release_on_unlock logic entirely,
>>> releasing the node immediately. I don't see why it is so critical to have read,
>>> and more importantly, write access to nodes after losing their ownership. And
>>> that too is only available until the lock is unlocked.
>>>
>>
>> Moved the next paragraph here to ease reply, it was the last paragraph
>> in your response.
>>
>>>
>>> Can you elaborate on actual use cases where immediate release or not having
>>> write support makes it hard or impossible to support a certain use case, so that
>>> it is easier to understand the requirements and design things accordingly?
>>>
>>
>> Sure, the main usecase and impetus behind this for me is the sched_ext work
>> Tejun and others are doing (https://lwn.net/Articles/916291/ ). One of the
>> things they'd like to be able to do is implement a CFS-like scheduler using
>> rbtree entirely in BPF. This would prove that sched_ext + BPF can be used to
>> implement complicated scheduling logic.
>>
>> If we can implement such complicated scheduling logic, but it has so much
>> BPF-specific twisting of program logic that it's incomprehensible to scheduler
>> folks, that's not great. The overlap between "BPF experts" and "scheduler
>> experts" is small, and we want the latter group to be able to read BPF
>> scheduling logic without too much struggle. Lower learning curve makes folks
>> more likely to experiment with sched_ext.
>>
>> When 'rbtree map' was in brainstorming / prototyping, non-owning reference
>> semantics were called out as moving BPF datastructures closer to their kernel
>> equivalents from a UX perspective.
> 
> Our emails crossed. See my previous email.
> Agree on the above.
> 
>> If the "it makes BPF code better resemble normal kernel code" argumentwas the
>> only reason to do this I wouldn't feel so strongly, but there are practical
>> concerns as well:
>>
>> If we could only read / write from rbtree node if it isn't in a tree, the common
>> operation of "find this node and update its data" would require removing and
>> re-adding it. For rbtree, these unnecessary remove and add operations could
> 
> Not really. See my previous email.
> 
>> result in unnecessary rebalancing. Going back to the sched_ext usecase,
>> if we have a rbtree with task or cgroup stats that need to be updated often,
>> unnecessary rebalancing would make this update slower than if non-owning refs
>> allowed in-place read/write of node data.
> 
> Agree. Read/write from non-owning refs is necessary.
> In the other email I'm arguing that PTR_TRUSTED with ref_obj_id == 0
> (your non-owning ref) should not be mixed with release_on_unlock logic.
> 
> KF_RELEASE should still accept as args and release only ptrs with ref_obj_id > 0.
> 
>>
>> Also, we eventually want to be able to have a node that's part of both a
>> list and rbtree. Likely adding such a node to both would require calling
>> kfunc for adding to list, and separate kfunc call for adding to rbtree.
>> Once the node has been added to list, we need some way to represent a reference
>> to that node so that we can pass it to rbtree add kfunc. Sounds like a
>> non-owning reference to me, albeit with different semantics than current
>> release_on_unlock.
> 
> A node with both link list and rbtree would be a new concept.
> We'd need to introduce 'struct bpf_refcnt' and make sure prog does the right thing.
> That's a future discussion.
> 
>>
>>> I think this relaxed release logic and write support is the wrong direction to
>>> take, as it has a direct bearing on what can be done with a node inside the
>>> critical section. There's already the problem with not being able to do
>>> bpf_obj_drop easily inside the critical section with this. That might be useful
>>> for draining operations while holding the lock.
>>>
>>
>> The bpf_obj_drop case is similar to your "can't pass non-owning reference
>> to bpf_rbtree_remove" concern from patch 1's thread. If we have:
>>
>>   n = bpf_obj_new(...); // n is owning ref
>>   bpf_rbtree_add(&tree, &n->node); // n is non-owning ref
> 
> what I proposed in the other email...
> n should be untrusted here.
> That's != 'n is non-owning ref'
> 
>>   res = bpf_rbtree_first(&tree);
>>   if (!res) {...}
>>   m = container_of(res, struct node_data, node); // m is non-owning ref
> 
> agree. m == PTR_TRUSTED with ref_obj_id == 0.
> 
>>   res = bpf_rbtree_remove(&tree, &n->node);
> 
> a typo here? Did you mean 'm->node' ?
> 
> and after 'if (res)' ...
>>   n = container_of(res, struct node_data, node); // n is owning ref, m points to same memory
> 
> agree. n -> ref_obj_id > 0
> 
>>   bpf_obj_drop(n);
> 
> above is ok to do.
> 'n' becomes UNTRUSTED or invalid.
> 
>>   // Not safe to use m anymore
> 
> 'm' should have become UNTRUSTED after bpf_rbtree_remove.
> 
>> Datastructures which support bpf_obj_drop in the critical section can
>> do same as my bpf_rbtree_remove suggestion: just invalidate all non-owning
>> references after bpf_obj_drop.
> 
> 'invalidate all' sounds suspicious.
> I don't think we need to do sweaping search after bpf_obj_drop.
> 
>> Then there's no potential use-after-free.
>> (For the above example, pretend bpf_rbtree_remove didn't already invalidate
>> 'm', or that there's some other way to obtain non-owning ref to 'n''s node
>> after rbtree_remove)
>>
>> I think that, in practice, operations where the BPF program wants to remove
>> / delete nodes will be distinct from operations where program just wants to 
>> obtain some non-owning refs and do read / write. At least for sched_ext usecase
>> this is true. So all the additional clobbers won't require program writer
>> to do special workarounds to deal with verifier in the common case.
>>
>>> Semantically in other languages, once you move an object, accessing it is
>>> usually a bug, and in most of the cases it is sufficient to prepare it before
>>> insertion. We are certainly in the same territory here with these APIs.
>>
>> Sure, but 'add'/'remove' for these intrusive linked datastructures is
>> _not_ a 'move'. Obscuring this from the user and forcing them to use
>> less performant patterns for the sake of some verifier complexity, or desire
>> to mimic semantics of languages w/o reference stability, doesn't make sense to
>> me.
> 
> I agree, but everything we discuss in the above looks orthogonal
> to release_on_unlock that myself and Kumar are proposing to drop.
> 
>> If we were to add some datastructures without reference stability, sure, let's
>> not do non-owning references for those. So let's make this non-owning reference
>> stuff easy to turn on/off, perhaps via KF_RELEASE_NON_OWN or similar flags,
>> which will coincidentally make it very easy to remove if we later decide that
>> the complexity isn't worth it. 
> 
> You mean KF_RELEASE_NON_OWN would be applied to bpf_rbtree_remove() ?
> So it accepts PTR_TRUSTED ref_obj_id == 0 arg and makes it PTR_UNTRUSTED ?
> If so then I agree. The 'release' part of the name was confusing.
> It's also not clear which arg it applies to.
> bpf_rbtree_remove has two args. Both are PTR_TRUSTED.
> I wouldn't introduce a new flag for this just yet.
> We can hard code bpf_rbtree_remove, bpf_list_pop for now
> or use our name suffix hack.

Before replying to specific things in this email, I think it would be useful
to have a subthread clearing up definitions and semantics, as I think we're
talking past each other a bit.


On a conceptual level I've still been using "owning reference" and "non-owning
reference" to understand rbtree operations. I'll use those here and try to map
them to actual verifier concepts later.

owning reference

  * This reference controls the lifetime of the pointee
  * Ownership of pointee must be 'released' by passing it to some rbtree
    API kfunc - rbtree_add in our case -  or via bpf_obj_drop, which free's
    * If not released before program ends, verifier considers prog invalid
  * Access to the memory ref is pointing at will not page fault

non-owning reference

  * No ownership of pointee so can't pass ownership via rbtree_add, not allowed
    to bpf_obj_drop
  * No control of lifetime, but can infer memory safety based on context
    (see explanation below)
  * Access to the memory ref is pointing at will not page fault
    (see explanation below)

2) From verifier's perspective non-owning references can only exist
between spin_lock and spin_unlock. Why? After spin_unlock another program
can do arbitrary operations on the rbtree like removing and free-ing
via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
free'd, and reused via bpf_obj_new would point to an entirely different thing.
Or the memory could go away.

To prevent this logic violation all non-owning references are invalidated by
verifier after critical section ends. This is necessary to ensure "will
not page fault" property of non-owning reference. So if verifier hasn't
invalidated a non-owning ref, accessing it will not page fault.

Currently bpf_obj_drop is not allowed in the critical section, so similarly,
if there's a valid non-owning ref, we must be in critical section, and can
conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
and-reused.

1) Any reference to a node that is in a rbtree _must_ be non-owning, since
the tree has control of pointee lifetime. Similarly, any ref to a node
that isn't in rbtree _must_ be owning. (let's ignore raw read from kptr_xchg'd
node in map_val for now)

Moving on to rbtree API:

bpf_rbtree_add(&tree, &node);
  'node' is an owning ref, becomes a non-owning ref.

bpf_rbtree_first(&tree);
  retval is a non-owning ref, since first() node is still in tree

bpf_rbtree_remove(&tree, &node);
  'node' is a non-owning ref, retval is an owning ref

All of the above can only be called when rbtree's lock is held, so invalidation
of all non-owning refs on spin_unlock is fine for rbtree_remove.

Nice property of paragraph marked with 1) above is the ability to use the
type system to prevent rbtree_add of node that's already in rbtree and
rbtree_remove of node that's not in one. So we can forego runtime
checking of "already in tree", "already not in tree".

But, as you and Kumar talked about in the past and referenced in patch 1's
thread, non-owning refs may alias each other, or an owning ref, and have no
way of knowing whether this is the case. So if X and Y are two non-owning refs
that alias each other, and bpf_rbtree_remove(tree, X) is called, a subsequent
call to bpf_rbtree_remove(tree, Y) would be removing node from tree which
already isn't in any tree (since prog has an owning ref to it). But verifier
doesn't know X and Y alias each other. So previous paragraph's "forego
runtime checks" statement can only hold if we invalidate all non-owning refs
after 'destructive' rbtree_remove operation.


It doesn't matter to me which combination of type flags, ref_obj_id, other
reg state stuff, and special-casing is used to implement owning and non-owning
refs. Specific ones chosen in this series for rbtree node:

owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
            ref_obj_id > 0

non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
                PTR_UNTRUSTED
                  - used for "can't pass ownership", not PROBE_MEM
                  - this is why I mentioned "decomposing UNTRUSTED into more
                    granular reg traits" in another thread
                ref_obj_id > 0
                release_on_unlock = true
                  - used due to paragraphs starting with 2) above                

Any other combination of type and reg state that gives me the semantics def'd
above works4me.


Based on this reply and others from today, I think you're saying that these
concepts should be implemented using:

owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
            PTR_TRUSTED
            ref_obj_id > 0

non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
                PTR_TRUSTED
                ref_obj_id == 0
                 - used for "can't pass ownership", since funcs that expect
                   owning ref need ref_obj_id > 0

And you're also adding 'untrusted' here, mainly as a result of
bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
instead of becoming a non-owning ref. 'untrusted' would have state like:

PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
PTR_UNTRUSTED
ref_obj_id == 0?

I think your "non-owning ref" definition also differs from mine, specifically
yours doesn't seem to have "will not page fault". For this reason, you don't
see the need for release_on_unlock logic, since that's used to prevent refs
escaping critical section and potentially referring to free'd memory.

This is where I start to get confused. Some questions:

  * If we get rid of release_on_unlock, and with mass invalidation of
    non-owning refs entirely, shouldn't non-owning refs be marked PTR_UNTRUSTED?

  * Since refs can alias each other, how to deal with bpf_obj_drop-and-reuse
    in this scheme, since non-owning ref can escape spin_unlock b/c no mass
    invalidation? PTR_UNTRUSTED isn't sufficient here

  * If non-owning ref can live past spin_unlock, do we expect read from
    such ref after _unlock to go through bpf_probe_read()? Otherwise direct
    read might fault and silently write 0.

  * For your 'untrusted', but not non-owning ref concept, I'm not sure
    what this gives us that's better than just invalidating the ref which
    gets in this state (rbtree_{add,remove} 'node' arg, bpf_obj_drop node)

I'm also not sure if you agree with my paragraph marked 1) above. But IMO the
release_on_unlock difference, and the perhaps-differing non-owning ref concept
are where we're really talking past each other.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08  1:18       ` Dave Marchevsky
@ 2022-12-08  3:51         ` Alexei Starovoitov
  2022-12-08  8:28           ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-08  3:51 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Wed, Dec 07, 2022 at 08:18:25PM -0500, Dave Marchevsky wrote:
> 
> Before replying to specific things in this email, I think it would be useful
> to have a subthread clearing up definitions and semantics, as I think we're
> talking past each other a bit.

Yeah. We were not on the same page.
The concepts of 'owning ref' and 'non-owning ref' appeared 'new' to me.
I remember discussing 'conditional release' and OBJ_NON_OWNING_REF long ago
and I thought we agreed that both are not necessary and with that
I assumed that anything 'non-owning' as a concept is gone too.
So the only thing left (in my mind) was the 'owning' concept.
Which I mapped as ref_obj_id > 0. In other words 'owning' meant 'acquired'.

Please have this detailed explanation in the commit log next time to
avoid this back and forth.
Now to the fun part...

> 
> On a conceptual level I've still been using "owning reference" and "non-owning
> reference" to understand rbtree operations. I'll use those here and try to map
> them to actual verifier concepts later.
> 
> owning reference
> 
>   * This reference controls the lifetime of the pointee
>   * Ownership of pointee must be 'released' by passing it to some rbtree
>     API kfunc - rbtree_add in our case -  or via bpf_obj_drop, which free's
>     * If not released before program ends, verifier considers prog invalid
>   * Access to the memory ref is pointing at will not page fault

agree.

> non-owning reference
> 
>   * No ownership of pointee so can't pass ownership via rbtree_add, not allowed
>     to bpf_obj_drop
>   * No control of lifetime, but can infer memory safety based on context
>     (see explanation below)
>   * Access to the memory ref is pointing at will not page fault
>     (see explanation below)

agree with addition that both read and write should be allowed into this
'non-owning' ptr.
Which breaks if you map this to something that ORs with PTR_UNTRUSTED.

> 2) From verifier's perspective non-owning references can only exist
> between spin_lock and spin_unlock. Why? After spin_unlock another program
> can do arbitrary operations on the rbtree like removing and free-ing
> via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
> free'd, and reused via bpf_obj_new would point to an entirely different thing.
> Or the memory could go away.

agree that spin_unlock needs to clean up 'non-owning'.

> To prevent this logic violation all non-owning references are invalidated by
> verifier after critical section ends. This is necessary to ensure "will
> not page fault" property of non-owning reference. So if verifier hasn't
> invalidated a non-owning ref, accessing it will not page fault.
> 
> Currently bpf_obj_drop is not allowed in the critical section, so similarly,
> if there's a valid non-owning ref, we must be in critical section, and can
> conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
> and-reused.

I don't understand why is that a problem.

> 1) Any reference to a node that is in a rbtree _must_ be non-owning, since
> the tree has control of pointee lifetime. Similarly, any ref to a node
> that isn't in rbtree _must_ be owning. (let's ignore raw read from kptr_xchg'd
> node in map_val for now)

Also not clear why such restriction is necessary.

> Moving on to rbtree API:
> 
> bpf_rbtree_add(&tree, &node);
>   'node' is an owning ref, becomes a non-owning ref.
> 
> bpf_rbtree_first(&tree);
>   retval is a non-owning ref, since first() node is still in tree
> 
> bpf_rbtree_remove(&tree, &node);
>   'node' is a non-owning ref, retval is an owning ref

agree on the above definition.

> All of the above can only be called when rbtree's lock is held, so invalidation
> of all non-owning refs on spin_unlock is fine for rbtree_remove.
> 
> Nice property of paragraph marked with 1) above is the ability to use the
> type system to prevent rbtree_add of node that's already in rbtree and
> rbtree_remove of node that's not in one. So we can forego runtime
> checking of "already in tree", "already not in tree".

I think it's easier to add runtime check inside bpf_rbtree_remove()
since it already returns MAYBE_NULL. No 'conditional release' necessary.
And with that we don't need to worry about aliases.

> But, as you and Kumar talked about in the past and referenced in patch 1's
> thread, non-owning refs may alias each other, or an owning ref, and have no
> way of knowing whether this is the case. So if X and Y are two non-owning refs
> that alias each other, and bpf_rbtree_remove(tree, X) is called, a subsequent
> call to bpf_rbtree_remove(tree, Y) would be removing node from tree which
> already isn't in any tree (since prog has an owning ref to it). But verifier
> doesn't know X and Y alias each other. So previous paragraph's "forego
> runtime checks" statement can only hold if we invalidate all non-owning refs
> after 'destructive' rbtree_remove operation.

right. we either invalidate all non-owning after bpf_rbtree_remove
or do run-time check in bpf_rbtree_remove.
Consider the following:
bpf_spin_lock
n = bpf_rbtree_first(root);
m = bpf_rbtree_first(root);
x = bpf_rbtree_remove(root, n)
y = bpf_rbtree_remove(root, m)
bpf_spin_unlock
if (x)
   bpf_obj_drop(x)
if (y)
   bpf_obj_drop(y)

If we invalidate after bpf_rbtree_remove() the above will be rejected by the verifier.
If we do run-time check the above will be accepted and will work without crashing.

The problem with release_on_unlock is that it marks 'n' after 1st remove
as UNTRUSTED which means 'no write' and 'read via probe_read'.
That's not good imo.

> 
> It doesn't matter to me which combination of type flags, ref_obj_id, other
> reg state stuff, and special-casing is used to implement owning and non-owning
> refs. Specific ones chosen in this series for rbtree node:
> 
> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
>             ref_obj_id > 0
> 
> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
>                 PTR_UNTRUSTED
>                   - used for "can't pass ownership", not PROBE_MEM
>                   - this is why I mentioned "decomposing UNTRUSTED into more
>                     granular reg traits" in another thread

Now I undestand, but that was very hard to grasp.
UNTRUSTED means 'no write' and 'read via probe_read'.
ref_set_release_on_unlock() also keeps ref_obj_id > 0 as you're correctly
pointing out below:
>                 ref_obj_id > 0
>                 release_on_unlock = true
>                   - used due to paragraphs starting with 2) above                

but the problem with ref_set_release_on_unlock() that it mixes real ref-d
pointers with ref_obj_id > 0 with UNTRUSTED && ref_obj_id > 0.
And the latter is a quite confusing combination in my mind,
since we consider everything with ref_obj_id > 0 as good for KF_TRUSTED_ARGS.

> Any other combination of type and reg state that gives me the semantics def'd
> above works4me.
> 
> 
> Based on this reply and others from today, I think you're saying that these
> concepts should be implemented using:
> 
> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>             PTR_TRUSTED
>             ref_obj_id > 0

Almost.
I propose:
PTR_TO_BTF_ID | MEM_ALLOC  && ref_obj_id > 0

See the definition of is_trusted_reg().
It's ref_obj_id > 0 || flag == (MEM_ALLOC | PTR_TRUSTED)

I was saying 'trusted' because of is_trusted_reg() definition.
Sorry for confusion.

> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>                 PTR_TRUSTED
>                 ref_obj_id == 0
>                  - used for "can't pass ownership", since funcs that expect
>                    owning ref need ref_obj_id > 0

I propose:
PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0

Both 'owning' and 'non-owning' will fit for KF_TRUSTED_ARGS kfuncs.

And we will be able to pass 'non-owning' under spin_lock into other kfuncs
and owning outside of spin_lock into other kfuncs.
Which is a good thing.

> And you're also adding 'untrusted' here, mainly as a result of
> bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
> instead of becoming a non-owning ref. 'untrusted' would have state like:
> 
> PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
> PTR_UNTRUSTED
> ref_obj_id == 0?

I'm not sure whether we really need full untrusted after going through bpf_rbtree_add()
or doing 'non-owning' is enough.
If it's full untrusted it will be:
PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0

tbh I don't remember why we even have 'MEM_ALLOC | PTR_UNTRUSTED'.

> I think your "non-owning ref" definition also differs from mine, specifically
> yours doesn't seem to have "will not page fault". For this reason, you don't
> see the need for release_on_unlock logic, since that's used to prevent refs
> escaping critical section and potentially referring to free'd memory.

Not quite.
We should be able to read/write directly through
PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
and we need to convert it to __mark_reg_unknown() after bpf_spin_unlock
the way release_reference() is doing.
I'm just not happy with using acquire_reference/release_reference() logic
(as release_on_unlock is doing) for cleaning after unlock.
Since we need to clean 'non-owning' ptrs in unlock it's confusing
to call the process 'release'.
I was hoping we can search through all states and __mark_reg_unknown() (or UNTRUSTED)
every reg where 
reg->id == cur_state->active_lock.id &&
flag == PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0

By deleting relase_on_unlock I meant delete release_on_unlock flag
and remove ref_set_release_on_unlock.

> This is where I start to get confused. Some questions:
> 
>   * If we get rid of release_on_unlock, and with mass invalidation of
>     non-owning refs entirely, shouldn't non-owning refs be marked PTR_UNTRUSTED?

Since we'll be cleaning all
PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
it shouldn't affect ptrs with ref_obj_id > 0 that came from bpf_obj_new.

The verifier already enforces that bpf_spin_unlock will be present
at the right place in bpf prog.
When the verifier sees it it will clean all non-owning refs with this spinlock 'id'.
So no concerns of leaking 'non-owning' outside.

While processing bpf_rbtree_first we need to:
regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
regs[BPF_REG_0].id = active_lock.id;
regs[BPF_REG_0].ref_obj_id = 0;

>   * Since refs can alias each other, how to deal with bpf_obj_drop-and-reuse
>     in this scheme, since non-owning ref can escape spin_unlock b/c no mass
>     invalidation? PTR_UNTRUSTED isn't sufficient here

run-time check in bpf_rbtree_remove (and in the future bpf_list_remove)
should address it, no?

>   * If non-owning ref can live past spin_unlock, do we expect read from
>     such ref after _unlock to go through bpf_probe_read()? Otherwise direct
>     read might fault and silently write 0.

unlock has to clean them.

>   * For your 'untrusted', but not non-owning ref concept, I'm not sure
>     what this gives us that's better than just invalidating the ref which
>     gets in this state (rbtree_{add,remove} 'node' arg, bpf_obj_drop node)

Whether to mark unknown or untrusted or non-owning after bpf_rbtree_add() is a difficult one.
Untrusted will allow prog to do read only access (via probe_read) into the node
but might hide bugs.
The cleanup after bpf_spin_unlock of non-owning and clean up after
bpf_rbtree_add() does not have to be the same.
Currently I'm leaning towards PTR_UNTRUSTED for cleanup after bpf_spin_unlock
and non-owning after bpf_rbtree_add.

Walking the example from previous email:

struct bpf_rbtree_iter it;
struct bpf_rb_node * node;
struct bpf_rb_node *n, *m;

bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree works as bpf_spin_lock
while ((node = bpf_rbtree_iter_next(&it)) {
  // node -> PTR_TO_BTF_ID | MEM_ALLOC | MAYBE_NULL && ref_obj_id == 0
  if (node && node->field == condition) {

    n = bpf_rbtree_remove(rb_root, node);
    if (!n) ...;
    // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == X
    m = bpf_rbtree_remove(rb_root, node); // ok, but fails in run-time
    if (!m) ...;
    // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y

    // node is still:
    // node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id

    // assume we allow double locks one day
    bpf_spin_lock(another_rb_root);
    bpf_rbtree_add(another_rb_root, n);
    // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[1].id
    bpf_spin_unlock(another_rb_root);
    // n -> PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
    break;
  }
}
// node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id
bpf_rbtree_iter_destroy(&it); // does unlock
// node -> PTR_TO_BTF_ID | PTR_UNTRUSTED
// n -> PTR_TO_BTF_ID | PTR_UNTRUSTED
// m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
bpf_obj_drop(m);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08  3:51         ` Alexei Starovoitov
@ 2022-12-08  8:28           ` Dave Marchevsky
  2022-12-08 12:57             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-08  8:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 12/7/22 10:51 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 08:18:25PM -0500, Dave Marchevsky wrote:
>>
>> Before replying to specific things in this email, I think it would be useful
>> to have a subthread clearing up definitions and semantics, as I think we're
>> talking past each other a bit.
> 
> Yeah. We were not on the same page.
> The concepts of 'owning ref' and 'non-owning ref' appeared 'new' to me.
> I remember discussing 'conditional release' and OBJ_NON_OWNING_REF long ago
> and I thought we agreed that both are not necessary and with that
> I assumed that anything 'non-owning' as a concept is gone too.
> So the only thing left (in my mind) was the 'owning' concept.
> Which I mapped as ref_obj_id > 0. In other words 'owning' meant 'acquired'.
> 

Whereas in my mind the release_on_unlock logic was specifically added to
implement the mass invalidation part of non-owning reference semantics, and it
being accepted implied that we weren't getting rid of the concept :).

> Please have this detailed explanation in the commit log next time to
> avoid this back and forth.
> Now to the fun part...
> 

I will add a documentation commit explaining 'owning' and 'non-owning' ref
as they pertain to these datastructures, after we agree about the semantics.

Speaking of which, although I have a few questions / clarifications, I think
we're more in agreement after your reply. After one more round of clarification
I will summarize conclusions to see if we agree on enough to move forward.

>>
>> On a conceptual level I've still been using "owning reference" and "non-owning
>> reference" to understand rbtree operations. I'll use those here and try to map
>> them to actual verifier concepts later.
>>
>> owning reference
>>
>>   * This reference controls the lifetime of the pointee
>>   * Ownership of pointee must be 'released' by passing it to some rbtree
>>     API kfunc - rbtree_add in our case -  or via bpf_obj_drop, which free's
>>     * If not released before program ends, verifier considers prog invalid
>>   * Access to the memory ref is pointing at will not page fault
> 
> agree.
> 
>> non-owning reference
>>
>>   * No ownership of pointee so can't pass ownership via rbtree_add, not allowed
>>     to bpf_obj_drop
>>   * No control of lifetime, but can infer memory safety based on context
>>     (see explanation below)
>>   * Access to the memory ref is pointing at will not page fault
>>     (see explanation below)
> 
> agree with addition that both read and write should be allowed into this
> 'non-owning' ptr.
> Which breaks if you map this to something that ORs with PTR_UNTRUSTED.
> 

Agree re: read/write allowed. PTR_UNTRUSTED was an implementation detail.
Sounds like we agree on general purpose of owning, non-owning. Looks like
we're in agreement about above semantics.

>> 2) From verifier's perspective non-owning references can only exist
>> between spin_lock and spin_unlock. Why? After spin_unlock another program
>> can do arbitrary operations on the rbtree like removing and free-ing
>> via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
>> free'd, and reused via bpf_obj_new would point to an entirely different thing.
>> Or the memory could go away.
> 
> agree that spin_unlock needs to clean up 'non-owning'.

Another point of agreement.

> 
>> To prevent this logic violation all non-owning references are invalidated by
>> verifier after critical section ends. This is necessary to ensure "will
>> not page fault" property of non-owning reference. So if verifier hasn't
>> invalidated a non-owning ref, accessing it will not page fault.
>>
>> Currently bpf_obj_drop is not allowed in the critical section, so similarly,
>> if there's a valid non-owning ref, we must be in critical section, and can
>> conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
>> and-reused.
> 
> I don't understand why is that a problem.
> 
>> 1) Any reference to a node that is in a rbtree _must_ be non-owning, since
>> the tree has control of pointee lifetime. Similarly, any ref to a node
>> that isn't in rbtree _must_ be owning. (let's ignore raw read from kptr_xchg'd
>> node in map_val for now)
> 
> Also not clear why such restriction is necessary.
> 

If we have this restriction and bpf_rbtree_release also mass invalidates
non-owning refs, the type system will ensure that only nodes that are in a tree
will be passed to bpf_rbtree_release, and we can avoid the runtime check.

But below you mention preferring the runtime check, mostly noting here to
refer back when continuing reply below.

>> Moving on to rbtree API:
>>
>> bpf_rbtree_add(&tree, &node);
>>   'node' is an owning ref, becomes a non-owning ref.
>>
>> bpf_rbtree_first(&tree);
>>   retval is a non-owning ref, since first() node is still in tree
>>
>> bpf_rbtree_remove(&tree, &node);
>>   'node' is a non-owning ref, retval is an owning ref
> 
> agree on the above definition.
> >> All of the above can only be called when rbtree's lock is held, so invalidation
>> of all non-owning refs on spin_unlock is fine for rbtree_remove.
>>
>> Nice property of paragraph marked with 1) above is the ability to use the
>> type system to prevent rbtree_add of node that's already in rbtree and
>> rbtree_remove of node that's not in one. So we can forego runtime
>> checking of "already in tree", "already not in tree".
> 
> I think it's easier to add runtime check inside bpf_rbtree_remove()
> since it already returns MAYBE_NULL. No 'conditional release' necessary.
> And with that we don't need to worry about aliases.
> 

To clarify: You're proposing that we don't worry about solving the aliasing
problem at verification time. Instead rbtree_{add,remove} will deal with it
at runtime. Corollary of this is that my restriction tagged 1) above ("ref
to node in tree _must_ be non-owning, to node not in tree must be owning")
isn't something we're guaranteeing, due to possibility of aliasing.

So bpf_rbtree_remove might get a node that's not in tree, and
bpf_rbtree_add might get a node that's already in tree. Runtime behavior
of both should be 'nop'.


If that is an accurate restatement of your proposal, the verifier
logic will need to be changed:

For bpf_rbtree_remove(&tree, &node), if node is already not in a tree,
retval will be NULL, effectively not acquiring an owning ref due to
mark_ptr_or_null_reg's logic.

In this case, do we want to invalidate
arg 'node' as well? Or just leave it as a non-owning ref that points
to node not in tree? I think the latter requires fewer verifier changes,
but can see the argument for the former if we want restriction 1) to
mostly be true, unless aliasing.

The above scenario is the only case where bpf_rbtree_remove fails and
returns NULL.

(In this series it can fail and RET_NULL for this reason, but my earlier comment
about type system + invalidate all-non owning after remove as discussed below
was my original intent. So I shouldn't have been allowing RET_NULL for my
version of these semantics.)


For bpf_rbtree_add(&tree, &node, less), if arg is already in tree, then
'node' isn't really an owning ref, and we need to tag it as non-owning,
and program then won't need to bpf_obj_drop it before exiting. If node
wasn't already in tree and rbtree_add actually added it, 'node' would
also be tagged as non-owning, since tree now owns it. 

Do we need some way to indicate whether 'already in tree' case happened?
If so, would need to change retval from void to bool or struct bpf_rb_node *.

Above scenario is only case where bpf_rbtree_add fails and returns
NULL / false. 

>> But, as you and Kumar talked about in the past and referenced in patch 1's
>> thread, non-owning refs may alias each other, or an owning ref, and have no
>> way of knowing whether this is the case. So if X and Y are two non-owning refs
>> that alias each other, and bpf_rbtree_remove(tree, X) is called, a subsequent
>> call to bpf_rbtree_remove(tree, Y) would be removing node from tree which
>> already isn't in any tree (since prog has an owning ref to it). But verifier
>> doesn't know X and Y alias each other. So previous paragraph's "forego
>> runtime checks" statement can only hold if we invalidate all non-owning refs
>> after 'destructive' rbtree_remove operation.
> 
> right. we either invalidate all non-owning after bpf_rbtree_remove
> or do run-time check in bpf_rbtree_remove.
> Consider the following:
> bpf_spin_lock
> n = bpf_rbtree_first(root);
> m = bpf_rbtree_first(root);
> x = bpf_rbtree_remove(root, n)
> y = bpf_rbtree_remove(root, m)
> bpf_spin_unlock
> if (x)
>    bpf_obj_drop(x)
> if (y)
>    bpf_obj_drop(y)
> 
> If we invalidate after bpf_rbtree_remove() the above will be rejected by the verifier.
> If we do run-time check the above will be accepted and will work without crashing.
> 

Agreed, although the above example's invalid double-remove of same node is
the kind of thing I'd like to be prevented at verification time instead of
runtime. Regardless, continuing with your runtime check idea.

> The problem with release_on_unlock is that it marks 'n' after 1st remove
> as UNTRUSTED which means 'no write' and 'read via probe_read'.
> That's not good imo.
>

Based on your response to paragraph below this one, I think we're in agreement
that using PTR_UNTRUSTED for non-owning ref gives non-owning ref bunch of traits
it doesn't need, when I just wanted "can't pass ownership". So agreed that
PTR_UNTRUSTED is too blunt an instrument here.

Regarding "marks 'n' after 1st remove", the series isn't currently doing this,
I proposed it as a way to prevent aliasing problem, but I think your proposal
is explicitly not trying to prevent aliasing problem at verification time. So
for your semantics we would only have non-owning cleanup after spin_unlock.
And such cleanup might just mark refs PTR_UNTRUSTED instead of invalidating
entirely.

>>
>> It doesn't matter to me which combination of type flags, ref_obj_id, other
>> reg state stuff, and special-casing is used to implement owning and non-owning
>> refs. Specific ones chosen in this series for rbtree node:
>>
>> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
>>             ref_obj_id > 0
>>
>> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
>>                 PTR_UNTRUSTED
>>                   - used for "can't pass ownership", not PROBE_MEM
>>                   - this is why I mentioned "decomposing UNTRUSTED into more
>>                     granular reg traits" in another thread
> 
> Now I undestand, but that was very hard to grasp.
> UNTRUSTED means 'no write' and 'read via probe_read'.
> ref_set_release_on_unlock() also keeps ref_obj_id > 0 as you're correctly
> pointing out below:
>>                 ref_obj_id > 0
>>                 release_on_unlock = true
>>                   - used due to paragraphs starting with 2) above                
> 
> but the problem with ref_set_release_on_unlock() that it mixes real ref-d
> pointers with ref_obj_id > 0 with UNTRUSTED && ref_obj_id > 0.
> And the latter is a quite confusing combination in my mind,
> since we consider everything with ref_obj_id > 0 as good for KF_TRUSTED_ARGS.
> 

I think I understand your desire to get rid of release_on_unlock now. It's not
due to disliking the concept of "clean up non-owning refs after spin_unlock",
which you earlier agreed was necessary, but rather the specifics of
release_on_unlock mechanism used to achieve this. 

If so, I think I agree with your reasoning for why the mechanism is bad in
light of how you want owning/non-owning implemented. To summarize your
statements about release_on_unlock mechanism from the rest of your reply:

  * 'ref_obj_id > 0' already has a specific meaning wrt. is_trusted_reg,
    and we may want to support both TRUSTED and UNTRUSTED non-owning refs

    * My comment: Currently is_trusted_reg is only used for
      KF_ARG_PTR_TO_BTF_ID, while rbtree and list types are assigned special
      KF_ARGs. So hypothetically could have different 'is_trusted_reg' logic.
      I don't actually think that's a good idea, though, especially since
      rbtree / list types are really specializations of PTR_TO_BTF_ID anyways.
      So agreed.

  * Instead of using 'acquire' and (modified) 'release', we can achieve
    "clean-up non-owning after spin_unlock" by associating non-owning
    refs with active_lock.id when they're created. We can store this in
    reg.id, which is currently unused for PTR_TO_BTF_ID (afaict).

    * This will solve issue raised by previous point, allowing us to have
      non-owning refs which are truly 'untrusted' according to is_trusted_reg.

    * My comment: This all sounds reasonable. On spin_unlock we have
      active_lock.id, so can do bpf_for_each_reg_in_vstate to look for
      PTR_TO_BTF_IDs matching the id and do 'cleanup' for them.

>> Any other combination of type and reg state that gives me the semantics def'd
>> above works4me.
>>
>>
>> Based on this reply and others from today, I think you're saying that these
>> concepts should be implemented using:
>>
>> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>>             PTR_TRUSTED
>>             ref_obj_id > 0
> 
> Almost.
> I propose:
> PTR_TO_BTF_ID | MEM_ALLOC  && ref_obj_id > 0
> 
> See the definition of is_trusted_reg().
> It's ref_obj_id > 0 || flag == (MEM_ALLOC | PTR_TRUSTED)
> 
> I was saying 'trusted' because of is_trusted_reg() definition.
> Sorry for confusion.
>

I see. Sounds reasonable.

>> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>>                 PTR_TRUSTED
>>                 ref_obj_id == 0
>>                  - used for "can't pass ownership", since funcs that expect
>>                    owning ref need ref_obj_id > 0
> 
> I propose:
> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> 

Also sounds reasonable, perhaps with the addition of id > 0 to account for
your desired changes to release_on_unlock mechanism?

> Both 'owning' and 'non-owning' will fit for KF_TRUSTED_ARGS kfuncs.
> 
> And we will be able to pass 'non-owning' under spin_lock into other kfuncs
> and owning outside of spin_lock into other kfuncs.
> Which is a good thing.
> 

Allowing passing of owning ref outside of spin_lock sounds reasonable to me.
'non-owning' under spinlock will have the same "what if this touches __key"
issue I brought up in another thread. But you mentioned not preventing that
and I don't necessarily disagree, so just noting here.

>> And you're also adding 'untrusted' here, mainly as a result of
>> bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
>> instead of becoming a non-owning ref. 'untrusted' would have state like:
>>
>> PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>> PTR_UNTRUSTED
>> ref_obj_id == 0?
> 
> I'm not sure whether we really need full untrusted after going through bpf_rbtree_add()
> or doing 'non-owning' is enough.
> If it's full untrusted it will be:
> PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
> 

Yeah, I don't see what this "full untrusted" is giving us either. Let's have
"cleanup non-owning refs on spin_unlock" just invalidate the regs for now,
instead of converting to "full untrusted"?

Adding "full untrusted" later won't make any valid programs written with
"just invalidate the regs" in mind fail the verifier. So painless to add later.

> tbh I don't remember why we even have 'MEM_ALLOC | PTR_UNTRUSTED'.
> 

I think such type combo was only added to implement non-owning refs. If it's
rewritten to use your type combos I don't think there'll be any uses of
MEM_ALLOC | PTR_UNTRUSTED remaining.

>> I think your "non-owning ref" definition also differs from mine, specifically
>> yours doesn't seem to have "will not page fault". For this reason, you don't
>> see the need for release_on_unlock logic, since that's used to prevent refs
>> escaping critical section and potentially referring to free'd memory.
> 
> Not quite.
> We should be able to read/write directly through
> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> and we need to convert it to __mark_reg_unknown() after bpf_spin_unlock
> the way release_reference() is doing.
> I'm just not happy with using acquire_reference/release_reference() logic
> (as release_on_unlock is doing) for cleaning after unlock.
> Since we need to clean 'non-owning' ptrs in unlock it's confusing
> to call the process 'release'.
> I was hoping we can search through all states and __mark_reg_unknown() (or UNTRUSTED)
> every reg where 
> reg->id == cur_state->active_lock.id &&
> flag == PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> 
> By deleting relase_on_unlock I meant delete release_on_unlock flag
> and remove ref_set_release_on_unlock.
> 

Summarized above, but: agreed, and thanks for clarifying what you meant by 
"delete release_on_unlock".

>> This is where I start to get confused. Some questions:
>>
>>   * If we get rid of release_on_unlock, and with mass invalidation of
>>     non-owning refs entirely, shouldn't non-owning refs be marked PTR_UNTRUSTED?
> 
> Since we'll be cleaning all
> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> it shouldn't affect ptrs with ref_obj_id > 0 that came from bpf_obj_new.
> 
> The verifier already enforces that bpf_spin_unlock will be present
> at the right place in bpf prog.
> When the verifier sees it it will clean all non-owning refs with this spinlock 'id'.
> So no concerns of leaking 'non-owning' outside.
> 

Sounds like we don't want "full untrusted" or any PTR_UNTRUSTED non-owning ref.

> While processing bpf_rbtree_first we need to:
> regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> regs[BPF_REG_0].id = active_lock.id;
> regs[BPF_REG_0].ref_obj_id = 0;
> 

Agreed.

>>   * Since refs can alias each other, how to deal with bpf_obj_drop-and-reuse
>>     in this scheme, since non-owning ref can escape spin_unlock b/c no mass
>>     invalidation? PTR_UNTRUSTED isn't sufficient here
> 
> run-time check in bpf_rbtree_remove (and in the future bpf_list_remove)
> should address it, no?
> 

If we don't do "full untrusted" and cleanup non-owning refs by invalidating,
_and_ don't allow bpf_obj_{new,drop} in critical section, then I don't think
this is an issue.

But to elaborate on the issue, if we instead cleaned up non-owning by marking 
untrusted:

struct node_data *n = bpf_obj_new(typeof(*n));
struct node_data *m, *o;
struct some_other_type *t;

bpf_spin_lock(&lock);

bpf_rbtree_add(&tree, n);
m = bpf_rbtree_first();
o = bpf_rbtree_first(); // m and o are non-owning, point to same node

m = bpf_rbtree_remove(&tree, m); // m is owning

bpf_spin_unlock(&lock); // o is "full untrusted", marked PTR_UNTRUSTED

bpf_obj_drop(m);
t = bpf_obj_new(typeof(*t)); // pretend that exact chunk of memory that was
                             // dropped in previous statement is returned here

data = o->some_data_field;   // PROBE_MEM, but no page fault, so load will
                             // succeed, but will read garbage from another type
                             // while verifier thinks it's reading from node_data


If we clean up by invalidating, but eventually enable bpf_obj_{new,drop} inside
critical section, we'll have similar issue.

It's not necessarily "crash the kernel" dangerous, but it may anger program
writers since they can't be sure they're not reading garbage in this scenario.

>>   * If non-owning ref can live past spin_unlock, do we expect read from
>>     such ref after _unlock to go through bpf_probe_read()? Otherwise direct
>>     read might fault and silently write 0.
> 
> unlock has to clean them.
> 

Ack.

>>   * For your 'untrusted', but not non-owning ref concept, I'm not sure
>>     what this gives us that's better than just invalidating the ref which
>>     gets in this state (rbtree_{add,remove} 'node' arg, bpf_obj_drop node)
> 
> Whether to mark unknown or untrusted or non-owning after bpf_rbtree_add() is a difficult one.
> Untrusted will allow prog to do read only access (via probe_read) into the node
> but might hide bugs.
> The cleanup after bpf_spin_unlock of non-owning and clean up after
> bpf_rbtree_add() does not have to be the same.

This is a good point.

> Currently I'm leaning towards PTR_UNTRUSTED for cleanup after bpf_spin_unlock
> and non-owning after bpf_rbtree_add.
> 
> Walking the example from previous email:
> 
> struct bpf_rbtree_iter it;
> struct bpf_rb_node * node;
> struct bpf_rb_node *n, *m;
> 
> bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree works as bpf_spin_lock
> while ((node = bpf_rbtree_iter_next(&it)) {
>   // node -> PTR_TO_BTF_ID | MEM_ALLOC | MAYBE_NULL && ref_obj_id == 0
>   if (node && node->field == condition) {
> 
>     n = bpf_rbtree_remove(rb_root, node);
>     if (!n) ...;
>     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == X
>     m = bpf_rbtree_remove(rb_root, node); // ok, but fails in run-time
>     if (!m) ...;
>     // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
> 
>     // node is still:
>     // node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id
> 
>     // assume we allow double locks one day
>     bpf_spin_lock(another_rb_root);
>     bpf_rbtree_add(another_rb_root, n);
>     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[1].id
>     bpf_spin_unlock(another_rb_root);
>     // n -> PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
>     break;
>   }
> }
> // node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id
> bpf_rbtree_iter_destroy(&it); // does unlock
> // node -> PTR_TO_BTF_ID | PTR_UNTRUSTED
> // n -> PTR_TO_BTF_ID | PTR_UNTRUSTED
> // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
> bpf_obj_drop(m);

This seems like a departure from other statements in your reply, where you're
leaning towards "non-owning and trusted" -> "full untrusted" after unlock
being unnecessary. I think the combo of reference aliases + bpf_obj_drop-and-
reuse make everything hard to reason about.

Regardless, your comments annotating reg state look correct to me.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  2022-12-08  0:47           ` Alexei Starovoitov
@ 2022-12-08  8:50             ` Dave Marchevsky
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-08  8:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/7/22 7:47 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 06:39:38PM -0500, Dave Marchevsky wrote:
>>>>
>>>> 0000000000000000 <less>:
>>>> ;       return node_a->key < node_b->key;
>>>>        0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
>>>>        1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
>>>>        2:       b4 00 00 00 01 00 00 00 w0 = 0x1
>>>> ;       return node_a->key < node_b->key;
>>>
>>> I see. That's the same bug.
>>> The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with 
>>> correct positive offset.
>>> Then node_a = container_of(a, struct node_data, node);
>>> would have produced correct offset into proper btf_id.
>>>
>>> The verifier should be passing into less() the btf_id
>>> of struct node_data instead of btf_id of struct bpf_rb_node.
>>>
>>
>> The verifier is already passing the struct node_data type, not bpf_rb_node.
>> For less() args, and rbtree_{first,remove} retval, mark_reg_datastructure_node
>> - added in patch 8 - is doing as you describe.
>>
>> Verifier sees less' arg regs as R=ptr_to_node_data(off=16). If it was
>> instead passing R=ptr_to_bpf_rb_node(off=0), attempting to access *(reg - 0x10)
>> would cause verifier err.
> 
> Ahh. I finally got it :)
> Please put these details in the commit log when you respin.
> 

Glad it finally started making sense.
Will do big improvement of patch summary after addressing other
feedback from this series.

>>>>        3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
>>>>        4:       b4 00 00 00 00 00 00 00 w0 = 0x0
>>>>
>>>> 0000000000000028 <LBB2_2>:
>>>> ;       return node_a->key < node_b->key;
>>>>        5:       95 00 00 00 00 00 00 00 exit
>>>>
>>>> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
>>>> negative insn->off. Verifier's view or R1 and R2 before insn 0 is
>>>> untrusted_ptr_node_data(off=16). If there were some intermediate insns
>>>> storing result of container_of() before dereferencing:
>>>>
>>>>   r3 = (r2 - 0x10)
>>>>   r2 = *(u64 *)(r3)
>>>>
>>>> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
>>>> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
>>>> using original arg ptrs to less() instead of storing container_of() ptr
>>>> adjustments.
>>>>
>>>> Since the container_of usage and code pattern in above example's less()
>>>> isn't particularly specific to this series, I think there are other scenarios
>>>> where such code would be generated and considered this a general bugfix in
>>>> cover letter.
>>>
>>> imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set.
>>>
>>
>> If I used PTR_TRUSTED here, the JITted instructions would still do a load like
>> r2 = *(u64 *)(r2 - 0x10). There would just be no BPF_PROBE_MEM runtime checking
>> insns generated, avoiding negative insn issue there. But the negative insn->off
>> load being generated is not specific to PTR_UNTRUSTED.
> 
> yep.
> 
>>>
>>> Exactly. More flags will only increase the confusion.
>>> Please try to make callback args as proper PTR_TRUSTED and disallow calling specific
>>> rbtree kfuncs while inside this particular callback to prevent recursion.
>>> That would solve all these issues, no?
>>> Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus.
>>>
>>> Consider less() receiving btf_id ptr_trusted of struct node_data and it contains
>>> both link list and rbtree.
>>> It should still be safe to operate on link list part of that node from less()
>>> though it's not something we would ever recommend.
>>
>> I definitely want to allow writes on non-owning references. In order to properly
>> support this, there needs to be a way to designate a field as a "key":
>>
>> struct node_data {
>>   long key __key;
>>   long data;
>>   struct bpf_rb_node node;
>> };
>>
>> or perhaps on the rb_root via __contains or separate tag:
>>
>> struct bpf_rb_root groot __contains(struct node_data, node, key);
>>
>> This is necessary because rbtree's less() uses key field to determine order, so
>> we don't want to allow write to the key field when the node is in a rbtree. If
>> such a write were possible the rbtree could easily be placed in an invalid state
>> since the new key may mean that the rbtree is no longer sorted. Subsequent add()
>> operations would compare less() using the new key, so other nodes will be placed
>> in wrong spot as well.
>>
>> Since PTR_UNTRUSTED currently allows read but not write, and prevents use of
>> non-owning ref as kfunc arg, it seemed to be reasonable tag for less() args.
>>
>> I was planning on adding __key / non-owning-ref write support as a followup, but
>> adding it as part of this series will probably save a lot of back-and-forth.
>> Will try to add it.
> 
> Just key mark might not be enough. less() could be doing all sort of complex
> logic on more than one field and even global fields.
> But what is the concern with writing into 'key' ?
> The rbtree will not be sorted. find/add operation will not be correct,
> but nothing will crash. At the end bpf_rb_root_free() will walk all
> unsorted nodes anyway and free them all.
> Even if we pass PTR_TRUSTED | MEM_RDONLY pointers into less() the less()
> can still do nonsensical things like returning random true/false.
> Doesn't look like an issue to me.

Agreed re: complex logic + global fields, less() being able to do nonsensical
things, and writing to key not crashing anything even if it breaks the tree.

OK, let's forget about __key. In next version of the series non-owning refs
will be write-able. Can add more protection in the future if it's deemed
necessary. Since this means non-owning refs won't be PTR_UNTRUSTED anymore,
I can split this patch out from the rest of the series after confirming that
it isn't necessary to ship rbtree.

Still want to convince you that the skipping of a check is correct before
I page out the details, but less urgent now. IIUC although the cause of the
issue is clear now, you'd still like me to clarify the details of solution.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08  8:28           ` Dave Marchevsky
@ 2022-12-08 12:57             ` Kumar Kartikeya Dwivedi
  2022-12-08 20:36               ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-12-08 12:57 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Alexei Starovoitov, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kernel Team, Tejun Heo

On Thu, Dec 08, 2022 at 01:58:44PM IST, Dave Marchevsky wrote:
> On 12/7/22 10:51 PM, Alexei Starovoitov wrote:
> > On Wed, Dec 07, 2022 at 08:18:25PM -0500, Dave Marchevsky wrote:
> >>
> >> Before replying to specific things in this email, I think it would be useful
> >> to have a subthread clearing up definitions and semantics, as I think we're
> >> talking past each other a bit.
> >
> > Yeah. We were not on the same page.
> > The concepts of 'owning ref' and 'non-owning ref' appeared 'new' to me.
> > I remember discussing 'conditional release' and OBJ_NON_OWNING_REF long ago
> > and I thought we agreed that both are not necessary and with that
> > I assumed that anything 'non-owning' as a concept is gone too.
> > So the only thing left (in my mind) was the 'owning' concept.
> > Which I mapped as ref_obj_id > 0. In other words 'owning' meant 'acquired'.
> >
>
> Whereas in my mind the release_on_unlock logic was specifically added to
> implement the mass invalidation part of non-owning reference semantics, and it
> being accepted implied that we weren't getting rid of the concept :).
>
> > Please have this detailed explanation in the commit log next time to
> > avoid this back and forth.
> > Now to the fun part...
> >
>
> I will add a documentation commit explaining 'owning' and 'non-owning' ref
> as they pertain to these datastructures, after we agree about the semantics.
>
> Speaking of which, although I have a few questions / clarifications, I think
> we're more in agreement after your reply. After one more round of clarification
> I will summarize conclusions to see if we agree on enough to move forward.
>
> >>
> >> On a conceptual level I've still been using "owning reference" and "non-owning
> >> reference" to understand rbtree operations. I'll use those here and try to map
> >> them to actual verifier concepts later.
> >>
> >> owning reference
> >>
> >>   * This reference controls the lifetime of the pointee
> >>   * Ownership of pointee must be 'released' by passing it to some rbtree
> >>     API kfunc - rbtree_add in our case -  or via bpf_obj_drop, which free's
> >>     * If not released before program ends, verifier considers prog invalid
> >>   * Access to the memory ref is pointing at will not page fault
> >
> > agree.
> >
> >> non-owning reference
> >>
> >>   * No ownership of pointee so can't pass ownership via rbtree_add, not allowed
> >>     to bpf_obj_drop
> >>   * No control of lifetime, but can infer memory safety based on context
> >>     (see explanation below)
> >>   * Access to the memory ref is pointing at will not page fault
> >>     (see explanation below)
> >
> > agree with addition that both read and write should be allowed into this
> > 'non-owning' ptr.
> > Which breaks if you map this to something that ORs with PTR_UNTRUSTED.
> >
>
> Agree re: read/write allowed. PTR_UNTRUSTED was an implementation detail.
> Sounds like we agree on general purpose of owning, non-owning. Looks like
> we're in agreement about above semantics.
>

Yes, PTR_UNTRUSTED is not appropriate for this. My opposition was also more to
the idea of mapping PTR_UNTRUSTED to non-owning references.
If we do PTR_TO_BTF_ID | MEM_ALLOC for them with ref_obj_id == 0, it SGTM.

> >> 2) From verifier's perspective non-owning references can only exist
> >> between spin_lock and spin_unlock. Why? After spin_unlock another program
> >> can do arbitrary operations on the rbtree like removing and free-ing
> >> via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
> >> free'd, and reused via bpf_obj_new would point to an entirely different thing.
> >> Or the memory could go away.
> >
> > agree that spin_unlock needs to clean up 'non-owning'.
>
> Another point of agreement.
>

+1

> >
> >> To prevent this logic violation all non-owning references are invalidated by
> >> verifier after critical section ends. This is necessary to ensure "will
> >> not page fault" property of non-owning reference. So if verifier hasn't
> >> invalidated a non-owning ref, accessing it will not page fault.
> >>
> >> Currently bpf_obj_drop is not allowed in the critical section, so similarly,
> >> if there's a valid non-owning ref, we must be in critical section, and can
> >> conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
> >> and-reused.
> >
> > I don't understand why is that a problem.
> >
> >> 1) Any reference to a node that is in a rbtree _must_ be non-owning, since
> >> the tree has control of pointee lifetime. Similarly, any ref to a node
> >> that isn't in rbtree _must_ be owning. (let's ignore raw read from kptr_xchg'd
> >> node in map_val for now)

The last case is going to be marked PTR_UNTRUSTED.

> >
> > Also not clear why such restriction is necessary.
> >
>
> If we have this restriction and bpf_rbtree_release also mass invalidates
> non-owning refs, the type system will ensure that only nodes that are in a tree
> will be passed to bpf_rbtree_release, and we can avoid the runtime check.
>

I like this property. This was also how I proposed implementing it for lists.
e.g. Any bpf_list_del would invalidate the result of prior bpf_list_first_entry
and bpf_list_last_entry to ensure safety.

It's a bit similar to aliasing XOR mutability guarantees that Rust has. We're
trying to implement a simple borrow checking mechanism.

Once the collection is mutated, any prior non-owning references become
invalidated. It can be further refined (e.g. bpf_rbtree_add won't do
invalidation on mutation) based on the properties of the data structure.

> But below you mention preferring the runtime check, mostly noting here to
> refer back when continuing reply below.
>
> >> Moving on to rbtree API:
> >>
> >> bpf_rbtree_add(&tree, &node);
> >>   'node' is an owning ref, becomes a non-owning ref.
> >>
> >> bpf_rbtree_first(&tree);
> >>   retval is a non-owning ref, since first() node is still in tree
> >>
> >> bpf_rbtree_remove(&tree, &node);
> >>   'node' is a non-owning ref, retval is an owning ref
> >
> > agree on the above definition.
> > >> All of the above can only be called when rbtree's lock is held, so invalidation
> >> of all non-owning refs on spin_unlock is fine for rbtree_remove.
> >>
> >> Nice property of paragraph marked with 1) above is the ability to use the
> >> type system to prevent rbtree_add of node that's already in rbtree and
> >> rbtree_remove of node that's not in one. So we can forego runtime
> >> checking of "already in tree", "already not in tree".
> >
> > I think it's easier to add runtime check inside bpf_rbtree_remove()
> > since it already returns MAYBE_NULL. No 'conditional release' necessary.
> > And with that we don't need to worry about aliases.
> >
>
> To clarify: You're proposing that we don't worry about solving the aliasing
> problem at verification time. Instead rbtree_{add,remove} will deal with it
> at runtime. Corollary of this is that my restriction tagged 1) above ("ref
> to node in tree _must_ be non-owning, to node not in tree must be owning")
> isn't something we're guaranteeing, due to possibility of aliasing.
>
> So bpf_rbtree_remove might get a node that's not in tree, and
> bpf_rbtree_add might get a node that's already in tree. Runtime behavior
> of both should be 'nop'.
>
>
> If that is an accurate restatement of your proposal, the verifier
> logic will need to be changed:
>
> For bpf_rbtree_remove(&tree, &node), if node is already not in a tree,
> retval will be NULL, effectively not acquiring an owning ref due to
> mark_ptr_or_null_reg's logic.
>
> In this case, do we want to invalidate
> arg 'node' as well? Or just leave it as a non-owning ref that points
> to node not in tree? I think the latter requires fewer verifier changes,
> but can see the argument for the former if we want restriction 1) to
> mostly be true, unless aliasing.
>
> The above scenario is the only case where bpf_rbtree_remove fails and
> returns NULL.
>
> (In this series it can fail and RET_NULL for this reason, but my earlier comment
> about type system + invalidate all-non owning after remove as discussed below
> was my original intent. So I shouldn't have been allowing RET_NULL for my
> version of these semantics.)
>

I agree with Dave to rely on the invariant that non-owning refs to nodes are
part of the collection. Then bpf_rbtree_remove is simply KF_ACQUIRE.

>
> For bpf_rbtree_add(&tree, &node, less), if arg is already in tree, then
> 'node' isn't really an owning ref, and we need to tag it as non-owning,
> and program then won't need to bpf_obj_drop it before exiting. If node
> wasn't already in tree and rbtree_add actually added it, 'node' would
> also be tagged as non-owning, since tree now owns it.
>
> Do we need some way to indicate whether 'already in tree' case happened?
> If so, would need to change retval from void to bool or struct bpf_rb_node *.
>
> Above scenario is only case where bpf_rbtree_add fails and returns
> NULL / false.
>

Why should we allow node that is not acquired to be passed to bpf_rbtree_add?

> >> But, as you and Kumar talked about in the past and referenced in patch 1's
> >> thread, non-owning refs may alias each other, or an owning ref, and have no
> >> way of knowing whether this is the case. So if X and Y are two non-owning refs
> >> that alias each other, and bpf_rbtree_remove(tree, X) is called, a subsequent
> >> call to bpf_rbtree_remove(tree, Y) would be removing node from tree which
> >> already isn't in any tree (since prog has an owning ref to it). But verifier
> >> doesn't know X and Y alias each other. So previous paragraph's "forego
> >> runtime checks" statement can only hold if we invalidate all non-owning refs
> >> after 'destructive' rbtree_remove operation.
> >
> > right. we either invalidate all non-owning after bpf_rbtree_remove
> > or do run-time check in bpf_rbtree_remove.
> > Consider the following:
> > bpf_spin_lock
> > n = bpf_rbtree_first(root);
> > m = bpf_rbtree_first(root);
> > x = bpf_rbtree_remove(root, n)
> > y = bpf_rbtree_remove(root, m)
> > bpf_spin_unlock
> > if (x)
> >    bpf_obj_drop(x)
> > if (y)
> >    bpf_obj_drop(y)
> >
> > If we invalidate after bpf_rbtree_remove() the above will be rejected by the verifier.
> > If we do run-time check the above will be accepted and will work without crashing.
> >
>
> Agreed, although the above example's invalid double-remove of same node is
> the kind of thing I'd like to be prevented at verification time instead of
> runtime. Regardless, continuing with your runtime check idea.
>

I agree with Dave, it seems better to invalidate non-owning refs after first
remove rather than allowing this to work.

> > The problem with release_on_unlock is that it marks 'n' after 1st remove
> > as UNTRUSTED which means 'no write' and 'read via probe_read'.
> > That's not good imo.
> >
>
> Based on your response to paragraph below this one, I think we're in agreement
> that using PTR_UNTRUSTED for non-owning ref gives non-owning ref bunch of traits
> it doesn't need, when I just wanted "can't pass ownership". So agreed that
> PTR_UNTRUSTED is too blunt an instrument here.
>

I think this is the part of the confusion which has left me wondering so far.
The discussion in this thread is making things more clear.

PTR_UNTRUSTED was never meant to be the kind of non-owning reference you want to
be returned from bpf_rbtree_first. PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id == 0
is the right choice.

> Regarding "marks 'n' after 1st remove", the series isn't currently doing this,
> I proposed it as a way to prevent aliasing problem, but I think your proposal
> is explicitly not trying to prevent aliasing problem at verification time. So
> for your semantics we would only have non-owning cleanup after spin_unlock.
> And such cleanup might just mark refs PTR_UNTRUSTED instead of invalidating
> entirely.
>

I would prefer proper invalidation using mark_reg_unknown.

> >>
> >> It doesn't matter to me which combination of type flags, ref_obj_id, other
> >> reg state stuff, and special-casing is used to implement owning and non-owning
> >> refs. Specific ones chosen in this series for rbtree node:
> >>
> >> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
> >>             ref_obj_id > 0
> >>
> >> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ type that contains bpf_rb_node)
> >>                 PTR_UNTRUSTED
> >>                   - used for "can't pass ownership", not PROBE_MEM
> >>                   - this is why I mentioned "decomposing UNTRUSTED into more
> >>                     granular reg traits" in another thread
> >
> > Now I undestand, but that was very hard to grasp.
> > UNTRUSTED means 'no write' and 'read via probe_read'.
> > ref_set_release_on_unlock() also keeps ref_obj_id > 0 as you're correctly
> > pointing out below:
> >>                 ref_obj_id > 0
> >>                 release_on_unlock = true
> >>                   - used due to paragraphs starting with 2) above
> >
> > but the problem with ref_set_release_on_unlock() that it mixes real ref-d
> > pointers with ref_obj_id > 0 with UNTRUSTED && ref_obj_id > 0.
> > And the latter is a quite confusing combination in my mind,
> > since we consider everything with ref_obj_id > 0 as good for KF_TRUSTED_ARGS.
> >
>
> I think I understand your desire to get rid of release_on_unlock now. It's not
> due to disliking the concept of "clean up non-owning refs after spin_unlock",
> which you earlier agreed was necessary, but rather the specifics of
> release_on_unlock mechanism used to achieve this.
>
> If so, I think I agree with your reasoning for why the mechanism is bad in
> light of how you want owning/non-owning implemented. To summarize your
> statements about release_on_unlock mechanism from the rest of your reply:
>
>   * 'ref_obj_id > 0' already has a specific meaning wrt. is_trusted_reg,
>     and we may want to support both TRUSTED and UNTRUSTED non-owning refs
>
>     * My comment: Currently is_trusted_reg is only used for
>       KF_ARG_PTR_TO_BTF_ID, while rbtree and list types are assigned special
>       KF_ARGs. So hypothetically could have different 'is_trusted_reg' logic.
>       I don't actually think that's a good idea, though, especially since
>       rbtree / list types are really specializations of PTR_TO_BTF_ID anyways.
>       So agreed.
>
>   * Instead of using 'acquire' and (modified) 'release', we can achieve
>     "clean-up non-owning after spin_unlock" by associating non-owning
>     refs with active_lock.id when they're created. We can store this in
>     reg.id, which is currently unused for PTR_TO_BTF_ID (afaict).
>

I don't mind using active_lock.id for invalidation, but using reg->id to
associate it with reg is a bad idea IMO, it's already preserved and set when the
object has bpf_spin_lock in it, and it's going to allow doing bpf_spin_unlock
with that non-owing ref if it has a spin lock, essentially unlocking different
spin lock if the reg->btf of already locked spin lock reg is same due to same
active_lock.id.

Even if you prevent it somehow it's more confusing to overload reg->id again for
this purpose.

It makes more sense to introduce a new nonref_obj_id instead dedicated for this
purpose, to associate it back to the reg->id of the collection it is coming from.

Also, there are two cases of invalidation, one is on remove from rbtree, which
should only invalidate non-owning references into the rbtree, and one is on
unlock, which should invalidate all non-owning references.

bpf_rbtree_remove shouldn't invalidate non-owning into list protected by same
lock, but unlocking should do it for both rbtree and list non-owning refs it is
protecting.

So it seems you will have to maintain two IDs for non-owning referneces, one for
the collection it comes from, and one for the lock region it is obtained in.

>     * This will solve issue raised by previous point, allowing us to have
>       non-owning refs which are truly 'untrusted' according to is_trusted_reg.
>
>     * My comment: This all sounds reasonable. On spin_unlock we have
>       active_lock.id, so can do bpf_for_each_reg_in_vstate to look for
>       PTR_TO_BTF_IDs matching the id and do 'cleanup' for them.
>
> >> Any other combination of type and reg state that gives me the semantics def'd
> >> above works4me.
> >>
> >>
> >> Based on this reply and others from today, I think you're saying that these
> >> concepts should be implemented using:
> >>
> >> owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
> >>             PTR_TRUSTED
> >>             ref_obj_id > 0
> >
> > Almost.
> > I propose:
> > PTR_TO_BTF_ID | MEM_ALLOC  && ref_obj_id > 0
> >
> > See the definition of is_trusted_reg().
> > It's ref_obj_id > 0 || flag == (MEM_ALLOC | PTR_TRUSTED)
> >
> > I was saying 'trusted' because of is_trusted_reg() definition.
> > Sorry for confusion.
> >
>
> I see. Sounds reasonable.
>
> >> non-owning ref: PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
> >>                 PTR_TRUSTED
> >>                 ref_obj_id == 0
> >>                  - used for "can't pass ownership", since funcs that expect
> >>                    owning ref need ref_obj_id > 0
> >
> > I propose:
> > PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> >
>
> Also sounds reasonable, perhaps with the addition of id > 0 to account for
> your desired changes to release_on_unlock mechanism?
>
> > Both 'owning' and 'non-owning' will fit for KF_TRUSTED_ARGS kfuncs.
> >
> > And we will be able to pass 'non-owning' under spin_lock into other kfuncs
> > and owning outside of spin_lock into other kfuncs.
> > Which is a good thing.
> >
>
> Allowing passing of owning ref outside of spin_lock sounds reasonable to me.
> 'non-owning' under spinlock will have the same "what if this touches __key"
> issue I brought up in another thread. But you mentioned not preventing that
> and I don't necessarily disagree, so just noting here.
>

Yeah, I agree with Alexei that writing to key is a non-issue. 'Less' cb may not
actually do the correct thing at all, so in that sense writing to key is a small
issue. In any case violating the 'sorted' property is not something we should be
trying to prevent.

> >> And you're also adding 'untrusted' here, mainly as a result of
> >> bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
> >> instead of becoming a non-owning ref. 'untrusted' would have state like:
> >>
> >> PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
> >> PTR_UNTRUSTED
> >> ref_obj_id == 0?
> >
> > I'm not sure whether we really need full untrusted after going through bpf_rbtree_add()
> > or doing 'non-owning' is enough.
> > If it's full untrusted it will be:
> > PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
> >
>
> Yeah, I don't see what this "full untrusted" is giving us either. Let's have
> "cleanup non-owning refs on spin_unlock" just invalidate the regs for now,
> instead of converting to "full untrusted"?
>

+1, I prefer invalidating completely on unlock.

> Adding "full untrusted" later won't make any valid programs written with
> "just invalidate the regs" in mind fail the verifier. So painless to add later.
>

+1

> > tbh I don't remember why we even have 'MEM_ALLOC | PTR_UNTRUSTED'.
> >

Eventually it will also be used for alloc obj kptr loaded from maps.

>
> I think such type combo was only added to implement non-owning refs. If it's
> rewritten to use your type combos I don't think there'll be any uses of
> MEM_ALLOC | PTR_UNTRUSTED remaining.
>

To be clear I was not intending to use PTR_UNTRUSTED to do such non-owning refs.

> >> I think your "non-owning ref" definition also differs from mine, specifically
> >> yours doesn't seem to have "will not page fault". For this reason, you don't
> >> see the need for release_on_unlock logic, since that's used to prevent refs
> >> escaping critical section and potentially referring to free'd memory.
> >
> > Not quite.
> > We should be able to read/write directly through
> > PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> > and we need to convert it to __mark_reg_unknown() after bpf_spin_unlock
> > the way release_reference() is doing.
> > I'm just not happy with using acquire_reference/release_reference() logic
> > (as release_on_unlock is doing) for cleaning after unlock.
> > Since we need to clean 'non-owning' ptrs in unlock it's confusing
> > to call the process 'release'.
> > I was hoping we can search through all states and __mark_reg_unknown() (or UNTRUSTED)
> > every reg where
> > reg->id == cur_state->active_lock.id &&
> > flag == PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> >
> > By deleting relase_on_unlock I meant delete release_on_unlock flag
> > and remove ref_set_release_on_unlock.
> >
>
> Summarized above, but: agreed, and thanks for clarifying what you meant by
> "delete release_on_unlock".
>
> >> This is where I start to get confused. Some questions:
> >>
> >>   * If we get rid of release_on_unlock, and with mass invalidation of
> >>     non-owning refs entirely, shouldn't non-owning refs be marked PTR_UNTRUSTED?
> >
> > Since we'll be cleaning all
> > PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0
> > it shouldn't affect ptrs with ref_obj_id > 0 that came from bpf_obj_new.
> >
> > The verifier already enforces that bpf_spin_unlock will be present
> > at the right place in bpf prog.
> > When the verifier sees it it will clean all non-owning refs with this spinlock 'id'.
> > So no concerns of leaking 'non-owning' outside.
> >
>
> Sounds like we don't want "full untrusted" or any PTR_UNTRUSTED non-owning ref.
>
> > While processing bpf_rbtree_first we need to:
> > regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> > regs[BPF_REG_0].id = active_lock.id;
> > regs[BPF_REG_0].ref_obj_id = 0;
> >
>
> Agreed.
>

I'm a bit concerned about putting active_lock.id in reg->id. Don't object to the
idea but the implementation, since we take PTR_TO_BTF_ID | MEM_ALLOC in
bpf_spin_lock/bpf_spin_unlock. It will lead to confusion. Currently this exact
reg->type never has reg->ref_obj_id as 0. Maybe that needs to be checked for
those helper calls.

Just thinking out loud, maybe it's fine but we need to be careful, reg->id
changes meaning with ref_obj_id == 0.

> >>   * Since refs can alias each other, how to deal with bpf_obj_drop-and-reuse
> >>     in this scheme, since non-owning ref can escape spin_unlock b/c no mass
> >>     invalidation? PTR_UNTRUSTED isn't sufficient here
> >
> > run-time check in bpf_rbtree_remove (and in the future bpf_list_remove)
> > should address it, no?
> >
>
> If we don't do "full untrusted" and cleanup non-owning refs by invalidating,
> _and_ don't allow bpf_obj_{new,drop} in critical section, then I don't think
> this is an issue.
>

bpf_obj_drop if/when enabled can also do invalidation. But let's table that
discussion until we introduce it. We most likely may not need it inside the CS.

> But to elaborate on the issue, if we instead cleaned up non-owning by marking
> untrusted:
>
> struct node_data *n = bpf_obj_new(typeof(*n));
> struct node_data *m, *o;
> struct some_other_type *t;
>
> bpf_spin_lock(&lock);
>
> bpf_rbtree_add(&tree, n);
> m = bpf_rbtree_first();
> o = bpf_rbtree_first(); // m and o are non-owning, point to same node
>
> m = bpf_rbtree_remove(&tree, m); // m is owning
>
> bpf_spin_unlock(&lock); // o is "full untrusted", marked PTR_UNTRUSTED
>
> bpf_obj_drop(m);
> t = bpf_obj_new(typeof(*t)); // pretend that exact chunk of memory that was
>                              // dropped in previous statement is returned here
>
> data = o->some_data_field;   // PROBE_MEM, but no page fault, so load will
>                              // succeed, but will read garbage from another type
>                              // while verifier thinks it's reading from node_data
>
>
> If we clean up by invalidating, but eventually enable bpf_obj_{new,drop} inside
> critical section, we'll have similar issue.
>
> It's not necessarily "crash the kernel" dangerous, but it may anger program
> writers since they can't be sure they're not reading garbage in this scenario.
>

I think it's better to clean by invalidating. We have better tools to form
untrusted pointers (like bpf_rdonly_cast) now if the BPF program writer needs
such an escape hatch for some reason. It's also easier to review where an
untrusted pointer is being used in a program, and has zero cost at runtime.

> >>   * If non-owning ref can live past spin_unlock, do we expect read from
> >>     such ref after _unlock to go through bpf_probe_read()? Otherwise direct
> >>     read might fault and silently write 0.
> >
> > unlock has to clean them.
> >
>
> Ack.
>
> >>   * For your 'untrusted', but not non-owning ref concept, I'm not sure
> >>     what this gives us that's better than just invalidating the ref which
> >>     gets in this state (rbtree_{add,remove} 'node' arg, bpf_obj_drop node)
> >
> > Whether to mark unknown or untrusted or non-owning after bpf_rbtree_add() is a difficult one.
> > Untrusted will allow prog to do read only access (via probe_read) into the node
> > but might hide bugs.
> > The cleanup after bpf_spin_unlock of non-owning and clean up after
> > bpf_rbtree_add() does not have to be the same.
>
> This is a good point.
>

So far I'm leaning towards:

bpf_rbtree_add(node) : node becomes non-owned ref
bpf_spin_unlock(lock) : node is invalidated

> > Currently I'm leaning towards PTR_UNTRUSTED for cleanup after bpf_spin_unlock
> > and non-owning after bpf_rbtree_add.
> >
> > Walking the example from previous email:
> >
> > struct bpf_rbtree_iter it;
> > struct bpf_rb_node * node;
> > struct bpf_rb_node *n, *m;
> >
> > bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree works as bpf_spin_lock
> > while ((node = bpf_rbtree_iter_next(&it)) {
> >   // node -> PTR_TO_BTF_ID | MEM_ALLOC | MAYBE_NULL && ref_obj_id == 0
> >   if (node && node->field == condition) {
> >
> >     n = bpf_rbtree_remove(rb_root, node);
> >     if (!n) ...;
> >     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == X
> >     m = bpf_rbtree_remove(rb_root, node); // ok, but fails in run-time
> >     if (!m) ...;
> >     // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
> >

This second remove I would simply disallow as Dave is suggesting during
verification, by invalidating non-owning refs for rb_root.

> >     // node is still:
> >     // node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id
> >
> >     // assume we allow double locks one day
> >     bpf_spin_lock(another_rb_root);
> >     bpf_rbtree_add(another_rb_root, n);
> >     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[1].id
> >     bpf_spin_unlock(another_rb_root);
> >     // n -> PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
> >     break;
> >   }
> > }
> > // node -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == 0 && id == active_lock[0].id
> > bpf_rbtree_iter_destroy(&it); // does unlock
> > // node -> PTR_TO_BTF_ID | PTR_UNTRUSTED
> > // n -> PTR_TO_BTF_ID | PTR_UNTRUSTED
> > // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
> > bpf_obj_drop(m);
>
> This seems like a departure from other statements in your reply, where you're
> leaning towards "non-owning and trusted" -> "full untrusted" after unlock
> being unnecessary. I think the combo of reference aliases + bpf_obj_drop-and-
> reuse make everything hard to reason about.
>
> Regardless, your comments annotating reg state look correct to me.

I think it's much more clear in this thread wrt what you wanted to do. It would
be good after the thread concludes to eventually summarize how you're going to
finally implement all this before respinning.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08 12:57             ` Kumar Kartikeya Dwivedi
@ 2022-12-08 20:36               ` Alexei Starovoitov
  2022-12-08 23:35                 ` Dave Marchevsky
  0 siblings, 1 reply; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-08 20:36 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kernel Team, Tejun Heo

On Thu, Dec 08, 2022 at 06:27:29PM +0530, Kumar Kartikeya Dwivedi wrote:
> 
> I don't mind using active_lock.id for invalidation, but using reg->id to
> associate it with reg is a bad idea IMO, it's already preserved and set when the
> object has bpf_spin_lock in it, and it's going to allow doing bpf_spin_unlock
> with that non-owing ref if it has a spin lock, essentially unlocking different
> spin lock if the reg->btf of already locked spin lock reg is same due to same
> active_lock.id.

Right. Overwriting reg->id was a bad idea.

> Even if you prevent it somehow it's more confusing to overload reg->id again for
> this purpose.
> 
> It makes more sense to introduce a new nonref_obj_id instead dedicated for this
> purpose, to associate it back to the reg->id of the collection it is coming from.

nonref_obj_id name sounds too generic and I'm not sure that it shouldn't be
connected to reg->id the way we do it for ref_obj_id.

> Also, there are two cases of invalidation, one is on remove from rbtree, which
> should only invalidate non-owning references into the rbtree, and one is on
> unlock, which should invalidate all non-owning references.

Two cases only if we're going to do invalidation on rbtree_remove.

> bpf_rbtree_remove shouldn't invalidate non-owning into list protected by same
> lock, but unlocking should do it for both rbtree and list non-owning refs it is
> protecting.
> 
> So it seems you will have to maintain two IDs for non-owning referneces, one for
> the collection it comes from, and one for the lock region it is obtained in.

Right. Like this ?
collection_id = rbroot->reg->id; // to track the collection it came from
active_lock_id = cur_state->active_lock.id // to track the lock region

but before we proceed let me demonstrate an example where
cleanup on rbtree_remove is not user friendly:

bpf_spin_lock
x = bpf_list_first(); if (!x) ..
y = bpf_list_last(); if (!y) ..

n = bpf_list_remove(x); if (!n) ..

bpf_list_add_after(n, y); // we should allow this
bpf_spin_unlock

We don't have such apis right now.
The point here that cleanup after bpf_list_remove/bpf_rbtree_remove will destroy
all regs that point somewhere in the collection.
This way we save run-time check in bpf_rbtree_remove, but sacrificing usability.

x and y could be pointing to the same thing.
In such case bpf_list_add_after() should fail in runtime after discovering
that 'y' is unlinked.

Similarly with bpf_rbtree_add().
Currently it cannot fail. It takes owning ref and will release it.
We can mark it as KF_RELEASE and no extra verifier changes necessary.

But in the future we might have failing add/insert operations on lists and rbtree.
If they're failing we'd need to struggle with 'conditional release' verifier additions,
the bpf prog would need to check return value, etc.

I think we better deal with it in run-time.
The verifier could supply bpf_list_add_after() with two hidden args:
- container_of offset (delta between rb_node and begining of prog's struct)
- struct btf_struct_meta *meta
Then inside bpf_list_add_after or any failing KF_RELEASE kfunc
it can call bpf_obj_drop_impl() that element.
Then from the verifier pov the KF_RELEASE function did the release
and 'owning ref' became 'non-owning ref'.

> > >> And you're also adding 'untrusted' here, mainly as a result of
> > >> bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
> > >> instead of becoming a non-owning ref. 'untrusted' would have state like:
> > >>
> > >> PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
> > >> PTR_UNTRUSTED
> > >> ref_obj_id == 0?
> > >
> > > I'm not sure whether we really need full untrusted after going through bpf_rbtree_add()
> > > or doing 'non-owning' is enough.
> > > If it's full untrusted it will be:
> > > PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
> > >
> >
> > Yeah, I don't see what this "full untrusted" is giving us either. Let's have
> > "cleanup non-owning refs on spin_unlock" just invalidate the regs for now,
> > instead of converting to "full untrusted"?
> >
> 
> +1, I prefer invalidating completely on unlock.

fine by me.

> 
> I think it's better to clean by invalidating. We have better tools to form
> untrusted pointers (like bpf_rdonly_cast) now if the BPF program writer needs
> such an escape hatch for some reason. It's also easier to review where an
> untrusted pointer is being used in a program, and has zero cost at runtime.

ok. Since it's more strict we can relax to untrusted later if necessary.

> So far I'm leaning towards:
> 
> bpf_rbtree_add(node) : node becomes non-owned ref
> bpf_spin_unlock(lock) : node is invalidated

ok

> > > Currently I'm leaning towards PTR_UNTRUSTED for cleanup after bpf_spin_unlock
> > > and non-owning after bpf_rbtree_add.
> > >
> > > Walking the example from previous email:
> > >
> > > struct bpf_rbtree_iter it;
> > > struct bpf_rb_node * node;
> > > struct bpf_rb_node *n, *m;
> > >
> > > bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree works as bpf_spin_lock
> > > while ((node = bpf_rbtree_iter_next(&it)) {
> > >   // node -> PTR_TO_BTF_ID | MEM_ALLOC | MAYBE_NULL && ref_obj_id == 0
> > >   if (node && node->field == condition) {
> > >
> > >     n = bpf_rbtree_remove(rb_root, node);
> > >     if (!n) ...;
> > >     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == X
> > >     m = bpf_rbtree_remove(rb_root, node); // ok, but fails in run-time
> > >     if (!m) ...;
> > >     // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
> > >
> 
> This second remove I would simply disallow as Dave is suggesting during
> verification, by invalidating non-owning refs for rb_root.

Looks like cleanup from non-owning to untrusted|unknown on bpf_rbtree_remove is our
only remaining disagreement.
I feel run-time checks will be fast enough and will improve usabililty.

Also it feels that not doing cleanup on rbtree_remove is simpler to
implement and reason about.

Here is the proposal with one new field 'active_lock_id':

first = bpf_rbtree_first(root) KF_RET_NULL
  check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
  R0 = PTR_TO_BTF_ID|MEM_ALLOC|PTR_MAYBE_NULL ref_obj_id = 0;
  R0->active_lock_id = root->reg->id
  R0->id = ++env->id_gen; which will be cleared after !NULL check inside prog.

same way we can add rb_find, rb_find_first,
but not rb_next, rb_prev, since they don't have 'root' argument.

bpf_rbtree_add(root, node, cb); KF_RELEASE.
  needs to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id > 0
  check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
  calls release_reference(node->ref_obj_id)
  converts 'node' to PTR_TO_BTF_ID|MEM_ALLOC ref_obj_id = 0;
  node->active_lock_id = root->reg->id

'node' is equivalent to 'first'. They both point to some element
inside rbtree and valid inside spin_locked region.
It's ok to read|write to both under lock.

removed_node = bpf_rbtree_remove(root, node); KF_ACQUIRE|KF_RET_NULL
  need to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id = 0; and 
  usual check_reg_allocation_locked(root)
  R0 = PTR_TO_BTF_ID|MEM_ALLOC|MAYBE_NULL
  R0->ref_obj_id = R0->id = acquire_reference_state();
  R0->active_lock_id should stay 0
  mark_reg_unknown(node)

bpf_spin_unlock(lock);
  checks lock->id == cur->active_lock.id
  for all regs in state 
    if (reg->active_lock_id == lock->id)
       mark_reg_unknown(reg)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08 20:36               ` Alexei Starovoitov
@ 2022-12-08 23:35                 ` Dave Marchevsky
  2022-12-09  0:39                   ` Alexei Starovoitov
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-08 23:35 UTC (permalink / raw)
  To: Alexei Starovoitov, Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On 12/8/22 3:36 PM, Alexei Starovoitov wrote:
> On Thu, Dec 08, 2022 at 06:27:29PM +0530, Kumar Kartikeya Dwivedi wrote:
>>
>> I don't mind using active_lock.id for invalidation, but using reg->id to
>> associate it with reg is a bad idea IMO, it's already preserved and set when the
>> object has bpf_spin_lock in it, and it's going to allow doing bpf_spin_unlock
>> with that non-owing ref if it has a spin lock, essentially unlocking different
>> spin lock if the reg->btf of already locked spin lock reg is same due to same
>> active_lock.id.
> 
> Right. Overwriting reg->id was a bad idea.
> 
>> Even if you prevent it somehow it's more confusing to overload reg->id again for
>> this purpose.
>>
>> It makes more sense to introduce a new nonref_obj_id instead dedicated for this
>> purpose, to associate it back to the reg->id of the collection it is coming from.
> 
> nonref_obj_id name sounds too generic and I'm not sure that it shouldn't be
> connected to reg->id the way we do it for ref_obj_id.
> 
>> Also, there are two cases of invalidation, one is on remove from rbtree, which
>> should only invalidate non-owning references into the rbtree, and one is on
>> unlock, which should invalidate all non-owning references.
> 
> Two cases only if we're going to do invalidation on rbtree_remove.
> 
>> bpf_rbtree_remove shouldn't invalidate non-owning into list protected by same
>> lock, but unlocking should do it for both rbtree and list non-owning refs it is
>> protecting.
>>
>> So it seems you will have to maintain two IDs for non-owning referneces, one for
>> the collection it comes from, and one for the lock region it is obtained in.
> 
> Right. Like this ?
> collection_id = rbroot->reg->id; // to track the collection it came from
> active_lock_id = cur_state->active_lock.id // to track the lock region
> 
> but before we proceed let me demonstrate an example where
> cleanup on rbtree_remove is not user friendly:
> 
> bpf_spin_lock
> x = bpf_list_first(); if (!x) ..
> y = bpf_list_last(); if (!y) ..
> 
> n = bpf_list_remove(x); if (!n) ..
> 
> bpf_list_add_after(n, y); // we should allow this
> bpf_spin_unlock
> 
> We don't have such apis right now.
> The point here that cleanup after bpf_list_remove/bpf_rbtree_remove will destroy
> all regs that point somewhere in the collection.
> This way we save run-time check in bpf_rbtree_remove, but sacrificing usability.
> 
> x and y could be pointing to the same thing.
> In such case bpf_list_add_after() should fail in runtime after discovering
> that 'y' is unlinked.
> 
> Similarly with bpf_rbtree_add().
> Currently it cannot fail. It takes owning ref and will release it.
> We can mark it as KF_RELEASE and no extra verifier changes necessary.
> 
> But in the future we might have failing add/insert operations on lists and rbtree.
> If they're failing we'd need to struggle with 'conditional release' verifier additions,
> the bpf prog would need to check return value, etc.
> 
> I think we better deal with it in run-time.
> The verifier could supply bpf_list_add_after() with two hidden args:
> - container_of offset (delta between rb_node and begining of prog's struct)
> - struct btf_struct_meta *meta
> Then inside bpf_list_add_after or any failing KF_RELEASE kfunc
> it can call bpf_obj_drop_impl() that element.
> Then from the verifier pov the KF_RELEASE function did the release
> and 'owning ref' became 'non-owning ref'.
> 
>>>>> And you're also adding 'untrusted' here, mainly as a result of
>>>>> bpf_rbtree_add(tree, node) - 'node' becoming untrusted after it's added,
>>>>> instead of becoming a non-owning ref. 'untrusted' would have state like:
>>>>>
>>>>> PTR_TO_BTF_ID | MEM_ALLOC (w/ rb_node type)
>>>>> PTR_UNTRUSTED
>>>>> ref_obj_id == 0?
>>>>
>>>> I'm not sure whether we really need full untrusted after going through bpf_rbtree_add()
>>>> or doing 'non-owning' is enough.
>>>> If it's full untrusted it will be:
>>>> PTR_TO_BTF_ID | PTR_UNTRUSTED && ref_obj_id == 0
>>>>
>>>
>>> Yeah, I don't see what this "full untrusted" is giving us either. Let's have
>>> "cleanup non-owning refs on spin_unlock" just invalidate the regs for now,
>>> instead of converting to "full untrusted"?
>>>
>>
>> +1, I prefer invalidating completely on unlock.
> 
> fine by me.
> 
>>
>> I think it's better to clean by invalidating. We have better tools to form
>> untrusted pointers (like bpf_rdonly_cast) now if the BPF program writer needs
>> such an escape hatch for some reason. It's also easier to review where an
>> untrusted pointer is being used in a program, and has zero cost at runtime.
> 
> ok. Since it's more strict we can relax to untrusted later if necessary.
> 
>> So far I'm leaning towards:
>>
>> bpf_rbtree_add(node) : node becomes non-owned ref
>> bpf_spin_unlock(lock) : node is invalidated
> 
> ok
> 
>>>> Currently I'm leaning towards PTR_UNTRUSTED for cleanup after bpf_spin_unlock
>>>> and non-owning after bpf_rbtree_add.
>>>>
>>>> Walking the example from previous email:
>>>>
>>>> struct bpf_rbtree_iter it;
>>>> struct bpf_rb_node * node;
>>>> struct bpf_rb_node *n, *m;
>>>>
>>>> bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree works as bpf_spin_lock
>>>> while ((node = bpf_rbtree_iter_next(&it)) {
>>>>   // node -> PTR_TO_BTF_ID | MEM_ALLOC | MAYBE_NULL && ref_obj_id == 0
>>>>   if (node && node->field == condition) {
>>>>
>>>>     n = bpf_rbtree_remove(rb_root, node);
>>>>     if (!n) ...;
>>>>     // n -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == X
>>>>     m = bpf_rbtree_remove(rb_root, node); // ok, but fails in run-time
>>>>     if (!m) ...;
>>>>     // m -> PTR_TO_BTF_ID | MEM_ALLOC && ref_obj_id == Y
>>>>
>>
>> This second remove I would simply disallow as Dave is suggesting during
>> verification, by invalidating non-owning refs for rb_root.
> 
> Looks like cleanup from non-owning to untrusted|unknown on bpf_rbtree_remove is our
> only remaining disagreement.
> I feel run-time checks will be fast enough and will improve usabililty.
> 
> Also it feels that not doing cleanup on rbtree_remove is simpler to
> implement and reason about.
> 
> Here is the proposal with one new field 'active_lock_id':
> 
> first = bpf_rbtree_first(root) KF_RET_NULL
>   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
>   R0 = PTR_TO_BTF_ID|MEM_ALLOC|PTR_MAYBE_NULL ref_obj_id = 0;
>   R0->active_lock_id = root->reg->id
>   R0->id = ++env->id_gen; which will be cleared after !NULL check inside prog.
> 
> same way we can add rb_find, rb_find_first,
> but not rb_next, rb_prev, since they don't have 'root' argument.
> 
> bpf_rbtree_add(root, node, cb); KF_RELEASE.
>   needs to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id > 0
>   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
>   calls release_reference(node->ref_obj_id)
>   converts 'node' to PTR_TO_BTF_ID|MEM_ALLOC ref_obj_id = 0;
>   node->active_lock_id = root->reg->id
> 
> 'node' is equivalent to 'first'. They both point to some element
> inside rbtree and valid inside spin_locked region.
> It's ok to read|write to both under lock.
> 
> removed_node = bpf_rbtree_remove(root, node); KF_ACQUIRE|KF_RET_NULL
>   need to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id = 0; and 
>   usual check_reg_allocation_locked(root)
>   R0 = PTR_TO_BTF_ID|MEM_ALLOC|MAYBE_NULL
>   R0->ref_obj_id = R0->id = acquire_reference_state();
>   R0->active_lock_id should stay 0
>   mark_reg_unknown(node)
> 
> bpf_spin_unlock(lock);
>   checks lock->id == cur->active_lock.id
>   for all regs in state 
>     if (reg->active_lock_id == lock->id)
>        mark_reg_unknown(reg)

OK, so sounds like a few more points of agreement, regardless of whether
we go the runtime checking route or the other one:

  * We're tossing 'full untrusted' for now. non-owning references will not be
    allowed to escape critical section. They'll be clobbered w/
    mark_reg_unknown.
    * No pressing need to make bpf_obj_drop callable from critical section.
      As a result no owning or non-owning ref access can page fault.

  * When spin_lock is unlocked, verifier needs to know about all non-owning
    references so that it can clobber them. Current implementation -
    ref_obj_id + release_on_unlock - is bad for a number of reasons, should
    be replaced with something that doesn't use ref_obj_id or reg->id.
    * Specific better approach was proposed above: new field + keep track
      of lock and datastructure identity.


Differences in proposed approaches:

"Type System checks + invalidation on 'destructive' rbtree ops"

  * This approach tries to prevent aliasing problems by invalidating
    non-owning refs after 'destructive' rbtree ops - like rbtree_remove -
    in addition to invalidation on spin_unlock

  * Type system guarantees invariants:
    * "if it's an owning ref, the node is guaranteed to not be in an rbtree"
    * "if it's a non-owning ref, the node is guaranteed to be in an rbtree"

  * Downside: mass non-owning ref invalidation on rbtree_remove will make some
    programs that logically don't have aliasing problem will be rejected by
    verifier. Will affect usability depending on how bad this is.


"Runtime checks + spin_unlock invalidation only"

  * This approach allows for the possibility of aliasing problem. As a result
    the invariants guaranteed in point 2 above don't necessarily hold.
    * Helpers that add or remove need to account for possibility that the node
      they're operating on has already been added / removed. Need to check this
      at runtime and nop if so.

  * non-owning refs are only invalidated on spin_unlock.
    * As a result, usability issues of previous approach don't happen here.

  * Downside: Need to do runtime checks, some additional verifier complexity
    to deal with "runtime check failed" case due to prev approach's invariant
    not holding

Conversion of non-owning refs to 'untrusted' at a invalidation point (unlock
or remove) can be added to either approach (maybe - at least it was specifically
discussed for "runtime checks"). Such untrusted refs, by virtue of being
PTR_UNTRUSTED, can fault, and aren't accepted by rbtree_{add, remove} as input.
For the "type system" approach this might ameliorate some of the usability
issues. For the "runtime checks" approach it would only be useful to let
such refs escape spin_unlock.

But we're not going to do non-owning -> 'untrusted' for now, just listing for
completeness.


The distance between what I have now and "type system" approach is smaller
than "runtime checks" approach. And to get from "type system" to "runtime
checks" I'd need to:

  * Remove 'destructive op' invalidation points
  * Add runtime checks to rbtree_{add,remove}
  * Add verifier handling of runtime check failure possibility

Of which only the first point is getting rid of something added for the
"type system" approach, and won't be much work relative to all the refactoring
and other improvements that are common between the two approaches.

So for V2 I will do the "type system + invalidation on 'destructive' ops"
approach as it'll take less time. This'll get eyes on common improvements
faster. Then can do a "runtime checks" v3 and we can compare usability of both
on same base.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure
  2022-12-08 23:35                 ` Dave Marchevsky
@ 2022-12-09  0:39                   ` Alexei Starovoitov
  0 siblings, 0 replies; 51+ messages in thread
From: Alexei Starovoitov @ 2022-12-09  0:39 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Kumar Kartikeya Dwivedi, Dave Marchevsky, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Thu, Dec 08, 2022 at 06:35:24PM -0500, Dave Marchevsky wrote:
> > 
> > Here is the proposal with one new field 'active_lock_id':
> > 
> > first = bpf_rbtree_first(root) KF_RET_NULL
> >   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
> >   R0 = PTR_TO_BTF_ID|MEM_ALLOC|PTR_MAYBE_NULL ref_obj_id = 0;
> >   R0->active_lock_id = root->reg->id
> >   R0->id = ++env->id_gen; which will be cleared after !NULL check inside prog.
> > 
> > same way we can add rb_find, rb_find_first,
> > but not rb_next, rb_prev, since they don't have 'root' argument.
> > 
> > bpf_rbtree_add(root, node, cb); KF_RELEASE.
> >   needs to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id > 0
> >   check_reg_allocation_locked() checks that root->reg->id == cur->active_lock.id
> >   calls release_reference(node->ref_obj_id)
> >   converts 'node' to PTR_TO_BTF_ID|MEM_ALLOC ref_obj_id = 0;
> >   node->active_lock_id = root->reg->id
> > 
> > 'node' is equivalent to 'first'. They both point to some element
> > inside rbtree and valid inside spin_locked region.
> > It's ok to read|write to both under lock.
> > 
> > removed_node = bpf_rbtree_remove(root, node); KF_ACQUIRE|KF_RET_NULL
> >   need to see PTR_TO_BTF_ID|MEM_ALLOC node->ref_obj_id = 0; and 
> >   usual check_reg_allocation_locked(root)
> >   R0 = PTR_TO_BTF_ID|MEM_ALLOC|MAYBE_NULL
> >   R0->ref_obj_id = R0->id = acquire_reference_state();
> >   R0->active_lock_id should stay 0
> >   mark_reg_unknown(node)
> > 
> > bpf_spin_unlock(lock);
> >   checks lock->id == cur->active_lock.id
> >   for all regs in state 
> >     if (reg->active_lock_id == lock->id)
> >        mark_reg_unknown(reg)
> 
> OK, so sounds like a few more points of agreement, regardless of whether
> we go the runtime checking route or the other one:
> 
>   * We're tossing 'full untrusted' for now. non-owning references will not be
>     allowed to escape critical section. They'll be clobbered w/
>     mark_reg_unknown.

agree

>     * No pressing need to make bpf_obj_drop callable from critical section.
>       As a result no owning or non-owning ref access can page fault.

agree

> 
>   * When spin_lock is unlocked, verifier needs to know about all non-owning
>     references so that it can clobber them. Current implementation -
>     ref_obj_id + release_on_unlock - is bad for a number of reasons, should
>     be replaced with something that doesn't use ref_obj_id or reg->id.
>     * Specific better approach was proposed above: new field + keep track
>       of lock and datastructure identity.

yes

> 
> Differences in proposed approaches:
> 
> "Type System checks + invalidation on 'destructive' rbtree ops"
> 
>   * This approach tries to prevent aliasing problems by invalidating
>     non-owning refs after 'destructive' rbtree ops - like rbtree_remove -
>     in addition to invalidation on spin_unlock
> 
>   * Type system guarantees invariants:
>     * "if it's an owning ref, the node is guaranteed to not be in an rbtree"
>     * "if it's a non-owning ref, the node is guaranteed to be in an rbtree"
> 
>   * Downside: mass non-owning ref invalidation on rbtree_remove will make some
>     programs that logically don't have aliasing problem will be rejected by
>     verifier. Will affect usability depending on how bad this is.

yes.

> 
> 
> "Runtime checks + spin_unlock invalidation only"
> 
>   * This approach allows for the possibility of aliasing problem. As a result
>     the invariants guaranteed in point 2 above don't necessarily hold.
>     * Helpers that add or remove need to account for possibility that the node
>       they're operating on has already been added / removed. Need to check this
>       at runtime and nop if so.

Only 'remove' needs to check.
'add' is operating on 'owning ref'. It cannot fail.
Some future 'add_here(root, owning_node_to_add, nonowning_location)'
may need to fail.

> 
>   * non-owning refs are only invalidated on spin_unlock.
>     * As a result, usability issues of previous approach don't happen here.
> 
>   * Downside: Need to do runtime checks, some additional verifier complexity
>     to deal with "runtime check failed" case due to prev approach's invariant
>     not holding
> 
> Conversion of non-owning refs to 'untrusted' at a invalidation point (unlock
> or remove) can be added to either approach (maybe - at least it was specifically
> discussed for "runtime checks"). Such untrusted refs, by virtue of being
> PTR_UNTRUSTED, can fault, and aren't accepted by rbtree_{add, remove} as input.

correct.

> For the "type system" approach this might ameliorate some of the usability
> issues. For the "runtime checks" approach it would only be useful to let
> such refs escape spin_unlock.

the prog can do bpf_rdonly_cast() even after mark_unknown.

> But we're not going to do non-owning -> 'untrusted' for now, just listing for
> completeness.

right, because of bpf_rdonly_cast availability.

> The distance between what I have now and "type system" approach is smaller
> than "runtime checks" approach. And to get from "type system" to "runtime
> checks" I'd need to:
> 
>   * Remove 'destructive op' invalidation points
>   * Add runtime checks to rbtree_{add,remove}
>   * Add verifier handling of runtime check failure possibility
> 
> Of which only the first point is getting rid of something added for the
> "type system" approach, and won't be much work relative to all the refactoring
> and other improvements that are common between the two approaches.
> 
> So for V2 I will do the "type system + invalidation on 'destructive' ops"
> approach as it'll take less time. This'll get eyes on common improvements
> faster. Then can do a "runtime checks" v3 and we can compare usability of both
> on same base.

Sure, if you think cleanup on rbtree_remove is faster to implement
then definitely go for it.
I was imagining the other way around, but it's fine. Happy to be wrong.
I'm not seeing though how you gonna do that cleanup.
Another id-like field?
Before doing all coding could you post a proposal in the format that I did above?
imo it's much easier to think through in that form instead of analyzing the src code.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic
  2022-12-07  2:01   ` Alexei Starovoitov
@ 2022-12-17  8:49     ` Dave Marchevsky
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:49 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/6/22 9:01 PM, Alexei Starovoitov wrote:
> On Tue, Dec 06, 2022 at 03:09:55PM -0800, Dave Marchevsky wrote:
>> Some BPF helpers take a callback function which the helper calls. For
>> each helper that takes such a callback, there's a special call to
>> __check_func_call with a callback-state-setting callback that sets up
>> verifier bpf_func_state for the callback's frame.
>>
>> kfuncs don't have any of this infrastructure yet, so let's add it in
>> this patch, following existing helper pattern as much as possible. To
>> validate functionality of this added plumbing, this patch adds
>> callback handling for the bpf_rbtree_add kfunc and hopes to lay
>> groundwork for future next-gen datastructure callbacks.
>>
>> In the "general plumbing" category we have:
>>
>>   * check_kfunc_call doing callback verification right before clearing
>>     CALLER_SAVED_REGS, exactly like check_helper_call
>>   * recognition of func_ptr BTF types in kfunc args as
>>     KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type
>>
>> In the "rbtree_add / next-gen datastructure-specific plumbing" category:
>>
>>   * Since bpf_rbtree_add must be called while the spin_lock associated
>>     with the tree is held, don't complain when callback's func_state
>>     doesn't unlock it by frame exit
>>   * Mark rbtree_add callback's args PTR_UNTRUSTED to prevent rbtree
>>     api functions from being called in the callback
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  kernel/bpf/verifier.c | 136 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 130 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 652112007b2c..9ad8c0b264dc 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1448,6 +1448,16 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
>>  	reg->type &= ~PTR_MAYBE_NULL;
>>  }
>>  
>> +static void mark_reg_datastructure_node(struct bpf_reg_state *regs, u32 regno,
>> +					struct btf_field_datastructure_head *ds_head)
>> +{
>> +	__mark_reg_known_zero(&regs[regno]);
>> +	regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC;
>> +	regs[regno].btf = ds_head->btf;
>> +	regs[regno].btf_id = ds_head->value_btf_id;
>> +	regs[regno].off = ds_head->node_offset;
>> +}
>> +
>>  static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)
>>  {
>>  	return type_is_pkt_pointer(reg->type);
>> @@ -4771,7 +4781,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>  			return -EACCES;
>>  		}
>>  
>> -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
>> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
>> +		    !cur_func(env)->in_callback_fn) {
>>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>>  			return -EFAULT;
>>  		}
>> @@ -6952,6 +6963,8 @@ static int set_callee_state(struct bpf_verifier_env *env,
>>  			    struct bpf_func_state *caller,
>>  			    struct bpf_func_state *callee, int insn_idx);
>>  
>> +static bool is_callback_calling_kfunc(u32 btf_id);
>> +
>>  static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  			     int *insn_idx, int subprog,
>>  			     set_callee_state_fn set_callee_state_cb)
>> @@ -7006,10 +7019,18 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>>  	 * interested in validating only BPF helpers that can call subprogs as
>>  	 * callbacks
>>  	 */
>> -	if (set_callee_state_cb != set_callee_state && !is_callback_calling_function(insn->imm)) {
>> -		verbose(env, "verifier bug: helper %s#%d is not marked as callback-calling\n",
>> -			func_id_name(insn->imm), insn->imm);
>> -		return -EFAULT;
>> +	if (set_callee_state_cb != set_callee_state) {
>> +		if (bpf_pseudo_kfunc_call(insn) &&
>> +		    !is_callback_calling_kfunc(insn->imm)) {
>> +			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
>> +				func_id_name(insn->imm), insn->imm);
>> +			return -EFAULT;
>> +		} else if (!bpf_pseudo_kfunc_call(insn) &&
>> +			   !is_callback_calling_function(insn->imm)) { /* helper */
>> +			verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n",
>> +				func_id_name(insn->imm), insn->imm);
>> +			return -EFAULT;
>> +		}
>>  	}
>>  
>>  	if (insn->code == (BPF_JMP | BPF_CALL) &&
>> @@ -7275,6 +7296,67 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
>>  	return 0;
>>  }
>>  
>> +static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
>> +					 struct bpf_func_state *caller,
>> +					 struct bpf_func_state *callee,
>> +					 int insn_idx)
>> +{
>> +	/* void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
>> +	 *                     bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b));
>> +	 *
>> +	 * 'struct bpf_rb_node *node' arg to bpf_rbtree_add is the same PTR_TO_BTF_ID w/ offset
>> +	 * that 'less' callback args will be receiving. However, 'node' arg was release_reference'd
>> +	 * by this point, so look at 'root'
>> +	 */
>> +	struct btf_field *field;
>> +	struct btf_record *rec;
>> +
>> +	rec = reg_btf_record(&caller->regs[BPF_REG_1]);
>> +	if (!rec)
>> +		return -EFAULT;
>> +
>> +	field = btf_record_find(rec, caller->regs[BPF_REG_1].off, BPF_RB_ROOT);
>> +	if (!field || !field->datastructure_head.value_btf_id)
>> +		return -EFAULT;
>> +
>> +	mark_reg_datastructure_node(callee->regs, BPF_REG_1, &field->datastructure_head);
>> +	callee->regs[BPF_REG_1].type |= PTR_UNTRUSTED;
>> +	mark_reg_datastructure_node(callee->regs, BPF_REG_2, &field->datastructure_head);
>> +	callee->regs[BPF_REG_2].type |= PTR_UNTRUSTED;
> 
> Please add a comment here to explain that the pointers are actually trusted
> and here it's a quick hack to prevent callback to call into rb_tree kfuncs.
> We definitely would need to clean it up.
> Have you tried to check for is_bpf_list_api_kfunc() || is_bpf_rbtree_api_kfunc()
> while processing kfuncs inside callback ?
> 
>> +	callee->in_callback_fn = true;
> 
> this will give you a flag to do that check.
> 
>> +	callee->callback_ret_range = tnum_range(0, 1);
>> +	return 0;
>> +}
>> +
>> +static bool is_rbtree_lock_required_kfunc(u32 btf_id);
>> +
>> +/* Are we currently verifying the callback for a rbtree helper that must
>> + * be called with lock held? If so, no need to complain about unreleased
>> + * lock
>> + */
>> +static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
>> +{
>> +	struct bpf_verifier_state *state = env->cur_state;
>> +	struct bpf_insn *insn = env->prog->insnsi;
>> +	struct bpf_func_state *callee;
>> +	int kfunc_btf_id;
>> +
>> +	if (!state->curframe)
>> +		return false;
>> +
>> +	callee = state->frame[state->curframe];
>> +
>> +	if (!callee->in_callback_fn)
>> +		return false;
>> +
>> +	kfunc_btf_id = insn[callee->callsite].imm;
>> +	return is_rbtree_lock_required_kfunc(kfunc_btf_id);
>> +}
>> +
>>  static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>>  {
>>  	struct bpf_verifier_state *state = env->cur_state;
>> @@ -8007,6 +8089,7 @@ struct bpf_kfunc_call_arg_meta {
>>  	bool r0_rdonly;
>>  	u32 ret_btf_id;
>>  	u64 r0_size;
>> +	u32 subprogno;
>>  	struct {
>>  		u64 value;
>>  		bool found;
>> @@ -8185,6 +8268,18 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
>>  	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
>>  }
>>  
>> +static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
>> +				  const struct btf_param *arg)
>> +{
>> +	const struct btf_type *t;
>> +
>> +	t = btf_type_resolve_func_ptr(btf, arg->type, NULL);
>> +	if (!t)
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>>  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
>>  static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
>>  					const struct btf *btf,
>> @@ -8244,6 +8339,7 @@ enum kfunc_ptr_arg_type {
>>  	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
>>  	KF_ARG_PTR_TO_MEM,
>>  	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
>> +	KF_ARG_PTR_TO_CALLBACK,
>>  	KF_ARG_PTR_TO_RB_ROOT,
>>  	KF_ARG_PTR_TO_RB_NODE,
>>  };
>> @@ -8368,6 +8464,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>>  		return KF_ARG_PTR_TO_BTF_ID;
>>  	}
>>  
>> +	if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
>> +		return KF_ARG_PTR_TO_CALLBACK;
>> +
>>  	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
>>  		arg_mem_size = true;
>>  
>> @@ -8585,6 +8684,16 @@ static bool is_bpf_datastructure_api_kfunc(u32 btf_id)
>>  	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
>>  }
>>  
>> +static bool is_callback_calling_kfunc(u32 btf_id)
>> +{
>> +	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
>> +}
>> +
>> +static bool is_rbtree_lock_required_kfunc(u32 btf_id)
>> +{
>> +	return is_bpf_rbtree_api_kfunc(btf_id);
>> +}
>> +
>>  static bool check_kfunc_is_datastructure_head_api(struct bpf_verifier_env *env,
>>  						  enum btf_field_type head_field_type,
>>  						  u32 kfunc_btf_id)
>> @@ -8920,6 +9029,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>>  		case KF_ARG_PTR_TO_RB_NODE:
>>  		case KF_ARG_PTR_TO_MEM:
>>  		case KF_ARG_PTR_TO_MEM_SIZE:
>> +		case KF_ARG_PTR_TO_CALLBACK:
>>  			/* Trusted by default */
>>  			break;
>>  		default:
>> @@ -9078,6 +9188,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>>  			/* Skip next '__sz' argument */
>>  			i++;
>>  			break;
>> +		case KF_ARG_PTR_TO_CALLBACK:
>> +			meta->subprogno = reg->subprogno;
>> +			break;
>>  		}
>>  	}
>>  
>> @@ -9193,6 +9306,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  		}
>>  	}
>>  
>> +	if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
>> +		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
>> +					set_rbtree_add_callback_state);
>> +		if (err) {
>> +			verbose(env, "kfunc %s#%d failed callback verification\n",
>> +				func_name, func_id);
>> +			return err;
>> +		}
>> +	}
>> +
>>  	for (i = 0; i < CALLER_SAVED_REGS; i++)
>>  		mark_reg_not_init(env, regs, caller_saved[i]);
>>  
>> @@ -14023,7 +14146,8 @@ static int do_check(struct bpf_verifier_env *env)
>>  					return -EINVAL;
>>  				}
>>  
>> -				if (env->cur_state->active_lock.ptr) {
>> +				if (env->cur_state->active_lock.ptr &&
>> +				    !in_rbtree_lock_required_cb(env)) {
> 
> That looks wrong.
> It will allow callbacks to use unpaired lock/unlock.
> Have you tried clearing cur_state->active_lock when entering callback?
> That should solve it and won't cause lock/unlock imbalance.

I didn't directly address this in v2. cur_state->active_lock isn't cleared.
rbtree callback is explicitly prevented from calling spin_{lock,unlock}, and
this check above is preserved so that verifier doesn't complain when cb exits
w/o releasing lock.

Logic for keeping it this way was:
  * We discussed allowing rbtree_first() call in less() cb, which requires
    correct lock to be held, so might as well keep lock info around
  * Similarly, because non-owning refs use active_lock info, need to keep
    info around.
  * Could work around both issues above, but net result would probably be
    _more_ special-casing, just in different places.

Not trying to resurrect v1 with this comment, we can continue convo on
same patch in v2: https://lore.kernel.org/bpf/20221217082506.1570898-9-davemarchevsky@fb.com/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails
  2022-12-07 19:05     ` Alexei Starovoitov
@ 2022-12-17  8:59       ` Dave Marchevsky
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On 12/7/22 2:05 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 10:19:00PM +0530, Kumar Kartikeya Dwivedi wrote:
>> On Wed, Dec 07, 2022 at 04:39:49AM IST, Dave Marchevsky wrote:
>>> map_check_btf calls btf_parse_fields to create a btf_record for its
>>> value_type. If there are no special fields in the value_type
>>> btf_parse_fields returns NULL, whereas if there special value_type
>>> fields but they are invalid in some way an error is returned.
>>>
>>> An example invalid state would be:
>>>
>>>   struct node_data {
>>>     struct bpf_rb_node node;
>>>     int data;
>>>   };
>>>
>>>   private(A) struct bpf_spin_lock glock;
>>>   private(A) struct bpf_list_head ghead __contains(node_data, node);
>>>
>>> groot should be invalid as its __contains tag points to a field with
>>> type != "bpf_list_node".
>>>
>>> Before this patch, such a scenario would result in btf_parse_fields
>>> returning an error ptr, subsequent !IS_ERR_OR_NULL check failing,
>>> and btf_check_and_fixup_fields returning 0, which would then be
>>> returned by map_check_btf.
>>>
>>> After this patch's changes, -EINVAL would be returned by map_check_btf
>>> and the map would correctly fail to load.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>> Fixes: aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record")
>>> ---
>>>  kernel/bpf/syscall.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 35972afb6850..c3599a7902f0 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -1007,7 +1007,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>>>  	map->record = btf_parse_fields(btf, value_type,
>>>  				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
>>>  				       map->value_size);
>>> -	if (!IS_ERR_OR_NULL(map->record)) {
>>> +	if (IS_ERR(map->record))
>>> +		return -EINVAL;
>>> +
>>
>> I didn't do this on purpose, because of backward compatibility concerns. An
>> error has not been returned in earlier kernel versions during map creation time
>> and those fields acted like normal non-special regions, with errors on use of
>> helpers that act on those fields.
>>
>> Especially that bpf_spin_lock and bpf_timer are part of the unified btf_record.
>>
>> If we are doing such a change, then you should also drop the checks for IS_ERR
>> in verifier.c, since that shouldn't be possible anymore. But I think we need to
>> think carefully before changing this.
>>
>> One possible example is: If we introduce bpf_foo in the future and program
>> already has that defined in map value, using it for some other purpose, with
>> different alignment and size, their map creation will start failing.
> 
> That's a good point.
> If we can error on such misconstructed map at the program verification time that's better
> anyway, since there will be a proper verifier log instead of EINVAL from map_create.

In v2 I addressed these comments by just dropping this patch. No additional
logic is needed for "error at verification time", since btf_parse_fields doesn't
create a btf_record, and thus the first insn that expects the map_val to have
one will cause verification to fail.

For my "list_head __contains rb_node" case, the first insn is usually
bpf_spin_lock call, which also needs a populated btf_record for spin_lock.
Unfortunately this doesn't really achieve "proper verifier log", since
spin_lock definition isn't the root cause here, but verifier error msg can
only complain about spin_lock.

Not that the error message coming from BTF parse or check failing is any
better.

Anyways, I think there's some path forward here that results in a good error
message. But semantics work how we want them to without this commit, so it can
be delayed for followups.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-12-17  8:59 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06 23:09 [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record Dave Marchevsky
2022-12-07 16:41   ` Kumar Kartikeya Dwivedi
2022-12-07 18:34     ` Dave Marchevsky
2022-12-07 18:59       ` Alexei Starovoitov
2022-12-07 20:38         ` Dave Marchevsky
2022-12-07 22:46           ` Alexei Starovoitov
2022-12-07 23:42             ` Dave Marchevsky
2022-12-07 19:03       ` Kumar Kartikeya Dwivedi
2022-12-06 23:09 ` [PATCH bpf-next 02/13] bpf: map_check_btf should fail if btf_parse_fields fails Dave Marchevsky
2022-12-07  1:32   ` Alexei Starovoitov
2022-12-07 16:49   ` Kumar Kartikeya Dwivedi
2022-12-07 19:05     ` Alexei Starovoitov
2022-12-17  8:59       ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 03/13] bpf: Minor refactor of ref_set_release_on_unlock Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 04/13] bpf: rename list_head -> datastructure_head in field info types Dave Marchevsky
2022-12-07  1:41   ` Alexei Starovoitov
2022-12-07 18:52     ` Dave Marchevsky
2022-12-07 19:01       ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
2022-12-07  1:48   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
2022-12-07 14:20   ` kernel test robot
2022-12-06 23:09 ` [PATCH bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
2022-12-07  1:51   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
2022-12-07  2:01   ` Alexei Starovoitov
2022-12-17  8:49     ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
2022-12-07  2:18   ` Alexei Starovoitov
2022-12-06 23:09 ` [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 Dave Marchevsky
2022-12-07  2:39   ` Alexei Starovoitov
2022-12-07  6:46     ` Dave Marchevsky
2022-12-07 18:06       ` Alexei Starovoitov
2022-12-07 23:39         ` Dave Marchevsky
2022-12-08  0:47           ` Alexei Starovoitov
2022-12-08  8:50             ` Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 11/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
2022-12-06 23:09 ` [PATCH bpf-next 12/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
2022-12-06 23:10 ` [PATCH bpf-next 13/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
2022-12-07  2:50 ` [PATCH bpf-next 00/13] BPF rbtree next-gen datastructure patchwork-bot+netdevbpf
2022-12-07 19:36 ` Kumar Kartikeya Dwivedi
2022-12-07 22:28   ` Dave Marchevsky
2022-12-07 23:06     ` Alexei Starovoitov
2022-12-08  1:18       ` Dave Marchevsky
2022-12-08  3:51         ` Alexei Starovoitov
2022-12-08  8:28           ` Dave Marchevsky
2022-12-08 12:57             ` Kumar Kartikeya Dwivedi
2022-12-08 20:36               ` Alexei Starovoitov
2022-12-08 23:35                 ` Dave Marchevsky
2022-12-09  0:39                   ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.