All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure
@ 2022-12-17  8:24 Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This series adds a rbtree datastructure following the "next-gen
datastructure" precedent set by recently-added linked-list [0]. This is
a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
instead of adding a new map type. This series adds a smaller set of API
functions than that RFC - just the minimum needed to support current
cgfifo example scheduler in ongoing sched_ext effort [2], namely:

  bpf_rbtree_add
  bpf_rbtree_remove
  bpf_rbtree_first

The meat of this series is bugfixes and verifier infra work to support
these API functions. Adding more rbtree kfuncs in future patches should
be straightforward as a result.

First, the series refactors and extends linked_list's release_on_unlock
logic. The concept of "reference to node that was added to data
structure" is formalized as "non-owning reference". From linked_list's
perspective this non-owning reference after
linked_list_push_{front,back} has same semantics as release_on_unlock,
with the addition of writes to such references being valid in the
critical section. Such references are no longer marked PTR_UNTRUSTED.
Patches 2 and 13 go into more detail.

The series then adds rbtree API kfuncs and necessary verifier support
for them - namely support for callback args to kfuncs and some
non-owning reference interactions that linked_list didn't need.

BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
hood. From the BPF program writer's perspective, a BPF rbtree is very
similar to existing linked list. Consider the following example:

  struct node_data {
    long key;
    long data;
    struct bpf_rb_node node;
  }

  static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
  {
    struct node_data *node_a;
    struct node_data *node_b;

    node_a = container_of(a, struct node_data, node);
    node_b = container_of(b, struct node_data, node);

    return node_a->key < node_b->key;
  }

  private(A) struct bpf_spin_lock glock;
  private(A) struct bpf_rb_root groot __contains(node_data, node);

  /* ... in BPF program */
  struct node_data *n, *m;
  struct bpf_rb_node *res;

  n = bpf_obj_new(typeof(*n));
  if (!n)
    /* skip */
  n->key = 5;
  n->data = 10;

  bpf_spin_lock(&glock);
  bpf_rbtree_add(&groot, &n->node, less);
  bpf_spin_unlock(&glock);

  bpf_spin_lock(&glock);
  res = bpf_rbtree_first(&groot);
  if (!res)
    /* skip */
  res = bpf_rbtree_remove(&groot, res);
  if (!res)
    /* skip */
  bpf_spin_unlock(&glock);

  m = container_of(res, struct node_data, node);
  bpf_obj_drop(m);

Some obvious similarities:

  * Special bpf_rb_root and bpf_rb_node types have same semantics
    as bpf_list_head and bpf_list_node, respectively
  * __contains is used to associated node type with root
  * The spin_lock associated with a rbtree must be held when using
    rbtree API kfuncs
  * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
  * Rbtree takes ownership of node lifetime when a node is added.
    Removing a node gives ownership back to the program, requiring a
    bpf_obj_drop before program exit

Some new additions as well:

  * Support for callbacks in kfunc args is added to enable 'less'
    callback use above
  * bpf_rbtree_first is the first graph API function to return a
    non-owning reference instead of convering an arg from own->non-own
  * Because all references to nodes already added to the rbtree are
    non-owning, bpf_rbtree_remove must accept such a reference in order
    to remove it from the tree

Summary of patches:
  Patch 1 lays groundwork for release_on_unlock -> non-owning ref
  changes

  Patches 2 and 3 do release_on_unlock -> non-owning ref migration and
  update linked_list tests

  Patch 4 is a nonfunctional rename

  Patches 5 - 9 implement the meat of rbtree support in this series,
  gradually building up to implemented kfuncs that verify as expected.

  Patch 10 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.

  Patch 12 adds tests, Patch 13 adds documentation.

  [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com
  [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
  [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org

Changelog:

v1 -> v2: lore.kernel.org/bpf/20221206231000.3180914-1-davemarchevsky@fb.com/

Series-wide changes:
  * Rename datastructure_{head,node,api} -> graph_{root,node,api} (Alexei)
  * "graph datastructure" in patch summaries to refer to linked_list + rbtree
    instead of "next-gen datastructure" (Alexei)
  * Move from hacky marking of non-owning references as PTR_UNTRUSTED to
    cleaner implementation (Alexei)
  * Add invalidation of non-owning refs to rbtree_remove (Kumar, Alexei)

Patch #'s below refer to the patch's number in v1 unless otherwise stated.

Note that in v1 most of the meaty verifier changes were in the latter half
of the series. Here, about half of that complexity has been moved to
"bpf: Migrate release_on_unlock logic to non-owning ref semantics" - was Patch
3 in v1.

* Patch 1 - "bpf: Loosen alloc obj test in verifier's reg_btf_record"
  * Was applied, dropped from further iterations

* Patch 2 - "bpf: map_check_btf should fail if btf_parse_fields fails"
  * Dropped in favor of verifier check-on-use: when some normal verifier
    checking expects the map to have btf_fields correctly parsed, it won't
    find any and verification will fail

* New patch added before Patch 3 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs"
  * Addition of KF_RELEASE_NON_OWN flag, which requires KF_RELEASE, and tagging
    of bpf_list_push_{front,back} KF_RELEASE | KF_RELEASE_NON_OWN, means that
    list-in-list push_{front,back} will trigger "only one ref_obj_id arg reg"
    logic. This is because "head" arg to those functions can be a list-in-list,
    which itself can be an owning reference with ref_obj_id. So need to
    support multiple ref_obj_id for release kfuncs.

* Patch 3 - "bpf: Minor refactor of ref_set_release_on_unlock"
  * Now a major refactor w/ a rename to reflect this
    * "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
  * Replaces release_on_unlock with active_lock logic as discussed in v1

* New patch added after Patch 3 - "selftests/bpf: Update linked_list tests for non_owning_ref logic"
  * Removes "write after push" linked_list failure tests - no longer failure
    scenarios.

* Patch 4 - "bpf: rename list_head -> datastructure_head in field info types"
  * rename to graph_root instead. Similar renamings across the series - see
    series-wide changes.

* Patch 5 - "bpf: Add basic bpf_rb_{root,node} support"
  * OWNER_FIELD_MASK -> GRAPH_ROOT_MASK, OWNEE_FIELD_MASK -> GRAPH_NODE_MASK,
    and change of "owner"/"ownee" in big btf_check_and_fixup_fields comment to
    "root"/"node" (Alexei)

* Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
  * bpf_rbtree_remove can no longer return NULL. v2 continues v1's "use type
    system to prevent remove of node that isn't in a datastructure" approach,
    so rbtree_remove should never have been able to return NULL

* Patch 7 - "bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args"
  * is_bpf_datastructure_api_kfunc -> is_bpf_graph_api_kfunc (Alexei)

* Patch 8 - "bpf: Add callback validation to kfunc verifier logic"
  * Explicitly disallow rbtree_remove in rbtree callback
  * Explicitly disallow bpf_spin_{lock,unlock} call in rbtree callback,
    preventing possibility of "unbalanced" unlock (Alexei)

* Patch 10 - "bpf, x86: BPF_PROBE_MEM handling for insn->off < 0"
  * Now that non-owning refs aren't marked PTR_UNTRUSTED it's not necessary to
    include this patch as part of the series
  * After conversation w/ Alexei, did another pass and submitted as an
    independent series (lore.kernel.org/bpf/20221213182726.325137-1-davemarchevsky@fb.com/)

* Patch 13 - "selftests/bpf: Add rbtree selftests"
  * Since bpf_rbtree_remove can no longer return null, remove null checks
  * Remove test confirming that rbtree_first isn't allowed in callback. We want
    this to be possible
  * Add failure test confirming that rbtree_remove's new non-owning reference
    invalidation behavior behaves as expected
  * Add SEC("license") to rbtree_btf_fail__* progs. They were previously
    failing due to lack of this section. Now they're failing for correct
    reasons.
  * rbtree_btf_fail__add_wrong_type.c - add locking around rbtree_add, rename
    the bpf prog to something reasonable

* New patch added after patch 13 - "bpf, documentation: Add graph documentation for non-owning refs"
  * Summarizes details of owning and non-owning refs which we hashed out in
    v1


Dave Marchevsky (13):
  bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  bpf: Migrate release_on_unlock logic to non-owning ref semantics
  selftests/bpf: Update linked_list tests for non-owning ref semantics
  bpf: rename list_head -> graph_root in field info types
  bpf: Add basic bpf_rb_{root,node} support
  bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  bpf: Add callback validation to kfunc verifier logic
  bpf: Special verifier handling for bpf_rbtree_{remove, first}
  bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj
    type
  selftests/bpf: Add rbtree selftests
  bpf, documentation: Add graph documentation for non-owning refs

 Documentation/bpf/graph_ds_impl.rst           | 208 +++++
 Documentation/bpf/other.rst                   |   3 +-
 include/linux/bpf.h                           |  23 +-
 include/linux/bpf_verifier.h                  |  39 +-
 include/linux/btf.h                           |  18 +-
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/btf.c                              | 181 ++--
 kernel/bpf/helpers.c                          |  76 +-
 kernel/bpf/syscall.c                          |  28 +-
 kernel/bpf/verifier.c                         | 800 ++++++++++++++----
 tools/include/uapi/linux/bpf.h                |  11 +
 tools/lib/bpf/libbpf.c                        |  50 +-
 .../testing/selftests/bpf/bpf_experimental.h  |  24 +
 .../selftests/bpf/prog_tests/linked_list.c    |  22 +-
 .../testing/selftests/bpf/prog_tests/rbtree.c | 186 ++++
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 .../selftests/bpf/progs/linked_list_fail.c    | 100 ++-
 tools/testing/selftests/bpf/progs/rbtree.c    | 176 ++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  52 ++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  49 ++
 .../testing/selftests/bpf/progs/rbtree_fail.c | 296 +++++++
 21 files changed, 2018 insertions(+), 337 deletions(-)
 create mode 100644 Documentation/bpf/graph_ds_impl.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

-- 
2.30.2

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-29  3:24   ` Alexei Starovoitov
  2022-12-29  6:40   ` David Vernet
  2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Currently, kfuncs marked KF_RELEASE indicate that they release some
previously-acquired arg. The verifier assumes that such a function will
only have one arg reg w/ ref_obj_id set, and that that arg is the one to
be released. Multiple kfunc arg regs have ref_obj_id set is considered
an invalid state.

For helpers, RELEASE is used to tag a particular arg in the function
proto, not the function itself. The arg with OBJ_RELEASE type tag is the
arg that the helper will release. There can only be one such tagged arg.
When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
also considered an invalid state.

Later patches in this series will result in some linked_list helpers
marked KF_RELEASE having a valid reason to take two ref_obj_id args.
Specifically, bpf_list_push_{front,back} can push a node to a list head
which is itself part of a list node. In such a scenario both arguments
to these functions would have ref_obj_id > 0, thus would fail
verification under current logic.

This patch changes kfunc ref_obj_id searching logic to find the last arg
reg w/ ref_obj_id and consider that the reg-to-release. This should be
backwards-compatible with all current kfuncs as they only expect one
such arg reg.

Currently the ref_obj_id and OBJ_RELEASE searching is done in the code
that examines each individual arg (check_func_arg for helpers and
check_kfunc_args inner loop for kfuncs). This patch pulls out this
searching to occur before individual arg type handling, resulting in a
cleaner separation of logic.

Two new helpers are added:
  * args_find_ref_obj_id_regno
    * For helpers and kfuncs. Searches through arg regs to find
      ref_obj_id reg and returns its regno. Helpers set allow_multi =
      false, retaining "only one ref_obj_id arg" behavior, while kfuncs
      set allow_multi = true and get the last ref_obj_id arg reg back.

  * helper_proto_find_release_arg_regno
    * For helpers only. Searches through fn proto args to find the
      OBJ_RELEASE arg and returns the corresponding regno.

Aside from the intentional semantic change for kfuncs, the rest of the
refactoring strives to keep failure logic and error messages unchanged.
However, because the release arg searching is now done before any
arg-specific type checking, verifier states that are invalid due to both
invalid release arg state _and_ some type- or helper-specific checking
might see release arg-related error message first.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 206 ++++++++++++++++++++++++++++--------------
 1 file changed, 138 insertions(+), 68 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a5255a0dcbb6..824e2242eae5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6412,49 +6412,6 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return err;
 
 skip_type_check:
-	if (arg_type_is_release(arg_type)) {
-		if (arg_type_is_dynptr(arg_type)) {
-			struct bpf_func_state *state = func(env, reg);
-			int spi;
-
-			/* Only dynptr created on stack can be released, thus
-			 * the get_spi and stack state checks for spilled_ptr
-			 * should only be done before process_dynptr_func for
-			 * PTR_TO_STACK.
-			 */
-			if (reg->type == PTR_TO_STACK) {
-				spi = get_spi(reg->off);
-				if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
-				    !state->stack[spi].spilled_ptr.ref_obj_id) {
-					verbose(env, "arg %d is an unacquired reference\n", regno);
-					return -EINVAL;
-				}
-			} else {
-				verbose(env, "cannot release unowned const bpf_dynptr\n");
-				return -EINVAL;
-			}
-		} else if (!reg->ref_obj_id && !register_is_null(reg)) {
-			verbose(env, "R%d must be referenced when passed to release function\n",
-				regno);
-			return -EINVAL;
-		}
-		if (meta->release_regno) {
-			verbose(env, "verifier internal error: more than one release argument\n");
-			return -EFAULT;
-		}
-		meta->release_regno = regno;
-	}
-
-	if (reg->ref_obj_id) {
-		if (meta->ref_obj_id) {
-			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
-				regno, reg->ref_obj_id,
-				meta->ref_obj_id);
-			return -EFAULT;
-		}
-		meta->ref_obj_id = reg->ref_obj_id;
-	}
-
 	switch (base_type(arg_type)) {
 	case ARG_CONST_MAP_PTR:
 		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
@@ -6565,6 +6522,27 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_mem_size_reg(env, reg, regno, true, meta);
 		break;
 	case ARG_PTR_TO_DYNPTR:
+		if (meta->release_regno == regno) {
+			struct bpf_func_state *state = func(env, reg);
+			int spi;
+
+			/* Only dynptr created on stack can be released, thus
+			 * the get_spi and stack state checks for spilled_ptr
+			 * should only be done before process_dynptr_func for
+			 * PTR_TO_STACK.
+			 */
+			if (reg->type == PTR_TO_STACK) {
+				spi = get_spi(reg->off);
+				if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
+				    !state->stack[spi].spilled_ptr.ref_obj_id) {
+					verbose(env, "arg %d is an unacquired reference\n", regno);
+					return -EINVAL;
+				}
+			} else {
+				verbose(env, "cannot release unowned const bpf_dynptr\n");
+				return -EINVAL;
+			}
+		}
 		err = process_dynptr_func(env, regno, arg_type, meta);
 		if (err)
 			return err;
@@ -7699,10 +7677,78 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
 				 state->callback_subprogno == subprogno);
 }
 
+/* Call arg meta's ref_obj_id is used to either:
+ *   - For release funcs, keep track of ref that needs to be released
+ *   - For other funcs, keep track of ref that needs to be propagated to retval
+ *
+ * Find and return:
+ *   - Regno that should become meta->ref_obj_id on success
+ *     (regno > 0 since BPF_REG_1 is first arg)
+ *   - 0 if no arg had ref_obj_id set
+ *   - Negative err if some invalid arg reg state
+ *
+ * allow_multi controls whether multiple args w/ ref_obj_id set is valid
+ *   - true: regno of _last_ such arg reg is returned
+ *   - false: err if multiple args w/ ref_obj_id set are seen
+ */
+static int args_find_ref_obj_id_regno(struct bpf_verifier_env *env, struct bpf_reg_state *regs,
+				      u32 nargs, bool allow_multi)
+{
+	struct bpf_reg_state *reg;
+	u32 i, regno, found_regno = 0;
+
+	for (i = 0; i < nargs; i++) {
+		regno = i + 1;
+		reg = &regs[regno];
+
+		if (!reg->ref_obj_id)
+			continue;
+
+		if (!allow_multi && found_regno) {
+			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
+				regno, reg->ref_obj_id, regs[found_regno].ref_obj_id);
+			return -EFAULT;
+		}
+
+		found_regno = regno;
+	}
+
+	return found_regno;
+}
+
+/* Find the OBJ_RELEASE arg in helper func proto and return:
+ *   - regno of single OBJ_RELEASE arg
+ *   - 0 if no arg in the proto was OBJ_RELEASE
+ *   - Negative err if some invalid func proto state
+ */
+static int helper_proto_find_release_arg_regno(struct bpf_verifier_env *env,
+					       const struct bpf_func_proto *fn, u32 nargs)
+{
+	enum bpf_arg_type arg_type;
+	int i, release_regno = 0;
+
+	for (i = 0; i < nargs; i++) {
+		arg_type = fn->arg_type[i];
+
+		if (!arg_type_is_release(arg_type))
+			continue;
+
+		if (release_regno) {
+			verbose(env, "verifier internal error: more than one release argument\n");
+			return -EFAULT;
+		}
+
+		release_regno = i + BPF_REG_1;
+	}
+
+	return release_regno;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+	int i, err, func_id, nargs, release_regno, ref_regno;
 	const struct bpf_func_proto *fn = NULL;
 	enum bpf_return_type ret_type;
 	enum bpf_type_flag ret_flag;
@@ -7710,7 +7756,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	struct bpf_call_arg_meta meta;
 	int insn_idx = *insn_idx_p;
 	bool changes_data;
-	int i, err, func_id;
 
 	/* find function prototype */
 	func_id = insn->imm;
@@ -7774,8 +7819,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	}
 
 	meta.func_id = func_id;
+	regs = cur_regs(env);
+
+	/* find actual arg count */
+	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++)
+		if (fn->arg_type[i] == ARG_DONTCARE)
+			break;
+	nargs = i;
+
+	release_regno = helper_proto_find_release_arg_regno(env, fn, nargs);
+	if (release_regno < 0)
+		return release_regno;
+
+	ref_regno = args_find_ref_obj_id_regno(env, regs, nargs, false);
+	if (ref_regno < 0)
+		return ref_regno;
+	else if (ref_regno > 0)
+		meta.ref_obj_id = regs[ref_regno].ref_obj_id;
+
+	if (release_regno > 0) {
+		if (!regs[release_regno].ref_obj_id &&
+		    !register_is_null(&regs[release_regno]) &&
+		    !arg_type_is_dynptr(fn->arg_type[release_regno - BPF_REG_1])) {
+			verbose(env, "R%d must be referenced when passed to release function\n",
+				release_regno);
+			return -EINVAL;
+		}
+
+		meta.release_regno = release_regno;
+	}
+
 	/* check args */
-	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
+	for (i = 0; i < nargs; i++) {
 		err = check_func_arg(env, i, &meta, fn);
 		if (err)
 			return err;
@@ -7799,8 +7874,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
-	regs = cur_regs(env);
-
 	/* This can only be set for PTR_TO_STACK, as CONST_PTR_TO_DYNPTR cannot
 	 * be reinitialized by any dynptr helper. Hence, mark_stack_slots_dynptr
 	 * is safe to do directly.
@@ -8795,10 +8868,11 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
+	struct bpf_reg_state *regs = cur_regs(env);
 	const struct btf *btf = meta->btf;
 	const struct btf_param *args;
 	u32 i, nargs;
-	int ret;
+	int ret, ref_regno;
 
 	args = (const struct btf_param *)(meta->func_proto + 1);
 	nargs = btf_type_vlen(meta->func_proto);
@@ -8808,17 +8882,31 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		return -EINVAL;
 	}
 
+	ref_regno = args_find_ref_obj_id_regno(env, cur_regs(env), nargs, true);
+	if (ref_regno < 0) {
+		return ref_regno;
+	} else if (!ref_regno && is_kfunc_release(meta)) {
+		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
+			func_name);
+		return -EINVAL;
+	}
+
+	meta->ref_obj_id = regs[ref_regno].ref_obj_id;
+	if (is_kfunc_release(meta))
+		meta->release_regno = ref_regno;
+
 	/* Check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
-		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
 		const struct btf_type *t, *ref_t, *resolve_ret;
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1, ref_id, type_size;
 		bool is_ret_buf_sz = false;
+		struct bpf_reg_state *reg;
 		int kf_arg_type;
 
+		reg = &regs[regno];
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
 
 		if (is_kfunc_arg_ignore(btf, &args[i]))
@@ -8875,18 +8963,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return -EINVAL;
 		}
 
-		if (reg->ref_obj_id) {
-			if (is_kfunc_release(meta) && meta->ref_obj_id) {
-				verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
-					regno, reg->ref_obj_id,
-					meta->ref_obj_id);
-				return -EFAULT;
-			}
-			meta->ref_obj_id = reg->ref_obj_id;
-			if (is_kfunc_release(meta))
-				meta->release_regno = regno;
-		}
-
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
@@ -8929,7 +9005,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return -EFAULT;
 		}
 
-		if (is_kfunc_release(meta) && reg->ref_obj_id)
+		if (is_kfunc_release(meta) && regno == meta->release_regno)
 			arg_type |= OBJ_RELEASE;
 		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
 		if (ret < 0)
@@ -9049,12 +9125,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		}
 	}
 
-	if (is_kfunc_release(meta) && !meta->release_regno) {
-		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
-			func_name);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-17  9:21   ` Dave Marchevsky
                     ` (2 more replies)
  2022-12-17  8:24 ` [PATCH v2 bpf-next 03/13] selftests/bpf: Update linked_list tests for " Dave Marchevsky
                   ` (10 subsequent siblings)
  12 siblings, 3 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch introduces non-owning reference semantics to the verifier,
specifically linked_list API kfunc handling. release_on_unlock logic for
refs is refactored - with small functional changes - to implement these
semantics, and bpf_list_push_{front,back} are migrated to use them.

When a list node is pushed to a list, the program still has a pointer to
the node:

  n = bpf_obj_new(typeof(*n));

  bpf_spin_lock(&l);
  bpf_list_push_back(&l, n);
  /* n still points to the just-added node */
  bpf_spin_unlock(&l);

What the verifier considers n to be after the push, and thus what can be
done with n, are changed by this patch.

Common properties both before/after this patch:
  * After push, n is only a valid reference to the node until end of
    critical section
  * After push, n cannot be pushed to any list
  * After push, the program can read the node's fields using n

Before:
  * After push, n retains the ref_obj_id which it received on
    bpf_obj_new, but the associated bpf_reference_state's
    release_on_unlock field is set to true
    * release_on_unlock field and associated logic is used to implement
      "n is only a valid ref until end of critical section"
  * After push, n cannot be written to, the node must be removed from
    the list before writing to its fields
  * After push, n is marked PTR_UNTRUSTED

After:
  * After push, n's ref is released and ref_obj_id set to 0. The
    bpf_reg_state's non_owning_ref_lock struct is populated with the
    currently active lock
    * non_owning_ref_lock and logic is used to implement "n is only a
      valid ref until end of critical section"
  * n can be written to (except for special fields e.g. bpf_list_node,
    timer, ...)
  * No special type flag is added to n after push

Summary of specific implementation changes to achieve the above:

  * release_on_unlock field, ref_set_release_on_unlock helper, and logic
    to "release on unlock" based on that field are removed

  * The anonymous active_lock struct used by bpf_verifier_state is
    pulled out into a named struct bpf_active_lock.

  * A non_owning_ref_lock field of type bpf_active_lock is added to
    bpf_reg_state's PTR_TO_BTF_ID union

  * Helpers are added to use non_owning_ref_lock to implement non-owning
    ref semantics as described above
    * invalidate_non_owning_refs - helper to clobber all non-owning refs
      matching a particular bpf_active_lock identity. Replaces
      release_on_unlock logic in process_spin_lock.
    * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
      on current verifier state
    * ref_convert_owning_non_owning - convert owning reference w/
      specified ref_obj_id to non-owning references. Setup
      non_owning_ref_lock for each reg with that ref_obj_id and 0 out
      its ref_obj_id

  * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
    KF_RELEASE to indicate that the release arg reg should be converted
    to non-owning ref
    * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
      the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
      doing ref_convert_owning_non_owning on the ref first, which
      prevents the regs from being clobbered by 0ing out their
      ref_obj_ids. The bpf_reference_state itself is still released via
      release_reference as a result of the KF_RELEASE flag.
    * KF_RELEASE | KF_RELEASE_NON_OWN are added to
      bpf_list_push_{front,back}

After these changes, linked_list's "release on unlock" logic continues
to function as before, except for the semantic differences noted above.
The patch immediately following this one makes minor changes to
linked_list selftests to account for the differing behavior.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h          |   1 +
 include/linux/bpf_verifier.h |  39 ++++-----
 include/linux/btf.h          |  17 ++--
 kernel/bpf/helpers.c         |   4 +-
 kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
 5 files changed, 146 insertions(+), 79 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3de24cfb7a3d..f71571bf6adc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,6 +180,7 @@ enum btf_field_type {
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
 	BPF_LIST_NODE  = (1 << 5),
+	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
 };
 
 struct btf_field_kptr {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 53d175cbaa02..cb417ffbbb84 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -43,6 +43,22 @@ enum bpf_reg_liveness {
 	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
 };
 
+/* For every reg representing a map value or allocated object pointer,
+ * we consider the tuple of (ptr, id) for them to be unique in verifier
+ * context and conside them to not alias each other for the purposes of
+ * tracking lock state.
+ */
+struct bpf_active_lock {
+	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
+	 * there's no active lock held, and other fields have no
+	 * meaning. If non-NULL, it indicates that a lock is held and
+	 * id member has the reg->id of the register which can be >= 0.
+	 */
+	void *ptr;
+	/* This will be reg->id */
+	u32 id;
+};
+
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
 	enum bpf_reg_type type;
@@ -68,6 +84,7 @@ struct bpf_reg_state {
 		struct {
 			struct btf *btf;
 			u32 btf_id;
+			struct bpf_active_lock non_owning_ref_lock;
 		};
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
@@ -223,11 +240,6 @@ struct bpf_reference_state {
 	 * exiting a callback function.
 	 */
 	int callback_ref;
-	/* Mark the reference state to release the registers sharing the same id
-	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
-	 * safe to access inside the critical section).
-	 */
-	bool release_on_unlock;
 };
 
 /* state of the program:
@@ -328,21 +340,8 @@ struct bpf_verifier_state {
 	u32 branches;
 	u32 insn_idx;
 	u32 curframe;
-	/* For every reg representing a map value or allocated object pointer,
-	 * we consider the tuple of (ptr, id) for them to be unique in verifier
-	 * context and conside them to not alias each other for the purposes of
-	 * tracking lock state.
-	 */
-	struct {
-		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
-		 * there's no active lock held, and other fields have no
-		 * meaning. If non-NULL, it indicates that a lock is held and
-		 * id member has the reg->id of the register which can be >= 0.
-		 */
-		void *ptr;
-		/* This will be reg->id */
-		u32 id;
-	} active_lock;
+
+	struct bpf_active_lock active_lock;
 	bool speculative;
 	bool active_rcu_lock;
 
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 5f628f323442..8aee3f7f4248 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -15,10 +15,10 @@
 #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
 
 /* These need to be macros, as the expressions are used in assembler input */
-#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
-#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
-#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
-#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
+#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
+#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
+#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
+#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
 /* Trusted arguments are those which are guaranteed to be valid when passed to
  * the kfunc. It is used to enforce that pointers obtained from either acquire
  * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
@@ -67,10 +67,11 @@
  *	return 0;
  * }
  */
-#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
-#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
-#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
-#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
+#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
+#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
+#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
+#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
+#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
 
 /*
  * Return the name of the passed struct, if exists, or halt the build if for
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index af30c6cbd65d..e041409779c3 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_list_push_front)
-BTF_ID_FLAGS(func, bpf_list_push_back)
+BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
+BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 824e2242eae5..84b0660e2a76 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
 
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
+				       struct bpf_active_lock *lock);
+static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 				verbose_a("id=%d", reg->id);
 			if (reg->ref_obj_id)
 				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
+			if (reg->non_owning_ref_lock.ptr)
+				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
+					  reg->non_owning_ref_lock.id);
 			if (t != SCALAR_VALUE)
 				verbose_a("off=%d", reg->off);
 			if (type_is_pkt_pointer(t))
@@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 			return -EACCES;
 		}
 
-		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
+		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
+		    !reg->non_owning_ref_lock.ptr) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			cur->active_lock.ptr = btf;
 		cur->active_lock.id = reg->id;
 	} else {
-		struct bpf_func_state *fstate = cur_func(env);
 		void *ptr;
-		int i;
 
 		if (map)
 			ptr = map;
@@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
-		cur->active_lock.ptr = NULL;
-		cur->active_lock.id = 0;
 
-		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
-			int err;
+		invalidate_non_owning_refs(env, &cur->active_lock);
 
-			/* Complain on error because this reference state cannot
-			 * be freed before this point, as bpf_spin_lock critical
-			 * section does not allow functions that release the
-			 * allocated object immediately.
-			 */
-			if (!fstate->refs[i].release_on_unlock)
-				continue;
-			err = release_reference(env, fstate->refs[i].id);
-			if (err) {
-				verbose(env, "failed to release release_on_unlock reference");
-				return err;
-			}
-		}
+		cur->active_lock.ptr = NULL;
+		cur->active_lock.id = 0;
 	}
 	return 0;
 }
@@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static struct btf_field *
+reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
+{
+	struct btf_field *field;
+	struct btf_record *rec;
+
+	rec = reg_btf_record(reg);
+	if (!reg)
+		return NULL;
+
+	field = btf_record_find(rec, off, fields);
+	if (!field)
+		return NULL;
+
+	return field;
+}
+
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
 			   enum bpf_arg_type arg_type)
@@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		 */
 		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
 			return 0;
+
+		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
+			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
+				return __check_ptr_off_reg(env, reg, regno, true);
+
+			verbose(env, "R%d must have zero offset when passed to release func\n",
+				regno);
+			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
+				kernel_type_name(reg->btf, reg->btf_id), reg->off);
+			return -EINVAL;
+		}
+
 		/* Doing check_ptr_off_reg check for the offset will catch this
 		 * because fixed_off_ok is false, but checking here allows us
 		 * to give the user a better error message.
@@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
+				       struct bpf_active_lock *lock)
+{
+	struct bpf_func_state *unused;
+	struct bpf_reg_state *reg;
+
+	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+		if (reg->non_owning_ref_lock.ptr &&
+		    reg->non_owning_ref_lock.ptr == lock->ptr &&
+		    reg->non_owning_ref_lock.id == lock->id)
+			__mark_reg_unknown(env, reg);
+	}));
+}
+
 static void clear_caller_saved_regs(struct bpf_verifier_env *env,
 				    struct bpf_reg_state *regs)
 {
@@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_RELEASE;
 }
 
+static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
+}
+
 static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_TRUSTED_ARGS;
@@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
+static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_verifier_state *state = env->cur_state;
+
+	if (!state->active_lock.ptr) {
+		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
+		return -EFAULT;
+	}
+
+	if (reg->non_owning_ref_lock.ptr) {
+		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
+		return -EFAULT;
+	}
+
+	reg->non_owning_ref_lock.id = state->active_lock.id;
+	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
+	return 0;
+}
+
+static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
+{
+	struct bpf_func_state *state, *unused;
 	struct bpf_reg_state *reg;
 	int i;
 
-	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
-	 * subprogs, no global functions. This means that the references would
-	 * not be released inside the critical section but they may be added to
-	 * the reference state, and the acquired_refs are never copied out for a
-	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
-	 * critical sections.
-	 */
+	state = cur_func(env);
+
 	if (!ref_obj_id) {
-		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
+		verbose(env, "verifier internal error: ref_obj_id is zero for "
+			     "owning -> non-owning conversion\n");
 		return -EFAULT;
 	}
+
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].id == ref_obj_id) {
-			if (state->refs[i].release_on_unlock) {
-				verbose(env, "verifier internal error: expected false release_on_unlock");
-				return -EFAULT;
+		if (state->refs[i].id != ref_obj_id)
+			continue;
+
+		/* Clear ref_obj_id here so release_reference doesn't clobber
+		 * the whole reg
+		 */
+		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+			if (reg->ref_obj_id == ref_obj_id) {
+				reg->ref_obj_id = 0;
+				ref_set_non_owning_lock(env, reg);
 			}
-			state->refs[i].release_on_unlock = true;
-			/* Now mark everyone sharing same ref_obj_id as untrusted */
-			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-				if (reg->ref_obj_id == ref_obj_id)
-					reg->type |= PTR_UNTRUSTED;
-			}));
-			return 0;
-		}
+		}));
+		return 0;
 	}
+
 	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
 	return -EFAULT;
 }
@@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 {
 	const struct btf_type *et, *t;
 	struct btf_field *field;
-	struct btf_record *rec;
 	u32 list_node_off;
 
 	if (meta->btf != btf_vmlinux ||
@@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	rec = reg_btf_record(reg);
 	list_node_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
+	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
 	if (!field || field->offset != list_node_off) {
 		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
 		return -EINVAL;
@@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 			btf_name_by_offset(field->list_head.btf, et->name_off));
 		return -EINVAL;
 	}
-	/* Set arg#1 for expiration after unlock */
-	return ref_set_release_on_unlock(env, reg->ref_obj_id);
+
+	return 0;
 }
 
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
@@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
 	const struct btf_type *t, *func, *func_proto, *ptr_type;
+	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
 	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
 	bool sleepable, rcu_lock, rcu_unlock;
 	struct bpf_kfunc_call_arg_meta meta;
-	u32 i, nargs, func_id, ptr_type_id;
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	const struct btf_type *ret_t;
@@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
 	 */
 	if (meta.release_regno) {
-		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
+		err = 0;
+		release_ref_obj_id = regs[meta.release_regno].ref_obj_id;
+
+		if (is_kfunc_release_non_own(&meta))
+			err = ref_convert_owning_non_owning(env, release_ref_obj_id);
+		if (err) {
+			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
+				func_name, func_id);
+			return err;
+		}
+
+		err = release_reference(env, release_ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
 				func_name, func_id);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 03/13] selftests/bpf: Update linked_list tests for non-owning ref semantics
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 04/13] bpf: rename list_head -> graph_root in field info types Dave Marchevsky
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Current linked_list semantics for release_on_unlock node refs are almost
exactly the same as newly-introduced "non-owning reference" concept. The
only difference: writes to a release_on_unlock node ref are not allowed,
while writes to non-owning reference pointees are.

As a result the linked_list "write after push" failure tests are no
longer scenarios that should fail.

The test##_missing_lock_##op and test##_incorrect_lock_##op
macro-generated failure tests need to have a valid node argument in
order to have the same error output as before. Otherwise verification
will fail early and the expected error output won't be seen.

Some other tests have minor changes in error output, but fail for the
same reason.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/linked_list.c    |  10 +-
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 .../selftests/bpf/progs/linked_list_fail.c    | 100 +++++++++++-------
 3 files changed, 68 insertions(+), 44 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 9a7d4c47af63..a8091a0c0831 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -78,18 +78,18 @@ static struct {
 	{ "direct_write_head", "direct access to bpf_list_head is disallowed" },
 	{ "direct_read_node", "direct access to bpf_list_node is disallowed" },
 	{ "direct_write_node", "direct access to bpf_list_node is disallowed" },
-	{ "write_after_push_front", "only read is supported" },
-	{ "write_after_push_back", "only read is supported" },
 	{ "use_after_unlock_push_front", "invalid mem access 'scalar'" },
 	{ "use_after_unlock_push_back", "invalid mem access 'scalar'" },
-	{ "double_push_front", "arg#1 expected pointer to allocated object" },
-	{ "double_push_back", "arg#1 expected pointer to allocated object" },
+	{ "double_push_front",
+	  "release kernel function bpf_list_push_front expects refcounted PTR_TO_BTF_ID" },
+	{ "double_push_back",
+	  "release kernel function bpf_list_push_back expects refcounted PTR_TO_BTF_ID" },
 	{ "no_node_value_type", "bpf_list_node not found at offset=0" },
 	{ "incorrect_value_type",
 	  "operation on bpf_list_head expects arg#1 bpf_list_node at offset=0 in struct foo, "
 	  "but arg is at offset=0 in struct bar" },
 	{ "incorrect_node_var_off", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" },
-	{ "incorrect_node_off1", "bpf_list_node not found at offset=1" },
+	{ "incorrect_node_off1", "No graph node or root found at R2 type:foo off:1" },
 	{ "incorrect_node_off2", "arg#1 offset=40, but expected bpf_list_node at offset=0 in struct foo" },
 	{ "no_head_type", "bpf_list_head not found at offset=0" },
 	{ "incorrect_head_var_off1", "R1 doesn't have constant offset" },
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
index 4ad88da5cda2..4fa4a9b01bde 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -260,7 +260,7 @@ int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head
 {
 	int ret;
 
-	ret = list_push_pop_multiple(lock ,head, false);
+	ret = list_push_pop_multiple(lock, head, false);
 	if (ret)
 		return ret;
 	return list_push_pop_multiple(lock, head, true);
diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c
index 1d9017240e19..69cdc07cba13 100644
--- a/tools/testing/selftests/bpf/progs/linked_list_fail.c
+++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c
@@ -54,28 +54,44 @@
 		return 0;                                   \
 	}
 
-CHECK(kptr, push_front, &f->head);
-CHECK(kptr, push_back, &f->head);
 CHECK(kptr, pop_front, &f->head);
 CHECK(kptr, pop_back, &f->head);
 
-CHECK(global, push_front, &ghead);
-CHECK(global, push_back, &ghead);
 CHECK(global, pop_front, &ghead);
 CHECK(global, pop_back, &ghead);
 
-CHECK(map, push_front, &v->head);
-CHECK(map, push_back, &v->head);
 CHECK(map, pop_front, &v->head);
 CHECK(map, pop_back, &v->head);
 
-CHECK(inner_map, push_front, &iv->head);
-CHECK(inner_map, push_back, &iv->head);
 CHECK(inner_map, pop_front, &iv->head);
 CHECK(inner_map, pop_back, &iv->head);
 
 #undef CHECK
 
+#define CHECK(test, op, hexpr, nexpr)					\
+	SEC("?tc")							\
+	int test##_missing_lock_##op(void *ctx)				\
+	{								\
+		INIT;							\
+		void (*p)(void *, void *) = (void *)&bpf_list_##op;	\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+CHECK(kptr, push_front, &f->head, b);
+CHECK(kptr, push_back, &f->head, b);
+
+CHECK(global, push_front, &ghead, f);
+CHECK(global, push_back, &ghead, f);
+
+CHECK(map, push_front, &v->head, f);
+CHECK(map, push_back, &v->head, f);
+
+CHECK(inner_map, push_front, &iv->head, f);
+CHECK(inner_map, push_back, &iv->head, f);
+
+#undef CHECK
+
 #define CHECK(test, op, lexpr, hexpr)                       \
 	SEC("?tc")                                          \
 	int test##_incorrect_lock_##op(void *ctx)           \
@@ -108,11 +124,47 @@ CHECK(inner_map, pop_back, &iv->head);
 	CHECK(inner_map_global, op, &iv->lock, &ghead);        \
 	CHECK(inner_map_map, op, &iv->lock, &v->head);
 
-CHECK_OP(push_front);
-CHECK_OP(push_back);
 CHECK_OP(pop_front);
 CHECK_OP(pop_back);
 
+#undef CHECK
+#undef CHECK_OP
+
+#define CHECK(test, op, lexpr, hexpr, nexpr)				\
+	SEC("?tc")							\
+	int test##_incorrect_lock_##op(void *ctx)			\
+	{								\
+		INIT;							\
+		void (*p)(void *, void*) = (void *)&bpf_list_##op;	\
+		bpf_spin_lock(lexpr);					\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+#define CHECK_OP(op)							\
+	CHECK(kptr_kptr, op, &f1->lock, &f2->head, b);			\
+	CHECK(kptr_global, op, &f1->lock, &ghead, f);			\
+	CHECK(kptr_map, op, &f1->lock, &v->head, f);			\
+	CHECK(kptr_inner_map, op, &f1->lock, &iv->head, f);		\
+									\
+	CHECK(global_global, op, &glock2, &ghead, f);			\
+	CHECK(global_kptr, op, &glock, &f1->head, b);			\
+	CHECK(global_map, op, &glock, &v->head, f);			\
+	CHECK(global_inner_map, op, &glock, &iv->head, f);		\
+									\
+	CHECK(map_map, op, &v->lock, &v2->head, f);			\
+	CHECK(map_kptr, op, &v->lock, &f2->head, b);			\
+	CHECK(map_global, op, &v->lock, &ghead, f);			\
+	CHECK(map_inner_map, op, &v->lock, &iv->head, f);		\
+									\
+	CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head, f);	\
+	CHECK(inner_map_kptr, op, &iv->lock, &f2->head, b);		\
+	CHECK(inner_map_global, op, &iv->lock, &ghead, f);		\
+	CHECK(inner_map_map, op, &iv->lock, &v->head, f);
+
+CHECK_OP(push_front);
+CHECK_OP(push_back);
+
 #undef CHECK
 #undef CHECK_OP
 #undef INIT
@@ -303,34 +355,6 @@ int direct_write_node(void *ctx)
 	return 0;
 }
 
-static __always_inline
-int write_after_op(void (*push_op)(void *head, void *node))
-{
-	struct foo *f;
-
-	f = bpf_obj_new(typeof(*f));
-	if (!f)
-		return 0;
-	bpf_spin_lock(&glock);
-	push_op(&ghead, &f->node);
-	f->data = 42;
-	bpf_spin_unlock(&glock);
-
-	return 0;
-}
-
-SEC("?tc")
-int write_after_push_front(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_front);
-}
-
-SEC("?tc")
-int write_after_push_back(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_back);
-}
-
 static __always_inline
 int use_after_unlock(void (*op)(void *head, void *node))
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 04/13] bpf: rename list_head -> graph_root in field info types
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (2 preceding siblings ...)
  2022-12-17  8:24 ` [PATCH v2 bpf-next 03/13] selftests/bpf: Update linked_list tests for " Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:

include/linux/bpf.h:
  struct btf_field_list_head -> struct btf_field_graph_root
  list_head -> graph_root in struct btf_field union
kernel/bpf/btf.c:
  list_head -> graph_root in struct btf_field_info

This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  4 ++--
 kernel/bpf/btf.c      | 21 +++++++++++----------
 kernel/bpf/helpers.c  |  4 ++--
 kernel/bpf/verifier.c | 21 +++++++++++----------
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f71571bf6adc..3b49c11729b0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -190,7 +190,7 @@ struct btf_field_kptr {
 	u32 btf_id;
 };
 
-struct btf_field_list_head {
+struct btf_field_graph_root {
 	struct btf *btf;
 	u32 value_btf_id;
 	u32 node_offset;
@@ -202,7 +202,7 @@ struct btf_field {
 	enum btf_field_type type;
 	union {
 		struct btf_field_kptr kptr;
-		struct btf_field_list_head list_head;
+		struct btf_field_graph_root graph_root;
 	};
 };
 
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index f7dd8af06413..578cee398550 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3228,7 +3228,7 @@ struct btf_field_info {
 		struct {
 			const char *node_name;
 			u32 value_btf_id;
-		} list_head;
+		} graph_root;
 	};
 };
 
@@ -3335,8 +3335,8 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
 		return -EINVAL;
 	info->type = BPF_LIST_HEAD;
 	info->off = off;
-	info->list_head.value_btf_id = id;
-	info->list_head.node_name = list_node;
+	info->graph_root.value_btf_id = id;
+	info->graph_root.node_name = list_node;
 	return BTF_FIELD_FOUND;
 }
 
@@ -3604,13 +3604,14 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 	u32 offset;
 	int i;
 
-	t = btf_type_by_id(btf, info->list_head.value_btf_id);
+	t = btf_type_by_id(btf, info->graph_root.value_btf_id);
 	/* We've already checked that value_btf_id is a struct type. We
 	 * just need to figure out the offset of the list_node, and
 	 * verify its type.
 	 */
 	for_each_member(i, t, member) {
-		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
+		if (strcmp(info->graph_root.node_name,
+			   __btf_name_by_offset(btf, member->name_off)))
 			continue;
 		/* Invalid BTF, two members with same name */
 		if (n)
@@ -3627,9 +3628,9 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 		if (offset % __alignof__(struct bpf_list_node))
 			return -EINVAL;
 
-		field->list_head.btf = (struct btf *)btf;
-		field->list_head.value_btf_id = info->list_head.value_btf_id;
-		field->list_head.node_offset = offset;
+		field->graph_root.btf = (struct btf *)btf;
+		field->graph_root.value_btf_id = info->graph_root.value_btf_id;
+		field->graph_root.node_offset = offset;
 	}
 	if (!n)
 		return -ENOENT;
@@ -3736,11 +3737,11 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 
 		if (!(rec->fields[i].type & BPF_LIST_HEAD))
 			continue;
-		btf_id = rec->fields[i].list_head.value_btf_id;
+		btf_id = rec->fields[i].graph_root.value_btf_id;
 		meta = btf_find_struct_meta(btf, btf_id);
 		if (!meta)
 			return -EFAULT;
-		rec->fields[i].list_head.value_rec = meta->record;
+		rec->fields[i].graph_root.value_rec = meta->record;
 
 		if (!(rec->field_mask & BPF_LIST_NODE))
 			continue;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e041409779c3..1df87af6919e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1745,12 +1745,12 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	while (head != orig_head) {
 		void *obj = head;
 
-		obj -= field->list_head.node_offset;
+		obj -= field->graph_root.node_offset;
 		head = head->next;
 		/* The contained type can also have resources, including a
 		 * bpf_list_head which needs to be freed.
 		 */
-		bpf_obj_free_fields(field->list_head.value_rec, obj);
+		bpf_obj_free_fields(field->graph_root.value_rec, obj);
 		/* bpf_mem_free requires migrate_disable(), since we can be
 		 * called from map free path as well apart from BPF program (as
 		 * part of map ops doing bpf_obj_free_fields).
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 84b0660e2a76..c914230beea7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8899,21 +8899,22 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 
 	field = meta->arg_list_head.field;
 
-	et = btf_type_by_id(field->list_head.btf, field->list_head.value_btf_id);
+	et = btf_type_by_id(field->graph_root.btf, field->graph_root.value_btf_id);
 	t = btf_type_by_id(reg->btf, reg->btf_id);
-	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
-				  field->list_head.value_btf_id, true)) {
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->graph_root.btf,
+				  field->graph_root.value_btf_id, true)) {
 		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
 			"in struct %s, but arg is at offset=%d in struct %s\n",
-			field->list_head.node_offset, btf_name_by_offset(field->list_head.btf, et->name_off),
+			field->graph_root.node_offset,
+			btf_name_by_offset(field->graph_root.btf, et->name_off),
 			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
 
-	if (list_node_off != field->list_head.node_offset) {
+	if (list_node_off != field->graph_root.node_offset) {
 		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
-			list_node_off, field->list_head.node_offset,
-			btf_name_by_offset(field->list_head.btf, et->name_off));
+			list_node_off, field->graph_root.node_offset,
+			btf_name_by_offset(field->graph_root.btf, et->name_off));
 		return -EINVAL;
 	}
 
@@ -9363,9 +9364,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = field->list_head.btf;
-				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
-				regs[BPF_REG_0].off = field->list_head.node_offset;
+				regs[BPF_REG_0].btf = field->graph_root.btf;
+				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
+				regs[BPF_REG_0].off = field->graph_root.node_offset;
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (3 preceding siblings ...)
  2022-12-17  8:24 ` [PATCH v2 bpf-next 04/13] bpf: rename list_head -> graph_root in field info types Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-17  8:24 ` [PATCH v2 bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
map_values.

structs bpf_rb_root and bpf_rb_node are opaque types meant to
obscure structs rb_root_cached rb_node, respectively.

btf_struct_access will prevent BPF programs from touching these special
fields automatically now that they're recognized.

btf_check_and_fixup_fields now groups list_head and rb_root together as
"graph root" fields and {list,rb}_node as "graph node", and does same
ownership cycle checking as before. Note that this function does _not_
prevent ownership type mixups (e.g. rb_root owning list_node) - that's
handled by btf_parse_graph_root.

After this patch, a bpf program can have a struct bpf_rb_root in a
map_value, but not add anything to nor do anything useful with it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h                           |  20 ++-
 include/uapi/linux/bpf.h                      |  11 ++
 kernel/bpf/btf.c                              | 162 ++++++++++++------
 kernel/bpf/helpers.c                          |  40 +++++
 kernel/bpf/syscall.c                          |  28 ++-
 kernel/bpf/verifier.c                         |   5 +-
 tools/include/uapi/linux/bpf.h                |  11 ++
 .../selftests/bpf/prog_tests/linked_list.c    |  12 +-
 8 files changed, 216 insertions(+), 73 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3b49c11729b0..01ada7e04fa7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,7 +180,10 @@ enum btf_field_type {
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
 	BPF_LIST_NODE  = (1 << 5),
-	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
+	BPF_RB_ROOT    = (1 << 6),
+	BPF_RB_NODE    = (1 << 7),
+	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
+				 BPF_RB_NODE | BPF_RB_ROOT,
 };
 
 struct btf_field_kptr {
@@ -284,6 +287,10 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
 		return "bpf_list_node";
+	case BPF_RB_ROOT:
+		return "bpf_rb_root";
+	case BPF_RB_NODE:
+		return "bpf_rb_node";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -304,6 +311,10 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return sizeof(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return sizeof(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return sizeof(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -324,6 +335,10 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return __alignof__(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return __alignof__(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return __alignof__(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -434,6 +449,9 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 void bpf_timer_cancel_and_free(void *timer);
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock);
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock);
+
 
 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 464ca3f01fe7..bd260134c420 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6901,6 +6901,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 578cee398550..830bf2a58402 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3305,12 +3305,14 @@ static const char *btf_find_decl_tag_value(const struct btf *btf,
 	return NULL;
 }
 
-static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
-			      const struct btf_type *t, int comp_idx,
-			      u32 off, int sz, struct btf_field_info *info)
+static int
+btf_find_graph_root(const struct btf *btf, const struct btf_type *pt,
+		    const struct btf_type *t, int comp_idx, u32 off,
+		    int sz, struct btf_field_info *info,
+		    enum btf_field_type head_type)
 {
+	const char *node_field_name;
 	const char *value_type;
-	const char *list_node;
 	s32 id;
 
 	if (!__btf_type_is_struct(t))
@@ -3320,26 +3322,32 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
 	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
 	if (!value_type)
 		return -EINVAL;
-	list_node = strstr(value_type, ":");
-	if (!list_node)
+	node_field_name = strstr(value_type, ":");
+	if (!node_field_name)
 		return -EINVAL;
-	value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN);
+	value_type = kstrndup(value_type, node_field_name - value_type, GFP_KERNEL | __GFP_NOWARN);
 	if (!value_type)
 		return -ENOMEM;
 	id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT);
 	kfree(value_type);
 	if (id < 0)
 		return id;
-	list_node++;
-	if (str_is_empty(list_node))
+	node_field_name++;
+	if (str_is_empty(node_field_name))
 		return -EINVAL;
-	info->type = BPF_LIST_HEAD;
+	info->type = head_type;
 	info->off = off;
 	info->graph_root.value_btf_id = id;
-	info->graph_root.node_name = list_node;
+	info->graph_root.node_name = node_field_name;
 	return BTF_FIELD_FOUND;
 }
 
+#define field_mask_test_name(field_type, field_type_str) \
+	if (field_mask & field_type && !strcmp(name, field_type_str)) { \
+		type = field_type;					\
+		goto end;						\
+	}
+
 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			      int *align, int *sz)
 {
@@ -3363,18 +3371,11 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
-	if (field_mask & BPF_LIST_HEAD) {
-		if (!strcmp(name, "bpf_list_head")) {
-			type = BPF_LIST_HEAD;
-			goto end;
-		}
-	}
-	if (field_mask & BPF_LIST_NODE) {
-		if (!strcmp(name, "bpf_list_node")) {
-			type = BPF_LIST_NODE;
-			goto end;
-		}
-	}
+	field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
+	field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
+	field_mask_test_name(BPF_RB_ROOT,   "bpf_rb_root");
+	field_mask_test_name(BPF_RB_NODE,   "bpf_rb_node");
+
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3387,6 +3388,8 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 	return type;
 }
 
+#undef field_mask_test_name
+
 static int btf_find_struct_field(const struct btf *btf,
 				 const struct btf_type *t, u32 field_mask,
 				 struct btf_field_info *info, int info_cnt)
@@ -3419,6 +3422,7 @@ static int btf_find_struct_field(const struct btf *btf,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3432,8 +3436,11 @@ static int btf_find_struct_field(const struct btf *btf,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, t, member_type, i, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_graph_root(btf, t, member_type,
+						  i, off, sz,
+						  idx < info_cnt ? &info[idx] : &tmp,
+						  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3480,6 +3487,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3493,8 +3501,11 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, var, var_type, -1, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_graph_root(btf, var, var_type,
+						  -1, off, sz,
+						  idx < info_cnt ? &info[idx] : &tmp,
+						  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3596,8 +3607,11 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 	return ret;
 }
 
-static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
-			       struct btf_field_info *info)
+static int btf_parse_graph_root(const struct btf *btf,
+				struct btf_field *field,
+				struct btf_field_info *info,
+				const char *node_type_name,
+				size_t node_type_align)
 {
 	const struct btf_type *t, *n = NULL;
 	const struct btf_member *member;
@@ -3619,13 +3633,13 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 		n = btf_type_by_id(btf, member->type);
 		if (!__btf_type_is_struct(n))
 			return -EINVAL;
-		if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off)))
+		if (strcmp(node_type_name, __btf_name_by_offset(btf, n->name_off)))
 			return -EINVAL;
 		offset = __btf_member_bit_offset(n, member);
 		if (offset % 8)
 			return -EINVAL;
 		offset /= 8;
-		if (offset % __alignof__(struct bpf_list_node))
+		if (offset % node_type_align)
 			return -EINVAL;
 
 		field->graph_root.btf = (struct btf *)btf;
@@ -3637,6 +3651,20 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 	return 0;
 }
 
+static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
+			       struct btf_field_info *info)
+{
+	return btf_parse_graph_root(btf, field, info, "bpf_list_node",
+					    __alignof__(struct bpf_list_node));
+}
+
+static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field,
+			     struct btf_field_info *info)
+{
+	return btf_parse_graph_root(btf, field, info, "bpf_rb_node",
+					    __alignof__(struct bpf_rb_node));
+}
+
 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
 				    u32 field_mask, u32 value_size)
 {
@@ -3699,7 +3727,13 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_RB_ROOT:
+			ret = btf_parse_rb_root(btf, &rec->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			ret = -EFAULT;
@@ -3708,8 +3742,9 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
-	/* bpf_list_head requires bpf_spin_lock */
-	if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) {
+	/* bpf_{list_head, rb_node} require bpf_spin_lock */
+	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
+	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -3720,22 +3755,28 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 	return ERR_PTR(ret);
 }
 
+#define GRAPH_ROOT_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
+#define GRAPH_NODE_MASK (BPF_LIST_NODE | BPF_RB_NODE)
+
 int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 {
 	int i;
 
-	/* There are two owning types, kptr_ref and bpf_list_head. The former
-	 * only supports storing kernel types, which can never store references
-	 * to program allocated local types, atleast not yet. Hence we only need
-	 * to ensure that bpf_list_head ownership does not form cycles.
+	/* There are three types that signify ownership of some other type:
+	 *  kptr_ref, bpf_list_head, bpf_rb_root.
+	 * kptr_ref only supports storing kernel types, which can't store
+	 * references to program allocated local types.
+	 *
+	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
+	 * does not form cycles.
 	 */
-	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
+	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & GRAPH_ROOT_MASK))
 		return 0;
 	for (i = 0; i < rec->cnt; i++) {
 		struct btf_struct_meta *meta;
 		u32 btf_id;
 
-		if (!(rec->fields[i].type & BPF_LIST_HEAD))
+		if (!(rec->fields[i].type & GRAPH_ROOT_MASK))
 			continue;
 		btf_id = rec->fields[i].graph_root.value_btf_id;
 		meta = btf_find_struct_meta(btf, btf_id);
@@ -3743,39 +3784,47 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 			return -EFAULT;
 		rec->fields[i].graph_root.value_rec = meta->record;
 
-		if (!(rec->field_mask & BPF_LIST_NODE))
+		/* We need to set value_rec for all root types, but no need
+		 * to check ownership cycle for a type unless it's also a
+		 * node type.
+		 */
+		if (!(rec->field_mask & GRAPH_NODE_MASK))
 			continue;
 
 		/* We need to ensure ownership acyclicity among all types. The
 		 * proper way to do it would be to topologically sort all BTF
 		 * IDs based on the ownership edges, since there can be multiple
-		 * bpf_list_head in a type. Instead, we use the following
-		 * reasoning:
+		 * bpf_{list_head,rb_node} in a type. Instead, we use the
+		 * following resaoning:
 		 *
 		 * - A type can only be owned by another type in user BTF if it
-		 *   has a bpf_list_node.
+		 *   has a bpf_{list,rb}_node. Let's call these node types.
 		 * - A type can only _own_ another type in user BTF if it has a
-		 *   bpf_list_head.
+		 *   bpf_{list_head,rb_root}. Let's call these root types.
 		 *
-		 * We ensure that if a type has both bpf_list_head and
-		 * bpf_list_node, its element types cannot be owning types.
+		 * We ensure that if a type is both a root and node, its
+		 * element types cannot be root types.
 		 *
 		 * To ensure acyclicity:
 		 *
-		 * When A only has bpf_list_head, ownership chain can be:
+		 * When A is an root type but not a node, its ownership
+		 * chain can be:
 		 *	A -> B -> C
 		 * Where:
-		 * - B has both bpf_list_head and bpf_list_node.
-		 * - C only has bpf_list_node.
+		 * - A is an root, e.g. has bpf_rb_root.
+		 * - B is both a root and node, e.g. has bpf_rb_node and
+		 *   bpf_list_head.
+		 * - C is only an root, e.g. has bpf_list_node
 		 *
-		 * When A has both bpf_list_head and bpf_list_node, some other
-		 * type already owns it in the BTF domain, hence it can not own
-		 * another owning type through any of the bpf_list_head edges.
+		 * When A is both a root and node, some other type already
+		 * owns it in the BTF domain, hence it can not own
+		 * another root type through any of the ownership edges.
 		 *	A -> B
 		 * Where:
-		 * - B only has bpf_list_node.
+		 * - A is both an root and node.
+		 * - B is only an node.
 		 */
-		if (meta->record->field_mask & BPF_LIST_HEAD)
+		if (meta->record->field_mask & GRAPH_ROOT_MASK)
 			return -ELOOP;
 	}
 	return 0;
@@ -5237,6 +5286,8 @@ static const char *alloc_obj_fields[] = {
 	"bpf_spin_lock",
 	"bpf_list_head",
 	"bpf_list_node",
+	"bpf_rb_root",
+	"bpf_rb_node",
 };
 
 static struct btf_struct_metas *
@@ -5310,7 +5361,8 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+						  BPF_RB_ROOT | BPF_RB_NODE, t->size);
 		/* The record cannot be unset, treat it as an error if so */
 		if (IS_ERR_OR_NULL(record)) {
 			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1df87af6919e..30fff015f9a1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1761,6 +1761,46 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	}
 }
 
+/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
+ * 'rb_node *', so field name of rb_node within containing struct is not
+ * needed.
+ *
+ * Since bpf_rb_tree's node type has a corresponding struct btf_field with
+ * graph_root.node_offset, it's not necessary to know field name
+ * or type of node struct
+ */
+#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \
+	for (pos = rb_first_postorder(root); \
+	    pos && ({ n = rb_next_postorder(pos); 1; }); \
+	    pos = n)
+
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock)
+{
+	struct rb_root_cached orig_root, *root = rb_root;
+	struct rb_node *pos, *n;
+	void *obj;
+
+	BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
+	BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
+
+	__bpf_spin_lock_irqsave(spin_lock);
+	orig_root = *root;
+	*root = RB_ROOT_CACHED;
+	__bpf_spin_unlock_irqrestore(spin_lock);
+
+	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
+		obj = pos;
+		obj -= field->graph_root.node_offset;
+
+		bpf_obj_free_fields(field->graph_root.value_rec, obj);
+
+		migrate_disable();
+		bpf_mem_free(&bpf_global_ma, obj);
+		migrate_enable();
+	}
+}
+
 __diag_push();
 __diag_ignore_all("-Wmissing-prototypes",
 		  "Global functions as their definitions will be in vmlinux BTF");
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..08e2def7ff93 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -527,9 +527,6 @@ void btf_record_free(struct btf_record *rec)
 		return;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (rec->fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			if (rec->fields[i].kptr.module)
@@ -538,7 +535,11 @@ void btf_record_free(struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to release for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to release */
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -571,9 +572,6 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 	new_rec->cnt = 0;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			btf_get(fields[i].kptr.btf);
@@ -584,7 +582,11 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to acquire for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to acquire */
 			break;
 		default:
 			ret = -EFAULT;
@@ -664,7 +666,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 				continue;
 			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
+		case BPF_RB_ROOT:
+			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
+				continue;
+			bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -1005,7 +1013,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
+				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_RB_ROOT,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
@@ -1053,6 +1062,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				}
 				break;
 			case BPF_LIST_HEAD:
+			case BPF_RB_ROOT:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c914230beea7..89d8d754567b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14392,9 +14392,10 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
-	if (btf_record_has_field(map->record, BPF_LIST_HEAD)) {
+	if (btf_record_has_field(map->record, BPF_LIST_HEAD) ||
+	    btf_record_has_field(map->record, BPF_RB_ROOT)) {
 		if (is_tracing_prog_type(prog_type)) {
-			verbose(env, "tracing progs cannot use bpf_list_head yet\n");
+			verbose(env, "tracing progs cannot use bpf_{list_head,rb_root} yet\n");
 			return -EINVAL;
 		}
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 464ca3f01fe7..bd260134c420 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6901,6 +6901,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index a8091a0c0831..d44ba935207f 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -58,12 +58,12 @@ static struct {
 	TEST(inner_map, pop_front)
 	TEST(inner_map, pop_back)
 #undef TEST
-	{ "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_perf", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_kprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_kretprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_perf", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
 	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
 	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (4 preceding siblings ...)
  2022-12-17  8:24 ` [PATCH v2 bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
@ 2022-12-17  8:24 ` Dave Marchevsky
  2022-12-17  8:25 ` [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:24 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds implementations of bpf_rbtree_{add,remove,first}
and teaches verifier about their BTF_IDs as well as those of
bpf_rb_{root,node}.

All three kfuncs have some nonstandard component to their verification
that needs to be addressed in future patches before programs can
properly use them:

  * bpf_rbtree_add:     Takes 'less' callback, need to verify it

  * bpf_rbtree_first:   Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). Return value ref is
			non-owning.

  * bpf_rbtree_remove:  Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). 2nd arg (node) is a
			non-owning reference.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c  | 28 ++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 11 +++++++++++
 2 files changed, 39 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 30fff015f9a1..de4523c777b7 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1873,6 +1873,30 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 	return __bpf_list_del(head, true);
 }
 
+struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, struct bpf_rb_node *node)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+	struct rb_node *n = (struct rb_node *)node;
+
+	rb_erase_cached(n, r);
+	RB_CLEAR_NODE(n);
+	return (struct bpf_rb_node *)n;
+}
+
+void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+		    bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
+{
+	rb_add_cached((struct rb_node *)node, (struct rb_root_cached *)root,
+		      (bool (*)(struct rb_node *, const struct rb_node *))less);
+}
+
+struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+
+	return (struct bpf_rb_node *)rb_first_cached(r);
+}
+
 /**
  * bpf_task_acquire - Acquire a reference to a task. A task acquired by this
  * kfunc which is not stored in a map as a kptr, must be released by calling
@@ -2097,6 +2121,10 @@ BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_rbtree_add, KF_RELEASE | KF_RELEASE_NON_OWN)
+BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
+
 #ifdef CONFIG_CGROUPS
 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 89d8d754567b..e90cf0b74571 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8406,6 +8406,8 @@ BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
 BTF_ID(struct, bpf_list_head)
 BTF_ID(struct, bpf_list_node)
+BTF_ID(struct, bpf_rb_root)
+BTF_ID(struct, bpf_rb_node)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -8511,6 +8513,9 @@ enum special_kfunc_type {
 	KF_bpf_rdonly_cast,
 	KF_bpf_rcu_read_lock,
 	KF_bpf_rcu_read_unlock,
+	KF_bpf_rbtree_remove,
+	KF_bpf_rbtree_add,
+	KF_bpf_rbtree_first,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -8522,6 +8527,9 @@ BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -8535,6 +8543,9 @@ BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
 BTF_ID(func, bpf_rcu_read_lock)
 BTF_ID(func, bpf_rcu_read_unlock)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (5 preceding siblings ...)
  2022-12-17  8:24 ` [PATCH v2 bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-29  4:00   ` Alexei Starovoitov
  2022-12-17  8:25 ` [PATCH v2 bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Now that we find bpf_rb_root and bpf_rb_node in structs, let's give args
that contain those types special classification and properly handle
these types when checking kfunc args.

"Properly handling" these types largely requires generalizing similar
handling for bpf_list_{head,node}, with little new logic added in this
patch.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 238 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 203 insertions(+), 35 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e90cf0b74571..06ab0eb6ee7f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8284,6 +8284,9 @@ struct bpf_kfunc_call_arg_meta {
 	struct {
 		struct btf_field *field;
 	} arg_list_head;
+	struct {
+		struct btf_field *field;
+	} arg_rbtree_root;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -8400,6 +8403,8 @@ enum {
 	KF_ARG_DYNPTR_ID,
 	KF_ARG_LIST_HEAD_ID,
 	KF_ARG_LIST_NODE_ID,
+	KF_ARG_RB_ROOT_ID,
+	KF_ARG_RB_NODE_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -8441,6 +8446,16 @@ static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
 }
 
+static bool is_kfunc_arg_rbtree_root(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_ROOT_ID);
+}
+
+static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8500,6 +8515,8 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_RB_ROOT,
+	KF_ARG_PTR_TO_RB_NODE,
 };
 
 enum special_kfunc_type {
@@ -8607,6 +8624,12 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_LIST_NODE;
 
+	if (is_kfunc_arg_rbtree_root(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_ROOT;
+
+	if (is_kfunc_arg_rbtree_node(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_NODE;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -8836,95 +8859,193 @@ static bool is_bpf_list_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_back];
 }
 
-static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
-					   struct bpf_reg_state *reg, u32 regno,
-					   struct bpf_kfunc_call_arg_meta *meta)
+static bool is_bpf_rbtree_api_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_first];
+}
+
+static bool is_bpf_graph_api_kfunc(u32 btf_id)
+{
+	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
+}
+
+static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
+					  enum btf_field_type head_field_type,
+					  u32 kfunc_btf_id)
 {
+	bool ret;
+
+	switch (head_field_type) {
+	case BPF_LIST_HEAD:
+		ret = is_bpf_list_api_kfunc(kfunc_btf_id);
+		break;
+	case BPF_RB_ROOT:
+		ret = is_bpf_rbtree_api_kfunc(kfunc_btf_id);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected graph root argument type %s\n",
+			btf_field_type_name(head_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s head arg for unknown kfunc\n",
+			btf_field_type_name(head_field_type));
+	return ret;
+}
+
+static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env,
+					  enum btf_field_type node_field_type,
+					  u32 kfunc_btf_id)
+{
+	bool ret;
+
+	switch (node_field_type) {
+	case BPF_LIST_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_front] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_back]);
+		break;
+	case BPF_RB_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add]);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected graph node argument type %s\n",
+			btf_field_type_name(node_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s node arg for unknown kfunc\n",
+			btf_field_type_name(node_field_type));
+	return ret;
+}
+
+static int
+__process_kf_arg_ptr_to_graph_root(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg, u32 regno,
+				   struct bpf_kfunc_call_arg_meta *meta,
+				   enum btf_field_type head_field_type,
+				   struct btf_field **head_field)
+{
+	const char *head_type_name;
 	struct btf_field *field;
 	struct btf_record *rec;
-	u32 list_head_off;
+	u32 head_off;
 
-	if (meta->btf != btf_vmlinux || !is_bpf_list_api_kfunc(meta->func_id)) {
-		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_graph_root_api(env, head_field_type, meta->func_id))
+		return -EFAULT;
+
+	head_type_name = btf_field_type_name(head_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, head_type_name);
 		return -EINVAL;
 	}
 
 	rec = reg_btf_record(reg);
-	list_head_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_head_off, BPF_LIST_HEAD);
+	head_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, head_off, head_field_type);
 	if (!field) {
-		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
+		verbose(env, "%s not found at offset=%u\n", head_type_name, head_off);
 		return -EINVAL;
 	}
 
 	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
 	if (check_reg_allocation_locked(env, reg)) {
-		verbose(env, "bpf_spin_lock at off=%d must be held for bpf_list_head\n",
-			rec->spin_lock_off);
+		verbose(env, "bpf_spin_lock at off=%d must be held for %s\n",
+			rec->spin_lock_off, head_type_name);
 		return -EINVAL;
 	}
 
-	if (meta->arg_list_head.field) {
-		verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
+	if (*head_field) {
+		verbose(env, "verifier internal error: repeating %s arg\n", head_type_name);
 		return -EFAULT;
 	}
-	meta->arg_list_head.field = field;
+	*head_field = field;
 	return 0;
 }
 
-static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
 					   struct bpf_reg_state *reg, u32 regno,
 					   struct bpf_kfunc_call_arg_meta *meta)
 {
+	return __process_kf_arg_ptr_to_graph_root(env, reg, regno, meta, BPF_LIST_HEAD,
+							  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_root(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_root(env, reg, regno, meta, BPF_RB_ROOT,
+							  &meta->arg_rbtree_root.field);
+}
+
+static int
+__process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg, u32 regno,
+				   struct bpf_kfunc_call_arg_meta *meta,
+				   enum btf_field_type head_field_type,
+				   enum btf_field_type node_field_type,
+				   struct btf_field **node_field)
+{
+	const char *node_type_name;
 	const struct btf_type *et, *t;
 	struct btf_field *field;
-	u32 list_node_off;
+	u32 node_off;
 
-	if (meta->btf != btf_vmlinux ||
-	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
-	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
-		verbose(env, "verifier internal error: bpf_list_node argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_graph_node_api(env, node_field_type, meta->func_id))
+		return -EFAULT;
+
+	node_type_name = btf_field_type_name(node_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_node has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, node_type_name);
 		return -EINVAL;
 	}
 
-	list_node_off = reg->off + reg->var_off.value;
-	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
-	if (!field || field->offset != list_node_off) {
-		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
+	node_off = reg->off + reg->var_off.value;
+	field = reg_find_field_offset(reg, node_off, node_field_type);
+	if (!field || field->offset != node_off) {
+		verbose(env, "%s not found at offset=%u\n", node_type_name, node_off);
 		return -EINVAL;
 	}
 
-	field = meta->arg_list_head.field;
+	field = *node_field;
 
 	et = btf_type_by_id(field->graph_root.btf, field->graph_root.value_btf_id);
 	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->graph_root.btf,
 				  field->graph_root.value_btf_id, true)) {
-		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
+		verbose(env, "operation on %s expects arg#1 %s at offset=%d "
 			"in struct %s, but arg is at offset=%d in struct %s\n",
+			btf_field_type_name(head_field_type),
+			btf_field_type_name(node_field_type),
 			field->graph_root.node_offset,
 			btf_name_by_offset(field->graph_root.btf, et->name_off),
-			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
+			node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
 
-	if (list_node_off != field->graph_root.node_offset) {
-		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
-			list_node_off, field->graph_root.node_offset,
+	if (node_off != field->graph_root.node_offset) {
+		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
+			node_off, btf_field_type_name(node_field_type),
+			field->graph_root.node_offset,
 			btf_name_by_offset(field->graph_root.btf, et->name_off));
 		return -EINVAL;
 	}
@@ -8932,6 +9053,24 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
+						  BPF_LIST_HEAD, BPF_LIST_NODE,
+						  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
+						  BPF_RB_ROOT, BPF_RB_NODE,
+						  &meta->arg_rbtree_root.field);
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
@@ -9063,6 +9202,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_DYNPTR:
 		case KF_ARG_PTR_TO_LIST_HEAD:
 		case KF_ARG_PTR_TO_LIST_NODE:
+		case KF_ARG_PTR_TO_RB_ROOT:
+		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
 			/* Trusted by default */
@@ -9141,6 +9282,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_ROOT:
+			if (reg->type != PTR_TO_MAP_VALUE &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_root(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_LIST_NODE:
 			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
@@ -9154,6 +9309,19 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_NODE:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to allocated object\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
@@ -14105,7 +14273,7 @@ static int do_check(struct bpf_verifier_env *env)
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_CALL) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
-					     (insn->off != 0 || !is_bpf_list_api_kfunc(insn->imm)))) {
+					     (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) {
 						verbose(env, "function calls are not allowed while holding a lock\n");
 						return -EINVAL;
 					}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (6 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-17  8:25 ` [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Some BPF helpers take a callback function which the helper calls. For
each helper that takes such a callback, there's a special call to
__check_func_call with a callback-state-setting callback that sets up
verifier bpf_func_state for the callback's frame.

kfuncs don't have any of this infrastructure yet, so let's add it in
this patch, following existing helper pattern as much as possible. To
validate functionality of this added plumbing, this patch adds
callback handling for the bpf_rbtree_add kfunc and hopes to lay
groundwork for future graph datastructure callbacks.

In the "general plumbing" category we have:

  * check_kfunc_call doing callback verification right before clearing
    CALLER_SAVED_REGS, exactly like check_helper_call
  * recognition of func_ptr BTF types in kfunc args as
    KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type

In the "rbtree_add / graph datastructure-specific plumbing" category:

  * Since bpf_rbtree_add must be called while the spin_lock associated
    with the tree is held, don't complain when callback's func_state
    doesn't unlock it by frame exit
  * Mark rbtree_add callback's args with ref_set_non_owning_lock
    to prevent rbtree api functions from being called in the callback.
    Semantically this makes sense, as less() takes no ownership of its
    args when determining which comes first.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 135 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 06ab0eb6ee7f..75979f78399d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -192,6 +192,8 @@ static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
 				       struct bpf_active_lock *lock);
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
+
 static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *reg);
 
@@ -1491,6 +1493,16 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 	reg->type &= ~PTR_MAYBE_NULL;
 }
 
+static void mark_reg_graph_node(struct bpf_reg_state *regs, u32 regno,
+				struct btf_field_graph_root *ds_head)
+{
+	__mark_reg_known_zero(&regs[regno]);
+	regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC;
+	regs[regno].btf = ds_head->btf;
+	regs[regno].btf_id = ds_head->value_btf_id;
+	regs[regno].off = ds_head->node_offset;
+}
+
 static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)
 {
 	return type_is_pkt_pointer(reg->type);
@@ -6504,6 +6516,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		meta->ret_btf_id = reg->btf_id;
 		break;
 	case ARG_PTR_TO_SPIN_LOCK:
+		if (in_rbtree_lock_required_cb(env)) {
+			verbose(env, "can't spin_{lock,unlock} in rbtree cb\n");
+			return -EACCES;
+		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
 			err = process_spin_lock(env, regno, true);
 			if (err)
@@ -7111,6 +7127,8 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *caller,
 			    struct bpf_func_state *callee, int insn_idx);
 
+static bool is_callback_calling_kfunc(u32 btf_id);
+
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx, int subprog,
 			     set_callee_state_fn set_callee_state_cb)
@@ -7165,10 +7183,18 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	 * interested in validating only BPF helpers that can call subprogs as
 	 * callbacks
 	 */
-	if (set_callee_state_cb != set_callee_state && !is_callback_calling_function(insn->imm)) {
-		verbose(env, "verifier bug: helper %s#%d is not marked as callback-calling\n",
-			func_id_name(insn->imm), insn->imm);
-		return -EFAULT;
+	if (set_callee_state_cb != set_callee_state) {
+		if (bpf_pseudo_kfunc_call(insn) &&
+		    !is_callback_calling_kfunc(insn->imm)) {
+			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		} else if (!bpf_pseudo_kfunc_call(insn) &&
+			   !is_callback_calling_function(insn->imm)) { /* helper */
+			verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		}
 	}
 
 	if (insn->code == (BPF_JMP | BPF_CALL) &&
@@ -7433,6 +7459,63 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
+					 struct bpf_func_state *caller,
+					 struct bpf_func_state *callee,
+					 int insn_idx)
+{
+	/* void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+	 *                     bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b));
+	 *
+	 * 'struct bpf_rb_node *node' arg to bpf_rbtree_add is the same PTR_TO_BTF_ID w/ offset
+	 * that 'less' callback args will be receiving. However, 'node' arg was release_reference'd
+	 * by this point, so look at 'root'
+	 */
+	struct btf_field *field;
+
+	field = reg_find_field_offset(&caller->regs[BPF_REG_1], caller->regs[BPF_REG_1].off,
+				      BPF_RB_ROOT);
+	if (!field || !field->graph_root.value_btf_id)
+		return -EFAULT;
+
+	mark_reg_graph_node(callee->regs, BPF_REG_1, &field->graph_root);
+	ref_set_non_owning_lock(env, &callee->regs[BPF_REG_1]);
+	mark_reg_graph_node(callee->regs, BPF_REG_2, &field->graph_root);
+	ref_set_non_owning_lock(env, &callee->regs[BPF_REG_2]);
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, 1);
+	return 0;
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id);
+
+/* Are we currently verifying the callback for a rbtree helper that must
+ * be called with lock held? If so, no need to complain about unreleased
+ * lock
+ */
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_insn *insn = env->prog->insnsi;
+	struct bpf_func_state *callee;
+	int kfunc_btf_id;
+
+	if (!state->curframe)
+		return false;
+
+	callee = state->frame[state->curframe];
+
+	if (!callee->in_callback_fn)
+		return false;
+
+	kfunc_btf_id = insn[callee->callsite].imm;
+	return is_rbtree_lock_required_kfunc(kfunc_btf_id);
+}
+
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -8273,6 +8356,7 @@ struct bpf_kfunc_call_arg_meta {
 	bool r0_rdonly;
 	u32 ret_btf_id;
 	u64 r0_size;
+	u32 subprogno;
 	struct {
 		u64 value;
 		bool found;
@@ -8456,6 +8540,18 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
 }
 
+static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
+				  const struct btf_param *arg)
+{
+	const struct btf_type *t;
+
+	t = btf_type_resolve_func_ptr(btf, arg->type, NULL);
+	if (!t)
+		return false;
+
+	return true;
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8515,6 +8611,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_CALLBACK,
 	KF_ARG_PTR_TO_RB_ROOT,
 	KF_ARG_PTR_TO_RB_NODE,
 };
@@ -8639,6 +8736,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		return KF_ARG_PTR_TO_BTF_ID;
 	}
 
+	if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_CALLBACK;
+
 	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
 		arg_mem_size = true;
 
@@ -8871,6 +8971,16 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
 }
 
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id)
+{
+	return is_bpf_rbtree_api_kfunc(btf_id);
+}
+
 static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
 					  enum btf_field_type head_field_type,
 					  u32 kfunc_btf_id)
@@ -9206,6 +9316,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
+		case KF_ARG_PTR_TO_CALLBACK:
 			/* Trusted by default */
 			break;
 		default:
@@ -9357,6 +9468,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			/* Skip next '__sz' argument */
 			i++;
 			break;
+		case KF_ARG_PTR_TO_CALLBACK:
+			meta->subprogno = reg->subprogno;
+			break;
 		}
 	}
 
@@ -9477,6 +9591,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_rbtree_add_callback_state);
+		if (err) {
+			verbose(env, "kfunc %s#%d failed callback verification\n",
+				func_name, func_id);
+			return err;
+		}
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
@@ -14309,7 +14433,8 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_lock.ptr) {
+				if (env->cur_state->active_lock.ptr &&
+				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (7 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-29  4:02   ` Alexei Starovoitov
  2022-12-17  8:25 ` [PATCH v2 bpf-next 10/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
that require handling in the verifier:

  * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
    the bpf_rb_node field, with the offset set to that field's offset,
    instead of a struct bpf_rb_node *
    * mark_reg_graph_node helper added in previous patch generalizes
      this logic, use it

  * bpf_rbtree_remove's node input is a node that's been inserted
    in the tree - a non-owning reference.

  * bpf_rbtree_remove must invalidate non-owning references in order to
    avoid aliasing issue. Add KF_INVALIDATE_NON_OWN flag, which
    indicates that the marked kfunc is a non-owning ref invalidation
    point, and associated verifier logic using previously-added
    invalidate_non_owning_refs helper.

  * Unlike other functions, which convert one of their input arg regs to
    non-owning reference, bpf_rbtree_first takes no arguments and just
    returns a non-owning reference (possibly null)
    * For now verifier logic for this is special-cased instead of
      adding new kfunc flag.

This patch, along with the previous one, complete special verifier
handling for all rbtree API functions added in this series.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/btf.h   |  1 +
 kernel/bpf/helpers.c  |  2 +-
 kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++------
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8aee3f7f4248..3663911bb7c0 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -72,6 +72,7 @@
 #define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
 #define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
 #define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
+#define KF_INVALIDATE_NON_OWN	(1 << 9) /* kfunc invalidates non-owning refs after return */
 
 /*
  * Return the name of the passed struct, if exists, or halt the build if for
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index de4523c777b7..0e6d010e6423 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2121,7 +2121,7 @@ BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_INVALIDATE_NON_OWN)
 BTF_ID_FLAGS(func, bpf_rbtree_add, KF_RELEASE | KF_RELEASE_NON_OWN)
 BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 75979f78399d..b4bf3701de7f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8393,6 +8393,11 @@ static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
 }
 
+static bool is_kfunc_invalidate_non_own(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_INVALIDATE_NON_OWN;
+}
+
 static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_TRUSTED_ARGS;
@@ -9425,10 +9430,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
 				return -EINVAL;
 			}
-			if (!reg->ref_obj_id) {
+			if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove]) {
+				if (reg->ref_obj_id) {
+					verbose(env, "rbtree_remove node input must be non-owning ref\n");
+					return -EINVAL;
+				}
+				if (in_rbtree_lock_required_cb(env)) {
+					verbose(env, "rbtree_remove not allowed in rbtree cb\n");
+					return -EINVAL;
+				}
+			} else if (!reg->ref_obj_id) {
 				verbose(env, "allocated object must be referenced\n");
 				return -EINVAL;
 			}
+
 			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
 			if (ret < 0)
 				return ret;
@@ -9665,11 +9680,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
 				struct btf_field *field = meta.arg_list_head.field;
 
-				mark_reg_known_zero(env, regs, BPF_REG_0);
-				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = field->graph_root.btf;
-				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
-				regs[BPF_REG_0].off = field->graph_root.node_offset;
+				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+				struct btf_field *field = meta.arg_rbtree_root.field;
+
+				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
@@ -9735,7 +9751,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			if (is_kfunc_ret_null(&meta))
 				regs[BPF_REG_0].id = id;
 			regs[BPF_REG_0].ref_obj_id = id;
+		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
 		}
+
+		if (is_kfunc_invalidate_non_own(&meta))
+			invalidate_non_owning_refs(env, &env->cur_state->active_lock);
+
 		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
 			regs[BPF_REG_0].id = ++env->id_gen;
 	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 10/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (8 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-17  8:25 ` [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

These kfuncs will be used by selftests in following patches

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 424f7bbbfe9b..dbd2c729781a 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -65,4 +65,28 @@ extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ks
  */
 extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
 
+/* Description
+ *	Remove 'node' from rbtree with root 'root'
+ * Returns
+ * 	Pointer to the removed node, or NULL if 'root' didn't contain 'node'
+ */
+extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
+					     struct bpf_rb_node *node) __ksym;
+
+/* Description
+ *	Add 'node' to rbtree with root 'root' using comparator 'less'
+ * Returns
+ *	Nothing
+ */
+extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+			   bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) __ksym;
+
+/* Description
+ *	Return the first (leftmost) node in input tree
+ * Returns
+ *	Pointer to the node, which is _not_ removed from the tree. If the tree
+ *	contains no nodes, returns NULL.
+ */
+extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (9 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 10/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-22 18:50   ` Andrii Nakryiko
  2022-12-17  8:25 ` [PATCH v2 bpf-next 12/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
  2022-12-17  8:25 ` [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
  12 siblings, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

If a BPF program defines a struct or union type which has a field type
that the verifier considers special - spin_lock, graph datastructure
heads and nodes - the verifier needs to be able to find fields of that
type using BTF.

For such a program, BTF is required, so modify kernel_needs_btf helper
to ensure that correct "BTF is mandatory" error message is emitted.

The newly-added btf_has_alloc_obj_type looks for BTF_KIND_STRUCTs with a
name corresponding to a special type. If any such struct is found it is
assumed that some variable is using it, and therefore that successful
BTF load is necessary.

Also add a kernel_needs_btf check to bpf_object__create_map where it was
previously missing. When this function calls bpf_map_create, kernel may
reject map creation due to mismatched graph owner and ownee
types (e.g. a struct bpf_list_head with __contains tag pointing to
bpf_rbtree_node field). In such a scenario - or any other where BTF is
necessary for verification - bpf_map_create should not be retried
without BTF.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 tools/lib/bpf/libbpf.c | 50 ++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2a82f49ce16f..56a905b502c9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -998,6 +998,31 @@ find_struct_ops_kern_types(const struct btf *btf, const char *tname,
 	return 0;
 }
 
+/* Should match alloc_obj_fields in kernel/bpf/btf.c
+ */
+static const char *alloc_obj_fields[] = {
+	"bpf_spin_lock",
+	"bpf_list_head",
+	"bpf_list_node",
+	"bpf_rb_root",
+	"bpf_rb_node",
+};
+
+static bool
+btf_has_alloc_obj_type(const struct btf *btf)
+{
+	const char *tname;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(alloc_obj_fields); i++) {
+		tname = alloc_obj_fields[i];
+		if (btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT) > 0)
+			return true;
+	}
+
+	return false;
+}
+
 static bool bpf_map__is_struct_ops(const struct bpf_map *map)
 {
 	return map->def.type == BPF_MAP_TYPE_STRUCT_OPS;
@@ -2794,7 +2819,8 @@ static bool libbpf_needs_btf(const struct bpf_object *obj)
 
 static bool kernel_needs_btf(const struct bpf_object *obj)
 {
-	return obj->efile.st_ops_shndx >= 0;
+	return obj->efile.st_ops_shndx >= 0 ||
+		(obj->btf && btf_has_alloc_obj_type(obj->btf));
 }
 
 static int bpf_object__init_btf(struct bpf_object *obj,
@@ -5103,16 +5129,18 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 
 		err = -errno;
 		cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
-		pr_warn("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
-			map->name, cp, err);
-		create_attr.btf_fd = 0;
-		create_attr.btf_key_type_id = 0;
-		create_attr.btf_value_type_id = 0;
-		map->btf_key_type_id = 0;
-		map->btf_value_type_id = 0;
-		map->fd = bpf_map_create(def->type, map_name,
-					 def->key_size, def->value_size,
-					 def->max_entries, &create_attr);
+		pr_warn("Error in bpf_create_map_xattr(%s):%s(%d).\n", map->name, cp, err);
+		if (!kernel_needs_btf(obj)) {
+			pr_warn("Retrying bpf_map_create_xattr(%s) without BTF.\n", map->name);
+			create_attr.btf_fd = 0;
+			create_attr.btf_key_type_id = 0;
+			create_attr.btf_value_type_id = 0;
+			map->btf_key_type_id = 0;
+			map->btf_value_type_id = 0;
+			map->fd = bpf_map_create(def->type, map_name,
+						 def->key_size, def->value_size,
+						 def->max_entries, &create_attr);
+		}
 	}
 
 	err = map->fd < 0 ? -errno : 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 12/13] selftests/bpf: Add rbtree selftests
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (10 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-17  8:25 ` [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
  12 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds selftests exercising the logic changed/added in the
previous patches in the series. A variety of successful and unsuccessful
rbtree usages are validated:

Success:
  * Add some nodes, let map_value bpf_rbtree_root destructor clean them
    up
  * Add some nodes, remove one using the non-owning ref leftover by
    successful rbtree_add() call
  * Add some nodes, remove one using the non-owning ref returned by
    rbtree_first() call

Failure:
  * BTF where bpf_rb_root owns bpf_list_node should fail to load
  * BTF where node of type X is added to tree containing nodes of type Y
    should fail to load
  * No calling rbtree api functions in 'less' callback for rbtree_add
  * No releasing lock in 'less' callback for rbtree_add
  * No removing a node which hasn't been added to any tree
  * No adding a node which has already been added to a tree
  * No escaping of non-owning references past their lock's
    critical section
  * No escaping of non-owning references past other invalidation points
    (rbtree_remove)

These tests mostly focus on rbtree-specific additions, but some of the
failure cases revalidate scenarios common to both linked_list and rbtree
which are covered in the former's tests. Better to be a bit redundant in
case linked_list and rbtree semantics deviate over time.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/prog_tests/rbtree.c | 186 +++++++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 176 +++++++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  52 +++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  49 +++
 .../testing/selftests/bpf/progs/rbtree_fail.c | 296 ++++++++++++++++++
 5 files changed, 759 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree.c b/tools/testing/selftests/bpf/prog_tests/rbtree.c
new file mode 100644
index 000000000000..ae332a532223
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "rbtree.skel.h"
+#include "rbtree_fail.skel.h"
+#include "rbtree_btf_fail__wrong_node_type.skel.h"
+#include "rbtree_btf_fail__add_wrong_type.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} rbtree_fail_tests[] = {
+	{"rbtree_api_nolock_add", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_remove", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_first", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+
+	/* Specific failure string for these three isn't very important, but it shouldn't be
+	 * possible to call rbtree api func from within add() callback
+	 */
+	{"rbtree_api_add_bad_cb_bad_fn_call_add",
+	 "release kernel function bpf_rbtree_add expects refcounted PTR_TO_BTF_ID"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_remove", "rbtree_remove not allowed in rbtree cb"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_first_unlock_after",
+	 "can't spin_{lock,unlock} in rbtree cb"},
+
+	{"rbtree_api_remove_unadded_node", "rbtree_remove node input must be non-owning ref"},
+	{"rbtree_api_add_to_multiple_trees",
+	 "function bpf_rbtree_add expects refcounted PTR_TO_BTF_ID"},
+	{"rbtree_api_add_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_first_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_remove_no_drop", "Unreleased reference id=2 alloc_insn=11"},
+	{"rbtree_api_release_aliasing", "arg#1 expected pointer to allocated object"},
+};
+
+static void test_rbtree_fail_prog(const char *prog_name, const char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts,
+		    .kernel_log_buf = log_buf,
+		    .kernel_log_size = sizeof(log_buf),
+		    .kernel_log_level = 1
+	);
+	struct rbtree_fail *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = rbtree_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "rbtree_fail__open_opts"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto end;
+
+	bpf_program__set_autoload(prog, true);
+
+	ret = rbtree_fail__load(skel);
+	if (!ASSERT_ERR(ret, "rbtree_fail__load must fail"))
+		goto end;
+
+	if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+		fprintf(stderr, "Expected: %s\n", err_msg);
+		fprintf(stderr, "Verifier: %s\n", log_buf);
+	}
+
+end:
+	rbtree_fail__destroy(skel);
+}
+
+static void test_rbtree_add_nodes(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_nodes), &opts);
+	ASSERT_OK(ret, "rbtree_add_nodes run");
+	ASSERT_OK(opts.retval, "rbtree_add_nodes retval");
+	ASSERT_EQ(skel->data->less_callback_ran, 1, "rbtree_add_nodes less_callback_ran");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_add_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_add_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_add_and_remove retval");
+	ASSERT_EQ(skel->data->removed_key, 5, "rbtree_add_and_remove first removed key");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_first_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_first_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_first_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_first_and_remove retval");
+	ASSERT_EQ(skel->data->first_data[0], 2, "rbtree_first_and_remove first rbtree_first()");
+	ASSERT_EQ(skel->data->removed_key, 1, "rbtree_first_and_remove first removed key");
+	ASSERT_EQ(skel->data->first_data[1], 4, "rbtree_first_and_remove second rbtree_first()");
+
+	rbtree__destroy(skel);
+}
+
+void test_rbtree_success(void)
+{
+	if (test__start_subtest("rbtree_add_nodes"))
+		test_rbtree_add_nodes();
+	if (test__start_subtest("rbtree_add_and_remove"))
+		test_rbtree_add_and_remove();
+	if (test__start_subtest("rbtree_first_and_remove"))
+		test_rbtree_first_and_remove();
+}
+
+#define BTF_FAIL_TEST(suffix)									\
+void test_rbtree_btf_fail__##suffix(void)							\
+{												\
+	struct rbtree_btf_fail__##suffix *skel;							\
+												\
+	skel = rbtree_btf_fail__##suffix##__open_and_load();					\
+	if (!ASSERT_ERR_PTR(skel,								\
+			    "rbtree_btf_fail__" #suffix "__open_and_load unexpected success"))	\
+		rbtree_btf_fail__##suffix##__destroy(skel);					\
+}
+
+#define RUN_BTF_FAIL_TEST(suffix)				\
+	if (test__start_subtest("rbtree_btf_fail__" #suffix))	\
+		test_rbtree_btf_fail__##suffix();
+
+BTF_FAIL_TEST(wrong_node_type);
+BTF_FAIL_TEST(add_wrong_type);
+
+void test_rbtree_btf_fail(void)
+{
+	RUN_BTF_FAIL_TEST(wrong_node_type);
+	RUN_BTF_FAIL_TEST(add_wrong_type);
+}
+
+void test_rbtree_fail(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rbtree_fail_tests); i++) {
+		if (!test__start_subtest(rbtree_fail_tests[i].prog_name))
+			continue;
+		test_rbtree_fail_prog(rbtree_fail_tests[i].prog_name,
+				      rbtree_fail_tests[i].err_msg);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree.c b/tools/testing/selftests/bpf/progs/rbtree.c
new file mode 100644
index 000000000000..e5db1a4287e5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+long less_callback_ran = -1;
+long removed_key = -1;
+long first_data[2] = {-1, -1};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	less_callback_ran = 1;
+
+	return node_a->key < node_b->key;
+}
+
+static long __add_three(struct bpf_rb_root *root, struct bpf_spin_lock *lock)
+{
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 2;
+	}
+	m->key = 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_spin_unlock(&glock);
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 3;
+	n->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("tc")
+long rbtree_add_nodes(void *ctx)
+{
+	return __add_three(&groot, &glock);
+}
+
+SEC("tc")
+long rbtree_add_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		goto err_out;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	res = bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+
+	n = container_of(res, struct node_data, node);
+	removed_key = n->key;
+
+	bpf_obj_drop(n);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+SEC("tc")
+long rbtree_first_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m, *o;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 3;
+	n->data = 4;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 5;
+	m->data = 6;
+
+	o = bpf_obj_new(typeof(*o));
+	if (!o)
+		goto err_out;
+	o->key = 1;
+	o->data = 2;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_rbtree_add(&groot, &o->node, less);
+
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 2;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[0] = o->data;
+
+	res = bpf_rbtree_remove(&groot, &o->node);
+	bpf_spin_unlock(&glock);
+
+	o = container_of(res, struct node_data, node);
+	removed_key = o->key;
+
+	bpf_obj_drop(o);
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 3;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[1] = o->data;
+	bpf_spin_unlock(&glock);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
new file mode 100644
index 000000000000..60079b202c07
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	int key;
+	int data;
+	struct bpf_rb_node node;
+};
+
+struct node_data2 {
+	int key;
+	struct bpf_rb_node node;
+	int data;
+};
+
+static bool less2(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data2 *node_a;
+	struct node_data2 *node_b;
+
+	node_a = container_of(a, struct node_data2, node);
+	node_b = container_of(b, struct node_data2, node);
+
+	return node_a->key < node_b->key;
+}
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+SEC("tc")
+long rbtree_api_add__add_wrong_type(void *ctx)
+{
+	struct node_data2 *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less2);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
new file mode 100644
index 000000000000..340f97da1084
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+/* BTF load should fail as bpf_rb_root __contains this type and points to
+ * 'node', but 'node' is not a bpf_rb_node
+ */
+struct node_data {
+	int key;
+	int data;
+	struct bpf_list_node node;
+};
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->key < node_b->key;
+}
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+SEC("tc")
+long rbtree_api_add__wrong_node_type(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_first(&groot);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_fail.c b/tools/testing/selftests/bpf/progs/rbtree_fail.c
new file mode 100644
index 000000000000..df6e2a39fcee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_fail.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+private(A) struct bpf_rb_root groot2 __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->key < node_b->key;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_add(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_rbtree_add(&groot, &n->node, less);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_remove(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_rbtree_remove(&groot, &n->node);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_first(void *ctx)
+{
+	bpf_rbtree_first(&groot);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_unadded_node(void *ctx)
+{
+	struct node_data *n, *m;
+	struct bpf_rb_node *res;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 1;
+	}
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This remove should pass verifier */
+	res = bpf_rbtree_remove(&groot, &n->node);
+	n = container_of(res, struct node_data, node);
+
+	/* This remove shouldn't, m isn't in an rbtree */
+	res = bpf_rbtree_remove(&groot, &m->node);
+	m = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_no_drop(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		goto unlock_err;
+
+	res = bpf_rbtree_remove(&groot, res);
+
+	n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	/* bpf_obj_drop(n) is missing here */
+	return 0;
+
+unlock_err:
+	bpf_spin_unlock(&glock);
+	return 1;
+}
+
+SEC("?tc")
+long rbtree_api_add_to_multiple_trees(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This add should fail since n already in groot's tree */
+	bpf_rbtree_add(&groot2, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_add_release_unlock_escape(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After add() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_release_aliasing(void *ctx)
+{
+	struct node_data *n, *m, *o;
+	struct bpf_rb_node *res;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+
+	/* m and o point to the same node,
+	 * but verifier doesn't know this
+	 */
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		return 1;
+	o = container_of(res, struct node_data, node);
+
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		return 1;
+	m = container_of(res, struct node_data, node);
+
+	bpf_rbtree_remove(&groot, &m->node);
+	/* This second remove shouldn't be possible. Retval of previous
+	 * remove returns owning reference to m, which is the same
+	 * node o's non-owning ref is pointing at
+	 *
+	 * In order to preserve property
+	 *   * owning ref must not be in rbtree
+	 *   * non-owning ref must be in rbtree
+	 *
+	 * o's ref must be invalidated after previous remove. Otherwise
+	 * we'd have non-owning ref to node that isn't in rbtree, and
+	 * verifier wouldn't be able to use type system to prevent remove
+	 * of ref that already isn't in any tree. Would have to do runtime
+	 * checks in that case.
+	 */
+	bpf_rbtree_remove(&groot, &o->node);
+
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_first_release_unlock_escape(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (res)
+		n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After first() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+static bool less__bad_fn_call_add(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_add(&groot, &node_a->node, less);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_remove(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_remove(&groot, &node_a->node);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_first_unlock_after(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_first(&groot);
+	bpf_spin_unlock(&glock);
+
+	return node_a->key < node_b->key;
+}
+
+#define RBTREE_API_ADD_BAD_CB(cb_suffix)				\
+SEC("?tc")								\
+long rbtree_api_add_bad_cb_##cb_suffix(void *ctx)			\
+{									\
+	struct node_data *n;						\
+									\
+	n = bpf_obj_new(typeof(*n));					\
+	if (!n)								\
+		return 1;						\
+									\
+	bpf_spin_lock(&glock);						\
+	bpf_rbtree_add(&groot, &n->node, less__##cb_suffix);		\
+	bpf_spin_unlock(&glock);					\
+	return 0;							\
+}
+
+RBTREE_API_ADD_BAD_CB(bad_fn_call_add);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_remove);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_first_unlock_after);
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs
  2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (11 preceding siblings ...)
  2022-12-17  8:25 ` [PATCH v2 bpf-next 12/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
@ 2022-12-17  8:25 ` Dave Marchevsky
  2022-12-28 21:26   ` David Vernet
  12 siblings, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  8:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

It is difficult to intuit the semantics of owning and non-owning
references from verifier code. In order to keep the high-level details
from being lost in the mailing list, this patch adds documentation
explaining semantics and details.

The target audience of doc added in this patch is folks working on BPF
internals, as there's focus on "what should the verifier do here". Via
reorganization or copy-and-paste, much of the content can probably be
repurposed for BPF program writer audience as well.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 Documentation/bpf/graph_ds_impl.rst | 208 ++++++++++++++++++++++++++++
 Documentation/bpf/other.rst         |   3 +-
 2 files changed, 210 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/bpf/graph_ds_impl.rst

diff --git a/Documentation/bpf/graph_ds_impl.rst b/Documentation/bpf/graph_ds_impl.rst
new file mode 100644
index 000000000000..f92cbd223dc3
--- /dev/null
+++ b/Documentation/bpf/graph_ds_impl.rst
@@ -0,0 +1,208 @@
+=========================
+BPF Graph Data Structures
+=========================
+
+This document describes implementation details of new-style "graph" data
+structures (linked_list, rbtree), with particular focus on verifier
+implementation of semantics particular to those data structures.
+
+Note that the intent of this document is to describe the current state of
+these graph data structures, **no guarantees** of stability for either
+semantics or APIs are made or implied here.
+
+.. contents::
+    :local:
+    :depth: 2
+
+Introduction
+------------
+
+The BPF map API has historically been the main way to expose data structures
+of various types for use within BPF programs. Some data structures fit naturally
+with the map API (HASH, ARRAY), others less so. Consequentially, programs
+interacting with the latter group of data structures can be hard to parse
+for kernel programmers without previous BPF experience.
+
+Luckily, some restrictions which necessitated the use of BPF map semantics are
+no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
+BPF allocator, it is now possible to implement BPF data structures whose API
+and semantics more closely match those exposed to the rest of the kernel.
+
+Two such data structures - linked_list and rbtree - have many verification
+details in common. Because both have "root"s ("head" for linked_list) and
+"node"s, the verifier code and this document refer to common functionality
+as "graph_api", "graph_root", "graph_node", etc.
+
+Unless otherwise stated, examples and semantics below apply to both graph data
+structures.
+
+Non-owning references
+---------------------
+
+**Motivation**
+
+Consider the following BPF code:
+
+.. code-block:: c
+        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n); /* AFTER */
+
+        bpf_spin_unlock(&lock);
+----
+
+From the verifier's perspective, after bpf_obj_new ``n`` has type
+``PTR_TO_BTF_ID | MEM_ALLOC`` with btf_id of ``struct node_data`` and a
+nonzero ``ref_obj_id``. Because it holds ``n``, the program has ownership
+of the pointee's lifetime (object pointed to by ``n``). The BPF program must
+pass off ownership before exiting - either via ``bpf_obj_drop``, which free's
+the object, or by adding it to ``tree`` with ``bpf_rbtree_add``.
+
+(``BEFORE`` and ``AFTER`` comments in the example denote beginning of "before
+ownership is passed" and "after ownership is passed")
+
+What should the verifier do with ``n`` after ownership is passed off? If the
+object was free'd with ``bpf_obj_drop`` the answer is obvious: the verifier
+should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
+the object is no longer valid. The underlying memory may have been reused for
+some other allocation, unmapped, etc.
+
+When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
+obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
+but that would result in programs with useful, common coding patterns being
+rejected, e.g.:
+
+.. code-block:: c
+        int x;
+        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n); /* AFTER */
+        x = n->data;
+        n->data = 42;
+
+        bpf_spin_unlock(&lock);
+----
+
+Both the read from and write to ``n->data`` would be rejected. The verifier
+can do better, though, by taking advantage of two details:
+
+  * Graph data structure APIs can only be used when the ``bpf_spin_lock``
+    associated with the graph root is held
+  * Both graph data structures have pointer stability
+    * Because graph nodes are allocated with ``bpf_obj_new`` and
+      adding / removing from the root involves fiddling with the
+      ``bpf_{list,rb}_node`` field of the node struct, a graph node will
+      remain at the same address after either operation.
+
+Because the associated ``bpf_spin_lock`` must be held by any program adding
+or removing, if we're in the critical section bounded by that lock, we know
+that no other program can add or remove until the end of the critical section.
+This combined with pointer stability means that, until the critical section
+ends, we can safely access the graph node through ``n`` even after it was used
+to pass ownership.
+
+The verifier considers such a reference a *non-owning reference*. The ref
+returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
+Both terms currently only have meaning in the context of graph nodes and API.
+
+**Details**
+
+Let's enumerate the properties of both types of references.
+
+*owning reference*
+
+  * This reference controls the lifetime of the pointee
+  * Ownership of pointee must be 'released' by passing it to some graph API
+    kfunc, or via ``bpf_obj_drop``, which free's the pointee
+    * If not released before program ends, verifier considers program invalid
+  * Access to the pointee's memory will not page fault
+
+*non-owning reference*
+
+  * This reference does not own the pointee
+    * It cannot be used to add the graph node to a graph root, nor free via
+      ``bpf_obj_drop``
+  * No explicit control of lifetime, but can infer valid lifetime based on
+    non-owning ref existence (see explanation below)
+  * Access to the pointee's memory will not page fault
+
+From verifier's perspective non-owning references can only exist
+between spin_lock and spin_unlock. Why? After spin_unlock another program
+can do arbitrary operations on the data structure like removing and free-ing
+via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
+free'd, and reused via bpf_obj_new would point to an entirely different thing.
+Or the memory could go away.
+
+To prevent this logic violation all non-owning references are invalidated by
+verifier after critical section ends. This is necessary to ensure "will
+not page fault" property of non-owning reference. So if verifier hasn't
+invalidated a non-owning ref, accessing it will not page fault.
+
+Currently ``bpf_obj_drop`` is not allowed in the critical section, so
+if there's a valid non-owning ref, we must be in critical section, and can
+conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
+and-reused.
+
+Any reference to a node that is in a rbtree _must_ be non-owning, since
+the tree has control of pointee lifetime. Similarly, any ref to a node
+that isn't in rbtree _must_ be owning. This results in a nice property:
+graph API add / remove implementations don't need to check if a node
+has already been added (or already removed), as the verifier type system
+prevents such a state from being valid.
+
+However, pointer aliasing poses an issue for the above "nice property".
+Consider the following example:
+
+.. code-block:: c
+        struct node_data *n, *m, *o, *p;
+        n = bpf_obj_new(typeof(*n));     /* 1 */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n);        /* 2 */
+        m = bpf_rbtree_first(&tree);     /* 3 */
+
+        o = bpf_rbtree_remove(&tree, n); /* 4 */
+        p = bpf_rbtree_remove(&tree, m); /* 5 */
+
+        bpf_spin_unlock(&lock);
+
+        bpf_obj_drop(o);
+        bpf_obj_drop(p); /* 6 */
+----
+
+Assume tree is empty before this program runs. If we track verifier state
+changes here using numbers in above comments:
+
+  1) n is an owning reference
+  2) n is a non-owning reference, it's been added to the tree
+  3) n and m are non-owning references, they both point to the same node
+  4) o is an owning reference, n and m non-owning, all point to same node
+  5) o and p are owning, n and m non-owning, all point to the same node
+  6) a double-free has occurred, since o and p point to same node and o was
+     free'd in previous statement
+
+States 4 and 5 violate our "nice property", as there are non-owning refs to
+a node which is not in a rbtree. Statement 5 will try to remove a node which
+has already been removed as a result of this violation. State 6 is a dangerous
+double-free.
+
+At a minimum we should prevent state 6 from being possible. If we can't also
+prevent state 5 then we must abandon our "nice property" and check whether a
+node has already been removed at runtime.
+
+We prevent both by generalizing the "invalidate non-owning references" behavior
+of ``bpf_spin_unlock`` and doing similar invalidation after
+``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
+
+  * takes an arbitrary node argument
+  * removes it from the datastructure
+  * returns an owning reference to the removed node
+
+May result in a state where some other non-owning reference points to the same
+node. So ``remove``-type kfuncs must be considered a non-owning reference
+invalidation point as well.
diff --git a/Documentation/bpf/other.rst b/Documentation/bpf/other.rst
index 3d61963403b4..7e6b12018802 100644
--- a/Documentation/bpf/other.rst
+++ b/Documentation/bpf/other.rst
@@ -6,4 +6,5 @@ Other
    :maxdepth: 1
 
    ringbuf
-   llvm_reloc
\ No newline at end of file
+   llvm_reloc
+   graph_ds_impl
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
@ 2022-12-17  9:21   ` Dave Marchevsky
  2022-12-28 23:46   ` David Vernet
  2022-12-29  3:56   ` Alexei Starovoitov
  2 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2022-12-17  9:21 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/17/22 3:24 AM, Dave Marchevsky wrote:
> This patch introduces non-owning reference semantics to the verifier,
> specifically linked_list API kfunc handling. release_on_unlock logic for
> refs is refactored - with small functional changes - to implement these
> semantics, and bpf_list_push_{front,back} are migrated to use them.
> 
> When a list node is pushed to a list, the program still has a pointer to
> the node:
> 
>   n = bpf_obj_new(typeof(*n));
> 
>   bpf_spin_lock(&l);
>   bpf_list_push_back(&l, n);
>   /* n still points to the just-added node */
>   bpf_spin_unlock(&l);
> 
> What the verifier considers n to be after the push, and thus what can be
> done with n, are changed by this patch.
> 
> Common properties both before/after this patch:
>   * After push, n is only a valid reference to the node until end of
>     critical section
>   * After push, n cannot be pushed to any list
>   * After push, the program can read the node's fields using n
> 
> Before:
>   * After push, n retains the ref_obj_id which it received on
>     bpf_obj_new, but the associated bpf_reference_state's
>     release_on_unlock field is set to true
>     * release_on_unlock field and associated logic is used to implement
>       "n is only a valid ref until end of critical section"
>   * After push, n cannot be written to, the node must be removed from
>     the list before writing to its fields
>   * After push, n is marked PTR_UNTRUSTED
> 
> After:
>   * After push, n's ref is released and ref_obj_id set to 0. The
>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>     currently active lock
>     * non_owning_ref_lock and logic is used to implement "n is only a
>       valid ref until end of critical section"
>   * n can be written to (except for special fields e.g. bpf_list_node,
>     timer, ...)
>   * No special type flag is added to n after push
> 
> Summary of specific implementation changes to achieve the above:
> 
>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>     to "release on unlock" based on that field are removed
> 
>   * The anonymous active_lock struct used by bpf_verifier_state is
>     pulled out into a named struct bpf_active_lock.
> 
>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>     bpf_reg_state's PTR_TO_BTF_ID union
> 
>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>     ref semantics as described above
>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>       matching a particular bpf_active_lock identity. Replaces
>       release_on_unlock logic in process_spin_lock.
>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>       on current verifier state
>     * ref_convert_owning_non_owning - convert owning reference w/
>       specified ref_obj_id to non-owning references. Setup
>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>       its ref_obj_id
> 
>   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
>     KF_RELEASE to indicate that the release arg reg should be converted
>     to non-owning ref
>     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
>       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
>       doing ref_convert_owning_non_owning on the ref first, which
>       prevents the regs from being clobbered by 0ing out their
>       ref_obj_ids. The bpf_reference_state itself is still released via
>       release_reference as a result of the KF_RELEASE flag.
>     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
>       bpf_list_push_{front,back}
> 
> After these changes, linked_list's "release on unlock" logic continues
> to function as before, except for the semantic differences noted above.
> The patch immediately following this one makes minor changes to
> linked_list selftests to account for the differing behavior.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 53d175cbaa02..cb417ffbbb84 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>  };

[...]

>  struct bpf_reg_state {
>  	/* Ordering of fields matters.  See states_equal() */
>  	enum bpf_reg_type type;
> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>  		struct {
>  			struct btf *btf;
>  			u32 btf_id;
> +			struct bpf_active_lock non_owning_ref_lock;
>  		};
>  

I think it's possible for this to be a pointer by just pointing to
struct bpf_verifier_state's active_lock. Why?

  * There can currently only be one active_lock at a time
  * non-owning refs are only valid in the critical section

So if a verifier_state has an active_lock, any non-owning ref must've been
obtained under that lock, and any non-owning ref not obtained under that
lock must have been invalidated previously. 

This will keep bpf_reg_state size down. Will give it a shot for v3,
wanted to leave it in current state for v2 so logic in this patch
is easier to reason about.

Actually, if above logic is correct, then only valid states for
non_owning_ref_lock are "empty / null" and "same as current verifier_state",
in which case this can go back to being a bool. But for non-spin_unlock
invalidation points (e.g. rbtree_remove), we may want to keep additional
info around to avoid invalidating everything, which would require
re-introducing a non_owning_ref identity.

>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> @@ -223,11 +240,6 @@ struct bpf_reference_state {
>  	 * exiting a callback function.
>  	 */
>  	int callback_ref;
> -	/* Mark the reference state to release the registers sharing the same id
> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> -	 * safe to access inside the critical section).
> -	 */
> -	bool release_on_unlock;
>  };
>  
>  /* state of the program:
> @@ -328,21 +340,8 @@ struct bpf_verifier_state {
>  	u32 branches;
>  	u32 insn_idx;
>  	u32 curframe;
> -	/* For every reg representing a map value or allocated object pointer,
> -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
> -	 * context and conside them to not alias each other for the purposes of
> -	 * tracking lock state.
> -	 */
> -	struct {
> -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> -		 * there's no active lock held, and other fields have no
> -		 * meaning. If non-NULL, it indicates that a lock is held and
> -		 * id member has the reg->id of the register which can be >= 0.
> -		 */
> -		void *ptr;
> -		/* This will be reg->id */
> -		u32 id;
> -	} active_lock;
> +
> +	struct bpf_active_lock active_lock;
>  	bool speculative;
>  	bool active_rcu_lock;

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
@ 2022-12-23 10:51 ` Dan Carpenter
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-12-17 10:23 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

[-- Attachment #1: Type: text/plain, Size: 2745 bytes --]

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20221217082506.1570898-3-davemarchevsky@fb.com>
References: <20221217082506.1570898-3-davemarchevsky@fb.com>
TO: Dave Marchevsky <davemarchevsky@fb.com>
TO: bpf@vger.kernel.org
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Andrii Nakryiko <andrii@kernel.org>
CC: Kernel Team <kernel-team@fb.com>
CC: Kumar Kartikeya Dwivedi <memxor@gmail.com>
CC: Tejun Heo <tj@kernel.org>
CC: Dave Marchevsky <davemarchevsky@fb.com>

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Marchevsky/BPF-rbtree-next-gen-datastructure/20221217-162646
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20221217082506.1570898-3-davemarchevsky%40fb.com
patch subject: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
:::::: branch date: 2 hours ago
:::::: commit date: 2 hours ago
config: x86_64-randconfig-m001
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>

smatch warnings:
kernel/bpf/verifier.c:6275 reg_find_field_offset() warn: variable dereferenced before check 'reg' (see line 6274)

vim +/reg +6275 kernel/bpf/verifier.c

f79e7ea571732a Lorenz Bauer    2020-09-21  6267  
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6268  static struct btf_field *
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6269  reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6270  {
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6271  	struct btf_field *field;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6272  	struct btf_record *rec;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6273  
4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6274  	rec = reg_btf_record(reg);
4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6275  	if (!reg)
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6276  		return NULL;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6277  
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6278  	field = btf_record_find(rec, off, fields);
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6279  	if (!field)
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6280  		return NULL;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6281  
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6282  	return field;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6283  }
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6284  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 125794 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 6.1.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-11 (Debian 11.3.0-8) 11.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=110300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23900
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23900
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_KERNEL_ZSTD=y
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_DEBUGFS=y
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
CONFIG_BPF_SYSCALL=y
# CONFIG_BPF_JIT is not set
CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
# CONFIG_BPF_PRELOAD is not set
# end of BPF subsystem

CONFIG_PREEMPT_VOLUNTARY_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
# CONFIG_PREEMPT_DYNAMIC is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_FORCE_TASKS_RCU=y
CONFIG_TASKS_RCU=y
CONFIG_FORCE_TASKS_RUDE_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_FORCE_TASKS_TRACE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_TASKS_TRACE_RCU_READ_MB=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_IKHEADERS=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
# CONFIG_CGROUP_FAVOR_DYNMODS is not set
CONFIG_MEMCG=y
CONFIG_MEMCG_KMEM=y
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_NAMESPACES is not set
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
# CONFIG_RD_LZMA is not set
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_FHANDLE=y
# CONFIG_POSIX_TIMERS is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
# CONFIG_AIO is not set
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_MEMBARRIER is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_SELFTEST is not set
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
# CONFIG_RSEQ is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_DEBUG_PERF_USE_VMALLOC=y
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_CSUM=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=1024
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_AUDIT_ARCH=y
CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_FEATURE_NAMES=y
# CONFIG_X86_X2APIC is not set
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
# CONFIG_X86_CPU_RESCTRL is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
# CONFIG_IOSF_MBI is not set
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_X86_HV_CALLBACK_VECTOR=y
# CONFIG_XEN is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_PROCESSOR_SELECT=y
CONFIG_CPU_SUP_INTEL=y
# CONFIG_CPU_SUP_AMD is not set
# CONFIG_CPU_SUP_HYGON is not set
# CONFIG_CPU_SUP_CENTAUR is not set
# CONFIG_CPU_SUP_ZHAOXIN is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1
CONFIG_UP_LATE_INIT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
# CONFIG_X86_MCE is not set

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_PERF_EVENTS_INTEL_RAPL=y
CONFIG_PERF_EVENTS_INTEL_CSTATE=y
# end of Performance monitoring

CONFIG_X86_VSYSCALL_EMULATION=y
# CONFIG_X86_IOPL_IOPERM is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_5LEVEL is not set
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
# CONFIG_X86_UMIP is not set
CONFIG_CC_HAS_IBT=y
# CONFIG_X86_KERNEL_IBT is not set
# CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is not set
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_HANDOVER_PROTOCOL=y
# CONFIG_EFI_MIXED is not set
CONFIG_EFI_FAKE_MEMMAP=y
CONFIG_EFI_MAX_FAKE_MEM=8
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
CONFIG_KEXEC_SIG_FORCE=y
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_COMPAT_VDSO=y
CONFIG_LEGACY_VSYSCALL_XONLY=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
# CONFIG_MODIFY_LDT_SYSCALL is not set
CONFIG_STRICT_SIGALTSTACK_SIZE=y
CONFIG_HAVE_LIVEPATCH=y
# end of Processor type and features

CONFIG_CC_HAS_SLS=y
CONFIG_CC_HAS_RETURN_THUNK=y
CONFIG_SPECULATION_MITIGATIONS=y
# CONFIG_PAGE_TABLE_ISOLATION is not set
CONFIG_RETPOLINE=y
CONFIG_RETHUNK=y
CONFIG_CPU_IBRS_ENTRY=y
# CONFIG_SLS is not set
CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y

#
# Power management and ACPI options
#
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_SUSPEND_SKIP_SYNC is not set
CONFIG_PM_SLEEP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_USERSPACE_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_CLK=y
CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
CONFIG_ACPI_FPDT=y
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_TAD=y
# CONFIG_ACPI_DOCK is not set
# CONFIG_ACPI_PROCESSOR is not set
CONFIG_ACPI_CUSTOM_DSDT_FILE=""
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
# CONFIG_ACPI_CONTAINER is not set
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=y
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
# CONFIG_ACPI_REDUCED_HARDWARE_ONLY is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_EINJ=y
CONFIG_ACPI_APEI_ERST_DEBUG=y
# CONFIG_ACPI_DPTF is not set
CONFIG_ACPI_CONFIGFS=y
# CONFIG_ACPI_PFRUT is not set
# CONFIG_ACPI_FFH is not set
# CONFIG_PMIC_OPREGION is not set
# CONFIG_TPS68470_PMIC_OPREGION is not set
CONFIG_ACPI_VIOT=y
CONFIG_ACPI_PRMT=y
CONFIG_X86_PM_TIMER=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# end of CPU Frequency scaling

#
# CPU Idle
#
# CONFIG_CPU_IDLE is not set
# end of CPU Idle
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_MMCONF_FAM10H=y
# CONFIG_PCI_CNB20LE_QUIRK is not set
# CONFIG_ISA_BUS is not set
# CONFIG_ISA_DMA_API is not set
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32_ABI is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
# end of Binary Emulations

CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_KRETPROBE_ON_RETHOOK=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_RUST=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_MERGE_VMAS=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
# CONFIG_STACKPROTECTOR_STRONG is not set
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_ARCH_SUPPORTS_CFI_CLANG=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING_USER=y
CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_HAVE_ARCH_HUGE_VMALLOC=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_OBJTOOL=y
CONFIG_HAVE_JUMP_LABEL_HACK=y
CONFIG_HAVE_NOINSTR_HACK=y
CONFIG_HAVE_NOINSTR_VALIDATION=y
CONFIG_HAVE_UACCESS_VALIDATION=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_ISA_BUS_API=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
# CONFIG_VMAP_STACK is not set
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
CONFIG_LOCK_EVENT_COUNTS=y
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y

#
# GCOV-based kernel profiling
#
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# CONFIG_GCOV_PROFILE_ALL is not set
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
CONFIG_GCC_PLUGINS=y
# CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_MODULE_SIG is not set
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
# CONFIG_BLOCK is not set
CONFIG_ASN1=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB_DEPRECATED is not set
# CONFIG_SLUB_TINY is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
# end of SLAB allocator options

CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_COMPAT_BRK=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_COMPACTION=y
CONFIG_COMPACT_UNEVICTABLE_DEFAULT=1
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_WANTS_THP_SWAP=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
CONFIG_NEED_PER_CPU_KM=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
CONFIG_CMA_DEBUGFS=y
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=7
# CONFIG_MEM_SOFT_DIRTY is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ARCH_HAS_ZONE_DMA_SET=y
# CONFIG_ZONE_DMA is not set
CONFIG_ZONE_DMA32=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set
# CONFIG_LRU_GEN is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_DIAG is not set
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
# CONFIG_UNIX_DIAG is not set
# CONFIG_TLS is not set
# CONFIG_XFRM_USER is not set
# CONFIG_NET_KEY is not set
# CONFIG_XDP_SOCKETS is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_NET_IP_TUNNEL=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_NET_IPVTI is not set
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
CONFIG_INET_TABLE_PERTURB_ORDER=16
CONFIG_INET_TUNNEL=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_INET_UDP_DIAG is not set
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_IPV6_VTI is not set
CONFIG_IPV6_SIT=y
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_NETLABEL is not set
# CONFIG_MPTCP is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
# CONFIG_NETFILTER is not set
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
# CONFIG_BRIDGE is not set
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
# CONFIG_VSOCKETS is not set
# CONFIG_NETLINK_DIAG is not set
# CONFIG_MPLS is not set
# CONFIG_NET_NSH is not set
# CONFIG_HSR is not set
# CONFIG_NET_SWITCHDEV is not set
# CONFIG_NET_L3_MASTER_DEV is not set
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
# CONFIG_MCTP is not set
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
# CONFIG_RFKILL is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_FD=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
# CONFIG_PSAMPLE is not set
# CONFIG_NET_IFE is not set
# CONFIG_LWTUNNEL is not set
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_PAGE_POOL=y
# CONFIG_PAGE_POOL_STATS is not set
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
# CONFIG_PCIE_PTM is not set
# CONFIG_PCI_MSI is not set
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_PCI_LOCKLESS_CONFIG=y
# CONFIG_PCI_IOV is not set
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
CONFIG_PCI_LABEL=y
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_HOTPLUG_PCI is not set

#
# PCI controller drivers
#

#
# DesignWare PCI Core Support
#
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
CONFIG_PCCARD=y
# CONFIG_PCMCIA is not set
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
# CONFIG_DEVTMPFS_SAFE is not set
# CONFIG_STANDALONE is not set
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_FW_LOADER_SYSFS=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
CONFIG_FW_LOADER_COMPRESS=y
CONFIG_FW_LOADER_COMPRESS_XZ=y
# CONFIG_FW_LOADER_COMPRESS_ZSTD is not set
CONFIG_FW_CACHE=y
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_WANT_DEV_COREDUMP=y
# CONFIG_ALLOW_DEV_COREDUMP is not set
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=y
CONFIG_REGMAP_SPMI=y
CONFIG_REGMAP_W1=y
CONFIG_REGMAP_MMIO=y
CONFIG_REGMAP_IRQ=y
CONFIG_REGMAP_I3C=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

# CONFIG_CONNECTOR is not set

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set
# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_DMIID is not set
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_FW_CFG_SYSFS is not set
# CONFIG_SYSFB_SIMPLEFB is not set
CONFIG_GOOGLE_FIRMWARE=y
CONFIG_GOOGLE_SMI=y
CONFIG_GOOGLE_COREBOOT_TABLE=y
# CONFIG_GOOGLE_MEMCONSOLE_X86_LEGACY is not set
# CONFIG_GOOGLE_MEMCONSOLE_COREBOOT is not set
CONFIG_GOOGLE_VPD=y

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_DXE_MEM_ATTRIBUTES=y
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_BOOTLOADER_CONTROL=y
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_APPLE_PROPERTIES=y
CONFIG_RESET_ATTACK_MITIGATION=y
CONFIG_EFI_RCI2_TABLE=y
CONFIG_EFI_DISABLE_PCI_DMA=y
CONFIG_EFI_EARLYCON=y
# CONFIG_EFI_CUSTOM_SSDT_OVERLAYS is not set
# CONFIG_EFI_DISABLE_RUNTIME is not set
# CONFIG_EFI_COCO_SECRET is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_GNSS=y
CONFIG_GNSS_SERIAL=y
CONFIG_GNSS_MTK_SERIAL=y
# CONFIG_GNSS_SIRF_SERIAL is not set
CONFIG_GNSS_UBX_SERIAL=y
CONFIG_MTD=y
# CONFIG_MTD_TESTS is not set

#
# Partition parsers
#
# CONFIG_MTD_AR7_PARTS is not set
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_REDBOOT_PARTS=y
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED=y
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# end of Partition parsers

#
# User Modules And Translation Layers
#
CONFIG_MTD_OOPS=y
# CONFIG_MTD_PARTITIONED_MASTER is not set

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=y
CONFIG_MTD_JEDECPROBE=y
CONFIG_MTD_GEN_PROBE=y
CONFIG_MTD_CFI_ADV_OPTIONS=y
# CONFIG_MTD_CFI_NOSWAP is not set
CONFIG_MTD_CFI_BE_BYTE_SWAP=y
# CONFIG_MTD_CFI_LE_BYTE_SWAP is not set
CONFIG_MTD_CFI_GEOMETRY=y
# CONFIG_MTD_MAP_BANK_WIDTH_1 is not set
CONFIG_MTD_MAP_BANK_WIDTH_2=y
# CONFIG_MTD_MAP_BANK_WIDTH_4 is not set
CONFIG_MTD_MAP_BANK_WIDTH_8=y
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
CONFIG_MTD_MAP_BANK_WIDTH_32=y
CONFIG_MTD_CFI_I1=y
# CONFIG_MTD_CFI_I2 is not set
CONFIG_MTD_CFI_I4=y
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_OTP=y
# CONFIG_MTD_CFI_INTELEXT is not set
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_CFI_STAA=y
CONFIG_MTD_CFI_UTIL=y
CONFIG_MTD_RAM=y
# CONFIG_MTD_ROM is not set
CONFIG_MTD_ABSENT=y
# end of RAM/ROM/Flash chip drivers

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
# CONFIG_MTD_PHYSMAP is not set
# CONFIG_MTD_AMD76XROM is not set
CONFIG_MTD_ICHXROM=y
# CONFIG_MTD_ESB2ROM is not set
# CONFIG_MTD_CK804XROM is not set
# CONFIG_MTD_SCB2_FLASH is not set
# CONFIG_MTD_NETtel is not set
CONFIG_MTD_L440GX=y
# CONFIG_MTD_PCI is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
CONFIG_MTD_PLATRAM=y
# end of Mapping drivers for chip access

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
CONFIG_MTD_SLRAM=y
CONFIG_MTD_PHRAM=y
CONFIG_MTD_MTDRAM=y
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOCG3 is not set
# end of Self-contained MTD device drivers

#
# NAND
#
CONFIG_MTD_NAND_CORE=y
# CONFIG_MTD_ONENAND is not set
CONFIG_MTD_RAW_NAND=y

#
# Raw/parallel NAND flash controllers
#
# CONFIG_MTD_NAND_DENALI_PCI is not set
# CONFIG_MTD_NAND_CAFE is not set
CONFIG_MTD_NAND_MXIC=y
CONFIG_MTD_NAND_GPIO=y
CONFIG_MTD_NAND_PLATFORM=y
CONFIG_MTD_NAND_ARASAN=y

#
# Misc
#
CONFIG_MTD_NAND_NANDSIM=y
# CONFIG_MTD_NAND_RICOH is not set
CONFIG_MTD_NAND_DISKONCHIP=y
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED=y
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADDRESS=0
# CONFIG_MTD_NAND_DISKONCHIP_PROBE_HIGH is not set
CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE=y

#
# ECC engine support
#
CONFIG_MTD_NAND_ECC=y
# CONFIG_MTD_NAND_ECC_SW_HAMMING is not set
# CONFIG_MTD_NAND_ECC_SW_BCH is not set
# CONFIG_MTD_NAND_ECC_MXIC is not set
# end of ECC engine support
# end of NAND

#
# LPDDR & LPDDR2 PCM memory drivers
#
CONFIG_MTD_LPDDR=y
CONFIG_MTD_QINFO_PROBE=y
# end of LPDDR & LPDDR2 PCM memory drivers

CONFIG_MTD_UBI=y
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_FASTMAP=y
CONFIG_MTD_UBI_GLUEBI=y
CONFIG_MTD_HYPERBUS=y
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=y
# CONFIG_PARPORT_PC is not set
CONFIG_PARPORT_AX88796=y
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y

#
# NVME Support
#
# end of NVME Support

#
# Misc devices
#
# CONFIG_AD525X_DPOT is not set
CONFIG_DUMMY_IRQ=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_TIFM_CORE is not set
CONFIG_ICS932S401=y
CONFIG_ENCLOSURE_SERVICES=y
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
CONFIG_ISL29003=y
CONFIG_ISL29020=y
CONFIG_SENSORS_TSL2550=y
CONFIG_SENSORS_BH1770=y
CONFIG_SENSORS_APDS990X=y
CONFIG_HMC6352=y
CONFIG_DS1682=y
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_C2PORT=y
# CONFIG_C2PORT_DURAMAR_2150 is not set

#
# EEPROM support
#
CONFIG_EEPROM_AT24=y
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_MAX6875=y
# CONFIG_EEPROM_93CX6 is not set
CONFIG_EEPROM_IDT_89HPESX=y
CONFIG_EEPROM_EE1004=y
# end of EEPROM support

# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_ALTERA_STAPL=y
# CONFIG_INTEL_MEI is not set
# CONFIG_INTEL_MEI_ME is not set
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_VMWARE_VMCI is not set
# CONFIG_GENWQE is not set
CONFIG_ECHO=y
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
CONFIG_PVPANIC=y
CONFIG_PVPANIC_MMIO=y
# CONFIG_PVPANIC_PCI is not set
# CONFIG_GP_PCI1XXXX is not set
# end of Misc devices

#
# SCSI device support
#
# end of SCSI device support

# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_TUN is not set
# CONFIG_TUN_VNET_CROSS_LE is not set
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_ARCNET is not set
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_NET_VENDOR_AMD is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ASIX=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
# CONFIG_CX_ECAT is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
# CONFIG_LIQUIDIO is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
CONFIG_NET_VENDOR_DAVICOM=y
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_ENGLEDER=y
# CONFIG_TSNEP is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_FUNGIBLE=y
CONFIG_NET_VENDOR_GOOGLE=y
CONFIG_NET_VENDOR_HUAWEI=y
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
# CONFIG_E1000E is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
# CONFIG_IXGBE is not set
# CONFIG_I40E is not set
# CONFIG_IGC is not set
CONFIG_NET_VENDOR_WANGXUN=y
# CONFIG_NGBE is not set
# CONFIG_TXGBE is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_LITEX=y
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_OCTEON_EP is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_LAN743X is not set
# CONFIG_VCAP is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MICROSOFT=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
CONFIG_NET_VENDOR_NETRONOME=y
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VERTEXCOM=y
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_PHYLIB is not set
# CONFIG_PSE_CONTROLLER is not set
# CONFIG_MDIO_DEVICE is not set

#
# PCS device drivers
#
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# Host-side USB support is needed for USB Network Adapter support
#
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K_PCI is not set
CONFIG_WLAN_VENDOR_ATMEL=y
CONFIG_WLAN_VENDOR_BROADCOM=y
CONFIG_WLAN_VENDOR_CISCO=y
CONFIG_WLAN_VENDOR_INTEL=y
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
CONFIG_WLAN_VENDOR_MARVELL=y
CONFIG_WLAN_VENDOR_MEDIATEK=y
CONFIG_WLAN_VENDOR_MICROCHIP=y
CONFIG_WLAN_VENDOR_PURELIFI=y
CONFIG_WLAN_VENDOR_RALINK=y
CONFIG_WLAN_VENDOR_REALTEK=y
CONFIG_WLAN_VENDOR_RSI=y
CONFIG_WLAN_VENDOR_SILABS=y
CONFIG_WLAN_VENDOR_ST=y
CONFIG_WLAN_VENDOR_TI=y
CONFIG_WLAN_VENDOR_ZYDAS=y
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_WAN is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set
CONFIG_INPUT_VIVALDIFMAP=y

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADC is not set
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_PINEPHONE is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_CROS_EC is not set
# CONFIG_KEYBOARD_MTK_PMIC is not set
# CONFIG_KEYBOARD_CYPRESS_SF is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
# CONFIG_MOUSE_PS2_VMMOUSE is not set
CONFIG_MOUSE_PS2_SMBUS=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_CYAPA is not set
# CONFIG_MOUSE_ELAN_I2C is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_GPIO is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_SERIO_ARC_PS2 is not set
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
CONFIG_GAMEPORT_L4=y
# CONFIG_GAMEPORT_EMU10K1 is not set
# CONFIG_GAMEPORT_FM801 is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
# CONFIG_VT is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
# CONFIG_SERIAL_8250_MEN_MCB is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
CONFIG_SERIAL_8250_DWLIB=y
# CONFIG_SERIAL_8250_DW is not set
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y
CONFIG_SERIAL_8250_PERICOM=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_MEN_Z135 is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
# CONFIG_NOZOMI is not set
# CONFIG_NULL_TTY is not set
CONFIG_SERIAL_DEV_BUS=y
CONFIG_SERIAL_DEV_CTRL_TTYPORT=y
# CONFIG_TTY_PRINTK is not set
CONFIG_PRINTER=y
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=y
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_SSIF_IPMI_BMC is not set
# CONFIG_IPMB_DEVICE_INTERFACE is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_DEVMEM is not set
# CONFIG_NVRAM is not set
CONFIG_DEVPORT=y
# CONFIG_HPET is not set
CONFIG_HANGCHECK_TIMER=y
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_I2C is not set
CONFIG_TCG_TIS_I2C_CR50=y
CONFIG_TCG_TIS_I2C_ATMEL=y
CONFIG_TCG_TIS_I2C_INFINEON=y
CONFIG_TCG_TIS_I2C_NUVOTON=y
CONFIG_TCG_NSC=y
CONFIG_TCG_ATMEL=y
CONFIG_TCG_INFINEON=y
CONFIG_TCG_CRB=y
CONFIG_TCG_VTPM_PROXY=y
CONFIG_TCG_TIS_ST33ZP24=y
CONFIG_TCG_TIS_ST33ZP24_I2C=y
CONFIG_TELCLOCK=y
# CONFIG_XILLYBUS is not set
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
# CONFIG_ACPI_I2C_OPREGION is not set
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_MUX=y

#
# Multiplexer I2C Chip support
#
CONFIG_I2C_MUX_GPIO=y
CONFIG_I2C_MUX_LTC4306=y
CONFIG_I2C_MUX_PCA9541=y
CONFIG_I2C_MUX_PCA954x=y
CONFIG_I2C_MUX_REG=y
CONFIG_I2C_MUX_MLXCPLD=y
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_AMD_MP2 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_ISMT is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
CONFIG_I2C_SCMI=y

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_CBUS_GPIO=y
# CONFIG_I2C_DESIGNWARE_PLATFORM is not set
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
CONFIG_I2C_GPIO=y
# CONFIG_I2C_GPIO_FAULT_INJECTOR is not set
CONFIG_I2C_OCORES=y
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_SIMTEC=y
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PCI1XXXX is not set
# CONFIG_I2C_TAOS_EVM is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_MLXCPLD is not set
# CONFIG_I2C_CROS_EC_TUNNEL is not set
CONFIG_I2C_VIRTIO=y
# end of I2C Hardware Bus support

# CONFIG_I2C_STUB is not set
CONFIG_I2C_SLAVE=y
CONFIG_I2C_SLAVE_EEPROM=y
CONFIG_I2C_SLAVE_TESTUNIT=y
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

CONFIG_I3C=y
# CONFIG_CDNS_I3C_MASTER is not set
CONFIG_DW_I3C_MASTER=y
CONFIG_SVC_I3C_MASTER=y
# CONFIG_MIPI_I3C_HCI is not set
# CONFIG_SPI is not set
CONFIG_SPMI=y
# CONFIG_SPMI_HISI3670 is not set
CONFIG_HSI=y
CONFIG_HSI_BOARDINFO=y

#
# HSI controllers
#

#
# HSI clients
#
CONFIG_HSI_CHAR=y
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set
CONFIG_NTP_PPS=y

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
# CONFIG_PPS_CLIENT_LDISC is not set
CONFIG_PPS_CLIENT_PARPORT=y
CONFIG_PPS_CLIENT_GPIO=y

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
CONFIG_DEBUG_PINCTRL=y
# CONFIG_PINCTRL_AMD is not set
# CONFIG_PINCTRL_CY8C95X0 is not set
CONFIG_PINCTRL_DA9062=y
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set
CONFIG_PINCTRL_MADERA=y
CONFIG_PINCTRL_CS47L35=y
CONFIG_PINCTRL_CS47L85=y
CONFIG_PINCTRL_CS47L90=y

#
# Intel pinctrl drivers
#
# CONFIG_PINCTRL_BAYTRAIL is not set
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=y
CONFIG_PINCTRL_ALDERLAKE=y
# CONFIG_PINCTRL_BROXTON is not set
CONFIG_PINCTRL_CANNONLAKE=y
CONFIG_PINCTRL_CEDARFORK=y
CONFIG_PINCTRL_DENVERTON=y
# CONFIG_PINCTRL_ELKHARTLAKE is not set
CONFIG_PINCTRL_EMMITSBURG=y
# CONFIG_PINCTRL_GEMINILAKE is not set
# CONFIG_PINCTRL_ICELAKE is not set
CONFIG_PINCTRL_JASPERLAKE=y
# CONFIG_PINCTRL_LAKEFIELD is not set
CONFIG_PINCTRL_LEWISBURG=y
# CONFIG_PINCTRL_METEORLAKE is not set
CONFIG_PINCTRL_SUNRISEPOINT=y
CONFIG_PINCTRL_TIGERLAKE=y
# end of Intel pinctrl drivers

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
CONFIG_DEBUG_GPIO=y
# CONFIG_GPIO_SYSFS is not set
CONFIG_GPIO_CDEV=y
# CONFIG_GPIO_CDEV_V1 is not set
CONFIG_GPIO_GENERIC=y

#
# Memory mapped GPIO drivers
#
# CONFIG_GPIO_AMDPT is not set
CONFIG_GPIO_DWAPB=y
# CONFIG_GPIO_EXAR is not set
CONFIG_GPIO_GENERIC_PLATFORM=y
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_MENZ127 is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
CONFIG_GPIO_F7188X=y
CONFIG_GPIO_IT87=y
CONFIG_GPIO_SCH311X=y
# CONFIG_GPIO_WINBOND is not set
CONFIG_GPIO_WS16C48=y
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_MAX7300 is not set
CONFIG_GPIO_MAX732X=y
# CONFIG_GPIO_MAX732X_IRQ is not set
CONFIG_GPIO_PCA953X=y
# CONFIG_GPIO_PCA953X_IRQ is not set
CONFIG_GPIO_PCA9570=y
# CONFIG_GPIO_PCF857X is not set
CONFIG_GPIO_TPIC2810=y
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
CONFIG_GPIO_ARIZONA=y
# CONFIG_GPIO_BD9571MWV is not set
CONFIG_GPIO_DA9055=y
CONFIG_GPIO_LP3943=y
# CONFIG_GPIO_LP873X is not set
CONFIG_GPIO_MADERA=y
# CONFIG_GPIO_PALMAS is not set
# CONFIG_GPIO_TPS65086 is not set
CONFIG_GPIO_TPS6586X=y
CONFIG_GPIO_TPS65910=y
CONFIG_GPIO_TPS68470=y
CONFIG_GPIO_TQMX86=y
CONFIG_GPIO_WM831X=y
# CONFIG_GPIO_WM8350 is not set
CONFIG_GPIO_WM8994=y
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# Virtual GPIO drivers
#
# CONFIG_GPIO_AGGREGATOR is not set
CONFIG_GPIO_MOCKUP=y
CONFIG_GPIO_VIRTIO=y
# CONFIG_GPIO_SIM is not set
# end of Virtual GPIO drivers

CONFIG_W1=y

#
# 1-wire Bus Masters
#
# CONFIG_W1_MASTER_MATROX is not set
CONFIG_W1_MASTER_DS2482=y
CONFIG_W1_MASTER_DS1WM=y
CONFIG_W1_MASTER_GPIO=y
CONFIG_W1_MASTER_SGI=y
# end of 1-wire Bus Masters

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=y
# CONFIG_W1_SLAVE_SMEM is not set
CONFIG_W1_SLAVE_DS2405=y
CONFIG_W1_SLAVE_DS2408=y
# CONFIG_W1_SLAVE_DS2408_READBACK is not set
# CONFIG_W1_SLAVE_DS2413 is not set
CONFIG_W1_SLAVE_DS2406=y
# CONFIG_W1_SLAVE_DS2423 is not set
# CONFIG_W1_SLAVE_DS2805 is not set
CONFIG_W1_SLAVE_DS2430=y
CONFIG_W1_SLAVE_DS2431=y
CONFIG_W1_SLAVE_DS2433=y
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_W1_SLAVE_DS2438=y
# CONFIG_W1_SLAVE_DS250X is not set
# CONFIG_W1_SLAVE_DS2780 is not set
CONFIG_W1_SLAVE_DS2781=y
CONFIG_W1_SLAVE_DS28E04=y
CONFIG_W1_SLAVE_DS28E17=y
# end of 1-wire Slaves

CONFIG_POWER_RESET=y
CONFIG_POWER_RESET_ATC260X=y
CONFIG_POWER_RESET_MT6323=y
CONFIG_POWER_RESET_RESTART=y
# CONFIG_POWER_RESET_TPS65086 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
CONFIG_PDA_POWER=y
CONFIG_GENERIC_ADC_BATTERY=y
# CONFIG_IP5XXX_POWER is not set
# CONFIG_WM831X_BACKUP is not set
# CONFIG_WM831X_POWER is not set
CONFIG_WM8350_POWER=y
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_88PM860X is not set
CONFIG_CHARGER_ADP5061=y
CONFIG_BATTERY_CW2015=y
CONFIG_BATTERY_DS2760=y
# CONFIG_BATTERY_DS2780 is not set
CONFIG_BATTERY_DS2781=y
CONFIG_BATTERY_DS2782=y
# CONFIG_BATTERY_SAMSUNG_SDI is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
CONFIG_MANAGER_SBS=y
CONFIG_BATTERY_BQ27XXX=y
# CONFIG_BATTERY_BQ27XXX_I2C is not set
CONFIG_BATTERY_BQ27XXX_HDQ=y
CONFIG_BATTERY_DA9030=y
CONFIG_BATTERY_DA9150=y
CONFIG_CHARGER_AXP20X=y
CONFIG_BATTERY_AXP20X=y
CONFIG_AXP20X_POWER=y
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_BATTERY_MAX17042=y
CONFIG_BATTERY_MAX1721X=y
CONFIG_CHARGER_ISP1704=y
# CONFIG_CHARGER_MAX8903 is not set
CONFIG_CHARGER_LP8727=y
CONFIG_CHARGER_LP8788=y
CONFIG_CHARGER_GPIO=y
# CONFIG_CHARGER_MANAGER is not set
CONFIG_CHARGER_LT3651=y
CONFIG_CHARGER_LTC4162L=y
CONFIG_CHARGER_MAX14577=y
# CONFIG_CHARGER_MAX77976 is not set
# CONFIG_CHARGER_MT6360 is not set
CONFIG_CHARGER_BQ2415X=y
CONFIG_CHARGER_BQ24190=y
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
CONFIG_CHARGER_BQ2515X=y
# CONFIG_CHARGER_BQ25890 is not set
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
CONFIG_CHARGER_SMB347=y
CONFIG_CHARGER_TPS65090=y
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
CONFIG_BATTERY_GOLDFISH=y
# CONFIG_BATTERY_RT5033 is not set
CONFIG_CHARGER_RT9455=y
# CONFIG_CHARGER_BD99954 is not set
# CONFIG_BATTERY_UG3105 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
CONFIG_SENSORS_AD7414=y
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
CONFIG_SENSORS_ADM1026=y
CONFIG_SENSORS_ADM1029=y
CONFIG_SENSORS_ADM1031=y
CONFIG_SENSORS_ADM1177=y
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7410 is not set
# CONFIG_SENSORS_ADT7411 is not set
CONFIG_SENSORS_ADT7462=y
CONFIG_SENSORS_ADT7470=y
CONFIG_SENSORS_ADT7475=y
# CONFIG_SENSORS_AHT10 is not set
CONFIG_SENSORS_AS370=y
# CONFIG_SENSORS_ASC7621 is not set
CONFIG_SENSORS_AXI_FAN_CONTROL=y
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_APPLESMC is not set
CONFIG_SENSORS_ASB100=y
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_CORSAIR_PSU is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
CONFIG_SENSORS_DELL_SMM=y
# CONFIG_I8K is not set
# CONFIG_SENSORS_DA9055 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
CONFIG_SENSORS_F71882FG=y
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_MC13783_ADC is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_GL518SM is not set
CONFIG_SENSORS_GL520SM=y
CONFIG_SENSORS_G760A=y
CONFIG_SENSORS_G762=y
CONFIG_SENSORS_HIH6130=y
CONFIG_SENSORS_IIO_HWMON=y
# CONFIG_SENSORS_I5500 is not set
CONFIG_SENSORS_CORETEMP=y
CONFIG_SENSORS_IT87=y
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_POWR1220 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LTC2945 is not set
CONFIG_SENSORS_LTC2947=y
CONFIG_SENSORS_LTC2947_I2C=y
CONFIG_SENSORS_LTC2990=y
# CONFIG_SENSORS_LTC2992 is not set
# CONFIG_SENSORS_LTC4151 is not set
CONFIG_SENSORS_LTC4215=y
CONFIG_SENSORS_LTC4222=y
# CONFIG_SENSORS_LTC4245 is not set
CONFIG_SENSORS_LTC4260=y
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=y
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_MAX1668=y
CONFIG_SENSORS_MAX197=y
CONFIG_SENSORS_MAX31730=y
# CONFIG_SENSORS_MAX31760 is not set
CONFIG_SENSORS_MAX6620=y
CONFIG_SENSORS_MAX6621=y
CONFIG_SENSORS_MAX6639=y
CONFIG_SENSORS_MAX6642=y
CONFIG_SENSORS_MAX6650=y
# CONFIG_SENSORS_MAX6697 is not set
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=y
CONFIG_SENSORS_MLXREG_FAN=y
# CONFIG_SENSORS_TC654 is not set
CONFIG_SENSORS_TPS23861=y
CONFIG_SENSORS_MR75203=y
CONFIG_SENSORS_LM63=y
CONFIG_SENSORS_LM73=y
CONFIG_SENSORS_LM75=y
CONFIG_SENSORS_LM77=y
# CONFIG_SENSORS_LM78 is not set
CONFIG_SENSORS_LM80=y
CONFIG_SENSORS_LM83=y
CONFIG_SENSORS_LM85=y
CONFIG_SENSORS_LM87=y
# CONFIG_SENSORS_LM90 is not set
CONFIG_SENSORS_LM92=y
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LM95234 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_LM95245 is not set
# CONFIG_SENSORS_PC87360 is not set
CONFIG_SENSORS_PC87427=y
# CONFIG_SENSORS_NTC_THERMISTOR is not set
CONFIG_SENSORS_NCT6683=y
CONFIG_SENSORS_NCT6775_CORE=y
CONFIG_SENSORS_NCT6775=y
# CONFIG_SENSORS_NCT6775_I2C is not set
CONFIG_SENSORS_NCT7802=y
# CONFIG_SENSORS_NPCM7XX is not set
# CONFIG_SENSORS_OCC_P8_I2C is not set
# CONFIG_SENSORS_OXP is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_PMBUS=y
CONFIG_SENSORS_PMBUS=y
CONFIG_SENSORS_ADM1266=y
CONFIG_SENSORS_ADM1275=y
CONFIG_SENSORS_BEL_PFE=y
CONFIG_SENSORS_BPA_RS600=y
CONFIG_SENSORS_DELTA_AHE50DC_FAN=y
# CONFIG_SENSORS_FSP_3Y is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
CONFIG_SENSORS_DPS920AB=y
CONFIG_SENSORS_INSPUR_IPSPS=y
CONFIG_SENSORS_IR35221=y
CONFIG_SENSORS_IR36021=y
CONFIG_SENSORS_IR38064=y
CONFIG_SENSORS_IR38064_REGULATOR=y
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=y
# CONFIG_SENSORS_LM25066_REGULATOR is not set
# CONFIG_SENSORS_LT7182S is not set
CONFIG_SENSORS_LTC2978=y
CONFIG_SENSORS_LTC2978_REGULATOR=y
# CONFIG_SENSORS_LTC3815 is not set
CONFIG_SENSORS_MAX15301=y
CONFIG_SENSORS_MAX16064=y
CONFIG_SENSORS_MAX16601=y
CONFIG_SENSORS_MAX20730=y
CONFIG_SENSORS_MAX20751=y
# CONFIG_SENSORS_MAX31785 is not set
# CONFIG_SENSORS_MAX34440 is not set
# CONFIG_SENSORS_MAX8688 is not set
CONFIG_SENSORS_MP2888=y
CONFIG_SENSORS_MP2975=y
CONFIG_SENSORS_MP5023=y
CONFIG_SENSORS_PIM4328=y
# CONFIG_SENSORS_PLI1209BC is not set
CONFIG_SENSORS_PM6764TR=y
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_Q54SJ108A2 is not set
CONFIG_SENSORS_STPDDC60=y
CONFIG_SENSORS_TPS40422=y
CONFIG_SENSORS_TPS53679=y
# CONFIG_SENSORS_TPS546D24 is not set
# CONFIG_SENSORS_UCD9000 is not set
# CONFIG_SENSORS_UCD9200 is not set
# CONFIG_SENSORS_XDPE152 is not set
CONFIG_SENSORS_XDPE122=y
# CONFIG_SENSORS_XDPE122_REGULATOR is not set
# CONFIG_SENSORS_ZL6100 is not set
# CONFIG_SENSORS_SBTSI is not set
CONFIG_SENSORS_SBRMI=y
CONFIG_SENSORS_SHT15=y
# CONFIG_SENSORS_SHT21 is not set
CONFIG_SENSORS_SHT3x=y
CONFIG_SENSORS_SHT4x=y
CONFIG_SENSORS_SHTC1=y
# CONFIG_SENSORS_SIS5595 is not set
CONFIG_SENSORS_DME1737=y
CONFIG_SENSORS_EMC1403=y
CONFIG_SENSORS_EMC2103=y
# CONFIG_SENSORS_EMC2305 is not set
CONFIG_SENSORS_EMC6W201=y
# CONFIG_SENSORS_SMSC47M1 is not set
CONFIG_SENSORS_SMSC47M192=y
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_STTS751 is not set
CONFIG_SENSORS_SMM665=y
CONFIG_SENSORS_ADC128D818=y
CONFIG_SENSORS_ADS7828=y
CONFIG_SENSORS_AMC6821=y
CONFIG_SENSORS_INA209=y
CONFIG_SENSORS_INA2XX=y
CONFIG_SENSORS_INA238=y
# CONFIG_SENSORS_INA3221 is not set
CONFIG_SENSORS_TC74=y
CONFIG_SENSORS_THMC50=y
# CONFIG_SENSORS_TMP102 is not set
CONFIG_SENSORS_TMP103=y
CONFIG_SENSORS_TMP108=y
CONFIG_SENSORS_TMP401=y
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_TMP464 is not set
CONFIG_SENSORS_TMP513=y
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
CONFIG_SENSORS_W83773G=y
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
CONFIG_SENSORS_W83792D=y
# CONFIG_SENSORS_W83793 is not set
CONFIG_SENSORS_W83795=y
CONFIG_SENSORS_W83795_FANCTRL=y
# CONFIG_SENSORS_W83L785TS is not set
CONFIG_SENSORS_W83L786NG=y
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_WM831X is not set
CONFIG_SENSORS_WM8350=y

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=y
CONFIG_SENSORS_ATK0110=y
# CONFIG_SENSORS_ASUS_EC is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
CONFIG_SSB_SDIOHOST_POSSIBLE=y
# CONFIG_SSB_SDIOHOST is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
# CONFIG_SSB_DRIVER_PCICORE is not set
CONFIG_SSB_DRIVER_GPIO=y
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=y
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
CONFIG_BCMA_HOST_SOC=y
CONFIG_BCMA_DRIVER_PCI=y
# CONFIG_BCMA_SFLASH is not set
# CONFIG_BCMA_DRIVER_GMAC_CMN is not set
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
CONFIG_MFD_AS3711=y
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
CONFIG_MFD_BD9571MWV=y
CONFIG_MFD_AXP20X=y
CONFIG_MFD_AXP20X_I2C=y
# CONFIG_MFD_CROS_EC_DEV is not set
CONFIG_MFD_MADERA=y
# CONFIG_MFD_MADERA_I2C is not set
# CONFIG_MFD_CS47L15 is not set
CONFIG_MFD_CS47L35=y
CONFIG_MFD_CS47L85=y
CONFIG_MFD_CS47L90=y
# CONFIG_MFD_CS47L92 is not set
CONFIG_PMIC_DA903X=y
# CONFIG_MFD_DA9052_I2C is not set
CONFIG_MFD_DA9055=y
CONFIG_MFD_DA9062=y
CONFIG_MFD_DA9063=y
CONFIG_MFD_DA9150=y
CONFIG_MFD_MC13XXX=y
CONFIG_MFD_MC13XXX_I2C=y
# CONFIG_MFD_MP2629 is not set
CONFIG_HTC_PASIC3=y
CONFIG_HTC_I2CPLD=y
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_INTEL_SOC_PMIC_MRFLD is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
# CONFIG_MFD_INTEL_LPSS_PCI is not set
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
CONFIG_MFD_88PM805=y
CONFIG_MFD_88PM860X=y
CONFIG_MFD_MAX14577=y
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
CONFIG_MFD_MAX8907=y
# CONFIG_MFD_MAX8925 is not set
CONFIG_MFD_MAX8997=y
# CONFIG_MFD_MAX8998 is not set
CONFIG_MFD_MT6360=y
# CONFIG_MFD_MT6370 is not set
CONFIG_MFD_MT6397=y
# CONFIG_MFD_MENF21BMC is not set
CONFIG_MFD_RETU=y
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_SY7636A is not set
# CONFIG_MFD_RDC321X is not set
CONFIG_MFD_RT4831=y
CONFIG_MFD_RT5033=y
# CONFIG_MFD_RT5120 is not set
# CONFIG_MFD_RC5T583 is not set
CONFIG_MFD_SI476X_CORE=y
CONFIG_MFD_SM501=y
# CONFIG_MFD_SM501_GPIO is not set
# CONFIG_MFD_SKY81452 is not set
CONFIG_MFD_SYSCON=y
CONFIG_MFD_TI_AM335X_TSCADC=y
CONFIG_MFD_LP3943=y
CONFIG_MFD_LP8788=y
CONFIG_MFD_TI_LMU=y
CONFIG_MFD_PALMAS=y
CONFIG_TPS6105X=y
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
CONFIG_MFD_TPS65086=y
CONFIG_MFD_TPS65090=y
CONFIG_MFD_TI_LP873X=y
CONFIG_MFD_TPS6586X=y
CONFIG_MFD_TPS65910=y
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
CONFIG_MFD_WL1273_CORE=y
CONFIG_MFD_LM3533=y
CONFIG_MFD_TQMX86=y
# CONFIG_MFD_VX855 is not set
CONFIG_MFD_ARIZONA=y
CONFIG_MFD_ARIZONA_I2C=y
# CONFIG_MFD_CS47L24 is not set
# CONFIG_MFD_WM5102 is not set
# CONFIG_MFD_WM5110 is not set
# CONFIG_MFD_WM8997 is not set
CONFIG_MFD_WM8998=y
CONFIG_MFD_WM8400=y
CONFIG_MFD_WM831X=y
CONFIG_MFD_WM831X_I2C=y
CONFIG_MFD_WM8350=y
CONFIG_MFD_WM8350_I2C=y
CONFIG_MFD_WM8994=y
# CONFIG_MFD_WCD934X is not set
CONFIG_MFD_ATC260X=y
CONFIG_MFD_ATC260X_I2C=y
CONFIG_RAVE_SP_CORE=y
# end of Multifunction device drivers

CONFIG_REGULATOR=y
CONFIG_REGULATOR_DEBUG=y
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_VIRTUAL_CONSUMER=y
# CONFIG_REGULATOR_USERSPACE_CONSUMER is not set
CONFIG_REGULATOR_88PG86X=y
CONFIG_REGULATOR_88PM8607=y
CONFIG_REGULATOR_ACT8865=y
# CONFIG_REGULATOR_AD5398 is not set
CONFIG_REGULATOR_AS3711=y
CONFIG_REGULATOR_ATC260X=y
CONFIG_REGULATOR_AXP20X=y
# CONFIG_REGULATOR_BD9571MWV is not set
CONFIG_REGULATOR_DA903X=y
CONFIG_REGULATOR_DA9055=y
CONFIG_REGULATOR_DA9062=y
# CONFIG_REGULATOR_DA9210 is not set
# CONFIG_REGULATOR_DA9211 is not set
CONFIG_REGULATOR_FAN53555=y
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_ISL9305=y
# CONFIG_REGULATOR_ISL6271A is not set
CONFIG_REGULATOR_LM363X=y
CONFIG_REGULATOR_LP3971=y
# CONFIG_REGULATOR_LP3972 is not set
CONFIG_REGULATOR_LP872X=y
CONFIG_REGULATOR_LP8755=y
CONFIG_REGULATOR_LP8788=y
# CONFIG_REGULATOR_LTC3589 is not set
CONFIG_REGULATOR_LTC3676=y
CONFIG_REGULATOR_MAX14577=y
CONFIG_REGULATOR_MAX1586=y
# CONFIG_REGULATOR_MAX8649 is not set
CONFIG_REGULATOR_MAX8660=y
# CONFIG_REGULATOR_MAX8893 is not set
CONFIG_REGULATOR_MAX8907=y
CONFIG_REGULATOR_MAX8952=y
# CONFIG_REGULATOR_MAX8997 is not set
# CONFIG_REGULATOR_MAX20086 is not set
# CONFIG_REGULATOR_MAX77826 is not set
# CONFIG_REGULATOR_MC13783 is not set
# CONFIG_REGULATOR_MC13892 is not set
CONFIG_REGULATOR_MP8859=y
CONFIG_REGULATOR_MT6311=y
# CONFIG_REGULATOR_MT6315 is not set
CONFIG_REGULATOR_MT6323=y
# CONFIG_REGULATOR_MT6331 is not set
# CONFIG_REGULATOR_MT6332 is not set
# CONFIG_REGULATOR_MT6357 is not set
CONFIG_REGULATOR_MT6358=y
CONFIG_REGULATOR_MT6359=y
# CONFIG_REGULATOR_MT6360 is not set
# CONFIG_REGULATOR_MT6397 is not set
CONFIG_REGULATOR_PALMAS=y
CONFIG_REGULATOR_PCA9450=y
CONFIG_REGULATOR_PV88060=y
CONFIG_REGULATOR_PV88080=y
# CONFIG_REGULATOR_PV88090 is not set
CONFIG_REGULATOR_QCOM_SPMI=y
CONFIG_REGULATOR_QCOM_USB_VBUS=y
CONFIG_REGULATOR_RT4801=y
CONFIG_REGULATOR_RT4831=y
CONFIG_REGULATOR_RT5033=y
# CONFIG_REGULATOR_RT5190A is not set
# CONFIG_REGULATOR_RT5759 is not set
CONFIG_REGULATOR_RT6160=y
# CONFIG_REGULATOR_RT6190 is not set
CONFIG_REGULATOR_RT6245=y
CONFIG_REGULATOR_RTQ2134=y
# CONFIG_REGULATOR_RTMV20 is not set
CONFIG_REGULATOR_RTQ6752=y
CONFIG_REGULATOR_SLG51000=y
# CONFIG_REGULATOR_TPS51632 is not set
CONFIG_REGULATOR_TPS6105X=y
CONFIG_REGULATOR_TPS62360=y
CONFIG_REGULATOR_TPS65023=y
# CONFIG_REGULATOR_TPS6507X is not set
CONFIG_REGULATOR_TPS65086=y
CONFIG_REGULATOR_TPS65090=y
CONFIG_REGULATOR_TPS65132=y
CONFIG_REGULATOR_TPS6586X=y
CONFIG_REGULATOR_TPS65910=y
# CONFIG_REGULATOR_TPS68470 is not set
CONFIG_REGULATOR_WM831X=y
# CONFIG_REGULATOR_WM8350 is not set
# CONFIG_REGULATOR_WM8400 is not set
CONFIG_REGULATOR_WM8994=y
CONFIG_REGULATOR_QCOM_LABIBB=y
# CONFIG_RC_CORE is not set
CONFIG_CEC_CORE=y
CONFIG_CEC_NOTIFIER=y

#
# CEC support
#
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_CH7322 is not set
CONFIG_CEC_CROS_EC=y
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_APERTURE_HELPERS=y
CONFIG_VIDEO_NOMODESET=y
# CONFIG_AGP is not set
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_DEBUG_MM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
CONFIG_DRM_DISPLAY_HELPER=y
CONFIG_DRM_DISPLAY_DP_HELPER=y
# CONFIG_DRM_DP_AUX_CHARDEV is not set
CONFIG_DRM_DP_CEC=y
CONFIG_DRM_GEM_SHMEM_HELPER=y
CONFIG_DRM_SCHED=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=y
CONFIG_DRM_I2C_SIL164=y
# CONFIG_DRM_I2C_NXP_TDA998X is not set
CONFIG_DRM_I2C_NXP_TDA9950=y
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
# CONFIG_DRM_I915 is not set
CONFIG_DRM_VGEM=y
CONFIG_DRM_VKMS=y
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_QXL is not set
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
CONFIG_DRM_ANALOGIX_ANX78XX=y
CONFIG_DRM_ANALOGIX_DP=y
# end of Display Interface Bridges

CONFIG_DRM_ETNAVIV=y
# CONFIG_DRM_ETNAVIV_THERMAL is not set
# CONFIG_DRM_BOCHS is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
CONFIG_DRM_SIMPLEDRM=y
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_SSD130X is not set
CONFIG_DRM_LEGACY=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
# CONFIG_FB is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_KTD253=y
# CONFIG_BACKLIGHT_LM3533 is not set
CONFIG_BACKLIGHT_DA903X=y
# CONFIG_BACKLIGHT_APPLE is not set
CONFIG_BACKLIGHT_QCOM_WLED=y
CONFIG_BACKLIGHT_RT4831=y
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_WM831X is not set
CONFIG_BACKLIGHT_ADP8860=y
# CONFIG_BACKLIGHT_ADP8870 is not set
CONFIG_BACKLIGHT_88PM860X=y
CONFIG_BACKLIGHT_LM3639=y
CONFIG_BACKLIGHT_AS3711=y
CONFIG_BACKLIGHT_GPIO=y
# CONFIG_BACKLIGHT_LV5207LP is not set
CONFIG_BACKLIGHT_BD6107=y
# CONFIG_BACKLIGHT_ARCXCNN is not set
CONFIG_BACKLIGHT_RAVE_SP=y
# end of Backlight & LCD device support

CONFIG_HDMI=y
# end of Graphics support

# CONFIG_DRM_ACCEL is not set
# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
# CONFIG_HID_BATTERY_STRENGTH is not set
# CONFIG_HIDRAW is not set
# CONFIG_UHID is not set
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
# CONFIG_HID_A4TECH is not set
# CONFIG_HID_ACRUX is not set
# CONFIG_HID_APPLE is not set
# CONFIG_HID_AUREAL is not set
# CONFIG_HID_BELKIN is not set
# CONFIG_HID_CHERRY is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
# CONFIG_HID_CMEDIA is not set
# CONFIG_HID_CYPRESS is not set
# CONFIG_HID_DRAGONRISE is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELECOM is not set
# CONFIG_HID_EZKEY is not set
# CONFIG_HID_GEMBIRD is not set
# CONFIG_HID_GFRM is not set
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_VIVALDI is not set
# CONFIG_HID_KEYTOUCH is not set
# CONFIG_HID_KYE is not set
# CONFIG_HID_WALTOP is not set
# CONFIG_HID_VIEWSONIC is not set
# CONFIG_HID_VRC2 is not set
# CONFIG_HID_XIAOMI is not set
# CONFIG_HID_GYRATION is not set
# CONFIG_HID_ICADE is not set
# CONFIG_HID_ITE is not set
# CONFIG_HID_JABRA is not set
# CONFIG_HID_TWINHAN is not set
# CONFIG_HID_KENSINGTON is not set
# CONFIG_HID_LCPOWER is not set
# CONFIG_HID_LED is not set
# CONFIG_HID_LENOVO is not set
# CONFIG_HID_MAGICMOUSE is not set
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
# CONFIG_HID_MICROSOFT is not set
# CONFIG_HID_MONTEREY is not set
# CONFIG_HID_MULTITOUCH is not set
# CONFIG_HID_NINTENDO is not set
# CONFIG_HID_NTI is not set
# CONFIG_HID_ORTEK is not set
# CONFIG_HID_PANTHERLORD is not set
# CONFIG_HID_PETALYNX is not set
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_PLANTRONICS is not set
# CONFIG_HID_PLAYSTATION is not set
# CONFIG_HID_PXRC is not set
# CONFIG_HID_RAZER is not set
# CONFIG_HID_PRIMAX is not set
# CONFIG_HID_SAITEK is not set
# CONFIG_HID_SEMITEK is not set
# CONFIG_HID_SPEEDLINK is not set
# CONFIG_HID_STEAM is not set
# CONFIG_HID_STEELSERIES is not set
# CONFIG_HID_SUNPLUS is not set
# CONFIG_HID_RMI is not set
# CONFIG_HID_GREENASIA is not set
# CONFIG_HID_SMARTJOYPLUS is not set
# CONFIG_HID_TIVO is not set
# CONFIG_HID_TOPSEED is not set
# CONFIG_HID_TOPRE is not set
# CONFIG_HID_THINGM is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_WIIMOTE is not set
# CONFIG_HID_XINMO is not set
# CONFIG_HID_ZEROPLUS is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_SENSOR_HUB is not set
# CONFIG_HID_ALPS is not set
# end of Special HID drivers

#
# I2C HID support
#
# CONFIG_I2C_HID_ACPI is not set
# end of I2C HID support

#
# Intel ISH HID support
#
# CONFIG_INTEL_ISH_HID is not set
# end of Intel ISH HID support

#
# AMD SFH HID Support
#
# CONFIG_AMD_SFH_HID is not set
# end of AMD SFH HID Support

#
# Surface System Aggregator Module HID support
#
# CONFIG_SURFACE_KBD is not set
# end of Surface System Aggregator Module HID support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
CONFIG_USB_ULPI_BUS=y
CONFIG_USB_CONN_GPIO=y
CONFIG_USB_ARCH_HAS_HCD=y
# CONFIG_USB is not set
CONFIG_USB_PCI=y
CONFIG_USB_CDNS_SUPPORT=y
# CONFIG_USB_CDNS3 is not set
# CONFIG_USB_CDNSP_PCI is not set
CONFIG_USB_MUSB_HDRC=y
CONFIG_USB_MUSB_GADGET=y

#
# Platform Glue Layer
#

#
# MUSB DMA mode
#
# CONFIG_MUSB_PIO_ONLY is not set
CONFIG_USB_DWC3=y
CONFIG_USB_DWC3_ULPI=y
CONFIG_USB_DWC3_GADGET=y

#
# Platform Glue Driver Support
#
CONFIG_USB_DWC3_PCI=y
CONFIG_USB_DWC3_HAPS=y
CONFIG_USB_DWC2=y

#
# Gadget/Dual-role mode requires USB Gadget support to be enabled
#
CONFIG_USB_DWC2_PERIPHERAL=y
# CONFIG_USB_DWC2_PCI is not set
CONFIG_USB_DWC2_DEBUG=y
# CONFIG_USB_DWC2_VERBOSE is not set
# CONFIG_USB_DWC2_TRACK_MISSED_SOFS is not set
# CONFIG_USB_DWC2_DEBUG_PERIODIC is not set
CONFIG_USB_CHIPIDEA=y
CONFIG_USB_CHIPIDEA_UDC=y
CONFIG_USB_CHIPIDEA_PCI=y
CONFIG_USB_CHIPIDEA_MSM=y
CONFIG_USB_CHIPIDEA_GENERIC=y
CONFIG_USB_ISP1760=y
CONFIG_USB_ISP1761_UDC=y
CONFIG_USB_ISP1760_GADGET_ROLE=y

#
# USB port drivers
#

#
# USB Physical Layer drivers
#
CONFIG_USB_PHY=y
CONFIG_NOP_USB_XCEIV=y
CONFIG_USB_GPIO_VBUS=y
CONFIG_TAHVO_USB=y
# CONFIG_TAHVO_USB_HOST_BY_DEFAULT is not set
CONFIG_USB_ISP1301=y
# end of USB Physical Layer drivers

CONFIG_USB_GADGET=y
# CONFIG_USB_GADGET_DEBUG is not set
# CONFIG_USB_GADGET_DEBUG_FILES is not set
CONFIG_USB_GADGET_DEBUG_FS=y
CONFIG_USB_GADGET_VBUS_DRAW=2
CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2

#
# USB Peripheral Controller
#
# CONFIG_USB_FOTG210_UDC is not set
CONFIG_USB_GR_UDC=y
# CONFIG_USB_R8A66597 is not set
CONFIG_USB_PXA27X=y
CONFIG_USB_MV_UDC=y
CONFIG_USB_MV_U3D=y
CONFIG_USB_M66592=y
CONFIG_USB_BDC_UDC=y
# CONFIG_USB_AMD5536UDC is not set
CONFIG_USB_NET2272=y
# CONFIG_USB_NET2272_DMA is not set
# CONFIG_USB_NET2280 is not set
# CONFIG_USB_GOKU is not set
# CONFIG_USB_EG20T is not set
# end of USB Peripheral Controller

CONFIG_USB_LIBCOMPOSITE=y
CONFIG_USB_F_FS=y
CONFIG_USB_F_HID=y
CONFIG_USB_F_PRINTER=y
# CONFIG_USB_CONFIGFS is not set

#
# USB Gadget precomposed configurations
#
# CONFIG_USB_ZERO is not set
# CONFIG_USB_ETH is not set
# CONFIG_USB_G_NCM is not set
# CONFIG_USB_GADGETFS is not set
CONFIG_USB_FUNCTIONFS=y
# CONFIG_USB_FUNCTIONFS_ETH is not set
# CONFIG_USB_FUNCTIONFS_RNDIS is not set
CONFIG_USB_FUNCTIONFS_GENERIC=y
# CONFIG_USB_G_SERIAL is not set
CONFIG_USB_G_PRINTER=y
# CONFIG_USB_CDC_COMPOSITE is not set
CONFIG_USB_G_HID=y
# CONFIG_USB_G_DBGP is not set
CONFIG_USB_RAW_GADGET=y
# end of USB Gadget precomposed configurations

# CONFIG_TYPEC is not set
CONFIG_USB_ROLE_SWITCH=y
CONFIG_USB_ROLES_INTEL_XHCI=y
CONFIG_MMC=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=y
# CONFIG_MMC_SDHCI_PCI is not set
# CONFIG_MMC_SDHCI_ACPI is not set
CONFIG_MMC_SDHCI_PLTFM=y
CONFIG_MMC_SDHCI_F_SDH30=y
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
CONFIG_MMC_USDHI6ROL0=y
CONFIG_MMC_CQHCI=y
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_CLASS_FLASH=y
CONFIG_LEDS_CLASS_MULTICOLOR=y
CONFIG_LEDS_BRIGHTNESS_HW_CHANGED=y

#
# LED drivers
#
CONFIG_LEDS_88PM860X=y
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=y
CONFIG_LEDS_LM3532=y
CONFIG_LEDS_LM3533=y
# CONFIG_LEDS_LM3642 is not set
CONFIG_LEDS_MT6323=y
# CONFIG_LEDS_PCA9532 is not set
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_LP3944=y
# CONFIG_LEDS_LP3952 is not set
CONFIG_LEDS_LP50XX=y
# CONFIG_LEDS_LP8788 is not set
CONFIG_LEDS_PCA955X=y
CONFIG_LEDS_PCA955X_GPIO=y
# CONFIG_LEDS_PCA963X is not set
CONFIG_LEDS_WM831X_STATUS=y
CONFIG_LEDS_WM8350=y
# CONFIG_LEDS_DA903X is not set
CONFIG_LEDS_REGULATOR=y
CONFIG_LEDS_BD2802=y
# CONFIG_LEDS_INTEL_SS4200 is not set
CONFIG_LEDS_LT3593=y
# CONFIG_LEDS_MC13783 is not set
# CONFIG_LEDS_TCA6507 is not set
CONFIG_LEDS_TLC591XX=y
CONFIG_LEDS_MAX8997=y
CONFIG_LEDS_LM355x=y
# CONFIG_LEDS_IS31FL319X is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=y
# CONFIG_LEDS_MLXCPLD is not set
CONFIG_LEDS_MLXREG=y
CONFIG_LEDS_USER=y
CONFIG_LEDS_NIC78BX=y
CONFIG_LEDS_TI_LMU_COMMON=y
# CONFIG_LEDS_LM36274 is not set
# CONFIG_LEDS_TPS6105X is not set

#
# Flash and Torch LED drivers
#
CONFIG_LEDS_AS3645A=y
CONFIG_LEDS_LM3601X=y
CONFIG_LEDS_RT8515=y
CONFIG_LEDS_SGM3140=y

#
# RGB LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_ONESHOT=y
# CONFIG_LEDS_TRIGGER_MTD is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_BACKLIGHT=y
# CONFIG_LEDS_TRIGGER_CPU is not set
CONFIG_LEDS_TRIGGER_ACTIVITY=y
CONFIG_LEDS_TRIGGER_GPIO=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=y
CONFIG_LEDS_TRIGGER_CAMERA=y
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=y
# CONFIG_LEDS_TRIGGER_TTY is not set

#
# Simple LED drivers
#
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_GHES=y
# CONFIG_EDAC_E752X is not set
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_I3200 is not set
# CONFIG_EDAC_IE31200 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
# CONFIG_EDAC_I5100 is not set
# CONFIG_EDAC_I7300 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_SYSTOHC=y
CONFIG_RTC_SYSTOHC_DEVICE="rtc0"
CONFIG_RTC_DEBUG=y
# CONFIG_RTC_NVMEM is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
# CONFIG_RTC_INTF_DEV is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_88PM860X=y
CONFIG_RTC_DRV_ABB5ZES3=y
CONFIG_RTC_DRV_ABEOZ9=y
CONFIG_RTC_DRV_ABX80X=y
CONFIG_RTC_DRV_DS1307=y
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=y
CONFIG_RTC_DRV_DS1672=y
# CONFIG_RTC_DRV_LP8788 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_MAX8907=y
CONFIG_RTC_DRV_MAX8997=y
CONFIG_RTC_DRV_RS5C372=y
CONFIG_RTC_DRV_ISL1208=y
# CONFIG_RTC_DRV_ISL12022 is not set
CONFIG_RTC_DRV_X1205=y
CONFIG_RTC_DRV_PCF8523=y
# CONFIG_RTC_DRV_PCF85063 is not set
CONFIG_RTC_DRV_PCF85363=y
CONFIG_RTC_DRV_PCF8563=y
CONFIG_RTC_DRV_PCF8583=y
CONFIG_RTC_DRV_M41T80=y
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=y
# CONFIG_RTC_DRV_PALMAS is not set
CONFIG_RTC_DRV_TPS6586X=y
# CONFIG_RTC_DRV_TPS65910 is not set
CONFIG_RTC_DRV_S35390A=y
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8010 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
CONFIG_RTC_DRV_EM3027=y
CONFIG_RTC_DRV_RV3028=y
# CONFIG_RTC_DRV_RV3032 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=y
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=y
CONFIG_RTC_DRV_RV3029_HWMON=y
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=y
CONFIG_RTC_DRV_DS1511=y
CONFIG_RTC_DRV_DS1553=y
CONFIG_RTC_DRV_DS1685_FAMILY=y
# CONFIG_RTC_DRV_DS1685 is not set
# CONFIG_RTC_DRV_DS1689 is not set
# CONFIG_RTC_DRV_DS17285 is not set
# CONFIG_RTC_DRV_DS17485 is not set
CONFIG_RTC_DRV_DS17885=y
CONFIG_RTC_DRV_DS1742=y
CONFIG_RTC_DRV_DS2404=y
# CONFIG_RTC_DRV_DA9055 is not set
CONFIG_RTC_DRV_DA9063=y
CONFIG_RTC_DRV_STK17TA8=y
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=y
CONFIG_RTC_DRV_M48T59=y
# CONFIG_RTC_DRV_MSM6242 is not set
CONFIG_RTC_DRV_BQ4802=y
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set
# CONFIG_RTC_DRV_WM831X is not set
CONFIG_RTC_DRV_WM8350=y
# CONFIG_RTC_DRV_CROS_EC is not set

#
# on-CPU RTC drivers
#
CONFIG_RTC_DRV_FTRTC010=y
CONFIG_RTC_DRV_MC13XXX=y
# CONFIG_RTC_DRV_MT6397 is not set

#
# HID Sensor RTC drivers
#
CONFIG_RTC_DRV_GOLDFISH=y
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=y
# CONFIG_INTEL_IDXD_COMPAT is not set
# CONFIG_INTEL_IOATDMA is not set
# CONFIG_PLX_DMA is not set
# CONFIG_AMD_PTDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=y
# CONFIG_DW_DMAC_PCI is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set
# CONFIG_INTEL_LDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=y
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
CONFIG_UDMABUF=y
# CONFIG_DMABUF_MOVE_NOTIFY is not set
CONFIG_DMABUF_DEBUG=y
# CONFIG_DMABUF_SELFTESTS is not set
CONFIG_DMABUF_HEAPS=y
CONFIG_DMABUF_SYSFS_STATS=y
CONFIG_DMABUF_HEAPS_SYSTEM=y
CONFIG_DMABUF_HEAPS_CMA=y
# end of DMABUF options

CONFIG_AUXDISPLAY=y
CONFIG_CHARLCD=y
CONFIG_LINEDISP=y
CONFIG_HD44780_COMMON=y
CONFIG_HD44780=y
CONFIG_IMG_ASCII_LCD=y
CONFIG_LCD2S=y
CONFIG_PARPORT_PANEL=y
CONFIG_PANEL_PARPORT=0
CONFIG_PANEL_PROFILE=5
CONFIG_PANEL_CHANGE_MESSAGE=y
CONFIG_PANEL_BOOT_MESSAGE=""
CONFIG_CHARLCD_BL_OFF=y
# CONFIG_CHARLCD_BL_ON is not set
# CONFIG_CHARLCD_BL_FLASH is not set
CONFIG_PANEL=y
# CONFIG_UIO is not set
# CONFIG_VFIO is not set
CONFIG_VIRT_DRIVERS=y
CONFIG_VMGENID=y
# CONFIG_VBOXGUEST is not set
# CONFIG_EFI_SECRET is not set
CONFIG_VIRTIO_ANCHOR=y
CONFIG_VIRTIO=y
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_MENU=y
# CONFIG_VHOST_NET is not set
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_HYPERV is not set
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
CONFIG_COMEDI=y
CONFIG_COMEDI_DEBUG=y
CONFIG_COMEDI_DEFAULT_BUF_SIZE_KB=2048
CONFIG_COMEDI_DEFAULT_BUF_MAXSIZE_KB=20480
# CONFIG_COMEDI_MISC_DRIVERS is not set
# CONFIG_COMEDI_ISA_DRIVERS is not set
# CONFIG_COMEDI_PCI_DRIVERS is not set
CONFIG_COMEDI_8255=y
CONFIG_COMEDI_8255_SA=y
# CONFIG_COMEDI_KCOMEDILIB is not set
# CONFIG_COMEDI_TESTS is not set
CONFIG_STAGING=y
# CONFIG_RTLLIB is not set

#
# IIO staging drivers
#

#
# Accelerometers
#
# end of Accelerometers

#
# Analog to digital converters
#
# end of Analog to digital converters

#
# Analog digital bi-direction converters
#
# CONFIG_ADT7316 is not set
# end of Analog digital bi-direction converters

#
# Direct Digital Synthesis
#
# end of Direct Digital Synthesis

#
# Network Analyzer, Impedance Converters
#
# CONFIG_AD5933 is not set
# end of Network Analyzer, Impedance Converters

#
# Active energy metering IC
#
CONFIG_ADE7854=y
CONFIG_ADE7854_I2C=y
# end of Active energy metering IC

#
# Resolver to digital converters
#
# end of Resolver to digital converters
# end of IIO staging drivers

# CONFIG_STAGING_MEDIA is not set
# CONFIG_MOST_COMPONENTS is not set
# CONFIG_KS7010 is not set
CONFIG_FIELDBUS_DEV=y
# CONFIG_QLGE is not set
# CONFIG_VME_BUS is not set
CONFIG_CHROME_PLATFORMS=y
# CONFIG_CHROMEOS_ACPI is not set
# CONFIG_CHROMEOS_LAPTOP is not set
CONFIG_CHROMEOS_PSTORE=y
# CONFIG_CHROMEOS_TBMC is not set
CONFIG_CROS_EC=y
# CONFIG_CROS_EC_I2C is not set
# CONFIG_CROS_EC_LPC is not set
CONFIG_CROS_EC_PROTO=y
CONFIG_CROS_KBD_LED_BACKLIGHT=y
# CONFIG_CROS_HPS_I2C is not set
# CONFIG_CHROMEOS_PRIVACY_SCREEN is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=y
CONFIG_MLXREG_IO=y
CONFIG_MLXREG_LC=y
# CONFIG_NVSW_SN2201 is not set
CONFIG_SURFACE_PLATFORMS=y
CONFIG_SURFACE_3_POWER_OPREGION=y
CONFIG_SURFACE_ACPI_NOTIFY=y
CONFIG_SURFACE_AGGREGATOR_CDEV=y
# CONFIG_SURFACE_DTX is not set
# CONFIG_SURFACE_GPE is not set
CONFIG_SURFACE_HOTPLUG=y
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_SURFACE_AGGREGATOR=y
# CONFIG_SURFACE_AGGREGATOR_BUS is not set
# CONFIG_SURFACE_AGGREGATOR_ERROR_INJECTION is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ACPI_WMI is not set
# CONFIG_ACER_WIRELESS is not set
# CONFIG_AMD_PMF is not set
# CONFIG_AMD_PMC is not set
# CONFIG_ADV_SWBUTTON is not set
# CONFIG_APPLE_GMUX is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_ASUS_WIRELESS is not set
# CONFIG_ASUS_TF103C_DOCK is not set
CONFIG_X86_PLATFORM_DRIVERS_DELL=y
CONFIG_DCDBAS=y
CONFIG_DELL_LAPTOP=m
CONFIG_DELL_RBU=y
CONFIG_DELL_SMBIOS=y
# CONFIG_DELL_SMBIOS_SMM is not set
CONFIG_DELL_SMO8800=y
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
# CONFIG_X86_PLATFORM_DRIVERS_HP is not set
# CONFIG_WIRELESS_HOTKEY is not set
# CONFIG_IBM_RTL is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_THINKPAD_ACPI is not set
CONFIG_INTEL_ATOMISP2_PDX86=y
CONFIG_INTEL_ATOMISP2_LED=y
CONFIG_INTEL_SAR_INT1092=y
CONFIG_INTEL_SKL_INT3472=y
# CONFIG_INTEL_PMC_CORE is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

#
# Intel Uncore Frequency Control
#
CONFIG_INTEL_UNCORE_FREQ_CONTROL=y
# end of Intel Uncore Frequency Control

# CONFIG_INTEL_HID_EVENT is not set
# CONFIG_INTEL_VBTN is not set
CONFIG_INTEL_INT0002_VGPIO=y
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_RST is not set
CONFIG_INTEL_SMARTCONNECT=y
# CONFIG_INTEL_VSEC is not set
# CONFIG_PCENGINES_APU2 is not set
CONFIG_BARCO_P50_GPIO=y
# CONFIG_SAMSUNG_LAPTOP is not set
CONFIG_SAMSUNG_Q10=y
CONFIG_TOSHIBA_BT_RFKILL=y
CONFIG_TOSHIBA_HAPS=y
# CONFIG_ACPI_CMPC is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_SYSTEM76_ACPI is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_MLX_PLATFORM is not set
# CONFIG_INTEL_IPS is not set
CONFIG_INTEL_SCU_IPC=y
CONFIG_INTEL_SCU=y
# CONFIG_INTEL_SCU_PCI is not set
CONFIG_INTEL_SCU_PLATFORM=y
# CONFIG_INTEL_SCU_IPC_UTIL is not set
# CONFIG_SIEMENS_SIMATIC_IPC is not set
# CONFIG_WINMATE_FM07_KEYS is not set
# CONFIG_P2SB is not set
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
CONFIG_COMMON_CLK_WM831X=y
CONFIG_COMMON_CLK_MAX9485=y
CONFIG_COMMON_CLK_SI5341=y
CONFIG_COMMON_CLK_SI5351=y
CONFIG_COMMON_CLK_SI544=y
CONFIG_COMMON_CLK_CDCE706=y
# CONFIG_COMMON_CLK_TPS68470 is not set
CONFIG_COMMON_CLK_CS2000_CP=y
CONFIG_COMMON_CLK_PALMAS=y
# CONFIG_XILINX_VCU is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_VIRTIO_IOMMU=y

#
# Remoteproc drivers
#
CONFIG_REMOTEPROC=y
# CONFIG_REMOTEPROC_CDEV is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# fujitsu SoC drivers
#
# end of fujitsu SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

CONFIG_PM_DEVFREQ=y

#
# DEVFREQ Governors
#
# CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND is not set
CONFIG_DEVFREQ_GOV_PERFORMANCE=y
# CONFIG_DEVFREQ_GOV_POWERSAVE is not set
# CONFIG_DEVFREQ_GOV_USERSPACE is not set
# CONFIG_DEVFREQ_GOV_PASSIVE is not set

#
# DEVFREQ Drivers
#
# CONFIG_PM_DEVFREQ_EVENT is not set
CONFIG_EXTCON=y

#
# Extcon Device Drivers
#
CONFIG_EXTCON_ADC_JACK=y
# CONFIG_EXTCON_FSA9480 is not set
CONFIG_EXTCON_GPIO=y
CONFIG_EXTCON_INTEL_INT3496=y
CONFIG_EXTCON_MAX14577=y
CONFIG_EXTCON_MAX3355=y
CONFIG_EXTCON_MAX8997=y
CONFIG_EXTCON_PALMAS=y
CONFIG_EXTCON_PTN5150=y
# CONFIG_EXTCON_RT8973A is not set
# CONFIG_EXTCON_SM5502 is not set
CONFIG_EXTCON_USB_GPIO=y
CONFIG_EXTCON_USBC_CROS_EC=y
CONFIG_MEMORY=y
CONFIG_FPGA_DFL_EMIF=y
CONFIG_IIO=y
CONFIG_IIO_BUFFER=y
CONFIG_IIO_BUFFER_CB=y
CONFIG_IIO_BUFFER_DMA=y
CONFIG_IIO_BUFFER_DMAENGINE=y
CONFIG_IIO_BUFFER_HW_CONSUMER=y
CONFIG_IIO_KFIFO_BUF=y
CONFIG_IIO_TRIGGERED_BUFFER=y
CONFIG_IIO_CONFIGFS=y
CONFIG_IIO_TRIGGER=y
CONFIG_IIO_CONSUMERS_PER_TRIGGER=2
CONFIG_IIO_SW_DEVICE=y
CONFIG_IIO_SW_TRIGGER=y
# CONFIG_IIO_TRIGGERED_EVENT is not set

#
# Accelerometers
#
# CONFIG_ADXL313_I2C is not set
CONFIG_ADXL345=y
CONFIG_ADXL345_I2C=y
# CONFIG_ADXL355_I2C is not set
# CONFIG_ADXL367_I2C is not set
CONFIG_ADXL372=y
CONFIG_ADXL372_I2C=y
CONFIG_BMA180=y
# CONFIG_BMA400 is not set
CONFIG_BMC150_ACCEL=y
CONFIG_BMC150_ACCEL_I2C=y
CONFIG_DA280=y
# CONFIG_DA311 is not set
# CONFIG_DMARD06 is not set
CONFIG_DMARD09=y
CONFIG_DMARD10=y
CONFIG_FXLS8962AF=y
CONFIG_FXLS8962AF_I2C=y
CONFIG_IIO_ST_ACCEL_3AXIS=y
# CONFIG_IIO_ST_ACCEL_I2C_3AXIS is not set
# CONFIG_KXSD9 is not set
# CONFIG_KXCJK1013 is not set
CONFIG_MC3230=y
CONFIG_MMA7455=y
CONFIG_MMA7455_I2C=y
# CONFIG_MMA7660 is not set
CONFIG_MMA8452=y
CONFIG_MMA9551_CORE=y
CONFIG_MMA9551=y
CONFIG_MMA9553=y
# CONFIG_MSA311 is not set
CONFIG_MXC4005=y
CONFIG_MXC6255=y
CONFIG_STK8312=y
CONFIG_STK8BA50=y
# end of Accelerometers

#
# Analog to digital converters
#
CONFIG_AD7091R5=y
CONFIG_AD7291=y
# CONFIG_AD7606_IFACE_PARALLEL is not set
CONFIG_AD799X=y
CONFIG_AXP20X_ADC=y
# CONFIG_AXP288_ADC is not set
# CONFIG_CC10001_ADC is not set
# CONFIG_DA9150_GPADC is not set
# CONFIG_ENVELOPE_DETECTOR is not set
# CONFIG_HX711 is not set
CONFIG_LP8788_ADC=y
CONFIG_LTC2471=y
# CONFIG_LTC2485 is not set
CONFIG_LTC2497=y
CONFIG_MAX1363=y
CONFIG_MAX9611=y
CONFIG_MCP3422=y
CONFIG_MEDIATEK_MT6360_ADC=y
# CONFIG_MEN_Z188_ADC is not set
CONFIG_NAU7802=y
# CONFIG_PALMAS_GPADC is not set
CONFIG_QCOM_VADC_COMMON=y
CONFIG_QCOM_SPMI_IADC=y
CONFIG_QCOM_SPMI_VADC=y
CONFIG_QCOM_SPMI_ADC5=y
# CONFIG_RICHTEK_RTQ6056 is not set
# CONFIG_SD_ADC_MODULATOR is not set
# CONFIG_TI_ADC081C is not set
# CONFIG_TI_ADS1015 is not set
CONFIG_TI_AM335X_ADC=y
# CONFIG_VF610_ADC is not set
CONFIG_XILINX_XADC=y
# end of Analog to digital converters

#
# Analog to digital and digital to analog converters
#
# end of Analog to digital and digital to analog converters

#
# Analog Front Ends
#
# CONFIG_IIO_RESCALE is not set
# end of Analog Front Ends

#
# Amplifiers
#
# CONFIG_HMC425 is not set
# end of Amplifiers

#
# Capacitance to digital converters
#
# CONFIG_AD7150 is not set
CONFIG_AD7746=y
# end of Capacitance to digital converters

#
# Chemical Sensors
#
CONFIG_ATLAS_PH_SENSOR=y
CONFIG_ATLAS_EZO_SENSOR=y
# CONFIG_BME680 is not set
CONFIG_CCS811=y
CONFIG_IAQCORE=y
# CONFIG_PMS7003 is not set
CONFIG_SCD30_CORE=y
# CONFIG_SCD30_I2C is not set
CONFIG_SCD30_SERIAL=y
CONFIG_SCD4X=y
# CONFIG_SENSIRION_SGP30 is not set
CONFIG_SENSIRION_SGP40=y
CONFIG_SPS30=y
# CONFIG_SPS30_I2C is not set
CONFIG_SPS30_SERIAL=y
CONFIG_SENSEAIR_SUNRISE_CO2=y
# CONFIG_VZ89X is not set
# end of Chemical Sensors

#
# Hid Sensor IIO Common
#
# end of Hid Sensor IIO Common

CONFIG_IIO_MS_SENSORS_I2C=y

#
# IIO SCMI Sensors
#
# end of IIO SCMI Sensors

#
# SSP Sensor Common
#
# end of SSP Sensor Common

CONFIG_IIO_ST_SENSORS_I2C=y
CONFIG_IIO_ST_SENSORS_CORE=y

#
# Digital to analog converters
#
CONFIG_AD5064=y
# CONFIG_AD5380 is not set
CONFIG_AD5446=y
CONFIG_AD5592R_BASE=y
CONFIG_AD5593R=y
# CONFIG_AD5696_I2C is not set
# CONFIG_DPOT_DAC is not set
CONFIG_DS4424=y
# CONFIG_M62332 is not set
# CONFIG_MAX517 is not set
CONFIG_MAX5821=y
CONFIG_MCP4725=y
CONFIG_TI_DAC5571=y
# CONFIG_VF610_DAC is not set
# end of Digital to analog converters

#
# IIO dummy driver
#
# CONFIG_IIO_SIMPLE_DUMMY is not set
# end of IIO dummy driver

#
# Filters
#
# end of Filters

#
# Frequency Synthesizers DDS/PLL
#

#
# Clock Generator/Distribution
#
# end of Clock Generator/Distribution

#
# Phase-Locked Loop (PLL) frequency synthesizers
#
# end of Phase-Locked Loop (PLL) frequency synthesizers
# end of Frequency Synthesizers DDS/PLL

#
# Digital gyroscope sensors
#
CONFIG_BMG160=y
CONFIG_BMG160_I2C=y
CONFIG_FXAS21002C=y
CONFIG_FXAS21002C_I2C=y
# CONFIG_MPU3050_I2C is not set
CONFIG_IIO_ST_GYRO_3AXIS=y
CONFIG_IIO_ST_GYRO_I2C_3AXIS=y
CONFIG_ITG3200=y
# end of Digital gyroscope sensors

#
# Health Sensors
#

#
# Heart Rate Monitors
#
# CONFIG_AFE4404 is not set
CONFIG_MAX30100=y
CONFIG_MAX30102=y
# end of Heart Rate Monitors
# end of Health Sensors

#
# Humidity sensors
#
CONFIG_AM2315=y
# CONFIG_DHT11 is not set
CONFIG_HDC100X=y
# CONFIG_HDC2010 is not set
CONFIG_HTS221=y
CONFIG_HTS221_I2C=y
CONFIG_HTU21=y
# CONFIG_SI7005 is not set
CONFIG_SI7020=y
# end of Humidity sensors

#
# Inertial measurement units
#
CONFIG_BMI160=y
CONFIG_BMI160_I2C=y
# CONFIG_BOSCH_BNO055_SERIAL is not set
# CONFIG_BOSCH_BNO055_I2C is not set
CONFIG_FXOS8700=y
CONFIG_FXOS8700_I2C=y
CONFIG_KMX61=y
CONFIG_INV_ICM42600=y
CONFIG_INV_ICM42600_I2C=y
# CONFIG_INV_MPU6050_I2C is not set
CONFIG_IIO_ST_LSM6DSX=y
CONFIG_IIO_ST_LSM6DSX_I2C=y
CONFIG_IIO_ST_LSM6DSX_I3C=y
# CONFIG_IIO_ST_LSM9DS0 is not set
# end of Inertial measurement units

#
# Light sensors
#
CONFIG_ACPI_ALS=y
CONFIG_ADJD_S311=y
CONFIG_ADUX1020=y
CONFIG_AL3010=y
CONFIG_AL3320A=y
# CONFIG_APDS9300 is not set
CONFIG_APDS9960=y
# CONFIG_AS73211 is not set
CONFIG_BH1750=y
CONFIG_BH1780=y
CONFIG_CM32181=y
# CONFIG_CM3232 is not set
CONFIG_CM3323=y
# CONFIG_CM3605 is not set
# CONFIG_CM36651 is not set
CONFIG_GP2AP002=y
CONFIG_GP2AP020A00F=y
CONFIG_SENSORS_ISL29018=y
# CONFIG_SENSORS_ISL29028 is not set
CONFIG_ISL29125=y
CONFIG_JSA1212=y
CONFIG_RPR0521=y
CONFIG_SENSORS_LM3533=y
CONFIG_LTR501=y
# CONFIG_LTRF216A is not set
# CONFIG_LV0104CS is not set
CONFIG_MAX44000=y
# CONFIG_MAX44009 is not set
CONFIG_NOA1305=y
CONFIG_OPT3001=y
CONFIG_PA12203001=y
# CONFIG_SI1133 is not set
# CONFIG_SI1145 is not set
# CONFIG_STK3310 is not set
# CONFIG_ST_UVIS25 is not set
# CONFIG_TCS3414 is not set
CONFIG_TCS3472=y
CONFIG_SENSORS_TSL2563=y
# CONFIG_TSL2583 is not set
# CONFIG_TSL2591 is not set
# CONFIG_TSL2772 is not set
CONFIG_TSL4531=y
CONFIG_US5182D=y
CONFIG_VCNL4000=y
# CONFIG_VCNL4035 is not set
# CONFIG_VEML6030 is not set
# CONFIG_VEML6070 is not set
# CONFIG_VL6180 is not set
# CONFIG_ZOPT2201 is not set
# end of Light sensors

#
# Magnetometer sensors
#
# CONFIG_AK8974 is not set
CONFIG_AK8975=y
# CONFIG_AK09911 is not set
CONFIG_BMC150_MAGN=y
CONFIG_BMC150_MAGN_I2C=y
CONFIG_MAG3110=y
# CONFIG_MMC35240 is not set
CONFIG_IIO_ST_MAGN_3AXIS=y
# CONFIG_IIO_ST_MAGN_I2C_3AXIS is not set
# CONFIG_SENSORS_HMC5843_I2C is not set
CONFIG_SENSORS_RM3100=y
CONFIG_SENSORS_RM3100_I2C=y
CONFIG_YAMAHA_YAS530=y
# end of Magnetometer sensors

#
# Multiplexers
#
# CONFIG_IIO_MUX is not set
# end of Multiplexers

#
# Inclinometer sensors
#
# end of Inclinometer sensors

#
# Triggers - standalone
#
CONFIG_IIO_HRTIMER_TRIGGER=y
CONFIG_IIO_INTERRUPT_TRIGGER=y
CONFIG_IIO_TIGHTLOOP_TRIGGER=y
CONFIG_IIO_SYSFS_TRIGGER=y
# end of Triggers - standalone

#
# Linear and angular position sensors
#
# end of Linear and angular position sensors

#
# Digital potentiometers
#
CONFIG_AD5110=y
CONFIG_AD5272=y
# CONFIG_DS1803 is not set
CONFIG_MAX5432=y
# CONFIG_MCP4018 is not set
CONFIG_MCP4531=y
CONFIG_TPL0102=y
# end of Digital potentiometers

#
# Digital potentiostats
#
CONFIG_LMP91000=y
# end of Digital potentiostats

#
# Pressure sensors
#
# CONFIG_ABP060MG is not set
CONFIG_BMP280=y
CONFIG_BMP280_I2C=y
CONFIG_DLHL60D=y
# CONFIG_DPS310 is not set
# CONFIG_HP03 is not set
CONFIG_ICP10100=y
CONFIG_MPL115=y
CONFIG_MPL115_I2C=y
CONFIG_MPL3115=y
CONFIG_MS5611=y
CONFIG_MS5611_I2C=y
CONFIG_MS5637=y
CONFIG_IIO_ST_PRESS=y
# CONFIG_IIO_ST_PRESS_I2C is not set
# CONFIG_T5403 is not set
CONFIG_HP206C=y
# CONFIG_ZPA2326 is not set
# end of Pressure sensors

#
# Lightning sensors
#
# end of Lightning sensors

#
# Proximity and distance sensors
#
CONFIG_CROS_EC_MKBP_PROXIMITY=y
CONFIG_ISL29501=y
CONFIG_LIDAR_LITE_V2=y
# CONFIG_MB1232 is not set
CONFIG_PING=y
CONFIG_RFD77402=y
# CONFIG_SRF04 is not set
CONFIG_SX_COMMON=y
# CONFIG_SX9310 is not set
CONFIG_SX9324=y
# CONFIG_SX9360 is not set
CONFIG_SX9500=y
# CONFIG_SRF08 is not set
CONFIG_VCNL3020=y
CONFIG_VL53L0X_I2C=y
# end of Proximity and distance sensors

#
# Resolver to digital converters
#
# end of Resolver to digital converters

#
# Temperature sensors
#
# CONFIG_MLX90614 is not set
CONFIG_MLX90632=y
CONFIG_TMP006=y
# CONFIG_TMP007 is not set
CONFIG_TMP117=y
CONFIG_TSYS01=y
CONFIG_TSYS02D=y
# end of Temperature sensors

# CONFIG_NTB is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
CONFIG_MADERA_IRQ=y
# end of IRQ chip support

CONFIG_IPACK_BUS=y
# CONFIG_BOARD_TPCI200 is not set
# CONFIG_SERIAL_IPOCTAL is not set
CONFIG_RESET_CONTROLLER=y
# CONFIG_RESET_SIMPLE is not set
CONFIG_RESET_TI_SYSCON=y
# CONFIG_RESET_TI_TPS380X is not set

#
# PHY Subsystem
#
CONFIG_GENERIC_PHY=y
CONFIG_USB_LGM_PHY=y
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
CONFIG_BCM_KONA_USB2_PHY=y
# end of PHY drivers for Broadcom platforms

CONFIG_PHY_PXA_28NM_HSIC=y
CONFIG_PHY_PXA_28NM_USB2=y
CONFIG_PHY_CPCAP_USB=y
# CONFIG_PHY_QCOM_USB_HS is not set
# CONFIG_PHY_QCOM_USB_HSIC is not set
CONFIG_PHY_SAMSUNG_USB2=y
CONFIG_PHY_TUSB1210=y
CONFIG_PHY_INTEL_LGM_EMMC=y
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_MCB=y
# CONFIG_MCB_PCI is not set
# CONFIG_MCB_LPC is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_USB4 is not set

#
# Android
#
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_BINDERFS=y
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
# CONFIG_ANDROID_BINDER_IPC_SELFTEST is not set
# end of Android

# CONFIG_DAX is not set
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_RAVE_SP_EEPROM is not set
CONFIG_NVMEM_RMEM=y
CONFIG_NVMEM_SPMI_SDAM=y

#
# HW tracing support
#
CONFIG_STM=y
# CONFIG_STM_PROTO_BASIC is not set
CONFIG_STM_PROTO_SYS_T=y
CONFIG_STM_DUMMY=y
CONFIG_STM_SOURCE_CONSOLE=y
# CONFIG_STM_SOURCE_HEARTBEAT is not set
# CONFIG_STM_SOURCE_FTRACE is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

CONFIG_FPGA=y
CONFIG_ALTERA_PR_IP_CORE=y
# CONFIG_FPGA_MGR_ALTERA_CVP is not set
CONFIG_FPGA_BRIDGE=y
CONFIG_ALTERA_FREEZE_BRIDGE=y
# CONFIG_XILINX_PR_DECOUPLER is not set
CONFIG_FPGA_REGION=y
CONFIG_FPGA_DFL=y
# CONFIG_FPGA_DFL_FME is not set
CONFIG_FPGA_DFL_AFU=y
CONFIG_FPGA_DFL_NIOS_INTEL_PAC_N3000=y
# CONFIG_FPGA_DFL_PCI is not set
CONFIG_PM_OPP=y
# CONFIG_SIOX is not set
CONFIG_SLIMBUS=y
CONFIG_SLIM_QCOM_CTRL=y
CONFIG_INTERCONNECT=y
# CONFIG_COUNTER is not set
CONFIG_MOST=y
CONFIG_MOST_CDEV=y
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
CONFIG_FS_VERITY=y
CONFIG_FS_VERITY_DEBUG=y
# CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not set
CONFIG_FSNOTIFY=y
# CONFIG_DNOTIFY is not set
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=y
CONFIG_CUSE=y
CONFIG_VIRTIO_FS=y
CONFIG_OVERLAY_FS=y
CONFIG_OVERLAY_FS_REDIRECT_DIR=y
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
CONFIG_OVERLAY_FS_XINO_AUTO=y
CONFIG_OVERLAY_FS_METACOPY=y

#
# Caches
#
CONFIG_NETFS_SUPPORT=y
# CONFIG_NETFS_STATS is not set
CONFIG_FSCACHE=y
# CONFIG_FSCACHE_STATS is not set
CONFIG_FSCACHE_DEBUG=y
# end of Caches

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_INODE64=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
# CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON is not set
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
# CONFIG_EFIVAR_FS is not set
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ECRYPT_FS is not set
CONFIG_JFFS2_FS=y
CONFIG_JFFS2_FS_DEBUG=0
# CONFIG_JFFS2_FS_WRITEBUFFER is not set
# CONFIG_JFFS2_SUMMARY is not set
# CONFIG_JFFS2_FS_XATTR is not set
CONFIG_JFFS2_COMPRESSION_OPTIONS=y
CONFIG_JFFS2_ZLIB=y
CONFIG_JFFS2_LZO=y
# CONFIG_JFFS2_RTIME is not set
# CONFIG_JFFS2_RUBIN is not set
# CONFIG_JFFS2_CMODE_NONE is not set
# CONFIG_JFFS2_CMODE_PRIORITY is not set
CONFIG_JFFS2_CMODE_SIZE=y
# CONFIG_JFFS2_CMODE_FAVOURLZO is not set
CONFIG_UBIFS_FS=y
# CONFIG_UBIFS_FS_ADVANCED_COMPR is not set
CONFIG_UBIFS_FS_LZO=y
CONFIG_UBIFS_FS_ZLIB=y
CONFIG_UBIFS_FS_ZSTD=y
# CONFIG_UBIFS_ATIME_SUPPORT is not set
# CONFIG_UBIFS_FS_XATTR is not set
CONFIG_UBIFS_FS_AUTHENTICATION=y
CONFIG_CRAMFS=y
# CONFIG_CRAMFS_MTD is not set
CONFIG_ROMFS_FS=y
CONFIG_ROMFS_BACKED_BY_MTD=y
CONFIG_ROMFS_ON_MTD=y
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
CONFIG_PSTORE_842_COMPRESS=y
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
# CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT is not set
CONFIG_PSTORE_842_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="842"
# CONFIG_PSTORE_CONSOLE is not set
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_RAM=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=m
# CONFIG_NFS_V4_1 is not set
# CONFIG_ROOT_NFS is not set
# CONFIG_NFS_FSCACHE is not set
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFSD is not set
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
# CONFIG_SUNRPC_DEBUG is not set
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
# CONFIG_CIFS_UPCALL is not set
# CONFIG_CIFS_XATTR is not set
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
# CONFIG_CIFS_DFS_UPCALL is not set
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
CONFIG_NLS_CODEPAGE_861=y
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=y
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
CONFIG_NLS_CODEPAGE_950=y
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
CONFIG_NLS_ISO8859_8=y
CONFIG_NLS_CODEPAGE_1250=y
CONFIG_NLS_CODEPAGE_1251=y
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=y
CONFIG_NLS_ISO8859_3=y
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
CONFIG_NLS_ISO8859_7=y
# CONFIG_NLS_ISO8859_9 is not set
CONFIG_NLS_ISO8859_13=y
CONFIG_NLS_ISO8859_14=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=y
CONFIG_NLS_MAC_ROMAN=y
# CONFIG_NLS_MAC_CELTIC is not set
CONFIG_NLS_MAC_CENTEURO=y
# CONFIG_NLS_MAC_CROATIAN is not set
CONFIG_NLS_MAC_CYRILLIC=y
# CONFIG_NLS_MAC_GAELIC is not set
CONFIG_NLS_MAC_GREEK=y
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
# CONFIG_NLS_MAC_ROMANIAN is not set
CONFIG_NLS_MAC_TURKISH=y
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_REQUEST_CACHE=y
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_USER_DECRYPTED_DATA is not set
CONFIG_KEY_DH_OPERATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_NETWORK is not set
# CONFIG_SECURITY_PATH is not set
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH="/sbin/usermode-helper"
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
# CONFIG_IMA is not set
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_GCC_PLUGIN_STRUCTLEAK_USER is not set
# CONFIG_GCC_PLUGIN_STACKLEAK is not set
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_INIT_ON_FREE_DEFAULT_ON=y
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# CONFIG_RANDSTRUCT_FULL is not set
# CONFIG_RANDSTRUCT_PERFORMANCE is not set
# end of Kernel hardening options
# end of Security options

CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=y
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=y
# CONFIG_CRYPTO_TEST is not set
CONFIG_CRYPTO_SIMD=y
# end of Crypto core or helper

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=y
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=y
# CONFIG_CRYPTO_ECDH is not set
CONFIG_CRYPTO_ECDSA=y
CONFIG_CRYPTO_ECRDSA=y
CONFIG_CRYPTO_SM2=y
# CONFIG_CRYPTO_CURVE25519 is not set
# end of Public-key cryptography

#
# Block ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=y
# CONFIG_CRYPTO_ARIA is not set
CONFIG_CRYPTO_BLOWFISH=y
CONFIG_CRYPTO_BLOWFISH_COMMON=y
# CONFIG_CRYPTO_CAMELLIA is not set
CONFIG_CRYPTO_CAST_COMMON=y
# CONFIG_CRYPTO_CAST5 is not set
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_SM4=y
# CONFIG_CRYPTO_SM4_GENERIC is not set
# CONFIG_CRYPTO_TWOFISH is not set
CONFIG_CRYPTO_TWOFISH_COMMON=y
# end of Block ciphers

#
# Length-preserving ciphers and modes
#
CONFIG_CRYPTO_ADIANTUM=y
CONFIG_CRYPTO_CHACHA20=y
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
# CONFIG_CRYPTO_HCTR2 is not set
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_LRW is not set
CONFIG_CRYPTO_OFB=y
# CONFIG_CRYPTO_PCBC is not set
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_NHPOLY1305=y
# end of Length-preserving ciphers and modes

#
# AEAD (authenticated encryption with associated data) ciphers
#
CONFIG_CRYPTO_AEGIS128=y
# CONFIG_CRYPTO_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
# CONFIG_CRYPTO_SEQIV is not set
# CONFIG_CRYPTO_ECHAINIV is not set
CONFIG_CRYPTO_ESSIV=y
# end of AEAD (authenticated encryption with associated data) ciphers

#
# Hashes, digests, and MACs
#
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_POLY1305 is not set
CONFIG_CRYPTO_RMD160=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=y
CONFIG_CRYPTO_SM3=y
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=y
CONFIG_CRYPTO_VMAC=y
CONFIG_CRYPTO_WP512=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_XXHASH=y
# end of Hashes, digests, and MACs

#
# CRCs (cyclic redundancy checks)
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32 is not set
CONFIG_CRYPTO_CRCT10DIF=y
# end of CRCs (cyclic redundancy checks)

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_842=y
CONFIG_CRYPTO_LZ4=y
# CONFIG_CRYPTO_LZ4HC is not set
CONFIG_CRYPTO_ZSTD=y
# end of Compression

#
# Random number generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_KDF800108_CTR=y
# end of Random number generation

#
# Userspace interface
#
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
# CONFIG_CRYPTO_USER_API_RNG is not set
# CONFIG_CRYPTO_USER_API_AEAD is not set
# end of Userspace interface

CONFIG_CRYPTO_HASH_INFO=y

#
# Accelerated Cryptographic Algorithms for CPU (x86)
#
# CONFIG_CRYPTO_CURVE25519_X86 is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_BLOWFISH_X86_64=y
CONFIG_CRYPTO_CAMELLIA_X86_64=y
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=y
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=y
# CONFIG_CRYPTO_CAST5_AVX_X86_64 is not set
CONFIG_CRYPTO_CAST6_AVX_X86_64=y
CONFIG_CRYPTO_DES3_EDE_X86_64=y
# CONFIG_CRYPTO_SERPENT_SSE2_X86_64 is not set
CONFIG_CRYPTO_SERPENT_AVX_X86_64=y
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=y
CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64=y
CONFIG_CRYPTO_SM4_AESNI_AVX2_X86_64=y
CONFIG_CRYPTO_TWOFISH_X86_64=y
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=y
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=y
# CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64 is not set
CONFIG_CRYPTO_CHACHA20_X86_64=y
CONFIG_CRYPTO_AEGIS128_AESNI_SSE2=y
CONFIG_CRYPTO_NHPOLY1305_SSE2=y
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
# CONFIG_CRYPTO_POLYVAL_CLMUL_NI is not set
# CONFIG_CRYPTO_POLY1305_X86_64 is not set
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
# CONFIG_CRYPTO_SHA512_SSSE3 is not set
# CONFIG_CRYPTO_SM3_AVX_X86_64 is not set
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=y
CONFIG_CRYPTO_CRC32C_INTEL=y
CONFIG_CRYPTO_CRC32_PCLMUL=y
# CONFIG_CRYPTO_CRCT10DIF_PCLMUL is not set
# end of Accelerated Cryptographic Algorithms for CPU (x86)

# CONFIG_CRYPTO_HW is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
# CONFIG_SIGNED_PE_FILE_VERIFICATION is not set
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE_SIZE=4096
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# CONFIG_SYSTEM_REVOCATION_LIST is not set
# CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_LINEAR_RANGES=y
CONFIG_PACKING=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=y
CONFIG_PRIME_NUMBERS=y
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_UTILS=y
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=y
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=y
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=y
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=y
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA1=y
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
# CONFIG_CRC64_ROCKSOFT is not set
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
# CONFIG_CRC32_SLICEBY8 is not set
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
CONFIG_CRC32_BIT=y
# CONFIG_CRC64 is not set
CONFIG_CRC4=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_CRC8=y
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=y
CONFIG_842_DECOMPRESS=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMMON=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_POWERPC is not set
# CONFIG_XZ_DEC_IA64 is not set
CONFIG_XZ_DEC_ARM=y
# CONFIG_XZ_DEC_ARMTHUMB is not set
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=y
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_BCH=y
CONFIG_INTERVAL_TREE=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
CONFIG_DMA_CMA=y
CONFIG_DMA_PERNUMA_CMA=y

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_PERCENTAGE=0
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
CONFIG_CMA_SIZE_SEL_MAX=y
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
CONFIG_DMA_MAP_BENCHMARK=y
CONFIG_SGL_ALLOC=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
# CONFIG_IRQ_POLL is not set
CONFIG_MPILIB=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
CONFIG_FONT_8x16=y
CONFIG_FONT_AUTOSELECT=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACKDEPOT_ALWAYS_INIT=y
CONFIG_REF_TRACKER=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_PRINTK_CALLER=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DYNAMIC_DEBUG_CORE is not set
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_AS_HAS_NON_CONST_LEB128=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
CONFIG_DEBUG_INFO_DWARF5=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_BTF=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
# CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_FRAME_WARN=8192
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_READABLE_ASM=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y
CONFIG_OBJTOOL=y
CONFIG_NOINSTR_VALIDATION=y
CONFIG_VMLINUX_MAP=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
# CONFIG_DEBUG_FS_ALLOW_ALL is not set
CONFIG_DEBUG_FS_DISALLOW_MOUNT=y
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
CONFIG_CC_HAS_UBSAN_BOUNDS=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ONLY_BOUNDS=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_BOOL is not set
# CONFIG_UBSAN_ENUM is not set
# CONFIG_UBSAN_ALIGNMENT is not set
CONFIG_UBSAN_SANITIZE_ALL=y
# CONFIG_TEST_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
CONFIG_NET_DEV_REFCNT_TRACKER=y
CONFIG_NET_NS_REFCNT_TRACKER=y
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
CONFIG_PAGE_EXTENSION=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_PAGE_OWNER=y
# CONFIG_PAGE_TABLE_CHECK is not set
# CONFIG_PAGE_POISONING is not set
CONFIG_DEBUG_PAGE_REF=y
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_DEBUG_WX=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
CONFIG_PTDUMP_DEBUGFS=y
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
# CONFIG_DEBUG_OBJECTS_WORK is not set
# CONFIG_DEBUG_OBJECTS_RCU_HEAD is not set
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
# CONFIG_SHRINKER_DEBUG is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_MEMORY_INIT is not set
CONFIG_ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP=y
# CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
# CONFIG_KASAN_OUTLINE is not set
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_MODULE_TEST is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
CONFIG_HAVE_ARCH_KMSAN=y
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=480
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

CONFIG_DEBUG_TIMEKEEPING=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_LOCK_TORTURE_TEST=m
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
CONFIG_TRACE_IRQFLAGS_NMI=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
CONFIG_DEBUG_NOTIFIERS=y
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# CONFIG_DEBUG_MAPLE_TREE is not set
# end of Debug kernel data structures

CONFIG_DEBUG_CREDENTIALS=y

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
# CONFIG_PROVE_RCU_LIST is not set
CONFIG_TORTURE_TEST=m
CONFIG_RCU_SCALE_TEST=m
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_REF_SCALE_TEST=m
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging

CONFIG_DEBUG_WQ_FORCE_RR_CPU=y
CONFIG_LATENCYTOP=y
# CONFIG_DEBUG_CGROUP_REF is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_RETHOOK=y
CONFIG_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_ENABLE_DEFAULT_TRACERS is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_KPROBE_EVENTS=y
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
# CONFIG_SYNTH_EVENTS is not set
# CONFIG_HIST_TRIGGERS is not set
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_GCOV_PROFILE_FTRACE is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_RV is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
CONFIG_SAMPLES=y
CONFIG_SAMPLE_AUXDISPLAY=y
# CONFIG_SAMPLE_TRACE_EVENTS is not set
# CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS is not set
# CONFIG_SAMPLE_TRACE_PRINTK is not set
# CONFIG_SAMPLE_TRACE_ARRAY is not set
CONFIG_SAMPLE_KOBJECT=y
# CONFIG_SAMPLE_KPROBES is not set
# CONFIG_SAMPLE_HW_BREAKPOINT is not set
# CONFIG_SAMPLE_KFIFO is not set
# CONFIG_SAMPLE_CONFIGFS is not set
CONFIG_SAMPLE_WATCHDOG=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y

#
# x86 Debugging
#
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
CONFIG_EFI_PGT_DUMP=y
CONFIG_DEBUG_TLBFLUSH=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
# CONFIG_X86_DECODER_SELFTEST is not set
# CONFIG_IO_DELAY_0X80 is not set
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
CONFIG_IO_DELAY_NONE=y
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
CONFIG_DEBUG_ENTRY=y
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
CONFIG_KCOV_INSTRUMENT_ALL=y
CONFIG_KCOV_IRQ_AREA_SIZE=0x40000
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_ARCH_USE_MEMTEST=y
CONFIG_MEMTEST=y
# end of Kernel Testing and Coverage

#
# Rust hacking
#
# end of Rust hacking
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type
  2022-12-17  8:25 ` [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
@ 2022-12-22 18:50   ` Andrii Nakryiko
  0 siblings, 0 replies; 38+ messages in thread
From: Andrii Nakryiko @ 2022-12-22 18:50 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:25 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> If a BPF program defines a struct or union type which has a field type
> that the verifier considers special - spin_lock, graph datastructure
> heads and nodes - the verifier needs to be able to find fields of that
> type using BTF.
>
> For such a program, BTF is required, so modify kernel_needs_btf helper
> to ensure that correct "BTF is mandatory" error message is emitted.
>
> The newly-added btf_has_alloc_obj_type looks for BTF_KIND_STRUCTs with a
> name corresponding to a special type. If any such struct is found it is
> assumed that some variable is using it, and therefore that successful
> BTF load is necessary.
>
> Also add a kernel_needs_btf check to bpf_object__create_map where it was
> previously missing. When this function calls bpf_map_create, kernel may
> reject map creation due to mismatched graph owner and ownee
> types (e.g. a struct bpf_list_head with __contains tag pointing to
> bpf_rbtree_node field). In such a scenario - or any other where BTF is
> necessary for verification - bpf_map_create should not be retried
> without BTF.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  tools/lib/bpf/libbpf.c | 50 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 39 insertions(+), 11 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2a82f49ce16f..56a905b502c9 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -998,6 +998,31 @@ find_struct_ops_kern_types(const struct btf *btf, const char *tname,
>         return 0;
>  }
>
> +/* Should match alloc_obj_fields in kernel/bpf/btf.c
> + */

nit: keep comment on a single line?

> +static const char *alloc_obj_fields[] = {
> +       "bpf_spin_lock",
> +       "bpf_list_head",
> +       "bpf_list_node",
> +       "bpf_rb_root",
> +       "bpf_rb_node",
> +};
> +
> +static bool
> +btf_has_alloc_obj_type(const struct btf *btf)

I find "alloc_obj_type" naming completely unhelpful, tbh. Let's use
something more generic and unassuming as "special_btf_type" or
something along those lines?

> +{
> +       const char *tname;
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(alloc_obj_fields); i++) {
> +               tname = alloc_obj_fields[i];
> +               if (btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT) > 0)

this will do linear search over entire program's BTF for each
alloc_obj_fields element. Given alloc_obj_fields is supposed to be a
small array, I think it's better to do single linear pass over prog
BTF and for each found STRUCT check if its name matches
alloc_obj_fields.

Having said that, it feels like the better logic would be to check
that any map value's BTF (including global var ARRAYs) have a field of
one of those special types. Just searching for any STRUCT type with
one of those names feels off.

> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
>  static bool bpf_map__is_struct_ops(const struct bpf_map *map)
>  {
>         return map->def.type == BPF_MAP_TYPE_STRUCT_OPS;
> @@ -2794,7 +2819,8 @@ static bool libbpf_needs_btf(const struct bpf_object *obj)
>
>  static bool kernel_needs_btf(const struct bpf_object *obj)
>  {
> -       return obj->efile.st_ops_shndx >= 0;
> +       return obj->efile.st_ops_shndx >= 0 ||
> +               (obj->btf && btf_has_alloc_obj_type(obj->btf));
>  }
>
>  static int bpf_object__init_btf(struct bpf_object *obj,
> @@ -5103,16 +5129,18 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
>
>                 err = -errno;
>                 cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
> -               pr_warn("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> -                       map->name, cp, err);
> -               create_attr.btf_fd = 0;
> -               create_attr.btf_key_type_id = 0;
> -               create_attr.btf_value_type_id = 0;
> -               map->btf_key_type_id = 0;
> -               map->btf_value_type_id = 0;
> -               map->fd = bpf_map_create(def->type, map_name,
> -                                        def->key_size, def->value_size,
> -                                        def->max_entries, &create_attr);
> +               pr_warn("Error in bpf_create_map_xattr(%s):%s(%d).\n", map->name, cp, err);
> +               if (!kernel_needs_btf(obj)) {

see above about check whether a map's value BTF itself is using any of
the special type. I think this decision should be made based on
particular map's need for BTF, not based on kernel_needs_btf().

I think it would be better to have an if/else with different
pr_warn()s. Both should report that initial bpf_map_create() (btw,
gotta update the message now, missed that) failed with error, but then
in one case say that we are retrying without BTF, and in another
explain that we are not because map requires kernel to see its BTF.
WDYT?

> +                       pr_warn("Retrying bpf_map_create_xattr(%s) without BTF.\n", map->name);
> +                       create_attr.btf_fd = 0;
> +                       create_attr.btf_key_type_id = 0;
> +                       create_attr.btf_value_type_id = 0;
> +                       map->btf_key_type_id = 0;
> +                       map->btf_value_type_id = 0;
> +                       map->fd = bpf_map_create(def->type, map_name,
> +                                                def->key_size, def->value_size,
> +                                                def->max_entries, &create_attr);
> +               }
>         }
>
>         err = map->fd < 0 ? -errno : 0;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
@ 2022-12-23 10:51 ` Dan Carpenter
  0 siblings, 0 replies; 38+ messages in thread
From: Dan Carpenter @ 2022-12-23 10:51 UTC (permalink / raw)
  To: oe-kbuild, Dave Marchevsky, bpf
  Cc: lkp, oe-kbuild-all, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo,
	Dave Marchevsky

Hi Dave,

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Marchevsky/BPF-rbtree-next-gen-datastructure/20221217-162646
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20221217082506.1570898-3-davemarchevsky%40fb.com
patch subject: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
config: x86_64-randconfig-m001
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>

smatch warnings:
kernel/bpf/verifier.c:6275 reg_find_field_offset() warn: variable dereferenced before check 'reg' (see line 6274)

vim +/reg +6275 kernel/bpf/verifier.c

4ed17b8d6842ba Dave Marchevsky 2022-12-17  6268  static struct btf_field *
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6269  reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6270  {
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6271  	struct btf_field *field;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6272  	struct btf_record *rec;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6273  
4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6274  	rec = reg_btf_record(reg);
4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6275  	if (!reg)

Is this supposed to test rec instead of reg?

4ed17b8d6842ba Dave Marchevsky 2022-12-17  6276  		return NULL;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6277  
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6278  	field = btf_record_find(rec, off, fields);
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6279  	if (!field)
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6280  		return NULL;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6281  
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6282  	return field;
4ed17b8d6842ba Dave Marchevsky 2022-12-17  6283  }

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs
  2022-12-17  8:25 ` [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
@ 2022-12-28 21:26   ` David Vernet
  2023-01-18  2:16     ` Dave Marchevsky
  0 siblings, 1 reply; 38+ messages in thread
From: David Vernet @ 2022-12-28 21:26 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:25:06AM -0800, Dave Marchevsky wrote:
> It is difficult to intuit the semantics of owning and non-owning
> references from verifier code. In order to keep the high-level details
> from being lost in the mailing list, this patch adds documentation
> explaining semantics and details.
> 
> The target audience of doc added in this patch is folks working on BPF
> internals, as there's focus on "what should the verifier do here". Via
> reorganization or copy-and-paste, much of the content can probably be
> repurposed for BPF program writer audience as well.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Hey Dave,

Thanks for writing this up. I left a few comments and suggestions as a
first pass. Feel free to push back on any of them.

> ---
>  Documentation/bpf/graph_ds_impl.rst | 208 ++++++++++++++++++++++++++++
>  Documentation/bpf/other.rst         |   3 +-
>  2 files changed, 210 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/bpf/graph_ds_impl.rst
> 
> diff --git a/Documentation/bpf/graph_ds_impl.rst b/Documentation/bpf/graph_ds_impl.rst
> new file mode 100644
> index 000000000000..f92cbd223dc3
> --- /dev/null
> +++ b/Documentation/bpf/graph_ds_impl.rst
> @@ -0,0 +1,208 @@
> +=========================
> +BPF Graph Data Structures
> +=========================
> +
> +This document describes implementation details of new-style "graph" data
> +structures (linked_list, rbtree), with particular focus on verifier

s/with particular/with a particular

> +implementation of semantics particular to those data structures.

s/particular/specific

Just because we already use the word "particular" in the sentence?

In general this sentence feels a bit difficult to parse. Wdyt about
this?

...with a particular focus on how the verifier ensures that they are
properly and safely used by BPF programs.

> +
> +Note that the intent of this document is to describe the current state of
> +these graph data structures, **no guarantees** of stability for either

I think we can end the sentence in the middle here.

...these graph data structures. **No guarantees**...

Should we also add a sentence or two here about the intended audience
(people working on the verifier or readers who are interested in
learning more about BPF internals)?

> +semantics or APIs are made or implied here.
> +
> +.. contents::
> +    :local:
> +    :depth: 2
> +
> +Introduction
> +------------
> +
> +The BPF map API has historically been the main way to expose data structures
> +of various types for use within BPF programs. Some data structures fit naturally
> +with the map API (HASH, ARRAY), others less so. Consequentially, programs

Would you mind please adding some details on why some data structures
don't fit naturally into the existing map APIs? I feel like that's kind
of the main focus of the article, so it would probably help to give some
high-level context up front.

> +interacting with the latter group of data structures can be hard to parse
> +for kernel programmers without previous BPF experience.

I'm not sure I quite follow how this latter point about data structures
being hard to parse is derived from the point about how some data
structures don't fit naturally with the map APIs. Maybe we should say
something like:

..., others less so. Given that the API surface and behavioral semantics
are fundmentally different between these two classes of BPF data
structures, kernel programmers who are used to interacting with map-type
data structures may find these graph-type data structures to be
confusing or unfamiliar.

Wdyt?

> +
> +Luckily, some restrictions which necessitated the use of BPF map semantics are
> +no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
> +BPF allocator, it is now possible to implement BPF data structures whose API
> +and semantics more closely match those exposed to the rest of the kernel.

Suggestion, I'd consider explicitly contrasting the map-type
implementation here with the graph-type implementation. What do you
think of something like this instead of the above paragraph:

BPF map-type data structures are defined as part of the UAPI in ``enum
bpf_map_type``, and are accessed and manipulated using BPF
:doc:`helpers`. The behaviors, backing memory, and implementations of
these map-type data structures are entirely encapsulated from BPF
programs, and mostly encapsulated from the verifier, by the helper
functions. The logic in the verifier for ensuring that map-type data
structures are correctly used therefore essentially amounts to
statically verifying that the helper functions that manipulate and
access the data structure are called correctly by the program, as
defined in the helper prototypes. The verifier then relies on the helper
to properly manipulate the backing data structure with its validated
arguments.

BPF graph-type data structures, on the other hand, leverage more modern
features such as :doc:`kfuncs`, kptrs, and the any-context BPF
allocator. They allow BPF programs to manipulate the data structures
directly using APIs and semantics which more closely match those exposed
to code in the main kernel, with the verifier's job now being to ensure
that the programs are properly manipulating the data structures, rather
than relying on helper functions to properly manipulate the data
structures in the main kernel.

> +
> +Two such data structures - linked_list and rbtree - have many verification
> +details in common. Because both have "root"s ("head" for linked_list) and
> +"node"s, the verifier code and this document refer to common functionality
> +as "graph_api", "graph_root", "graph_node", etc.



> +
> +Unless otherwise stated, examples and semantics below apply to both graph data
> +structures.
> +
> +Non-owning references
> +---------------------
> +
> +**Motivation**
> +
> +Consider the following BPF code:
> +
> +.. code-block:: c

You need an extra newline here or the docs build will complain:

bpf-next/Documentation/bpf/graph_ds_impl.rst:46: ERROR: Error in "code-block" directive:
maximum 1 argument(s) allowed, 9 supplied.

.. code-block:: c
        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */

        bpf_spin_lock(&lock);

        bpf_rbtree_add(&tree, n); /* AFTER */

        bpf_spin_unlock(&lock);

> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> +
> +        bpf_spin_lock(&lock);
> +
> +        bpf_rbtree_add(&tree, n); /* AFTER */
> +
> +        bpf_spin_unlock(&lock);

Also need a newline here or sphinx will get confused and think the
vertical line is part of the code block.

> +----
> +
> +From the verifier's perspective, after bpf_obj_new ``n`` has type
> +``PTR_TO_BTF_ID | MEM_ALLOC`` with btf_id of ``struct node_data`` and a
> +nonzero ``ref_obj_id``. Because it holds ``n``, the program has ownership

I had to read this first sentence a few times to parse it, maybe due to
a missing comma between "after bpf_obj_new" and "``n`` has type...".
What do you think about this wording?

From the verifier's perspective, the pointer ``n`` returned from
``bpf_obj_new`` has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a `btf_id`
of ``struct node_data``, and a nonzero ``ref_obj_id``.

> +of the pointee's lifetime (object pointed to by ``n``). The BPF program must

Should we move (object pointed to by ``n``) to be directly after
"pointee's" / before "lifetime"? Otherwise it reads kind of odd given
that "lifetime" is really the indirect object in the sentence.

> +pass off ownership before exiting - either via ``bpf_obj_drop``, which free's

s/free's/frees

> +the object, or by adding it to ``tree`` with ``bpf_rbtree_add``.
> +
> +(``BEFORE`` and ``AFTER`` comments in the example denote beginning of "before
> +ownership is passed" and "after ownership is passed")

Should we use something like ACQUIRED / PASSED / RELEASED instead of
BEFORE / AFTER?

> +
> +What should the verifier do with ``n`` after ownership is passed off? If the
> +object was free'd with ``bpf_obj_drop`` the answer is obvious: the verifier

s/free'd/freed

> +should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
> +the object is no longer valid. The underlying memory may have been reused for
> +some other allocation, unmapped, etc.
> +
> +When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
> +obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
> +but that would result in programs with useful, common coding patterns being
> +rejected, e.g.:
> +
> +.. code-block:: c

Same here (newline)

> +        int x;
> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> +
> +        bpf_spin_lock(&lock);
> +
> +        bpf_rbtree_add(&tree, n); /* AFTER */
> +        x = n->data;
> +        n->data = 42;
> +
> +        bpf_spin_unlock(&lock);

Same here (newline)

> +----
> +
> +Both the read from and write to ``n->data`` would be rejected. The verifier
> +can do better, though, by taking advantage of two details:
> +
> +  * Graph data structure APIs can only be used when the ``bpf_spin_lock``
> +    associated with the graph root is held

I'd consider giving a bit more background information on this somewhere
above. This is the first time we've mentioned anything about a lock, so
it might be worth it to give some context on how these graph-type maps
are defined and initialized.

I realize we could be approaching "useful even to people who aren't
working on the verifier" territory if we go into too much detail, but I
also think it's important to give backround context on this stuff
regardless of the intended audience in order for the documentation to
really be useful.

> +  * Both graph data structures have pointer stability

You also need a newline between nested list entries or sphinx will get
confused. My suggestion would be to just always have a newline between
list entries (applies elsewhere in the file as well).

> +    * Because graph nodes are allocated with ``bpf_obj_new`` and
> +      adding / removing from the root involves fiddling with the
> +      ``bpf_{list,rb}_node`` field of the node struct, a graph node will
> +      remain at the same address after either operation.
> +
> +Because the associated ``bpf_spin_lock`` must be held by any program adding
> +or removing, if we're in the critical section bounded by that lock, we know
> +that no other program can add or remove until the end of the critical section.
> +This combined with pointer stability means that, until the critical section
> +ends, we can safely access the graph node through ``n`` even after it was used
> +to pass ownership.
> +
> +The verifier considers such a reference a *non-owning reference*. The ref
> +returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
> +Both terms currently only have meaning in the context of graph nodes and API.
> +
> +**Details**
> +
> +Let's enumerate the properties of both types of references.
> +
> +*owning reference*
> +
> +  * This reference controls the lifetime of the pointee
> +  * Ownership of pointee must be 'released' by passing it to some graph API
> +    kfunc, or via ``bpf_obj_drop``, which free's the pointee

s/free's/frees. "Frees" is a verb, "free's" is a possessive.

> +    * If not released before program ends, verifier considers program invalid
> +  * Access to the pointee's memory will not page fault
> +
> +*non-owning reference*
> +
> +  * This reference does not own the pointee
> +    * It cannot be used to add the graph node to a graph root, nor free via
> +      ``bpf_obj_drop``
> +  * No explicit control of lifetime, but can infer valid lifetime based on
> +    non-owning ref existence (see explanation below)
> +  * Access to the pointee's memory will not page fault

I'd consider defining references, or at least giving some high-level
description of how they work, somewhere a bit earlier in the page. The
"Non-owning references" section kind of just jumps right into examples
of what the verifier allows without describing the concept at a higher
level, so readers will have a difficult time applying what they're
reading to the examples being provided.

> +
> +From verifier's perspective non-owning references can only exist
> +between spin_lock and spin_unlock. Why? After spin_unlock another program
> +can do arbitrary operations on the data structure like removing and free-ing

s/free-ing/freeing

> +via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,

s/remove'd/removed

I'll stop pointing these out for now, they apply throughout the page.

> +free'd, and reused via bpf_obj_new would point to an entirely different thing.
> +Or the memory could go away.
> +
> +To prevent this logic violation all non-owning references are invalidated by
> +verifier after critical section ends. This is necessary to ensure "will

- s/by verifier/by the verifier
- s/after critical section/after a critical section
- s/to ensure "will not"/to ensure a "will not"


> +not page fault" property of non-owning reference. So if verifier hasn't

- s/of non-owning/of the non-owning
- s/So if verifier/So if the verifier

> +invalidated a non-owning ref, accessing it will not page fault.
> +
> +Currently ``bpf_obj_drop`` is not allowed in the critical section, so
> +if there's a valid non-owning ref, we must be in critical section, and can

s/in critical section/in a critical section

> +conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
> +and-reused.

If you split the line like this, it will render as "dropped-and- reused".

> +
> +Any reference to a node that is in a rbtree _must_ be non-owning, since

s/a rbtree/an rbtree

> +the tree has control of pointee lifetime. Similarly, any ref to a node

s/of pointee lifetime/of the pointee's lifetime

> +that isn't in rbtree _must_ be owning. This results in a nice property:

s/in rbtree/in an rbtree

> +graph API add / remove implementations don't need to check if a node
> +has already been added (or already removed), as the verifier type system
> +prevents such a state from being valid.

I feel like "verifier type system" isn't quite accurate here, though I
may be wrong. When I think of something like "verifier type system" I'm
more envisioning how the verifier ensures that the correct BTF IDs are
passed. In this case, it's really the BPF graph-object ownership model
that's ensuring that the state is valid, right?

> +
> +However, pointer aliasing poses an issue for the above "nice property".
> +Consider the following example:
> +
> +.. code-block:: c

Same here (newline)

> +        struct node_data *n, *m, *o, *p;
> +        n = bpf_obj_new(typeof(*n));     /* 1 */
> +
> +        bpf_spin_lock(&lock);
> +
> +        bpf_rbtree_add(&tree, n);        /* 2 */
> +        m = bpf_rbtree_first(&tree);     /* 3 */
> +
> +        o = bpf_rbtree_remove(&tree, n); /* 4 */
> +        p = bpf_rbtree_remove(&tree, m); /* 5 */
> +
> +        bpf_spin_unlock(&lock);
> +
> +        bpf_obj_drop(o);
> +        bpf_obj_drop(p); /* 6 */

Same here (newline)

> +----
> +
> +Assume tree is empty before this program runs. If we track verifier state

s/Assume tree,/Assume the tree

> +changes here using numbers in above comments:
> +
> +  1) n is an owning reference
> +  2) n is a non-owning reference, it's been added to the tree
> +  3) n and m are non-owning references, they both point to the same node
> +  4) o is an owning reference, n and m non-owning, all point to same node
> +  5) o and p are owning, n and m non-owning, all point to the same node
> +  6) a double-free has occurred, since o and p point to same node and o was
> +     free'd in previous statement
> +
> +States 4 and 5 violate our "nice property", as there are non-owning refs to
> +a node which is not in a rbtree. Statement 5 will try to remove a node which
> +has already been removed as a result of this violation. State 6 is a dangerous
> +double-free.
> +
> +At a minimum we should prevent state 6 from being possible. If we can't also
> +prevent state 5 then we must abandon our "nice property" and check whether a
> +node has already been removed at runtime.
> +
> +We prevent both by generalizing the "invalidate non-owning references" behavior
> +of ``bpf_spin_unlock`` and doing similar invalidation after
> +``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
> +
> +  * takes an arbitrary node argument
> +  * removes it from the datastructure
> +  * returns an owning reference to the removed node
> +
> +May result in a state where some other non-owning reference points to the same
> +node. So ``remove``-type kfuncs must be considered a non-owning reference
> +invalidation point as well.

Could you please also add the new kfunc flags that signal this to
Documentation/bpf/kfuncs.rst?

> diff --git a/Documentation/bpf/other.rst b/Documentation/bpf/other.rst
> index 3d61963403b4..7e6b12018802 100644
> --- a/Documentation/bpf/other.rst
> +++ b/Documentation/bpf/other.rst
> @@ -6,4 +6,5 @@ Other
>     :maxdepth: 1
>  
>     ringbuf
> -   llvm_reloc
> \ No newline at end of file
> +   llvm_reloc
> +   graph_ds_impl
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
  2022-12-17  9:21   ` Dave Marchevsky
@ 2022-12-28 23:46   ` David Vernet
  2022-12-29 15:39     ` David Vernet
  2022-12-29  3:56   ` Alexei Starovoitov
  2 siblings, 1 reply; 38+ messages in thread
From: David Vernet @ 2022-12-28 23:46 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote:
> This patch introduces non-owning reference semantics to the verifier,
> specifically linked_list API kfunc handling. release_on_unlock logic for
> refs is refactored - with small functional changes - to implement these
> semantics, and bpf_list_push_{front,back} are migrated to use them.
> 
> When a list node is pushed to a list, the program still has a pointer to
> the node:
> 
>   n = bpf_obj_new(typeof(*n));
> 
>   bpf_spin_lock(&l);
>   bpf_list_push_back(&l, n);
>   /* n still points to the just-added node */
>   bpf_spin_unlock(&l);
> 
> What the verifier considers n to be after the push, and thus what can be
> done with n, are changed by this patch.
> 
> Common properties both before/after this patch:
>   * After push, n is only a valid reference to the node until end of
>     critical section
>   * After push, n cannot be pushed to any list
>   * After push, the program can read the node's fields using n
> 
> Before:
>   * After push, n retains the ref_obj_id which it received on
>     bpf_obj_new, but the associated bpf_reference_state's
>     release_on_unlock field is set to true
>     * release_on_unlock field and associated logic is used to implement
>       "n is only a valid ref until end of critical section"
>   * After push, n cannot be written to, the node must be removed from
>     the list before writing to its fields
>   * After push, n is marked PTR_UNTRUSTED
> 
> After:
>   * After push, n's ref is released and ref_obj_id set to 0. The
>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>     currently active lock
>     * non_owning_ref_lock and logic is used to implement "n is only a
>       valid ref until end of critical section"
>   * n can be written to (except for special fields e.g. bpf_list_node,
>     timer, ...)
>   * No special type flag is added to n after push
> 
> Summary of specific implementation changes to achieve the above:
> 
>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>     to "release on unlock" based on that field are removed
> 
>   * The anonymous active_lock struct used by bpf_verifier_state is
>     pulled out into a named struct bpf_active_lock.
> 
>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>     bpf_reg_state's PTR_TO_BTF_ID union
> 
>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>     ref semantics as described above
>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>       matching a particular bpf_active_lock identity. Replaces
>       release_on_unlock logic in process_spin_lock.
>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>       on current verifier state
>     * ref_convert_owning_non_owning - convert owning reference w/
>       specified ref_obj_id to non-owning references. Setup
>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>       its ref_obj_id
> 
>   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
>     KF_RELEASE to indicate that the release arg reg should be converted
>     to non-owning ref
>     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
>       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
>       doing ref_convert_owning_non_owning on the ref first, which
>       prevents the regs from being clobbered by 0ing out their
>       ref_obj_ids. The bpf_reference_state itself is still released via
>       release_reference as a result of the KF_RELEASE flag.
>     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
>       bpf_list_push_{front,back}
> 
> After these changes, linked_list's "release on unlock" logic continues
> to function as before, except for the semantic differences noted above.
> The patch immediately following this one makes minor changes to
> linked_list selftests to account for the differing behavior.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Hey Dave,

I'm sorry to be chiming in a bit late in the game here, but I only
finally had the time to fully review some of this stuff during the
holiday-lull, and I have a few questions / concerns about the whole
owning vs. non-owning refcount approach we're taking here.

> ---
>  include/linux/bpf.h          |   1 +
>  include/linux/bpf_verifier.h |  39 ++++-----
>  include/linux/btf.h          |  17 ++--
>  kernel/bpf/helpers.c         |   4 +-
>  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
>  5 files changed, 146 insertions(+), 79 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3de24cfb7a3d..f71571bf6adc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -180,6 +180,7 @@ enum btf_field_type {
>  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>  	BPF_LIST_HEAD  = (1 << 4),
>  	BPF_LIST_NODE  = (1 << 5),
> +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
>  };
>  
>  struct btf_field_kptr {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 53d175cbaa02..cb417ffbbb84 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>  };
>  
> +/* For every reg representing a map value or allocated object pointer,
> + * we consider the tuple of (ptr, id) for them to be unique in verifier
> + * context and conside them to not alias each other for the purposes of
> + * tracking lock state.
> + */
> +struct bpf_active_lock {
> +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> +	 * there's no active lock held, and other fields have no
> +	 * meaning. If non-NULL, it indicates that a lock is held and
> +	 * id member has the reg->id of the register which can be >= 0.
> +	 */
> +	void *ptr;
> +	/* This will be reg->id */
> +	u32 id;
> +};
> +
>  struct bpf_reg_state {
>  	/* Ordering of fields matters.  See states_equal() */
>  	enum bpf_reg_type type;
> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>  		struct {
>  			struct btf *btf;
>  			u32 btf_id;
> +			struct bpf_active_lock non_owning_ref_lock;
>  		};
>  
>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> @@ -223,11 +240,6 @@ struct bpf_reference_state {
>  	 * exiting a callback function.
>  	 */
>  	int callback_ref;
> -	/* Mark the reference state to release the registers sharing the same id
> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> -	 * safe to access inside the critical section).
> -	 */
> -	bool release_on_unlock;
>  };
>  
>  /* state of the program:
> @@ -328,21 +340,8 @@ struct bpf_verifier_state {
>  	u32 branches;
>  	u32 insn_idx;
>  	u32 curframe;
> -	/* For every reg representing a map value or allocated object pointer,
> -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
> -	 * context and conside them to not alias each other for the purposes of
> -	 * tracking lock state.
> -	 */
> -	struct {
> -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> -		 * there's no active lock held, and other fields have no
> -		 * meaning. If non-NULL, it indicates that a lock is held and
> -		 * id member has the reg->id of the register which can be >= 0.
> -		 */
> -		void *ptr;
> -		/* This will be reg->id */
> -		u32 id;
> -	} active_lock;
> +
> +	struct bpf_active_lock active_lock;
>  	bool speculative;
>  	bool active_rcu_lock;
>  
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 5f628f323442..8aee3f7f4248 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -15,10 +15,10 @@
>  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
>  
>  /* These need to be macros, as the expressions are used in assembler input */
> -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
> -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
> +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
> +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
> +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
>  /* Trusted arguments are those which are guaranteed to be valid when passed to
>   * the kfunc. It is used to enforce that pointers obtained from either acquire
>   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> @@ -67,10 +67,11 @@
>   *	return 0;
>   * }
>   */
> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
> +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
> +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
> +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
> +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */

It would be nice if we could come up with new kfunc flag names that
don't have 'RELEASE' in it. As is this is arguably a bit of a leaky
abstraction given that kfunc authors now have to understand a notion of
"releasing", "releasing but keeping a non-owning ref", and "releasing
but it must be a non-owning reference". I know that in [0] you mention
that the notions of owning and non-owning references are entirely
relegated to graph-type maps, but I disagree. More below.

[0]: https://lore.kernel.org/all/20221217082506.1570898-14-davemarchevsky@fb.com/

In general, IMO this muddies the existing, crystal-clear semantics of
BPF object ownership and refcounting. Usually a "weak" or "non-owning"
reference is a shadow of a strong reference, and "using" the weak
reference requires attempting (because it could fail) to temporarily
promote it to a strong reference. If successful, the object's existence
is guaranteed until the weak pointer is demoted back to a weak pointer
and/or the promoted strong pointer is released, and it's perfectly valid
for an object's lifetime to be extended due to a promoted weak pointer
not dropping its reference until after all the other strong pointer
references have been dropped. The key point here is that a pointer's
safety is entirely dictated by whether or not the holder has or is able
to acquire a strong reference, and nothing more.

In contrast, if I understand correctly, in this proposal a "non-owning"
reference means that the object is guaranteed to be valid due to
external factors such as a lock being held on the root node of the
graph, and is used to e.g. signal whether an object has or has not yet
been added as a node to an rbtree or a list. If so, IMO these are
completely separate concepts from refcounting, and I don't think we
should intertwine it with the acquire / release semantics that we
currently use for ensuring object lifetime.

Note that weak references are usually (if not always, at least in my
experience) used to resolve circular dependencies where the reference
would always be leaked if both sides had a strong reference. I don't
think that applies here, where instead we're using "owning reference" to
mean that ownership of the object has not yet been passed to a
graph-type data structure, and "non-owning reference" to mean that the
graph now owns the strong reference, but it's still safe to reference
the object due to it being protected by some external synchronization
mechanism like a lock. There's no danger of a circular dependency here,
we just want to provide consistent API semantics.

If we want to encapsulate notions of "safe due to a lock being held on a
root node", and "pointer hasn't yet been inserted into the graph", I
think we should consider adding some entirely separate abstractions. For
example, something like PTR_GRAPH_STORED on the register type-modifier
side for signaling whether a pointer has already been stored in a graph,
and KF_GRAPH_INSERT, KF_GRAPH_REMOVE type kfunc flags for adding and
removing from graphs respectively. I don't think we'd have to add
anything at all for ensuring pointer safety from the lock being held, as
the verifier should be able to figure out that a pointer that was
inserted with KF_GRAPH_INSERT is safe to reference inside of the locked
region of the lock associated with the root node. The refcnt of the
object isn't relevant at all, it's the association of the root node with
a specific lock.

>  
>  /*
>   * Return the name of the passed struct, if exists, or halt the build if for
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index af30c6cbd65d..e041409779c3 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>  #endif
>  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
> -BTF_ID_FLAGS(func, bpf_list_push_front)
> -BTF_ID_FLAGS(func, bpf_list_push_back)
> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)

I don't think a helper should specify both of these flags together.
IIUC, what this is saying is something along the lines of, "Release the
reference, but rather than actually releasing it, just keep it and
convert it into a non-owning reference". IMO KF_RELEASE should always
mean, exclusively, "I'm releasing a previously-acquired strong reference
to an object", and the expectation should be that the object cannot be
referenced _at all_ afterwards, unless you happen to have another strong
reference.

IMO this is another sign that we should consider going in a different
direction for owning vs.  non-owning references. I don't think this
makes sense from an object-refcounting perspective, but I readily admit
that I could be missing a lot of important context here.

[...]

Thanks,
David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
@ 2022-12-29  3:24   ` Alexei Starovoitov
  2022-12-29  6:40   ` David Vernet
  1 sibling, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-12-29  3:24 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
> Currently, kfuncs marked KF_RELEASE indicate that they release some
> previously-acquired arg. The verifier assumes that such a function will
> only have one arg reg w/ ref_obj_id set, and that that arg is the one to
> be released. Multiple kfunc arg regs have ref_obj_id set is considered
> an invalid state.
> 
> For helpers, RELEASE is used to tag a particular arg in the function
> proto, not the function itself. The arg with OBJ_RELEASE type tag is the
> arg that the helper will release. There can only be one such tagged arg.
> When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
> also considered an invalid state.
> 
> Later patches in this series will result in some linked_list helpers
> marked KF_RELEASE having a valid reason to take two ref_obj_id args.
> Specifically, bpf_list_push_{front,back} can push a node to a list head
> which is itself part of a list node. In such a scenario both arguments
> to these functions would have ref_obj_id > 0, thus would fail
> verification under current logic.

Well, I think this patch is unnecessary, because there is really no need
to mark lish_push as KF_RELEASE.
The verifier still has to do custom checks for both arguments:
list_node and list_head.
They are different enough. The 'generalization' via
KF_RELEASE | KF_RELEASE_NON_OWN is quite confusing.
Especially considering how register is being picked: 1st vs 2nd.
More details on this in the other reply to patch 2.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
  2022-12-17  9:21   ` Dave Marchevsky
  2022-12-28 23:46   ` David Vernet
@ 2022-12-29  3:56   ` Alexei Starovoitov
  2022-12-29 16:54     ` David Vernet
  2023-01-17 16:07     ` Dave Marchevsky
  2 siblings, 2 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-12-29  3:56 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote:
> This patch introduces non-owning reference semantics to the verifier,
> specifically linked_list API kfunc handling. release_on_unlock logic for
> refs is refactored - with small functional changes - to implement these
> semantics, and bpf_list_push_{front,back} are migrated to use them.
> 
> When a list node is pushed to a list, the program still has a pointer to
> the node:
> 
>   n = bpf_obj_new(typeof(*n));
> 
>   bpf_spin_lock(&l);
>   bpf_list_push_back(&l, n);
>   /* n still points to the just-added node */
>   bpf_spin_unlock(&l);
> 
> What the verifier considers n to be after the push, and thus what can be
> done with n, are changed by this patch.
> 
> Common properties both before/after this patch:
>   * After push, n is only a valid reference to the node until end of
>     critical section
>   * After push, n cannot be pushed to any list
>   * After push, the program can read the node's fields using n

correct.

> Before:
>   * After push, n retains the ref_obj_id which it received on
>     bpf_obj_new, but the associated bpf_reference_state's
>     release_on_unlock field is set to true
>     * release_on_unlock field and associated logic is used to implement
>       "n is only a valid ref until end of critical section"
>   * After push, n cannot be written to, the node must be removed from
>     the list before writing to its fields
>   * After push, n is marked PTR_UNTRUSTED

yep

> After:
>   * After push, n's ref is released and ref_obj_id set to 0. The
>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>     currently active lock
>     * non_owning_ref_lock and logic is used to implement "n is only a
>       valid ref until end of critical section"
>   * n can be written to (except for special fields e.g. bpf_list_node,
>     timer, ...)
>   * No special type flag is added to n after push

yep.
Great summary.

> Summary of specific implementation changes to achieve the above:
> 
>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>     to "release on unlock" based on that field are removed

+1 

>   * The anonymous active_lock struct used by bpf_verifier_state is
>     pulled out into a named struct bpf_active_lock.
...
>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>     bpf_reg_state's PTR_TO_BTF_ID union

not great. see below.

>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>     ref semantics as described above
>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>       matching a particular bpf_active_lock identity. Replaces
>       release_on_unlock logic in process_spin_lock.

+1

>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>       on current verifier state

+1

>     * ref_convert_owning_non_owning - convert owning reference w/
>       specified ref_obj_id to non-owning references. Setup
>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>       its ref_obj_id

+1

>   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
>     KF_RELEASE to indicate that the release arg reg should be converted
>     to non-owning ref
>     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
>       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
>       doing ref_convert_owning_non_owning on the ref first, which
>       prevents the regs from being clobbered by 0ing out their
>       ref_obj_ids. The bpf_reference_state itself is still released via
>       release_reference as a result of the KF_RELEASE flag.
>     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
>       bpf_list_push_{front,back}

And this bit is confusing and not generalizable.
As David noticed in his reply KF_RELEASE_NON_OWN is not a great name.
It's hard to come up with a good name and it won't be generic anyway.
The ref_convert_owning_non_owning has to be applied to a specific arg.
The function itself is not KF_RELEASE in the current definition of it.
The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new
that should have been generic, but doesn't really work this way.
In the next patches rbtree_root/node still has to have all the custom
logic.
KF_RELEASE_NON_OWN by itself is a nonsensical flag.
Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes
sense, but still hard to understand what releases what.
More below.

> After these changes, linked_list's "release on unlock" logic continues
> to function as before, except for the semantic differences noted above.
> The patch immediately following this one makes minor changes to
> linked_list selftests to account for the differing behavior.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h          |   1 +
>  include/linux/bpf_verifier.h |  39 ++++-----
>  include/linux/btf.h          |  17 ++--
>  kernel/bpf/helpers.c         |   4 +-
>  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
>  5 files changed, 146 insertions(+), 79 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3de24cfb7a3d..f71571bf6adc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -180,6 +180,7 @@ enum btf_field_type {
>  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>  	BPF_LIST_HEAD  = (1 << 4),
>  	BPF_LIST_NODE  = (1 << 5),
> +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
>  };
>  
>  struct btf_field_kptr {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 53d175cbaa02..cb417ffbbb84 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>  };
>  
> +/* For every reg representing a map value or allocated object pointer,
> + * we consider the tuple of (ptr, id) for them to be unique in verifier
> + * context and conside them to not alias each other for the purposes of
> + * tracking lock state.
> + */
> +struct bpf_active_lock {
> +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> +	 * there's no active lock held, and other fields have no
> +	 * meaning. If non-NULL, it indicates that a lock is held and
> +	 * id member has the reg->id of the register which can be >= 0.
> +	 */
> +	void *ptr;
> +	/* This will be reg->id */
> +	u32 id;
> +};
> +
>  struct bpf_reg_state {
>  	/* Ordering of fields matters.  See states_equal() */
>  	enum bpf_reg_type type;
> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>  		struct {
>  			struct btf *btf;
>  			u32 btf_id;
> +			struct bpf_active_lock non_owning_ref_lock;

In your other email you argue that pointer should be enough.
I suspect that won't be correct.
See fixes that Andrii did in states_equal() and regsafe().
In particular:
        if (!!old->active_lock.id != !!cur->active_lock.id)
                return false;

        if (old->active_lock.id &&
            !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch))
                return false;

We have to do the comparison of this new ID via idmap as well.

I think introduction of struct bpf_active_lock  and addition of it
to bpf_reg_state is overkill.
Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe().
I'm guessing you didn't like my 'active_lock_id' suggestion. Fine.
non_own_ref_obj_id would match existing ref_obj_id at least.

>  		};
>  
>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> @@ -223,11 +240,6 @@ struct bpf_reference_state {
>  	 * exiting a callback function.
>  	 */
>  	int callback_ref;
> -	/* Mark the reference state to release the registers sharing the same id
> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> -	 * safe to access inside the critical section).
> -	 */
> -	bool release_on_unlock;
>  };
>  
>  /* state of the program:
> @@ -328,21 +340,8 @@ struct bpf_verifier_state {
>  	u32 branches;
>  	u32 insn_idx;
>  	u32 curframe;
> -	/* For every reg representing a map value or allocated object pointer,
> -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
> -	 * context and conside them to not alias each other for the purposes of
> -	 * tracking lock state.
> -	 */
> -	struct {
> -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> -		 * there's no active lock held, and other fields have no
> -		 * meaning. If non-NULL, it indicates that a lock is held and
> -		 * id member has the reg->id of the register which can be >= 0.
> -		 */
> -		void *ptr;
> -		/* This will be reg->id */
> -		u32 id;
> -	} active_lock;

I would keep it as-is.

> +
> +	struct bpf_active_lock active_lock;
>  	bool speculative;
>  	bool active_rcu_lock;
>  
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 5f628f323442..8aee3f7f4248 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -15,10 +15,10 @@
>  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
>  
>  /* These need to be macros, as the expressions are used in assembler input */
> -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
> -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
> +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
> +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
> +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
>  /* Trusted arguments are those which are guaranteed to be valid when passed to
>   * the kfunc. It is used to enforce that pointers obtained from either acquire
>   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> @@ -67,10 +67,11 @@
>   *	return 0;
>   * }
>   */
> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
> +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
> +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
> +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
> +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */

No need for this flag.

>  /*
>   * Return the name of the passed struct, if exists, or halt the build if for
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index af30c6cbd65d..e041409779c3 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>  #endif
>  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
> -BTF_ID_FLAGS(func, bpf_list_push_front)
> -BTF_ID_FLAGS(func, bpf_list_push_back)
> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)

No need for this.

>  BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 824e2242eae5..84b0660e2a76 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
>  
>  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> +				       struct bpf_active_lock *lock);
> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
> +				   struct bpf_reg_state *reg);
>  
>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>  {
> @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>  				verbose_a("id=%d", reg->id);
>  			if (reg->ref_obj_id)
>  				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
> +			if (reg->non_owning_ref_lock.ptr)
> +				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
> +					  reg->non_owning_ref_lock.id);
>  			if (t != SCALAR_VALUE)
>  				verbose_a("off=%d", reg->off);
>  			if (type_is_pkt_pointer(t))
> @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  			return -EACCES;
>  		}
>  
> -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
> +		    !reg->non_owning_ref_lock.ptr) {
>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>  			return -EFAULT;
>  		}
> @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  			cur->active_lock.ptr = btf;
>  		cur->active_lock.id = reg->id;
>  	} else {
> -		struct bpf_func_state *fstate = cur_func(env);
>  		void *ptr;
> -		int i;
>  
>  		if (map)
>  			ptr = map;
> @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  			verbose(env, "bpf_spin_unlock of different lock\n");
>  			return -EINVAL;
>  		}
> -		cur->active_lock.ptr = NULL;
> -		cur->active_lock.id = 0;
>  
> -		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
> -			int err;
> +		invalidate_non_owning_refs(env, &cur->active_lock);

+1

> -			/* Complain on error because this reference state cannot
> -			 * be freed before this point, as bpf_spin_lock critical
> -			 * section does not allow functions that release the
> -			 * allocated object immediately.
> -			 */
> -			if (!fstate->refs[i].release_on_unlock)
> -				continue;
> -			err = release_reference(env, fstate->refs[i].id);
> -			if (err) {
> -				verbose(env, "failed to release release_on_unlock reference");
> -				return err;
> -			}
> -		}
> +		cur->active_lock.ptr = NULL;
> +		cur->active_lock.id = 0;

+1

>  	}
>  	return 0;
>  }
> @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  	return 0;
>  }
>  
> +static struct btf_field *
> +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
> +{
> +	struct btf_field *field;
> +	struct btf_record *rec;
> +
> +	rec = reg_btf_record(reg);
> +	if (!reg)
> +		return NULL;
> +
> +	field = btf_record_find(rec, off, fields);
> +	if (!field)
> +		return NULL;
> +
> +	return field;
> +}

Doesn't look like that this helper is really necessary.

> +
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
>  			   enum bpf_arg_type arg_type)
> @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  		 */
>  		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
>  			return 0;
> +
> +		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
> +			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
> +				return __check_ptr_off_reg(env, reg, regno, true);
> +
> +			verbose(env, "R%d must have zero offset when passed to release func\n",
> +				regno);
> +			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
> +				kernel_type_name(reg->btf, reg->btf_id), reg->off);
> +			return -EINVAL;
> +		}

This bit is only necessary if we mark push_list as KF_RELEASE.
Just don't add this mark and drop above.

> +
>  		/* Doing check_ptr_off_reg check for the offset will catch this
>  		 * because fixed_off_ok is false, but checking here allows us
>  		 * to give the user a better error message.
> @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> +				       struct bpf_active_lock *lock)
> +{
> +	struct bpf_func_state *unused;
> +	struct bpf_reg_state *reg;
> +
> +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> +		if (reg->non_owning_ref_lock.ptr &&
> +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
> +		    reg->non_owning_ref_lock.id == lock->id)

I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things.
We're under active spin_lock here. All regs were checked earlier and id keeps incrementing.
So we can just do 'u32 non_own_ref_obj_id'.

> +			__mark_reg_unknown(env, reg);
> +	}));
> +}
> +
>  static void clear_caller_saved_regs(struct bpf_verifier_env *env,
>  				    struct bpf_reg_state *regs)
>  {
> @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
>  	return meta->kfunc_flags & KF_RELEASE;
>  }
>  
> +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
> +}
> +

No need.

>  static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
>  {
>  	return meta->kfunc_flags & KF_TRUSTED_ARGS;
> @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>  {
> -	struct bpf_func_state *state = cur_func(env);
> +	struct bpf_verifier_state *state = env->cur_state;
> +
> +	if (!state->active_lock.ptr) {
> +		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
> +		return -EFAULT;
> +	}
> +
> +	if (reg->non_owning_ref_lock.ptr) {
> +		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
> +		return -EFAULT;
> +	}
> +
> +	reg->non_owning_ref_lock.id = state->active_lock.id;
> +	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
> +	return 0;
> +}
> +
> +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
> +{
> +	struct bpf_func_state *state, *unused;
>  	struct bpf_reg_state *reg;
>  	int i;
>  
> -	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
> -	 * subprogs, no global functions. This means that the references would
> -	 * not be released inside the critical section but they may be added to
> -	 * the reference state, and the acquired_refs are never copied out for a
> -	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
> -	 * critical sections.
> -	 */
> +	state = cur_func(env);
> +
>  	if (!ref_obj_id) {
> -		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
> +		verbose(env, "verifier internal error: ref_obj_id is zero for "
> +			     "owning -> non-owning conversion\n");
>  		return -EFAULT;
>  	}
> +
>  	for (i = 0; i < state->acquired_refs; i++) {
> -		if (state->refs[i].id == ref_obj_id) {
> -			if (state->refs[i].release_on_unlock) {
> -				verbose(env, "verifier internal error: expected false release_on_unlock");
> -				return -EFAULT;
> +		if (state->refs[i].id != ref_obj_id)
> +			continue;
> +
> +		/* Clear ref_obj_id here so release_reference doesn't clobber
> +		 * the whole reg
> +		 */
> +		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> +			if (reg->ref_obj_id == ref_obj_id) {
> +				reg->ref_obj_id = 0;
> +				ref_set_non_owning_lock(env, reg);

+1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no?
and probably reg_set_non_own_ref_obj_id() ?
Or just open code it?

>  			}
> -			state->refs[i].release_on_unlock = true;
> -			/* Now mark everyone sharing same ref_obj_id as untrusted */
> -			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> -				if (reg->ref_obj_id == ref_obj_id)
> -					reg->type |= PTR_UNTRUSTED;
> -			}));
> -			return 0;
> -		}
> +		}));
> +		return 0;
>  	}
> +
>  	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
>  	return -EFAULT;
>  }
> @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  {
>  	const struct btf_type *et, *t;
>  	struct btf_field *field;
> -	struct btf_record *rec;
>  	u32 list_node_off;
>  
>  	if (meta->btf != btf_vmlinux ||
> @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  		return -EINVAL;
>  	}
>  
> -	rec = reg_btf_record(reg);
>  	list_node_off = reg->off + reg->var_off.value;
> -	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> +	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
>  	if (!field || field->offset != list_node_off) {
>  		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
>  		return -EINVAL;
> @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  			btf_name_by_offset(field->list_head.btf, et->name_off));
>  		return -EINVAL;
>  	}
> -	/* Set arg#1 for expiration after unlock */
> -	return ref_set_release_on_unlock(env, reg->ref_obj_id);
> +
> +	return 0;

and here we come to the main point.
Can you just call
ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here?
Everything will be so much simpler, no?

>  }
>  
>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
> @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			    int *insn_idx_p)
>  {
>  	const struct btf_type *t, *func, *func_proto, *ptr_type;
> +	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
>  	struct bpf_reg_state *regs = cur_regs(env);
>  	const char *func_name, *ptr_type_name;
>  	bool sleepable, rcu_lock, rcu_unlock;
>  	struct bpf_kfunc_call_arg_meta meta;
> -	u32 i, nargs, func_id, ptr_type_id;
>  	int err, insn_idx = *insn_idx_p;
>  	const struct btf_param *args;
>  	const struct btf_type *ret_t;
> @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
>  	 */
>  	if (meta.release_regno) {
> -		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
> +		err = 0;
> +		release_ref_obj_id = regs[meta.release_regno].ref_obj_id;
> +
> +		if (is_kfunc_release_non_own(&meta))
> +			err = ref_convert_owning_non_owning(env, release_ref_obj_id);
> +		if (err) {
> +			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
> +				func_name, func_id);
> +			return err;
> +		}
> +
> +		err = release_reference(env, release_ref_obj_id);

and this bit won't be needed.
and no need to guess in patch 1 which arg has to be released and converted to non_own.

>  		if (err) {
>  			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
>  				func_name, func_id);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  2022-12-17  8:25 ` [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
@ 2022-12-29  4:00   ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-12-29  4:00 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:25:00AM -0800, Dave Marchevsky wrote:
>  
> +static int
> +__process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
> +				   struct bpf_reg_state *reg, u32 regno,
> +				   struct bpf_kfunc_call_arg_meta *meta,
> +				   enum btf_field_type head_field_type,
> +				   enum btf_field_type node_field_type,
> +				   struct btf_field **node_field)
> +{
> +	const char *node_type_name;
>  	const struct btf_type *et, *t;
>  	struct btf_field *field;
> -	u32 list_node_off;
> +	u32 node_off;
>  
> -	if (meta->btf != btf_vmlinux ||
> -	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
> -	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
> -		verbose(env, "verifier internal error: bpf_list_node argument for unknown kfunc\n");
> +	if (meta->btf != btf_vmlinux) {
> +		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
>  		return -EFAULT;
>  	}
>  
> +	if (!check_kfunc_is_graph_node_api(env, node_field_type, meta->func_id))
> +		return -EFAULT;
> +
> +	node_type_name = btf_field_type_name(node_field_type);
>  	if (!tnum_is_const(reg->var_off)) {
>  		verbose(env,
> -			"R%d doesn't have constant offset. bpf_list_node has to be at the constant offset\n",
> -			regno);
> +			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
> +			regno, node_type_name);
>  		return -EINVAL;
>  	}
>  
> -	list_node_off = reg->off + reg->var_off.value;
> -	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
> -	if (!field || field->offset != list_node_off) {
> -		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> +	node_off = reg->off + reg->var_off.value;
> +	field = reg_find_field_offset(reg, node_off, node_field_type);
> +	if (!field || field->offset != node_off) {
> +		verbose(env, "%s not found at offset=%u\n", node_type_name, node_off);
>  		return -EINVAL;
>  	}
>  
> -	field = meta->arg_list_head.field;
> +	field = *node_field;
>  
>  	et = btf_type_by_id(field->graph_root.btf, field->graph_root.value_btf_id);
>  	t = btf_type_by_id(reg->btf, reg->btf_id);
>  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->graph_root.btf,
>  				  field->graph_root.value_btf_id, true)) {
> -		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
> +		verbose(env, "operation on %s expects arg#1 %s at offset=%d "
>  			"in struct %s, but arg is at offset=%d in struct %s\n",
> +			btf_field_type_name(head_field_type),
> +			btf_field_type_name(node_field_type),
>  			field->graph_root.node_offset,
>  			btf_name_by_offset(field->graph_root.btf, et->name_off),
> -			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
> +			node_off, btf_name_by_offset(reg->btf, t->name_off));
>  		return -EINVAL;
>  	}
>  
> -	if (list_node_off != field->graph_root.node_offset) {
> -		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
> -			list_node_off, field->graph_root.node_offset,
> +	if (node_off != field->graph_root.node_offset) {
> +		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
> +			node_off, btf_field_type_name(node_field_type),
> +			field->graph_root.node_offset,
>  			btf_name_by_offset(field->graph_root.btf, et->name_off));
>  		return -EINVAL;
>  	}
> @@ -8932,6 +9053,24 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  	return 0;
>  }

and with suggestion in patch 2 the single __process_kf_arg_ptr_to_graph_node helper
called as:

> +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> +					   struct bpf_reg_state *reg, u32 regno,
> +					   struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
> +						  BPF_LIST_HEAD, BPF_LIST_NODE,
> +						  &meta->arg_list_head.field);
> +}
> +
> +static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
> +					     struct bpf_reg_state *reg, u32 regno,
> +					     struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
> +						  BPF_RB_ROOT, BPF_RB_NODE,
> +						  &meta->arg_rbtree_root.field);
> +}

would convert the arg from owning to non-owning.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2022-12-17  8:25 ` [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2022-12-29  4:02   ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-12-29  4:02 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:25:02AM -0800, Dave Marchevsky wrote:
> Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> that require handling in the verifier:
> 
>   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
>     the bpf_rb_node field, with the offset set to that field's offset,
>     instead of a struct bpf_rb_node *
>     * mark_reg_graph_node helper added in previous patch generalizes
>       this logic, use it
> 
>   * bpf_rbtree_remove's node input is a node that's been inserted
>     in the tree - a non-owning reference.
> 
>   * bpf_rbtree_remove must invalidate non-owning references in order to
>     avoid aliasing issue. Add KF_INVALIDATE_NON_OWN flag, which
>     indicates that the marked kfunc is a non-owning ref invalidation
>     point, and associated verifier logic using previously-added
>     invalidate_non_owning_refs helper.
> 
>   * Unlike other functions, which convert one of their input arg regs to
>     non-owning reference, bpf_rbtree_first takes no arguments and just
>     returns a non-owning reference (possibly null)
>     * For now verifier logic for this is special-cased instead of
>       adding new kfunc flag.
> 
> This patch, along with the previous one, complete special verifier
> handling for all rbtree API functions added in this series.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/btf.h   |  1 +
>  kernel/bpf/helpers.c  |  2 +-
>  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++------
>  3 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 8aee3f7f4248..3663911bb7c0 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -72,6 +72,7 @@
>  #define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
>  #define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
>  #define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
> +#define KF_INVALIDATE_NON_OWN	(1 << 9) /* kfunc invalidates non-owning refs after return */
>  
>  /*
>   * Return the name of the passed struct, if exists, or halt the build if for
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index de4523c777b7..0e6d010e6423 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2121,7 +2121,7 @@ BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
>  BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
> -BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
> +BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_INVALIDATE_NON_OWN)

I don't like this 'generalization' either.

>  BTF_ID_FLAGS(func, bpf_rbtree_add, KF_RELEASE | KF_RELEASE_NON_OWN)
>  BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
>  
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 75979f78399d..b4bf3701de7f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8393,6 +8393,11 @@ static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
>  	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
>  }
>  
> +static bool is_kfunc_invalidate_non_own(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_INVALIDATE_NON_OWN;
> +}
> +
>  static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
>  {
>  	return meta->kfunc_flags & KF_TRUSTED_ARGS;
> @@ -9425,10 +9430,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  				verbose(env, "arg#%d expected pointer to allocated object\n", i);
>  				return -EINVAL;
>  			}
> -			if (!reg->ref_obj_id) {
> +			if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove]) {
> +				if (reg->ref_obj_id) {
> +					verbose(env, "rbtree_remove node input must be non-owning ref\n");
> +					return -EINVAL;
> +				}
> +				if (in_rbtree_lock_required_cb(env)) {
> +					verbose(env, "rbtree_remove not allowed in rbtree cb\n");
> +					return -EINVAL;
> +				}
> +			} else if (!reg->ref_obj_id) {
>  				verbose(env, "allocated object must be referenced\n");
>  				return -EINVAL;
>  			}
> +
>  			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
>  			if (ret < 0)
>  				return ret;
> @@ -9665,11 +9680,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
>  				struct btf_field *field = meta.arg_list_head.field;
>  
> -				mark_reg_known_zero(env, regs, BPF_REG_0);
> -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> -				regs[BPF_REG_0].btf = field->graph_root.btf;
> -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
> -				regs[BPF_REG_0].off = field->graph_root.node_offset;
> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||

Just call invalidate_non_owning_refs() here since it needs to be a special case anyway.

> +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> +				struct btf_field *field = meta.arg_rbtree_root.field;
> +
> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
> @@ -9735,7 +9751,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			if (is_kfunc_ret_null(&meta))
>  				regs[BPF_REG_0].id = id;
>  			regs[BPF_REG_0].ref_obj_id = id;
> +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
>  		}
> +
> +		if (is_kfunc_invalidate_non_own(&meta))
> +			invalidate_non_owning_refs(env, &env->cur_state->active_lock);
> +
>  		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
>  			regs[BPF_REG_0].id = ++env->id_gen;
>  	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
  2022-12-29  3:24   ` Alexei Starovoitov
@ 2022-12-29  6:40   ` David Vernet
  2022-12-29 16:50     ` Alexei Starovoitov
  1 sibling, 1 reply; 38+ messages in thread
From: David Vernet @ 2022-12-29  6:40 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
> Currently, kfuncs marked KF_RELEASE indicate that they release some
> previously-acquired arg. The verifier assumes that such a function will
> only have one arg reg w/ ref_obj_id set, and that that arg is the one to
> be released. Multiple kfunc arg regs have ref_obj_id set is considered
> an invalid state.
> 
> For helpers, RELEASE is used to tag a particular arg in the function
> proto, not the function itself. The arg with OBJ_RELEASE type tag is the
> arg that the helper will release. There can only be one such tagged arg.
> When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
> also considered an invalid state.
> 
> Later patches in this series will result in some linked_list helpers
> marked KF_RELEASE having a valid reason to take two ref_obj_id args.
> Specifically, bpf_list_push_{front,back} can push a node to a list head
> which is itself part of a list node. In such a scenario both arguments
> to these functions would have ref_obj_id > 0, thus would fail
> verification under current logic.
> 
> This patch changes kfunc ref_obj_id searching logic to find the last arg
> reg w/ ref_obj_id and consider that the reg-to-release. This should be
> backwards-compatible with all current kfuncs as they only expect one
> such arg reg.

Can't say I'm a huge fan of this proposal :-( While I think it's really
unfortunate that kfunc flags are not defined per-arg for this exact type
of reason, adding more flag-specific semantics like this is IMO a step
in the wrong direction.  It's similar to the existing __sz and __k
argument-naming semantics that inform the verifier that the arguments
have special meaning. All of these little additions of special-case
handling for kfunc flags end up requiring people writing kfuncs (and
sometimes calling them) to read through the verifier to understand
what's going on (though I will say that it's nice that __sz and __k are
properly documented in [0]).

[0]: https://docs.kernel.org/bpf/kfuncs.html#sz-annotation

The correct thing to do here, in my opinion, is to work to combine kfunc
and helper definitions. Right now that's of course not possible for a
number of reasons, including the fact that kfuncs can do things that
helpers cannot. If we do end up merging it, at the very least I'd ask
you to please loudly document the behavior both in
Documentation/bpf/kfuncs.rst, and in the code where the kfunc flags are
defined, if you don't mind.

Of course, that's assuming that we decide that we still need this, per
Alexei's comment in [1].

[1]: https://lore.kernel.org/all/20221229032442.dkastsstktsxjymb@MacBook-Pro-6.local/

> 
> Currently the ref_obj_id and OBJ_RELEASE searching is done in the code
> that examines each individual arg (check_func_arg for helpers and
> check_kfunc_args inner loop for kfuncs). This patch pulls out this
> searching to occur before individual arg type handling, resulting in a
> cleaner separation of logic.
> 
> Two new helpers are added:
>   * args_find_ref_obj_id_regno
>     * For helpers and kfuncs. Searches through arg regs to find
>       ref_obj_id reg and returns its regno. Helpers set allow_multi =
>       false, retaining "only one ref_obj_id arg" behavior, while kfuncs
>       set allow_multi = true and get the last ref_obj_id arg reg back.
> 
>   * helper_proto_find_release_arg_regno
>     * For helpers only. Searches through fn proto args to find the
>       OBJ_RELEASE arg and returns the corresponding regno.
> 
> Aside from the intentional semantic change for kfuncs, the rest of the
> refactoring strives to keep failure logic and error messages unchanged.
> However, because the release arg searching is now done before any
> arg-specific type checking, verifier states that are invalid due to both
> invalid release arg state _and_ some type- or helper-specific checking
> might see release arg-related error message first.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  kernel/bpf/verifier.c | 206 ++++++++++++++++++++++++++++--------------
>  1 file changed, 138 insertions(+), 68 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a5255a0dcbb6..824e2242eae5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6412,49 +6412,6 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		return err;
>  
>  skip_type_check:
> -	if (arg_type_is_release(arg_type)) {
> -		if (arg_type_is_dynptr(arg_type)) {
> -			struct bpf_func_state *state = func(env, reg);
> -			int spi;
> -
> -			/* Only dynptr created on stack can be released, thus
> -			 * the get_spi and stack state checks for spilled_ptr
> -			 * should only be done before process_dynptr_func for
> -			 * PTR_TO_STACK.
> -			 */
> -			if (reg->type == PTR_TO_STACK) {
> -				spi = get_spi(reg->off);
> -				if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> -				    !state->stack[spi].spilled_ptr.ref_obj_id) {
> -					verbose(env, "arg %d is an unacquired reference\n", regno);
> -					return -EINVAL;
> -				}
> -			} else {
> -				verbose(env, "cannot release unowned const bpf_dynptr\n");
> -				return -EINVAL;
> -			}
> -		} else if (!reg->ref_obj_id && !register_is_null(reg)) {
> -			verbose(env, "R%d must be referenced when passed to release function\n",
> -				regno);
> -			return -EINVAL;
> -		}
> -		if (meta->release_regno) {
> -			verbose(env, "verifier internal error: more than one release argument\n");
> -			return -EFAULT;
> -		}
> -		meta->release_regno = regno;
> -	}
> -
> -	if (reg->ref_obj_id) {
> -		if (meta->ref_obj_id) {
> -			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> -				regno, reg->ref_obj_id,
> -				meta->ref_obj_id);
> -			return -EFAULT;
> -		}
> -		meta->ref_obj_id = reg->ref_obj_id;
> -	}
> -
>  	switch (base_type(arg_type)) {
>  	case ARG_CONST_MAP_PTR:
>  		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
> @@ -6565,6 +6522,27 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		err = check_mem_size_reg(env, reg, regno, true, meta);
>  		break;
>  	case ARG_PTR_TO_DYNPTR:
> +		if (meta->release_regno == regno) {
> +			struct bpf_func_state *state = func(env, reg);
> +			int spi;
> +
> +			/* Only dynptr created on stack can be released, thus
> +			 * the get_spi and stack state checks for spilled_ptr
> +			 * should only be done before process_dynptr_func for
> +			 * PTR_TO_STACK.
> +			 */
> +			if (reg->type == PTR_TO_STACK) {
> +				spi = get_spi(reg->off);
> +				if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +				    !state->stack[spi].spilled_ptr.ref_obj_id) {
> +					verbose(env, "arg %d is an unacquired reference\n", regno);
> +					return -EINVAL;
> +				}
> +			} else {
> +				verbose(env, "cannot release unowned const bpf_dynptr\n");
> +				return -EINVAL;
> +			}
> +		}
>  		err = process_dynptr_func(env, regno, arg_type, meta);
>  		if (err)
>  			return err;
> @@ -7699,10 +7677,78 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
>  				 state->callback_subprogno == subprogno);
>  }
>  
> +/* Call arg meta's ref_obj_id is used to either:
> + *   - For release funcs, keep track of ref that needs to be released
> + *   - For other funcs, keep track of ref that needs to be propagated to retval
> + *
> + * Find and return:
> + *   - Regno that should become meta->ref_obj_id on success
> + *     (regno > 0 since BPF_REG_1 is first arg)
> + *   - 0 if no arg had ref_obj_id set
> + *   - Negative err if some invalid arg reg state
> + *
> + * allow_multi controls whether multiple args w/ ref_obj_id set is valid
> + *   - true: regno of _last_ such arg reg is returned
> + *   - false: err if multiple args w/ ref_obj_id set are seen
> + */

Could you please update this function header to match the suggested
formatting in the coding style ([1])? Applies to
helper_proto_find_release_arg_regno() as well.

[1]: https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation

> +static int args_find_ref_obj_id_regno(struct bpf_verifier_env *env, struct bpf_reg_state *regs,
> +				      u32 nargs, bool allow_multi)
> +{
> +	struct bpf_reg_state *reg;
> +	u32 i, regno, found_regno = 0;
> +
> +	for (i = 0; i < nargs; i++) {
> +		regno = i + 1;
> +		reg = &regs[regno];
> +
> +		if (!reg->ref_obj_id)
> +			continue;
> +
> +		if (!allow_multi && found_regno) {
> +			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> +				regno, reg->ref_obj_id, regs[found_regno].ref_obj_id);
> +			return -EFAULT;
> +		}
> +
> +		found_regno = regno;
> +	}
> +
> +	return found_regno;
> +}
> +
> +/* Find the OBJ_RELEASE arg in helper func proto and return:
> + *   - regno of single OBJ_RELEASE arg
> + *   - 0 if no arg in the proto was OBJ_RELEASE
> + *   - Negative err if some invalid func proto state
> + */
> +static int helper_proto_find_release_arg_regno(struct bpf_verifier_env *env,
> +					       const struct bpf_func_proto *fn, u32 nargs)
> +{
> +	enum bpf_arg_type arg_type;
> +	int i, release_regno = 0;
> +
> +	for (i = 0; i < nargs; i++) {
> +		arg_type = fn->arg_type[i];
> +
> +		if (!arg_type_is_release(arg_type))
> +			continue;
> +
> +		if (release_regno) {
> +			verbose(env, "verifier internal error: more than one release argument\n");
> +			return -EFAULT;
> +		}
> +
> +		release_regno = i + BPF_REG_1;
> +	}
> +
> +	return release_regno;
> +}
> +
>  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			     int *insn_idx_p)
>  {
>  	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> +	int i, err, func_id, nargs, release_regno, ref_regno;
>  	const struct bpf_func_proto *fn = NULL;
>  	enum bpf_return_type ret_type;
>  	enum bpf_type_flag ret_flag;
> @@ -7710,7 +7756,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  	struct bpf_call_arg_meta meta;
>  	int insn_idx = *insn_idx_p;
>  	bool changes_data;
> -	int i, err, func_id;
>  
>  	/* find function prototype */
>  	func_id = insn->imm;
> @@ -7774,8 +7819,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  	}
>  
>  	meta.func_id = func_id;
> +	regs = cur_regs(env);
> +
> +	/* find actual arg count */
> +	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++)
> +		if (fn->arg_type[i] == ARG_DONTCARE)
> +			break;
> +	nargs = i;

Is this just an optimization to avoid unnecessary loop iterations? If
so, can we pull it into a separate patch? Also, you could very slightly
simplify this by doing the for loop over nargs here instead of i. Feel
free to ignore though if you think that will be less readable.

> +
> +	release_regno = helper_proto_find_release_arg_regno(env, fn, nargs);
> +	if (release_regno < 0)
> +		return release_regno;
> +
> +	ref_regno = args_find_ref_obj_id_regno(env, regs, nargs, false);
> +	if (ref_regno < 0)
> +		return ref_regno;

Hmm, I'm confused. Why are we tracking two different registers here,
given that it's a helper function so the release argument should be
unambiguous? Can we just get rid of ref_regno and use release_regno
here? Or am I missing something?

Note that I don't think it's necessarily incorrect to pass multiple
arguments with ref_obj_id > 0 to a helper function precisly because
there's no ambiguity as to which argument is being released. One
argument could be a refcounted object that's not being released, and
another could be the object being released. I don't think we have any
such helpers, but conceptually it doesn't seem like something we'd need
to protect against. It's actually kfuncs where it feels problematic to
have multiple ref_obj_id > 0 args due to the inherent ambiguity, though
I realize the intention of this patch is to enable the behavior for
kfuncs.

> +	else if (ref_regno > 0)
> +		meta.ref_obj_id = regs[ref_regno].ref_obj_id;
> +
> +	if (release_regno > 0) {
> +		if (!regs[release_regno].ref_obj_id &&
> +		    !register_is_null(&regs[release_regno]) &&
> +		    !arg_type_is_dynptr(fn->arg_type[release_regno - BPF_REG_1])) {
> +			verbose(env, "R%d must be referenced when passed to release function\n",
> +				release_regno);
> +			return -EINVAL;
> +		}
> +
> +		meta.release_regno = release_regno;
> +	}
> +
>  	/* check args */
> -	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> +	for (i = 0; i < nargs; i++) {
>  		err = check_func_arg(env, i, &meta, fn);
>  		if (err)
>  			return err;
> @@ -7799,8 +7874,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			return err;
>  	}
>  
> -	regs = cur_regs(env);
> -
>  	/* This can only be set for PTR_TO_STACK, as CONST_PTR_TO_DYNPTR cannot
>  	 * be reinitialized by any dynptr helper. Hence, mark_stack_slots_dynptr
>  	 * is safe to do directly.
> @@ -8795,10 +8868,11 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
>  {
>  	const char *func_name = meta->func_name, *ref_tname;
> +	struct bpf_reg_state *regs = cur_regs(env);
>  	const struct btf *btf = meta->btf;
>  	const struct btf_param *args;
>  	u32 i, nargs;
> -	int ret;
> +	int ret, ref_regno;
>  
>  	args = (const struct btf_param *)(meta->func_proto + 1);
>  	nargs = btf_type_vlen(meta->func_proto);
> @@ -8808,17 +8882,31 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  		return -EINVAL;
>  	}
>  
> +	ref_regno = args_find_ref_obj_id_regno(env, cur_regs(env), nargs, true);
> +	if (ref_regno < 0) {
> +		return ref_regno;
> +	} else if (!ref_regno && is_kfunc_release(meta)) {
> +		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
> +			func_name);
> +		return -EINVAL;
> +	}
> +
> +	meta->ref_obj_id = regs[ref_regno].ref_obj_id;
> +	if (is_kfunc_release(meta))
> +		meta->release_regno = ref_regno;
> +
>  	/* Check that BTF function arguments match actual types that the
>  	 * verifier sees.
>  	 */
>  	for (i = 0; i < nargs; i++) {
> -		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
>  		const struct btf_type *t, *ref_t, *resolve_ret;
>  		enum bpf_arg_type arg_type = ARG_DONTCARE;
>  		u32 regno = i + 1, ref_id, type_size;
>  		bool is_ret_buf_sz = false;
> +		struct bpf_reg_state *reg;
>  		int kf_arg_type;
>  
> +		reg = &regs[regno];
>  		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
>  
>  		if (is_kfunc_arg_ignore(btf, &args[i]))
> @@ -8875,18 +8963,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			return -EINVAL;
>  		}
>  
> -		if (reg->ref_obj_id) {
> -			if (is_kfunc_release(meta) && meta->ref_obj_id) {
> -				verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> -					regno, reg->ref_obj_id,
> -					meta->ref_obj_id);
> -				return -EFAULT;
> -			}
> -			meta->ref_obj_id = reg->ref_obj_id;
> -			if (is_kfunc_release(meta))
> -				meta->release_regno = regno;
> -		}
> -
>  		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
>  		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
>  
> @@ -8929,7 +9005,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			return -EFAULT;
>  		}
>  
> -		if (is_kfunc_release(meta) && reg->ref_obj_id)
> +		if (is_kfunc_release(meta) && regno == meta->release_regno)
>  			arg_type |= OBJ_RELEASE;
>  		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
>  		if (ret < 0)
> @@ -9049,12 +9125,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  		}
>  	}
>  
> -	if (is_kfunc_release(meta) && !meta->release_regno) {
> -		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
> -			func_name);
> -		return -EINVAL;
> -	}
> -
>  	return 0;
>  }
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-28 23:46   ` David Vernet
@ 2022-12-29 15:39     ` David Vernet
  0 siblings, 0 replies; 38+ messages in thread
From: David Vernet @ 2022-12-29 15:39 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 28, 2022 at 05:46:54PM -0600, David Vernet wrote:

[...]

> Hey Dave,
> 
> I'm sorry to be chiming in a bit late in the game here, but I only
> finally had the time to fully review some of this stuff during the
> holiday-lull, and I have a few questions / concerns about the whole
> owning vs. non-owning refcount approach we're taking here.

After reading through sleeping on this and reading through the
discussion in [0], I have some slight adjustments I want to make to my
points here.

[0]: https://lore.kernel.org/bpf/20221207230602.logjjjv3kwiiy6u3@macbook-pro-6.dhcp.thefacebook.com/

> 
> > ---
> >  include/linux/bpf.h          |   1 +
> >  include/linux/bpf_verifier.h |  39 ++++-----
> >  include/linux/btf.h          |  17 ++--
> >  kernel/bpf/helpers.c         |   4 +-
> >  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
> >  5 files changed, 146 insertions(+), 79 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 3de24cfb7a3d..f71571bf6adc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -180,6 +180,7 @@ enum btf_field_type {
> >  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> >  	BPF_LIST_HEAD  = (1 << 4),
> >  	BPF_LIST_NODE  = (1 << 5),
> > +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
> >  };
> >  
> >  struct btf_field_kptr {
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 53d175cbaa02..cb417ffbbb84 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
> >  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
> >  };
> >  
> > +/* For every reg representing a map value or allocated object pointer,
> > + * we consider the tuple of (ptr, id) for them to be unique in verifier
> > + * context and conside them to not alias each other for the purposes of
> > + * tracking lock state.
> > + */
> > +struct bpf_active_lock {
> > +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> > +	 * there's no active lock held, and other fields have no
> > +	 * meaning. If non-NULL, it indicates that a lock is held and
> > +	 * id member has the reg->id of the register which can be >= 0.
> > +	 */
> > +	void *ptr;
> > +	/* This will be reg->id */
> > +	u32 id;
> > +};
> > +
> >  struct bpf_reg_state {
> >  	/* Ordering of fields matters.  See states_equal() */
> >  	enum bpf_reg_type type;
> > @@ -68,6 +84,7 @@ struct bpf_reg_state {
> >  		struct {
> >  			struct btf *btf;
> >  			u32 btf_id;
> > +			struct bpf_active_lock non_owning_ref_lock;
> >  		};
> >  
> >  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> > @@ -223,11 +240,6 @@ struct bpf_reference_state {
> >  	 * exiting a callback function.
> >  	 */
> >  	int callback_ref;
> > -	/* Mark the reference state to release the registers sharing the same id
> > -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> > -	 * safe to access inside the critical section).
> > -	 */
> > -	bool release_on_unlock;
> >  };
> >  
> >  /* state of the program:
> > @@ -328,21 +340,8 @@ struct bpf_verifier_state {
> >  	u32 branches;
> >  	u32 insn_idx;
> >  	u32 curframe;
> > -	/* For every reg representing a map value or allocated object pointer,
> > -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
> > -	 * context and conside them to not alias each other for the purposes of
> > -	 * tracking lock state.
> > -	 */
> > -	struct {
> > -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> > -		 * there's no active lock held, and other fields have no
> > -		 * meaning. If non-NULL, it indicates that a lock is held and
> > -		 * id member has the reg->id of the register which can be >= 0.
> > -		 */
> > -		void *ptr;
> > -		/* This will be reg->id */
> > -		u32 id;
> > -	} active_lock;
> > +
> > +	struct bpf_active_lock active_lock;
> >  	bool speculative;
> >  	bool active_rcu_lock;
> >  
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 5f628f323442..8aee3f7f4248 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -15,10 +15,10 @@
> >  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
> >  
> >  /* These need to be macros, as the expressions are used in assembler input */
> > -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
> > -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> > -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> > -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> > +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
> > +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
> > +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
> > +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
> >  /* Trusted arguments are those which are guaranteed to be valid when passed to
> >   * the kfunc. It is used to enforce that pointers obtained from either acquire
> >   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> > @@ -67,10 +67,11 @@
> >   *	return 0;
> >   * }
> >   */
> > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> > -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> > -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> > -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> > +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
> > +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
> > +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
> > +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
> > +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
> 
> It would be nice if we could come up with new kfunc flag names that
> don't have 'RELEASE' in it. As is this is arguably a bit of a leaky
> abstraction given that kfunc authors now have to understand a notion of
> "releasing", "releasing but keeping a non-owning ref", and "releasing
> but it must be a non-owning reference". I know that in [0] you mention
> that the notions of owning and non-owning references are entirely
> relegated to graph-type maps, but I disagree. More below.
> 
> [0]: https://lore.kernel.org/all/20221217082506.1570898-14-davemarchevsky@fb.com/

I see now why you said that owning and non-owning were entirely
relegated to graph-type maps. I think the general idea of owning vs.
non-owning references in the context of graph-type maps is reasonable,
but I think the proposal here unintentionally does combine the concepts
due to its naming choices, which is problematic.

I'll briefly say one more thing below, but I'll continue the
conversation on Alexei's email in [1] to keep everything in the same
place. I'll be responding there shortly.

[1]: https://lore.kernel.org/all/20221229035600.m43ayhidfisbl4sq@MacBook-Pro-6.local/

> 
> In general, IMO this muddies the existing, crystal-clear semantics of
> BPF object ownership and refcounting. Usually a "weak" or "non-owning"
> reference is a shadow of a strong reference, and "using" the weak
> reference requires attempting (because it could fail) to temporarily
> promote it to a strong reference. If successful, the object's existence
> is guaranteed until the weak pointer is demoted back to a weak pointer
> and/or the promoted strong pointer is released, and it's perfectly valid
> for an object's lifetime to be extended due to a promoted weak pointer
> not dropping its reference until after all the other strong pointer
> references have been dropped. The key point here is that a pointer's
> safety is entirely dictated by whether or not the holder has or is able
> to acquire a strong reference, and nothing more.
> 
> In contrast, if I understand correctly, in this proposal a "non-owning"
> reference means that the object is guaranteed to be valid due to
> external factors such as a lock being held on the root node of the
> graph, and is used to e.g. signal whether an object has or has not yet
> been added as a node to an rbtree or a list. If so, IMO these are
> completely separate concepts from refcounting, and I don't think we
> should intertwine it with the acquire / release semantics that we
> currently use for ensuring object lifetime.
> 
> Note that weak references are usually (if not always, at least in my
> experience) used to resolve circular dependencies where the reference
> would always be leaked if both sides had a strong reference. I don't
> think that applies here, where instead we're using "owning reference" to
> mean that ownership of the object has not yet been passed to a
> graph-type data structure, and "non-owning reference" to mean that the
> graph now owns the strong reference, but it's still safe to reference
> the object due to it being protected by some external synchronization
> mechanism like a lock. There's no danger of a circular dependency here,
> we just want to provide consistent API semantics.
> 
> If we want to encapsulate notions of "safe due to a lock being held on a
> root node", and "pointer hasn't yet been inserted into the graph", I
> think we should consider adding some entirely separate abstractions. For
> example, something like PTR_GRAPH_STORED on the register type-modifier
> side for signaling whether a pointer has already been stored in a graph,
> and KF_GRAPH_INSERT, KF_GRAPH_REMOVE type kfunc flags for adding and
> removing from graphs respectively. I don't think we'd have to add
> anything at all for ensuring pointer safety from the lock being held, as
> the verifier should be able to figure out that a pointer that was
> inserted with KF_GRAPH_INSERT is safe to reference inside of the locked
> region of the lock associated with the root node. The refcnt of the
> object isn't relevant at all, it's the association of the root node with
> a specific lock.
> 
> >  
> >  /*
> >   * Return the name of the passed struct, if exists, or halt the build if for
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index af30c6cbd65d..e041409779c3 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
> >  #endif
> >  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
> > -BTF_ID_FLAGS(func, bpf_list_push_front)
> > -BTF_ID_FLAGS(func, bpf_list_push_back)
> > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
> > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)
> 
> I don't think a helper should specify both of these flags together.
> IIUC, what this is saying is something along the lines of, "Release the
> reference, but rather than actually releasing it, just keep it and
> convert it into a non-owning reference". IMO KF_RELEASE should always
> mean, exclusively, "I'm releasing a previously-acquired strong reference
> to an object", and the expectation should be that the object cannot be
> referenced _at all_ afterwards, unless you happen to have another strong
> reference.
> 
> IMO this is another sign that we should consider going in a different
> direction for owning vs.  non-owning references. I don't think this

I now don't feel that we should be going in a different direction for
owning vs. non-owning as it applies to graphs as you said in [2]. I do
think, however, that we have to revisit some naming choices, and
possibly consider not adding new kfunc flags at all to enable this.
I'll respond on the other thread to Alexei to keep the discussion in one
place.

[2]: https://lore.kernel.org/all/20221229035600.m43ayhidfisbl4sq@MacBook-Pro-6.local/

> makes sense from an object-refcounting perspective, but I readily admit
> that I could be missing a lot of important context here.
> 
> [...]
> 
> Thanks,
> David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-29  6:40   ` David Vernet
@ 2022-12-29 16:50     ` Alexei Starovoitov
  2022-12-29 17:00       ` David Vernet
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-12-29 16:50 UTC (permalink / raw)
  To: David Vernet
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 28, 2022 at 10:40 PM David Vernet <void@manifault.com> wrote:
>
> On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
> > Currently, kfuncs marked KF_RELEASE indicate that they release some
> > previously-acquired arg. The verifier assumes that such a function will
> > only have one arg reg w/ ref_obj_id set, and that that arg is the one to
> > be released. Multiple kfunc arg regs have ref_obj_id set is considered
> > an invalid state.
> >
> > For helpers, RELEASE is used to tag a particular arg in the function
> > proto, not the function itself. The arg with OBJ_RELEASE type tag is the
> > arg that the helper will release. There can only be one such tagged arg.
> > When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
> > also considered an invalid state.
> >
> > Later patches in this series will result in some linked_list helpers
> > marked KF_RELEASE having a valid reason to take two ref_obj_id args.
> > Specifically, bpf_list_push_{front,back} can push a node to a list head
> > which is itself part of a list node. In such a scenario both arguments
> > to these functions would have ref_obj_id > 0, thus would fail
> > verification under current logic.
> >
> > This patch changes kfunc ref_obj_id searching logic to find the last arg
> > reg w/ ref_obj_id and consider that the reg-to-release. This should be
> > backwards-compatible with all current kfuncs as they only expect one
> > such arg reg.
>
> Can't say I'm a huge fan of this proposal :-( While I think it's really
> unfortunate that kfunc flags are not defined per-arg for this exact type
> of reason, adding more flag-specific semantics like this is IMO a step
> in the wrong direction.  It's similar to the existing __sz and __k
> argument-naming semantics that inform the verifier that the arguments
> have special meaning. All of these little additions of special-case
> handling for kfunc flags end up requiring people writing kfuncs (and
> sometimes calling them) to read through the verifier to understand
> what's going on (though I will say that it's nice that __sz and __k are
> properly documented in [0]).

Before getting to pros/cons of KF_* vs name suffix vs helper style
per-arg description...
It's important to highlight that here we're talking about
link list and rb tree kfuncs that are not like other kfuncs.
Majority of kfuncs can be added by subsystems like hid-bpf
without touching the verifier.
Here we're paving the way for graph (aka new gen data structs)
and so far not only kfuncs, but their arg types have to have
special handling inside the verifier.
There is not much yet to generalize and expose as generic KF_
flag or as a name suffix.
Therefore I think it's more appropriate to implement them
with minimal verifier changes and minimal complexity.
There is no 3rd graph algorithm on the horizon after link list
and rbtree. Instead there is a big todo list for
'multi owner graph node' and 'bpf_refcount_t'.
Those will require bigger changes in the verifier,
so I'd like to avoid premature generalization :) as analogous
to premature optimization :)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-29  3:56   ` Alexei Starovoitov
@ 2022-12-29 16:54     ` David Vernet
  2023-01-17 16:54       ` Dave Marchevsky
  2023-01-17 16:07     ` Dave Marchevsky
  1 sibling, 1 reply; 38+ messages in thread
From: David Vernet @ 2022-12-29 16:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Wed, Dec 28, 2022 at 07:56:00PM -0800, Alexei Starovoitov wrote:
> On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote:
> > This patch introduces non-owning reference semantics to the verifier,
> > specifically linked_list API kfunc handling. release_on_unlock logic for
> > refs is refactored - with small functional changes - to implement these
> > semantics, and bpf_list_push_{front,back} are migrated to use them.
> > 
> > When a list node is pushed to a list, the program still has a pointer to
> > the node:
> > 
> >   n = bpf_obj_new(typeof(*n));
> > 
> >   bpf_spin_lock(&l);
> >   bpf_list_push_back(&l, n);
> >   /* n still points to the just-added node */
> >   bpf_spin_unlock(&l);
> > 
> > What the verifier considers n to be after the push, and thus what can be
> > done with n, are changed by this patch.
> > 
> > Common properties both before/after this patch:
> >   * After push, n is only a valid reference to the node until end of
> >     critical section
> >   * After push, n cannot be pushed to any list
> >   * After push, the program can read the node's fields using n
> 
> correct.
> 
> > Before:
> >   * After push, n retains the ref_obj_id which it received on
> >     bpf_obj_new, but the associated bpf_reference_state's
> >     release_on_unlock field is set to true
> >     * release_on_unlock field and associated logic is used to implement
> >       "n is only a valid ref until end of critical section"
> >   * After push, n cannot be written to, the node must be removed from
> >     the list before writing to its fields
> >   * After push, n is marked PTR_UNTRUSTED
> 
> yep
> 
> > After:
> >   * After push, n's ref is released and ref_obj_id set to 0. The
> >     bpf_reg_state's non_owning_ref_lock struct is populated with the
> >     currently active lock
> >     * non_owning_ref_lock and logic is used to implement "n is only a
> >       valid ref until end of critical section"
> >   * n can be written to (except for special fields e.g. bpf_list_node,
> >     timer, ...)
> >   * No special type flag is added to n after push
> 
> yep.
> Great summary.
> 
> > Summary of specific implementation changes to achieve the above:
> > 
> >   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
> >     to "release on unlock" based on that field are removed
> 
> +1 
> 
> >   * The anonymous active_lock struct used by bpf_verifier_state is
> >     pulled out into a named struct bpf_active_lock.
> ...
> >   * A non_owning_ref_lock field of type bpf_active_lock is added to
> >     bpf_reg_state's PTR_TO_BTF_ID union
> 
> not great. see below.
> 
> >   * Helpers are added to use non_owning_ref_lock to implement non-owning
> >     ref semantics as described above
> >     * invalidate_non_owning_refs - helper to clobber all non-owning refs
> >       matching a particular bpf_active_lock identity. Replaces
> >       release_on_unlock logic in process_spin_lock.
> 
> +1
> 
> >     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
> >       on current verifier state
> 
> +1
> 
> >     * ref_convert_owning_non_owning - convert owning reference w/
> >       specified ref_obj_id to non-owning references. Setup
> >       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
> >       its ref_obj_id
> 
> +1
> 
> >   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
> >     KF_RELEASE to indicate that the release arg reg should be converted
> >     to non-owning ref
> >     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
> >       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
> >       doing ref_convert_owning_non_owning on the ref first, which
> >       prevents the regs from being clobbered by 0ing out their
> >       ref_obj_ids. The bpf_reference_state itself is still released via
> >       release_reference as a result of the KF_RELEASE flag.
> >     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
> >       bpf_list_push_{front,back}
> 
> And this bit is confusing and not generalizable.

+1 on both counts. If we want to make it generalizable, I think the only
way to do would be to generalize it across different graph map types.
For example, to have kfunc flags like KF_GRAPH_INSERT and
KF_GRAPH_REMOVE which signal to the verifier that "for this graph-type
map which has a spin-lock associated with its root node that I expect to
be held, I've {inserted, removed} the node {to, from} the graph, so
adjust the refcnt / pointer type accordingly and then clean up when the
lock is dropped."

I don't see any reason to add kfunc flags for that though, as the fact
that the pointer in question refers to a node that has a root node that
has a lock associated with it is already itself a special-case scenario.
I think we should just special-case these kfuncs in the verifier as
"graph-type" kfuncs in some static global array(s).  That's probably
less error prone anyways, and I don't see the typical kfunc writer ever
needing to do this.

> As David noticed in his reply KF_RELEASE_NON_OWN is not a great name.
> It's hard to come up with a good name and it won't be generic anyway.
> The ref_convert_owning_non_owning has to be applied to a specific arg.
> The function itself is not KF_RELEASE in the current definition of it.
> The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new
> that should have been generic, but doesn't really work this way.
> In the next patches rbtree_root/node still has to have all the custom
> logic.
> KF_RELEASE_NON_OWN by itself is a nonsensical flag.

IMO if a flag doesn't make any sense on its own, or even possibly if it
needs to be mutually exclusive with one or more other flags, it is
probably never a correct building block. Even KF_TRUSTED_ARGS doesn't
really make sense, as it's redundant if KF_RCU is specified. This is
fine though, as IIUC our long-term plan is to get rid of KF_TRUSTED_ARGS
and make it the default behavior for all kfuncs (not trying to hijack
this thread with a tangential discussion about KF_TRUSTED_ARGS, just
using this as an opportunity to point out something to keep in mind as
we continue to add kfunc flags down the road).

> Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes
> sense, but still hard to understand what releases what.

I agree and I think this is an important point. IMO it is a worse
tradeoff to try to generalize this by complicating the definition of a
reference than it is to keep the refcounting APIs straightforward and
well defined. As a basic building block, having an owning refcount
should mean one thing: that the object will not be destroyed and is safe
to dereference. When you start mixing in these graph-specific notions of
references meaning different things in specific contexts, it compromises
that and makes the API significantly less usable and extensible.

For example, at some point we may decide to add something like a
kptr_weak_ref which would function exactly like an std::weak_ptr, except
of course that it would wrap a kptr_ref instead of an std::shared_ptr.
IMO something like that is a more natural and generalizable building
block that cleanly complements refcounting as it exists today.

> More below.
> 
> > After these changes, linked_list's "release on unlock" logic continues
> > to function as before, except for the semantic differences noted above.
> > The patch immediately following this one makes minor changes to
> > linked_list selftests to account for the differing behavior.
> > 
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > ---
> >  include/linux/bpf.h          |   1 +
> >  include/linux/bpf_verifier.h |  39 ++++-----
> >  include/linux/btf.h          |  17 ++--
> >  kernel/bpf/helpers.c         |   4 +-
> >  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
> >  5 files changed, 146 insertions(+), 79 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 3de24cfb7a3d..f71571bf6adc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -180,6 +180,7 @@ enum btf_field_type {
> >  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> >  	BPF_LIST_HEAD  = (1 << 4),
> >  	BPF_LIST_NODE  = (1 << 5),
> > +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,

Can you update the rest of the elements here to keep common indentation?

> >  };
> >  
> >  struct btf_field_kptr {
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 53d175cbaa02..cb417ffbbb84 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
> >  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
> >  };
> >  
> > +/* For every reg representing a map value or allocated object pointer,
> > + * we consider the tuple of (ptr, id) for them to be unique in verifier
> > + * context and conside them to not alias each other for the purposes of
> > + * tracking lock state.
> > + */
> > +struct bpf_active_lock {
> > +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> > +	 * there's no active lock held, and other fields have no
> > +	 * meaning. If non-NULL, it indicates that a lock is held and
> > +	 * id member has the reg->id of the register which can be >= 0.
> > +	 */
> > +	void *ptr;
> > +	/* This will be reg->id */
> > +	u32 id;
> > +};
> > +
> >  struct bpf_reg_state {
> >  	/* Ordering of fields matters.  See states_equal() */
> >  	enum bpf_reg_type type;
> > @@ -68,6 +84,7 @@ struct bpf_reg_state {
> >  		struct {
> >  			struct btf *btf;
> >  			u32 btf_id;
> > +			struct bpf_active_lock non_owning_ref_lock;
> 
> In your other email you argue that pointer should be enough.
> I suspect that won't be correct.
> See fixes that Andrii did in states_equal() and regsafe().
> In particular:
>         if (!!old->active_lock.id != !!cur->active_lock.id)
>                 return false;
> 
>         if (old->active_lock.id &&
>             !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch))
>                 return false;
> 
> We have to do the comparison of this new ID via idmap as well.
> 
> I think introduction of struct bpf_active_lock  and addition of it
> to bpf_reg_state is overkill.
> Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe().
> I'm guessing you didn't like my 'active_lock_id' suggestion. Fine.
> non_own_ref_obj_id would match existing ref_obj_id at least.
> 
> >  		};
> >  
> >  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> > @@ -223,11 +240,6 @@ struct bpf_reference_state {
> >  	 * exiting a callback function.
> >  	 */
> >  	int callback_ref;
> > -	/* Mark the reference state to release the registers sharing the same id
> > -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> > -	 * safe to access inside the critical section).
> > -	 */
> > -	bool release_on_unlock;
> >  };
> >  
> >  /* state of the program:
> > @@ -328,21 +340,8 @@ struct bpf_verifier_state {
> >  	u32 branches;
> >  	u32 insn_idx;
> >  	u32 curframe;
> > -	/* For every reg representing a map value or allocated object pointer,
> > -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
> > -	 * context and conside them to not alias each other for the purposes of
> > -	 * tracking lock state.
> > -	 */
> > -	struct {
> > -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> > -		 * there's no active lock held, and other fields have no
> > -		 * meaning. If non-NULL, it indicates that a lock is held and
> > -		 * id member has the reg->id of the register which can be >= 0.
> > -		 */
> > -		void *ptr;
> > -		/* This will be reg->id */
> > -		u32 id;
> > -	} active_lock;
> 
> I would keep it as-is.
> 
> > +
> > +	struct bpf_active_lock active_lock;
> >  	bool speculative;
> >  	bool active_rcu_lock;
> >  
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 5f628f323442..8aee3f7f4248 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -15,10 +15,10 @@
> >  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
> >  
> >  /* These need to be macros, as the expressions are used in assembler input */
> > -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
> > -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> > -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> > -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> > +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
> > +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
> > +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
> > +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
> >  /* Trusted arguments are those which are guaranteed to be valid when passed to
> >   * the kfunc. It is used to enforce that pointers obtained from either acquire
> >   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> > @@ -67,10 +67,11 @@
> >   *	return 0;
> >   * }
> >   */
> > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> > -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> > -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> > -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
> > +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
> > +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
> > +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
> > +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
> > +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
> 
> No need for this flag.
> 
> >  /*
> >   * Return the name of the passed struct, if exists, or halt the build if for
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index af30c6cbd65d..e041409779c3 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
> >  #endif
> >  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
> > -BTF_ID_FLAGS(func, bpf_list_push_front)
> > -BTF_ID_FLAGS(func, bpf_list_push_back)
> > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
> > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)
> 
> No need for this.
> 
> >  BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 824e2242eae5..84b0660e2a76 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
> >  
> >  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
> >  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
> > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> > +				       struct bpf_active_lock *lock);
> > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
> > +				   struct bpf_reg_state *reg);
> >  
> >  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
> >  {
> > @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
> >  				verbose_a("id=%d", reg->id);
> >  			if (reg->ref_obj_id)
> >  				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
> > +			if (reg->non_owning_ref_lock.ptr)
> > +				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
> > +					  reg->non_owning_ref_lock.id);
> >  			if (t != SCALAR_VALUE)
> >  				verbose_a("off=%d", reg->off);
> >  			if (type_is_pkt_pointer(t))
> > @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >  			return -EACCES;
> >  		}
> >  
> > -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
> > +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
> > +		    !reg->non_owning_ref_lock.ptr) {
> >  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
> >  			return -EFAULT;
> >  		}
> > @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
> >  			cur->active_lock.ptr = btf;
> >  		cur->active_lock.id = reg->id;
> >  	} else {
> > -		struct bpf_func_state *fstate = cur_func(env);
> >  		void *ptr;
> > -		int i;
> >  
> >  		if (map)
> >  			ptr = map;
> > @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
> >  			verbose(env, "bpf_spin_unlock of different lock\n");
> >  			return -EINVAL;
> >  		}
> > -		cur->active_lock.ptr = NULL;
> > -		cur->active_lock.id = 0;
> >  
> > -		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
> > -			int err;
> > +		invalidate_non_owning_refs(env, &cur->active_lock);
> 
> +1
> 
> > -			/* Complain on error because this reference state cannot
> > -			 * be freed before this point, as bpf_spin_lock critical
> > -			 * section does not allow functions that release the
> > -			 * allocated object immediately.
> > -			 */
> > -			if (!fstate->refs[i].release_on_unlock)
> > -				continue;
> > -			err = release_reference(env, fstate->refs[i].id);
> > -			if (err) {
> > -				verbose(env, "failed to release release_on_unlock reference");
> > -				return err;
> > -			}
> > -		}
> > +		cur->active_lock.ptr = NULL;
> > +		cur->active_lock.id = 0;
> 
> +1
> 
> >  	}
> >  	return 0;
> >  }
> > @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  	return 0;
> >  }
> >  
> > +static struct btf_field *
> > +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
> > +{
> > +	struct btf_field *field;
> > +	struct btf_record *rec;
> > +
> > +	rec = reg_btf_record(reg);
> > +	if (!reg)
> > +		return NULL;
> > +
> > +	field = btf_record_find(rec, off, fields);
> > +	if (!field)
> > +		return NULL;
> > +
> > +	return field;
> > +}
> 
> Doesn't look like that this helper is really necessary.
> 
> > +
> >  int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  			   const struct bpf_reg_state *reg, int regno,
> >  			   enum bpf_arg_type arg_type)
> > @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  		 */
> >  		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
> >  			return 0;
> > +
> > +		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
> > +			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
> > +				return __check_ptr_off_reg(env, reg, regno, true);
> > +
> > +			verbose(env, "R%d must have zero offset when passed to release func\n",
> > +				regno);
> > +			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
> > +				kernel_type_name(reg->btf, reg->btf_id), reg->off);
> > +			return -EINVAL;
> > +		}
> 
> This bit is only necessary if we mark push_list as KF_RELEASE.
> Just don't add this mark and drop above.
> 
> > +
> >  		/* Doing check_ptr_off_reg check for the offset will catch this
> >  		 * because fixed_off_ok is false, but checking here allows us
> >  		 * to give the user a better error message.
> > @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env,
> >  	return 0;
> >  }
> >  
> > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> > +				       struct bpf_active_lock *lock)
> > +{
> > +	struct bpf_func_state *unused;
> > +	struct bpf_reg_state *reg;
> > +
> > +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> > +		if (reg->non_owning_ref_lock.ptr &&
> > +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
> > +		    reg->non_owning_ref_lock.id == lock->id)
> 
> I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things.
> We're under active spin_lock here. All regs were checked earlier and id keeps incrementing.
> So we can just do 'u32 non_own_ref_obj_id'.
> 
> > +			__mark_reg_unknown(env, reg);
> > +	}));
> > +}
> > +
> >  static void clear_caller_saved_regs(struct bpf_verifier_env *env,
> >  				    struct bpf_reg_state *regs)
> >  {
> > @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
> >  	return meta->kfunc_flags & KF_RELEASE;
> >  }
> >  
> > +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
> > +}
> > +
> 
> No need.
> 
> >  static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
> >  {
> >  	return meta->kfunc_flags & KF_TRUSTED_ARGS;
> > @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
> >  	return 0;
> >  }
> >  
> > -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
> > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> >  {
> > -	struct bpf_func_state *state = cur_func(env);
> > +	struct bpf_verifier_state *state = env->cur_state;
> > +
> > +	if (!state->active_lock.ptr) {
> > +		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (reg->non_owning_ref_lock.ptr) {
> > +		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	reg->non_owning_ref_lock.id = state->active_lock.id;
> > +	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
> > +	return 0;
> > +}
> > +
> > +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
> > +{
> > +	struct bpf_func_state *state, *unused;
> >  	struct bpf_reg_state *reg;
> >  	int i;
> >  
> > -	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
> > -	 * subprogs, no global functions. This means that the references would
> > -	 * not be released inside the critical section but they may be added to
> > -	 * the reference state, and the acquired_refs are never copied out for a
> > -	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
> > -	 * critical sections.
> > -	 */
> > +	state = cur_func(env);
> > +
> >  	if (!ref_obj_id) {
> > -		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
> > +		verbose(env, "verifier internal error: ref_obj_id is zero for "
> > +			     "owning -> non-owning conversion\n");
> >  		return -EFAULT;
> >  	}
> > +
> >  	for (i = 0; i < state->acquired_refs; i++) {
> > -		if (state->refs[i].id == ref_obj_id) {
> > -			if (state->refs[i].release_on_unlock) {
> > -				verbose(env, "verifier internal error: expected false release_on_unlock");
> > -				return -EFAULT;
> > +		if (state->refs[i].id != ref_obj_id)
> > +			continue;
> > +
> > +		/* Clear ref_obj_id here so release_reference doesn't clobber
> > +		 * the whole reg
> > +		 */
> > +		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> > +			if (reg->ref_obj_id == ref_obj_id) {
> > +				reg->ref_obj_id = 0;
> > +				ref_set_non_owning_lock(env, reg);
> 
> +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no?
> and probably reg_set_non_own_ref_obj_id() ?
> Or just open code it?
> 
> >  			}
> > -			state->refs[i].release_on_unlock = true;
> > -			/* Now mark everyone sharing same ref_obj_id as untrusted */
> > -			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> > -				if (reg->ref_obj_id == ref_obj_id)
> > -					reg->type |= PTR_UNTRUSTED;
> > -			}));
> > -			return 0;
> > -		}
> > +		}));
> > +		return 0;
> >  	}
> > +
> >  	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
> >  	return -EFAULT;
> >  }
> > @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> >  {
> >  	const struct btf_type *et, *t;
> >  	struct btf_field *field;
> > -	struct btf_record *rec;
> >  	u32 list_node_off;
> >  
> >  	if (meta->btf != btf_vmlinux ||
> > @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> >  		return -EINVAL;
> >  	}
> >  
> > -	rec = reg_btf_record(reg);
> >  	list_node_off = reg->off + reg->var_off.value;
> > -	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> > +	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
> >  	if (!field || field->offset != list_node_off) {
> >  		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> >  		return -EINVAL;
> > @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> >  			btf_name_by_offset(field->list_head.btf, et->name_off));
> >  		return -EINVAL;
> >  	}
> > -	/* Set arg#1 for expiration after unlock */
> > -	return ref_set_release_on_unlock(env, reg->ref_obj_id);
> > +
> > +	return 0;
> 
> and here we come to the main point.
> Can you just call
> ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here?
> Everything will be so much simpler, no?
> 
> >  }
> >  
> >  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
> > @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  			    int *insn_idx_p)
> >  {
> >  	const struct btf_type *t, *func, *func_proto, *ptr_type;
> > +	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
> >  	struct bpf_reg_state *regs = cur_regs(env);
> >  	const char *func_name, *ptr_type_name;
> >  	bool sleepable, rcu_lock, rcu_unlock;
> >  	struct bpf_kfunc_call_arg_meta meta;
> > -	u32 i, nargs, func_id, ptr_type_id;
> >  	int err, insn_idx = *insn_idx_p;
> >  	const struct btf_param *args;
> >  	const struct btf_type *ret_t;
> > @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
> >  	 */
> >  	if (meta.release_regno) {
> > -		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
> > +		err = 0;
> > +		release_ref_obj_id = regs[meta.release_regno].ref_obj_id;
> > +
> > +		if (is_kfunc_release_non_own(&meta))
> > +			err = ref_convert_owning_non_owning(env, release_ref_obj_id);
> > +		if (err) {
> > +			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
> > +				func_name, func_id);
> > +			return err;
> > +		}
> > +
> > +		err = release_reference(env, release_ref_obj_id);
> 
> and this bit won't be needed.
> and no need to guess in patch 1 which arg has to be released and converted to non_own.
> 
> >  		if (err) {
> >  			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
> >  				func_name, func_id);
> > -- 
> > 2.30.2
> > 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-29 16:50     ` Alexei Starovoitov
@ 2022-12-29 17:00       ` David Vernet
  2023-01-17 17:26         ` Dave Marchevsky
  0 siblings, 1 reply; 38+ messages in thread
From: David Vernet @ 2022-12-29 17:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Thu, Dec 29, 2022 at 08:50:19AM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 28, 2022 at 10:40 PM David Vernet <void@manifault.com> wrote:
> >
> > On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
> > > Currently, kfuncs marked KF_RELEASE indicate that they release some
> > > previously-acquired arg. The verifier assumes that such a function will
> > > only have one arg reg w/ ref_obj_id set, and that that arg is the one to
> > > be released. Multiple kfunc arg regs have ref_obj_id set is considered
> > > an invalid state.
> > >
> > > For helpers, RELEASE is used to tag a particular arg in the function
> > > proto, not the function itself. The arg with OBJ_RELEASE type tag is the
> > > arg that the helper will release. There can only be one such tagged arg.
> > > When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
> > > also considered an invalid state.
> > >
> > > Later patches in this series will result in some linked_list helpers
> > > marked KF_RELEASE having a valid reason to take two ref_obj_id args.
> > > Specifically, bpf_list_push_{front,back} can push a node to a list head
> > > which is itself part of a list node. In such a scenario both arguments
> > > to these functions would have ref_obj_id > 0, thus would fail
> > > verification under current logic.
> > >
> > > This patch changes kfunc ref_obj_id searching logic to find the last arg
> > > reg w/ ref_obj_id and consider that the reg-to-release. This should be
> > > backwards-compatible with all current kfuncs as they only expect one
> > > such arg reg.
> >
> > Can't say I'm a huge fan of this proposal :-( While I think it's really
> > unfortunate that kfunc flags are not defined per-arg for this exact type
> > of reason, adding more flag-specific semantics like this is IMO a step
> > in the wrong direction.  It's similar to the existing __sz and __k
> > argument-naming semantics that inform the verifier that the arguments
> > have special meaning. All of these little additions of special-case
> > handling for kfunc flags end up requiring people writing kfuncs (and
> > sometimes calling them) to read through the verifier to understand
> > what's going on (though I will say that it's nice that __sz and __k are
> > properly documented in [0]).
> 
> Before getting to pros/cons of KF_* vs name suffix vs helper style
> per-arg description...
> It's important to highlight that here we're talking about
> link list and rb tree kfuncs that are not like other kfuncs.
> Majority of kfuncs can be added by subsystems like hid-bpf
> without touching the verifier.

I hear you and I agree. It wasn't my intention to drag us into a larger
discussion about kfuncs vs. helpers, but rather just to point out that I
think we have to try hard to avoid adding special-case logic that
requires looking into the verifier to understand the semantics. I think
we're on the same page about this, based on this and your other
response.

> Here we're paving the way for graph (aka new gen data structs)
> and so far not only kfuncs, but their arg types have to have
> special handling inside the verifier.
> There is not much yet to generalize and expose as generic KF_
> flag or as a name suffix.
> Therefore I think it's more appropriate to implement them
> with minimal verifier changes and minimal complexity.

Agreed

> There is no 3rd graph algorithm on the horizon after link list
> and rbtree. Instead there is a big todo list for
> 'multi owner graph node' and 'bpf_refcount_t'.

In this case my point in [0] of the only option for generalizing being
to have something like KF_GRAPH_INSERT / KF_GRAPH_REMOVE is just not the
way forward (which I also said was my opinion when I pointed it out as
an option). Let's just special-case these kfuncs. There's already a
precedence for doing that in the verifier anyways. Minimal complexity,
minimal API changes. It's a win-win.

[0]: https://lore.kernel.org/all/Y63GLqZil9l1NzY4@maniforge.lan/

> Those will require bigger changes in the verifier,
> so I'd like to avoid premature generalization :) as analogous
> to premature optimization :)

And of course given my points above and in other threads: agreed. I
think we have an ideal middle-ground for minimizing complexity in the
short term, and some nice follow-on todo-list items to work on in the
medium-long term which will continue to improve things without
(negatively) affecting users in any way. All SGTM

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-29  3:56   ` Alexei Starovoitov
  2022-12-29 16:54     ` David Vernet
@ 2023-01-17 16:07     ` Dave Marchevsky
  2023-01-17 16:56       ` Alexei Starovoitov
  1 sibling, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2023-01-17 16:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/28/22 10:56 PM, Alexei Starovoitov wrote:
> On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote:
>> This patch introduces non-owning reference semantics to the verifier,
>> specifically linked_list API kfunc handling. release_on_unlock logic for
>> refs is refactored - with small functional changes - to implement these
>> semantics, and bpf_list_push_{front,back} are migrated to use them.
>>
>> When a list node is pushed to a list, the program still has a pointer to
>> the node:
>>
>>   n = bpf_obj_new(typeof(*n));
>>
>>   bpf_spin_lock(&l);
>>   bpf_list_push_back(&l, n);
>>   /* n still points to the just-added node */
>>   bpf_spin_unlock(&l);
>>
>> What the verifier considers n to be after the push, and thus what can be
>> done with n, are changed by this patch.
>>
>> Common properties both before/after this patch:
>>   * After push, n is only a valid reference to the node until end of
>>     critical section
>>   * After push, n cannot be pushed to any list
>>   * After push, the program can read the node's fields using n
> 
> correct.
> 
>> Before:
>>   * After push, n retains the ref_obj_id which it received on
>>     bpf_obj_new, but the associated bpf_reference_state's
>>     release_on_unlock field is set to true
>>     * release_on_unlock field and associated logic is used to implement
>>       "n is only a valid ref until end of critical section"
>>   * After push, n cannot be written to, the node must be removed from
>>     the list before writing to its fields
>>   * After push, n is marked PTR_UNTRUSTED
> 
> yep
> 
>> After:
>>   * After push, n's ref is released and ref_obj_id set to 0. The
>>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>>     currently active lock
>>     * non_owning_ref_lock and logic is used to implement "n is only a
>>       valid ref until end of critical section"
>>   * n can be written to (except for special fields e.g. bpf_list_node,
>>     timer, ...)
>>   * No special type flag is added to n after push
> 
> yep.
> Great summary.
> 
>> Summary of specific implementation changes to achieve the above:
>>
>>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>>     to "release on unlock" based on that field are removed
> 
> +1 
> 
>>   * The anonymous active_lock struct used by bpf_verifier_state is
>>     pulled out into a named struct bpf_active_lock.
> ...
>>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>>     bpf_reg_state's PTR_TO_BTF_ID union
> 
> not great. see below.
> 
>>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>>     ref semantics as described above
>>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>>       matching a particular bpf_active_lock identity. Replaces
>>       release_on_unlock logic in process_spin_lock.
> 
> +1
> 
>>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>>       on current verifier state
> 
> +1
> 
>>     * ref_convert_owning_non_owning - convert owning reference w/
>>       specified ref_obj_id to non-owning references. Setup
>>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>>       its ref_obj_id
> 
> +1
> 
>>   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
>>     KF_RELEASE to indicate that the release arg reg should be converted
>>     to non-owning ref
>>     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
>>       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
>>       doing ref_convert_owning_non_owning on the ref first, which
>>       prevents the regs from being clobbered by 0ing out their
>>       ref_obj_ids. The bpf_reference_state itself is still released via
>>       release_reference as a result of the KF_RELEASE flag.
>>     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
>>       bpf_list_push_{front,back}
> 
> And this bit is confusing and not generalizable.
> As David noticed in his reply KF_RELEASE_NON_OWN is not a great name.
> It's hard to come up with a good name and it won't be generic anyway.
> The ref_convert_owning_non_owning has to be applied to a specific arg.
> The function itself is not KF_RELEASE in the current definition of it.
> The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new
> that should have been generic, but doesn't really work this way.
> In the next patches rbtree_root/node still has to have all the custom
> logic.
> KF_RELEASE_NON_OWN by itself is a nonsensical flag.
> Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes
> sense, but still hard to understand what releases what.
> More below.
> 

Addressed below (in response to your 'here we come to the main point'
comment).

>> After these changes, linked_list's "release on unlock" logic continues
>> to function as before, except for the semantic differences noted above.
>> The patch immediately following this one makes minor changes to
>> linked_list selftests to account for the differing behavior.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>  include/linux/bpf.h          |   1 +
>>  include/linux/bpf_verifier.h |  39 ++++-----
>>  include/linux/btf.h          |  17 ++--
>>  kernel/bpf/helpers.c         |   4 +-
>>  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
>>  5 files changed, 146 insertions(+), 79 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 3de24cfb7a3d..f71571bf6adc 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -180,6 +180,7 @@ enum btf_field_type {
>>  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>>  	BPF_LIST_HEAD  = (1 << 4),
>>  	BPF_LIST_NODE  = (1 << 5),
>> +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
>>  };
>>  
>>  struct btf_field_kptr {
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 53d175cbaa02..cb417ffbbb84 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>>  };
>>  
>> +/* For every reg representing a map value or allocated object pointer,
>> + * we consider the tuple of (ptr, id) for them to be unique in verifier
>> + * context and conside them to not alias each other for the purposes of
>> + * tracking lock state.
>> + */
>> +struct bpf_active_lock {
>> +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
>> +	 * there's no active lock held, and other fields have no
>> +	 * meaning. If non-NULL, it indicates that a lock is held and
>> +	 * id member has the reg->id of the register which can be >= 0.
>> +	 */
>> +	void *ptr;
>> +	/* This will be reg->id */
>> +	u32 id;
>> +};
>> +
>>  struct bpf_reg_state {
>>  	/* Ordering of fields matters.  See states_equal() */
>>  	enum bpf_reg_type type;
>> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>>  		struct {
>>  			struct btf *btf;
>>  			u32 btf_id;
>> +			struct bpf_active_lock non_owning_ref_lock;
> 
> In your other email you argue that pointer should be enough.
> I suspect that won't be correct.
> See fixes that Andrii did in states_equal() and regsafe().
> In particular:
>         if (!!old->active_lock.id != !!cur->active_lock.id)
>                 return false;
> 
>         if (old->active_lock.id &&
>             !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch))
>                 return false;
> 
> We have to do the comparison of this new ID via idmap as well.
> 
> I think introduction of struct bpf_active_lock  and addition of it
> to bpf_reg_state is overkill.
> Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe().
> I'm guessing you didn't like my 'active_lock_id' suggestion. Fine.
> non_own_ref_obj_id would match existing ref_obj_id at least.
> 
>>  		};
>>  
>>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
>> @@ -223,11 +240,6 @@ struct bpf_reference_state {
>>  	 * exiting a callback function.
>>  	 */
>>  	int callback_ref;
>> -	/* Mark the reference state to release the registers sharing the same id
>> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
>> -	 * safe to access inside the critical section).
>> -	 */
>> -	bool release_on_unlock;
>>  };
>>  
>>  /* state of the program:
>> @@ -328,21 +340,8 @@ struct bpf_verifier_state {
>>  	u32 branches;
>>  	u32 insn_idx;
>>  	u32 curframe;
>> -	/* For every reg representing a map value or allocated object pointer,
>> -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
>> -	 * context and conside them to not alias each other for the purposes of
>> -	 * tracking lock state.
>> -	 */
>> -	struct {
>> -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
>> -		 * there's no active lock held, and other fields have no
>> -		 * meaning. If non-NULL, it indicates that a lock is held and
>> -		 * id member has the reg->id of the register which can be >= 0.
>> -		 */
>> -		void *ptr;
>> -		/* This will be reg->id */
>> -		u32 id;
>> -	} active_lock;
> 
> I would keep it as-is.
> 
>> +
>> +	struct bpf_active_lock active_lock;
>>  	bool speculative;
>>  	bool active_rcu_lock;
>>  
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index 5f628f323442..8aee3f7f4248 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>> @@ -15,10 +15,10 @@
>>  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
>>  
>>  /* These need to be macros, as the expressions are used in assembler input */
>> -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
>> -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
>> -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
>> -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
>> +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
>> +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
>> +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
>> +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
>>  /* Trusted arguments are those which are guaranteed to be valid when passed to
>>   * the kfunc. It is used to enforce that pointers obtained from either acquire
>>   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
>> @@ -67,10 +67,11 @@
>>   *	return 0;
>>   * }
>>   */
>> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
>> -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
>> -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
>> -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
>> +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
>> +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
>> +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
>> +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
>> +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
> 
> No need for this flag.
> 

Addressed below (in re: your 'here we come to the main point' comment)

>>  /*
>>   * Return the name of the passed struct, if exists, or halt the build if for
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index af30c6cbd65d..e041409779c3 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>>  #endif
>>  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>>  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
>> -BTF_ID_FLAGS(func, bpf_list_push_front)
>> -BTF_ID_FLAGS(func, bpf_list_push_back)
>> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
>> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)
> 
> No need for this.
> 

Addressed below (in re: your 'here we come to the main point' comment)

>>  BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
>>  BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
>>  BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 824e2242eae5..84b0660e2a76 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
>>  
>>  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>>  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
>> +				       struct bpf_active_lock *lock);
>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
>> +				   struct bpf_reg_state *reg);
>>  
>>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>>  {
>> @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>>  				verbose_a("id=%d", reg->id);
>>  			if (reg->ref_obj_id)
>>  				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
>> +			if (reg->non_owning_ref_lock.ptr)
>> +				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
>> +					  reg->non_owning_ref_lock.id);
>>  			if (t != SCALAR_VALUE)
>>  				verbose_a("off=%d", reg->off);
>>  			if (type_is_pkt_pointer(t))
>> @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>  			return -EACCES;
>>  		}
>>  
>> -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
>> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
>> +		    !reg->non_owning_ref_lock.ptr) {
>>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>>  			return -EFAULT;
>>  		}
>> @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>>  			cur->active_lock.ptr = btf;
>>  		cur->active_lock.id = reg->id;
>>  	} else {
>> -		struct bpf_func_state *fstate = cur_func(env);
>>  		void *ptr;
>> -		int i;
>>  
>>  		if (map)
>>  			ptr = map;
>> @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>>  			verbose(env, "bpf_spin_unlock of different lock\n");
>>  			return -EINVAL;
>>  		}
>> -		cur->active_lock.ptr = NULL;
>> -		cur->active_lock.id = 0;
>>  
>> -		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
>> -			int err;
>> +		invalidate_non_owning_refs(env, &cur->active_lock);
> 
> +1
> 
>> -			/* Complain on error because this reference state cannot
>> -			 * be freed before this point, as bpf_spin_lock critical
>> -			 * section does not allow functions that release the
>> -			 * allocated object immediately.
>> -			 */
>> -			if (!fstate->refs[i].release_on_unlock)
>> -				continue;
>> -			err = release_reference(env, fstate->refs[i].id);
>> -			if (err) {
>> -				verbose(env, "failed to release release_on_unlock reference");
>> -				return err;
>> -			}
>> -		}
>> +		cur->active_lock.ptr = NULL;
>> +		cur->active_lock.id = 0;
> 
> +1
> 
>>  	}
>>  	return 0;
>>  }
>> @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>>  	return 0;
>>  }
>>  
>> +static struct btf_field *
>> +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
>> +{
>> +	struct btf_field *field;
>> +	struct btf_record *rec;
>> +
>> +	rec = reg_btf_record(reg);
>> +	if (!reg)
>> +		return NULL;
>> +
>> +	field = btf_record_find(rec, off, fields);
>> +	if (!field)
>> +		return NULL;
>> +
>> +	return field;
>> +}
> 
> Doesn't look like that this helper is really necessary.
> 

The helper is used in other places in the series. It saves some boilerplate
since usually when reg's btf_record is fetched it's just to look for the 
presence of some field.

If all uses of the helper in this patch are removed, I will move the def to
first patch where it's used when I respin.

>> +
>>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>  			   const struct bpf_reg_state *reg, int regno,
>>  			   enum bpf_arg_type arg_type)
>> @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>  		 */
>>  		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
>>  			return 0;
>> +
>> +		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
>> +			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
>> +				return __check_ptr_off_reg(env, reg, regno, true);
>> +
>> +			verbose(env, "R%d must have zero offset when passed to release func\n",
>> +				regno);
>> +			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
>> +				kernel_type_name(reg->btf, reg->btf_id), reg->off);
>> +			return -EINVAL;
>> +		}
> 
> This bit is only necessary if we mark push_list as KF_RELEASE.
> Just don't add this mark and drop above.
> 

Addressed below (in re: your 'here we come to the main point' comment)

>> +
>>  		/* Doing check_ptr_off_reg check for the offset will catch this
>>  		 * because fixed_off_ok is false, but checking here allows us
>>  		 * to give the user a better error message.
>> @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env,
>>  	return 0;
>>  }
>>  
>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
>> +				       struct bpf_active_lock *lock)
>> +{
>> +	struct bpf_func_state *unused;
>> +	struct bpf_reg_state *reg;
>> +
>> +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
>> +		if (reg->non_owning_ref_lock.ptr &&
>> +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
>> +		    reg->non_owning_ref_lock.id == lock->id)
> 
> I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things.
> We're under active spin_lock here. All regs were checked earlier and id keeps incrementing.
> So we can just do 'u32 non_own_ref_obj_id'.
> 
>> +			__mark_reg_unknown(env, reg);
>> +	}));
>> +}
>> +
>>  static void clear_caller_saved_regs(struct bpf_verifier_env *env,
>>  				    struct bpf_reg_state *regs)
>>  {
>> @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
>>  	return meta->kfunc_flags & KF_RELEASE;
>>  }
>>  
>> +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
>> +{
>> +	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
>> +}
>> +
> 
> No need.
> 

Addressed below (in re: your 'here we come to the main point' comment)

>>  static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
>>  {
>>  	return meta->kfunc_flags & KF_TRUSTED_ARGS;
>> @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
>>  	return 0;
>>  }
>>  
>> -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>  {
>> -	struct bpf_func_state *state = cur_func(env);
>> +	struct bpf_verifier_state *state = env->cur_state;
>> +
>> +	if (!state->active_lock.ptr) {
>> +		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	if (reg->non_owning_ref_lock.ptr) {
>> +		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	reg->non_owning_ref_lock.id = state->active_lock.id;
>> +	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
>> +	return 0;
>> +}
>> +
>> +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
>> +{
>> +	struct bpf_func_state *state, *unused;
>>  	struct bpf_reg_state *reg;
>>  	int i;
>>  
>> -	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
>> -	 * subprogs, no global functions. This means that the references would
>> -	 * not be released inside the critical section but they may be added to
>> -	 * the reference state, and the acquired_refs are never copied out for a
>> -	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
>> -	 * critical sections.
>> -	 */
>> +	state = cur_func(env);
>> +
>>  	if (!ref_obj_id) {
>> -		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
>> +		verbose(env, "verifier internal error: ref_obj_id is zero for "
>> +			     "owning -> non-owning conversion\n");
>>  		return -EFAULT;
>>  	}
>> +
>>  	for (i = 0; i < state->acquired_refs; i++) {
>> -		if (state->refs[i].id == ref_obj_id) {
>> -			if (state->refs[i].release_on_unlock) {
>> -				verbose(env, "verifier internal error: expected false release_on_unlock");
>> -				return -EFAULT;
>> +		if (state->refs[i].id != ref_obj_id)
>> +			continue;
>> +
>> +		/* Clear ref_obj_id here so release_reference doesn't clobber
>> +		 * the whole reg
>> +		 */
>> +		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
>> +			if (reg->ref_obj_id == ref_obj_id) {
>> +				reg->ref_obj_id = 0;
>> +				ref_set_non_owning_lock(env, reg);
> 
> +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no?
> and probably reg_set_non_own_ref_obj_id() ?
> Or just open code it?
> 

I like reg_set_... Will change

>>  			}
>> -			state->refs[i].release_on_unlock = true;
>> -			/* Now mark everyone sharing same ref_obj_id as untrusted */
>> -			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>> -				if (reg->ref_obj_id == ref_obj_id)
>> -					reg->type |= PTR_UNTRUSTED;
>> -			}));
>> -			return 0;
>> -		}
>> +		}));
>> +		return 0;
>>  	}
>> +
>>  	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
>>  	return -EFAULT;
>>  }
>> @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>  {
>>  	const struct btf_type *et, *t;
>>  	struct btf_field *field;
>> -	struct btf_record *rec;
>>  	u32 list_node_off;
>>  
>>  	if (meta->btf != btf_vmlinux ||
>> @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>  		return -EINVAL;
>>  	}
>>  
>> -	rec = reg_btf_record(reg);
>>  	list_node_off = reg->off + reg->var_off.value;
>> -	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
>> +	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
>>  	if (!field || field->offset != list_node_off) {
>>  		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
>>  		return -EINVAL;
>> @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>  			btf_name_by_offset(field->list_head.btf, et->name_off));
>>  		return -EINVAL;
>>  	}
>> -	/* Set arg#1 for expiration after unlock */
>> -	return ref_set_release_on_unlock(env, reg->ref_obj_id);
>> +
>> +	return 0;
> 
> and here we come to the main point.
> Can you just call
> ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here?
> Everything will be so much simpler, no?
> 

IIUC, your proposal here is what you'd like me to do instead of
KF_RELEASE_NON_OWN kfunc flag. I think all the points you're making are
interrelated, so I'll summarize my understanding of them and reply to them as a
group here.

1) KF_RELEASE_NON_OWN shouldn't depend on KF_RELEASE and thus shouldn't require
   the flags to be used together. Why? It's confusing and KF_RELEASE_NON_OWN
   isn't really doing a 'release' by current KF_RELEASE semantics since it's
   converting to non-owning.

This point seems reasonable to me. KF_RELEASE_NON_OWN wants to do many things
that KF_RELEASE does, and in the same places (e.g. release_reference(),
confirm that there's a referenced arg passed in). Adding KF_RELEASE_NON_OWN
logic as a special-casing of KF_RELEASE logic in a few places reduced amount of
changes to verifier, at the cost of tying the flags together.

That cost doesn't seem worth it if it's confusing to read and muddies the
meaning of 'RELEASE'. So agreed. Will make KF_RELEASE_NON_OWN logic separate
from KF_RELEASE.


2) process_kf_arg_ptr_to_list_node, renamed __process_kf_arg_ptr_to_graph_node
   in further patches in the series, should handle owning -> non-owning
   conversion, instead of relying on a kfunc flag.

This is how the implementation in v1 worked. In [0] both list_head and rb_root
start using same __process_kf_arg_ptr_to_datastructure_node helper, which does
ref_set_release_on_unlock. Then in [1] the ref_set_release_on_unlock call
is moved out of the __process helper because process_kf_arg_ptr_to_rbtree_node
needs to special-case its behavior for bpf_rbtree_remove.

That special casing led me to move away from doing the owning -> non owning
conversion in the process_ helpers. My logic was as follows:

  * process_kf_arg_ptr_to_{list_node,rbtree_node} act on the arg reg when it's
    a specific type, but owning -> non-owning conversion is more of a function-
    level property. e.g. rbtree_add sees an owning rbtree_node and converts it 
    to non-owning, but not all functions which take an owning rbtree_node will
    want to do this.
    * Really it's both an arg/type-level behavior and a function-level behavior.
      stable BPF helper func proto's .arg1_type w/ base type and flags would
      be a better way of expressing this, IMO, as it'd remove the need to search
      for the arg-to-release with the current KF_RELEASE kfunc-level flag.
  * The 'deeper' in arg reg helper functions special-casing based on function is
    , the harder code becomes to understand.
    * Retval special-casing based on function is less confusing since that's
      usually in 'check_kfunc_call' with minimal helper usage. This is why
      I kept some special-cased logic for rbtree_remove retval in this series,
      although ideally there would be a named semantic for that as well.

But as you mention in this patch and others, we can have this be function-level
behavior without adding a new kfunc flag, by special-casing in the appropriate
spots. This suggestion is pretty reasonable to me, but I'd like to make the case
for keeping the named kfunc flags.

I made a mistake when I used 'generalize' to describe the purpose of moving
logic behind kfunc flags. You and David Vernet correctly state that it's not
likely that the logic will be used by many other kfuncs; even other graph
datastructure API kfuncs won't be that numerous, so is it really 'general'
functionality?

Really what I meant by 'generalize' was 'give this behavior a name that clearly
ties it to graph datastructure semantics'. By 'clearly ties it to ...', I mean:

  * It should be obvious that the named semantic is not unique to the particular
    kfunc
  * It should be obvious that the named semantic is tied to other named graph
    datastructure semantics
  * When specific semantics are discussed in the documentation or on the mailing
    list, it should be easy to tie the concept being discussed to some specific
    code without ambiguity

Personally, whenever I see "func_id == BPF_FUNC_whatever" (or kfunc equivalent),
it's not clear to me whether the logic follows is unique to the helper or is due
to the helper being e.g. "special dynptr helper". For this graph ds stuff
specifically, you had trouble understanding what I wanted to do until we stepped
back from the specific implementation and talked about general semantics of
what args / retval look like before / after kfunc call. Since we nailed down the
semantics - in some detail - in earlier convos, and decided to document them
outside of the code, it made sense to me to give them top-level names.

Your (and David's) comment that "KF_RELEASE_NON_OWN is not a great name"
is IMO acknowledgement that giving the semantic a _good_ name would be useful.

How about I try to make the names better in v3 instead of removing the kfunc
flags entirely? If you're still opposed after that, I will instead add helpers
with comments like:

    /* Implement 'release non owning reference' semantic as described by graph
     * ds documentation
     */
     void graph_ds_release_non_own_ref() { ... }

To satisfy my bullets above.

  [0]: lore.kernel.org/bpf/20221206231000.3180914-8-davemarchevsky@fb.com
  [1]: lore.kernel.org/bpf/20221206231000.3180914-10-davemarchevsky@fb.com

>>  }
>>  
>>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
>> @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  			    int *insn_idx_p)
>>  {
>>  	const struct btf_type *t, *func, *func_proto, *ptr_type;
>> +	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
>>  	struct bpf_reg_state *regs = cur_regs(env);
>>  	const char *func_name, *ptr_type_name;
>>  	bool sleepable, rcu_lock, rcu_unlock;
>>  	struct bpf_kfunc_call_arg_meta meta;
>> -	u32 i, nargs, func_id, ptr_type_id;
>>  	int err, insn_idx = *insn_idx_p;
>>  	const struct btf_param *args;
>>  	const struct btf_type *ret_t;
>> @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
>>  	 */
>>  	if (meta.release_regno) {
>> -		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
>> +		err = 0;
>> +		release_ref_obj_id = regs[meta.release_regno].ref_obj_id;
>> +
>> +		if (is_kfunc_release_non_own(&meta))
>> +			err = ref_convert_owning_non_owning(env, release_ref_obj_id);
>> +		if (err) {
>> +			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
>> +				func_name, func_id);
>> +			return err;
>> +		}
>> +
>> +		err = release_reference(env, release_ref_obj_id);
> 
> and this bit won't be needed.
> and no need to guess in patch 1 which arg has to be released and converted to non_own.
> 

Addressed above (in re: your 'here we come to the main point' comment)

>>  		if (err) {
>>  			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
>>  				func_name, func_id);
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2022-12-29 16:54     ` David Vernet
@ 2023-01-17 16:54       ` Dave Marchevsky
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2023-01-17 16:54 UTC (permalink / raw)
  To: David Vernet, Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/29/22 11:54 AM, David Vernet wrote:
> On Wed, Dec 28, 2022 at 07:56:00PM -0800, Alexei Starovoitov wrote:
>> On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote:
>>> This patch introduces non-owning reference semantics to the verifier,
>>> specifically linked_list API kfunc handling. release_on_unlock logic for
>>> refs is refactored - with small functional changes - to implement these
>>> semantics, and bpf_list_push_{front,back} are migrated to use them.
>>>
>>> When a list node is pushed to a list, the program still has a pointer to
>>> the node:
>>>
>>>   n = bpf_obj_new(typeof(*n));
>>>
>>>   bpf_spin_lock(&l);
>>>   bpf_list_push_back(&l, n);
>>>   /* n still points to the just-added node */
>>>   bpf_spin_unlock(&l);
>>>
>>> What the verifier considers n to be after the push, and thus what can be
>>> done with n, are changed by this patch.
>>>
>>> Common properties both before/after this patch:
>>>   * After push, n is only a valid reference to the node until end of
>>>     critical section
>>>   * After push, n cannot be pushed to any list
>>>   * After push, the program can read the node's fields using n
>>
>> correct.
>>
>>> Before:
>>>   * After push, n retains the ref_obj_id which it received on
>>>     bpf_obj_new, but the associated bpf_reference_state's
>>>     release_on_unlock field is set to true
>>>     * release_on_unlock field and associated logic is used to implement
>>>       "n is only a valid ref until end of critical section"
>>>   * After push, n cannot be written to, the node must be removed from
>>>     the list before writing to its fields
>>>   * After push, n is marked PTR_UNTRUSTED
>>
>> yep
>>
>>> After:
>>>   * After push, n's ref is released and ref_obj_id set to 0. The
>>>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>>>     currently active lock
>>>     * non_owning_ref_lock and logic is used to implement "n is only a
>>>       valid ref until end of critical section"
>>>   * n can be written to (except for special fields e.g. bpf_list_node,
>>>     timer, ...)
>>>   * No special type flag is added to n after push
>>
>> yep.
>> Great summary.
>>
>>> Summary of specific implementation changes to achieve the above:
>>>
>>>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>>>     to "release on unlock" based on that field are removed
>>
>> +1 
>>
>>>   * The anonymous active_lock struct used by bpf_verifier_state is
>>>     pulled out into a named struct bpf_active_lock.
>> ...
>>>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>>>     bpf_reg_state's PTR_TO_BTF_ID union
>>
>> not great. see below.
>>
>>>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>>>     ref semantics as described above
>>>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>>>       matching a particular bpf_active_lock identity. Replaces
>>>       release_on_unlock logic in process_spin_lock.
>>
>> +1
>>
>>>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>>>       on current verifier state
>>
>> +1
>>
>>>     * ref_convert_owning_non_owning - convert owning reference w/
>>>       specified ref_obj_id to non-owning references. Setup
>>>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>>>       its ref_obj_id
>>
>> +1
>>
>>>   * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with
>>>     KF_RELEASE to indicate that the release arg reg should be converted
>>>     to non-owning ref
>>>     * Plain KF_RELEASE would clobber all regs with ref_obj_id matching
>>>       the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first -
>>>       doing ref_convert_owning_non_owning on the ref first, which
>>>       prevents the regs from being clobbered by 0ing out their
>>>       ref_obj_ids. The bpf_reference_state itself is still released via
>>>       release_reference as a result of the KF_RELEASE flag.
>>>     * KF_RELEASE | KF_RELEASE_NON_OWN are added to
>>>       bpf_list_push_{front,back}
>>
>> And this bit is confusing and not generalizable.
> 
> +1 on both counts. If we want to make it generalizable, I think the only
> way to do would be to generalize it across different graph map types.
> For example, to have kfunc flags like KF_GRAPH_INSERT and
> KF_GRAPH_REMOVE which signal to the verifier that "for this graph-type
> map which has a spin-lock associated with its root node that I expect to
> be held, I've {inserted, removed} the node {to, from} the graph, so
> adjust the refcnt / pointer type accordingly and then clean up when the
> lock is dropped."
> 
> I don't see any reason to add kfunc flags for that though, as the fact
> that the pointer in question refers to a node that has a root node that
> has a lock associated with it is already itself a special-case scenario.
> I think we should just special-case these kfuncs in the verifier as
> "graph-type" kfuncs in some static global array(s).  That's probably
> less error prone anyways, and I don't see the typical kfunc writer ever
> needing to do this.
> 

re: "generalizable" and "why add a kfunc flag at all", I addressed that in
a side-reply to the msg which you're replying to here [0].

But to address your specific points:

"the fact that the pointer in question refers to a node that has a root node
that has a lock associated with it is already itself a special-case scenario"

Are you trying to say here that because the arg is of a special type, special
behavior should be tied to that arg type instead of the function? If so, that's
addressed in [0]. A function with KF_RELEASE_NON_OWN semantics certainly does
need args of a certain type in order to do its thing, but the semantic is really
a function-level thing. If we can tag the function with it, then later check arg
regs, that's preferable to checking kfunc id while processing arg regs, as the
latter conflates "the arg is of this special type so the function should do X"
with "the function does X so it must have an arg with this special type".

  [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/

>> As David noticed in his reply KF_RELEASE_NON_OWN is not a great name.
>> It's hard to come up with a good name and it won't be generic anyway.
>> The ref_convert_owning_non_owning has to be applied to a specific arg.
>> The function itself is not KF_RELEASE in the current definition of it.
>> The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new
>> that should have been generic, but doesn't really work this way.
>> In the next patches rbtree_root/node still has to have all the custom
>> logic.
>> KF_RELEASE_NON_OWN by itself is a nonsensical flag.
> 
> IMO if a flag doesn't make any sense on its own, or even possibly if it
> needs to be mutually exclusive with one or more other flags, it is
> probably never a correct building block. Even KF_TRUSTED_ARGS doesn't
> really make sense, as it's redundant if KF_RCU is specified. This is
> fine though, as IIUC our long-term plan is to get rid of KF_TRUSTED_ARGS
> and make it the default behavior for all kfuncs (not trying to hijack
> this thread with a tangential discussion about KF_TRUSTED_ARGS, just
> using this as an opportunity to point out something to keep in mind as
> we continue to add kfunc flags down the road).
> 

I'm fine with making KF_RELEASE_NON_OWN not depend on KF_RELEASE. Addressed in
[0] above.

>> Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes
>> sense, but still hard to understand what releases what.
> 
> I agree and I think this is an important point. IMO it is a worse
> tradeoff to try to generalize this by complicating the definition of a
> reference than it is to keep the refcounting APIs straightforward and
> well defined. As a basic building block, having an owning refcount
> should mean one thing: that the object will not be destroyed and is safe
> to dereference. When you start mixing in these graph-specific notions of
> references meaning different things in specific contexts, it compromises
> that and makes the API significantly less usable and extensible.
> 

"Generalize" was the wrong word for me to use here. Addressed in [0] above.

Regarding polluting the meaning of "reference": owning and non-owning references
are intentionally scoped to graph datastructures only, and have well-defined and
documented meaning in that context. Elsewhere in the verifier "reference",
"owning refcount", etc are not well-defined as folks have been adding whatever
semantics they need to get their stuff working for some time. Scoping these
new concepts to graph datastructures only is my attempt at making progress
without adding to that confusion.

> For example, at some point we may decide to add something like a
> kptr_weak_ref which would function exactly like an std::weak_ptr, except
> of course that it would wrap a kptr_ref instead of an std::shared_ptr.
> IMO something like that is a more natural and generalizable building
> block that cleanly complements refcounting as it exists today.
> 

Any functionality that implements the desired semantics for rbtree / linked_list
is fine with me. If it's a superset of what I'm adding here, happy to migrate.

If changes to rbtree/linked_list APIs are needed to make such migration
possible, luckily it's all unstable kptr/kfunc, so that better future state
isn't blocked by these semantics / implementation.

All this next-gen datastructure work has been an exercise in YAGNI and scope
reduction. Luckily since it's all unstable API we're not backing ourselves
into any corners by doing so.

re "std::weak_ptr" idea more specifically - despite obvious similarities to
some rust or cpp concepts, I've been intentionally avoiding trying to sell this
work as such or copying semantics wholesale. Better to focus on the specific
things needed to move forward and avoid starting big-scope arguments like
"should we just add std::shared_ptr semantics?" "should the verifier be doing
'borrow checking' similar to rust, and if so to what extent". Don't get me
wrong, I'd find such discussions interesting, but a YAGNI approach where such
functionality is gradually added in response to concrete usecases will likely
save much contentious back-and-forth.

>> More below.
>>
>>> After these changes, linked_list's "release on unlock" logic continues
>>> to function as before, except for the semantic differences noted above.
>>> The patch immediately following this one makes minor changes to
>>> linked_list selftests to account for the differing behavior.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>  include/linux/bpf.h          |   1 +
>>>  include/linux/bpf_verifier.h |  39 ++++-----
>>>  include/linux/btf.h          |  17 ++--
>>>  kernel/bpf/helpers.c         |   4 +-
>>>  kernel/bpf/verifier.c        | 164 ++++++++++++++++++++++++-----------
>>>  5 files changed, 146 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index 3de24cfb7a3d..f71571bf6adc 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -180,6 +180,7 @@ enum btf_field_type {
>>>  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>>>  	BPF_LIST_HEAD  = (1 << 4),
>>>  	BPF_LIST_NODE  = (1 << 5),
>>> +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
> 
> Can you update the rest of the elements here to keep common indentation?
> 

Ack

>>>  };
>>>  
>>>  struct btf_field_kptr {
>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>> index 53d175cbaa02..cb417ffbbb84 100644
>>> --- a/include/linux/bpf_verifier.h
>>> +++ b/include/linux/bpf_verifier.h
>>> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>>>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>>>  };
>>>  
>>> +/* For every reg representing a map value or allocated object pointer,
>>> + * we consider the tuple of (ptr, id) for them to be unique in verifier
>>> + * context and conside them to not alias each other for the purposes of
>>> + * tracking lock state.
>>> + */
>>> +struct bpf_active_lock {
>>> +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
>>> +	 * there's no active lock held, and other fields have no
>>> +	 * meaning. If non-NULL, it indicates that a lock is held and
>>> +	 * id member has the reg->id of the register which can be >= 0.
>>> +	 */
>>> +	void *ptr;
>>> +	/* This will be reg->id */
>>> +	u32 id;
>>> +};
>>> +
>>>  struct bpf_reg_state {
>>>  	/* Ordering of fields matters.  See states_equal() */
>>>  	enum bpf_reg_type type;
>>> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>>>  		struct {
>>>  			struct btf *btf;
>>>  			u32 btf_id;
>>> +			struct bpf_active_lock non_owning_ref_lock;
>>
>> In your other email you argue that pointer should be enough.
>> I suspect that won't be correct.
>> See fixes that Andrii did in states_equal() and regsafe().
>> In particular:
>>         if (!!old->active_lock.id != !!cur->active_lock.id)
>>                 return false;
>>
>>         if (old->active_lock.id &&
>>             !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch))
>>                 return false;
>>
>> We have to do the comparison of this new ID via idmap as well.
>>
>> I think introduction of struct bpf_active_lock  and addition of it
>> to bpf_reg_state is overkill.
>> Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe().
>> I'm guessing you didn't like my 'active_lock_id' suggestion. Fine.
>> non_own_ref_obj_id would match existing ref_obj_id at least.
>>
>>>  		};
>>>  
>>>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
>>> @@ -223,11 +240,6 @@ struct bpf_reference_state {
>>>  	 * exiting a callback function.
>>>  	 */
>>>  	int callback_ref;
>>> -	/* Mark the reference state to release the registers sharing the same id
>>> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
>>> -	 * safe to access inside the critical section).
>>> -	 */
>>> -	bool release_on_unlock;
>>>  };
>>>  
>>>  /* state of the program:
>>> @@ -328,21 +340,8 @@ struct bpf_verifier_state {
>>>  	u32 branches;
>>>  	u32 insn_idx;
>>>  	u32 curframe;
>>> -	/* For every reg representing a map value or allocated object pointer,
>>> -	 * we consider the tuple of (ptr, id) for them to be unique in verifier
>>> -	 * context and conside them to not alias each other for the purposes of
>>> -	 * tracking lock state.
>>> -	 */
>>> -	struct {
>>> -		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
>>> -		 * there's no active lock held, and other fields have no
>>> -		 * meaning. If non-NULL, it indicates that a lock is held and
>>> -		 * id member has the reg->id of the register which can be >= 0.
>>> -		 */
>>> -		void *ptr;
>>> -		/* This will be reg->id */
>>> -		u32 id;
>>> -	} active_lock;
>>
>> I would keep it as-is.
>>
>>> +
>>> +	struct bpf_active_lock active_lock;
>>>  	bool speculative;
>>>  	bool active_rcu_lock;
>>>  
>>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>>> index 5f628f323442..8aee3f7f4248 100644
>>> --- a/include/linux/btf.h
>>> +++ b/include/linux/btf.h
>>> @@ -15,10 +15,10 @@
>>>  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
>>>  
>>>  /* These need to be macros, as the expressions are used in assembler input */
>>> -#define KF_ACQUIRE	(1 << 0) /* kfunc is an acquire function */
>>> -#define KF_RELEASE	(1 << 1) /* kfunc is a release function */
>>> -#define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
>>> -#define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
>>> +#define KF_ACQUIRE		(1 << 0) /* kfunc is an acquire function */
>>> +#define KF_RELEASE		(1 << 1) /* kfunc is a release function */
>>> +#define KF_RET_NULL		(1 << 2) /* kfunc returns a pointer that may be NULL */
>>> +#define KF_KPTR_GET		(1 << 3) /* kfunc returns reference to a kptr */
>>>  /* Trusted arguments are those which are guaranteed to be valid when passed to
>>>   * the kfunc. It is used to enforce that pointers obtained from either acquire
>>>   * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
>>> @@ -67,10 +67,11 @@
>>>   *	return 0;
>>>   * }
>>>   */
>>> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
>>> -#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
>>> -#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
>>> -#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
>>> +#define KF_TRUSTED_ARGS	(1 << 4) /* kfunc only takes trusted pointer arguments */
>>> +#define KF_SLEEPABLE		(1 << 5) /* kfunc may sleep */
>>> +#define KF_DESTRUCTIVE		(1 << 6) /* kfunc performs destructive actions */
>>> +#define KF_RCU			(1 << 7) /* kfunc only takes rcu pointer arguments */
>>> +#define KF_RELEASE_NON_OWN	(1 << 8) /* kfunc converts its referenced arg into non-owning ref */
>>
>> No need for this flag.
>>
>>>  /*
>>>   * Return the name of the passed struct, if exists, or halt the build if for
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index af30c6cbd65d..e041409779c3 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>>>  #endif
>>>  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>>>  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
>>> -BTF_ID_FLAGS(func, bpf_list_push_front)
>>> -BTF_ID_FLAGS(func, bpf_list_push_back)
>>> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN)
>>> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN)
>>
>> No need for this.
>>
>>>  BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
>>>  BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
>>>  BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 824e2242eae5..84b0660e2a76 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
>>>  
>>>  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>>>  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
>>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
>>> +				       struct bpf_active_lock *lock);
>>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
>>> +				   struct bpf_reg_state *reg);
>>>  
>>>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>>>  {
>>> @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>>>  				verbose_a("id=%d", reg->id);
>>>  			if (reg->ref_obj_id)
>>>  				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
>>> +			if (reg->non_owning_ref_lock.ptr)
>>> +				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
>>> +					  reg->non_owning_ref_lock.id);
>>>  			if (t != SCALAR_VALUE)
>>>  				verbose_a("off=%d", reg->off);
>>>  			if (type_is_pkt_pointer(t))
>>> @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>>  			return -EACCES;
>>>  		}
>>>  
>>> -		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
>>> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
>>> +		    !reg->non_owning_ref_lock.ptr) {
>>>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>>>  			return -EFAULT;
>>>  		}
>>> @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>>>  			cur->active_lock.ptr = btf;
>>>  		cur->active_lock.id = reg->id;
>>>  	} else {
>>> -		struct bpf_func_state *fstate = cur_func(env);
>>>  		void *ptr;
>>> -		int i;
>>>  
>>>  		if (map)
>>>  			ptr = map;
>>> @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>>>  			verbose(env, "bpf_spin_unlock of different lock\n");
>>>  			return -EINVAL;
>>>  		}
>>> -		cur->active_lock.ptr = NULL;
>>> -		cur->active_lock.id = 0;
>>>  
>>> -		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
>>> -			int err;
>>> +		invalidate_non_owning_refs(env, &cur->active_lock);
>>
>> +1
>>
>>> -			/* Complain on error because this reference state cannot
>>> -			 * be freed before this point, as bpf_spin_lock critical
>>> -			 * section does not allow functions that release the
>>> -			 * allocated object immediately.
>>> -			 */
>>> -			if (!fstate->refs[i].release_on_unlock)
>>> -				continue;
>>> -			err = release_reference(env, fstate->refs[i].id);
>>> -			if (err) {
>>> -				verbose(env, "failed to release release_on_unlock reference");
>>> -				return err;
>>> -			}
>>> -		}
>>> +		cur->active_lock.ptr = NULL;
>>> +		cur->active_lock.id = 0;
>>
>> +1
>>
>>>  	}
>>>  	return 0;
>>>  }
>>> @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>>>  	return 0;
>>>  }
>>>  
>>> +static struct btf_field *
>>> +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
>>> +{
>>> +	struct btf_field *field;
>>> +	struct btf_record *rec;
>>> +
>>> +	rec = reg_btf_record(reg);
>>> +	if (!reg)
>>> +		return NULL;
>>> +
>>> +	field = btf_record_find(rec, off, fields);
>>> +	if (!field)
>>> +		return NULL;
>>> +
>>> +	return field;
>>> +}
>>
>> Doesn't look like that this helper is really necessary.
>>
>>> +
>>>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>  			   const struct bpf_reg_state *reg, int regno,
>>>  			   enum bpf_arg_type arg_type)
>>> @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>  		 */
>>>  		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
>>>  			return 0;
>>> +
>>> +		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
>>> +			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
>>> +				return __check_ptr_off_reg(env, reg, regno, true);
>>> +
>>> +			verbose(env, "R%d must have zero offset when passed to release func\n",
>>> +				regno);
>>> +			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
>>> +				kernel_type_name(reg->btf, reg->btf_id), reg->off);
>>> +			return -EINVAL;
>>> +		}
>>
>> This bit is only necessary if we mark push_list as KF_RELEASE.
>> Just don't add this mark and drop above.
>>
>>> +
>>>  		/* Doing check_ptr_off_reg check for the offset will catch this
>>>  		 * because fixed_off_ok is false, but checking here allows us
>>>  		 * to give the user a better error message.
>>> @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env,
>>>  	return 0;
>>>  }
>>>  
>>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
>>> +				       struct bpf_active_lock *lock)
>>> +{
>>> +	struct bpf_func_state *unused;
>>> +	struct bpf_reg_state *reg;
>>> +
>>> +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
>>> +		if (reg->non_owning_ref_lock.ptr &&
>>> +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
>>> +		    reg->non_owning_ref_lock.id == lock->id)
>>
>> I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things.
>> We're under active spin_lock here. All regs were checked earlier and id keeps incrementing.
>> So we can just do 'u32 non_own_ref_obj_id'.
>>
>>> +			__mark_reg_unknown(env, reg);
>>> +	}));
>>> +}
>>> +
>>>  static void clear_caller_saved_regs(struct bpf_verifier_env *env,
>>>  				    struct bpf_reg_state *regs)
>>>  {
>>> @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
>>>  	return meta->kfunc_flags & KF_RELEASE;
>>>  }
>>>  
>>> +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta)
>>> +{
>>> +	return meta->kfunc_flags & KF_RELEASE_NON_OWN;
>>> +}
>>> +
>>
>> No need.
>>
>>>  static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
>>>  {
>>>  	return meta->kfunc_flags & KF_TRUSTED_ARGS;
>>> @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
>>>  	return 0;
>>>  }
>>>  
>>> -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
>>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>  {
>>> -	struct bpf_func_state *state = cur_func(env);
>>> +	struct bpf_verifier_state *state = env->cur_state;
>>> +
>>> +	if (!state->active_lock.ptr) {
>>> +		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
>>> +		return -EFAULT;
>>> +	}
>>> +
>>> +	if (reg->non_owning_ref_lock.ptr) {
>>> +		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
>>> +		return -EFAULT;
>>> +	}
>>> +
>>> +	reg->non_owning_ref_lock.id = state->active_lock.id;
>>> +	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
>>> +	return 0;
>>> +}
>>> +
>>> +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
>>> +{
>>> +	struct bpf_func_state *state, *unused;
>>>  	struct bpf_reg_state *reg;
>>>  	int i;
>>>  
>>> -	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
>>> -	 * subprogs, no global functions. This means that the references would
>>> -	 * not be released inside the critical section but they may be added to
>>> -	 * the reference state, and the acquired_refs are never copied out for a
>>> -	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
>>> -	 * critical sections.
>>> -	 */
>>> +	state = cur_func(env);
>>> +
>>>  	if (!ref_obj_id) {
>>> -		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
>>> +		verbose(env, "verifier internal error: ref_obj_id is zero for "
>>> +			     "owning -> non-owning conversion\n");
>>>  		return -EFAULT;
>>>  	}
>>> +
>>>  	for (i = 0; i < state->acquired_refs; i++) {
>>> -		if (state->refs[i].id == ref_obj_id) {
>>> -			if (state->refs[i].release_on_unlock) {
>>> -				verbose(env, "verifier internal error: expected false release_on_unlock");
>>> -				return -EFAULT;
>>> +		if (state->refs[i].id != ref_obj_id)
>>> +			continue;
>>> +
>>> +		/* Clear ref_obj_id here so release_reference doesn't clobber
>>> +		 * the whole reg
>>> +		 */
>>> +		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
>>> +			if (reg->ref_obj_id == ref_obj_id) {
>>> +				reg->ref_obj_id = 0;
>>> +				ref_set_non_owning_lock(env, reg);
>>
>> +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no?
>> and probably reg_set_non_own_ref_obj_id() ?
>> Or just open code it?
>>
>>>  			}
>>> -			state->refs[i].release_on_unlock = true;
>>> -			/* Now mark everyone sharing same ref_obj_id as untrusted */
>>> -			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>> -				if (reg->ref_obj_id == ref_obj_id)
>>> -					reg->type |= PTR_UNTRUSTED;
>>> -			}));
>>> -			return 0;
>>> -		}
>>> +		}));
>>> +		return 0;
>>>  	}
>>> +
>>>  	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
>>>  	return -EFAULT;
>>>  }
>>> @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>>  {
>>>  	const struct btf_type *et, *t;
>>>  	struct btf_field *field;
>>> -	struct btf_record *rec;
>>>  	u32 list_node_off;
>>>  
>>>  	if (meta->btf != btf_vmlinux ||
>>> @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>>  		return -EINVAL;
>>>  	}
>>>  
>>> -	rec = reg_btf_record(reg);
>>>  	list_node_off = reg->off + reg->var_off.value;
>>> -	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
>>> +	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
>>>  	if (!field || field->offset != list_node_off) {
>>>  		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
>>>  		return -EINVAL;
>>> @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
>>>  			btf_name_by_offset(field->list_head.btf, et->name_off));
>>>  		return -EINVAL;
>>>  	}
>>> -	/* Set arg#1 for expiration after unlock */
>>> -	return ref_set_release_on_unlock(env, reg->ref_obj_id);
>>> +
>>> +	return 0;
>>
>> and here we come to the main point.
>> Can you just call
>> ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here?
>> Everything will be so much simpler, no?
>>
>>>  }
>>>  
>>>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
>>> @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>  			    int *insn_idx_p)
>>>  {
>>>  	const struct btf_type *t, *func, *func_proto, *ptr_type;
>>> +	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
>>>  	struct bpf_reg_state *regs = cur_regs(env);
>>>  	const char *func_name, *ptr_type_name;
>>>  	bool sleepable, rcu_lock, rcu_unlock;
>>>  	struct bpf_kfunc_call_arg_meta meta;
>>> -	u32 i, nargs, func_id, ptr_type_id;
>>>  	int err, insn_idx = *insn_idx_p;
>>>  	const struct btf_param *args;
>>>  	const struct btf_type *ret_t;
>>> @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>  	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
>>>  	 */
>>>  	if (meta.release_regno) {
>>> -		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
>>> +		err = 0;
>>> +		release_ref_obj_id = regs[meta.release_regno].ref_obj_id;
>>> +
>>> +		if (is_kfunc_release_non_own(&meta))
>>> +			err = ref_convert_owning_non_owning(env, release_ref_obj_id);
>>> +		if (err) {
>>> +			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
>>> +				func_name, func_id);
>>> +			return err;
>>> +		}
>>> +
>>> +		err = release_reference(env, release_ref_obj_id);
>>
>> and this bit won't be needed.
>> and no need to guess in patch 1 which arg has to be released and converted to non_own.
>>
>>>  		if (err) {
>>>  			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
>>>  				func_name, func_id);
>>> -- 
>>> 2.30.2
>>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2023-01-17 16:07     ` Dave Marchevsky
@ 2023-01-17 16:56       ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2023-01-17 16:56 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Jan 17, 2023 at 8:07 AM Dave Marchevsky <davemarchevsky@meta.com> wrote:
>
> How about I try to make the names better in v3 instead of removing the kfunc
> flags entirely? If you're still opposed after that, I will instead add helpers
> with comments like:

Please review what I did with this patch:
https://patchwork.kernel.org/project/netdevbpf/patch/20221230010738.45277-1-alexei.starovoitov@gmail.com/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2022-12-29 17:00       ` David Vernet
@ 2023-01-17 17:26         ` Dave Marchevsky
  2023-01-17 17:36           ` Alexei Starovoitov
  2023-01-20  5:13           ` David Vernet
  0 siblings, 2 replies; 38+ messages in thread
From: Dave Marchevsky @ 2023-01-17 17:26 UTC (permalink / raw)
  To: David Vernet, Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/29/22 12:00 PM, David Vernet wrote:
> On Thu, Dec 29, 2022 at 08:50:19AM -0800, Alexei Starovoitov wrote:
>> On Wed, Dec 28, 2022 at 10:40 PM David Vernet <void@manifault.com> wrote:
>>>
>>> On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
>>>> Currently, kfuncs marked KF_RELEASE indicate that they release some
>>>> previously-acquired arg. The verifier assumes that such a function will
>>>> only have one arg reg w/ ref_obj_id set, and that that arg is the one to
>>>> be released. Multiple kfunc arg regs have ref_obj_id set is considered
>>>> an invalid state.
>>>>
>>>> For helpers, RELEASE is used to tag a particular arg in the function
>>>> proto, not the function itself. The arg with OBJ_RELEASE type tag is the
>>>> arg that the helper will release. There can only be one such tagged arg.
>>>> When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
>>>> also considered an invalid state.
>>>>
>>>> Later patches in this series will result in some linked_list helpers
>>>> marked KF_RELEASE having a valid reason to take two ref_obj_id args.
>>>> Specifically, bpf_list_push_{front,back} can push a node to a list head
>>>> which is itself part of a list node. In such a scenario both arguments
>>>> to these functions would have ref_obj_id > 0, thus would fail
>>>> verification under current logic.
>>>>
>>>> This patch changes kfunc ref_obj_id searching logic to find the last arg
>>>> reg w/ ref_obj_id and consider that the reg-to-release. This should be
>>>> backwards-compatible with all current kfuncs as they only expect one
>>>> such arg reg.
>>>
>>> Can't say I'm a huge fan of this proposal :-( While I think it's really
>>> unfortunate that kfunc flags are not defined per-arg for this exact type
>>> of reason, adding more flag-specific semantics like this is IMO a step
>>> in the wrong direction.  It's similar to the existing __sz and __k
>>> argument-naming semantics that inform the verifier that the arguments
>>> have special meaning. All of these little additions of special-case
>>> handling for kfunc flags end up requiring people writing kfuncs (and
>>> sometimes calling them) to read through the verifier to understand
>>> what's going on (though I will say that it's nice that __sz and __k are
>>> properly documented in [0]).
>>
>> Before getting to pros/cons of KF_* vs name suffix vs helper style
>> per-arg description...
>> It's important to highlight that here we're talking about
>> link list and rb tree kfuncs that are not like other kfuncs.
>> Majority of kfuncs can be added by subsystems like hid-bpf
>> without touching the verifier.
> 
> I hear you and I agree. It wasn't my intention to drag us into a larger
> discussion about kfuncs vs. helpers, but rather just to point out that I
> think we have to try hard to avoid adding special-case logic that
> requires looking into the verifier to understand the semantics. I think
> we're on the same page about this, based on this and your other
> response.
> 

In another thread you also mentioned that hypothetical "kfunc writer" persona
shouldn't have to understand kfunc flags in order to add their simple kfunc, and
I think your comments here are also presupposing a "kfunc writer" persona that
doesn't look at the verifier. Having such a person able to add kfuncs without
understanding the verifier is a good goal, but doesn't reflect current
reality when the kfunc needs to have any special semantics.

Regardless, I'd expect that anyone adding further new-style Graph
datastructures, old-style maps, or new datastructures unrelated to either,
will be closer to "verifier expert" than "random person adding a few kfuncs".

>> Here we're paving the way for graph (aka new gen data structs)
>> and so far not only kfuncs, but their arg types have to have
>> special handling inside the verifier.
>> There is not much yet to generalize and expose as generic KF_
>> flag or as a name suffix.
>> Therefore I think it's more appropriate to implement them
>> with minimal verifier changes and minimal complexity.
> 
> Agreed
> 

'Generalize' was addressed in Patch 2's thread.

>> There is no 3rd graph algorithm on the horizon after link list
>> and rbtree. Instead there is a big todo list for
>> 'multi owner graph node' and 'bpf_refcount_t'.
> 
> In this case my point in [0] of the only option for generalizing being
> to have something like KF_GRAPH_INSERT / KF_GRAPH_REMOVE is just not the
> way forward (which I also said was my opinion when I pointed it out as
> an option). Let's just special-case these kfuncs. There's already a
> precedence for doing that in the verifier anyways. Minimal complexity,
> minimal API changes. It's a win-win.
> 
> [0]: https://lore.kernel.org/all/Y63GLqZil9l1NzY4@maniforge.lan/
> 

There's certainly precedent for adding special-case "kfunc_id == KFUNC_whatever"
all over the verifier. It's a bad precedent, though, for reasons discussed in
[0].

To specifically address your points here, I don't buy the argument that
special-casing based on func id is "minimal complexity, minimal API changes".
Re: 'complexity': the logic implementing the complicated semantic will be
added regardless, it just won't have a name that's easily referenced in docs
and mailing list discussions.

Similarly, re: 'API changes': if by 'API' here you mean "API that's exposed
to folks adding kfuncs" - see my comments about "kfunc writer" persona above.
We can think of the verifier itself as an API too - with a single bpf_check
function. That API's behavior is indeed changed here, regardless of whether
the added semantics are gated by a kfunc flag or special-case checks. I don't
think that hiding complexity behind special-case checks when there could be
a named flag simplifies anything. The complexity is added regardless, question
is how many breadcrumbs and pointers we want to leave for folks trying to make
sense of it in the future.

  [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/

>> Those will require bigger changes in the verifier,
>> so I'd like to avoid premature generalization :) as analogous
>> to premature optimization :)
> 
> And of course given my points above and in other threads: agreed. I
> think we have an ideal middle-ground for minimizing complexity in the
> short term, and some nice follow-on todo-list items to work on in the
> medium-long term which will continue to improve things without
> (negatively) affecting users in any way. All SGTM

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2023-01-17 17:26         ` Dave Marchevsky
@ 2023-01-17 17:36           ` Alexei Starovoitov
  2023-01-17 23:12             ` Dave Marchevsky
  2023-01-20  5:13           ` David Vernet
  1 sibling, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2023-01-17 17:36 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: David Vernet, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kernel Team,
	Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Jan 17, 2023 at 9:26 AM Dave Marchevsky <davemarchevsky@meta.com> wrote:
>
> In another thread you also mentioned that hypothetical "kfunc writer" persona
> shouldn't have to understand kfunc flags in order to add their simple kfunc, and
> I think your comments here are also presupposing a "kfunc writer" persona that
> doesn't look at the verifier. Having such a person able to add kfuncs without
> understanding the verifier is a good goal, but doesn't reflect current
> reality when the kfunc needs to have any special semantics.

agree on that goal.

> Regardless, I'd expect that anyone adding further new-style Graph
> datastructures, old-style maps, or new datastructures unrelated to either,
> will be closer to "verifier expert" than "random person adding a few kfuncs".

also agree, since it's a reality right now.

> >> Here we're paving the way for graph (aka new gen data structs)
> >> and so far not only kfuncs, but their arg types have to have
> >> special handling inside the verifier.
> >> There is not much yet to generalize and expose as generic KF_
> >> flag or as a name suffix.
> >> Therefore I think it's more appropriate to implement them
> >> with minimal verifier changes and minimal complexity.
> >
> > Agreed
> >
>
> 'Generalize' was addressed in Patch 2's thread.
>
> >> There is no 3rd graph algorithm on the horizon after link list
> >> and rbtree. Instead there is a big todo list for
> >> 'multi owner graph node' and 'bpf_refcount_t'.
> >
> > In this case my point in [0] of the only option for generalizing being
> > to have something like KF_GRAPH_INSERT / KF_GRAPH_REMOVE is just not the
> > way forward (which I also said was my opinion when I pointed it out as
> > an option). Let's just special-case these kfuncs. There's already a
> > precedence for doing that in the verifier anyways. Minimal complexity,
> > minimal API changes. It's a win-win.
> >
> > [0]: https://lore.kernel.org/all/Y63GLqZil9l1NzY4@maniforge.lan/
> >
>
> There's certainly precedent for adding special-case "kfunc_id == KFUNC_whatever"
> all over the verifier. It's a bad precedent, though, for reasons discussed in
> [0].
>
> To specifically address your points here, I don't buy the argument that
> special-casing based on func id is "minimal complexity, minimal API changes".
> Re: 'complexity': the logic implementing the complicated semantic will be
> added regardless, it just won't have a name that's easily referenced in docs
> and mailing list discussions.
>
> Similarly, re: 'API changes': if by 'API' here you mean "API that's exposed
> to folks adding kfuncs" - see my comments about "kfunc writer" persona above.
> We can think of the verifier itself as an API too - with a single bpf_check
> function. That API's behavior is indeed changed here, regardless of whether
> the added semantics are gated by a kfunc flag or special-case checks. I don't
> think that hiding complexity behind special-case checks when there could be
> a named flag simplifies anything. The complexity is added regardless, question
> is how many breadcrumbs and pointers we want to leave for folks trying to make
> sense of it in the future.
>
>   [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/

I could have agreed to this as well if I didn't go and remove
all the new KF_*OWN* flags.
imo the resulting diff of mine vs your initial patch is easier to
follow and reason about.
So for this case "kfunc_id == KFUNC_whatever" is cleaner.
It doesn't mean that it will be the case in other situations.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2023-01-17 17:36           ` Alexei Starovoitov
@ 2023-01-17 23:12             ` Dave Marchevsky
  0 siblings, 0 replies; 38+ messages in thread
From: Dave Marchevsky @ 2023-01-17 23:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Vernet, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kernel Team,
	Kumar Kartikeya Dwivedi, Tejun Heo

On 1/17/23 12:36 PM, Alexei Starovoitov wrote:
> On Tue, Jan 17, 2023 at 9:26 AM Dave Marchevsky <davemarchevsky@meta.com> wrote:
>>
>> In another thread you also mentioned that hypothetical "kfunc writer" persona
>> shouldn't have to understand kfunc flags in order to add their simple kfunc, and
>> I think your comments here are also presupposing a "kfunc writer" persona that
>> doesn't look at the verifier. Having such a person able to add kfuncs without
>> understanding the verifier is a good goal, but doesn't reflect current
>> reality when the kfunc needs to have any special semantics.
> 
> agree on that goal.
> 
>> Regardless, I'd expect that anyone adding further new-style Graph
>> datastructures, old-style maps, or new datastructures unrelated to either,
>> will be closer to "verifier expert" than "random person adding a few kfuncs".
> 
> also agree, since it's a reality right now.
> 
>>>> Here we're paving the way for graph (aka new gen data structs)
>>>> and so far not only kfuncs, but their arg types have to have
>>>> special handling inside the verifier.
>>>> There is not much yet to generalize and expose as generic KF_
>>>> flag or as a name suffix.
>>>> Therefore I think it's more appropriate to implement them
>>>> with minimal verifier changes and minimal complexity.
>>>
>>> Agreed
>>>
>>
>> 'Generalize' was addressed in Patch 2's thread.
>>
>>>> There is no 3rd graph algorithm on the horizon after link list
>>>> and rbtree. Instead there is a big todo list for
>>>> 'multi owner graph node' and 'bpf_refcount_t'.
>>>
>>> In this case my point in [0] of the only option for generalizing being
>>> to have something like KF_GRAPH_INSERT / KF_GRAPH_REMOVE is just not the
>>> way forward (which I also said was my opinion when I pointed it out as
>>> an option). Let's just special-case these kfuncs. There's already a
>>> precedence for doing that in the verifier anyways. Minimal complexity,
>>> minimal API changes. It's a win-win.
>>>
>>> [0]: https://lore.kernel.org/all/Y63GLqZil9l1NzY4@maniforge.lan/
>>>
>>
>> There's certainly precedent for adding special-case "kfunc_id == KFUNC_whatever"
>> all over the verifier. It's a bad precedent, though, for reasons discussed in
>> [0].
>>
>> To specifically address your points here, I don't buy the argument that
>> special-casing based on func id is "minimal complexity, minimal API changes".
>> Re: 'complexity': the logic implementing the complicated semantic will be
>> added regardless, it just won't have a name that's easily referenced in docs
>> and mailing list discussions.
>>
>> Similarly, re: 'API changes': if by 'API' here you mean "API that's exposed
>> to folks adding kfuncs" - see my comments about "kfunc writer" persona above.
>> We can think of the verifier itself as an API too - with a single bpf_check
>> function. That API's behavior is indeed changed here, regardless of whether
>> the added semantics are gated by a kfunc flag or special-case checks. I don't
>> think that hiding complexity behind special-case checks when there could be
>> a named flag simplifies anything. The complexity is added regardless, question
>> is how many breadcrumbs and pointers we want to leave for folks trying to make
>> sense of it in the future.
>>
>>   [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/
> 
> I could have agreed to this as well if I didn't go and remove
> all the new KF_*OWN* flags.
> imo the resulting diff of mine vs your initial patch is easier to
> follow and reason about.
> So for this case "kfunc_id == KFUNC_whatever" is cleaner.
> It doesn't mean that it will be the case in other situations.

In the alternate "bpf: Migrate release_on_unlock logic to non-owning ref
semantics" series you submitted, you mean?

It's certainly a smaller diff and easier to reason about as an individual
change. IMO "smaller diff" is largely due to my version moving
convert_owning_non_owning semantics to function-level while yours keeps it at
arg-level. I think moving to function-level is necessary, elaborated on
why in the other deep side-thread [0].

  [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs
  2022-12-28 21:26   ` David Vernet
@ 2023-01-18  2:16     ` Dave Marchevsky
  2023-01-20  4:45       ` David Vernet
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Marchevsky @ 2023-01-18  2:16 UTC (permalink / raw)
  To: David Vernet, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 12/28/22 4:26 PM, David Vernet wrote:
> On Sat, Dec 17, 2022 at 12:25:06AM -0800, Dave Marchevsky wrote:
>> It is difficult to intuit the semantics of owning and non-owning
>> references from verifier code. In order to keep the high-level details
>> from being lost in the mailing list, this patch adds documentation
>> explaining semantics and details.
>>
>> The target audience of doc added in this patch is folks working on BPF
>> internals, as there's focus on "what should the verifier do here". Via
>> reorganization or copy-and-paste, much of the content can probably be
>> repurposed for BPF program writer audience as well.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> 
> Hey Dave,
> 
> Thanks for writing this up. I left a few comments and suggestions as a
> first pass. Feel free to push back on any of them.
> 
>> ---
>>  Documentation/bpf/graph_ds_impl.rst | 208 ++++++++++++++++++++++++++++
>>  Documentation/bpf/other.rst         |   3 +-
>>  2 files changed, 210 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/bpf/graph_ds_impl.rst
>>
>> diff --git a/Documentation/bpf/graph_ds_impl.rst b/Documentation/bpf/graph_ds_impl.rst
>> new file mode 100644
>> index 000000000000..f92cbd223dc3
>> --- /dev/null
>> +++ b/Documentation/bpf/graph_ds_impl.rst
>> @@ -0,0 +1,208 @@
>> +=========================
>> +BPF Graph Data Structures
>> +=========================
>> +
>> +This document describes implementation details of new-style "graph" data
>> +structures (linked_list, rbtree), with particular focus on verifier
> 
> s/with particular/with a particular
> 

I'm no grammar expert, but based on my googling
"with particular focus" is widely used in newspapers and other places
where grammarians lurk.

>> +implementation of semantics particular to those data structures.
> 
> s/particular/specific
> 
> Just because we already use the word "particular" in the sentence?
> 

Ack

> In general this sentence feels a bit difficult to parse. Wdyt about
> this?
> 
> ...with a particular focus on how the verifier ensures that they are
> properly and safely used by BPF programs.
> 

Agreed in general, but re: your specific suggestion: "the verifier's
implementation of semantics specific to those data structures" communicates
that there are semantics specific to those data structures which required
verifier changes. 

"ensures that they are properly and safely used by BPF programs" is more
vague, but definitely easier to parse.

Will rewrite in some other way which is hopefully best of both worlds.

>> +
>> +Note that the intent of this document is to describe the current state of
>> +these graph data structures, **no guarantees** of stability for either
> 
> I think we can end the sentence in the middle here.
> 
> ...these graph data structures. **No guarantees**...

Ack

> 
> Should we also add a sentence or two here about the intended audience
> (people working on the verifier or readers who are interested in
> learning more about BPF internals)?
> 

Ack

>> +semantics or APIs are made or implied here.
>> +
>> +.. contents::
>> +    :local:
>> +    :depth: 2
>> +
>> +Introduction
>> +------------
>> +
>> +The BPF map API has historically been the main way to expose data structures
>> +of various types for use within BPF programs. Some data structures fit naturally
>> +with the map API (HASH, ARRAY), others less so. Consequentially, programs
> 
> Would you mind please adding some details on why some data structures
> don't fit naturally into the existing map APIs? I feel like that's kind
> of the main focus of the article, so it would probably help to give some
> high-level context up front.
> 
>> +interacting with the latter group of data structures can be hard to parse
>> +for kernel programmers without previous BPF experience.
> 
> I'm not sure I quite follow how this latter point about data structures
> being hard to parse is derived from the point about how some data
> structures don't fit naturally with the map APIs. Maybe we should say
> something like:
> 
> ..., others less so. Given that the API surface and behavioral semantics
> are fundmentally different between these two classes of BPF data
> structures, kernel programmers who are used to interacting with map-type
> data structures may find these graph-type data structures to be
> confusing or unfamiliar.
> 
> Wdyt?
> 

The "Introduction" section is trying to make these points:

  * Data structures have historically been forced to adhere to the Map API
  * Some data structures (linked list, rbtree) don't fit the Map API well
  * For data stuctures that don't fit the Map API well, two problems would
    arise if they were exposed as maps:
    * "square peg / round hole" - in a vacuum, it'd be hard to make sense
      of how Map API manipulates those data structures
    * "Familiarity" - we're not in a vacuum, folks would prefer to write / read
      code that interacts with these data structures in a "normal" kernel style

I will expand upon these, but FWIW the main point of this document is to explain
why new verifier functionality is necessary to make Graph datastructures work,
and what said new functionality does.

Explaining why map API is a bad fit is part of that, but I expect the reader to
have some experience writing BPF programs which interact with maps, so I
probably won't elaborate too much on the basics here. The sentence(s) added to
satisfy your "intended audience" suggestion will say as much.

>> +
>> +Luckily, some restrictions which necessitated the use of BPF map semantics are
>> +no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
>> +BPF allocator, it is now possible to implement BPF data structures whose API
>> +and semantics more closely match those exposed to the rest of the kernel.
> 
> Suggestion, I'd consider explicitly contrasting the map-type
> implementation here with the graph-type implementation. What do you
> think of something like this instead of the above paragraph:
> 
> BPF map-type data structures are defined as part of the UAPI in ``enum
> bpf_map_type``, and are accessed and manipulated using BPF
> :doc:`helpers`. The behaviors, backing memory, and implementations of
> these map-type data structures are entirely encapsulated from BPF
> programs, and mostly encapsulated from the verifier, by the helper
> functions. The logic in the verifier for ensuring that map-type data
> structures are correctly used therefore essentially amounts to
> statically verifying that the helper functions that manipulate and
> access the data structure are called correctly by the program, as
> defined in the helper prototypes. The verifier then relies on the helper
> to properly manipulate the backing data structure with its validated
> arguments.
>> BPF graph-type data structures, on the other hand, leverage more modern
> features such as :doc:`kfuncs`, kptrs, and the any-context BPF
> allocator. They allow BPF programs to manipulate the data structures
> directly using APIs and semantics which more closely match those exposed
> to code in the main kernel, with the verifier's job now being to ensure
> that the programs are properly manipulating the data structures, rather
> than relying on helper functions to properly manipulate the data
> structures in the main kernel.
> 

There's good info here, but I think it belongs in specific sections where
new approach is discussed, not in introduction. For "intended audience" reasons
touched on in my response above.

For "non-owning references section", I will add some paragraphs explaining why
there's no equivalent concept for Map API. For other things you touched on (UAPI
vs kptrs, prealloc vs any-context allocator, etc), I'll add other sections.

>> +
>> +Two such data structures - linked_list and rbtree - have many verification
>> +details in common. Because both have "root"s ("head" for linked_list) and
>> +"node"s, the verifier code and this document refer to common functionality
>> +as "graph_api", "graph_root", "graph_node", etc.
> 
> 
> 
>> +
>> +Unless otherwise stated, examples and semantics below apply to both graph data
>> +structures.
>> +
>> +Non-owning references
>> +---------------------
>> +
>> +**Motivation**
>> +
>> +Consider the following BPF code:
>> +
>> +.. code-block:: c
> 
> You need an extra newline here or the docs build will complain:
> 
> bpf-next/Documentation/bpf/graph_ds_impl.rst:46: ERROR: Error in "code-block" directive:
> maximum 1 argument(s) allowed, 9 supplied.
> 
> .. code-block:: c
>         struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> 
>         bpf_spin_lock(&lock);
> 
>         bpf_rbtree_add(&tree, n); /* AFTER */
> 
>         bpf_spin_unlock(&lock);
> 

Ack

>> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
>> +
>> +        bpf_spin_lock(&lock);
>> +
>> +        bpf_rbtree_add(&tree, n); /* AFTER */
>> +
>> +        bpf_spin_unlock(&lock);
> 
> Also need a newline here or sphinx will get confused and think the
> vertical line is part of the code block.
> 

Ack, all sphinx build errors / warnings for this doc have been fixed.

>> +----
>> +
>> +From the verifier's perspective, after bpf_obj_new ``n`` has type
>> +``PTR_TO_BTF_ID | MEM_ALLOC`` with btf_id of ``struct node_data`` and a
>> +nonzero ``ref_obj_id``. Because it holds ``n``, the program has ownership
> 
> I had to read this first sentence a few times to parse it, maybe due to
> a missing comma between "after bpf_obj_new" and "``n`` has type...".
> What do you think about this wording?
> 
> From the verifier's perspective, the pointer ``n`` returned from
> ``bpf_obj_new`` has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a `btf_id`
> of ``struct node_data``, and a nonzero ``ref_obj_id``.
> 

Ack, your wording is better.

>> +of the pointee's lifetime (object pointed to by ``n``). The BPF program must
> 
> Should we move (object pointed to by ``n``) to be directly after
> "pointee's" / before "lifetime"? Otherwise it reads kind of odd given
> that "lifetime" is really the indirect object in the sentence.
> 

Ack.

>> +pass off ownership before exiting - either via ``bpf_obj_drop``, which free's
> 
> s/free's/frees
> 

I did ``free``'s and ``free``'d instead of these suggested changes. Want to make
it obvious that the action taken is equivalent to free() from malloc API.

>> +the object, or by adding it to ``tree`` with ``bpf_rbtree_add``.
>> +
>> +(``BEFORE`` and ``AFTER`` comments in the example denote beginning of "before
>> +ownership is passed" and "after ownership is passed")
> 
> Should we use something like ACQUIRED / PASSED / RELEASED instead of
> BEFORE / AFTER?
> 

Ack. None of the code samples need RELEASED comment yet, but this scheme is
easier to follow regardless.

>> +
>> +What should the verifier do with ``n`` after ownership is passed off? If the
>> +object was free'd with ``bpf_obj_drop`` the answer is obvious: the verifier
> 
> s/free'd/freed
> 
>> +should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
>> +the object is no longer valid. The underlying memory may have been reused for
>> +some other allocation, unmapped, etc.
>> +
>> +When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
>> +obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
>> +but that would result in programs with useful, common coding patterns being
>> +rejected, e.g.:
>> +
>> +.. code-block:: c
> 
> Same here (newline)
> 
>> +        int x;
>> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
>> +
>> +        bpf_spin_lock(&lock);
>> +
>> +        bpf_rbtree_add(&tree, n); /* AFTER */
>> +        x = n->data;
>> +        n->data = 42;
>> +
>> +        bpf_spin_unlock(&lock);
> 
> Same here (newline)
> 
>> +----
>> +
>> +Both the read from and write to ``n->data`` would be rejected. The verifier
>> +can do better, though, by taking advantage of two details:
>> +
>> +  * Graph data structure APIs can only be used when the ``bpf_spin_lock``
>> +    associated with the graph root is held
> 
> I'd consider giving a bit more background information on this somewhere
> above. This is the first time we've mentioned anything about a lock, so
> it might be worth it to give some context on how these graph-type maps
> are defined and initialized.
> 
> I realize we could be approaching "useful even to people who aren't
> working on the verifier" territory if we go into too much detail, but I
> also think it's important to give backround context on this stuff
> regardless of the intended audience in order for the documentation to
> really be useful.
> 

Agreed, this document is missing important background information about
spin_locks + Graph Datastructures.

>> +  * Both graph data structures have pointer stability
> 
> You also need a newline between nested list entries or sphinx will get
> confused. My suggestion would be to just always have a newline between
> list entries (applies elsewhere in the file as well).
> 

Ack. Apparently I needed three spaces to trigger the next nesting level (had
two). After doing that, it was obvious why your "always have a newline"
suggestion is good.

>> +    * Because graph nodes are allocated with ``bpf_obj_new`` and
>> +      adding / removing from the root involves fiddling with the
>> +      ``bpf_{list,rb}_node`` field of the node struct, a graph node will
>> +      remain at the same address after either operation.
>> +
>> +Because the associated ``bpf_spin_lock`` must be held by any program adding
>> +or removing, if we're in the critical section bounded by that lock, we know
>> +that no other program can add or remove until the end of the critical section.
>> +This combined with pointer stability means that, until the critical section
>> +ends, we can safely access the graph node through ``n`` even after it was used
>> +to pass ownership.
>> +
>> +The verifier considers such a reference a *non-owning reference*. The ref
>> +returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
>> +Both terms currently only have meaning in the context of graph nodes and API.
>> +
>> +**Details**
>> +
>> +Let's enumerate the properties of both types of references.
>> +
>> +*owning reference*
>> +
>> +  * This reference controls the lifetime of the pointee
>> +  * Ownership of pointee must be 'released' by passing it to some graph API
>> +    kfunc, or via ``bpf_obj_drop``, which free's the pointee
> 
> s/free's/frees. "Frees" is a verb, "free's" is a possessive.
> 
>> +    * If not released before program ends, verifier considers program invalid
>> +  * Access to the pointee's memory will not page fault
>> +
>> +*non-owning reference*
>> +
>> +  * This reference does not own the pointee
>> +    * It cannot be used to add the graph node to a graph root, nor free via
>> +      ``bpf_obj_drop``
>> +  * No explicit control of lifetime, but can infer valid lifetime based on
>> +    non-owning ref existence (see explanation below)
>> +  * Access to the pointee's memory will not page fault
> 
> I'd consider defining references, or at least giving some high-level
> description of how they work, somewhere a bit earlier in the page. The
> "Non-owning references" section kind of just jumps right into examples
> of what the verifier allows without describing the concept at a higher
> level, so readers will have a difficult time applying what they're
> reading to the examples being provided.
> 
>> +
>> +From verifier's perspective non-owning references can only exist
>> +between spin_lock and spin_unlock. Why? After spin_unlock another program
>> +can do arbitrary operations on the data structure like removing and free-ing
> 
> s/free-ing/freeing
> 
>> +via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
> 
> s/remove'd/removed

Similarly to ``free``'d, 'remove' here is referring to a specific function, so
did ``remove``'d instead.

> 
> I'll stop pointing these out for now, they apply throughout the page.
> 
>> +free'd, and reused via bpf_obj_new would point to an entirely different thing.
>> +Or the memory could go away.
>> +
>> +To prevent this logic violation all non-owning references are invalidated by
>> +verifier after critical section ends. This is necessary to ensure "will
> 
> - s/by verifier/by the verifier
> - s/after critical section/after a critical section
> - s/to ensure "will not"/to ensure a "will not"
> 
> 

Ack, except s/to ensure "will not"/to ensure the "will not"

>> +not page fault" property of non-owning reference. So if verifier hasn't
> 
> - s/of non-owning/of the non-owning
> - s/So if verifier/So if the verifier
> 

Ack, except s/of non-owning reference/of non-owning references

>> +invalidated a non-owning ref, accessing it will not page fault.
>> +
>> +Currently ``bpf_obj_drop`` is not allowed in the critical section, so
>> +if there's a valid non-owning ref, we must be in critical section, and can
> 
> s/in critical section/in a critical section
> 

Ack

>> +conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
>> +and-reused.
> 
> If you split the line like this, it will render as "dropped-and- reused".
> 

Ack

>> +
>> +Any reference to a node that is in a rbtree _must_ be non-owning, since
> 
> s/a rbtree/an rbtree
> 

TIL, ack.

>> +the tree has control of pointee lifetime. Similarly, any ref to a node
> 
> s/of pointee lifetime/of the pointee's lifetime
> 

ack

>> +that isn't in rbtree _must_ be owning. This results in a nice property:
> 
> s/in rbtree/in an rbtree
> 

ack

>> +graph API add / remove implementations don't need to check if a node
>> +has already been added (or already removed), as the verifier type system
>> +prevents such a state from being valid.
> 
> I feel like "verifier type system" isn't quite accurate here, though I
> may be wrong. When I think of something like "verifier type system" I'm
> more envisioning how the verifier ensures that the correct BTF IDs are
> passed. In this case, it's really the BPF graph-object ownership model
> that's ensuring that the state is valid, right?
> 

I mean "type system" here in the PL / language runtime sense. Although the
verifier doesn't execute the code at runtime, at verification time it augments
the raw BPF bytecode with type information (BTF or type inferred from attach
context) and does some execution-like things with the program, including
complaining if some function expects type X but gets type Y as input.

In this case "owning reference" and "non-owning reference" are distinct types
(owning has nonzero ref_obj_id) and the verifier rejects wrong type for kfunc
input based on this info alone. "graph-object ownership model" is responsible
for changing refs of one type to another.

Regardless, your broader point stands - "verifier type system" isn't commonly
used to describe this behavior, so I should phrase this better.

>> +
>> +However, pointer aliasing poses an issue for the above "nice property".
>> +Consider the following example:
>> +
>> +.. code-block:: c
> 
> Same here (newline)
> 
>> +        struct node_data *n, *m, *o, *p;
>> +        n = bpf_obj_new(typeof(*n));     /* 1 */
>> +
>> +        bpf_spin_lock(&lock);
>> +
>> +        bpf_rbtree_add(&tree, n);        /* 2 */
>> +        m = bpf_rbtree_first(&tree);     /* 3 */
>> +
>> +        o = bpf_rbtree_remove(&tree, n); /* 4 */
>> +        p = bpf_rbtree_remove(&tree, m); /* 5 */
>> +
>> +        bpf_spin_unlock(&lock);
>> +
>> +        bpf_obj_drop(o);
>> +        bpf_obj_drop(p); /* 6 */
> 
> Same here (newline)
> 
>> +----
>> +
>> +Assume tree is empty before this program runs. If we track verifier state
> 
> s/Assume tree,/Assume the tree
> 

ack

>> +changes here using numbers in above comments:
>> +
>> +  1) n is an owning reference
>> +  2) n is a non-owning reference, it's been added to the tree
>> +  3) n and m are non-owning references, they both point to the same node
>> +  4) o is an owning reference, n and m non-owning, all point to same node
>> +  5) o and p are owning, n and m non-owning, all point to the same node
>> +  6) a double-free has occurred, since o and p point to same node and o was
>> +     free'd in previous statement
>> +
>> +States 4 and 5 violate our "nice property", as there are non-owning refs to
>> +a node which is not in a rbtree. Statement 5 will try to remove a node which
>> +has already been removed as a result of this violation. State 6 is a dangerous
>> +double-free.
>> +
>> +At a minimum we should prevent state 6 from being possible. If we can't also
>> +prevent state 5 then we must abandon our "nice property" and check whether a
>> +node has already been removed at runtime.
>> +
>> +We prevent both by generalizing the "invalidate non-owning references" behavior
>> +of ``bpf_spin_unlock`` and doing similar invalidation after
>> +``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
>> +
>> +  * takes an arbitrary node argument
>> +  * removes it from the datastructure
>> +  * returns an owning reference to the removed node
>> +
>> +May result in a state where some other non-owning reference points to the same
>> +node. So ``remove``-type kfuncs must be considered a non-owning reference
>> +invalidation point as well.
> 
> Could you please also add the new kfunc flags that signal this to
> Documentation/bpf/kfuncs.rst?
> 

ack

>> diff --git a/Documentation/bpf/other.rst b/Documentation/bpf/other.rst
>> index 3d61963403b4..7e6b12018802 100644
>> --- a/Documentation/bpf/other.rst
>> +++ b/Documentation/bpf/other.rst
>> @@ -6,4 +6,5 @@ Other
>>     :maxdepth: 1
>>  
>>     ringbuf
>> -   llvm_reloc
>> \ No newline at end of file
>> +   llvm_reloc
>> +   graph_ds_impl
>> -- 
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs
  2023-01-18  2:16     ` Dave Marchevsky
@ 2023-01-20  4:45       ` David Vernet
  0 siblings, 0 replies; 38+ messages in thread
From: David Vernet @ 2023-01-20  4:45 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Jan 17, 2023 at 09:16:00PM -0500, Dave Marchevsky wrote:
> On 12/28/22 4:26 PM, David Vernet wrote:
> > On Sat, Dec 17, 2022 at 12:25:06AM -0800, Dave Marchevsky wrote:
> >> It is difficult to intuit the semantics of owning and non-owning
> >> references from verifier code. In order to keep the high-level details
> >> from being lost in the mailing list, this patch adds documentation
> >> explaining semantics and details.
> >>
> >> The target audience of doc added in this patch is folks working on BPF
> >> internals, as there's focus on "what should the verifier do here". Via
> >> reorganization or copy-and-paste, much of the content can probably be
> >> repurposed for BPF program writer audience as well.
> >>
> >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > 
> > Hey Dave,
> > 
> > Thanks for writing this up. I left a few comments and suggestions as a
> > first pass. Feel free to push back on any of them.
> > 
> >> ---
> >>  Documentation/bpf/graph_ds_impl.rst | 208 ++++++++++++++++++++++++++++
> >>  Documentation/bpf/other.rst         |   3 +-
> >>  2 files changed, 210 insertions(+), 1 deletion(-)
> >>  create mode 100644 Documentation/bpf/graph_ds_impl.rst
> >>
> >> diff --git a/Documentation/bpf/graph_ds_impl.rst b/Documentation/bpf/graph_ds_impl.rst
> >> new file mode 100644
> >> index 000000000000..f92cbd223dc3
> >> --- /dev/null
> >> +++ b/Documentation/bpf/graph_ds_impl.rst
> >> @@ -0,0 +1,208 @@
> >> +=========================
> >> +BPF Graph Data Structures
> >> +=========================
> >> +
> >> +This document describes implementation details of new-style "graph" data
> >> +structures (linked_list, rbtree), with particular focus on verifier
> > 
> > s/with particular/with a particular
> > 
> 
> I'm no grammar expert, but based on my googling
> "with particular focus" is widely used in newspapers and other places
> where grammarians lurk.

I have no doubt whatsoever that people who write professionally know
better than I do, so yeah, I'm probably wrong and it's fine to leave it
as is.

> 
> >> +implementation of semantics particular to those data structures.
> > 
> > s/particular/specific
> > 
> > Just because we already use the word "particular" in the sentence?
> > 
> 
> Ack
> 
> > In general this sentence feels a bit difficult to parse. Wdyt about
> > this?
> > 
> > ...with a particular focus on how the verifier ensures that they are
> > properly and safely used by BPF programs.
> > 
> 
> Agreed in general, but re: your specific suggestion: "the verifier's
> implementation of semantics specific to those data structures" communicates
> that there are semantics specific to those data structures which required
> verifier changes. 
> 
> "ensures that they are properly and safely used by BPF programs" is more
> vague, but definitely easier to parse.
> 
> Will rewrite in some other way which is hopefully best of both worlds.

Yeah that's fair, feel free to keep the way you had it before if you
prefer that. Or reword it, up to you.

> 
> >> +
> >> +Note that the intent of this document is to describe the current state of
> >> +these graph data structures, **no guarantees** of stability for either
> > 
> > I think we can end the sentence in the middle here.
> > 
> > ...these graph data structures. **No guarantees**...
> 
> Ack
> 
> > 
> > Should we also add a sentence or two here about the intended audience
> > (people working on the verifier or readers who are interested in
> > learning more about BPF internals)?
> > 
> 
> Ack
> 
> >> +semantics or APIs are made or implied here.
> >> +
> >> +.. contents::
> >> +    :local:
> >> +    :depth: 2
> >> +
> >> +Introduction
> >> +------------
> >> +
> >> +The BPF map API has historically been the main way to expose data structures
> >> +of various types for use within BPF programs. Some data structures fit naturally
> >> +with the map API (HASH, ARRAY), others less so. Consequentially, programs
> > 
> > Would you mind please adding some details on why some data structures
> > don't fit naturally into the existing map APIs? I feel like that's kind
> > of the main focus of the article, so it would probably help to give some
> > high-level context up front.
> > 
> >> +interacting with the latter group of data structures can be hard to parse
> >> +for kernel programmers without previous BPF experience.
> > 
> > I'm not sure I quite follow how this latter point about data structures
> > being hard to parse is derived from the point about how some data
> > structures don't fit naturally with the map APIs. Maybe we should say
> > something like:
> > 
> > ..., others less so. Given that the API surface and behavioral semantics
> > are fundmentally different between these two classes of BPF data
> > structures, kernel programmers who are used to interacting with map-type
> > data structures may find these graph-type data structures to be
> > confusing or unfamiliar.
> > 
> > Wdyt?
> > 
> 
> The "Introduction" section is trying to make these points:
> 
>   * Data structures have historically been forced to adhere to the Map API
>   * Some data structures (linked list, rbtree) don't fit the Map API well
>   * For data stuctures that don't fit the Map API well, two problems would
>     arise if they were exposed as maps:
>     * "square peg / round hole" - in a vacuum, it'd be hard to make sense
>       of how Map API manipulates those data structures
>     * "Familiarity" - we're not in a vacuum, folks would prefer to write / read
>       code that interacts with these data structures in a "normal" kernel style
> 
> I will expand upon these, but FWIW the main point of this document is to explain
> why new verifier functionality is necessary to make Graph datastructures work,
> and what said new functionality does.
> 
> Explaining why map API is a bad fit is part of that, but I expect the reader to
> have some experience writing BPF programs which interact with maps, so I
> probably won't elaborate too much on the basics here. The sentence(s) added to
> satisfy your "intended audience" suggestion will say as much.
> 
> >> +
> >> +Luckily, some restrictions which necessitated the use of BPF map semantics are
> >> +no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
> >> +BPF allocator, it is now possible to implement BPF data structures whose API
> >> +and semantics more closely match those exposed to the rest of the kernel.
> > 
> > Suggestion, I'd consider explicitly contrasting the map-type
> > implementation here with the graph-type implementation. What do you
> > think of something like this instead of the above paragraph:
> > 
> > BPF map-type data structures are defined as part of the UAPI in ``enum
> > bpf_map_type``, and are accessed and manipulated using BPF
> > :doc:`helpers`. The behaviors, backing memory, and implementations of
> > these map-type data structures are entirely encapsulated from BPF
> > programs, and mostly encapsulated from the verifier, by the helper
> > functions. The logic in the verifier for ensuring that map-type data
> > structures are correctly used therefore essentially amounts to
> > statically verifying that the helper functions that manipulate and
> > access the data structure are called correctly by the program, as
> > defined in the helper prototypes. The verifier then relies on the helper
> > to properly manipulate the backing data structure with its validated
> > arguments.
> >> BPF graph-type data structures, on the other hand, leverage more modern
> > features such as :doc:`kfuncs`, kptrs, and the any-context BPF
> > allocator. They allow BPF programs to manipulate the data structures
> > directly using APIs and semantics which more closely match those exposed
> > to code in the main kernel, with the verifier's job now being to ensure
> > that the programs are properly manipulating the data structures, rather
> > than relying on helper functions to properly manipulate the data
> > structures in the main kernel.
> > 
> 
> There's good info here, but I think it belongs in specific sections where
> new approach is discussed, not in introduction. For "intended audience" reasons
> touched on in my response above.

Sounds good.

> For "non-owning references section", I will add some paragraphs explaining why
> there's no equivalent concept for Map API. For other things you touched on (UAPI
> vs kptrs, prealloc vs any-context allocator, etc), I'll add other sections.

Thanks!

> 
> >> +
> >> +Two such data structures - linked_list and rbtree - have many verification
> >> +details in common. Because both have "root"s ("head" for linked_list) and
> >> +"node"s, the verifier code and this document refer to common functionality
> >> +as "graph_api", "graph_root", "graph_node", etc.
> > 
> > 
> > 
> >> +
> >> +Unless otherwise stated, examples and semantics below apply to both graph data
> >> +structures.
> >> +
> >> +Non-owning references
> >> +---------------------
> >> +
> >> +**Motivation**
> >> +
> >> +Consider the following BPF code:
> >> +
> >> +.. code-block:: c
> > 
> > You need an extra newline here or the docs build will complain:
> > 
> > bpf-next/Documentation/bpf/graph_ds_impl.rst:46: ERROR: Error in "code-block" directive:
> > maximum 1 argument(s) allowed, 9 supplied.
> > 
> > .. code-block:: c
> >         struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> > 
> >         bpf_spin_lock(&lock);
> > 
> >         bpf_rbtree_add(&tree, n); /* AFTER */
> > 
> >         bpf_spin_unlock(&lock);
> > 
> 
> Ack
> 
> >> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> >> +
> >> +        bpf_spin_lock(&lock);
> >> +
> >> +        bpf_rbtree_add(&tree, n); /* AFTER */
> >> +
> >> +        bpf_spin_unlock(&lock);
> > 
> > Also need a newline here or sphinx will get confused and think the
> > vertical line is part of the code block.
> > 
> 
> Ack, all sphinx build errors / warnings for this doc have been fixed.
> 
> >> +----
> >> +
> >> +From the verifier's perspective, after bpf_obj_new ``n`` has type
> >> +``PTR_TO_BTF_ID | MEM_ALLOC`` with btf_id of ``struct node_data`` and a
> >> +nonzero ``ref_obj_id``. Because it holds ``n``, the program has ownership
> > 
> > I had to read this first sentence a few times to parse it, maybe due to
> > a missing comma between "after bpf_obj_new" and "``n`` has type...".
> > What do you think about this wording?
> > 
> > From the verifier's perspective, the pointer ``n`` returned from
> > ``bpf_obj_new`` has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a `btf_id`
> > of ``struct node_data``, and a nonzero ``ref_obj_id``.
> > 
> 
> Ack, your wording is better.
> 
> >> +of the pointee's lifetime (object pointed to by ``n``). The BPF program must
> > 
> > Should we move (object pointed to by ``n``) to be directly after
> > "pointee's" / before "lifetime"? Otherwise it reads kind of odd given
> > that "lifetime" is really the indirect object in the sentence.
> > 
> 
> Ack.
> 
> >> +pass off ownership before exiting - either via ``bpf_obj_drop``, which free's
> > 
> > s/free's/frees
> > 
> 
> I did ``free``'s and ``free``'d instead of these suggested changes. Want to make
> it obvious that the action taken is equivalent to free() from malloc API.
> 
> >> +the object, or by adding it to ``tree`` with ``bpf_rbtree_add``.
> >> +
> >> +(``BEFORE`` and ``AFTER`` comments in the example denote beginning of "before
> >> +ownership is passed" and "after ownership is passed")
> > 
> > Should we use something like ACQUIRED / PASSED / RELEASED instead of
> > BEFORE / AFTER?
> > 
> 
> Ack. None of the code samples need RELEASED comment yet, but this scheme is
> easier to follow regardless.
> 
> >> +
> >> +What should the verifier do with ``n`` after ownership is passed off? If the
> >> +object was free'd with ``bpf_obj_drop`` the answer is obvious: the verifier
> > 
> > s/free'd/freed
> > 
> >> +should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
> >> +the object is no longer valid. The underlying memory may have been reused for
> >> +some other allocation, unmapped, etc.
> >> +
> >> +When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
> >> +obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
> >> +but that would result in programs with useful, common coding patterns being
> >> +rejected, e.g.:
> >> +
> >> +.. code-block:: c
> > 
> > Same here (newline)
> > 
> >> +        int x;
> >> +        struct node_data *n = bpf_obj_new(typeof(*n)); /* BEFORE */
> >> +
> >> +        bpf_spin_lock(&lock);
> >> +
> >> +        bpf_rbtree_add(&tree, n); /* AFTER */
> >> +        x = n->data;
> >> +        n->data = 42;
> >> +
> >> +        bpf_spin_unlock(&lock);
> > 
> > Same here (newline)
> > 
> >> +----
> >> +
> >> +Both the read from and write to ``n->data`` would be rejected. The verifier
> >> +can do better, though, by taking advantage of two details:
> >> +
> >> +  * Graph data structure APIs can only be used when the ``bpf_spin_lock``
> >> +    associated with the graph root is held
> > 
> > I'd consider giving a bit more background information on this somewhere
> > above. This is the first time we've mentioned anything about a lock, so
> > it might be worth it to give some context on how these graph-type maps
> > are defined and initialized.
> > 
> > I realize we could be approaching "useful even to people who aren't
> > working on the verifier" territory if we go into too much detail, but I
> > also think it's important to give backround context on this stuff
> > regardless of the intended audience in order for the documentation to
> > really be useful.
> > 
> 
> Agreed, this document is missing important background information about
> spin_locks + Graph Datastructures.
> 
> >> +  * Both graph data structures have pointer stability
> > 
> > You also need a newline between nested list entries or sphinx will get
> > confused. My suggestion would be to just always have a newline between
> > list entries (applies elsewhere in the file as well).
> > 
> 
> Ack. Apparently I needed three spaces to trigger the next nesting level (had
> two). After doing that, it was obvious why your "always have a newline"
> suggestion is good.
> 
> >> +    * Because graph nodes are allocated with ``bpf_obj_new`` and
> >> +      adding / removing from the root involves fiddling with the
> >> +      ``bpf_{list,rb}_node`` field of the node struct, a graph node will
> >> +      remain at the same address after either operation.
> >> +
> >> +Because the associated ``bpf_spin_lock`` must be held by any program adding
> >> +or removing, if we're in the critical section bounded by that lock, we know
> >> +that no other program can add or remove until the end of the critical section.
> >> +This combined with pointer stability means that, until the critical section
> >> +ends, we can safely access the graph node through ``n`` even after it was used
> >> +to pass ownership.
> >> +
> >> +The verifier considers such a reference a *non-owning reference*. The ref
> >> +returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
> >> +Both terms currently only have meaning in the context of graph nodes and API.
> >> +
> >> +**Details**
> >> +
> >> +Let's enumerate the properties of both types of references.
> >> +
> >> +*owning reference*
> >> +
> >> +  * This reference controls the lifetime of the pointee
> >> +  * Ownership of pointee must be 'released' by passing it to some graph API
> >> +    kfunc, or via ``bpf_obj_drop``, which free's the pointee
> > 
> > s/free's/frees. "Frees" is a verb, "free's" is a possessive.
> > 
> >> +    * If not released before program ends, verifier considers program invalid
> >> +  * Access to the pointee's memory will not page fault
> >> +
> >> +*non-owning reference*
> >> +
> >> +  * This reference does not own the pointee
> >> +    * It cannot be used to add the graph node to a graph root, nor free via
> >> +      ``bpf_obj_drop``
> >> +  * No explicit control of lifetime, but can infer valid lifetime based on
> >> +    non-owning ref existence (see explanation below)
> >> +  * Access to the pointee's memory will not page fault
> > 
> > I'd consider defining references, or at least giving some high-level
> > description of how they work, somewhere a bit earlier in the page. The
> > "Non-owning references" section kind of just jumps right into examples
> > of what the verifier allows without describing the concept at a higher
> > level, so readers will have a difficult time applying what they're
> > reading to the examples being provided.
> > 
> >> +
> >> +From verifier's perspective non-owning references can only exist
> >> +between spin_lock and spin_unlock. Why? After spin_unlock another program
> >> +can do arbitrary operations on the data structure like removing and free-ing
> > 
> > s/free-ing/freeing
> > 
> >> +via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
> > 
> > s/remove'd/removed
> 
> Similarly to ``free``'d, 'remove' here is referring to a specific function, so
> did ``remove``'d instead.
> 
> > 
> > I'll stop pointing these out for now, they apply throughout the page.
> > 
> >> +free'd, and reused via bpf_obj_new would point to an entirely different thing.
> >> +Or the memory could go away.
> >> +
> >> +To prevent this logic violation all non-owning references are invalidated by
> >> +verifier after critical section ends. This is necessary to ensure "will
> > 
> > - s/by verifier/by the verifier
> > - s/after critical section/after a critical section
> > - s/to ensure "will not"/to ensure a "will not"
> > 
> > 
> 
> Ack, except s/to ensure "will not"/to ensure the "will not"
> 
> >> +not page fault" property of non-owning reference. So if verifier hasn't
> > 
> > - s/of non-owning/of the non-owning
> > - s/So if verifier/So if the verifier
> > 
> 
> Ack, except s/of non-owning reference/of non-owning references
> 
> >> +invalidated a non-owning ref, accessing it will not page fault.
> >> +
> >> +Currently ``bpf_obj_drop`` is not allowed in the critical section, so
> >> +if there's a valid non-owning ref, we must be in critical section, and can
> > 
> > s/in critical section/in a critical section
> > 
> 
> Ack
> 
> >> +conclude that the ref's memory hasn't been dropped-and-free'd or dropped-
> >> +and-reused.
> > 
> > If you split the line like this, it will render as "dropped-and- reused".
> > 
> 
> Ack
> 
> >> +
> >> +Any reference to a node that is in a rbtree _must_ be non-owning, since
> > 
> > s/a rbtree/an rbtree
> > 
> 
> TIL, ack.
> 
> >> +the tree has control of pointee lifetime. Similarly, any ref to a node
> > 
> > s/of pointee lifetime/of the pointee's lifetime
> > 
> 
> ack
> 
> >> +that isn't in rbtree _must_ be owning. This results in a nice property:
> > 
> > s/in rbtree/in an rbtree
> > 
> 
> ack
> 
> >> +graph API add / remove implementations don't need to check if a node
> >> +has already been added (or already removed), as the verifier type system
> >> +prevents such a state from being valid.
> > 
> > I feel like "verifier type system" isn't quite accurate here, though I
> > may be wrong. When I think of something like "verifier type system" I'm
> > more envisioning how the verifier ensures that the correct BTF IDs are
> > passed. In this case, it's really the BPF graph-object ownership model
> > that's ensuring that the state is valid, right?
> > 
> 
> I mean "type system" here in the PL / language runtime sense. Although the
> verifier doesn't execute the code at runtime, at verification time it augments
> the raw BPF bytecode with type information (BTF or type inferred from attach
> context) and does some execution-like things with the program, including
> complaining if some function expects type X but gets type Y as input.
> 
> In this case "owning reference" and "non-owning reference" are distinct types
> (owning has nonzero ref_obj_id) and the verifier rejects wrong type for kfunc
> input based on this info alone. "graph-object ownership model" is responsible
> for changing refs of one type to another.
> 
> Regardless, your broader point stands - "verifier type system" isn't commonly
> used to describe this behavior, so I should phrase this better.

Thanks for explaining. That all makes sense, but yeah, might be worth
tinkering with the wording a bit just to avoid future confusion for
others.

> 
> >> +
> >> +However, pointer aliasing poses an issue for the above "nice property".
> >> +Consider the following example:
> >> +
> >> +.. code-block:: c
> > 
> > Same here (newline)
> > 
> >> +        struct node_data *n, *m, *o, *p;
> >> +        n = bpf_obj_new(typeof(*n));     /* 1 */
> >> +
> >> +        bpf_spin_lock(&lock);
> >> +
> >> +        bpf_rbtree_add(&tree, n);        /* 2 */
> >> +        m = bpf_rbtree_first(&tree);     /* 3 */
> >> +
> >> +        o = bpf_rbtree_remove(&tree, n); /* 4 */
> >> +        p = bpf_rbtree_remove(&tree, m); /* 5 */
> >> +
> >> +        bpf_spin_unlock(&lock);
> >> +
> >> +        bpf_obj_drop(o);
> >> +        bpf_obj_drop(p); /* 6 */
> > 
> > Same here (newline)
> > 
> >> +----
> >> +
> >> +Assume tree is empty before this program runs. If we track verifier state
> > 
> > s/Assume tree,/Assume the tree
> > 
> 
> ack
> 
> >> +changes here using numbers in above comments:
> >> +
> >> +  1) n is an owning reference
> >> +  2) n is a non-owning reference, it's been added to the tree
> >> +  3) n and m are non-owning references, they both point to the same node
> >> +  4) o is an owning reference, n and m non-owning, all point to same node
> >> +  5) o and p are owning, n and m non-owning, all point to the same node
> >> +  6) a double-free has occurred, since o and p point to same node and o was
> >> +     free'd in previous statement
> >> +
> >> +States 4 and 5 violate our "nice property", as there are non-owning refs to
> >> +a node which is not in a rbtree. Statement 5 will try to remove a node which
> >> +has already been removed as a result of this violation. State 6 is a dangerous
> >> +double-free.
> >> +
> >> +At a minimum we should prevent state 6 from being possible. If we can't also
> >> +prevent state 5 then we must abandon our "nice property" and check whether a
> >> +node has already been removed at runtime.
> >> +
> >> +We prevent both by generalizing the "invalidate non-owning references" behavior
> >> +of ``bpf_spin_unlock`` and doing similar invalidation after
> >> +``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
> >> +
> >> +  * takes an arbitrary node argument
> >> +  * removes it from the datastructure
> >> +  * returns an owning reference to the removed node
> >> +
> >> +May result in a state where some other non-owning reference points to the same
> >> +node. So ``remove``-type kfuncs must be considered a non-owning reference
> >> +invalidation point as well.
> > 
> > Could you please also add the new kfunc flags that signal this to
> > Documentation/bpf/kfuncs.rst?
> > 
> 
> ack
> 
> >> diff --git a/Documentation/bpf/other.rst b/Documentation/bpf/other.rst
> >> index 3d61963403b4..7e6b12018802 100644
> >> --- a/Documentation/bpf/other.rst
> >> +++ b/Documentation/bpf/other.rst
> >> @@ -6,4 +6,5 @@ Other
> >>     :maxdepth: 1
> >>  
> >>     ringbuf
> >> -   llvm_reloc
> >> \ No newline at end of file
> >> +   llvm_reloc
> >> +   graph_ds_impl
> >> -- 
> >> 2.30.2
> >>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs
  2023-01-17 17:26         ` Dave Marchevsky
  2023-01-17 17:36           ` Alexei Starovoitov
@ 2023-01-20  5:13           ` David Vernet
  1 sibling, 0 replies; 38+ messages in thread
From: David Vernet @ 2023-01-20  5:13 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Alexei Starovoitov, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kernel Team,
	Kumar Kartikeya Dwivedi, Tejun Heo

On Tue, Jan 17, 2023 at 12:26:32PM -0500, Dave Marchevsky wrote:
> On 12/29/22 12:00 PM, David Vernet wrote:
> > On Thu, Dec 29, 2022 at 08:50:19AM -0800, Alexei Starovoitov wrote:
> >> On Wed, Dec 28, 2022 at 10:40 PM David Vernet <void@manifault.com> wrote:
> >>>
> >>> On Sat, Dec 17, 2022 at 12:24:54AM -0800, Dave Marchevsky wrote:
> >>>> Currently, kfuncs marked KF_RELEASE indicate that they release some
> >>>> previously-acquired arg. The verifier assumes that such a function will
> >>>> only have one arg reg w/ ref_obj_id set, and that that arg is the one to
> >>>> be released. Multiple kfunc arg regs have ref_obj_id set is considered
> >>>> an invalid state.
> >>>>
> >>>> For helpers, RELEASE is used to tag a particular arg in the function
> >>>> proto, not the function itself. The arg with OBJ_RELEASE type tag is the
> >>>> arg that the helper will release. There can only be one such tagged arg.
> >>>> When verifying arg regs, multiple helper arg regs w/ ref_obj_id set is
> >>>> also considered an invalid state.
> >>>>
> >>>> Later patches in this series will result in some linked_list helpers
> >>>> marked KF_RELEASE having a valid reason to take two ref_obj_id args.
> >>>> Specifically, bpf_list_push_{front,back} can push a node to a list head
> >>>> which is itself part of a list node. In such a scenario both arguments
> >>>> to these functions would have ref_obj_id > 0, thus would fail
> >>>> verification under current logic.
> >>>>
> >>>> This patch changes kfunc ref_obj_id searching logic to find the last arg
> >>>> reg w/ ref_obj_id and consider that the reg-to-release. This should be
> >>>> backwards-compatible with all current kfuncs as they only expect one
> >>>> such arg reg.
> >>>
> >>> Can't say I'm a huge fan of this proposal :-( While I think it's really
> >>> unfortunate that kfunc flags are not defined per-arg for this exact type
> >>> of reason, adding more flag-specific semantics like this is IMO a step
> >>> in the wrong direction.  It's similar to the existing __sz and __k
> >>> argument-naming semantics that inform the verifier that the arguments
> >>> have special meaning. All of these little additions of special-case
> >>> handling for kfunc flags end up requiring people writing kfuncs (and
> >>> sometimes calling them) to read through the verifier to understand
> >>> what's going on (though I will say that it's nice that __sz and __k are
> >>> properly documented in [0]).
> >>
> >> Before getting to pros/cons of KF_* vs name suffix vs helper style
> >> per-arg description...
> >> It's important to highlight that here we're talking about
> >> link list and rb tree kfuncs that are not like other kfuncs.
> >> Majority of kfuncs can be added by subsystems like hid-bpf
> >> without touching the verifier.
> > 
> > I hear you and I agree. It wasn't my intention to drag us into a larger
> > discussion about kfuncs vs. helpers, but rather just to point out that I
> > think we have to try hard to avoid adding special-case logic that
> > requires looking into the verifier to understand the semantics. I think
> > we're on the same page about this, based on this and your other
> > response.
> > 
> 
> In another thread you also mentioned that hypothetical "kfunc writer" persona
> shouldn't have to understand kfunc flags in order to add their simple kfunc, and
> I think your comments here are also presupposing a "kfunc writer" persona that
> doesn't look at the verifier. Having such a person able to add kfuncs without
> understanding the verifier is a good goal, but doesn't reflect current
> reality when the kfunc needs to have any special semantics.

Agreed that it's the current reality that you need to read the verifier
to add kfuncs, but I disagree with the sentiment that it's therefore
acceptable to add what are arguably somewhat odd semantics in the
interim that move us in the opposite direction of getting there.

> Regardless, I'd expect that anyone adding further new-style Graph
> datastructures, old-style maps, or new datastructures unrelated to either,
> will be closer to "verifier expert" than "random person adding a few kfuncs".

This doesn't affect just graph datastructure kfunc authors though, it
affects anyone adding a kfunc. It just happens to be needed specifically
for graph data structures. If we really end up needing this, IMO it
would be better to get rid of KF_ACQUIRE and KF_RELEASE flags and just
use __acq / __rel suffixes to match __k and __sz.

> 
> >> Here we're paving the way for graph (aka new gen data structs)
> >> and so far not only kfuncs, but their arg types have to have
> >> special handling inside the verifier.
> >> There is not much yet to generalize and expose as generic KF_
> >> flag or as a name suffix.
> >> Therefore I think it's more appropriate to implement them
> >> with minimal verifier changes and minimal complexity.
> > 
> > Agreed
> > 
> 
> 'Generalize' was addressed in Patch 2's thread.
> 
> >> There is no 3rd graph algorithm on the horizon after link list
> >> and rbtree. Instead there is a big todo list for
> >> 'multi owner graph node' and 'bpf_refcount_t'.
> > 
> > In this case my point in [0] of the only option for generalizing being
> > to have something like KF_GRAPH_INSERT / KF_GRAPH_REMOVE is just not the
> > way forward (which I also said was my opinion when I pointed it out as
> > an option). Let's just special-case these kfuncs. There's already a
> > precedence for doing that in the verifier anyways. Minimal complexity,
> > minimal API changes. It's a win-win.
> > 
> > [0]: https://lore.kernel.org/all/Y63GLqZil9l1NzY4@maniforge.lan/
> > 
> 
> There's certainly precedent for adding special-case "kfunc_id == KFUNC_whatever"
> all over the verifier. It's a bad precedent, though, for reasons discussed in
> [0].
> 
> To specifically address your points here, I don't buy the argument that
> special-casing based on func id is "minimal complexity, minimal API changes".
> Re: 'complexity': the logic implementing the complicated semantic will be
> added regardless, it just won't have a name that's easily referenced in docs
> and mailing list discussions.
> 
> Similarly, re: 'API changes': if by 'API' here you mean "API that's exposed
> to folks adding kfuncs" - see my comments about "kfunc writer" persona above.
> We can think of the verifier itself as an API too - with a single bpf_check
> function. That API's behavior is indeed changed here, regardless of whether
> the added semantics are gated by a kfunc flag or special-case checks. I don't
> think that hiding complexity behind special-case checks when there could be
> a named flag simplifies anything. The complexity is added regardless, question
> is how many breadcrumbs and pointers we want to leave for folks trying to make
> sense of it in the future.
> 
>   [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/

Will reply on that thread.

> 
> >> Those will require bigger changes in the verifier,
> >> so I'd like to avoid premature generalization :) as analogous
> >> to premature optimization :)
> > 
> > And of course given my points above and in other threads: agreed. I
> > think we have an ideal middle-ground for minimizing complexity in the
> > short term, and some nice follow-on todo-list items to work on in the
> > medium-long term which will continue to improve things without
> > (negatively) affecting users in any way. All SGTM

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-01-20  5:23 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-17  8:24 [PATCH v2 bpf-next 00/13] BPF rbtree next-gen datastructure Dave Marchevsky
2022-12-17  8:24 ` [PATCH v2 bpf-next 01/13] bpf: Support multiple arg regs w/ ref_obj_id for kfuncs Dave Marchevsky
2022-12-29  3:24   ` Alexei Starovoitov
2022-12-29  6:40   ` David Vernet
2022-12-29 16:50     ` Alexei Starovoitov
2022-12-29 17:00       ` David Vernet
2023-01-17 17:26         ` Dave Marchevsky
2023-01-17 17:36           ` Alexei Starovoitov
2023-01-17 23:12             ` Dave Marchevsky
2023-01-20  5:13           ` David Vernet
2022-12-17  8:24 ` [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
2022-12-17  9:21   ` Dave Marchevsky
2022-12-28 23:46   ` David Vernet
2022-12-29 15:39     ` David Vernet
2022-12-29  3:56   ` Alexei Starovoitov
2022-12-29 16:54     ` David Vernet
2023-01-17 16:54       ` Dave Marchevsky
2023-01-17 16:07     ` Dave Marchevsky
2023-01-17 16:56       ` Alexei Starovoitov
2022-12-17  8:24 ` [PATCH v2 bpf-next 03/13] selftests/bpf: Update linked_list tests for " Dave Marchevsky
2022-12-17  8:24 ` [PATCH v2 bpf-next 04/13] bpf: rename list_head -> graph_root in field info types Dave Marchevsky
2022-12-17  8:24 ` [PATCH v2 bpf-next 05/13] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
2022-12-17  8:24 ` [PATCH v2 bpf-next 06/13] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
2022-12-17  8:25 ` [PATCH v2 bpf-next 07/13] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
2022-12-29  4:00   ` Alexei Starovoitov
2022-12-17  8:25 ` [PATCH v2 bpf-next 08/13] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
2022-12-17  8:25 ` [PATCH v2 bpf-next 09/13] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
2022-12-29  4:02   ` Alexei Starovoitov
2022-12-17  8:25 ` [PATCH v2 bpf-next 10/13] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
2022-12-17  8:25 ` [PATCH v2 bpf-next 11/13] libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type Dave Marchevsky
2022-12-22 18:50   ` Andrii Nakryiko
2022-12-17  8:25 ` [PATCH v2 bpf-next 12/13] selftests/bpf: Add rbtree selftests Dave Marchevsky
2022-12-17  8:25 ` [PATCH v2 bpf-next 13/13] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
2022-12-28 21:26   ` David Vernet
2023-01-18  2:16     ` Dave Marchevsky
2023-01-20  4:45       ` David Vernet
2022-12-17 10:23 [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics kernel test robot
2022-12-23 10:51 ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.