All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure
@ 2023-02-09 17:41 Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This series adds a rbtree datastructure following the "next-gen
datastructure" precedent set by recently-added linked-list [0]. This is
a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
instead of adding a new map type. This series adds a smaller set of API
functions than that RFC - just the minimum needed to support current
cgfifo example scheduler in ongoing sched_ext effort [2], namely:

  bpf_rbtree_add
  bpf_rbtree_remove
  bpf_rbtree_first

The meat of this series is bugfixes and verifier infra work to support
these API functions. Adding more rbtree kfuncs in future patches should
be straightforward as a result.

First, the series refactors and extends linked_list's release_on_unlock
logic. The concept of "reference to node that was added to data
structure" is formalized as "non-owning reference". From linked_list's
perspective this non-owning reference after
linked_list_push_{front,back} has same semantics as release_on_unlock,
with the addition of writes to such references being valid in the
critical section. Such references are no longer marked PTR_UNTRUSTED.
Patches 2 and 13 go into more detail.

The series then adds rbtree API kfuncs and necessary verifier support
for them - namely support for callback args to kfuncs and some
non-owning reference interactions that linked_list didn't need.

BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
hood. From the BPF program writer's perspective, a BPF rbtree is very
similar to existing linked list. Consider the following example:

  struct node_data {
    long key;
    long data;
    struct bpf_rb_node node;
  }

  static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
  {
    struct node_data *node_a;
    struct node_data *node_b;

    node_a = container_of(a, struct node_data, node);
    node_b = container_of(b, struct node_data, node);

    return node_a->key < node_b->key;
  }

  private(A) struct bpf_spin_lock glock;
  private(A) struct bpf_rb_root groot __contains(node_data, node);

  /* ... in BPF program */
  struct node_data *n, *m;
  struct bpf_rb_node *res;

  n = bpf_obj_new(typeof(*n));
  if (!n)
    /* skip */
  n->key = 5;
  n->data = 10;

  bpf_spin_lock(&glock);
  bpf_rbtree_add(&groot, &n->node, less);
  bpf_spin_unlock(&glock);

  bpf_spin_lock(&glock);
  res = bpf_rbtree_first(&groot);
  if (!res)
    /* skip */
  res = bpf_rbtree_remove(&groot, res);
  if (!res)
    /* skip */
  bpf_spin_unlock(&glock);

  m = container_of(res, struct node_data, node);
  bpf_obj_drop(m);

Some obvious similarities:

  * Special bpf_rb_root and bpf_rb_node types have same semantics
    as bpf_list_head and bpf_list_node, respectively
  * __contains is used to associated node type with root
  * The spin_lock associated with a rbtree must be held when using
    rbtree API kfuncs
  * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
  * Rbtree takes ownership of node lifetime when a node is added.
    Removing a node gives ownership back to the program, requiring a
    bpf_obj_drop before program exit

Some new additions as well:

  * Support for callbacks in kfunc args is added to enable 'less'
    callback use above
  * bpf_rbtree_first is the first graph API function to return a
    non-owning reference instead of convering an arg from own->non-own
  * Because all references to nodes already added to the rbtree are
    non-owning, bpf_rbtree_remove must accept such a reference in order
    to remove it from the tree

Summary of patches:
  Patch 1 lays groundwork for release_on_unlock -> non-owning ref
  changes

  Patches 2 and 3 do release_on_unlock -> non-owning ref migration and
  update linked_list tests

  Patch 4 is a nonfunctional rename

  Patches 5 - 9 implement the meat of rbtree support in this series,
  gradually building up to implemented kfuncs that verify as expected.

  Patch 10 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.

  Patch 12 adds tests, Patch 13 adds documentation.

  [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com
  [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
  [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org

Changelog:

v3 -> v4: lore.kernel.org/bpf/20230131180016.3368305-1-davemarchevsky@fb.com/

Patch #'s below refer to the patch's number in v3 unless otherwise stated.

* General
  * Don't base this series on "bpf: Refactor release_regno searching logic",
    which was submitted separately as a refactor.
  * Rebase onto latest bpf-next: "samples/bpf: Add openat2() enter/exit tracepoint to syscall_tp sample"

* Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock"
  * print_verifier_state change was adding redundant comma after "non_own_ref",
    fix it to put comma in correct place
  * invalidate_non_owning_refs no longer needs to take bpf_active_lock param,
    since any non-owning ref reg in env's cur_state is assumed to use that
    state's active_lock (Alexei)
  * invalidate_non_owning_refs' reg loop should check that the reg being
    inspected is a PTR_TO_BTF_ID before checking reg->non_owning_ref_lock,
    since that field is part of a union and may be filled w/ meaningless bytes
    if reg != PTR_TO_BTF_ID (Alexei)

* Patch 3 - "selftests/bpf: Update linked_list tests for non-owning ref semantics"
  * Change the string searched for by the following tests:
    * linked_list/incorrect_node_off1
    * linked_list/double_push_front
    * linked_list/double_push_back

    necessary due to rebase / dropping of "release_regno searching logic" patch
    (see "General" changes)

* Patch 8 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}"
  * Just call invalidate_non_owning_refs w/ env instead of env, lock. (see
    Patch 2 changes)

* Patch 11 - "bpf, documentation: Add graph documentation for non-owning refs"
  * Fix documentation formatting and improve content (David)
  * v3's version of patch 11 was missing some changes, v4's patch 11 is still
    addressing David's feedback from v2


v2 -> v3: lore.kernel.org/bpf/20221217082506.1570898-1-davemarchevsky@fb.com/

Patch #'s below refer to the patch's number in v2 unless otherwise stated.

* Patch 1 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs"
  * No longer needed as v3 doesn't have multiple ref_obj_id arg regs
  * The refactoring pieces were submitted separately
    (https://lore.kernel.org/bpf/20230121002417.1684602-1-davemarchevsky@fb.com/)

* Patch 2 - "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
  * Remove KF_RELEASE_NON_OWN flag from list API push methods, just match
    against specific kfuncs for now (Alexei, David)
  * Separate "release non owning reference" logic from KF_RELEASE logic
    (Alexei, David)
  * reg_find_field_offset now correctly tests 'rec' instead of 'reg' after
    calling reg_btf_record (Dan Carpenter)

* New patch added after Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock"
  * Eliminates extra bpf_reg_state memory usage by using a bool instead of
    copying lock identity

* Patch 4 - "bpf: rename list_head -> graph_root in field info types"
  * v2's version was applied to bpf-next, not including in respins

* Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
  * Remove KF_RELEASE_NON_OWN flag from rbtree_add, just add it to specific
    kfunc matching (Alexei, David)

* Patch 9 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}"
  * Remove KF_INVALIDATE_NON_OWN kfunc flag, just match against specific kfunc
    for now (Alexei, David)

* Patch 11 - "libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type"
  * Drop for now, will submit separately

* Patch 12 - "selftests/bpf: Add rbtree selftests"
  * Some expected-failure tests have different error messages due to "release
    non-owning reference logic" being separated from KF_RELEASE logic in Patch
    2 changes

* Patch 13 - "bpf, documentation: Add graph documentation for non-owning refs"
  * Fix documentation formatting and improve content (David)


v1 -> v2: lore.kernel.org/bpf/20221206231000.3180914-1-davemarchevsky@fb.com/

Series-wide changes:
  * Rename datastructure_{head,node,api} -> graph_{root,node,api} (Alexei)
  * "graph datastructure" in patch summaries to refer to linked_list + rbtree
    instead of "next-gen datastructure" (Alexei)
  * Move from hacky marking of non-owning references as PTR_UNTRUSTED to
    cleaner implementation (Alexei)
  * Add invalidation of non-owning refs to rbtree_remove (Kumar, Alexei)

Patch #'s below refer to the patch's number in v1 unless otherwise stated.

Note that in v1 most of the meaty verifier changes were in the latter half
of the series. Here, about half of that complexity has been moved to
"bpf: Migrate release_on_unlock logic to non-owning ref semantics" - was Patch
3 in v1.

* Patch 1 - "bpf: Loosen alloc obj test in verifier's reg_btf_record"
  * Was applied, dropped from further iterations

* Patch 2 - "bpf: map_check_btf should fail if btf_parse_fields fails"
  * Dropped in favor of verifier check-on-use: when some normal verifier
    checking expects the map to have btf_fields correctly parsed, it won't
    find any and verification will fail

* New patch added before Patch 3 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs"
  * Addition of KF_RELEASE_NON_OWN flag, which requires KF_RELEASE, and tagging
    of bpf_list_push_{front,back} KF_RELEASE | KF_RELEASE_NON_OWN, means that
    list-in-list push_{front,back} will trigger "only one ref_obj_id arg reg"
    logic. This is because "head" arg to those functions can be a list-in-list,
    which itself can be an owning reference with ref_obj_id. So need to
    support multiple ref_obj_id for release kfuncs.

* Patch 3 - "bpf: Minor refactor of ref_set_release_on_unlock"
  * Now a major refactor w/ a rename to reflect this
    * "bpf: Migrate release_on_unlock logic to non-owning ref semantics"
  * Replaces release_on_unlock with active_lock logic as discussed in v1

* New patch added after Patch 3 - "selftests/bpf: Update linked_list tests for non_owning_ref logic"
  * Removes "write after push" linked_list failure tests - no longer failure
    scenarios.

* Patch 4 - "bpf: rename list_head -> datastructure_head in field info types"
  * rename to graph_root instead. Similar renamings across the series - see
    series-wide changes.

* Patch 5 - "bpf: Add basic bpf_rb_{root,node} support"
  * OWNER_FIELD_MASK -> GRAPH_ROOT_MASK, OWNEE_FIELD_MASK -> GRAPH_NODE_MASK,
    and change of "owner"/"ownee" in big btf_check_and_fixup_fields comment to
    "root"/"node" (Alexei)

* Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs"
  * bpf_rbtree_remove can no longer return NULL. v2 continues v1's "use type
    system to prevent remove of node that isn't in a datastructure" approach,
    so rbtree_remove should never have been able to return NULL

* Patch 7 - "bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args"
  * is_bpf_datastructure_api_kfunc -> is_bpf_graph_api_kfunc (Alexei)

* Patch 8 - "bpf: Add callback validation to kfunc verifier logic"
  * Explicitly disallow rbtree_remove in rbtree callback
  * Explicitly disallow bpf_spin_{lock,unlock} call in rbtree callback,
    preventing possibility of "unbalanced" unlock (Alexei)

* Patch 10 - "bpf, x86: BPF_PROBE_MEM handling for insn->off < 0"
  * Now that non-owning refs aren't marked PTR_UNTRUSTED it's not necessary to
    include this patch as part of the series
  * After conversation w/ Alexei, did another pass and submitted as an
    independent series (lore.kernel.org/bpf/20221213182726.325137-1-davemarchevsky@fb.com/)

* Patch 13 - "selftests/bpf: Add rbtree selftests"
  * Since bpf_rbtree_remove can no longer return null, remove null checks
  * Remove test confirming that rbtree_first isn't allowed in callback. We want
    this to be possible
  * Add failure test confirming that rbtree_remove's new non-owning reference
    invalidation behavior behaves as expected
  * Add SEC("license") to rbtree_btf_fail__* progs. They were previously
    failing due to lack of this section. Now they're failing for correct
    reasons.
  * rbtree_btf_fail__add_wrong_type.c - add locking around rbtree_add, rename
    the bpf prog to something reasonable

* New patch added after patch 13 - "bpf, documentation: Add graph documentation for non-owning refs"
  * Summarizes details of owning and non-owning refs which we hashed out in
    v1

Dave Marchevsky (11):
  bpf: Migrate release_on_unlock logic to non-owning ref semantics
  bpf: Improve bpf_reg_state space usage for non-owning ref lock
  selftests/bpf: Update linked_list tests for non-owning ref semantics
  bpf: Add basic bpf_rb_{root,node} support
  bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  bpf: Add callback validation to kfunc verifier logic
  bpf: Special verifier handling for bpf_rbtree_{remove, first}
  bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  selftests/bpf: Add rbtree selftests
  bpf, documentation: Add graph documentation for non-owning refs

 Documentation/bpf/graph_ds_impl.rst           | 266 ++++++++
 Documentation/bpf/other.rst                   |   3 +-
 include/linux/bpf.h                           |  19 +
 include/linux/bpf_verifier.h                  |  39 +-
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/btf.c                              | 162 +++--
 kernel/bpf/helpers.c                          |  68 +++
 kernel/bpf/syscall.c                          |  28 +-
 kernel/bpf/verifier.c                         | 575 +++++++++++++++---
 tools/include/uapi/linux/bpf.h                |  11 +
 .../testing/selftests/bpf/bpf_experimental.h  |  24 +
 .../selftests/bpf/prog_tests/linked_list.c    |  18 +-
 .../testing/selftests/bpf/prog_tests/rbtree.c | 184 ++++++
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 .../selftests/bpf/progs/linked_list_fail.c    | 100 +--
 tools/testing/selftests/bpf/progs/rbtree.c    | 176 ++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  52 ++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  49 ++
 .../testing/selftests/bpf/progs/rbtree_fail.c | 296 +++++++++
 19 files changed, 1854 insertions(+), 229 deletions(-)
 create mode 100644 Documentation/bpf/graph_ds_impl.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

-- 
2.30.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-10 13:24   ` Kumar Kartikeya Dwivedi
  2023-02-09 17:41 ` [PATCH v4 bpf-next 02/11] bpf: Improve bpf_reg_state space usage for non-owning ref lock Dave Marchevsky
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch introduces non-owning reference semantics to the verifier,
specifically linked_list API kfunc handling. release_on_unlock logic for
refs is refactored - with small functional changes - to implement these
semantics, and bpf_list_push_{front,back} are migrated to use them.

When a list node is pushed to a list, the program still has a pointer to
the node:

  n = bpf_obj_new(typeof(*n));

  bpf_spin_lock(&l);
  bpf_list_push_back(&l, n);
  /* n still points to the just-added node */
  bpf_spin_unlock(&l);

What the verifier considers n to be after the push, and thus what can be
done with n, are changed by this patch.

Common properties both before/after this patch:
  * After push, n is only a valid reference to the node until end of
    critical section
  * After push, n cannot be pushed to any list
  * After push, the program can read the node's fields using n

Before:
  * After push, n retains the ref_obj_id which it received on
    bpf_obj_new, but the associated bpf_reference_state's
    release_on_unlock field is set to true
    * release_on_unlock field and associated logic is used to implement
      "n is only a valid ref until end of critical section"
  * After push, n cannot be written to, the node must be removed from
    the list before writing to its fields
  * After push, n is marked PTR_UNTRUSTED

After:
  * After push, n's ref is released and ref_obj_id set to 0. The
    bpf_reg_state's non_owning_ref_lock struct is populated with the
    currently active lock
    * non_owning_ref_lock and logic is used to implement "n is only a
      valid ref until end of critical section"
  * n can be written to (except for special fields e.g. bpf_list_node,
    timer, ...)
  * No special type flag is added to n after push

Summary of specific implementation changes to achieve the above:

  * release_on_unlock field, ref_set_release_on_unlock helper, and logic
    to "release on unlock" based on that field are removed

  * The anonymous active_lock struct used by bpf_verifier_state is
    pulled out into a named struct bpf_active_lock.

  * A non_owning_ref_lock field of type bpf_active_lock is added to
    bpf_reg_state's PTR_TO_BTF_ID union

  * Helpers are added to use non_owning_ref_lock to implement non-owning
    ref semantics as described above
    * invalidate_non_owning_refs - helper to clobber all non-owning refs
      matching a particular bpf_active_lock identity. Replaces
      release_on_unlock logic in process_spin_lock.
    * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
      on current verifier state
    * ref_convert_owning_non_owning - convert owning reference w/
      specified ref_obj_id to non-owning references. Setup
      non_owning_ref_lock for each reg with that ref_obj_id and 0 out
      its ref_obj_id

After these changes, linked_list's "release on unlock" logic continues
to function as before, except for the semantic differences noted above.
The patch immediately following this one makes minor changes to
linked_list selftests to account for the differing behavior.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h          |   1 +
 include/linux/bpf_verifier.h |  39 ++++-----
 kernel/bpf/verifier.c        | 164 +++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 68 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35c18a98c21a..9a79ebe1774c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,6 +180,7 @@ enum btf_field_type {
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
 	BPF_LIST_NODE  = (1 << 5),
+	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
 };
 
 struct btf_field_kptr {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index aa83de1fe755..7b5fbb66446c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -43,6 +43,22 @@ enum bpf_reg_liveness {
 	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
 };
 
+/* For every reg representing a map value or allocated object pointer,
+ * we consider the tuple of (ptr, id) for them to be unique in verifier
+ * context and conside them to not alias each other for the purposes of
+ * tracking lock state.
+ */
+struct bpf_active_lock {
+	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
+	 * there's no active lock held, and other fields have no
+	 * meaning. If non-NULL, it indicates that a lock is held and
+	 * id member has the reg->id of the register which can be >= 0.
+	 */
+	void *ptr;
+	/* This will be reg->id */
+	u32 id;
+};
+
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
 	enum bpf_reg_type type;
@@ -68,6 +84,7 @@ struct bpf_reg_state {
 		struct {
 			struct btf *btf;
 			u32 btf_id;
+			struct bpf_active_lock non_owning_ref_lock;
 		};
 
 		struct { /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
@@ -226,11 +243,6 @@ struct bpf_reference_state {
 	 * exiting a callback function.
 	 */
 	int callback_ref;
-	/* Mark the reference state to release the registers sharing the same id
-	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
-	 * safe to access inside the critical section).
-	 */
-	bool release_on_unlock;
 };
 
 /* state of the program:
@@ -331,21 +343,8 @@ struct bpf_verifier_state {
 	u32 branches;
 	u32 insn_idx;
 	u32 curframe;
-	/* For every reg representing a map value or allocated object pointer,
-	 * we consider the tuple of (ptr, id) for them to be unique in verifier
-	 * context and conside them to not alias each other for the purposes of
-	 * tracking lock state.
-	 */
-	struct {
-		/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
-		 * there's no active lock held, and other fields have no
-		 * meaning. If non-NULL, it indicates that a lock is held and
-		 * id member has the reg->id of the register which can be >= 0.
-		 */
-		void *ptr;
-		/* This will be reg->id */
-		u32 id;
-	} active_lock;
+
+	struct bpf_active_lock active_lock;
 	bool speculative;
 	bool active_rcu_lock;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4cc0e70ee71e..f693cc97c574 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem {
 
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
+				       struct bpf_active_lock *lock);
+static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -1073,6 +1077,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 				verbose_a("id=%d", reg->id);
 			if (reg->ref_obj_id)
 				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
+			if (reg->non_owning_ref_lock.ptr)
+				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
+					  reg->non_owning_ref_lock.id);
 			if (t != SCALAR_VALUE)
 				verbose_a("off=%d", reg->off);
 			if (type_is_pkt_pointer(t))
@@ -5041,7 +5048,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 			return -EACCES;
 		}
 
-		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
+		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
+		    !reg->non_owning_ref_lock.ptr) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -6031,9 +6039,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			cur->active_lock.ptr = btf;
 		cur->active_lock.id = reg->id;
 	} else {
-		struct bpf_func_state *fstate = cur_func(env);
 		void *ptr;
-		int i;
 
 		if (map)
 			ptr = map;
@@ -6049,25 +6055,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
-		cur->active_lock.ptr = NULL;
-		cur->active_lock.id = 0;
 
-		for (i = fstate->acquired_refs - 1; i >= 0; i--) {
-			int err;
+		invalidate_non_owning_refs(env, &cur->active_lock);
 
-			/* Complain on error because this reference state cannot
-			 * be freed before this point, as bpf_spin_lock critical
-			 * section does not allow functions that release the
-			 * allocated object immediately.
-			 */
-			if (!fstate->refs[i].release_on_unlock)
-				continue;
-			err = release_reference(env, fstate->refs[i].id);
-			if (err) {
-				verbose(env, "failed to release release_on_unlock reference");
-				return err;
-			}
-		}
+		cur->active_lock.ptr = NULL;
+		cur->active_lock.id = 0;
 	}
 	return 0;
 }
@@ -6535,6 +6527,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static struct btf_field *
+reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields)
+{
+	struct btf_field *field;
+	struct btf_record *rec;
+
+	rec = reg_btf_record(reg);
+	if (!rec)
+		return NULL;
+
+	field = btf_record_find(rec, off, fields);
+	if (!field)
+		return NULL;
+
+	return field;
+}
+
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
 			   enum bpf_arg_type arg_type)
@@ -6556,6 +6565,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		 */
 		if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK)
 			return 0;
+
+		if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) {
+			if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT))
+				return __check_ptr_off_reg(env, reg, regno, true);
+
+			verbose(env, "R%d must have zero offset when passed to release func\n",
+				regno);
+			verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno,
+				kernel_type_name(reg->btf, reg->btf_id), reg->off);
+			return -EINVAL;
+		}
+
 		/* Doing check_ptr_off_reg check for the offset will catch this
 		 * because fixed_off_ok is false, but checking here allows us
 		 * to give the user a better error message.
@@ -7352,6 +7373,20 @@ static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
+				       struct bpf_active_lock *lock)
+{
+	struct bpf_func_state *unused;
+	struct bpf_reg_state *reg;
+
+	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+		if (reg->non_owning_ref_lock.ptr &&
+		    reg->non_owning_ref_lock.ptr == lock->ptr &&
+		    reg->non_owning_ref_lock.id == lock->id)
+			__mark_reg_unknown(env, reg);
+	}));
+}
+
 static void clear_caller_saved_regs(struct bpf_verifier_env *env,
 				    struct bpf_reg_state *regs)
 {
@@ -8904,38 +8939,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
+static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
-	struct bpf_func_state *state = cur_func(env);
+	struct bpf_verifier_state *state = env->cur_state;
+
+	if (!state->active_lock.ptr) {
+		verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n");
+		return -EFAULT;
+	}
+
+	if (reg->non_owning_ref_lock.ptr) {
+		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
+		return -EFAULT;
+	}
+
+	reg->non_owning_ref_lock.id = state->active_lock.id;
+	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
+	return 0;
+}
+
+static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id)
+{
+	struct bpf_func_state *state, *unused;
 	struct bpf_reg_state *reg;
 	int i;
 
-	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
-	 * subprogs, no global functions. This means that the references would
-	 * not be released inside the critical section but they may be added to
-	 * the reference state, and the acquired_refs are never copied out for a
-	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
-	 * critical sections.
-	 */
+	state = cur_func(env);
+
 	if (!ref_obj_id) {
-		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
+		verbose(env, "verifier internal error: ref_obj_id is zero for "
+			     "owning -> non-owning conversion\n");
 		return -EFAULT;
 	}
+
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->refs[i].id == ref_obj_id) {
-			if (state->refs[i].release_on_unlock) {
-				verbose(env, "verifier internal error: expected false release_on_unlock");
-				return -EFAULT;
+		if (state->refs[i].id != ref_obj_id)
+			continue;
+
+		/* Clear ref_obj_id here so release_reference doesn't clobber
+		 * the whole reg
+		 */
+		bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
+			if (reg->ref_obj_id == ref_obj_id) {
+				reg->ref_obj_id = 0;
+				ref_set_non_owning_lock(env, reg);
 			}
-			state->refs[i].release_on_unlock = true;
-			/* Now mark everyone sharing same ref_obj_id as untrusted */
-			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-				if (reg->ref_obj_id == ref_obj_id)
-					reg->type |= PTR_UNTRUSTED;
-			}));
-			return 0;
-		}
+		}));
+		return 0;
 	}
+
 	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
 	return -EFAULT;
 }
@@ -9070,7 +9122,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 {
 	const struct btf_type *et, *t;
 	struct btf_field *field;
-	struct btf_record *rec;
 	u32 list_node_off;
 
 	if (meta->btf != btf_vmlinux ||
@@ -9087,9 +9138,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	rec = reg_btf_record(reg);
 	list_node_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
+	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
 	if (!field || field->offset != list_node_off) {
 		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
 		return -EINVAL;
@@ -9115,8 +9165,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 			btf_name_by_offset(field->graph_root.btf, et->name_off));
 		return -EINVAL;
 	}
-	/* Set arg#1 for expiration after unlock */
-	return ref_set_release_on_unlock(env, reg->ref_obj_id);
+
+	return 0;
 }
 
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
@@ -9395,11 +9445,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
 	const struct btf_type *t, *func, *func_proto, *ptr_type;
+	u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id;
 	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
 	bool sleepable, rcu_lock, rcu_unlock;
 	struct bpf_kfunc_call_arg_meta meta;
-	u32 i, nargs, func_id, ptr_type_id;
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	const struct btf_type *ret_t;
@@ -9494,6 +9544,24 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_list_push_front] ||
+	    meta.func_id == special_kfunc_list[KF_bpf_list_push_back]) {
+		release_ref_obj_id = regs[BPF_REG_2].ref_obj_id;
+		err = ref_convert_owning_non_owning(env, release_ref_obj_id);
+		if (err) {
+			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
+				func_name, func_id);
+			return err;
+		}
+
+		err = release_reference(env, release_ref_obj_id);
+		if (err) {
+			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
+				func_name, func_id);
+			return err;
+		}
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 02/11] bpf: Improve bpf_reg_state space usage for non-owning ref lock
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 03/11] selftests/bpf: Update linked_list tests for non-owning ref semantics Dave Marchevsky
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch eliminates extra bpf_reg_state memory usage added due to
previous patch keeping a copy of lock identity in reg state for
non-owning refs.

Instead of copying lock identity around, this patch changes
non_owning_ref_lock field to be a bool, taking advantage of the
following:

  * There can currently only be one active lock at a time
  * non-owning refs are only valid in the critical section

So if a verifier_state has an active_lock, any non-owning ref must've
been obtained under that lock, and any non-owning ref not obtained under
that lock must have been invalidated previously. Therefore if a
non-owning ref is associated with a lock, it's the active_lock of the
current state. So we can keep a bool "are we associated with active_lock
of current state" instead of copying lock identity around.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf_verifier.h |  2 +-
 kernel/bpf/verifier.c        | 25 ++++++++++---------------
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7b5fbb66446c..d25446dd0413 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -84,7 +84,7 @@ struct bpf_reg_state {
 		struct {
 			struct btf *btf;
 			u32 btf_id;
-			struct bpf_active_lock non_owning_ref_lock;
+			bool non_owning_ref_lock;
 		};
 
 		struct { /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f693cc97c574..89c09752421c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -190,8 +190,7 @@ struct bpf_verifier_stack_elem {
 
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
-static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
-				       struct bpf_active_lock *lock);
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
 static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *reg);
 
@@ -1077,9 +1076,8 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 				verbose_a("id=%d", reg->id);
 			if (reg->ref_obj_id)
 				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
-			if (reg->non_owning_ref_lock.ptr)
-				verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr,
-					  reg->non_owning_ref_lock.id);
+			if (reg->non_owning_ref_lock)
+				verbose_a("%s", "non_own_ref");
 			if (t != SCALAR_VALUE)
 				verbose_a("off=%d", reg->off);
 			if (type_is_pkt_pointer(t))
@@ -5049,7 +5047,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		}
 
 		if (type_is_alloc(reg->type) && !reg->ref_obj_id &&
-		    !reg->non_owning_ref_lock.ptr) {
+		    !reg->non_owning_ref_lock) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -6056,7 +6054,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			return -EINVAL;
 		}
 
-		invalidate_non_owning_refs(env, &cur->active_lock);
+		invalidate_non_owning_refs(env);
 
 		cur->active_lock.ptr = NULL;
 		cur->active_lock.id = 0;
@@ -7373,16 +7371,14 @@ static int release_reference(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
-				       struct bpf_active_lock *lock)
+static void invalidate_non_owning_refs(struct bpf_verifier_env *env)
 {
 	struct bpf_func_state *unused;
 	struct bpf_reg_state *reg;
 
 	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
-		if (reg->non_owning_ref_lock.ptr &&
-		    reg->non_owning_ref_lock.ptr == lock->ptr &&
-		    reg->non_owning_ref_lock.id == lock->id)
+		if (type_is_ptr_alloc_obj(reg->type) &&
+		    reg->non_owning_ref_lock)
 			__mark_reg_unknown(env, reg);
 	}));
 }
@@ -8948,13 +8944,12 @@ static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_
 		return -EFAULT;
 	}
 
-	if (reg->non_owning_ref_lock.ptr) {
+	if (reg->non_owning_ref_lock) {
 		verbose(env, "verifier internal error: non_owning_ref_lock already set\n");
 		return -EFAULT;
 	}
 
-	reg->non_owning_ref_lock.id = state->active_lock.id;
-	reg->non_owning_ref_lock.ptr = state->active_lock.ptr;
+	reg->non_owning_ref_lock = true;
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 03/11] selftests/bpf: Update linked_list tests for non-owning ref semantics
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 02/11] bpf: Improve bpf_reg_state space usage for non-owning ref lock Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Current linked_list semantics for release_on_unlock node refs are almost
exactly the same as newly-introduced "non-owning reference" concept. The
only difference: writes to a release_on_unlock node ref are not allowed,
while writes to non-owning reference pointees are.

As a result the linked_list "write after push" failure tests are no
longer scenarios that should fail.

The test##_missing_lock_##op and test##_incorrect_lock_##op
macro-generated failure tests need to have a valid node argument in
order to have the same error output as before. Otherwise verification
will fail early and the expected error output won't be seen.

Some other tests have minor changes in error output, but fail for the
same reason.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/linked_list.c    |   6 +-
 .../testing/selftests/bpf/progs/linked_list.c |   2 +-
 .../selftests/bpf/progs/linked_list_fail.c    | 100 +++++++++++-------
 3 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 9a7d4c47af63..0a0fdaec98bd 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -78,12 +78,10 @@ static struct {
 	{ "direct_write_head", "direct access to bpf_list_head is disallowed" },
 	{ "direct_read_node", "direct access to bpf_list_node is disallowed" },
 	{ "direct_write_node", "direct access to bpf_list_node is disallowed" },
-	{ "write_after_push_front", "only read is supported" },
-	{ "write_after_push_back", "only read is supported" },
 	{ "use_after_unlock_push_front", "invalid mem access 'scalar'" },
 	{ "use_after_unlock_push_back", "invalid mem access 'scalar'" },
-	{ "double_push_front", "arg#1 expected pointer to allocated object" },
-	{ "double_push_back", "arg#1 expected pointer to allocated object" },
+	{ "double_push_front", "allocated object must be referenced" },
+	{ "double_push_back", "allocated object must be referenced" },
 	{ "no_node_value_type", "bpf_list_node not found at offset=0" },
 	{ "incorrect_value_type",
 	  "operation on bpf_list_head expects arg#1 bpf_list_node at offset=0 in struct foo, "
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
index 4ad88da5cda2..4fa4a9b01bde 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -260,7 +260,7 @@ int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head
 {
 	int ret;
 
-	ret = list_push_pop_multiple(lock ,head, false);
+	ret = list_push_pop_multiple(lock, head, false);
 	if (ret)
 		return ret;
 	return list_push_pop_multiple(lock, head, true);
diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c
index 1d9017240e19..69cdc07cba13 100644
--- a/tools/testing/selftests/bpf/progs/linked_list_fail.c
+++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c
@@ -54,28 +54,44 @@
 		return 0;                                   \
 	}
 
-CHECK(kptr, push_front, &f->head);
-CHECK(kptr, push_back, &f->head);
 CHECK(kptr, pop_front, &f->head);
 CHECK(kptr, pop_back, &f->head);
 
-CHECK(global, push_front, &ghead);
-CHECK(global, push_back, &ghead);
 CHECK(global, pop_front, &ghead);
 CHECK(global, pop_back, &ghead);
 
-CHECK(map, push_front, &v->head);
-CHECK(map, push_back, &v->head);
 CHECK(map, pop_front, &v->head);
 CHECK(map, pop_back, &v->head);
 
-CHECK(inner_map, push_front, &iv->head);
-CHECK(inner_map, push_back, &iv->head);
 CHECK(inner_map, pop_front, &iv->head);
 CHECK(inner_map, pop_back, &iv->head);
 
 #undef CHECK
 
+#define CHECK(test, op, hexpr, nexpr)					\
+	SEC("?tc")							\
+	int test##_missing_lock_##op(void *ctx)				\
+	{								\
+		INIT;							\
+		void (*p)(void *, void *) = (void *)&bpf_list_##op;	\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+CHECK(kptr, push_front, &f->head, b);
+CHECK(kptr, push_back, &f->head, b);
+
+CHECK(global, push_front, &ghead, f);
+CHECK(global, push_back, &ghead, f);
+
+CHECK(map, push_front, &v->head, f);
+CHECK(map, push_back, &v->head, f);
+
+CHECK(inner_map, push_front, &iv->head, f);
+CHECK(inner_map, push_back, &iv->head, f);
+
+#undef CHECK
+
 #define CHECK(test, op, lexpr, hexpr)                       \
 	SEC("?tc")                                          \
 	int test##_incorrect_lock_##op(void *ctx)           \
@@ -108,11 +124,47 @@ CHECK(inner_map, pop_back, &iv->head);
 	CHECK(inner_map_global, op, &iv->lock, &ghead);        \
 	CHECK(inner_map_map, op, &iv->lock, &v->head);
 
-CHECK_OP(push_front);
-CHECK_OP(push_back);
 CHECK_OP(pop_front);
 CHECK_OP(pop_back);
 
+#undef CHECK
+#undef CHECK_OP
+
+#define CHECK(test, op, lexpr, hexpr, nexpr)				\
+	SEC("?tc")							\
+	int test##_incorrect_lock_##op(void *ctx)			\
+	{								\
+		INIT;							\
+		void (*p)(void *, void*) = (void *)&bpf_list_##op;	\
+		bpf_spin_lock(lexpr);					\
+		p(hexpr, nexpr);					\
+		return 0;						\
+	}
+
+#define CHECK_OP(op)							\
+	CHECK(kptr_kptr, op, &f1->lock, &f2->head, b);			\
+	CHECK(kptr_global, op, &f1->lock, &ghead, f);			\
+	CHECK(kptr_map, op, &f1->lock, &v->head, f);			\
+	CHECK(kptr_inner_map, op, &f1->lock, &iv->head, f);		\
+									\
+	CHECK(global_global, op, &glock2, &ghead, f);			\
+	CHECK(global_kptr, op, &glock, &f1->head, b);			\
+	CHECK(global_map, op, &glock, &v->head, f);			\
+	CHECK(global_inner_map, op, &glock, &iv->head, f);		\
+									\
+	CHECK(map_map, op, &v->lock, &v2->head, f);			\
+	CHECK(map_kptr, op, &v->lock, &f2->head, b);			\
+	CHECK(map_global, op, &v->lock, &ghead, f);			\
+	CHECK(map_inner_map, op, &v->lock, &iv->head, f);		\
+									\
+	CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head, f);	\
+	CHECK(inner_map_kptr, op, &iv->lock, &f2->head, b);		\
+	CHECK(inner_map_global, op, &iv->lock, &ghead, f);		\
+	CHECK(inner_map_map, op, &iv->lock, &v->head, f);
+
+CHECK_OP(push_front);
+CHECK_OP(push_back);
+
 #undef CHECK
 #undef CHECK_OP
 #undef INIT
@@ -303,34 +355,6 @@ int direct_write_node(void *ctx)
 	return 0;
 }
 
-static __always_inline
-int write_after_op(void (*push_op)(void *head, void *node))
-{
-	struct foo *f;
-
-	f = bpf_obj_new(typeof(*f));
-	if (!f)
-		return 0;
-	bpf_spin_lock(&glock);
-	push_op(&ghead, &f->node);
-	f->data = 42;
-	bpf_spin_unlock(&glock);
-
-	return 0;
-}
-
-SEC("?tc")
-int write_after_push_front(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_front);
-}
-
-SEC("?tc")
-int write_after_push_back(void *ctx)
-{
-	return write_after_op((void *)bpf_list_push_back);
-}
-
 static __always_inline
 int use_after_unlock(void (*op)(void *head, void *node))
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (2 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 03/11] selftests/bpf: Update linked_list tests for non-owning ref semantics Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-10 14:18   ` Kumar Kartikeya Dwivedi
  2023-02-09 17:41 ` [PATCH v4 bpf-next 05/11] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
map_values.

structs bpf_rb_root and bpf_rb_node are opaque types meant to
obscure structs rb_root_cached rb_node, respectively.

btf_struct_access will prevent BPF programs from touching these special
fields automatically now that they're recognized.

btf_check_and_fixup_fields now groups list_head and rb_root together as
"graph root" fields and {list,rb}_node as "graph node", and does same
ownership cycle checking as before. Note that this function does _not_
prevent ownership type mixups (e.g. rb_root owning list_node) - that's
handled by btf_parse_graph_root.

After this patch, a bpf program can have a struct bpf_rb_root in a
map_value, but not add anything to nor do anything useful with it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h                           |  20 ++-
 include/uapi/linux/bpf.h                      |  11 ++
 kernel/bpf/btf.c                              | 162 ++++++++++++------
 kernel/bpf/helpers.c                          |  40 +++++
 kernel/bpf/syscall.c                          |  28 ++-
 kernel/bpf/verifier.c                         |   5 +-
 tools/include/uapi/linux/bpf.h                |  11 ++
 .../selftests/bpf/prog_tests/linked_list.c    |  12 +-
 8 files changed, 216 insertions(+), 73 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a79ebe1774c..0074bc6e42a5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,7 +180,10 @@ enum btf_field_type {
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
 	BPF_LIST_NODE  = (1 << 5),
-	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
+	BPF_RB_ROOT    = (1 << 6),
+	BPF_RB_NODE    = (1 << 7),
+	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
+				 BPF_RB_NODE | BPF_RB_ROOT,
 };
 
 struct btf_field_kptr {
@@ -284,6 +287,10 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
 		return "bpf_list_node";
+	case BPF_RB_ROOT:
+		return "bpf_rb_root";
+	case BPF_RB_NODE:
+		return "bpf_rb_node";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -304,6 +311,10 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return sizeof(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return sizeof(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return sizeof(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -324,6 +335,10 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_list_head);
 	case BPF_LIST_NODE:
 		return __alignof__(struct bpf_list_node);
+	case BPF_RB_ROOT:
+		return __alignof__(struct bpf_rb_root);
+	case BPF_RB_NODE:
+		return __alignof__(struct bpf_rb_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -434,6 +449,9 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 void bpf_timer_cancel_and_free(void *timer);
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock);
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock);
+
 
 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 17afd2b35ee5..1503f61336b6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6917,6 +6917,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 1622a3b15d6f..8a09da37045e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3324,12 +3324,14 @@ static const char *btf_find_decl_tag_value(const struct btf *btf,
 	return NULL;
 }
 
-static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
-			      const struct btf_type *t, int comp_idx,
-			      u32 off, int sz, struct btf_field_info *info)
+static int
+btf_find_graph_root(const struct btf *btf, const struct btf_type *pt,
+		    const struct btf_type *t, int comp_idx, u32 off,
+		    int sz, struct btf_field_info *info,
+		    enum btf_field_type head_type)
 {
+	const char *node_field_name;
 	const char *value_type;
-	const char *list_node;
 	s32 id;
 
 	if (!__btf_type_is_struct(t))
@@ -3339,26 +3341,32 @@ static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
 	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
 	if (!value_type)
 		return -EINVAL;
-	list_node = strstr(value_type, ":");
-	if (!list_node)
+	node_field_name = strstr(value_type, ":");
+	if (!node_field_name)
 		return -EINVAL;
-	value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN);
+	value_type = kstrndup(value_type, node_field_name - value_type, GFP_KERNEL | __GFP_NOWARN);
 	if (!value_type)
 		return -ENOMEM;
 	id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT);
 	kfree(value_type);
 	if (id < 0)
 		return id;
-	list_node++;
-	if (str_is_empty(list_node))
+	node_field_name++;
+	if (str_is_empty(node_field_name))
 		return -EINVAL;
-	info->type = BPF_LIST_HEAD;
+	info->type = head_type;
 	info->off = off;
 	info->graph_root.value_btf_id = id;
-	info->graph_root.node_name = list_node;
+	info->graph_root.node_name = node_field_name;
 	return BTF_FIELD_FOUND;
 }
 
+#define field_mask_test_name(field_type, field_type_str) \
+	if (field_mask & field_type && !strcmp(name, field_type_str)) { \
+		type = field_type;					\
+		goto end;						\
+	}
+
 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			      int *align, int *sz)
 {
@@ -3382,18 +3390,11 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
-	if (field_mask & BPF_LIST_HEAD) {
-		if (!strcmp(name, "bpf_list_head")) {
-			type = BPF_LIST_HEAD;
-			goto end;
-		}
-	}
-	if (field_mask & BPF_LIST_NODE) {
-		if (!strcmp(name, "bpf_list_node")) {
-			type = BPF_LIST_NODE;
-			goto end;
-		}
-	}
+	field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
+	field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
+	field_mask_test_name(BPF_RB_ROOT,   "bpf_rb_root");
+	field_mask_test_name(BPF_RB_NODE,   "bpf_rb_node");
+
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3406,6 +3407,8 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 	return type;
 }
 
+#undef field_mask_test_name
+
 static int btf_find_struct_field(const struct btf *btf,
 				 const struct btf_type *t, u32 field_mask,
 				 struct btf_field_info *info, int info_cnt)
@@ -3438,6 +3441,7 @@ static int btf_find_struct_field(const struct btf *btf,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3451,8 +3455,11 @@ static int btf_find_struct_field(const struct btf *btf,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, t, member_type, i, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_graph_root(btf, t, member_type,
+						  i, off, sz,
+						  idx < info_cnt ? &info[idx] : &tmp,
+						  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3499,6 +3506,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3512,8 +3520,11 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				return ret;
 			break;
 		case BPF_LIST_HEAD:
-			ret = btf_find_list_head(btf, var, var_type, -1, off, sz,
-						 idx < info_cnt ? &info[idx] : &tmp);
+		case BPF_RB_ROOT:
+			ret = btf_find_graph_root(btf, var, var_type,
+						  -1, off, sz,
+						  idx < info_cnt ? &info[idx] : &tmp,
+						  field_type);
 			if (ret < 0)
 				return ret;
 			break;
@@ -3615,8 +3626,11 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 	return ret;
 }
 
-static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
-			       struct btf_field_info *info)
+static int btf_parse_graph_root(const struct btf *btf,
+				struct btf_field *field,
+				struct btf_field_info *info,
+				const char *node_type_name,
+				size_t node_type_align)
 {
 	const struct btf_type *t, *n = NULL;
 	const struct btf_member *member;
@@ -3638,13 +3652,13 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 		n = btf_type_by_id(btf, member->type);
 		if (!__btf_type_is_struct(n))
 			return -EINVAL;
-		if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off)))
+		if (strcmp(node_type_name, __btf_name_by_offset(btf, n->name_off)))
 			return -EINVAL;
 		offset = __btf_member_bit_offset(n, member);
 		if (offset % 8)
 			return -EINVAL;
 		offset /= 8;
-		if (offset % __alignof__(struct bpf_list_node))
+		if (offset % node_type_align)
 			return -EINVAL;
 
 		field->graph_root.btf = (struct btf *)btf;
@@ -3656,6 +3670,20 @@ static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
 	return 0;
 }
 
+static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
+			       struct btf_field_info *info)
+{
+	return btf_parse_graph_root(btf, field, info, "bpf_list_node",
+					    __alignof__(struct bpf_list_node));
+}
+
+static int btf_parse_rb_root(const struct btf *btf, struct btf_field *field,
+			     struct btf_field_info *info)
+{
+	return btf_parse_graph_root(btf, field, info, "bpf_rb_node",
+					    __alignof__(struct bpf_rb_node));
+}
+
 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
 				    u32 field_mask, u32 value_size)
 {
@@ -3718,7 +3746,13 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_RB_ROOT:
+			ret = btf_parse_rb_root(btf, &rec->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			ret = -EFAULT;
@@ -3727,8 +3761,9 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
-	/* bpf_list_head requires bpf_spin_lock */
-	if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) {
+	/* bpf_{list_head, rb_node} require bpf_spin_lock */
+	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
+	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -3739,22 +3774,28 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 	return ERR_PTR(ret);
 }
 
+#define GRAPH_ROOT_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
+#define GRAPH_NODE_MASK (BPF_LIST_NODE | BPF_RB_NODE)
+
 int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 {
 	int i;
 
-	/* There are two owning types, kptr_ref and bpf_list_head. The former
-	 * only supports storing kernel types, which can never store references
-	 * to program allocated local types, atleast not yet. Hence we only need
-	 * to ensure that bpf_list_head ownership does not form cycles.
+	/* There are three types that signify ownership of some other type:
+	 *  kptr_ref, bpf_list_head, bpf_rb_root.
+	 * kptr_ref only supports storing kernel types, which can't store
+	 * references to program allocated local types.
+	 *
+	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
+	 * does not form cycles.
 	 */
-	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
+	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & GRAPH_ROOT_MASK))
 		return 0;
 	for (i = 0; i < rec->cnt; i++) {
 		struct btf_struct_meta *meta;
 		u32 btf_id;
 
-		if (!(rec->fields[i].type & BPF_LIST_HEAD))
+		if (!(rec->fields[i].type & GRAPH_ROOT_MASK))
 			continue;
 		btf_id = rec->fields[i].graph_root.value_btf_id;
 		meta = btf_find_struct_meta(btf, btf_id);
@@ -3762,39 +3803,47 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
 			return -EFAULT;
 		rec->fields[i].graph_root.value_rec = meta->record;
 
-		if (!(rec->field_mask & BPF_LIST_NODE))
+		/* We need to set value_rec for all root types, but no need
+		 * to check ownership cycle for a type unless it's also a
+		 * node type.
+		 */
+		if (!(rec->field_mask & GRAPH_NODE_MASK))
 			continue;
 
 		/* We need to ensure ownership acyclicity among all types. The
 		 * proper way to do it would be to topologically sort all BTF
 		 * IDs based on the ownership edges, since there can be multiple
-		 * bpf_list_head in a type. Instead, we use the following
-		 * reasoning:
+		 * bpf_{list_head,rb_node} in a type. Instead, we use the
+		 * following resaoning:
 		 *
 		 * - A type can only be owned by another type in user BTF if it
-		 *   has a bpf_list_node.
+		 *   has a bpf_{list,rb}_node. Let's call these node types.
 		 * - A type can only _own_ another type in user BTF if it has a
-		 *   bpf_list_head.
+		 *   bpf_{list_head,rb_root}. Let's call these root types.
 		 *
-		 * We ensure that if a type has both bpf_list_head and
-		 * bpf_list_node, its element types cannot be owning types.
+		 * We ensure that if a type is both a root and node, its
+		 * element types cannot be root types.
 		 *
 		 * To ensure acyclicity:
 		 *
-		 * When A only has bpf_list_head, ownership chain can be:
+		 * When A is an root type but not a node, its ownership
+		 * chain can be:
 		 *	A -> B -> C
 		 * Where:
-		 * - B has both bpf_list_head and bpf_list_node.
-		 * - C only has bpf_list_node.
+		 * - A is an root, e.g. has bpf_rb_root.
+		 * - B is both a root and node, e.g. has bpf_rb_node and
+		 *   bpf_list_head.
+		 * - C is only an root, e.g. has bpf_list_node
 		 *
-		 * When A has both bpf_list_head and bpf_list_node, some other
-		 * type already owns it in the BTF domain, hence it can not own
-		 * another owning type through any of the bpf_list_head edges.
+		 * When A is both a root and node, some other type already
+		 * owns it in the BTF domain, hence it can not own
+		 * another root type through any of the ownership edges.
 		 *	A -> B
 		 * Where:
-		 * - B only has bpf_list_node.
+		 * - A is both an root and node.
+		 * - B is only an node.
 		 */
-		if (meta->record->field_mask & BPF_LIST_HEAD)
+		if (meta->record->field_mask & GRAPH_ROOT_MASK)
 			return -ELOOP;
 	}
 	return 0;
@@ -5256,6 +5305,8 @@ static const char *alloc_obj_fields[] = {
 	"bpf_spin_lock",
 	"bpf_list_head",
 	"bpf_list_node",
+	"bpf_rb_root",
+	"bpf_rb_node",
 };
 
 static struct btf_struct_metas *
@@ -5329,7 +5380,8 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+						  BPF_RB_ROOT | BPF_RB_NODE, t->size);
 		/* The record cannot be unset, treat it as an error if so */
 		if (IS_ERR_OR_NULL(record)) {
 			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2dae44581922..192184b5156e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1772,6 +1772,46 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	}
 }
 
+/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
+ * 'rb_node *', so field name of rb_node within containing struct is not
+ * needed.
+ *
+ * Since bpf_rb_tree's node type has a corresponding struct btf_field with
+ * graph_root.node_offset, it's not necessary to know field name
+ * or type of node struct
+ */
+#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \
+	for (pos = rb_first_postorder(root); \
+	    pos && ({ n = rb_next_postorder(pos); 1; }); \
+	    pos = n)
+
+void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
+		      struct bpf_spin_lock *spin_lock)
+{
+	struct rb_root_cached orig_root, *root = rb_root;
+	struct rb_node *pos, *n;
+	void *obj;
+
+	BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
+	BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
+
+	__bpf_spin_lock_irqsave(spin_lock);
+	orig_root = *root;
+	*root = RB_ROOT_CACHED;
+	__bpf_spin_unlock_irqrestore(spin_lock);
+
+	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
+		obj = pos;
+		obj -= field->graph_root.node_offset;
+
+		bpf_obj_free_fields(field->graph_root.value_rec, obj);
+
+		migrate_disable();
+		bpf_mem_free(&bpf_global_ma, obj);
+		migrate_enable();
+	}
+}
+
 __diag_push();
 __diag_ignore_all("-Wmissing-prototypes",
 		  "Global functions as their definitions will be in vmlinux BTF");
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bcc97613de76..0806cb9f8a1c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -521,9 +521,6 @@ void btf_record_free(struct btf_record *rec)
 		return;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (rec->fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			if (rec->fields[i].kptr.module)
@@ -532,7 +529,11 @@ void btf_record_free(struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to release for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to release */
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -565,9 +566,6 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 	new_rec->cnt = 0;
 	for (i = 0; i < rec->cnt; i++) {
 		switch (fields[i].type) {
-		case BPF_SPIN_LOCK:
-		case BPF_TIMER:
-			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			btf_get(fields[i].kptr.btf);
@@ -578,7 +576,11 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 			break;
 		case BPF_LIST_HEAD:
 		case BPF_LIST_NODE:
-			/* Nothing to acquire for bpf_list_head */
+		case BPF_RB_ROOT:
+		case BPF_RB_NODE:
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			/* Nothing to acquire */
 			break;
 		default:
 			ret = -EFAULT;
@@ -658,7 +660,13 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 				continue;
 			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
+		case BPF_RB_ROOT:
+			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
+				continue;
+			bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
+			break;
 		case BPF_LIST_NODE:
+		case BPF_RB_NODE:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -994,7 +1002,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
+				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_RB_ROOT,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
@@ -1042,6 +1051,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				}
 				break;
 			case BPF_LIST_HEAD:
+			case BPF_RB_ROOT:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 89c09752421c..ec4d7e19ed41 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14685,9 +14685,10 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
-	if (btf_record_has_field(map->record, BPF_LIST_HEAD)) {
+	if (btf_record_has_field(map->record, BPF_LIST_HEAD) ||
+	    btf_record_has_field(map->record, BPF_RB_ROOT)) {
 		if (is_tracing_prog_type(prog_type)) {
-			verbose(env, "tracing progs cannot use bpf_list_head yet\n");
+			verbose(env, "tracing progs cannot use bpf_{list_head,rb_root} yet\n");
 			return -EINVAL;
 		}
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 17afd2b35ee5..1503f61336b6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6917,6 +6917,17 @@ struct bpf_list_node {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_rb_root {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_rb_node {
+	__u64 :64;
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 0a0fdaec98bd..7fde4a95dd49 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -58,12 +58,12 @@ static struct {
 	TEST(inner_map, pop_front)
 	TEST(inner_map, pop_back)
 #undef TEST
-	{ "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_perf", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" },
-	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_kprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_kretprobe", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_perf", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
+	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
 	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
 	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 05/11] bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (3 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 06/11] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds implementations of bpf_rbtree_{add,remove,first}
and teaches verifier about their BTF_IDs as well as those of
bpf_rb_{root,node}.

All three kfuncs have some nonstandard component to their verification
that needs to be addressed in future patches before programs can
properly use them:

  * bpf_rbtree_add:     Takes 'less' callback, need to verify it

  * bpf_rbtree_first:   Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). Return value ref is
			non-owning.

  * bpf_rbtree_remove:  Returns ptr_to_node_type(off=rb_node_off) instead
                        of ptr_to_rb_node(off=0). 2nd arg (node) is a
			non-owning reference.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c  | 28 ++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 14 +++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 192184b5156e..cea42b31e9b8 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1884,6 +1884,30 @@ __bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
 	return __bpf_list_del(head, true);
 }
 
+struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, struct bpf_rb_node *node)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+	struct rb_node *n = (struct rb_node *)node;
+
+	rb_erase_cached(n, r);
+	RB_CLEAR_NODE(n);
+	return (struct bpf_rb_node *)n;
+}
+
+void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+		    bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
+{
+	rb_add_cached((struct rb_node *)node, (struct rb_root_cached *)root,
+		      (bool (*)(struct rb_node *, const struct rb_node *))less);
+}
+
+struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root)
+{
+	struct rb_root_cached *r = (struct rb_root_cached *)root;
+
+	return (struct bpf_rb_node *)rb_first_cached(r);
+}
+
 /**
  * bpf_task_acquire - Acquire a reference to a task. A task acquired by this
  * kfunc which is not stored in a map as a kptr, must be released by calling
@@ -2108,6 +2132,10 @@ BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_task_acquire_not_zero, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_rbtree_add)
+BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL)
+
 #ifdef CONFIG_CGROUPS
 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ec4d7e19ed41..58416aef7c0b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8622,6 +8622,8 @@ BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
 BTF_ID(struct, bpf_list_head)
 BTF_ID(struct, bpf_list_node)
+BTF_ID(struct, bpf_rb_root)
+BTF_ID(struct, bpf_rb_node)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -8727,6 +8729,9 @@ enum special_kfunc_type {
 	KF_bpf_rdonly_cast,
 	KF_bpf_rcu_read_lock,
 	KF_bpf_rcu_read_unlock,
+	KF_bpf_rbtree_remove,
+	KF_bpf_rbtree_add,
+	KF_bpf_rbtree_first,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -8738,6 +8743,9 @@ BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -8751,6 +8759,9 @@ BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
 BTF_ID(func, bpf_rcu_read_lock)
 BTF_ID(func, bpf_rcu_read_unlock)
+BTF_ID(func, bpf_rbtree_remove)
+BTF_ID(func, bpf_rbtree_add)
+BTF_ID(func, bpf_rbtree_first)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -9540,7 +9551,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	}
 
 	if (meta.func_id == special_kfunc_list[KF_bpf_list_push_front] ||
-	    meta.func_id == special_kfunc_list[KF_bpf_list_push_back]) {
+	    meta.func_id == special_kfunc_list[KF_bpf_list_push_back] ||
+	    meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
 		release_ref_obj_id = regs[BPF_REG_2].ref_obj_id;
 		err = ref_convert_owning_non_owning(env, release_ref_obj_id);
 		if (err) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 06/11] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (4 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 05/11] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 07/11] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Now that we find bpf_rb_root and bpf_rb_node in structs, let's give args
that contain those types special classification and properly handle
these types when checking kfunc args.

"Properly handling" these types largely requires generalizing similar
handling for bpf_list_{head,node}, with little new logic added in this
patch.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 238 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 203 insertions(+), 35 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 58416aef7c0b..8a47bc9617e1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8505,6 +8505,9 @@ struct bpf_kfunc_call_arg_meta {
 	struct {
 		struct btf_field *field;
 	} arg_list_head;
+	struct {
+		struct btf_field *field;
+	} arg_rbtree_root;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -8616,6 +8619,8 @@ enum {
 	KF_ARG_DYNPTR_ID,
 	KF_ARG_LIST_HEAD_ID,
 	KF_ARG_LIST_NODE_ID,
+	KF_ARG_RB_ROOT_ID,
+	KF_ARG_RB_NODE_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -8657,6 +8662,16 @@ static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
 }
 
+static bool is_kfunc_arg_rbtree_root(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_ROOT_ID);
+}
+
+static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8716,6 +8731,8 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_RB_ROOT,
+	KF_ARG_PTR_TO_RB_NODE,
 };
 
 enum special_kfunc_type {
@@ -8823,6 +8840,12 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_LIST_NODE;
 
+	if (is_kfunc_arg_rbtree_root(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_ROOT;
+
+	if (is_kfunc_arg_rbtree_node(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RB_NODE;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -9079,95 +9102,193 @@ static bool is_bpf_list_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_list_pop_back];
 }
 
-static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
-					   struct bpf_reg_state *reg, u32 regno,
-					   struct bpf_kfunc_call_arg_meta *meta)
+static bool is_bpf_rbtree_api_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+	       btf_id == special_kfunc_list[KF_bpf_rbtree_first];
+}
+
+static bool is_bpf_graph_api_kfunc(u32 btf_id)
+{
+	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
+}
+
+static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
+					  enum btf_field_type head_field_type,
+					  u32 kfunc_btf_id)
 {
+	bool ret;
+
+	switch (head_field_type) {
+	case BPF_LIST_HEAD:
+		ret = is_bpf_list_api_kfunc(kfunc_btf_id);
+		break;
+	case BPF_RB_ROOT:
+		ret = is_bpf_rbtree_api_kfunc(kfunc_btf_id);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected graph root argument type %s\n",
+			btf_field_type_name(head_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s head arg for unknown kfunc\n",
+			btf_field_type_name(head_field_type));
+	return ret;
+}
+
+static bool check_kfunc_is_graph_node_api(struct bpf_verifier_env *env,
+					  enum btf_field_type node_field_type,
+					  u32 kfunc_btf_id)
+{
+	bool ret;
+
+	switch (node_field_type) {
+	case BPF_LIST_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_front] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_list_push_back]);
+		break;
+	case BPF_RB_NODE:
+		ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+		       kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add]);
+		break;
+	default:
+		verbose(env, "verifier internal error: unexpected graph node argument type %s\n",
+			btf_field_type_name(node_field_type));
+		return false;
+	}
+
+	if (!ret)
+		verbose(env, "verifier internal error: %s node arg for unknown kfunc\n",
+			btf_field_type_name(node_field_type));
+	return ret;
+}
+
+static int
+__process_kf_arg_ptr_to_graph_root(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg, u32 regno,
+				   struct bpf_kfunc_call_arg_meta *meta,
+				   enum btf_field_type head_field_type,
+				   struct btf_field **head_field)
+{
+	const char *head_type_name;
 	struct btf_field *field;
 	struct btf_record *rec;
-	u32 list_head_off;
+	u32 head_off;
 
-	if (meta->btf != btf_vmlinux || !is_bpf_list_api_kfunc(meta->func_id)) {
-		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_graph_root_api(env, head_field_type, meta->func_id))
+		return -EFAULT;
+
+	head_type_name = btf_field_type_name(head_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, head_type_name);
 		return -EINVAL;
 	}
 
 	rec = reg_btf_record(reg);
-	list_head_off = reg->off + reg->var_off.value;
-	field = btf_record_find(rec, list_head_off, BPF_LIST_HEAD);
+	head_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, head_off, head_field_type);
 	if (!field) {
-		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
+		verbose(env, "%s not found at offset=%u\n", head_type_name, head_off);
 		return -EINVAL;
 	}
 
 	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
 	if (check_reg_allocation_locked(env, reg)) {
-		verbose(env, "bpf_spin_lock at off=%d must be held for bpf_list_head\n",
-			rec->spin_lock_off);
+		verbose(env, "bpf_spin_lock at off=%d must be held for %s\n",
+			rec->spin_lock_off, head_type_name);
 		return -EINVAL;
 	}
 
-	if (meta->arg_list_head.field) {
-		verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
+	if (*head_field) {
+		verbose(env, "verifier internal error: repeating %s arg\n", head_type_name);
 		return -EFAULT;
 	}
-	meta->arg_list_head.field = field;
+	*head_field = field;
 	return 0;
 }
 
-static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
 					   struct bpf_reg_state *reg, u32 regno,
 					   struct bpf_kfunc_call_arg_meta *meta)
 {
+	return __process_kf_arg_ptr_to_graph_root(env, reg, regno, meta, BPF_LIST_HEAD,
+							  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_root(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_root(env, reg, regno, meta, BPF_RB_ROOT,
+							  &meta->arg_rbtree_root.field);
+}
+
+static int
+__process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *reg, u32 regno,
+				   struct bpf_kfunc_call_arg_meta *meta,
+				   enum btf_field_type head_field_type,
+				   enum btf_field_type node_field_type,
+				   struct btf_field **node_field)
+{
+	const char *node_type_name;
 	const struct btf_type *et, *t;
 	struct btf_field *field;
-	u32 list_node_off;
+	u32 node_off;
 
-	if (meta->btf != btf_vmlinux ||
-	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
-	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
-		verbose(env, "verifier internal error: bpf_list_node argument for unknown kfunc\n");
+	if (meta->btf != btf_vmlinux) {
+		verbose(env, "verifier internal error: unexpected btf mismatch in kfunc call\n");
 		return -EFAULT;
 	}
 
+	if (!check_kfunc_is_graph_node_api(env, node_field_type, meta->func_id))
+		return -EFAULT;
+
+	node_type_name = btf_field_type_name(node_field_type);
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_list_node has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s has to be at the constant offset\n",
+			regno, node_type_name);
 		return -EINVAL;
 	}
 
-	list_node_off = reg->off + reg->var_off.value;
-	field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE);
-	if (!field || field->offset != list_node_off) {
-		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
+	node_off = reg->off + reg->var_off.value;
+	field = reg_find_field_offset(reg, node_off, node_field_type);
+	if (!field || field->offset != node_off) {
+		verbose(env, "%s not found at offset=%u\n", node_type_name, node_off);
 		return -EINVAL;
 	}
 
-	field = meta->arg_list_head.field;
+	field = *node_field;
 
 	et = btf_type_by_id(field->graph_root.btf, field->graph_root.value_btf_id);
 	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->graph_root.btf,
 				  field->graph_root.value_btf_id, true)) {
-		verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d "
+		verbose(env, "operation on %s expects arg#1 %s at offset=%d "
 			"in struct %s, but arg is at offset=%d in struct %s\n",
+			btf_field_type_name(head_field_type),
+			btf_field_type_name(node_field_type),
 			field->graph_root.node_offset,
 			btf_name_by_offset(field->graph_root.btf, et->name_off),
-			list_node_off, btf_name_by_offset(reg->btf, t->name_off));
+			node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
 
-	if (list_node_off != field->graph_root.node_offset) {
-		verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n",
-			list_node_off, field->graph_root.node_offset,
+	if (node_off != field->graph_root.node_offset) {
+		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
+			node_off, btf_field_type_name(node_field_type),
+			field->graph_root.node_offset,
 			btf_name_by_offset(field->graph_root.btf, et->name_off));
 		return -EINVAL;
 	}
@@ -9175,6 +9296,24 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
+						  BPF_LIST_HEAD, BPF_LIST_NODE,
+						  &meta->arg_list_head.field);
+}
+
+static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg, u32 regno,
+					     struct bpf_kfunc_call_arg_meta *meta)
+{
+	return __process_kf_arg_ptr_to_graph_node(env, reg, regno, meta,
+						  BPF_RB_ROOT, BPF_RB_NODE,
+						  &meta->arg_rbtree_root.field);
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
@@ -9309,6 +9448,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_DYNPTR:
 		case KF_ARG_PTR_TO_LIST_HEAD:
 		case KF_ARG_PTR_TO_LIST_NODE:
+		case KF_ARG_PTR_TO_RB_ROOT:
+		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
 			/* Trusted by default */
@@ -9387,6 +9528,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_ROOT:
+			if (reg->type != PTR_TO_MAP_VALUE &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_root(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_LIST_NODE:
 			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
@@ -9400,6 +9555,19 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (ret < 0)
 				return ret;
 			break;
+		case KF_ARG_PTR_TO_RB_NODE:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to allocated object\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
@@ -14399,7 +14567,7 @@ static int do_check(struct bpf_verifier_env *env)
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_CALL) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
-					     (insn->off != 0 || !is_bpf_list_api_kfunc(insn->imm)))) {
+					     (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) {
 						verbose(env, "function calls are not allowed while holding a lock\n");
 						return -EINVAL;
 					}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 07/11] bpf: Add callback validation to kfunc verifier logic
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (5 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 06/11] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Some BPF helpers take a callback function which the helper calls. For
each helper that takes such a callback, there's a special call to
__check_func_call with a callback-state-setting callback that sets up
verifier bpf_func_state for the callback's frame.

kfuncs don't have any of this infrastructure yet, so let's add it in
this patch, following existing helper pattern as much as possible. To
validate functionality of this added plumbing, this patch adds
callback handling for the bpf_rbtree_add kfunc and hopes to lay
groundwork for future graph datastructure callbacks.

In the "general plumbing" category we have:

  * check_kfunc_call doing callback verification right before clearing
    CALLER_SAVED_REGS, exactly like check_helper_call
  * recognition of func_ptr BTF types in kfunc args as
    KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type

In the "rbtree_add / graph datastructure-specific plumbing" category:

  * Since bpf_rbtree_add must be called while the spin_lock associated
    with the tree is held, don't complain when callback's func_state
    doesn't unlock it by frame exit
  * Mark rbtree_add callback's args with ref_set_non_owning_lock
    to prevent rbtree api functions from being called in the callback.
    Semantically this makes sense, as less() takes no ownership of its
    args when determining which comes first.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 136 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8a47bc9617e1..842f8b98c9b4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -191,6 +191,7 @@ struct bpf_verifier_stack_elem {
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
 static int ref_set_non_owning_lock(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *reg);
 
@@ -1637,6 +1638,16 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 	reg->type &= ~PTR_MAYBE_NULL;
 }
 
+static void mark_reg_graph_node(struct bpf_reg_state *regs, u32 regno,
+				struct btf_field_graph_root *ds_head)
+{
+	__mark_reg_known_zero(&regs[regno]);
+	regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC;
+	regs[regno].btf = ds_head->btf;
+	regs[regno].btf_id = ds_head->value_btf_id;
+	regs[regno].off = ds_head->node_offset;
+}
+
 static bool reg_is_pkt_pointer(const struct bpf_reg_state *reg)
 {
 	return type_is_pkt_pointer(reg->type);
@@ -6820,6 +6831,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		meta->ret_btf_id = reg->btf_id;
 		break;
 	case ARG_PTR_TO_SPIN_LOCK:
+		if (in_rbtree_lock_required_cb(env)) {
+			verbose(env, "can't spin_{lock,unlock} in rbtree cb\n");
+			return -EACCES;
+		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
 			err = process_spin_lock(env, regno, true);
 			if (err)
@@ -7404,6 +7419,8 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *caller,
 			    struct bpf_func_state *callee, int insn_idx);
 
+static bool is_callback_calling_kfunc(u32 btf_id);
+
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx, int subprog,
 			     set_callee_state_fn set_callee_state_cb)
@@ -7458,10 +7475,18 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	 * interested in validating only BPF helpers that can call subprogs as
 	 * callbacks
 	 */
-	if (set_callee_state_cb != set_callee_state && !is_callback_calling_function(insn->imm)) {
-		verbose(env, "verifier bug: helper %s#%d is not marked as callback-calling\n",
-			func_id_name(insn->imm), insn->imm);
-		return -EFAULT;
+	if (set_callee_state_cb != set_callee_state) {
+		if (bpf_pseudo_kfunc_call(insn) &&
+		    !is_callback_calling_kfunc(insn->imm)) {
+			verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		} else if (!bpf_pseudo_kfunc_call(insn) &&
+			   !is_callback_calling_function(insn->imm)) { /* helper */
+			verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n",
+				func_id_name(insn->imm), insn->imm);
+			return -EFAULT;
+		}
 	}
 
 	if (insn->code == (BPF_JMP | BPF_CALL) &&
@@ -7726,6 +7751,63 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
+					 struct bpf_func_state *caller,
+					 struct bpf_func_state *callee,
+					 int insn_idx)
+{
+	/* void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+	 *                     bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b));
+	 *
+	 * 'struct bpf_rb_node *node' arg to bpf_rbtree_add is the same PTR_TO_BTF_ID w/ offset
+	 * that 'less' callback args will be receiving. However, 'node' arg was release_reference'd
+	 * by this point, so look at 'root'
+	 */
+	struct btf_field *field;
+
+	field = reg_find_field_offset(&caller->regs[BPF_REG_1], caller->regs[BPF_REG_1].off,
+				      BPF_RB_ROOT);
+	if (!field || !field->graph_root.value_btf_id)
+		return -EFAULT;
+
+	mark_reg_graph_node(callee->regs, BPF_REG_1, &field->graph_root);
+	ref_set_non_owning_lock(env, &callee->regs[BPF_REG_1]);
+	mark_reg_graph_node(callee->regs, BPF_REG_2, &field->graph_root);
+	ref_set_non_owning_lock(env, &callee->regs[BPF_REG_2]);
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, 1);
+	return 0;
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id);
+
+/* Are we currently verifying the callback for a rbtree helper that must
+ * be called with lock held? If so, no need to complain about unreleased
+ * lock
+ */
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
+{
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_insn *insn = env->prog->insnsi;
+	struct bpf_func_state *callee;
+	int kfunc_btf_id;
+
+	if (!state->curframe)
+		return false;
+
+	callee = state->frame[state->curframe];
+
+	if (!callee->in_callback_fn)
+		return false;
+
+	kfunc_btf_id = insn[callee->callsite].imm;
+	return is_rbtree_lock_required_kfunc(kfunc_btf_id);
+}
+
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -8494,6 +8576,7 @@ struct bpf_kfunc_call_arg_meta {
 	bool r0_rdonly;
 	u32 ret_btf_id;
 	u64 r0_size;
+	u32 subprogno;
 	struct {
 		u64 value;
 		bool found;
@@ -8672,6 +8755,18 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
 }
 
+static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
+				  const struct btf_param *arg)
+{
+	const struct btf_type *t;
+
+	t = btf_type_resolve_func_ptr(btf, arg->type, NULL);
+	if (!t)
+		return false;
+
+	return true;
+}
+
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
 static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
 					const struct btf *btf,
@@ -8731,6 +8826,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+	KF_ARG_PTR_TO_CALLBACK,
 	KF_ARG_PTR_TO_RB_ROOT,
 	KF_ARG_PTR_TO_RB_NODE,
 };
@@ -8855,6 +8951,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		return KF_ARG_PTR_TO_BTF_ID;
 	}
 
+	if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_CALLBACK;
+
 	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
 		arg_mem_size = true;
 
@@ -9114,6 +9213,16 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	return is_bpf_list_api_kfunc(btf_id) || is_bpf_rbtree_api_kfunc(btf_id);
 }
 
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
+}
+
+static bool is_rbtree_lock_required_kfunc(u32 btf_id)
+{
+	return is_bpf_rbtree_api_kfunc(btf_id);
+}
+
 static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
 					  enum btf_field_type head_field_type,
 					  u32 kfunc_btf_id)
@@ -9452,6 +9561,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_RB_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
+		case KF_ARG_PTR_TO_CALLBACK:
 			/* Trusted by default */
 			break;
 		default:
@@ -9603,6 +9713,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			/* Skip next '__sz' argument */
 			i++;
 			break;
+		case KF_ARG_PTR_TO_CALLBACK:
+			meta->subprogno = reg->subprogno;
+			break;
 		}
 	}
 
@@ -9737,6 +9850,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add]) {
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_rbtree_add_callback_state);
+		if (err) {
+			verbose(env, "kfunc %s#%d failed callback verification\n",
+				func_name, func_id);
+			return err;
+		}
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
@@ -13646,7 +13769,7 @@ static bool regs_exact(const struct bpf_reg_state *rold,
 		       const struct bpf_reg_state *rcur,
 		       struct bpf_id_pair *idmap)
 {
-	return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 && 
+	return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
 	       check_ids(rold->id, rcur->id, idmap) &&
 	       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
 }
@@ -14603,7 +14726,8 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_lock.ptr) {
+				if (env->cur_state->active_lock.ptr &&
+				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (6 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 07/11] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-10  3:11   ` Alexei Starovoitov
  2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
  2023-02-09 17:41 ` [PATCH v4 bpf-next 09/11] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
that require handling in the verifier:

  * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
    the bpf_rb_node field, with the offset set to that field's offset,
    instead of a struct bpf_rb_node *
    * mark_reg_graph_node helper added in previous patch generalizes
      this logic, use it

  * bpf_rbtree_remove's node input is a node that's been inserted
    in the tree - a non-owning reference.

  * bpf_rbtree_remove must invalidate non-owning references in order to
    avoid aliasing issue. Use previously-added
    invalidate_non_owning_refs helper to mark this function as a
    non-owning ref invalidation point.

  * Unlike other functions, which convert one of their input arg regs to
    non-owning reference, bpf_rbtree_first takes no arguments and just
    returns a non-owning reference (possibly null)
    * For now verifier logic for this is special-cased instead of
      adding new kfunc flag.

This patch, along with the previous one, complete special verifier
handling for all rbtree API functions added in this series.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 842f8b98c9b4..269853c632f3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9670,10 +9670,20 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
 				return -EINVAL;
 			}
-			if (!reg->ref_obj_id) {
+			if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove]) {
+				if (reg->ref_obj_id) {
+					verbose(env, "rbtree_remove node input must be non-owning ref\n");
+					return -EINVAL;
+				}
+				if (in_rbtree_lock_required_cb(env)) {
+					verbose(env, "rbtree_remove not allowed in rbtree cb\n");
+					return -EINVAL;
+				}
+			} else if (!reg->ref_obj_id) {
 				verbose(env, "allocated object must be referenced\n");
 				return -EINVAL;
 			}
+
 			ret = process_kf_arg_ptr_to_rbtree_node(env, reg, regno, meta);
 			if (ret < 0)
 				return ret;
@@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
 				struct btf_field *field = meta.arg_list_head.field;
 
-				mark_reg_known_zero(env, regs, BPF_REG_0);
-				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
-				regs[BPF_REG_0].btf = field->graph_root.btf;
-				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
-				regs[BPF_REG_0].off = field->graph_root.node_offset;
+				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
+				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+				struct btf_field *field = meta.arg_rbtree_root.field;
+
+				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
@@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			if (is_kfunc_ret_null(&meta))
 				regs[BPF_REG_0].id = id;
 			regs[BPF_REG_0].ref_obj_id = id;
+		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
+			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
 		}
+
+		if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove])
+			invalidate_non_owning_refs(env);
+
 		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
 			regs[BPF_REG_0].id = ++env->id_gen;
 	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 09/11] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (7 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests Dave Marchevsky
  2023-02-09 17:41 ` [PATCH v4 bpf-next 11/11] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

These kfuncs will be used by selftests in following patches

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 424f7bbbfe9b..dbd2c729781a 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -65,4 +65,28 @@ extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ks
  */
 extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
 
+/* Description
+ *	Remove 'node' from rbtree with root 'root'
+ * Returns
+ * 	Pointer to the removed node, or NULL if 'root' didn't contain 'node'
+ */
+extern struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
+					     struct bpf_rb_node *node) __ksym;
+
+/* Description
+ *	Add 'node' to rbtree with root 'root' using comparator 'less'
+ * Returns
+ *	Nothing
+ */
+extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+			   bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) __ksym;
+
+/* Description
+ *	Return the first (leftmost) node in input tree
+ * Returns
+ *	Pointer to the node, which is _not_ removed from the tree. If the tree
+ *	contains no nodes, returns NULL.
+ */
+extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (8 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 09/11] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  2023-02-10  2:52   ` Alexei Starovoitov
  2023-02-09 17:41 ` [PATCH v4 bpf-next 11/11] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky
  10 siblings, 1 reply; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

This patch adds selftests exercising the logic changed/added in the
previous patches in the series. A variety of successful and unsuccessful
rbtree usages are validated:

Success:
  * Add some nodes, let map_value bpf_rbtree_root destructor clean them
    up
  * Add some nodes, remove one using the non-owning ref leftover by
    successful rbtree_add() call
  * Add some nodes, remove one using the non-owning ref returned by
    rbtree_first() call

Failure:
  * BTF where bpf_rb_root owns bpf_list_node should fail to load
  * BTF where node of type X is added to tree containing nodes of type Y
    should fail to load
  * No calling rbtree api functions in 'less' callback for rbtree_add
  * No releasing lock in 'less' callback for rbtree_add
  * No removing a node which hasn't been added to any tree
  * No adding a node which has already been added to a tree
  * No escaping of non-owning references past their lock's
    critical section
  * No escaping of non-owning references past other invalidation points
    (rbtree_remove)

These tests mostly focus on rbtree-specific additions, but some of the
failure cases revalidate scenarios common to both linked_list and rbtree
which are covered in the former's tests. Better to be a bit redundant in
case linked_list and rbtree semantics deviate over time.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 176 +++++++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  52 +++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  49 +++
 .../testing/selftests/bpf/progs/rbtree_fail.c | 296 ++++++++++++++++++
 5 files changed, 757 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree.c b/tools/testing/selftests/bpf/prog_tests/rbtree.c
new file mode 100644
index 000000000000..733db8d79a2d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "rbtree.skel.h"
+#include "rbtree_fail.skel.h"
+#include "rbtree_btf_fail__wrong_node_type.skel.h"
+#include "rbtree_btf_fail__add_wrong_type.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} rbtree_fail_tests[] = {
+	{"rbtree_api_nolock_add", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_remove", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+	{"rbtree_api_nolock_first", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
+
+	/* Specific failure string for these three isn't very important, but it shouldn't be
+	 * possible to call rbtree api func from within add() callback
+	 */
+	{"rbtree_api_add_bad_cb_bad_fn_call_add", "allocated object must be referenced"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_remove", "rbtree_remove not allowed in rbtree cb"},
+	{"rbtree_api_add_bad_cb_bad_fn_call_first_unlock_after",
+	 "can't spin_{lock,unlock} in rbtree cb"},
+
+	{"rbtree_api_remove_unadded_node", "rbtree_remove node input must be non-owning ref"},
+	{"rbtree_api_add_to_multiple_trees", "allocated object must be referenced"},
+	{"rbtree_api_add_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_first_release_unlock_escape", "arg#1 expected pointer to allocated object"},
+	{"rbtree_api_remove_no_drop", "Unreleased reference id=2 alloc_insn=11"},
+	{"rbtree_api_release_aliasing", "arg#1 expected pointer to allocated object"},
+};
+
+static void test_rbtree_fail_prog(const char *prog_name, const char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts,
+		    .kernel_log_buf = log_buf,
+		    .kernel_log_size = sizeof(log_buf),
+		    .kernel_log_level = 1
+	);
+	struct rbtree_fail *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = rbtree_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "rbtree_fail__open_opts"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto end;
+
+	bpf_program__set_autoload(prog, true);
+
+	ret = rbtree_fail__load(skel);
+	if (!ASSERT_ERR(ret, "rbtree_fail__load must fail"))
+		goto end;
+
+	if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+		fprintf(stderr, "Expected: %s\n", err_msg);
+		fprintf(stderr, "Verifier: %s\n", log_buf);
+	}
+
+end:
+	rbtree_fail__destroy(skel);
+}
+
+static void test_rbtree_add_nodes(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_nodes), &opts);
+	ASSERT_OK(ret, "rbtree_add_nodes run");
+	ASSERT_OK(opts.retval, "rbtree_add_nodes retval");
+	ASSERT_EQ(skel->data->less_callback_ran, 1, "rbtree_add_nodes less_callback_ran");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_add_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_add_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_add_and_remove retval");
+	ASSERT_EQ(skel->data->removed_key, 5, "rbtree_add_and_remove first removed key");
+
+	rbtree__destroy(skel);
+}
+
+static void test_rbtree_first_and_remove(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_first_and_remove), &opts);
+	ASSERT_OK(ret, "rbtree_first_and_remove");
+	ASSERT_OK(opts.retval, "rbtree_first_and_remove retval");
+	ASSERT_EQ(skel->data->first_data[0], 2, "rbtree_first_and_remove first rbtree_first()");
+	ASSERT_EQ(skel->data->removed_key, 1, "rbtree_first_and_remove first removed key");
+	ASSERT_EQ(skel->data->first_data[1], 4, "rbtree_first_and_remove second rbtree_first()");
+
+	rbtree__destroy(skel);
+}
+
+void test_rbtree_success(void)
+{
+	if (test__start_subtest("rbtree_add_nodes"))
+		test_rbtree_add_nodes();
+	if (test__start_subtest("rbtree_add_and_remove"))
+		test_rbtree_add_and_remove();
+	if (test__start_subtest("rbtree_first_and_remove"))
+		test_rbtree_first_and_remove();
+}
+
+#define BTF_FAIL_TEST(suffix)									\
+void test_rbtree_btf_fail__##suffix(void)							\
+{												\
+	struct rbtree_btf_fail__##suffix *skel;							\
+												\
+	skel = rbtree_btf_fail__##suffix##__open_and_load();					\
+	if (!ASSERT_ERR_PTR(skel,								\
+			    "rbtree_btf_fail__" #suffix "__open_and_load unexpected success"))	\
+		rbtree_btf_fail__##suffix##__destroy(skel);					\
+}
+
+#define RUN_BTF_FAIL_TEST(suffix)				\
+	if (test__start_subtest("rbtree_btf_fail__" #suffix))	\
+		test_rbtree_btf_fail__##suffix();
+
+BTF_FAIL_TEST(wrong_node_type);
+BTF_FAIL_TEST(add_wrong_type);
+
+void test_rbtree_btf_fail(void)
+{
+	RUN_BTF_FAIL_TEST(wrong_node_type);
+	RUN_BTF_FAIL_TEST(add_wrong_type);
+}
+
+void test_rbtree_fail(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rbtree_fail_tests); i++) {
+		if (!test__start_subtest(rbtree_fail_tests[i].prog_name))
+			continue;
+		test_rbtree_fail_prog(rbtree_fail_tests[i].prog_name,
+				      rbtree_fail_tests[i].err_msg);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree.c b/tools/testing/selftests/bpf/progs/rbtree.c
new file mode 100644
index 000000000000..e5db1a4287e5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+long less_callback_ran = -1;
+long removed_key = -1;
+long first_data[2] = {-1, -1};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	less_callback_ran = 1;
+
+	return node_a->key < node_b->key;
+}
+
+static long __add_three(struct bpf_rb_root *root, struct bpf_spin_lock *lock)
+{
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 2;
+	}
+	m->key = 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_spin_unlock(&glock);
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 3;
+	n->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("tc")
+long rbtree_add_nodes(void *ctx)
+{
+	return __add_three(&groot, &glock);
+}
+
+SEC("tc")
+long rbtree_add_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		goto err_out;
+	n->key = 5;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 3;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	res = bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+
+	n = container_of(res, struct node_data, node);
+	removed_key = n->key;
+
+	bpf_obj_drop(n);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+SEC("tc")
+long rbtree_first_and_remove(void *ctx)
+{
+	struct bpf_rb_node *res = NULL;
+	struct node_data *n, *m, *o;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+	n->key = 3;
+	n->data = 4;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m)
+		goto err_out;
+	m->key = 5;
+	m->data = 6;
+
+	o = bpf_obj_new(typeof(*o));
+	if (!o)
+		goto err_out;
+	o->key = 1;
+	o->data = 2;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rbtree_add(&groot, &m->node, less);
+	bpf_rbtree_add(&groot, &o->node, less);
+
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 2;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[0] = o->data;
+
+	res = bpf_rbtree_remove(&groot, &o->node);
+	bpf_spin_unlock(&glock);
+
+	o = container_of(res, struct node_data, node);
+	removed_key = o->key;
+
+	bpf_obj_drop(o);
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res) {
+		bpf_spin_unlock(&glock);
+		return 3;
+	}
+
+	o = container_of(res, struct node_data, node);
+	first_data[1] = o->data;
+	bpf_spin_unlock(&glock);
+
+	return 0;
+err_out:
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
new file mode 100644
index 000000000000..60079b202c07
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	int key;
+	int data;
+	struct bpf_rb_node node;
+};
+
+struct node_data2 {
+	int key;
+	struct bpf_rb_node node;
+	int data;
+};
+
+static bool less2(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data2 *node_a;
+	struct node_data2 *node_b;
+
+	node_a = container_of(a, struct node_data2, node);
+	node_b = container_of(b, struct node_data2, node);
+
+	return node_a->key < node_b->key;
+}
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+SEC("tc")
+long rbtree_api_add__add_wrong_type(void *ctx)
+{
+	struct node_data2 *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less2);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
new file mode 100644
index 000000000000..340f97da1084
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+/* BTF load should fail as bpf_rb_root __contains this type and points to
+ * 'node', but 'node' is not a bpf_rb_node
+ */
+struct node_data {
+	int key;
+	int data;
+	struct bpf_list_node node;
+};
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->key < node_b->key;
+}
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+
+SEC("tc")
+long rbtree_api_add__wrong_node_type(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_first(&groot);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_fail.c b/tools/testing/selftests/bpf/progs/rbtree_fail.c
new file mode 100644
index 000000000000..df6e2a39fcee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_fail.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+struct node_data {
+	long key;
+	long data;
+	struct bpf_rb_node node;
+};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_rb_root groot __contains(node_data, node);
+private(A) struct bpf_rb_root groot2 __contains(node_data, node);
+
+static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+
+	return node_a->key < node_b->key;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_add(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_rbtree_add(&groot, &n->node, less);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_remove(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_rbtree_remove(&groot, &n->node);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_nolock_first(void *ctx)
+{
+	bpf_rbtree_first(&groot);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_unadded_node(void *ctx)
+{
+	struct node_data *n, *m;
+	struct bpf_rb_node *res;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	m = bpf_obj_new(typeof(*m));
+	if (!m) {
+		bpf_obj_drop(n);
+		return 1;
+	}
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This remove should pass verifier */
+	res = bpf_rbtree_remove(&groot, &n->node);
+	n = container_of(res, struct node_data, node);
+
+	/* This remove shouldn't, m isn't in an rbtree */
+	res = bpf_rbtree_remove(&groot, &m->node);
+	m = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	if (n)
+		bpf_obj_drop(n);
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_remove_no_drop(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		goto unlock_err;
+
+	res = bpf_rbtree_remove(&groot, res);
+
+	n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	/* bpf_obj_drop(n) is missing here */
+	return 0;
+
+unlock_err:
+	bpf_spin_unlock(&glock);
+	return 1;
+}
+
+SEC("?tc")
+long rbtree_api_add_to_multiple_trees(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+
+	/* This add should fail since n already in groot's tree */
+	bpf_rbtree_add(&groot2, &n->node, less);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_add_release_unlock_escape(void *ctx)
+{
+	struct node_data *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After add() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_release_aliasing(void *ctx)
+{
+	struct node_data *n, *m, *o;
+	struct bpf_rb_node *res;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 1;
+
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+
+	/* m and o point to the same node,
+	 * but verifier doesn't know this
+	 */
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		return 1;
+	o = container_of(res, struct node_data, node);
+
+	res = bpf_rbtree_first(&groot);
+	if (!res)
+		return 1;
+	m = container_of(res, struct node_data, node);
+
+	bpf_rbtree_remove(&groot, &m->node);
+	/* This second remove shouldn't be possible. Retval of previous
+	 * remove returns owning reference to m, which is the same
+	 * node o's non-owning ref is pointing at
+	 *
+	 * In order to preserve property
+	 *   * owning ref must not be in rbtree
+	 *   * non-owning ref must be in rbtree
+	 *
+	 * o's ref must be invalidated after previous remove. Otherwise
+	 * we'd have non-owning ref to node that isn't in rbtree, and
+	 * verifier wouldn't be able to use type system to prevent remove
+	 * of ref that already isn't in any tree. Would have to do runtime
+	 * checks in that case.
+	 */
+	bpf_rbtree_remove(&groot, &o->node);
+
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+SEC("?tc")
+long rbtree_api_first_release_unlock_escape(void *ctx)
+{
+	struct bpf_rb_node *res;
+	struct node_data *n;
+
+	bpf_spin_lock(&glock);
+	res = bpf_rbtree_first(&groot);
+	if (res)
+		n = container_of(res, struct node_data, node);
+	bpf_spin_unlock(&glock);
+
+	bpf_spin_lock(&glock);
+	/* After first() in previous critical section, n should be
+	 * release_on_unlock and released after previous spin_unlock,
+	 * so should not be possible to use it here
+	 */
+	bpf_rbtree_remove(&groot, &n->node);
+	bpf_spin_unlock(&glock);
+	return 0;
+}
+
+static bool less__bad_fn_call_add(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_add(&groot, &node_a->node, less);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_remove(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_remove(&groot, &node_a->node);
+
+	return node_a->key < node_b->key;
+}
+
+static bool less__bad_fn_call_first_unlock_after(struct bpf_rb_node *a, const struct bpf_rb_node *b)
+{
+	struct node_data *node_a;
+	struct node_data *node_b;
+
+	node_a = container_of(a, struct node_data, node);
+	node_b = container_of(b, struct node_data, node);
+	bpf_rbtree_first(&groot);
+	bpf_spin_unlock(&glock);
+
+	return node_a->key < node_b->key;
+}
+
+#define RBTREE_API_ADD_BAD_CB(cb_suffix)				\
+SEC("?tc")								\
+long rbtree_api_add_bad_cb_##cb_suffix(void *ctx)			\
+{									\
+	struct node_data *n;						\
+									\
+	n = bpf_obj_new(typeof(*n));					\
+	if (!n)								\
+		return 1;						\
+									\
+	bpf_spin_lock(&glock);						\
+	bpf_rbtree_add(&groot, &n->node, less__##cb_suffix);		\
+	bpf_spin_unlock(&glock);					\
+	return 0;							\
+}
+
+RBTREE_API_ADD_BAD_CB(bad_fn_call_add);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_remove);
+RBTREE_API_ADD_BAD_CB(bad_fn_call_first_unlock_after);
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 bpf-next 11/11] bpf, documentation: Add graph documentation for non-owning refs
  2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
                   ` (9 preceding siblings ...)
  2023-02-09 17:41 ` [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests Dave Marchevsky
@ 2023-02-09 17:41 ` Dave Marchevsky
  10 siblings, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-09 17:41 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo, Dave Marchevsky

It is difficult to intuit the semantics of owning and non-owning
references from verifier code. In order to keep the high-level details
from being lost in the mailing list, this patch adds documentation
explaining semantics and details.

The target audience of doc added in this patch is folks working on BPF
internals, as there's focus on "what should the verifier do here". Via
reorganization or copy-and-paste, much of the content can probably be
repurposed for BPF program writer audience as well.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 Documentation/bpf/graph_ds_impl.rst | 266 ++++++++++++++++++++++++++++
 Documentation/bpf/other.rst         |   3 +-
 2 files changed, 268 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/bpf/graph_ds_impl.rst

diff --git a/Documentation/bpf/graph_ds_impl.rst b/Documentation/bpf/graph_ds_impl.rst
new file mode 100644
index 000000000000..8bbf1815efe7
--- /dev/null
+++ b/Documentation/bpf/graph_ds_impl.rst
@@ -0,0 +1,266 @@
+=========================
+BPF Graph Data Structures
+=========================
+
+This document describes implementation details of new-style "graph" data
+structures (linked_list, rbtree), with particular focus on the verifier's
+implementation of semantics specific to those data structures.
+
+Although no specific verifier code is referred to in this document, the document
+assumes that the reader has general knowledge of BPF verifier internals, BPF
+maps, and BPF program writing.
+
+Note that the intent of this document is to describe the current state of
+these graph data structures. **No guarantees** of stability for either
+semantics or APIs are made or implied here.
+
+.. contents::
+    :local:
+    :depth: 2
+
+Introduction
+------------
+
+The BPF map API has historically been the main way to expose data structures
+of various types for use within BPF programs. Some data structures fit naturally
+with the map API (HASH, ARRAY), others less so. Consequentially, programs
+interacting with the latter group of data structures can be hard to parse
+for kernel programmers without previous BPF experience.
+
+Luckily, some restrictions which necessitated the use of BPF map semantics are
+no longer relevant. With the introduction of kfuncs, kptrs, and the any-context
+BPF allocator, it is now possible to implement BPF data structures whose API
+and semantics more closely match those exposed to the rest of the kernel.
+
+Two such data structures - linked_list and rbtree - have many verification
+details in common. Because both have "root"s ("head" for linked_list) and
+"node"s, the verifier code and this document refer to common functionality
+as "graph_api", "graph_root", "graph_node", etc.
+
+Unless otherwise stated, examples and semantics below apply to both graph data
+structures.
+
+Unstable API
+------------
+
+Data structures implemented using the BPF map API have historically used BPF
+helper functions - either standard map API helpers like ``bpf_map_update_elem``
+or map-specific helpers. The new-style graph data structures instead use kfuncs
+to define their manipulation helpers. Because there are no stability guarantees
+for kfuncs, the API and semantics for these data structures can be evolved in
+a way that breaks backwards compatibility if necessary.
+
+Root and node types for the new data structures are opaquely defined in the
+``uapi/linux/bpf.h`` header.
+
+Locking
+-------
+
+The new-style data structures are intrusive and are defined similarly to their
+vanilla kernel counterparts:
+
+.. code-block:: c
+        struct node_data {
+          long key;
+          long data;
+          struct bpf_rb_node node;
+        };
+
+        struct bpf_spin_lock glock;
+        struct bpf_rb_root groot __contains(node_data, node);
+
+The "root" type for both linked_list and rbtree expects to be in a map_value
+which also contains a ``bpf_spin_lock`` - in the above example both global
+variables are placed in a single-value arraymap. The verifier considers this
+spin_lock to be associated with the ``bpf_rb_root`` by virtue of both being in
+the same map_value and will enforce that the correct lock is held when
+verifying BPF programs that manipulate the tree. Since this lock checking
+happens at verification time, there is no runtime penalty.
+
+Non-owning references
+---------------------
+
+**Motivation**
+
+Consider the following BPF code:
+
+.. code-block:: c
+
+        struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n); /* PASSED */
+
+        bpf_spin_unlock(&lock);
+
+From the verifier's perspective, the pointer ``n`` returned from ``bpf_obj_new``
+has type ``PTR_TO_BTF_ID | MEM_ALLOC``, with a ``btf_id`` of
+``struct node_data`` and a nonzero ``ref_obj_id``. Because it holds ``n``, the
+program has ownership of the pointee's (object pointed to by ``n``) lifetime.
+The BPF program must pass off ownership before exiting - either via
+``bpf_obj_drop``, which ``free``'s the object, or by adding it to ``tree`` with
+``bpf_rbtree_add``.
+
+(``ACQUIRED`` and ``PASSED`` comments in the example denote statements where
+"ownership is acquired" and "ownership is passed", respectively)
+
+What should the verifier do with ``n`` after ownership is passed off? If the
+object was ``free``'d with ``bpf_obj_drop`` the answer is obvious: the verifier
+should reject programs which attempt to access ``n`` after ``bpf_obj_drop`` as
+the object is no longer valid. The underlying memory may have been reused for
+some other allocation, unmapped, etc.
+
+When ownership is passed to ``tree`` via ``bpf_rbtree_add`` the answer is less
+obvious. The verifier could enforce the same semantics as for ``bpf_obj_drop``,
+but that would result in programs with useful, common coding patterns being
+rejected, e.g.:
+
+.. code-block:: c
+
+        int x;
+        struct node_data *n = bpf_obj_new(typeof(*n)); /* ACQUIRED */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n); /* PASSED */
+        x = n->data;
+        n->data = 42;
+
+        bpf_spin_unlock(&lock);
+
+Both the read from and write to ``n->data`` would be rejected. The verifier
+can do better, though, by taking advantage of two details:
+
+  * Graph data structure APIs can only be used when the ``bpf_spin_lock``
+    associated with the graph root is held
+
+  * Both graph data structures have pointer stability
+
+     * Because graph nodes are allocated with ``bpf_obj_new`` and
+       adding / removing from the root involves fiddling with the
+       ``bpf_{list,rb}_node`` field of the node struct, a graph node will
+       remain at the same address after either operation.
+
+Because the associated ``bpf_spin_lock`` must be held by any program adding
+or removing, if we're in the critical section bounded by that lock, we know
+that no other program can add or remove until the end of the critical section.
+This combined with pointer stability means that, until the critical section
+ends, we can safely access the graph node through ``n`` even after it was used
+to pass ownership.
+
+The verifier considers such a reference a *non-owning reference*. The ref
+returned by ``bpf_obj_new`` is accordingly considered an *owning reference*.
+Both terms currently only have meaning in the context of graph nodes and API.
+
+**Details**
+
+Let's enumerate the properties of both types of references.
+
+*owning reference*
+
+  * This reference controls the lifetime of the pointee
+
+  * Ownership of pointee must be 'released' by passing it to some graph API
+    kfunc, or via ``bpf_obj_drop``, which ``free``'s the pointee
+
+    * If not released before program ends, verifier considers program invalid
+
+  * Access to the pointee's memory will not page fault
+
+*non-owning reference*
+
+  * This reference does not own the pointee
+
+     * It cannot be used to add the graph node to a graph root, nor ``free``'d via
+       ``bpf_obj_drop``
+
+  * No explicit control of lifetime, but can infer valid lifetime based on
+    non-owning ref existence (see explanation below)
+
+  * Access to the pointee's memory will not page fault
+
+From verifier's perspective non-owning references can only exist
+between spin_lock and spin_unlock. Why? After spin_unlock another program
+can do arbitrary operations on the data structure like removing and ``free``-ing
+via bpf_obj_drop. A non-owning ref to some chunk of memory that was remove'd,
+``free``'d, and reused via bpf_obj_new would point to an entirely different thing.
+Or the memory could go away.
+
+To prevent this logic violation all non-owning references are invalidated by the
+verifier after a critical section ends. This is necessary to ensure the "will
+not page fault" property of non-owning references. So if the verifier hasn't
+invalidated a non-owning ref, accessing it will not page fault.
+
+Currently ``bpf_obj_drop`` is not allowed in the critical section, so
+if there's a valid non-owning ref, we must be in a critical section, and can
+conclude that the ref's memory hasn't been dropped-and- ``free``'d or
+dropped-and-reused.
+
+Any reference to a node that is in an rbtree _must_ be non-owning, since
+the tree has control of the pointee's lifetime. Similarly, any ref to a node
+that isn't in rbtree _must_ be owning. This results in a nice property:
+graph API add / remove implementations don't need to check if a node
+has already been added (or already removed), as the ownership model
+allows the verifier to prevent such a state from being valid by simply checking
+types.
+
+However, pointer aliasing poses an issue for the above "nice property".
+Consider the following example:
+
+.. code-block:: c
+
+        struct node_data *n, *m, *o, *p;
+        n = bpf_obj_new(typeof(*n));     /* 1 */
+
+        bpf_spin_lock(&lock);
+
+        bpf_rbtree_add(&tree, n);        /* 2 */
+        m = bpf_rbtree_first(&tree);     /* 3 */
+
+        o = bpf_rbtree_remove(&tree, n); /* 4 */
+        p = bpf_rbtree_remove(&tree, m); /* 5 */
+
+        bpf_spin_unlock(&lock);
+
+        bpf_obj_drop(o);
+        bpf_obj_drop(p); /* 6 */
+
+Assume the tree is empty before this program runs. If we track verifier state
+changes here using numbers in above comments:
+
+  1) n is an owning reference
+
+  2) n is a non-owning reference, it's been added to the tree
+
+  3) n and m are non-owning references, they both point to the same node
+
+  4) o is an owning reference, n and m non-owning, all point to same node
+
+  5) o and p are owning, n and m non-owning, all point to the same node
+
+  6) a double-free has occurred, since o and p point to same node and o was
+     ``free``'d in previous statement
+
+States 4 and 5 violate our "nice property", as there are non-owning refs to
+a node which is not in an rbtree. Statement 5 will try to remove a node which
+has already been removed as a result of this violation. State 6 is a dangerous
+double-free.
+
+At a minimum we should prevent state 6 from being possible. If we can't also
+prevent state 5 then we must abandon our "nice property" and check whether a
+node has already been removed at runtime.
+
+We prevent both by generalizing the "invalidate non-owning references" behavior
+of ``bpf_spin_unlock`` and doing similar invalidation after
+``bpf_rbtree_remove``. The logic here being that any graph API kfunc which:
+
+  * takes an arbitrary node argument
+
+  * removes it from the data structure
+
+  * returns an owning reference to the removed node
+
+May result in a state where some other non-owning reference points to the same
+node. So ``remove``-type kfuncs must be considered a non-owning reference
+invalidation point as well.
diff --git a/Documentation/bpf/other.rst b/Documentation/bpf/other.rst
index 3d61963403b4..7e6b12018802 100644
--- a/Documentation/bpf/other.rst
+++ b/Documentation/bpf/other.rst
@@ -6,4 +6,5 @@ Other
    :maxdepth: 1
 
    ringbuf
-   llvm_reloc
\ No newline at end of file
+   llvm_reloc
+   graph_ds_impl
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests
  2023-02-09 17:41 ` [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests Dave Marchevsky
@ 2023-02-10  2:52   ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10  2:52 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Thu, Feb 09, 2023 at 09:41:43AM -0800, Dave Marchevsky wrote:
> This patch adds selftests exercising the logic changed/added in the
> previous patches in the series. A variety of successful and unsuccessful
> rbtree usages are validated:
> 
> Success:
>   * Add some nodes, let map_value bpf_rbtree_root destructor clean them
>     up
>   * Add some nodes, remove one using the non-owning ref leftover by
>     successful rbtree_add() call
>   * Add some nodes, remove one using the non-owning ref returned by
>     rbtree_first() call
> 
> Failure:
>   * BTF where bpf_rb_root owns bpf_list_node should fail to load
>   * BTF where node of type X is added to tree containing nodes of type Y
>     should fail to load
>   * No calling rbtree api functions in 'less' callback for rbtree_add
>   * No releasing lock in 'less' callback for rbtree_add
>   * No removing a node which hasn't been added to any tree
>   * No adding a node which has already been added to a tree
>   * No escaping of non-owning references past their lock's
>     critical section
>   * No escaping of non-owning references past other invalidation points
>     (rbtree_remove)
> 
> These tests mostly focus on rbtree-specific additions, but some of the
> failure cases revalidate scenarios common to both linked_list and rbtree
> which are covered in the former's tests. Better to be a bit redundant in
> case linked_list and rbtree semantics deviate over time.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  .../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++++++
>  tools/testing/selftests/bpf/progs/rbtree.c    | 176 +++++++++++
>  .../progs/rbtree_btf_fail__add_wrong_type.c   |  52 +++
>  .../progs/rbtree_btf_fail__wrong_node_type.c  |  49 +++
>  .../testing/selftests/bpf/progs/rbtree_fail.c | 296 ++++++++++++++++++
>  5 files changed, 757 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
>  create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
>  create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
>  create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
>  create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree.c b/tools/testing/selftests/bpf/prog_tests/rbtree.c
> new file mode 100644
> index 000000000000..733db8d79a2d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/rbtree.c
> @@ -0,0 +1,184 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +
> +#include "rbtree.skel.h"
> +#include "rbtree_fail.skel.h"
> +#include "rbtree_btf_fail__wrong_node_type.skel.h"
> +#include "rbtree_btf_fail__add_wrong_type.skel.h"
> +
> +static char log_buf[1024 * 1024];
> +
> +static struct {
> +	const char *prog_name;
> +	const char *err_msg;
> +} rbtree_fail_tests[] = {
> +	{"rbtree_api_nolock_add", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
> +	{"rbtree_api_nolock_remove", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
> +	{"rbtree_api_nolock_first", "bpf_spin_lock at off=16 must be held for bpf_rb_root"},
> +
> +	/* Specific failure string for these three isn't very important, but it shouldn't be
> +	 * possible to call rbtree api func from within add() callback
> +	 */
> +	{"rbtree_api_add_bad_cb_bad_fn_call_add", "allocated object must be referenced"},
> +	{"rbtree_api_add_bad_cb_bad_fn_call_remove", "rbtree_remove not allowed in rbtree cb"},
> +	{"rbtree_api_add_bad_cb_bad_fn_call_first_unlock_after",
> +	 "can't spin_{lock,unlock} in rbtree cb"},
> +
> +	{"rbtree_api_remove_unadded_node", "rbtree_remove node input must be non-owning ref"},
> +	{"rbtree_api_add_to_multiple_trees", "allocated object must be referenced"},
> +	{"rbtree_api_add_release_unlock_escape", "arg#1 expected pointer to allocated object"},
> +	{"rbtree_api_first_release_unlock_escape", "arg#1 expected pointer to allocated object"},
> +	{"rbtree_api_remove_no_drop", "Unreleased reference id=2 alloc_insn=11"},
> +	{"rbtree_api_release_aliasing", "arg#1 expected pointer to allocated object"},

Please convert the fail tests to RUN_TESTS() and __failure __msg() framework.
The success tests can stay as-is, since the runner checks the return values
and we don't have this feature yet in RUN_TESTS().
We probably should add such feature, but that's orthogonal and not a blocker for this set.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-09 17:41 ` [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
@ 2023-02-10  3:11   ` Alexei Starovoitov
  2023-02-10  8:22     ` Dave Marchevsky
  2023-02-10 14:15     ` Kumar Kartikeya Dwivedi
  2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
  1 sibling, 2 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10  3:11 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Thu, Feb 09, 2023 at 09:41:41AM -0800, Dave Marchevsky wrote:
> @@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
>  				struct btf_field *field = meta.arg_list_head.field;
>  
> -				mark_reg_known_zero(env, regs, BPF_REG_0);
> -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> -				regs[BPF_REG_0].btf = field->graph_root.btf;
> -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
> -				regs[BPF_REG_0].off = field->graph_root.node_offset;
> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> +				struct btf_field *field = meta.arg_rbtree_root.field;
> +
> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
> @@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			if (is_kfunc_ret_null(&meta))
>  				regs[BPF_REG_0].id = id;
>  			regs[BPF_REG_0].ref_obj_id = id;
> +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
>  		}

Looking at the above code where R0 state is set across two different if-s
it feels that bool non_owning_ref_lock from patch 2 shouldn't be a bool.

Patch 7 also has this split initialization of the reg state.
First it does mark_reg_graph_node() which sets regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC
and then it does ref_set_non_owning_lock() that sets that bool flag.
Setting PTR_TO_BTF_ID | MEM_ALLOC in the helper without setting ref_obj_id > 0
at the same time feels error prone.

This non_owning_ref_lock bool flag is actually a just flag.
I think it would be cleaner to make it similar to MEM_ALLOC and call it
NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS).

Then we can set it at once in this patch and in patch 7 and avoid this split init.
The check in patch 2 also will become cleaner.
Instead of:
if (type_is_ptr_alloc_obj(reg->type) && reg->non_owning_ref_lock)
it will be
if (reg->type == PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF)

the transition from owning to non-owning would be easier to follow as well:
PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id > 0
 -> 
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0

and it will probably help to avoid bugs where PTR_TO_BTF_ID | MEM_ALLOC is accepted
but we forgot to check ref_obj_id. There are no such places now, but it feels
less error prone with proper flag instead of bool.

I would also squash patches 1 and 2. Since we've analyzed correctness of patch 2 well enough
it doesn't make sense to go through the churn in patch 1 just to delete it in patch 2.

wdyt?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10  3:11   ` Alexei Starovoitov
@ 2023-02-10  8:22     ` Dave Marchevsky
  2023-02-10 17:30       ` Alexei Starovoitov
  2023-02-10 14:15     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-10  8:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On 2/9/23 10:11 PM, Alexei Starovoitov wrote:
> On Thu, Feb 09, 2023 at 09:41:41AM -0800, Dave Marchevsky wrote:
>> @@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
>>  				struct btf_field *field = meta.arg_list_head.field;
>>  
>> -				mark_reg_known_zero(env, regs, BPF_REG_0);
>> -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
>> -				regs[BPF_REG_0].btf = field->graph_root.btf;
>> -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
>> -				regs[BPF_REG_0].off = field->graph_root.node_offset;
>> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
>> +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
>> +				struct btf_field *field = meta.arg_rbtree_root.field;
>> +
>> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
>> @@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  			if (is_kfunc_ret_null(&meta))
>>  				regs[BPF_REG_0].id = id;
>>  			regs[BPF_REG_0].ref_obj_id = id;
>> +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
>> +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
>>  		}
> 
> Looking at the above code where R0 state is set across two different if-s
> it feels that bool non_owning_ref_lock from patch 2 shouldn't be a bool.

Re: "set across two different if-s" - I see what you mean, and the fact that
both are doing 'meta.func_id == whatever' checks doesn't make it clear why
they're separate. But note that above the else if that the second check
is adding is "if (is_kfunc_acquire(&meta))" check, acquire_reference_state, etc.

"Is function acquire" is a function-level property and, as the kfunc flags I
tried to add in previous versions of this series indicate, I think that
"returns a non-owning reference" and "need to invalidate non-owning refs"
are function-level properties as well.

As a contrast, the first addition - with mark_reg_graph_node - is more of a
return-type-level property. Instead of doing meta.func_id checks in that change,
we could instead assume that any kfunc returning "bpf_rb_node *" is actually
returning a graph node type w/ bpf_rb_node field. It's certainly a blurry line
in this case since it's necessary to peek at the bpf_rb_root arg in order to
provide info about the node type to mark_reg_graph_node. But this is similar
to RET_PTR_TO_MAP_VALUE logic which requires a bpf_map * to provide info about
the map value being returned.

Why does this distinction matter at all? Because I'd like to eventually merge
helper and kfunc verification as much as possible / reasonable, especially the
different approaches to func_proto-like logic. Currently, the bpf_func_proto
approach used by bpf helpers is better at expressing 
{arg,return}-type level properties. A helper func_proto can do

  .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL | OBJ_RELEASE,

and it's obvious which arg is being released, whereas kfunc equivalent is
KF_RELEASE flag on the kfunc itself and verifier needs to assume that there's
a single arg w/ ref_obj_id which is being released. Sure, kfunc annotations
(e.g. __sz, __alloc) could be extended to support all of this, but that's
not the current state of things, and such name suffixes wouldn't work for
retval.

Similarly, current kfunc definition scheme is better at expressing function-
level properties:

  BTF_ID_FLAGS(func, whatever, KF_ACQUIRE)

There's no func_proto equivalent, the is_acquire_function helper used in
check_helper_call resorts to "func_id ==" checks. For acquire specifically
it could be faked with a OBJ_ACQUIRE flag on retval in the proto, but I
don't know if the same would make sense for "need to invalidate non-owning
refs" or something like KF_TRUSTED_ARGS.

Anyways, this was a long-winded way of saying that separating this logic across
two different if-s was intentional and will help with future refactoring.

> Patch 7 also has this split initialization of the reg state.
> First it does mark_reg_graph_node() which sets regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC
> and then it does ref_set_non_owning_lock() that sets that bool flag.
> Setting PTR_TO_BTF_ID | MEM_ALLOC in the helper without setting ref_obj_id > 0
> at the same time feels error prone.

It's unfortunate that the reg type isn't really complete for rbtree_first
until after the second chunk of code, but this was already happening with
bpf_list_pop_{front,back}, which rely on KF_ACQUIRE flag and
is_kfunc_acquire check to set ref_obj_id on the popped owning reference.

Maybe to assuage your 'error prone' concern some check can be added at
the end of check_kfunc_call which ensures that PTR_TO_BTF_ID | MEM_ALLOC
types are properly configured, and dies with 'verifier internal error'
if not. I'm not convinced it's necessary, but regardless it would be
similar to commit 47e34cb74d37 ("bpf: Add verifier check for BPF_PTR_POISON retval and arg")
which I added a few months ago.

> This non_owning_ref_lock bool flag is actually a just flag.
> I think it would be cleaner to make it similar to MEM_ALLOC and call it
> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS).
> 
> Then we can set it at once in this patch and in patch 7 and avoid this split init.
> The check in patch 2 also will become cleaner.
> Instead of:
> if (type_is_ptr_alloc_obj(reg->type) && reg->non_owning_ref_lock)
> it will be
> if (reg->type == PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF)

Minor tangent: this is why I like helpers like type_is_ptr_alloc_obj - they make
it obvious what properties a reg should have to be considered a certain type by
the verifier, and provide more context as to what specific type check is
happening vs a raw check.

IMO the cleanest version of that check would be if(reg_is_non_owning_ref(reg))
with the newly-added helper doing what you'd expect.

> 
> the transition from owning to non-owning would be easier to follow as well:
> PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id > 0
>  -> 
>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0
> 
> and it will probably help to avoid bugs where PTR_TO_BTF_ID | MEM_ALLOC is accepted
> but we forgot to check ref_obj_id. There are no such places now, but it feels
> less error prone with proper flag instead of bool.

I'm not strongly opposed to a NON_OWN_REF type flag. It's a more granular
version of the KF_RELEASE_NON_OWN flag which I tried to add in a previous
version of this series. But some comments:

* Such a flag would eliminate the need to change bpf_reg_state in this
  series, but I think this will be temporary. If we add support for nested
  locks we'll need to reintroduce some "lock identity" again. If we want
  to improve UX for non-owning reference invalidation in the case where
  a list_head and rb_root share the same lock, we'll need to introduce some
  "datastructure root identity" to allow invalidation of only the list's
  non-owning refs on list_pop.

* Sure, we could have both the NON_OWN_REF flag and additional {lock,root}
  identity structures. But there isn't infinite room for type flags and
  currently non-owning ref concept is contained to just two data structures.
  IMO in terms of generality this flag is closer to MEM_RINGBUF than
  PTR_MAYBE_NULL. If we're going to need {lock,root} identity structs
  and can use them to disambiguate between owning/non-owning refs quickly,
  why bother with an extra flag?

* Followup to earlier func_proto rant: I can't use NON_OWN_REF flag to tag
  a kfunc retval currently. So it'll really only be a verifier-internal flag.
  At that point, might as well add reg_is_non_owning_ref for canonical way
  of checking this.

> I would also squash patches 1 and 2. Since we've analyzed correctness of patch 2 well enough
> it doesn't make sense to go through the churn in patch 1 just to delete it in patch 2.
> 

Ack.

> wdyt?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2023-02-09 17:41 ` [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
@ 2023-02-10 13:24   ` Kumar Kartikeya Dwivedi
  2023-02-10 17:13     ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 13:24 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Thu, Feb 09, 2023 at 06:41:34PM CET, Dave Marchevsky wrote:
> This patch introduces non-owning reference semantics to the verifier,
> specifically linked_list API kfunc handling. release_on_unlock logic for
> refs is refactored - with small functional changes - to implement these
> semantics, and bpf_list_push_{front,back} are migrated to use them.
>
> When a list node is pushed to a list, the program still has a pointer to
> the node:
>
>   n = bpf_obj_new(typeof(*n));
>
>   bpf_spin_lock(&l);
>   bpf_list_push_back(&l, n);
>   /* n still points to the just-added node */
>   bpf_spin_unlock(&l);
>
> What the verifier considers n to be after the push, and thus what can be
> done with n, are changed by this patch.
>
> Common properties both before/after this patch:
>   * After push, n is only a valid reference to the node until end of
>     critical section
>   * After push, n cannot be pushed to any list
>   * After push, the program can read the node's fields using n
>
> Before:
>   * After push, n retains the ref_obj_id which it received on
>     bpf_obj_new, but the associated bpf_reference_state's
>     release_on_unlock field is set to true
>     * release_on_unlock field and associated logic is used to implement
>       "n is only a valid ref until end of critical section"
>   * After push, n cannot be written to, the node must be removed from
>     the list before writing to its fields
>   * After push, n is marked PTR_UNTRUSTED
>
> After:
>   * After push, n's ref is released and ref_obj_id set to 0. The
>     bpf_reg_state's non_owning_ref_lock struct is populated with the
>     currently active lock
>     * non_owning_ref_lock and logic is used to implement "n is only a
>       valid ref until end of critical section"
>   * n can be written to (except for special fields e.g. bpf_list_node,
>     timer, ...)
>   * No special type flag is added to n after push
>
> Summary of specific implementation changes to achieve the above:
>
>   * release_on_unlock field, ref_set_release_on_unlock helper, and logic
>     to "release on unlock" based on that field are removed
>
>   * The anonymous active_lock struct used by bpf_verifier_state is
>     pulled out into a named struct bpf_active_lock.
>
>   * A non_owning_ref_lock field of type bpf_active_lock is added to
>     bpf_reg_state's PTR_TO_BTF_ID union
>
>   * Helpers are added to use non_owning_ref_lock to implement non-owning
>     ref semantics as described above
>     * invalidate_non_owning_refs - helper to clobber all non-owning refs
>       matching a particular bpf_active_lock identity. Replaces
>       release_on_unlock logic in process_spin_lock.
>     * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based
>       on current verifier state
>     * ref_convert_owning_non_owning - convert owning reference w/
>       specified ref_obj_id to non-owning references. Setup
>       non_owning_ref_lock for each reg with that ref_obj_id and 0 out
>       its ref_obj_id
>
> After these changes, linked_list's "release on unlock" logic continues
> to function as before, except for the semantic differences noted above.
> The patch immediately following this one makes minor changes to
> linked_list selftests to account for the differing behavior.
>

I think you need to squash sefltest changes into this one to ensure clean
bisection.

> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>  include/linux/bpf.h          |   1 +
>  include/linux/bpf_verifier.h |  39 ++++-----
>  kernel/bpf/verifier.c        | 164 +++++++++++++++++++++++++----------
>  3 files changed, 136 insertions(+), 68 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 35c18a98c21a..9a79ebe1774c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -180,6 +180,7 @@ enum btf_field_type {
>  	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>  	BPF_LIST_HEAD  = (1 << 4),
>  	BPF_LIST_NODE  = (1 << 5),
> +	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD,
>  };
>
>  struct btf_field_kptr {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index aa83de1fe755..7b5fbb66446c 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -43,6 +43,22 @@ enum bpf_reg_liveness {
>  	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
>  };
>
> +/* For every reg representing a map value or allocated object pointer,
> + * we consider the tuple of (ptr, id) for them to be unique in verifier
> + * context and conside them to not alias each other for the purposes of
> + * tracking lock state.
> + */
> +struct bpf_active_lock {
> +	/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
> +	 * there's no active lock held, and other fields have no
> +	 * meaning. If non-NULL, it indicates that a lock is held and
> +	 * id member has the reg->id of the register which can be >= 0.
> +	 */
> +	void *ptr;
> +	/* This will be reg->id */
> +	u32 id;
> +};
> +
>  struct bpf_reg_state {
>  	/* Ordering of fields matters.  See states_equal() */
>  	enum bpf_reg_type type;
> @@ -68,6 +84,7 @@ struct bpf_reg_state {
>  		struct {
>  			struct btf *btf;
>  			u32 btf_id;
> +			struct bpf_active_lock non_owning_ref_lock;
>  		};

As Alexei said, it'd be better to merge patch 1 and patch 2. But if not, we
should probably increase the size of 'raw' member in this change.

>
>  		struct { /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> @@ -226,11 +243,6 @@ struct bpf_reference_state {
>  	 * exiting a callback function.
>  	 */
>  	int callback_ref;
> -	/* Mark the reference state to release the registers sharing the same id
> -	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> -	 * safe to access inside the critical section).
> -	 */
> -	bool release_on_unlock;
>  };
>
> [...]
> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> +				       struct bpf_active_lock *lock)
> +{
> +	struct bpf_func_state *unused;
> +	struct bpf_reg_state *reg;
> +
> +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> +		if (reg->non_owning_ref_lock.ptr &&
> +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
> +		    reg->non_owning_ref_lock.id == lock->id)
> +			__mark_reg_unknown(env, reg);

Probably better to do:

	if (!env->allow_ptr_leaks)
		__mark_reg_not_init(...);
	else
		__mark_reg_unknown(...);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-09 17:41 ` [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
  2023-02-10  3:11   ` Alexei Starovoitov
@ 2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
  2023-02-10 17:21     ` Alexei Starovoitov
  2023-02-10 19:01     ` Dave Marchevsky
  1 sibling, 2 replies; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 13:55 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> that require handling in the verifier:
>
>   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
>     the bpf_rb_node field, with the offset set to that field's offset,
>     instead of a struct bpf_rb_node *
>     * mark_reg_graph_node helper added in previous patch generalizes
>       this logic, use it
>
>   * bpf_rbtree_remove's node input is a node that's been inserted
>     in the tree - a non-owning reference.
>
>   * bpf_rbtree_remove must invalidate non-owning references in order to
>     avoid aliasing issue. Use previously-added
>     invalidate_non_owning_refs helper to mark this function as a
>     non-owning ref invalidation point.
>
>   * Unlike other functions, which convert one of their input arg regs to
>     non-owning reference, bpf_rbtree_first takes no arguments and just
>     returns a non-owning reference (possibly null)
>     * For now verifier logic for this is special-cased instead of
>       adding new kfunc flag.
>
> This patch, along with the previous one, complete special verifier
> handling for all rbtree API functions added in this series.
>

I think there are two issues with the current approach. The fundamental
assumption with non-owning references is that it is part of the collection. So
bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
is being removed from collection. Marking bpf_rbtree_remove (and in the future
bpf_list_del) as invalidation points is also right, since once a node has been
removed it is going to be unclear whether existing non-owning references have
the same value, and thus the property of 'part of the collection' will be
broken.

The first issue relates to usability. If I have non-owning references to nodes
inserted into both a list and an rbtree, bpf_rbtree_remove should only
invalidate the ones that are part of the particular rbtree. It should have no
effect on others. Likewise for the bpf_list_del operation in the future.
Therefore, we need to track the collection identity associated with each
non-owning reference, then only invalidate non-owning references associated with
the same collection.

The case of bpf_spin_unlock is different, which should invalidate all non-owning
references.

The second issue is more serious. By not tracking the collection identity, we
will currently allow a non-owning reference for an object inserted into a list
to be passed to bpf_rbtree_remove, because the verifier cannot discern between
'inserted into rbtree' vs 'inserted into list'. For it, both are currently
equivalent in the verifier state. An object is allowed to have both
bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
time (because of no shared ownership).

	struct obj {
		bpf_list_node ln;
		bpf_rb_node rn;
	};

	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
	bpf_rbtree_remove(&obj->rn); // should not work, but does

So some notion of a collection identity needs to be constructed, the amount of
data which needs to be remembered in each non-owning reference's register state
depends on our requirements.

The first sanity check is that bpf_rbtree_remove only removes something in an
rbtree, so probably an enum member indicating whether collection is a list or
rbtree. To ensure proper scoped invalidation, we will unfortunately need more
than just the reg->id of the reg holding the graph root, since map values of
different maps may have same id (0). Hence, we need id and ptr similar to the
active lock case for proper matching. Even this won't be enough, as there can be
multiple list or rbtree roots in a particular memory region, therefore the
offset also needs to be part of the collection identity.

So it seems it will amount to:

	struct bpf_collection_id {
		enum bpf_collection_type type;
		void *ptr;
		int id;
		int off;
	};

There might be ways to optimize the memory footprint of this struct, but I'm
just trying to state why we'll need to include all four, so we don't miss out on
a corner case again.

> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
> [...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10  3:11   ` Alexei Starovoitov
  2023-02-10  8:22     ` Dave Marchevsky
@ 2023-02-10 14:15     ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 14:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 04:11:25AM CET, Alexei Starovoitov wrote:
> On Thu, Feb 09, 2023 at 09:41:41AM -0800, Dave Marchevsky wrote:
> > @@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
> >  				struct btf_field *field = meta.arg_list_head.field;
> >
> > -				mark_reg_known_zero(env, regs, BPF_REG_0);
> > -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> > -				regs[BPF_REG_0].btf = field->graph_root.btf;
> > -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
> > -				regs[BPF_REG_0].off = field->graph_root.node_offset;
> > +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> > +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> > +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> > +				struct btf_field *field = meta.arg_rbtree_root.field;
> > +
> > +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> >  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
> >  				mark_reg_known_zero(env, regs, BPF_REG_0);
> >  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
> > @@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  			if (is_kfunc_ret_null(&meta))
> >  				regs[BPF_REG_0].id = id;
> >  			regs[BPF_REG_0].ref_obj_id = id;
> > +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> > +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
> >  		}
>
> Looking at the above code where R0 state is set across two different if-s
> it feels that bool non_owning_ref_lock from patch 2 shouldn't be a bool.
>
> Patch 7 also has this split initialization of the reg state.
> First it does mark_reg_graph_node() which sets regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC
> and then it does ref_set_non_owning_lock() that sets that bool flag.
> Setting PTR_TO_BTF_ID | MEM_ALLOC in the helper without setting ref_obj_id > 0
> at the same time feels error prone.
>
> This non_owning_ref_lock bool flag is actually a just flag.
> I think it would be cleaner to make it similar to MEM_ALLOC and call it
> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS).
>
> Then we can set it at once in this patch and in patch 7 and avoid this split init.
> The check in patch 2 also will become cleaner.
> Instead of:
> if (type_is_ptr_alloc_obj(reg->type) && reg->non_owning_ref_lock)
> it will be
> if (reg->type == PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF)
>
> the transition from owning to non-owning would be easier to follow as well:
> PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id > 0
>  ->
>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0
>

Separate type flag looks cleaner to me too, especially now that such non-owning
references have concrete semantics and context associated with them.

> and it will probably help to avoid bugs where PTR_TO_BTF_ID | MEM_ALLOC is accepted
> but we forgot to check ref_obj_id. There are no such places now, but it feels
> less error prone with proper flag instead of bool.
>
> I would also squash patches 1 and 2. Since we've analyzed correctness of patch 2 well enough
> it doesn't make sense to go through the churn in patch 1 just to delete it in patch 2.
>

+1

> wdyt?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support
  2023-02-09 17:41 ` [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
@ 2023-02-10 14:18   ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 14:18 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On Thu, Feb 09, 2023 at 06:41:37PM CET, Dave Marchevsky wrote:
> This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to
> BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new
> types, and adds bpf_rb_root_free function for freeing bpf_rb_root in
> map_values.
>
> structs bpf_rb_root and bpf_rb_node are opaque types meant to
> obscure structs rb_root_cached rb_node, respectively.
>
> btf_struct_access will prevent BPF programs from touching these special
> fields automatically now that they're recognized.
>
> btf_check_and_fixup_fields now groups list_head and rb_root together as
> "graph root" fields and {list,rb}_node as "graph node", and does same
> ownership cycle checking as before. Note that this function does _not_
> prevent ownership type mixups (e.g. rb_root owning list_node) - that's
> handled by btf_parse_graph_root.
>
> After this patch, a bpf program can have a struct bpf_rb_root in a
> map_value, but not add anything to nor do anything useful with it.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
> [...]
> +#define GRAPH_ROOT_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
> +#define GRAPH_NODE_MASK (BPF_LIST_NODE | BPF_RB_NODE)
> +
>  int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>  {
>  	int i;
>
> -	/* There are two owning types, kptr_ref and bpf_list_head. The former
> -	 * only supports storing kernel types, which can never store references
> -	 * to program allocated local types, atleast not yet. Hence we only need
> -	 * to ensure that bpf_list_head ownership does not form cycles.
> +	/* There are three types that signify ownership of some other type:
> +	 *  kptr_ref, bpf_list_head, bpf_rb_root.
> +	 * kptr_ref only supports storing kernel types, which can't store
> +	 * references to program allocated local types.
> +	 *
> +	 * Hence we only need to ensure that bpf_{list_head,rb_root} ownership
> +	 * does not form cycles.
>  	 */
> -	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
> +	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & GRAPH_ROOT_MASK))
>  		return 0;
>  	for (i = 0; i < rec->cnt; i++) {
>  		struct btf_struct_meta *meta;
>  		u32 btf_id;
>
> -		if (!(rec->fields[i].type & BPF_LIST_HEAD))
> +		if (!(rec->fields[i].type & GRAPH_ROOT_MASK))
>  			continue;
>  		btf_id = rec->fields[i].graph_root.value_btf_id;
>  		meta = btf_find_struct_meta(btf, btf_id);
> @@ -3762,39 +3803,47 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
>  			return -EFAULT;
>  		rec->fields[i].graph_root.value_rec = meta->record;
>
> -		if (!(rec->field_mask & BPF_LIST_NODE))
> +		/* We need to set value_rec for all root types, but no need
> +		 * to check ownership cycle for a type unless it's also a
> +		 * node type.
> +		 */
> +		if (!(rec->field_mask & GRAPH_NODE_MASK))
>  			continue;
>
>  		/* We need to ensure ownership acyclicity among all types. The
>  		 * proper way to do it would be to topologically sort all BTF
>  		 * IDs based on the ownership edges, since there can be multiple
> -		 * bpf_list_head in a type. Instead, we use the following
> -		 * reasoning:
> +		 * bpf_{list_head,rb_node} in a type. Instead, we use the
> +		 * following resaoning:
>  		 *
>  		 * - A type can only be owned by another type in user BTF if it
> -		 *   has a bpf_list_node.
> +		 *   has a bpf_{list,rb}_node. Let's call these node types.
>  		 * - A type can only _own_ another type in user BTF if it has a
> -		 *   bpf_list_head.
> +		 *   bpf_{list_head,rb_root}. Let's call these root types.
>  		 *
> -		 * We ensure that if a type has both bpf_list_head and
> -		 * bpf_list_node, its element types cannot be owning types.
> +		 * We ensure that if a type is both a root and node, its
> +		 * element types cannot be root types.
>  		 *
>  		 * To ensure acyclicity:
>  		 *
> -		 * When A only has bpf_list_head, ownership chain can be:
> +		 * When A is an root type but not a node, its ownership
> +		 * chain can be:
>  		 *	A -> B -> C
>  		 * Where:
> -		 * - B has both bpf_list_head and bpf_list_node.
> -		 * - C only has bpf_list_node.
> +		 * - A is an root, e.g. has bpf_rb_root.
> +		 * - B is both a root and node, e.g. has bpf_rb_node and
> +		 *   bpf_list_head.
> +		 * - C is only an root, e.g. has bpf_list_node
>  		 *
> -		 * When A has both bpf_list_head and bpf_list_node, some other
> -		 * type already owns it in the BTF domain, hence it can not own
> -		 * another owning type through any of the bpf_list_head edges.
> +		 * When A is both a root and node, some other type already
> +		 * owns it in the BTF domain, hence it can not own
> +		 * another root type through any of the ownership edges.
>  		 *	A -> B
>  		 * Where:
> -		 * - B only has bpf_list_node.
> +		 * - A is both an root and node.
> +		 * - B is only an node.
>  		 */
> -		if (meta->record->field_mask & BPF_LIST_HEAD)
> +		if (meta->record->field_mask & GRAPH_ROOT_MASK)
>  			return -ELOOP;

While you are at it, can you include BTF selftests (similar to what linked list
tests are doing in test_btf) to ensure all of this being correctly rejected for
rbtree and mixed rbtree + list cases?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics
  2023-02-10 13:24   ` Kumar Kartikeya Dwivedi
@ 2023-02-10 17:13     ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 17:13 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 02:24:13PM +0100, Kumar Kartikeya Dwivedi wrote:
> > [...]
> > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env,
> > +				       struct bpf_active_lock *lock)
> > +{
> > +	struct bpf_func_state *unused;
> > +	struct bpf_reg_state *reg;
> > +
> > +	bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({
> > +		if (reg->non_owning_ref_lock.ptr &&
> > +		    reg->non_owning_ref_lock.ptr == lock->ptr &&
> > +		    reg->non_owning_ref_lock.id == lock->id)
> > +			__mark_reg_unknown(env, reg);
> 
> Probably better to do:
> 
> 	if (!env->allow_ptr_leaks)
> 		__mark_reg_not_init(...);
> 	else
> 		__mark_reg_unknown(...);

That's redundant. kfuncs and any PTR_TO_BTF_ID access requires allow_ptr_leaks.
See first check in check_ptr_to_btf_access()

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
@ 2023-02-10 17:21     ` Alexei Starovoitov
  2023-02-10 18:03       ` Kumar Kartikeya Dwivedi
  2023-02-10 19:01     ` Dave Marchevsky
  1 sibling, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 17:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > that require handling in the verifier:
> >
> >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> >     the bpf_rb_node field, with the offset set to that field's offset,
> >     instead of a struct bpf_rb_node *
> >     * mark_reg_graph_node helper added in previous patch generalizes
> >       this logic, use it
> >
> >   * bpf_rbtree_remove's node input is a node that's been inserted
> >     in the tree - a non-owning reference.
> >
> >   * bpf_rbtree_remove must invalidate non-owning references in order to
> >     avoid aliasing issue. Use previously-added
> >     invalidate_non_owning_refs helper to mark this function as a
> >     non-owning ref invalidation point.
> >
> >   * Unlike other functions, which convert one of their input arg regs to
> >     non-owning reference, bpf_rbtree_first takes no arguments and just
> >     returns a non-owning reference (possibly null)
> >     * For now verifier logic for this is special-cased instead of
> >       adding new kfunc flag.
> >
> > This patch, along with the previous one, complete special verifier
> > handling for all rbtree API functions added in this series.
> >
> 
> I think there are two issues with the current approach. The fundamental
> assumption with non-owning references is that it is part of the collection. So
> bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> is being removed from collection. Marking bpf_rbtree_remove (and in the future
> bpf_list_del) as invalidation points is also right, since once a node has been
> removed it is going to be unclear whether existing non-owning references have
> the same value, and thus the property of 'part of the collection' will be
> broken.

correct, but the patch set does invalidate after bpf_rbtree_remove(),
so it's not an issue.

> The first issue relates to usability. If I have non-owning references to nodes
> inserted into both a list and an rbtree, bpf_rbtree_remove should only
> invalidate the ones that are part of the particular rbtree. It should have no
> effect on others. Likewise for the bpf_list_del operation in the future.
> Therefore, we need to track the collection identity associated with each
> non-owning reference, then only invalidate non-owning references associated with
> the same collection.
> 
> The case of bpf_spin_unlock is different, which should invalidate all non-owning
> references.
> 
> The second issue is more serious. By not tracking the collection identity, we
> will currently allow a non-owning reference for an object inserted into a list
> to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> equivalent in the verifier state. An object is allowed to have both
> bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> time (because of no shared ownership).
> 
> 	struct obj {
> 		bpf_list_node ln;
> 		bpf_rb_node rn;
> 	};
> 
> 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> 	bpf_rbtree_remove(&obj->rn); // should not work, but does

Also correct, but inserting the same single owner node into rbtree and link list
is not supported. Only 'shared ownership' node can be inserted into
two collections.
The check to disallow bpf_list_node and bpf_rb_node in the same obj
can be a follow up patch to close this hole.

> So some notion of a collection identity needs to be constructed, the amount of
> data which needs to be remembered in each non-owning reference's register state
> depends on our requirements.
> 
> The first sanity check is that bpf_rbtree_remove only removes something in an
> rbtree, so probably an enum member indicating whether collection is a list or
> rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> than just the reg->id of the reg holding the graph root, since map values of
> different maps may have same id (0). Hence, we need id and ptr similar to the
> active lock case for proper matching. Even this won't be enough, as there can be
> multiple list or rbtree roots in a particular memory region, therefore the
> offset also needs to be part of the collection identity.
> 
> So it seems it will amount to:
> 
> 	struct bpf_collection_id {
> 		enum bpf_collection_type type;
> 		void *ptr;
> 		int id;
> 		int off;
> 	};
> 
> There might be ways to optimize the memory footprint of this struct, but I'm
> just trying to state why we'll need to include all four, so we don't miss out on
> a corner case again.

The trade-off doesn't feel right here. Tracking collection id complexity in
the verifier for single owner case is not worth it imo.
We should focus on adding support for 'shared ownership' with explicit refcount in the obj.
Then the same obj can be inserted into two rbtrees or into rbtree and link list.

Single owner rb-tree is a big step and we've been trying to make this step for the last ~6 month.
I prefer to do it now and worry about UX, shared owner, etc in the follow ups.
We need to start using lists and rbtree in bpf progs that do real work to get a feel
on whether UX is right or unusable or somewhere in-between.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10  8:22     ` Dave Marchevsky
@ 2023-02-10 17:30       ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 17:30 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Kumar Kartikeya Dwivedi, Tejun Heo

On Fri, Feb 10, 2023 at 03:22:43AM -0500, Dave Marchevsky wrote:
> On 2/9/23 10:11 PM, Alexei Starovoitov wrote:
> > On Thu, Feb 09, 2023 at 09:41:41AM -0800, Dave Marchevsky wrote:
> >> @@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >>  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
> >>  				struct btf_field *field = meta.arg_list_head.field;
> >>  
> >> -				mark_reg_known_zero(env, regs, BPF_REG_0);
> >> -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> >> -				regs[BPF_REG_0].btf = field->graph_root.btf;
> >> -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
> >> -				regs[BPF_REG_0].off = field->graph_root.node_offset;
> >> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> >> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
> >> +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> >> +				struct btf_field *field = meta.arg_rbtree_root.field;
> >> +
> >> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
> >>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
> >>  				mark_reg_known_zero(env, regs, BPF_REG_0);
> >>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
> >> @@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >>  			if (is_kfunc_ret_null(&meta))
> >>  				regs[BPF_REG_0].id = id;
> >>  			regs[BPF_REG_0].ref_obj_id = id;
> >> +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
> >> +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
> >>  		}
> > 
> > Looking at the above code where R0 state is set across two different if-s
> > it feels that bool non_owning_ref_lock from patch 2 shouldn't be a bool.
> 
> Re: "set across two different if-s" - I see what you mean, and the fact that
> both are doing 'meta.func_id == whatever' checks doesn't make it clear why
> they're separate. But note that above the else if that the second check
> is adding is "if (is_kfunc_acquire(&meta))" check, acquire_reference_state, etc.

fair enough. let's keep it split.

> "Is function acquire" is a function-level property and, as the kfunc flags I
> tried to add in previous versions of this series indicate, I think that
> "returns a non-owning reference" and "need to invalidate non-owning refs"
> are function-level properties as well.

agree

> As a contrast, the first addition - with mark_reg_graph_node - is more of a
> return-type-level property. Instead of doing meta.func_id checks in that change,
> we could instead assume that any kfunc returning "bpf_rb_node *" is actually
> returning a graph node type w/ bpf_rb_node field. It's certainly a blurry line
> in this case since it's necessary to peek at the bpf_rb_root arg in order to
> provide info about the node type to mark_reg_graph_node. But this is similar
> to RET_PTR_TO_MAP_VALUE logic which requires a bpf_map * to provide info about
> the map value being returned.
> 
> Why does this distinction matter at all? Because I'd like to eventually merge
> helper and kfunc verification as much as possible / reasonable, especially the
> different approaches to func_proto-like logic. Currently, the bpf_func_proto
> approach used by bpf helpers is better at expressing 
> {arg,return}-type level properties. A helper func_proto can do
> 
>   .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL | OBJ_RELEASE,
> 
> and it's obvious which arg is being released, whereas kfunc equivalent is
> KF_RELEASE flag on the kfunc itself and verifier needs to assume that there's
> a single arg w/ ref_obj_id which is being released. Sure, kfunc annotations
> (e.g. __sz, __alloc) could be extended to support all of this, but that's
> not the current state of things, and such name suffixes wouldn't work for
> retval.
> 
> Similarly, current kfunc definition scheme is better at expressing function-
> level properties:
> 
>   BTF_ID_FLAGS(func, whatever, KF_ACQUIRE)
> 
> There's no func_proto equivalent, the is_acquire_function helper used in
> check_helper_call resorts to "func_id ==" checks. For acquire specifically
> it could be faked with a OBJ_ACQUIRE flag on retval in the proto, but I
> don't know if the same would make sense for "need to invalidate non-owning
> refs" or something like KF_TRUSTED_ARGS.
> 
> Anyways, this was a long-winded way of saying that separating this logic across
> two different if-s was intentional and will help with future refactoring.

Agree with all of the above. We need to address this tech debt and merge
kfunc and helper validation, but this is orthogonal to this patch set.
It's a separate discussion to have with lots of bike shedding ahead.

> > Patch 7 also has this split initialization of the reg state.
> > First it does mark_reg_graph_node() which sets regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC
> > and then it does ref_set_non_owning_lock() that sets that bool flag.
> > Setting PTR_TO_BTF_ID | MEM_ALLOC in the helper without setting ref_obj_id > 0
> > at the same time feels error prone.
> 
> It's unfortunate that the reg type isn't really complete for rbtree_first
> until after the second chunk of code, but this was already happening with
> bpf_list_pop_{front,back}, which rely on KF_ACQUIRE flag and
> is_kfunc_acquire check to set ref_obj_id on the popped owning reference.
> 
> Maybe to assuage your 'error prone' concern some check can be added at
> the end of check_kfunc_call which ensures that PTR_TO_BTF_ID | MEM_ALLOC
> types are properly configured, and dies with 'verifier internal error'
> if not. I'm not convinced it's necessary, but regardless it would be
> similar to commit 47e34cb74d37 ("bpf: Add verifier check for BPF_PTR_POISON retval and arg")
> which I added a few months ago.

I think it's a good idea to add such safety check, but let's do it in the follow up.

> > This non_owning_ref_lock bool flag is actually a just flag.
> > I think it would be cleaner to make it similar to MEM_ALLOC and call it
> > NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS).
> > 
> > Then we can set it at once in this patch and in patch 7 and avoid this split init.
> > The check in patch 2 also will become cleaner.
> > Instead of:
> > if (type_is_ptr_alloc_obj(reg->type) && reg->non_owning_ref_lock)
> > it will be
> > if (reg->type == PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF)
> 
> Minor tangent: this is why I like helpers like type_is_ptr_alloc_obj - they make
> it obvious what properties a reg should have to be considered a certain type by
> the verifier, and provide more context as to what specific type check is
> happening vs a raw check.
> 
> IMO the cleanest version of that check would be if(reg_is_non_owning_ref(reg))
> with the newly-added helper doing what you'd expect.

Like
static bool type_is_non_owning_ref(u32 type)
{
        return base_type(type) == PTR_TO_BTF_ID &&  type_flag(type) & (MEM_ALLOC | NON_OWN_REF);
}

?
If so that makes sense.

> > 
> > the transition from owning to non-owning would be easier to follow as well:
> > PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id > 0
> >  -> 
> >    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0
> > 
> > and it will probably help to avoid bugs where PTR_TO_BTF_ID | MEM_ALLOC is accepted
> > but we forgot to check ref_obj_id. There are no such places now, but it feels
> > less error prone with proper flag instead of bool.
> 
> I'm not strongly opposed to a NON_OWN_REF type flag. It's a more granular
> version of the KF_RELEASE_NON_OWN flag which I tried to add in a previous
> version of this series. But some comments:
> 
> * Such a flag would eliminate the need to change bpf_reg_state in this
>   series, but I think this will be temporary. If we add support for nested
>   locks we'll need to reintroduce some "lock identity" again. If we want
>   to improve UX for non-owning reference invalidation in the case where
>   a list_head and rb_root share the same lock, we'll need to introduce some
>   "datastructure root identity" to allow invalidation of only the list's
>   non-owning refs on list_pop.

As replied to Kumar I prefer to minimize the complexity now and worry about
list_head and rb_root under the same lock later.

> * Sure, we could have both the NON_OWN_REF flag and additional {lock,root}
>   identity structures. But there isn't infinite room for type flags and
>   currently non-owning ref concept is contained to just two data structures.
>   IMO in terms of generality this flag is closer to MEM_RINGBUF than
>   PTR_MAYBE_NULL. If we're going to need {lock,root} identity structs
>   and can use them to disambiguate between owning/non-owning refs quickly,
>   why bother with an extra flag?

Let's cross that bridge when we get there.
Currently NON_OWN_REF flag looks strictly cleaner to me than extra 'bool' in reg_state.
If we need to refactor it later we can certainly do that.
I really want to move this feature forward.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 17:21     ` Alexei Starovoitov
@ 2023-02-10 18:03       ` Kumar Kartikeya Dwivedi
  2023-02-10 18:58         ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 18:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > that require handling in the verifier:
> > >
> > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > >     the bpf_rb_node field, with the offset set to that field's offset,
> > >     instead of a struct bpf_rb_node *
> > >     * mark_reg_graph_node helper added in previous patch generalizes
> > >       this logic, use it
> > >
> > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > >     in the tree - a non-owning reference.
> > >
> > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > >     avoid aliasing issue. Use previously-added
> > >     invalidate_non_owning_refs helper to mark this function as a
> > >     non-owning ref invalidation point.
> > >
> > >   * Unlike other functions, which convert one of their input arg regs to
> > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > >     returns a non-owning reference (possibly null)
> > >     * For now verifier logic for this is special-cased instead of
> > >       adding new kfunc flag.
> > >
> > > This patch, along with the previous one, complete special verifier
> > > handling for all rbtree API functions added in this series.
> > >
> >
> > I think there are two issues with the current approach. The fundamental
> > assumption with non-owning references is that it is part of the collection. So
> > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > bpf_list_del) as invalidation points is also right, since once a node has been
> > removed it is going to be unclear whether existing non-owning references have
> > the same value, and thus the property of 'part of the collection' will be
> > broken.
>
> correct, but the patch set does invalidate after bpf_rbtree_remove(),
> so it's not an issue.
>
> > The first issue relates to usability. If I have non-owning references to nodes
> > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > invalidate the ones that are part of the particular rbtree. It should have no
> > effect on others. Likewise for the bpf_list_del operation in the future.
> > Therefore, we need to track the collection identity associated with each
> > non-owning reference, then only invalidate non-owning references associated with
> > the same collection.
> >
> > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > references.
> >
> > The second issue is more serious. By not tracking the collection identity, we
> > will currently allow a non-owning reference for an object inserted into a list
> > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > equivalent in the verifier state. An object is allowed to have both
> > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > time (because of no shared ownership).
> >
> > 	struct obj {
> > 		bpf_list_node ln;
> > 		bpf_rb_node rn;
> > 	};
> >
> > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
>
> Also correct, but inserting the same single owner node into rbtree and link list
> is not supported. Only 'shared ownership' node can be inserted into
> two collections.

What is supported is having an object be part of a list and an rbtree one at a
time, which is what I'm talking about here. Shared ownership has nothing to do
with this.

> The check to disallow bpf_list_node and bpf_rb_node in the same obj
> can be a follow up patch to close this hole.
>

Fine, that would also 'fix' this problem, where a non-owning reference part of a
list could be passed to bpf_rbtree_remove etc. If there can only be a list node
or an rbtree node in an object, such a case cannot be constructed. But I think
it's an awkward limitation.

> > So some notion of a collection identity needs to be constructed, the amount of
> > data which needs to be remembered in each non-owning reference's register state
> > depends on our requirements.
> >
> > The first sanity check is that bpf_rbtree_remove only removes something in an
> > rbtree, so probably an enum member indicating whether collection is a list or
> > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > than just the reg->id of the reg holding the graph root, since map values of
> > different maps may have same id (0). Hence, we need id and ptr similar to the
> > active lock case for proper matching. Even this won't be enough, as there can be
> > multiple list or rbtree roots in a particular memory region, therefore the
> > offset also needs to be part of the collection identity.
> >
> > So it seems it will amount to:
> >
> > 	struct bpf_collection_id {
> > 		enum bpf_collection_type type;
> > 		void *ptr;
> > 		int id;
> > 		int off;
> > 	};
> >
> > There might be ways to optimize the memory footprint of this struct, but I'm
> > just trying to state why we'll need to include all four, so we don't miss out on
> > a corner case again.
>
> The trade-off doesn't feel right here. Tracking collection id complexity in
> the verifier for single owner case is not worth it imo.

It was more about correctness after this set is applied than being worth it. We
could argue it's not worth it for the first issue which relates to usability.
Dave has already mentioned that point. But the second one is simply incorrect to
allow. As soon as you want an object which can be part of both a list and rbtree
at different times (e.g., an item usually part of an rbtree but which is popped
out and moved to a list for some processing), people will be able to trigger it.

Simply disallowing that (as you said above) is also an option. But whenever you
do allow it, you will need to add something like this. It has little to do with
whether shared ownership is supported or not.

> We should focus on adding support for 'shared ownership' with explicit refcount in the obj.
> Then the same obj can be inserted into two rbtrees or into rbtree and link list.
>
> Single owner rb-tree is a big step and we've been trying to make this step for the last ~6 month.
> I prefer to do it now and worry about UX, shared owner, etc in the follow ups.
> We need to start using lists and rbtree in bpf progs that do real work to get a feel
> on whether UX is right or unusable or somewhere in-between.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 18:03       ` Kumar Kartikeya Dwivedi
@ 2023-02-10 18:58         ` Alexei Starovoitov
  2023-02-10 19:38           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 18:58 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 07:03:46PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> > On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > > that require handling in the verifier:
> > > >
> > > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > > >     the bpf_rb_node field, with the offset set to that field's offset,
> > > >     instead of a struct bpf_rb_node *
> > > >     * mark_reg_graph_node helper added in previous patch generalizes
> > > >       this logic, use it
> > > >
> > > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > > >     in the tree - a non-owning reference.
> > > >
> > > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > > >     avoid aliasing issue. Use previously-added
> > > >     invalidate_non_owning_refs helper to mark this function as a
> > > >     non-owning ref invalidation point.
> > > >
> > > >   * Unlike other functions, which convert one of their input arg regs to
> > > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > > >     returns a non-owning reference (possibly null)
> > > >     * For now verifier logic for this is special-cased instead of
> > > >       adding new kfunc flag.
> > > >
> > > > This patch, along with the previous one, complete special verifier
> > > > handling for all rbtree API functions added in this series.
> > > >
> > >
> > > I think there are two issues with the current approach. The fundamental
> > > assumption with non-owning references is that it is part of the collection. So
> > > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > > bpf_list_del) as invalidation points is also right, since once a node has been
> > > removed it is going to be unclear whether existing non-owning references have
> > > the same value, and thus the property of 'part of the collection' will be
> > > broken.
> >
> > correct, but the patch set does invalidate after bpf_rbtree_remove(),
> > so it's not an issue.
> >
> > > The first issue relates to usability. If I have non-owning references to nodes
> > > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > > invalidate the ones that are part of the particular rbtree. It should have no
> > > effect on others. Likewise for the bpf_list_del operation in the future.
> > > Therefore, we need to track the collection identity associated with each
> > > non-owning reference, then only invalidate non-owning references associated with
> > > the same collection.
> > >
> > > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > > references.
> > >
> > > The second issue is more serious. By not tracking the collection identity, we
> > > will currently allow a non-owning reference for an object inserted into a list
> > > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > > equivalent in the verifier state. An object is allowed to have both
> > > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > > time (because of no shared ownership).
> > >
> > > 	struct obj {
> > > 		bpf_list_node ln;
> > > 		bpf_rb_node rn;
> > > 	};
> > >
> > > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> >
> > Also correct, but inserting the same single owner node into rbtree and link list
> > is not supported. Only 'shared ownership' node can be inserted into
> > two collections.
> 
> What is supported is having an object be part of a list and an rbtree one at a
> time, which is what I'm talking about here. Shared ownership has nothing to do
> with this.

I wouldn't call it "supported".
btf_parse_struct_metas() doesn't error on structs with bpf_list_node and bpf_rb_node in them.
It allows two bpf_list_node-s as well.
list_node+rb_node is broken as you said.
I'm not sure two list_node-s is correct either. For the same reasons.
So it's an omission rather than support.

> > The check to disallow bpf_list_node and bpf_rb_node in the same obj
> > can be a follow up patch to close this hole.
> >
> 
> Fine, that would also 'fix' this problem, where a non-owning reference part of a
> list could be passed to bpf_rbtree_remove etc. If there can only be a list node
> or an rbtree node in an object, such a case cannot be constructed. But I think
> it's an awkward limitation.

Regardless whether non-own refs exist or not. Two list_node-s are unusable
with single owner model.
struct obj {
	bpf_list_node ln;
	bpf_list_node ln2;
};

bpf_list_push_front(head, &obj->ln); // doesn't matter what we do with obj here
bpf_list_push_front(head2, &obj->ln2); // cannot be made to work correctly 
The only way to actually support this case is to have refcount in obj.

It can "work" when obj is a part of only one list at a time, but one can reasonably
argue that a requirement to have two 'bpf_list_node'-s in a obj just two insert obj
in one list, pop it, and then insert into another list through a different list_node
is nothing but a waste of memory.
The user should be able to use the same 'bpf_list_node' to insert into one
list_head, remove it from there, and then insert into another list_head.

> > > So some notion of a collection identity needs to be constructed, the amount of
> > > data which needs to be remembered in each non-owning reference's register state
> > > depends on our requirements.
> > >
> > > The first sanity check is that bpf_rbtree_remove only removes something in an
> > > rbtree, so probably an enum member indicating whether collection is a list or
> > > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > > than just the reg->id of the reg holding the graph root, since map values of
> > > different maps may have same id (0). Hence, we need id and ptr similar to the
> > > active lock case for proper matching. Even this won't be enough, as there can be
> > > multiple list or rbtree roots in a particular memory region, therefore the
> > > offset also needs to be part of the collection identity.
> > >
> > > So it seems it will amount to:
> > >
> > > 	struct bpf_collection_id {
> > > 		enum bpf_collection_type type;
> > > 		void *ptr;
> > > 		int id;
> > > 		int off;
> > > 	};
> > >
> > > There might be ways to optimize the memory footprint of this struct, but I'm
> > > just trying to state why we'll need to include all four, so we don't miss out on
> > > a corner case again.
> >
> > The trade-off doesn't feel right here. Tracking collection id complexity in
> > the verifier for single owner case is not worth it imo.
> 
> It was more about correctness after this set is applied than being worth it. We
> could argue it's not worth it for the first issue which relates to usability.
> Dave has already mentioned that point. But the second one is simply incorrect to
> allow. As soon as you want an object which can be part of both a list and rbtree
> at different times (e.g., an item usually part of an rbtree but which is popped
> out and moved to a list for some processing), people will be able to trigger it.

obj-in-a-list or obj-in-a-rbtree is technically possible, but I argue it's equally
weird corner case to support. See above why two list_node-s for head-at-a-time are awkward.

> Simply disallowing that (as you said above) is also an option. But whenever you
> do allow it, you will need to add something like this. It has little to do with
> whether shared ownership is supported or not.

I think we will allow two list_nodes or list_node+rb_node or ten rb_nodes
only when refcount is present as well and that would be 'shared ownership' case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
  2023-02-10 17:21     ` Alexei Starovoitov
@ 2023-02-10 19:01     ` Dave Marchevsky
  1 sibling, 0 replies; 27+ messages in thread
From: Dave Marchevsky @ 2023-02-10 19:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Kernel Team, Tejun Heo

On 2/10/23 8:55 AM, Kumar Kartikeya Dwivedi wrote:
> On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
>> Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
>> that require handling in the verifier:
>>
>>   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
>>     the bpf_rb_node field, with the offset set to that field's offset,
>>     instead of a struct bpf_rb_node *
>>     * mark_reg_graph_node helper added in previous patch generalizes
>>       this logic, use it
>>
>>   * bpf_rbtree_remove's node input is a node that's been inserted
>>     in the tree - a non-owning reference.
>>
>>   * bpf_rbtree_remove must invalidate non-owning references in order to
>>     avoid aliasing issue. Use previously-added
>>     invalidate_non_owning_refs helper to mark this function as a
>>     non-owning ref invalidation point.
>>
>>   * Unlike other functions, which convert one of their input arg regs to
>>     non-owning reference, bpf_rbtree_first takes no arguments and just
>>     returns a non-owning reference (possibly null)
>>     * For now verifier logic for this is special-cased instead of
>>       adding new kfunc flag.
>>
>> This patch, along with the previous one, complete special verifier
>> handling for all rbtree API functions added in this series.
>>

As I was writing this response I noticed that Alexei started a subthread
which makes similar points.

> 
> I think there are two issues with the current approach. The fundamental
> assumption with non-owning references is that it is part of the collection. So
> bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> is being removed from collection. Marking bpf_rbtree_remove (and in the future
> bpf_list_del) as invalidation points is also right, since once a node has been
> removed it is going to be unclear whether existing non-owning references have
> the same value, and thus the property of 'part of the collection' will be
> broken.
> 
> The first issue relates to usability. If I have non-owning references to nodes
> inserted into both a list and an rbtree, bpf_rbtree_remove should only
> invalidate the ones that are part of the particular rbtree. It should have no
> effect on others. Likewise for the bpf_list_del operation in the future.
> Therefore, we need to track the collection identity associated with each
> non-owning reference, then only invalidate non-owning references associated with
> the same collection.
> 
> The case of bpf_spin_unlock is different, which should invalidate all non-owning
> references.
> 

I agree that if we had "data structure" or "collection" identity we would be
able to make this optimization. I also agree that such an optimization is likely
necessary for good usability in nontrivial scenarios. Alexei convinced me to
punt on it for now (over VC, I think, so can't link a thread), with the goal
of keeping complexity of this series under control.

Since "invalidate all non-owning references" is a superset of "invalidate
non-owning references associated with a particular collection", I think it's
fine to proceed with this for now if it means we can avoid adding collection
identity until followup. But I do plan on adding it and was thinking along
similar lines to your struct bpf_collection_id below. Which brings us to
second issue...

> The second issue is more serious. By not tracking the collection identity, we
> will currently allow a non-owning reference for an object inserted into a list
> to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> equivalent in the verifier state. An object is allowed to have both
> bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> time (because of no shared ownership).
> 
> 	struct obj {
> 		bpf_list_node ln;
> 		bpf_rb_node rn;
> 	};
> 
> 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> 
> So some notion of a collection identity needs to be constructed, the amount of
> data which needs to be remembered in each non-owning reference's register state
> depends on our requirements.
> 
> The first sanity check is that bpf_rbtree_remove only removes something in an
> rbtree, so probably an enum member indicating whether collection is a list or
> rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> than just the reg->id of the reg holding the graph root, since map values of
> different maps may have same id (0). Hence, we need id and ptr similar to the
> active lock case for proper matching. Even this won't be enough, as there can be
> multiple list or rbtree roots in a particular memory region, therefore the
> offset also needs to be part of the collection identity.
> 

I agree with your logic here: the logic in this series will consider your
example code a valid program when it should fail verification.

Since there's no support for "shared ownership" added in this series, we must
prevent an object from being part of both list and rbtree in your above example.
Adding "collection identity" as you propose seems like it would be sufficient,
but would increase complexity of this series. Alternatively, we could add a
small patch which causes btf_parse_fields to fail if a struct has both
bpf_list_node and bpf_rb_node. Then a "collection identity"-focused followup
could remove that kludge in favor of proper logic.

This was an oversight on my part, I wasn't separating "node can be part of both
list and rbtree" from "node can be part of either, but not both". So I see why
you mention this second issue being orthogonal from shared ownership in the
other subthread. But, at least for our current internal usecases, we always want
shared ownership, not "part of either, but not both". If it's possible to
turn both off for now and do "collection identity" and "shared ownership"
followups, I'd rather do that and ship complexity as gradually as possible.

> So it seems it will amount to:
> 
> 	struct bpf_collection_id {
> 		enum bpf_collection_type type;
> 		void *ptr;
> 		int id;
> 		int off;
> 	};
> 
> There might be ways to optimize the memory footprint of this struct, but I'm
> just trying to state why we'll need to include all four, so we don't miss out on
> a corner case again.
> 
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>> [...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 18:58         ` Alexei Starovoitov
@ 2023-02-10 19:38           ` Kumar Kartikeya Dwivedi
  2023-02-10 20:01             ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-02-10 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 07:58:54PM CET, Alexei Starovoitov wrote:
> On Fri, Feb 10, 2023 at 07:03:46PM +0100, Kumar Kartikeya Dwivedi wrote:
> > On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> > > On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > > > that require handling in the verifier:
> > > > >
> > > > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > > > >     the bpf_rb_node field, with the offset set to that field's offset,
> > > > >     instead of a struct bpf_rb_node *
> > > > >     * mark_reg_graph_node helper added in previous patch generalizes
> > > > >       this logic, use it
> > > > >
> > > > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > > > >     in the tree - a non-owning reference.
> > > > >
> > > > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > > > >     avoid aliasing issue. Use previously-added
> > > > >     invalidate_non_owning_refs helper to mark this function as a
> > > > >     non-owning ref invalidation point.
> > > > >
> > > > >   * Unlike other functions, which convert one of their input arg regs to
> > > > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > > > >     returns a non-owning reference (possibly null)
> > > > >     * For now verifier logic for this is special-cased instead of
> > > > >       adding new kfunc flag.
> > > > >
> > > > > This patch, along with the previous one, complete special verifier
> > > > > handling for all rbtree API functions added in this series.
> > > > >
> > > >
> > > > I think there are two issues with the current approach. The fundamental
> > > > assumption with non-owning references is that it is part of the collection. So
> > > > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > > > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > > > bpf_list_del) as invalidation points is also right, since once a node has been
> > > > removed it is going to be unclear whether existing non-owning references have
> > > > the same value, and thus the property of 'part of the collection' will be
> > > > broken.
> > >
> > > correct, but the patch set does invalidate after bpf_rbtree_remove(),
> > > so it's not an issue.
> > >
> > > > The first issue relates to usability. If I have non-owning references to nodes
> > > > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > > > invalidate the ones that are part of the particular rbtree. It should have no
> > > > effect on others. Likewise for the bpf_list_del operation in the future.
> > > > Therefore, we need to track the collection identity associated with each
> > > > non-owning reference, then only invalidate non-owning references associated with
> > > > the same collection.
> > > >
> > > > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > > > references.
> > > >
> > > > The second issue is more serious. By not tracking the collection identity, we
> > > > will currently allow a non-owning reference for an object inserted into a list
> > > > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > > > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > > > equivalent in the verifier state. An object is allowed to have both
> > > > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > > > time (because of no shared ownership).
> > > >
> > > > 	struct obj {
> > > > 		bpf_list_node ln;
> > > > 		bpf_rb_node rn;
> > > > 	};
> > > >
> > > > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > > > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> > >
> > > Also correct, but inserting the same single owner node into rbtree and link list
> > > is not supported. Only 'shared ownership' node can be inserted into
> > > two collections.
> >
> > What is supported is having an object be part of a list and an rbtree one at a
> > time, which is what I'm talking about here. Shared ownership has nothing to do
> > with this.
>
> I wouldn't call it "supported".
> btf_parse_struct_metas() doesn't error on structs with bpf_list_node and bpf_rb_node in them.
> It allows two bpf_list_node-s as well.
> list_node+rb_node is broken as you said.
> I'm not sure two list_node-s is correct either. For the same reasons.

You mean it's incorrect before or after this series? If before, can you
elaborate? The node isn't usable with kfuncs after push before this series.

> So it's an omission rather than support.
>
> > > The check to disallow bpf_list_node and bpf_rb_node in the same obj
> > > can be a follow up patch to close this hole.
> > >
> >
> > Fine, that would also 'fix' this problem, where a non-owning reference part of a
> > list could be passed to bpf_rbtree_remove etc. If there can only be a list node
> > or an rbtree node in an object, such a case cannot be constructed. But I think
> > it's an awkward limitation.
>
> Regardless whether non-own refs exist or not. Two list_node-s are unusable
> with single owner model.
> struct obj {
> 	bpf_list_node ln;
> 	bpf_list_node ln2;
> };
>
> bpf_list_push_front(head, &obj->ln); // doesn't matter what we do with obj here
> bpf_list_push_front(head2, &obj->ln2); // cannot be made to work correctly
> The only way to actually support this case is to have refcount in obj.
>

Yes, I think we all agree it's not possible currently.

> It can "work" when obj is a part of only one list at a time, but one can reasonably
> argue that a requirement to have two 'bpf_list_node'-s in a obj just two insert obj
> in one list, pop it, and then insert into another list through a different list_node
> is nothing but a waste of memory.

I didn't get this. Unless I'm mistaken, you should be able to move objects
around from one list to another using the same single bpf_list_node. You don't
need two. Just that the list_head needs correct __contains annotation.

> The user should be able to use the same 'bpf_list_node' to insert into one
> list_head, remove it from there, and then insert into another list_head.
>

I think this already works.

> > > > So some notion of a collection identity needs to be constructed, the amount of
> > > > data which needs to be remembered in each non-owning reference's register state
> > > > depends on our requirements.
> > > >
> > > > The first sanity check is that bpf_rbtree_remove only removes something in an
> > > > rbtree, so probably an enum member indicating whether collection is a list or
> > > > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > > > than just the reg->id of the reg holding the graph root, since map values of
> > > > different maps may have same id (0). Hence, we need id and ptr similar to the
> > > > active lock case for proper matching. Even this won't be enough, as there can be
> > > > multiple list or rbtree roots in a particular memory region, therefore the
> > > > offset also needs to be part of the collection identity.
> > > >
> > > > So it seems it will amount to:
> > > >
> > > > 	struct bpf_collection_id {
> > > > 		enum bpf_collection_type type;
> > > > 		void *ptr;
> > > > 		int id;
> > > > 		int off;
> > > > 	};
> > > >
> > > > There might be ways to optimize the memory footprint of this struct, but I'm
> > > > just trying to state why we'll need to include all four, so we don't miss out on
> > > > a corner case again.
> > >
> > > The trade-off doesn't feel right here. Tracking collection id complexity in
> > > the verifier for single owner case is not worth it imo.
> >
> > It was more about correctness after this set is applied than being worth it. We
> > could argue it's not worth it for the first issue which relates to usability.
> > Dave has already mentioned that point. But the second one is simply incorrect to
> > allow. As soon as you want an object which can be part of both a list and rbtree
> > at different times (e.g., an item usually part of an rbtree but which is popped
> > out and moved to a list for some processing), people will be able to trigger it.
>
> obj-in-a-list or obj-in-a-rbtree is technically possible, but I argue it's equally
> weird corner case to support. See above why two list_node-s for head-at-a-time are awkward.
>
> > Simply disallowing that (as you said above) is also an option. But whenever you
> > do allow it, you will need to add something like this. It has little to do with
> > whether shared ownership is supported or not.
>
> I think we will allow two list_nodes or list_node+rb_node or ten rb_nodes
> only when refcount is present as well and that would be 'shared ownership' case.

I get that you just want to punt it for now until someone has a use case, and
move forward.

But saying that people will have to use a refcount in their object just to be
able to have single unique ownership while allowing it to be part of multiple
data structures at different times is weird. It's like forcing someone to use a
shared_ptr in C++ (when they can use a unique_ptr) just because they want to
std::move it from one place to another.

I thought the whole point was building composable building blocks and letting
people choose their tradeoffs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}
  2023-02-10 19:38           ` Kumar Kartikeya Dwivedi
@ 2023-02-10 20:01             ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-02-10 20:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Tejun Heo

On Fri, Feb 10, 2023 at 08:38:47PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Fri, Feb 10, 2023 at 07:58:54PM CET, Alexei Starovoitov wrote:
> > On Fri, Feb 10, 2023 at 07:03:46PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> > > > On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > > > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > > > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > > > > that require handling in the verifier:
> > > > > >
> > > > > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > > > > >     the bpf_rb_node field, with the offset set to that field's offset,
> > > > > >     instead of a struct bpf_rb_node *
> > > > > >     * mark_reg_graph_node helper added in previous patch generalizes
> > > > > >       this logic, use it
> > > > > >
> > > > > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > > > > >     in the tree - a non-owning reference.
> > > > > >
> > > > > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > > > > >     avoid aliasing issue. Use previously-added
> > > > > >     invalidate_non_owning_refs helper to mark this function as a
> > > > > >     non-owning ref invalidation point.
> > > > > >
> > > > > >   * Unlike other functions, which convert one of their input arg regs to
> > > > > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > > > > >     returns a non-owning reference (possibly null)
> > > > > >     * For now verifier logic for this is special-cased instead of
> > > > > >       adding new kfunc flag.
> > > > > >
> > > > > > This patch, along with the previous one, complete special verifier
> > > > > > handling for all rbtree API functions added in this series.
> > > > > >
> > > > >
> > > > > I think there are two issues with the current approach. The fundamental
> > > > > assumption with non-owning references is that it is part of the collection. So
> > > > > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > > > > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > > > > bpf_list_del) as invalidation points is also right, since once a node has been
> > > > > removed it is going to be unclear whether existing non-owning references have
> > > > > the same value, and thus the property of 'part of the collection' will be
> > > > > broken.
> > > >
> > > > correct, but the patch set does invalidate after bpf_rbtree_remove(),
> > > > so it's not an issue.
> > > >
> > > > > The first issue relates to usability. If I have non-owning references to nodes
> > > > > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > > > > invalidate the ones that are part of the particular rbtree. It should have no
> > > > > effect on others. Likewise for the bpf_list_del operation in the future.
> > > > > Therefore, we need to track the collection identity associated with each
> > > > > non-owning reference, then only invalidate non-owning references associated with
> > > > > the same collection.
> > > > >
> > > > > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > > > > references.
> > > > >
> > > > > The second issue is more serious. By not tracking the collection identity, we
> > > > > will currently allow a non-owning reference for an object inserted into a list
> > > > > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > > > > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > > > > equivalent in the verifier state. An object is allowed to have both
> > > > > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > > > > time (because of no shared ownership).
> > > > >
> > > > > 	struct obj {
> > > > > 		bpf_list_node ln;
> > > > > 		bpf_rb_node rn;
> > > > > 	};
> > > > >
> > > > > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > > > > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> > > >
> > > > Also correct, but inserting the same single owner node into rbtree and link list
> > > > is not supported. Only 'shared ownership' node can be inserted into
> > > > two collections.
> > >
> > > What is supported is having an object be part of a list and an rbtree one at a
> > > time, which is what I'm talking about here. Shared ownership has nothing to do
> > > with this.
> >
> > I wouldn't call it "supported".
> > btf_parse_struct_metas() doesn't error on structs with bpf_list_node and bpf_rb_node in them.
> > It allows two bpf_list_node-s as well.
> > list_node+rb_node is broken as you said.
> > I'm not sure two list_node-s is correct either. For the same reasons.
> 
> You mean it's incorrect before or after this series? If before, can you
> elaborate? The node isn't usable with kfuncs after push before this series.

I meant once we add non-own-refs and bpf_list_del. Something like:
struct obj {
	bpf_list_node ln;
	bpf_list_node ln2;
};

bpf_list_push_front(head, &obj->ln); // obj is non-own-ref
bpf_list_del(&obj->ln2); // should not work, but does

...details to fill in...
I'm arguing that bpf_list_node x 2 is the same danger zone as bpf_list_node + bpf_rb_node.
It's safe today only because link list is so limited.
I'm looking at full featured rb tree and link list with similar ops allowed in both.

> 
> > So it's an omission rather than support.
> >
> > > > The check to disallow bpf_list_node and bpf_rb_node in the same obj
> > > > can be a follow up patch to close this hole.
> > > >
> > >
> > > Fine, that would also 'fix' this problem, where a non-owning reference part of a
> > > list could be passed to bpf_rbtree_remove etc. If there can only be a list node
> > > or an rbtree node in an object, such a case cannot be constructed. But I think
> > > it's an awkward limitation.
> >
> > Regardless whether non-own refs exist or not. Two list_node-s are unusable
> > with single owner model.
> > struct obj {
> > 	bpf_list_node ln;
> > 	bpf_list_node ln2;
> > };
> >
> > bpf_list_push_front(head, &obj->ln); // doesn't matter what we do with obj here
> > bpf_list_push_front(head2, &obj->ln2); // cannot be made to work correctly
> > The only way to actually support this case is to have refcount in obj.
> >
> 
> Yes, I think we all agree it's not possible currently.
> 
> > It can "work" when obj is a part of only one list at a time, but one can reasonably
> > argue that a requirement to have two 'bpf_list_node'-s in a obj just two insert obj
> > in one list, pop it, and then insert into another list through a different list_node
> > is nothing but a waste of memory.
> 
> I didn't get this. Unless I'm mistaken, you should be able to move objects
> around from one list to another using the same single bpf_list_node. You don't
> need two. Just that the list_head needs correct __contains annotation.
> 
> > The user should be able to use the same 'bpf_list_node' to insert into one
> > list_head, remove it from there, and then insert into another list_head.
> >
> 
> I think this already works.

It could be. We don't have tests for it, so it's a guess.

> 
> > > > > So some notion of a collection identity needs to be constructed, the amount of
> > > > > data which needs to be remembered in each non-owning reference's register state
> > > > > depends on our requirements.
> > > > >
> > > > > The first sanity check is that bpf_rbtree_remove only removes something in an
> > > > > rbtree, so probably an enum member indicating whether collection is a list or
> > > > > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > > > > than just the reg->id of the reg holding the graph root, since map values of
> > > > > different maps may have same id (0). Hence, we need id and ptr similar to the
> > > > > active lock case for proper matching. Even this won't be enough, as there can be
> > > > > multiple list or rbtree roots in a particular memory region, therefore the
> > > > > offset also needs to be part of the collection identity.
> > > > >
> > > > > So it seems it will amount to:
> > > > >
> > > > > 	struct bpf_collection_id {
> > > > > 		enum bpf_collection_type type;
> > > > > 		void *ptr;
> > > > > 		int id;
> > > > > 		int off;
> > > > > 	};
> > > > >
> > > > > There might be ways to optimize the memory footprint of this struct, but I'm
> > > > > just trying to state why we'll need to include all four, so we don't miss out on
> > > > > a corner case again.
> > > >
> > > > The trade-off doesn't feel right here. Tracking collection id complexity in
> > > > the verifier for single owner case is not worth it imo.
> > >
> > > It was more about correctness after this set is applied than being worth it. We
> > > could argue it's not worth it for the first issue which relates to usability.
> > > Dave has already mentioned that point. But the second one is simply incorrect to
> > > allow. As soon as you want an object which can be part of both a list and rbtree
> > > at different times (e.g., an item usually part of an rbtree but which is popped
> > > out and moved to a list for some processing), people will be able to trigger it.
> >
> > obj-in-a-list or obj-in-a-rbtree is technically possible, but I argue it's equally
> > weird corner case to support. See above why two list_node-s for head-at-a-time are awkward.
> >
> > > Simply disallowing that (as you said above) is also an option. But whenever you
> > > do allow it, you will need to add something like this. It has little to do with
> > > whether shared ownership is supported or not.
> >
> > I think we will allow two list_nodes or list_node+rb_node or ten rb_nodes
> > only when refcount is present as well and that would be 'shared ownership' case.
> 
> I get that you just want to punt it for now until someone has a use case, and
> move forward.
> 
> But saying that people will have to use a refcount in their object just to be
> able to have single unique ownership while allowing it to be part of multiple
> data structures at different times is weird. It's like forcing someone to use a
> shared_ptr in C++ (when they can use a unique_ptr) just because they want to
> std::move it from one place to another.
> 
> I thought the whole point was building composable building blocks and letting
> people choose their tradeoffs.

That's exactly what we're doing. rb-tree, link lists are building blocks.
If we can improve their UX we will certainly do it. I'm arguing that we don't
have a real user yet, so whether UX is good or bad is subjective.

For example in this thread and never before we looked critically at kptrs
and kptr_xchg. The first time people tried to use for real they said it sucks.
That's a feedback that we should listen to instead of our own believes.
We try to strike a balance between feature being useful while keeping the verifier
complexity to the minimum. Then deliver the feature into the hands of users and
iterate based on their feedback.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-02-10 20:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 17:41 [PATCH v4 bpf-next 00/11] BPF rbtree next-gen datastructure Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 01/11] bpf: Migrate release_on_unlock logic to non-owning ref semantics Dave Marchevsky
2023-02-10 13:24   ` Kumar Kartikeya Dwivedi
2023-02-10 17:13     ` Alexei Starovoitov
2023-02-09 17:41 ` [PATCH v4 bpf-next 02/11] bpf: Improve bpf_reg_state space usage for non-owning ref lock Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 03/11] selftests/bpf: Update linked_list tests for non-owning ref semantics Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 04/11] bpf: Add basic bpf_rb_{root,node} support Dave Marchevsky
2023-02-10 14:18   ` Kumar Kartikeya Dwivedi
2023-02-09 17:41 ` [PATCH v4 bpf-next 05/11] bpf: Add bpf_rbtree_{add,remove,first} kfuncs Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 06/11] bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 07/11] bpf: Add callback validation to kfunc verifier logic Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first} Dave Marchevsky
2023-02-10  3:11   ` Alexei Starovoitov
2023-02-10  8:22     ` Dave Marchevsky
2023-02-10 17:30       ` Alexei Starovoitov
2023-02-10 14:15     ` Kumar Kartikeya Dwivedi
2023-02-10 13:55   ` Kumar Kartikeya Dwivedi
2023-02-10 17:21     ` Alexei Starovoitov
2023-02-10 18:03       ` Kumar Kartikeya Dwivedi
2023-02-10 18:58         ` Alexei Starovoitov
2023-02-10 19:38           ` Kumar Kartikeya Dwivedi
2023-02-10 20:01             ` Alexei Starovoitov
2023-02-10 19:01     ` Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 09/11] bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h Dave Marchevsky
2023-02-09 17:41 ` [PATCH v4 bpf-next 10/11] selftests/bpf: Add rbtree selftests Dave Marchevsky
2023-02-10  2:52   ` Alexei Starovoitov
2023-02-09 17:41 ` [PATCH v4 bpf-next 11/11] bpf, documentation: Add graph documentation for non-owning refs Dave Marchevsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.