bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists
@ 2022-10-13  6:22 Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 01/25] bpf: Document UAPI details for special BPF types Kumar Kartikeya Dwivedi
                   ` (24 more replies)
  0 siblings, 25 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Hi, this is the non-RFC v2. This is a major rewrite of the previous RFC set,
hence complete review of some of these patches is needed again. I have cut down
the series to only the specific pieces needed to enable other work (like Dave's
RB-Tree). The rest will come as a follow up to this.

--

This series introduces user defined BPF objects, by introducing the idea of
local kptrs. These are kptrs (strongly typed pointers) that refer to objects of
a user defined type, hence called "local" kptrs. This allows BPF programs to
allocate their own objects, build their own object hierarchies, and use the
basic building blocks provided by BPF runtime to build their own data structures
flexibly.

Then, we introduce the support for single ownership BPF linked lists, which can
be put inside BPF maps, or local kptrs, and hold such allocated local kptrs as
elements. It works as an instrusive collection, which is done to allow making
local kptrs part of multiple data structures at the same time in the future.

The eventual goal of this and future patches is to allow one to do some limited
form of kernel style programming in BPF C, and allow programmers to build their
own complex data structures flexibly out of basic building blocks.

The key difference will be that such programs are verified to be safe, preserve
runtime integrity of the system, and are proven to be bug free as far as the
invariants of BPF specific APIs are concerned.

One immediate use case that will be using the entire infrastructure this series
is introducing will be managing percpu NMI safe linked lists inside BPF
programs.

The other use case this will serve in the near future will be linking kernel
structures like XDP frame and sk_buff directly into user data structures
(rbtree, pifomap, etc.) for packet queueing. This will follow single ownership
concept included in this series.

The user has complete control of the internal locking, and hence also the
batching of operations for each critical section.

The features are:
- Local kptrs - User defined kernel objects.
- bpf_kptr_new, bpf_kptr_drop to allocate and free them.
- Single ownership BPF linked lists.
  - Support for them in BPF maps.
  - Support for them in local kptrs.
- Global spin locks.
- Spin locks inside local kptrs.
- Allow storing local kptrs in all BPF maps with support for kernel kptrs.

Some other notable things:
- Completely static verification of locking.
- Kfunc argument handling has been completely reworked.
- Argument rewriting support for kfuncs.
- Search pruning now understands non-size precise registers.
- A new bpf_experimental.h header as a dumping ground for these APIs.

Any functionality exposed in this series is NOT part of UAPI. It is only
available through use of kfuncs, and structs that can be added to map value may
also change their size or name in the future. Hence, every feature in this
series must be considered experimental.

Follow-ups:
-----------
 * Support for kptrs (local and kernel) in local storage and percpu maps + kptr tests
 * Fixes that rebase on top of refactorings in this series for dynptrs
   and helper access checks (not included since unrelated, and this is too big already)

Next steps:
-----------
 * NMI safe percpu single ownership linked lists (using local_t protection).
 * Lockless linked lists.
 * Allow RCU protected local kptrs. This then allows RCU protected list lookups,
   since spinlock protection for readers does not scale.
 * Introduce explicit RCU read sections (using kfuncs).
 * Introduce bpf_refcount for local kptrs, shared ownership.
 * Introduce shared ownership linked lists.
 * Documentation.

Notes:
------

 * Dave's patch for libbpf sections is still needed for this to move forward.

Changelog:
----------
 v1 -> v2
 v1: https://lore.kernel.org/bpf/20221011012240.3149-1-memxor@gmail.com

  * Rebase on bpf-next to resolve merge conflict in DENYLIST.s390x
  * Fix a couple of mental lapses in bpf_list_head_free

 RFC v1 -> v1
 RFC v1: https://lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com

  * Mostly a complete rewrite of BTF parsing, refactor existing code (Kartikeya)
  * Rebase kfunc rewrite for bpf-next, add support for more changes
  * Cache type metadata in BTF to avoid recomputation inside verifier (Kartikeya)
  * Remove __kernel tag, make things similar to map values, reserve bpf_ prefix
  * bpf_kptr_new, bpf_kptr_drop (Alexei)
  * Rename precision state enum values (Alexei)
  * Drop explicit constructor/destructor support (Alexei)
  * Rewrite code for constructing/destructing objects and offload to runtime
  * Minimize duplication in bpf_map_value_off_desc handling (Alexei)
  * Expose global memory allocator (Alexei)
  * Address other nits from Alexei
  * Split out local kptrs in maps, more kptrs in maps support into a follow up

Links:
------
 * Dave's BPF RB-Tree RFC series
   v1 (Discussion thread)
     https://lore.kernel.org/bpf/20220722183438.3319790-1-davemarchevsky@fb.com
   v2 (With support for static locks)
     https://lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
 * BPF Linked Lists Discussion
   https://lore.kernel.org/bpf/CAP01T74U30+yeBHEgmgzTJ-XYxZ0zj71kqCDJtTH9YQNfTK+Xw@mail.gmail.com
 * BPF Memory Allocator from Alexei
   https://lore.kernel.org/bpf/20220902211058.60789-1-alexei.starovoitov@gmail.com
 * BPF Memory Allocator UAPI Discussion
   https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com

Dave Marchevsky (1):
  libbpf: Add support for private BSS map section

Kumar Kartikeya Dwivedi (24):
  bpf: Document UAPI details for special BPF types
  bpf: Allow specifying volatile type modifier for kptrs
  bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID
  bpf: Fix slot type check in check_stack_write_var_off
  bpf: Drop reg_type_may_be_refcounted_or_null
  bpf: Refactor kptr_off_tab into fields_tab
  bpf: Consolidate spin_lock, timer management into fields_tab
  bpf: Refactor map->off_arr handling
  bpf: Support bpf_list_head in map values
  bpf: Introduce local kptrs
  bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs
  bpf: Verify ownership relationships for owning types
  bpf: Support locking bpf_spin_lock in local kptr
  bpf: Allow locking bpf_spin_lock global variables
  bpf: Rewrite kfunc argument handling
  bpf: Drop kfunc bits from btf_check_func_arg_match
  bpf: Support constant scalar arguments for kfuncs
  bpf: Teach verifier about non-size constant arguments
  bpf: Introduce bpf_kptr_new
  bpf: Introduce bpf_kptr_drop
  bpf: Permit NULL checking pointer with non-zero fixed offset
  bpf: Introduce single ownership BPF linked list API
  selftests/bpf: Add __contains macro to bpf_experimental.h
  selftests/bpf: Add BPF linked list API tests

 Documentation/bpf/bpf_design_QA.rst           |   44 +
 Documentation/bpf/kfuncs.rst                  |   30 +
 include/linux/bpf.h                           |  238 ++-
 include/linux/bpf_verifier.h                  |   22 +-
 include/linux/btf.h                           |   76 +-
 include/linux/filter.h                        |    4 +-
 kernel/bpf/arraymap.c                         |   30 +-
 kernel/bpf/bpf_local_storage.c                |    2 +-
 kernel/bpf/btf.c                              | 1204 +++++++-------
 kernel/bpf/core.c                             |   14 +
 kernel/bpf/hashtab.c                          |   40 +-
 kernel/bpf/helpers.c                          |  141 +-
 kernel/bpf/local_storage.c                    |    2 +-
 kernel/bpf/map_in_map.c                       |   18 +-
 kernel/bpf/syscall.c                          |  381 +++--
 kernel/bpf/verifier.c                         | 1445 ++++++++++++++---
 net/bpf/bpf_dummy_struct_ops.c                |    3 +-
 net/core/bpf_sk_storage.c                     |    4 +-
 net/core/filter.c                             |   13 +-
 net/ipv4/bpf_tcp_ca.c                         |    3 +-
 net/netfilter/nf_conntrack_bpf.c              |    1 +
 tools/lib/bpf/libbpf.c                        |   65 +-
 tools/testing/selftests/bpf/DENYLIST.s390x    |    1 +
 .../testing/selftests/bpf/bpf_experimental.h  |   85 +
 .../bpf/prog_tests/kfunc_dynptr_param.c       |    2 +-
 .../selftests/bpf/prog_tests/linked_list.c    |   88 +
 .../testing/selftests/bpf/progs/linked_list.c |  325 ++++
 tools/testing/selftests/bpf/verifier/calls.c  |    4 +-
 .../selftests/bpf/verifier/ref_tracking.c     |    4 +-
 29 files changed, 3146 insertions(+), 1143 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.c

-- 
2.38.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 01/25] bpf: Document UAPI details for special BPF types
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 02/25] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

The kernel recognizes some special BPF types in map values or local
kptrs. Document that only bpf_spin_lock and bpf_timer will preserve
backwards compatibility, and kptr will preserve backwards compatibility
for the operations on the pointer, not the types supported for such
kptrs.

For local kptrs, document that there are no stability guarantees at all.

Finally, document that 'bpf_' namespace is reserved for adding future
special fields, hence BPF programs must not declare types with such
names in their programs and still expect backwards compatibility.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 Documentation/bpf/bpf_design_QA.rst | 44 +++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index a210b8a4df00..c82c50475e1b 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -298,3 +298,47 @@ A: NO.
 
 The BTF_ID macro does not cause a function to become part of the ABI
 any more than does the EXPORT_SYMBOL_GPL macro.
+
+Q: What is the compatibility story for special BPF types in map values?
+-----------------------------------------------------------------------
+Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
+values (when using BTF support for BPF maps). This allows to use helpers for
+such objects on these fields inside map values. Users are also allowed to embed
+pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
+kernel preserve backwards compatibility for these features?
+
+A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
+NO, but see below.
+
+For struct types that have been added already, like bpf_spin_lock and bpf_timer,
+the kernel will preserve backwards compatibility, as they are part of UAPI.
+
+For kptrs, they are also part of UAPI, but only with respect to the kptr
+mechanism. The types that you can use with a __kptr and __kptr_ref tagged
+pointer in your struct is NOT part of the UAPI contract. The supported types
+can and will change across kernel releases. However, operations like accessing
+kptr fields and bpf_kptr_xchg() helper will continue to be supported across
+kernel releases for the supported types.
+
+For any other supported struct type, unless explicitly stated in this document
+and added to bpf.h UAPI header, such types can and will arbitrarily change their
+size, type, and alignment, or any other user visible API or ABI detail across
+kernel releases. The users must adapt their BPF programs to the new changes and
+update them to make sure their programs continue to work correctly.
+
+NOTE: BPF subsystem specially reserves the 'bpf_' prefix for type names, in
+order to introduce more special fields in the future. Hence, user programs must
+avoid defining types with 'bpf_' prefix to not be broken in future releases. In
+other words, no backwards compatibility is guaranteed if one using a type in BTF
+with 'bpf_' prefix.
+
+Q: What is the compatibility story for special BPF types in local kptrs?
+------------------------------------------------------------------------
+Q: Same as above, but for local kptrs (i.e. kptrs allocated using bpf_kptr_new
+for user defined structures). Will the kernel preserve backwards compatibility
+for these features?
+
+A: NO.
+
+Unlike map value types, there are no stability guarantees for this case. The
+whole local kptr API itself is unstable (since it is exposed through kfuncs).
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 02/25] bpf: Allow specifying volatile type modifier for kptrs
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 01/25] bpf: Document UAPI details for special BPF types Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 03/25] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID Kumar Kartikeya Dwivedi
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

This is useful in particular to mark the pointer as volatile, so that
compiler treats each load and store to the field as a volatile access.
The alternative is having to define and use READ_ONCE and WRITE_ONCE in
the BPF program.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h | 5 +++++
 kernel/bpf/btf.c    | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9aababc5d78..86aad9b2ce02 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -288,6 +288,11 @@ static inline bool btf_type_is_typedef(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
 }
 
+static inline bool btf_type_is_volatile(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_VOLATILE;
+}
+
 static inline bool btf_type_is_func(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index eba603cec2c5..ad301e78f7ee 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3225,6 +3225,9 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	enum bpf_kptr_type type;
 	u32 res_id;
 
+	/* Permit modifiers on the pointer itself */
+	if (btf_type_is_volatile(t))
+		t = btf_type_by_id(btf, t->type);
 	/* For PTR, sz is always == 8 */
 	if (!btf_type_is_ptr(t))
 		return BTF_FIELD_IGNORE;
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 03/25] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 01/25] bpf: Document UAPI details for special BPF types Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 02/25] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 04/25] bpf: Fix slot type check in check_stack_write_var_off Kumar Kartikeya Dwivedi
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Yonghong Song, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Dave Marchevsky, Delyan Kratunov

When support was added for spilled PTR_TO_BTF_ID to be accessed by
helper memory access, the stack slot was not overwritten to STACK_MISC
(and that too is only safe when env->allow_ptr_leaks is true).

This means that helpers who take ARG_PTR_TO_MEM and write to it may
essentially overwrite the value while the verifier continues to track
the slot for spilled register.

This can cause issues when PTR_TO_BTF_ID is spilled to stack, and then
overwritten by helper write access, which can then be passed to BPF
helpers or kfuncs.

Handle this by falling back to the case introduced in a later commit,
which will also handle PTR_TO_BTF_ID along with other pointer types,
i.e. cd17d38f8b28 ("bpf: Permits pointers on stack for helper calls").

Finally, include a comment on why REG_LIVE_WRITTEN is not being set when
clobber is set to true. In short, the reason is that while when clobber
is unset, we know that we won't be writing, when it is true, we *may*
write to any of the stack slots in that range. It may be a partial or
complete write, to just one or many stack slots.

We cannot be sure, hence to be conservative, we leave things as is and
never set REG_LIVE_WRITTEN for any stack slot. However, clobber still
needs to reset them to STACK_MISC assuming writes happened. However read
marks still need to be propagated upwards from liveness point of view,
as parent stack slot's contents may still continue to matter to child
states.

Cc: Yonghong Song <yhs@meta.com>
Fixes: 1d68f22b3d53 ("bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f6d2d511c06..48a10d79f1bf 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5154,10 +5154,6 @@ static int check_stack_range_initialized(
 			goto mark;
 		}
 
-		if (is_spilled_reg(&state->stack[spi]) &&
-		    base_type(state->stack[spi].spilled_ptr.type) == PTR_TO_BTF_ID)
-			goto mark;
-
 		if (is_spilled_reg(&state->stack[spi]) &&
 		    (state->stack[spi].spilled_ptr.type == SCALAR_VALUE ||
 		     env->allow_ptr_leaks)) {
@@ -5188,6 +5184,11 @@ static int check_stack_range_initialized(
 		mark_reg_read(env, &state->stack[spi].spilled_ptr,
 			      state->stack[spi].spilled_ptr.parent,
 			      REG_LIVE_READ64);
+		/* We do not set REG_LIVE_WRITTEN for stack slot, as we can not
+		 * be sure that whether stack slot is written to or not. Hence,
+		 * we must still conservatively propagate reads upwards even if
+		 * helper may write to the entire memory range.
+		 */
 	}
 	return update_stack_depth(env, state, min_off);
 }
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 04/25] bpf: Fix slot type check in check_stack_write_var_off
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 03/25] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null Kumar Kartikeya Dwivedi
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

For the case where allow_ptr_leaks is false, code is checking whether
slot type is STACK_INVALID and STACK_SPILL and rejecting other cases.
This is a consequence of incorrectly checking for register type instead
of the slot type (NOT_INIT and SCALAR_VALUE respectively). Fix the
check.

Fixes: 01f810ace9ed ("bpf: Allow variable-offset stack access")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 48a10d79f1bf..bbbd44b0fd6f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3181,14 +3181,17 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
 		mark_stack_slot_scratched(env, spi);
 
-		if (!env->allow_ptr_leaks
-				&& *stype != NOT_INIT
-				&& *stype != SCALAR_VALUE) {
-			/* Reject the write if there's are spilled pointers in
-			 * range. If we didn't reject here, the ptr status
-			 * would be erased below (even though not all slots are
-			 * actually overwritten), possibly opening the door to
-			 * leaks.
+		if (!env->allow_ptr_leaks && *stype != STACK_MISC && *stype != STACK_ZERO) {
+			/* Reject the write if range we may write to has not
+			 * been initialized beforehand. If we didn't reject
+			 * here, the ptr status would be erased below (even
+			 * though not all slots are actually overwritten),
+			 * possibly opening the door to leaks.
+			 *
+			 * We do however catch STACK_INVALID case below, and
+			 * only allow reading possibly uninitialized memory
+			 * later for CAP_PERFMON, as the write may not happen to
+			 * that slot.
 			 */
 			verbose(env, "spilled ptr in range of var-offset stack write; insn %d, ptr off: %d",
 				insn_idx, i);
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 04/25] bpf: Fix slot type check in check_stack_write_var_off Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19 16:04   ` Dave Marchevsky
  2022-10-13  6:22 ` [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab Kumar Kartikeya Dwivedi
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

It is not scalable to maintain a list of types that can have non-zero
ref_obj_id. It is never set for scalars anyway, so just remove the
conditional on register types and print it whenever it is non-zero.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bbbd44b0fd6f..c96419cf7033 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -457,13 +457,6 @@ static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 		map_value_has_spin_lock(reg->map_ptr);
 }
 
-static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
-{
-	type = base_type(type);
-	return type == PTR_TO_SOCKET || type == PTR_TO_TCP_SOCK ||
-		type == PTR_TO_MEM || type == PTR_TO_BTF_ID;
-}
-
 static bool type_is_rdonly_mem(u32 type)
 {
 	return type & MEM_RDONLY;
@@ -875,7 +868,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 
 			if (reg->id)
 				verbose_a("id=%d", reg->id);
-			if (reg_type_may_be_refcounted_or_null(t) && reg->ref_obj_id)
+			if (reg->ref_obj_id)
 				verbose_a("ref_obj_id=%d", reg->ref_obj_id);
 			if (t != SCALAR_VALUE)
 				verbose_a("off=%d", reg->off);
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19  1:35   ` Alexei Starovoitov
  2022-10-13  6:22 ` [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management " Kumar Kartikeya Dwivedi
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

To prepare the BPF verifier to handle special fields in both map values
and program allocated types coming from program BTF, we need to refactor
the kptr_off_tab handling code into something more generic and reusable
across both cases to avoid code duplication.

Later patches also require passing this data to helpers at runtime, so
that they can work on user defined types, initialize them, destruct
them, etc.

The main observation is that both map values and such allocated types
point to a type in program BTF, hence they can be handled similarly. We
can prepare a field metadata table for both cases and store them in
struct bpf_map or struct btf depending on the use case.

Hence, refactor the code into generic btf_type_fields and btf_field
member structs. The btf_type_fields represents the fields of a specific
btf_type in user BTF. The cnt indicates the number of special fields we
successfully recognized, and field_mask is a bitmask of fields that were
found, to enable quick determination of availability of a certain field.

Subsequently, refactor the rest of the code to work with these generic
types, remove assumptions about kptr and kptr_off_tab, rename variables
to more meaningful names, etc.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h     | 103 +++++++++++++-------
 include/linux/btf.h     |   4 +-
 kernel/bpf/arraymap.c   |  13 ++-
 kernel/bpf/btf.c        |  64 ++++++-------
 kernel/bpf/hashtab.c    |  14 ++-
 kernel/bpf/map_in_map.c |  13 ++-
 kernel/bpf/syscall.c    | 203 +++++++++++++++++++++++-----------------
 kernel/bpf/verifier.c   |  96 ++++++++++---------
 8 files changed, 289 insertions(+), 221 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9e7d46d16032..25e77a172d7c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -164,35 +164,41 @@ struct bpf_map_ops {
 };
 
 enum {
-	/* Support at most 8 pointers in a BPF map value */
-	BPF_MAP_VALUE_OFF_MAX = 8,
-	BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
+	/* Support at most 8 pointers in a BTF type */
+	BTF_FIELDS_MAX	      = 8,
+	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
 				1 + /* for bpf_spin_lock */
 				1,  /* for bpf_timer */
 };
 
-enum bpf_kptr_type {
-	BPF_KPTR_UNREF,
-	BPF_KPTR_REF,
+enum btf_field_type {
+	BPF_KPTR_UNREF = (1 << 2),
+	BPF_KPTR_REF   = (1 << 3),
+	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 };
 
-struct bpf_map_value_off_desc {
+struct btf_field_kptr {
+	struct btf *btf;
+	struct module *module;
+	btf_dtor_kfunc_t dtor;
+	u32 btf_id;
+};
+
+struct btf_field {
 	u32 offset;
-	enum bpf_kptr_type type;
-	struct {
-		struct btf *btf;
-		struct module *module;
-		btf_dtor_kfunc_t dtor;
-		u32 btf_id;
-	} kptr;
+	enum btf_field_type type;
+	union {
+		struct btf_field_kptr kptr;
+	};
 };
 
-struct bpf_map_value_off {
-	u32 nr_off;
-	struct bpf_map_value_off_desc off[];
+struct btf_type_fields {
+	u32 cnt;
+	u32 field_mask;
+	struct btf_field fields[];
 };
 
-struct bpf_map_off_arr {
+struct btf_type_fields_off {
 	u32 cnt;
 	u32 field_off[BPF_MAP_OFF_ARR_MAX];
 	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
@@ -214,7 +220,7 @@ struct bpf_map {
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
 	int spin_lock_off; /* >=0 valid offset, <0 error */
-	struct bpf_map_value_off *kptr_off_tab;
+	struct btf_type_fields *fields_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
@@ -226,7 +232,7 @@ struct bpf_map {
 	struct obj_cgroup *objcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
-	struct bpf_map_off_arr *off_arr;
+	struct btf_type_fields_off *off_arr;
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
 	 */
@@ -250,6 +256,37 @@ struct bpf_map {
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 };
 
+static inline u32 btf_field_type_size(enum btf_field_type type)
+{
+	switch (type) {
+	case BPF_KPTR_UNREF:
+	case BPF_KPTR_REF:
+		return sizeof(u64);
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
+static inline u32 btf_field_type_align(enum btf_field_type type)
+{
+	switch (type) {
+	case BPF_KPTR_UNREF:
+	case BPF_KPTR_REF:
+		return __alignof__(u64);
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
+static inline bool btf_type_fields_has_field(const struct btf_type_fields *tab, enum btf_field_type type)
+{
+	if (IS_ERR_OR_NULL(tab))
+		return false;
+	return tab->field_mask & type;
+}
+
 static inline bool map_value_has_spin_lock(const struct bpf_map *map)
 {
 	return map->spin_lock_off >= 0;
@@ -260,23 +297,19 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
 	return map->timer_off >= 0;
 }
 
-static inline bool map_value_has_kptrs(const struct bpf_map *map)
-{
-	return !IS_ERR_OR_NULL(map->kptr_off_tab);
-}
-
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
 		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
 	if (unlikely(map_value_has_timer(map)))
 		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
-	if (unlikely(map_value_has_kptrs(map))) {
-		struct bpf_map_value_off *tab = map->kptr_off_tab;
+	if (!IS_ERR_OR_NULL(map->fields_tab)) {
+		struct btf_field *fields = map->fields_tab->fields;
+		u32 cnt = map->fields_tab->cnt;
 		int i;
 
-		for (i = 0; i < tab->nr_off; i++)
-			*(u64 *)(dst + tab->off[i].offset) = 0;
+		for (i = 0; i < cnt; i++)
+			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
 	}
 }
 
@@ -1691,11 +1724,13 @@ void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
-struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
-void bpf_map_free_kptr_off_tab(struct bpf_map *map);
-struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
-bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
-void bpf_map_free_kptrs(struct bpf_map *map, void *map_value);
+struct btf_field *btf_type_fields_find(const struct btf_type_fields *tab,
+				       u32 offset, enum btf_field_type type);
+void btf_type_fields_free(struct btf_type_fields *tab);
+void bpf_map_free_fields_tab(struct bpf_map *map);
+struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab);
+bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf_type_fields *tab_b);
+void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj);
 
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 86aad9b2ce02..0d47cbb11a59 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -163,8 +163,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
-struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
-					  const struct btf_type *t);
+struct btf_type_fields *btf_parse_fields(const struct btf *btf,
+					 const struct btf_type *t);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 832b2659e96e..defe5c00049a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -310,8 +310,7 @@ static void check_and_free_fields(struct bpf_array *arr, void *val)
 {
 	if (map_value_has_timer(&arr->map))
 		bpf_timer_cancel_and_free(val + arr->map.timer_off);
-	if (map_value_has_kptrs(&arr->map))
-		bpf_map_free_kptrs(&arr->map, val);
+	bpf_obj_free_fields(arr->map.fields_tab, val);
 }
 
 /* Called from syscall or from eBPF program */
@@ -409,7 +408,7 @@ static void array_map_free_timers(struct bpf_map *map)
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	int i;
 
-	/* We don't reset or free kptr on uref dropping to zero. */
+	/* We don't reset or free fields other than timer on uref dropping to zero. */
 	if (!map_value_has_timer(map))
 		return;
 
@@ -423,22 +422,22 @@ static void array_map_free(struct bpf_map *map)
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	int i;
 
-	if (map_value_has_kptrs(map)) {
+	if (!IS_ERR_OR_NULL(map->fields_tab)) {
 		if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 			for (i = 0; i < array->map.max_entries; i++) {
 				void __percpu *pptr = array->pptrs[i & array->index_mask];
 				int cpu;
 
 				for_each_possible_cpu(cpu) {
-					bpf_map_free_kptrs(map, per_cpu_ptr(pptr, cpu));
+					bpf_obj_free_fields(map->fields_tab, per_cpu_ptr(pptr, cpu));
 					cond_resched();
 				}
 			}
 		} else {
 			for (i = 0; i < array->map.max_entries; i++)
-				bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
+				bpf_obj_free_fields(map->fields_tab, array_map_elem_ptr(array, i));
 		}
-		bpf_map_free_kptr_off_tab(map);
+		bpf_map_free_fields_tab(map);
 	}
 
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ad301e78f7ee..c8d267098b87 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3191,7 +3191,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
-enum btf_field_type {
+enum btf_field_info_type {
 	BTF_FIELD_SPIN_LOCK,
 	BTF_FIELD_TIMER,
 	BTF_FIELD_KPTR,
@@ -3203,9 +3203,9 @@ enum {
 };
 
 struct btf_field_info {
-	u32 type_id;
+	enum btf_field_type type;
 	u32 off;
-	enum bpf_kptr_type type;
+	u32 type_id;
 };
 
 static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
@@ -3222,7 +3222,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
 static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 			 u32 off, int sz, struct btf_field_info *info)
 {
-	enum bpf_kptr_type type;
+	enum btf_field_type type;
 	u32 res_id;
 
 	/* Permit modifiers on the pointer itself */
@@ -3259,7 +3259,7 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
 				 const char *name, int sz, int align,
-				 enum btf_field_type field_type,
+				 enum btf_field_info_type field_type,
 				 struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_member *member;
@@ -3311,7 +3311,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				const char *name, int sz, int align,
-				enum btf_field_type field_type,
+				enum btf_field_info_type field_type,
 				struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_var_secinfo *vsi;
@@ -3360,7 +3360,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  enum btf_field_type field_type,
+			  enum btf_field_info_type field_type,
 			  struct btf_field_info *info, int info_cnt)
 {
 	const char *name;
@@ -3423,14 +3423,14 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 	return info.off;
 }
 
-struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
-					  const struct btf_type *t)
+struct btf_type_fields *btf_parse_fields(const struct btf *btf,
+					 const struct btf_type *t)
 {
-	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
-	struct bpf_map_value_off *tab;
+	struct btf_field_info info_arr[BTF_FIELDS_MAX];
 	struct btf *kernel_btf = NULL;
+	struct btf_type_fields *tab;
 	struct module *mod = NULL;
-	int ret, i, nr_off;
+	int ret, i, cnt;
 
 	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
 	if (ret < 0)
@@ -3438,12 +3438,12 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 	if (!ret)
 		return NULL;
 
-	nr_off = ret;
-	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
+	cnt = ret;
+	tab = kzalloc(offsetof(struct btf_type_fields, fields[cnt]), GFP_KERNEL | __GFP_NOWARN);
 	if (!tab)
 		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < nr_off; i++) {
+	tab->cnt = 0;
+	for (i = 0; i < cnt; i++) {
 		const struct btf_type *t;
 		s32 id;
 
@@ -3500,28 +3500,24 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 				ret = -EINVAL;
 				goto end_mod;
 			}
-			tab->off[i].kptr.dtor = (void *)addr;
+			tab->fields[i].kptr.dtor = (void *)addr;
 		}
 
-		tab->off[i].offset = info_arr[i].off;
-		tab->off[i].type = info_arr[i].type;
-		tab->off[i].kptr.btf_id = id;
-		tab->off[i].kptr.btf = kernel_btf;
-		tab->off[i].kptr.module = mod;
+		tab->fields[i].offset = info_arr[i].off;
+		tab->fields[i].type = info_arr[i].type;
+		tab->fields[i].kptr.btf_id = id;
+		tab->fields[i].kptr.btf = kernel_btf;
+		tab->fields[i].kptr.module = mod;
+		tab->cnt++;
 	}
-	tab->nr_off = nr_off;
+	tab->cnt = cnt;
 	return tab;
 end_mod:
 	module_put(mod);
 end_btf:
 	btf_put(kernel_btf);
 end:
-	while (i--) {
-		btf_put(tab->off[i].kptr.btf);
-		if (tab->off[i].kptr.module)
-			module_put(tab->off[i].kptr.module);
-	}
-	kfree(tab);
+	btf_type_fields_free(tab);
 	return ERR_PTR(ret);
 }
 
@@ -6365,7 +6361,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 
 		/* kptr_get is only true for kfunc */
 		if (i == 0 && kptr_get) {
-			struct bpf_map_value_off_desc *off_desc;
+			struct btf_field *kptr_field;
 
 			if (reg->type != PTR_TO_MAP_VALUE) {
 				bpf_log(log, "arg#0 expected pointer to map value\n");
@@ -6381,8 +6377,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				return -EINVAL;
 			}
 
-			off_desc = bpf_map_kptr_off_contains(reg->map_ptr, reg->off + reg->var_off.value);
-			if (!off_desc || off_desc->type != BPF_KPTR_REF) {
+			kptr_field = btf_type_fields_find(reg->map_ptr->fields_tab, reg->off + reg->var_off.value, BPF_KPTR);
+			if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
 				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
 					reg->off + reg->var_off.value);
 				return -EINVAL;
@@ -6401,8 +6397,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 					func_name, i, btf_type_str(ref_t), ref_tname);
 				return -EINVAL;
 			}
-			if (!btf_struct_ids_match(log, btf, ref_id, 0, off_desc->kptr.btf,
-						  off_desc->kptr.btf_id, true)) {
+			if (!btf_struct_ids_match(log, btf, ref_id, 0, kptr_field->kptr.btf,
+						  kptr_field->kptr.btf_id, true)) {
 				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
 					func_name, i, btf_type_str(ref_t), ref_tname);
 				return -EINVAL;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index ed3f8a53603b..59cdbea587c5 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -238,21 +238,20 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
 	}
 }
 
-static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
+static void htab_free_prealloced_fields(struct bpf_htab *htab)
 {
 	u32 num_entries = htab->map.max_entries;
 	int i;
 
-	if (!map_value_has_kptrs(&htab->map))
+	if (IS_ERR_OR_NULL(htab->map.fields_tab))
 		return;
 	if (htab_has_extra_elems(htab))
 		num_entries += num_possible_cpus();
-
 	for (i = 0; i < num_entries; i++) {
 		struct htab_elem *elem;
 
 		elem = get_htab_elem(htab, i);
-		bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
+		bpf_obj_free_fields(htab->map.fields_tab, elem->key + round_up(htab->map.key_size, 8));
 		cond_resched();
 	}
 }
@@ -766,8 +765,7 @@ static void check_and_free_fields(struct bpf_htab *htab,
 
 	if (map_value_has_timer(&htab->map))
 		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
-	if (map_value_has_kptrs(&htab->map))
-		bpf_map_free_kptrs(&htab->map, map_value);
+	bpf_obj_free_fields(htab->map.fields_tab, map_value);
 }
 
 /* It is called from the bpf_lru_list when the LRU needs to delete
@@ -1517,11 +1515,11 @@ static void htab_map_free(struct bpf_map *map)
 	if (!htab_is_prealloc(htab)) {
 		delete_all_elements(htab);
 	} else {
-		htab_free_prealloced_kptrs(htab);
+		htab_free_prealloced_fields(htab);
 		prealloc_destroy(htab);
 	}
 
-	bpf_map_free_kptr_off_tab(map);
+	bpf_map_free_fields_tab(map);
 	free_percpu(htab->extra_elems);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 135205d0d560..2bff5f3a5efc 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -52,7 +52,14 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	inner_map_meta->max_entries = inner_map->max_entries;
 	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
 	inner_map_meta->timer_off = inner_map->timer_off;
-	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
+	inner_map_meta->fields_tab = btf_type_fields_dup(inner_map->fields_tab);
+	if (IS_ERR(inner_map_meta->fields_tab)) {
+		/* btf_type_fields returns NULL or valid pointer in case of
+		 * invalid/empty/valid, but ERR_PTR in case of errors.
+		 */
+		fdput(f);
+		return ERR_CAST(inner_map_meta->fields_tab);
+	}
 	if (inner_map->btf) {
 		btf_get(inner_map->btf);
 		inner_map_meta->btf = inner_map->btf;
@@ -72,7 +79,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 
 void bpf_map_meta_free(struct bpf_map *map_meta)
 {
-	bpf_map_free_kptr_off_tab(map_meta);
+	bpf_map_free_fields_tab(map_meta);
 	btf_put(map_meta->btf);
 	kfree(map_meta);
 }
@@ -86,7 +93,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 		meta0->value_size == meta1->value_size &&
 		meta0->timer_off == meta1->timer_off &&
 		meta0->map_flags == meta1->map_flags &&
-		bpf_map_equal_kptr_off_tab(meta0, meta1);
+		btf_type_fields_equal(meta0->fields_tab, meta1->fields_tab);
 }
 
 void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7b373a5e861f..83e7a290ad06 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -495,114 +495,134 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 #endif
 
-static int bpf_map_kptr_off_cmp(const void *a, const void *b)
+static int btf_field_cmp(const void *a, const void *b)
 {
-	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
+	const struct btf_field *f1 = a, *f2 = b;
 
-	if (off_desc1->offset < off_desc2->offset)
+	if (f1->offset < f2->offset)
 		return -1;
-	else if (off_desc1->offset > off_desc2->offset)
+	else if (f1->offset > f2->offset)
 		return 1;
 	return 0;
 }
 
-struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
+struct btf_field *btf_type_fields_find(const struct btf_type_fields *tab, u32 offset,
+				       enum btf_field_type type)
 {
-	/* Since members are iterated in btf_find_field in increasing order,
-	 * offsets appended to kptr_off_tab are in increasing order, so we can
-	 * do bsearch to find exact match.
-	 */
-	struct bpf_map_value_off *tab;
+	struct btf_field *field;
 
-	if (!map_value_has_kptrs(map))
+	if (IS_ERR_OR_NULL(tab) || !(tab->field_mask & type))
+		return NULL;
+	field = bsearch(&offset, tab->fields, tab->cnt, sizeof(tab->fields[0]), btf_field_cmp);
+	if (!field || !(field->type & type))
 		return NULL;
-	tab = map->kptr_off_tab;
-	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
+	return field;
 }
 
-void bpf_map_free_kptr_off_tab(struct bpf_map *map)
+void btf_type_fields_free(struct btf_type_fields *tab)
 {
-	struct bpf_map_value_off *tab = map->kptr_off_tab;
 	int i;
 
-	if (!map_value_has_kptrs(map))
+	if (IS_ERR_OR_NULL(tab))
 		return;
-	for (i = 0; i < tab->nr_off; i++) {
-		if (tab->off[i].kptr.module)
-			module_put(tab->off[i].kptr.module);
-		btf_put(tab->off[i].kptr.btf);
+	for (i = 0; i < tab->cnt; i++) {
+		switch (tab->fields[i].type) {
+		case BPF_KPTR_UNREF:
+		case BPF_KPTR_REF:
+			if (tab->fields[i].kptr.module)
+				module_put(tab->fields[i].kptr.module);
+			btf_put(tab->fields[i].kptr.btf);
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			continue;
+		}
 	}
 	kfree(tab);
-	map->kptr_off_tab = NULL;
 }
 
-struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
+void bpf_map_free_fields_tab(struct bpf_map *map)
+{
+	btf_type_fields_free(map->fields_tab);
+	map->fields_tab = NULL;
+}
+
+struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab)
 {
-	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
-	int size, i;
+	struct btf_type_fields *new_tab;
+	const struct btf_field *fields;
+	int ret, size, i;
 
-	if (!map_value_has_kptrs(map))
-		return ERR_PTR(-ENOENT);
-	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
+	if (IS_ERR_OR_NULL(tab))
+		return NULL;
+	size = offsetof(struct btf_type_fields, fields[tab->cnt]);
 	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
 	if (!new_tab)
 		return ERR_PTR(-ENOMEM);
-	/* Do a deep copy of the kptr_off_tab */
-	for (i = 0; i < tab->nr_off; i++) {
-		btf_get(tab->off[i].kptr.btf);
-		if (tab->off[i].kptr.module && !try_module_get(tab->off[i].kptr.module)) {
-			while (i--) {
-				if (tab->off[i].kptr.module)
-					module_put(tab->off[i].kptr.module);
-				btf_put(tab->off[i].kptr.btf);
+	/* Do a deep copy of the fields_tab */
+	fields = tab->fields;
+	new_tab->cnt = 0;
+	for (i = 0; i < tab->cnt; i++) {
+		switch (fields[i].type) {
+		case BPF_KPTR_UNREF:
+		case BPF_KPTR_REF:
+			btf_get(fields[i].kptr.btf);
+			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
+				ret = -ENXIO;
+				goto free;
 			}
-			kfree(new_tab);
-			return ERR_PTR(-ENXIO);
+			break;
+		default:
+			ret = -EFAULT;
+			WARN_ON_ONCE(1);
+			goto free;
 		}
+		new_tab->cnt++;
 	}
 	return new_tab;
+free:
+	btf_type_fields_free(new_tab);
+	return ERR_PTR(ret);
 }
 
-bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
+bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf_type_fields *tab_b)
 {
-	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
-	bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
+	bool a_has_fields = !IS_ERR_OR_NULL(tab_a), b_has_fields = !IS_ERR_OR_NULL(tab_b);
 	int size;
 
-	if (!a_has_kptr && !b_has_kptr)
+	if (!a_has_fields && !b_has_fields)
 		return true;
-	if (a_has_kptr != b_has_kptr)
+	if (a_has_fields != b_has_fields)
 		return false;
-	if (tab_a->nr_off != tab_b->nr_off)
+	if (tab_a->cnt != tab_b->cnt)
 		return false;
-	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
+	size = offsetof(struct btf_type_fields, fields[tab_a->cnt]);
 	return !memcmp(tab_a, tab_b, size);
 }
 
-/* Caller must ensure map_value_has_kptrs is true. Note that this function can
- * be called on a map value while the map_value is visible to BPF programs, as
- * it ensures the correct synchronization, and we already enforce the same using
- * the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
- */
-void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
+void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)
 {
-	struct bpf_map_value_off *tab = map->kptr_off_tab;
-	unsigned long *btf_id_ptr;
+	const struct btf_field *fields;
 	int i;
 
-	for (i = 0; i < tab->nr_off; i++) {
-		struct bpf_map_value_off_desc *off_desc = &tab->off[i];
-		unsigned long old_ptr;
-
-		btf_id_ptr = map_value + off_desc->offset;
-		if (off_desc->type == BPF_KPTR_UNREF) {
-			u64 *p = (u64 *)btf_id_ptr;
-
-			WRITE_ONCE(*p, 0);
+	if (IS_ERR_OR_NULL(tab))
+		return;
+	fields = tab->fields;
+	for (i = 0; i < tab->cnt; i++) {
+		const struct btf_field *field = &fields[i];
+		void *field_ptr = obj + field->offset;
+
+		switch (fields[i].type) {
+		case BPF_KPTR_UNREF:
+			WRITE_ONCE(*(u64 *)field_ptr, 0);
+			break;
+		case BPF_KPTR_REF:
+			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
+			break;
+		default:
+			WARN_ON_ONCE(1);
 			continue;
 		}
-		old_ptr = xchg(btf_id_ptr, 0);
-		off_desc->kptr.dtor((void *)old_ptr);
 	}
 }
 
@@ -615,7 +635,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 	kfree(map->off_arr);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing, map_free callback also does
-	 * bpf_map_free_kptr_off_tab, if needed.
+	 * bpf_map_free_fields_tab, if needed.
 	 */
 	map->ops->map_free(map);
 }
@@ -779,7 +799,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	int err;
 
 	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
-	    map_value_has_timer(map) || map_value_has_kptrs(map))
+	    map_value_has_timer(map) || !IS_ERR_OR_NULL(map->fields_tab))
 		return -ENOTSUPP;
 
 	if (!(vma->vm_flags & VM_SHARED))
@@ -936,11 +956,11 @@ static int bpf_map_alloc_off_arr(struct bpf_map *map)
 {
 	bool has_spin_lock = map_value_has_spin_lock(map);
 	bool has_timer = map_value_has_timer(map);
-	bool has_kptrs = map_value_has_kptrs(map);
-	struct bpf_map_off_arr *off_arr;
+	bool has_fields = !IS_ERR_OR_NULL(map);
+	struct btf_type_fields_off *off_arr;
 	u32 i;
 
-	if (!has_spin_lock && !has_timer && !has_kptrs) {
+	if (!has_spin_lock && !has_timer && !has_fields) {
 		map->off_arr = NULL;
 		return 0;
 	}
@@ -965,16 +985,16 @@ static int bpf_map_alloc_off_arr(struct bpf_map *map)
 		off_arr->field_sz[i] = sizeof(struct bpf_timer);
 		off_arr->cnt++;
 	}
-	if (has_kptrs) {
-		struct bpf_map_value_off *tab = map->kptr_off_tab;
+	if (has_fields) {
+		struct btf_type_fields *tab = map->fields_tab;
 		u32 *off = &off_arr->field_off[off_arr->cnt];
 		u8 *sz = &off_arr->field_sz[off_arr->cnt];
 
-		for (i = 0; i < tab->nr_off; i++) {
-			*off++ = tab->off[i].offset;
-			*sz++ = sizeof(u64);
+		for (i = 0; i < tab->cnt; i++) {
+			*off++ = tab->fields[i].offset;
+			*sz++ = btf_field_type_size(tab->fields[i].type);
 		}
-		off_arr->cnt += tab->nr_off;
+		off_arr->cnt += tab->cnt;
 	}
 
 	if (off_arr->cnt == 1)
@@ -1037,8 +1057,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			return -EOPNOTSUPP;
 	}
 
-	map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
-	if (map_value_has_kptrs(map)) {
+	map->fields_tab = btf_parse_fields(btf, value_type);
+	if (!IS_ERR_OR_NULL(map->fields_tab)) {
+		int i;
+
 		if (!bpf_capable()) {
 			ret = -EPERM;
 			goto free_map_tab;
@@ -1047,12 +1069,25 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			ret = -EACCES;
 			goto free_map_tab;
 		}
-		if (map->map_type != BPF_MAP_TYPE_HASH &&
-		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
-		    map->map_type != BPF_MAP_TYPE_ARRAY &&
-		    map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) {
-			ret = -EOPNOTSUPP;
-			goto free_map_tab;
+		for (i = 0; i < sizeof(map->fields_tab->field_mask) * 8; i++) {
+			switch (map->fields_tab->field_mask & (1 << i)) {
+			case 0:
+				continue;
+			case BPF_KPTR_UNREF:
+			case BPF_KPTR_REF:
+				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+				    map->map_type != BPF_MAP_TYPE_ARRAY &&
+				    map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) {
+					ret = -EOPNOTSUPP;
+					goto free_map_tab;
+				}
+				break;
+			default:
+				/* Fail if map_type checks are missing for a field type */
+				ret = -EOPNOTSUPP;
+				goto free_map_tab;
+			}
 		}
 	}
 
@@ -1064,7 +1099,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 
 	return ret;
 free_map_tab:
-	bpf_map_free_kptr_off_tab(map);
+	bpf_map_free_fields_tab(map);
 	return ret;
 }
 
@@ -1882,7 +1917,7 @@ static int map_freeze(const union bpf_attr *attr)
 		return PTR_ERR(map);
 
 	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
-	    map_value_has_timer(map) || map_value_has_kptrs(map)) {
+	    map_value_has_timer(map) || !IS_ERR_OR_NULL(map->fields_tab)) {
 		fdput(f);
 		return -ENOTSUPP;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c96419cf7033..9c375949804d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -262,7 +262,7 @@ struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
-	struct bpf_map_value_off_desc *kptr_off_desc;
+	struct btf_field *kptr_field;
 	u8 uninit_dynptr_regno;
 };
 
@@ -3674,15 +3674,15 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 }
 
 static int map_kptr_match_type(struct bpf_verifier_env *env,
-			       struct bpf_map_value_off_desc *off_desc,
+			       struct btf_field *kptr_field,
 			       struct bpf_reg_state *reg, u32 regno)
 {
-	const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
+	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
 	int perm_flags = PTR_MAYBE_NULL;
 	const char *reg_name = "";
 
 	/* Only unreferenced case accepts untrusted pointers */
-	if (off_desc->type == BPF_KPTR_UNREF)
+	if (kptr_field->type == BPF_KPTR_UNREF)
 		perm_flags |= PTR_UNTRUSTED;
 
 	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
@@ -3729,15 +3729,15 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	 * strict mode to true for type match.
 	 */
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
-				  off_desc->kptr.btf, off_desc->kptr.btf_id,
-				  off_desc->type == BPF_KPTR_REF))
+				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
+				  kptr_field->type == BPF_KPTR_REF))
 		goto bad_type;
 	return 0;
 bad_type:
 	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
 		reg_type_str(env, reg->type), reg_name);
 	verbose(env, "expected=%s%s", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
-	if (off_desc->type == BPF_KPTR_UNREF)
+	if (kptr_field->type == BPF_KPTR_UNREF)
 		verbose(env, " or %s%s\n", reg_type_str(env, PTR_TO_BTF_ID | PTR_UNTRUSTED),
 			targ_name);
 	else
@@ -3747,7 +3747,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 
 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 				 int value_regno, int insn_idx,
-				 struct bpf_map_value_off_desc *off_desc)
+				 struct btf_field *kptr_field)
 {
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	int class = BPF_CLASS(insn->code);
@@ -3757,7 +3757,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	 *  - Reject cases where variable offset may touch kptr
 	 *  - size of access (must be BPF_DW)
 	 *  - tnum_is_const(reg->var_off)
-	 *  - off_desc->offset == off + reg->var_off.value
+	 *  - kptr_field->offset == off + reg->var_off.value
 	 */
 	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
 	if (BPF_MODE(insn->code) != BPF_MEM) {
@@ -3768,7 +3768,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	/* We only allow loading referenced kptr, since it will be marked as
 	 * untrusted, similar to unreferenced kptr.
 	 */
-	if (class != BPF_LDX && off_desc->type == BPF_KPTR_REF) {
+	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
 		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
@@ -3778,19 +3778,19 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		/* We can simply mark the value_regno receiving the pointer
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
-		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
-				off_desc->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
+				kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
 		/* For mark_ptr_or_null_reg */
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
 		val_reg = reg_state(env, value_regno);
 		if (!register_is_null(val_reg) &&
-		    map_kptr_match_type(env, off_desc, val_reg, value_regno))
+		    map_kptr_match_type(env, kptr_field, val_reg, value_regno))
 			return -EACCES;
 	} else if (class == BPF_ST) {
 		if (insn->imm) {
 			verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
-				off_desc->offset);
+				kptr_field->offset);
 			return -EACCES;
 		}
 	} else {
@@ -3809,7 +3809,8 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
 	struct bpf_reg_state *reg = &state->regs[regno];
 	struct bpf_map *map = reg->map_ptr;
-	int err;
+	struct btf_type_fields *tab;
+	int err, i;
 
 	err = check_mem_region_access(env, regno, off, size, map->value_size,
 				      zero_size_allowed);
@@ -3839,15 +3840,18 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 	}
-	if (map_value_has_kptrs(map)) {
-		struct bpf_map_value_off *tab = map->kptr_off_tab;
-		int i;
-
-		for (i = 0; i < tab->nr_off; i++) {
-			u32 p = tab->off[i].offset;
-
-			if (reg->smin_value + off < p + sizeof(u64) &&
-			    p < reg->umax_value + off + size) {
+	if (IS_ERR_OR_NULL(map->fields_tab))
+		return 0;
+	tab = map->fields_tab;
+	for (i = 0; i < tab->cnt; i++) {
+		struct btf_field *field = &tab->fields[i];
+		u32 p = field->offset;
+
+		if (reg->smin_value + off < p + btf_field_type_size(field->type) &&
+		    p < reg->umax_value + off + size) {
+			switch (field->type) {
+			case BPF_KPTR_UNREF:
+			case BPF_KPTR_REF:
 				if (src != ACCESS_DIRECT) {
 					verbose(env, "kptr cannot be accessed indirectly by helper\n");
 					return -EACCES;
@@ -3866,10 +3870,13 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 					return -EACCES;
 				}
 				break;
+			default:
+				verbose(env, "field cannot be accessed directly by load/store\n");
+				return -EACCES;
 			}
 		}
 	}
-	return err;
+	return 0;
 }
 
 #define MAX_PACKET_OFF 0xffff
@@ -4742,7 +4749,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
 	} else if (reg->type == PTR_TO_MAP_VALUE) {
-		struct bpf_map_value_off_desc *kptr_off_desc = NULL;
+		struct btf_field *kptr_field = NULL;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
@@ -4756,11 +4763,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err)
 			return err;
 		if (tnum_is_const(reg->var_off))
-			kptr_off_desc = bpf_map_kptr_off_contains(reg->map_ptr,
-								  off + reg->var_off.value);
-		if (kptr_off_desc) {
-			err = check_map_kptr_access(env, regno, value_regno, insn_idx,
-						    kptr_off_desc);
+			kptr_field = btf_type_fields_find(reg->map_ptr->fields_tab,
+							  off + reg->var_off.value, BPF_KPTR);
+		if (kptr_field) {
+			err = check_map_kptr_access(env, regno, value_regno, insn_idx, kptr_field);
 		} else if (t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
 
@@ -5527,10 +5533,9 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 			     struct bpf_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
-	struct bpf_map_value_off_desc *off_desc;
 	struct bpf_map *map_ptr = reg->map_ptr;
+	struct btf_field *kptr_field;
 	u32 kptr_off;
-	int ret;
 
 	if (!tnum_is_const(reg->var_off)) {
 		verbose(env,
@@ -5543,30 +5548,23 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 			map_ptr->name);
 		return -EINVAL;
 	}
-	if (!map_value_has_kptrs(map_ptr)) {
-		ret = PTR_ERR_OR_ZERO(map_ptr->kptr_off_tab);
-		if (ret == -E2BIG)
-			verbose(env, "map '%s' has more than %d kptr\n", map_ptr->name,
-				BPF_MAP_VALUE_OFF_MAX);
-		else if (ret == -EEXIST)
-			verbose(env, "map '%s' has repeating kptr BTF tags\n", map_ptr->name);
-		else
-			verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
+	if (!btf_type_fields_has_field(map_ptr->fields_tab, BPF_KPTR)) {
+		verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
 		return -EINVAL;
 	}
 
 	meta->map_ptr = map_ptr;
 	kptr_off = reg->off + reg->var_off.value;
-	off_desc = bpf_map_kptr_off_contains(map_ptr, kptr_off);
-	if (!off_desc) {
+	kptr_field = btf_type_fields_find(map_ptr->fields_tab, kptr_off, BPF_KPTR);
+	if (!kptr_field) {
 		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
 		return -EACCES;
 	}
-	if (off_desc->type != BPF_KPTR_REF) {
+	if (kptr_field->type != BPF_KPTR_REF) {
 		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
 		return -EACCES;
 	}
-	meta->kptr_off_desc = off_desc;
+	meta->kptr_field = kptr_field;
 	return 0;
 }
 
@@ -5798,7 +5796,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		}
 
 		if (meta->func_id == BPF_FUNC_kptr_xchg) {
-			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
+			if (map_kptr_match_type(env, meta->kptr_field, reg, regno))
 				return -EACCES;
 		} else {
 			if (arg_btf_id == BPF_PTR_POISON) {
@@ -7535,8 +7533,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
 		if (func_id == BPF_FUNC_kptr_xchg) {
-			ret_btf = meta.kptr_off_desc->kptr.btf;
-			ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
+			ret_btf = meta.kptr_field->kptr.btf;
+			ret_btf_id = meta.kptr_field->kptr.btf_id;
 		} else {
 			if (fn->ret_btf_id == BPF_PTR_POISON) {
 				verbose(env, "verifier internal error:");
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management into fields_tab
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19  1:40   ` Alexei Starovoitov
  2022-10-13  6:22 ` [PATCH bpf-next v2 08/25] bpf: Refactor map->off_arr handling Kumar Kartikeya Dwivedi
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Now that kptr_off_tab has been refactored into fields_tab, and can hold
more than one specific field type, accomodate bpf_spin_lock and
bpf_timer as well.

While they don't require any more metadata than offset, having all
special fields in one place allows us to share the same code for
allocated user defined types and handle both map values and these
allocated objects in a similar fashion.

As an optimization, we still keep spin_lock_off and timer_off offsets in
the btf_type_fields structure, just to avoid having to find the
btf_field struct each time their offset is needed. This is mostly needed
to manipulate such objects in a map value at runtime. It's ok to
hardcode just one offset as more than one field is disallowed.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h            |  53 +++---
 include/linux/btf.h            |   3 +-
 kernel/bpf/arraymap.c          |  19 +-
 kernel/bpf/bpf_local_storage.c |   2 +-
 kernel/bpf/btf.c               | 323 ++++++++++++++++++---------------
 kernel/bpf/hashtab.c           |  26 +--
 kernel/bpf/helpers.c           |   6 +-
 kernel/bpf/local_storage.c     |   2 +-
 kernel/bpf/map_in_map.c        |   5 +-
 kernel/bpf/syscall.c           | 111 +++++------
 kernel/bpf/verifier.c          |  83 +++------
 net/core/bpf_sk_storage.c      |   4 +-
 12 files changed, 308 insertions(+), 329 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 25e77a172d7c..ba59147dfa61 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -165,13 +165,13 @@ struct bpf_map_ops {
 
 enum {
 	/* Support at most 8 pointers in a BTF type */
-	BTF_FIELDS_MAX	      = 8,
-	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
-				1 + /* for bpf_spin_lock */
-				1,  /* for bpf_timer */
+	BTF_FIELDS_MAX	      = 10,
+	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX,
 };
 
 enum btf_field_type {
+	BPF_SPIN_LOCK  = (1 << 0),
+	BPF_TIMER      = (1 << 1),
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
@@ -195,6 +195,8 @@ struct btf_field {
 struct btf_type_fields {
 	u32 cnt;
 	u32 field_mask;
+	int spin_lock_off;
+	int timer_off;
 	struct btf_field fields[];
 };
 
@@ -219,10 +221,8 @@ struct bpf_map {
 	u32 max_entries;
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
-	int spin_lock_off; /* >=0 valid offset, <0 error */
-	struct btf_type_fields *fields_tab;
-	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
+	struct btf_type_fields *fields_tab;
 	int numa_node;
 	u32 btf_key_type_id;
 	u32 btf_value_type_id;
@@ -256,9 +256,29 @@ struct bpf_map {
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 };
 
+static inline const char *btf_field_type_name(enum btf_field_type type)
+{
+	switch (type) {
+	case BPF_SPIN_LOCK:
+		return "bpf_spin_lock";
+	case BPF_TIMER:
+		return "bpf_timer";
+	case BPF_KPTR_UNREF:
+	case BPF_KPTR_REF:
+		return "kptr";
+	default:
+		WARN_ON_ONCE(1);
+		return "unknown";
+	}
+}
+
 static inline u32 btf_field_type_size(enum btf_field_type type)
 {
 	switch (type) {
+	case BPF_SPIN_LOCK:
+		return sizeof(struct bpf_spin_lock);
+	case BPF_TIMER:
+		return sizeof(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return sizeof(u64);
@@ -271,6 +291,10 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 static inline u32 btf_field_type_align(enum btf_field_type type)
 {
 	switch (type) {
+	case BPF_SPIN_LOCK:
+		return __alignof__(struct bpf_spin_lock);
+	case BPF_TIMER:
+		return __alignof__(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return __alignof__(u64);
@@ -287,22 +311,8 @@ static inline bool btf_type_fields_has_field(const struct btf_type_fields *tab,
 	return tab->field_mask & type;
 }
 
-static inline bool map_value_has_spin_lock(const struct bpf_map *map)
-{
-	return map->spin_lock_off >= 0;
-}
-
-static inline bool map_value_has_timer(const struct bpf_map *map)
-{
-	return map->timer_off >= 0;
-}
-
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
-	if (unlikely(map_value_has_spin_lock(map)))
-		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
-	if (unlikely(map_value_has_timer(map)))
-		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
 	if (!IS_ERR_OR_NULL(map->fields_tab)) {
 		struct btf_field *fields = map->fields_tab->fields;
 		u32 cnt = map->fields_tab->cnt;
@@ -1730,6 +1740,7 @@ void btf_type_fields_free(struct btf_type_fields *tab);
 void bpf_map_free_fields_tab(struct bpf_map *map);
 struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab);
 bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf_type_fields *tab_b);
+void bpf_obj_free_timer(const struct btf_type_fields *tab, void *obj);
 void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj);
 
 struct bpf_map *bpf_map_get(u32 ufd);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 0d47cbb11a59..72136c9ae4cd 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -164,7 +164,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
 struct btf_type_fields *btf_parse_fields(const struct btf *btf,
-					 const struct btf_type *t);
+					 const struct btf_type *t,
+					 u32 field_mask, u32 value_size);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index defe5c00049a..c993e164fb65 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -306,13 +306,6 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 	return 0;
 }
 
-static void check_and_free_fields(struct bpf_array *arr, void *val)
-{
-	if (map_value_has_timer(&arr->map))
-		bpf_timer_cancel_and_free(val + arr->map.timer_off);
-	bpf_obj_free_fields(arr->map.fields_tab, val);
-}
-
 /* Called from syscall or from eBPF program */
 static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 				 u64 map_flags)
@@ -334,13 +327,13 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 		return -EEXIST;
 
 	if (unlikely((map_flags & BPF_F_LOCK) &&
-		     !map_value_has_spin_lock(map)))
+		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
 		return -EINVAL;
 
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		val = this_cpu_ptr(array->pptrs[index & array->index_mask]);
 		copy_map_value(map, val, value);
-		check_and_free_fields(array, val);
+		bpf_obj_free_fields(array->map.fields_tab, val);
 	} else {
 		val = array->value +
 			(u64)array->elem_size * (index & array->index_mask);
@@ -348,7 +341,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 			copy_map_value_locked(map, val, value, false);
 		else
 			copy_map_value(map, val, value);
-		check_and_free_fields(array, val);
+		bpf_obj_free_fields(array->map.fields_tab, val);
 	}
 	return 0;
 }
@@ -385,7 +378,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	pptr = array->pptrs[index & array->index_mask];
 	for_each_possible_cpu(cpu) {
 		copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
-		check_and_free_fields(array, per_cpu_ptr(pptr, cpu));
+		bpf_obj_free_fields(array->map.fields_tab, per_cpu_ptr(pptr, cpu));
 		off += size;
 	}
 	rcu_read_unlock();
@@ -409,11 +402,11 @@ static void array_map_free_timers(struct bpf_map *map)
 	int i;
 
 	/* We don't reset or free fields other than timer on uref dropping to zero. */
-	if (!map_value_has_timer(map))
+	if (!btf_type_fields_has_field(map->fields_tab, BPF_TIMER))
 		return;
 
 	for (i = 0; i < array->map.max_entries; i++)
-		bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off);
+		bpf_obj_free_timer(map->fields_tab, array_map_elem_ptr(array, i));
 }
 
 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 802fc15b0d73..a3abebb7f38f 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -372,7 +372,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
 	    /* BPF_F_LOCK can only be used in a value with spin_lock */
 	    unlikely((map_flags & BPF_F_LOCK) &&
-		     !map_value_has_spin_lock(&smap->map)))
+		     !btf_type_fields_has_field(smap->map.fields_tab, BPF_SPIN_LOCK)))
 		return ERR_PTR(-EINVAL);
 
 	if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c8d267098b87..fe00d9c95c96 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3205,16 +3205,20 @@ enum {
 struct btf_field_info {
 	enum btf_field_type type;
 	u32 off;
-	u32 type_id;
+	struct {
+		u32 type_id;
+	} kptr;
 };
 
 static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
-			   u32 off, int sz, struct btf_field_info *info)
+			   u32 off, int sz, enum btf_field_type field_type,
+			   struct btf_field_info *info)
 {
 	if (!__btf_type_is_struct(t))
 		return BTF_FIELD_IGNORE;
 	if (t->size != sz)
 		return BTF_FIELD_IGNORE;
+	info->type = field_type;
 	info->off = off;
 	return BTF_FIELD_FOUND;
 }
@@ -3251,28 +3255,66 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	if (!__btf_type_is_struct(t))
 		return -EINVAL;
 
-	info->type_id = res_id;
-	info->off = off;
 	info->type = type;
+	info->off = off;
+	info->kptr.type_id = res_id;
 	return BTF_FIELD_FOUND;
 }
 
-static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
-				 const char *name, int sz, int align,
-				 enum btf_field_info_type field_type,
+static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
+			      int *align, int *sz)
+{
+	int type = 0;
+
+	if (field_mask & BPF_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_spin_lock")) {
+			if (*seen_mask & BPF_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_SPIN_LOCK;
+			type = BPF_SPIN_LOCK;
+			goto end;
+		}
+	}
+	if (field_mask & BPF_TIMER) {
+		if (!strcmp(name, "bpf_timer")) {
+			if (*seen_mask & BPF_TIMER)
+				return -E2BIG;
+			*seen_mask |= BPF_TIMER;
+			type = BPF_TIMER;
+			goto end;
+		}
+	}
+	/* Only return BPF_KPTR when all other types with matchable names fail */
+	if (field_mask & BPF_KPTR) {
+		type = BPF_KPTR_REF;
+		goto end;
+	}
+	return 0;
+end:
+	*sz = btf_field_type_size(type);
+	*align = btf_field_type_align(type);
+	return type;
+}
+
+static int btf_find_struct_field(const struct btf *btf,
+				 const struct btf_type *t, u32 field_mask,
 				 struct btf_field_info *info, int info_cnt)
 {
+	int ret, idx = 0, align, sz, field_type;
 	const struct btf_member *member;
 	struct btf_field_info tmp;
-	int ret, idx = 0;
-	u32 i, off;
+	u32 i, off, seen_mask = 0;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								    member->type);
 
-		if (name && strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
+		field_type = btf_get_field_type(__btf_name_by_offset(btf, member_type->name_off),
+						field_mask, &seen_mask, &align, &sz);
+		if (field_type == 0)
 			continue;
+		if (field_type < 0)
+			return field_type;
 
 		off = __btf_member_bit_offset(t, member);
 		if (off % 8)
@@ -3280,17 +3322,18 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 			return -EINVAL;
 		off /= 8;
 		if (off % align)
-			return -EINVAL;
+			continue;
 
 		switch (field_type) {
-		case BTF_FIELD_SPIN_LOCK:
-		case BTF_FIELD_TIMER:
-			ret = btf_find_struct(btf, member_type, off, sz,
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
 				return ret;
 			break;
-		case BTF_FIELD_KPTR:
+		case BPF_KPTR_UNREF:
+		case BPF_KPTR_REF:
 			ret = btf_find_kptr(btf, member_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3310,37 +3353,41 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
-				const char *name, int sz, int align,
-				enum btf_field_info_type field_type,
-				struct btf_field_info *info, int info_cnt)
+				u32 field_mask, struct btf_field_info *info,
+				int info_cnt)
 {
+	int ret, idx = 0, align, sz, field_type;
 	const struct btf_var_secinfo *vsi;
 	struct btf_field_info tmp;
-	int ret, idx = 0;
-	u32 i, off;
+	u32 i, off, seen_mask = 0;
 
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
 		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
 
-		off = vsi->offset;
-
-		if (name && strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
+		field_type = btf_get_field_type(__btf_name_by_offset(btf, var_type->name_off),
+						field_mask, &seen_mask, &align, &sz);
+		if (field_type == 0)
 			continue;
+		if (field_type < 0)
+			return field_type;
+
+		off = vsi->offset;
 		if (vsi->size != sz)
 			continue;
 		if (off % align)
-			return -EINVAL;
+			continue;
 
 		switch (field_type) {
-		case BTF_FIELD_SPIN_LOCK:
-		case BTF_FIELD_TIMER:
-			ret = btf_find_struct(btf, var_type, off, sz,
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
 				return ret;
 			break;
-		case BTF_FIELD_KPTR:
+		case BPF_KPTR_UNREF:
+		case BPF_KPTR_REF:
 			ret = btf_find_kptr(btf, var_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3360,79 +3407,100 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  enum btf_field_info_type field_type,
-			  struct btf_field_info *info, int info_cnt)
+			  u32 field_mask, struct btf_field_info *info,
+			  int info_cnt)
 {
-	const char *name;
-	int sz, align;
-
-	switch (field_type) {
-	case BTF_FIELD_SPIN_LOCK:
-		name = "bpf_spin_lock";
-		sz = sizeof(struct bpf_spin_lock);
-		align = __alignof__(struct bpf_spin_lock);
-		break;
-	case BTF_FIELD_TIMER:
-		name = "bpf_timer";
-		sz = sizeof(struct bpf_timer);
-		align = __alignof__(struct bpf_timer);
-		break;
-	case BTF_FIELD_KPTR:
-		name = NULL;
-		sz = sizeof(u64);
-		align = 8;
-		break;
-	default:
-		return -EFAULT;
-	}
-
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt);
+		return btf_find_struct_field(btf, t, field_mask, info, info_cnt);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt);
+		return btf_find_datasec_var(btf, t, field_mask, info, info_cnt);
 	return -EINVAL;
 }
 
-/* find 'struct bpf_spin_lock' in map value.
- * return >= 0 offset if found
- * and < 0 in case of error
- */
-int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
+static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
+			  struct btf_field_info *info)
 {
-	struct btf_field_info info;
+	struct module *mod = NULL;
+	const struct btf_type *t;
+	struct btf *kernel_btf;
 	int ret;
+	s32 id;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info, 1);
-	if (ret < 0)
-		return ret;
-	if (!ret)
-		return -ENOENT;
-	return info.off;
-}
+	/* Find type in map BTF, and use it to look up the matching type
+	 * in vmlinux or module BTFs, by name and kind.
+	 */
+	t = btf_type_by_id(btf, info->kptr.type_id);
+	id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
+			     &kernel_btf);
+	if (id < 0)
+		return id;
+
+	/* Find and stash the function pointer for the destruction function that
+	 * needs to be eventually invoked from the map free path.
+	 */
+	if (info->type == BPF_KPTR_REF) {
+		const struct btf_type *dtor_func;
+		const char *dtor_func_name;
+		unsigned long addr;
+		s32 dtor_btf_id;
+
+		/* This call also serves as a whitelist of allowed objects that
+		 * can be used as a referenced pointer and be stored in a map at
+		 * the same time.
+		 */
+		dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
+		if (dtor_btf_id < 0) {
+			ret = dtor_btf_id;
+			goto end_btf;
+		}
 
-int btf_find_timer(const struct btf *btf, const struct btf_type *t)
-{
-	struct btf_field_info info;
-	int ret;
+		dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
+		if (!dtor_func) {
+			ret = -ENOENT;
+			goto end_btf;
+		}
 
-	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info, 1);
-	if (ret < 0)
-		return ret;
-	if (!ret)
-		return -ENOENT;
-	return info.off;
+		if (btf_is_module(kernel_btf)) {
+			mod = btf_try_get_module(kernel_btf);
+			if (!mod) {
+				ret = -ENXIO;
+				goto end_btf;
+			}
+		}
+
+		/* We already verified dtor_func to be btf_type_is_func
+		 * in register_btf_id_dtor_kfuncs.
+		 */
+		dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
+		addr = kallsyms_lookup_name(dtor_func_name);
+		if (!addr) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+		field->kptr.dtor = (void *)addr;
+	}
+
+	field->kptr.btf_id = id;
+	field->kptr.btf = kernel_btf;
+	field->kptr.module = mod;
+	return 0;
+end_mod:
+	module_put(mod);
+end_btf:
+	btf_put(kernel_btf);
+	return ret;
 }
 
 struct btf_type_fields *btf_parse_fields(const struct btf *btf,
-					 const struct btf_type *t)
+					 const struct btf_type *t,
+					 u32 field_mask,
+					 u32 value_size)
 {
 	struct btf_field_info info_arr[BTF_FIELDS_MAX];
-	struct btf *kernel_btf = NULL;
 	struct btf_type_fields *tab;
-	struct module *mod = NULL;
 	int ret, i, cnt;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
+	ret = btf_find_field(btf, t, field_mask, info_arr, ARRAY_SIZE(info_arr));
 	if (ret < 0)
 		return ERR_PTR(ret);
 	if (!ret)
@@ -3443,79 +3511,46 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 	if (!tab)
 		return ERR_PTR(-ENOMEM);
 	tab->cnt = 0;
-	for (i = 0; i < cnt; i++) {
-		const struct btf_type *t;
-		s32 id;
 
-		/* Find type in map BTF, and use it to look up the matching type
-		 * in vmlinux or module BTFs, by name and kind.
-		 */
-		t = btf_type_by_id(btf, info_arr[i].type_id);
-		id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
-				     &kernel_btf);
-		if (id < 0) {
-			ret = id;
+	tab->spin_lock_off = -EINVAL;
+	tab->timer_off = -EINVAL;
+	for (i = 0; i < cnt; i++) {
+		if (info_arr[i].off + btf_field_type_size(info_arr[i].type) > value_size) {
+			WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, value_size);
+			ret = -EFAULT;
 			goto end;
 		}
 
-		/* Find and stash the function pointer for the destruction function that
-		 * needs to be eventually invoked from the map free path.
-		 */
-		if (info_arr[i].type == BPF_KPTR_REF) {
-			const struct btf_type *dtor_func;
-			const char *dtor_func_name;
-			unsigned long addr;
-			s32 dtor_btf_id;
-
-			/* This call also serves as a whitelist of allowed objects that
-			 * can be used as a referenced pointer and be stored in a map at
-			 * the same time.
-			 */
-			dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
-			if (dtor_btf_id < 0) {
-				ret = dtor_btf_id;
-				goto end_btf;
-			}
-
-			dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
-			if (!dtor_func) {
-				ret = -ENOENT;
-				goto end_btf;
-			}
-
-			if (btf_is_module(kernel_btf)) {
-				mod = btf_try_get_module(kernel_btf);
-				if (!mod) {
-					ret = -ENXIO;
-					goto end_btf;
-				}
-			}
+		tab->field_mask |= info_arr[i].type;
+		tab->fields[i].offset = info_arr[i].off;
+		tab->fields[i].type = info_arr[i].type;
 
-			/* We already verified dtor_func to be btf_type_is_func
-			 * in register_btf_id_dtor_kfuncs.
-			 */
-			dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
-			addr = kallsyms_lookup_name(dtor_func_name);
-			if (!addr) {
-				ret = -EINVAL;
-				goto end_mod;
-			}
-			tab->fields[i].kptr.dtor = (void *)addr;
+		switch (info_arr[i].type) {
+		case BPF_SPIN_LOCK:
+			WARN_ON_ONCE(tab->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			tab->spin_lock_off = tab->fields[i].offset;
+			break;
+		case BPF_TIMER:
+			WARN_ON_ONCE(tab->timer_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			tab->timer_off = tab->fields[i].offset;
+			break;
+		case BPF_KPTR_UNREF:
+		case BPF_KPTR_REF:
+			ret = btf_parse_kptr(btf, &tab->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
+		default:
+			ret = -EFAULT;
+			goto end;
 		}
 
-		tab->fields[i].offset = info_arr[i].off;
-		tab->fields[i].type = info_arr[i].type;
-		tab->fields[i].kptr.btf_id = id;
-		tab->fields[i].kptr.btf = kernel_btf;
-		tab->fields[i].kptr.module = mod;
 		tab->cnt++;
 	}
 	tab->cnt = cnt;
 	return tab;
-end_mod:
-	module_put(mod);
-end_btf:
-	btf_put(kernel_btf);
 end:
 	btf_type_fields_free(tab);
 	return ERR_PTR(ret);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 59cdbea587c5..b19c4efd8a80 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -222,7 +222,7 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
 	u32 num_entries = htab->map.max_entries;
 	int i;
 
-	if (!map_value_has_timer(&htab->map))
+	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
 		return;
 	if (htab_has_extra_elems(htab))
 		num_entries += num_possible_cpus();
@@ -231,9 +231,8 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
 		struct htab_elem *elem;
 
 		elem = get_htab_elem(htab, i);
-		bpf_timer_cancel_and_free(elem->key +
-					  round_up(htab->map.key_size, 8) +
-					  htab->map.timer_off);
+		bpf_obj_free_timer(htab->map.fields_tab, elem->key +
+				   round_up(htab->map.key_size, 8));
 		cond_resched();
 	}
 }
@@ -763,8 +762,6 @@ static void check_and_free_fields(struct bpf_htab *htab,
 {
 	void *map_value = elem->key + round_up(htab->map.key_size, 8);
 
-	if (map_value_has_timer(&htab->map))
-		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
 	bpf_obj_free_fields(htab->map.fields_tab, map_value);
 }
 
@@ -1089,7 +1086,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	head = &b->head;
 
 	if (unlikely(map_flags & BPF_F_LOCK)) {
-		if (unlikely(!map_value_has_spin_lock(map)))
+		if (unlikely(!btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
 			return -EINVAL;
 		/* find an element without taking the bucket lock */
 		l_old = lookup_nulls_elem_raw(head, hash, key, key_size,
@@ -1472,12 +1469,9 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
 		struct htab_elem *l;
 
 		hlist_nulls_for_each_entry(l, n, head, hash_node) {
-			/* We don't reset or free kptr on uref dropping to zero,
-			 * hence just free timer.
-			 */
-			bpf_timer_cancel_and_free(l->key +
-						  round_up(htab->map.key_size, 8) +
-						  htab->map.timer_off);
+			/* We only free timer on uref dropping to zero */
+			bpf_obj_free_timer(htab->map.fields_tab, l->key +
+					   round_up(htab->map.key_size, 8));
 		}
 		cond_resched_rcu();
 	}
@@ -1488,8 +1482,8 @@ static void htab_map_free_timers(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 
-	/* We don't reset or free kptr on uref dropping to zero. */
-	if (!map_value_has_timer(&htab->map))
+	/* We only free timer on uref dropping to zero */
+	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
 		return;
 	if (!htab_is_prealloc(htab))
 		htab_free_malloced_timers(htab);
@@ -1673,7 +1667,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 
 	elem_map_flags = attr->batch.elem_flags;
 	if ((elem_map_flags & ~BPF_F_LOCK) ||
-	    ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+	    ((elem_map_flags & BPF_F_LOCK) && !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
 		return -EINVAL;
 
 	map_flags = attr->batch.flags;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a6b04faed282..8f425596b9c6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -366,9 +366,9 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 	struct bpf_spin_lock *lock;
 
 	if (lock_src)
-		lock = src + map->spin_lock_off;
+		lock = src + map->fields_tab->spin_lock_off;
 	else
-		lock = dst + map->spin_lock_off;
+		lock = dst + map->fields_tab->spin_lock_off;
 	preempt_disable();
 	__bpf_spin_lock_irqsave(lock);
 	copy_map_value(map, dst, src);
@@ -1169,7 +1169,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
 		ret = -ENOMEM;
 		goto out;
 	}
-	t->value = (void *)timer - map->timer_off;
+	t->value = (void *)timer - map->fields_tab->timer_off;
 	t->map = map;
 	t->prog = NULL;
 	rcu_assign_pointer(t->callback_fn, NULL);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 098cf336fae6..42af40c74c40 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -151,7 +151,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
 		return -EINVAL;
 
 	if (unlikely((flags & BPF_F_LOCK) &&
-		     !map_value_has_spin_lock(map)))
+		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
 		return -EINVAL;
 
 	storage = cgroup_storage_lookup((struct bpf_cgroup_storage_map *)map,
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 2bff5f3a5efc..b3f82fec8e2e 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -29,7 +29,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		return ERR_PTR(-ENOTSUPP);
 	}
 
-	if (map_value_has_spin_lock(inner_map)) {
+	if (btf_type_fields_has_field(inner_map->fields_tab, BPF_SPIN_LOCK)) {
 		fdput(f);
 		return ERR_PTR(-ENOTSUPP);
 	}
@@ -50,8 +50,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	inner_map_meta->value_size = inner_map->value_size;
 	inner_map_meta->map_flags = inner_map->map_flags;
 	inner_map_meta->max_entries = inner_map->max_entries;
-	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
-	inner_map_meta->timer_off = inner_map->timer_off;
 	inner_map_meta->fields_tab = btf_type_fields_dup(inner_map->fields_tab);
 	if (IS_ERR(inner_map_meta->fields_tab)) {
 		/* btf_type_fields returns NULL or valid pointer in case of
@@ -91,7 +89,6 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 	return meta0->map_type == meta1->map_type &&
 		meta0->key_size == meta1->key_size &&
 		meta0->value_size == meta1->value_size &&
-		meta0->timer_off == meta1->timer_off &&
 		meta0->map_flags == meta1->map_flags &&
 		btf_type_fields_equal(meta0->fields_tab, meta1->fields_tab);
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83e7a290ad06..afa736132cc5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -527,6 +527,9 @@ void btf_type_fields_free(struct btf_type_fields *tab)
 		return;
 	for (i = 0; i < tab->cnt; i++) {
 		switch (tab->fields[i].type) {
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			if (tab->fields[i].kptr.module)
@@ -564,6 +567,9 @@ struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab)
 	new_tab->cnt = 0;
 	for (i = 0; i < tab->cnt; i++) {
 		switch (fields[i].type) {
+		case BPF_SPIN_LOCK:
+		case BPF_TIMER:
+			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
 			btf_get(fields[i].kptr.btf);
@@ -600,6 +606,13 @@ bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf
 	return !memcmp(tab_a, tab_b, size);
 }
 
+void bpf_obj_free_timer(const struct btf_type_fields *tab, void *obj)
+{
+	if (WARN_ON_ONCE(!btf_type_fields_has_field(tab, BPF_TIMER)))
+		return;
+	bpf_timer_cancel_and_free(obj + tab->timer_off);
+}
+
 void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)
 {
 	const struct btf_field *fields;
@@ -613,6 +626,11 @@ void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)
 		void *field_ptr = obj + field->offset;
 
 		switch (fields[i].type) {
+		case BPF_SPIN_LOCK:
+			break;
+		case BPF_TIMER:
+			bpf_timer_cancel_and_free(field_ptr);
+			break;
 		case BPF_KPTR_UNREF:
 			WRITE_ONCE(*(u64 *)field_ptr, 0);
 			break;
@@ -798,8 +816,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	struct bpf_map *map = filp->private_data;
 	int err;
 
-	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
-	    map_value_has_timer(map) || !IS_ERR_OR_NULL(map->fields_tab))
+	if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->fields_tab))
 		return -ENOTSUPP;
 
 	if (!(vma->vm_flags & VM_SHARED))
@@ -954,13 +971,11 @@ static void map_off_arr_swap(void *_a, void *_b, int size, const void *priv)
 
 static int bpf_map_alloc_off_arr(struct bpf_map *map)
 {
-	bool has_spin_lock = map_value_has_spin_lock(map);
-	bool has_timer = map_value_has_timer(map);
 	bool has_fields = !IS_ERR_OR_NULL(map);
 	struct btf_type_fields_off *off_arr;
 	u32 i;
 
-	if (!has_spin_lock && !has_timer && !has_fields) {
+	if (!has_fields) {
 		map->off_arr = NULL;
 		return 0;
 	}
@@ -971,20 +986,6 @@ static int bpf_map_alloc_off_arr(struct bpf_map *map)
 	map->off_arr = off_arr;
 
 	off_arr->cnt = 0;
-	if (has_spin_lock) {
-		i = off_arr->cnt;
-
-		off_arr->field_off[i] = map->spin_lock_off;
-		off_arr->field_sz[i] = sizeof(struct bpf_spin_lock);
-		off_arr->cnt++;
-	}
-	if (has_timer) {
-		i = off_arr->cnt;
-
-		off_arr->field_off[i] = map->timer_off;
-		off_arr->field_sz[i] = sizeof(struct bpf_timer);
-		off_arr->cnt++;
-	}
 	if (has_fields) {
 		struct btf_type_fields *tab = map->fields_tab;
 		u32 *off = &off_arr->field_off[off_arr->cnt];
@@ -994,7 +995,7 @@ static int bpf_map_alloc_off_arr(struct bpf_map *map)
 			*off++ = tab->fields[i].offset;
 			*sz++ = btf_field_type_size(tab->fields[i].type);
 		}
-		off_arr->cnt += tab->cnt;
+		off_arr->cnt = tab->cnt;
 	}
 
 	if (off_arr->cnt == 1)
@@ -1026,38 +1027,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!value_type || value_size != map->value_size)
 		return -EINVAL;
 
-	map->spin_lock_off = btf_find_spin_lock(btf, value_type);
-
-	if (map_value_has_spin_lock(map)) {
-		if (map->map_flags & BPF_F_RDONLY_PROG)
-			return -EACCES;
-		if (map->map_type != BPF_MAP_TYPE_HASH &&
-		    map->map_type != BPF_MAP_TYPE_ARRAY &&
-		    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
-		    map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
-		    map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
-		    map->map_type != BPF_MAP_TYPE_TASK_STORAGE)
-			return -ENOTSUPP;
-		if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
-		    map->value_size) {
-			WARN_ONCE(1,
-				  "verifier bug spin_lock_off %d value_size %d\n",
-				  map->spin_lock_off, map->value_size);
-			return -EFAULT;
-		}
-	}
-
-	map->timer_off = btf_find_timer(btf, value_type);
-	if (map_value_has_timer(map)) {
-		if (map->map_flags & BPF_F_RDONLY_PROG)
-			return -EACCES;
-		if (map->map_type != BPF_MAP_TYPE_HASH &&
-		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
-		    map->map_type != BPF_MAP_TYPE_ARRAY)
-			return -EOPNOTSUPP;
-	}
-
-	map->fields_tab = btf_parse_fields(btf, value_type);
+	map->fields_tab = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR,
+					   map->value_size);
 	if (!IS_ERR_OR_NULL(map->fields_tab)) {
 		int i;
 
@@ -1073,6 +1044,25 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			switch (map->fields_tab->field_mask & (1 << i)) {
 			case 0:
 				continue;
+			case BPF_SPIN_LOCK:
+				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_ARRAY &&
+				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
+				    map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
+				    map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
+				    map->map_type != BPF_MAP_TYPE_TASK_STORAGE) {
+					ret = -EOPNOTSUPP;
+					goto free_map_tab;
+				}
+				break;
+			case BPF_TIMER:
+				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+				    map->map_type != BPF_MAP_TYPE_ARRAY) {
+					return -EOPNOTSUPP;
+					goto free_map_tab;
+				}
+				break;
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
@@ -1152,8 +1142,6 @@ static int map_create(union bpf_attr *attr)
 	mutex_init(&map->freeze_mutex);
 	spin_lock_init(&map->owner.lock);
 
-	map->spin_lock_off = -EINVAL;
-	map->timer_off = -EINVAL;
 	if (attr->btf_key_type_id || attr->btf_value_type_id ||
 	    /* Even the map's value is a kernel's struct,
 	     * the bpf_prog.o must have BTF to begin with
@@ -1367,7 +1355,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	}
 
 	if ((attr->flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map)) {
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		err = -EINVAL;
 		goto err_put;
 	}
@@ -1440,7 +1428,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 	}
 
 	if ((attr->flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map)) {
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		err = -EINVAL;
 		goto err_put;
 	}
@@ -1603,7 +1591,7 @@ int generic_map_delete_batch(struct bpf_map *map,
 		return -EINVAL;
 
 	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map)) {
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		return -EINVAL;
 	}
 
@@ -1660,7 +1648,7 @@ int generic_map_update_batch(struct bpf_map *map,
 		return -EINVAL;
 
 	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map)) {
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		return -EINVAL;
 	}
 
@@ -1723,7 +1711,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
 		return -EINVAL;
 
 	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map))
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK))
 		return -EINVAL;
 
 	value_size = bpf_map_value_size(map);
@@ -1845,7 +1833,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	}
 
 	if ((attr->flags & BPF_F_LOCK) &&
-	    !map_value_has_spin_lock(map)) {
+	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		err = -EINVAL;
 		goto err_put;
 	}
@@ -1916,8 +1904,7 @@ static int map_freeze(const union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
-	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
-	    map_value_has_timer(map) || !IS_ERR_OR_NULL(map->fields_tab)) {
+	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->fields_tab)) {
 		fdput(f);
 		return -ENOTSUPP;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9c375949804d..8660d08589c8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -454,7 +454,7 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
 	return reg->type == PTR_TO_MAP_VALUE &&
-		map_value_has_spin_lock(reg->map_ptr);
+		btf_type_fields_has_field(reg->map_ptr->fields_tab, BPF_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1388,7 +1388,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
 			/* transfer reg's id which is unique for every map_lookup_elem
 			 * as UID of the inner map.
 			 */
-			if (map_value_has_timer(map->inner_map_meta))
+			if (btf_type_fields_has_field(map->inner_map_meta->fields_tab, BPF_TIMER))
 				reg->map_uid = reg->id;
 		} else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
 			reg->type = PTR_TO_XDP_SOCK;
@@ -3817,29 +3817,6 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	if (err)
 		return err;
 
-	if (map_value_has_spin_lock(map)) {
-		u32 lock = map->spin_lock_off;
-
-		/* if any part of struct bpf_spin_lock can be touched by
-		 * load/store reject this program.
-		 * To check that [x1, x2) overlaps with [y1, y2)
-		 * it is sufficient to check x1 < y2 && y1 < x2.
-		 */
-		if (reg->smin_value + off < lock + sizeof(struct bpf_spin_lock) &&
-		     lock < reg->umax_value + off + size) {
-			verbose(env, "bpf_spin_lock cannot be accessed directly by load/store\n");
-			return -EACCES;
-		}
-	}
-	if (map_value_has_timer(map)) {
-		u32 t = map->timer_off;
-
-		if (reg->smin_value + off < t + sizeof(struct bpf_timer) &&
-		     t < reg->umax_value + off + size) {
-			verbose(env, "bpf_timer cannot be accessed directly by load/store\n");
-			return -EACCES;
-		}
-	}
 	if (IS_ERR_OR_NULL(map->fields_tab))
 		return 0;
 	tab = map->fields_tab;
@@ -3847,6 +3824,11 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 		struct btf_field *field = &tab->fields[i];
 		u32 p = field->offset;
 
+		/* if any part of struct bpf_spin_lock can be touched by
+		 * load/store reject this program.
+		 * To check that [x1, x2) overlaps with [y1, y2)
+		 * it is sufficient to check x1 < y2 && y1 < x2.
+		 */
 		if (reg->smin_value + off < p + btf_field_type_size(field->type) &&
 		    p < reg->umax_value + off + size) {
 			switch (field->type) {
@@ -3871,7 +3853,8 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 				}
 				break;
 			default:
-				verbose(env, "field cannot be accessed directly by load/store\n");
+				verbose(env, "%s cannot be accessed directly by load/store\n",
+					btf_field_type_name(field->type));
 				return -EACCES;
 			}
 		}
@@ -5440,24 +5423,13 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			map->name);
 		return -EINVAL;
 	}
-	if (!map_value_has_spin_lock(map)) {
-		if (map->spin_lock_off == -E2BIG)
-			verbose(env,
-				"map '%s' has more than one 'struct bpf_spin_lock'\n",
-				map->name);
-		else if (map->spin_lock_off == -ENOENT)
-			verbose(env,
-				"map '%s' doesn't have 'struct bpf_spin_lock'\n",
-				map->name);
-		else
-			verbose(env,
-				"map '%s' is not a struct type or bpf_spin_lock is mangled\n",
-				map->name);
+	if (!btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
+		verbose(env, "map '%s' has no valid bpf_spin_lock\n", map->name);
 		return -EINVAL;
 	}
-	if (map->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock'\n",
-			val + reg->off);
+	if (map->fields_tab->spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
+			val + reg->off, map->fields_tab->spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
@@ -5500,24 +5472,13 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
 			map->name);
 		return -EINVAL;
 	}
-	if (!map_value_has_timer(map)) {
-		if (map->timer_off == -E2BIG)
-			verbose(env,
-				"map '%s' has more than one 'struct bpf_timer'\n",
-				map->name);
-		else if (map->timer_off == -ENOENT)
-			verbose(env,
-				"map '%s' doesn't have 'struct bpf_timer'\n",
-				map->name);
-		else
-			verbose(env,
-				"map '%s' is not a struct type or bpf_timer is mangled\n",
-				map->name);
+	if (!btf_type_fields_has_field(map->fields_tab, BPF_TIMER)) {
+		verbose(env, "map '%s' has no valid bpf_timer\n", map->name);
 		return -EINVAL;
 	}
-	if (map->timer_off != val + reg->off) {
+	if (map->fields_tab->timer_off != val + reg->off) {
 		verbose(env, "off %lld doesn't point to 'struct bpf_timer' that is at %d\n",
-			val + reg->off, map->timer_off);
+			val + reg->off, map->fields_tab->timer_off);
 		return -EINVAL;
 	}
 	if (meta->map_ptr) {
@@ -7469,7 +7430,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].map_uid = meta.map_uid;
 		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
 		if (!type_may_be_null(ret_type) &&
-		    map_value_has_spin_lock(meta.map_ptr)) {
+		    btf_type_fields_has_field(meta.map_ptr->fields_tab, BPF_SPIN_LOCK)) {
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		break;
@@ -10380,7 +10341,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
-		if (map_value_has_spin_lock(map))
+		if (btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK))
 			dst_reg->id = ++env->id_gen;
 	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
 		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
@@ -12658,7 +12619,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
-	if (map_value_has_spin_lock(map)) {
+	if (btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
@@ -12675,7 +12636,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (map_value_has_timer(map)) {
+	if (btf_type_fields_has_field(map->fields_tab, BPF_TIMER)) {
 		if (is_tracing_prog_type(prog_type)) {
 			verbose(env, "tracing progs cannot use bpf_timer yet\n");
 			return -EINVAL;
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 94374d529ea4..f6bbe83329d7 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -176,7 +176,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
 	if (!copy_selem)
 		return NULL;
 
-	if (map_value_has_spin_lock(&smap->map))
+	if (btf_type_fields_has_field(smap->map.fields_tab, BPF_SPIN_LOCK))
 		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
 				      SDATA(selem)->data, true);
 	else
@@ -595,7 +595,7 @@ static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb)
 	if (!nla_value)
 		goto errout;
 
-	if (map_value_has_spin_lock(&smap->map))
+	if (btf_type_fields_has_field(smap->map.fields_tab, BPF_SPIN_LOCK))
 		copy_map_value_locked(&smap->map, nla_data(nla_value),
 				      sdata->data, true);
 	else
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 08/25] bpf: Refactor map->off_arr handling
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management " Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Refactor map->off_arr handling into generic functions that can work on
their own without hardcoding map specific code. The btf_type_fields_off
structure is now returned from btf_parse_fields_off, which can be reused
later for types in program BTF.

All functions like copy_map_value, zero_map_value call generic
underlying functions so that they can also be reused later for copying
to values allocated in programs which encode specific fields.

Later, some helper functions will also require access to this off_arr
structure to be able to skip over special fields at runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  | 41 ++++++++++++++-----------
 include/linux/btf.h  |  1 +
 kernel/bpf/btf.c     | 55 ++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c | 71 +++++---------------------------------------
 4 files changed, 87 insertions(+), 81 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ba59147dfa61..bc8e7a132664 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -340,55 +340,62 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
 }
 
 /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */
-static inline void __copy_map_value(struct bpf_map *map, void *dst, void *src, bool long_memcpy)
+static inline void bpf_obj_memcpy(struct btf_type_fields_off *off_arr,
+				  void *dst, void *src, u32 size,
+				  bool long_memcpy)
 {
 	u32 curr_off = 0;
 	int i;
 
-	if (likely(!map->off_arr)) {
+	if (likely(!off_arr)) {
 		if (long_memcpy)
-			bpf_long_memcpy(dst, src, round_up(map->value_size, 8));
+			bpf_long_memcpy(dst, src, round_up(size, 8));
 		else
-			memcpy(dst, src, map->value_size);
+			memcpy(dst, src, size);
 		return;
 	}
 
-	for (i = 0; i < map->off_arr->cnt; i++) {
-		u32 next_off = map->off_arr->field_off[i];
+	for (i = 0; i < off_arr->cnt; i++) {
+		u32 next_off = off_arr->field_off[i];
 
 		memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
-		curr_off += map->off_arr->field_sz[i];
+		curr_off += off_arr->field_sz[i];
 	}
-	memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off);
+	memcpy(dst + curr_off, src + curr_off, size - curr_off);
 }
 
 static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
 {
-	__copy_map_value(map, dst, src, false);
+	bpf_obj_memcpy(map->off_arr, dst, src, map->value_size, false);
 }
 
 static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src)
 {
-	__copy_map_value(map, dst, src, true);
+	bpf_obj_memcpy(map->off_arr, dst, src, map->value_size, true);
 }
 
-static inline void zero_map_value(struct bpf_map *map, void *dst)
+static inline void bpf_obj_memzero(struct btf_type_fields_off *off_arr, void *dst, u32 size)
 {
 	u32 curr_off = 0;
 	int i;
 
-	if (likely(!map->off_arr)) {
-		memset(dst, 0, map->value_size);
+	if (likely(!off_arr)) {
+		memset(dst, 0, size);
 		return;
 	}
 
-	for (i = 0; i < map->off_arr->cnt; i++) {
-		u32 next_off = map->off_arr->field_off[i];
+	for (i = 0; i < off_arr->cnt; i++) {
+		u32 next_off = off_arr->field_off[i];
 
 		memset(dst + curr_off, 0, next_off - curr_off);
-		curr_off += map->off_arr->field_sz[i];
+		curr_off += off_arr->field_sz[i];
 	}
-	memset(dst + curr_off, 0, map->value_size - curr_off);
+	memset(dst + curr_off, 0, size - curr_off);
+}
+
+static inline void zero_map_value(struct bpf_map *map, void *dst)
+{
+	bpf_obj_memzero(map->off_arr, dst, map->value_size);
 }
 
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 72136c9ae4cd..609809017ea1 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -166,6 +166,7 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t);
 struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 					 const struct btf_type *t,
 					 u32 field_mask, u32 value_size);
+struct btf_type_fields_off *btf_parse_fields_off(struct btf_type_fields *tab);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index fe00d9c95c96..daadcd8641b5 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3556,6 +3556,61 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 	return ERR_PTR(ret);
 }
 
+static int btf_type_fields_off_cmp(const void *_a, const void *_b, const void *priv)
+{
+	const u32 a = *(const u32 *)_a;
+	const u32 b = *(const u32 *)_b;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void btf_type_fields_off_swap(void *_a, void *_b, int size, const void *priv)
+{
+	struct btf_type_fields_off *off_arr = (void *)priv;
+	u32 *off_base = off_arr->field_off;
+	u32 *a = _a, *b = _b;
+	u8 *sz_a, *sz_b;
+
+	sz_a = off_arr->field_sz + (a - off_base);
+	sz_b = off_arr->field_sz + (b - off_base);
+
+	swap(*a, *b);
+	swap(*sz_a, *sz_b);
+}
+
+struct btf_type_fields_off *btf_parse_fields_off(struct btf_type_fields *tab)
+{
+	struct btf_type_fields_off *off_arr;
+	u32 i, *off;
+	u8 *sz;
+
+	BUILD_BUG_ON(ARRAY_SIZE(off_arr->field_off) != ARRAY_SIZE(off_arr->field_sz));
+	if (IS_ERR_OR_NULL(tab) || WARN_ON_ONCE(tab->cnt > sizeof(off_arr->field_off)))
+		return NULL;
+
+	off_arr = kzalloc(sizeof(*off_arr), GFP_KERNEL | __GFP_NOWARN);
+	if (!off_arr)
+		return ERR_PTR(-ENOMEM);
+
+	off = &off_arr->field_off[0];
+	sz = &off_arr->field_sz[0];
+	for (i = 0; i < tab->cnt; i++) {
+		off[i] = tab->fields[i].offset;
+		sz[i] = btf_field_type_size(tab->fields[i].type);
+	}
+	off_arr->cnt = tab->cnt;
+
+	if (off_arr->cnt == 1)
+		return off_arr;
+	sort_r(off_arr->field_off, off_arr->cnt, sizeof(off_arr->field_off[0]),
+	       btf_type_fields_off_cmp, btf_type_fields_off_swap, off_arr);
+	return off_arr;
+}
+
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
 			      u32 type_id, void *data, u8 bits_offset,
 			      struct btf_show *show)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index afa736132cc5..3f3f9697d299 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -943,68 +943,6 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
-static int map_off_arr_cmp(const void *_a, const void *_b, const void *priv)
-{
-	const u32 a = *(const u32 *)_a;
-	const u32 b = *(const u32 *)_b;
-
-	if (a < b)
-		return -1;
-	else if (a > b)
-		return 1;
-	return 0;
-}
-
-static void map_off_arr_swap(void *_a, void *_b, int size, const void *priv)
-{
-	struct bpf_map *map = (struct bpf_map *)priv;
-	u32 *off_base = map->off_arr->field_off;
-	u32 *a = _a, *b = _b;
-	u8 *sz_a, *sz_b;
-
-	sz_a = map->off_arr->field_sz + (a - off_base);
-	sz_b = map->off_arr->field_sz + (b - off_base);
-
-	swap(*a, *b);
-	swap(*sz_a, *sz_b);
-}
-
-static int bpf_map_alloc_off_arr(struct bpf_map *map)
-{
-	bool has_fields = !IS_ERR_OR_NULL(map);
-	struct btf_type_fields_off *off_arr;
-	u32 i;
-
-	if (!has_fields) {
-		map->off_arr = NULL;
-		return 0;
-	}
-
-	off_arr = kmalloc(sizeof(*map->off_arr), GFP_KERNEL | __GFP_NOWARN);
-	if (!off_arr)
-		return -ENOMEM;
-	map->off_arr = off_arr;
-
-	off_arr->cnt = 0;
-	if (has_fields) {
-		struct btf_type_fields *tab = map->fields_tab;
-		u32 *off = &off_arr->field_off[off_arr->cnt];
-		u8 *sz = &off_arr->field_sz[off_arr->cnt];
-
-		for (i = 0; i < tab->cnt; i++) {
-			*off++ = tab->fields[i].offset;
-			*sz++ = btf_field_type_size(tab->fields[i].type);
-		}
-		off_arr->cnt = tab->cnt;
-	}
-
-	if (off_arr->cnt == 1)
-		return 0;
-	sort_r(off_arr->field_off, off_arr->cnt, sizeof(off_arr->field_off[0]),
-	       map_off_arr_cmp, map_off_arr_swap, map);
-	return 0;
-}
-
 static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			 u32 btf_key_id, u32 btf_value_id)
 {
@@ -1098,6 +1036,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 static int map_create(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
+	struct btf_type_fields_off *off_arr;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
@@ -1177,9 +1116,13 @@ static int map_create(union bpf_attr *attr)
 			attr->btf_vmlinux_value_type_id;
 	}
 
-	err = bpf_map_alloc_off_arr(map);
-	if (err)
+
+	off_arr = btf_parse_fields_off(map->fields_tab);
+	if (IS_ERR(off_arr)) {
+		err = PTR_ERR(off_arr);
 		goto free_map;
+	}
+	map->off_arr = off_arr;
 
 	err = security_bpf_map_alloc(map);
 	if (err)
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 08/25] bpf: Refactor map->off_arr handling Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19  1:59   ` Alexei Starovoitov
  2022-10-13  6:22 ` [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Add the basic support on the map side to parse, recognize, verify, and
build metadata table for a new special field of the type struct
bpf_list_head. To parameterize the bpf_list_head for a certain value
type and the list_node member it will accept in that value type, we use
BTF declaration tags.

The definition of bpf_list_head in a map value will be done as follows:

struct foo {
	struct bpf_list_node node;
	int data;
};

struct map_value {
	struct bpf_list_head head __contains(foo, node);
};

Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.

The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:kind:name:node" where the kind and name is then used to
look up the type in the map BTF. The node defines name of the member in
this type that has the type struct bpf_list_node, which is actually used
for linking into the linked list. For now, 'kind' part is hardcoded as
struct.

This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.

For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done
with a TODO that will be resolved in a future patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  19 +++
 kernel/bpf/btf.c                              | 147 +++++++++++++++++-
 kernel/bpf/helpers.c                          |  32 ++++
 kernel/bpf/syscall.c                          |  22 ++-
 kernel/bpf/verifier.c                         |   7 +
 .../testing/selftests/bpf/bpf_experimental.h  |  23 +++
 6 files changed, 246 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bc8e7a132664..46330d871d4e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -27,6 +27,8 @@
 #include <linux/bpfptr.h>
 #include <linux/btf.h>
 #include <linux/rcupdate_trace.h>
+/* Experimental BPF APIs header for type definitions */
+#include "../tools/testing/selftests/bpf/bpf_experimental.h"
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -175,6 +177,7 @@ enum btf_field_type {
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
+	BPF_LIST_HEAD  = (1 << 4),
 };
 
 struct btf_field_kptr {
@@ -184,11 +187,18 @@ struct btf_field_kptr {
 	u32 btf_id;
 };
 
+struct btf_field_list_head {
+	struct btf *btf;
+	u32 value_btf_id;
+	u32 node_offset;
+};
+
 struct btf_field {
 	u32 offset;
 	enum btf_field_type type;
 	union {
 		struct btf_field_kptr kptr;
+		struct btf_field_list_head list_head;
 	};
 };
 
@@ -266,6 +276,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return "kptr";
+	case BPF_LIST_HEAD:
+		return "bpf_list_head";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -282,6 +294,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return sizeof(u64);
+	case BPF_LIST_HEAD:
+		return sizeof(struct bpf_list_head);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -298,6 +312,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return __alignof__(u64);
+	case BPF_LIST_HEAD:
+		return __alignof__(struct bpf_list_head);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -401,6 +417,9 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 			   bool lock_src);
 void bpf_timer_cancel_and_free(void *timer);
+void bpf_list_head_free(const struct btf_field *field, void *list_head,
+			struct bpf_spin_lock *spin_lock);
+
 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
 
 struct bpf_offload_dev;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index daadcd8641b5..066984d73a8b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3205,9 +3205,15 @@ enum {
 struct btf_field_info {
 	enum btf_field_type type;
 	u32 off;
-	struct {
-		u32 type_id;
-	} kptr;
+	union {
+		struct {
+			u32 type_id;
+		} kptr;
+		struct {
+			const char *node_name;
+			u32 value_btf_id;
+		} list_head;
+	};
 };
 
 static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
@@ -3261,6 +3267,69 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	return BTF_FIELD_FOUND;
 }
 
+static const char *btf_find_decl_tag_value(const struct btf *btf,
+					   const struct btf_type *pt,
+					   int comp_idx, const char *tag_key)
+{
+	int i;
+
+	for (i = 1; i < btf_nr_types(btf); i++) {
+		const struct btf_type *t = btf_type_by_id(btf, i);
+		int len = strlen(tag_key);
+
+		if (!btf_type_is_decl_tag(t))
+			continue;
+		/* TODO: Instead of btf_type pt, it would be much better if we had BTF
+		 * ID of the map value type. This would avoid btf_type_by_id call here.
+		 */
+		if (pt != btf_type_by_id(btf, t->type) ||
+		    btf_type_decl_tag(t)->component_idx != comp_idx)
+			continue;
+		if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len))
+			continue;
+		return __btf_name_by_offset(btf, t->name_off) + len;
+	}
+	return NULL;
+}
+
+static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
+			      const struct btf_type *t, int comp_idx,
+			      u32 off, int sz, struct btf_field_info *info)
+{
+	const char *value_type;
+	const char *list_node;
+	s32 id;
+
+	if (!__btf_type_is_struct(t))
+		return BTF_FIELD_IGNORE;
+	if (t->size != sz)
+		return BTF_FIELD_IGNORE;
+	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
+	if (!value_type)
+		return -EINVAL;
+	if (strncmp(value_type, "struct:", sizeof("struct:") - 1))
+		return -EINVAL;
+	value_type += sizeof("struct:") - 1;
+	list_node = strstr(value_type, ":");
+	if (!list_node)
+		return -EINVAL;
+	value_type = kstrndup(value_type, list_node - value_type, GFP_ATOMIC);
+	if (!value_type)
+		return -ENOMEM;
+	id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT);
+	kfree(value_type);
+	if (id < 0)
+		return id;
+	list_node++;
+	if (str_is_empty(list_node))
+		return -EINVAL;
+	info->type = BPF_LIST_HEAD;
+	info->off = off;
+	info->list_head.value_btf_id = id;
+	info->list_head.node_name = list_node;
+	return BTF_FIELD_FOUND;
+}
+
 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			      int *align, int *sz)
 {
@@ -3284,6 +3353,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
+	if (field_mask & BPF_LIST_HEAD) {
+		if (!strcmp(name, "bpf_list_head")) {
+			type = BPF_LIST_HEAD;
+			goto end;
+		}
+	}
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3317,6 +3392,8 @@ static int btf_find_struct_field(const struct btf *btf,
 			return field_type;
 
 		off = __btf_member_bit_offset(t, member);
+		if (i && !off)
+			return -EFAULT;
 		if (off % 8)
 			/* valid C code cannot generate such BTF */
 			return -EINVAL;
@@ -3339,6 +3416,12 @@ static int btf_find_struct_field(const struct btf *btf,
 			if (ret < 0)
 				return ret;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_find_list_head(btf, t, member_type, i, off, sz,
+						 idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
@@ -3373,6 +3456,8 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			return field_type;
 
 		off = vsi->offset;
+		if (i && !off)
+			return -EFAULT;
 		if (vsi->size != sz)
 			continue;
 		if (off % align)
@@ -3393,6 +3478,12 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			if (ret < 0)
 				return ret;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_find_list_head(btf, var, var_type, -1, off, sz,
+						 idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
@@ -3491,6 +3582,44 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 	return ret;
 }
 
+static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
+			       struct btf_field_info *info)
+{
+	const struct btf_type *t, *n = NULL;
+	const struct btf_member *member;
+	u32 offset;
+	int i;
+
+	t = btf_type_by_id(btf, info->list_head.value_btf_id);
+	/* We've already checked that value_btf_id is a struct type. We
+	 * just need to figure out the offset of the list_node, and
+	 * verify its type.
+	 */
+	for_each_member(i, t, member) {
+		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
+			continue;
+		/* Invalid BTF, two members with same name */
+		if (n)
+			return -EINVAL;
+		n = btf_type_by_id(btf, member->type);
+		if (!__btf_type_is_struct(n))
+			return -EINVAL;
+		if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off)))
+			return -EINVAL;
+		offset = __btf_member_bit_offset(n, member);
+		if (offset % 8)
+			return -EINVAL;
+		offset /= 8;
+		if (offset % __alignof__(struct bpf_list_node))
+			return -EINVAL;
+
+		field->list_head.btf = (struct btf *)btf;
+		field->list_head.value_btf_id = info->list_head.value_btf_id;
+		field->list_head.node_offset = offset;
+	}
+	return 0;
+}
+
 struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 					 const struct btf_type *t,
 					 u32 field_mask,
@@ -3542,6 +3671,11 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_parse_list_head(btf, &tab->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
 		default:
 			ret = -EFAULT;
 			goto end;
@@ -3550,6 +3684,13 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 		tab->cnt++;
 	}
 	tab->cnt = cnt;
+
+	/* bpf_list_head requires bpf_spin_lock */
+	if (btf_type_fields_has_field(tab, BPF_LIST_HEAD) && tab->spin_lock_off < 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	return tab;
 end:
 	btf_type_fields_free(tab);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8f425596b9c6..a2f2fe43916b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1700,6 +1700,38 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+void bpf_list_head_free(const struct btf_field *field, void *list_head,
+			struct bpf_spin_lock *spin_lock)
+{
+	struct list_head *head = list_head, *orig_head = head;
+	unsigned long flags;
+
+	BUILD_BUG_ON(sizeof(struct bpf_list_head) != sizeof(struct list_head));
+	BUILD_BUG_ON(__alignof__(struct bpf_list_head) != __alignof__(struct list_head));
+
+	/* __bpf_spin_lock_irqsave cannot be used here, as we may take a spin
+	 * lock again when we call bpf_obj_free_fields in the loop, and it will
+	 * overwrite the per-CPU local_irq_save state.
+	 */
+	local_irq_save(flags);
+	__bpf_spin_lock(spin_lock);
+	if (!head->next || list_empty(head))
+		goto unlock;
+	head = head->next;
+	while (head != orig_head) {
+		void *obj = head;
+
+		obj -= field->list_head.node_offset;
+		head = head->next;
+		/* TODO: Rework later */
+		kfree(obj);
+	}
+unlock:
+	INIT_LIST_HEAD(head);
+	__bpf_spin_unlock(spin_lock);
+	local_irq_restore(flags);
+}
+
 BTF_SET8_START(tracing_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3f3f9697d299..92486d777246 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -536,6 +536,9 @@ void btf_type_fields_free(struct btf_type_fields *tab)
 				module_put(tab->fields[i].kptr.module);
 			btf_put(tab->fields[i].kptr.btf);
 			break;
+		case BPF_LIST_HEAD:
+			/* Nothing to release for bpf_list_head */
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
@@ -578,6 +581,9 @@ struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab)
 				goto free;
 			}
 			break;
+		case BPF_LIST_HEAD:
+			/* Nothing to acquire for bpf_list_head */
+			break;
 		default:
 			ret = -EFAULT;
 			WARN_ON_ONCE(1);
@@ -637,6 +643,11 @@ void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)
 		case BPF_KPTR_REF:
 			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
 			break;
+		case BPF_LIST_HEAD:
+			if (WARN_ON_ONCE(tab->spin_lock_off < 0))
+				continue;
+			bpf_list_head_free(field, field_ptr, obj + tab->spin_lock_off);
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
@@ -965,7 +976,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!value_type || value_size != map->value_size)
 		return -EINVAL;
 
-	map->fields_tab = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR,
+	map->fields_tab = btf_parse_fields(btf, value_type,
+					   BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
 					   map->value_size);
 	if (!IS_ERR_OR_NULL(map->fields_tab)) {
 		int i;
@@ -1011,6 +1023,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 					goto free_map_tab;
 				}
 				break;
+			case BPF_LIST_HEAD:
+				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+				    map->map_type != BPF_MAP_TYPE_ARRAY) {
+					ret = -EOPNOTSUPP;
+					goto free_map_tab;
+				}
+				break;
 			default:
 				/* Fail if map_type checks are missing for a field type */
 				ret = -EOPNOTSUPP;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8660d08589c8..3c47cecda302 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12643,6 +12643,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
+	if (btf_type_fields_has_field(map->fields_tab, BPF_LIST_HEAD)) {
+		if (is_tracing_prog_type(prog_type)) {
+			verbose(env, "tracing progs cannot use bpf_list_head yet\n");
+			return -EINVAL;
+		}
+	}
+
 	if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) &&
 	    !bpf_offload_prog_map_match(prog, map)) {
 		verbose(env, "offload device mismatch between prog and map\n");
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
new file mode 100644
index 000000000000..4e31790e433d
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -0,0 +1,23 @@
+#ifndef __KERNEL__
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+#else
+
+struct bpf_list_head {
+	__u64 __a;
+	__u64 __b;
+} __attribute__((aligned(8)));
+
+struct bpf_list_node {
+	__u64 __a;
+	__u64 __b;
+} __attribute__((aligned(8)));
+
+#endif
+
+#ifndef __KERNEL__
+#endif
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19 17:15   ` Dave Marchevsky
  2022-10-25 16:32   ` Dave Marchevsky
  2022-10-13  6:22 ` [PATCH bpf-next v2 11/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in " Kumar Kartikeya Dwivedi
                   ` (14 subsequent siblings)
  24 siblings, 2 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
type tag in reg->type to avoid having to check btf_is_kernel when trying
to match argument types in helpers.

For now, these local kptrs will always be referenced in verifier
context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
to such objects, as long fields that are special are not touched
(support for which will be added in subsequent patches).

No PROBE_MEM handling is hence done since they can never be in an
undefined state, and their lifetime will always be valid.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h              | 14 +++++++++++---
 include/linux/filter.h           |  4 +++-
 kernel/bpf/btf.c                 |  9 ++++++++-
 kernel/bpf/verifier.c            | 15 ++++++++++-----
 net/bpf/bpf_dummy_struct_ops.c   |  3 ++-
 net/core/filter.c                | 13 ++++++++-----
 net/ipv4/bpf_tcp_ca.c            |  3 ++-
 net/netfilter/nf_conntrack_bpf.c |  1 +
 8 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 46330d871d4e..a2f4d3356cc8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -526,6 +526,11 @@ enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is of a type from program BTF, not kernel BTF. This is used to
+	 * tag PTR_TO_BTF_ID allocated using bpf_kptr_alloc.
+	 */
+	MEM_TYPE_LOCAL		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
@@ -774,6 +779,7 @@ struct bpf_prog_ops {
 			union bpf_attr __user *uattr);
 };
 
+struct bpf_reg_state;
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
 	const struct bpf_func_proto *
@@ -795,6 +801,7 @@ struct bpf_verifier_ops {
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
 	int (*btf_struct_access)(struct bpf_verifier_log *log,
+				 const struct bpf_reg_state *reg,
 				 const struct btf *btf,
 				 const struct btf_type *t, int off, int size,
 				 enum bpf_access_type atype,
@@ -2076,10 +2083,11 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size,
 	return btf_ctx_access(off, size, type, prog, info);
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg, const struct btf *btf,
 		      const struct btf_type *t, int off, int size,
-		      enum bpf_access_type atype,
-		      u32 *next_btf_id, enum bpf_type_flag *flag);
+		      enum bpf_access_type atype, u32 *next_btf_id,
+		      enum bpf_type_flag *flag);
 bool btf_struct_ids_match(struct bpf_verifier_log *log,
 			  const struct btf *btf, u32 id, int off,
 			  const struct btf *need_btf, u32 need_type_id,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6e3aed..9b94e24f90b9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -568,7 +568,9 @@ struct sk_filter {
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
 extern struct mutex nf_conn_btf_access_lock;
-extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
+extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+				     const struct bpf_reg_state *reg,
+				     const struct btf *btf,
 				     const struct btf_type *t, int off, int size,
 				     enum bpf_access_type atype, u32 *next_btf_id,
 				     enum bpf_type_flag *flag);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 066984d73a8b..65f444405d9c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6019,11 +6019,13 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 	return -EINVAL;
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg, const struct btf *btf,
 		      const struct btf_type *t, int off, int size,
 		      enum bpf_access_type atype __maybe_unused,
 		      u32 *next_btf_id, enum bpf_type_flag *flag)
 {
+	bool local_type = reg && (type_flag(reg->type) & MEM_TYPE_LOCAL);
 	enum bpf_type_flag tmp_flag = 0;
 	int err;
 	u32 id;
@@ -6033,6 +6035,11 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
 
 		switch (err) {
 		case WALK_PTR:
+			/* For local types, the destination register cannot
+			 * become a pointer again.
+			 */
+			if (local_type)
+				return SCALAR_VALUE;
 			/* If we found the pointer or scalar on t+off,
 			 * we're done.
 			 */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c47cecda302..6ee8c06c2080 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4522,16 +4522,20 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	if (env->ops->btf_struct_access) {
-		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
+	if (env->ops->btf_struct_access && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
+		WARN_ON_ONCE(!btf_is_kernel(reg->btf));
+		ret = env->ops->btf_struct_access(&env->log, reg, reg->btf, t,
 						  off, size, atype, &btf_id, &flag);
 	} else {
-		if (atype != BPF_READ) {
+		if (atype != BPF_READ && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
 			verbose(env, "only read is supported\n");
 			return -EACCES;
 		}
 
-		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
+		if (reg->type & MEM_TYPE_LOCAL)
+			WARN_ON_ONCE(!reg->ref_obj_id);
+
+		ret = btf_struct_access(&env->log, reg, reg->btf, t, off, size,
 					atype, &btf_id, &flag);
 	}
 
@@ -4596,7 +4600,7 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
+	ret = btf_struct_access(&env->log, NULL, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
 	if (ret < 0)
 		return ret;
 
@@ -5816,6 +5820,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index e78dadfc5829..d7aa636d90ce 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -156,6 +156,7 @@ static bool bpf_dummy_ops_is_valid_access(int off, int size,
 }
 
 static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
+					   const struct bpf_reg_state *reg,
 					   const struct btf *btf,
 					   const struct btf_type *t, int off,
 					   int size, enum bpf_access_type atype,
@@ -177,7 +178,7 @@ static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
 		return -EACCES;
 	}
 
-	err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+	err = btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
 				flag);
 	if (err < 0)
 		return err;
diff --git a/net/core/filter.c b/net/core/filter.c
index bb0136e7a8e4..cc7af7be91d9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8647,13 +8647,15 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 DEFINE_MUTEX(nf_conn_btf_access_lock);
 EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
 
-int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
+int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+			      const struct bpf_reg_state *reg, const struct btf *btf,
 			      const struct btf_type *t, int off, int size,
 			      enum bpf_access_type atype, u32 *next_btf_id,
 			      enum bpf_type_flag *flag);
 EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
 
 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
+					const struct bpf_reg_state *reg,
 					const struct btf *btf,
 					const struct btf_type *t, int off,
 					int size, enum bpf_access_type atype,
@@ -8663,12 +8665,12 @@ static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
 					 flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
@@ -8734,6 +8736,7 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
 static int xdp_btf_struct_access(struct bpf_verifier_log *log,
+				 const struct bpf_reg_state *reg,
 				 const struct btf *btf,
 				 const struct btf_type *t, int off,
 				 int size, enum bpf_access_type atype,
@@ -8743,12 +8746,12 @@ static int xdp_btf_struct_access(struct bpf_verifier_log *log,
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
 					 flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 6da16ae6a962..1fe3935c4260 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -69,6 +69,7 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
 }
 
 static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
+					const struct bpf_reg_state *reg,
 					const struct btf *btf,
 					const struct btf_type *t, int off,
 					int size, enum bpf_access_type atype,
@@ -78,7 +79,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
 	size_t end;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
 					 flag);
 
 	if (t != tcp_sock_type) {
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8639e7efd0e2..f6036a84484b 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -191,6 +191,7 @@ BTF_ID(struct, nf_conn___init)
 
 /* Check writes into `struct nf_conn` */
 static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
+					   const struct bpf_reg_state *reg,
 					   const struct btf *btf,
 					   const struct btf_type *t, int off,
 					   int size, enum bpf_access_type atype,
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 11/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 12/25] bpf: Verify ownership relationships for owning types Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a
local kptr.

A bpf_list_head allows implementing map-in-map style use cases, where
local kptr with bpf_list_head is linked into a list in a map value. This
would require embedding a bpf_list_node, support for which is also
included.

Lastly, while we strictly don't require to hold a bpf_spin_lock while
manipulating the bpf_list_head of a local kptr, as when have access to
it, we have complete ownership of the object, the locking constraint is
still kept and may be conditionally lifted in the future.

Note that the specification of such types can be done just like map
values, e.g.:

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
	struct bpf_list_node node;
};

struct map_value {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(foo, node);
};

To recognize such types in user BTF, we build a btf_struct_metas array
of metadata items corresponding to each BTF ID. This is done once during
the btf_parse stage to avoid having to do it each time during the
verification process's requirement to inspect the metadata.

Moreover, the computed metadata needs to be passed to some helpers in
future patches which requires allocating them and storing them in the
BTF that is pinned by the program itself, so that valid access can be
assumed to such data during program runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  |   7 ++
 include/linux/btf.h  |  35 ++++++++
 kernel/bpf/btf.c     | 196 +++++++++++++++++++++++++++++++++++++++----
 kernel/bpf/syscall.c |   4 +
 4 files changed, 224 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a2f4d3356cc8..76548a9d57db 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -178,6 +178,7 @@ enum btf_field_type {
 	BPF_KPTR_REF   = (1 << 3),
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
+	BPF_LIST_NODE  = (1 << 5),
 };
 
 struct btf_field_kptr {
@@ -278,6 +279,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 		return "kptr";
 	case BPF_LIST_HEAD:
 		return "bpf_list_head";
+	case BPF_LIST_NODE:
+		return "bpf_list_node";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -296,6 +299,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(u64);
 	case BPF_LIST_HEAD:
 		return sizeof(struct bpf_list_head);
+	case BPF_LIST_NODE:
+		return sizeof(struct bpf_list_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -314,6 +319,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(u64);
 	case BPF_LIST_HEAD:
 		return __alignof__(struct bpf_list_head);
+	case BPF_LIST_NODE:
+		return __alignof__(struct bpf_list_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 609809017ea1..b63c88de3135 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -6,6 +6,8 @@
 
 #include <linux/types.h>
 #include <linux/bpfptr.h>
+#include <linux/bsearch.h>
+#include <linux/btf_ids.h>
 #include <uapi/linux/btf.h>
 #include <uapi/linux/bpf.h>
 
@@ -78,6 +80,17 @@ struct btf_id_dtor_kfunc {
 	u32 kfunc_btf_id;
 };
 
+struct btf_struct_meta {
+	u32 btf_id;
+	struct btf_type_fields *fields_tab;
+	struct btf_type_fields_off *off_arr;
+};
+
+struct btf_struct_metas {
+	u32 cnt;
+	struct btf_struct_meta types[];
+};
+
 typedef void (*btf_dtor_kfunc_t)(void *);
 
 extern const struct file_operations btf_fops;
@@ -409,6 +422,23 @@ static inline struct btf_param *btf_params(const struct btf_type *t)
 	return (struct btf_param *)(t + 1);
 }
 
+static inline int btf_id_cmp_func(const void *a, const void *b)
+{
+	const int *pa = a, *pb = b;
+
+	return *pa - *pb;
+}
+
+static inline bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
+{
+	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
+}
+
+static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
+{
+	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 struct bpf_prog;
 
@@ -424,6 +454,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner);
+struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -455,6 +486,10 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt
 {
 	return 0;
 }
+static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
+{
+	return NULL;
+}
 #endif
 
 static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 65f444405d9c..6c4701f7c938 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -237,6 +237,7 @@ struct btf {
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
 	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
+	struct btf_struct_metas *struct_meta_tab;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -1642,8 +1643,30 @@ static void btf_free_dtor_kfunc_tab(struct btf *btf)
 	btf->dtor_kfunc_tab = NULL;
 }
 
+static void btf_struct_metas_free(struct btf_struct_metas *tab)
+{
+	int i;
+
+	if (!tab)
+		return;
+	for (i = 0; i < tab->cnt; i++) {
+		btf_type_fields_free(tab->types[i].fields_tab);
+		kfree(tab->types[i].off_arr);
+	}
+	kfree(tab);
+}
+
+static void btf_free_struct_meta_tab(struct btf *btf)
+{
+	struct btf_struct_metas *tab = btf->struct_meta_tab;
+
+	btf_struct_metas_free(tab);
+	btf->struct_meta_tab = NULL;
+}
+
 static void btf_free(struct btf *btf)
 {
+	btf_free_struct_meta_tab(btf);
 	btf_free_dtor_kfunc_tab(btf);
 	btf_free_kfunc_set_tab(btf);
 	kvfree(btf->types);
@@ -3359,6 +3382,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
+	if (field_mask & BPF_LIST_NODE) {
+		if (!strcmp(name, "bpf_list_node")) {
+			type = BPF_LIST_NODE;
+			goto end;
+		}
+	}
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3404,6 +3433,7 @@ static int btf_find_struct_field(const struct btf *btf,
 		switch (field_type) {
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
+		case BPF_LIST_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3466,6 +3496,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
+		case BPF_LIST_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3676,6 +3707,8 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_LIST_NODE:
+			break;
 		default:
 			ret = -EFAULT;
 			goto end;
@@ -5143,6 +5176,118 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
 	return btf_check_sec_info(env, btf_data_size);
 }
 
+static const char *local_kptr_fields[] = {
+	"bpf_spin_lock",
+	"bpf_list_head",
+	"bpf_list_node",
+};
+
+static struct btf_struct_metas *
+btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
+{
+	union {
+		struct btf_id_set set;
+		struct {
+			u32 _cnt;
+			u32 _ids[ARRAY_SIZE(local_kptr_fields)];
+		} _arr;
+	} lkf;
+	struct btf_struct_metas *tab = NULL;
+	int i, n, id, ret;
+
+	memset(&lkf, 0, sizeof(lkf));
+
+	for (i = 0; i < ARRAY_SIZE(local_kptr_fields); i++) {
+		/* Try to find whether this special type exists in user BTF, and
+		 * if so remember its ID so we can easily find it among members
+		 * of structs that we iterate in the next loop.
+		 */
+		id = btf_find_by_name_kind(btf, local_kptr_fields[i], BTF_KIND_STRUCT);
+		if (id < 0)
+			continue;
+		lkf.set.ids[lkf.set.cnt++] = id;
+	}
+
+	if (!lkf.set.cnt)
+		return NULL;
+	sort(&lkf.set.ids, lkf.set.cnt, sizeof(lkf.set.ids[0]), btf_id_cmp_func, NULL);
+
+	n = btf_nr_types(btf);
+	for (i = 1; i < n; i++) {
+		struct btf_type_fields_off *off_arr;
+		struct btf_type_fields *fields_tab;
+		const struct btf_member *member;
+		struct btf_struct_meta *type;
+		const struct btf_type *t;
+		int j;
+
+		t = btf_type_by_id(btf, i);
+		if (!t) {
+			ret = -EINVAL;
+			goto free;
+		}
+		if (!__btf_type_is_struct(t))
+			continue;
+
+		cond_resched();
+
+		for_each_member(j, t, member) {
+			if (btf_id_set_contains(&lkf.set, member->type))
+				goto parse;
+		}
+		continue;
+	parse:
+		if (!tab) {
+			tab = kzalloc(offsetof(struct btf_struct_metas, types[1]),
+				      GFP_KERNEL | __GFP_NOWARN);
+			if (!tab)
+				return ERR_PTR(-ENOMEM);
+		} else {
+			struct btf_struct_metas *new_tab;
+
+			new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab->cnt + 1]),
+					   GFP_KERNEL | __GFP_NOWARN);
+			if (!new_tab) {
+				ret = -ENOMEM;
+				goto free;
+			}
+			tab = new_tab;
+		}
+		type = &tab->types[tab->cnt];
+
+		type->btf_id = i;
+		fields_tab = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
+		if (IS_ERR_OR_NULL(fields_tab)) {
+			ret = PTR_ERR_OR_ZERO(fields_tab) ?: -EFAULT;
+			goto free;
+		}
+		off_arr = btf_parse_fields_off(fields_tab);
+		if (WARN_ON_ONCE(IS_ERR_OR_NULL(off_arr))) {
+			btf_type_fields_free(fields_tab);
+			ret = -EFAULT;
+			goto free;
+		}
+		type->fields_tab = fields_tab;
+		type->off_arr = off_arr;
+		tab->cnt++;
+	}
+	return tab;
+free:
+	btf_struct_metas_free(tab);
+	return ERR_PTR(ret);
+}
+
+struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
+{
+	struct btf_struct_metas *tab;
+
+	BUILD_BUG_ON(offsetof(struct btf_struct_meta, btf_id) != 0);
+	tab = btf->struct_meta_tab;
+	if (!tab)
+		return NULL;
+	return bsearch(&btf_id, tab->types, tab->cnt, sizeof(tab->types[0]), btf_id_cmp_func);
+}
+
 static int btf_check_type_tags(struct btf_verifier_env *env,
 			       struct btf *btf, int start_id)
 {
@@ -5193,6 +5338,7 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
 static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 			     u32 log_level, char __user *log_ubuf, u32 log_size)
 {
+	struct btf_struct_metas *struct_meta_tab;
 	struct btf_verifier_env *env = NULL;
 	struct bpf_verifier_log *log;
 	struct btf *btf = NULL;
@@ -5261,15 +5407,24 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 	if (err)
 		goto errout;
 
+	struct_meta_tab = btf_parse_struct_metas(log, btf);
+	if (IS_ERR(struct_meta_tab)) {
+		err = PTR_ERR(struct_meta_tab);
+		goto errout;
+	}
+	btf->struct_meta_tab = struct_meta_tab;
+
 	if (log->level && bpf_verifier_log_full(log)) {
 		err = -ENOSPC;
-		goto errout;
+		goto errout_meta;
 	}
 
 	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
 	return btf;
 
+errout_meta:
+	btf_free_struct_meta_tab(btf);
 errout:
 	btf_verifier_env_free(env);
 	if (btf)
@@ -6030,6 +6185,28 @@ int btf_struct_access(struct bpf_verifier_log *log,
 	int err;
 	u32 id;
 
+	while (local_type) {
+		struct btf_struct_meta *meta;
+		struct btf_type_fields *tab;
+		int i;
+
+		meta = btf_find_struct_meta(btf, reg->btf_id);
+		if (!meta)
+			break;
+		tab = meta->fields_tab;
+		for (i = 0; i < tab->cnt; i++) {
+			struct btf_field *field = &tab->fields[i];
+			u32 offset = field->offset;
+			if (off < offset + btf_field_type_size(field->type) && offset < off + size) {
+				bpf_log(log,
+					"direct access to %s is disallowed\n",
+					btf_field_type_name(field->type));
+				return -EACCES;
+			}
+		}
+		break;
+	}
+
 	do {
 		err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
 
@@ -7270,23 +7447,6 @@ bool btf_is_module(const struct btf *btf)
 	return btf->kernel_btf && strcmp(btf->name, "vmlinux") != 0;
 }
 
-static int btf_id_cmp_func(const void *a, const void *b)
-{
-	const int *pa = a, *pb = b;
-
-	return *pa - *pb;
-}
-
-bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
-{
-	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
-}
-
-static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
-{
-	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
-}
-
 enum {
 	BTF_MODULE_F_LIVE = (1 << 0),
 };
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 92486d777246..c60bf641301d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -537,6 +537,7 @@ void btf_type_fields_free(struct btf_type_fields *tab)
 			btf_put(tab->fields[i].kptr.btf);
 			break;
 		case BPF_LIST_HEAD:
+		case BPF_LIST_NODE:
 			/* Nothing to release for bpf_list_head */
 			break;
 		default:
@@ -582,6 +583,7 @@ struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab)
 			}
 			break;
 		case BPF_LIST_HEAD:
+		case BPF_LIST_NODE:
 			/* Nothing to acquire for bpf_list_head */
 			break;
 		default:
@@ -648,6 +650,8 @@ void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)
 				continue;
 			bpf_list_head_free(field, field_ptr, obj + tab->spin_lock_off);
 			break;
+		case BPF_LIST_NODE:
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 12/25] bpf: Verify ownership relationships for owning types
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 11/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in " Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 13/25] bpf: Support locking bpf_spin_lock in local kptr Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Ensure that there can be no ownership cycles among different types by
way of having owning objects that can hold some other type as their
element. For instance, a map value can only hold local kptrs, but these
are allowed to have another bpf_list_head. To prevent unbounded
recursion while freeing resources, elements of bpf_list_head in local
kptrs can never have a bpf_list_head which are part of list in a map
value.

Also, to make runtime destruction easier, once btf_struct_metas is fully
populated, we can stash the metadata of the value type directly in the
metadata of the list_head fields, as that allows easier access to the
value type's layout to destruct it at runtime from the btf_field entry
of the list head itself.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  |  1 +
 include/linux/btf.h  |  1 +
 kernel/bpf/btf.c     | 71 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c |  4 +++
 4 files changed, 77 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 76548a9d57db..b2419752542a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -192,6 +192,7 @@ struct btf_field_list_head {
 	struct btf *btf;
 	u32 value_btf_id;
 	u32 node_offset;
+	struct btf_type_fields *value_tab;
 };
 
 struct btf_field {
diff --git a/include/linux/btf.h b/include/linux/btf.h
index b63c88de3135..4492636c3571 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -179,6 +179,7 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t);
 struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 					 const struct btf_type *t,
 					 u32 field_mask, u32 value_size);
+int btf_check_and_fixup_fields(const struct btf *btf, struct btf_type_fields *tab);
 struct btf_type_fields_off *btf_parse_fields_off(struct btf_type_fields *tab);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6c4701f7c938..86ee5841d8dc 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3730,6 +3730,67 @@ struct btf_type_fields *btf_parse_fields(const struct btf *btf,
 	return ERR_PTR(ret);
 }
 
+int btf_check_and_fixup_fields(const struct btf *btf, struct btf_type_fields *tab)
+{
+	int i;
+
+	/* There are two owning types, kptr_ref and bpf_list_head. The former
+	 * only supports storing kernel types, which can never store references
+	 * to program allocated local types, atleast not yet. Hence we only need
+	 * to ensure that bpf_list_head ownership does not form cycles.
+	 */
+	if (IS_ERR_OR_NULL(tab) || !(tab->field_mask & BPF_LIST_HEAD))
+		return 0;
+	for (i = 0; i < tab->cnt; i++) {
+		struct btf_struct_meta *meta;
+		u32 btf_id;
+
+		if (!(tab->fields[i].type & BPF_LIST_HEAD))
+			continue;
+		btf_id = tab->fields[i].list_head.value_btf_id;
+		meta = btf_find_struct_meta(btf, btf_id);
+		if (!meta)
+			return -EFAULT;
+		tab->fields[i].list_head.value_tab = meta->fields_tab;
+
+		if (!(tab->field_mask & BPF_LIST_NODE))
+			continue;
+
+		/* We need to ensure ownership acyclicity among all types. The
+		 * proper way to do it would be to topologically sort all BTF
+		 * IDs based on the ownership edges, since there can be multiple
+		 * bpf_list_head in a type. Instead, we use the following
+		 * reasoning:
+		 *
+		 * - A type can only be owned by another type in user BTF if it
+		 *   has a bpf_list_node.
+		 * - A type can only _own_ another type in user BTF if it has a
+		 *   bpf_list_head.
+		 *
+		 * We ensure that if a type has both bpf_list_head and
+		 * bpf_list_node, its element types cannot be owning types.
+		 *
+		 * To ensure acyclicity:
+		 *
+		 * When A only has bpf_list_head, ownership chain can be:
+		 *	A -> B -> C
+		 * Where:
+		 * - B has both bpf_list_head and bpf_list_node.
+		 * - C only has bpf_list_node.
+		 *
+		 * When A has both bpf_list_head and bpf_list_node, some other
+		 * type already owns it in the BTF domain, hence it can not own
+		 * another owning type through any of the bpf_list_head edges.
+		 *	A -> B
+		 * Where:
+		 * - B only has bpf_list_node.
+		 */
+		if (meta->fields_tab->field_mask & BPF_LIST_HEAD)
+			return -ELOOP;
+	}
+	return 0;
+}
+
 static int btf_type_fields_off_cmp(const void *_a, const void *_b, const void *priv)
 {
 	const u32 a = *(const u32 *)_a;
@@ -5414,6 +5475,16 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 	}
 	btf->struct_meta_tab = struct_meta_tab;
 
+	if (struct_meta_tab) {
+		int i;
+
+		for (i = 0; i < struct_meta_tab->cnt; i++) {
+			err = btf_check_and_fixup_fields(btf, struct_meta_tab->types[i].fields_tab);
+			if (err < 0)
+				goto errout_meta;
+		}
+	}
+
 	if (log->level && bpf_verifier_log_full(log)) {
 		err = -ENOSPC;
 		goto errout_meta;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c60bf641301d..c8e1bdcbc205 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1043,6 +1043,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		}
 	}
 
+	ret = btf_check_and_fixup_fields(btf, map->fields_tab);
+	if (ret < 0)
+		goto free_map_tab;
+
 	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
 		if (ret < 0)
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 13/25] bpf: Support locking bpf_spin_lock in local kptr
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 12/25] bpf: Verify ownership relationships for owning types Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 14/25] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Allow locking a bpf_spin_lock embedded in local kptr, in addition to
already support map value pointers. The handling is similar to that
of map values, by just preserving the reg->id of local kptrs as well,
and adjusting process_spin_lock to work with non-PTR_TO_MAP_VALUE and
remember the id in verifier state.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c  |  2 ++
 kernel/bpf/verifier.c | 70 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a2f2fe43916b..238103dc6c5e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -336,6 +336,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+	.arg1_btf_id    = BPF_PTR_POISON,
 };
 
 static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
@@ -358,6 +359,7 @@ const struct bpf_func_proto bpf_spin_unlock_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+	.arg1_btf_id    = BPF_PTR_POISON,
 };
 
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6ee8c06c2080..5114cc97cdd4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -453,8 +453,16 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return reg->type == PTR_TO_MAP_VALUE &&
-		btf_type_fields_has_field(reg->map_ptr->fields_tab, BPF_SPIN_LOCK);
+	struct btf_type_fields *tab = NULL;
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		tab = reg->map_ptr->fields_tab;
+	} else if (reg->type == (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
+		struct btf_struct_meta *meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (meta)
+			tab = meta->fields_tab;
+	}
+	return btf_type_fields_has_field(tab, BPF_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -5412,8 +5420,10 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
-	struct bpf_map *map = reg->map_ptr;
 	u64 val = reg->var_off.value;
+	struct btf_type_fields *tab;
+	struct bpf_map *map = NULL;
+	struct btf *btf = NULL;
 
 	if (!is_const) {
 		verbose(env,
@@ -5421,19 +5431,32 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			regno);
 		return -EINVAL;
 	}
-	if (!map->btf) {
-		verbose(env,
-			"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-			map->name);
-		return -EINVAL;
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		map = reg->map_ptr;
+		if (!map->btf) {
+			verbose(env,
+				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
+				map->name);
+			return -EINVAL;
+		}
+		tab = map->fields_tab;
+	} else {
+		struct btf_struct_meta *meta;
+
+		btf = reg->btf;
+		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (meta)
+			tab = meta->fields_tab;
 	}
-	if (!btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
-		verbose(env, "map '%s' has no valid bpf_spin_lock\n", map->name);
+
+	if (!btf_type_fields_has_field(tab, BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr");
 		return -EINVAL;
 	}
-	if (map->fields_tab->spin_lock_off != val + reg->off) {
+	if (tab->spin_lock_off != val + reg->off) {
 		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, map->fields_tab->spin_lock_off);
+			val + reg->off, tab->spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
@@ -5649,13 +5672,19 @@ static const struct bpf_reg_types int_ptr_types = {
 	},
 };
 
+static const struct bpf_reg_types spin_lock_types = {
+	.types = {
+		PTR_TO_MAP_VALUE,
+		PTR_TO_BTF_ID | MEM_TYPE_LOCAL,
+	}
+};
+
 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
 static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
-static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
@@ -5780,6 +5809,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 				return -EACCES;
 			}
 		}
+	} else if (reg->type == (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
+		if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock) {
+			verbose(env, "verifier internal error: unimplemented handling of local kptr\n");
+			return -EFAULT;
+		}
 	}
 
 	return 0;
@@ -5896,7 +5930,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		goto skip_type_check;
 
 	/* arg_btf_id and arg_size are in a union. */
-	if (base_type(arg_type) == ARG_PTR_TO_BTF_ID)
+	if (base_type(arg_type) == ARG_PTR_TO_BTF_ID ||
+	    base_type(arg_type) == ARG_PTR_TO_SPIN_LOCK)
 		arg_btf_id = fn->arg_btf_id[arg];
 
 	err = check_reg_type(env, regno, arg_type, arg_btf_id, meta);
@@ -6504,9 +6539,10 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
-		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
-			return false;
-
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID)
+			return !!fn->arg_btf_id[i];
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_SPIN_LOCK)
+			return fn->arg_btf_id[i] == BPF_PTR_POISON;
 		if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i] &&
 		    /* arg_btf_id and arg_size are in a union. */
 		    (base_type(fn->arg_type[i]) != ARG_PTR_TO_MEM ||
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 14/25] bpf: Allow locking bpf_spin_lock global variables
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 13/25] bpf: Support locking bpf_spin_lock in local kptr Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Global variables reside in maps accessible using direct_value_addr
callbacks, so giving each load instruction's rewrite a unique reg->id
disallows us from holding locks which are global.

This is not great, so refactor the active_spin_lock into two separate
fields, active_spin_lock_ptr and active_spin_lock_id, which is generic
enough to allow it for global variables, map lookups, and local kptr
registers at the same time.

Held vs non-held is indicated by active_spin_lock_ptr, which stores the
reg->map_ptr or reg->btf pointer of the register used for locking spin
lock. But the active_spin_lock_id also needs to be compared to ensure
whether bpf_spin_unlock is for the same register.

Next, pseudo load instructions are not given a unique reg->id, as they
are doing lookup for the same map value (max_entries is never greater
than 1).

Essentially, we consider that the tuple of (active_spin_lock_ptr,
active_spin_lock_id) will always be unique for any kind of argument to
bpf_spin_{lock,unlock}.

Note that this can be extended in the future to also remember offset
used for locking, so that we can introduce multiple bpf_spin_lock fields
in the same allocation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  3 ++-
 kernel/bpf/verifier.c        | 39 +++++++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9e1e6965f407..c283484f8b94 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -323,7 +323,8 @@ struct bpf_verifier_state {
 	u32 branches;
 	u32 insn_idx;
 	u32 curframe;
-	u32 active_spin_lock;
+	void *active_spin_lock_ptr;
+	u32 active_spin_lock_id;
 	bool speculative;
 
 	/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5114cc97cdd4..41a5cc5fbcd4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1201,7 +1201,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	}
 	dst_state->speculative = src->speculative;
 	dst_state->curframe = src->curframe;
-	dst_state->active_spin_lock = src->active_spin_lock;
+	dst_state->active_spin_lock_ptr = src->active_spin_lock_ptr;
+	dst_state->active_spin_lock_id = src->active_spin_lock_id;
 	dst_state->branches = src->branches;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
@@ -5460,22 +5461,35 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 		return -EINVAL;
 	}
 	if (is_lock) {
-		if (cur->active_spin_lock) {
+		if (cur->active_spin_lock_ptr) {
 			verbose(env,
 				"Locking two bpf_spin_locks are not allowed\n");
 			return -EINVAL;
 		}
-		cur->active_spin_lock = reg->id;
+		if (map)
+			cur->active_spin_lock_ptr = map;
+		else
+			cur->active_spin_lock_ptr = btf;
+		cur->active_spin_lock_id = reg->id;
 	} else {
-		if (!cur->active_spin_lock) {
+		void *ptr;
+
+		if (map)
+			ptr = map;
+		else
+			ptr = btf;
+
+		if (!cur->active_spin_lock_ptr) {
 			verbose(env, "bpf_spin_unlock without taking a lock\n");
 			return -EINVAL;
 		}
-		if (cur->active_spin_lock != reg->id) {
+		if (cur->active_spin_lock_ptr != ptr ||
+		    cur->active_spin_lock_id != reg->id) {
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
-		cur->active_spin_lock = 0;
+		cur->active_spin_lock_ptr = NULL;
+		cur->active_spin_lock_id = 0;
 	}
 	return 0;
 }
@@ -10382,8 +10396,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
-		if (btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK))
-			dst_reg->id = ++env->id_gen;
+		WARN_ON_ONCE(map->max_entries != 1);
+		/* We want reg->id to be same (0) as map_value is not distinct */
 	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
 		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
 		dst_reg->type = CONST_PTR_TO_MAP;
@@ -10461,7 +10475,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return err;
 	}
 
-	if (env->cur_state->active_spin_lock) {
+	if (env->cur_state->active_spin_lock_ptr) {
 		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_spin_lock-ed region\n");
 		return -EINVAL;
 	}
@@ -11727,7 +11741,8 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->speculative && !cur->speculative)
 		return false;
 
-	if (old->active_spin_lock != cur->active_spin_lock)
+	if (old->active_spin_lock_ptr != cur->active_spin_lock_ptr ||
+	    old->active_spin_lock_id != cur->active_spin_lock_id)
 		return false;
 
 	/* for states to be equal callsites have to be the same
@@ -12366,7 +12381,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock &&
+				if (env->cur_state->active_spin_lock_ptr &&
 				    (insn->src_reg == BPF_PSEUDO_CALL ||
 				     insn->imm != BPF_FUNC_spin_unlock)) {
 					verbose(env, "function calls are not allowed while holding a lock\n");
@@ -12403,7 +12418,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock) {
+				if (env->cur_state->active_spin_lock_ptr) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 14/25] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13 13:48   ` kernel test robot
  2022-10-13  6:22 ` [PATCH bpf-next v2 16/25] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

As we continue to add more features, argument types, kfunc flags, and
different extensions to kfuncs, the code to verify the correctness of
the kfunc prototype wrt the passed in registers has become ad-hoc and
ugly to read.

To make life easier, and make a very clear split between different
stages of argument processing, move all the code into verifier.c and
refactor into easier to read helpers and functions.

This also makes sharing code within the verifier easier with kfunc
argument processing. This will be more and more useful in later patches
as we are now moving to implement very core BPF helpers as kfuncs, to
keep them experimental before baking into UAPI.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h                           |  29 +
 kernel/bpf/btf.c                              |  16 +-
 kernel/bpf/verifier.c                         | 546 +++++++++++++++++-
 .../bpf/prog_tests/kfunc_dynptr_param.c       |   2 +-
 tools/testing/selftests/bpf/verifier/calls.c  |   4 +-
 .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
 6 files changed, 568 insertions(+), 33 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 4492636c3571..022c70a62363 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -339,6 +339,16 @@ static inline bool btf_type_is_struct(const struct btf_type *t)
 	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
 }
 
+static inline bool __btf_type_is_struct(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
+}
+
+static inline bool btf_type_is_array(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
+}
+
 static inline u16 btf_type_vlen(const struct btf_type *t)
 {
 	return BTF_INFO_VLEN(t->info);
@@ -442,6 +452,7 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
 
 #ifdef CONFIG_BPF_SYSCALL
 struct bpf_prog;
+struct bpf_verifier_log;
 
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
@@ -456,6 +467,12 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner);
 struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
+const struct btf_member *
+btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
+		      const struct btf_type *t, enum bpf_prog_type prog_type,
+		      int arg);
+bool btf_types_are_same(const struct btf *btf1, u32 id1,
+			const struct btf *btf2, u32 id2);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -491,6 +508,18 @@ static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf
 {
 	return NULL;
 }
+static inline const struct btf_member *
+btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
+		      const struct btf_type *t, enum bpf_prog_type prog_type,
+		      int arg)
+{
+	return NULL;
+}
+static inline bool btf_types_are_same(const struct btf *btf1, u32 id1,
+				      const struct btf *btf2, u32 id2)
+{
+	return false;
+}
 #endif
 
 static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 86ee5841d8dc..56757ec79c1a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -478,16 +478,6 @@ static bool btf_type_nosize_or_null(const struct btf_type *t)
 	return !t || btf_type_nosize(t);
 }
 
-static bool __btf_type_is_struct(const struct btf_type *t)
-{
-	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
-}
-
-static bool btf_type_is_array(const struct btf_type *t)
-{
-	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
-}
-
 static bool btf_type_is_datasec(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
@@ -5537,7 +5527,7 @@ static u8 bpf_ctx_convert_map[] = {
 #undef BPF_MAP_TYPE
 #undef BPF_LINK_TYPE
 
-static const struct btf_member *
+const struct btf_member *
 btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 		      const struct btf_type *t, enum bpf_prog_type prog_type,
 		      int arg)
@@ -6322,8 +6312,8 @@ int btf_struct_access(struct bpf_verifier_log *log,
  * end up with two different module BTFs, but IDs point to the common type in
  * vmlinux BTF.
  */
-static bool btf_types_are_same(const struct btf *btf1, u32 id1,
-			       const struct btf *btf2, u32 id2)
+bool btf_types_are_same(const struct btf *btf1, u32 id1,
+			const struct btf *btf2, u32 id2)
 {
 	if (id1 != id2)
 		return false;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 41a5cc5fbcd4..e29ea51276cb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7664,19 +7664,522 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
+struct bpf_kfunc_call_arg_meta {
+	/* In parameters */
+	struct btf *btf;
+	u32 func_id;
+	u32 kfunc_flags;
+	const struct btf_type *func_proto;
+	const char *func_name;
+	/* Out parameters */
+	u32 ref_obj_id;
+	u8 release_regno;
+	bool r0_rdonly;
+	u64 r0_size;
+};
+
+static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_ACQUIRE;
+}
+
+static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RET_NULL;
+}
+
+static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RELEASE;
+}
+
+static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_TRUSTED_ARGS;
+}
+
+static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_SLEEPABLE;
+}
+
+static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_DESTRUCTIVE;
+}
+
+static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
+{
+	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
+}
+
+static bool is_kfunc_arg_mem_size(const struct btf *btf,
+				  const struct btf_param *arg,
+				  const struct bpf_reg_state *reg)
+{
+	int len, sfx_len = sizeof("__sz") - 1;
+	const struct btf_type *t;
+	const char *param_name;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
+		return false;
+
+	/* In the future, this can be ported to use BTF tagging */
+	param_name = btf_name_by_offset(btf, arg->name_off);
+	if (str_is_empty(param_name))
+		return false;
+	len = strlen(param_name);
+	if (len < sfx_len)
+		return false;
+	param_name += len - sfx_len;
+	if (strncmp(param_name, "__sz", sfx_len))
+		return false;
+
+	return true;
+}
+
+static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
+				      const struct btf_param *arg,
+				      const struct bpf_reg_state *reg,
+				      const char *name)
+{
+	int len, target_len = strlen(name);
+	const struct btf_type *t;
+	const char *param_name;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
+		return false;
+
+	param_name = btf_name_by_offset(btf, arg->name_off);
+	if (str_is_empty(param_name))
+		return false;
+	len = strlen(param_name);
+	if (len != target_len)
+		return false;
+	if (strcmp(param_name, name))
+		return false;
+
+	return true;
+}
+
+enum {
+	KF_ARG_DYNPTR_ID,
+};
+
+BTF_ID_LIST(kf_arg_btf_ids)
+BTF_ID(struct, bpf_dynptr_kern)
+
+static bool is_kfunc_arg_dynptr(const struct btf *btf,
+				const struct btf_param *arg)
+{
+	const struct btf_type *t;
+	u32 res_id;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!t)
+		return false;
+	if (!btf_type_is_ptr(t))
+		return false;
+	t = btf_type_skip_modifiers(btf, t->type, &res_id);
+	if (!t)
+		return false;
+	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
+}
+
+/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
+static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
+					const struct btf *btf,
+					const struct btf_type *t, int rec)
+{
+	const struct btf_type *member_type;
+	const struct btf_member *member;
+	u32 i;
+
+	if (!btf_type_is_struct(t))
+		return false;
+
+	for_each_member(i, t, member) {
+		const struct btf_array *array;
+
+		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
+		if (btf_type_is_struct(member_type)) {
+			if (rec >= 3) {
+				verbose(env, "max struct nesting depth exceeded\n");
+				return false;
+			}
+			if (!__btf_type_is_scalar_struct(env, btf, member_type, rec + 1))
+				return false;
+			continue;
+		}
+		if (btf_type_is_array(member_type)) {
+			array = btf_array(member_type);
+			if (!array->nelems)
+				return false;
+			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
+			if (!btf_type_is_scalar(member_type))
+				return false;
+			continue;
+		}
+		if (!btf_type_is_scalar(member_type))
+			return false;
+	}
+	return true;
+}
+
+
+static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
+#ifdef CONFIG_NET
+	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
+	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
+	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
+#endif
+};
+
+enum kfunc_ptr_arg_type {
+	KF_ARG_PTR_TO_CTX,
+	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
+	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
+	KF_ARG_PTR_TO_DYNPTR,
+	KF_ARG_PTR_TO_MEM,
+	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+};
+
+enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
+						struct bpf_kfunc_call_arg_meta *meta,
+						const struct btf_type *t,
+						const struct btf_type *ref_t,
+						const char *ref_tname,
+						const struct btf_param *args,
+						int argno, int nargs)
+{
+	u32 regno = argno + 1;
+	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_reg_state *reg = &regs[regno];
+	bool arg_mem_size = false;
+
+	/* In this function, we verify the kfunc's BTF as per the argument type,
+	 * leaving the rest of the verification with respect to the register
+	 * type to our caller. When a set of conditions hold in the BTF type of
+	 * arguments, we resolve it to a known kfunc_ptr_arg_type.
+	 */
+	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
+		return KF_ARG_PTR_TO_CTX;
+
+	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
+		if (!btf_type_is_struct(ref_t)) {
+			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
+				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
+			return -EINVAL;
+		}
+		return KF_ARG_PTR_TO_BTF_ID;
+	}
+
+	if (is_kfunc_arg_kptr_get(meta, argno)) {
+		if (!btf_type_is_ptr(ref_t)) {
+			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
+			return -EINVAL;
+		}
+		ref_t = btf_type_by_id(meta->btf, ref_t->type);
+		ref_tname = btf_name_by_offset(meta->btf, ref_t->name_off);
+		if (!btf_type_is_struct(ref_t)) {
+			verbose(env, "kernel function %s args#0 pointer type %s %s is not supported\n",
+				meta->func_name, btf_type_str(ref_t), ref_tname);
+			return -EINVAL;
+		}
+		return KF_ARG_PTR_TO_KPTR_STRONG;
+	}
+
+	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_DYNPTR;
+
+	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
+		arg_mem_size = true;
+
+	/* This is the catch all argument type of register types supported by
+	 * check_helper_mem_access. However, we only allow when argument type is
+	 * pointer to scalar, or struct composed (recursively) of scalars. When
+	 * arg_mem_size is true, the pointer can be void *.
+	 */
+	if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
+	    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
+		verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
+			argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
+		return -EINVAL;
+	}
+	return arg_mem_size ? KF_ARG_PTR_TO_MEM_SIZE : KF_ARG_PTR_TO_MEM;
+}
+
+static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
+					struct bpf_reg_state *reg,
+					const struct btf_type *ref_t,
+					const char *ref_tname, u32 ref_id,
+					struct bpf_kfunc_call_arg_meta *meta,
+					int argno)
+{
+	const struct btf_type *reg_ref_t;
+	bool strict_type_match = false;
+	const struct btf *reg_btf;
+	const char *reg_ref_tname;
+	u32 reg_ref_id;
+
+	if (reg->type == PTR_TO_BTF_ID) {
+		reg_btf = reg->btf;
+		reg_ref_id = reg->btf_id;
+	} else {
+		reg_btf = btf_vmlinux;
+		reg_ref_id = *reg2btf_ids[base_type(reg->type)];
+	}
+
+	if (is_kfunc_trusted_args(meta) || (is_kfunc_release(meta) && reg->ref_obj_id))
+		strict_type_match = true;
+
+	reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
+	reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
+	if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
+		verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
+			meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
+			btf_type_str(reg_ref_t), reg_ref_tname);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg,
+					     const struct btf_type *ref_t,
+					     const char *ref_tname,
+					     struct bpf_kfunc_call_arg_meta *meta,
+					     int argno)
+{
+	struct btf_field *kptr_field;
+
+	/* check_func_arg_reg_off allows var_off for
+	 * PTR_TO_MAP_VALUE, but we need fixed offset to find
+	 * off_desc.
+	 */
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env, "arg#0 must have constant offset\n");
+		return -EINVAL;
+	}
+
+	kptr_field = btf_type_fields_find(reg->map_ptr->fields_tab, reg->off + reg->var_off.value, BPF_KPTR);
+	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
+		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
+			reg->off + reg->var_off.value);
+		return -EINVAL;
+	}
+
+	if (!btf_struct_ids_match(&env->log, meta->btf, ref_t->type, 0, kptr_field->kptr.btf,
+				  kptr_field->kptr.btf_id, true)) {
+		verbose(env, "kernel function %s args#%d expected pointer to %s %s\n",
+			meta->func_name, argno, btf_type_str(ref_t), ref_tname);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
+{
+	const char *func_name = meta->func_name, *ref_tname;
+	const struct btf *btf = meta->btf;
+	const struct btf_param *args;
+	u32 i, nargs;
+	int ret;
+
+	args = (const struct btf_param *)(meta->func_proto + 1);
+	nargs = btf_type_vlen(meta->func_proto);
+	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
+		verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
+			MAX_BPF_FUNC_REG_ARGS);
+		return -EINVAL;
+	}
+
+	/* Check that BTF function arguments match actual types that the
+	 * verifier sees.
+	 */
+	for (i = 0; i < nargs; i++) {
+		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
+		const struct btf_type *t, *ref_t, *resolve_ret;
+		enum bpf_arg_type arg_type = ARG_DONTCARE;
+		u32 regno = i + 1, ref_id, type_size;
+		bool is_ret_buf_sz = false;
+		int kf_arg_type;
+
+		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+		if (btf_type_is_scalar(t)) {
+			if (reg->type != SCALAR_VALUE) {
+				verbose(env, "R%d is not a scalar\n", regno);
+				return -EINVAL;
+			}
+			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
+					meta->r0_rdonly = true;
+					is_ret_buf_sz = true;
+			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
+					is_ret_buf_sz = true;
+			}
+
+			if (is_ret_buf_sz) {
+				if (meta->r0_size) {
+					verbose(env, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
+					return -EINVAL;
+				}
+
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "R%d is not a const\n", regno);
+					return -EINVAL;
+				}
+
+				meta->r0_size = reg->var_off.value;
+				ret = mark_chain_precision(env, regno);
+				if (ret)
+					return ret;
+			}
+			continue;
+		}
+
+		if (!btf_type_is_ptr(t)) {
+			verbose(env, "Unrecognized arg#%d type %s\n", i, btf_type_str(t));
+			return -EINVAL;
+		}
+
+		if (reg->ref_obj_id) {
+			if (is_kfunc_release(meta) && meta->ref_obj_id) {
+				verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
+					regno, reg->ref_obj_id,
+					meta->ref_obj_id);
+				return -EFAULT;
+			}
+			meta->ref_obj_id = reg->ref_obj_id;
+			if (is_kfunc_release(meta))
+				meta->release_regno = regno;
+		}
+
+		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
+		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+
+		kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs);
+		if (kf_arg_type < 0)
+			return kf_arg_type;
+
+		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_BTF_ID:
+			if (is_kfunc_trusted_args(meta) && !reg->ref_obj_id) {
+				verbose(env, "R%d must be referenced\n", regno);
+				return -EINVAL;
+			}
+			fallthrough;
+		case KF_ARG_PTR_TO_CTX:
+			/* Trusted arguments have the same offset checks as release arguments */
+			arg_type |= OBJ_RELEASE;
+			break;
+		case KF_ARG_PTR_TO_KPTR_STRONG:
+		case KF_ARG_PTR_TO_DYNPTR:
+		case KF_ARG_PTR_TO_MEM:
+		case KF_ARG_PTR_TO_MEM_SIZE:
+			/* Trusted by default */
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return -EFAULT;
+		}
+
+		if (is_kfunc_release(meta) && reg->ref_obj_id)
+			arg_type |= OBJ_RELEASE;
+		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
+		if (ret < 0)
+			return ret;
+
+		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_CTX:
+			if (reg->type != PTR_TO_CTX) {
+				verbose(env, "arg#%d expected pointer to ctx, but got %s\n", i, btf_type_str(t));
+				return -EINVAL;
+			}
+			break;
+		case KF_ARG_PTR_TO_BTF_ID:
+			/* Only base_type is checked, further checks are done here */
+			if (reg->type != PTR_TO_BTF_ID &&
+			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
+				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_KPTR_STRONG:
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				verbose(env, "arg#0 expected pointer to map value\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_kptr_strong(env, reg, ref_t, ref_tname, meta, i);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_DYNPTR:
+			if (reg->type != PTR_TO_STACK) {
+				verbose(env, "arg#%d expected pointer to stack\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_dynptr_reg_valid_init(env, reg)) {
+				verbose(env, "arg#%d pointer type %s %s must be valid and initialized\n",
+					i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+
+			if (!is_dynptr_type_expected(env, reg, ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
+				verbose(env, "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
+					i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			break;
+		case KF_ARG_PTR_TO_MEM:
+			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
+			if (IS_ERR(resolve_ret)) {
+				verbose(env, "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
+					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
+				return -EINVAL;
+			}
+			ret = check_mem_reg(env, reg, regno, type_size);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_MEM_SIZE:
+			ret = check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1);
+			if (ret < 0) {
+				verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+				return ret;
+			}
+			/* Skip next '__sz' argument */
+			i++;
+			break;
+		}
+	}
+
+	if (is_kfunc_release(meta) && !meta->release_regno) {
+		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
+			func_name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
 	const struct btf_type *t, *func, *func_proto, *ptr_type;
 	struct bpf_reg_state *regs = cur_regs(env);
-	struct bpf_kfunc_arg_meta meta = { 0 };
 	const char *func_name, *ptr_type_name;
+	struct bpf_kfunc_call_arg_meta meta;
 	u32 i, nargs, func_id, ptr_type_id;
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	struct btf *desc_btf;
 	u32 *kfunc_flags;
-	bool acq;
 
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
@@ -7697,24 +8200,34 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			func_name);
 		return -EACCES;
 	}
-	if (*kfunc_flags & KF_DESTRUCTIVE && !capable(CAP_SYS_BOOT)) {
-		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capabilities\n");
+
+	/* Prepare kfunc call metadata */
+	memset(&meta, 0, sizeof(meta));
+	meta.btf = desc_btf;
+	meta.func_id = func_id;
+	meta.kfunc_flags = *kfunc_flags;
+	meta.func_proto = func_proto;
+	meta.func_name = func_name;
+
+	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
+		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
 		return -EACCES;
 	}
 
-	acq = *kfunc_flags & KF_ACQUIRE;
-
-	meta.flags = *kfunc_flags;
+	if (is_kfunc_sleepable(&meta) && !env->prog->aux->sleepable) {
+		verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name);
+		return -EACCES;
+	}
 
 	/* Check the arguments */
-	err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta);
+	err = check_kfunc_args(env, &meta);
 	if (err < 0)
 		return err;
 	/* In case of release function, we get register number of refcounted
-	 * PTR_TO_BTF_ID back from btf_check_kfunc_arg_match, do the release now
+	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
 	 */
-	if (err) {
-		err = release_reference(env, regs[err].ref_obj_id);
+	if (meta.release_regno) {
+		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
 				func_name, func_id);
@@ -7728,7 +8241,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	/* Check return type */
 	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
 
-	if (acq && !btf_type_is_struct_ptr(desc_btf, t)) {
+	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
 		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
 		return -EINVAL;
 	}
@@ -7767,20 +8280,23 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 		}
-		if (*kfunc_flags & KF_RET_NULL) {
+		if (is_kfunc_ret_null(&meta)) {
 			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
 			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
-		if (acq) {
+		if (is_kfunc_acquire(&meta)) {
 			int id = acquire_reference_state(env, insn_idx);
 
 			if (id < 0)
 				return id;
-			regs[BPF_REG_0].id = id;
+			if (is_kfunc_ret_null(&meta))
+				regs[BPF_REG_0].id = id;
 			regs[BPF_REG_0].ref_obj_id = id;
 		}
+		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
+			regs[BPF_REG_0].id = ++env->id_gen;
 	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
 
 	nargs = btf_type_vlen(func_proto);
diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
index c210657d4d0a..55d641c1f126 100644
--- a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
+++ b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
@@ -22,7 +22,7 @@ static struct {
 	 "arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0},
 	{"not_valid_dynptr",
 	 "arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0},
-	{"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0},
+	{"not_ptr_to_stack", "arg#0 expected pointer to stack", 0},
 	{"dynptr_data_null", NULL, -EBADMSG},
 };
 
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index e1a937277b54..e349d85f7717 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -109,7 +109,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_kfunc_call_test_acquire", 3 },
 		{ "bpf_kfunc_call_test_release", 5 },
@@ -181,7 +181,7 @@
 	},
 	.result_unpriv = REJECT,
 	.result = REJECT,
-	.errstr = "negative offset ptr_ ptr R1 off=-4 disallowed",
+	.errstr = "R1 must have zero offset when passed to release func",
 },
 {
 	"calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset",
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index f18ce867271f..4784471b0b7f 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -142,7 +142,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_user_key", 2 },
 		{ "bpf_key_put", 4 },
@@ -163,7 +163,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_system_key", 1 },
 		{ "bpf_key_put", 3 },
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 16/25] bpf: Drop kfunc bits from btf_check_func_arg_match
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (14 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 17/25] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Remove all kfunc related bits now from btf_check_func_arg_match, as
users have been converted away to refactored kfunc argument handling.

This is split into a separate commit to aid review, in order to compare
what has been preserved from the removed bits easily instead of mixing
removed hunks with previous patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |  11 --
 include/linux/bpf_verifier.h |   2 -
 kernel/bpf/btf.c             | 364 +----------------------------------
 kernel/bpf/verifier.c        |   4 +-
 4 files changed, 10 insertions(+), 371 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b2419752542a..7ffafa5bb866 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2107,22 +2107,11 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
 			   const char *func_name,
 			   struct btf_func_model *m);
 
-struct bpf_kfunc_arg_meta {
-	u64 r0_size;
-	bool r0_rdonly;
-	int ref_obj_id;
-	u32 flags;
-};
-
 struct bpf_reg_state;
 int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
 				struct bpf_reg_state *regs);
 int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 			   struct bpf_reg_state *regs);
-int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
-			      const struct btf *btf, u32 func_id,
-			      struct bpf_reg_state *regs,
-			      struct bpf_kfunc_arg_meta *meta);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			  struct bpf_reg_state *reg);
 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c283484f8b94..4585de45ad1c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -590,8 +590,6 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
 			   enum bpf_arg_type arg_type);
-int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-			     u32 regno);
 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 		   u32 regno, u32 mem_size);
 bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 56757ec79c1a..8a22f3a3c293 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6595,122 +6595,19 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
-static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
-#ifdef CONFIG_NET
-	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
-	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
-	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
-#endif
-};
-
-/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
-static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int rec)
-{
-	const struct btf_type *member_type;
-	const struct btf_member *member;
-	u32 i;
-
-	if (!btf_type_is_struct(t))
-		return false;
-
-	for_each_member(i, t, member) {
-		const struct btf_array *array;
-
-		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
-		if (btf_type_is_struct(member_type)) {
-			if (rec >= 3) {
-				bpf_log(log, "max struct nesting depth exceeded\n");
-				return false;
-			}
-			if (!__btf_type_is_scalar_struct(log, btf, member_type, rec + 1))
-				return false;
-			continue;
-		}
-		if (btf_type_is_array(member_type)) {
-			array = btf_type_array(member_type);
-			if (!array->nelems)
-				return false;
-			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
-			if (!btf_type_is_scalar(member_type))
-				return false;
-			continue;
-		}
-		if (!btf_type_is_scalar(member_type))
-			return false;
-	}
-	return true;
-}
-
-static bool is_kfunc_arg_mem_size(const struct btf *btf,
-				  const struct btf_param *arg,
-				  const struct bpf_reg_state *reg)
-{
-	int len, sfx_len = sizeof("__sz") - 1;
-	const struct btf_type *t;
-	const char *param_name;
-
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
-	/* In the future, this can be ported to use BTF tagging */
-	param_name = btf_name_by_offset(btf, arg->name_off);
-	if (str_is_empty(param_name))
-		return false;
-	len = strlen(param_name);
-	if (len < sfx_len)
-		return false;
-	param_name += len - sfx_len;
-	if (strncmp(param_name, "__sz", sfx_len))
-		return false;
-
-	return true;
-}
-
-static bool btf_is_kfunc_arg_mem_size(const struct btf *btf,
-				      const struct btf_param *arg,
-				      const struct bpf_reg_state *reg,
-				      const char *name)
-{
-	int len, target_len = strlen(name);
-	const struct btf_type *t;
-	const char *param_name;
-
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
-	param_name = btf_name_by_offset(btf, arg->name_off);
-	if (str_is_empty(param_name))
-		return false;
-	len = strlen(param_name);
-	if (len != target_len)
-		return false;
-	if (strcmp(param_name, name))
-		return false;
-
-	return true;
-}
-
 static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				    const struct btf *btf, u32 func_id,
 				    struct bpf_reg_state *regs,
 				    bool ptr_to_mem_ok,
-				    struct bpf_kfunc_arg_meta *kfunc_meta,
 				    bool processing_call)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
-	bool rel = false, kptr_get = false, trusted_args = false;
-	bool sleepable = false;
 	struct bpf_verifier_log *log = &env->log;
-	u32 i, nargs, ref_id, ref_obj_id = 0;
-	bool is_kfunc = btf_is_kernel(btf);
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
-	int ref_regno = 0, ret;
+	u32 i, nargs, ref_id;
+	int ret;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
@@ -6736,14 +6633,6 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (is_kfunc && kfunc_meta) {
-		/* Only kfunc can be release func */
-		rel = kfunc_meta->flags & KF_RELEASE;
-		kptr_get = kfunc_meta->flags & KF_KPTR_GET;
-		trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
-		sleepable = kfunc_meta->flags & KF_SLEEPABLE;
-	}
-
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
@@ -6751,42 +6640,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1;
 		struct bpf_reg_state *reg = &regs[regno];
-		bool obj_ptr = false;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
 		if (btf_type_is_scalar(t)) {
-			if (is_kfunc && kfunc_meta) {
-				bool is_buf_size = false;
-
-				/* check for any const scalar parameter of name "rdonly_buf_size"
-				 * or "rdwr_buf_size"
-				 */
-				if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
-							      "rdonly_buf_size")) {
-					kfunc_meta->r0_rdonly = true;
-					is_buf_size = true;
-				} else if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
-								     "rdwr_buf_size"))
-					is_buf_size = true;
-
-				if (is_buf_size) {
-					if (kfunc_meta->r0_size) {
-						bpf_log(log, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
-						return -EINVAL;
-					}
-
-					if (!tnum_is_const(reg->var_off)) {
-						bpf_log(log, "R%d is not a const\n", regno);
-						return -EINVAL;
-					}
-
-					kfunc_meta->r0_size = reg->var_off.value;
-					ret = mark_chain_precision(env, regno);
-					if (ret)
-						return ret;
-				}
-			}
-
 			if (reg->type == SCALAR_VALUE)
 				continue;
 			bpf_log(log, "R%d is not a scalar\n", regno);
@@ -6799,88 +6655,14 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		/* These register types have special constraints wrt ref_obj_id
-		 * and offset checks. The rest of trusted args don't.
-		 */
-		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
-			  reg2btf_ids[base_type(reg->type)];
-
-		/* Check if argument must be a referenced pointer, args + i has
-		 * been verified to be a pointer (after skipping modifiers).
-		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
-		 */
-		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
-			bpf_log(log, "R%d must be referenced\n", regno);
-			return -EINVAL;
-		}
-
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		/* Trusted args have the same offset checks as release arguments */
-		if ((trusted_args && obj_ptr) || (rel && reg->ref_obj_id))
-			arg_type |= OBJ_RELEASE;
 		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
 		if (ret < 0)
 			return ret;
 
-		if (is_kfunc && reg->ref_obj_id) {
-			/* Ensure only one argument is referenced PTR_TO_BTF_ID */
-			if (ref_obj_id) {
-				bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
-					regno, reg->ref_obj_id, ref_obj_id);
-				return -EFAULT;
-			}
-			ref_regno = regno;
-			ref_obj_id = reg->ref_obj_id;
-		}
-
-		/* kptr_get is only true for kfunc */
-		if (i == 0 && kptr_get) {
-			struct btf_field *kptr_field;
-
-			if (reg->type != PTR_TO_MAP_VALUE) {
-				bpf_log(log, "arg#0 expected pointer to map value\n");
-				return -EINVAL;
-			}
-
-			/* check_func_arg_reg_off allows var_off for
-			 * PTR_TO_MAP_VALUE, but we need fixed offset to find
-			 * off_desc.
-			 */
-			if (!tnum_is_const(reg->var_off)) {
-				bpf_log(log, "arg#0 must have constant offset\n");
-				return -EINVAL;
-			}
-
-			kptr_field = btf_type_fields_find(reg->map_ptr->fields_tab, reg->off + reg->var_off.value, BPF_KPTR);
-			if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
-				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
-					reg->off + reg->var_off.value);
-				return -EINVAL;
-			}
-
-			if (!btf_type_is_ptr(ref_t)) {
-				bpf_log(log, "arg#0 BTF type must be a double pointer\n");
-				return -EINVAL;
-			}
-
-			ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id);
-			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
-
-			if (!btf_type_is_struct(ref_t)) {
-				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
-					func_name, i, btf_type_str(ref_t), ref_tname);
-				return -EINVAL;
-			}
-			if (!btf_struct_ids_match(log, btf, ref_id, 0, kptr_field->kptr.btf,
-						  kptr_field->kptr.btf_id, true)) {
-				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
-					func_name, i, btf_type_str(ref_t), ref_tname);
-				return -EINVAL;
-			}
-			/* rest of the arguments can be anything, like normal kfunc */
-		} else if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
+		if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
@@ -6890,109 +6672,10 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 					i, btf_type_str(t));
 				return -EINVAL;
 			}
-		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
-			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
-			const struct btf_type *reg_ref_t;
-			const struct btf *reg_btf;
-			const char *reg_ref_tname;
-			u32 reg_ref_id;
-
-			if (!btf_type_is_struct(ref_t)) {
-				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
-					func_name, i, btf_type_str(ref_t),
-					ref_tname);
-				return -EINVAL;
-			}
-
-			if (reg->type == PTR_TO_BTF_ID) {
-				reg_btf = reg->btf;
-				reg_ref_id = reg->btf_id;
-			} else {
-				reg_btf = btf_vmlinux;
-				reg_ref_id = *reg2btf_ids[base_type(reg->type)];
-			}
-
-			reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id,
-							    &reg_ref_id);
-			reg_ref_tname = btf_name_by_offset(reg_btf,
-							   reg_ref_t->name_off);
-			if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
-						  reg->off, btf, ref_id,
-						  trusted_args || (rel && reg->ref_obj_id))) {
-				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
-					func_name, i,
-					btf_type_str(ref_t), ref_tname,
-					regno, btf_type_str(reg_ref_t),
-					reg_ref_tname);
-				return -EINVAL;
-			}
 		} else if (ptr_to_mem_ok && processing_call) {
 			const struct btf_type *resolve_ret;
 			u32 type_size;
 
-			if (is_kfunc) {
-				bool arg_mem_size = i + 1 < nargs && is_kfunc_arg_mem_size(btf, &args[i + 1], &regs[regno + 1]);
-				bool arg_dynptr = btf_type_is_struct(ref_t) &&
-						  !strcmp(ref_tname,
-							  stringify_struct(bpf_dynptr_kern));
-
-				/* Permit pointer to mem, but only when argument
-				 * type is pointer to scalar, or struct composed
-				 * (recursively) of scalars.
-				 * When arg_mem_size is true, the pointer can be
-				 * void *.
-				 * Also permit initialized local dynamic pointers.
-				 */
-				if (!btf_type_is_scalar(ref_t) &&
-				    !__btf_type_is_scalar_struct(log, btf, ref_t, 0) &&
-				    !arg_dynptr &&
-				    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
-					bpf_log(log,
-						"arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
-						i, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
-					return -EINVAL;
-				}
-
-				if (arg_dynptr) {
-					if (reg->type != PTR_TO_STACK) {
-						bpf_log(log, "arg#%d pointer type %s %s not to stack\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					if (!is_dynptr_reg_valid_init(env, reg)) {
-						bpf_log(log,
-							"arg#%d pointer type %s %s must be valid and initialized\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					if (!is_dynptr_type_expected(env, reg,
-							ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
-						bpf_log(log,
-							"arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					continue;
-				}
-
-				/* Check for mem, len pair */
-				if (arg_mem_size) {
-					if (check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1)) {
-						bpf_log(log, "arg#%d arg#%d memory, len pair leads to invalid memory access\n",
-							i, i + 1);
-						return -EINVAL;
-					}
-					i++;
-					continue;
-				}
-			}
-
 			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
 			if (IS_ERR(resolve_ret)) {
 				bpf_log(log,
@@ -7005,36 +6688,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 			if (check_mem_reg(env, reg, regno, type_size))
 				return -EINVAL;
 		} else {
-			bpf_log(log, "reg type unsupported for arg#%d %sfunction %s#%d\n", i,
-				is_kfunc ? "kernel " : "", func_name, func_id);
+			bpf_log(log, "reg type unsupported for arg#%d function %s#%d\n", i,
+				func_name, func_id);
 			return -EINVAL;
 		}
 	}
 
-	/* Either both are set, or neither */
-	WARN_ON_ONCE((ref_obj_id && !ref_regno) || (!ref_obj_id && ref_regno));
-	/* We already made sure ref_obj_id is set only for one argument. We do
-	 * allow (!rel && ref_obj_id), so that passing such referenced
-	 * PTR_TO_BTF_ID to other kfuncs works. Note that rel is only true when
-	 * is_kfunc is true.
-	 */
-	if (rel && !ref_obj_id) {
-		bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
-			func_name);
-		return -EINVAL;
-	}
-
-	if (sleepable && !env->prog->aux->sleepable) {
-		bpf_log(log, "kernel function %s is sleepable but the program is not\n",
-			func_name);
-		return -EINVAL;
-	}
-
-	if (kfunc_meta && ref_obj_id)
-		kfunc_meta->ref_obj_id = ref_obj_id;
-
-	/* returns argument register number > 0 in case of reference release kfunc */
-	return rel ? ref_regno : 0;
+	return 0;
 }
 
 /* Compare BTF of a function declaration with given bpf_reg_state.
@@ -7064,7 +6724,7 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
 		return -EINVAL;
 
 	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
-	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, false);
+	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, false);
 
 	/* Compiler optimizations can remove arguments from static functions
 	 * or mismatched type can be passed into a global function.
@@ -7107,7 +6767,7 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 		return -EINVAL;
 
 	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
-	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, true);
+	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, true);
 
 	/* Compiler optimizations can remove arguments from static functions
 	 * or mismatched type can be passed into a global function.
@@ -7118,14 +6778,6 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 	return err;
 }
 
-int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
-			      const struct btf *btf, u32 func_id,
-			      struct bpf_reg_state *regs,
-			      struct bpf_kfunc_arg_meta *meta)
-{
-	return btf_check_func_arg_match(env, btf, func_id, regs, true, meta, true);
-}
-
 /* Convert BTF of a function into bpf_reg_state if possible
  * Returns:
  * EFAULT - there is a verifier bug. Abort verification.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e29ea51276cb..0ff021ab3064 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5368,8 +5368,8 @@ int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 	return err;
 }
 
-int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-			     u32 regno)
+static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				    u32 regno)
 {
 	struct bpf_reg_state *mem_reg = &cur_regs(env)[regno - 1];
 	bool may_be_null = type_may_be_null(mem_reg->type);
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 17/25] bpf: Support constant scalar arguments for kfuncs
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (15 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 16/25] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 18/25] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Allow passing known constant scalars as arguments to kfuncs that do not
represent a size parameter. This makes the search pruning optimization
of verifier more conservative for such kfunc calls, and each
non-distinct argument is considered unequivalent.

We will use this support to then expose a global bpf_kptr_alloc function
where it takes the local type ID in program BTF, and returns a
PTR_TO_BTF_ID to the local type. These will be called local kptrs, and
allows programs to allocate their own objects.

However, this is still not completely safe, as mark_chain_precision
logic is buggy without more work when the constant argument is not a
size, but still needs precise marker propagation for pruning checks.
Next patch will fix this problem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 Documentation/bpf/kfuncs.rst | 30 ++++++++++++++++++
 kernel/bpf/verifier.c        | 59 +++++++++++++++++++++++++++---------
 2 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0f858156371d..08f9a968d06d 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -72,6 +72,36 @@ argument as its size. By default, without __sz annotation, the size of the type
 of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
 pointer.
 
+2.2.1 __k Annotation
+--------------------
+
+This annotation is only understood for scalar arguments, where it indicates that
+the verifier must check the scalar argument to be a known constant, which does
+not indicate a size parameter. This distinction is important, as when the scalar
+argument does not represent a size parameter, verifier is more conservative in
+state search pruning and does not consider two arguments equivalent for safety
+purposes if the already verified value was within range of the new one.
+
+This assumption holds well for sizes (as memory accessed within smaller bounds
+in old verified state will also work for bigger bounds in current to be explored
+state), but not for other constant arguments where each carries a distinct
+semantic effect.
+
+An example is given below::
+
+        void *bpf_mem_alloc(u32 local_type_id__k)
+        {
+        ...
+        }
+
+Here, bpf_mem_alloc uses local_type_id argument to find out the size of that
+type ID in program's BTF and return a sized pointer to it. Each type ID will
+have a distinct size, hence it is crucial to treat each such call as distinct
+when values don't match.
+
+Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
+size parameter, __k suffix must be used.
+
 .. _BPF_kfunc_nodef:
 
 2.3 Using an existing kernel function
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0ff021ab3064..7c3d7d07773f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7676,6 +7676,10 @@ struct bpf_kfunc_call_arg_meta {
 	u8 release_regno;
 	bool r0_rdonly;
 	u64 r0_size;
+	struct {
+		u64 value;
+		bool found;
+	} arg_constant;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7713,30 +7717,40 @@ static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
 	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
 }
 
-static bool is_kfunc_arg_mem_size(const struct btf *btf,
-				  const struct btf_param *arg,
-				  const struct bpf_reg_state *reg)
+static bool __kfunc_param_match_suffix(const struct btf *btf,
+				       const struct btf_param *arg,
+				       const char *suffix)
 {
-	int len, sfx_len = sizeof("__sz") - 1;
-	const struct btf_type *t;
+	int suffix_len = strlen(suffix), len;
 	const char *param_name;
 
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
 	/* In the future, this can be ported to use BTF tagging */
 	param_name = btf_name_by_offset(btf, arg->name_off);
 	if (str_is_empty(param_name))
 		return false;
 	len = strlen(param_name);
-	if (len < sfx_len)
+	if (len < suffix_len)
 		return false;
-	param_name += len - sfx_len;
-	if (strncmp(param_name, "__sz", sfx_len))
+	param_name += len - suffix_len;
+	return !strncmp(param_name, suffix, suffix_len);
+}
+
+static bool is_kfunc_arg_mem_size(const struct btf *btf,
+				  const struct btf_param *arg,
+				  const struct bpf_reg_state *reg)
+{
+	const struct btf_type *t;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
 		return false;
 
-	return true;
+	return __kfunc_param_match_suffix(btf, arg, "__sz");
+}
+
+static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__k");
 }
 
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
@@ -8013,7 +8027,24 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "R%d is not a scalar\n", regno);
 				return -EINVAL;
 			}
-			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
+			if (is_kfunc_arg_sfx_constant(meta->btf, &args[i])) {
+				/* kfunc is already bpf_capable() only, no need
+				 * to check it here.
+				 */
+				if (meta->arg_constant.found) {
+					verbose(env, "verifier internal error: only one constant argument permitted\n");
+					return -EFAULT;
+				}
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "R%d must be a known constant\n", regno);
+					return -EINVAL;
+				}
+				ret = mark_chain_precision(env, regno);
+				if (ret < 0)
+					return ret;
+				meta->arg_constant.found = true;
+				meta->arg_constant.value = reg->var_off.value;
+			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
 					meta->r0_rdonly = true;
 					is_ret_buf_sz = true;
 			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 18/25] bpf: Teach verifier about non-size constant arguments
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (16 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 17/25] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Currently, the verifier has support for various arguments that either
describe the size of the memory being passed in to a helper, or describe
the size of the memory being returned. When a constant is passed in like
this, it is assumed for the purposes of precision tracking that if the
value in the already explored safe state is within the value in current
state, it would fine to prune the search.

While this holds well for size arguments, arguments where each value may
denote a distinct meaning and needs to be verified separately needs more
work. Search can only be pruned if both are constant values and both are
equal. In all other cases, it would be incorrect to treat those two
precise registers as equivalent if the new value satisfies the old one
(i.e. old <= cur).

Hence, make the register precision marker tri-state. There are now three
values that reg->precise takes: NOT_PRECISE, PRECISE, EXACT.

Both PRECISE and EXACT are 'true' values. EXACT affects how regsafe
decides whether both registers are equivalent for the purposes of
verifier state equivalence. When it sees that one register has
reg->precise == EXACT, unless both are absolute, it will return false.
When both are, it returns true only when both are const and both have
the same value. Otherwise, for PRECISE case it falls back to the default
check that is present now (i.e. thinking that we're talking about
sizes).

This is required as a future patch introduces a BPF memory allocator
interface, where we take the program BTF's type ID as an argument. Each
distinct type ID may result in the returned pointer obtaining a
different size, hence precision tracking is needed, and pruning cannot
just happen when the old value is within the current value. It must only
happen when the type ID is equal. The type ID will always correspond to
prog->aux->btf hence actual type match is not required.

Finally, change mark_chain_precision precision argument to EXACT for
kfuncs constant non-size scalar arguments (tagged with __k suffix).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  10 ++--
 kernel/bpf/verifier.c        | 101 ++++++++++++++++++++++-------------
 2 files changed, 70 insertions(+), 41 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 4585de45ad1c..8b09c3f82071 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -43,6 +43,12 @@ enum bpf_reg_liveness {
 	REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
 };
 
+enum bpf_reg_precise {
+	NOT_PRECISE,
+	PRECISE,
+	EXACT,
+};
+
 struct bpf_reg_state {
 	/* Ordering of fields matters.  See states_equal() */
 	enum bpf_reg_type type;
@@ -180,7 +186,7 @@ struct bpf_reg_state {
 	s32 subreg_def;
 	enum bpf_reg_liveness live;
 	/* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */
-	bool precise;
+	enum bpf_reg_precise precise;
 };
 
 enum bpf_stack_slot_type {
@@ -624,8 +630,6 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 			    struct bpf_attach_target_info *tgt_info);
 void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab);
 
-int mark_chain_precision(struct bpf_verifier_env *env, int regno);
-
 #define BPF_BASE_TYPE_MASK	GENMASK(BPF_BASE_TYPE_BITS - 1, 0)
 
 /* extract base type from bpf_{arg, return, reg}_type. */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7c3d7d07773f..b324c1042fb8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -855,7 +855,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 		print_liveness(env, reg->live);
 		verbose(env, "=");
 		if (t == SCALAR_VALUE && reg->precise)
-			verbose(env, "P");
+			verbose(env, reg->precise == EXACT ? "E" : "P");
 		if ((t == SCALAR_VALUE || t == PTR_TO_STACK) &&
 		    tnum_is_const(reg->var_off)) {
 			/* reg->off should be 0 for SCALAR_VALUE */
@@ -952,7 +952,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			t = reg->type;
 			verbose(env, "=%s", t == SCALAR_VALUE ? "" : reg_type_str(env, t));
 			if (t == SCALAR_VALUE && reg->precise)
-				verbose(env, "P");
+				verbose(env, reg->precise == EXACT ? "E" : "P");
 			if (t == SCALAR_VALUE && tnum_is_const(reg->var_off))
 				verbose(env, "%lld", reg->var_off.value + reg->off);
 		} else {
@@ -1686,7 +1686,17 @@ static void __mark_reg_unknown(const struct bpf_verifier_env *env,
 	reg->type = SCALAR_VALUE;
 	reg->var_off = tnum_unknown;
 	reg->frameno = 0;
-	reg->precise = env->subprog_cnt > 1 || !env->bpf_capable;
+	/* Helpers requiring EXACT for constant arguments cannot be called from
+	 * programs without CAP_BPF. This is because we don't propagate
+	 * precision markers for when CAP_BPF is missing. If we allowed calling
+	 * such heleprs in those programs, the default would have to be EXACT
+	 * for them, which would be too aggresive.
+	 *
+	 * We still propagate EXACT when subprog_cnt > 1, hence those cases
+	 * would still override the default PRECISE value when we propagate the
+	 * precision markers.
+	 */
+	reg->precise = (env->subprog_cnt > 1 || !env->bpf_capable) ? PRECISE : NOT_PRECISE;
 	__mark_reg_unbounded(reg);
 }
 
@@ -2736,7 +2746,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
  * For now backtracking falls back into conservative marking.
  */
 static void mark_all_scalars_precise(struct bpf_verifier_env *env,
-				     struct bpf_verifier_state *st)
+				     struct bpf_verifier_state *st,
+				     enum bpf_reg_precise precise)
 {
 	struct bpf_func_state *func;
 	struct bpf_reg_state *reg;
@@ -2752,7 +2763,7 @@ static void mark_all_scalars_precise(struct bpf_verifier_env *env,
 				reg = &func->regs[j];
 				if (reg->type != SCALAR_VALUE)
 					continue;
-				reg->precise = true;
+				reg->precise = precise;
 			}
 			for (j = 0; j < func->allocated_stack / BPF_REG_SIZE; j++) {
 				if (!is_spilled_reg(&func->stack[j]))
@@ -2760,13 +2771,13 @@ static void mark_all_scalars_precise(struct bpf_verifier_env *env,
 				reg = &func->stack[j].spilled_ptr;
 				if (reg->type != SCALAR_VALUE)
 					continue;
-				reg->precise = true;
+				reg->precise = precise;
 			}
 		}
 }
 
 static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
-				  int spi)
+				  int spi, enum bpf_reg_precise precise)
 {
 	struct bpf_verifier_state *st = env->cur_state;
 	int first_idx = st->first_insn_idx;
@@ -2793,7 +2804,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 			new_marks = true;
 		else
 			reg_mask = 0;
-		reg->precise = true;
+		reg->precise = precise;
 	}
 
 	while (spi >= 0) {
@@ -2810,7 +2821,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 			new_marks = true;
 		else
 			stack_mask = 0;
-		reg->precise = true;
+		reg->precise = precise;
 		break;
 	}
 
@@ -2832,7 +2843,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 				err = backtrack_insn(env, i, &reg_mask, &stack_mask);
 			}
 			if (err == -ENOTSUPP) {
-				mark_all_scalars_precise(env, st);
+				mark_all_scalars_precise(env, st, precise);
 				return 0;
 			} else if (err) {
 				return err;
@@ -2873,7 +2884,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 			}
 			if (!reg->precise)
 				new_marks = true;
-			reg->precise = true;
+			reg->precise = precise;
 		}
 
 		bitmap_from_u64(mask, stack_mask);
@@ -2892,7 +2903,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 				 * fp-8 and it's "unallocated" stack space.
 				 * In such case fallback to conservative.
 				 */
-				mark_all_scalars_precise(env, st);
+				mark_all_scalars_precise(env, st, precise);
 				return 0;
 			}
 
@@ -2907,7 +2918,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 			}
 			if (!reg->precise)
 				new_marks = true;
-			reg->precise = true;
+			reg->precise = precise;
 		}
 		if (env->log.level & BPF_LOG_LEVEL2) {
 			verbose(env, "parent %s regs=%x stack=%llx marks:",
@@ -2927,14 +2938,16 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
-int mark_chain_precision(struct bpf_verifier_env *env, int regno)
+static int mark_chain_precision(struct bpf_verifier_env *env, int regno,
+				enum bpf_reg_precise precise)
 {
-	return __mark_chain_precision(env, regno, -1);
+	return __mark_chain_precision(env, regno, -1, precise);
 }
 
-static int mark_chain_precision_stack(struct bpf_verifier_env *env, int spi)
+static int mark_chain_precision_stack(struct bpf_verifier_env *env, int spi,
+				      enum bpf_reg_precise precise)
 {
-	return __mark_chain_precision(env, -1, spi);
+	return __mark_chain_precision(env, -1, spi, precise);
 }
 
 static bool is_spillable_regtype(enum bpf_reg_type type)
@@ -3069,7 +3082,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 			 * Backtrack from here and mark all registers as precise
 			 * that contributed into 'reg' being a constant.
 			 */
-			err = mark_chain_precision(env, value_regno);
+			err = mark_chain_precision(env, value_regno, PRECISE);
 			if (err)
 				return err;
 		}
@@ -3110,7 +3123,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 		/* when we zero initialize stack slots mark them as such */
 		if (reg && register_is_null(reg)) {
 			/* backtracking doesn't work for STACK_ZERO yet. */
-			err = mark_chain_precision(env, value_regno);
+			err = mark_chain_precision(env, value_regno, PRECISE);
 			if (err)
 				return err;
 			type = STACK_ZERO;
@@ -3226,7 +3239,7 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 	}
 	if (zero_used) {
 		/* backtracking doesn't work for STACK_ZERO yet. */
-		err = mark_chain_precision(env, value_regno);
+		err = mark_chain_precision(env, value_regno, PRECISE);
 		if (err)
 			return err;
 	}
@@ -3275,7 +3288,7 @@ static void mark_reg_stack_read(struct bpf_verifier_env *env,
 		 * backtracking. Any register that contributed
 		 * to const 0 was marked precise before spill.
 		 */
-		state->regs[dst_regno].precise = true;
+		state->regs[dst_regno].precise = PRECISE;
 	} else {
 		/* have read misc data from the stack */
 		mark_reg_unknown(env, state->regs, dst_regno);
@@ -5332,7 +5345,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
 				      reg->umax_value,
 				      zero_size_allowed, meta);
 	if (!err)
-		err = mark_chain_precision(env, regno);
+		err = mark_chain_precision(env, regno, PRECISE);
 	return err;
 }
 
@@ -6150,7 +6163,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		meta->mem_size = reg->var_off.value;
-		err = mark_chain_precision(env, regno);
+		err = mark_chain_precision(env, regno, PRECISE);
 		if (err)
 			return err;
 		break;
@@ -7117,7 +7130,7 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 		return 0;
 	}
 
-	err = mark_chain_precision(env, BPF_REG_3);
+	err = mark_chain_precision(env, BPF_REG_3, PRECISE);
 	if (err)
 		return err;
 	if (bpf_map_key_unseen(aux))
@@ -7217,7 +7230,7 @@ static bool loop_flag_is_zero(struct bpf_verifier_env *env)
 	bool reg_is_null = register_is_null(reg);
 
 	if (reg_is_null)
-		mark_chain_precision(env, BPF_REG_4);
+		mark_chain_precision(env, BPF_REG_4, PRECISE);
 
 	return reg_is_null;
 }
@@ -8039,7 +8052,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 					verbose(env, "R%d must be a known constant\n", regno);
 					return -EINVAL;
 				}
-				ret = mark_chain_precision(env, regno);
+				ret = mark_chain_precision(env, regno, EXACT);
 				if (ret < 0)
 					return ret;
 				meta->arg_constant.found = true;
@@ -8063,7 +8076,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				}
 
 				meta->r0_size = reg->var_off.value;
-				ret = mark_chain_precision(env, regno);
+				ret = mark_chain_precision(env, regno, PRECISE);
 				if (ret)
 					return ret;
 			}
@@ -9742,7 +9755,7 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 				 * This is legal, but we have to reverse our
 				 * src/dest handling in computing the range
 				 */
-				err = mark_chain_precision(env, insn->dst_reg);
+				err = mark_chain_precision(env, insn->dst_reg, PRECISE);
 				if (err)
 					return err;
 				return adjust_ptr_min_max_vals(env, insn,
@@ -9750,7 +9763,7 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 			}
 		} else if (ptr_reg) {
 			/* pointer += scalar */
-			err = mark_chain_precision(env, insn->src_reg);
+			err = mark_chain_precision(env, insn->src_reg, PRECISE);
 			if (err)
 				return err;
 			return adjust_ptr_min_max_vals(env, insn,
@@ -10746,10 +10759,10 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 		 * above is_branch_taken() special cased the 0 comparison.
 		 */
 		if (!__is_pointer_value(false, dst_reg))
-			err = mark_chain_precision(env, insn->dst_reg);
+			err = mark_chain_precision(env, insn->dst_reg, PRECISE);
 		if (BPF_SRC(insn->code) == BPF_X && !err &&
 		    !__is_pointer_value(false, src_reg))
-			err = mark_chain_precision(env, insn->src_reg);
+			err = mark_chain_precision(env, insn->src_reg, PRECISE);
 		if (err)
 			return err;
 	}
@@ -12070,9 +12083,19 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 		if (rcur->type == SCALAR_VALUE) {
 			if (!rold->precise && !rcur->precise)
 				return true;
-			/* new val must satisfy old val knowledge */
-			return range_within(rold, rcur) &&
-			       tnum_in(rold->var_off, rcur->var_off);
+			/* We can only determine safety when type of precision
+			 * needed is same. For EXACT, we need values to match
+			 * exactly, so simply return false as the memcmp above
+			 * failed already, otherwise current being within the
+			 * old value suffices.
+			 */
+			if (rold->precise == EXACT || rcur->precise == EXACT) {
+				return false;
+			} else {
+				/* new val must satisfy old val knowledge */
+				return range_within(rold, rcur) &&
+				       tnum_in(rold->var_off, rcur->var_off);
+			}
 		} else {
 			/* We're trying to use a pointer in place of a scalar.
 			 * Even if the scalar was unbounded, this could lead to
@@ -12401,8 +12424,9 @@ static int propagate_precision(struct bpf_verifier_env *env,
 		    !state_reg->precise)
 			continue;
 		if (env->log.level & BPF_LOG_LEVEL2)
-			verbose(env, "propagating r%d\n", i);
-		err = mark_chain_precision(env, i);
+			verbose(env, "propagating %sr%d\n",
+				state_reg->precise == EXACT ? "exact " : "", i);
+		err = mark_chain_precision(env, i, state_reg->precise);
 		if (err < 0)
 			return err;
 	}
@@ -12415,9 +12439,10 @@ static int propagate_precision(struct bpf_verifier_env *env,
 		    !state_reg->precise)
 			continue;
 		if (env->log.level & BPF_LOG_LEVEL2)
-			verbose(env, "propagating fp%d\n",
+			verbose(env, "propagating %sfp%d\n",
+				state_reg->precise == EXACT ? "exact " : "",
 				(-i - 1) * BPF_REG_SIZE);
-		err = mark_chain_precision_stack(env, i);
+		err = mark_chain_precision_stack(env, i, state_reg->precise);
 		if (err < 0)
 			return err;
 	}
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (17 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 18/25] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-19  2:31   ` Alexei Starovoitov
  2022-10-13  6:22 ` [PATCH bpf-next v2 20/25] bpf: Introduce bpf_kptr_drop Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Introduce type safe memory allocator bpf_kptr_new for BPF programs. The
kernel side kfunc is named bpf_kptr_new_impl, as passing hidden
arguments to kfuncs still requires having them in prototype, unlike BPF
helpers which always take 5 arguments and have them checked using
bpf_func_proto in verifier, ignoring unset argument types.

Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_kptr_new_impl kfunc.

The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).

The second argument flags is reserved to be 0 for now.

The third argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.

It would also be needed in the next patch to perform proper destruction
of the object's special fields.

A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
};

void prog(void) {
	struct foo *f;

	f = bpf_kptr_new(typeof(*f));
	if (!f)
		return;
	...
}

A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  21 ++--
 include/linux/bpf_verifier.h                  |   2 +
 kernel/bpf/core.c                             |  14 +++
 kernel/bpf/helpers.c                          |  41 +++++--
 kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  19 ++++
 6 files changed, 181 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7ffafa5bb866..29fccf7c8505 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -55,6 +55,8 @@ struct cgroup;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
+extern struct bpf_mem_alloc bpf_global_ma;
+extern bool bpf_global_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
@@ -335,16 +337,19 @@ static inline bool btf_type_fields_has_field(const struct btf_type_fields *tab,
 	return tab->field_mask & type;
 }
 
-static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+static inline void bpf_obj_init(const struct btf_type_fields_off *off_arr, void *obj)
 {
-	if (!IS_ERR_OR_NULL(map->fields_tab)) {
-		struct btf_field *fields = map->fields_tab->fields;
-		u32 cnt = map->fields_tab->cnt;
-		int i;
+	int i;
 
-		for (i = 0; i < cnt; i++)
-			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
-	}
+	if (!off_arr)
+		return;
+	for (i = 0; i < off_arr->cnt; i++)
+		memset(obj + off_arr->field_off[i], 0, off_arr->field_sz[i]);
+}
+
+static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+{
+	bpf_obj_init(map->off_arr, dst);
 }
 
 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8b09c3f82071..0cc4679f3f42 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,8 @@ struct bpf_insn_aux_data {
 		 */
 		struct bpf_loop_inline_state loop_inline_state;
 	};
+	u64 kptr_new_size; /* remember the size of type passed to bpf_kptr_new to rewrite R1 */
+	struct btf_struct_meta *kptr_struct_meta;
 	u64 map_key_state; /* constant (32 bit) key tracking for maps */
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 711fd293b6de..a8b3263a9a45 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -34,6 +34,7 @@
 #include <linux/log2.h>
 #include <linux/bpf_verifier.h>
 #include <linux/nodemask.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include <asm/barrier.h>
 #include <asm/unaligned.h>
@@ -60,6 +61,9 @@
 #define CTX	regs[BPF_REG_CTX]
 #define IMM	insn->imm
 
+struct bpf_mem_alloc bpf_global_ma;
+bool bpf_global_ma_set;
+
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
@@ -2740,6 +2744,16 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
 	return -ENOTSUPP;
 }
 
+static int __init bpf_global_ma_init(void)
+{
+	int ret;
+
+	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
+	bpf_global_ma_set = !ret;
+	return ret;
+}
+late_initcall(bpf_global_ma_init);
+
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 238103dc6c5e..954e0bf18269 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -19,6 +19,7 @@
 #include <linux/proc_ns.h>
 #include <linux/security.h>
 #include <linux/btf_ids.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -1725,8 +1726,11 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 
 		obj -= field->list_head.node_offset;
 		head = head->next;
-		/* TODO: Rework later */
-		kfree(obj);
+		/* The contained type can also have resources, including a
+		 * bpf_list_head which needs to be freed.
+		 */
+		bpf_obj_free_fields(field->list_head.value_tab, obj);
+		bpf_mem_free(&bpf_global_ma, obj);
 	}
 unlock:
 	INIT_LIST_HEAD(head);
@@ -1734,20 +1738,43 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	local_irq_restore(flags);
 }
 
-BTF_SET8_START(tracing_btf_ids)
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	u64 size = local_type_id__k;
+	void *p;
+
+	if (unlikely(flags || !bpf_global_ma_set))
+		return NULL;
+	p = bpf_mem_alloc(&bpf_global_ma, size);
+	if (!p)
+		return NULL;
+	if (meta)
+		bpf_obj_init(meta->off_arr, p);
+	return p;
+}
+
+__diag_pop();
+
+BTF_SET8_START(generic_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
-BTF_SET8_END(tracing_btf_ids)
+BTF_ID_FLAGS(func, bpf_kptr_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_SET8_END(generic_btf_ids)
 
-static const struct btf_kfunc_id_set tracing_kfunc_set = {
+static const struct btf_kfunc_id_set generic_kfunc_set = {
 	.owner = THIS_MODULE,
-	.set   = &tracing_btf_ids,
+	.set   = &generic_btf_ids,
 };
 
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
 }
 
 late_initcall(kfunc_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b324c1042fb8..9cc01535e391 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7766,6 +7766,11 @@ static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
 	return __kfunc_param_match_suffix(btf, arg, "__k");
 }
 
+static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__ign");
+}
+
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 				      const struct btf_param *arg,
 				      const struct bpf_reg_state *reg,
@@ -8035,6 +8040,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		int kf_arg_type;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+
+		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
+			continue;
+
 		if (btf_type_is_scalar(t)) {
 			if (reg->type != SCALAR_VALUE) {
 				verbose(env, "R%d is not a scalar\n", regno);
@@ -8212,6 +8221,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	return 0;
 }
 
+enum special_kfunc_type {
+	KF_bpf_kptr_new_impl,
+};
+
+BTF_SET_START(special_kfunc_set)
+BTF_ID(func, bpf_kptr_new_impl)
+BTF_SET_END(special_kfunc_set)
+
+BTF_ID_LIST(special_kfunc_list)
+BTF_ID(func, bpf_kptr_new_impl)
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -8286,17 +8306,64 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
 
 	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
-		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
-		return -EINVAL;
+		/* Only exception is bpf_kptr_new_impl */
+		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_kptr_new_impl]) {
+			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
+			return -EINVAL;
+		}
 	}
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
-		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
-						   &ptr_type_id);
-		if (!btf_type_is_struct(ptr_type)) {
+		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
+
+		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
+			if (!btf_type_is_void(ptr_type)) {
+				verbose(env, "kernel function %s must have void * return type\n",
+					meta.func_name);
+				return -EINVAL;
+			}
+			if (meta.func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
+				const struct btf_type *ret_t;
+				struct btf *ret_btf;
+				u32 ret_btf_id;
+
+				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
+					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
+					return -EINVAL;
+				}
+
+				ret_btf = env->prog->aux->btf;
+				ret_btf_id = meta.arg_constant.value;
+
+				/* This may be NULL due to user not supplying a BTF */
+				if (!ret_btf) {
+					verbose(env, "bpf_kptr_new requires prog BTF\n");
+					return -EINVAL;
+				}
+
+				mark_reg_known_zero(env, regs, BPF_REG_0);
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_TYPE_LOCAL;
+				regs[BPF_REG_0].btf = ret_btf;
+				regs[BPF_REG_0].btf_id = ret_btf_id;
+
+				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
+				if (!ret_t || !__btf_type_is_struct(ret_t)) {
+					verbose(env, "bpf_kptr_new type ID argument must be of a struct\n");
+					return -EINVAL;
+				}
+
+				env->insn_aux_data[insn_idx].kptr_new_size = ret_t->size;
+				env->insn_aux_data[insn_idx].kptr_struct_meta =
+					btf_find_struct_meta(ret_btf, ret_btf_id);
+			} else {
+				verbose(env, "kernel function %s unhandled dynamic return type\n",
+					meta.func_name);
+				return -EFAULT;
+			}
+		} else if (!__btf_type_is_struct(ptr_type)) {
 			if (!meta.r0_size) {
 				ptr_type_name = btf_name_by_offset(desc_btf,
 								   ptr_type->name_off);
@@ -8324,6 +8391,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 		}
+
 		if (is_kfunc_ret_null(&meta)) {
 			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
 			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
@@ -14455,8 +14523,8 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 	return err;
 }
 
-static int fixup_kfunc_call(struct bpf_verifier_env *env,
-			    struct bpf_insn *insn)
+static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
 {
 	const struct bpf_kfunc_desc *desc;
 
@@ -14475,8 +14543,21 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env,
 		return -EFAULT;
 	}
 
+	*cnt = 0;
 	insn->imm = desc->imm;
+	if (insn->off)
+		return 0;
+	if (desc->func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
+		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_3, (long)kptr_struct_meta) };
+		u64 kptr_new_size = env->insn_aux_data[insn_idx].kptr_new_size;
 
+		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, kptr_new_size);
+		insn_buf[1] = addr[0];
+		insn_buf[2] = addr[1];
+		insn_buf[3] = *insn;
+		*cnt = 4;
+	}
 	return 0;
 }
 
@@ -14618,9 +14699,19 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
-			ret = fixup_kfunc_call(env, insn);
+			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
+			if (cnt == 0)
+				continue;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta	 += cnt - 1;
+			env->prog = prog = new_prog;
+			insn	  = new_prog->insnsi + i + delta;
 			continue;
 		}
 
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4e31790e433d..9c7d0badb02e 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -20,4 +20,23 @@ struct bpf_list_node {
 #endif
 
 #ifndef __KERNEL__
+
+/* Description
+ *	Allocates a local kptr of type represented by 'local_type_id' in program
+ *	BTF. User may use the bpf_core_type_id_local macro to pass the type ID
+ *	of a struct in program BTF.
+ *
+ *	The 'local_type_id' parameter must be a known constant. The 'flags'
+ *	parameter must be 0.
+ *
+ *	The 'meta__ign' parameter is a hidden argument that is ignored.
+ * Returns
+ *	A local kptr corresponding to passed in 'local_type_id', or NULL on
+ *	failure.
+ */
+extern void *bpf_kptr_new_impl(__u64 local_type_id, __u64 flags, void *meta__ign) __ksym;
+
+/* Convenience macro to wrap over bpf_kptr_new_impl */
+#define bpf_kptr_new(type) bpf_kptr_new_impl(bpf_core_type_id_local(type), 0, NULL)
+
 #endif
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 20/25] bpf: Introduce bpf_kptr_drop
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (18 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:22 ` [PATCH bpf-next v2 21/25] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Introduce bpf_kptr_drop, which is the kfunc used to free local kptrs
allocated using bpf_kptr_new. Similar to bpf_kptr_new, it implicitly
destructs the fields part of the local kptr automatically without user
intervention.

Just like the previous patch, btf_struct_meta that is needed to free up
the special fields is passed as a hidden argument to the kfunc.

For the user, a convenience macro hides over the kernel side kfunc which
is named bpf_kptr_drop_impl.

Continuing the previous example:

void prog(void) {
	struct foo *f;

	f = bpf_kptr_new(typeof(*f));
	if (!f)
		return;
	bpf_kptr_drop(f);
}

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c                          | 11 ++++
 kernel/bpf/verifier.c                         | 66 +++++++++++++++----
 .../testing/selftests/bpf/bpf_experimental.h  | 13 ++++
 3 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 954e0bf18269..43a7c9999e94 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1758,6 +1758,16 @@ void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
 	return p;
 }
 
+void bpf_kptr_drop_impl(void *p__lkptr, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	void *p = p__lkptr;
+
+	if (meta)
+		bpf_obj_free_fields(meta->fields_tab, p);
+	bpf_mem_free(&bpf_global_ma, p);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -1765,6 +1775,7 @@ BTF_SET8_START(generic_btf_ids)
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_kptr_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_kptr_drop_impl, KF_RELEASE)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9cc01535e391..a4a806cb68dc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7693,6 +7693,10 @@ struct bpf_kfunc_call_arg_meta {
 		u64 value;
 		bool found;
 	} arg_constant;
+	struct {
+		struct btf *btf;
+		u32 btf_id;
+	} arg_kptr_drop;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7771,6 +7775,11 @@ static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_para
 	return __kfunc_param_match_suffix(btf, arg, "__ign");
 }
 
+static bool is_kfunc_arg_local_kptr(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__lkptr");
+}
+
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 				      const struct btf_param *arg,
 				      const struct bpf_reg_state *reg,
@@ -7871,6 +7880,7 @@ static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
 
 enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CTX,
+	KF_ARG_PTR_TO_LOCAL_BTF_ID,  /* Local kptr */
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
 	KF_ARG_PTR_TO_DYNPTR,
@@ -7878,6 +7888,20 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
 };
 
+enum special_kfunc_type {
+	KF_bpf_kptr_new_impl,
+	KF_bpf_kptr_drop_impl,
+};
+
+BTF_SET_START(special_kfunc_set)
+BTF_ID(func, bpf_kptr_new_impl)
+BTF_ID(func, bpf_kptr_drop_impl)
+BTF_SET_END(special_kfunc_set)
+
+BTF_ID_LIST(special_kfunc_list)
+BTF_ID(func, bpf_kptr_new_impl)
+BTF_ID(func, bpf_kptr_drop_impl)
+
 enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 						struct bpf_kfunc_call_arg_meta *meta,
 						const struct btf_type *t,
@@ -7899,6 +7923,9 @@ enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
 		return KF_ARG_PTR_TO_CTX;
 
+	if (is_kfunc_arg_local_kptr(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_LOCAL_BTF_ID;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -8117,6 +8144,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return kf_arg_type;
 
 		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_LOCAL_BTF_ID:
 		case KF_ARG_PTR_TO_BTF_ID:
 			if (is_kfunc_trusted_args(meta) && !reg->ref_obj_id) {
 				verbose(env, "R%d must be referenced\n", regno);
@@ -8151,6 +8179,21 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			break;
+		case KF_ARG_PTR_TO_LOCAL_BTF_ID:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
+				verbose(env, "arg#%d expected point to local kptr\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "local kptr must be referenced\n");
+				return -EINVAL;
+			}
+			if (meta->btf == btf_vmlinux &&
+			    meta->func_id == special_kfunc_list[KF_bpf_kptr_drop_impl]) {
+				meta->arg_kptr_drop.btf = reg->btf;
+				meta->arg_kptr_drop.btf_id = reg->btf_id;
+			}
+			break;
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if (reg->type != PTR_TO_BTF_ID &&
@@ -8221,17 +8264,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	return 0;
 }
 
-enum special_kfunc_type {
-	KF_bpf_kptr_new_impl,
-};
-
-BTF_SET_START(special_kfunc_set)
-BTF_ID(func, bpf_kptr_new_impl)
-BTF_SET_END(special_kfunc_set)
-
-BTF_ID_LIST(special_kfunc_list)
-BTF_ID(func, bpf_kptr_new_impl)
-
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -8358,6 +8390,10 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				env->insn_aux_data[insn_idx].kptr_new_size = ret_t->size;
 				env->insn_aux_data[insn_idx].kptr_struct_meta =
 					btf_find_struct_meta(ret_btf, ret_btf_id);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_kptr_drop_impl]) {
+				env->insn_aux_data[insn_idx].kptr_struct_meta =
+					btf_find_struct_meta(meta.arg_kptr_drop.btf,
+							     meta.arg_kptr_drop.btf_id);
 			} else {
 				verbose(env, "kernel function %s unhandled dynamic return type\n",
 					meta.func_name);
@@ -14557,6 +14593,14 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[2] = addr[1];
 		insn_buf[3] = *insn;
 		*cnt = 4;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_kptr_drop_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
+		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
+
+		insn_buf[0] = addr[0];
+		insn_buf[1] = addr[1];
+		insn_buf[2] = *insn;
+		*cnt = 3;
 	}
 	return 0;
 }
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 9c7d0badb02e..c47d16f3e817 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -39,4 +39,17 @@ extern void *bpf_kptr_new_impl(__u64 local_type_id, __u64 flags, void *meta__ign
 /* Convenience macro to wrap over bpf_kptr_new_impl */
 #define bpf_kptr_new(type) bpf_kptr_new_impl(bpf_core_type_id_local(type), 0, NULL)
 
+/* Description
+ *	Free a local kptr. All fields of local kptr that require destruction
+ *	will be destructed before the storage is freed.
+ *
+ *	The 'meta__ign' parameter is a hidden argument that is ignored.
+ * Returns
+ *	Void.
+ */
+extern void bpf_kptr_drop_impl(void *kptr, void *meta__ign) __ksym;
+
+/* Convenience macro to wrap over bpf_kptr_drop_impl */
+#define bpf_kptr_drop(kptr) bpf_kptr_drop_impl(kptr, NULL)
+
 #endif
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 21/25] bpf: Permit NULL checking pointer with non-zero fixed offset
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (19 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 20/25] bpf: Introduce bpf_kptr_drop Kumar Kartikeya Dwivedi
@ 2022-10-13  6:22 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:23 ` [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:22 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Pointer increment on seeing PTR_MAYBE_NULL is already protected against,
hence make an exception for local kptrs while still keeping the warning
for other unintended cases that might creep in.

bpf_list_del{,tail} helpers return a local kptr with incremented offset
pointing to bpf_list_node field. The user is supposed to then obtain the
pointer to the entry using container_of after NULL checking it. The
current restrictions trigger a warning when doing the NULL checking.
Revisiting the reason, it is meant as an assertion which seems to
actually work and catch the bad case.

Hence, under no other circumstances can reg->off be non-zero for a
register that has the PTR_MAYBE_NULL type flag set.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a4a806cb68dc..a8cd04c18ac5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10612,15 +10612,20 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 {
 	if (type_may_be_null(reg->type) && reg->id == id &&
 	    !WARN_ON_ONCE(!reg->id)) {
-		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||
-				 !tnum_equals_const(reg->var_off, 0) ||
-				 reg->off)) {
+		if (reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0) || reg->off) {
 			/* Old offset (both fixed and variable parts) should
 			 * have been known-zero, because we don't allow pointer
 			 * arithmetic on pointers that might be NULL. If we
 			 * see this happening, don't convert the register.
+			 *
+			 * But in some cases, some helpers that return local
+			 * kptrs advance offset for the returned pointer.
+			 * In those cases, it is fine to expect to see reg->off.
 			 */
-			return;
+			if (WARN_ON_ONCE(reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL | PTR_MAYBE_NULL)))
+				return;
+			if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0)))
+				return;
 		}
 		if (is_null) {
 			reg->type = SCALAR_VALUE;
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (20 preceding siblings ...)
  2022-10-13  6:22 ` [PATCH bpf-next v2 21/25] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
@ 2022-10-13  6:23 ` Kumar Kartikeya Dwivedi
  2022-10-25 17:45   ` Dave Marchevsky
  2022-10-13  6:23 ` [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:23 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Add a linked list API for use in BPF programs, where it expects
protection from the bpf_spin_lock in the same allocation as the
bpf_list_head. Future patches will extend the same infrastructure to
have different flavors with varying protection domains and visibility
(e.g. percpu variant with local_t protection, usable in NMI progs).

The following functions are added to kick things off:

bpf_list_add
bpf_list_add_tail
bpf_list_del
bpf_list_del_tail

The lock protecting the bpf_list_head needs to be taken for all
operations.

Once a node has been added to the list, it's pointer changes to
PTR_UNTRUSTED. However, it is only released once the lock protecting the
list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an
active ref_obj_id, it is still permitted to read and write to them as
long as the lock is held.

bpf_list_del and bpf_list_del_tail delete the first or last item of the
list respectively, and return pointer to the element at the list_node
offset. The user can then use container_of style macro to get the actual
entry type. The verifier however statically knows the actual type, so
the safety properties are still preserved.

With these additions, programs can now manage their own linked lists and
store their objects in them.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h                  |   5 +
 kernel/bpf/helpers.c                          |  48 +++
 kernel/bpf/verifier.c                         | 344 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  28 ++
 4 files changed, 391 insertions(+), 34 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0cc4679f3f42..01d3dd76b224 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -229,6 +229,11 @@ struct bpf_reference_state {
 	 * exiting a callback function.
 	 */
 	int callback_ref;
+	/* Mark the reference state to release the registers sharing the same id
+	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
+	 * safe to access inside the critical section).
+	 */
+	bool release_on_unlock;
 };
 
 /* state of the program:
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 43a7c9999e94..71e0f19f738a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1768,6 +1768,50 @@ void bpf_kptr_drop_impl(void *p__lkptr, void *meta__ign)
 	bpf_mem_free(&bpf_global_ma, p);
 }
 
+static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
+{
+	struct list_head *n = (void *)node, *h = (void *)head;
+
+	if (unlikely(!h->next))
+		INIT_LIST_HEAD(h);
+	if (unlikely(!n->next))
+		INIT_LIST_HEAD(n);
+	tail ? list_add_tail(n, h) : list_add(n, h);
+}
+
+void bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head)
+{
+	return __bpf_list_add(node, head, false);
+}
+
+void bpf_list_add_tail(struct bpf_list_node *node, struct bpf_list_head *head)
+{
+	return __bpf_list_add(node, head, true);
+}
+
+static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
+{
+	struct list_head *n, *h = (void *)head;
+
+	if (unlikely(!h->next))
+		INIT_LIST_HEAD(h);
+	if (list_empty(h))
+		return NULL;
+	n = tail ? h->prev : h->next;
+	list_del_init(n);
+	return (struct bpf_list_node *)n;
+}
+
+struct bpf_list_node *bpf_list_del(struct bpf_list_head *head)
+{
+	return __bpf_list_del(head, false);
+}
+
+struct bpf_list_node *bpf_list_del_tail(struct bpf_list_head *head)
+{
+	return __bpf_list_del(head, true);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -1776,6 +1820,10 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_kptr_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_kptr_drop_impl, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_list_add)
+BTF_ID_FLAGS(func, bpf_list_add_tail)
+BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_list_del_tail, KF_ACQUIRE | KF_RET_NULL)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a8cd04c18ac5..96cf576784c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5485,7 +5485,9 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			cur->active_spin_lock_ptr = btf;
 		cur->active_spin_lock_id = reg->id;
 	} else {
+		struct bpf_func_state *fstate = cur_func(env);
 		void *ptr;
+		int i;
 
 		if (map)
 			ptr = map;
@@ -5503,6 +5505,16 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 		}
 		cur->active_spin_lock_ptr = NULL;
 		cur->active_spin_lock_id = 0;
+
+		for (i = 0; i < fstate->acquired_refs; i++) {
+			/* WARN because this reference state cannot be freed
+			 * before this point, as bpf_spin_lock CS does not
+			 * allow functions that release the local kptr
+			 * immediately.
+			 */
+			if (fstate->refs[i].release_on_unlock)
+				WARN_ON_ONCE(release_reference(env, fstate->refs[i].id));
+		}
 	}
 	return 0;
 }
@@ -7697,6 +7709,16 @@ struct bpf_kfunc_call_arg_meta {
 		struct btf *btf;
 		u32 btf_id;
 	} arg_kptr_drop;
+	struct {
+		struct btf_field *field;
+	} arg_list_head;
+	struct {
+		struct btf_field *field;
+		struct btf *reg_btf;
+		u32 reg_btf_id;
+		u32 reg_offset;
+		u32 reg_ref_obj_id;
+	} arg_list_node;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7807,13 +7829,17 @@ static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 
 enum {
 	KF_ARG_DYNPTR_ID,
+	KF_ARG_LIST_HEAD_ID,
+	KF_ARG_LIST_NODE_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
+BTF_ID(struct, bpf_list_head)
+BTF_ID(struct, bpf_list_node)
 
-static bool is_kfunc_arg_dynptr(const struct btf *btf,
-				const struct btf_param *arg)
+static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
+				    const struct btf_param *arg, int type)
 {
 	const struct btf_type *t;
 	u32 res_id;
@@ -7826,7 +7852,22 @@ static bool is_kfunc_arg_dynptr(const struct btf *btf,
 	t = btf_type_skip_modifiers(btf, t->type, &res_id);
 	if (!t)
 		return false;
-	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
+	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[type]);
+}
+
+static bool is_kfunc_arg_dynptr(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_DYNPTR_ID);
+}
+
+static bool is_kfunc_arg_list_head(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_HEAD_ID);
+}
+
+static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
 }
 
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
@@ -7881,9 +7922,11 @@ static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
 enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CTX,
 	KF_ARG_PTR_TO_LOCAL_BTF_ID,  /* Local kptr */
-	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
 	KF_ARG_PTR_TO_DYNPTR,
+	KF_ARG_PTR_TO_LIST_HEAD,
+	KF_ARG_PTR_TO_LIST_NODE,
+	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
 };
@@ -7891,16 +7934,28 @@ enum kfunc_ptr_arg_type {
 enum special_kfunc_type {
 	KF_bpf_kptr_new_impl,
 	KF_bpf_kptr_drop_impl,
+	KF_bpf_list_add,
+	KF_bpf_list_add_tail,
+	KF_bpf_list_del,
+	KF_bpf_list_del_tail,
 };
 
 BTF_SET_START(special_kfunc_set)
 BTF_ID(func, bpf_kptr_new_impl)
 BTF_ID(func, bpf_kptr_drop_impl)
+BTF_ID(func, bpf_list_add)
+BTF_ID(func, bpf_list_add_tail)
+BTF_ID(func, bpf_list_del)
+BTF_ID(func, bpf_list_del_tail)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
 BTF_ID(func, bpf_kptr_new_impl)
 BTF_ID(func, bpf_kptr_drop_impl)
+BTF_ID(func, bpf_list_add)
+BTF_ID(func, bpf_list_add_tail)
+BTF_ID(func, bpf_list_del)
+BTF_ID(func, bpf_list_del_tail)
 
 enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 						struct bpf_kfunc_call_arg_meta *meta,
@@ -7926,15 +7981,6 @@ enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_local_kptr(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_LOCAL_BTF_ID;
 
-	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
-		if (!btf_type_is_struct(ref_t)) {
-			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
-				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
-			return -EINVAL;
-		}
-		return KF_ARG_PTR_TO_BTF_ID;
-	}
-
 	if (is_kfunc_arg_kptr_get(meta, argno)) {
 		if (!btf_type_is_ptr(ref_t)) {
 			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
@@ -7953,6 +7999,21 @@ enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_DYNPTR;
 
+	if (is_kfunc_arg_list_head(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_LIST_HEAD;
+
+	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_LIST_NODE;
+
+	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
+		if (!btf_type_is_struct(ref_t)) {
+			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
+				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
+			return -EINVAL;
+		}
+		return KF_ARG_PTR_TO_BTF_ID;
+	}
+
 	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
 		arg_mem_size = true;
 
@@ -8039,6 +8100,181 @@ static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static bool ref_obj_id_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
+{
+	struct bpf_func_state *state = cur_func(env);
+	struct bpf_reg_state *reg;
+	int i;
+
+	/* bpf_spin_lock only allows calling list_add and list_del, no BPF
+	 * subprogs, no global functions, so this acquired refs state is the
+	 * same one we will use to find registers to kill on bpf_spin_unlock.
+	 */
+	WARN_ON_ONCE(!ref_obj_id);
+	for (i = 0; i < state->acquired_refs; i++) {
+		if (state->refs[i].id == ref_obj_id) {
+			WARN_ON_ONCE(state->refs[i].release_on_unlock);
+			state->refs[i].release_on_unlock = true;
+			/* Now mark everyone sharing same ref_obj_id as untrusted */
+			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+				if (reg->ref_obj_id == ref_obj_id)
+					reg->type |= PTR_UNTRUSTED;
+			}));
+			return 0;
+		}
+	}
+	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
+	return -EFAULT;
+}
+
+static bool is_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	void *ptr;
+	u32 id;
+
+	switch ((int)reg->type) {
+	case PTR_TO_MAP_VALUE:
+		ptr = reg->map_ptr;
+		break;
+	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
+		ptr = reg->btf;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+	id = reg->id;
+
+	return env->cur_state->active_spin_lock_ptr == ptr &&
+	       env->cur_state->active_spin_lock_id == id;
+}
+
+static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg,
+					   u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct btf_type_fields *tab = NULL;
+	struct btf_field *field;
+	u32 list_head_off;
+
+	if (meta->btf != btf_vmlinux ||
+	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
+	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail] &&
+	     meta->func_id != special_kfunc_list[KF_bpf_list_del] &&
+	     meta->func_id != special_kfunc_list[KF_bpf_list_del_tail])) {
+		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+		return -EFAULT;
+	}
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		tab = reg->map_ptr->fields_tab;
+	} else /* PTR_TO_BTF_ID | MEM_TYPE_LOCAL */ {
+		struct btf_struct_meta *meta;
+
+		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (!meta) {
+			verbose(env, "bpf_list_head not found for local kptr\n");
+			return -EINVAL;
+		}
+		tab = meta->fields_tab;
+	}
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+
+	list_head_off = reg->off + reg->var_off.value;
+	field = btf_type_fields_find(tab, list_head_off, BPF_LIST_HEAD);
+	if (!field) {
+		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
+		return -EINVAL;
+	}
+
+	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
+	if (!is_reg_allocation_locked(env, reg)) {
+		verbose(env, "bpf_spin_lock at off=%d must be held for manipulating bpf_list_head\n",
+			tab->spin_lock_off);
+		return -EINVAL;
+	}
+
+	if (meta->func_id == special_kfunc_list[KF_bpf_list_add] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_list_add_tail]) {
+		if (!btf_struct_ids_match(&env->log, meta->arg_list_node.reg_btf,
+					  meta->arg_list_node.reg_btf_id, 0,
+					  field->list_head.btf, field->list_head.value_btf_id, true)) {
+			verbose(env, "bpf_list_head value type does not match arg#0\n");
+			return -EINVAL;
+		}
+		if (meta->arg_list_node.reg_offset != field->list_head.node_offset) {
+			verbose(env, "arg#0 offset must be for bpf_list_node at off=%d\n",
+				field->list_head.node_offset);
+			return -EINVAL;
+		}
+		/* Set arg#0 for expiration after unlock */
+		ref_obj_id_set_release_on_unlock(env, meta->arg_list_node.reg_ref_obj_id);
+	} else {
+		if (meta->arg_list_head.field) {
+			verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
+			return -EFAULT;
+		}
+		meta->arg_list_head.field = field;
+	}
+	return 0;
+}
+
+static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg,
+					   u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct btf_struct_meta *struct_meta;
+	struct btf_type_fields *tab;
+	struct btf_field *field;
+	u32 list_node_off;
+
+	if (meta->btf != btf_vmlinux ||
+	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
+	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail])) {
+		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+		return -EFAULT;
+	}
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+
+	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+	if (!struct_meta) {
+		verbose(env, "bpf_list_node not found for local kptr\n");
+		return -EINVAL;
+	}
+	tab = struct_meta->fields_tab;
+
+	list_node_off = reg->off + reg->var_off.value;
+	field = btf_type_fields_find(tab, list_node_off, BPF_LIST_NODE);
+	if (!field || field->offset != list_node_off) {
+		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
+		return -EINVAL;
+	}
+	if (meta->arg_list_node.field) {
+		verbose(env, "verifier internal error: repeating bpf_list_node arg\n");
+		return -EFAULT;
+	}
+	meta->arg_list_node.field = field;
+	meta->arg_list_node.reg_btf = reg->btf;
+	meta->arg_list_node.reg_btf_id = reg->btf_id;
+	meta->arg_list_node.reg_offset = list_node_off;
+	meta->arg_list_node.reg_ref_obj_id = reg->ref_obj_id;
+	return 0;
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
@@ -8157,6 +8393,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		case KF_ARG_PTR_TO_KPTR_STRONG:
 		case KF_ARG_PTR_TO_DYNPTR:
+		case KF_ARG_PTR_TO_LIST_HEAD:
+		case KF_ARG_PTR_TO_LIST_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
 			/* Trusted by default */
@@ -8194,17 +8432,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				meta->arg_kptr_drop.btf_id = reg->btf_id;
 			}
 			break;
-		case KF_ARG_PTR_TO_BTF_ID:
-			/* Only base_type is checked, further checks are done here */
-			if (reg->type != PTR_TO_BTF_ID &&
-			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
-				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
-				return -EINVAL;
-			}
-			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
-			if (ret < 0)
-				return ret;
-			break;
 		case KF_ARG_PTR_TO_KPTR_STRONG:
 			if (reg->type != PTR_TO_MAP_VALUE) {
 				verbose(env, "arg#0 expected pointer to map value\n");
@@ -8232,6 +8459,44 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			break;
+		case KF_ARG_PTR_TO_LIST_HEAD:
+			if (reg->type != PTR_TO_MAP_VALUE &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
+				verbose(env, "arg#%d expected pointer to map value or local kptr\n", i);
+				return -EINVAL;
+			}
+			if (reg->type == (PTR_TO_BTF_ID | MEM_TYPE_LOCAL) && !reg->ref_obj_id) {
+				verbose(env, "local kptr must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_LIST_NODE:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
+				verbose(env, "arg#%d expected point to local kptr\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "local kptr must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_BTF_ID:
+			/* Only base_type is checked, further checks are done here */
+			if (reg->type != PTR_TO_BTF_ID &&
+			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
+				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_MEM:
 			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
 			if (IS_ERR(resolve_ret)) {
@@ -8352,11 +8617,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
 
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (!btf_type_is_void(ptr_type)) {
-				verbose(env, "kernel function %s must have void * return type\n",
-					meta.func_name);
-				return -EINVAL;
-			}
 			if (meta.func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
 				const struct btf_type *ret_t;
 				struct btf *ret_btf;
@@ -8394,6 +8654,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				env->insn_aux_data[insn_idx].kptr_struct_meta =
 					btf_find_struct_meta(meta.arg_kptr_drop.btf,
 							     meta.arg_kptr_drop.btf_id);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_list_del] ||
+				   meta.func_id == special_kfunc_list[KF_bpf_list_del_tail]) {
+				struct btf_field *field = meta.arg_list_head.field;
+
+				mark_reg_known_zero(env, regs, BPF_REG_0);
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_TYPE_LOCAL;
+				regs[BPF_REG_0].btf = field->list_head.btf;
+				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
+				regs[BPF_REG_0].off = field->list_head.node_offset;
 			} else {
 				verbose(env, "kernel function %s unhandled dynamic return type\n",
 					meta.func_name);
@@ -13062,11 +13331,18 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock_ptr &&
-				    (insn->src_reg == BPF_PSEUDO_CALL ||
-				     insn->imm != BPF_FUNC_spin_unlock)) {
-					verbose(env, "function calls are not allowed while holding a lock\n");
-					return -EINVAL;
+				if (env->cur_state->active_spin_lock_ptr) {
+					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
+					    (insn->src_reg == BPF_PSEUDO_CALL) ||
+					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+					     (insn->off != 0 ||
+					      (insn->imm != special_kfunc_list[KF_bpf_list_add] &&
+					       insn->imm != special_kfunc_list[KF_bpf_list_add_tail] &&
+					       insn->imm != special_kfunc_list[KF_bpf_list_del] &&
+					       insn->imm != special_kfunc_list[KF_bpf_list_del_tail])))) {
+						verbose(env, "function calls are not allowed while holding a lock\n");
+						return -EINVAL;
+					}
 				}
 				if (insn->src_reg == BPF_PSEUDO_CALL)
 					err = check_func_call(env, insn, &env->insn_idx);
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index c47d16f3e817..21b85cd721cb 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -52,4 +52,32 @@ extern void bpf_kptr_drop_impl(void *kptr, void *meta__ign) __ksym;
 /* Convenience macro to wrap over bpf_kptr_drop_impl */
 #define bpf_kptr_drop(kptr) bpf_kptr_drop_impl(kptr, NULL)
 
+/* Description
+ *	Add a new entry to the head of the BPF linked list.
+ * Returns
+ *	Void.
+ */
+extern void bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head) __ksym;
+
+/* Description
+ *	Add a new entry to the tail of the BPF linked list.
+ * Returns
+ *	Void.
+ */
+extern void bpf_list_add_tail(struct bpf_list_node *node, struct bpf_list_head *head) __ksym;
+
+/* Description
+ *	Remove the entry at head of the BPF linked list.
+ * Returns
+ *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
+ */
+extern struct bpf_list_node *bpf_list_del(struct bpf_list_head *head) __ksym;
+
+/* Description
+ *	Remove the entry at tail of the BPF linked list.
+ * Returns
+ *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
+ */
+extern struct bpf_list_node *bpf_list_del_tail(struct bpf_list_head *head) __ksym;
+
 #endif
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (21 preceding siblings ...)
  2022-10-13  6:23 ` [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
@ 2022-10-13  6:23 ` Kumar Kartikeya Dwivedi
  2022-10-18  4:03   ` Andrii Nakryiko
  2022-10-13  6:23 ` [PATCH bpf-next v2 24/25] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
  2022-10-13  6:23 ` [PATCH bpf-next v2 25/25] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
  24 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:23 UTC (permalink / raw)
  To: bpf
  Cc: Dave Marchevsky, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Dave Marchevsky, Delyan Kratunov

From: Dave Marchevsky <davemarchevsky@fb.com>

Currently libbpf does not allow declaration of a struct bpf_spin_lock in
global scope. Attempting to do so results in "failed to re-mmap" error,
as .bss arraymap containing spinlock is not allowed to be mmap'd.

This patch adds support for a .bss.private section. The maps contained
in this section will not be mmaped into userspace by libbpf, nor will
they be exposed via bpftool-generated skeleton.

Intent here is to allow more natural programming pattern for
global-scope spinlocks which will be used by rbtree locking mechanism in
further patches in this series.

Notes:

  * Initially I called the section .bss.no_mmap, but the broader
    'private' term better indicates that skeleton shouldn't expose these
    maps at all, IMO.

  * bpftool/gen.c's is_internal_mmapable_map function checks whether the
    map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary
    to remove .bss.private maps from skeleton

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.c | 65 ++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 184ce1684dcd..fc4d15515b02 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -465,6 +465,7 @@ struct bpf_struct_ops {
 #define KCONFIG_SEC ".kconfig"
 #define KSYMS_SEC ".ksyms"
 #define STRUCT_OPS_SEC ".struct_ops"
+#define BSS_SEC_PRIVATE ".bss.private"
 
 enum libbpf_map_type {
 	LIBBPF_MAP_UNSPEC,
@@ -578,6 +579,7 @@ enum sec_type {
 	SEC_BSS,
 	SEC_DATA,
 	SEC_RODATA,
+	SEC_BSS_PRIVATE,
 };
 
 struct elf_sec_desc {
@@ -1582,7 +1584,8 @@ bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map);
 
 static int
 bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
-			      const char *real_name, int sec_idx, void *data, size_t data_sz)
+			      const char *real_name, int sec_idx, void *data,
+			      size_t data_sz, bool do_mmap)
 {
 	struct bpf_map_def *def;
 	struct bpf_map *map;
@@ -1610,27 +1613,31 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	def->max_entries = 1;
 	def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
 			 ? BPF_F_RDONLY_PROG : 0;
-	def->map_flags |= BPF_F_MMAPABLE;
+	if (do_mmap)
+		def->map_flags |= BPF_F_MMAPABLE;
 
 	pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
 		 map->name, map->sec_idx, map->sec_offset, def->map_flags);
 
-	map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
-	if (map->mmaped == MAP_FAILED) {
-		err = -errno;
-		map->mmaped = NULL;
-		pr_warn("failed to alloc map '%s' content buffer: %d\n",
-			map->name, err);
-		zfree(&map->real_name);
-		zfree(&map->name);
-		return err;
+	map->mmaped = NULL;
+	if (do_mmap) {
+		map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
+				   MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+		if (map->mmaped == MAP_FAILED) {
+			err = -errno;
+			map->mmaped = NULL;
+			pr_warn("failed to alloc map '%s' content buffer: %d\n",
+				map->name, err);
+			zfree(&map->real_name);
+			zfree(&map->name);
+			return err;
+		}
 	}
 
 	/* failures are fine because of maps like .rodata.str1.1 */
 	(void) bpf_map_find_btf_info(obj, map);
 
-	if (data)
+	if (do_mmap && data)
 		memcpy(map->mmaped, data, data_sz);
 
 	pr_debug("map %td is \"%s\"\n", map - obj->maps, map->name);
@@ -1642,12 +1649,14 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 	struct elf_sec_desc *sec_desc;
 	const char *sec_name;
 	int err = 0, sec_idx;
+	bool do_mmap;
 
 	/*
 	 * Populate obj->maps with libbpf internal maps.
 	 */
 	for (sec_idx = 1; sec_idx < obj->efile.sec_cnt; sec_idx++) {
 		sec_desc = &obj->efile.secs[sec_idx];
+		do_mmap = true;
 
 		/* Skip recognized sections with size 0. */
 		if (!sec_desc->data || sec_desc->data->d_size == 0)
@@ -1659,7 +1668,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
 							    sec_name, sec_idx,
 							    sec_desc->data->d_buf,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
 		case SEC_RODATA:
 			obj->has_rodata = true;
@@ -1667,14 +1677,18 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_RODATA,
 							    sec_name, sec_idx,
 							    sec_desc->data->d_buf,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
+		case SEC_BSS_PRIVATE:
+			do_mmap = false;
 		case SEC_BSS:
 			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
 			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS,
 							    sec_name, sec_idx,
 							    NULL,
-							    sec_desc->data->d_size);
+							    sec_desc->data->d_size,
+							    do_mmap);
 			break;
 		default:
 			/* skip */
@@ -1988,7 +2002,7 @@ static int bpf_object__init_kconfig_map(struct bpf_object *obj)
 	map_sz = last_ext->kcfg.data_off + last_ext->kcfg.sz;
 	err = bpf_object__init_internal_map(obj, LIBBPF_MAP_KCONFIG,
 					    ".kconfig", obj->efile.symbols_shndx,
-					    NULL, map_sz);
+					    NULL, map_sz, true);
 	if (err)
 		return err;
 
@@ -3449,6 +3463,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			sec_desc->sec_type = SEC_BSS;
 			sec_desc->shdr = sh;
 			sec_desc->data = data;
+		} else if (sh->sh_type == SHT_NOBITS && strcmp(name, BSS_SEC_PRIVATE) == 0) {
+			sec_desc->sec_type = SEC_BSS_PRIVATE;
+			sec_desc->shdr = sh;
+			sec_desc->data = data;
 		} else {
 			pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name,
 				(size_t)sh->sh_size);
@@ -3911,6 +3929,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 	case SEC_BSS:
 	case SEC_DATA:
 	case SEC_RODATA:
+	case SEC_BSS_PRIVATE:
 		return true;
 	default:
 		return false;
@@ -3930,6 +3949,7 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
 		return LIBBPF_MAP_KCONFIG;
 
 	switch (obj->efile.secs[shndx].sec_type) {
+	case SEC_BSS_PRIVATE:
 	case SEC_BSS:
 		return LIBBPF_MAP_BSS;
 	case SEC_DATA:
@@ -4919,16 +4939,19 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 {
 	enum libbpf_map_type map_type = map->libbpf_type;
 	char *cp, errmsg[STRERR_BUFSIZE];
-	int err, zero = 0;
+	int err = 0, zero = 0;
 
 	if (obj->gen_loader) {
-		bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
-					 map->mmaped, map->def.value_size);
+		if (map->mmaped)
+			bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
+						 map->mmaped, map->def.value_size);
 		if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG)
 			bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
 		return 0;
 	}
-	err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
+
+	if (map->mmaped)
+		err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
 	if (err) {
 		err = -errno;
 		cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 24/25] selftests/bpf: Add __contains macro to bpf_experimental.h
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (22 preceding siblings ...)
  2022-10-13  6:23 ` [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
@ 2022-10-13  6:23 ` Kumar Kartikeya Dwivedi
  2022-10-13  6:23 ` [PATCH bpf-next v2 25/25] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:23 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Add user facing __contains macro which provides a convenient wrapper
over the verbose kernel specific BTF declaration tag required to
annotate BPF list head structs in user types.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/bpf_experimental.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 21b85cd721cb..dc71b58b123c 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -5,6 +5,8 @@
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_core_read.h>
 
+#define __contains(name, node) __attribute__((btf_decl_tag("contains:struct:" #name ":" #node)))
+
 #else
 
 struct bpf_list_head {
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH bpf-next v2 25/25] selftests/bpf: Add BPF linked list API tests
  2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (23 preceding siblings ...)
  2022-10-13  6:23 ` [PATCH bpf-next v2 24/25] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
@ 2022-10-13  6:23 ` Kumar Kartikeya Dwivedi
  24 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-13  6:23 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

Include various tests covering the success and failure cases. Also, run
the success cases at runtime to verify correctness of linked list
manipulation routines, in addition to ensuring successful verification.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c                          |   5 +-
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/linked_list.c    |  88 +++++
 .../testing/selftests/bpf/progs/linked_list.c | 325 ++++++++++++++++++
 4 files changed, 418 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.c

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 71e0f19f738a..6f012aa44ebe 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1833,7 +1833,10 @@ static const struct btf_kfunc_id_set generic_kfunc_set = {
 
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
 }
 
 late_initcall(kfunc_init);
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 520f12229b98..d13e908da387 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -32,6 +32,7 @@ ksyms_module                             # test_ksyms_module__open_and_load unex
 ksyms_module_libbpf                      # JIT does not support calling kernel function                                (kfunc)
 ksyms_module_lskel                       # test_ksyms_module_lskel__open_and_load unexpected error: -9                 (?)
 libbpf_get_fd_by_id_opts                 # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
+linked_list				 # JIT does not support calling kernel function                                (kfunc)
 lookup_key                               # JIT does not support calling kernel function                                (kfunc)
 lru_bug                                  # prog 'printk': failed to auto-attach: -524
 map_kptr                                 # failed to open_and_load program: -524 (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
new file mode 100644
index 000000000000..2dc695fb05b3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#define __KERNEL__
+#include "bpf_experimental.h"
+#undef __KERNEL__
+
+#include "linked_list.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} linked_list_fail_tests = {
+};
+
+static void test_linked_list_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct linked_list *skel;
+	int key = 0, ret;
+	char buf[32];
+
+	(void)log_buf;
+	(void)&linked_list_fail_tests;
+
+	skel = linked_list__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "linked_list__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop), &opts);
+	ASSERT_OK(ret, "map_list_push_pop");
+	ASSERT_OK(opts.retval, "map_list_push_pop retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop), &opts);
+	ASSERT_OK(ret, "global_list_push_pop");
+	ASSERT_OK(opts.retval, "global_list_push_pop retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_unclean), &opts);
+	ASSERT_OK(ret, "global_list_push_pop_unclean");
+	ASSERT_OK(opts.retval, "global_list_push_pop_unclean retval");
+
+	ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.bss_private), &key, buf, 0),
+		  "check_and_free_fields");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop_multiple), &opts);
+	ASSERT_OK(ret, "map_list_push_pop_multiple");
+	ASSERT_OK(opts.retval, "map_list_push_pop_multiple retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple), &opts);
+	ASSERT_OK(ret, "global_list_push_pop_multiple");
+	ASSERT_OK(opts.retval, "global_list_push_pop_multiple retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple_unclean), &opts);
+	ASSERT_OK(ret, "global_list_push_pop_multiple_unclean");
+	ASSERT_OK(opts.retval, "global_list_push_pop_multiple_unclean retval");
+
+	ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.bss_private), &key, buf, 0),
+		  "check_and_free_fields");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_in_list), &opts);
+	ASSERT_OK(ret, "map_list_in_list");
+	ASSERT_OK(opts.retval, "map_list_in_list retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list), &opts);
+	ASSERT_OK(ret, "global_list_in_list");
+	ASSERT_OK(opts.retval, "global_list_in_list retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list_unclean), &opts);
+	ASSERT_OK(ret, "global_list_in_list_unclean");
+	ASSERT_OK(opts.retval, "global_list_in_list_unclean retval");
+
+	ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.bss_private), &key, buf, 0),
+		  "check_and_free_fields");
+
+	linked_list__destroy(skel);
+}
+
+void test_linked_list(void)
+{
+	test_linked_list_success();
+}
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
new file mode 100644
index 000000000000..1b228ada7d2c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+struct bar {
+	struct bpf_list_node node;
+	int data;
+};
+
+struct foo {
+	struct bpf_list_node node;
+	struct bpf_list_head head __contains(bar, node);
+	struct bpf_spin_lock lock;
+	int data;
+};
+
+struct map_value {
+	struct bpf_list_head head __contains(foo, node);
+	struct bpf_spin_lock lock;
+	int data;
+};
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+struct bpf_spin_lock glock SEC(".bss.private");
+struct bpf_list_head ghead __contains(foo, node) SEC(".bss.private");
+struct bpf_list_head gghead __contains(foo, node) SEC(".bss.private");
+
+static __always_inline int list_push_pop(void *lock, void *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct foo *f;
+
+	f = bpf_kptr_new(typeof(*f));
+	if (!f)
+		return 2;
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		bpf_kptr_drop(f);
+		return 3;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del_tail(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		bpf_kptr_drop(f);
+		return 4;
+	}
+
+
+	bpf_spin_lock(lock);
+	bpf_list_add(&f->node, head);
+	f->data = 42;
+	bpf_spin_unlock(lock);
+	if (leave_in_map)
+		return 0;
+	bpf_spin_lock(lock);
+	n = bpf_list_del_tail(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 5;
+	f = container_of(n, struct foo, node);
+	if (f->data != 42) {
+		bpf_kptr_drop(f);
+		return 6;
+	}
+
+	bpf_spin_lock(lock);
+	bpf_list_add(&f->node, head);
+	f->data = 13;
+	bpf_spin_unlock(lock);
+	bpf_spin_lock(lock);
+	n = bpf_list_del(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 7;
+	f = container_of(n, struct foo, node);
+	if (f->data != 13) {
+		bpf_kptr_drop(f);
+		return 8;
+	}
+	bpf_kptr_drop(f);
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		return 9;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del_tail(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		return 10;
+	}
+	return 0;
+}
+
+
+static __always_inline int list_push_pop_multiple(void *lock, void *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct foo *f[8], *pf;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		f[i] = bpf_kptr_new(typeof(**f));
+		if (!f[i])
+			return 2;
+		f[i]->data = i;
+		bpf_spin_lock(lock);
+		bpf_list_add(&f[i]->node, head);
+		bpf_spin_unlock(lock);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		bpf_spin_lock(lock);
+		n = bpf_list_del(head);
+		bpf_spin_unlock(lock);
+		if (!n)
+			return 3;
+		pf = container_of(n, struct foo, node);
+		if (pf->data != (ARRAY_SIZE(f) - i - 1)) {
+			bpf_kptr_drop(pf);
+			return 4;
+		}
+		bpf_spin_lock(lock);
+		bpf_list_add_tail(&pf->node, head);
+		bpf_spin_unlock(lock);
+	}
+
+	if (leave_in_map)
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		bpf_spin_lock(lock);
+		n = bpf_list_del_tail(head);
+		bpf_spin_unlock(lock);
+		if (!n)
+			return 5;
+		pf = container_of(n, struct foo, node);
+		if (pf->data != i) {
+			bpf_kptr_drop(pf);
+			return 6;
+		}
+		bpf_kptr_drop(pf);
+	}
+	bpf_spin_lock(lock);
+	n = bpf_list_del_tail(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		return 7;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_kptr_drop(container_of(n, struct foo, node));
+		return 8;
+	}
+	return 0;
+}
+
+static __always_inline int list_in_list(void *lock, void *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct bar *ba[8], *b;
+	struct foo *f;
+	int i;
+
+	f = bpf_kptr_new(typeof(*f));
+	if (!f)
+		return 2;
+	for (i = 0; i < ARRAY_SIZE(ba); i++) {
+		b = bpf_kptr_new(typeof(*b));
+		if (!b) {
+			bpf_kptr_drop(f);
+			return 3;
+		}
+		b->data = i;
+		bpf_spin_lock(&f->lock);
+		bpf_list_add_tail(&b->node, &f->head);
+		bpf_spin_unlock(&f->lock);
+	}
+
+	bpf_spin_lock(lock);
+	bpf_list_add(&f->node, head);
+	f->data = 42;
+	bpf_spin_unlock(lock);
+
+	if (leave_in_map)
+		return 0;
+
+	bpf_spin_lock(lock);
+	n = bpf_list_del(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 4;
+	f = container_of(n, struct foo, node);
+	if (f->data != 42) {
+		bpf_kptr_drop(f);
+		return 5;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ba); i++) {
+		bpf_spin_lock(&f->lock);
+		n = bpf_list_del(&f->head);
+		bpf_spin_unlock(&f->lock);
+		if (!n) {
+			bpf_kptr_drop(f);
+			return 6;
+		}
+		b = container_of(n, struct bar, node);
+		if (b->data != i) {
+			bpf_kptr_drop(f);
+			bpf_kptr_drop(b);
+			return 7;
+		}
+		bpf_kptr_drop(b);
+	}
+	bpf_spin_lock(&f->lock);
+	n = bpf_list_del(&f->head);
+	bpf_spin_unlock(&f->lock);
+	if (n) {
+		bpf_kptr_drop(f);
+		bpf_kptr_drop(container_of(n, struct bar, node));
+		return 8;
+	}
+	bpf_kptr_drop(f);
+	return 0;
+}
+
+SEC("tc")
+int map_list_push_pop(void *ctx)
+{
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return list_push_pop(&v->lock, &v->head, false);
+}
+
+SEC("tc")
+int global_list_push_pop(void *ctx)
+{
+	return list_push_pop(&glock, &ghead, false);
+}
+
+SEC("tc")
+int global_list_push_pop_unclean(void *ctx)
+{
+	return list_push_pop(&glock, &gghead, true);
+}
+
+SEC("tc")
+int map_list_push_pop_multiple(void *ctx)
+{
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return list_push_pop_multiple(&v->lock, &v->head, false);
+}
+
+SEC("tc")
+int global_list_push_pop_multiple(void *ctx)
+{
+	return list_push_pop_multiple(&glock, &ghead, false);
+}
+
+SEC("tc")
+int global_list_push_pop_multiple_unclean(void *ctx)
+{
+	return list_push_pop_multiple(&glock, &gghead, true);
+}
+
+SEC("tc")
+int map_list_in_list(void *ctx)
+{
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return list_in_list(&v->lock, &v->head, false);
+}
+
+SEC("tc")
+int global_list_in_list(void *ctx)
+{
+	return list_in_list(&glock, &ghead, false);
+}
+
+SEC("tc")
+int global_list_in_list_unclean(void *ctx)
+{
+	return list_in_list(&glock, &gghead, true);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling
  2022-10-13  6:22 ` [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
@ 2022-10-13 13:48   ` kernel test robot
  0 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2022-10-13 13:48 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kbuild-all, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

[-- Attachment #1: Type: text/plain, Size: 6015 bytes --]

Hi Kumar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/Local-kptrs-BPF-linked-lists/20221013-142606
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: powerpc-allnoconfig
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/b8a8cce859fde905d2a8f2694d5aee0b5d3a77e2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Local-kptrs-BPF-linked-lists/20221013-142606
        git checkout b8a8cce859fde905d2a8f2694d5aee0b5d3a77e2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/bpf.h:28,
                    from kernel/fork.c:99:
>> include/linux/btf.h:512:30: warning: 'struct bpf_verifier_log' declared inside parameter list will not be visible outside of this definition or declaration
     512 | btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
         |                              ^~~~~~~~~~~~~~~~
   kernel/fork.c:163:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes]
     163 | void __weak arch_release_task_struct(struct task_struct *tsk)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:852:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes]
     852 | void __init __weak arch_task_cache_init(void) { }
         |                    ^~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/bpf.h:28,
                    from include/linux/filter.h:9,
                    from kernel/sysctl.c:35:
>> include/linux/btf.h:512:30: warning: 'struct bpf_verifier_log' declared inside parameter list will not be visible outside of this definition or declaration
     512 | btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
         |                              ^~~~~~~~~~~~~~~~
--
   In file included from include/linux/bpf.h:28,
                    from include/linux/filter.h:9,
                    from kernel/kallsyms.c:25:
>> include/linux/btf.h:512:30: warning: 'struct bpf_verifier_log' declared inside parameter list will not be visible outside of this definition or declaration
     512 | btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
         |                              ^~~~~~~~~~~~~~~~
   kernel/kallsyms.c:571:12: warning: no previous prototype for 'arch_get_kallsym' [-Wmissing-prototypes]
     571 | int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
         |            ^~~~~~~~~~~~~~~~


vim +512 include/linux/btf.h

   456	
   457	const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
   458	const char *btf_name_by_offset(const struct btf *btf, u32 offset);
   459	struct btf *btf_parse_vmlinux(void);
   460	struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
   461	u32 *btf_kfunc_id_set_contains(const struct btf *btf,
   462				       enum bpf_prog_type prog_type,
   463				       u32 kfunc_btf_id);
   464	int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
   465				      const struct btf_kfunc_id_set *s);
   466	s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
   467	int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
   468					struct module *owner);
   469	struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
   470	const struct btf_member *
   471	btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
   472			      const struct btf_type *t, enum bpf_prog_type prog_type,
   473			      int arg);
   474	bool btf_types_are_same(const struct btf *btf1, u32 id1,
   475				const struct btf *btf2, u32 id2);
   476	#else
   477	static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
   478							    u32 type_id)
   479	{
   480		return NULL;
   481	}
   482	static inline const char *btf_name_by_offset(const struct btf *btf,
   483						     u32 offset)
   484	{
   485		return NULL;
   486	}
   487	static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
   488						     enum bpf_prog_type prog_type,
   489						     u32 kfunc_btf_id)
   490	{
   491		return NULL;
   492	}
   493	static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
   494						    const struct btf_kfunc_id_set *s)
   495	{
   496		return 0;
   497	}
   498	static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
   499	{
   500		return -ENOENT;
   501	}
   502	static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
   503						      u32 add_cnt, struct module *owner)
   504	{
   505		return 0;
   506	}
   507	static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
   508	{
   509		return NULL;
   510	}
   511	static inline const struct btf_member *
 > 512	btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
   513			      const struct btf_type *t, enum bpf_prog_type prog_type,
   514			      int arg)
   515	{
   516		return NULL;
   517	}
   518	static inline bool btf_types_are_same(const struct btf *btf1, u32 id1,
   519					      const struct btf *btf2, u32 id2)
   520	{
   521		return false;
   522	}
   523	#endif
   524	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 30549 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 6.0.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="powerpc-linux-gcc (GCC) 12.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120100
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_XZ is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

CONFIG_HAVE_EBPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

#
# Scheduler features
#
# end of Scheduler features

CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CC_NO_ARRAY_BOUNDS=y
# CONFIG_CGROUPS is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_TIME_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_BOOT_CONFIG is not set
# CONFIG_INITRAMFS_PRESERVE_MTIME is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
# CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_EXPERT=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
# CONFIG_KCMP is not set
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# end of Kernel Performance Events And Counters

# CONFIG_PROFILING is not set
# end of General setup

CONFIG_PPC32=y
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_PPC_BOOK3S_603 is not set
CONFIG_PPC_BOOK3S_604=y
CONFIG_POWERPC_CPU=y
# CONFIG_E300C2_CPU is not set
# CONFIG_E300C3_CPU is not set
# CONFIG_G4_CPU is not set
# CONFIG_TOOLCHAIN_DEFAULT_CPU is not set
CONFIG_TARGET_CPU_BOOL=y
CONFIG_TARGET_CPU="powerpc"
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU_REGS=y
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not set
# CONFIG_PPC_KUEP is not set
# CONFIG_PPC_KUAP is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_PMU_SYSFS is not set
# CONFIG_SMP is not set
CONFIG_NR_CPUS=1
# end of Processor support

CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_32BIT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_BITS_MIN=11
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=17
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_PANIC_TIMEOUT=180
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_HAS_ADD_PAGES=y
# CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_PGTABLE_LEVELS=2

#
# Platform support
#
# CONFIG_PPC_CHRP is not set
# CONFIG_PPC_MPC512x is not set
# CONFIG_PPC_MPC52xx is not set
# CONFIG_PPC_PMAC is not set
# CONFIG_PPC_82xx is not set
# CONFIG_PPC_83xx is not set
# CONFIG_PPC_86xx is not set
# CONFIG_EMBEDDED6xx is not set
# CONFIG_AMIGAONE is not set
# CONFIG_KVM_GUEST is not set
# CONFIG_EPAPR_PARAVIRT is not set
# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
# end of CPU Frequency scaling

#
# CPUIdle driver
#

#
# CPU Idle
#
# CONFIG_CPU_IDLE is not set
# end of CPU Idle
# end of CPUIdle driver

# CONFIG_TAU is not set
# CONFIG_GEN_RTC is not set
# end of Platform support

#
# Kernel options
#
# CONFIG_HIGHMEM is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_PPC_4K_PAGES=y
CONFIG_PAGE_SIZE_4KB=y
CONFIG_PPC_PAGE_SHIFT=12
CONFIG_THREAD_SHIFT=13
CONFIG_DATA_SHIFT=12
CONFIG_FORCE_MAX_ZONEORDER=11
CONFIG_CMDLINE=""
CONFIG_EXTRA_TARGETS=""
# CONFIG_PM is not set
# end of Kernel options

#
# Bus options
#
# CONFIG_FSL_LBC is not set
# end of Bus options

#
# Advanced setup
#
# CONFIG_ADVANCED_OPTIONS is not set

#
# Default settings for advanced configuration options are used
#
CONFIG_LOWMEM_SIZE=0x30000000
CONFIG_PAGE_OFFSET=0xc0000000
CONFIG_KERNEL_START=0xc0000000
CONFIG_PHYSICAL_START=0x00000000
CONFIG_TASK_SIZE=0xb0000000
# end of Advanced setup

# CONFIG_VIRTUALIZATION is not set
CONFIG_HAVE_LIVEPATCH=y

#
# General architecture-dependent options
#
# CONFIG_JUMP_LABEL is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_OFF_T=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_PAGE_SIZE=y
CONFIG_MMU_GATHER_MERGE_VMAS=y
CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_ARCH_WEAK_RELEASE_ACQUIRE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
# CONFIG_SECCOMP is not set
CONFIG_HAVE_STACKPROTECTOR=y
# CONFIG_STACKPROTECTOR is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_ARCH_MMAP_RND_BITS=11
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_HAVE_ARCH_NVRAM_OPS=y
CONFIG_CLONE_BACKWARDS=y
CONFIG_OLD_SIGSUSPEND=y
CONFIG_OLD_SIGACTION=y
# CONFIG_COMPAT_32BIT_TIME is not set
CONFIG_HAVE_ARCH_VMAP_STACK=y
# CONFIG_VMAP_STACK is not set
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_OPTIONAL_KERNEL_RWX=y
CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT=y
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
# CONFIG_STRICT_KERNEL_RWX is not set
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_ARCH_HAS_PHYS_TO_DMA=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_BLOCK_LEGACY_AUTOLOAD is not set
# CONFIG_BLK_DEV_BSGLIB is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
# CONFIG_BLK_DEV_ZONED is not set
# CONFIG_BLK_WBT is not set
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
# end of Partition Types

#
# IO Schedulers
#
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
# CONFIG_IOSCHED_BFQ is not set
# end of IO Schedulers

CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_ELFCORE=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_BINFMT_SCRIPT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
# CONFIG_SWAP is not set

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_SLAB_MERGE_DEFAULT is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_FLATMEM=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_ARCH_KEEP_MEMBLOCK=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_COMPACTION is not set
# CONFIG_PAGE_REPORTING is not set
# CONFIG_MIGRATION is not set
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_NEED_PER_CPU_KM=y
# CONFIG_CMA is not set
CONFIG_GENERIC_EARLY_IOREMAP=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PERCPU_STATS is not set

#
# GUP_TEST needs to have DEBUG_FS enabled
#
CONFIG_ARCH_HAS_PTE_SPECIAL=y
# CONFIG_ANON_VMA_NAME is not set
# CONFIG_USERFAULTFD is not set

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

# CONFIG_NET is not set

#
# Device Drivers
#
# CONFIG_PCCARD is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
# CONFIG_DEVTMPFS is not set
# CONFIG_STANDALONE is not set
# CONFIG_PREVENT_FIRMWARE_BUILD is not set

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
CONFIG_DTC=y
CONFIG_OF=y
# CONFIG_OF_UNITTEST is not set
CONFIG_OF_FLATTREE=y
CONFIG_OF_EARLY_FLATTREE=y
CONFIG_OF_KOBJ=y
CONFIG_OF_ADDRESS=y
CONFIG_OF_IRQ=y
CONFIG_OF_RESERVED_MEM=y
# CONFIG_OF_OVERLAY is not set
CONFIG_OF_DMA_DEFAULT_COHERENT=y
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
# CONFIG_PARPORT is not set
# CONFIG_BLK_DEV is not set

#
# NVME Support
#
# CONFIG_NVME_FC is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_SRAM is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_OPEN_DICE is not set
# CONFIG_VCPU_STALL_DETECTOR is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
# CONFIG_ECHO is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set
# end of SCSI device support

# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_TARGET_CORE is not set
# CONFIG_MACINTOSH_DRIVERS is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_SIFIVE is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_XILINX_PS_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_CONEXANT_DIGICOLOR is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_PPC_EPAPR_HV_BYTECHAN is not set
# CONFIG_NULL_TTY is not set
# CONFIG_HVC_UDBG is not set
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_TTY_PRINTK is not set
# CONFIG_VIRTIO_CONSOLE is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_DEVMEM is not set
# CONFIG_NVRAM is not set
# CONFIG_TCG_TPM is not set
# CONFIG_XILLYBUS is not set
# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_ATMEL_FLEXCOM is not set
# CONFIG_MFD_ATMEL_HLCDC is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_MFD_HI6421_PMIC is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
# CONFIG_FB is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
# CONFIG_VGA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
# end of Console display driver support
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_RTC_LIB=y
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
# CONFIG_SYNC_FILE is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_VFIO is not set
# CONFIG_VIRT_DRIVERS is not set
# CONFIG_VIRTIO_MENU is not set
# CONFIG_VHOST_MENU is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_GOLDFISH is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# CONFIG_MICROCHIP_PIT64B is not set
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
# CONFIG_IOMMU_SUPPORT is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# CONFIG_QUICC_ENGINE is not set
# end of NXP/Freescale QorIQ SoC drivers

#
# fujitsu SoC drivers
#
# end of fujitsu SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# CONFIG_LITEX_SOC_CONTROLLER is not set
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
CONFIG_IRQCHIP=y
# CONFIG_AL_FIC is not set
# CONFIG_XILINX_INTC is not set
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_CADENCE_DPHY is not set
# CONFIG_PHY_CADENCE_DPHY_RX is not set
# CONFIG_PHY_CADENCE_SALVO is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set
# CONFIG_RAS is not set

#
# Android
#
# CONFIG_ANDROID_BINDER_IPC is not set
# end of Android

# CONFIG_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_FSI is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
# CONFIG_VALIDATE_FS_PARSER is not set
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_NILFS2_FS is not set
# CONFIG_F2FS_FS is not set
CONFIG_EXPORTFS=y
# CONFIG_EXPORTFS_BLOCK_OPS is not set
CONFIG_FILE_LOCKING=y
# CONFIG_FS_ENCRYPTION is not set
# CONFIG_FS_VERITY is not set
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_FUSE_FS is not set
# CONFIG_OVERLAY_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROC_CHILDREN is not set
CONFIG_KERNFS=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
# CONFIG_CONFIGFS_FS is not set
# end of Pseudo filesystems

# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NLS is not set
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_BARE=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
# CONFIG_INIT_STACK_ALL_PATTERN is not set
CONFIG_INIT_STACK_ALL_ZERO=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

# CONFIG_CRYPTO is not set

#
# Library routines
#
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
# CONFIG_CORDIC is not set
# CONFIG_PRIME_NUMBERS is not set

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# end of Crypto library routines

# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_T10DIF is not set
# CONFIG_CRC64_ROCKSOFT is not set
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
# CONFIG_CRC8 is not set
# CONFIG_RANDOM32_SELFTEST is not set
# CONFIG_XZ_DEC is not set
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_DMA_DECLARE_COHERENT=y
# CONFIG_DMA_API_DEBUG is not set
CONFIG_GENERIC_ATOMIC64=y
# CONFIG_IRQ_POLL is not set
CONFIG_LIBFDT=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
# CONFIG_PRINTK_TIME is not set
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DYNAMIC_DEBUG_CORE is not set
# CONFIG_SYMBOLIC_ERRNAME is not set
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO_NONE=y
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B is not set
# CONFIG_VMLINUX_MAP is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_DEBUG_FS is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_KCSAN_COMPILER=y
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
# CONFIG_SOFTLOCKUP_DETECTOR is not set
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_BUG_ON_DATA_CORRUPTION is not set
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_LATENCYTOP is not set
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_SAMPLES is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y

#
# powerpc Debugging
#
# CONFIG_PPC_DISABLE_WERROR is not set
CONFIG_PPC_WERROR=y
CONFIG_PRINT_STACK_DEPTH=64
# CONFIG_CODE_PATCHING_SELFTEST is not set
# CONFIG_FTR_FIXUP_SELFTEST is not set
# CONFIG_MSI_BITMAP_SELFTEST is not set
# CONFIG_XMON is not set
# CONFIG_BDI_SWITCH is not set
# CONFIG_BOOTX_TEXT is not set
# CONFIG_PPC_EARLY_DEBUG is not set
# end of powerpc Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
# CONFIG_RUNTIME_TESTING_MENU is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# end of Kernel Testing and Coverage

#
# Rust hacking
#
# end of Rust hacking
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section
  2022-10-13  6:23 ` [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
@ 2022-10-18  4:03   ` Andrii Nakryiko
  0 siblings, 0 replies; 52+ messages in thread
From: Andrii Nakryiko @ 2022-10-18  4:03 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Dave Marchevsky, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Dave Marchevsky, Delyan Kratunov

On Wed, Oct 12, 2022 at 11:24 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> From: Dave Marchevsky <davemarchevsky@fb.com>
>
> Currently libbpf does not allow declaration of a struct bpf_spin_lock in
> global scope. Attempting to do so results in "failed to re-mmap" error,
> as .bss arraymap containing spinlock is not allowed to be mmap'd.
>
> This patch adds support for a .bss.private section. The maps contained
> in this section will not be mmaped into userspace by libbpf, nor will
> they be exposed via bpftool-generated skeleton.
>
> Intent here is to allow more natural programming pattern for
> global-scope spinlocks which will be used by rbtree locking mechanism in
> further patches in this series.
>
> Notes:
>
>   * Initially I called the section .bss.no_mmap, but the broader
>     'private' term better indicates that skeleton shouldn't expose these
>     maps at all, IMO.
>
>   * bpftool/gen.c's is_internal_mmapable_map function checks whether the
>     map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary
>     to remove .bss.private maps from skeleton
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Please see [0] for what I think is a better way forward specifically
for the libbpf-side part.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=686066&state=*

>  tools/lib/bpf/libbpf.c | 65 ++++++++++++++++++++++++++++--------------
>  1 file changed, 44 insertions(+), 21 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab
  2022-10-13  6:22 ` [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab Kumar Kartikeya Dwivedi
@ 2022-10-19  1:35   ` Alexei Starovoitov
  2022-10-19  5:42     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19  1:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Thu, Oct 13, 2022 at 11:52:44AM +0530, Kumar Kartikeya Dwivedi wrote:
> To prepare the BPF verifier to handle special fields in both map values
> and program allocated types coming from program BTF, we need to refactor
> the kptr_off_tab handling code into something more generic and reusable
> across both cases to avoid code duplication.
> 
> Later patches also require passing this data to helpers at runtime, so
> that they can work on user defined types, initialize them, destruct
> them, etc.
> 
> The main observation is that both map values and such allocated types
> point to a type in program BTF, hence they can be handled similarly. We
> can prepare a field metadata table for both cases and store them in
> struct bpf_map or struct btf depending on the use case.
> 
> Hence, refactor the code into generic btf_type_fields and btf_field
> member structs. The btf_type_fields represents the fields of a specific
> btf_type in user BTF. The cnt indicates the number of special fields we
> successfully recognized, and field_mask is a bitmask of fields that were
> found, to enable quick determination of availability of a certain field.
> 
> Subsequently, refactor the rest of the code to work with these generic
> types, remove assumptions about kptr and kptr_off_tab, rename variables
> to more meaningful names, etc.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h     | 103 +++++++++++++-------
>  include/linux/btf.h     |   4 +-
>  kernel/bpf/arraymap.c   |  13 ++-
>  kernel/bpf/btf.c        |  64 ++++++-------
>  kernel/bpf/hashtab.c    |  14 ++-
>  kernel/bpf/map_in_map.c |  13 ++-
>  kernel/bpf/syscall.c    | 203 +++++++++++++++++++++++-----------------
>  kernel/bpf/verifier.c   |  96 ++++++++++---------
>  8 files changed, 289 insertions(+), 221 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..25e77a172d7c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -164,35 +164,41 @@ struct bpf_map_ops {
>  };
>  
>  enum {
> -	/* Support at most 8 pointers in a BPF map value */
> -	BPF_MAP_VALUE_OFF_MAX = 8,
> -	BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> +	/* Support at most 8 pointers in a BTF type */
> +	BTF_FIELDS_MAX	      = 8,
> +	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
>  				1 + /* for bpf_spin_lock */
>  				1,  /* for bpf_timer */
>  };
>  
> -enum bpf_kptr_type {
> -	BPF_KPTR_UNREF,
> -	BPF_KPTR_REF,
> +enum btf_field_type {
> +	BPF_KPTR_UNREF = (1 << 2),
> +	BPF_KPTR_REF   = (1 << 3),
> +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>  };
>  
> -struct bpf_map_value_off_desc {
> +struct btf_field_kptr {
> +	struct btf *btf;
> +	struct module *module;
> +	btf_dtor_kfunc_t dtor;
> +	u32 btf_id;
> +};
> +
> +struct btf_field {
>  	u32 offset;
> -	enum bpf_kptr_type type;
> -	struct {
> -		struct btf *btf;
> -		struct module *module;
> -		btf_dtor_kfunc_t dtor;
> -		u32 btf_id;
> -	} kptr;
> +	enum btf_field_type type;
> +	union {
> +		struct btf_field_kptr kptr;
> +	};
>  };
>  
> -struct bpf_map_value_off {
> -	u32 nr_off;
> -	struct bpf_map_value_off_desc off[];
> +struct btf_type_fields {

How about btf_record instead ?
Then btf_type_fields_has_field() will become btf_record_has_field() ?

> +	u32 cnt;
> +	u32 field_mask;
> +	struct btf_field fields[];
>  };
>  
> -struct bpf_map_off_arr {
> +struct btf_type_fields_off {

struct btf_field_offs ?

>  	u32 cnt;
>  	u32 field_off[BPF_MAP_OFF_ARR_MAX];
>  	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
> @@ -214,7 +220,7 @@ struct bpf_map {
>  	u64 map_extra; /* any per-map-type extra fields */
>  	u32 map_flags;
>  	int spin_lock_off; /* >=0 valid offset, <0 error */
> -	struct bpf_map_value_off *kptr_off_tab;
> +	struct btf_type_fields *fields_tab;

struct btf_record *record; ?
The '_tab' suffix suppose to mean 'fieldS table' ?
Just 'record' seems clear enough.
Or
struct btf_record *btf_record; ?
if just 'record' ambiguous.

>  	int timer_off; /* >=0 valid offset, <0 error */
>  	u32 id;
>  	int numa_node;
> @@ -226,7 +232,7 @@ struct bpf_map {
>  	struct obj_cgroup *objcg;
>  #endif
>  	char name[BPF_OBJ_NAME_LEN];
> -	struct bpf_map_off_arr *off_arr;
> +	struct btf_type_fields_off *off_arr;

'off_arr' should probably be renamed as well.
How about 'struct btf_field_offs *field_offs;' ?

>  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
>  {
>  	if (unlikely(map_value_has_spin_lock(map)))
>  		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
>  	if (unlikely(map_value_has_timer(map)))
>  		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> -	if (unlikely(map_value_has_kptrs(map))) {
> -		struct bpf_map_value_off *tab = map->kptr_off_tab;
> +	if (!IS_ERR_OR_NULL(map->fields_tab)) {
> +		struct btf_field *fields = map->fields_tab->fields;

will become
struct btf_field *fields = map->record->fields;

> +		u32 cnt = map->fields_tab->cnt;
>  		int i;
>  
> -		for (i = 0; i < tab->nr_off; i++)
> -			*(u64 *)(dst + tab->off[i].offset) = 0;
> +		for (i = 0; i < cnt; i++)
> +			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
>  	}
>  }
>  
> @@ -1691,11 +1724,13 @@ void bpf_prog_put(struct bpf_prog *prog);
>  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
>  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
>  
> -struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
> -void bpf_map_free_kptr_off_tab(struct bpf_map *map);
> -struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
> -bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
> -void bpf_map_free_kptrs(struct bpf_map *map, void *map_value);
> +struct btf_field *btf_type_fields_find(const struct btf_type_fields *tab,
> +				       u32 offset, enum btf_field_type type);
> +void btf_type_fields_free(struct btf_type_fields *tab);

void btf_record_free(struct btf_record *r) ?

> +void bpf_map_free_fields_tab(struct bpf_map *map);

will become bpf_map_free_btf_record() ?

> +struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab);
> +bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf_type_fields *tab_b);
> +void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj);
>  
>  struct bpf_map *bpf_map_get(u32 ufd);
>  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 86aad9b2ce02..0d47cbb11a59 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -163,8 +163,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
>  			   u32 expected_offset, u32 expected_size);
>  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
>  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> -struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
> -					  const struct btf_type *t);
> +struct btf_type_fields *btf_parse_fields(const struct btf *btf,
> +					 const struct btf_type *t);
>  bool btf_type_is_void(const struct btf_type *t);
>  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
>  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 832b2659e96e..defe5c00049a 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -310,8 +310,7 @@ static void check_and_free_fields(struct bpf_array *arr, void *val)
>  {
>  	if (map_value_has_timer(&arr->map))
>  		bpf_timer_cancel_and_free(val + arr->map.timer_off);
> -	if (map_value_has_kptrs(&arr->map))
> -		bpf_map_free_kptrs(&arr->map, val);
> +	bpf_obj_free_fields(arr->map.fields_tab, val);
>  }
>  
>  /* Called from syscall or from eBPF program */
> @@ -409,7 +408,7 @@ static void array_map_free_timers(struct bpf_map *map)
>  	struct bpf_array *array = container_of(map, struct bpf_array, map);
>  	int i;
>  
> -	/* We don't reset or free kptr on uref dropping to zero. */
> +	/* We don't reset or free fields other than timer on uref dropping to zero. */
>  	if (!map_value_has_timer(map))
>  		return;
>  
> @@ -423,22 +422,22 @@ static void array_map_free(struct bpf_map *map)
>  	struct bpf_array *array = container_of(map, struct bpf_array, map);
>  	int i;
>  
> -	if (map_value_has_kptrs(map)) {
> +	if (!IS_ERR_OR_NULL(map->fields_tab)) {
>  		if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
>  			for (i = 0; i < array->map.max_entries; i++) {
>  				void __percpu *pptr = array->pptrs[i & array->index_mask];
>  				int cpu;
>  
>  				for_each_possible_cpu(cpu) {
> -					bpf_map_free_kptrs(map, per_cpu_ptr(pptr, cpu));
> +					bpf_obj_free_fields(map->fields_tab, per_cpu_ptr(pptr, cpu));
>  					cond_resched();
>  				}
>  			}
>  		} else {
>  			for (i = 0; i < array->map.max_entries; i++)
> -				bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
> +				bpf_obj_free_fields(map->fields_tab, array_map_elem_ptr(array, i));
>  		}
> -		bpf_map_free_kptr_off_tab(map);
> +		bpf_map_free_fields_tab(map);
>  	}
>  
>  	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index ad301e78f7ee..c8d267098b87 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3191,7 +3191,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
>  	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
>  }
>  
> -enum btf_field_type {
> +enum btf_field_info_type {
>  	BTF_FIELD_SPIN_LOCK,
>  	BTF_FIELD_TIMER,
>  	BTF_FIELD_KPTR,
> @@ -3203,9 +3203,9 @@ enum {
>  };
>  
>  struct btf_field_info {
> -	u32 type_id;
> +	enum btf_field_type type;
>  	u32 off;
> -	enum bpf_kptr_type type;
> +	u32 type_id;
>  };
>  
>  static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
> @@ -3222,7 +3222,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
>  static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
>  			 u32 off, int sz, struct btf_field_info *info)
>  {
> -	enum bpf_kptr_type type;
> +	enum btf_field_type type;
>  	u32 res_id;
>  
>  	/* Permit modifiers on the pointer itself */
> @@ -3259,7 +3259,7 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
>  
>  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
>  				 const char *name, int sz, int align,
> -				 enum btf_field_type field_type,
> +				 enum btf_field_info_type field_type,
>  				 struct btf_field_info *info, int info_cnt)
>  {
>  	const struct btf_member *member;
> @@ -3311,7 +3311,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
>  
>  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>  				const char *name, int sz, int align,
> -				enum btf_field_type field_type,
> +				enum btf_field_info_type field_type,
>  				struct btf_field_info *info, int info_cnt)
>  {
>  	const struct btf_var_secinfo *vsi;
> @@ -3360,7 +3360,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>  }
>  
>  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> -			  enum btf_field_type field_type,
> +			  enum btf_field_info_type field_type,
>  			  struct btf_field_info *info, int info_cnt)
>  {
>  	const char *name;
> @@ -3423,14 +3423,14 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t)
>  	return info.off;
>  }
>  
> -struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
> -					  const struct btf_type *t)
> +struct btf_type_fields *btf_parse_fields(const struct btf *btf,
> +					 const struct btf_type *t)
>  {
> -	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
> -	struct bpf_map_value_off *tab;
> +	struct btf_field_info info_arr[BTF_FIELDS_MAX];
>  	struct btf *kernel_btf = NULL;
> +	struct btf_type_fields *tab;

struct btf_record *r; ?

>  	struct module *mod = NULL;
> -	int ret, i, nr_off;
> +	int ret, i, cnt;
>  
>  	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
>  	if (ret < 0)
> @@ -3438,12 +3438,12 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
>  	if (!ret)
>  		return NULL;
>  
> -	nr_off = ret;
> -	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
> +	cnt = ret;
> +	tab = kzalloc(offsetof(struct btf_type_fields, fields[cnt]), GFP_KERNEL | __GFP_NOWARN);
>  	if (!tab)
>  		return ERR_PTR(-ENOMEM);
> -
> -	for (i = 0; i < nr_off; i++) {
> +	tab->cnt = 0;
> +	for (i = 0; i < cnt; i++) {
>  		const struct btf_type *t;
>  		s32 id;
>  
> @@ -3500,28 +3500,24 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
>  				ret = -EINVAL;
>  				goto end_mod;
>  			}
> -			tab->off[i].kptr.dtor = (void *)addr;
> +			tab->fields[i].kptr.dtor = (void *)addr;
>  		}
>  
> -		tab->off[i].offset = info_arr[i].off;
> -		tab->off[i].type = info_arr[i].type;
> -		tab->off[i].kptr.btf_id = id;
> -		tab->off[i].kptr.btf = kernel_btf;
> -		tab->off[i].kptr.module = mod;
> +		tab->fields[i].offset = info_arr[i].off;
> +		tab->fields[i].type = info_arr[i].type;
> +		tab->fields[i].kptr.btf_id = id;
> +		tab->fields[i].kptr.btf = kernel_btf;
> +		tab->fields[i].kptr.module = mod;
> +		tab->cnt++;
>  	}
> -	tab->nr_off = nr_off;
> +	tab->cnt = cnt;
>  	return tab;
>  end_mod:
>  	module_put(mod);
>  end_btf:
>  	btf_put(kernel_btf);
>  end:
> -	while (i--) {
> -		btf_put(tab->off[i].kptr.btf);
> -		if (tab->off[i].kptr.module)
> -			module_put(tab->off[i].kptr.module);
> -	}
> -	kfree(tab);
> +	btf_type_fields_free(tab);
>  	return ERR_PTR(ret);
>  }
>  
> @@ -6365,7 +6361,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  
>  		/* kptr_get is only true for kfunc */
>  		if (i == 0 && kptr_get) {
> -			struct bpf_map_value_off_desc *off_desc;
> +			struct btf_field *kptr_field;
>  
>  			if (reg->type != PTR_TO_MAP_VALUE) {
>  				bpf_log(log, "arg#0 expected pointer to map value\n");
> @@ -6381,8 +6377,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  				return -EINVAL;
>  			}
>  
> -			off_desc = bpf_map_kptr_off_contains(reg->map_ptr, reg->off + reg->var_off.value);
> -			if (!off_desc || off_desc->type != BPF_KPTR_REF) {
> +			kptr_field = btf_type_fields_find(reg->map_ptr->fields_tab, reg->off + reg->var_off.value, BPF_KPTR);
> +			if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
>  				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
>  					reg->off + reg->var_off.value);
>  				return -EINVAL;
> @@ -6401,8 +6397,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  					func_name, i, btf_type_str(ref_t), ref_tname);
>  				return -EINVAL;
>  			}
> -			if (!btf_struct_ids_match(log, btf, ref_id, 0, off_desc->kptr.btf,
> -						  off_desc->kptr.btf_id, true)) {
> +			if (!btf_struct_ids_match(log, btf, ref_id, 0, kptr_field->kptr.btf,
> +						  kptr_field->kptr.btf_id, true)) {
>  				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
>  					func_name, i, btf_type_str(ref_t), ref_tname);
>  				return -EINVAL;
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index ed3f8a53603b..59cdbea587c5 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -238,21 +238,20 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
>  	}
>  }
>  
> -static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
> +static void htab_free_prealloced_fields(struct bpf_htab *htab)
>  {
>  	u32 num_entries = htab->map.max_entries;
>  	int i;
>  
> -	if (!map_value_has_kptrs(&htab->map))
> +	if (IS_ERR_OR_NULL(htab->map.fields_tab))
>  		return;
>  	if (htab_has_extra_elems(htab))
>  		num_entries += num_possible_cpus();
> -
>  	for (i = 0; i < num_entries; i++) {
>  		struct htab_elem *elem;
>  
>  		elem = get_htab_elem(htab, i);
> -		bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
> +		bpf_obj_free_fields(htab->map.fields_tab, elem->key + round_up(htab->map.key_size, 8));
>  		cond_resched();
>  	}
>  }
> @@ -766,8 +765,7 @@ static void check_and_free_fields(struct bpf_htab *htab,
>  
>  	if (map_value_has_timer(&htab->map))
>  		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
> -	if (map_value_has_kptrs(&htab->map))
> -		bpf_map_free_kptrs(&htab->map, map_value);
> +	bpf_obj_free_fields(htab->map.fields_tab, map_value);
>  }
>  
>  /* It is called from the bpf_lru_list when the LRU needs to delete
> @@ -1517,11 +1515,11 @@ static void htab_map_free(struct bpf_map *map)
>  	if (!htab_is_prealloc(htab)) {
>  		delete_all_elements(htab);
>  	} else {
> -		htab_free_prealloced_kptrs(htab);
> +		htab_free_prealloced_fields(htab);
>  		prealloc_destroy(htab);
>  	}
>  
> -	bpf_map_free_kptr_off_tab(map);
> +	bpf_map_free_fields_tab(map);
>  	free_percpu(htab->extra_elems);
>  	bpf_map_area_free(htab->buckets);
>  	bpf_mem_alloc_destroy(&htab->pcpu_ma);
> diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> index 135205d0d560..2bff5f3a5efc 100644
> --- a/kernel/bpf/map_in_map.c
> +++ b/kernel/bpf/map_in_map.c
> @@ -52,7 +52,14 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>  	inner_map_meta->max_entries = inner_map->max_entries;
>  	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
>  	inner_map_meta->timer_off = inner_map->timer_off;
> -	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
> +	inner_map_meta->fields_tab = btf_type_fields_dup(inner_map->fields_tab);
> +	if (IS_ERR(inner_map_meta->fields_tab)) {
> +		/* btf_type_fields returns NULL or valid pointer in case of
> +		 * invalid/empty/valid, but ERR_PTR in case of errors.
> +		 */
> +		fdput(f);
> +		return ERR_CAST(inner_map_meta->fields_tab);
> +	}
>  	if (inner_map->btf) {
>  		btf_get(inner_map->btf);
>  		inner_map_meta->btf = inner_map->btf;
> @@ -72,7 +79,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>  
>  void bpf_map_meta_free(struct bpf_map *map_meta)
>  {
> -	bpf_map_free_kptr_off_tab(map_meta);
> +	bpf_map_free_fields_tab(map_meta);
>  	btf_put(map_meta->btf);
>  	kfree(map_meta);
>  }
> @@ -86,7 +93,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
>  		meta0->value_size == meta1->value_size &&
>  		meta0->timer_off == meta1->timer_off &&
>  		meta0->map_flags == meta1->map_flags &&
> -		bpf_map_equal_kptr_off_tab(meta0, meta1);
> +		btf_type_fields_equal(meta0->fields_tab, meta1->fields_tab);
>  }
>  
>  void *bpf_map_fd_get_ptr(struct bpf_map *map,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7b373a5e861f..83e7a290ad06 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -495,114 +495,134 @@ static void bpf_map_release_memcg(struct bpf_map *map)
>  }
>  #endif
>  
> -static int bpf_map_kptr_off_cmp(const void *a, const void *b)
> +static int btf_field_cmp(const void *a, const void *b)
>  {
> -	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> +	const struct btf_field *f1 = a, *f2 = b;
>  
> -	if (off_desc1->offset < off_desc2->offset)
> +	if (f1->offset < f2->offset)
>  		return -1;
> -	else if (off_desc1->offset > off_desc2->offset)
> +	else if (f1->offset > f2->offset)
>  		return 1;
>  	return 0;
>  }
>  
> -struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
> +struct btf_field *btf_type_fields_find(const struct btf_type_fields *tab, u32 offset,
> +				       enum btf_field_type type)
>  {
> -	/* Since members are iterated in btf_find_field in increasing order,
> -	 * offsets appended to kptr_off_tab are in increasing order, so we can
> -	 * do bsearch to find exact match.
> -	 */
> -	struct bpf_map_value_off *tab;
> +	struct btf_field *field;
>  
> -	if (!map_value_has_kptrs(map))
> +	if (IS_ERR_OR_NULL(tab) || !(tab->field_mask & type))
> +		return NULL;
> +	field = bsearch(&offset, tab->fields, tab->cnt, sizeof(tab->fields[0]), btf_field_cmp);
> +	if (!field || !(field->type & type))
>  		return NULL;
> -	tab = map->kptr_off_tab;
> -	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
> +	return field;
>  }
>  
> -void bpf_map_free_kptr_off_tab(struct bpf_map *map)
> +void btf_type_fields_free(struct btf_type_fields *tab)

void btf_record_free(struct btf_record *r)

>  {
> -	struct bpf_map_value_off *tab = map->kptr_off_tab;
>  	int i;
>  
> -	if (!map_value_has_kptrs(map))
> +	if (IS_ERR_OR_NULL(tab))
>  		return;
> -	for (i = 0; i < tab->nr_off; i++) {
> -		if (tab->off[i].kptr.module)
> -			module_put(tab->off[i].kptr.module);
> -		btf_put(tab->off[i].kptr.btf);
> +	for (i = 0; i < tab->cnt; i++) {
> +		switch (tab->fields[i].type) {
> +		case BPF_KPTR_UNREF:
> +		case BPF_KPTR_REF:
> +			if (tab->fields[i].kptr.module)
> +				module_put(tab->fields[i].kptr.module);
> +			btf_put(tab->fields[i].kptr.btf);
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			continue;
> +		}
>  	}
>  	kfree(tab);
> -	map->kptr_off_tab = NULL;
>  }
>  
> -struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
> +void bpf_map_free_fields_tab(struct bpf_map *map)

void bpf_map_free_btf_record(struct bpf_map *map) ?

> +{
> +	btf_type_fields_free(map->fields_tab);
> +	map->fields_tab = NULL;
> +}
> +
> +struct btf_type_fields *btf_type_fields_dup(const struct btf_type_fields *tab)
>  {
> -	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
> -	int size, i;
> +	struct btf_type_fields *new_tab;
> +	const struct btf_field *fields;
> +	int ret, size, i;
>  
> -	if (!map_value_has_kptrs(map))
> -		return ERR_PTR(-ENOENT);
> -	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
> +	if (IS_ERR_OR_NULL(tab))
> +		return NULL;
> +	size = offsetof(struct btf_type_fields, fields[tab->cnt]);
>  	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
>  	if (!new_tab)
>  		return ERR_PTR(-ENOMEM);
> -	/* Do a deep copy of the kptr_off_tab */
> -	for (i = 0; i < tab->nr_off; i++) {
> -		btf_get(tab->off[i].kptr.btf);
> -		if (tab->off[i].kptr.module && !try_module_get(tab->off[i].kptr.module)) {
> -			while (i--) {
> -				if (tab->off[i].kptr.module)
> -					module_put(tab->off[i].kptr.module);
> -				btf_put(tab->off[i].kptr.btf);
> +	/* Do a deep copy of the fields_tab */
> +	fields = tab->fields;
> +	new_tab->cnt = 0;
> +	for (i = 0; i < tab->cnt; i++) {
> +		switch (fields[i].type) {
> +		case BPF_KPTR_UNREF:
> +		case BPF_KPTR_REF:
> +			btf_get(fields[i].kptr.btf);
> +			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
> +				ret = -ENXIO;
> +				goto free;
>  			}
> -			kfree(new_tab);
> -			return ERR_PTR(-ENXIO);
> +			break;
> +		default:
> +			ret = -EFAULT;
> +			WARN_ON_ONCE(1);
> +			goto free;
>  		}
> +		new_tab->cnt++;
>  	}
>  	return new_tab;
> +free:
> +	btf_type_fields_free(new_tab);
> +	return ERR_PTR(ret);
>  }
>  
> -bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> +bool btf_type_fields_equal(const struct btf_type_fields *tab_a, const struct btf_type_fields *tab_b)
>  {
> -	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> -	bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
> +	bool a_has_fields = !IS_ERR_OR_NULL(tab_a), b_has_fields = !IS_ERR_OR_NULL(tab_b);
>  	int size;
>  
> -	if (!a_has_kptr && !b_has_kptr)
> +	if (!a_has_fields && !b_has_fields)
>  		return true;
> -	if (a_has_kptr != b_has_kptr)
> +	if (a_has_fields != b_has_fields)
>  		return false;
> -	if (tab_a->nr_off != tab_b->nr_off)
> +	if (tab_a->cnt != tab_b->cnt)
>  		return false;
> -	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> +	size = offsetof(struct btf_type_fields, fields[tab_a->cnt]);
>  	return !memcmp(tab_a, tab_b, size);
>  }
>  
> -/* Caller must ensure map_value_has_kptrs is true. Note that this function can
> - * be called on a map value while the map_value is visible to BPF programs, as
> - * it ensures the correct synchronization, and we already enforce the same using
> - * the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
> - */
> -void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
> +void bpf_obj_free_fields(const struct btf_type_fields *tab, void *obj)

Still thinking about this one. bpf_obj_free_fields() seems to fit fine.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management into fields_tab
  2022-10-13  6:22 ` [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management " Kumar Kartikeya Dwivedi
@ 2022-10-19  1:40   ` Alexei Starovoitov
  2022-10-19  5:43     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19  1:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Thu, Oct 13, 2022 at 11:52:45AM +0530, Kumar Kartikeya Dwivedi wrote:
>  	if (unlikely((map_flags & BPF_F_LOCK) &&
> -		     !map_value_has_spin_lock(map)))
> +		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
>  		return -EINVAL;

...

>  	/* We don't reset or free fields other than timer on uref dropping to zero. */
> -	if (!map_value_has_timer(map))
> +	if (!btf_type_fields_has_field(map->fields_tab, BPF_TIMER))

...

> -		     !map_value_has_spin_lock(&smap->map)))
> +		     !btf_type_fields_has_field(smap->map.fields_tab, BPF_SPIN_LOCK)))
>  		return ERR_PTR(-EINVAL);

...

> -	if (!map_value_has_timer(&htab->map))
> +	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
>  		return;

...

>  	if (unlikely(map_flags & BPF_F_LOCK)) {
> -		if (unlikely(!map_value_has_spin_lock(map)))
> +		if (unlikely(!btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
>  			return -EINVAL;

...

> -	/* We don't reset or free kptr on uref dropping to zero. */
> -	if (!map_value_has_timer(&htab->map))
> +	/* We only free timer on uref dropping to zero */
> +	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
>  		return;

...
>  	if ((elem_map_flags & ~BPF_F_LOCK) ||
> -	    ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> +	    ((elem_map_flags & BPF_F_LOCK) && !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
>  		return -EINVAL;

...

>  	if (unlikely((flags & BPF_F_LOCK) &&
> -		     !map_value_has_spin_lock(map)))
> +		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
>  		return -EINVAL;

...

> -	if (map_value_has_spin_lock(inner_map)) {
> +	if (btf_type_fields_has_field(inner_map->fields_tab, BPF_SPIN_LOCK)) {
>  		fdput(f);
>  		return ERR_PTR(-ENOTSUPP);

...

>  	if ((attr->flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map)) {
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
>  		err = -EINVAL;
>  		goto err_put;
>  	}
> @@ -1440,7 +1428,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>  	}
>  
>  	if ((attr->flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map)) {
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
>  		err = -EINVAL;
>  		goto err_put;
>  	}
> @@ -1603,7 +1591,7 @@ int generic_map_delete_batch(struct bpf_map *map,
>  		return -EINVAL;
>  
>  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map)) {
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
>  		return -EINVAL;
>  	}
>  
> @@ -1660,7 +1648,7 @@ int generic_map_update_batch(struct bpf_map *map,
>  		return -EINVAL;
>  
>  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map)) {
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
>  		return -EINVAL;
>  	}
>  
> @@ -1723,7 +1711,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>  		return -EINVAL;
>  
>  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map))
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK))
>  		return -EINVAL;
>  
>  	value_size = bpf_map_value_size(map);
> @@ -1845,7 +1833,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>  	}
>  
>  	if ((attr->flags & BPF_F_LOCK) &&
> -	    !map_value_has_spin_lock(map)) {
> +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {

All of these btf_type_fields_has_field() is quite an eyesore.
That was the reason to suggest btf_record_has_field() in the previous email.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values
  2022-10-13  6:22 ` [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
@ 2022-10-19  1:59   ` Alexei Starovoitov
  2022-10-19  5:48     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19  1:59 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Thu, Oct 13, 2022 at 11:52:47AM +0530, Kumar Kartikeya Dwivedi wrote:
> Add the basic support on the map side to parse, recognize, verify, and
> build metadata table for a new special field of the type struct
> bpf_list_head. To parameterize the bpf_list_head for a certain value
> type and the list_node member it will accept in that value type, we use
> BTF declaration tags.
> 
> The definition of bpf_list_head in a map value will be done as follows:
> 
> struct foo {
> 	struct bpf_list_node node;
> 	int data;
> };
> 
> struct map_value {
> 	struct bpf_list_head head __contains(foo, node);
> };
> 
> Then, the bpf_list_head only allows adding to the list 'head' using the
> bpf_list_node 'node' for the type struct foo.
> 
> The 'contains' annotation is a BTF declaration tag composed of four
> parts, "contains:kind:name:node" where the kind and name is then used to
> look up the type in the map BTF. The node defines name of the member in
> this type that has the type struct bpf_list_node, which is actually used
> for linking into the linked list. For now, 'kind' part is hardcoded as
> struct.

...

> +	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
> +	if (!value_type)
> +		return -EINVAL;
> +	if (strncmp(value_type, "struct:", sizeof("struct:") - 1))
> +		return -EINVAL;
> +	value_type += sizeof("struct:") - 1;

I don't get it.
The patch 24 does:
+#define __contains(name, node) __attribute__((btf_decl_tag("contains:struct:" #name ":" #node)))

The 'struct:' part is invisible to users. They won't make a mistake.
Why bother adding it to BTF and then check for it?
Backward compat concerns?
But it's in bpf_experimental.h.
That probably be the last thing to change and so easy to do.
Please drop it?

> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> new file mode 100644
> index 000000000000..4e31790e433d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -0,0 +1,23 @@
> +#ifndef __KERNEL__
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +

Why bother with the above?
The below should be enough ?

> +#else
> +
> +struct bpf_list_head {
> +	__u64 __a;
> +	__u64 __b;
> +} __attribute__((aligned(8)));
> +
> +struct bpf_list_node {
> +	__u64 __a;
> +	__u64 __b;
> +} __attribute__((aligned(8)));
> +
> +#endif

> +
> +#ifndef __KERNEL__
> +#endif

hmm.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-13  6:22 ` [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new Kumar Kartikeya Dwivedi
@ 2022-10-19  2:31   ` Alexei Starovoitov
  2022-10-19  5:58     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19  2:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> +{
> +	struct btf_struct_meta *meta = meta__ign;
> +	u64 size = local_type_id__k;
> +	void *p;
> +
> +	if (unlikely(flags || !bpf_global_ma_set))
> +		return NULL;

Unused 'flags' looks weird in unstable api. Just drop it?
And keep it as:
void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);

and in bpf_experimental.h:

extern void *bpf_kptr_new(__u64 local_type_id) __ksym;

since __ign args are ignored during kfunc type match
the bpf progs can use it without #define.

> +	p = bpf_mem_alloc(&bpf_global_ma, size);
> +	if (!p)
> +		return NULL;
> +	if (meta)
> +		bpf_obj_init(meta->off_arr, p);

I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
It reminds me of programming style where people tried to add types into
variable names. imo dropping _arr wouldn't be just fine.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab
  2022-10-19  1:35   ` Alexei Starovoitov
@ 2022-10-19  5:42     ` Kumar Kartikeya Dwivedi
  2022-10-19 15:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19  5:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 07:05:26AM IST, Alexei Starovoitov wrote:
> On Thu, Oct 13, 2022 at 11:52:44AM +0530, Kumar Kartikeya Dwivedi wrote:
> > To prepare the BPF verifier to handle special fields in both map values
> > and program allocated types coming from program BTF, we need to refactor
> > the kptr_off_tab handling code into something more generic and reusable
> > across both cases to avoid code duplication.
> >
> > Later patches also require passing this data to helpers at runtime, so
> > that they can work on user defined types, initialize them, destruct
> > them, etc.
> >
> > The main observation is that both map values and such allocated types
> > point to a type in program BTF, hence they can be handled similarly. We
> > can prepare a field metadata table for both cases and store them in
> > struct bpf_map or struct btf depending on the use case.
> >
> > Hence, refactor the code into generic btf_type_fields and btf_field
> > member structs. The btf_type_fields represents the fields of a specific
> > btf_type in user BTF. The cnt indicates the number of special fields we
> > successfully recognized, and field_mask is a bitmask of fields that were
> > found, to enable quick determination of availability of a certain field.
> >
> > Subsequently, refactor the rest of the code to work with these generic
> > types, remove assumptions about kptr and kptr_off_tab, rename variables
> > to more meaningful names, etc.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h     | 103 +++++++++++++-------
> >  include/linux/btf.h     |   4 +-
> >  kernel/bpf/arraymap.c   |  13 ++-
> >  kernel/bpf/btf.c        |  64 ++++++-------
> >  kernel/bpf/hashtab.c    |  14 ++-
> >  kernel/bpf/map_in_map.c |  13 ++-
> >  kernel/bpf/syscall.c    | 203 +++++++++++++++++++++++-----------------
> >  kernel/bpf/verifier.c   |  96 ++++++++++---------
> >  8 files changed, 289 insertions(+), 221 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9e7d46d16032..25e77a172d7c 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -164,35 +164,41 @@ struct bpf_map_ops {
> >  };
> >
> >  enum {
> > -	/* Support at most 8 pointers in a BPF map value */
> > -	BPF_MAP_VALUE_OFF_MAX = 8,
> > -	BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> > +	/* Support at most 8 pointers in a BTF type */
> > +	BTF_FIELDS_MAX	      = 8,
> > +	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
> >  				1 + /* for bpf_spin_lock */
> >  				1,  /* for bpf_timer */
> >  };
> >
> > -enum bpf_kptr_type {
> > -	BPF_KPTR_UNREF,
> > -	BPF_KPTR_REF,
> > +enum btf_field_type {
> > +	BPF_KPTR_UNREF = (1 << 2),
> > +	BPF_KPTR_REF   = (1 << 3),
> > +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> >  };
> >
> > -struct bpf_map_value_off_desc {
> > +struct btf_field_kptr {
> > +	struct btf *btf;
> > +	struct module *module;
> > +	btf_dtor_kfunc_t dtor;
> > +	u32 btf_id;
> > +};
> > +
> > +struct btf_field {
> >  	u32 offset;
> > -	enum bpf_kptr_type type;
> > -	struct {
> > -		struct btf *btf;
> > -		struct module *module;
> > -		btf_dtor_kfunc_t dtor;
> > -		u32 btf_id;
> > -	} kptr;
> > +	enum btf_field_type type;
> > +	union {
> > +		struct btf_field_kptr kptr;
> > +	};
> >  };
> >
> > -struct bpf_map_value_off {
> > -	u32 nr_off;
> > -	struct bpf_map_value_off_desc off[];
> > +struct btf_type_fields {
>
> How about btf_record instead ?
> Then btf_type_fields_has_field() will become btf_record_has_field() ?
>

I guess btf_record is ok. I thought of just making it btf_fields, but then
bpf_map_free_fields (for freeing this struct) and bpf_obj_free_fields (for
freeing actual fields of object) gets confusing.

Or to be more precise I could name the struct btf_type_record,
but the member variable record in all places.

> > +	u32 cnt;
> > +	u32 field_mask;
> > +	struct btf_field fields[];
> >  };
> >
> > -struct bpf_map_off_arr {
> > +struct btf_type_fields_off {
>
> struct btf_field_offs ?
>
> >  	u32 cnt;
> >  	u32 field_off[BPF_MAP_OFF_ARR_MAX];
> >  	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
> > @@ -214,7 +220,7 @@ struct bpf_map {
> >  	u64 map_extra; /* any per-map-type extra fields */
> >  	u32 map_flags;
> >  	int spin_lock_off; /* >=0 valid offset, <0 error */
> > -	struct bpf_map_value_off *kptr_off_tab;
> > +	struct btf_type_fields *fields_tab;
>
> struct btf_record *record; ?
> The '_tab' suffix suppose to mean 'fieldS table' ?
> Just 'record' seems clear enough.
> Or
> struct btf_record *btf_record; ?
> if just 'record' ambiguous.
>

Ack. I think just 'record' is ok.

> >  	int timer_off; /* >=0 valid offset, <0 error */
> >  	u32 id;
> >  	int numa_node;
> > @@ -226,7 +232,7 @@ struct bpf_map {
> >  	struct obj_cgroup *objcg;
> >  #endif
> >  	char name[BPF_OBJ_NAME_LEN];
> > -	struct bpf_map_off_arr *off_arr;
> > +	struct btf_type_fields_off *off_arr;
>
> 'off_arr' should probably be renamed as well.
> How about 'struct btf_field_offs *field_offs;' ?
>

Ack.

> >  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >  {
> >  	if (unlikely(map_value_has_spin_lock(map)))
> >  		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
> >  	if (unlikely(map_value_has_timer(map)))
> >  		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> > -	if (unlikely(map_value_has_kptrs(map))) {
> > -		struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +	if (!IS_ERR_OR_NULL(map->fields_tab)) {
> > +		struct btf_field *fields = map->fields_tab->fields;
>
> will become
> struct btf_field *fields = map->record->fields;
>

Ack for this and the rest below.

> [...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management into fields_tab
  2022-10-19  1:40   ` Alexei Starovoitov
@ 2022-10-19  5:43     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19  5:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 07:10:50AM IST, Alexei Starovoitov wrote:
> On Thu, Oct 13, 2022 at 11:52:45AM +0530, Kumar Kartikeya Dwivedi wrote:
> >  	if (unlikely((map_flags & BPF_F_LOCK) &&
> > -		     !map_value_has_spin_lock(map)))
> > +		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
> >  		return -EINVAL;
>
> ...
>
> >  	/* We don't reset or free fields other than timer on uref dropping to zero. */
> > -	if (!map_value_has_timer(map))
> > +	if (!btf_type_fields_has_field(map->fields_tab, BPF_TIMER))
>
> ...
>
> > -		     !map_value_has_spin_lock(&smap->map)))
> > +		     !btf_type_fields_has_field(smap->map.fields_tab, BPF_SPIN_LOCK)))
> >  		return ERR_PTR(-EINVAL);
>
> ...
>
> > -	if (!map_value_has_timer(&htab->map))
> > +	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
> >  		return;
>
> ...
>
> >  	if (unlikely(map_flags & BPF_F_LOCK)) {
> > -		if (unlikely(!map_value_has_spin_lock(map)))
> > +		if (unlikely(!btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
> >  			return -EINVAL;
>
> ...
>
> > -	/* We don't reset or free kptr on uref dropping to zero. */
> > -	if (!map_value_has_timer(&htab->map))
> > +	/* We only free timer on uref dropping to zero */
> > +	if (!btf_type_fields_has_field(htab->map.fields_tab, BPF_TIMER))
> >  		return;
>
> ...
> >  	if ((elem_map_flags & ~BPF_F_LOCK) ||
> > -	    ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> > +	    ((elem_map_flags & BPF_F_LOCK) && !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
> >  		return -EINVAL;
>
> ...
>
> >  	if (unlikely((flags & BPF_F_LOCK) &&
> > -		     !map_value_has_spin_lock(map)))
> > +		     !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)))
> >  		return -EINVAL;
>
> ...
>
> > -	if (map_value_has_spin_lock(inner_map)) {
> > +	if (btf_type_fields_has_field(inner_map->fields_tab, BPF_SPIN_LOCK)) {
> >  		fdput(f);
> >  		return ERR_PTR(-ENOTSUPP);
>
> ...
>
> >  	if ((attr->flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map)) {
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
> >  		err = -EINVAL;
> >  		goto err_put;
> >  	}
> > @@ -1440,7 +1428,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
> >  	}
> >
> >  	if ((attr->flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map)) {
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
> >  		err = -EINVAL;
> >  		goto err_put;
> >  	}
> > @@ -1603,7 +1591,7 @@ int generic_map_delete_batch(struct bpf_map *map,
> >  		return -EINVAL;
> >
> >  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map)) {
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
> >  		return -EINVAL;
> >  	}
> >
> > @@ -1660,7 +1648,7 @@ int generic_map_update_batch(struct bpf_map *map,
> >  		return -EINVAL;
> >
> >  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map)) {
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
> >  		return -EINVAL;
> >  	}
> >
> > @@ -1723,7 +1711,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
> >  		return -EINVAL;
> >
> >  	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map))
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK))
> >  		return -EINVAL;
> >
> >  	value_size = bpf_map_value_size(map);
> > @@ -1845,7 +1833,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
> >  	}
> >
> >  	if ((attr->flags & BPF_F_LOCK) &&
> > -	    !map_value_has_spin_lock(map)) {
> > +	    !btf_type_fields_has_field(map->fields_tab, BPF_SPIN_LOCK)) {
>
> All of these btf_type_fields_has_field() is quite an eyesore.
> That was the reason to suggest btf_record_has_field() in the previous email.

I agree, what do you think of calling it btf_type_has_field? You pass in the
btf_type_record and the field type.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values
  2022-10-19  1:59   ` Alexei Starovoitov
@ 2022-10-19  5:48     ` Kumar Kartikeya Dwivedi
  2022-10-19 15:57       ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19  5:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 07:29:16AM IST, Alexei Starovoitov wrote:
> On Thu, Oct 13, 2022 at 11:52:47AM +0530, Kumar Kartikeya Dwivedi wrote:
> > Add the basic support on the map side to parse, recognize, verify, and
> > build metadata table for a new special field of the type struct
> > bpf_list_head. To parameterize the bpf_list_head for a certain value
> > type and the list_node member it will accept in that value type, we use
> > BTF declaration tags.
> >
> > The definition of bpf_list_head in a map value will be done as follows:
> >
> > struct foo {
> > 	struct bpf_list_node node;
> > 	int data;
> > };
> >
> > struct map_value {
> > 	struct bpf_list_head head __contains(foo, node);
> > };
> >
> > Then, the bpf_list_head only allows adding to the list 'head' using the
> > bpf_list_node 'node' for the type struct foo.
> >
> > The 'contains' annotation is a BTF declaration tag composed of four
> > parts, "contains:kind:name:node" where the kind and name is then used to
> > look up the type in the map BTF. The node defines name of the member in
> > this type that has the type struct bpf_list_node, which is actually used
> > for linking into the linked list. For now, 'kind' part is hardcoded as
> > struct.
>
> ...
>
> > +	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
> > +	if (!value_type)
> > +		return -EINVAL;
> > +	if (strncmp(value_type, "struct:", sizeof("struct:") - 1))
> > +		return -EINVAL;
> > +	value_type += sizeof("struct:") - 1;
>
> I don't get it.
> The patch 24 does:
> +#define __contains(name, node) __attribute__((btf_decl_tag("contains:struct:" #name ":" #node)))
>
> The 'struct:' part is invisible to users. They won't make a mistake.
> Why bother adding it to BTF and then check for it?
> Backward compat concerns?
> But it's in bpf_experimental.h.
> That probably be the last thing to change and so easy to do.
> Please drop it?
>

Fair, I just left it there anticipating atleast union with a discriminant might
be a possible candidate, but since this is all unstable it's not a big deal.

> > diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> > new file mode 100644
> > index 000000000000..4e31790e433d
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> > @@ -0,0 +1,23 @@
> > +#ifndef __KERNEL__
> > +
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_core_read.h>
> > +
>
> Why bother with the above?
> The below should be enough ?
>

Actually, I'm using this header inside the kernel, userspace, and BPF programs.
In the kernel to provide type definitions for bpf_list_head and bpf_list_node,
which are then emitted to vmlinux.h (and also used inside the kernel ofcourse).

In userspace for these types as otherwise including skeleton fails to build, as
such types are global variables, but there I have to define __KERNEL__ around
include.

In the BPF program, for the kfunc declarations.

I guess I can split the header into two to avoid confusion. I agree it's a bit
ugly.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-19  2:31   ` Alexei Starovoitov
@ 2022-10-19  5:58     ` Kumar Kartikeya Dwivedi
  2022-10-19 16:31       ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19  5:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > +{
> > +	struct btf_struct_meta *meta = meta__ign;
> > +	u64 size = local_type_id__k;
> > +	void *p;
> > +
> > +	if (unlikely(flags || !bpf_global_ma_set))
> > +		return NULL;
>
> Unused 'flags' looks weird in unstable api. Just drop it?
> And keep it as:
> void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
>
> and in bpf_experimental.h:
>
> extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
>
> since __ign args are ignored during kfunc type match
> the bpf progs can use it without #define.
>

It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
prototypes will not be the same. I guess I'll have to teach it do that during
type match, but IDK how you feel about that.

Otherwise unless you want people to manually pass something to the ignored
argument, we have to hide it behind a macro.

I actually like the macro on top, then I don't even pass the type ID but the
type. But that's a personal preference, and I don't feel strongly about it.

So in C one does malloc(sizeof(*p)), here we'll just write
bpf_kptr_new(typeof(*p)). YMMV.

> > +	p = bpf_mem_alloc(&bpf_global_ma, size);
> > +	if (!p)
> > +		return NULL;
> > +	if (meta)
> > +		bpf_obj_init(meta->off_arr, p);
>
> I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> It reminds me of programming style where people tried to add types into
> variable names. imo dropping _arr wouldn't be just fine.

Ack, I'll do it in v3.

Also, I'd like to invite people to please bikeshed a bit over the naming of the
APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.

In the BPF list API, it's named bpf_list_del but it's actually distinct from how
list_del in the kernel works. So it does make sense to give them a different
name (like pop_front/pop_back and push_front/push_back)?

Because even bpf_list_add takes bpf_list_head, in the kernel there's no
distinction between node and head, so you can do list_add on a node as well, but
it won't be possible with the kfunc (unless we overload the head argument to
also work with nodes).

Later we'll probably have to add bpf_list_node_add etc. that add before or after
a node to make that work.

The main question is whether it should closely resembly the linked list API in
the kernel, or can it steer away considerably from that?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab
  2022-10-19  5:42     ` Kumar Kartikeya Dwivedi
@ 2022-10-19 15:54       ` Alexei Starovoitov
  2022-10-19 23:57         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19 15:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Tue, Oct 18, 2022 at 10:43 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 07:05:26AM IST, Alexei Starovoitov wrote:
> > On Thu, Oct 13, 2022 at 11:52:44AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > To prepare the BPF verifier to handle special fields in both map values
> > > and program allocated types coming from program BTF, we need to refactor
> > > the kptr_off_tab handling code into something more generic and reusable
> > > across both cases to avoid code duplication.
> > >
> > > Later patches also require passing this data to helpers at runtime, so
> > > that they can work on user defined types, initialize them, destruct
> > > them, etc.
> > >
> > > The main observation is that both map values and such allocated types
> > > point to a type in program BTF, hence they can be handled similarly. We
> > > can prepare a field metadata table for both cases and store them in
> > > struct bpf_map or struct btf depending on the use case.
> > >
> > > Hence, refactor the code into generic btf_type_fields and btf_field
> > > member structs. The btf_type_fields represents the fields of a specific
> > > btf_type in user BTF. The cnt indicates the number of special fields we
> > > successfully recognized, and field_mask is a bitmask of fields that were
> > > found, to enable quick determination of availability of a certain field.
> > >
> > > Subsequently, refactor the rest of the code to work with these generic
> > > types, remove assumptions about kptr and kptr_off_tab, rename variables
> > > to more meaningful names, etc.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h     | 103 +++++++++++++-------
> > >  include/linux/btf.h     |   4 +-
> > >  kernel/bpf/arraymap.c   |  13 ++-
> > >  kernel/bpf/btf.c        |  64 ++++++-------
> > >  kernel/bpf/hashtab.c    |  14 ++-
> > >  kernel/bpf/map_in_map.c |  13 ++-
> > >  kernel/bpf/syscall.c    | 203 +++++++++++++++++++++++-----------------
> > >  kernel/bpf/verifier.c   |  96 ++++++++++---------
> > >  8 files changed, 289 insertions(+), 221 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 9e7d46d16032..25e77a172d7c 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -164,35 +164,41 @@ struct bpf_map_ops {
> > >  };
> > >
> > >  enum {
> > > -   /* Support at most 8 pointers in a BPF map value */
> > > -   BPF_MAP_VALUE_OFF_MAX = 8,
> > > -   BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> > > +   /* Support at most 8 pointers in a BTF type */
> > > +   BTF_FIELDS_MAX        = 8,
> > > +   BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
> > >                             1 + /* for bpf_spin_lock */
> > >                             1,  /* for bpf_timer */
> > >  };
> > >
> > > -enum bpf_kptr_type {
> > > -   BPF_KPTR_UNREF,
> > > -   BPF_KPTR_REF,
> > > +enum btf_field_type {
> > > +   BPF_KPTR_UNREF = (1 << 2),
> > > +   BPF_KPTR_REF   = (1 << 3),
> > > +   BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> > >  };
> > >
> > > -struct bpf_map_value_off_desc {
> > > +struct btf_field_kptr {
> > > +   struct btf *btf;
> > > +   struct module *module;
> > > +   btf_dtor_kfunc_t dtor;
> > > +   u32 btf_id;
> > > +};
> > > +
> > > +struct btf_field {
> > >     u32 offset;
> > > -   enum bpf_kptr_type type;
> > > -   struct {
> > > -           struct btf *btf;
> > > -           struct module *module;
> > > -           btf_dtor_kfunc_t dtor;
> > > -           u32 btf_id;
> > > -   } kptr;
> > > +   enum btf_field_type type;
> > > +   union {
> > > +           struct btf_field_kptr kptr;
> > > +   };
> > >  };
> > >
> > > -struct bpf_map_value_off {
> > > -   u32 nr_off;
> > > -   struct bpf_map_value_off_desc off[];
> > > +struct btf_type_fields {
> >
> > How about btf_record instead ?
> > Then btf_type_fields_has_field() will become btf_record_has_field() ?
> >
>
> I guess btf_record is ok. I thought of just making it btf_fields, but then
> bpf_map_free_fields (for freeing this struct) and bpf_obj_free_fields (for
> freeing actual fields of object) gets confusing.
>
> Or to be more precise I could name the struct btf_type_record,
> but the member variable record in all places.

What "_type_" prefix adds to btf_record ?

btf already has Type in the abbrev.

And from the other email:

> I agree, what do you think of calling it btf_type_has_field? You pass > in the
> btf_type_record and the field type.

btf_type_has_field doesn't sound right.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values
  2022-10-19  5:48     ` Kumar Kartikeya Dwivedi
@ 2022-10-19 15:57       ` Alexei Starovoitov
  2022-10-19 23:59         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19 15:57 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Tue, Oct 18, 2022 at 10:48 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 07:29:16AM IST, Alexei Starovoitov wrote:
> > On Thu, Oct 13, 2022 at 11:52:47AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > Add the basic support on the map side to parse, recognize, verify, and
> > > build metadata table for a new special field of the type struct
> > > bpf_list_head. To parameterize the bpf_list_head for a certain value
> > > type and the list_node member it will accept in that value type, we use
> > > BTF declaration tags.
> > >
> > > The definition of bpf_list_head in a map value will be done as follows:
> > >
> > > struct foo {
> > >     struct bpf_list_node node;
> > >     int data;
> > > };
> > >
> > > struct map_value {
> > >     struct bpf_list_head head __contains(foo, node);
> > > };
> > >
> > > Then, the bpf_list_head only allows adding to the list 'head' using the
> > > bpf_list_node 'node' for the type struct foo.
> > >
> > > The 'contains' annotation is a BTF declaration tag composed of four
> > > parts, "contains:kind:name:node" where the kind and name is then used to
> > > look up the type in the map BTF. The node defines name of the member in
> > > this type that has the type struct bpf_list_node, which is actually used
> > > for linking into the linked list. For now, 'kind' part is hardcoded as
> > > struct.
> >
> > ...
> >
> > > +   value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
> > > +   if (!value_type)
> > > +           return -EINVAL;
> > > +   if (strncmp(value_type, "struct:", sizeof("struct:") - 1))
> > > +           return -EINVAL;
> > > +   value_type += sizeof("struct:") - 1;
> >
> > I don't get it.
> > The patch 24 does:
> > +#define __contains(name, node) __attribute__((btf_decl_tag("contains:struct:" #name ":" #node)))
> >
> > The 'struct:' part is invisible to users. They won't make a mistake.
> > Why bother adding it to BTF and then check for it?
> > Backward compat concerns?
> > But it's in bpf_experimental.h.
> > That probably be the last thing to change and so easy to do.
> > Please drop it?
> >
>
> Fair, I just left it there anticipating atleast union with a discriminant might
> be a possible candidate, but since this is all unstable it's not a big deal.
>
> > > diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> > > new file mode 100644
> > > index 000000000000..4e31790e433d
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> > > @@ -0,0 +1,23 @@
> > > +#ifndef __KERNEL__
> > > +
> > > +#include <vmlinux.h>
> > > +#include <bpf/bpf_tracing.h>
> > > +#include <bpf/bpf_helpers.h>
> > > +#include <bpf/bpf_core_read.h>
> > > +
> >
> > Why bother with the above?
> > The below should be enough ?
> >
>
> Actually, I'm using this header inside the kernel, userspace, and BPF programs.
> In the kernel to provide type definitions for bpf_list_head and bpf_list_node,
> which are then emitted to vmlinux.h (and also used inside the kernel ofcourse).
>
> In userspace for these types as otherwise including skeleton fails to build, as
> such types are global variables, but there I have to define __KERNEL__ around
> include.
>
> In the BPF program, for the kfunc declarations.
>
> I guess I can split the header into two to avoid confusion. I agree it's a bit
> ugly.

I think we can add bpf_list_head and bpf_list_node to uapi/bpf.h
The chances of them changing the size are pretty low.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null
  2022-10-13  6:22 ` [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null Kumar Kartikeya Dwivedi
@ 2022-10-19 16:04   ` Dave Marchevsky
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Marchevsky @ 2022-10-19 16:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Delyan Kratunov

On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> It is not scalable to maintain a list of types that can have non-zero
> ref_obj_id. It is never set for scalars anyway, so just remove the
> conditional on register types and print it whenever it is non-zero.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-19  5:58     ` Kumar Kartikeya Dwivedi
@ 2022-10-19 16:31       ` Alexei Starovoitov
  2022-10-20  0:44         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-19 16:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > +{
> > > +   struct btf_struct_meta *meta = meta__ign;
> > > +   u64 size = local_type_id__k;
> > > +   void *p;
> > > +
> > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > +           return NULL;
> >
> > Unused 'flags' looks weird in unstable api. Just drop it?
> > And keep it as:
> > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> >
> > and in bpf_experimental.h:
> >
> > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> >
> > since __ign args are ignored during kfunc type match
> > the bpf progs can use it without #define.
> >
>
> It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> prototypes will not be the same. I guess I'll have to teach it do that during
> type match, but IDK how you feel about that.

libbpf does the full type match, really?
Could you point me to the code?

> Otherwise unless you want people to manually pass something to the ignored
> argument, we have to hide it behind a macro.
>
> I actually like the macro on top, then I don't even pass the type ID but the
> type. But that's a personal preference, and I don't feel strongly about it.
>
> So in C one does malloc(sizeof(*p)), here we'll just write
> bpf_kptr_new(typeof(*p)). YMMV.

bpf_kptr_new(typeof(*p)) is cleaner.

> > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > +   if (!p)
> > > +           return NULL;
> > > +   if (meta)
> > > +           bpf_obj_init(meta->off_arr, p);
> >
> > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > It reminds me of programming style where people tried to add types into
> > variable names. imo dropping _arr wouldn't be just fine.
>
> Ack, I'll do it in v3.
>
> Also, I'd like to invite people to please bikeshed a bit over the naming of the
> APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.

bpf_kptr_drop is more precise.
delete assumes instant free which is not the case here.

How about
extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
extern void bpf_obj_drop(void *obj) __ksym;
#define bpf_obj_new(t) \
 (t *)__bpf_obj_new(bpf_core_type_id_local(t));

kptr means 'kernel pointer'.
Here we have program supplied object.
It feels 'obj' is better than 'kptr' in this context.

> In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> list_del in the kernel works. So it does make sense to give them a different
> name (like pop_front/pop_back and push_front/push_back)?
>
> Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> distinction between node and head, so you can do list_add on a node as well, but
> it won't be possible with the kfunc (unless we overload the head argument to
> also work with nodes).
>
> Later we'll probably have to add bpf_list_node_add etc. that add before or after
> a node to make that work.
>
> The main question is whether it should closely resembly the linked list API in
> the kernel, or can it steer away considerably from that?

If we do doubly linked list we should allow delete in
the middle with
bpf_list_del_any(head, node)

and
bpf_list_pop_front/pop_back(head)

bpf_list_add(node, head) would match kernel style,
but I think it's cleaner to have head as 1st arg.
In that sense new pop/push/_front/_back are cleaner.
And similar for rbtree.

If we keep (node, head) and (rb_node, rb_root) order
we should keep kernel names.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-13  6:22 ` [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
@ 2022-10-19 17:15   ` Dave Marchevsky
  2022-10-20  0:48     ` Kumar Kartikeya Dwivedi
  2022-10-25 16:32   ` Dave Marchevsky
  1 sibling, 1 reply; 52+ messages in thread
From: Dave Marchevsky @ 2022-10-19 17:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Delyan Kratunov

On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
> type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
> type tag in reg->type to avoid having to check btf_is_kernel when trying
> to match argument types in helpers.
> 
> For now, these local kptrs will always be referenced in verifier
> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> to such objects, as long fields that are special are not touched
> (support for which will be added in subsequent patches).
> 
> No PROBE_MEM handling is hence done since they can never be in an
> undefined state, and their lifetime will always be valid.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h              | 14 +++++++++++---
>  include/linux/filter.h           |  4 +++-
>  kernel/bpf/btf.c                 |  9 ++++++++-
>  kernel/bpf/verifier.c            | 15 ++++++++++-----
>  net/bpf/bpf_dummy_struct_ops.c   |  3 ++-
>  net/core/filter.c                | 13 ++++++++-----
>  net/ipv4/bpf_tcp_ca.c            |  3 ++-
>  net/netfilter/nf_conntrack_bpf.c |  1 +
>  8 files changed, 45 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 46330d871d4e..a2f4d3356cc8 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -526,6 +526,11 @@ enum bpf_type_flag {
>  	/* Size is known at compile time. */
>  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>  
> +	/* MEM is of a type from program BTF, not kernel BTF. This is used to
> +	 * tag PTR_TO_BTF_ID allocated using bpf_kptr_alloc.
> +	 */
> +	MEM_TYPE_LOCAL		= BIT(11 + BPF_BASE_TYPE_BITS),
> +
>  	__BPF_TYPE_FLAG_MAX,
>  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>  };
> @@ -774,6 +779,7 @@ struct bpf_prog_ops {
>  			union bpf_attr __user *uattr);
>  };
>  
> +struct bpf_reg_state;
>  struct bpf_verifier_ops {
>  	/* return eBPF function prototype for verification */
>  	const struct bpf_func_proto *
> @@ -795,6 +801,7 @@ struct bpf_verifier_ops {
>  				  struct bpf_insn *dst,
>  				  struct bpf_prog *prog, u32 *target_size);
>  	int (*btf_struct_access)(struct bpf_verifier_log *log,
> +				 const struct bpf_reg_state *reg,

Not that struct_ops API is meant to be stable, but would be good to note that
this changes that API in the summary. 

On that note, maybe passing whole bpf_reg_state *reg can be avoided for now
by making this a 'bool disallow_ptr_walk' or similar, since that's the only 
thing this patch is using it for.

>  				 const struct btf *btf,
>  				 const struct btf_type *t, int off, int size,
>  				 enum bpf_access_type atype,
> @@ -2076,10 +2083,11 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size,
>  	return btf_ctx_access(off, size, type, prog, info);
>  }
>  
> -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> +int btf_struct_access(struct bpf_verifier_log *log,
> +		      const struct bpf_reg_state *reg, const struct btf *btf,
>  		      const struct btf_type *t, int off, int size,
> -		      enum bpf_access_type atype,
> -		      u32 *next_btf_id, enum bpf_type_flag *flag);
> +		      enum bpf_access_type atype, u32 *next_btf_id,
> +		      enum bpf_type_flag *flag);
>  bool btf_struct_ids_match(struct bpf_verifier_log *log,
>  			  const struct btf *btf, u32 id, int off,
>  			  const struct btf *need_btf, u32 need_type_id,
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index efc42a6e3aed..9b94e24f90b9 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -568,7 +568,9 @@ struct sk_filter {
>  DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>  
>  extern struct mutex nf_conn_btf_access_lock;
> -extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
> +extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
> +				     const struct bpf_reg_state *reg,
> +				     const struct btf *btf,
>  				     const struct btf_type *t, int off, int size,
>  				     enum bpf_access_type atype, u32 *next_btf_id,
>  				     enum bpf_type_flag *flag);
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 066984d73a8b..65f444405d9c 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6019,11 +6019,13 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>  	return -EINVAL;
>  }
>  
> -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> +int btf_struct_access(struct bpf_verifier_log *log,
> +		      const struct bpf_reg_state *reg, const struct btf *btf,
>  		      const struct btf_type *t, int off, int size,
>  		      enum bpf_access_type atype __maybe_unused,
>  		      u32 *next_btf_id, enum bpf_type_flag *flag)
>  {
> +	bool local_type = reg && (type_flag(reg->type) & MEM_TYPE_LOCAL);
>  	enum bpf_type_flag tmp_flag = 0;
>  	int err;
>  	u32 id;
> @@ -6033,6 +6035,11 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
>  
>  		switch (err) {
>  		case WALK_PTR:
> +			/* For local types, the destination register cannot
> +			 * become a pointer again.
> +			 */
> +			if (local_type)
> +				return SCALAR_VALUE;
>  			/* If we found the pointer or scalar on t+off,
>  			 * we're done.
>  			 */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c47cecda302..6ee8c06c2080 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4522,16 +4522,20 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>  
> -	if (env->ops->btf_struct_access) {
> -		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
> +	if (env->ops->btf_struct_access && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
> +		WARN_ON_ONCE(!btf_is_kernel(reg->btf));
> +		ret = env->ops->btf_struct_access(&env->log, reg, reg->btf, t,
>  						  off, size, atype, &btf_id, &flag);
>  	} else {
> -		if (atype != BPF_READ) {
> +		if (atype != BPF_READ && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
>  			verbose(env, "only read is supported\n");
>  			return -EACCES;
>  		}
>  
> -		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
> +		if (reg->type & MEM_TYPE_LOCAL)
> +			WARN_ON_ONCE(!reg->ref_obj_id);

Can we instead verbose(env, ...) and return error? Then when someone tries to
add local kptrs that don't set ref_obj_id in the future, it'll be more obvious
that this wasn't explicitly supported and they need to check verifier logic
carefully. Also rest of check_ptr_to_btf_access checks do verbose + err.

Similar for btf_is_kernel WARN above.

> +
> +		ret = btf_struct_access(&env->log, reg, reg->btf, t, off, size,


more re: passing entire reg state to btf_struct access: 

In the next patch in the series ("bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs")
you do btf_find_struct_meta(btf, reg->btf_id). I see why you couldn't use 't'
that's passed in here / elsewhere since you need the btf_id for meta lookup.
Perhaps 'btf_type *t' param can be changed to btf_id, eliminating the need
to pass 'reg'.

Alternatively, since we're already passing reg->btf and result of
btf_type_by_id(reg->btf, reg->btf_id), seems like btf_struct_access
maybe is tied closely enough to reg state that passing reg state
directly and getting rid of extraneous args is cleaner.

>  					atype, &btf_id, &flag);
>  	}
>  
> @@ -4596,7 +4600,7 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>  
> -	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
> +	ret = btf_struct_access(&env->log, NULL, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
>  	if (ret < 0)
>  		return ret;
>  
> @@ -5816,6 +5820,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> +	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * it's fixed offset must be 0.	In the other cases, fixed offset
>  		 * can be non-zero.
> diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
> index e78dadfc5829..d7aa636d90ce 100644
> --- a/net/bpf/bpf_dummy_struct_ops.c
> +++ b/net/bpf/bpf_dummy_struct_ops.c
> @@ -156,6 +156,7 @@ static bool bpf_dummy_ops_is_valid_access(int off, int size,
>  }
>  
>  static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
> +					   const struct bpf_reg_state *reg,
>  					   const struct btf *btf,
>  					   const struct btf_type *t, int off,
>  					   int size, enum bpf_access_type atype,
> @@ -177,7 +178,7 @@ static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
>  		return -EACCES;
>  	}
>  
> -	err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> +	err = btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
>  				flag);
>  	if (err < 0)
>  		return err;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bb0136e7a8e4..cc7af7be91d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8647,13 +8647,15 @@ static bool tc_cls_act_is_valid_access(int off, int size,
>  DEFINE_MUTEX(nf_conn_btf_access_lock);
>  EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
>  
> -int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
> +int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
> +			      const struct bpf_reg_state *reg, const struct btf *btf,
>  			      const struct btf_type *t, int off, int size,
>  			      enum bpf_access_type atype, u32 *next_btf_id,
>  			      enum bpf_type_flag *flag);
>  EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
>  
>  static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
> +					const struct bpf_reg_state *reg,
>  					const struct btf *btf,
>  					const struct btf_type *t, int off,
>  					int size, enum bpf_access_type atype,
> @@ -8663,12 +8665,12 @@ static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
>  	int ret = -EACCES;
>  
>  	if (atype == BPF_READ)
> -		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> +		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
>  					 flag);
>  
>  	mutex_lock(&nf_conn_btf_access_lock);
>  	if (nfct_btf_struct_access)
> -		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
> +		ret = nfct_btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id, flag);
>  	mutex_unlock(&nf_conn_btf_access_lock);
>  
>  	return ret;
> @@ -8734,6 +8736,7 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
>  EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
>  
>  static int xdp_btf_struct_access(struct bpf_verifier_log *log,
> +				 const struct bpf_reg_state *reg,
>  				 const struct btf *btf,
>  				 const struct btf_type *t, int off,
>  				 int size, enum bpf_access_type atype,
> @@ -8743,12 +8746,12 @@ static int xdp_btf_struct_access(struct bpf_verifier_log *log,
>  	int ret = -EACCES;
>  
>  	if (atype == BPF_READ)
> -		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> +		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
>  					 flag);
>  
>  	mutex_lock(&nf_conn_btf_access_lock);
>  	if (nfct_btf_struct_access)
> -		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
> +		ret = nfct_btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id, flag);
>  	mutex_unlock(&nf_conn_btf_access_lock);
>  
>  	return ret;
> diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> index 6da16ae6a962..1fe3935c4260 100644
> --- a/net/ipv4/bpf_tcp_ca.c
> +++ b/net/ipv4/bpf_tcp_ca.c
> @@ -69,6 +69,7 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
>  }
>  
>  static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
> +					const struct bpf_reg_state *reg,
>  					const struct btf *btf,
>  					const struct btf_type *t, int off,
>  					int size, enum bpf_access_type atype,
> @@ -78,7 +79,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
>  	size_t end;
>  
>  	if (atype == BPF_READ)
> -		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> +		return btf_struct_access(log, reg, btf, t, off, size, atype, next_btf_id,
>  					 flag);
>  
>  	if (t != tcp_sock_type) {
> diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
> index 8639e7efd0e2..f6036a84484b 100644
> --- a/net/netfilter/nf_conntrack_bpf.c
> +++ b/net/netfilter/nf_conntrack_bpf.c
> @@ -191,6 +191,7 @@ BTF_ID(struct, nf_conn___init)
>  
>  /* Check writes into `struct nf_conn` */
>  static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> +					   const struct bpf_reg_state *reg,
>  					   const struct btf *btf,
>  					   const struct btf_type *t, int off,
>  					   int size, enum bpf_access_type atype,

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab
  2022-10-19 15:54       ` Alexei Starovoitov
@ 2022-10-19 23:57         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19 23:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 09:24:18PM IST, Alexei Starovoitov wrote:
> On Tue, Oct 18, 2022 at 10:43 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 07:05:26AM IST, Alexei Starovoitov wrote:
> > > On Thu, Oct 13, 2022 at 11:52:44AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > To prepare the BPF verifier to handle special fields in both map values
> > > > and program allocated types coming from program BTF, we need to refactor
> > > > the kptr_off_tab handling code into something more generic and reusable
> > > > across both cases to avoid code duplication.
> > > >
> > > > Later patches also require passing this data to helpers at runtime, so
> > > > that they can work on user defined types, initialize them, destruct
> > > > them, etc.
> > > >
> > > > The main observation is that both map values and such allocated types
> > > > point to a type in program BTF, hence they can be handled similarly. We
> > > > can prepare a field metadata table for both cases and store them in
> > > > struct bpf_map or struct btf depending on the use case.
> > > >
> > > > Hence, refactor the code into generic btf_type_fields and btf_field
> > > > member structs. The btf_type_fields represents the fields of a specific
> > > > btf_type in user BTF. The cnt indicates the number of special fields we
> > > > successfully recognized, and field_mask is a bitmask of fields that were
> > > > found, to enable quick determination of availability of a certain field.
> > > >
> > > > Subsequently, refactor the rest of the code to work with these generic
> > > > types, remove assumptions about kptr and kptr_off_tab, rename variables
> > > > to more meaningful names, etc.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h     | 103 +++++++++++++-------
> > > >  include/linux/btf.h     |   4 +-
> > > >  kernel/bpf/arraymap.c   |  13 ++-
> > > >  kernel/bpf/btf.c        |  64 ++++++-------
> > > >  kernel/bpf/hashtab.c    |  14 ++-
> > > >  kernel/bpf/map_in_map.c |  13 ++-
> > > >  kernel/bpf/syscall.c    | 203 +++++++++++++++++++++++-----------------
> > > >  kernel/bpf/verifier.c   |  96 ++++++++++---------
> > > >  8 files changed, 289 insertions(+), 221 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 9e7d46d16032..25e77a172d7c 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -164,35 +164,41 @@ struct bpf_map_ops {
> > > >  };
> > > >
> > > >  enum {
> > > > -   /* Support at most 8 pointers in a BPF map value */
> > > > -   BPF_MAP_VALUE_OFF_MAX = 8,
> > > > -   BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> > > > +   /* Support at most 8 pointers in a BTF type */
> > > > +   BTF_FIELDS_MAX        = 8,
> > > > +   BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX +
> > > >                             1 + /* for bpf_spin_lock */
> > > >                             1,  /* for bpf_timer */
> > > >  };
> > > >
> > > > -enum bpf_kptr_type {
> > > > -   BPF_KPTR_UNREF,
> > > > -   BPF_KPTR_REF,
> > > > +enum btf_field_type {
> > > > +   BPF_KPTR_UNREF = (1 << 2),
> > > > +   BPF_KPTR_REF   = (1 << 3),
> > > > +   BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> > > >  };
> > > >
> > > > -struct bpf_map_value_off_desc {
> > > > +struct btf_field_kptr {
> > > > +   struct btf *btf;
> > > > +   struct module *module;
> > > > +   btf_dtor_kfunc_t dtor;
> > > > +   u32 btf_id;
> > > > +};
> > > > +
> > > > +struct btf_field {
> > > >     u32 offset;
> > > > -   enum bpf_kptr_type type;
> > > > -   struct {
> > > > -           struct btf *btf;
> > > > -           struct module *module;
> > > > -           btf_dtor_kfunc_t dtor;
> > > > -           u32 btf_id;
> > > > -   } kptr;
> > > > +   enum btf_field_type type;
> > > > +   union {
> > > > +           struct btf_field_kptr kptr;
> > > > +   };
> > > >  };
> > > >
> > > > -struct bpf_map_value_off {
> > > > -   u32 nr_off;
> > > > -   struct bpf_map_value_off_desc off[];
> > > > +struct btf_type_fields {
> > >
> > > How about btf_record instead ?
> > > Then btf_type_fields_has_field() will become btf_record_has_field() ?
> > >
> >
> > I guess btf_record is ok. I thought of just making it btf_fields, but then
> > bpf_map_free_fields (for freeing this struct) and bpf_obj_free_fields (for
> > freeing actual fields of object) gets confusing.
> >
> > Or to be more precise I could name the struct btf_type_record,
> > but the member variable record in all places.
>
> What "_type_" prefix adds to btf_record ?
>
> btf already has Type in the abbrev.
>

Well, it's the record of a btf_type, so btf_type_record.

> And from the other email:
>
> > I agree, what do you think of calling it btf_type_has_field? You pass > in the
> > btf_type_record and the field type.
>
> btf_type_has_field doesn't sound right.

Ok, let's go with just the btf_record naming.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values
  2022-10-19 15:57       ` Alexei Starovoitov
@ 2022-10-19 23:59         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-19 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 09:27:57PM IST, Alexei Starovoitov wrote:
> On Tue, Oct 18, 2022 at 10:48 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 07:29:16AM IST, Alexei Starovoitov wrote:
> > > On Thu, Oct 13, 2022 at 11:52:47AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > Add the basic support on the map side to parse, recognize, verify, and
> > > > build metadata table for a new special field of the type struct
> > > > bpf_list_head. To parameterize the bpf_list_head for a certain value
> > > > type and the list_node member it will accept in that value type, we use
> > > > BTF declaration tags.
> > > >
> > > > The definition of bpf_list_head in a map value will be done as follows:
> > > >
> > > > struct foo {
> > > >     struct bpf_list_node node;
> > > >     int data;
> > > > };
> > > >
> > > > struct map_value {
> > > >     struct bpf_list_head head __contains(foo, node);
> > > > };
> > > >
> > > > Then, the bpf_list_head only allows adding to the list 'head' using the
> > > > bpf_list_node 'node' for the type struct foo.
> > > >
> > > > The 'contains' annotation is a BTF declaration tag composed of four
> > > > parts, "contains:kind:name:node" where the kind and name is then used to
> > > > look up the type in the map BTF. The node defines name of the member in
> > > > this type that has the type struct bpf_list_node, which is actually used
> > > > for linking into the linked list. For now, 'kind' part is hardcoded as
> > > > struct.
> > >
> > > ...
> > >
> > > > +   value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
> > > > +   if (!value_type)
> > > > +           return -EINVAL;
> > > > +   if (strncmp(value_type, "struct:", sizeof("struct:") - 1))
> > > > +           return -EINVAL;
> > > > +   value_type += sizeof("struct:") - 1;
> > >
> > > I don't get it.
> > > The patch 24 does:
> > > +#define __contains(name, node) __attribute__((btf_decl_tag("contains:struct:" #name ":" #node)))
> > >
> > > The 'struct:' part is invisible to users. They won't make a mistake.
> > > Why bother adding it to BTF and then check for it?
> > > Backward compat concerns?
> > > But it's in bpf_experimental.h.
> > > That probably be the last thing to change and so easy to do.
> > > Please drop it?
> > >
> >
> > Fair, I just left it there anticipating atleast union with a discriminant might
> > be a possible candidate, but since this is all unstable it's not a big deal.
> >
> > > > diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> > > > new file mode 100644
> > > > index 000000000000..4e31790e433d
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> > > > @@ -0,0 +1,23 @@
> > > > +#ifndef __KERNEL__
> > > > +
> > > > +#include <vmlinux.h>
> > > > +#include <bpf/bpf_tracing.h>
> > > > +#include <bpf/bpf_helpers.h>
> > > > +#include <bpf/bpf_core_read.h>
> > > > +
> > >
> > > Why bother with the above?
> > > The below should be enough ?
> > >
> >
> > Actually, I'm using this header inside the kernel, userspace, and BPF programs.
> > In the kernel to provide type definitions for bpf_list_head and bpf_list_node,
> > which are then emitted to vmlinux.h (and also used inside the kernel ofcourse).
> >
> > In userspace for these types as otherwise including skeleton fails to build, as
> > such types are global variables, but there I have to define __KERNEL__ around
> > include.
> >
> > In the BPF program, for the kfunc declarations.
> >
> > I guess I can split the header into two to avoid confusion. I agree it's a bit
> > ugly.
>
> I think we can add bpf_list_head and bpf_list_node to uapi/bpf.h
> The chances of them changing the size are pretty low.

Sounds good to me, the rest I'll keep in bpf_experimental.h.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-19 16:31       ` Alexei Starovoitov
@ 2022-10-20  0:44         ` Kumar Kartikeya Dwivedi
  2022-10-20  1:11           ` Alexei Starovoitov
  0 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-20  0:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 10:01:21PM IST, Alexei Starovoitov wrote:
> On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > > +{
> > > > +   struct btf_struct_meta *meta = meta__ign;
> > > > +   u64 size = local_type_id__k;
> > > > +   void *p;
> > > > +
> > > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > > +           return NULL;
> > >
> > > Unused 'flags' looks weird in unstable api. Just drop it?
> > > And keep it as:
> > > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> > >
> > > and in bpf_experimental.h:
> > >
> > > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> > >
> > > since __ign args are ignored during kfunc type match
> > > the bpf progs can use it without #define.
> > >
> >
> > It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> > prototypes will not be the same. I guess I'll have to teach it do that during
> > type match, but IDK how you feel about that.
>
> libbpf does the full type match, really?
> Could you point me to the code?
>

Not full type match, but the number of arguments must be same, so it won't allow
having kfunc as:

void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);

in the kernel and ksym declaration in the program as:

extern void *bpf_kptr_new(__u64 local_type_id) __ksym;

I get:

libbpf: extern (func ksym) 'bpf_kptr_new_impl': func_proto [25] incompatible with kernel [60043]

vlen of func_proto in kernel type is 2, for us it will be 1.

> > Otherwise unless you want people to manually pass something to the ignored
> > argument, we have to hide it behind a macro.
> >
> > I actually like the macro on top, then I don't even pass the type ID but the
> > type. But that's a personal preference, and I don't feel strongly about it.
> >
> > So in C one does malloc(sizeof(*p)), here we'll just write
> > bpf_kptr_new(typeof(*p)). YMMV.
>
> bpf_kptr_new(typeof(*p)) is cleaner.
>

So if we're having a macro anyway, the thing above can be hidden behind it.

> > > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > > +   if (!p)
> > > > +           return NULL;
> > > > +   if (meta)
> > > > +           bpf_obj_init(meta->off_arr, p);
> > >
> > > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > > It reminds me of programming style where people tried to add types into
> > > variable names. imo dropping _arr wouldn't be just fine.
> >
> > Ack, I'll do it in v3.
> >
> > Also, I'd like to invite people to please bikeshed a bit over the naming of the
> > APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.
>
> bpf_kptr_drop is more precise.
> delete assumes instant free which is not the case here.
>
> How about
> extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
> extern void bpf_obj_drop(void *obj) __ksym;
> #define bpf_obj_new(t) \
>  (t *)__bpf_obj_new(bpf_core_type_id_local(t));
>
> kptr means 'kernel pointer'.
> Here we have program supplied object.
> It feels 'obj' is better than 'kptr' in this context.
>

I agree, I'll rename it to bpf_obj_*.

Also, that __bpf_obj_new doesn't work yet in clang [0] but I think we can rename
it to that once clang is fixed.

 [0]: https://reviews.llvm.org/D136041

> > In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> > list_del in the kernel works. So it does make sense to give them a different
> > name (like pop_front/pop_back and push_front/push_back)?
> >
> > Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> > distinction between node and head, so you can do list_add on a node as well, but
> > it won't be possible with the kfunc (unless we overload the head argument to
> > also work with nodes).
> >
> > Later we'll probably have to add bpf_list_node_add etc. that add before or after
> > a node to make that work.
> >
> > The main question is whether it should closely resembly the linked list API in
> > the kernel, or can it steer away considerably from that?
>
> If we do doubly linked list we should allow delete in
> the middle with
> bpf_list_del_any(head, node)
>
> and
> bpf_list_pop_front/pop_back(head)
>
> bpf_list_add(node, head) would match kernel style,
> but I think it's cleaner to have head as 1st arg.
> In that sense new pop/push/_front/_back are cleaner.
> And similar for rbtree.
>
> If we keep (node, head) and (rb_node, rb_root) order
> we should keep kernel names.

There's also tradeoffs in how various operations are done.

Right now we have bpf_list_del and bpf_list_del_tail doing pop_front and
pop_back.

To replicate the same in kernel style API, you would do:

struct foo *f = bpf_list_first_entry_or_null(head);
if (!f) {}
bpf_list_del(&f->node);

Between those two calls, you might do bpf_list_last_entry -> bpf_list_del, the
verifier is going to have a hard time proving the aliasing of the two nodes, so
it will have to invalidate all pointers peeking into the list whenever something
modifies it. You would still be able to access memory but cannot pass it to list
ops anymore.

But I think we will pay this cost eventually anyway, people will add a peek
operation and such analysis would have to be done then. So I think it's unlikely
we can avoid this, and it might be better to make things more consistent and end
up mirroring the kernel list APIs.

WDYT?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-19 17:15   ` Dave Marchevsky
@ 2022-10-20  0:48     ` Kumar Kartikeya Dwivedi
  2022-10-25 16:27       ` Dave Marchevsky
  0 siblings, 1 reply; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-20  0:48 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Delyan Kratunov

On Wed, Oct 19, 2022 at 10:45:22PM IST, Dave Marchevsky wrote:
> On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> > Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
> > type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
> > type tag in reg->type to avoid having to check btf_is_kernel when trying
> > to match argument types in helpers.
> >
> > For now, these local kptrs will always be referenced in verifier
> > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > to such objects, as long fields that are special are not touched
> > (support for which will be added in subsequent patches).
> >
> > No PROBE_MEM handling is hence done since they can never be in an
> > undefined state, and their lifetime will always be valid.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h              | 14 +++++++++++---
> >  include/linux/filter.h           |  4 +++-
> >  kernel/bpf/btf.c                 |  9 ++++++++-
> >  kernel/bpf/verifier.c            | 15 ++++++++++-----
> >  net/bpf/bpf_dummy_struct_ops.c   |  3 ++-
> >  net/core/filter.c                | 13 ++++++++-----
> >  net/ipv4/bpf_tcp_ca.c            |  3 ++-
> >  net/netfilter/nf_conntrack_bpf.c |  1 +
> >  8 files changed, 45 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 46330d871d4e..a2f4d3356cc8 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -526,6 +526,11 @@ enum bpf_type_flag {
> >  	/* Size is known at compile time. */
> >  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
> >
> > +	/* MEM is of a type from program BTF, not kernel BTF. This is used to
> > +	 * tag PTR_TO_BTF_ID allocated using bpf_kptr_alloc.
> > +	 */
> > +	MEM_TYPE_LOCAL		= BIT(11 + BPF_BASE_TYPE_BITS),
> > +
> >  	__BPF_TYPE_FLAG_MAX,
> >  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
> >  };
> > @@ -774,6 +779,7 @@ struct bpf_prog_ops {
> >  			union bpf_attr __user *uattr);
> >  };
> >
> > +struct bpf_reg_state;
> >  struct bpf_verifier_ops {
> >  	/* return eBPF function prototype for verification */
> >  	const struct bpf_func_proto *
> > @@ -795,6 +801,7 @@ struct bpf_verifier_ops {
> >  				  struct bpf_insn *dst,
> >  				  struct bpf_prog *prog, u32 *target_size);
> >  	int (*btf_struct_access)(struct bpf_verifier_log *log,
> > +				 const struct bpf_reg_state *reg,
>
> Not that struct_ops API is meant to be stable, but would be good to note that
> this changes that API in the summary.
>

Ack, will do.

> On that note, maybe passing whole bpf_reg_state *reg can be avoided for now
> by making this a 'bool disallow_ptr_walk' or similar, since that's the only
> thing this patch is using it for.
>

I did this in the RFC version, with bool local_type, but Alexei asked me to drop
it and just pass the register in. But more on that below...

> >  				 const struct btf *btf,
> >  				 const struct btf_type *t, int off, int size,
> >  				 enum bpf_access_type atype,
> > @@ -2076,10 +2083,11 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size,
> >  	return btf_ctx_access(off, size, type, prog, info);
> >  }
> >
> > -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> > +int btf_struct_access(struct bpf_verifier_log *log,
> > +		      const struct bpf_reg_state *reg, const struct btf *btf,
> >  		      const struct btf_type *t, int off, int size,
> > -		      enum bpf_access_type atype,
> > -		      u32 *next_btf_id, enum bpf_type_flag *flag);
> > +		      enum bpf_access_type atype, u32 *next_btf_id,
> > +		      enum bpf_type_flag *flag);
> >  bool btf_struct_ids_match(struct bpf_verifier_log *log,
> >  			  const struct btf *btf, u32 id, int off,
> >  			  const struct btf *need_btf, u32 need_type_id,
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index efc42a6e3aed..9b94e24f90b9 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -568,7 +568,9 @@ struct sk_filter {
> >  DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> >
> >  extern struct mutex nf_conn_btf_access_lock;
> > -extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
> > +extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
> > +				     const struct bpf_reg_state *reg,
> > +				     const struct btf *btf,
> >  				     const struct btf_type *t, int off, int size,
> >  				     enum bpf_access_type atype, u32 *next_btf_id,
> >  				     enum bpf_type_flag *flag);
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 066984d73a8b..65f444405d9c 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -6019,11 +6019,13 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
> >  	return -EINVAL;
> >  }
> >
> > -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> > +int btf_struct_access(struct bpf_verifier_log *log,
> > +		      const struct bpf_reg_state *reg, const struct btf *btf,
> >  		      const struct btf_type *t, int off, int size,
> >  		      enum bpf_access_type atype __maybe_unused,
> >  		      u32 *next_btf_id, enum bpf_type_flag *flag)
> >  {
> > +	bool local_type = reg && (type_flag(reg->type) & MEM_TYPE_LOCAL);
> >  	enum bpf_type_flag tmp_flag = 0;
> >  	int err;
> >  	u32 id;
> > @@ -6033,6 +6035,11 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> >
> >  		switch (err) {
> >  		case WALK_PTR:
> > +			/* For local types, the destination register cannot
> > +			 * become a pointer again.
> > +			 */
> > +			if (local_type)
> > +				return SCALAR_VALUE;
> >  			/* If we found the pointer or scalar on t+off,
> >  			 * we're done.
> >  			 */
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 3c47cecda302..6ee8c06c2080 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4522,16 +4522,20 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >  		return -EACCES;
> >  	}
> >
> > -	if (env->ops->btf_struct_access) {
> > -		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
> > +	if (env->ops->btf_struct_access && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
> > +		WARN_ON_ONCE(!btf_is_kernel(reg->btf));
> > +		ret = env->ops->btf_struct_access(&env->log, reg, reg->btf, t,
> >  						  off, size, atype, &btf_id, &flag);
> >  	} else {
> > -		if (atype != BPF_READ) {
> > +		if (atype != BPF_READ && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
> >  			verbose(env, "only read is supported\n");
> >  			return -EACCES;
> >  		}
> >
> > -		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
> > +		if (reg->type & MEM_TYPE_LOCAL)
> > +			WARN_ON_ONCE(!reg->ref_obj_id);
>
> Can we instead verbose(env, ...) and return error? Then when someone tries to
> add local kptrs that don't set ref_obj_id in the future, it'll be more obvious
> that this wasn't explicitly supported and they need to check verifier logic
> carefully. Also rest of check_ptr_to_btf_access checks do verbose + err.
>
> Similar for btf_is_kernel WARN above.
>

Ack.

> > +
> > +		ret = btf_struct_access(&env->log, reg, reg->btf, t, off, size,
>
>
> more re: passing entire reg state to btf_struct access:
>
> In the next patch in the series ("bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs")
> you do btf_find_struct_meta(btf, reg->btf_id). I see why you couldn't use 't'
> that's passed in here / elsewhere since you need the btf_id for meta lookup.
> Perhaps 'btf_type *t' param can be changed to btf_id, eliminating the need
> to pass 'reg'.
>
> Alternatively, since we're already passing reg->btf and result of
> btf_type_by_id(reg->btf, reg->btf_id), seems like btf_struct_access
> maybe is tied closely enough to reg state that passing reg state
> directly and getting rid of extraneous args is cleaner.
>

So Alexei actually suggested dropping both btf and type arguments and simply
pass in the register and get it from there.

But one call site threw a wrench in the plan:

check_ptr_to_map_access -> btf_struct_access

Here, it passes it's own btf and type to simulate access to a map. Maybe I
should be creating a dummy register on stack and make it work like that for this
particular case? Otherwise all other callers pass in what they have from reg.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new
  2022-10-20  0:44         ` Kumar Kartikeya Dwivedi
@ 2022-10-20  1:11           ` Alexei Starovoitov
  0 siblings, 0 replies; 52+ messages in thread
From: Alexei Starovoitov @ 2022-10-20  1:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Dave Marchevsky, Delyan Kratunov

On Wed, Oct 19, 2022 at 5:44 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 10:01:21PM IST, Alexei Starovoitov wrote:
> > On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > > > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > > > +{
> > > > > +   struct btf_struct_meta *meta = meta__ign;
> > > > > +   u64 size = local_type_id__k;
> > > > > +   void *p;
> > > > > +
> > > > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > > > +           return NULL;
> > > >
> > > > Unused 'flags' looks weird in unstable api. Just drop it?
> > > > And keep it as:
> > > > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> > > >
> > > > and in bpf_experimental.h:
> > > >
> > > > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> > > >
> > > > since __ign args are ignored during kfunc type match
> > > > the bpf progs can use it without #define.
> > > >
> > >
> > > It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> > > prototypes will not be the same. I guess I'll have to teach it do that during
> > > type match, but IDK how you feel about that.
> >
> > libbpf does the full type match, really?
> > Could you point me to the code?
> >
>
> Not full type match, but the number of arguments must be same, so it won't allow
> having kfunc as:
>
> void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
>
> in the kernel and ksym declaration in the program as:
>
> extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
>
> I get:
>
> libbpf: extern (func ksym) 'bpf_kptr_new_impl': func_proto [25] incompatible with kernel [60043]
>
> vlen of func_proto in kernel type is 2, for us it will be 1.

Ahh. Found it.
__bpf_core_types_are_compat() runs in both kernel and libbpf.
We could hack it for this case, but let's not complicate it.

> > > Otherwise unless you want people to manually pass something to the ignored
> > > argument, we have to hide it behind a macro.
> > >
> > > I actually like the macro on top, then I don't even pass the type ID but the
> > > type. But that's a personal preference, and I don't feel strongly about it.
> > >
> > > So in C one does malloc(sizeof(*p)), here we'll just write
> > > bpf_kptr_new(typeof(*p)). YMMV.
> >
> > bpf_kptr_new(typeof(*p)) is cleaner.
> >
>
> So if we're having a macro anyway, the thing above can be hidden behind it.

Indeed. Since #define is there let's not mess with
__bpf_core_types_are_compat().

> > > > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > +   if (!p)
> > > > > +           return NULL;
> > > > > +   if (meta)
> > > > > +           bpf_obj_init(meta->off_arr, p);
> > > >
> > > > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > > > It reminds me of programming style where people tried to add types into
> > > > variable names. imo dropping _arr wouldn't be just fine.
> > >
> > > Ack, I'll do it in v3.
> > >
> > > Also, I'd like to invite people to please bikeshed a bit over the naming of the
> > > APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.
> >
> > bpf_kptr_drop is more precise.
> > delete assumes instant free which is not the case here.
> >
> > How about
> > extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
> > extern void bpf_obj_drop(void *obj) __ksym;
> > #define bpf_obj_new(t) \
> >  (t *)__bpf_obj_new(bpf_core_type_id_local(t));
> >
> > kptr means 'kernel pointer'.
> > Here we have program supplied object.
> > It feels 'obj' is better than 'kptr' in this context.
> >
>
> I agree, I'll rename it to bpf_obj_*.
>
> Also, that __bpf_obj_new doesn't work yet in clang [0] but I think we can rename
> it to that once clang is fixed.

argh. ok. let's keep bpf_obj_new_impl() for now.

>
>  [0]: https://reviews.llvm.org/D136041
>
> > > In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> > > list_del in the kernel works. So it does make sense to give them a different
> > > name (like pop_front/pop_back and push_front/push_back)?
> > >
> > > Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> > > distinction between node and head, so you can do list_add on a node as well, but
> > > it won't be possible with the kfunc (unless we overload the head argument to
> > > also work with nodes).
> > >
> > > Later we'll probably have to add bpf_list_node_add etc. that add before or after
> > > a node to make that work.
> > >
> > > The main question is whether it should closely resembly the linked list API in
> > > the kernel, or can it steer away considerably from that?
> >
> > If we do doubly linked list we should allow delete in
> > the middle with
> > bpf_list_del_any(head, node)
> >
> > and
> > bpf_list_pop_front/pop_back(head)
> >
> > bpf_list_add(node, head) would match kernel style,
> > but I think it's cleaner to have head as 1st arg.
> > In that sense new pop/push/_front/_back are cleaner.
> > And similar for rbtree.
> >
> > If we keep (node, head) and (rb_node, rb_root) order
> > we should keep kernel names.
>
> There's also tradeoffs in how various operations are done.
>
> Right now we have bpf_list_del and bpf_list_del_tail doing pop_front and
> pop_back.
>
> To replicate the same in kernel style API, you would do:
>
> struct foo *f = bpf_list_first_entry_or_null(head);
> if (!f) {}
> bpf_list_del(&f->node);
>
> Between those two calls, you might do bpf_list_last_entry -> bpf_list_del, the
> verifier is going to have a hard time proving the aliasing of the two nodes, so
> it will have to invalidate all pointers peeking into the list whenever something
> modifies it. You would still be able to access memory but cannot pass it to list
> ops anymore.

yeah. ugly.

>
> But I think we will pay this cost eventually anyway, people will add a peek
> operation and such analysis would have to be done then. So I think it's unlikely
> we can avoid this, and it might be better to make things more consistent and end
> up mirroring the kernel list APIs.

Let's keep the verifier simpler for now.
We can go with pop/push/_front/_back, since they are simpler
and later add kernel style accessor as well with swapped
node/head order.
One doesn't preclude the other.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-20  0:48     ` Kumar Kartikeya Dwivedi
@ 2022-10-25 16:27       ` Dave Marchevsky
  2022-10-25 18:11         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Dave Marchevsky @ 2022-10-25 16:27 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Delyan Kratunov

On 10/19/22 8:48 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, Oct 19, 2022 at 10:45:22PM IST, Dave Marchevsky wrote:
>> On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
>>> Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
>>> type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
>>> type tag in reg->type to avoid having to check btf_is_kernel when trying
>>> to match argument types in helpers.
>>>
>>> For now, these local kptrs will always be referenced in verifier
>>> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
>>> to such objects, as long fields that are special are not touched
>>> (support for which will be added in subsequent patches).
>>>
>>> No PROBE_MEM handling is hence done since they can never be in an
>>> undefined state, and their lifetime will always be valid.
>>>
>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>> ---

[...]

>>
>>
>> more re: passing entire reg state to btf_struct access:
>>
>> In the next patch in the series ("bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs")
>> you do btf_find_struct_meta(btf, reg->btf_id). I see why you couldn't use 't'
>> that's passed in here / elsewhere since you need the btf_id for meta lookup.
>> Perhaps 'btf_type *t' param can be changed to btf_id, eliminating the need
>> to pass 'reg'.
>>
>> Alternatively, since we're already passing reg->btf and result of
>> btf_type_by_id(reg->btf, reg->btf_id), seems like btf_struct_access
>> maybe is tied closely enough to reg state that passing reg state
>> directly and getting rid of extraneous args is cleaner.
>>
> 
> So Alexei actually suggested dropping both btf and type arguments and simply
> pass in the register and get it from there.
> 
> But one call site threw a wrench in the plan:
> 
> check_ptr_to_map_access -> btf_struct_access
> 
> Here, it passes it's own btf and type to simulate access to a map. Maybe I
> should be creating a dummy register on stack and make it work like that for this
> particular case? Otherwise all other callers pass in what they have from reg.

Ah, sorry for missing that. Personally I'm not a fan of dummy register on the
stack. Then if btf_struct_access starts using some reg state that wasn't
populated in the dummy reg it will be confusing.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-13  6:22 ` [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
  2022-10-19 17:15   ` Dave Marchevsky
@ 2022-10-25 16:32   ` Dave Marchevsky
  2022-10-25 18:11     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 52+ messages in thread
From: Dave Marchevsky @ 2022-10-25 16:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Delyan Kratunov

On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
> type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
> type tag in reg->type to avoid having to check btf_is_kernel when trying
> to match argument types in helpers.
> 
> For now, these local kptrs will always be referenced in verifier
> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> to such objects, as long fields that are special are not touched
> (support for which will be added in subsequent patches).
> 
> No PROBE_MEM handling is hence done since they can never be in an
> undefined state, and their lifetime will always be valid.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

One nit unrelated to the other thread we have going for this patch.

[...]

> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 066984d73a8b..65f444405d9c 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6019,11 +6019,13 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>  	return -EINVAL;
>  }
>  
> -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> +int btf_struct_access(struct bpf_verifier_log *log,
> +		      const struct bpf_reg_state *reg, const struct btf *btf,
>  		      const struct btf_type *t, int off, int size,
>  		      enum bpf_access_type atype __maybe_unused,
>  		      u32 *next_btf_id, enum bpf_type_flag *flag)
>  {
> +	bool local_type = reg && (type_flag(reg->type) & MEM_TYPE_LOCAL);

Can you add a type_is_local_kptr helper (or similar name) to reduce this
type_flag(reg->type) & MEM_TYPE_LOCAL repetition here and elsewhere in the patch?
Some examples of repetition in verifier.c below.

>  	enum bpf_type_flag tmp_flag = 0;
>  	int err;
>  	u32 id;
> @@ -6033,6 +6035,11 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
>  
>  		switch (err) {
>  		case WALK_PTR:
> +			/* For local types, the destination register cannot
> +			 * become a pointer again.
> +			 */
> +			if (local_type)
> +				return SCALAR_VALUE;
>  			/* If we found the pointer or scalar on t+off,
>  			 * we're done.
>  			 */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c47cecda302..6ee8c06c2080 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4522,16 +4522,20 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>  
> -	if (env->ops->btf_struct_access) {
> -		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
> +	if (env->ops->btf_struct_access && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
> +		WARN_ON_ONCE(!btf_is_kernel(reg->btf));
> +		ret = env->ops->btf_struct_access(&env->log, reg, reg->btf, t,
>  						  off, size, atype, &btf_id, &flag);
>  	} else {
> -		if (atype != BPF_READ) {
> +		if (atype != BPF_READ && !(type_flag(reg->type) & MEM_TYPE_LOCAL)) {
>  			verbose(env, "only read is supported\n");
>  			return -EACCES;
>  		}
>  
> -		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
> +		if (reg->type & MEM_TYPE_LOCAL)
> +			WARN_ON_ONCE(!reg->ref_obj_id);
> +
> +		ret = btf_struct_access(&env->log, reg, reg->btf, t, off, size,
>  					atype, &btf_id, &flag);
>  	}
>  
> @@ -4596,7 +4600,7 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>  
> -	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
> +	ret = btf_struct_access(&env->log, NULL, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
>  	if (ret < 0)
>  		return ret;
>  
> @@ -5816,6 +5820,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> +	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * it's fixed offset must be 0.	In the other cases, fixed offset
>  		 * can be non-zero.

[...]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API
  2022-10-13  6:23 ` [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
@ 2022-10-25 17:45   ` Dave Marchevsky
  2022-10-25 19:00     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 52+ messages in thread
From: Dave Marchevsky @ 2022-10-25 17:45 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Delyan Kratunov

On 10/13/22 2:23 AM, Kumar Kartikeya Dwivedi wrote:
> Add a linked list API for use in BPF programs, where it expects
> protection from the bpf_spin_lock in the same allocation as the
> bpf_list_head. Future patches will extend the same infrastructure to
> have different flavors with varying protection domains and visibility
> (e.g. percpu variant with local_t protection, usable in NMI progs).
> 
> The following functions are added to kick things off:
> 
> bpf_list_add
> bpf_list_add_tail
> bpf_list_del
> bpf_list_del_tail
> 
> The lock protecting the bpf_list_head needs to be taken for all
> operations.
> 
> Once a node has been added to the list, it's pointer changes to
> PTR_UNTRUSTED. However, it is only released once the lock protecting the
> list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an
> active ref_obj_id, it is still permitted to read and write to them as
> long as the lock is held.
> 
> bpf_list_del and bpf_list_del_tail delete the first or last item of the
> list respectively, and return pointer to the element at the list_node
> offset. The user can then use container_of style macro to get the actual
> entry type. The verifier however statically knows the actual type, so
> the safety properties are still preserved.
> 
> With these additions, programs can now manage their own linked lists and
> store their objects in them.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf_verifier.h                  |   5 +
>  kernel/bpf/helpers.c                          |  48 +++
>  kernel/bpf/verifier.c                         | 344 ++++++++++++++++--
>  .../testing/selftests/bpf/bpf_experimental.h  |  28 ++
>  4 files changed, 391 insertions(+), 34 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 0cc4679f3f42..01d3dd76b224 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -229,6 +229,11 @@ struct bpf_reference_state {
>  	 * exiting a callback function.
>  	 */
>  	int callback_ref;
> +	/* Mark the reference state to release the registers sharing the same id
> +	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
> +	 * safe to access inside the critical section).
> +	 */
> +	bool release_on_unlock;
>  };
>  
>  /* state of the program:
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 43a7c9999e94..71e0f19f738a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1768,6 +1768,50 @@ void bpf_kptr_drop_impl(void *p__lkptr, void *meta__ign)
>  	bpf_mem_free(&bpf_global_ma, p);
>  }
>  
> +static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
> +{
> +	struct list_head *n = (void *)node, *h = (void *)head;
> +
> +	if (unlikely(!h->next))
> +		INIT_LIST_HEAD(h);
> +	if (unlikely(!n->next))
> +		INIT_LIST_HEAD(n);
> +	tail ? list_add_tail(n, h) : list_add(n, h);
> +}
> +
> +void bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head)
> +{
> +	return __bpf_list_add(node, head, false);
> +}
> +
> +void bpf_list_add_tail(struct bpf_list_node *node, struct bpf_list_head *head)
> +{
> +	return __bpf_list_add(node, head, true);
> +}
> +
> +static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
> +{
> +	struct list_head *n, *h = (void *)head;
> +
> +	if (unlikely(!h->next))
> +		INIT_LIST_HEAD(h);
> +	if (list_empty(h))
> +		return NULL;
> +	n = tail ? h->prev : h->next;
> +	list_del_init(n);
> +	return (struct bpf_list_node *)n;
> +}
> +
> +struct bpf_list_node *bpf_list_del(struct bpf_list_head *head)
> +{
> +	return __bpf_list_del(head, false);
> +}
> +
> +struct bpf_list_node *bpf_list_del_tail(struct bpf_list_head *head)
> +{
> +	return __bpf_list_del(head, true);
> +}
> +
>  __diag_pop();
>  
>  BTF_SET8_START(generic_btf_ids)
> @@ -1776,6 +1820,10 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>  #endif
>  BTF_ID_FLAGS(func, bpf_kptr_new_impl, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_kptr_drop_impl, KF_RELEASE)
> +BTF_ID_FLAGS(func, bpf_list_add)
> +BTF_ID_FLAGS(func, bpf_list_add_tail)
> +BTF_ID_FLAGS(func, bpf_list_del, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_list_del_tail, KF_ACQUIRE | KF_RET_NULL)
>  BTF_SET8_END(generic_btf_ids)
>  
>  static const struct btf_kfunc_id_set generic_kfunc_set = {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a8cd04c18ac5..96cf576784c6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5485,7 +5485,9 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  			cur->active_spin_lock_ptr = btf;
>  		cur->active_spin_lock_id = reg->id;
>  	} else {
> +		struct bpf_func_state *fstate = cur_func(env);
>  		void *ptr;
> +		int i;
>  
>  		if (map)
>  			ptr = map;
> @@ -5503,6 +5505,16 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  		}
>  		cur->active_spin_lock_ptr = NULL;
>  		cur->active_spin_lock_id = 0;
> +
> +		for (i = 0; i < fstate->acquired_refs; i++) {
> +			/* WARN because this reference state cannot be freed
> +			 * before this point, as bpf_spin_lock CS does not
> +			 * allow functions that release the local kptr
> +			 * immediately.
> +			 */
> +			if (fstate->refs[i].release_on_unlock)
> +				WARN_ON_ONCE(release_reference(env, fstate->refs[i].id));
> +		}
>  	}
>  	return 0;
>  }
> @@ -7697,6 +7709,16 @@ struct bpf_kfunc_call_arg_meta {
>  		struct btf *btf;
>  		u32 btf_id;
>  	} arg_kptr_drop;
> +	struct {
> +		struct btf_field *field;
> +	} arg_list_head;
> +	struct {
> +		struct btf_field *field;
> +		struct btf *reg_btf;
> +		u32 reg_btf_id;
> +		u32 reg_offset;
> +		u32 reg_ref_obj_id;
> +	} arg_list_node;
>  };
>  
>  static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
> @@ -7807,13 +7829,17 @@ static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
>  
>  enum {
>  	KF_ARG_DYNPTR_ID,
> +	KF_ARG_LIST_HEAD_ID,
> +	KF_ARG_LIST_NODE_ID,
>  };
>  
>  BTF_ID_LIST(kf_arg_btf_ids)
>  BTF_ID(struct, bpf_dynptr_kern)
> +BTF_ID(struct, bpf_list_head)
> +BTF_ID(struct, bpf_list_node)
>  
> -static bool is_kfunc_arg_dynptr(const struct btf *btf,
> -				const struct btf_param *arg)
> +static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
> +				    const struct btf_param *arg, int type)
>  {
>  	const struct btf_type *t;
>  	u32 res_id;
> @@ -7826,7 +7852,22 @@ static bool is_kfunc_arg_dynptr(const struct btf *btf,
>  	t = btf_type_skip_modifiers(btf, t->type, &res_id);
>  	if (!t)
>  		return false;
> -	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
> +	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[type]);
> +}
> +
> +static bool is_kfunc_arg_dynptr(const struct btf *btf, const struct btf_param *arg)
> +{
> +	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_DYNPTR_ID);
> +}
> +
> +static bool is_kfunc_arg_list_head(const struct btf *btf, const struct btf_param *arg)
> +{
> +	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_HEAD_ID);
> +}
> +
> +static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param *arg)
> +{
> +	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
>  }
>  
>  /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> @@ -7881,9 +7922,11 @@ static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
>  enum kfunc_ptr_arg_type {
>  	KF_ARG_PTR_TO_CTX,
>  	KF_ARG_PTR_TO_LOCAL_BTF_ID,  /* Local kptr */
> -	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
>  	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
>  	KF_ARG_PTR_TO_DYNPTR,
> +	KF_ARG_PTR_TO_LIST_HEAD,
> +	KF_ARG_PTR_TO_LIST_NODE,
> +	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
>  	KF_ARG_PTR_TO_MEM,
>  	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
>  };
> @@ -7891,16 +7934,28 @@ enum kfunc_ptr_arg_type {
>  enum special_kfunc_type {
>  	KF_bpf_kptr_new_impl,
>  	KF_bpf_kptr_drop_impl,
> +	KF_bpf_list_add,
> +	KF_bpf_list_add_tail,
> +	KF_bpf_list_del,
> +	KF_bpf_list_del_tail,
>  };
>  
>  BTF_SET_START(special_kfunc_set)
>  BTF_ID(func, bpf_kptr_new_impl)
>  BTF_ID(func, bpf_kptr_drop_impl)
> +BTF_ID(func, bpf_list_add)
> +BTF_ID(func, bpf_list_add_tail)
> +BTF_ID(func, bpf_list_del)
> +BTF_ID(func, bpf_list_del_tail)
>  BTF_SET_END(special_kfunc_set)
>  
>  BTF_ID_LIST(special_kfunc_list)
>  BTF_ID(func, bpf_kptr_new_impl)
>  BTF_ID(func, bpf_kptr_drop_impl)
> +BTF_ID(func, bpf_list_add)
> +BTF_ID(func, bpf_list_add_tail)
> +BTF_ID(func, bpf_list_del)
> +BTF_ID(func, bpf_list_del_tail)
>  
>  enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  						struct bpf_kfunc_call_arg_meta *meta,
> @@ -7926,15 +7981,6 @@ enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  	if (is_kfunc_arg_local_kptr(meta->btf, &args[argno]))
>  		return KF_ARG_PTR_TO_LOCAL_BTF_ID;
>  
> -	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> -		if (!btf_type_is_struct(ref_t)) {
> -			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> -				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> -			return -EINVAL;
> -		}
> -		return KF_ARG_PTR_TO_BTF_ID;
> -	}
> -
>  	if (is_kfunc_arg_kptr_get(meta, argno)) {
>  		if (!btf_type_is_ptr(ref_t)) {
>  			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
> @@ -7953,6 +7999,21 @@ enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
>  		return KF_ARG_PTR_TO_DYNPTR;
>  
> +	if (is_kfunc_arg_list_head(meta->btf, &args[argno]))
> +		return KF_ARG_PTR_TO_LIST_HEAD;
> +
> +	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
> +		return KF_ARG_PTR_TO_LIST_NODE;
> +
> +	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> +		if (!btf_type_is_struct(ref_t)) {
> +			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> +				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> +			return -EINVAL;
> +		}
> +		return KF_ARG_PTR_TO_BTF_ID;
> +	}
> +
>  	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
>  		arg_mem_size = true;
>  
> @@ -8039,6 +8100,181 @@ static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> +static bool ref_obj_id_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	struct bpf_reg_state *reg;
> +	int i;
> +
> +	/* bpf_spin_lock only allows calling list_add and list_del, no BPF
> +	 * subprogs, no global functions, so this acquired refs state is the
> +	 * same one we will use to find registers to kill on bpf_spin_unlock.
> +	 */

It's unclear to me what "only allows calling ... is the same one" in this
comment is trying to say. Are you trying to say something similar to your
comment in the process_spin_lock change in this patch? e.g. "bpf_spin_lock CS
does not allow functions that release the local kptr, so this ref_obj_id will
still be valid then". At least to me the language in the other comment is
clearer.

> +	WARN_ON_ONCE(!ref_obj_id);

Can this be a verbose("verifier internal error: ...") + return?
Same for similar 'this should never happen' checks elsewhere in this function
and patch.

> +	for (i = 0; i < state->acquired_refs; i++) {
> +		if (state->refs[i].id == ref_obj_id) {
> +			WARN_ON_ONCE(state->refs[i].release_on_unlock);
> +			state->refs[i].release_on_unlock = true;
> +			/* Now mark everyone sharing same ref_obj_id as untrusted */
> +			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> +				if (reg->ref_obj_id == ref_obj_id)
> +					reg->type |= PTR_UNTRUSTED;

To confirm my understanding: since ownership of the thing this reference points
to has been transferred to the datastructure, it's now necessary to mark all
instances of the reference PTR_UNTRUSTED to prevent them from being passed
to helpers/kfuncs as the owning datastructure could make it dissapear
at any time? Or because arbitrary kfunc might mess with bpf_list_node internal
fields?

> +			}));
> +			return 0;
> +		}
> +	}
> +	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
> +	return -EFAULT;

You're returning -EFAULT here, but the fn return type is 'bool' above.

> +}
> +
> +static bool is_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +	void *ptr;
> +	u32 id;
> +
> +	switch ((int)reg->type) {
> +	case PTR_TO_MAP_VALUE:
> +		ptr = reg->map_ptr;
> +		break;
> +	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
> +		ptr = reg->btf;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return false;
> +	}
> +	id = reg->id;
> +
> +	return env->cur_state->active_spin_lock_ptr == ptr &&
> +	       env->cur_state->active_spin_lock_id == id;
> +}
> +
> +static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
> +					   struct bpf_reg_state *reg,
> +					   u32 regno,
> +					   struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	struct btf_type_fields *tab = NULL;
> +	struct btf_field *field;
> +	u32 list_head_off;
> +
> +	if (meta->btf != btf_vmlinux ||
> +	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
> +	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail] &&
> +	     meta->func_id != special_kfunc_list[KF_bpf_list_del] &&
> +	     meta->func_id != special_kfunc_list[KF_bpf_list_del_tail])) {
> +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> +		return -EFAULT;
> +	}
> +
> +	if (reg->type == PTR_TO_MAP_VALUE) {
> +		tab = reg->map_ptr->fields_tab;
> +	} else /* PTR_TO_BTF_ID | MEM_TYPE_LOCAL */ {
> +		struct btf_struct_meta *meta;
> +
> +		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> +		if (!meta) {
> +			verbose(env, "bpf_list_head not found for local kptr\n");
> +			return -EINVAL;
> +		}
> +		tab = meta->fields_tab;
> +	}
> +
> +	if (!tnum_is_const(reg->var_off)) {
> +		verbose(env,
> +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> +			regno);
> +		return -EINVAL;
> +	}
> +
> +	list_head_off = reg->off + reg->var_off.value;
> +	field = btf_type_fields_find(tab, list_head_off, BPF_LIST_HEAD);
> +	if (!field) {
> +		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
> +		return -EINVAL;
> +	}
> +
> +	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
> +	if (!is_reg_allocation_locked(env, reg)) {
> +		verbose(env, "bpf_spin_lock at off=%d must be held for manipulating bpf_list_head\n",
> +			tab->spin_lock_off);
> +		return -EINVAL;
> +	}
> +
> +	if (meta->func_id == special_kfunc_list[KF_bpf_list_add] ||
> +	    meta->func_id == special_kfunc_list[KF_bpf_list_add_tail]) {
> +		if (!btf_struct_ids_match(&env->log, meta->arg_list_node.reg_btf,
> +					  meta->arg_list_node.reg_btf_id, 0,
> +					  field->list_head.btf, field->list_head.value_btf_id, true)) {
> +			verbose(env, "bpf_list_head value type does not match arg#0\n");
> +			return -EINVAL;
> +		}
> +		if (meta->arg_list_node.reg_offset != field->list_head.node_offset) {
> +			verbose(env, "arg#0 offset must be for bpf_list_node at off=%d\n",
> +				field->list_head.node_offset);
> +			return -EINVAL;
> +		}
> +		/* Set arg#0 for expiration after unlock */
> +		ref_obj_id_set_release_on_unlock(env, meta->arg_list_node.reg_ref_obj_id);
> +	} else {
> +		if (meta->arg_list_head.field) {
> +			verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
> +			return -EFAULT;
> +		}
> +		meta->arg_list_head.field = field;
> +	}
> +	return 0;
> +}
> +
> +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> +					   struct bpf_reg_state *reg,
> +					   u32 regno,
> +					   struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	struct btf_struct_meta *struct_meta;
> +	struct btf_type_fields *tab;
> +	struct btf_field *field;
> +	u32 list_node_off;
> +
> +	if (meta->btf != btf_vmlinux ||
> +	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
> +	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail])) {
> +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> +		return -EFAULT;
> +	}
> +
> +	if (!tnum_is_const(reg->var_off)) {
> +		verbose(env,
> +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> +			regno);
> +		return -EINVAL;
> +	}
> +
> +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> +	if (!struct_meta) {
> +		verbose(env, "bpf_list_node not found for local kptr\n");
> +		return -EINVAL;
> +	}
> +	tab = struct_meta->fields_tab;
> +
> +	list_node_off = reg->off + reg->var_off.value;
> +	field = btf_type_fields_find(tab, list_node_off, BPF_LIST_NODE);
> +	if (!field || field->offset != list_node_off) {
> +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> +		return -EINVAL;
> +	}
> +	if (meta->arg_list_node.field) {
> +		verbose(env, "verifier internal error: repeating bpf_list_node arg\n");
> +		return -EFAULT;
> +	}
> +	meta->arg_list_node.field = field;
> +	meta->arg_list_node.reg_btf = reg->btf;
> +	meta->arg_list_node.reg_btf_id = reg->btf_id;
> +	meta->arg_list_node.reg_offset = list_node_off;
> +	meta->arg_list_node.reg_ref_obj_id = reg->ref_obj_id;
> +	return 0;
> +}
> +
>  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
>  {
>  	const char *func_name = meta->func_name, *ref_tname;
> @@ -8157,6 +8393,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  			break;
>  		case KF_ARG_PTR_TO_KPTR_STRONG:
>  		case KF_ARG_PTR_TO_DYNPTR:
> +		case KF_ARG_PTR_TO_LIST_HEAD:
> +		case KF_ARG_PTR_TO_LIST_NODE:
>  		case KF_ARG_PTR_TO_MEM:
>  		case KF_ARG_PTR_TO_MEM_SIZE:
>  			/* Trusted by default */
> @@ -8194,17 +8432,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  				meta->arg_kptr_drop.btf_id = reg->btf_id;
>  			}
>  			break;
> -		case KF_ARG_PTR_TO_BTF_ID:
> -			/* Only base_type is checked, further checks are done here */
> -			if (reg->type != PTR_TO_BTF_ID &&
> -			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> -				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> -				return -EINVAL;
> -			}
> -			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> -			if (ret < 0)
> -				return ret;
> -			break;
>  		case KF_ARG_PTR_TO_KPTR_STRONG:
>  			if (reg->type != PTR_TO_MAP_VALUE) {
>  				verbose(env, "arg#0 expected pointer to map value\n");
> @@ -8232,6 +8459,44 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  				return -EINVAL;
>  			}
>  			break;
> +		case KF_ARG_PTR_TO_LIST_HEAD:
> +			if (reg->type != PTR_TO_MAP_VALUE &&
> +			    reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
> +				verbose(env, "arg#%d expected pointer to map value or local kptr\n", i);
> +				return -EINVAL;
> +			}
> +			if (reg->type == (PTR_TO_BTF_ID | MEM_TYPE_LOCAL) && !reg->ref_obj_id) {
> +				verbose(env, "local kptr must be referenced\n");
> +				return -EINVAL;
> +			}
> +			ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta);
> +			if (ret < 0)
> +				return ret;
> +			break;
> +		case KF_ARG_PTR_TO_LIST_NODE:
> +			if (reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
> +				verbose(env, "arg#%d expected point to local kptr\n", i);
> +				return -EINVAL;
> +			}
> +			if (!reg->ref_obj_id) {
> +				verbose(env, "local kptr must be referenced\n");
> +				return -EINVAL;
> +			}
> +			ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta);
> +			if (ret < 0)
> +				return ret;
> +			break;
> +		case KF_ARG_PTR_TO_BTF_ID:
> +			/* Only base_type is checked, further checks are done here */
> +			if (reg->type != PTR_TO_BTF_ID &&
> +			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> +				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> +				return -EINVAL;
> +			}
> +			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> +			if (ret < 0)
> +				return ret;
> +			break;
>  		case KF_ARG_PTR_TO_MEM:
>  			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
>  			if (IS_ERR(resolve_ret)) {
> @@ -8352,11 +8617,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
>  
>  		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
> -			if (!btf_type_is_void(ptr_type)) {
> -				verbose(env, "kernel function %s must have void * return type\n",
> -					meta.func_name);
> -				return -EINVAL;
> -			}
>  			if (meta.func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
>  				const struct btf_type *ret_t;
>  				struct btf *ret_btf;
> @@ -8394,6 +8654,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  				env->insn_aux_data[insn_idx].kptr_struct_meta =
>  					btf_find_struct_meta(meta.arg_kptr_drop.btf,
>  							     meta.arg_kptr_drop.btf_id);
> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_list_del] ||
> +				   meta.func_id == special_kfunc_list[KF_bpf_list_del_tail]) {
> +				struct btf_field *field = meta.arg_list_head.field;
> +
> +				mark_reg_known_zero(env, regs, BPF_REG_0);
> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_TYPE_LOCAL;
> +				regs[BPF_REG_0].btf = field->list_head.btf;
> +				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
> +				regs[BPF_REG_0].off = field->list_head.node_offset;
>  			} else {
>  				verbose(env, "kernel function %s unhandled dynamic return type\n",
>  					meta.func_name);
> @@ -13062,11 +13331,18 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> -				if (env->cur_state->active_spin_lock_ptr &&
> -				    (insn->src_reg == BPF_PSEUDO_CALL ||
> -				     insn->imm != BPF_FUNC_spin_unlock)) {
> -					verbose(env, "function calls are not allowed while holding a lock\n");
> -					return -EINVAL;
> +				if (env->cur_state->active_spin_lock_ptr) {
> +					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
> +					    (insn->src_reg == BPF_PSEUDO_CALL) ||
> +					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> +					     (insn->off != 0 ||
> +					      (insn->imm != special_kfunc_list[KF_bpf_list_add] &&
> +					       insn->imm != special_kfunc_list[KF_bpf_list_add_tail] &&
> +					       insn->imm != special_kfunc_list[KF_bpf_list_del] &&
> +					       insn->imm != special_kfunc_list[KF_bpf_list_del_tail])))) {

There's some similar special_kfunc_list checking in 
process_kf_arg_ptr_to_list_head. Can you make a helper for this check?
kfunc_manipulates_bpf_list or something? Similarly for
KF_bpf_list_del{_tail} check in previous hunk, maybe
something like kfunc_acquires_bpf_list_node?

> +						verbose(env, "function calls are not allowed while holding a lock\n");
> +						return -EINVAL;
> +					}
>  				}
>  				if (insn->src_reg == BPF_PSEUDO_CALL)
>  					err = check_func_call(env, insn, &env->insn_idx);
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index c47d16f3e817..21b85cd721cb 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -52,4 +52,32 @@ extern void bpf_kptr_drop_impl(void *kptr, void *meta__ign) __ksym;
>  /* Convenience macro to wrap over bpf_kptr_drop_impl */
>  #define bpf_kptr_drop(kptr) bpf_kptr_drop_impl(kptr, NULL)
>  
> +/* Description
> + *	Add a new entry to the head of the BPF linked list.
> + * Returns
> + *	Void.
> + */
> +extern void bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head) __ksym;
> +
> +/* Description
> + *	Add a new entry to the tail of the BPF linked list.
> + * Returns
> + *	Void.
> + */
> +extern void bpf_list_add_tail(struct bpf_list_node *node, struct bpf_list_head *head) __ksym;
> +
> +/* Description
> + *	Remove the entry at head of the BPF linked list.
> + * Returns
> + *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
> + */
> +extern struct bpf_list_node *bpf_list_del(struct bpf_list_head *head) __ksym;
> +
> +/* Description
> + *	Remove the entry at tail of the BPF linked list.
> + * Returns
> + *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
> + */
> +extern struct bpf_list_node *bpf_list_del_tail(struct bpf_list_head *head) __ksym;
> +
>  #endif

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-25 16:27       ` Dave Marchevsky
@ 2022-10-25 18:11         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-25 18:11 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Delyan Kratunov

On Tue, Oct 25, 2022 at 09:57:58PM IST, Dave Marchevsky wrote:
> On 10/19/22 8:48 PM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Oct 19, 2022 at 10:45:22PM IST, Dave Marchevsky wrote:
> >> On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> >>> Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
> >>> type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
> >>> type tag in reg->type to avoid having to check btf_is_kernel when trying
> >>> to match argument types in helpers.
> >>>
> >>> For now, these local kptrs will always be referenced in verifier
> >>> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> >>> to such objects, as long fields that are special are not touched
> >>> (support for which will be added in subsequent patches).
> >>>
> >>> No PROBE_MEM handling is hence done since they can never be in an
> >>> undefined state, and their lifetime will always be valid.
> >>>
> >>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> >>> ---
>
> [...]
>
> >>
> >>
> >> more re: passing entire reg state to btf_struct access:
> >>
> >> In the next patch in the series ("bpf: Recognize bpf_{spin_lock,list_head,list_node} in local kptrs")
> >> you do btf_find_struct_meta(btf, reg->btf_id). I see why you couldn't use 't'
> >> that's passed in here / elsewhere since you need the btf_id for meta lookup.
> >> Perhaps 'btf_type *t' param can be changed to btf_id, eliminating the need
> >> to pass 'reg'.
> >>
> >> Alternatively, since we're already passing reg->btf and result of
> >> btf_type_by_id(reg->btf, reg->btf_id), seems like btf_struct_access
> >> maybe is tied closely enough to reg state that passing reg state
> >> directly and getting rid of extraneous args is cleaner.
> >>
> >
> > So Alexei actually suggested dropping both btf and type arguments and simply
> > pass in the register and get it from there.
> >
> > But one call site threw a wrench in the plan:
> >
> > check_ptr_to_map_access -> btf_struct_access
> >
> > Here, it passes it's own btf and type to simulate access to a map. Maybe I
> > should be creating a dummy register on stack and make it work like that for this
> > particular case? Otherwise all other callers pass in what they have from reg.
>
> Ah, sorry for missing that. Personally I'm not a fan of dummy register on the
> stack. Then if btf_struct_access starts using some reg state that wasn't
> populated in the dummy reg it will be confusing.

Well, it can be initialized the same way it's done for normal regs. memset to 0
and then mark_reg_known_zero, memset zeroes state untouched by
mark_reg_known_zero (reg->parent, reg->live, etc.). which won't matter here.

In general, your point is valid as well, but I think for this particular case we
can get away with it, moreso because this is the only case needing such
adjustment.

Or maybe it can be split into two, with the inner call taking btf and btf_id,
while the outer one passes them in. Then btf_struct_access uses reg to pass them
in, while __btf_struct_access takes them directly.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs
  2022-10-25 16:32   ` Dave Marchevsky
@ 2022-10-25 18:11     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-25 18:11 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Delyan Kratunov

On Tue, Oct 25, 2022 at 10:02:49PM IST, Dave Marchevsky wrote:
> On 10/13/22 2:22 AM, Kumar Kartikeya Dwivedi wrote:
> > Introduce the idea of local kptrs, i.e. PTR_TO_BTF_ID that point to a
> > type in program BTF. This is indicated by the presence of MEM_TYPE_LOCAL
> > type tag in reg->type to avoid having to check btf_is_kernel when trying
> > to match argument types in helpers.
> >
> > For now, these local kptrs will always be referenced in verifier
> > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > to such objects, as long fields that are special are not touched
> > (support for which will be added in subsequent patches).
> >
> > No PROBE_MEM handling is hence done since they can never be in an
> > undefined state, and their lifetime will always be valid.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> One nit unrelated to the other thread we have going for this patch.
>
> [...]
>
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 066984d73a8b..65f444405d9c 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -6019,11 +6019,13 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
> >  	return -EINVAL;
> >  }
> >
> > -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> > +int btf_struct_access(struct bpf_verifier_log *log,
> > +		      const struct bpf_reg_state *reg, const struct btf *btf,
> >  		      const struct btf_type *t, int off, int size,
> >  		      enum bpf_access_type atype __maybe_unused,
> >  		      u32 *next_btf_id, enum bpf_type_flag *flag)
> >  {
> > +	bool local_type = reg && (type_flag(reg->type) & MEM_TYPE_LOCAL);
>
> Can you add a type_is_local_kptr helper (or similar name) to reduce this
> type_flag(reg->type) & MEM_TYPE_LOCAL repetition here and elsewhere in the patch?
> Some examples of repetition in verifier.c below.
>

Good point, it was there in RFC but for some reason I decided against it. I will
include it in v3.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API
  2022-10-25 17:45   ` Dave Marchevsky
@ 2022-10-25 19:00     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 52+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-10-25 19:00 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Delyan Kratunov

On Tue, Oct 25, 2022 at 11:15:12PM IST, Dave Marchevsky wrote:
> On 10/13/22 2:23 AM, Kumar Kartikeya Dwivedi wrote:
> > Add a linked list API for use in BPF programs, where it expects
> > protection from the bpf_spin_lock in the same allocation as the
> > bpf_list_head. Future patches will extend the same infrastructure to
> > have different flavors with varying protection domains and visibility
> > (e.g. percpu variant with local_t protection, usable in NMI progs).
> >
> > The following functions are added to kick things off:
> >
> > bpf_list_add
> > bpf_list_add_tail
> > bpf_list_del
> > bpf_list_del_tail
> >
> > The lock protecting the bpf_list_head needs to be taken for all
> > operations.
> >
> > Once a node has been added to the list, it's pointer changes to
> > PTR_UNTRUSTED. However, it is only released once the lock protecting the
> > list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an
> > active ref_obj_id, it is still permitted to read and write to them as
> > long as the lock is held.
> >
> > bpf_list_del and bpf_list_del_tail delete the first or last item of the
> > list respectively, and return pointer to the element at the list_node
> > offset. The user can then use container_of style macro to get the actual
> > entry type. The verifier however statically knows the actual type, so
> > the safety properties are still preserved.
> >
> > With these additions, programs can now manage their own linked lists and
> > store their objects in them.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > [...]
> > +static bool ref_obj_id_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
> > +{
> > +	struct bpf_func_state *state = cur_func(env);
> > +	struct bpf_reg_state *reg;
> > +	int i;
> > +
> > +	/* bpf_spin_lock only allows calling list_add and list_del, no BPF
> > +	 * subprogs, no global functions, so this acquired refs state is the
> > +	 * same one we will use to find registers to kill on bpf_spin_unlock.
> > +	 */
>
> It's unclear to me what "only allows calling ... is the same one" in this
> comment is trying to say. Are you trying to say something similar to your
> comment in the process_spin_lock change in this patch? e.g. "bpf_spin_lock CS
> does not allow functions that release the local kptr, so this ref_obj_id will
> still be valid then". At least to me the language in the other comment is
> clearer.

It's not even about the same ref_obj_id. It means what it says, when you call a
BPF function the reference state is copied into the callee func state, since
that is not allowed here, the walk over reference state IDs and killing
release_on_unlock ones on bpf_spin_unlock will always happen for this particular
bpf_reference_state only.

>
> > +	WARN_ON_ONCE(!ref_obj_id);
>
> Can this be a verbose("verifier internal error: ...") + return?
> Same for similar 'this should never happen' checks elsewhere in this function
> and patch.
>

Yes, I will address similar patterns elsewhere.

> > +	for (i = 0; i < state->acquired_refs; i++) {
> > +		if (state->refs[i].id == ref_obj_id) {
> > +			WARN_ON_ONCE(state->refs[i].release_on_unlock);
> > +			state->refs[i].release_on_unlock = true;
> > +			/* Now mark everyone sharing same ref_obj_id as untrusted */
> > +			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> > +				if (reg->ref_obj_id == ref_obj_id)
> > +					reg->type |= PTR_UNTRUSTED;
>
> To confirm my understanding: since ownership of the thing this reference points
> to has been transferred to the datastructure, it's now necessary to mark all
> instances of the reference PTR_UNTRUSTED to prevent them from being passed
> to helpers/kfuncs as the owning datastructure could make it dissapear
> at any time? Or because arbitrary kfunc might mess with bpf_list_node internal
> fields?
>

Yes, it's because the ownership through that reference is gone. So access to the
object is still permitted until unlock, but only through memory accesses handled
using PROBE_MEM.

'Through that reference' is key because it might return to the program in case
of push_front -> pop_back but we cannot know for sure (atleast not without more
complicated heap modelling for the list). It cannot know when this specific
object's ownership behind the reference returns back to the program, so this
particular reference (identified by the ref_obj_id) needs to become untrusted.

It serves both as a way to prevent passing it to list helpers anymore, and also
to give safe read access to a potentially invalid object after its ownership is
gone. This is the same state a kptr/kptr_ref loaded using load insn from a BPF
map ends up in.

I have switched things a bit to disallow stores, which is a bug right now in
this set, because one can do this:

push_front(head, &p->node);
p2 = container_of(pop_front(head));
// p2 == p
bpf_obj_drop(p2);
p->data = ...;

One can always fully initialize the object _before_ inserting it into the list,
in some cases that will be the requirement (like adding to RCU protected lists)
for correctness.

> > +			}));
> > +			return 0;
> > +		}
> > +	}
> > +	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
> > +	return -EFAULT;
>
> You're returning -EFAULT here, but the fn return type is 'bool' above.
>

Ack.

> > +}
> > +
> > +static bool is_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +	void *ptr;
> > +	u32 id;
> > +
> > +	switch ((int)reg->type) {
> > +	case PTR_TO_MAP_VALUE:
> > +		ptr = reg->map_ptr;
> > +		break;
> > +	case PTR_TO_BTF_ID | MEM_TYPE_LOCAL:
> > +		ptr = reg->btf;
> > +		break;
> > +	default:
> > +		WARN_ON_ONCE(1);
> > +		return false;
> > +	}
> > +	id = reg->id;
> > +
> > +	return env->cur_state->active_spin_lock_ptr == ptr &&
> > +	       env->cur_state->active_spin_lock_id == id;
> > +}
> > +
> > +static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
> > +					   struct bpf_reg_state *reg,
> > +					   u32 regno,
> > +					   struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +	struct btf_type_fields *tab = NULL;
> > +	struct btf_field *field;
> > +	u32 list_head_off;
> > +
> > +	if (meta->btf != btf_vmlinux ||
> > +	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
> > +	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail] &&
> > +	     meta->func_id != special_kfunc_list[KF_bpf_list_del] &&
> > +	     meta->func_id != special_kfunc_list[KF_bpf_list_del_tail])) {
> > +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (reg->type == PTR_TO_MAP_VALUE) {
> > +		tab = reg->map_ptr->fields_tab;
> > +	} else /* PTR_TO_BTF_ID | MEM_TYPE_LOCAL */ {
> > +		struct btf_struct_meta *meta;
> > +
> > +		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> > +		if (!meta) {
> > +			verbose(env, "bpf_list_head not found for local kptr\n");
> > +			return -EINVAL;
> > +		}
> > +		tab = meta->fields_tab;
> > +	}
> > +
> > +	if (!tnum_is_const(reg->var_off)) {
> > +		verbose(env,
> > +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> > +			regno);
> > +		return -EINVAL;
> > +	}
> > +
> > +	list_head_off = reg->off + reg->var_off.value;
> > +	field = btf_type_fields_find(tab, list_head_off, BPF_LIST_HEAD);
> > +	if (!field) {
> > +		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
> > +	if (!is_reg_allocation_locked(env, reg)) {
> > +		verbose(env, "bpf_spin_lock at off=%d must be held for manipulating bpf_list_head\n",
> > +			tab->spin_lock_off);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (meta->func_id == special_kfunc_list[KF_bpf_list_add] ||
> > +	    meta->func_id == special_kfunc_list[KF_bpf_list_add_tail]) {
> > +		if (!btf_struct_ids_match(&env->log, meta->arg_list_node.reg_btf,
> > +					  meta->arg_list_node.reg_btf_id, 0,
> > +					  field->list_head.btf, field->list_head.value_btf_id, true)) {
> > +			verbose(env, "bpf_list_head value type does not match arg#0\n");
> > +			return -EINVAL;
> > +		}
> > +		if (meta->arg_list_node.reg_offset != field->list_head.node_offset) {
> > +			verbose(env, "arg#0 offset must be for bpf_list_node at off=%d\n",
> > +				field->list_head.node_offset);
> > +			return -EINVAL;
> > +		}
> > +		/* Set arg#0 for expiration after unlock */
> > +		ref_obj_id_set_release_on_unlock(env, meta->arg_list_node.reg_ref_obj_id);
> > +	} else {
> > +		if (meta->arg_list_head.field) {
> > +			verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
> > +			return -EFAULT;
> > +		}
> > +		meta->arg_list_head.field = field;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> > +					   struct bpf_reg_state *reg,
> > +					   u32 regno,
> > +					   struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +	struct btf_struct_meta *struct_meta;
> > +	struct btf_type_fields *tab;
> > +	struct btf_field *field;
> > +	u32 list_node_off;
> > +
> > +	if (meta->btf != btf_vmlinux ||
> > +	    (meta->func_id != special_kfunc_list[KF_bpf_list_add] &&
> > +	     meta->func_id != special_kfunc_list[KF_bpf_list_add_tail])) {
> > +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (!tnum_is_const(reg->var_off)) {
> > +		verbose(env,
> > +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> > +			regno);
> > +		return -EINVAL;
> > +	}
> > +
> > +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> > +	if (!struct_meta) {
> > +		verbose(env, "bpf_list_node not found for local kptr\n");
> > +		return -EINVAL;
> > +	}
> > +	tab = struct_meta->fields_tab;
> > +
> > +	list_node_off = reg->off + reg->var_off.value;
> > +	field = btf_type_fields_find(tab, list_node_off, BPF_LIST_NODE);
> > +	if (!field || field->offset != list_node_off) {
> > +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> > +		return -EINVAL;
> > +	}
> > +	if (meta->arg_list_node.field) {
> > +		verbose(env, "verifier internal error: repeating bpf_list_node arg\n");
> > +		return -EFAULT;
> > +	}
> > +	meta->arg_list_node.field = field;
> > +	meta->arg_list_node.reg_btf = reg->btf;
> > +	meta->arg_list_node.reg_btf_id = reg->btf_id;
> > +	meta->arg_list_node.reg_offset = list_node_off;
> > +	meta->arg_list_node.reg_ref_obj_id = reg->ref_obj_id;
> > +	return 0;
> > +}
> > +
> >  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
> >  {
> >  	const char *func_name = meta->func_name, *ref_tname;
> > @@ -8157,6 +8393,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  			break;
> >  		case KF_ARG_PTR_TO_KPTR_STRONG:
> >  		case KF_ARG_PTR_TO_DYNPTR:
> > +		case KF_ARG_PTR_TO_LIST_HEAD:
> > +		case KF_ARG_PTR_TO_LIST_NODE:
> >  		case KF_ARG_PTR_TO_MEM:
> >  		case KF_ARG_PTR_TO_MEM_SIZE:
> >  			/* Trusted by default */
> > @@ -8194,17 +8432,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  				meta->arg_kptr_drop.btf_id = reg->btf_id;
> >  			}
> >  			break;
> > -		case KF_ARG_PTR_TO_BTF_ID:
> > -			/* Only base_type is checked, further checks are done here */
> > -			if (reg->type != PTR_TO_BTF_ID &&
> > -			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> > -				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> > -				return -EINVAL;
> > -			}
> > -			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> > -			if (ret < 0)
> > -				return ret;
> > -			break;
> >  		case KF_ARG_PTR_TO_KPTR_STRONG:
> >  			if (reg->type != PTR_TO_MAP_VALUE) {
> >  				verbose(env, "arg#0 expected pointer to map value\n");
> > @@ -8232,6 +8459,44 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  				return -EINVAL;
> >  			}
> >  			break;
> > +		case KF_ARG_PTR_TO_LIST_HEAD:
> > +			if (reg->type != PTR_TO_MAP_VALUE &&
> > +			    reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
> > +				verbose(env, "arg#%d expected pointer to map value or local kptr\n", i);
> > +				return -EINVAL;
> > +			}
> > +			if (reg->type == (PTR_TO_BTF_ID | MEM_TYPE_LOCAL) && !reg->ref_obj_id) {
> > +				verbose(env, "local kptr must be referenced\n");
> > +				return -EINVAL;
> > +			}
> > +			ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta);
> > +			if (ret < 0)
> > +				return ret;
> > +			break;
> > +		case KF_ARG_PTR_TO_LIST_NODE:
> > +			if (reg->type != (PTR_TO_BTF_ID | MEM_TYPE_LOCAL)) {
> > +				verbose(env, "arg#%d expected point to local kptr\n", i);
> > +				return -EINVAL;
> > +			}
> > +			if (!reg->ref_obj_id) {
> > +				verbose(env, "local kptr must be referenced\n");
> > +				return -EINVAL;
> > +			}
> > +			ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta);
> > +			if (ret < 0)
> > +				return ret;
> > +			break;
> > +		case KF_ARG_PTR_TO_BTF_ID:
> > +			/* Only base_type is checked, further checks are done here */
> > +			if (reg->type != PTR_TO_BTF_ID &&
> > +			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> > +				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> > +				return -EINVAL;
> > +			}
> > +			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> > +			if (ret < 0)
> > +				return ret;
> > +			break;
> >  		case KF_ARG_PTR_TO_MEM:
> >  			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
> >  			if (IS_ERR(resolve_ret)) {
> > @@ -8352,11 +8617,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
> >
> >  		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
> > -			if (!btf_type_is_void(ptr_type)) {
> > -				verbose(env, "kernel function %s must have void * return type\n",
> > -					meta.func_name);
> > -				return -EINVAL;
> > -			}
> >  			if (meta.func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
> >  				const struct btf_type *ret_t;
> >  				struct btf *ret_btf;
> > @@ -8394,6 +8654,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  				env->insn_aux_data[insn_idx].kptr_struct_meta =
> >  					btf_find_struct_meta(meta.arg_kptr_drop.btf,
> >  							     meta.arg_kptr_drop.btf_id);
> > +			} else if (meta.func_id == special_kfunc_list[KF_bpf_list_del] ||
> > +				   meta.func_id == special_kfunc_list[KF_bpf_list_del_tail]) {
> > +				struct btf_field *field = meta.arg_list_head.field;
> > +
> > +				mark_reg_known_zero(env, regs, BPF_REG_0);
> > +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_TYPE_LOCAL;
> > +				regs[BPF_REG_0].btf = field->list_head.btf;
> > +				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
> > +				regs[BPF_REG_0].off = field->list_head.node_offset;
> >  			} else {
> >  				verbose(env, "kernel function %s unhandled dynamic return type\n",
> >  					meta.func_name);
> > @@ -13062,11 +13331,18 @@ static int do_check(struct bpf_verifier_env *env)
> >  					return -EINVAL;
> >  				}
> >
> > -				if (env->cur_state->active_spin_lock_ptr &&
> > -				    (insn->src_reg == BPF_PSEUDO_CALL ||
> > -				     insn->imm != BPF_FUNC_spin_unlock)) {
> > -					verbose(env, "function calls are not allowed while holding a lock\n");
> > -					return -EINVAL;
> > +				if (env->cur_state->active_spin_lock_ptr) {
> > +					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
> > +					    (insn->src_reg == BPF_PSEUDO_CALL) ||
> > +					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
> > +					     (insn->off != 0 ||
> > +					      (insn->imm != special_kfunc_list[KF_bpf_list_add] &&
> > +					       insn->imm != special_kfunc_list[KF_bpf_list_add_tail] &&
> > +					       insn->imm != special_kfunc_list[KF_bpf_list_del] &&
> > +					       insn->imm != special_kfunc_list[KF_bpf_list_del_tail])))) {
>
> There's some similar special_kfunc_list checking in
> process_kf_arg_ptr_to_list_head. Can you make a helper for this check?
> kfunc_manipulates_bpf_list or something? Similarly for
> KF_bpf_list_del{_tail} check in previous hunk, maybe
> something like kfunc_acquires_bpf_list_node?
>

Good point, it makes sense to hide this list behind a helper.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2022-10-25 19:00 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13  6:22 [PATCH bpf-next v2 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 01/25] bpf: Document UAPI details for special BPF types Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 02/25] bpf: Allow specifying volatile type modifier for kptrs Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 03/25] bpf: Clobber stack slot when writing over spilled PTR_TO_BTF_ID Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 04/25] bpf: Fix slot type check in check_stack_write_var_off Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 05/25] bpf: Drop reg_type_may_be_refcounted_or_null Kumar Kartikeya Dwivedi
2022-10-19 16:04   ` Dave Marchevsky
2022-10-13  6:22 ` [PATCH bpf-next v2 06/25] bpf: Refactor kptr_off_tab into fields_tab Kumar Kartikeya Dwivedi
2022-10-19  1:35   ` Alexei Starovoitov
2022-10-19  5:42     ` Kumar Kartikeya Dwivedi
2022-10-19 15:54       ` Alexei Starovoitov
2022-10-19 23:57         ` Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 07/25] bpf: Consolidate spin_lock, timer management " Kumar Kartikeya Dwivedi
2022-10-19  1:40   ` Alexei Starovoitov
2022-10-19  5:43     ` Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 08/25] bpf: Refactor map->off_arr handling Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 09/25] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
2022-10-19  1:59   ` Alexei Starovoitov
2022-10-19  5:48     ` Kumar Kartikeya Dwivedi
2022-10-19 15:57       ` Alexei Starovoitov
2022-10-19 23:59         ` Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 10/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
2022-10-19 17:15   ` Dave Marchevsky
2022-10-20  0:48     ` Kumar Kartikeya Dwivedi
2022-10-25 16:27       ` Dave Marchevsky
2022-10-25 18:11         ` Kumar Kartikeya Dwivedi
2022-10-25 16:32   ` Dave Marchevsky
2022-10-25 18:11     ` Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 11/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in " Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 12/25] bpf: Verify ownership relationships for owning types Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 13/25] bpf: Support locking bpf_spin_lock in local kptr Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 14/25] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 15/25] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-10-13 13:48   ` kernel test robot
2022-10-13  6:22 ` [PATCH bpf-next v2 16/25] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 17/25] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 18/25] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 19/25] bpf: Introduce bpf_kptr_new Kumar Kartikeya Dwivedi
2022-10-19  2:31   ` Alexei Starovoitov
2022-10-19  5:58     ` Kumar Kartikeya Dwivedi
2022-10-19 16:31       ` Alexei Starovoitov
2022-10-20  0:44         ` Kumar Kartikeya Dwivedi
2022-10-20  1:11           ` Alexei Starovoitov
2022-10-13  6:22 ` [PATCH bpf-next v2 20/25] bpf: Introduce bpf_kptr_drop Kumar Kartikeya Dwivedi
2022-10-13  6:22 ` [PATCH bpf-next v2 21/25] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-10-13  6:23 ` [PATCH bpf-next v2 22/25] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-10-25 17:45   ` Dave Marchevsky
2022-10-25 19:00     ` Kumar Kartikeya Dwivedi
2022-10-13  6:23 ` [PATCH bpf-next v2 23/25] libbpf: Add support for private BSS map section Kumar Kartikeya Dwivedi
2022-10-18  4:03   ` Andrii Nakryiko
2022-10-13  6:23 ` [PATCH bpf-next v2 24/25] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
2022-10-13  6:23 ` [PATCH bpf-next v2 25/25] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).