All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists
@ 2022-11-14 19:15 Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 01/26] bpf: Remove local kptr references in documentation Kumar Kartikeya Dwivedi
                   ` (26 more replies)
  0 siblings, 27 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

This series introduces user defined BPF objects of a type in program
BTF. This allows BPF programs to allocate their own objects, build their
own object hierarchies, and use the basic building blocks provided by
BPF runtime to build their own data structures flexibly.

Then, we introduce the support for single ownership BPF linked lists,
which can be put inside BPF maps, or allocated objects, and hold such
allocated objects as elements. It works as an instrusive collection,
which is done to allow making allocated objects part of multiple data
structures at the same time in the future.

The eventual goal of this and future patches is to allow one to do some
limited form of kernel style programming in BPF C, and allow programmers
to build their own complex data structures flexibly out of basic
building blocks.

The key difference will be that such programs are verified to be safe,
preserve runtime integrity of the system, and are proven to be bug free
as far as the invariants of BPF specific APIs are concerned.

One immediate use case that will be using the entire infrastructure this
series is introducing will be managing percpu NMI safe linked lists
inside BPF programs.

The other use case this will serve in the near future will be linking
kernel structures like XDP frame and sk_buff directly into user data
structures (rbtree, pifomap, etc.) for packet queueing. This will follow
single ownership concept included in this series.

The user has complete control of the internal locking, and hence also
the batching of operations for each critical section.

The features are:
- Allocated objects.
- bpf_obj_new, bpf_obj_drop to allocate and free them.
- Single ownership BPF linked lists.
  - Support for them in BPF maps.
  - Support for them in allocated objects.
- Global spin locks.
- Spin locks inside allocated objects.

Some other notable things:
- Completely static verification of locking.
- Kfunc argument handling has been completely reworked.
- Argument rewriting support for kfuncs.
- A new bpf_experimental.h header as a dumping ground for these APIs.

Any functionality exposed in this series is NOT part of UAPI. It is only
available through use of kfuncs, and structs that can be added to map
value may also change their size or name in the future. Hence, every
feature in this series must be considered experimental.

Follow-ups:
-----------
 * Support for kptrs (local and kernel) in local storage and percpu maps + kptr tests
 * Fixes for helper access checks rebasing on top of this series

Next steps:
-----------
 * NMI safe percpu single ownership linked lists (using local_t protection).
 * Lockless linked lists.
 * Allow RCU protected BPF allocated objects. This then allows RCU
   protected list lookups, since spinlock protection for readers does
   not scale.
 * Introduce bpf_refcount for local kptrs, shared ownership.
 * Introduce shared ownership linked lists.
 * Documentation.

Changelog:
----------
 v6 -> v7
 v6: https://lore.kernel.org/bpf/20221111193224.876706-1-memxor@gmail.com

  * Fix uninitialized variable warning (Dan Carpenter, Kernel Test Robot)
  * One more local_kptr renaming

 v5 -> v6
 v5: https://lore.kernel.org/bpf/20221107230950.7117-1-memxor@gmail.com

  * Replace (i && !off) check with next_off, include test (Andrii)
  * Drop local kptrs naming (Andrii, Alexei)
  * Drop reg->precise == EXACT patch (Andrii)
  * Add comment about ptr member of struct active_lock (Andrii)
  * Use btf__new_empty + btf__add_xxx APIs (Andrii)
  * Address other misc nits from Andrii

 v4 -> v5
 v4: https://lore.kernel.org/bpf/20221103191013.1236066-1-memxor@gmail.com

  * Add a lot more selftests (failure, success, runtime, BTF)
  * Make sure series is bisect friendly
  * Move list draining out of spin lock
    * This exposed an issue where bpf_mem_free can now be called in
      map_free path without migrate_disable, also fixed that.
  * Rename MEM_ALLOC -> MEM_RINGBUF, MEM_TYPE_LOCAL -> MEM_ALLOC (Alexei)
  * Group lock identity into a struct active_lock { ptr, id } (Dave)
  * Split set_release_on_unlock logic into separate patch (Alexei)

 v3 -> v4
 v3: https://lore.kernel.org/bpf/20221102202658.963008-1-memxor@gmail.com

  * Fix compiler error for !CONFIG_BPF_SYSCALL (Kernel Test Robot)
  * Fix error due to BUILD_BUG_ON on 32-bit platforms (Kernel Test Robot)

 v2 -> v3
 v2: https://lore.kernel.org/bpf/20221013062303.896469-1-memxor@gmail.com

  * Add ack from Dave for patch 5
  * Rename btf_type_fields -> btf_record, btf_type_fields_off ->
    btf_field_offs, rename functions similarly (Alexei)
  * Remove 'kind' component from contains declaration tag (Alexei)
  * Move bpf_list_head, bpf_list_node definitions to UAPI bpf.h (Alexei)
  * Add note in commit log about modifying btf_struct_access API (Dave)
  * Downgrade WARN_ON_ONCE to verbose(env, "...") and return -EFAULT (Dave)
  * Add type_is_local_kptr wrapper to avoid noisy checks (Dave)
  * Remove unused flags parameter from bpf_kptr_new (Alexei)
  * Rename bpf_kptr_new -> bpf_obj_new, bpf_kptr_drop -> bpf_obj_drop (Alexei)
  * Reword comment in ref_obj_id_set_release_on_unlock (Dave)
  * Fix return type of ref_obj_id_set_release_on_unlock (Dave)
  * Introduce is_bpf_list_api_kfunc to dedup checks (Dave)
  * Disallow BPF_WRITE to untrusted local kptrs
  * Add details about soundness of check_reg_allocation_locked logic
  * List untrusted local kptrs for PROBE_MEM handling

 v1 -> v2
 v1: https://lore.kernel.org/bpf/20221011012240.3149-1-memxor@gmail.com

  * Rebase on bpf-next to resolve merge conflict in DENYLIST.s390x
  * Fix a couple of mental lapses in bpf_list_head_free

 RFC v1 -> v1
 RFC v1: https://lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com

  * Mostly a complete rewrite of BTF parsing, refactor existing code (Kartikeya)
  * Rebase kfunc rewrite for bpf-next, add support for more changes
  * Cache type metadata in BTF to avoid recomputation inside verifier (Kartikeya)
  * Remove __kernel tag, make things similar to map values, reserve bpf_ prefix
  * bpf_kptr_new, bpf_kptr_drop
  * Rename precision state enum values (Alexei)
  * Drop explicit constructor/destructor support (Alexei)
  * Rewrite code for constructing/destructing objects and offload to runtime
  * Minimize duplication in bpf_map_value_off_desc handling (Alexei)
  * Expose global memory allocator (Alexei)
  * Address other nits from Alexei
  * Split out local kptrs in maps, more kptrs in maps support into a follow up

Links:
------
 * Dave's BPF RB-Tree RFC series
   v1 (Discussion thread)
     https://lore.kernel.org/bpf/20220722183438.3319790-1-davemarchevsky@fb.com
   v2 (With support for static locks)
     https://lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com
 * BPF Linked Lists Discussion
   https://lore.kernel.org/bpf/CAP01T74U30+yeBHEgmgzTJ-XYxZ0zj71kqCDJtTH9YQNfTK+Xw@mail.gmail.com
 * BPF Memory Allocator from Alexei
   https://lore.kernel.org/bpf/20220902211058.60789-1-alexei.starovoitov@gmail.com
 * BPF Memory Allocator UAPI Discussion
   https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com

Kumar Kartikeya Dwivedi (26):
  bpf: Remove local kptr references in documentation
  bpf: Remove BPF_MAP_OFF_ARR_MAX
  bpf: Fix copy_map_value, zero_map_value
  bpf: Support bpf_list_head in map values
  bpf: Rename RET_PTR_TO_ALLOC_MEM
  bpf: Rename MEM_ALLOC to MEM_RINGBUF
  bpf: Refactor btf_struct_access
  bpf: Introduce allocated objects support
  bpf: Recognize lock and list fields in allocated objects
  bpf: Verify ownership relationships for user BTF types
  bpf: Allow locking bpf_spin_lock in allocated objects
  bpf: Allow locking bpf_spin_lock global variables
  bpf: Allow locking bpf_spin_lock in inner map values
  bpf: Rewrite kfunc argument handling
  bpf: Drop kfunc bits from btf_check_func_arg_match
  bpf: Support constant scalar arguments for kfuncs
  bpf: Introduce bpf_obj_new
  bpf: Introduce bpf_obj_drop
  bpf: Permit NULL checking pointer with non-zero fixed offset
  bpf: Introduce single ownership BPF linked list API
  bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}
  selftests/bpf: Add __contains macro to bpf_experimental.h
  selftests/bpf: Update spinlock selftest
  selftests/bpf: Add failure test cases for spin lock pairing
  selftests/bpf: Add BPF linked list API tests
  selftests/bpf: Add BTF sanity tests

 Documentation/bpf/bpf_design_QA.rst           |   11 +-
 Documentation/bpf/kfuncs.rst                  |   22 +
 include/linux/bpf.h                           |  113 +-
 include/linux/bpf_verifier.h                  |   19 +-
 include/linux/btf.h                           |   67 +-
 include/linux/filter.h                        |    8 +-
 include/uapi/linux/bpf.h                      |   10 +
 kernel/bpf/btf.c                              |  812 +++++------
 kernel/bpf/core.c                             |   16 +
 kernel/bpf/helpers.c                          |  143 +-
 kernel/bpf/map_in_map.c                       |    5 -
 kernel/bpf/ringbuf.c                          |    6 +-
 kernel/bpf/syscall.c                          |   30 +-
 kernel/bpf/verifier.c                         | 1239 +++++++++++++++--
 net/bpf/bpf_dummy_struct_ops.c                |   14 +-
 net/core/filter.c                             |   34 +-
 net/ipv4/bpf_tcp_ca.c                         |   13 +-
 net/netfilter/nf_conntrack_bpf.c              |   17 +-
 tools/include/uapi/linux/bpf.h                |   10 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |    1 +
 .../testing/selftests/bpf/bpf_experimental.h  |   68 +
 .../testing/selftests/bpf/prog_tests/dynptr.c |    2 +-
 .../bpf/prog_tests/kfunc_dynptr_param.c       |    2 +-
 .../selftests/bpf/prog_tests/linked_list.c    |  738 ++++++++++
 .../selftests/bpf/prog_tests/spin_lock.c      |  136 ++
 .../selftests/bpf/prog_tests/spinlock.c       |   45 -
 .../testing/selftests/bpf/progs/linked_list.c |  370 +++++
 .../testing/selftests/bpf/progs/linked_list.h |   56 +
 .../selftests/bpf/progs/linked_list_fail.c    |  581 ++++++++
 .../selftests/bpf/progs/test_spin_lock.c      |    4 +-
 .../selftests/bpf/progs/test_spin_lock_fail.c |  204 +++
 tools/testing/selftests/bpf/verifier/calls.c  |    2 +-
 .../selftests/bpf/verifier/ref_tracking.c     |    4 +-
 .../testing/selftests/bpf/verifier/ringbuf.c  |    2 +-
 .../selftests/bpf/verifier/spill_fill.c       |    2 +-
 35 files changed, 4162 insertions(+), 644 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/spin_lock.c
 delete mode 100644 tools/testing/selftests/bpf/prog_tests/spinlock.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.h
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_spin_lock_fail.c


base-commit: e662c7753668bbfb95e25043c6064088cc3a996d
-- 
2.38.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 01/26] bpf: Remove local kptr references in documentation
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 02/26] bpf: Remove BPF_MAP_OFF_ARR_MAX Kumar Kartikeya Dwivedi
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

We don't want to commit to a specific name for these. Simply call them
allocated objects coming from bpf_obj_new, which is completely clear in
itself.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 Documentation/bpf/bpf_design_QA.rst | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index 17e774d96c5e..cec2371173d7 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -332,13 +332,14 @@ avoid defining types with 'bpf\_' prefix to not be broken in future releases.
 In other words, no backwards compatibility is guaranteed if one using a type
 in BTF with 'bpf\_' prefix.
 
-Q: What is the compatibility story for special BPF types in local kptrs?
-------------------------------------------------------------------------
-Q: Same as above, but for local kptrs (i.e. pointers to objects allocated using
-bpf_obj_new for user defined structures). Will the kernel preserve backwards
+Q: What is the compatibility story for special BPF types in allocated objects?
+------------------------------------------------------------------------------
+Q: Same as above, but for allocated objects (i.e. objects allocated using
+bpf_obj_new for user defined types). Will the kernel preserve backwards
 compatibility for these features?
 
 A: NO.
 
 Unlike map value types, there are no stability guarantees for this case. The
-whole local kptr API itself is unstable (since it is exposed through kfuncs).
+whole API to work with allocated objects and any support for special fields
+inside them is unstable (since it is exposed through kfuncs).
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 02/26] bpf: Remove BPF_MAP_OFF_ARR_MAX
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 01/26] bpf: Remove local kptr references in documentation Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 03/26] bpf: Fix copy_map_value, zero_map_value Kumar Kartikeya Dwivedi
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

In f71b2f64177a ("bpf: Refactor map->off_arr handling"), map->off_arr
was refactored to be btf_field_offs. The number of field offsets is
equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse
BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled
specially for offset sorting, fix the comment, and remove incorrect
WARN_ON as its rec->cnt can never exceed this value. The reason to keep
separate constant was the it was always more 2 more than total kptrs.
This is no longer the case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h | 9 ++++-----
 kernel/bpf/btf.c    | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 798aec816970..1a66a1df1af1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -165,9 +165,8 @@ struct bpf_map_ops {
 };
 
 enum {
-	/* Support at most 8 pointers in a BTF type */
-	BTF_FIELDS_MAX	      = 10,
-	BPF_MAP_OFF_ARR_MAX   = BTF_FIELDS_MAX,
+	/* Support at most 10 fields in a BTF type */
+	BTF_FIELDS_MAX	   = 10,
 };
 
 enum btf_field_type {
@@ -203,8 +202,8 @@ struct btf_record {
 
 struct btf_field_offs {
 	u32 cnt;
-	u32 field_off[BPF_MAP_OFF_ARR_MAX];
-	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
+	u32 field_off[BTF_FIELDS_MAX];
+	u8 field_sz[BTF_FIELDS_MAX];
 };
 
 struct bpf_map {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5579ff3a5b54..12361d7b2498 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3584,7 +3584,7 @@ struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec)
 	u8 *sz;
 
 	BUILD_BUG_ON(ARRAY_SIZE(foffs->field_off) != ARRAY_SIZE(foffs->field_sz));
-	if (IS_ERR_OR_NULL(rec) || WARN_ON_ONCE(rec->cnt > sizeof(foffs->field_off)))
+	if (IS_ERR_OR_NULL(rec))
 		return NULL;
 
 	foffs = kzalloc(sizeof(*foffs), GFP_KERNEL | __GFP_NOWARN);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 03/26] bpf: Fix copy_map_value, zero_map_value
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 01/26] bpf: Remove local kptr references in documentation Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 02/26] bpf: Remove BPF_MAP_OFF_ARR_MAX Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 04/26] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

The current offset needs to also skip over the already copied region in
addition to the size of the next field. This case manifests where there
are gaps between adjacent special fields.

It was observed that for a map value with size 48, having fields at:
off:  0, 16, 32
size: 4, 16, 16

The current code does:

memcpy(dst + 0, src + 0, 0)
memcpy(dst + 4, src + 4, 12)
memcpy(dst + 20, src + 20, 12)
memcpy(dst + 36, src + 36, 12)

With the fix, it is done correctly as:

memcpy(dst + 0, src + 0, 0)
memcpy(dst + 4, src + 4, 12)
memcpy(dst + 32, src + 32, 0)
memcpy(dst + 48, src + 48, 0)

Fixes: 4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1a66a1df1af1..f08eb2d27de0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -360,7 +360,7 @@ static inline void bpf_obj_memcpy(struct btf_field_offs *foffs,
 		u32 sz = next_off - curr_off;
 
 		memcpy(dst + curr_off, src + curr_off, sz);
-		curr_off += foffs->field_sz[i];
+		curr_off += foffs->field_sz[i] + sz;
 	}
 	memcpy(dst + curr_off, src + curr_off, size - curr_off);
 }
@@ -390,7 +390,7 @@ static inline void bpf_obj_memzero(struct btf_field_offs *foffs, void *dst, u32
 		u32 sz = next_off - curr_off;
 
 		memset(dst + curr_off, 0, sz);
-		curr_off += foffs->field_sz[i];
+		curr_off += foffs->field_sz[i] + sz;
 	}
 	memset(dst + curr_off, 0, size - curr_off);
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 04/26] bpf: Support bpf_list_head in map values
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 03/26] bpf: Fix copy_map_value, zero_map_value Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 05/26] bpf: Rename RET_PTR_TO_ALLOC_MEM Kumar Kartikeya Dwivedi
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Add the support on the map side to parse, recognize, verify, and build
metadata table for a new special field of the type struct bpf_list_head.
To parameterize the bpf_list_head for a certain value type and the
list_node member it will accept in that value type, we use BTF
declaration tags.

The definition of bpf_list_head in a map value will be done as follows:

struct foo {
	struct bpf_list_node node;
	int data;
};

struct map_value {
	struct bpf_list_head head __contains(foo, node);
};

Then, the bpf_list_head only allows adding to the list 'head' using the
bpf_list_node 'node' for the type struct foo.

The 'contains' annotation is a BTF declaration tag composed of four
parts, "contains:name:node" where the name is then used to look up the
type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
the lookup. The node defines name of the member in this type that has
the type struct bpf_list_node, which is actually used for linking into
the linked list. For now, 'kind' part is hardcoded as struct.

This allows building intrusive linked lists in BPF, using container_of
to obtain pointer to entry, while being completely type safe from the
perspective of the verifier. The verifier knows exactly the type of the
nodes, and knows that list helpers return that type at some fixed offset
where the bpf_list_node member used for this list exists. The verifier
also uses this information to disallow adding types that are not
accepted by a certain list.

For now, no elements can be added to such lists. Support for that is
coming in future patches, hence draining and freeing items is done with
a TODO that will be resolved in a future patch.

Note that the bpf_list_head_free function moves the list out to a local
variable under the lock and releases it, doing the actual draining of
the list items outside the lock. While this helps with not holding the
lock for too long pessimizing other concurrent list operations, it is
also necessary for deadlock prevention: unless every function called in
the critical section would be notrace, a fentry/fexit program could
attach and call bpf_map_update_elem again on the map, leading to the
same lock being acquired if the key matches and lead to a deadlock.
While this requires some special effort on part of the BPF programmer to
trigger and is highly unlikely to occur in practice, it is always better
if we can avoid such a condition.

While notrace would prevent this, doing the draining outside the lock
has advantages of its own, hence it is used to also fix the deadlock
related problem.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h            |  17 ++++
 include/uapi/linux/bpf.h       |  10 +++
 kernel/bpf/btf.c               | 145 ++++++++++++++++++++++++++++++++-
 kernel/bpf/helpers.c           |  32 ++++++++
 kernel/bpf/syscall.c           |  22 ++++-
 kernel/bpf/verifier.c          |   7 ++
 tools/include/uapi/linux/bpf.h |  10 +++
 7 files changed, 239 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f08eb2d27de0..05f98e9e5c48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -175,6 +175,7 @@ enum btf_field_type {
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
+	BPF_LIST_HEAD  = (1 << 4),
 };
 
 struct btf_field_kptr {
@@ -184,11 +185,18 @@ struct btf_field_kptr {
 	u32 btf_id;
 };
 
+struct btf_field_list_head {
+	struct btf *btf;
+	u32 value_btf_id;
+	u32 node_offset;
+};
+
 struct btf_field {
 	u32 offset;
 	enum btf_field_type type;
 	union {
 		struct btf_field_kptr kptr;
+		struct btf_field_list_head list_head;
 	};
 };
 
@@ -266,6 +274,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return "kptr";
+	case BPF_LIST_HEAD:
+		return "bpf_list_head";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -282,6 +292,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return sizeof(u64);
+	case BPF_LIST_HEAD:
+		return sizeof(struct bpf_list_head);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -298,6 +310,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return __alignof__(u64);
+	case BPF_LIST_HEAD:
+		return __alignof__(struct bpf_list_head);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -403,6 +417,9 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 			   bool lock_src);
 void bpf_timer_cancel_and_free(void *timer);
+void bpf_list_head_free(const struct btf_field *field, void *list_head,
+			struct bpf_spin_lock *spin_lock);
+
 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
 
 struct bpf_offload_dev;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fb4c911d2a03..6580448e9f77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6888,6 +6888,16 @@ struct bpf_dynptr {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_list_head {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_list_node {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 12361d7b2498..c0d73d71c539 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3205,9 +3205,15 @@ enum {
 struct btf_field_info {
 	enum btf_field_type type;
 	u32 off;
-	struct {
-		u32 type_id;
-	} kptr;
+	union {
+		struct {
+			u32 type_id;
+		} kptr;
+		struct {
+			const char *node_name;
+			u32 value_btf_id;
+		} list_head;
+	};
 };
 
 static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
@@ -3261,6 +3267,63 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	return BTF_FIELD_FOUND;
 }
 
+static const char *btf_find_decl_tag_value(const struct btf *btf,
+					   const struct btf_type *pt,
+					   int comp_idx, const char *tag_key)
+{
+	int i;
+
+	for (i = 1; i < btf_nr_types(btf); i++) {
+		const struct btf_type *t = btf_type_by_id(btf, i);
+		int len = strlen(tag_key);
+
+		if (!btf_type_is_decl_tag(t))
+			continue;
+		if (pt != btf_type_by_id(btf, t->type) ||
+		    btf_type_decl_tag(t)->component_idx != comp_idx)
+			continue;
+		if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len))
+			continue;
+		return __btf_name_by_offset(btf, t->name_off) + len;
+	}
+	return NULL;
+}
+
+static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt,
+			      const struct btf_type *t, int comp_idx,
+			      u32 off, int sz, struct btf_field_info *info)
+{
+	const char *value_type;
+	const char *list_node;
+	s32 id;
+
+	if (!__btf_type_is_struct(t))
+		return BTF_FIELD_IGNORE;
+	if (t->size != sz)
+		return BTF_FIELD_IGNORE;
+	value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:");
+	if (!value_type)
+		return -EINVAL;
+	list_node = strstr(value_type, ":");
+	if (!list_node)
+		return -EINVAL;
+	value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN);
+	if (!value_type)
+		return -ENOMEM;
+	id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT);
+	kfree(value_type);
+	if (id < 0)
+		return id;
+	list_node++;
+	if (str_is_empty(list_node))
+		return -EINVAL;
+	info->type = BPF_LIST_HEAD;
+	info->off = off;
+	info->list_head.value_btf_id = id;
+	info->list_head.node_name = list_node;
+	return BTF_FIELD_FOUND;
+}
+
 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			      int *align, int *sz)
 {
@@ -3284,6 +3347,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
+	if (field_mask & BPF_LIST_HEAD) {
+		if (!strcmp(name, "bpf_list_head")) {
+			type = BPF_LIST_HEAD;
+			goto end;
+		}
+	}
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3339,6 +3408,12 @@ static int btf_find_struct_field(const struct btf *btf,
 			if (ret < 0)
 				return ret;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_find_list_head(btf, t, member_type, i, off, sz,
+						 idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
@@ -3393,6 +3468,12 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			if (ret < 0)
 				return ret;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_find_list_head(btf, var, var_type, -1, off, sz,
+						 idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
@@ -3491,11 +3572,52 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 	return ret;
 }
 
+static int btf_parse_list_head(const struct btf *btf, struct btf_field *field,
+			       struct btf_field_info *info)
+{
+	const struct btf_type *t, *n = NULL;
+	const struct btf_member *member;
+	u32 offset;
+	int i;
+
+	t = btf_type_by_id(btf, info->list_head.value_btf_id);
+	/* We've already checked that value_btf_id is a struct type. We
+	 * just need to figure out the offset of the list_node, and
+	 * verify its type.
+	 */
+	for_each_member(i, t, member) {
+		if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off)))
+			continue;
+		/* Invalid BTF, two members with same name */
+		if (n)
+			return -EINVAL;
+		n = btf_type_by_id(btf, member->type);
+		if (!__btf_type_is_struct(n))
+			return -EINVAL;
+		if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off)))
+			return -EINVAL;
+		offset = __btf_member_bit_offset(n, member);
+		if (offset % 8)
+			return -EINVAL;
+		offset /= 8;
+		if (offset % __alignof__(struct bpf_list_node))
+			return -EINVAL;
+
+		field->list_head.btf = (struct btf *)btf;
+		field->list_head.value_btf_id = info->list_head.value_btf_id;
+		field->list_head.node_offset = offset;
+	}
+	if (!n)
+		return -ENOENT;
+	return 0;
+}
+
 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
 				    u32 field_mask, u32 value_size)
 {
 	struct btf_field_info info_arr[BTF_FIELDS_MAX];
 	struct btf_record *rec;
+	u32 next_off = 0;
 	int ret, i, cnt;
 
 	ret = btf_find_field(btf, t, field_mask, info_arr, ARRAY_SIZE(info_arr));
@@ -3517,6 +3639,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			ret = -EFAULT;
 			goto end;
 		}
+		if (info_arr[i].off < next_off) {
+			ret = -EEXIST;
+			goto end;
+		}
+		next_off = info_arr[i].off + btf_field_type_size(info_arr[i].type);
 
 		rec->field_mask |= info_arr[i].type;
 		rec->fields[i].offset = info_arr[i].off;
@@ -3539,12 +3666,24 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_LIST_HEAD:
+			ret = btf_parse_list_head(btf, &rec->fields[i], &info_arr[i]);
+			if (ret < 0)
+				goto end;
+			break;
 		default:
 			ret = -EFAULT;
 			goto end;
 		}
 		rec->cnt++;
 	}
+
+	/* bpf_list_head requires bpf_spin_lock */
+	if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	return rec;
 end:
 	btf_record_free(rec);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 283f55bbeb70..7bc71995f17c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1706,6 +1706,38 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+void bpf_list_head_free(const struct btf_field *field, void *list_head,
+			struct bpf_spin_lock *spin_lock)
+{
+	struct list_head *head = list_head, *orig_head = list_head;
+
+	BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
+	BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
+
+	/* Do the actual list draining outside the lock to not hold the lock for
+	 * too long, and also prevent deadlocks if tracing programs end up
+	 * executing on entry/exit of functions called inside the critical
+	 * section, and end up doing map ops that call bpf_list_head_free for
+	 * the same map value again.
+	 */
+	__bpf_spin_lock_irqsave(spin_lock);
+	if (!head->next || list_empty(head))
+		goto unlock;
+	head = head->next;
+unlock:
+	INIT_LIST_HEAD(orig_head);
+	__bpf_spin_unlock_irqrestore(spin_lock);
+
+	while (head != orig_head) {
+		void *obj = head;
+
+		obj -= field->list_head.node_offset;
+		head = head->next;
+		/* TODO: Rework later */
+		kfree(obj);
+	}
+}
+
 BTF_SET8_START(tracing_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 85532d301124..fdbae52f463f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -536,6 +536,9 @@ void btf_record_free(struct btf_record *rec)
 				module_put(rec->fields[i].kptr.module);
 			btf_put(rec->fields[i].kptr.btf);
 			break;
+		case BPF_LIST_HEAD:
+			/* Nothing to release for bpf_list_head */
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
@@ -578,6 +581,9 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 				goto free;
 			}
 			break;
+		case BPF_LIST_HEAD:
+			/* Nothing to acquire for bpf_list_head */
+			break;
 		default:
 			ret = -EFAULT;
 			WARN_ON_ONCE(1);
@@ -637,6 +643,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 		case BPF_KPTR_REF:
 			field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0));
 			break;
+		case BPF_LIST_HEAD:
+			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
+				continue;
+			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
@@ -965,7 +976,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!value_type || value_size != map->value_size)
 		return -EINVAL;
 
-	map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR,
+	map->record = btf_parse_fields(btf, value_type,
+				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
@@ -1012,6 +1024,14 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 					goto free_map_tab;
 				}
 				break;
+			case BPF_LIST_HEAD:
+				if (map->map_type != BPF_MAP_TYPE_HASH &&
+				    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+				    map->map_type != BPF_MAP_TYPE_ARRAY) {
+					ret = -EOPNOTSUPP;
+					goto free_map_tab;
+				}
+				break;
 			default:
 				/* Fail if map_type checks are missing for a field type */
 				ret = -EOPNOTSUPP;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 07c0259dfc1a..a50018e2d4a0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12814,6 +12814,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
+	if (btf_record_has_field(map->record, BPF_LIST_HEAD)) {
+		if (is_tracing_prog_type(prog_type)) {
+			verbose(env, "tracing progs cannot use bpf_list_head yet\n");
+			return -EINVAL;
+		}
+	}
+
 	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index fb4c911d2a03..6580448e9f77 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6888,6 +6888,16 @@ struct bpf_dynptr {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_list_head {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
+struct bpf_list_node {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 05/26] bpf: Rename RET_PTR_TO_ALLOC_MEM
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 04/26] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 06/26] bpf: Rename MEM_ALLOC to MEM_RINGBUF Kumar Kartikeya Dwivedi
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Currently, the verifier has two return types, RET_PTR_TO_ALLOC_MEM, and
RET_PTR_TO_ALLOC_MEM_OR_NULL, however the former is confusingly named to
imply that it carries MEM_ALLOC, while only the latter does. This causes
confusion during code review leading to conclusions like that the return
value of RET_PTR_TO_DYNPTR_MEM_OR_NULL (which is RET_PTR_TO_ALLOC_MEM |
PTR_MAYBE_NULL) may be consumable by bpf_ringbuf_{submit,commit}.

Rename it to make it clear MEM_ALLOC needs to be tacked on top of
RET_PTR_TO_MEM.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   | 6 +++---
 kernel/bpf/verifier.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 05f98e9e5c48..2fe3ec620d54 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -607,7 +607,7 @@ enum bpf_return_type {
 	RET_PTR_TO_SOCKET,		/* returns a pointer to a socket */
 	RET_PTR_TO_TCP_SOCK,		/* returns a pointer to a tcp_sock */
 	RET_PTR_TO_SOCK_COMMON,		/* returns a pointer to a sock_common */
-	RET_PTR_TO_ALLOC_MEM,		/* returns a pointer to dynamically allocated memory */
+	RET_PTR_TO_MEM,			/* returns a pointer to memory */
 	RET_PTR_TO_MEM_OR_BTF_ID,	/* returns a pointer to a valid memory or a btf_id */
 	RET_PTR_TO_BTF_ID,		/* returns a pointer to a btf_id */
 	__BPF_RET_TYPE_MAX,
@@ -617,8 +617,8 @@ enum bpf_return_type {
 	RET_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_SOCKET,
 	RET_PTR_TO_TCP_SOCK_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
 	RET_PTR_TO_SOCK_COMMON_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
-	RET_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM,
-	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM,
+	RET_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_MEM,
+	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_MEM,
 	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a50018e2d4a0..c88da7e3ca74 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7630,7 +7630,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_TCP_SOCK | ret_flag;
 		break;
-	case RET_PTR_TO_ALLOC_MEM:
+	case RET_PTR_TO_MEM:
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
 		regs[BPF_REG_0].mem_size = meta.mem_size;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 06/26] bpf: Rename MEM_ALLOC to MEM_RINGBUF
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 05/26] bpf: Rename RET_PTR_TO_ALLOC_MEM Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 07/26] bpf: Refactor btf_struct_access Kumar Kartikeya Dwivedi
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Currently, verifier uses MEM_ALLOC type tag to specially tag memory
returned from bpf_ringbuf_reserve helper. However, this is currently
only used for this purpose and there is an implicit assumption that it
only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM
in check_func_arg_reg_off).

Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special
relationship and instead open the use of MEM_ALLOC for more generic
allocations made for user types.

Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it.

Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                               | 11 ++++-------
 kernel/bpf/ringbuf.c                              |  6 +++---
 kernel/bpf/verifier.c                             | 14 +++++++-------
 tools/testing/selftests/bpf/prog_tests/dynptr.c   |  2 +-
 tools/testing/selftests/bpf/verifier/ringbuf.c    |  2 +-
 tools/testing/selftests/bpf/verifier/spill_fill.c |  2 +-
 6 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2fe3ec620d54..afc1c51b59ff 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -488,10 +488,8 @@ enum bpf_type_flag {
 	 */
 	MEM_RDONLY		= BIT(1 + BPF_BASE_TYPE_BITS),
 
-	/* MEM was "allocated" from a different helper, and cannot be mixed
-	 * with regular non-MEM_ALLOC'ed MEM types.
-	 */
-	MEM_ALLOC		= BIT(2 + BPF_BASE_TYPE_BITS),
+	/* MEM points to BPF ring buffer reservation. */
+	MEM_RINGBUF		= BIT(2 + BPF_BASE_TYPE_BITS),
 
 	/* MEM is in user address space. */
 	MEM_USER		= BIT(3 + BPF_BASE_TYPE_BITS),
@@ -565,7 +563,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_LONG,	/* pointer to long */
 	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock (fullsock) */
 	ARG_PTR_TO_BTF_ID,	/* pointer to in-kernel struct */
-	ARG_PTR_TO_ALLOC_MEM,	/* pointer to dynamically allocated memory */
+	ARG_PTR_TO_RINGBUF_MEM,	/* pointer to dynamically reserved ringbuf memory */
 	ARG_CONST_ALLOC_SIZE_OR_ZERO,	/* number of allocated bytes requested */
 	ARG_PTR_TO_BTF_ID_SOCK_COMMON,	/* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */
 	ARG_PTR_TO_PERCPU_BTF_ID,	/* pointer to in-kernel percpu type */
@@ -582,7 +580,6 @@ enum bpf_arg_type {
 	ARG_PTR_TO_MEM_OR_NULL		= PTR_MAYBE_NULL | ARG_PTR_TO_MEM,
 	ARG_PTR_TO_CTX_OR_NULL		= PTR_MAYBE_NULL | ARG_PTR_TO_CTX,
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
-	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
 	ARG_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
 	/* pointer to memory does not need to be initialized, helper function must fill
@@ -617,7 +614,7 @@ enum bpf_return_type {
 	RET_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_SOCKET,
 	RET_PTR_TO_TCP_SOCK_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
 	RET_PTR_TO_SOCK_COMMON_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
-	RET_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_MEM,
+	RET_PTR_TO_RINGBUF_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
 	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_MEM,
 	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 9e832acf4692..80f4b4d88aaf 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -447,7 +447,7 @@ BPF_CALL_3(bpf_ringbuf_reserve, struct bpf_map *, map, u64, size, u64, flags)
 
 const struct bpf_func_proto bpf_ringbuf_reserve_proto = {
 	.func		= bpf_ringbuf_reserve,
-	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
+	.ret_type	= RET_PTR_TO_RINGBUF_MEM_OR_NULL,
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
@@ -490,7 +490,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_submit_proto = {
 	.func		= bpf_ringbuf_submit,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
+	.arg1_type	= ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
@@ -503,7 +503,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_discard_proto = {
 	.func		= bpf_ringbuf_discard,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
+	.arg1_type	= ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c88da7e3ca74..c588e5483540 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -577,8 +577,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 
 	if (type & MEM_RDONLY)
 		strncpy(prefix, "rdonly_", 32);
-	if (type & MEM_ALLOC)
-		strncpy(prefix, "alloc_", 32);
+	if (type & MEM_RINGBUF)
+		strncpy(prefix, "ringbuf_", 32);
 	if (type & MEM_USER)
 		strncpy(prefix, "user_", 32);
 	if (type & MEM_PERCPU)
@@ -5785,7 +5785,7 @@ static const struct bpf_reg_types mem_types = {
 		PTR_TO_MAP_KEY,
 		PTR_TO_MAP_VALUE,
 		PTR_TO_MEM,
-		PTR_TO_MEM | MEM_ALLOC,
+		PTR_TO_MEM | MEM_RINGBUF,
 		PTR_TO_BUF,
 	},
 };
@@ -5803,7 +5803,7 @@ static const struct bpf_reg_types int_ptr_types = {
 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
-static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
+static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
 static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
@@ -5836,7 +5836,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_BTF_ID]		= &btf_ptr_types,
 	[ARG_PTR_TO_SPIN_LOCK]		= &spin_lock_types,
 	[ARG_PTR_TO_MEM]		= &mem_types,
-	[ARG_PTR_TO_ALLOC_MEM]		= &alloc_mem_types,
+	[ARG_PTR_TO_RINGBUF_MEM]	= &ringbuf_mem_types,
 	[ARG_PTR_TO_INT]		= &int_ptr_types,
 	[ARG_PTR_TO_LONG]		= &int_ptr_types,
 	[ARG_PTR_TO_PERCPU_BTF_ID]	= &percpu_btf_ptr_types,
@@ -5957,14 +5957,14 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_MAP_VALUE:
 	case PTR_TO_MEM:
 	case PTR_TO_MEM | MEM_RDONLY:
-	case PTR_TO_MEM | MEM_ALLOC:
+	case PTR_TO_MEM | MEM_RINGBUF:
 	case PTR_TO_BUF:
 	case PTR_TO_BUF | MEM_RDONLY:
 	case SCALAR_VALUE:
 		/* Some of the argument types nevertheless require a
 		 * zero register offset.
 		 */
-		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
+		if (base_type(arg_type) != ARG_PTR_TO_RINGBUF_MEM)
 			return 0;
 		break;
 	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
index 8fc4e6c02bfd..b0c06f821cb8 100644
--- a/tools/testing/selftests/bpf/prog_tests/dynptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
@@ -17,7 +17,7 @@ static struct {
 	{"ringbuf_missing_release2", "Unreleased reference id=2"},
 	{"ringbuf_missing_release_callback", "Unreleased reference id"},
 	{"use_after_invalid", "Expected an initialized dynptr as arg #3"},
-	{"ringbuf_invalid_api", "type=mem expected=alloc_mem"},
+	{"ringbuf_invalid_api", "type=mem expected=ringbuf_mem"},
 	{"add_dynptr_to_map1", "invalid indirect read from stack"},
 	{"add_dynptr_to_map2", "invalid indirect read from stack"},
 	{"data_slice_out_of_bounds_ringbuf", "value is outside of the allowed memory range"},
diff --git a/tools/testing/selftests/bpf/verifier/ringbuf.c b/tools/testing/selftests/bpf/verifier/ringbuf.c
index b64d33e4833c..84838feba47f 100644
--- a/tools/testing/selftests/bpf/verifier/ringbuf.c
+++ b/tools/testing/selftests/bpf/verifier/ringbuf.c
@@ -28,7 +28,7 @@
 	},
 	.fixup_map_ringbuf = { 1 },
 	.result = REJECT,
-	.errstr = "dereference of modified alloc_mem ptr R1",
+	.errstr = "dereference of modified ringbuf_mem ptr R1",
 },
 {
 	"ringbuf: invalid reservation offset 2",
diff --git a/tools/testing/selftests/bpf/verifier/spill_fill.c b/tools/testing/selftests/bpf/verifier/spill_fill.c
index e23f07175e1b..9bb302dade23 100644
--- a/tools/testing/selftests/bpf/verifier/spill_fill.c
+++ b/tools/testing/selftests/bpf/verifier/spill_fill.c
@@ -84,7 +84,7 @@
 	},
 	.fixup_map_ringbuf = { 1 },
 	.result = REJECT,
-	.errstr = "R0 pointer arithmetic on alloc_mem_or_null prohibited",
+	.errstr = "R0 pointer arithmetic on ringbuf_mem_or_null prohibited",
 },
 {
 	"check corrupted spill/fill",
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 07/26] bpf: Refactor btf_struct_access
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 06/26] bpf: Rename MEM_ALLOC to MEM_RINGBUF Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support Kumar Kartikeya Dwivedi
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Instead of having to pass multiple arguments that describe the register,
pass the bpf_reg_state into the btf_struct_access callback. Currently,
all call sites simply reuse the btf and btf_id of the reg they want to
check the access of. The only exception to this pattern is the callsite
in check_ptr_to_map_access, hence for that case create a dummy reg to
simulate PTR_TO_BTF_ID access.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h              | 17 ++++++++--------
 include/linux/filter.h           |  8 ++++----
 kernel/bpf/btf.c                 | 11 +++++++----
 kernel/bpf/verifier.c            | 12 ++++++-----
 net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++-------
 net/core/filter.c                | 34 +++++++++++++-------------------
 net/ipv4/bpf_tcp_ca.c            | 13 ++++++------
 net/netfilter/nf_conntrack_bpf.c | 17 +++++++---------
 8 files changed, 60 insertions(+), 66 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index afc1c51b59ff..49f9d2bec401 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -771,6 +771,7 @@ struct bpf_prog_ops {
 			union bpf_attr __user *uattr);
 };
 
+struct bpf_reg_state;
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
 	const struct bpf_func_proto *
@@ -792,9 +793,8 @@ struct bpf_verifier_ops {
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
 	int (*btf_struct_access)(struct bpf_verifier_log *log,
-				 const struct btf *btf,
-				 const struct btf_type *t, int off, int size,
-				 enum bpf_access_type atype,
+				 const struct bpf_reg_state *reg,
+				 int off, int size, enum bpf_access_type atype,
 				 u32 *next_btf_id, enum bpf_type_flag *flag);
 };
 
@@ -2080,9 +2080,9 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size,
 	return btf_ctx_access(off, size, type, prog, info);
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
-		      const struct btf_type *t, int off, int size,
-		      enum bpf_access_type atype,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg,
+		      int off, int size, enum bpf_access_type atype,
 		      u32 *next_btf_id, enum bpf_type_flag *flag);
 bool btf_struct_ids_match(struct bpf_verifier_log *log,
 			  const struct btf *btf, u32 id, int off,
@@ -2333,9 +2333,8 @@ static inline struct bpf_prog *bpf_prog_by_id(u32 id)
 }
 
 static inline int btf_struct_access(struct bpf_verifier_log *log,
-				    const struct btf *btf,
-				    const struct btf_type *t, int off, int size,
-				    enum bpf_access_type atype,
+				    const struct bpf_reg_state *reg,
+				    int off, int size, enum bpf_access_type atype,
 				    u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	return -EACCES;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6e3aed..787d35dbf5b0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -568,10 +568,10 @@ struct sk_filter {
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
 extern struct mutex nf_conn_btf_access_lock;
-extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
-				     const struct btf_type *t, int off, int size,
-				     enum bpf_access_type atype, u32 *next_btf_id,
-				     enum bpf_type_flag *flag);
+extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+				     const struct bpf_reg_state *reg,
+				     int off, int size, enum bpf_access_type atype,
+				     u32 *next_btf_id, enum bpf_type_flag *flag);
 
 typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
 					  const struct bpf_insn *insnsi,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c0d73d71c539..875355ff3718 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6017,15 +6017,18 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 	return -EINVAL;
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
-		      const struct btf_type *t, int off, int size,
-		      enum bpf_access_type atype __maybe_unused,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg,
+		      int off, int size, enum bpf_access_type atype __maybe_unused,
 		      u32 *next_btf_id, enum bpf_type_flag *flag)
 {
+	const struct btf *btf = reg->btf;
 	enum bpf_type_flag tmp_flag = 0;
+	const struct btf_type *t;
+	u32 id = reg->btf_id;
 	int err;
-	u32 id;
 
+	t = btf_type_by_id(btf, id);
 	do {
 		err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c588e5483540..5e74f460dfd0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4688,16 +4688,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	}
 
 	if (env->ops->btf_struct_access) {
-		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
-						  off, size, atype, &btf_id, &flag);
+		ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	} else {
 		if (atype != BPF_READ) {
 			verbose(env, "only read is supported\n");
 			return -EACCES;
 		}
 
-		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
-					atype, &btf_id, &flag);
+		ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	}
 
 	if (ret < 0)
@@ -4723,6 +4721,7 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 {
 	struct bpf_reg_state *reg = regs + regno;
 	struct bpf_map *map = reg->map_ptr;
+	struct bpf_reg_state map_reg;
 	enum bpf_type_flag flag = 0;
 	const struct btf_type *t;
 	const char *tname;
@@ -4761,7 +4760,10 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
+	/* Simulate access to a PTR_TO_BTF_ID */
+	memset(&map_reg, 0, sizeof(map_reg));
+	mark_btf_ld_reg(env, &map_reg, 0, PTR_TO_BTF_ID, btf_vmlinux, *map->ops->map_btf_id, 0);
+	ret = btf_struct_access(&env->log, &map_reg, off, size, atype, &btf_id, &flag);
 	if (ret < 0)
 		return ret;
 
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index e78dadfc5829..2d434c1f4617 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -156,29 +156,29 @@ static bool bpf_dummy_ops_is_valid_access(int off, int size,
 }
 
 static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
-					   const struct btf *btf,
-					   const struct btf_type *t, int off,
-					   int size, enum bpf_access_type atype,
+					   const struct bpf_reg_state *reg,
+					   int off, int size, enum bpf_access_type atype,
 					   u32 *next_btf_id,
 					   enum bpf_type_flag *flag)
 {
 	const struct btf_type *state;
+	const struct btf_type *t;
 	s32 type_id;
 	int err;
 
-	type_id = btf_find_by_name_kind(btf, "bpf_dummy_ops_state",
+	type_id = btf_find_by_name_kind(reg->btf, "bpf_dummy_ops_state",
 					BTF_KIND_STRUCT);
 	if (type_id < 0)
 		return -EINVAL;
 
-	state = btf_type_by_id(btf, type_id);
+	t = btf_type_by_id(reg->btf, reg->btf_id);
+	state = btf_type_by_id(reg->btf, type_id);
 	if (t != state) {
 		bpf_log(log, "only access to bpf_dummy_ops_state is supported\n");
 		return -EACCES;
 	}
 
-	err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-				flag);
+	err = btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	if (err < 0)
 		return err;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6dd2baf5eeb2..37fad5a9b752 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8651,28 +8651,25 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 DEFINE_MUTEX(nf_conn_btf_access_lock);
 EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
 
-int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
-			      const struct btf_type *t, int off, int size,
-			      enum bpf_access_type atype, u32 *next_btf_id,
-			      enum bpf_type_flag *flag);
+int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+			      const struct bpf_reg_state *reg,
+			      int off, int size, enum bpf_access_type atype,
+			      u32 *next_btf_id, enum bpf_type_flag *flag);
 EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
 
 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int off,
-					int size, enum bpf_access_type atype,
-					u32 *next_btf_id,
-					enum bpf_type_flag *flag)
+					const struct bpf_reg_state *reg,
+					int off, int size, enum bpf_access_type atype,
+					u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
@@ -8738,21 +8735,18 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
 static int xdp_btf_struct_access(struct bpf_verifier_log *log,
-				 const struct btf *btf,
-				 const struct btf_type *t, int off,
-				 int size, enum bpf_access_type atype,
-				 u32 *next_btf_id,
-				 enum bpf_type_flag *flag)
+				 const struct bpf_reg_state *reg,
+				 int off, int size, enum bpf_access_type atype,
+				 u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 6da16ae6a962..d15c91de995f 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -69,18 +69,17 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
 }
 
 static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int off,
-					int size, enum bpf_access_type atype,
-					u32 *next_btf_id,
-					enum bpf_type_flag *flag)
+					const struct bpf_reg_state *reg,
+					int off, int size, enum bpf_access_type atype,
+					u32 *next_btf_id, enum bpf_type_flag *flag)
 {
+	const struct btf_type *t;
 	size_t end;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
+	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (t != tcp_sock_type) {
 		bpf_log(log, "only read is supported\n");
 		return -EACCES;
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8639e7efd0e2..24002bc61e07 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -191,19 +191,16 @@ BTF_ID(struct, nf_conn___init)
 
 /* Check writes into `struct nf_conn` */
 static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
-					   const struct btf *btf,
-					   const struct btf_type *t, int off,
-					   int size, enum bpf_access_type atype,
-					   u32 *next_btf_id,
-					   enum bpf_type_flag *flag)
+					   const struct bpf_reg_state *reg,
+					   int off, int size, enum bpf_access_type atype,
+					   u32 *next_btf_id, enum bpf_type_flag *flag)
 {
-	const struct btf_type *ncit;
-	const struct btf_type *nct;
+	const struct btf_type *ncit, *nct, *t;
 	size_t end;
 
-	ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
-	nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
-
+	ncit = btf_type_by_id(reg->btf, btf_nf_conn_ids[1]);
+	nct = btf_type_by_id(reg->btf, btf_nf_conn_ids[0]);
+	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (t != nct && t != ncit) {
 		bpf_log(log, "only read is supported\n");
 		return -EACCES;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 07/26] bpf: Refactor btf_struct_access Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  5:48   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects Kumar Kartikeya Dwivedi
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Introduce support for representing pointers to objects allocated by the
BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF.
This is indicated by the presence of MEM_ALLOC type flag in reg->type to
avoid having to check btf_is_kernel when trying to match argument types
in helpers.

Whenever walking such types, any pointers being walked will always yield
a SCALAR instead of pointer. In the future we might permit kptr inside
such allocated objects (either kernel or local), and it will then form a
PTR_TO_BTF_ID of the respective type.

For now, such allocated objects will always be referenced in verifier
context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
to such objects, as long fields that are special are not touched
(support for which will be added in subsequent patches). Note that once
such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write
to it.

No PROBE_MEM handling is therefore done for loads into this type unless
PTR_UNTRUSTED is part of the register type, since they can never be in
an undefined state, and their lifetime will always be valid.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   | 11 +++++++++++
 kernel/bpf/btf.c      |  5 +++++
 kernel/bpf/verifier.c | 25 +++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 49f9d2bec401..3cab113b149e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -524,6 +524,11 @@ enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is of a an allocated object of type from program BTF. This is
+	 * used to tag PTR_TO_BTF_ID allocated using bpf_obj_new.
+	 */
+	MEM_ALLOC		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
@@ -2791,4 +2796,10 @@ struct bpf_key {
 	bool has_ref;
 };
 #endif /* CONFIG_KEYS */
+
+static inline bool type_is_alloc(u32 type)
+{
+	return type & MEM_ALLOC;
+}
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 875355ff3718..9a596f430558 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6034,6 +6034,11 @@ int btf_struct_access(struct bpf_verifier_log *log,
 
 		switch (err) {
 		case WALK_PTR:
+			/* For local types, the destination register cannot
+			 * become a pointer again.
+			 */
+			if (type_is_alloc(reg->type))
+				return SCALAR_VALUE;
 			/* If we found the pointer or scalar on t+off,
 			 * we're done.
 			 */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5e74f460dfd0..d726d55622c9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4687,14 +4687,27 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	if (env->ops->btf_struct_access) {
+	if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
+		if (!btf_is_kernel(reg->btf)) {
+			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
+			return -EFAULT;
+		}
 		ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	} else {
-		if (atype != BPF_READ) {
+		/* Writes are permitted with default btf_struct_access for local
+		 * kptrs (which always have ref_obj_id > 0), but not for
+		 * untrusted PTR_TO_BTF_ID | MEM_ALLOC.
+		 */
+		if (atype != BPF_READ && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 			verbose(env, "only read is supported\n");
 			return -EACCES;
 		}
 
+		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
+			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
+			return -EFAULT;
+		}
+
 		ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	}
 
@@ -5973,6 +5986,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | MEM_ALLOC:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
@@ -13659,6 +13673,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			break;
 		case PTR_TO_BTF_ID:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
+		/* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike
+		 * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot
+		 * be said once it is marked PTR_UNTRUSTED, hence we must handle
+		 * any faults for loads into such types. BPF_WRITE is disallowed
+		 * for this case.
+		 */
+		case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  5:50   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 10/26] bpf: Verify ownership relationships for user BTF types Kumar Kartikeya Dwivedi
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a
allocated object.

Also update btf_struct_access to reject direct access to these special
fields.

A bpf_list_head allows implementing map-in-map style use cases, where an
allocated object with bpf_list_head is linked into a list in a map
value. This would require embedding a bpf_list_node, support for which
is also included. The bpf_spin_lock is used to protect the bpf_list_head
and other data.

While we strictly don't require to hold a bpf_spin_lock while touching
the bpf_list_head in such objects, as when have access to it, we have
complete ownership of the object, the locking constraint is still kept
and may be conditionally lifted in the future.

Note that the specification of such types can be done just like map
values, e.g.:

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
	struct bpf_list_node node;
};

struct map_value {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(foo, node);
};

To recognize such types in user BTF, we build a btf_struct_metas array
of metadata items corresponding to each BTF ID. This is done once during
the btf_parse stage to avoid having to do it each time during the
verification process's requirement to inspect the metadata.

Moreover, the computed metadata needs to be passed to some helpers in
future patches which requires allocating them and storing them in the
BTF that is pinned by the program itself, so that valid access can be
assumed to such data during program runtime.

A key thing to note is that once a btf_struct_meta is available for a
type, both the btf_record and btf_field_offs should be available. It is
critical that btf_field_offs is available in case special fields are
present, as we extensively rely on special fields being zeroed out in
map values and allocated objects in later patches. The code ensures that
by bailing out in case of errors and ensuring both are available
together. If the record is not available, the special fields won't be
recognized, so not having both is also fine (in terms of being a
verification error and not a runtime bug).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  |   7 ++
 include/linux/btf.h  |  35 ++++++++
 kernel/bpf/btf.c     | 198 +++++++++++++++++++++++++++++++++++++++----
 kernel/bpf/syscall.c |   4 +
 4 files changed, 226 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cab113b149e..4cd3c9e6f50b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -176,6 +176,7 @@ enum btf_field_type {
 	BPF_KPTR_REF   = (1 << 3),
 	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
 	BPF_LIST_HEAD  = (1 << 4),
+	BPF_LIST_NODE  = (1 << 5),
 };
 
 struct btf_field_kptr {
@@ -276,6 +277,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 		return "kptr";
 	case BPF_LIST_HEAD:
 		return "bpf_list_head";
+	case BPF_LIST_NODE:
+		return "bpf_list_node";
 	default:
 		WARN_ON_ONCE(1);
 		return "unknown";
@@ -294,6 +297,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(u64);
 	case BPF_LIST_HEAD:
 		return sizeof(struct bpf_list_head);
+	case BPF_LIST_NODE:
+		return sizeof(struct bpf_list_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -312,6 +317,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(u64);
 	case BPF_LIST_HEAD:
 		return __alignof__(struct bpf_list_head);
+	case BPF_LIST_NODE:
+		return __alignof__(struct bpf_list_node);
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index d80345fa566b..a01a8da20021 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -6,6 +6,8 @@
 
 #include <linux/types.h>
 #include <linux/bpfptr.h>
+#include <linux/bsearch.h>
+#include <linux/btf_ids.h>
 #include <uapi/linux/btf.h>
 #include <uapi/linux/bpf.h>
 
@@ -78,6 +80,17 @@ struct btf_id_dtor_kfunc {
 	u32 kfunc_btf_id;
 };
 
+struct btf_struct_meta {
+	u32 btf_id;
+	struct btf_record *record;
+	struct btf_field_offs *field_offs;
+};
+
+struct btf_struct_metas {
+	u32 cnt;
+	struct btf_struct_meta types[];
+};
+
 typedef void (*btf_dtor_kfunc_t)(void *);
 
 extern const struct file_operations btf_fops;
@@ -408,6 +421,23 @@ static inline struct btf_param *btf_params(const struct btf_type *t)
 	return (struct btf_param *)(t + 1);
 }
 
+static inline int btf_id_cmp_func(const void *a, const void *b)
+{
+	const int *pa = a, *pb = b;
+
+	return *pa - *pb;
+}
+
+static inline bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
+{
+	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
+}
+
+static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
+{
+	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 struct bpf_prog;
 
@@ -423,6 +453,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner);
+struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -454,6 +485,10 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt
 {
 	return 0;
 }
+static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
+{
+	return NULL;
+}
 #endif
 
 static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 9a596f430558..c0c2db93e0de 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -237,6 +237,7 @@ struct btf {
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
 	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
+	struct btf_struct_metas *struct_meta_tab;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -1642,8 +1643,30 @@ static void btf_free_dtor_kfunc_tab(struct btf *btf)
 	btf->dtor_kfunc_tab = NULL;
 }
 
+static void btf_struct_metas_free(struct btf_struct_metas *tab)
+{
+	int i;
+
+	if (!tab)
+		return;
+	for (i = 0; i < tab->cnt; i++) {
+		btf_record_free(tab->types[i].record);
+		kfree(tab->types[i].field_offs);
+	}
+	kfree(tab);
+}
+
+static void btf_free_struct_meta_tab(struct btf *btf)
+{
+	struct btf_struct_metas *tab = btf->struct_meta_tab;
+
+	btf_struct_metas_free(tab);
+	btf->struct_meta_tab = NULL;
+}
+
 static void btf_free(struct btf *btf)
 {
+	btf_free_struct_meta_tab(btf);
 	btf_free_dtor_kfunc_tab(btf);
 	btf_free_kfunc_set_tab(btf);
 	kvfree(btf->types);
@@ -3353,6 +3376,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
 			goto end;
 		}
 	}
+	if (field_mask & BPF_LIST_NODE) {
+		if (!strcmp(name, "bpf_list_node")) {
+			type = BPF_LIST_NODE;
+			goto end;
+		}
+	}
 	/* Only return BPF_KPTR when all other types with matchable names fail */
 	if (field_mask & BPF_KPTR) {
 		type = BPF_KPTR_REF;
@@ -3396,6 +3425,7 @@ static int btf_find_struct_field(const struct btf *btf,
 		switch (field_type) {
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
+		case BPF_LIST_NODE:
 			ret = btf_find_struct(btf, member_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3456,6 +3486,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
+		case BPF_LIST_NODE:
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3671,6 +3702,8 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			if (ret < 0)
 				goto end;
 			break;
+		case BPF_LIST_NODE:
+			break;
 		default:
 			ret = -EFAULT;
 			goto end;
@@ -5141,6 +5174,120 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
 	return btf_check_sec_info(env, btf_data_size);
 }
 
+static const char *alloc_obj_fields[] = {
+	"bpf_spin_lock",
+	"bpf_list_head",
+	"bpf_list_node",
+};
+
+static struct btf_struct_metas *
+btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
+{
+	union {
+		struct btf_id_set set;
+		struct {
+			u32 _cnt;
+			u32 _ids[ARRAY_SIZE(alloc_obj_fields)];
+		} _arr;
+	} aof;
+	struct btf_struct_metas *tab = NULL;
+	int i, n, id, ret;
+
+	BUILD_BUG_ON(offsetof(struct btf_id_set, cnt) != 0);
+	BUILD_BUG_ON(sizeof(struct btf_id_set) != sizeof(u32));
+
+	memset(&aof, 0, sizeof(aof));
+	for (i = 0; i < ARRAY_SIZE(alloc_obj_fields); i++) {
+		/* Try to find whether this special type exists in user BTF, and
+		 * if so remember its ID so we can easily find it among members
+		 * of structs that we iterate in the next loop.
+		 */
+		id = btf_find_by_name_kind(btf, alloc_obj_fields[i], BTF_KIND_STRUCT);
+		if (id < 0)
+			continue;
+		aof.set.ids[aof.set.cnt++] = id;
+	}
+
+	if (!aof.set.cnt)
+		return NULL;
+	sort(&aof.set.ids, aof.set.cnt, sizeof(aof.set.ids[0]), btf_id_cmp_func, NULL);
+
+	n = btf_nr_types(btf);
+	for (i = 1; i < n; i++) {
+		const struct btf_member *member;
+		struct btf_field_offs *foffs;
+		struct btf_struct_meta *type;
+		struct btf_record *record;
+		const struct btf_type *t;
+		int j;
+
+		t = btf_type_by_id(btf, i);
+		if (!t) {
+			ret = -EINVAL;
+			goto free;
+		}
+		if (!__btf_type_is_struct(t))
+			continue;
+
+		cond_resched();
+
+		for_each_member(j, t, member) {
+			if (btf_id_set_contains(&aof.set, member->type))
+				goto parse;
+		}
+		continue;
+	parse:
+		if (!tab) {
+			tab = kzalloc(offsetof(struct btf_struct_metas, types[1]),
+				      GFP_KERNEL | __GFP_NOWARN);
+			if (!tab)
+				return ERR_PTR(-ENOMEM);
+		} else {
+			struct btf_struct_metas *new_tab;
+
+			new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab->cnt + 1]),
+					   GFP_KERNEL | __GFP_NOWARN);
+			if (!new_tab) {
+				ret = -ENOMEM;
+				goto free;
+			}
+			tab = new_tab;
+		}
+		type = &tab->types[tab->cnt];
+
+		type->btf_id = i;
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
+		if (IS_ERR_OR_NULL(record)) {
+			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
+			goto free;
+		}
+		foffs = btf_parse_field_offs(record);
+		if (WARN_ON_ONCE(IS_ERR_OR_NULL(foffs))) {
+			btf_record_free(record);
+			ret = -EFAULT;
+			goto free;
+		}
+		type->record = record;
+		type->field_offs = foffs;
+		tab->cnt++;
+	}
+	return tab;
+free:
+	btf_struct_metas_free(tab);
+	return ERR_PTR(ret);
+}
+
+struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
+{
+	struct btf_struct_metas *tab;
+
+	BUILD_BUG_ON(offsetof(struct btf_struct_meta, btf_id) != 0);
+	tab = btf->struct_meta_tab;
+	if (!tab)
+		return NULL;
+	return bsearch(&btf_id, tab->types, tab->cnt, sizeof(tab->types[0]), btf_id_cmp_func);
+}
+
 static int btf_check_type_tags(struct btf_verifier_env *env,
 			       struct btf *btf, int start_id)
 {
@@ -5191,6 +5338,7 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
 static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 			     u32 log_level, char __user *log_ubuf, u32 log_size)
 {
+	struct btf_struct_metas *struct_meta_tab;
 	struct btf_verifier_env *env = NULL;
 	struct bpf_verifier_log *log;
 	struct btf *btf = NULL;
@@ -5259,15 +5407,24 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 	if (err)
 		goto errout;
 
+	struct_meta_tab = btf_parse_struct_metas(log, btf);
+	if (IS_ERR(struct_meta_tab)) {
+		err = PTR_ERR(struct_meta_tab);
+		goto errout;
+	}
+	btf->struct_meta_tab = struct_meta_tab;
+
 	if (log->level && bpf_verifier_log_full(log)) {
 		err = -ENOSPC;
-		goto errout;
+		goto errout_meta;
 	}
 
 	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
 	return btf;
 
+errout_meta:
+	btf_free_struct_meta_tab(btf);
 errout:
 	btf_verifier_env_free(env);
 	if (btf)
@@ -6028,6 +6185,28 @@ int btf_struct_access(struct bpf_verifier_log *log,
 	u32 id = reg->btf_id;
 	int err;
 
+	while (type_is_alloc(reg->type)) {
+		struct btf_struct_meta *meta;
+		struct btf_record *rec;
+		int i;
+
+		meta = btf_find_struct_meta(btf, id);
+		if (!meta)
+			break;
+		rec = meta->record;
+		for (i = 0; i < rec->cnt; i++) {
+			struct btf_field *field = &rec->fields[i];
+			u32 offset = field->offset;
+			if (off < offset + btf_field_type_size(field->type) && offset < off + size) {
+				bpf_log(log,
+					"direct access to %s is disallowed\n",
+					btf_field_type_name(field->type));
+				return -EACCES;
+			}
+		}
+		break;
+	}
+
 	t = btf_type_by_id(btf, id);
 	do {
 		err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
@@ -7269,23 +7448,6 @@ bool btf_is_module(const struct btf *btf)
 	return btf->kernel_btf && strcmp(btf->name, "vmlinux") != 0;
 }
 
-static int btf_id_cmp_func(const void *a, const void *b)
-{
-	const int *pa = a, *pb = b;
-
-	return *pa - *pb;
-}
-
-bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
-{
-	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
-}
-
-static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
-{
-	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
-}
-
 enum {
 	BTF_MODULE_F_LIVE = (1 << 0),
 };
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fdbae52f463f..c96039a4e57f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -537,6 +537,7 @@ void btf_record_free(struct btf_record *rec)
 			btf_put(rec->fields[i].kptr.btf);
 			break;
 		case BPF_LIST_HEAD:
+		case BPF_LIST_NODE:
 			/* Nothing to release for bpf_list_head */
 			break;
 		default:
@@ -582,6 +583,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 			}
 			break;
 		case BPF_LIST_HEAD:
+		case BPF_LIST_NODE:
 			/* Nothing to acquire for bpf_list_head */
 			break;
 		default:
@@ -648,6 +650,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 				continue;
 			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
+		case BPF_LIST_NODE:
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			continue;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 10/26] bpf: Verify ownership relationships for user BTF types
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 11/26] bpf: Allow locking bpf_spin_lock in allocated objects Kumar Kartikeya Dwivedi
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Ensure that there can be no ownership cycles among different types by
way of having owning objects that can hold some other type as their
element. For instance, a map value can only hold allocated objects, but
these are allowed to have another bpf_list_head. To prevent unbounded
recursion while freeing resources, elements of bpf_list_head in local
kptrs can never have a bpf_list_head which are part of list in a map
value. Later patches will verify this by having dedicated BTF selftests.

Also, to make runtime destruction easier, once btf_struct_metas is fully
populated, we can stash the metadata of the value type directly in the
metadata of the list_head fields, as that allows easier access to the
value type's layout to destruct it at runtime from the btf_field entry
of the list head itself.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  |  1 +
 include/linux/btf.h  |  1 +
 kernel/bpf/btf.c     | 71 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c |  4 +++
 4 files changed, 77 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4cd3c9e6f50b..c88f75a68893 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -190,6 +190,7 @@ struct btf_field_list_head {
 	struct btf *btf;
 	u32 value_btf_id;
 	u32 node_offset;
+	struct btf_record *value_rec;
 };
 
 struct btf_field {
diff --git a/include/linux/btf.h b/include/linux/btf.h
index a01a8da20021..42d8f3730a8d 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -178,6 +178,7 @@ int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
 				    u32 field_mask, u32 value_size);
+int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
 struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c0c2db93e0de..10644343d877 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3723,6 +3723,67 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 	return ERR_PTR(ret);
 }
 
+int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
+{
+	int i;
+
+	/* There are two owning types, kptr_ref and bpf_list_head. The former
+	 * only supports storing kernel types, which can never store references
+	 * to program allocated local types, atleast not yet. Hence we only need
+	 * to ensure that bpf_list_head ownership does not form cycles.
+	 */
+	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD))
+		return 0;
+	for (i = 0; i < rec->cnt; i++) {
+		struct btf_struct_meta *meta;
+		u32 btf_id;
+
+		if (!(rec->fields[i].type & BPF_LIST_HEAD))
+			continue;
+		btf_id = rec->fields[i].list_head.value_btf_id;
+		meta = btf_find_struct_meta(btf, btf_id);
+		if (!meta)
+			return -EFAULT;
+		rec->fields[i].list_head.value_rec = meta->record;
+
+		if (!(rec->field_mask & BPF_LIST_NODE))
+			continue;
+
+		/* We need to ensure ownership acyclicity among all types. The
+		 * proper way to do it would be to topologically sort all BTF
+		 * IDs based on the ownership edges, since there can be multiple
+		 * bpf_list_head in a type. Instead, we use the following
+		 * reasoning:
+		 *
+		 * - A type can only be owned by another type in user BTF if it
+		 *   has a bpf_list_node.
+		 * - A type can only _own_ another type in user BTF if it has a
+		 *   bpf_list_head.
+		 *
+		 * We ensure that if a type has both bpf_list_head and
+		 * bpf_list_node, its element types cannot be owning types.
+		 *
+		 * To ensure acyclicity:
+		 *
+		 * When A only has bpf_list_head, ownership chain can be:
+		 *	A -> B -> C
+		 * Where:
+		 * - B has both bpf_list_head and bpf_list_node.
+		 * - C only has bpf_list_node.
+		 *
+		 * When A has both bpf_list_head and bpf_list_node, some other
+		 * type already owns it in the BTF domain, hence it can not own
+		 * another owning type through any of the bpf_list_head edges.
+		 *	A -> B
+		 * Where:
+		 * - B only has bpf_list_node.
+		 */
+		if (meta->record->field_mask & BPF_LIST_HEAD)
+			return -ELOOP;
+	}
+	return 0;
+}
+
 static int btf_field_offs_cmp(const void *_a, const void *_b, const void *priv)
 {
 	const u32 a = *(const u32 *)_a;
@@ -5414,6 +5475,16 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
 	}
 	btf->struct_meta_tab = struct_meta_tab;
 
+	if (struct_meta_tab) {
+		int i;
+
+		for (i = 0; i < struct_meta_tab->cnt; i++) {
+			err = btf_check_and_fixup_fields(btf, struct_meta_tab->types[i].record);
+			if (err < 0)
+				goto errout_meta;
+		}
+	}
+
 	if (log->level && bpf_verifier_log_full(log)) {
 		err = -ENOSPC;
 		goto errout_meta;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c96039a4e57f..4669020bb47d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1044,6 +1044,10 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		}
 	}
 
+	ret = btf_check_and_fixup_fields(btf, map->record);
+	if (ret < 0)
+		goto free_map_tab;
+
 	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
 		if (ret < 0)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 11/26] bpf: Allow locking bpf_spin_lock in allocated objects
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 10/26] bpf: Verify ownership relationships for user BTF types Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Allow locking a bpf_spin_lock in an allocated object, in addition to
already support map value pointers. The handling is similar to that of
map values, by just preserving the reg->id of PTR_TO_BTF_ID | MEM_ALLOC
as well, and adjusting process_spin_lock to work with them  and remember
the id in verifier state.

Refactor the existing process_spin_lock to work with PTR_TO_BTF_ID |
MEM_ALLOC in addition to PTR_TO_MAP_VALUE. We need to update the
reg_may_point_to_spin_lock which is used in mark_ptr_or_null_reg to
preserve reg->id, that will be used in env->cur_state->active_spin_lock
to remember the currently held spin lock.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c  |  2 ++
 kernel/bpf/verifier.c | 70 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7bc71995f17c..5bc0b9f0f306 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -336,6 +336,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+	.arg1_btf_id    = BPF_PTR_POISON,
 };
 
 static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
@@ -358,6 +359,7 @@ const struct bpf_func_proto bpf_spin_unlock_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_SPIN_LOCK,
+	.arg1_btf_id    = BPF_PTR_POISON,
 };
 
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d726d55622c9..070d003a99f0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -453,8 +453,16 @@ static bool reg_type_not_null(enum bpf_reg_type type)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return reg->type == PTR_TO_MAP_VALUE &&
-	       btf_record_has_field(reg->map_ptr->record, BPF_SPIN_LOCK);
+	struct btf_record *rec = NULL;
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		rec = reg->map_ptr->record;
+	} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
+		struct btf_struct_meta *meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (meta)
+			rec = meta->record;
+	}
+	return btf_record_has_field(rec, BPF_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -5588,8 +5596,10 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
-	struct bpf_map *map = reg->map_ptr;
+	struct btf_record *rec = NULL;
 	u64 val = reg->var_off.value;
+	struct bpf_map *map = NULL;
+	struct btf *btf = NULL;
 
 	if (!is_const) {
 		verbose(env,
@@ -5597,19 +5607,32 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			regno);
 		return -EINVAL;
 	}
-	if (!map->btf) {
-		verbose(env,
-			"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-			map->name);
-		return -EINVAL;
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		map = reg->map_ptr;
+		if (!map->btf) {
+			verbose(env,
+				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
+				map->name);
+			return -EINVAL;
+		}
+		rec = map->record;
+	} else {
+		struct btf_struct_meta *meta;
+
+		btf = reg->btf;
+		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (meta)
+			rec = meta->record;
 	}
-	if (!btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
-		verbose(env, "map '%s' has no valid bpf_spin_lock\n", map->name);
+
+	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr");
 		return -EINVAL;
 	}
-	if (map->record->spin_lock_off != val + reg->off) {
+	if (rec->spin_lock_off != val + reg->off) {
 		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, map->record->spin_lock_off);
+			val + reg->off, rec->spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
@@ -5815,13 +5838,19 @@ static const struct bpf_reg_types int_ptr_types = {
 	},
 };
 
+static const struct bpf_reg_types spin_lock_types = {
+	.types = {
+		PTR_TO_MAP_VALUE,
+		PTR_TO_BTF_ID | MEM_ALLOC,
+	}
+};
+
 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
 static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
-static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
@@ -5946,6 +5975,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 				return -EACCES;
 			}
 		}
+	} else if (type_is_alloc(reg->type)) {
+		if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock) {
+			verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n");
+			return -EFAULT;
+		}
 	}
 
 	return 0;
@@ -6062,7 +6096,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		goto skip_type_check;
 
 	/* arg_btf_id and arg_size are in a union. */
-	if (base_type(arg_type) == ARG_PTR_TO_BTF_ID)
+	if (base_type(arg_type) == ARG_PTR_TO_BTF_ID ||
+	    base_type(arg_type) == ARG_PTR_TO_SPIN_LOCK)
 		arg_btf_id = fn->arg_btf_id[arg];
 
 	err = check_reg_type(env, regno, arg_type, arg_btf_id, meta);
@@ -6680,9 +6715,10 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
-		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
-			return false;
-
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID)
+			return !!fn->arg_btf_id[i];
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_SPIN_LOCK)
+			return fn->arg_btf_id[i] == BPF_PTR_POISON;
 		if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i] &&
 		    /* arg_btf_id and arg_size are in a union. */
 		    (base_type(fn->arg_type[i]) != ARG_PTR_TO_MEM ||
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 11/26] bpf: Allow locking bpf_spin_lock in allocated objects Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15 17:33   ` Dave Marchevsky
  2022-11-14 19:15 ` [PATCH bpf-next v7 13/26] bpf: Allow locking bpf_spin_lock in inner map values Kumar Kartikeya Dwivedi
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Global variables reside in maps accessible using direct_value_addr
callbacks, so giving each load instruction's rewrite a unique reg->id
disallows us from holding locks which are global.

The reason for preserving reg->id as a unique value for registers that
may point to spin lock is that two separate lookups are treated as two
separate memory regions, and any possible aliasing is ignored for the
purposes of spin lock correctness.

This is not great especially for the global variable case, which are
served from maps that have max_entries == 1, i.e. they always lead to
map values pointing into the same map value.

So refactor the active_spin_lock into a 'active_lock' structure which
represents the lock identity, and instead of the reg->id, remember two
fields, a pointer and the reg->id. The pointer will store reg->map_ptr
or reg->btf. It's only necessary to distinguish for the id == 0 case of
global variables, but always setting the pointer to a non-NULL value and
using the pointer to check whether the lock is held simplifies code in
the verifier.

This is generic enough to allow it for global variables, map lookups,
and allocated objects at the same time.

Note that while whether a lock is held can be answered by just comparing
active_lock.ptr to NULL, to determine whether the register is pointing
to the same held lock requires comparing _both_ ptr and id.

Finally, as a result of this refactoring, pseudo load instructions are
not given a unique reg->id, as they are doing lookup for the same map
value (max_entries is never greater than 1).

Essentially, we consider that the tuple of (ptr, id) will always be
unique for any kind of argument to bpf_spin_{lock,unlock}.

Note that this can be extended in the future to also remember offset
used for locking, so that we can introduce multiple bpf_spin_lock fields
in the same allocation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h | 10 ++++++++-
 kernel/bpf/verifier.c        | 41 ++++++++++++++++++++++++------------
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1a32baa78ce2..fa738abea267 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -323,7 +323,15 @@ struct bpf_verifier_state {
 	u32 branches;
 	u32 insn_idx;
 	u32 curframe;
-	u32 active_spin_lock;
+	struct {
+		/* This can either be reg->map_ptr or reg->btf, but it is only
+		 * used to check whether the lock is held or not by comparing to
+		 * NULL.
+		 */
+		void *ptr;
+		/* This will be reg->id */
+		u32 id;
+	} active_lock;
 	bool speculative;
 
 	/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 070d003a99f0..99b5edb56978 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1215,7 +1215,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	}
 	dst_state->speculative = src->speculative;
 	dst_state->curframe = src->curframe;
-	dst_state->active_spin_lock = src->active_spin_lock;
+	dst_state->active_lock.ptr = src->active_lock.ptr;
+	dst_state->active_lock.id = src->active_lock.id;
 	dst_state->branches = src->branches;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
@@ -5587,7 +5588,7 @@ int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state
  * Since only one bpf_spin_lock is allowed the checks are simpler than
  * reg_is_refcounted() logic. The verifier needs to remember only
  * one spin_lock instead of array of acquired_refs.
- * cur_state->active_spin_lock remembers which map value element got locked
+ * cur_state->active_lock remembers which map value element got locked
  * and clears it after bpf_spin_unlock.
  */
 static int process_spin_lock(struct bpf_verifier_env *env, int regno,
@@ -5636,22 +5637,35 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 		return -EINVAL;
 	}
 	if (is_lock) {
-		if (cur->active_spin_lock) {
+		if (cur->active_lock.ptr) {
 			verbose(env,
 				"Locking two bpf_spin_locks are not allowed\n");
 			return -EINVAL;
 		}
-		cur->active_spin_lock = reg->id;
+		if (map)
+			cur->active_lock.ptr = map;
+		else
+			cur->active_lock.ptr = btf;
+		cur->active_lock.id = reg->id;
 	} else {
-		if (!cur->active_spin_lock) {
+		void *ptr;
+
+		if (map)
+			ptr = map;
+		else
+			ptr = btf;
+
+		if (!cur->active_lock.ptr) {
 			verbose(env, "bpf_spin_unlock without taking a lock\n");
 			return -EINVAL;
 		}
-		if (cur->active_spin_lock != reg->id) {
+		if (cur->active_lock.ptr != ptr ||
+		    cur->active_lock.id != reg->id) {
 			verbose(env, "bpf_spin_unlock of different lock\n");
 			return -EINVAL;
 		}
-		cur->active_spin_lock = 0;
+		cur->active_lock.ptr = NULL;
+		cur->active_lock.id = 0;
 	}
 	return 0;
 }
@@ -10582,8 +10596,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
-		if (btf_record_has_field(map->record, BPF_SPIN_LOCK))
-			dst_reg->id = ++env->id_gen;
+		WARN_ON_ONCE(map->max_entries != 1);
+		/* We want reg->id to be same (0) as map_value is not distinct */
 	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
 		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
 		dst_reg->type = CONST_PTR_TO_MAP;
@@ -10661,7 +10675,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return err;
 	}
 
-	if (env->cur_state->active_spin_lock) {
+	if (env->cur_state->active_lock.ptr) {
 		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_spin_lock-ed region\n");
 		return -EINVAL;
 	}
@@ -11927,7 +11941,8 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->speculative && !cur->speculative)
 		return false;
 
-	if (old->active_spin_lock != cur->active_spin_lock)
+	if (old->active_lock.ptr != cur->active_lock.ptr ||
+	    old->active_lock.id != cur->active_lock.id)
 		return false;
 
 	/* for states to be equal callsites have to be the same
@@ -12572,7 +12587,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock &&
+				if (env->cur_state->active_lock.ptr &&
 				    (insn->src_reg == BPF_PSEUDO_CALL ||
 				     insn->imm != BPF_FUNC_spin_unlock)) {
 					verbose(env, "function calls are not allowed while holding a lock\n");
@@ -12609,7 +12624,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_spin_lock) {
+				if (env->cur_state->active_lock.ptr) {
 					verbose(env, "bpf_spin_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 13/26] bpf: Allow locking bpf_spin_lock in inner map values
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

There is no need to restrict users from locking bpf_spin_lock in map
values of inner maps. Each inner map lookup gets a unique reg->id
assigned to the returned PTR_TO_MAP_VALUE which will be preserved after
the NULL check. Distinct lookups into different inner map get unique
IDs, and distinct lookups into same inner map also get unique IDs.

Hence, lift the restriction by removing the check return -ENOTSUPP in
map_in_map.c. Later commits will add comprehensive test cases to ensure
that invalid cases are rejected.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/map_in_map.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 8ca0cca39d49..f31893a123a2 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -29,11 +29,6 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		return ERR_PTR(-ENOTSUPP);
 	}
 
-	if (btf_record_has_field(inner_map->record, BPF_SPIN_LOCK)) {
-		fdput(f);
-		return ERR_PTR(-ENOTSUPP);
-	}
-
 	inner_map_meta_size = sizeof(*inner_map_meta);
 	/* In some cases verifier needs to access beyond just base map. */
 	if (inner_map->ops == &array_map_ops)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 13/26] bpf: Allow locking bpf_spin_lock in inner map values Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  6:10   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 15/26] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

As we continue to add more features, argument types, kfunc flags, and
different extensions to kfuncs, the code to verify the correctness of
the kfunc prototype wrt the passed in registers has become ad-hoc and
ugly to read. To make life easier, and make a very clear split between
different stages of argument processing, move all the code into
verifier.c and refactor into easier to read helpers and functions.

This also makes sharing code within the verifier easier with kfunc
argument processing. This will be more and more useful in later patches
as we are now moving to implement very core BPF helpers as kfuncs, to
keep them experimental before baking into UAPI.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h                           |  31 +-
 kernel/bpf/btf.c                              |  16 +-
 kernel/bpf/verifier.c                         | 547 +++++++++++++++++-
 .../bpf/prog_tests/kfunc_dynptr_param.c       |   2 +-
 tools/testing/selftests/bpf/verifier/calls.c  |   2 +-
 .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
 6 files changed, 569 insertions(+), 33 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 42d8f3730a8d..d5b26380a60f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -338,6 +338,16 @@ static inline bool btf_type_is_struct(const struct btf_type *t)
 	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
 }
 
+static inline bool __btf_type_is_struct(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
+}
+
+static inline bool btf_type_is_array(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
+}
+
 static inline u16 btf_type_vlen(const struct btf_type *t)
 {
 	return BTF_INFO_VLEN(t->info);
@@ -439,9 +449,10 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
 	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
 }
 
-#ifdef CONFIG_BPF_SYSCALL
 struct bpf_prog;
+struct bpf_verifier_log;
 
+#ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
 struct btf *btf_parse_vmlinux(void);
@@ -455,6 +466,12 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner);
 struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
+const struct btf_member *
+btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
+		      const struct btf_type *t, enum bpf_prog_type prog_type,
+		      int arg);
+bool btf_types_are_same(const struct btf *btf1, u32 id1,
+			const struct btf *btf2, u32 id2);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -490,6 +507,18 @@ static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf
 {
 	return NULL;
 }
+static inline const struct btf_member *
+btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
+		      const struct btf_type *t, enum bpf_prog_type prog_type,
+		      int arg)
+{
+	return NULL;
+}
+static inline bool btf_types_are_same(const struct btf *btf1, u32 id1,
+				      const struct btf *btf2, u32 id2)
+{
+	return false;
+}
 #endif
 
 static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 10644343d877..ff8b46c209dd 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -478,16 +478,6 @@ static bool btf_type_nosize_or_null(const struct btf_type *t)
 	return !t || btf_type_nosize(t);
 }
 
-static bool __btf_type_is_struct(const struct btf_type *t)
-{
-	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
-}
-
-static bool btf_type_is_array(const struct btf_type *t)
-{
-	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
-}
-
 static bool btf_type_is_datasec(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
@@ -5537,7 +5527,7 @@ static u8 bpf_ctx_convert_map[] = {
 #undef BPF_MAP_TYPE
 #undef BPF_LINK_TYPE
 
-static const struct btf_member *
+const struct btf_member *
 btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 		      const struct btf_type *t, enum bpf_prog_type prog_type,
 		      int arg)
@@ -6323,8 +6313,8 @@ int btf_struct_access(struct bpf_verifier_log *log,
  * end up with two different module BTFs, but IDs point to the common type in
  * vmlinux BTF.
  */
-static bool btf_types_are_same(const struct btf *btf1, u32 id1,
-			       const struct btf *btf2, u32 id2)
+bool btf_types_are_same(const struct btf *btf1, u32 id1,
+			const struct btf *btf2, u32 id2)
 {
 	if (id1 != id2)
 		return false;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 99b5edb56978..ddb7ac1cb529 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7859,19 +7859,523 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
 	}
 }
 
+struct bpf_kfunc_call_arg_meta {
+	/* In parameters */
+	struct btf *btf;
+	u32 func_id;
+	u32 kfunc_flags;
+	const struct btf_type *func_proto;
+	const char *func_name;
+	/* Out parameters */
+	u32 ref_obj_id;
+	u8 release_regno;
+	bool r0_rdonly;
+	u64 r0_size;
+};
+
+static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_ACQUIRE;
+}
+
+static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RET_NULL;
+}
+
+static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RELEASE;
+}
+
+static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_TRUSTED_ARGS;
+}
+
+static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_SLEEPABLE;
+}
+
+static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_DESTRUCTIVE;
+}
+
+static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
+{
+	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
+}
+
+static bool is_kfunc_arg_mem_size(const struct btf *btf,
+				  const struct btf_param *arg,
+				  const struct bpf_reg_state *reg)
+{
+	int len, sfx_len = sizeof("__sz") - 1;
+	const struct btf_type *t;
+	const char *param_name;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
+		return false;
+
+	/* In the future, this can be ported to use BTF tagging */
+	param_name = btf_name_by_offset(btf, arg->name_off);
+	if (str_is_empty(param_name))
+		return false;
+	len = strlen(param_name);
+	if (len < sfx_len)
+		return false;
+	param_name += len - sfx_len;
+	if (strncmp(param_name, "__sz", sfx_len))
+		return false;
+
+	return true;
+}
+
+static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
+				      const struct btf_param *arg,
+				      const struct bpf_reg_state *reg,
+				      const char *name)
+{
+	int len, target_len = strlen(name);
+	const struct btf_type *t;
+	const char *param_name;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
+		return false;
+
+	param_name = btf_name_by_offset(btf, arg->name_off);
+	if (str_is_empty(param_name))
+		return false;
+	len = strlen(param_name);
+	if (len != target_len)
+		return false;
+	if (strcmp(param_name, name))
+		return false;
+
+	return true;
+}
+
+enum {
+	KF_ARG_DYNPTR_ID,
+};
+
+BTF_ID_LIST(kf_arg_btf_ids)
+BTF_ID(struct, bpf_dynptr_kern)
+
+static bool is_kfunc_arg_dynptr(const struct btf *btf,
+				const struct btf_param *arg)
+{
+	const struct btf_type *t;
+	u32 res_id;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!t)
+		return false;
+	if (!btf_type_is_ptr(t))
+		return false;
+	t = btf_type_skip_modifiers(btf, t->type, &res_id);
+	if (!t)
+		return false;
+	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
+}
+
+/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
+static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
+					const struct btf *btf,
+					const struct btf_type *t, int rec)
+{
+	const struct btf_type *member_type;
+	const struct btf_member *member;
+	u32 i;
+
+	if (!btf_type_is_struct(t))
+		return false;
+
+	for_each_member(i, t, member) {
+		const struct btf_array *array;
+
+		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
+		if (btf_type_is_struct(member_type)) {
+			if (rec >= 3) {
+				verbose(env, "max struct nesting depth exceeded\n");
+				return false;
+			}
+			if (!__btf_type_is_scalar_struct(env, btf, member_type, rec + 1))
+				return false;
+			continue;
+		}
+		if (btf_type_is_array(member_type)) {
+			array = btf_array(member_type);
+			if (!array->nelems)
+				return false;
+			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
+			if (!btf_type_is_scalar(member_type))
+				return false;
+			continue;
+		}
+		if (!btf_type_is_scalar(member_type))
+			return false;
+	}
+	return true;
+}
+
+
+static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
+#ifdef CONFIG_NET
+	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
+	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
+	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
+#endif
+};
+
+enum kfunc_ptr_arg_type {
+	KF_ARG_PTR_TO_CTX,
+	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
+	KF_ARG_PTR_TO_DYNPTR,
+	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
+	KF_ARG_PTR_TO_MEM,
+	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
+};
+
+static enum kfunc_ptr_arg_type
+get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
+		       struct bpf_kfunc_call_arg_meta *meta,
+		       const struct btf_type *t, const struct btf_type *ref_t,
+		       const char *ref_tname, const struct btf_param *args,
+		       int argno, int nargs)
+{
+	u32 regno = argno + 1;
+	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_reg_state *reg = &regs[regno];
+	bool arg_mem_size = false;
+
+	/* In this function, we verify the kfunc's BTF as per the argument type,
+	 * leaving the rest of the verification with respect to the register
+	 * type to our caller. When a set of conditions hold in the BTF type of
+	 * arguments, we resolve it to a known kfunc_ptr_arg_type.
+	 */
+	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
+		return KF_ARG_PTR_TO_CTX;
+
+	if (is_kfunc_arg_kptr_get(meta, argno)) {
+		if (!btf_type_is_ptr(ref_t)) {
+			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
+			return -EINVAL;
+		}
+		ref_t = btf_type_by_id(meta->btf, ref_t->type);
+		ref_tname = btf_name_by_offset(meta->btf, ref_t->name_off);
+		if (!btf_type_is_struct(ref_t)) {
+			verbose(env, "kernel function %s args#0 pointer type %s %s is not supported\n",
+				meta->func_name, btf_type_str(ref_t), ref_tname);
+			return -EINVAL;
+		}
+		return KF_ARG_PTR_TO_KPTR_STRONG;
+	}
+
+	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_DYNPTR;
+
+	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
+		if (!btf_type_is_struct(ref_t)) {
+			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
+				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
+			return -EINVAL;
+		}
+		return KF_ARG_PTR_TO_BTF_ID;
+	}
+
+	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
+		arg_mem_size = true;
+
+	/* This is the catch all argument type of register types supported by
+	 * check_helper_mem_access. However, we only allow when argument type is
+	 * pointer to scalar, or struct composed (recursively) of scalars. When
+	 * arg_mem_size is true, the pointer can be void *.
+	 */
+	if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
+	    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
+		verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
+			argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
+		return -EINVAL;
+	}
+	return arg_mem_size ? KF_ARG_PTR_TO_MEM_SIZE : KF_ARG_PTR_TO_MEM;
+}
+
+static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
+					struct bpf_reg_state *reg,
+					const struct btf_type *ref_t,
+					const char *ref_tname, u32 ref_id,
+					struct bpf_kfunc_call_arg_meta *meta,
+					int argno)
+{
+	const struct btf_type *reg_ref_t;
+	bool strict_type_match = false;
+	const struct btf *reg_btf;
+	const char *reg_ref_tname;
+	u32 reg_ref_id;
+
+	if (reg->type == PTR_TO_BTF_ID) {
+		reg_btf = reg->btf;
+		reg_ref_id = reg->btf_id;
+	} else {
+		reg_btf = btf_vmlinux;
+		reg_ref_id = *reg2btf_ids[base_type(reg->type)];
+	}
+
+	if (is_kfunc_trusted_args(meta) || (is_kfunc_release(meta) && reg->ref_obj_id))
+		strict_type_match = true;
+
+	reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
+	reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
+	if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
+		verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
+			meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
+			btf_type_str(reg_ref_t), reg_ref_tname);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
+					     struct bpf_reg_state *reg,
+					     const struct btf_type *ref_t,
+					     const char *ref_tname,
+					     struct bpf_kfunc_call_arg_meta *meta,
+					     int argno)
+{
+	struct btf_field *kptr_field;
+
+	/* check_func_arg_reg_off allows var_off for
+	 * PTR_TO_MAP_VALUE, but we need fixed offset to find
+	 * off_desc.
+	 */
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env, "arg#0 must have constant offset\n");
+		return -EINVAL;
+	}
+
+	kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
+	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
+		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
+			reg->off + reg->var_off.value);
+		return -EINVAL;
+	}
+
+	if (!btf_struct_ids_match(&env->log, meta->btf, ref_t->type, 0, kptr_field->kptr.btf,
+				  kptr_field->kptr.btf_id, true)) {
+		verbose(env, "kernel function %s args#%d expected pointer to %s %s\n",
+			meta->func_name, argno, btf_type_str(ref_t), ref_tname);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
+{
+	const char *func_name = meta->func_name, *ref_tname;
+	const struct btf *btf = meta->btf;
+	const struct btf_param *args;
+	u32 i, nargs;
+	int ret;
+
+	args = (const struct btf_param *)(meta->func_proto + 1);
+	nargs = btf_type_vlen(meta->func_proto);
+	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
+		verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
+			MAX_BPF_FUNC_REG_ARGS);
+		return -EINVAL;
+	}
+
+	/* Check that BTF function arguments match actual types that the
+	 * verifier sees.
+	 */
+	for (i = 0; i < nargs; i++) {
+		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
+		const struct btf_type *t, *ref_t, *resolve_ret;
+		enum bpf_arg_type arg_type = ARG_DONTCARE;
+		u32 regno = i + 1, ref_id, type_size;
+		bool is_ret_buf_sz = false;
+		int kf_arg_type;
+
+		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+		if (btf_type_is_scalar(t)) {
+			if (reg->type != SCALAR_VALUE) {
+				verbose(env, "R%d is not a scalar\n", regno);
+				return -EINVAL;
+			}
+			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
+					meta->r0_rdonly = true;
+					is_ret_buf_sz = true;
+			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
+					is_ret_buf_sz = true;
+			}
+
+			if (is_ret_buf_sz) {
+				if (meta->r0_size) {
+					verbose(env, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
+					return -EINVAL;
+				}
+
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "R%d is not a const\n", regno);
+					return -EINVAL;
+				}
+
+				meta->r0_size = reg->var_off.value;
+				ret = mark_chain_precision(env, regno);
+				if (ret)
+					return ret;
+			}
+			continue;
+		}
+
+		if (!btf_type_is_ptr(t)) {
+			verbose(env, "Unrecognized arg#%d type %s\n", i, btf_type_str(t));
+			return -EINVAL;
+		}
+
+		if (reg->ref_obj_id) {
+			if (is_kfunc_release(meta) && meta->ref_obj_id) {
+				verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
+					regno, reg->ref_obj_id,
+					meta->ref_obj_id);
+				return -EFAULT;
+			}
+			meta->ref_obj_id = reg->ref_obj_id;
+			if (is_kfunc_release(meta))
+				meta->release_regno = regno;
+		}
+
+		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
+		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+
+		kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs);
+		if (kf_arg_type < 0)
+			return kf_arg_type;
+
+		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_BTF_ID:
+			if (!is_kfunc_trusted_args(meta))
+				break;
+			if (!reg->ref_obj_id) {
+				verbose(env, "R%d must be referenced\n", regno);
+				return -EINVAL;
+			}
+			fallthrough;
+		case KF_ARG_PTR_TO_CTX:
+			/* Trusted arguments have the same offset checks as release arguments */
+			arg_type |= OBJ_RELEASE;
+			break;
+		case KF_ARG_PTR_TO_KPTR_STRONG:
+		case KF_ARG_PTR_TO_DYNPTR:
+		case KF_ARG_PTR_TO_MEM:
+		case KF_ARG_PTR_TO_MEM_SIZE:
+			/* Trusted by default */
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return -EFAULT;
+		}
+
+		if (is_kfunc_release(meta) && reg->ref_obj_id)
+			arg_type |= OBJ_RELEASE;
+		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
+		if (ret < 0)
+			return ret;
+
+		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_CTX:
+			if (reg->type != PTR_TO_CTX) {
+				verbose(env, "arg#%d expected pointer to ctx, but got %s\n", i, btf_type_str(t));
+				return -EINVAL;
+			}
+			break;
+		case KF_ARG_PTR_TO_KPTR_STRONG:
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				verbose(env, "arg#0 expected pointer to map value\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_kptr_strong(env, reg, ref_t, ref_tname, meta, i);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_DYNPTR:
+			if (reg->type != PTR_TO_STACK) {
+				verbose(env, "arg#%d expected pointer to stack\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_dynptr_reg_valid_init(env, reg)) {
+				verbose(env, "arg#%d pointer type %s %s must be valid and initialized\n",
+					i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+
+			if (!is_dynptr_type_expected(env, reg, ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
+				verbose(env, "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
+					i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			break;
+		case KF_ARG_PTR_TO_BTF_ID:
+			/* Only base_type is checked, further checks are done here */
+			if (reg->type != PTR_TO_BTF_ID &&
+			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
+				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_MEM:
+			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
+			if (IS_ERR(resolve_ret)) {
+				verbose(env, "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
+					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
+				return -EINVAL;
+			}
+			ret = check_mem_reg(env, reg, regno, type_size);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_MEM_SIZE:
+			ret = check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1);
+			if (ret < 0) {
+				verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+				return ret;
+			}
+			/* Skip next '__sz' argument */
+			i++;
+			break;
+		}
+	}
+
+	if (is_kfunc_release(meta) && !meta->release_regno) {
+		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
+			func_name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
 	const struct btf_type *t, *func, *func_proto, *ptr_type;
 	struct bpf_reg_state *regs = cur_regs(env);
-	struct bpf_kfunc_arg_meta meta = { 0 };
 	const char *func_name, *ptr_type_name;
+	struct bpf_kfunc_call_arg_meta meta;
 	u32 i, nargs, func_id, ptr_type_id;
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	struct btf *desc_btf;
 	u32 *kfunc_flags;
-	bool acq;
 
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
@@ -7892,24 +8396,34 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			func_name);
 		return -EACCES;
 	}
-	if (*kfunc_flags & KF_DESTRUCTIVE && !capable(CAP_SYS_BOOT)) {
-		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capabilities\n");
+
+	/* Prepare kfunc call metadata */
+	memset(&meta, 0, sizeof(meta));
+	meta.btf = desc_btf;
+	meta.func_id = func_id;
+	meta.kfunc_flags = *kfunc_flags;
+	meta.func_proto = func_proto;
+	meta.func_name = func_name;
+
+	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
+		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
 		return -EACCES;
 	}
 
-	acq = *kfunc_flags & KF_ACQUIRE;
-
-	meta.flags = *kfunc_flags;
+	if (is_kfunc_sleepable(&meta) && !env->prog->aux->sleepable) {
+		verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name);
+		return -EACCES;
+	}
 
 	/* Check the arguments */
-	err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta);
+	err = check_kfunc_args(env, &meta);
 	if (err < 0)
 		return err;
 	/* In case of release function, we get register number of refcounted
-	 * PTR_TO_BTF_ID back from btf_check_kfunc_arg_match, do the release now
+	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
 	 */
-	if (err) {
-		err = release_reference(env, regs[err].ref_obj_id);
+	if (meta.release_regno) {
+		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
 				func_name, func_id);
@@ -7923,7 +8437,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	/* Check return type */
 	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
 
-	if (acq && !btf_type_is_struct_ptr(desc_btf, t)) {
+	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
 		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
 		return -EINVAL;
 	}
@@ -7962,20 +8476,23 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 		}
-		if (*kfunc_flags & KF_RET_NULL) {
+		if (is_kfunc_ret_null(&meta)) {
 			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
 			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
 			regs[BPF_REG_0].id = ++env->id_gen;
 		}
 		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
-		if (acq) {
+		if (is_kfunc_acquire(&meta)) {
 			int id = acquire_reference_state(env, insn_idx);
 
 			if (id < 0)
 				return id;
-			regs[BPF_REG_0].id = id;
+			if (is_kfunc_ret_null(&meta))
+				regs[BPF_REG_0].id = id;
 			regs[BPF_REG_0].ref_obj_id = id;
 		}
+		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
+			regs[BPF_REG_0].id = ++env->id_gen;
 	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
 
 	nargs = btf_type_vlen(func_proto);
diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
index c210657d4d0a..55d641c1f126 100644
--- a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
+++ b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
@@ -22,7 +22,7 @@ static struct {
 	 "arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0},
 	{"not_valid_dynptr",
 	 "arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0},
-	{"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0},
+	{"not_ptr_to_stack", "arg#0 expected pointer to stack", 0},
 	{"dynptr_data_null", NULL, -EBADMSG},
 };
 
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index e1a937277b54..86d6fef2e3b4 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -109,7 +109,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_kfunc_call_test_acquire", 3 },
 		{ "bpf_kfunc_call_test_release", 5 },
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index fd683a32a276..55cba01c99d5 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -142,7 +142,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_user_key", 2 },
 		{ "bpf_key_put", 4 },
@@ -163,7 +163,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 expected pointer to btf or socket",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_system_key", 1 },
 		{ "bpf_key_put", 3 },
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 15/26] bpf: Drop kfunc bits from btf_check_func_arg_match
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 16/26] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Remove all kfunc related bits now from btf_check_func_arg_match, as
users have been converted away to refactored kfunc argument handling.

This is split into a separate commit to aid review, in order to compare
what has been preserved from the removed bits easily instead of mixing
removed hunks with previous patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |  11 --
 include/linux/bpf_verifier.h |   2 -
 kernel/bpf/btf.c             | 364 +----------------------------------
 kernel/bpf/verifier.c        |   4 +-
 4 files changed, 10 insertions(+), 371 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c88f75a68893..62a16b699e71 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2108,22 +2108,11 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
 			   const char *func_name,
 			   struct btf_func_model *m);
 
-struct bpf_kfunc_arg_meta {
-	u64 r0_size;
-	bool r0_rdonly;
-	int ref_obj_id;
-	u32 flags;
-};
-
 struct bpf_reg_state;
 int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
 				struct bpf_reg_state *regs);
 int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 			   struct bpf_reg_state *regs);
-int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
-			      const struct btf *btf, u32 func_id,
-			      struct bpf_reg_state *regs,
-			      struct bpf_kfunc_arg_meta *meta);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			  struct bpf_reg_state *reg);
 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index fa738abea267..887fa4d922f6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -597,8 +597,6 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
 			   enum bpf_arg_type arg_type);
-int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-			     u32 regno);
 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 		   u32 regno, u32 mem_size);
 bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ff8b46c209dd..6ea8d0cf81f7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6596,122 +6596,19 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
-static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
-#ifdef CONFIG_NET
-	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
-	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
-	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
-#endif
-};
-
-/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
-static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int rec)
-{
-	const struct btf_type *member_type;
-	const struct btf_member *member;
-	u32 i;
-
-	if (!btf_type_is_struct(t))
-		return false;
-
-	for_each_member(i, t, member) {
-		const struct btf_array *array;
-
-		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
-		if (btf_type_is_struct(member_type)) {
-			if (rec >= 3) {
-				bpf_log(log, "max struct nesting depth exceeded\n");
-				return false;
-			}
-			if (!__btf_type_is_scalar_struct(log, btf, member_type, rec + 1))
-				return false;
-			continue;
-		}
-		if (btf_type_is_array(member_type)) {
-			array = btf_type_array(member_type);
-			if (!array->nelems)
-				return false;
-			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
-			if (!btf_type_is_scalar(member_type))
-				return false;
-			continue;
-		}
-		if (!btf_type_is_scalar(member_type))
-			return false;
-	}
-	return true;
-}
-
-static bool is_kfunc_arg_mem_size(const struct btf *btf,
-				  const struct btf_param *arg,
-				  const struct bpf_reg_state *reg)
-{
-	int len, sfx_len = sizeof("__sz") - 1;
-	const struct btf_type *t;
-	const char *param_name;
-
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
-	/* In the future, this can be ported to use BTF tagging */
-	param_name = btf_name_by_offset(btf, arg->name_off);
-	if (str_is_empty(param_name))
-		return false;
-	len = strlen(param_name);
-	if (len < sfx_len)
-		return false;
-	param_name += len - sfx_len;
-	if (strncmp(param_name, "__sz", sfx_len))
-		return false;
-
-	return true;
-}
-
-static bool btf_is_kfunc_arg_mem_size(const struct btf *btf,
-				      const struct btf_param *arg,
-				      const struct bpf_reg_state *reg,
-				      const char *name)
-{
-	int len, target_len = strlen(name);
-	const struct btf_type *t;
-	const char *param_name;
-
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
-	param_name = btf_name_by_offset(btf, arg->name_off);
-	if (str_is_empty(param_name))
-		return false;
-	len = strlen(param_name);
-	if (len != target_len)
-		return false;
-	if (strcmp(param_name, name))
-		return false;
-
-	return true;
-}
-
 static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				    const struct btf *btf, u32 func_id,
 				    struct bpf_reg_state *regs,
 				    bool ptr_to_mem_ok,
-				    struct bpf_kfunc_arg_meta *kfunc_meta,
 				    bool processing_call)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
-	bool rel = false, kptr_get = false, trusted_args = false;
-	bool sleepable = false;
 	struct bpf_verifier_log *log = &env->log;
-	u32 i, nargs, ref_id, ref_obj_id = 0;
-	bool is_kfunc = btf_is_kernel(btf);
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
-	int ref_regno = 0, ret;
+	u32 i, nargs, ref_id;
+	int ret;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
@@ -6737,14 +6634,6 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (is_kfunc && kfunc_meta) {
-		/* Only kfunc can be release func */
-		rel = kfunc_meta->flags & KF_RELEASE;
-		kptr_get = kfunc_meta->flags & KF_KPTR_GET;
-		trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
-		sleepable = kfunc_meta->flags & KF_SLEEPABLE;
-	}
-
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
@@ -6752,42 +6641,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1;
 		struct bpf_reg_state *reg = &regs[regno];
-		bool obj_ptr = false;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
 		if (btf_type_is_scalar(t)) {
-			if (is_kfunc && kfunc_meta) {
-				bool is_buf_size = false;
-
-				/* check for any const scalar parameter of name "rdonly_buf_size"
-				 * or "rdwr_buf_size"
-				 */
-				if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
-							      "rdonly_buf_size")) {
-					kfunc_meta->r0_rdonly = true;
-					is_buf_size = true;
-				} else if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
-								     "rdwr_buf_size"))
-					is_buf_size = true;
-
-				if (is_buf_size) {
-					if (kfunc_meta->r0_size) {
-						bpf_log(log, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
-						return -EINVAL;
-					}
-
-					if (!tnum_is_const(reg->var_off)) {
-						bpf_log(log, "R%d is not a const\n", regno);
-						return -EINVAL;
-					}
-
-					kfunc_meta->r0_size = reg->var_off.value;
-					ret = mark_chain_precision(env, regno);
-					if (ret)
-						return ret;
-				}
-			}
-
 			if (reg->type == SCALAR_VALUE)
 				continue;
 			bpf_log(log, "R%d is not a scalar\n", regno);
@@ -6800,88 +6656,14 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		/* These register types have special constraints wrt ref_obj_id
-		 * and offset checks. The rest of trusted args don't.
-		 */
-		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
-			  reg2btf_ids[base_type(reg->type)];
-
-		/* Check if argument must be a referenced pointer, args + i has
-		 * been verified to be a pointer (after skipping modifiers).
-		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
-		 */
-		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
-			bpf_log(log, "R%d must be referenced\n", regno);
-			return -EINVAL;
-		}
-
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		/* Trusted args have the same offset checks as release arguments */
-		if ((trusted_args && obj_ptr) || (rel && reg->ref_obj_id))
-			arg_type |= OBJ_RELEASE;
 		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
 		if (ret < 0)
 			return ret;
 
-		if (is_kfunc && reg->ref_obj_id) {
-			/* Ensure only one argument is referenced PTR_TO_BTF_ID */
-			if (ref_obj_id) {
-				bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
-					regno, reg->ref_obj_id, ref_obj_id);
-				return -EFAULT;
-			}
-			ref_regno = regno;
-			ref_obj_id = reg->ref_obj_id;
-		}
-
-		/* kptr_get is only true for kfunc */
-		if (i == 0 && kptr_get) {
-			struct btf_field *kptr_field;
-
-			if (reg->type != PTR_TO_MAP_VALUE) {
-				bpf_log(log, "arg#0 expected pointer to map value\n");
-				return -EINVAL;
-			}
-
-			/* check_func_arg_reg_off allows var_off for
-			 * PTR_TO_MAP_VALUE, but we need fixed offset to find
-			 * off_desc.
-			 */
-			if (!tnum_is_const(reg->var_off)) {
-				bpf_log(log, "arg#0 must have constant offset\n");
-				return -EINVAL;
-			}
-
-			kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
-			if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
-				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
-					reg->off + reg->var_off.value);
-				return -EINVAL;
-			}
-
-			if (!btf_type_is_ptr(ref_t)) {
-				bpf_log(log, "arg#0 BTF type must be a double pointer\n");
-				return -EINVAL;
-			}
-
-			ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id);
-			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
-
-			if (!btf_type_is_struct(ref_t)) {
-				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
-					func_name, i, btf_type_str(ref_t), ref_tname);
-				return -EINVAL;
-			}
-			if (!btf_struct_ids_match(log, btf, ref_id, 0, kptr_field->kptr.btf,
-						  kptr_field->kptr.btf_id, true)) {
-				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
-					func_name, i, btf_type_str(ref_t), ref_tname);
-				return -EINVAL;
-			}
-			/* rest of the arguments can be anything, like normal kfunc */
-		} else if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
+		if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
@@ -6891,109 +6673,10 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 					i, btf_type_str(t));
 				return -EINVAL;
 			}
-		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
-			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
-			const struct btf_type *reg_ref_t;
-			const struct btf *reg_btf;
-			const char *reg_ref_tname;
-			u32 reg_ref_id;
-
-			if (!btf_type_is_struct(ref_t)) {
-				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
-					func_name, i, btf_type_str(ref_t),
-					ref_tname);
-				return -EINVAL;
-			}
-
-			if (reg->type == PTR_TO_BTF_ID) {
-				reg_btf = reg->btf;
-				reg_ref_id = reg->btf_id;
-			} else {
-				reg_btf = btf_vmlinux;
-				reg_ref_id = *reg2btf_ids[base_type(reg->type)];
-			}
-
-			reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id,
-							    &reg_ref_id);
-			reg_ref_tname = btf_name_by_offset(reg_btf,
-							   reg_ref_t->name_off);
-			if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
-						  reg->off, btf, ref_id,
-						  trusted_args || (rel && reg->ref_obj_id))) {
-				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
-					func_name, i,
-					btf_type_str(ref_t), ref_tname,
-					regno, btf_type_str(reg_ref_t),
-					reg_ref_tname);
-				return -EINVAL;
-			}
 		} else if (ptr_to_mem_ok && processing_call) {
 			const struct btf_type *resolve_ret;
 			u32 type_size;
 
-			if (is_kfunc) {
-				bool arg_mem_size = i + 1 < nargs && is_kfunc_arg_mem_size(btf, &args[i + 1], &regs[regno + 1]);
-				bool arg_dynptr = btf_type_is_struct(ref_t) &&
-						  !strcmp(ref_tname,
-							  stringify_struct(bpf_dynptr_kern));
-
-				/* Permit pointer to mem, but only when argument
-				 * type is pointer to scalar, or struct composed
-				 * (recursively) of scalars.
-				 * When arg_mem_size is true, the pointer can be
-				 * void *.
-				 * Also permit initialized local dynamic pointers.
-				 */
-				if (!btf_type_is_scalar(ref_t) &&
-				    !__btf_type_is_scalar_struct(log, btf, ref_t, 0) &&
-				    !arg_dynptr &&
-				    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
-					bpf_log(log,
-						"arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
-						i, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
-					return -EINVAL;
-				}
-
-				if (arg_dynptr) {
-					if (reg->type != PTR_TO_STACK) {
-						bpf_log(log, "arg#%d pointer type %s %s not to stack\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					if (!is_dynptr_reg_valid_init(env, reg)) {
-						bpf_log(log,
-							"arg#%d pointer type %s %s must be valid and initialized\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					if (!is_dynptr_type_expected(env, reg,
-							ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
-						bpf_log(log,
-							"arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
-							i, btf_type_str(ref_t),
-							ref_tname);
-						return -EINVAL;
-					}
-
-					continue;
-				}
-
-				/* Check for mem, len pair */
-				if (arg_mem_size) {
-					if (check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1)) {
-						bpf_log(log, "arg#%d arg#%d memory, len pair leads to invalid memory access\n",
-							i, i + 1);
-						return -EINVAL;
-					}
-					i++;
-					continue;
-				}
-			}
-
 			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
 			if (IS_ERR(resolve_ret)) {
 				bpf_log(log,
@@ -7006,36 +6689,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 			if (check_mem_reg(env, reg, regno, type_size))
 				return -EINVAL;
 		} else {
-			bpf_log(log, "reg type unsupported for arg#%d %sfunction %s#%d\n", i,
-				is_kfunc ? "kernel " : "", func_name, func_id);
+			bpf_log(log, "reg type unsupported for arg#%d function %s#%d\n", i,
+				func_name, func_id);
 			return -EINVAL;
 		}
 	}
 
-	/* Either both are set, or neither */
-	WARN_ON_ONCE((ref_obj_id && !ref_regno) || (!ref_obj_id && ref_regno));
-	/* We already made sure ref_obj_id is set only for one argument. We do
-	 * allow (!rel && ref_obj_id), so that passing such referenced
-	 * PTR_TO_BTF_ID to other kfuncs works. Note that rel is only true when
-	 * is_kfunc is true.
-	 */
-	if (rel && !ref_obj_id) {
-		bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
-			func_name);
-		return -EINVAL;
-	}
-
-	if (sleepable && !env->prog->aux->sleepable) {
-		bpf_log(log, "kernel function %s is sleepable but the program is not\n",
-			func_name);
-		return -EINVAL;
-	}
-
-	if (kfunc_meta && ref_obj_id)
-		kfunc_meta->ref_obj_id = ref_obj_id;
-
-	/* returns argument register number > 0 in case of reference release kfunc */
-	return rel ? ref_regno : 0;
+	return 0;
 }
 
 /* Compare BTF of a function declaration with given bpf_reg_state.
@@ -7065,7 +6725,7 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
 		return -EINVAL;
 
 	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
-	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, false);
+	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, false);
 
 	/* Compiler optimizations can remove arguments from static functions
 	 * or mismatched type can be passed into a global function.
@@ -7108,7 +6768,7 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 		return -EINVAL;
 
 	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
-	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, true);
+	err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, true);
 
 	/* Compiler optimizations can remove arguments from static functions
 	 * or mismatched type can be passed into a global function.
@@ -7119,14 +6779,6 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
 	return err;
 }
 
-int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
-			      const struct btf *btf, u32 func_id,
-			      struct bpf_reg_state *regs,
-			      struct bpf_kfunc_arg_meta *meta)
-{
-	return btf_check_func_arg_match(env, btf, func_id, regs, true, meta, true);
-}
-
 /* Convert BTF of a function into bpf_reg_state if possible
  * Returns:
  * EFAULT - there is a verifier bug. Abort verification.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ddb7ac1cb529..d95b6cc63e38 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5544,8 +5544,8 @@ int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 	return err;
 }
 
-int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
-			     u32 regno)
+static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				    u32 regno)
 {
 	struct bpf_reg_state *mem_reg = &cur_regs(env)[regno - 1];
 	bool may_be_null = type_may_be_null(mem_reg->type);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 16/26] bpf: Support constant scalar arguments for kfuncs
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (14 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 15/26] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Allow passing known constant scalars as arguments to kfuncs that do not
represent a size parameter. This makes the search pruning optimization
of verifier more conservative for such kfunc calls, and each
non-distinct argument is considered unequivalent.

We will use this support to then expose a bpf_obj_new function where it
takes the local type ID of a type in program BTF, and returns a
PTR_TO_BTF_ID | MEM_ALLOC to the local type, and allows programs to
allocate their own objects.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 Documentation/bpf/kfuncs.rst | 22 ++++++++++++++
 kernel/bpf/verifier.c        | 59 +++++++++++++++++++++++++++---------
 2 files changed, 67 insertions(+), 14 deletions(-)

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0f858156371d..8fa9c052417f 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -72,6 +72,28 @@ argument as its size. By default, without __sz annotation, the size of the type
 of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
 pointer.
 
+2.2.1 __k Annotation
+--------------------
+
+This annotation is only understood for scalar arguments, where it indicates that
+the verifier must check the scalar argument to be a known constant, which does
+not indicate a size parameter.
+
+An example is given below::
+
+        void *bpf_obj_new(u32 local_type_id__k, ...)
+        {
+        ...
+        }
+
+Here, bpf_obj_new uses local_type_id argument to find out the size of that type
+ID in program's BTF and return a sized pointer to it. Each type ID will have a
+distinct size, hence it is crucial to treat each such call as distinct when
+values don't match.
+
+Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
+size parameter, __k suffix should be used.
+
 .. _BPF_kfunc_nodef:
 
 2.3 Using an existing kernel function
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d95b6cc63e38..a4a1424b19a5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7871,6 +7871,10 @@ struct bpf_kfunc_call_arg_meta {
 	u8 release_regno;
 	bool r0_rdonly;
 	u64 r0_size;
+	struct {
+		u64 value;
+		bool found;
+	} arg_constant;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7908,30 +7912,40 @@ static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
 	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
 }
 
-static bool is_kfunc_arg_mem_size(const struct btf *btf,
-				  const struct btf_param *arg,
-				  const struct bpf_reg_state *reg)
+static bool __kfunc_param_match_suffix(const struct btf *btf,
+				       const struct btf_param *arg,
+				       const char *suffix)
 {
-	int len, sfx_len = sizeof("__sz") - 1;
-	const struct btf_type *t;
+	int suffix_len = strlen(suffix), len;
 	const char *param_name;
 
-	t = btf_type_skip_modifiers(btf, arg->type, NULL);
-	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
-		return false;
-
 	/* In the future, this can be ported to use BTF tagging */
 	param_name = btf_name_by_offset(btf, arg->name_off);
 	if (str_is_empty(param_name))
 		return false;
 	len = strlen(param_name);
-	if (len < sfx_len)
+	if (len < suffix_len)
 		return false;
-	param_name += len - sfx_len;
-	if (strncmp(param_name, "__sz", sfx_len))
+	param_name += len - suffix_len;
+	return !strncmp(param_name, suffix, suffix_len);
+}
+
+static bool is_kfunc_arg_mem_size(const struct btf *btf,
+				  const struct btf_param *arg,
+				  const struct bpf_reg_state *reg)
+{
+	const struct btf_type *t;
+
+	t = btf_type_skip_modifiers(btf, arg->type, NULL);
+	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
 		return false;
 
-	return true;
+	return __kfunc_param_match_suffix(btf, arg, "__sz");
+}
+
+static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__k");
 }
 
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
@@ -8207,7 +8221,24 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "R%d is not a scalar\n", regno);
 				return -EINVAL;
 			}
-			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
+			if (is_kfunc_arg_sfx_constant(meta->btf, &args[i])) {
+				/* kfunc is already bpf_capable() only, no need
+				 * to check it here.
+				 */
+				if (meta->arg_constant.found) {
+					verbose(env, "verifier internal error: only one constant argument permitted\n");
+					return -EFAULT;
+				}
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "R%d must be a known constant\n", regno);
+					return -EINVAL;
+				}
+				ret = mark_chain_precision(env, regno);
+				if (ret < 0)
+					return ret;
+				meta->arg_constant.found = true;
+				meta->arg_constant.value = reg->var_off.value;
+			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
 					meta->r0_rdonly = true;
 					is_ret_buf_sz = true;
 			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (15 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 16/26] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  6:19   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 18/26] bpf: Introduce bpf_obj_drop Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Introduce type safe memory allocator bpf_obj_new for BPF programs. The
kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
to kfuncs still requires having them in prototype, unlike BPF helpers
which always take 5 arguments and have them checked using bpf_func_proto
in verifier, ignoring unset argument types.

Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_obj_new_impl kfunc.

The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).

The second argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.

It would also be needed in the next patch to perform proper destruction
of the object's special fields.

Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
using the any context BPF memory allocator introduced recently. To this
end, a global instance of the BPF memory allocator is initialized on
boot to be used for this purpose. This 'bpf_global_ma' serves all
allocations for bpf_obj_new. In the future, bpf_obj_new variants will
allow specifying a custom allocator.

Note that now that bpf_obj_new can be used to allocate objects that can
be linked to BPF linked list (when future linked list helpers are
available), we need to also free the elements using bpf_mem_free.
However, since the draining of elements is done outside the
bpf_spin_lock, we need to do migrate_disable around the call since
bpf_list_head_free can be called from map free path where migration is
enabled. Otherwise, when called from BPF programs migration is already
disabled.

A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
};

void prog(void) {
	struct foo *f;

	f = bpf_obj_new(typeof(*f));
	if (!f)
		return;
	...
}

A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  21 ++--
 include/linux/bpf_verifier.h                  |   2 +
 kernel/bpf/core.c                             |  16 +++
 kernel/bpf/helpers.c                          |  47 ++++++--
 kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  25 ++++
 6 files changed, 195 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 62a16b699e71..4635e31bd6fc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -54,6 +54,8 @@ struct cgroup;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
+extern struct bpf_mem_alloc bpf_global_ma;
+extern bool bpf_global_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
@@ -333,16 +335,19 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
 	return rec->field_mask & type;
 }
 
-static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
 {
-	if (!IS_ERR_OR_NULL(map->record)) {
-		struct btf_field *fields = map->record->fields;
-		u32 cnt = map->record->cnt;
-		int i;
+	int i;
 
-		for (i = 0; i < cnt; i++)
-			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
-	}
+	if (!foffs)
+		return;
+	for (i = 0; i < foffs->cnt; i++)
+		memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
+}
+
+static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+{
+	bpf_obj_init(map->field_offs, dst);
 }
 
 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 887fa4d922f6..306fc1d6cc4a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -427,6 +427,8 @@ struct bpf_insn_aux_data {
 		 */
 		struct bpf_loop_inline_state loop_inline_state;
 	};
+	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
+	struct btf_struct_meta *kptr_struct_meta;
 	u64 map_key_state; /* constant (32 bit) key tracking for maps */
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9c16338bcbe8..2e57fc839a5c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -34,6 +34,7 @@
 #include <linux/log2.h>
 #include <linux/bpf_verifier.h>
 #include <linux/nodemask.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include <asm/barrier.h>
 #include <asm/unaligned.h>
@@ -60,6 +61,9 @@
 #define CTX	regs[BPF_REG_CTX]
 #define IMM	insn->imm
 
+struct bpf_mem_alloc bpf_global_ma;
+bool bpf_global_ma_set;
+
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
@@ -2746,6 +2750,18 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
 	return -ENOTSUPP;
 }
 
+#ifdef CONFIG_BPF_SYSCALL
+static int __init bpf_global_ma_init(void)
+{
+	int ret;
+
+	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
+	bpf_global_ma_set = !ret;
+	return ret;
+}
+late_initcall(bpf_global_ma_init);
+#endif
+
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5bc0b9f0f306..c4f1c22cc44c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -19,6 +19,7 @@
 #include <linux/proc_ns.h>
 #include <linux/security.h>
 #include <linux/btf_ids.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -1735,25 +1736,57 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 
 		obj -= field->list_head.node_offset;
 		head = head->next;
-		/* TODO: Rework later */
-		kfree(obj);
+		/* The contained type can also have resources, including a
+		 * bpf_list_head which needs to be freed.
+		 */
+		bpf_obj_free_fields(field->list_head.value_rec, obj);
+		/* bpf_mem_free requires migrate_disable(), since we can be
+		 * called from map free path as well apart from BPF program (as
+		 * part of map ops doing bpf_obj_free_fields).
+		 */
+		migrate_disable();
+		bpf_mem_free(&bpf_global_ma, obj);
+		migrate_enable();
 	}
 }
 
-BTF_SET8_START(tracing_btf_ids)
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	u64 size = local_type_id__k;
+	void *p;
+
+	if (unlikely(!bpf_global_ma_set))
+		return NULL;
+	p = bpf_mem_alloc(&bpf_global_ma, size);
+	if (!p)
+		return NULL;
+	if (meta)
+		bpf_obj_init(meta->field_offs, p);
+	return p;
+}
+
+__diag_pop();
+
+BTF_SET8_START(generic_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
-BTF_SET8_END(tracing_btf_ids)
+BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_SET8_END(generic_btf_ids)
 
-static const struct btf_kfunc_id_set tracing_kfunc_set = {
+static const struct btf_kfunc_id_set generic_kfunc_set = {
 	.owner = THIS_MODULE,
-	.set   = &tracing_btf_ids,
+	.set   = &generic_btf_ids,
 };
 
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
 }
 
 late_initcall(kfunc_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a4a1424b19a5..c7f5d83783db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7948,6 +7948,11 @@ static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
 	return __kfunc_param_match_suffix(btf, arg, "__k");
 }
 
+static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__ign");
+}
+
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 				      const struct btf_param *arg,
 				      const struct bpf_reg_state *reg,
@@ -8216,6 +8221,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		int kf_arg_type;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+
+		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
+			continue;
+
 		if (btf_type_is_scalar(t)) {
 			if (reg->type != SCALAR_VALUE) {
 				verbose(env, "R%d is not a scalar\n", regno);
@@ -8395,6 +8404,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	return 0;
 }
 
+enum special_kfunc_type {
+	KF_bpf_obj_new_impl,
+};
+
+BTF_SET_START(special_kfunc_set)
+BTF_ID(func, bpf_obj_new_impl)
+BTF_SET_END(special_kfunc_set)
+
+BTF_ID_LIST(special_kfunc_list)
+BTF_ID(func, bpf_obj_new_impl)
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -8469,17 +8489,64 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
 
 	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
-		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
-		return -EINVAL;
+		/* Only exception is bpf_obj_new_impl */
+		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl]) {
+			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
+			return -EINVAL;
+		}
 	}
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
-		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
-						   &ptr_type_id);
-		if (!btf_type_is_struct(ptr_type)) {
+		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
+
+		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
+			if (!btf_type_is_void(ptr_type)) {
+				verbose(env, "kernel function %s must have void * return type\n",
+					meta.func_name);
+				return -EINVAL;
+			}
+			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+				const struct btf_type *ret_t;
+				struct btf *ret_btf;
+				u32 ret_btf_id;
+
+				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
+					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
+					return -EINVAL;
+				}
+
+				ret_btf = env->prog->aux->btf;
+				ret_btf_id = meta.arg_constant.value;
+
+				/* This may be NULL due to user not supplying a BTF */
+				if (!ret_btf) {
+					verbose(env, "bpf_obj_new requires prog BTF\n");
+					return -EINVAL;
+				}
+
+				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
+				if (!ret_t || !__btf_type_is_struct(ret_t)) {
+					verbose(env, "bpf_obj_new type ID argument must be of a struct\n");
+					return -EINVAL;
+				}
+
+				mark_reg_known_zero(env, regs, BPF_REG_0);
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
+				regs[BPF_REG_0].btf = ret_btf;
+				regs[BPF_REG_0].btf_id = ret_btf_id;
+
+				env->insn_aux_data[insn_idx].obj_new_size = ret_t->size;
+				env->insn_aux_data[insn_idx].kptr_struct_meta =
+					btf_find_struct_meta(ret_btf, ret_btf_id);
+			} else {
+				verbose(env, "kernel function %s unhandled dynamic return type\n",
+					meta.func_name);
+				return -EFAULT;
+			}
+		} else if (!__btf_type_is_struct(ptr_type)) {
 			if (!meta.r0_size) {
 				ptr_type_name = btf_name_by_offset(desc_btf,
 								   ptr_type->name_off);
@@ -8507,6 +8574,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 		}
+
 		if (is_kfunc_ret_null(&meta)) {
 			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
 			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
@@ -14644,8 +14712,8 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 	return err;
 }
 
-static int fixup_kfunc_call(struct bpf_verifier_env *env,
-			    struct bpf_insn *insn)
+static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
 {
 	const struct bpf_kfunc_desc *desc;
 
@@ -14664,8 +14732,21 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env,
 		return -EFAULT;
 	}
 
+	*cnt = 0;
 	insn->imm = desc->imm;
+	if (insn->off)
+		return 0;
+	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
+		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
+		u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size;
 
+		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, obj_new_size);
+		insn_buf[1] = addr[0];
+		insn_buf[2] = addr[1];
+		insn_buf[3] = *insn;
+		*cnt = 4;
+	}
 	return 0;
 }
 
@@ -14807,9 +14888,19 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
-			ret = fixup_kfunc_call(env, insn);
+			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
+			if (cnt == 0)
+				continue;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta	 += cnt - 1;
+			env->prog = prog = new_prog;
+			insn	  = new_prog->insnsi + i + delta;
 			continue;
 		}
 
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
new file mode 100644
index 000000000000..aeb6a7fcb7c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -0,0 +1,25 @@
+#ifndef __BPF_EXPERIMENTAL__
+#define __BPF_EXPERIMENTAL__
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+/* Description
+ *	Allocates an object of the type represented by 'local_type_id' in
+ *	program BTF. User may use the bpf_core_type_id_local macro to pass the
+ *	type ID of a struct in program BTF.
+ *
+ *	The 'local_type_id' parameter must be a known constant.
+ *	The 'meta' parameter is a hidden argument that is ignored.
+ * Returns
+ *	A pointer to an object of the type corresponding to the passed in
+ *	'local_type_id', or NULL on failure.
+ */
+extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_obj_new_impl */
+#define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
+
+#endif
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 18/26] bpf: Introduce bpf_obj_drop
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (16 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Introduce bpf_obj_drop, which is the kfunc used to free allocated
objects (allocated using bpf_obj_new). Pairing with bpf_obj_new, it
implicitly destructs the fields part of object automatically without
user intervention.

Just like the previous patch, btf_struct_meta that is needed to free up
the special fields is passed as a hidden argument to the kfunc.

For the user, a convenience macro hides over the kernel side kfunc which
is named bpf_obj_drop_impl.

Continuing the previous example:

void prog(void) {
	struct foo *f;

	f = bpf_obj_new(typeof(*f));
	if (!f)
		return;
	bpf_obj_drop(f);
}

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c                          | 11 ++++
 kernel/bpf/verifier.c                         | 66 +++++++++++++++----
 .../testing/selftests/bpf/bpf_experimental.h  | 13 ++++
 3 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c4f1c22cc44c..71d803ca0c1d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1770,6 +1770,16 @@ void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
 	return p;
 }
 
+void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	void *p = p__alloc;
+
+	if (meta)
+		bpf_obj_free_fields(meta->record, p);
+	bpf_mem_free(&bpf_global_ma, p);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -1777,6 +1787,7 @@ BTF_SET8_START(generic_btf_ids)
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c7f5d83783db..7372737cbde9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7875,6 +7875,10 @@ struct bpf_kfunc_call_arg_meta {
 		u64 value;
 		bool found;
 	} arg_constant;
+	struct {
+		struct btf *btf;
+		u32 btf_id;
+	} arg_obj_drop;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7953,6 +7957,11 @@ static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_para
 	return __kfunc_param_match_suffix(btf, arg, "__ign");
 }
 
+static bool is_kfunc_arg_alloc_obj(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__alloc");
+}
+
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 				      const struct btf_param *arg,
 				      const struct bpf_reg_state *reg,
@@ -8053,6 +8062,7 @@ static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
 
 enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_CTX,
+	KF_ARG_PTR_TO_ALLOC_BTF_ID,  /* Allocated object */
 	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
 	KF_ARG_PTR_TO_DYNPTR,
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
@@ -8060,6 +8070,20 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
 };
 
+enum special_kfunc_type {
+	KF_bpf_obj_new_impl,
+	KF_bpf_obj_drop_impl,
+};
+
+BTF_SET_START(special_kfunc_set)
+BTF_ID(func, bpf_obj_new_impl)
+BTF_ID(func, bpf_obj_drop_impl)
+BTF_SET_END(special_kfunc_set)
+
+BTF_ID_LIST(special_kfunc_list)
+BTF_ID(func, bpf_obj_new_impl)
+BTF_ID(func, bpf_obj_drop_impl)
+
 static enum kfunc_ptr_arg_type
 get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 		       struct bpf_kfunc_call_arg_meta *meta,
@@ -8080,6 +8104,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
 		return KF_ARG_PTR_TO_CTX;
 
+	if (is_kfunc_arg_alloc_obj(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_ALLOC_BTF_ID;
+
 	if (is_kfunc_arg_kptr_get(meta, argno)) {
 		if (!btf_type_is_ptr(ref_t)) {
 			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
@@ -8298,6 +8325,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			return kf_arg_type;
 
 		switch (kf_arg_type) {
+		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
 		case KF_ARG_PTR_TO_BTF_ID:
 			if (!is_kfunc_trusted_args(meta))
 				break;
@@ -8334,6 +8362,21 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			break;
+		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to allocated object\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			if (meta->btf == btf_vmlinux &&
+			    meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+				meta->arg_obj_drop.btf = reg->btf;
+				meta->arg_obj_drop.btf_id = reg->btf_id;
+			}
+			break;
 		case KF_ARG_PTR_TO_KPTR_STRONG:
 			if (reg->type != PTR_TO_MAP_VALUE) {
 				verbose(env, "arg#0 expected pointer to map value\n");
@@ -8404,17 +8447,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	return 0;
 }
 
-enum special_kfunc_type {
-	KF_bpf_obj_new_impl,
-};
-
-BTF_SET_START(special_kfunc_set)
-BTF_ID(func, bpf_obj_new_impl)
-BTF_SET_END(special_kfunc_set)
-
-BTF_ID_LIST(special_kfunc_list)
-BTF_ID(func, bpf_obj_new_impl)
-
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -8541,6 +8573,10 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				env->insn_aux_data[insn_idx].obj_new_size = ret_t->size;
 				env->insn_aux_data[insn_idx].kptr_struct_meta =
 					btf_find_struct_meta(ret_btf, ret_btf_id);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+				env->insn_aux_data[insn_idx].kptr_struct_meta =
+					btf_find_struct_meta(meta.arg_obj_drop.btf,
+							     meta.arg_obj_drop.btf_id);
 			} else {
 				verbose(env, "kernel function %s unhandled dynamic return type\n",
 					meta.func_name);
@@ -14746,6 +14782,14 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[2] = addr[1];
 		insn_buf[3] = *insn;
 		*cnt = 4;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
+		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
+
+		insn_buf[0] = addr[0];
+		insn_buf[1] = addr[1];
+		insn_buf[2] = *insn;
+		*cnt = 3;
 	}
 	return 0;
 }
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index aeb6a7fcb7c4..8473395a11af 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -22,4 +22,17 @@ extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
 /* Convenience macro to wrap over bpf_obj_new_impl */
 #define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
 
+/* Description
+ *	Free an allocated object. All fields of the object that require
+ *	destruction will be destructed before the storage is freed.
+ *
+ *	The 'meta' parameter is a hidden argument that is ignored.
+ * Returns
+ *	Void.
+ */
+extern void bpf_obj_drop_impl(void *kptr, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_obj_drop_impl */
+#define bpf_obj_drop(kptr) bpf_obj_drop_impl(kptr, NULL)
+
 #endif
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (17 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 18/26] bpf: Introduce bpf_obj_drop Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  6:23   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Pointer increment on seeing PTR_MAYBE_NULL is already protected against,
hence make an exception for PTR_TO_BTF_ID | MEM_ALLOC while still
keeping the warning for other unintended cases that might creep in.

bpf_list_pop_{front,_back} helpers planned to be introduced in next
commit will return a MEM_ALLOC register with incremented offset pointing
to bpf_list_node field. The user is supposed to then obtain the pointer
to the entry using container_of after NULL checking it. The current
restrictions trigger a warning when doing the NULL checking. Revisiting
the reason, it is meant as an assertion which seems to actually work and
catch the bad case.

Hence, under no other circumstances can reg->off be non-zero for a
register that has the PTR_MAYBE_NULL type flag set.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7372737cbde9..e194c3feb01f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10800,15 +10800,20 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 {
 	if (type_may_be_null(reg->type) && reg->id == id &&
 	    !WARN_ON_ONCE(!reg->id)) {
-		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||
-				 !tnum_equals_const(reg->var_off, 0) ||
-				 reg->off)) {
+		if (reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0) || reg->off) {
 			/* Old offset (both fixed and variable parts) should
 			 * have been known-zero, because we don't allow pointer
 			 * arithmetic on pointers that might be NULL. If we
 			 * see this happening, don't convert the register.
+			 *
+			 * But in some cases, some helpers that return local
+			 * kptrs advance offset for the returned pointer.
+			 * In those cases, it is fine to expect to see reg->off.
 			 */
-			return;
+			if (WARN_ON_ONCE(reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL)))
+				return;
+			if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0)))
+				return;
 		}
 		if (is_null) {
 			reg->type = SCALAR_VALUE;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (18 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  6:26   ` Alexei Starovoitov
  2022-11-14 19:15 ` [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back} Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Add a linked list API for use in BPF programs, where it expects
protection from the bpf_spin_lock in the same allocation as the
bpf_list_head. For now, only one bpf_spin_lock can be present hence that
is assumed to be the one protecting the bpf_list_head.

The following functions are added to kick things off:

// Add node to beginning of list
void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node);

// Add node to end of list
void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node);

// Remove node at beginning of list and return it
struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head);

// Remove node at end of list and return it
struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head);

The lock protecting the bpf_list_head needs to be taken for all
operations. The verifier ensures that the lock that needs to be taken is
always held, and only the correct lock is taken for these operations.
These checks are made statically by relying on the reg->id preserved for
registers pointing into regions having both bpf_spin_lock and the
objects protected by it. The comment over check_reg_allocation_locked in
this change describes the logic in detail.

Note that bpf_list_push_front and bpf_list_push_back are meant to
consume the object containing the node in the 1st argument, however that
specific mechanism is intended to not release the ref_obj_id directly
until the bpf_spin_unlock is called. In this commit, nothing is done,
but the next commit will be introducing logic to handle this case, so it
has been left as is for now.

bpf_list_pop_front and bpf_list_pop_back delete the first or last item
of the list respectively, and return pointer to the element at the
list_node offset. The user can then use container_of style macro to get
the actual entry type. The verifier however statically knows the actual
type, so the safety properties are still preserved.

With these additions, programs can now manage their own linked lists and
store their objects in them.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c                          |  55 +++-
 kernel/bpf/verifier.c                         | 292 +++++++++++++++++-
 .../testing/selftests/bpf/bpf_experimental.h  |  28 ++
 3 files changed, 361 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 71d803ca0c1d..212e791d7452 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1780,6 +1780,50 @@ void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
 	bpf_mem_free(&bpf_global_ma, p);
 }
 
+static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail)
+{
+	struct list_head *n = (void *)node, *h = (void *)head;
+
+	if (unlikely(!h->next))
+		INIT_LIST_HEAD(h);
+	if (unlikely(!n->next))
+		INIT_LIST_HEAD(n);
+	tail ? list_add_tail(n, h) : list_add(n, h);
+}
+
+void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node)
+{
+	return __bpf_list_add(node, head, false);
+}
+
+void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node)
+{
+	return __bpf_list_add(node, head, true);
+}
+
+static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail)
+{
+	struct list_head *n, *h = (void *)head;
+
+	if (unlikely(!h->next))
+		INIT_LIST_HEAD(h);
+	if (list_empty(h))
+		return NULL;
+	n = tail ? h->prev : h->next;
+	list_del_init(n);
+	return (struct bpf_list_node *)n;
+}
+
+struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head)
+{
+	return __bpf_list_del(head, false);
+}
+
+struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
+{
+	return __bpf_list_del(head, true);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -1788,6 +1832,10 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_list_push_front)
+BTF_ID_FLAGS(func, bpf_list_push_back)
+BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
@@ -1797,7 +1845,12 @@ static const struct btf_kfunc_id_set generic_kfunc_set = {
 
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+	if (ret)
+		return ret;
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
 }
 
 late_initcall(kfunc_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e194c3feb01f..c034ca2d9479 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7879,6 +7879,9 @@ struct bpf_kfunc_call_arg_meta {
 		struct btf *btf;
 		u32 btf_id;
 	} arg_obj_drop;
+	struct {
+		struct btf_field *field;
+	} arg_list_head;
 };
 
 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
@@ -7989,13 +7992,17 @@ static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 
 enum {
 	KF_ARG_DYNPTR_ID,
+	KF_ARG_LIST_HEAD_ID,
+	KF_ARG_LIST_NODE_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
 BTF_ID(struct, bpf_dynptr_kern)
+BTF_ID(struct, bpf_list_head)
+BTF_ID(struct, bpf_list_node)
 
-static bool is_kfunc_arg_dynptr(const struct btf *btf,
-				const struct btf_param *arg)
+static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
+				    const struct btf_param *arg, int type)
 {
 	const struct btf_type *t;
 	u32 res_id;
@@ -8008,7 +8015,22 @@ static bool is_kfunc_arg_dynptr(const struct btf *btf,
 	t = btf_type_skip_modifiers(btf, t->type, &res_id);
 	if (!t)
 		return false;
-	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
+	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[type]);
+}
+
+static bool is_kfunc_arg_dynptr(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_DYNPTR_ID);
+}
+
+static bool is_kfunc_arg_list_head(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_HEAD_ID);
+}
+
+static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID);
 }
 
 /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
@@ -8065,6 +8087,8 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_ALLOC_BTF_ID,  /* Allocated object */
 	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
 	KF_ARG_PTR_TO_DYNPTR,
+	KF_ARG_PTR_TO_LIST_HEAD,
+	KF_ARG_PTR_TO_LIST_NODE,
 	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
 	KF_ARG_PTR_TO_MEM,
 	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
@@ -8073,16 +8097,28 @@ enum kfunc_ptr_arg_type {
 enum special_kfunc_type {
 	KF_bpf_obj_new_impl,
 	KF_bpf_obj_drop_impl,
+	KF_bpf_list_push_front,
+	KF_bpf_list_push_back,
+	KF_bpf_list_pop_front,
+	KF_bpf_list_pop_back,
 };
 
 BTF_SET_START(special_kfunc_set)
 BTF_ID(func, bpf_obj_new_impl)
 BTF_ID(func, bpf_obj_drop_impl)
+BTF_ID(func, bpf_list_push_front)
+BTF_ID(func, bpf_list_push_back)
+BTF_ID(func, bpf_list_pop_front)
+BTF_ID(func, bpf_list_pop_back)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
 BTF_ID(func, bpf_obj_new_impl)
 BTF_ID(func, bpf_obj_drop_impl)
+BTF_ID(func, bpf_list_push_front)
+BTF_ID(func, bpf_list_push_back)
+BTF_ID(func, bpf_list_pop_front)
+BTF_ID(func, bpf_list_pop_back)
 
 static enum kfunc_ptr_arg_type
 get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
@@ -8125,6 +8161,12 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_DYNPTR;
 
+	if (is_kfunc_arg_list_head(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_LIST_HEAD;
+
+	if (is_kfunc_arg_list_node(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_LIST_NODE;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -8220,6 +8262,194 @@ static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* Implementation details:
+ *
+ * Each register points to some region of memory, which we define as an
+ * allocation. Each allocation may embed a bpf_spin_lock which protects any
+ * special BPF objects (bpf_list_head, bpf_rb_root, etc.) part of the same
+ * allocation. The lock and the data it protects are colocated in the same
+ * memory region.
+ *
+ * Hence, everytime a register holds a pointer value pointing to such
+ * allocation, the verifier preserves a unique reg->id for it.
+ *
+ * The verifier remembers the lock 'ptr' and the lock 'id' whenever
+ * bpf_spin_lock is called.
+ *
+ * To enable this, lock state in the verifier captures two values:
+ *	active_lock.ptr = Register's type specific pointer
+ *	active_lock.id  = A unique ID for each register pointer value
+ *
+ * Currently, PTR_TO_MAP_VALUE and PTR_TO_BTF_ID | MEM_ALLOC are the two
+ * supported register types.
+ *
+ * The active_lock.ptr in case of map values is the reg->map_ptr, and in case of
+ * allocated objects is the reg->btf pointer.
+ *
+ * The active_lock.id is non-unique for maps supporting direct_value_addr, as we
+ * can establish the provenance of the map value statically for each distinct
+ * lookup into such maps. They always contain a single map value hence unique
+ * IDs for each pseudo load pessimizes the algorithm and rejects valid programs.
+ *
+ * So, in case of global variables, they use array maps with max_entries = 1,
+ * hence their active_lock.ptr becomes map_ptr and id = 0 (since they all point
+ * into the same map value as max_entries is 1, as described above).
+ *
+ * In case of inner map lookups, the inner map pointer has same map_ptr as the
+ * outer map pointer (in verifier context), but each lookup into an inner map
+ * assigns a fresh reg->id to the lookup, so while lookups into distinct inner
+ * maps from the same outer map share the same map_ptr as active_lock.ptr, they
+ * will get different reg->id assigned to each lookup, hence different
+ * active_lock.id.
+ *
+ * In case of allocated objects, active_lock.ptr is the reg->btf, and the
+ * reg->id is a unique ID preserved after the NULL pointer check on the pointer
+ * returned from bpf_obj_new. Each allocation receives a new reg->id.
+ */
+static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	void *ptr;
+	u32 id;
+
+	switch ((int)reg->type) {
+	case PTR_TO_MAP_VALUE:
+		ptr = reg->map_ptr;
+		break;
+	case PTR_TO_BTF_ID | MEM_ALLOC:
+		ptr = reg->btf;
+		break;
+	default:
+		verbose(env, "verifier internal error: unknown reg type for lock check\n");
+		return -EFAULT;
+	}
+	id = reg->id;
+
+	if (!env->cur_state->active_lock.ptr)
+		return -EINVAL;
+	if (env->cur_state->active_lock.ptr != ptr ||
+	    env->cur_state->active_lock.id != id) {
+		verbose(env, "held lock and object are not in the same allocation\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static bool is_bpf_list_api_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_list_push_front] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_push_back] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_pop_front] ||
+	       btf_id == special_kfunc_list[KF_bpf_list_pop_back];
+}
+
+static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct btf_record *rec = NULL;
+	struct btf_field *field;
+	u32 list_head_off;
+
+	if (meta->btf != btf_vmlinux || !is_bpf_list_api_kfunc(meta->func_id)) {
+		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+		return -EFAULT;
+	}
+
+	if (reg->type == PTR_TO_MAP_VALUE) {
+		rec = reg->map_ptr->record;
+	} else /* PTR_TO_BTF_ID | MEM_ALLOC */ {
+		struct btf_struct_meta *meta;
+
+		meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+		if (!meta) {
+			verbose(env, "bpf_list_head not found for allocated object\n");
+			return -EINVAL;
+		}
+		rec = meta->record;
+	}
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+
+	list_head_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, list_head_off, BPF_LIST_HEAD);
+	if (!field) {
+		verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off);
+		return -EINVAL;
+	}
+
+	/* All functions require bpf_list_head to be protected using a bpf_spin_lock */
+	if (check_reg_allocation_locked(env, reg)) {
+		verbose(env, "bpf_spin_lock at off=%d must be held for bpf_list_head\n",
+			rec->spin_lock_off);
+		return -EINVAL;
+	}
+
+	if (meta->arg_list_head.field) {
+		verbose(env, "verifier internal error: repeating bpf_list_head arg\n");
+		return -EFAULT;
+	}
+	meta->arg_list_head.field = field;
+	return 0;
+}
+
+static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
+					   struct bpf_reg_state *reg, u32 regno,
+					   struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct btf_struct_meta *struct_meta;
+	struct btf_field *field;
+	struct btf_record *rec;
+	u32 list_node_off;
+
+	if (meta->btf != btf_vmlinux ||
+	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
+	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
+		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
+		return -EFAULT;
+	}
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+
+	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
+	if (!struct_meta) {
+		verbose(env, "bpf_list_node not found for allocated object\n");
+		return -EINVAL;
+	}
+	rec = struct_meta->record;
+
+	list_node_off = reg->off + reg->var_off.value;
+	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
+	if (!field || field->offset != list_node_off) {
+		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
+		return -EINVAL;
+	}
+
+	field = meta->arg_list_head.field;
+
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
+				  field->list_head.value_btf_id, true)) {
+		verbose(env, "bpf_list_head value type does not match arg#1\n");
+		return -EINVAL;
+	}
+
+	if (list_node_off != field->list_head.node_offset) {
+		verbose(env, "arg#1 offset must be for bpf_list_node at off=%d\n",
+			field->list_head.node_offset);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
 {
 	const char *func_name = meta->func_name, *ref_tname;
@@ -8340,6 +8570,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		case KF_ARG_PTR_TO_KPTR_STRONG:
 		case KF_ARG_PTR_TO_DYNPTR:
+		case KF_ARG_PTR_TO_LIST_HEAD:
+		case KF_ARG_PTR_TO_LIST_NODE:
 		case KF_ARG_PTR_TO_MEM:
 		case KF_ARG_PTR_TO_MEM_SIZE:
 			/* Trusted by default */
@@ -8404,6 +8636,33 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			break;
+		case KF_ARG_PTR_TO_LIST_HEAD:
+			if (reg->type != PTR_TO_MAP_VALUE &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_LIST_NODE:
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d expected pointer to allocated object\n", i);
+				return -EINVAL;
+			}
+			if (!reg->ref_obj_id) {
+				verbose(env, "allocated object must be referenced\n");
+				return -EINVAL;
+			}
+			ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta);
+			if (ret < 0)
+				return ret;
+			break;
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if (reg->type != PTR_TO_BTF_ID &&
@@ -8535,11 +8794,6 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
 
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (!btf_type_is_void(ptr_type)) {
-				verbose(env, "kernel function %s must have void * return type\n",
-					meta.func_name);
-				return -EINVAL;
-			}
 			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
 				const struct btf_type *ret_t;
 				struct btf *ret_btf;
@@ -8577,6 +8831,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				env->insn_aux_data[insn_idx].kptr_struct_meta =
 					btf_find_struct_meta(meta.arg_obj_drop.btf,
 							     meta.arg_obj_drop.btf_id);
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front] ||
+				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
+				struct btf_field *field = meta.arg_list_head.field;
+
+				mark_reg_known_zero(env, regs, BPF_REG_0);
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
+				regs[BPF_REG_0].btf = field->list_head.btf;
+				regs[BPF_REG_0].btf_id = field->list_head.value_btf_id;
+				regs[BPF_REG_0].off = field->list_head.node_offset;
 			} else {
 				verbose(env, "kernel function %s unhandled dynamic return type\n",
 					meta.func_name);
@@ -13244,11 +13507,14 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_lock.ptr &&
-				    (insn->src_reg == BPF_PSEUDO_CALL ||
-				     insn->imm != BPF_FUNC_spin_unlock)) {
-					verbose(env, "function calls are not allowed while holding a lock\n");
-					return -EINVAL;
+				if (env->cur_state->active_lock.ptr) {
+					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
+					    (insn->src_reg == BPF_PSEUDO_CALL) ||
+					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+					     (insn->off != 0 || !is_bpf_list_api_kfunc(insn->imm)))) {
+						verbose(env, "function calls are not allowed while holding a lock\n");
+						return -EINVAL;
+					}
 				}
 				if (insn->src_reg == BPF_PSEUDO_CALL)
 					err = check_func_call(env, insn, &env->insn_idx);
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 8473395a11af..d6b143275e82 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -35,4 +35,32 @@ extern void bpf_obj_drop_impl(void *kptr, void *meta) __ksym;
 /* Convenience macro to wrap over bpf_obj_drop_impl */
 #define bpf_obj_drop(kptr) bpf_obj_drop_impl(kptr, NULL)
 
+/* Description
+ *	Add a new entry to the beginning of the BPF linked list.
+ * Returns
+ *	Void.
+ */
+extern void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) __ksym;
+
+/* Description
+ *	Add a new entry to the end of the BPF linked list.
+ * Returns
+ *	Void.
+ */
+extern void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) __ksym;
+
+/* Description
+ *	Remove the entry at the beginning of the BPF linked list.
+ * Returns
+ *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
+ */
+extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ksym;
+
+/* Description
+ *	Remove the entry at the end of the BPF linked list.
+ * Returns
+ *	Pointer to bpf_list_node of deleted entry, or NULL if list is empty.
+ */
+extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym;
+
 #endif
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (19 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15 16:52   ` Dave Marchevsky
  2022-11-14 19:15 ` [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

This commit implements the delayed release logic for bpf_list_push_front
and bpf_list_push_back.

Once a node has been added to the list, it's pointer changes to
PTR_UNTRUSTED. However, it is only released once the lock protecting the
list is unlocked. For such PTR_TO_BTF_ID | MEM_ALLOC with PTR_UNTRUSTED
set but an active ref_obj_id, it is still permitted to read them as long
as the lock is held. Writing to them is not allowed.

This allows having read access to push items we no longer own until we
release the lock guarding the list, allowing a little more flexibility
when working with these APIs.

Note that enabling write support has fairly tricky interactions with
what happens inside the critical section. Just as an example, currently,
bpf_obj_drop is not permitted, but if it were, being able to write to
the PTR_UNTRUSTED pointer while the object gets released back to the
memory allocator would violate safety properties we wish to guarantee
(i.e. not crashing the kernel). The memory could be reused for a
different type in the BPF program or even in the kernel as it gets
eventually kfree'd.

Not enabling bpf_obj_drop inside the critical section would appear to
prevent all of the above, but that is more of an artifical limitation
right now. Since the write support is tangled with how we handle
potential aliasing of nodes inside the critical section that may or may
not be part of the list anymore, it has been deferred to a future patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  5 ++++
 kernel/bpf/verifier.c        | 48 +++++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 306fc1d6cc4a..740e774e1c7a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -223,6 +223,11 @@ struct bpf_reference_state {
 	 * exiting a callback function.
 	 */
 	int callback_ref;
+	/* Mark the reference state to release the registers sharing the same id
+	 * on bpf_spin_unlock (for nodes that we will lose ownership to but are
+	 * safe to access inside the critical section).
+	 */
+	bool release_on_unlock;
 };
 
 /* state of the program:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c034ca2d9479..8725c2ee7eb4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5648,7 +5648,9 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			cur->active_lock.ptr = btf;
 		cur->active_lock.id = reg->id;
 	} else {
+		struct bpf_func_state *fstate = cur_func(env);
 		void *ptr;
+		int i;
 
 		if (map)
 			ptr = map;
@@ -5666,6 +5668,16 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 		}
 		cur->active_lock.ptr = NULL;
 		cur->active_lock.id = 0;
+
+		for (i = 0; i < fstate->acquired_refs; i++) {
+			/* WARN because this reference state cannot be freed
+			 * before this point, as bpf_spin_lock critical section
+			 * does not allow functions that release the allocated
+			 * object immediately.
+			 */
+			if (fstate->refs[i].release_on_unlock)
+				WARN_ON_ONCE(release_reference(env, fstate->refs[i].id));
+		}
 	}
 	return 0;
 }
@@ -8262,6 +8274,39 @@ static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id)
+{
+	struct bpf_func_state *state = cur_func(env);
+	struct bpf_reg_state *reg;
+	int i;
+
+	/* bpf_spin_lock only allows calling list_push and list_pop, no BPF
+	 * subprogs, no global functions. This means that the references would
+	 * not be released inside the critical section but they may be added to
+	 * the reference state, and the acquired_refs are never copied out for a
+	 * different frame as BPF to BPF calls don't work in bpf_spin_lock
+	 * critical sections.
+	 */
+	if (!ref_obj_id) {
+		verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n");
+		return -EFAULT;
+	}
+	for (i = 0; i < state->acquired_refs; i++) {
+		if (state->refs[i].id == ref_obj_id) {
+			WARN_ON_ONCE(state->refs[i].release_on_unlock);
+			state->refs[i].release_on_unlock = true;
+			/* Now mark everyone sharing same ref_obj_id as untrusted */
+			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+				if (reg->ref_obj_id == ref_obj_id)
+					reg->type |= PTR_UNTRUSTED;
+			}));
+			return 0;
+		}
+	}
+	verbose(env, "verifier internal error: ref state missing for ref_obj_id\n");
+	return -EFAULT;
+}
+
 /* Implementation details:
  *
  * Each register points to some region of memory, which we define as an
@@ -8447,7 +8492,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
 			field->list_head.node_offset);
 		return -EINVAL;
 	}
-	return 0;
+	/* Set arg#1 for expiration after unlock */
+	return ref_set_release_on_unlock(env, reg->ref_obj_id);
 }
 
 static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (20 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back} Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15 16:40   ` Dave Marchevsky
  2022-11-14 19:15 ` [PATCH bpf-next v7 23/26] selftests/bpf: Update spinlock selftest Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Add user facing __contains macro which provides a convenient wrapper
over the verbose kernel specific BTF declaration tag required to
annotate BPF list head structs in user types.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/bpf_experimental.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index d6b143275e82..424f7bbbfe9b 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -6,6 +6,8 @@
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_core_read.h>
 
+#define __contains(name, node) __attribute__((btf_decl_tag("contains:" #name ":" #node)))
+
 /* Description
  *	Allocates an object of the type represented by 'local_type_id' in
  *	program BTF. User may use the bpf_core_type_id_local macro to pass the
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 23/26] selftests/bpf: Update spinlock selftest
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (21 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 24/26] selftests/bpf: Add failure test cases for spin lock pairing Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Make updates in preparation for adding more test cases to this selftest:
- Convert from CHECK_ to ASSERT macros.
- Use BPF skeleton
- Fix typo sping -> spin
- Rename spinlock.c -> spin_lock.c

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/spin_lock.c      | 49 +++++++++++++++++++
 .../selftests/bpf/prog_tests/spinlock.c       | 45 -----------------
 .../selftests/bpf/progs/test_spin_lock.c      |  4 +-
 3 files changed, 51 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/spin_lock.c
 delete mode 100644 tools/testing/selftests/bpf/prog_tests/spinlock.c

diff --git a/tools/testing/selftests/bpf/prog_tests/spin_lock.c b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
new file mode 100644
index 000000000000..fab061e9d77c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "test_spin_lock.skel.h"
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_OK(topts.retval, "test_run retval");
+	pthread_exit(arg);
+}
+
+void test_spinlock(void)
+{
+	struct test_spin_lock *skel;
+	pthread_t thread_id[4];
+	int prog_fd, i;
+	void *ret;
+
+	skel = test_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_spin_lock__open_and_load"))
+		return;
+	prog_fd = bpf_program__fd(skel->progs.bpf_spin_lock_test);
+	for (i = 0; i < 4; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	for (i = 0; i < 4; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+end:
+	test_spin_lock__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
deleted file mode 100644
index 15eb1372d771..000000000000
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ /dev/null
@@ -1,45 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <test_progs.h>
-#include <network_helpers.h>
-
-static void *spin_lock_thread(void *arg)
-{
-	int err, prog_fd = *(u32 *) arg;
-	LIBBPF_OPTS(bpf_test_run_opts, topts,
-		.data_in = &pkt_v4,
-		.data_size_in = sizeof(pkt_v4),
-		.repeat = 10000,
-	);
-
-	err = bpf_prog_test_run_opts(prog_fd, &topts);
-	ASSERT_OK(err, "test_run");
-	ASSERT_OK(topts.retval, "test_run retval");
-	pthread_exit(arg);
-}
-
-void test_spinlock(void)
-{
-	const char *file = "./test_spin_lock.bpf.o";
-	pthread_t thread_id[4];
-	struct bpf_object *obj = NULL;
-	int prog_fd;
-	int err = 0, i;
-	void *ret;
-
-	err = bpf_prog_test_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
-	if (CHECK_FAIL(err)) {
-		printf("test_spin_lock:bpf_prog_test_load errno %d\n", errno);
-		goto close_prog;
-	}
-	for (i = 0; i < 4; i++)
-		if (CHECK_FAIL(pthread_create(&thread_id[i], NULL,
-					      &spin_lock_thread, &prog_fd)))
-			goto close_prog;
-
-	for (i = 0; i < 4; i++)
-		if (CHECK_FAIL(pthread_join(thread_id[i], &ret) ||
-			       ret != (void *)&prog_fd))
-			goto close_prog;
-close_prog:
-	bpf_object__close(obj);
-}
diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock.c b/tools/testing/selftests/bpf/progs/test_spin_lock.c
index 7e88309d3229..5bd10409285b 100644
--- a/tools/testing/selftests/bpf/progs/test_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/test_spin_lock.c
@@ -45,8 +45,8 @@ struct {
 
 #define CREDIT_PER_NS(delta, rate) (((delta) * rate) >> 20)
 
-SEC("tc")
-int bpf_sping_lock_test(struct __sk_buff *skb)
+SEC("cgroup_skb/ingress")
+int bpf_spin_lock_test(struct __sk_buff *skb)
 {
 	volatile int credit = 0, max_credit = 100, pkt_len = 64;
 	struct hmap_elem zero = {}, *val;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 24/26] selftests/bpf: Add failure test cases for spin lock pairing
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (22 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 23/26] selftests/bpf: Update spinlock selftest Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 25/26] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

First, ensure that whenever a bpf_spin_lock is present in an allocation,
the reg->id is preserved. This won't be true for global variables
however, since they have a single map value per map, hence the verifier
harcodes it to 0 (so that multiple pseudo ldimm64 insns can yield the
same lock object per map at a given offset).

Next, add test cases for all possible combinations (kptr, global, map
value, inner map value). Since we lifted restriction on locking in inner
maps, also add test cases for them. Currently, each lookup into an inner
map gets a fresh reg->id, so even if the reg->map_ptr is same, they will
be treated as separate allocations and the incorrect unlock pairing will
be rejected.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/spin_lock.c      |  89 +++++++-
 .../selftests/bpf/progs/test_spin_lock_fail.c | 204 ++++++++++++++++++
 2 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/spin_lock.c b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
index fab061e9d77c..72282e92a78a 100644
--- a/tools/testing/selftests/bpf/prog_tests/spin_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spin_lock.c
@@ -3,6 +3,79 @@
 #include <network_helpers.h>
 
 #include "test_spin_lock.skel.h"
+#include "test_spin_lock_fail.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} spin_lock_fail_tests[] = {
+	{ "lock_id_kptr_preserve",
+	  "5: (bf) r1 = r0                       ; R0_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) "
+	  "R1_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) refs=2\n6: (85) call bpf_this_cpu_ptr#154\n"
+	  "R1 type=ptr_ expected=percpu_ptr_" },
+	{ "lock_id_global_zero",
+	  "; R1_w=map_value(off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n"
+	  "R1 type=map_value expected=percpu_ptr_" },
+	{ "lock_id_mapval_preserve",
+	  "8: (bf) r1 = r0                       ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) "
+	  "R1_w=map_value(id=1,off=0,ks=4,vs=8,imm=0)\n9: (85) call bpf_this_cpu_ptr#154\n"
+	  "R1 type=map_value expected=percpu_ptr_" },
+	{ "lock_id_innermapval_preserve",
+	  "13: (bf) r1 = r0                      ; R0=map_value(id=2,off=0,ks=4,vs=8,imm=0) "
+	  "R1_w=map_value(id=2,off=0,ks=4,vs=8,imm=0)\n14: (85) call bpf_this_cpu_ptr#154\n"
+	  "R1 type=map_value expected=percpu_ptr_" },
+	{ "lock_id_mismatch_kptr_kptr", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_kptr_global", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_kptr_mapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_kptr_innermapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_global_global", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_global_kptr", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_global_mapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_global_innermapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_mapval_mapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_mapval_kptr", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_mapval_global", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_mapval_innermapval", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_innermapval_innermapval1", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_innermapval_innermapval2", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_innermapval_kptr", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_innermapval_global", "bpf_spin_unlock of different lock" },
+	{ "lock_id_mismatch_innermapval_mapval", "bpf_spin_unlock of different lock" },
+};
+
+static void test_spin_lock_fail_prog(const char *prog_name, const char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
+						.kernel_log_size = sizeof(log_buf),
+						.kernel_log_level = 1);
+	struct test_spin_lock_fail *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = test_spin_lock_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "test_spin_lock_fail__open_opts"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto end;
+
+	bpf_program__set_autoload(prog, true);
+
+	ret = test_spin_lock_fail__load(skel);
+	if (!ASSERT_ERR(ret, "test_spin_lock_fail__load must fail"))
+		goto end;
+
+	if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+		fprintf(stderr, "Expected: %s\n", err_msg);
+		fprintf(stderr, "Verifier: %s\n", log_buf);
+	}
+
+end:
+	test_spin_lock_fail__destroy(skel);
+}
 
 static void *spin_lock_thread(void *arg)
 {
@@ -19,7 +92,7 @@ static void *spin_lock_thread(void *arg)
 	pthread_exit(arg);
 }
 
-void test_spinlock(void)
+void test_spin_lock_success(void)
 {
 	struct test_spin_lock *skel;
 	pthread_t thread_id[4];
@@ -47,3 +120,17 @@ void test_spinlock(void)
 end:
 	test_spin_lock__destroy(skel);
 }
+
+void test_spin_lock(void)
+{
+	int i;
+
+	test_spin_lock_success();
+
+	for (i = 0; i < ARRAY_SIZE(spin_lock_fail_tests); i++) {
+		if (!test__start_subtest(spin_lock_fail_tests[i].prog_name))
+			continue;
+		test_spin_lock_fail_prog(spin_lock_fail_tests[i].prog_name,
+					 spin_lock_fail_tests[i].err_msg);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c
new file mode 100644
index 000000000000..86cd183ef6dc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+struct foo {
+	struct bpf_spin_lock lock;
+	int data;
+};
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct foo);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+	__array(values, struct array_map);
+} map_of_maps SEC(".maps") = {
+	.values = {
+		[0] = &array_map,
+	},
+};
+
+SEC(".data.A") struct bpf_spin_lock lockA;
+SEC(".data.B") struct bpf_spin_lock lockB;
+
+SEC("?tc")
+int lock_id_kptr_preserve(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_this_cpu_ptr(f);
+	return 0;
+}
+
+SEC("?tc")
+int lock_id_global_zero(void *ctx)
+{
+	bpf_this_cpu_ptr(&lockA);
+	return 0;
+}
+
+SEC("?tc")
+int lock_id_mapval_preserve(void *ctx)
+{
+	struct foo *f;
+	int key = 0;
+
+	f = bpf_map_lookup_elem(&array_map, &key);
+	if (!f)
+		return 0;
+	bpf_this_cpu_ptr(f);
+	return 0;
+}
+
+SEC("?tc")
+int lock_id_innermapval_preserve(void *ctx)
+{
+	struct foo *f;
+	int key = 0;
+	void *map;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &key);
+	if (!map)
+		return 0;
+	f = bpf_map_lookup_elem(map, &key);
+	if (!f)
+		return 0;
+	bpf_this_cpu_ptr(f);
+	return 0;
+}
+
+#define CHECK(test, A, B)                                      \
+	SEC("?tc")                                             \
+	int lock_id_mismatch_##test(void *ctx)                 \
+	{                                                      \
+		struct foo *f1, *f2, *v, *iv;                  \
+		int key = 0;                                   \
+		void *map;                                     \
+                                                               \
+		map = bpf_map_lookup_elem(&map_of_maps, &key); \
+		if (!map)                                      \
+			return 0;                              \
+		iv = bpf_map_lookup_elem(map, &key);           \
+		if (!iv)                                       \
+			return 0;                              \
+		v = bpf_map_lookup_elem(&array_map, &key);     \
+		if (!v)                                        \
+			return 0;                              \
+		f1 = bpf_obj_new(typeof(*f1));                 \
+		if (!f1)                                       \
+			return 0;                              \
+		f2 = bpf_obj_new(typeof(*f2));                 \
+		if (!f2) {                                     \
+			bpf_obj_drop(f1);                      \
+			return 0;                              \
+		}                                              \
+		bpf_spin_lock(A);                              \
+		bpf_spin_unlock(B);                            \
+		return 0;                                      \
+	}
+
+CHECK(kptr_kptr, &f1->lock, &f2->lock);
+CHECK(kptr_global, &f1->lock, &lockA);
+CHECK(kptr_mapval, &f1->lock, &v->lock);
+CHECK(kptr_innermapval, &f1->lock, &iv->lock);
+
+CHECK(global_global, &lockA, &lockB);
+CHECK(global_kptr, &lockA, &f1->lock);
+CHECK(global_mapval, &lockA, &v->lock);
+CHECK(global_innermapval, &lockA, &iv->lock);
+
+SEC("?tc")
+int lock_id_mismatch_mapval_mapval(void *ctx)
+{
+	struct foo *f1, *f2;
+	int key = 0;
+
+	f1 = bpf_map_lookup_elem(&array_map, &key);
+	if (!f1)
+		return 0;
+	f2 = bpf_map_lookup_elem(&array_map, &key);
+	if (!f2)
+		return 0;
+
+	bpf_spin_lock(&f1->lock);
+	f1->data = 42;
+	bpf_spin_unlock(&f2->lock);
+
+	return 0;
+}
+
+CHECK(mapval_kptr, &v->lock, &f1->lock);
+CHECK(mapval_global, &v->lock, &lockB);
+CHECK(mapval_innermapval, &v->lock, &iv->lock);
+
+SEC("?tc")
+int lock_id_mismatch_innermapval_innermapval1(void *ctx)
+{
+	struct foo *f1, *f2;
+	int key = 0;
+	void *map;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &key);
+	if (!map)
+		return 0;
+	f1 = bpf_map_lookup_elem(map, &key);
+	if (!f1)
+		return 0;
+	f2 = bpf_map_lookup_elem(map, &key);
+	if (!f2)
+		return 0;
+
+	bpf_spin_lock(&f1->lock);
+	f1->data = 42;
+	bpf_spin_unlock(&f2->lock);
+
+	return 0;
+}
+
+SEC("?tc")
+int lock_id_mismatch_innermapval_innermapval2(void *ctx)
+{
+	struct foo *f1, *f2;
+	int key = 0;
+	void *map;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &key);
+	if (!map)
+		return 0;
+	f1 = bpf_map_lookup_elem(map, &key);
+	if (!f1)
+		return 0;
+	map = bpf_map_lookup_elem(&map_of_maps, &key);
+	if (!map)
+		return 0;
+	f2 = bpf_map_lookup_elem(map, &key);
+	if (!f2)
+		return 0;
+
+	bpf_spin_lock(&f1->lock);
+	f1->data = 42;
+	bpf_spin_unlock(&f2->lock);
+
+	return 0;
+}
+
+CHECK(innermapval_kptr, &iv->lock, &f1->lock);
+CHECK(innermapval_global, &iv->lock, &lockA);
+CHECK(innermapval_mapval, &iv->lock, &v->lock);
+
+#undef CHECK
+
+char _license[] SEC("license") = "GPL";
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 25/26] selftests/bpf: Add BPF linked list API tests
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (23 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 24/26] selftests/bpf: Add failure test cases for spin lock pairing Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-14 19:15 ` [PATCH bpf-next v7 26/26] selftests/bpf: Add BTF sanity tests Kumar Kartikeya Dwivedi
  2022-11-15  6:00 ` [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists patchwork-bot+netdevbpf
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Include various tests covering the success and failure cases. Also, run
the success cases at runtime to verify correctness of linked list
manipulation routines, in addition to ensuring successful verification.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/linked_list.c    | 253 ++++++++
 .../testing/selftests/bpf/progs/linked_list.c | 370 +++++++++++
 .../testing/selftests/bpf/progs/linked_list.h |  56 ++
 .../selftests/bpf/progs/linked_list_fail.c    | 581 ++++++++++++++++++
 5 files changed, 1261 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.c
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list.h
 create mode 100644 tools/testing/selftests/bpf/progs/linked_list_fail.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index be4e3d47ea3e..072243af93b0 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -33,6 +33,7 @@ ksyms_module                             # test_ksyms_module__open_and_load unex
 ksyms_module_libbpf                      # JIT does not support calling kernel function                                (kfunc)
 ksyms_module_lskel                       # test_ksyms_module_lskel__open_and_load unexpected error: -9                 (?)
 libbpf_get_fd_by_id_opts                 # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
+linked_list				 # JIT does not support calling kernel function                                (kfunc)
 lookup_key                               # JIT does not support calling kernel function                                (kfunc)
 lru_bug                                  # prog 'printk': failed to auto-attach: -524
 map_kptr                                 # failed to open_and_load program: -524 (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
new file mode 100644
index 000000000000..e8569db2f3bc
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "linked_list.skel.h"
+#include "linked_list_fail.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static struct {
+	const char *prog_name;
+	const char *err_msg;
+} linked_list_fail_tests[] = {
+#define TEST(test, off) \
+	{ #test "_missing_lock_push_front", \
+	  "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
+	{ #test "_missing_lock_push_back", \
+	  "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
+	{ #test "_missing_lock_pop_front", \
+	  "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \
+	{ #test "_missing_lock_pop_back", \
+	  "bpf_spin_lock at off=" #off " must be held for bpf_list_head" },
+	TEST(kptr, 32)
+	TEST(global, 16)
+	TEST(map, 0)
+	TEST(inner_map, 0)
+#undef TEST
+#define TEST(test, op) \
+	{ #test "_kptr_incorrect_lock_" #op, \
+	  "held lock and object are not in the same allocation\n" \
+	  "bpf_spin_lock at off=32 must be held for bpf_list_head" }, \
+	{ #test "_global_incorrect_lock_" #op, \
+	  "held lock and object are not in the same allocation\n" \
+	  "bpf_spin_lock at off=16 must be held for bpf_list_head" }, \
+	{ #test "_map_incorrect_lock_" #op, \
+	  "held lock and object are not in the same allocation\n" \
+	  "bpf_spin_lock at off=0 must be held for bpf_list_head" }, \
+	{ #test "_inner_map_incorrect_lock_" #op, \
+	  "held lock and object are not in the same allocation\n" \
+	  "bpf_spin_lock at off=0 must be held for bpf_list_head" },
+	TEST(kptr, push_front)
+	TEST(kptr, push_back)
+	TEST(kptr, pop_front)
+	TEST(kptr, pop_back)
+	TEST(global, push_front)
+	TEST(global, push_back)
+	TEST(global, pop_front)
+	TEST(global, pop_back)
+	TEST(map, push_front)
+	TEST(map, push_back)
+	TEST(map, pop_front)
+	TEST(map, pop_back)
+	TEST(inner_map, push_front)
+	TEST(inner_map, push_back)
+	TEST(inner_map, pop_front)
+	TEST(inner_map, pop_back)
+#undef TEST
+	{ "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_tp", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_perf", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" },
+	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" },
+	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
+	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
+	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
+	{ "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" },
+	{ "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" },
+	{ "obj_new_acq", "Unreleased reference id=" },
+	{ "use_after_drop", "invalid mem access 'scalar'" },
+	{ "ptr_walk_scalar", "type=scalar expected=percpu_ptr_" },
+	{ "direct_read_lock", "direct access to bpf_spin_lock is disallowed" },
+	{ "direct_write_lock", "direct access to bpf_spin_lock is disallowed" },
+	{ "direct_read_head", "direct access to bpf_list_head is disallowed" },
+	{ "direct_write_head", "direct access to bpf_list_head is disallowed" },
+	{ "direct_read_node", "direct access to bpf_list_node is disallowed" },
+	{ "direct_write_node", "direct access to bpf_list_node is disallowed" },
+	{ "write_after_push_front", "only read is supported" },
+	{ "write_after_push_back", "only read is supported" },
+	{ "use_after_unlock_push_front", "invalid mem access 'scalar'" },
+	{ "use_after_unlock_push_back", "invalid mem access 'scalar'" },
+	{ "double_push_front", "arg#1 expected pointer to allocated object" },
+	{ "double_push_back", "arg#1 expected pointer to allocated object" },
+	{ "no_node_value_type", "bpf_list_node not found for allocated object\n" },
+	{ "incorrect_value_type", "bpf_list_head value type does not match arg#1" },
+	{ "incorrect_node_var_off", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" },
+	{ "incorrect_node_off1", "bpf_list_node not found at offset=1" },
+	{ "incorrect_node_off2", "arg#1 offset must be for bpf_list_node at off=0" },
+	{ "no_head_type", "bpf_list_head not found for allocated object" },
+	{ "incorrect_head_var_off1", "R1 doesn't have constant offset" },
+	{ "incorrect_head_var_off2", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" },
+	{ "incorrect_head_off1", "bpf_list_head not found at offset=17" },
+	{ "incorrect_head_off2", "bpf_list_head not found at offset=1" },
+	{ "pop_front_off",
+	  "15: (bf) r1 = r6                      ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) "
+	  "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n"
+	  "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" },
+	{ "pop_back_off",
+	  "15: (bf) r1 = r6                      ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) "
+	  "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n"
+	  "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" },
+};
+
+static void test_linked_list_fail_prog(const char *prog_name, const char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
+						.kernel_log_size = sizeof(log_buf),
+						.kernel_log_level = 1);
+	struct linked_list_fail *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = linked_list_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "linked_list_fail__open_opts"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto end;
+
+	bpf_program__set_autoload(prog, true);
+
+	ret = linked_list_fail__load(skel);
+	if (!ASSERT_ERR(ret, "linked_list_fail__load must fail"))
+		goto end;
+
+	if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+		fprintf(stderr, "Expected: %s\n", err_msg);
+		fprintf(stderr, "Verifier: %s\n", log_buf);
+	}
+
+end:
+	linked_list_fail__destroy(skel);
+}
+
+static void clear_fields(struct bpf_map *map)
+{
+	char buf[24];
+	int key = 0;
+
+	memset(buf, 0xff, sizeof(buf));
+	ASSERT_OK(bpf_map__update_elem(map, &key, sizeof(key), buf, sizeof(buf), 0), "check_and_free_fields");
+}
+
+enum {
+	TEST_ALL,
+	PUSH_POP,
+	PUSH_POP_MULT,
+	LIST_IN_LIST,
+};
+
+static void test_linked_list_success(int mode, bool leave_in_map)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct linked_list *skel;
+	int ret;
+
+	skel = linked_list__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "linked_list__open_and_load"))
+		return;
+
+	if (mode == LIST_IN_LIST)
+		goto lil;
+	if (mode == PUSH_POP_MULT)
+		goto ppm;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop), &opts);
+	ASSERT_OK(ret, "map_list_push_pop");
+	ASSERT_OK(opts.retval, "map_list_push_pop retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.array_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop), &opts);
+	ASSERT_OK(ret, "inner_map_list_push_pop");
+	ASSERT_OK(opts.retval, "inner_map_list_push_pop retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.inner_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop), &opts);
+	ASSERT_OK(ret, "global_list_push_pop");
+	ASSERT_OK(opts.retval, "global_list_push_pop retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.data_A);
+
+	if (mode == PUSH_POP)
+		goto end;
+
+ppm:
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop_multiple), &opts);
+	ASSERT_OK(ret, "map_list_push_pop_multiple");
+	ASSERT_OK(opts.retval, "map_list_push_pop_multiple retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.array_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop_multiple), &opts);
+	ASSERT_OK(ret, "inner_map_list_push_pop_multiple");
+	ASSERT_OK(opts.retval, "inner_map_list_push_pop_multiple retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.inner_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple), &opts);
+	ASSERT_OK(ret, "global_list_push_pop_multiple");
+	ASSERT_OK(opts.retval, "global_list_push_pop_multiple retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.data_A);
+
+	if (mode == PUSH_POP_MULT)
+		goto end;
+
+lil:
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_in_list), &opts);
+	ASSERT_OK(ret, "map_list_in_list");
+	ASSERT_OK(opts.retval, "map_list_in_list retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.array_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_in_list), &opts);
+	ASSERT_OK(ret, "inner_map_list_in_list");
+	ASSERT_OK(opts.retval, "inner_map_list_in_list retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.inner_map);
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list), &opts);
+	ASSERT_OK(ret, "global_list_in_list");
+	ASSERT_OK(opts.retval, "global_list_in_list retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.data_A);
+end:
+	linked_list__destroy(skel);
+}
+
+void test_linked_list(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(linked_list_fail_tests); i++) {
+		if (!test__start_subtest(linked_list_fail_tests[i].prog_name))
+			continue;
+		test_linked_list_fail_prog(linked_list_fail_tests[i].prog_name,
+					   linked_list_fail_tests[i].err_msg);
+	}
+	test_linked_list_success(PUSH_POP, false);
+	test_linked_list_success(PUSH_POP, true);
+	test_linked_list_success(PUSH_POP_MULT, false);
+	test_linked_list_success(PUSH_POP_MULT, true);
+	test_linked_list_success(LIST_IN_LIST, false);
+	test_linked_list_success(LIST_IN_LIST, true);
+	test_linked_list_success(TEST_ALL, false);
+}
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
new file mode 100644
index 000000000000..2c7b615c6d41
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -0,0 +1,370 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#include "linked_list.h"
+
+static __always_inline
+int list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 2;
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_front(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		bpf_obj_drop(f);
+		return 3;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_back(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		bpf_obj_drop(f);
+		return 4;
+	}
+
+
+	bpf_spin_lock(lock);
+	f->data = 42;
+	bpf_list_push_front(head, &f->node);
+	bpf_spin_unlock(lock);
+	if (leave_in_map)
+		return 0;
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_back(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 5;
+	f = container_of(n, struct foo, node);
+	if (f->data != 42) {
+		bpf_obj_drop(f);
+		return 6;
+	}
+
+	bpf_spin_lock(lock);
+	f->data = 13;
+	bpf_list_push_front(head, &f->node);
+	bpf_spin_unlock(lock);
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_front(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 7;
+	f = container_of(n, struct foo, node);
+	if (f->data != 13) {
+		bpf_obj_drop(f);
+		return 8;
+	}
+	bpf_obj_drop(f);
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_front(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		return 9;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_back(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		return 10;
+	}
+	return 0;
+}
+
+
+static __always_inline
+int list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct foo *f[8], *pf;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		f[i] = bpf_obj_new(typeof(**f));
+		if (!f[i])
+			return 2;
+		f[i]->data = i;
+		bpf_spin_lock(lock);
+		bpf_list_push_front(head, &f[i]->node);
+		bpf_spin_unlock(lock);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		bpf_spin_lock(lock);
+		n = bpf_list_pop_front(head);
+		bpf_spin_unlock(lock);
+		if (!n)
+			return 3;
+		pf = container_of(n, struct foo, node);
+		if (pf->data != (ARRAY_SIZE(f) - i - 1)) {
+			bpf_obj_drop(pf);
+			return 4;
+		}
+		bpf_spin_lock(lock);
+		bpf_list_push_back(head, &pf->node);
+		bpf_spin_unlock(lock);
+	}
+
+	if (leave_in_map)
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(f); i++) {
+		bpf_spin_lock(lock);
+		n = bpf_list_pop_back(head);
+		bpf_spin_unlock(lock);
+		if (!n)
+			return 5;
+		pf = container_of(n, struct foo, node);
+		if (pf->data != i) {
+			bpf_obj_drop(pf);
+			return 6;
+		}
+		bpf_obj_drop(pf);
+	}
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_back(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		return 7;
+	}
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_front(head);
+	bpf_spin_unlock(lock);
+	if (n) {
+		bpf_obj_drop(container_of(n, struct foo, node));
+		return 8;
+	}
+	return 0;
+}
+
+static __always_inline
+int list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
+{
+	struct bpf_list_node *n;
+	struct bar *ba[8], *b;
+	struct foo *f;
+	int i;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 2;
+	for (i = 0; i < ARRAY_SIZE(ba); i++) {
+		b = bpf_obj_new(typeof(*b));
+		if (!b) {
+			bpf_obj_drop(f);
+			return 3;
+		}
+		b->data = i;
+		bpf_spin_lock(&f->lock);
+		bpf_list_push_back(&f->head, &b->node);
+		bpf_spin_unlock(&f->lock);
+	}
+
+	bpf_spin_lock(lock);
+	f->data = 42;
+	bpf_list_push_front(head, &f->node);
+	bpf_spin_unlock(lock);
+
+	if (leave_in_map)
+		return 0;
+
+	bpf_spin_lock(lock);
+	n = bpf_list_pop_front(head);
+	bpf_spin_unlock(lock);
+	if (!n)
+		return 4;
+	f = container_of(n, struct foo, node);
+	if (f->data != 42) {
+		bpf_obj_drop(f);
+		return 5;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ba); i++) {
+		bpf_spin_lock(&f->lock);
+		n = bpf_list_pop_front(&f->head);
+		bpf_spin_unlock(&f->lock);
+		if (!n) {
+			bpf_obj_drop(f);
+			return 6;
+		}
+		b = container_of(n, struct bar, node);
+		if (b->data != i) {
+			bpf_obj_drop(f);
+			bpf_obj_drop(b);
+			return 7;
+		}
+		bpf_obj_drop(b);
+	}
+	bpf_spin_lock(&f->lock);
+	n = bpf_list_pop_front(&f->head);
+	bpf_spin_unlock(&f->lock);
+	if (n) {
+		bpf_obj_drop(f);
+		bpf_obj_drop(container_of(n, struct bar, node));
+		return 8;
+	}
+	bpf_obj_drop(f);
+	return 0;
+}
+
+static __always_inline
+int test_list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head)
+{
+	int ret;
+
+	ret = list_push_pop(lock, head, false);
+	if (ret)
+		return ret;
+	return list_push_pop(lock, head, true);
+}
+
+static __always_inline
+int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head)
+{
+	int ret;
+
+	ret = list_push_pop_multiple(lock ,head, false);
+	if (ret)
+		return ret;
+	return list_push_pop_multiple(lock, head, true);
+}
+
+static __always_inline
+int test_list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head)
+{
+	int ret;
+
+	ret = list_in_list(lock, head, false);
+	if (ret)
+		return ret;
+	return list_in_list(lock, head, true);
+}
+
+SEC("tc")
+int map_list_push_pop(void *ctx)
+{
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_push_pop(&v->lock, &v->head);
+}
+
+SEC("tc")
+int inner_map_list_push_pop(void *ctx)
+{
+	struct map_value *v;
+	void *map;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
+	if (!map)
+		return 1;
+	v = bpf_map_lookup_elem(map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_push_pop(&v->lock, &v->head);
+}
+
+SEC("tc")
+int global_list_push_pop(void *ctx)
+{
+	return test_list_push_pop(&glock, &ghead);
+}
+
+SEC("tc")
+int map_list_push_pop_multiple(void *ctx)
+{
+	struct map_value *v;
+	int ret;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_push_pop_multiple(&v->lock, &v->head);
+}
+
+SEC("tc")
+int inner_map_list_push_pop_multiple(void *ctx)
+{
+	struct map_value *v;
+	void *map;
+	int ret;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
+	if (!map)
+		return 1;
+	v = bpf_map_lookup_elem(map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_push_pop_multiple(&v->lock, &v->head);
+}
+
+SEC("tc")
+int global_list_push_pop_multiple(void *ctx)
+{
+	int ret;
+
+	ret = list_push_pop_multiple(&glock, &ghead, false);
+	if (ret)
+		return ret;
+	return list_push_pop_multiple(&glock, &ghead, true);
+}
+
+SEC("tc")
+int map_list_in_list(void *ctx)
+{
+	struct map_value *v;
+	int ret;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_in_list(&v->lock, &v->head);
+}
+
+SEC("tc")
+int inner_map_list_in_list(void *ctx)
+{
+	struct map_value *v;
+	void *map;
+	int ret;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
+	if (!map)
+		return 1;
+	v = bpf_map_lookup_elem(map, &(int){0});
+	if (!v)
+		return 1;
+	return test_list_in_list(&v->lock, &v->head);
+}
+
+SEC("tc")
+int global_list_in_list(void *ctx)
+{
+	return test_list_in_list(&glock, &ghead);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/linked_list.h b/tools/testing/selftests/bpf/progs/linked_list.h
new file mode 100644
index 000000000000..8db80ed64db1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/linked_list.h
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef LINKED_LIST_H
+#define LINKED_LIST_H
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+struct bar {
+	struct bpf_list_node node;
+	int data;
+};
+
+struct foo {
+	struct bpf_list_node node;
+	struct bpf_list_head head __contains(bar, node);
+	struct bpf_spin_lock lock;
+	int data;
+	struct bpf_list_node node2;
+};
+
+struct map_value {
+	struct bpf_spin_lock lock;
+	int data;
+	struct bpf_list_head head __contains(foo, node);
+};
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+};
+
+struct array_map array_map SEC(".maps");
+struct array_map inner_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, int);
+	__array(values, struct array_map);
+} map_of_maps SEC(".maps") = {
+	.values = {
+		[0] = &inner_map,
+	},
+};
+
+#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
+
+private(A) struct bpf_spin_lock glock;
+private(A) struct bpf_list_head ghead __contains(foo, node);
+private(B) struct bpf_spin_lock glock2;
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/linked_list_fail.c b/tools/testing/selftests/bpf/progs/linked_list_fail.c
new file mode 100644
index 000000000000..1d9017240e19
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/linked_list_fail.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+
+#include "linked_list.h"
+
+#define INIT                                                  \
+	struct map_value *v, *v2, *iv, *iv2;                  \
+	struct foo *f, *f1, *f2;                              \
+	struct bar *b;                                        \
+	void *map;                                            \
+                                                              \
+	map = bpf_map_lookup_elem(&map_of_maps, &(int){ 0 }); \
+	if (!map)                                             \
+		return 0;                                     \
+	v = bpf_map_lookup_elem(&array_map, &(int){ 0 });     \
+	if (!v)                                               \
+		return 0;                                     \
+	v2 = bpf_map_lookup_elem(&array_map, &(int){ 0 });    \
+	if (!v2)                                              \
+		return 0;                                     \
+	iv = bpf_map_lookup_elem(map, &(int){ 0 });           \
+	if (!iv)                                              \
+		return 0;                                     \
+	iv2 = bpf_map_lookup_elem(map, &(int){ 0 });          \
+	if (!iv2)                                             \
+		return 0;                                     \
+	f = bpf_obj_new(typeof(*f));                          \
+	if (!f)                                               \
+		return 0;                                     \
+	f1 = f;                                               \
+	f2 = bpf_obj_new(typeof(*f2));                        \
+	if (!f2) {                                            \
+		bpf_obj_drop(f1);                             \
+		return 0;                                     \
+	}                                                     \
+	b = bpf_obj_new(typeof(*b));                          \
+	if (!b) {                                             \
+		bpf_obj_drop(f2);                             \
+		bpf_obj_drop(f1);                             \
+		return 0;                                     \
+	}
+
+#define CHECK(test, op, hexpr)                              \
+	SEC("?tc")                                          \
+	int test##_missing_lock_##op(void *ctx)             \
+	{                                                   \
+		INIT;                                       \
+		void (*p)(void *) = (void *)&bpf_list_##op; \
+		p(hexpr);                                   \
+		return 0;                                   \
+	}
+
+CHECK(kptr, push_front, &f->head);
+CHECK(kptr, push_back, &f->head);
+CHECK(kptr, pop_front, &f->head);
+CHECK(kptr, pop_back, &f->head);
+
+CHECK(global, push_front, &ghead);
+CHECK(global, push_back, &ghead);
+CHECK(global, pop_front, &ghead);
+CHECK(global, pop_back, &ghead);
+
+CHECK(map, push_front, &v->head);
+CHECK(map, push_back, &v->head);
+CHECK(map, pop_front, &v->head);
+CHECK(map, pop_back, &v->head);
+
+CHECK(inner_map, push_front, &iv->head);
+CHECK(inner_map, push_back, &iv->head);
+CHECK(inner_map, pop_front, &iv->head);
+CHECK(inner_map, pop_back, &iv->head);
+
+#undef CHECK
+
+#define CHECK(test, op, lexpr, hexpr)                       \
+	SEC("?tc")                                          \
+	int test##_incorrect_lock_##op(void *ctx)           \
+	{                                                   \
+		INIT;                                       \
+		void (*p)(void *) = (void *)&bpf_list_##op; \
+		bpf_spin_lock(lexpr);                       \
+		p(hexpr);                                   \
+		return 0;                                   \
+	}
+
+#define CHECK_OP(op)                                           \
+	CHECK(kptr_kptr, op, &f1->lock, &f2->head);            \
+	CHECK(kptr_global, op, &f1->lock, &ghead);             \
+	CHECK(kptr_map, op, &f1->lock, &v->head);              \
+	CHECK(kptr_inner_map, op, &f1->lock, &iv->head);       \
+                                                               \
+	CHECK(global_global, op, &glock2, &ghead);             \
+	CHECK(global_kptr, op, &glock, &f1->head);             \
+	CHECK(global_map, op, &glock, &v->head);               \
+	CHECK(global_inner_map, op, &glock, &iv->head);        \
+                                                               \
+	CHECK(map_map, op, &v->lock, &v2->head);               \
+	CHECK(map_kptr, op, &v->lock, &f2->head);              \
+	CHECK(map_global, op, &v->lock, &ghead);               \
+	CHECK(map_inner_map, op, &v->lock, &iv->head);         \
+                                                               \
+	CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head); \
+	CHECK(inner_map_kptr, op, &iv->lock, &f2->head);       \
+	CHECK(inner_map_global, op, &iv->lock, &ghead);        \
+	CHECK(inner_map_map, op, &iv->lock, &v->head);
+
+CHECK_OP(push_front);
+CHECK_OP(push_back);
+CHECK_OP(pop_front);
+CHECK_OP(pop_back);
+
+#undef CHECK
+#undef CHECK_OP
+#undef INIT
+
+SEC("?kprobe/xyz")
+int map_compat_kprobe(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?kretprobe/xyz")
+int map_compat_kretprobe(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?tracepoint/xyz")
+int map_compat_tp(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?perf_event")
+int map_compat_perf(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?raw_tp/xyz")
+int map_compat_raw_tp(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?raw_tp.w/xyz")
+int map_compat_raw_tp_w(void *ctx)
+{
+	bpf_list_push_front(&ghead, NULL);
+	return 0;
+}
+
+SEC("?tc")
+int obj_type_id_oor(void *ctx)
+{
+	bpf_obj_new_impl(~0UL, NULL);
+	return 0;
+}
+
+SEC("?tc")
+int obj_new_no_composite(void *ctx)
+{
+	bpf_obj_new_impl(bpf_core_type_id_local(int), (void *)42);
+	return 0;
+}
+
+SEC("?tc")
+int obj_new_no_struct(void *ctx)
+{
+
+	bpf_obj_new(union { int data; unsigned udata; });
+	return 0;
+}
+
+SEC("?tc")
+int obj_drop_non_zero_off(void *ctx)
+{
+	void *f;
+
+	f = bpf_obj_new(struct foo);
+	if (!f)
+		return 0;
+	bpf_obj_drop(f+1);
+	return 0;
+}
+
+SEC("?tc")
+int new_null_ret(void *ctx)
+{
+	return bpf_obj_new(struct foo)->data;
+}
+
+SEC("?tc")
+int obj_new_acq(void *ctx)
+{
+	bpf_obj_new(struct foo);
+	return 0;
+}
+
+SEC("?tc")
+int use_after_drop(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_obj_drop(f);
+	return f->data;
+}
+
+SEC("?tc")
+int ptr_walk_scalar(void *ctx)
+{
+	struct test1 {
+		struct test2 {
+			struct test2 *next;
+		} *ptr;
+	} *p;
+
+	p = bpf_obj_new(typeof(*p));
+	if (!p)
+		return 0;
+	bpf_this_cpu_ptr(p->ptr);
+	return 0;
+}
+
+SEC("?tc")
+int direct_read_lock(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	return *(int *)&f->lock;
+}
+
+SEC("?tc")
+int direct_write_lock(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	*(int *)&f->lock = 0;
+	return 0;
+}
+
+SEC("?tc")
+int direct_read_head(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	return *(int *)&f->head;
+}
+
+SEC("?tc")
+int direct_write_head(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	*(int *)&f->head = 0;
+	return 0;
+}
+
+SEC("?tc")
+int direct_read_node(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	return *(int *)&f->node;
+}
+
+SEC("?tc")
+int direct_write_node(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	*(int *)&f->node = 0;
+	return 0;
+}
+
+static __always_inline
+int write_after_op(void (*push_op)(void *head, void *node))
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	push_op(&ghead, &f->node);
+	f->data = 42;
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int write_after_push_front(void *ctx)
+{
+	return write_after_op((void *)bpf_list_push_front);
+}
+
+SEC("?tc")
+int write_after_push_back(void *ctx)
+{
+	return write_after_op((void *)bpf_list_push_back);
+}
+
+static __always_inline
+int use_after_unlock(void (*op)(void *head, void *node))
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	f->data = 42;
+	op(&ghead, &f->node);
+	bpf_spin_unlock(&glock);
+
+	return f->data;
+}
+
+SEC("?tc")
+int use_after_unlock_push_front(void *ctx)
+{
+	return use_after_unlock((void *)bpf_list_push_front);
+}
+
+SEC("?tc")
+int use_after_unlock_push_back(void *ctx)
+{
+	return use_after_unlock((void *)bpf_list_push_back);
+}
+
+static __always_inline
+int list_double_add(void (*op)(void *head, void *node))
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	op(&ghead, &f->node);
+	op(&ghead, &f->node);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int double_push_front(void *ctx)
+{
+	return list_double_add((void *)bpf_list_push_front);
+}
+
+SEC("?tc")
+int double_push_back(void *ctx)
+{
+	return list_double_add((void *)bpf_list_push_back);
+}
+
+SEC("?tc")
+int no_node_value_type(void *ctx)
+{
+	void *p;
+
+	p = bpf_obj_new(struct { int data; });
+	if (!p)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(&ghead, p);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_value_type(void *ctx)
+{
+	struct bar *b;
+
+	b = bpf_obj_new(typeof(*b));
+	if (!b)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(&ghead, &b->node);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_node_var_off(struct __sk_buff *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(&ghead, (void *)&f->node + ctx->protocol);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_node_off1(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(&ghead, (void *)&f->node + 1);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_node_off2(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(&ghead, &f->node2);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int no_head_type(void *ctx)
+{
+	void *p;
+
+	p = bpf_obj_new(typeof(struct { int data; }));
+	if (!p)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front(p, NULL);
+	bpf_spin_lock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_head_var_off1(struct __sk_buff *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front((void *)&ghead + ctx->protocol, &f->node);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_head_var_off2(struct __sk_buff *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&glock);
+	bpf_list_push_front((void *)&f->head + ctx->protocol, &f->node);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_head_off1(void *ctx)
+{
+	struct foo *f;
+	struct bar *b;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	b = bpf_obj_new(typeof(*b));
+	if (!b) {
+		bpf_obj_drop(f);
+		return 0;
+	}
+
+	bpf_spin_lock(&f->lock);
+	bpf_list_push_front((void *)&f->head + 1, &b->node);
+	bpf_spin_unlock(&f->lock);
+
+	return 0;
+}
+
+SEC("?tc")
+int incorrect_head_off2(void *ctx)
+{
+	struct foo *f;
+	struct bar *b;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+
+	bpf_spin_lock(&glock);
+	bpf_list_push_front((void *)&ghead + 1, &f->node);
+	bpf_spin_unlock(&glock);
+
+	return 0;
+}
+
+static __always_inline
+int pop_ptr_off(void *(*op)(void *head))
+{
+	struct {
+		struct bpf_list_head head __contains(foo, node2);
+		struct bpf_spin_lock lock;
+	} *p;
+	struct bpf_list_node *n;
+
+	p = bpf_obj_new(typeof(*p));
+	if (!p)
+		return 0;
+	bpf_spin_lock(&p->lock);
+	n = op(&p->head);
+	bpf_spin_unlock(&p->lock);
+
+	bpf_this_cpu_ptr(n);
+	return 0;
+}
+
+SEC("?tc")
+int pop_front_off(void *ctx)
+{
+	return pop_ptr_off((void *)bpf_list_pop_front);
+}
+
+SEC("?tc")
+int pop_back_off(void *ctx)
+{
+	return pop_ptr_off((void *)bpf_list_pop_back);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH bpf-next v7 26/26] selftests/bpf: Add BTF sanity tests
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (24 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 25/26] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
@ 2022-11-14 19:15 ` Kumar Kartikeya Dwivedi
  2022-11-15  6:00 ` [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists patchwork-bot+netdevbpf
  26 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-14 19:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

Preparing the metadata for bpf_list_head involves a complicated parsing
step and type resolution for the contained value. Ensure that corner
cases are tested against and invalid specifications in source are duly
rejected. Also include tests for incorrect ownership relationships in
the BTF.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/linked_list.c    | 485 ++++++++++++++++++
 1 file changed, 485 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index e8569db2f3bc..bdc5a4f82e79 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -1,4 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <bpf/btf.h>
+#include <test_btf.h>
+#include <linux/btf.h>
 #include <test_progs.h>
 #include <network_helpers.h>
 
@@ -233,6 +236,487 @@ static void test_linked_list_success(int mode, bool leave_in_map)
 	linked_list__destroy(skel);
 }
 
+#define SPIN_LOCK 2
+#define LIST_HEAD 3
+#define LIST_NODE 4
+
+static struct btf *init_btf(void)
+{
+	int id, lid, hid, nid;
+	struct btf *btf;
+
+	btf = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf, "btf__new_empty"))
+		return NULL;
+	id = btf__add_int(btf, "int", 4, BTF_INT_SIGNED);
+	if (!ASSERT_EQ(id, 1, "btf__add_int"))
+		goto end;
+	lid = btf__add_struct(btf, "bpf_spin_lock", 4);
+	if (!ASSERT_EQ(lid, SPIN_LOCK, "btf__add_struct bpf_spin_lock"))
+		goto end;
+	hid = btf__add_struct(btf, "bpf_list_head", 16);
+	if (!ASSERT_EQ(hid, LIST_HEAD, "btf__add_struct bpf_list_head"))
+		goto end;
+	nid = btf__add_struct(btf, "bpf_list_node", 16);
+	if (!ASSERT_EQ(nid, LIST_NODE, "btf__add_struct bpf_list_node"))
+		goto end;
+	return btf;
+end:
+	btf__free(btf);
+	return NULL;
+}
+
+static void test_btf(void)
+{
+	struct btf *btf = NULL;
+	int id, err;
+
+	while (test__start_subtest("btf: too many locks")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 24);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_struct foo::a"))
+			break;
+		err = btf__add_field(btf, "b", SPIN_LOCK, 32, 0);
+		if (!ASSERT_OK(err, "btf__add_struct foo::a"))
+			break;
+		err = btf__add_field(btf, "c", LIST_HEAD, 64, 0);
+		if (!ASSERT_OK(err, "btf__add_struct foo::a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -E2BIG, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: missing lock")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 16);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_struct foo::a"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:baz:a", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:baz:a"))
+			break;
+		id = btf__add_struct(btf, "baz", 16);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct baz"))
+			break;
+		err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field baz::a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -EINVAL, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: bad offset")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 36);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -EEXIST, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: missing contains:")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 24);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -EINVAL, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: missing struct")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 24);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:bar", 5, 1);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:bar"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -ENOENT, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: missing node")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 24);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_HEAD, 64, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:foo:c", 5, 1);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:c"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		btf__free(btf);
+		ASSERT_EQ(err, -ENOENT, "check btf");
+		break;
+	}
+
+	while (test__start_subtest("btf: node incorrect type")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 20);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a"))
+			break;
+		id = btf__add_struct(btf, "bar", 4);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -EINVAL, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: multiple bpf_list_node with name b")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 52);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::c"))
+			break;
+		err = btf__add_field(btf, "d", SPIN_LOCK, 384, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::d"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -EINVAL, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning | owned AA cycle")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 36);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -ELOOP, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning | owned ABA cycle")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 36);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
+			break;
+		id = btf__add_struct(btf, "bar", 36);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:foo:b", 7, 0);
+		if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:foo:b"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -ELOOP, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning -> owned")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 20);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a"))
+			break;
+		id = btf__add_struct(btf, "bar", 16);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, 0, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning -> owning | owned -> owned")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 20);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
+			break;
+		id = btf__add_struct(btf, "bar", 36);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0);
+		if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a"))
+			break;
+		id = btf__add_struct(btf, "baz", 16);
+		if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
+			break;
+		err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field baz:a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, 0, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning | owned -> owning | owned -> owned")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 36);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
+			break;
+		id = btf__add_struct(btf, "bar", 36);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar:a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar:b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar:c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0);
+		if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a"))
+			break;
+		id = btf__add_struct(btf, "baz", 16);
+		if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
+			break;
+		err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field baz:a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -ELOOP, "check btf");
+		btf__free(btf);
+		break;
+	}
+
+	while (test__start_subtest("btf: owning -> owning | owned -> owning | owned -> owned")) {
+		btf = init_btf();
+		if (!ASSERT_OK_PTR(btf, "init_btf"))
+			break;
+		id = btf__add_struct(btf, "foo", 20);
+		if (!ASSERT_EQ(id, 5, "btf__add_struct foo"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::a"))
+			break;
+		err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field foo::b"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0);
+		if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b"))
+			break;
+		id = btf__add_struct(btf, "bar", 36);
+		if (!ASSERT_EQ(id, 7, "btf__add_struct bar"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:baz:b", 7, 0);
+		if (!ASSERT_EQ(id, 8, "btf__add_decl_tag"))
+			break;
+		id = btf__add_struct(btf, "baz", 36);
+		if (!ASSERT_EQ(id, 9, "btf__add_struct baz"))
+			break;
+		err = btf__add_field(btf, "a", LIST_HEAD, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::a"))
+			break;
+		err = btf__add_field(btf, "b", LIST_NODE, 128, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::b"))
+			break;
+		err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0);
+		if (!ASSERT_OK(err, "btf__add_field bar::c"))
+			break;
+		id = btf__add_decl_tag(btf, "contains:bam:a", 9, 0);
+		if (!ASSERT_EQ(id, 10, "btf__add_decl_tag contains:bam:a"))
+			break;
+		id = btf__add_struct(btf, "bam", 16);
+		if (!ASSERT_EQ(id, 11, "btf__add_struct bam"))
+			break;
+		err = btf__add_field(btf, "a", LIST_NODE, 0, 0);
+		if (!ASSERT_OK(err, "btf__add_field bam::a"))
+			break;
+
+		err = btf__load_into_kernel(btf);
+		ASSERT_EQ(err, -ELOOP, "check btf");
+		btf__free(btf);
+		break;
+	}
+}
+
 void test_linked_list(void)
 {
 	int i;
@@ -243,6 +727,7 @@ void test_linked_list(void)
 		test_linked_list_fail_prog(linked_list_fail_tests[i].prog_name,
 					   linked_list_fail_tests[i].err_msg);
 	}
+	test_btf();
 	test_linked_list_success(PUSH_POP, false);
 	test_linked_list_success(PUSH_POP, true);
 	test_linked_list_success(PUSH_POP_MULT, false);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support
  2022-11-14 19:15 ` [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support Kumar Kartikeya Dwivedi
@ 2022-11-15  5:48   ` Alexei Starovoitov
  2022-11-15 16:47     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  5:48 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:29AM +0530, Kumar Kartikeya Dwivedi wrote:
> Introduce support for representing pointers to objects allocated by the
> BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF.
> This is indicated by the presence of MEM_ALLOC type flag in reg->type to
> avoid having to check btf_is_kernel when trying to match argument types
> in helpers.
> 
> Whenever walking such types, any pointers being walked will always yield
> a SCALAR instead of pointer. In the future we might permit kptr inside
> such allocated objects (either kernel or local), and it will then form a

(either kernel or program allocated) ?

> PTR_TO_BTF_ID of the respective type.
> 
> For now, such allocated objects will always be referenced in verifier
> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> to such objects, as long fields that are special are not touched
> (support for which will be added in subsequent patches). Note that once
> such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write
> to it.
> 
> No PROBE_MEM handling is therefore done for loads into this type unless
> PTR_UNTRUSTED is part of the register type, since they can never be in
> an undefined state, and their lifetime will always be valid.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   | 11 +++++++++++
>  kernel/bpf/btf.c      |  5 +++++
>  kernel/bpf/verifier.c | 25 +++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 49f9d2bec401..3cab113b149e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -524,6 +524,11 @@ enum bpf_type_flag {
>  	/* Size is known at compile time. */
>  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>  
> +	/* MEM is of a an allocated object of type from program BTF. This is
> +	 * used to tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> +	 */
> +	MEM_ALLOC		= BIT(11 + BPF_BASE_TYPE_BITS),
> +
>  	__BPF_TYPE_FLAG_MAX,
>  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>  };
> @@ -2791,4 +2796,10 @@ struct bpf_key {
>  	bool has_ref;
>  };
>  #endif /* CONFIG_KEYS */
> +
> +static inline bool type_is_alloc(u32 type)
> +{
> +	return type & MEM_ALLOC;
> +}
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 875355ff3718..9a596f430558 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6034,6 +6034,11 @@ int btf_struct_access(struct bpf_verifier_log *log,
>  
>  		switch (err) {
>  		case WALK_PTR:
> +			/* For local types, the destination register cannot
> +			 * become a pointer again.
> +			 */
> +			if (type_is_alloc(reg->type))
> +				return SCALAR_VALUE;
>  			/* If we found the pointer or scalar on t+off,
>  			 * we're done.
>  			 */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 5e74f460dfd0..d726d55622c9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4687,14 +4687,27 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>  
> -	if (env->ops->btf_struct_access) {
> +	if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
> +		if (!btf_is_kernel(reg->btf)) {
> +			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
> +			return -EFAULT;
> +		}
>  		ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
>  	} else {
> -		if (atype != BPF_READ) {
> +		/* Writes are permitted with default btf_struct_access for local
> +		 * kptrs (which always have ref_obj_id > 0), but not for

for program allocated objects (which always have ref_obj_id > 0) ?

> +		 * untrusted PTR_TO_BTF_ID | MEM_ALLOC.
> +		 */
> +		if (atype != BPF_READ && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
>  			verbose(env, "only read is supported\n");
>  			return -EACCES;
>  		}
>  
> +		if (type_is_alloc(reg->type) && !reg->ref_obj_id) {
> +			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
> +			return -EFAULT;
> +		}
> +
>  		ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
>  	}
>  
> @@ -5973,6 +5986,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> +	case PTR_TO_BTF_ID | MEM_ALLOC:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * it's fixed offset must be 0.	In the other cases, fixed offset
>  		 * can be non-zero.
> @@ -13659,6 +13673,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  			break;
>  		case PTR_TO_BTF_ID:
>  		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> +		/* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike
> +		 * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot
> +		 * be said once it is marked PTR_UNTRUSTED, hence we must handle
> +		 * any faults for loads into such types. BPF_WRITE is disallowed
> +		 * for this case.
> +		 */
> +		case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
>  			if (type == BPF_READ) {
>  				insn->code = BPF_LDX | BPF_PROBE_MEM |
>  					BPF_SIZE((insn)->code);
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects
  2022-11-14 19:15 ` [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects Kumar Kartikeya Dwivedi
@ 2022-11-15  5:50   ` Alexei Starovoitov
  2022-11-15 16:48     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  5:50 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:30AM +0530, Kumar Kartikeya Dwivedi wrote:
> +	parse:
> +		if (!tab) {
> +			tab = kzalloc(offsetof(struct btf_struct_metas, types[1]),
> +				      GFP_KERNEL | __GFP_NOWARN);
> +			if (!tab)
> +				return ERR_PTR(-ENOMEM);
> +		} else {
> +			struct btf_struct_metas *new_tab;
> +
> +			new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab->cnt + 1]),
> +					   GFP_KERNEL | __GFP_NOWARN);
> +			if (!new_tab) {
> +				ret = -ENOMEM;
> +				goto free;
> +			}
> +			tab = new_tab;
> +		}

If @p is %NULL, krealloc() behaves exactly like kmalloc().

That can help to simplify above a bit?

> +		type = &tab->types[tab->cnt];
> +
> +		type->btf_id = i;
> +		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
> +		if (IS_ERR_OR_NULL(record)) {
> +			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
> +			goto free;
> +		}
> +		foffs = btf_parse_field_offs(record);
> +		if (WARN_ON_ONCE(IS_ERR_OR_NULL(foffs))) {

WARN_ON_ONCE ?
Pls add a comment.

> +			btf_record_free(record);
> +			ret = -EFAULT;
> +			goto free;
> +		}
> +		type->record = record;
> +		type->field_offs = foffs;
> +		tab->cnt++;
> +	}
> +	return tab;
> +free:
> +	btf_struct_metas_free(tab);
> +	return ERR_PTR(ret);
> +}
> +
> +struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id)
> +{
> +	struct btf_struct_metas *tab;
> +
> +	BUILD_BUG_ON(offsetof(struct btf_struct_meta, btf_id) != 0);
> +	tab = btf->struct_meta_tab;
> +	if (!tab)
> +		return NULL;
> +	return bsearch(&btf_id, tab->types, tab->cnt, sizeof(tab->types[0]), btf_id_cmp_func);
> +}
> +
>  static int btf_check_type_tags(struct btf_verifier_env *env,
>  			       struct btf *btf, int start_id)
>  {
> @@ -5191,6 +5338,7 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
>  static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
>  			     u32 log_level, char __user *log_ubuf, u32 log_size)
>  {
> +	struct btf_struct_metas *struct_meta_tab;
>  	struct btf_verifier_env *env = NULL;
>  	struct bpf_verifier_log *log;
>  	struct btf *btf = NULL;
> @@ -5259,15 +5407,24 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
>  	if (err)
>  		goto errout;
>  
> +	struct_meta_tab = btf_parse_struct_metas(log, btf);
> +	if (IS_ERR(struct_meta_tab)) {
> +		err = PTR_ERR(struct_meta_tab);
> +		goto errout;
> +	}
> +	btf->struct_meta_tab = struct_meta_tab;
> +
>  	if (log->level && bpf_verifier_log_full(log)) {
>  		err = -ENOSPC;
> -		goto errout;
> +		goto errout_meta;
>  	}
>  
>  	btf_verifier_env_free(env);
>  	refcount_set(&btf->refcnt, 1);
>  	return btf;
>  
> +errout_meta:
> +	btf_free_struct_meta_tab(btf);
>  errout:
>  	btf_verifier_env_free(env);
>  	if (btf)
> @@ -6028,6 +6185,28 @@ int btf_struct_access(struct bpf_verifier_log *log,
>  	u32 id = reg->btf_id;
>  	int err;
>  
> +	while (type_is_alloc(reg->type)) {
> +		struct btf_struct_meta *meta;
> +		struct btf_record *rec;
> +		int i;
> +
> +		meta = btf_find_struct_meta(btf, id);
> +		if (!meta)
> +			break;
> +		rec = meta->record;
> +		for (i = 0; i < rec->cnt; i++) {
> +			struct btf_field *field = &rec->fields[i];
> +			u32 offset = field->offset;
> +			if (off < offset + btf_field_type_size(field->type) && offset < off + size) {
> +				bpf_log(log,
> +					"direct access to %s is disallowed\n",
> +					btf_field_type_name(field->type));
> +				return -EACCES;
> +			}
> +		}
> +		break;
> +	}
> +
>  	t = btf_type_by_id(btf, id);
>  	do {
>  		err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
> @@ -7269,23 +7448,6 @@ bool btf_is_module(const struct btf *btf)
>  	return btf->kernel_btf && strcmp(btf->name, "vmlinux") != 0;
>  }
>  
> -static int btf_id_cmp_func(const void *a, const void *b)
> -{
> -	const int *pa = a, *pb = b;
> -
> -	return *pa - *pb;
> -}
> -
> -bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
> -{
> -	return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
> -}
> -
> -static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
> -{
> -	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
> -}
> -
>  enum {
>  	BTF_MODULE_F_LIVE = (1 << 0),
>  };
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index fdbae52f463f..c96039a4e57f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -537,6 +537,7 @@ void btf_record_free(struct btf_record *rec)
>  			btf_put(rec->fields[i].kptr.btf);
>  			break;
>  		case BPF_LIST_HEAD:
> +		case BPF_LIST_NODE:
>  			/* Nothing to release for bpf_list_head */
>  			break;
>  		default:
> @@ -582,6 +583,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
>  			}
>  			break;
>  		case BPF_LIST_HEAD:
> +		case BPF_LIST_NODE:
>  			/* Nothing to acquire for bpf_list_head */
>  			break;
>  		default:
> @@ -648,6 +650,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  				continue;
>  			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
>  			break;
> +		case BPF_LIST_NODE:
> +			break;
>  		default:
>  			WARN_ON_ONCE(1);
>  			continue;
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists
  2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
                   ` (25 preceding siblings ...)
  2022-11-14 19:15 ` [PATCH bpf-next v7 26/26] selftests/bpf: Add BTF sanity tests Kumar Kartikeya Dwivedi
@ 2022-11-15  6:00 ` patchwork-bot+netdevbpf
  26 siblings, 0 replies; 48+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-11-15  6:00 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, ast, andrii, daniel, martin.lau, davemarchevsky

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue, 15 Nov 2022 00:45:21 +0530 you wrote:
> This series introduces user defined BPF objects of a type in program
> BTF. This allows BPF programs to allocate their own objects, build their
> own object hierarchies, and use the basic building blocks provided by
> BPF runtime to build their own data structures flexibly.
> 
> Then, we introduce the support for single ownership BPF linked lists,
> which can be put inside BPF maps, or allocated objects, and hold such
> allocated objects as elements. It works as an instrusive collection,
> which is done to allow making allocated objects part of multiple data
> structures at the same time in the future.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v7,01/26] bpf: Remove local kptr references in documentation
    https://git.kernel.org/bpf/bpf-next/c/1f6d52f1a894
  - [bpf-next,v7,02/26] bpf: Remove BPF_MAP_OFF_ARR_MAX
    https://git.kernel.org/bpf/bpf-next/c/2d577252579b
  - [bpf-next,v7,03/26] bpf: Fix copy_map_value, zero_map_value
    https://git.kernel.org/bpf/bpf-next/c/e5feed0f64f7
  - [bpf-next,v7,04/26] bpf: Support bpf_list_head in map values
    https://git.kernel.org/bpf/bpf-next/c/f0c5941ff5b2
  - [bpf-next,v7,05/26] bpf: Rename RET_PTR_TO_ALLOC_MEM
    https://git.kernel.org/bpf/bpf-next/c/2de2669b4e52
  - [bpf-next,v7,06/26] bpf: Rename MEM_ALLOC to MEM_RINGBUF
    https://git.kernel.org/bpf/bpf-next/c/894f2a8b1673
  - [bpf-next,v7,07/26] bpf: Refactor btf_struct_access
    https://git.kernel.org/bpf/bpf-next/c/6728aea7216c
  - [bpf-next,v7,08/26] bpf: Introduce allocated objects support
    (no matching commit)
  - [bpf-next,v7,09/26] bpf: Recognize lock and list fields in allocated objects
    (no matching commit)
  - [bpf-next,v7,10/26] bpf: Verify ownership relationships for user BTF types
    (no matching commit)
  - [bpf-next,v7,11/26] bpf: Allow locking bpf_spin_lock in allocated objects
    (no matching commit)
  - [bpf-next,v7,12/26] bpf: Allow locking bpf_spin_lock global variables
    (no matching commit)
  - [bpf-next,v7,13/26] bpf: Allow locking bpf_spin_lock in inner map values
    (no matching commit)
  - [bpf-next,v7,14/26] bpf: Rewrite kfunc argument handling
    (no matching commit)
  - [bpf-next,v7,15/26] bpf: Drop kfunc bits from btf_check_func_arg_match
    (no matching commit)
  - [bpf-next,v7,16/26] bpf: Support constant scalar arguments for kfuncs
    (no matching commit)
  - [bpf-next,v7,17/26] bpf: Introduce bpf_obj_new
    (no matching commit)
  - [bpf-next,v7,18/26] bpf: Introduce bpf_obj_drop
    (no matching commit)
  - [bpf-next,v7,19/26] bpf: Permit NULL checking pointer with non-zero fixed offset
    (no matching commit)
  - [bpf-next,v7,20/26] bpf: Introduce single ownership BPF linked list API
    (no matching commit)
  - [bpf-next,v7,21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}
    (no matching commit)
  - [bpf-next,v7,22/26] selftests/bpf: Add __contains macro to bpf_experimental.h
    (no matching commit)
  - [bpf-next,v7,23/26] selftests/bpf: Update spinlock selftest
    (no matching commit)
  - [bpf-next,v7,24/26] selftests/bpf: Add failure test cases for spin lock pairing
    (no matching commit)
  - [bpf-next,v7,25/26] selftests/bpf: Add BPF linked list API tests
    (no matching commit)
  - [bpf-next,v7,26/26] selftests/bpf: Add BTF sanity tests
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling
  2022-11-14 19:15 ` [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
@ 2022-11-15  6:10   ` Alexei Starovoitov
  2022-11-15 16:51     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  6:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:35AM +0530, Kumar Kartikeya Dwivedi wrote:
> As we continue to add more features, argument types, kfunc flags, and
> different extensions to kfuncs, the code to verify the correctness of
> the kfunc prototype wrt the passed in registers has become ad-hoc and
> ugly to read. To make life easier, and make a very clear split between
> different stages of argument processing, move all the code into
> verifier.c and refactor into easier to read helpers and functions.
> 
> This also makes sharing code within the verifier easier with kfunc
> argument processing. This will be more and more useful in later patches
> as we are now moving to implement very core BPF helpers as kfuncs, to
> keep them experimental before baking into UAPI.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/btf.h                           |  31 +-
>  kernel/bpf/btf.c                              |  16 +-
>  kernel/bpf/verifier.c                         | 547 +++++++++++++++++-
>  .../bpf/prog_tests/kfunc_dynptr_param.c       |   2 +-
>  tools/testing/selftests/bpf/verifier/calls.c  |   2 +-
>  .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
>  6 files changed, 569 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 42d8f3730a8d..d5b26380a60f 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -338,6 +338,16 @@ static inline bool btf_type_is_struct(const struct btf_type *t)
>  	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
>  }
>  
> +static inline bool __btf_type_is_struct(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
> +}
> +
> +static inline bool btf_type_is_array(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
> +}
> +
>  static inline u16 btf_type_vlen(const struct btf_type *t)
>  {
>  	return BTF_INFO_VLEN(t->info);
> @@ -439,9 +449,10 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
>  	return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
>  }
>  
> -#ifdef CONFIG_BPF_SYSCALL
>  struct bpf_prog;
> +struct bpf_verifier_log;
>  
> +#ifdef CONFIG_BPF_SYSCALL
>  const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
>  struct btf *btf_parse_vmlinux(void);
> @@ -455,6 +466,12 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
>  int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
>  				struct module *owner);
>  struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
> +const struct btf_member *
> +btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
> +		      const struct btf_type *t, enum bpf_prog_type prog_type,
> +		      int arg);
> +bool btf_types_are_same(const struct btf *btf1, u32 id1,
> +			const struct btf *btf2, u32 id2);
>  #else
>  static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
>  						    u32 type_id)
> @@ -490,6 +507,18 @@ static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf
>  {
>  	return NULL;
>  }
> +static inline const struct btf_member *
> +btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
> +		      const struct btf_type *t, enum bpf_prog_type prog_type,
> +		      int arg)
> +{
> +	return NULL;
> +}
> +static inline bool btf_types_are_same(const struct btf *btf1, u32 id1,
> +				      const struct btf *btf2, u32 id2)
> +{
> +	return false;
> +}
>  #endif
>  
>  static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 10644343d877..ff8b46c209dd 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -478,16 +478,6 @@ static bool btf_type_nosize_or_null(const struct btf_type *t)
>  	return !t || btf_type_nosize(t);
>  }
>  
> -static bool __btf_type_is_struct(const struct btf_type *t)
> -{
> -	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
> -}
> -
> -static bool btf_type_is_array(const struct btf_type *t)
> -{
> -	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
> -}
> -
>  static bool btf_type_is_datasec(const struct btf_type *t)
>  {
>  	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
> @@ -5537,7 +5527,7 @@ static u8 bpf_ctx_convert_map[] = {
>  #undef BPF_MAP_TYPE
>  #undef BPF_LINK_TYPE
>  
> -static const struct btf_member *
> +const struct btf_member *
>  btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
>  		      const struct btf_type *t, enum bpf_prog_type prog_type,
>  		      int arg)
> @@ -6323,8 +6313,8 @@ int btf_struct_access(struct bpf_verifier_log *log,
>   * end up with two different module BTFs, but IDs point to the common type in
>   * vmlinux BTF.
>   */
> -static bool btf_types_are_same(const struct btf *btf1, u32 id1,
> -			       const struct btf *btf2, u32 id2)
> +bool btf_types_are_same(const struct btf *btf1, u32 id1,
> +			const struct btf *btf2, u32 id2)
>  {
>  	if (id1 != id2)
>  		return false;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 99b5edb56978..ddb7ac1cb529 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7859,19 +7859,523 @@ static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
>  	}
>  }
>  
> +struct bpf_kfunc_call_arg_meta {
> +	/* In parameters */
> +	struct btf *btf;
> +	u32 func_id;
> +	u32 kfunc_flags;
> +	const struct btf_type *func_proto;
> +	const char *func_name;
> +	/* Out parameters */
> +	u32 ref_obj_id;
> +	u8 release_regno;
> +	bool r0_rdonly;
> +	u64 r0_size;
> +};
> +
> +static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_ACQUIRE;
> +}
> +
> +static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_RET_NULL;
> +}
> +
> +static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_RELEASE;
> +}
> +
> +static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_TRUSTED_ARGS;
> +}
> +
> +static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_SLEEPABLE;
> +}
> +
> +static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_DESTRUCTIVE;
> +}
> +
> +static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
> +{
> +	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
> +}
> +
> +static bool is_kfunc_arg_mem_size(const struct btf *btf,
> +				  const struct btf_param *arg,
> +				  const struct bpf_reg_state *reg)
> +{
> +	int len, sfx_len = sizeof("__sz") - 1;
> +	const struct btf_type *t;
> +	const char *param_name;
> +
> +	t = btf_type_skip_modifiers(btf, arg->type, NULL);
> +	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
> +		return false;
> +
> +	/* In the future, this can be ported to use BTF tagging */
> +	param_name = btf_name_by_offset(btf, arg->name_off);
> +	if (str_is_empty(param_name))
> +		return false;
> +	len = strlen(param_name);
> +	if (len < sfx_len)
> +		return false;
> +	param_name += len - sfx_len;
> +	if (strncmp(param_name, "__sz", sfx_len))
> +		return false;
> +
> +	return true;
> +}
> +
> +static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
> +				      const struct btf_param *arg,
> +				      const struct bpf_reg_state *reg,
> +				      const char *name)
> +{
> +	int len, target_len = strlen(name);
> +	const struct btf_type *t;
> +	const char *param_name;
> +
> +	t = btf_type_skip_modifiers(btf, arg->type, NULL);
> +	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
> +		return false;
> +
> +	param_name = btf_name_by_offset(btf, arg->name_off);
> +	if (str_is_empty(param_name))
> +		return false;
> +	len = strlen(param_name);
> +	if (len != target_len)
> +		return false;
> +	if (strcmp(param_name, name))
> +		return false;
> +
> +	return true;
> +}
> +
> +enum {
> +	KF_ARG_DYNPTR_ID,
> +};
> +
> +BTF_ID_LIST(kf_arg_btf_ids)
> +BTF_ID(struct, bpf_dynptr_kern)
> +
> +static bool is_kfunc_arg_dynptr(const struct btf *btf,
> +				const struct btf_param *arg)
> +{
> +	const struct btf_type *t;
> +	u32 res_id;
> +
> +	t = btf_type_skip_modifiers(btf, arg->type, NULL);
> +	if (!t)
> +		return false;
> +	if (!btf_type_is_ptr(t))
> +		return false;
> +	t = btf_type_skip_modifiers(btf, t->type, &res_id);
> +	if (!t)
> +		return false;
> +	return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[KF_ARG_DYNPTR_ID]);
> +}
> +
> +/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> +static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
> +					const struct btf *btf,
> +					const struct btf_type *t, int rec)
> +{
> +	const struct btf_type *member_type;
> +	const struct btf_member *member;
> +	u32 i;
> +
> +	if (!btf_type_is_struct(t))
> +		return false;
> +
> +	for_each_member(i, t, member) {
> +		const struct btf_array *array;
> +
> +		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
> +		if (btf_type_is_struct(member_type)) {
> +			if (rec >= 3) {
> +				verbose(env, "max struct nesting depth exceeded\n");
> +				return false;
> +			}
> +			if (!__btf_type_is_scalar_struct(env, btf, member_type, rec + 1))
> +				return false;
> +			continue;
> +		}
> +		if (btf_type_is_array(member_type)) {
> +			array = btf_array(member_type);
> +			if (!array->nelems)
> +				return false;
> +			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
> +			if (!btf_type_is_scalar(member_type))
> +				return false;
> +			continue;
> +		}
> +		if (!btf_type_is_scalar(member_type))
> +			return false;
> +	}
> +	return true;
> +}

Deleting the code from the next patch can be combined with this patch,
since it's a pure code move?

Similar to few funcs that do pure code move... it's better to have them
in a single patch.

Not sure about 2 patch split strategy in general.
Not clear whether it helps or hurts the code review.

> +
> +
> +static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> +#ifdef CONFIG_NET
> +	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> +	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> +	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> +#endif
> +};
> +
> +enum kfunc_ptr_arg_type {
> +	KF_ARG_PTR_TO_CTX,
> +	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */

What does the STRONG suffix signify?

PTR_TO_KPTR should always be safe.
Just to make it different from ARG_PTR_TO_KPTR in kptr_xchg that
has 'void *' arg and "auto converts" the type?

Here STRONG means that the type of the arg should match?

I think it's too verbose.
Just KF_ARG_PTR_TO_KPTR would be clear enough.
If we ever have another kptr_xchg that is done as kfunc
we can add KF_ARG_PTR_TO_KPTR_AUTO or some other name that we can bikeshed later.

> +	KF_ARG_PTR_TO_DYNPTR,
> +	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
> +	KF_ARG_PTR_TO_MEM,
> +	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
> +};
> +
> +static enum kfunc_ptr_arg_type
> +get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> +		       struct bpf_kfunc_call_arg_meta *meta,
> +		       const struct btf_type *t, const struct btf_type *ref_t,
> +		       const char *ref_tname, const struct btf_param *args,
> +		       int argno, int nargs)
> +{
> +	u32 regno = argno + 1;
> +	struct bpf_reg_state *regs = cur_regs(env);
> +	struct bpf_reg_state *reg = &regs[regno];
> +	bool arg_mem_size = false;
> +
> +	/* In this function, we verify the kfunc's BTF as per the argument type,
> +	 * leaving the rest of the verification with respect to the register
> +	 * type to our caller. When a set of conditions hold in the BTF type of
> +	 * arguments, we resolve it to a known kfunc_ptr_arg_type.
> +	 */
> +	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
> +		return KF_ARG_PTR_TO_CTX;
> +
> +	if (is_kfunc_arg_kptr_get(meta, argno)) {
> +		if (!btf_type_is_ptr(ref_t)) {
> +			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
> +			return -EINVAL;
> +		}
> +		ref_t = btf_type_by_id(meta->btf, ref_t->type);
> +		ref_tname = btf_name_by_offset(meta->btf, ref_t->name_off);
> +		if (!btf_type_is_struct(ref_t)) {
> +			verbose(env, "kernel function %s args#0 pointer type %s %s is not supported\n",
> +				meta->func_name, btf_type_str(ref_t), ref_tname);
> +			return -EINVAL;
> +		}
> +		return KF_ARG_PTR_TO_KPTR_STRONG;
> +	}
> +
> +	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
> +		return KF_ARG_PTR_TO_DYNPTR;
> +
> +	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> +		if (!btf_type_is_struct(ref_t)) {
> +			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> +				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> +			return -EINVAL;
> +		}
> +		return KF_ARG_PTR_TO_BTF_ID;
> +	}
> +
> +	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
> +		arg_mem_size = true;
> +
> +	/* This is the catch all argument type of register types supported by
> +	 * check_helper_mem_access. However, we only allow when argument type is
> +	 * pointer to scalar, or struct composed (recursively) of scalars. When
> +	 * arg_mem_size is true, the pointer can be void *.
> +	 */
> +	if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
> +	    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
> +		verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
> +			argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
> +		return -EINVAL;
> +	}
> +	return arg_mem_size ? KF_ARG_PTR_TO_MEM_SIZE : KF_ARG_PTR_TO_MEM;
> +}
> +
> +static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> +					struct bpf_reg_state *reg,
> +					const struct btf_type *ref_t,
> +					const char *ref_tname, u32 ref_id,
> +					struct bpf_kfunc_call_arg_meta *meta,
> +					int argno)
> +{
> +	const struct btf_type *reg_ref_t;
> +	bool strict_type_match = false;
> +	const struct btf *reg_btf;
> +	const char *reg_ref_tname;
> +	u32 reg_ref_id;
> +
> +	if (reg->type == PTR_TO_BTF_ID) {
> +		reg_btf = reg->btf;
> +		reg_ref_id = reg->btf_id;
> +	} else {
> +		reg_btf = btf_vmlinux;
> +		reg_ref_id = *reg2btf_ids[base_type(reg->type)];
> +	}
> +
> +	if (is_kfunc_trusted_args(meta) || (is_kfunc_release(meta) && reg->ref_obj_id))
> +		strict_type_match = true;
> +
> +	reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
> +	reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
> +	if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
> +		verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
> +			meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
> +			btf_type_str(reg_ref_t), reg_ref_tname);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
> +					     struct bpf_reg_state *reg,
> +					     const struct btf_type *ref_t,
> +					     const char *ref_tname,
> +					     struct bpf_kfunc_call_arg_meta *meta,
> +					     int argno)
> +{
> +	struct btf_field *kptr_field;
> +
> +	/* check_func_arg_reg_off allows var_off for
> +	 * PTR_TO_MAP_VALUE, but we need fixed offset to find
> +	 * off_desc.
> +	 */
> +	if (!tnum_is_const(reg->var_off)) {
> +		verbose(env, "arg#0 must have constant offset\n");
> +		return -EINVAL;
> +	}
> +
> +	kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
> +	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
> +		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
> +			reg->off + reg->var_off.value);
> +		return -EINVAL;
> +	}
> +
> +	if (!btf_struct_ids_match(&env->log, meta->btf, ref_t->type, 0, kptr_field->kptr.btf,
> +				  kptr_field->kptr.btf_id, true)) {
> +		verbose(env, "kernel function %s args#%d expected pointer to %s %s\n",
> +			meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	const char *func_name = meta->func_name, *ref_tname;
> +	const struct btf *btf = meta->btf;
> +	const struct btf_param *args;
> +	u32 i, nargs;
> +	int ret;
> +
> +	args = (const struct btf_param *)(meta->func_proto + 1);
> +	nargs = btf_type_vlen(meta->func_proto);
> +	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> +		verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
> +			MAX_BPF_FUNC_REG_ARGS);
> +		return -EINVAL;
> +	}
> +
> +	/* Check that BTF function arguments match actual types that the
> +	 * verifier sees.
> +	 */
> +	for (i = 0; i < nargs; i++) {
> +		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
> +		const struct btf_type *t, *ref_t, *resolve_ret;
> +		enum bpf_arg_type arg_type = ARG_DONTCARE;
> +		u32 regno = i + 1, ref_id, type_size;
> +		bool is_ret_buf_sz = false;
> +		int kf_arg_type;
> +
> +		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> +		if (btf_type_is_scalar(t)) {
> +			if (reg->type != SCALAR_VALUE) {
> +				verbose(env, "R%d is not a scalar\n", regno);
> +				return -EINVAL;
> +			}
> +			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
> +					meta->r0_rdonly = true;
> +					is_ret_buf_sz = true;
> +			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
> +					is_ret_buf_sz = true;
> +			}

is_kfunc_arg_ret_buf_size() is more generic than its name says so.
Maybe is_scalar_arg_with_name() ?

Also looks wrong to triple check inside it:
 +	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
 +		return false;
when there was a check above.

> +
> +			if (is_ret_buf_sz) {
> +				if (meta->r0_size) {
> +					verbose(env, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
> +					return -EINVAL;
> +				}
> +
> +				if (!tnum_is_const(reg->var_off)) {
> +					verbose(env, "R%d is not a const\n", regno);
> +					return -EINVAL;
> +				}
> +
> +				meta->r0_size = reg->var_off.value;
> +				ret = mark_chain_precision(env, regno);
> +				if (ret)
> +					return ret;
> +			}
> +			continue;
> +		}
> +
> +		if (!btf_type_is_ptr(t)) {
> +			verbose(env, "Unrecognized arg#%d type %s\n", i, btf_type_str(t));
> +			return -EINVAL;
> +		}
> +
> +		if (reg->ref_obj_id) {
> +			if (is_kfunc_release(meta) && meta->ref_obj_id) {
> +				verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> +					regno, reg->ref_obj_id,
> +					meta->ref_obj_id);
> +				return -EFAULT;
> +			}
> +			meta->ref_obj_id = reg->ref_obj_id;
> +			if (is_kfunc_release(meta))
> +				meta->release_regno = regno;
> +		}
> +
> +		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
> +		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
> +
> +		kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs);
> +		if (kf_arg_type < 0)
> +			return kf_arg_type;
> +
> +		switch (kf_arg_type) {
> +		case KF_ARG_PTR_TO_BTF_ID:
> +			if (!is_kfunc_trusted_args(meta))
> +				break;
> +			if (!reg->ref_obj_id) {
> +				verbose(env, "R%d must be referenced\n", regno);
> +				return -EINVAL;
> +			}
> +			fallthrough;
> +		case KF_ARG_PTR_TO_CTX:
> +			/* Trusted arguments have the same offset checks as release arguments */
> +			arg_type |= OBJ_RELEASE;
> +			break;
> +		case KF_ARG_PTR_TO_KPTR_STRONG:
> +		case KF_ARG_PTR_TO_DYNPTR:
> +		case KF_ARG_PTR_TO_MEM:
> +		case KF_ARG_PTR_TO_MEM_SIZE:
> +			/* Trusted by default */
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			return -EFAULT;
> +		}
> +
> +		if (is_kfunc_release(meta) && reg->ref_obj_id)
> +			arg_type |= OBJ_RELEASE;
> +		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
> +		if (ret < 0)
> +			return ret;
> +
> +		switch (kf_arg_type) {
> +		case KF_ARG_PTR_TO_CTX:
> +			if (reg->type != PTR_TO_CTX) {
> +				verbose(env, "arg#%d expected pointer to ctx, but got %s\n", i, btf_type_str(t));
> +				return -EINVAL;
> +			}
> +			break;
> +		case KF_ARG_PTR_TO_KPTR_STRONG:
> +			if (reg->type != PTR_TO_MAP_VALUE) {
> +				verbose(env, "arg#0 expected pointer to map value\n");
> +				return -EINVAL;
> +			}
> +			ret = process_kf_arg_ptr_to_kptr_strong(env, reg, ref_t, ref_tname, meta, i);
> +			if (ret < 0)
> +				return ret;
> +			break;
> +		case KF_ARG_PTR_TO_DYNPTR:
> +			if (reg->type != PTR_TO_STACK) {
> +				verbose(env, "arg#%d expected pointer to stack\n", i);
> +				return -EINVAL;
> +			}
> +
> +			if (!is_dynptr_reg_valid_init(env, reg)) {
> +				verbose(env, "arg#%d pointer type %s %s must be valid and initialized\n",
> +					i, btf_type_str(ref_t), ref_tname);
> +				return -EINVAL;
> +			}
> +
> +			if (!is_dynptr_type_expected(env, reg, ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
> +				verbose(env, "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
> +					i, btf_type_str(ref_t), ref_tname);
> +				return -EINVAL;
> +			}
> +			break;
> +		case KF_ARG_PTR_TO_BTF_ID:
> +			/* Only base_type is checked, further checks are done here */
> +			if (reg->type != PTR_TO_BTF_ID &&
> +			    (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> +				verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> +				return -EINVAL;
> +			}
> +			ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> +			if (ret < 0)
> +				return ret;
> +			break;
> +		case KF_ARG_PTR_TO_MEM:
> +			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
> +			if (IS_ERR(resolve_ret)) {
> +				verbose(env, "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
> +					i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret));
> +				return -EINVAL;
> +			}
> +			ret = check_mem_reg(env, reg, regno, type_size);
> +			if (ret < 0)
> +				return ret;
> +			break;
> +		case KF_ARG_PTR_TO_MEM_SIZE:
> +			ret = check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1);
> +			if (ret < 0) {
> +				verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
> +				return ret;
> +			}
> +			/* Skip next '__sz' argument */
> +			i++;
> +			break;
> +		}
> +	}
> +
> +	if (is_kfunc_release(meta) && !meta->release_regno) {
> +		verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
> +			func_name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			    int *insn_idx_p)
>  {
>  	const struct btf_type *t, *func, *func_proto, *ptr_type;
>  	struct bpf_reg_state *regs = cur_regs(env);
> -	struct bpf_kfunc_arg_meta meta = { 0 };
>  	const char *func_name, *ptr_type_name;
> +	struct bpf_kfunc_call_arg_meta meta;
>  	u32 i, nargs, func_id, ptr_type_id;
>  	int err, insn_idx = *insn_idx_p;
>  	const struct btf_param *args;
>  	struct btf *desc_btf;
>  	u32 *kfunc_flags;
> -	bool acq;
>  
>  	/* skip for now, but return error when we find this in fixup_kfunc_call */
>  	if (!insn->imm)
> @@ -7892,24 +8396,34 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			func_name);
>  		return -EACCES;
>  	}
> -	if (*kfunc_flags & KF_DESTRUCTIVE && !capable(CAP_SYS_BOOT)) {
> -		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capabilities\n");
> +
> +	/* Prepare kfunc call metadata */
> +	memset(&meta, 0, sizeof(meta));
> +	meta.btf = desc_btf;
> +	meta.func_id = func_id;
> +	meta.kfunc_flags = *kfunc_flags;
> +	meta.func_proto = func_proto;
> +	meta.func_name = func_name;
> +
> +	if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) {
> +		verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n");
>  		return -EACCES;
>  	}
>  
> -	acq = *kfunc_flags & KF_ACQUIRE;
> -
> -	meta.flags = *kfunc_flags;
> +	if (is_kfunc_sleepable(&meta) && !env->prog->aux->sleepable) {
> +		verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name);
> +		return -EACCES;
> +	}
>  
>  	/* Check the arguments */
> -	err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta);
> +	err = check_kfunc_args(env, &meta);
>  	if (err < 0)
>  		return err;
>  	/* In case of release function, we get register number of refcounted
> -	 * PTR_TO_BTF_ID back from btf_check_kfunc_arg_match, do the release now
> +	 * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
>  	 */
> -	if (err) {
> -		err = release_reference(env, regs[err].ref_obj_id);
> +	if (meta.release_regno) {
> +		err = release_reference(env, regs[meta.release_regno].ref_obj_id);
>  		if (err) {
>  			verbose(env, "kfunc %s#%d reference has not been acquired before\n",
>  				func_name, func_id);
> @@ -7923,7 +8437,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  	/* Check return type */
>  	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
>  
> -	if (acq && !btf_type_is_struct_ptr(desc_btf, t)) {
> +	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
>  		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
>  		return -EINVAL;
>  	}
> @@ -7962,20 +8476,23 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
>  			regs[BPF_REG_0].btf_id = ptr_type_id;
>  		}
> -		if (*kfunc_flags & KF_RET_NULL) {
> +		if (is_kfunc_ret_null(&meta)) {
>  			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
>  			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
>  			regs[BPF_REG_0].id = ++env->id_gen;
>  		}
>  		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
> -		if (acq) {
> +		if (is_kfunc_acquire(&meta)) {
>  			int id = acquire_reference_state(env, insn_idx);
>  
>  			if (id < 0)
>  				return id;
> -			regs[BPF_REG_0].id = id;
> +			if (is_kfunc_ret_null(&meta))
> +				regs[BPF_REG_0].id = id;
>  			regs[BPF_REG_0].ref_obj_id = id;
>  		}
> +		if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id)
> +			regs[BPF_REG_0].id = ++env->id_gen;
>  	} /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */
>  
>  	nargs = btf_type_vlen(func_proto);
> diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
> index c210657d4d0a..55d641c1f126 100644
> --- a/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
> +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c
> @@ -22,7 +22,7 @@ static struct {
>  	 "arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0},
>  	{"not_valid_dynptr",
>  	 "arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0},
> -	{"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0},
> +	{"not_ptr_to_stack", "arg#0 expected pointer to stack", 0},
>  	{"dynptr_data_null", NULL, -EBADMSG},
>  };
>  
> diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> index e1a937277b54..86d6fef2e3b4 100644
> --- a/tools/testing/selftests/bpf/verifier/calls.c
> +++ b/tools/testing/selftests/bpf/verifier/calls.c
> @@ -109,7 +109,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
> +	.errstr = "arg#0 expected pointer to btf or socket",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_kfunc_call_test_acquire", 3 },
>  		{ "bpf_kfunc_call_test_release", 5 },
> diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> index fd683a32a276..55cba01c99d5 100644
> --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> @@ -142,7 +142,7 @@
>  	.kfunc = "bpf",
>  	.expected_attach_type = BPF_LSM_MAC,
>  	.flags = BPF_F_SLEEPABLE,
> -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> +	.errstr = "arg#0 expected pointer to btf or socket",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_lookup_user_key", 2 },
>  		{ "bpf_key_put", 4 },
> @@ -163,7 +163,7 @@
>  	.kfunc = "bpf",
>  	.expected_attach_type = BPF_LSM_MAC,
>  	.flags = BPF_F_SLEEPABLE,
> -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> +	.errstr = "arg#0 expected pointer to btf or socket",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_lookup_system_key", 1 },
>  		{ "bpf_key_put", 3 },
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new
  2022-11-14 19:15 ` [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new Kumar Kartikeya Dwivedi
@ 2022-11-15  6:19   ` Alexei Starovoitov
  2022-11-15 16:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  6:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:38AM +0530, Kumar Kartikeya Dwivedi wrote:
> Introduce type safe memory allocator bpf_obj_new for BPF programs. The
> kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
> to kfuncs still requires having them in prototype, unlike BPF helpers
> which always take 5 arguments and have them checked using bpf_func_proto
> in verifier, ignoring unset argument types.
> 
> Introduce __ign suffix to ignore a specific kfunc argument during type
> checks, then use this to introduce support for passing type metadata to
> the bpf_obj_new_impl kfunc.
> 
> The user passes BTF ID of the type it wants to allocates in program BTF,
> the verifier then rewrites the first argument as the size of this type,
> after performing some sanity checks (to ensure it exists and it is a
> struct type).
> 
> The second argument is also fixed up and passed by the verifier. This is
> the btf_struct_meta for the type being allocated. It would be needed
> mostly for the offset array which is required for zero initializing
> special fields while leaving the rest of storage in unitialized state.
> 
> It would also be needed in the next patch to perform proper destruction
> of the object's special fields.
> 
> Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
> using the any context BPF memory allocator introduced recently. To this
> end, a global instance of the BPF memory allocator is initialized on
> boot to be used for this purpose. This 'bpf_global_ma' serves all
> allocations for bpf_obj_new. In the future, bpf_obj_new variants will
> allow specifying a custom allocator.
> 
> Note that now that bpf_obj_new can be used to allocate objects that can
> be linked to BPF linked list (when future linked list helpers are
> available), we need to also free the elements using bpf_mem_free.
> However, since the draining of elements is done outside the
> bpf_spin_lock, we need to do migrate_disable around the call since
> bpf_list_head_free can be called from map free path where migration is
> enabled. Otherwise, when called from BPF programs migration is already
> disabled.
> 
> A convenience macro is included in the bpf_experimental.h header to hide
> over the ugly details of the implementation, leading to user code
> looking similar to a language level extension which allocates and
> constructs fields of a user type.
> 
> struct bar {
> 	struct bpf_list_node node;
> };
> 
> struct foo {
> 	struct bpf_spin_lock lock;
> 	struct bpf_list_head head __contains(bar, node);
> };
> 
> void prog(void) {
> 	struct foo *f;
> 
> 	f = bpf_obj_new(typeof(*f));
> 	if (!f)
> 		return;
> 	...
> }
> 
> A key piece of this story is still missing, i.e. the free function,
> which will come in the next patch.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h                           |  21 ++--
>  include/linux/bpf_verifier.h                  |   2 +
>  kernel/bpf/core.c                             |  16 +++
>  kernel/bpf/helpers.c                          |  47 ++++++--
>  kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
>  .../testing/selftests/bpf/bpf_experimental.h  |  25 ++++
>  6 files changed, 195 insertions(+), 23 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 62a16b699e71..4635e31bd6fc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -54,6 +54,8 @@ struct cgroup;
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
>  extern struct kobject *btf_kobj;
> +extern struct bpf_mem_alloc bpf_global_ma;
> +extern bool bpf_global_ma_set;
>  
>  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> @@ -333,16 +335,19 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
>  	return rec->field_mask & type;
>  }
>  
> -static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> +static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
>  {
> -	if (!IS_ERR_OR_NULL(map->record)) {
> -		struct btf_field *fields = map->record->fields;
> -		u32 cnt = map->record->cnt;
> -		int i;
> +	int i;
>  
> -		for (i = 0; i < cnt; i++)
> -			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
> -	}
> +	if (!foffs)
> +		return;
> +	for (i = 0; i < foffs->cnt; i++)
> +		memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
> +}
> +
> +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> +{
> +	bpf_obj_init(map->field_offs, dst);
>  }
>  
>  /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 887fa4d922f6..306fc1d6cc4a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -427,6 +427,8 @@ struct bpf_insn_aux_data {
>  		 */
>  		struct bpf_loop_inline_state loop_inline_state;
>  	};
> +	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
> +	struct btf_struct_meta *kptr_struct_meta;
>  	u64 map_key_state; /* constant (32 bit) key tracking for maps */
>  	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
>  	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 9c16338bcbe8..2e57fc839a5c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -34,6 +34,7 @@
>  #include <linux/log2.h>
>  #include <linux/bpf_verifier.h>
>  #include <linux/nodemask.h>
> +#include <linux/bpf_mem_alloc.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/unaligned.h>
> @@ -60,6 +61,9 @@
>  #define CTX	regs[BPF_REG_CTX]
>  #define IMM	insn->imm
>  
> +struct bpf_mem_alloc bpf_global_ma;
> +bool bpf_global_ma_set;
> +
>  /* No hurry in this branch
>   *
>   * Exported for the bpf jit load helper.
> @@ -2746,6 +2750,18 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
>  	return -ENOTSUPP;
>  }
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +static int __init bpf_global_ma_init(void)
> +{
> +	int ret;
> +
> +	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
> +	bpf_global_ma_set = !ret;
> +	return ret;
> +}
> +late_initcall(bpf_global_ma_init);
> +#endif
> +
>  DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>  EXPORT_SYMBOL(bpf_stats_enabled_key);
>  
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5bc0b9f0f306..c4f1c22cc44c 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -19,6 +19,7 @@
>  #include <linux/proc_ns.h>
>  #include <linux/security.h>
>  #include <linux/btf_ids.h>
> +#include <linux/bpf_mem_alloc.h>
>  
>  #include "../../lib/kstrtox.h"
>  
> @@ -1735,25 +1736,57 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  
>  		obj -= field->list_head.node_offset;
>  		head = head->next;
> -		/* TODO: Rework later */
> -		kfree(obj);
> +		/* The contained type can also have resources, including a
> +		 * bpf_list_head which needs to be freed.
> +		 */
> +		bpf_obj_free_fields(field->list_head.value_rec, obj);
> +		/* bpf_mem_free requires migrate_disable(), since we can be
> +		 * called from map free path as well apart from BPF program (as
> +		 * part of map ops doing bpf_obj_free_fields).
> +		 */
> +		migrate_disable();
> +		bpf_mem_free(&bpf_global_ma, obj);
> +		migrate_enable();
>  	}
>  }
>  
> -BTF_SET8_START(tracing_btf_ids)
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in vmlinux BTF");
> +
> +void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
> +{
> +	struct btf_struct_meta *meta = meta__ign;
> +	u64 size = local_type_id__k;
> +	void *p;
> +
> +	if (unlikely(!bpf_global_ma_set))
> +		return NULL;
> +	p = bpf_mem_alloc(&bpf_global_ma, size);
> +	if (!p)
> +		return NULL;
> +	if (meta)
> +		bpf_obj_init(meta->field_offs, p);
> +	return p;
> +}
> +
> +__diag_pop();
> +
> +BTF_SET8_START(generic_btf_ids)
>  #ifdef CONFIG_KEXEC_CORE
>  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>  #endif
> -BTF_SET8_END(tracing_btf_ids)
> +BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> +BTF_SET8_END(generic_btf_ids)
>  
> -static const struct btf_kfunc_id_set tracing_kfunc_set = {
> +static const struct btf_kfunc_id_set generic_kfunc_set = {
>  	.owner = THIS_MODULE,
> -	.set   = &tracing_btf_ids,
> +	.set   = &generic_btf_ids,
>  };
>  
>  static int __init kfunc_init(void)
>  {
> -	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
>  }
>  
>  late_initcall(kfunc_init);
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a4a1424b19a5..c7f5d83783db 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7948,6 +7948,11 @@ static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
>  	return __kfunc_param_match_suffix(btf, arg, "__k");
>  }
>  
> +static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
> +{
> +	return __kfunc_param_match_suffix(btf, arg, "__ign");
> +}
> +
>  static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
>  				      const struct btf_param *arg,
>  				      const struct bpf_reg_state *reg,
> @@ -8216,6 +8221,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  		int kf_arg_type;
>  
>  		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> +
> +		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
> +			continue;
> +
>  		if (btf_type_is_scalar(t)) {
>  			if (reg->type != SCALAR_VALUE) {
>  				verbose(env, "R%d is not a scalar\n", regno);
> @@ -8395,6 +8404,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>  	return 0;
>  }
>  
> +enum special_kfunc_type {
> +	KF_bpf_obj_new_impl,
> +};
> +
> +BTF_SET_START(special_kfunc_set)
> +BTF_ID(func, bpf_obj_new_impl)
> +BTF_SET_END(special_kfunc_set)
> +
> +BTF_ID_LIST(special_kfunc_list)
> +BTF_ID(func, bpf_obj_new_impl)
> +
>  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			    int *insn_idx_p)
>  {
> @@ -8469,17 +8489,64 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
>  
>  	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
> -		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> -		return -EINVAL;
> +		/* Only exception is bpf_obj_new_impl */
> +		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl]) {
> +			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> +			return -EINVAL;
> +		}
>  	}
>  
>  	if (btf_type_is_scalar(t)) {
>  		mark_reg_unknown(env, regs, BPF_REG_0);
>  		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
>  	} else if (btf_type_is_ptr(t)) {
> -		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
> -						   &ptr_type_id);
> -		if (!btf_type_is_struct(ptr_type)) {
> +		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
> +
> +		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
> +			if (!btf_type_is_void(ptr_type)) {
> +				verbose(env, "kernel function %s must have void * return type\n",
> +					meta.func_name);
> +				return -EINVAL;
> +			}

Here you're checking that void *bpf_obj_new_impl and obj_drop actually were
declared with 'void *' return ?
and wasting run-time cycle to check that??
Please don't.

Even if we miss that during code review there is no safety issue.

> +			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
> +				const struct btf_type *ret_t;
> +				struct btf *ret_btf;
> +				u32 ret_btf_id;
> +
> +				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
> +					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
> +					return -EINVAL;
> +				}
> +
> +				ret_btf = env->prog->aux->btf;
> +				ret_btf_id = meta.arg_constant.value;
> +
> +				/* This may be NULL due to user not supplying a BTF */
> +				if (!ret_btf) {
> +					verbose(env, "bpf_obj_new requires prog BTF\n");
> +					return -EINVAL;
> +				}
> +
> +				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
> +				if (!ret_t || !__btf_type_is_struct(ret_t)) {
> +					verbose(env, "bpf_obj_new type ID argument must be of a struct\n");
> +					return -EINVAL;
> +				}
> +
> +				mark_reg_known_zero(env, regs, BPF_REG_0);
> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> +				regs[BPF_REG_0].btf = ret_btf;
> +				regs[BPF_REG_0].btf_id = ret_btf_id;
> +
> +				env->insn_aux_data[insn_idx].obj_new_size = ret_t->size;
> +				env->insn_aux_data[insn_idx].kptr_struct_meta =
> +					btf_find_struct_meta(ret_btf, ret_btf_id);
> +			} else {
> +				verbose(env, "kernel function %s unhandled dynamic return type\n",
> +					meta.func_name);
> +				return -EFAULT;
> +			}
> +		} else if (!__btf_type_is_struct(ptr_type)) {
>  			if (!meta.r0_size) {
>  				ptr_type_name = btf_name_by_offset(desc_btf,
>  								   ptr_type->name_off);
> @@ -8507,6 +8574,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
>  			regs[BPF_REG_0].btf_id = ptr_type_id;
>  		}
> +
>  		if (is_kfunc_ret_null(&meta)) {
>  			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
>  			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
> @@ -14644,8 +14712,8 @@ static int fixup_call_args(struct bpf_verifier_env *env)
>  	return err;
>  }
>  
> -static int fixup_kfunc_call(struct bpf_verifier_env *env,
> -			    struct bpf_insn *insn)
> +static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> +			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
>  {
>  	const struct bpf_kfunc_desc *desc;
>  
> @@ -14664,8 +14732,21 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env,
>  		return -EFAULT;
>  	}
>  
> +	*cnt = 0;
>  	insn->imm = desc->imm;
> +	if (insn->off)
> +		return 0;
> +	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
> +		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
> +		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
> +		u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size;
>  
> +		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, obj_new_size);
> +		insn_buf[1] = addr[0];
> +		insn_buf[2] = addr[1];
> +		insn_buf[3] = *insn;
> +		*cnt = 4;
> +	}
>  	return 0;
>  }
>  
> @@ -14807,9 +14888,19 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>  		if (insn->src_reg == BPF_PSEUDO_CALL)
>  			continue;
>  		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
> -			ret = fixup_kfunc_call(env, insn);
> +			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
>  			if (ret)
>  				return ret;
> +			if (cnt == 0)
> +				continue;
> +
> +			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +			if (!new_prog)
> +				return -ENOMEM;
> +
> +			delta	 += cnt - 1;
> +			env->prog = prog = new_prog;
> +			insn	  = new_prog->insnsi + i + delta;
>  			continue;
>  		}
>  
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> new file mode 100644
> index 000000000000..aeb6a7fcb7c4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -0,0 +1,25 @@
> +#ifndef __BPF_EXPERIMENTAL__
> +#define __BPF_EXPERIMENTAL__
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +
> +/* Description
> + *	Allocates an object of the type represented by 'local_type_id' in
> + *	program BTF. User may use the bpf_core_type_id_local macro to pass the
> + *	type ID of a struct in program BTF.
> + *
> + *	The 'local_type_id' parameter must be a known constant.
> + *	The 'meta' parameter is a hidden argument that is ignored.
> + * Returns
> + *	A pointer to an object of the type corresponding to the passed in
> + *	'local_type_id', or NULL on failure.
> + */
> +extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
> +
> +/* Convenience macro to wrap over bpf_obj_new_impl */
> +#define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
> +
> +#endif
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset
  2022-11-14 19:15 ` [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
@ 2022-11-15  6:23   ` Alexei Starovoitov
  2022-11-15 16:57     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  6:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:40AM +0530, Kumar Kartikeya Dwivedi wrote:
>  	if (type_may_be_null(reg->type) && reg->id == id &&
>  	    !WARN_ON_ONCE(!reg->id)) {
> -		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||
> -				 !tnum_equals_const(reg->var_off, 0) ||
> -				 reg->off)) {
> +		if (reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0) || reg->off) {
....
> +			if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0)))

That is too much copy-paste between two lines.
Please combine the checks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-14 19:15 ` [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
@ 2022-11-15  6:26   ` Alexei Starovoitov
  2022-11-15 16:59     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15  6:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 12:45:41AM +0530, Kumar Kartikeya Dwivedi wrote:
> +}
> +
> +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> +					   struct bpf_reg_state *reg, u32 regno,
> +					   struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	struct btf_struct_meta *struct_meta;
> +	struct btf_field *field;
> +	struct btf_record *rec;
> +	u32 list_node_off;
> +
> +	if (meta->btf != btf_vmlinux ||
> +	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
> +	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
> +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");

typo. bpf_list_node ?

> +		return -EFAULT;
> +	}
> +
> +	if (!tnum_is_const(reg->var_off)) {
> +		verbose(env,
> +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",

same typo?

> +			regno);
> +		return -EINVAL;
> +	}
> +
> +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> +	if (!struct_meta) {
> +		verbose(env, "bpf_list_node not found for allocated object\n");
> +		return -EINVAL;
> +	}
> +	rec = struct_meta->record;
> +
> +	list_node_off = reg->off + reg->var_off.value;
> +	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> +	if (!field || field->offset != list_node_off) {
> +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> +		return -EINVAL;
> +	}
> +
> +	field = meta->arg_list_head.field;
> +
> +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
> +				  field->list_head.value_btf_id, true)) {
> +		verbose(env, "bpf_list_head value type does not match arg#1\n");

and the same typo again?!

> +		return -EINVAL;
> +	}
> +
> +	if (list_node_off != field->list_head.node_offset) {
> +		verbose(env, "arg#1 offset must be for bpf_list_node at off=%d\n",
> +			field->list_head.node_offset);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h
  2022-11-14 19:15 ` [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
@ 2022-11-15 16:40   ` Dave Marchevsky
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Marchevsky @ 2022-11-15 16:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

On 11/14/22 2:15 PM, Kumar Kartikeya Dwivedi wrote:
> Add user facing __contains macro which provides a convenient wrapper
> over the verbose kernel specific BTF declaration tag required to
> annotate BPF list head structs in user types.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support
  2022-11-15  5:48   ` Alexei Starovoitov
@ 2022-11-15 16:47     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:18:18AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:29AM +0530, Kumar Kartikeya Dwivedi wrote:
> > Introduce support for representing pointers to objects allocated by the
> > BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF.
> > This is indicated by the presence of MEM_ALLOC type flag in reg->type to
> > avoid having to check btf_is_kernel when trying to match argument types
> > in helpers.
> >
> > Whenever walking such types, any pointers being walked will always yield
> > a SCALAR instead of pointer. In the future we might permit kptr inside
> > such allocated objects (either kernel or local), and it will then form a
>
> (either kernel or program allocated) ?
>
> > PTR_TO_BTF_ID of the respective type.
> >
> > For now, such allocated objects will always be referenced in verifier
> > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > to such objects, as long fields that are special are not touched
> > (support for which will be added in subsequent patches). Note that once
> > such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write
> > to it.
> >
> > No PROBE_MEM handling is therefore done for loads into this type unless
> > PTR_UNTRUSTED is part of the register type, since they can never be in
> > an undefined state, and their lifetime will always be valid.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h   | 11 +++++++++++
> >  kernel/bpf/btf.c      |  5 +++++
> >  kernel/bpf/verifier.c | 25 +++++++++++++++++++++++--
> >  3 files changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 49f9d2bec401..3cab113b149e 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> >  	/* Size is known at compile time. */
> >  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
> >
> > +	/* MEM is of a an allocated object of type from program BTF. This is
> > +	 * used to tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > +	 */
> > +	MEM_ALLOC		= BIT(11 + BPF_BASE_TYPE_BITS),
> > +
> >  	__BPF_TYPE_FLAG_MAX,
> >  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
> >  };
> > @@ -2791,4 +2796,10 @@ struct bpf_key {
> >  	bool has_ref;
> >  };
> >  #endif /* CONFIG_KEYS */
> > +
> > +static inline bool type_is_alloc(u32 type)
> > +{
> > +	return type & MEM_ALLOC;
> > +}
> > +
> >  #endif /* _LINUX_BPF_H */
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 875355ff3718..9a596f430558 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -6034,6 +6034,11 @@ int btf_struct_access(struct bpf_verifier_log *log,
> >
> >  		switch (err) {
> >  		case WALK_PTR:
> > +			/* For local types, the destination register cannot
> > +			 * become a pointer again.
> > +			 */
> > +			if (type_is_alloc(reg->type))
> > +				return SCALAR_VALUE;
> >  			/* If we found the pointer or scalar on t+off,
> >  			 * we're done.
> >  			 */
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 5e74f460dfd0..d726d55622c9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -4687,14 +4687,27 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >  		return -EACCES;
> >  	}
> >
> > -	if (env->ops->btf_struct_access) {
> > +	if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
> > +		if (!btf_is_kernel(reg->btf)) {
> > +			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
> > +			return -EFAULT;
> > +		}
> >  		ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
> >  	} else {
> > -		if (atype != BPF_READ) {
> > +		/* Writes are permitted with default btf_struct_access for local
> > +		 * kptrs (which always have ref_obj_id > 0), but not for
>
> for program allocated objects (which always have ref_obj_id > 0) ?
>

Ack for both.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects
  2022-11-15  5:50   ` Alexei Starovoitov
@ 2022-11-15 16:48     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:20:34AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:30AM +0530, Kumar Kartikeya Dwivedi wrote:
> > +	parse:
> > +		if (!tab) {
> > +			tab = kzalloc(offsetof(struct btf_struct_metas, types[1]),
> > +				      GFP_KERNEL | __GFP_NOWARN);
> > +			if (!tab)
> > +				return ERR_PTR(-ENOMEM);
> > +		} else {
> > +			struct btf_struct_metas *new_tab;
> > +
> > +			new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab->cnt + 1]),
> > +					   GFP_KERNEL | __GFP_NOWARN);
> > +			if (!new_tab) {
> > +				ret = -ENOMEM;
> > +				goto free;
> > +			}
> > +			tab = new_tab;
> > +		}
>
> If @p is %NULL, krealloc() behaves exactly like kmalloc().
>
> That can help to simplify above a bit?
>

Great suggestion, I made the change.

> > +		type = &tab->types[tab->cnt];
> > +
> > +		type->btf_id = i;
> > +		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size);
> > +		if (IS_ERR_OR_NULL(record)) {
> > +			ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT;
> > +			goto free;
> > +		}
> > +		foffs = btf_parse_field_offs(record);
> > +		if (WARN_ON_ONCE(IS_ERR_OR_NULL(foffs))) {
>
> WARN_ON_ONCE ?
> Pls add a comment.
>

I'll drop this, it doesn't have to be a WARN_ON_ONCE. But still we must fail
when field_offs can't be constructed for a valid btf_record. btf_record enables
field discovery, but without field_offs we will never skip over them or rewrite
them to 0 (and zero init is critical for everything else which assumes it).

I'll add a comment that both should be set or we should simply bail.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling
  2022-11-15  6:10   ` Alexei Starovoitov
@ 2022-11-15 16:51     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:40:50AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:35AM +0530, Kumar Kartikeya Dwivedi wrote:
> > As we continue to add more features, argument types, kfunc flags, and
> > different extensions to kfuncs, the code to verify the correctness of
> > the kfunc prototype wrt the passed in registers has become ad-hoc and
> > ugly to read. To make life easier, and make a very clear split between
> > different stages of argument processing, move all the code into
> > verifier.c and refactor into easier to read helpers and functions.
> >
> > This also makes sharing code within the verifier easier with kfunc
> > argument processing. This will be more and more useful in later patches
> > as we are now moving to implement very core BPF helpers as kfuncs, to
> > keep them experimental before baking into UAPI.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > [...]
> > +/* Returns true if struct is composed of scalars, 4 levels of nesting allowed */
> > +static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env,
> > +					const struct btf *btf,
> > +					const struct btf_type *t, int rec)
> > +{
> > +	const struct btf_type *member_type;
> > +	const struct btf_member *member;
> > +	u32 i;
> > +
> > +	if (!btf_type_is_struct(t))
> > +		return false;
> > +
> > +	for_each_member(i, t, member) {
> > +		const struct btf_array *array;
> > +
> > +		member_type = btf_type_skip_modifiers(btf, member->type, NULL);
> > +		if (btf_type_is_struct(member_type)) {
> > +			if (rec >= 3) {
> > +				verbose(env, "max struct nesting depth exceeded\n");
> > +				return false;
> > +			}
> > +			if (!__btf_type_is_scalar_struct(env, btf, member_type, rec + 1))
> > +				return false;
> > +			continue;
> > +		}
> > +		if (btf_type_is_array(member_type)) {
> > +			array = btf_array(member_type);
> > +			if (!array->nelems)
> > +				return false;
> > +			member_type = btf_type_skip_modifiers(btf, array->type, NULL);
> > +			if (!btf_type_is_scalar(member_type))
> > +				return false;
> > +			continue;
> > +		}
> > +		if (!btf_type_is_scalar(member_type))
> > +			return false;
> > +	}
> > +	return true;
> > +}
>
> Deleting the code from the next patch can be combined with this patch,
> since it's a pure code move?
>
> Similar to few funcs that do pure code move... it's better to have them
> in a single patch.
>
> Not sure about 2 patch split strategy in general.
> Not clear whether it helps or hurts the code review.
>

No problem, I can squash both into a single patch.

> > +
> > +
> > +static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
> > +#ifdef CONFIG_NET
> > +	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> > +	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> > +	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
> > +#endif
> > +};
> > +
> > +enum kfunc_ptr_arg_type {
> > +	KF_ARG_PTR_TO_CTX,
> > +	KF_ARG_PTR_TO_KPTR_STRONG,   /* PTR_TO_KPTR but type specific */
>
> What does the STRONG suffix signify?
>
> PTR_TO_KPTR should always be safe.
> Just to make it different from ARG_PTR_TO_KPTR in kptr_xchg that
> has 'void *' arg and "auto converts" the type?
>

Yes.

> Here STRONG means that the type of the arg should match?

Yes.

>
> I think it's too verbose.
> Just KF_ARG_PTR_TO_KPTR would be clear enough.
> If we ever have another kptr_xchg that is done as kfunc
> we can add KF_ARG_PTR_TO_KPTR_AUTO or some other name that we can bikeshed later.
>

Ack, I'll drop the _STRONG suffix and simply call it ARG_PTR_TO_KPTR.

> > +	KF_ARG_PTR_TO_DYNPTR,
> > +	KF_ARG_PTR_TO_BTF_ID,	     /* Also covers reg2btf_ids conversions */
> > +	KF_ARG_PTR_TO_MEM,
> > +	KF_ARG_PTR_TO_MEM_SIZE,	     /* Size derived from next argument, skip it */
> > +};
> > +
> > +static enum kfunc_ptr_arg_type
> > +get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> > +		       struct bpf_kfunc_call_arg_meta *meta,
> > +		       const struct btf_type *t, const struct btf_type *ref_t,
> > +		       const char *ref_tname, const struct btf_param *args,
> > +		       int argno, int nargs)
> > +{
> > +	u32 regno = argno + 1;
> > +	struct bpf_reg_state *regs = cur_regs(env);
> > +	struct bpf_reg_state *reg = &regs[regno];
> > +	bool arg_mem_size = false;
> > +
> > +	/* In this function, we verify the kfunc's BTF as per the argument type,
> > +	 * leaving the rest of the verification with respect to the register
> > +	 * type to our caller. When a set of conditions hold in the BTF type of
> > +	 * arguments, we resolve it to a known kfunc_ptr_arg_type.
> > +	 */
> > +	if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
> > +		return KF_ARG_PTR_TO_CTX;
> > +
> > +	if (is_kfunc_arg_kptr_get(meta, argno)) {
> > +		if (!btf_type_is_ptr(ref_t)) {
> > +			verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n");
> > +			return -EINVAL;
> > +		}
> > +		ref_t = btf_type_by_id(meta->btf, ref_t->type);
> > +		ref_tname = btf_name_by_offset(meta->btf, ref_t->name_off);
> > +		if (!btf_type_is_struct(ref_t)) {
> > +			verbose(env, "kernel function %s args#0 pointer type %s %s is not supported\n",
> > +				meta->func_name, btf_type_str(ref_t), ref_tname);
> > +			return -EINVAL;
> > +		}
> > +		return KF_ARG_PTR_TO_KPTR_STRONG;
> > +	}
> > +
> > +	if (is_kfunc_arg_dynptr(meta->btf, &args[argno]))
> > +		return KF_ARG_PTR_TO_DYNPTR;
> > +
> > +	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> > +		if (!btf_type_is_struct(ref_t)) {
> > +			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> > +				meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> > +			return -EINVAL;
> > +		}
> > +		return KF_ARG_PTR_TO_BTF_ID;
> > +	}
> > +
> > +	if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))
> > +		arg_mem_size = true;
> > +
> > +	/* This is the catch all argument type of register types supported by
> > +	 * check_helper_mem_access. However, we only allow when argument type is
> > +	 * pointer to scalar, or struct composed (recursively) of scalars. When
> > +	 * arg_mem_size is true, the pointer can be void *.
> > +	 */
> > +	if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) &&
> > +	    (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
> > +		verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
> > +			argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : "");
> > +		return -EINVAL;
> > +	}
> > +	return arg_mem_size ? KF_ARG_PTR_TO_MEM_SIZE : KF_ARG_PTR_TO_MEM;
> > +}
> > +
> > +static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> > +					struct bpf_reg_state *reg,
> > +					const struct btf_type *ref_t,
> > +					const char *ref_tname, u32 ref_id,
> > +					struct bpf_kfunc_call_arg_meta *meta,
> > +					int argno)
> > +{
> > +	const struct btf_type *reg_ref_t;
> > +	bool strict_type_match = false;
> > +	const struct btf *reg_btf;
> > +	const char *reg_ref_tname;
> > +	u32 reg_ref_id;
> > +
> > +	if (reg->type == PTR_TO_BTF_ID) {
> > +		reg_btf = reg->btf;
> > +		reg_ref_id = reg->btf_id;
> > +	} else {
> > +		reg_btf = btf_vmlinux;
> > +		reg_ref_id = *reg2btf_ids[base_type(reg->type)];
> > +	}
> > +
> > +	if (is_kfunc_trusted_args(meta) || (is_kfunc_release(meta) && reg->ref_obj_id))
> > +		strict_type_match = true;
> > +
> > +	reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id);
> > +	reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off);
> > +	if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) {
> > +		verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
> > +			meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1,
> > +			btf_type_str(reg_ref_t), reg_ref_tname);
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int process_kf_arg_ptr_to_kptr_strong(struct bpf_verifier_env *env,
> > +					     struct bpf_reg_state *reg,
> > +					     const struct btf_type *ref_t,
> > +					     const char *ref_tname,
> > +					     struct bpf_kfunc_call_arg_meta *meta,
> > +					     int argno)
> > +{
> > +	struct btf_field *kptr_field;
> > +
> > +	/* check_func_arg_reg_off allows var_off for
> > +	 * PTR_TO_MAP_VALUE, but we need fixed offset to find
> > +	 * off_desc.
> > +	 */
> > +	if (!tnum_is_const(reg->var_off)) {
> > +		verbose(env, "arg#0 must have constant offset\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR);
> > +	if (!kptr_field || kptr_field->type != BPF_KPTR_REF) {
> > +		verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n",
> > +			reg->off + reg->var_off.value);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!btf_struct_ids_match(&env->log, meta->btf, ref_t->type, 0, kptr_field->kptr.btf,
> > +				  kptr_field->kptr.btf_id, true)) {
> > +		verbose(env, "kernel function %s args#%d expected pointer to %s %s\n",
> > +			meta->func_name, argno, btf_type_str(ref_t), ref_tname);
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +	const char *func_name = meta->func_name, *ref_tname;
> > +	const struct btf *btf = meta->btf;
> > +	const struct btf_param *args;
> > +	u32 i, nargs;
> > +	int ret;
> > +
> > +	args = (const struct btf_param *)(meta->func_proto + 1);
> > +	nargs = btf_type_vlen(meta->func_proto);
> > +	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> > +		verbose(env, "Function %s has %d > %d args\n", func_name, nargs,
> > +			MAX_BPF_FUNC_REG_ARGS);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Check that BTF function arguments match actual types that the
> > +	 * verifier sees.
> > +	 */
> > +	for (i = 0; i < nargs; i++) {
> > +		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
> > +		const struct btf_type *t, *ref_t, *resolve_ret;
> > +		enum bpf_arg_type arg_type = ARG_DONTCARE;
> > +		u32 regno = i + 1, ref_id, type_size;
> > +		bool is_ret_buf_sz = false;
> > +		int kf_arg_type;
> > +
> > +		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> > +		if (btf_type_is_scalar(t)) {
> > +			if (reg->type != SCALAR_VALUE) {
> > +				verbose(env, "R%d is not a scalar\n", regno);
> > +				return -EINVAL;
> > +			}
> > +			if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdonly_buf_size")) {
> > +					meta->r0_rdonly = true;
> > +					is_ret_buf_sz = true;
> > +			} else if (is_kfunc_arg_ret_buf_size(btf, &args[i], reg, "rdwr_buf_size")) {
> > +					is_ret_buf_sz = true;
> > +			}
>
> is_kfunc_arg_ret_buf_size() is more generic than its name says so.
> Maybe is_scalar_arg_with_name() ?
>

I named it is_kfunc_scalar_arg_with_name for consistency.

> Also looks wrong to triple check inside it:
>  +	if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
>  +		return false;
> when there was a check above.
>

Right, this was in the original code that I copied over, but it's no longer
needed given where it's called now.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}
  2022-11-14 19:15 ` [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back} Kumar Kartikeya Dwivedi
@ 2022-11-15 16:52   ` Dave Marchevsky
  2022-11-15 17:01     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Marchevsky @ 2022-11-15 16:52 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

On 11/14/22 2:15 PM, Kumar Kartikeya Dwivedi wrote:
> This commit implements the delayed release logic for bpf_list_push_front
> and bpf_list_push_back.
> 
> Once a node has been added to the list, it's pointer changes to
> PTR_UNTRUSTED. However, it is only released once the lock protecting the
> list is unlocked. For such PTR_TO_BTF_ID | MEM_ALLOC with PTR_UNTRUSTED
> set but an active ref_obj_id, it is still permitted to read them as long
> as the lock is held. Writing to them is not allowed.
> 
> This allows having read access to push items we no longer own until we
> release the lock guarding the list, allowing a little more flexibility
> when working with these APIs.
> 
> Note that enabling write support has fairly tricky interactions with
> what happens inside the critical section. Just as an example, currently,
> bpf_obj_drop is not permitted, but if it were, being able to write to
> the PTR_UNTRUSTED pointer while the object gets released back to the
> memory allocator would violate safety properties we wish to guarantee
> (i.e. not crashing the kernel). The memory could be reused for a
> different type in the BPF program or even in the kernel as it gets
> eventually kfree'd.
> 
> Not enabling bpf_obj_drop inside the critical section would appear to
> prevent all of the above, but that is more of an artifical limitation
> right now. Since the write support is tangled with how we handle
> potential aliasing of nodes inside the critical section that may or may
> not be part of the list anymore, it has been deferred to a future patch.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Can the two WARN_ON_ONCE in this patch be converted to
verifier-log-and-EFAULT? Looks like they're both in
functions with access to 'env' and are checking for
scenarios that should be considered bugs in the verifier.

Aside from that style nit, logic and patch summary updates
here LGTM.

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new
  2022-11-15  6:19   ` Alexei Starovoitov
@ 2022-11-15 16:53     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:49:12AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:38AM +0530, Kumar Kartikeya Dwivedi wrote:
> > Introduce type safe memory allocator bpf_obj_new for BPF programs. The
> > kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
> > to kfuncs still requires having them in prototype, unlike BPF helpers
> > which always take 5 arguments and have them checked using bpf_func_proto
> > in verifier, ignoring unset argument types.
> >
> > Introduce __ign suffix to ignore a specific kfunc argument during type
> > checks, then use this to introduce support for passing type metadata to
> > the bpf_obj_new_impl kfunc.
> >
> > The user passes BTF ID of the type it wants to allocates in program BTF,
> > the verifier then rewrites the first argument as the size of this type,
> > after performing some sanity checks (to ensure it exists and it is a
> > struct type).
> >
> > The second argument is also fixed up and passed by the verifier. This is
> > the btf_struct_meta for the type being allocated. It would be needed
> > mostly for the offset array which is required for zero initializing
> > special fields while leaving the rest of storage in unitialized state.
> >
> > It would also be needed in the next patch to perform proper destruction
> > of the object's special fields.
> >
> > Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
> > using the any context BPF memory allocator introduced recently. To this
> > end, a global instance of the BPF memory allocator is initialized on
> > boot to be used for this purpose. This 'bpf_global_ma' serves all
> > allocations for bpf_obj_new. In the future, bpf_obj_new variants will
> > allow specifying a custom allocator.
> >
> > Note that now that bpf_obj_new can be used to allocate objects that can
> > be linked to BPF linked list (when future linked list helpers are
> > available), we need to also free the elements using bpf_mem_free.
> > However, since the draining of elements is done outside the
> > bpf_spin_lock, we need to do migrate_disable around the call since
> > bpf_list_head_free can be called from map free path where migration is
> > enabled. Otherwise, when called from BPF programs migration is already
> > disabled.
> >
> > A convenience macro is included in the bpf_experimental.h header to hide
> > over the ugly details of the implementation, leading to user code
> > looking similar to a language level extension which allocates and
> > constructs fields of a user type.
> >
> > struct bar {
> > 	struct bpf_list_node node;
> > };
> >
> > struct foo {
> > 	struct bpf_spin_lock lock;
> > 	struct bpf_list_head head __contains(bar, node);
> > };
> >
> > void prog(void) {
> > 	struct foo *f;
> >
> > 	f = bpf_obj_new(typeof(*f));
> > 	if (!f)
> > 		return;
> > 	...
> > }
> >
> > A key piece of this story is still missing, i.e. the free function,
> > which will come in the next patch.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h                           |  21 ++--
> >  include/linux/bpf_verifier.h                  |   2 +
> >  kernel/bpf/core.c                             |  16 +++
> >  kernel/bpf/helpers.c                          |  47 ++++++--
> >  kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
> >  .../testing/selftests/bpf/bpf_experimental.h  |  25 ++++
> >  6 files changed, 195 insertions(+), 23 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 62a16b699e71..4635e31bd6fc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -54,6 +54,8 @@ struct cgroup;
> >  extern struct idr btf_idr;
> >  extern spinlock_t btf_idr_lock;
> >  extern struct kobject *btf_kobj;
> > +extern struct bpf_mem_alloc bpf_global_ma;
> > +extern bool bpf_global_ma_set;
> >
> >  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
> >  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> > @@ -333,16 +335,19 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
> >  	return rec->field_mask & type;
> >  }
> >
> > -static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > +static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
> >  {
> > -	if (!IS_ERR_OR_NULL(map->record)) {
> > -		struct btf_field *fields = map->record->fields;
> > -		u32 cnt = map->record->cnt;
> > -		int i;
> > +	int i;
> >
> > -		for (i = 0; i < cnt; i++)
> > -			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
> > -	}
> > +	if (!foffs)
> > +		return;
> > +	for (i = 0; i < foffs->cnt; i++)
> > +		memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
> > +}
> > +
> > +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > +{
> > +	bpf_obj_init(map->field_offs, dst);
> >  }
> >
> >  /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 887fa4d922f6..306fc1d6cc4a 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -427,6 +427,8 @@ struct bpf_insn_aux_data {
> >  		 */
> >  		struct bpf_loop_inline_state loop_inline_state;
> >  	};
> > +	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
> > +	struct btf_struct_meta *kptr_struct_meta;
> >  	u64 map_key_state; /* constant (32 bit) key tracking for maps */
> >  	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
> >  	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 9c16338bcbe8..2e57fc839a5c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -34,6 +34,7 @@
> >  #include <linux/log2.h>
> >  #include <linux/bpf_verifier.h>
> >  #include <linux/nodemask.h>
> > +#include <linux/bpf_mem_alloc.h>
> >
> >  #include <asm/barrier.h>
> >  #include <asm/unaligned.h>
> > @@ -60,6 +61,9 @@
> >  #define CTX	regs[BPF_REG_CTX]
> >  #define IMM	insn->imm
> >
> > +struct bpf_mem_alloc bpf_global_ma;
> > +bool bpf_global_ma_set;
> > +
> >  /* No hurry in this branch
> >   *
> >   * Exported for the bpf jit load helper.
> > @@ -2746,6 +2750,18 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
> >  	return -ENOTSUPP;
> >  }
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int __init bpf_global_ma_init(void)
> > +{
> > +	int ret;
> > +
> > +	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
> > +	bpf_global_ma_set = !ret;
> > +	return ret;
> > +}
> > +late_initcall(bpf_global_ma_init);
> > +#endif
> > +
> >  DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> >  EXPORT_SYMBOL(bpf_stats_enabled_key);
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5bc0b9f0f306..c4f1c22cc44c 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -19,6 +19,7 @@
> >  #include <linux/proc_ns.h>
> >  #include <linux/security.h>
> >  #include <linux/btf_ids.h>
> > +#include <linux/bpf_mem_alloc.h>
> >
> >  #include "../../lib/kstrtox.h"
> >
> > @@ -1735,25 +1736,57 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
> >
> >  		obj -= field->list_head.node_offset;
> >  		head = head->next;
> > -		/* TODO: Rework later */
> > -		kfree(obj);
> > +		/* The contained type can also have resources, including a
> > +		 * bpf_list_head which needs to be freed.
> > +		 */
> > +		bpf_obj_free_fields(field->list_head.value_rec, obj);
> > +		/* bpf_mem_free requires migrate_disable(), since we can be
> > +		 * called from map free path as well apart from BPF program (as
> > +		 * part of map ops doing bpf_obj_free_fields).
> > +		 */
> > +		migrate_disable();
> > +		bpf_mem_free(&bpf_global_ma, obj);
> > +		migrate_enable();
> >  	}
> >  }
> >
> > -BTF_SET8_START(tracing_btf_ids)
> > +__diag_push();
> > +__diag_ignore_all("-Wmissing-prototypes",
> > +		  "Global functions as their definitions will be in vmlinux BTF");
> > +
> > +void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
> > +{
> > +	struct btf_struct_meta *meta = meta__ign;
> > +	u64 size = local_type_id__k;
> > +	void *p;
> > +
> > +	if (unlikely(!bpf_global_ma_set))
> > +		return NULL;
> > +	p = bpf_mem_alloc(&bpf_global_ma, size);
> > +	if (!p)
> > +		return NULL;
> > +	if (meta)
> > +		bpf_obj_init(meta->field_offs, p);
> > +	return p;
> > +}
> > +
> > +__diag_pop();
> > +
> > +BTF_SET8_START(generic_btf_ids)
> >  #ifdef CONFIG_KEXEC_CORE
> >  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
> >  #endif
> > -BTF_SET8_END(tracing_btf_ids)
> > +BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> > +BTF_SET8_END(generic_btf_ids)
> >
> > -static const struct btf_kfunc_id_set tracing_kfunc_set = {
> > +static const struct btf_kfunc_id_set generic_kfunc_set = {
> >  	.owner = THIS_MODULE,
> > -	.set   = &tracing_btf_ids,
> > +	.set   = &generic_btf_ids,
> >  };
> >
> >  static int __init kfunc_init(void)
> >  {
> > -	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
> > +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
> >  }
> >
> >  late_initcall(kfunc_init);
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index a4a1424b19a5..c7f5d83783db 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7948,6 +7948,11 @@ static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
> >  	return __kfunc_param_match_suffix(btf, arg, "__k");
> >  }
> >
> > +static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
> > +{
> > +	return __kfunc_param_match_suffix(btf, arg, "__ign");
> > +}
> > +
> >  static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
> >  				      const struct btf_param *arg,
> >  				      const struct bpf_reg_state *reg,
> > @@ -8216,6 +8221,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  		int kf_arg_type;
> >
> >  		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> > +
> > +		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
> > +			continue;
> > +
> >  		if (btf_type_is_scalar(t)) {
> >  			if (reg->type != SCALAR_VALUE) {
> >  				verbose(env, "R%d is not a scalar\n", regno);
> > @@ -8395,6 +8404,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  	return 0;
> >  }
> >
> > +enum special_kfunc_type {
> > +	KF_bpf_obj_new_impl,
> > +};
> > +
> > +BTF_SET_START(special_kfunc_set)
> > +BTF_ID(func, bpf_obj_new_impl)
> > +BTF_SET_END(special_kfunc_set)
> > +
> > +BTF_ID_LIST(special_kfunc_list)
> > +BTF_ID(func, bpf_obj_new_impl)
> > +
> >  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  			    int *insn_idx_p)
> >  {
> > @@ -8469,17 +8489,64 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
> >
> >  	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
> > -		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> > -		return -EINVAL;
> > +		/* Only exception is bpf_obj_new_impl */
> > +		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl]) {
> > +			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> > +			return -EINVAL;
> > +		}
> >  	}
> >
> >  	if (btf_type_is_scalar(t)) {
> >  		mark_reg_unknown(env, regs, BPF_REG_0);
> >  		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
> >  	} else if (btf_type_is_ptr(t)) {
> > -		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
> > -						   &ptr_type_id);
> > -		if (!btf_type_is_struct(ptr_type)) {
> > +		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
> > +
> > +		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
> > +			if (!btf_type_is_void(ptr_type)) {
> > +				verbose(env, "kernel function %s must have void * return type\n",
> > +					meta.func_name);
> > +				return -EINVAL;
> > +			}
>
> Here you're checking that void *bpf_obj_new_impl and obj_drop actually were
> declared with 'void *' return ?
> and wasting run-time cycle to check that??
> Please don't.
>
> Even if we miss that during code review there is no safety issue.
>

Right, I think the later linked list patch actually drops this hunk anyway. We
fail if any of the other cases are unhandled anyway.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset
  2022-11-15  6:23   ` Alexei Starovoitov
@ 2022-11-15 16:57     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:53:19AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:40AM +0530, Kumar Kartikeya Dwivedi wrote:
> >  	if (type_may_be_null(reg->type) && reg->id == id &&
> >  	    !WARN_ON_ONCE(!reg->id)) {
> > -		if (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||
> > -				 !tnum_equals_const(reg->var_off, 0) ||
> > -				 reg->off)) {
> > +		if (reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0) || reg->off) {
> ....
> > +			if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0)))
>
> That is too much copy-paste between two lines.
> Please combine the checks.

I have rewritten it like this:

if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0))) ‣a: reg->var_off ‣b: 0 ‣: int
	  return;
if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL) && WARN_ON_ONCE(reg->off)) ‣: int
	  return;

I prefer to keep the WARN, as it would be pretty clearly a verifier bug that
would be silently missed since the return type is void.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-15  6:26   ` Alexei Starovoitov
@ 2022-11-15 16:59     ` Kumar Kartikeya Dwivedi
  2022-11-15 18:26       ` Alexei Starovoitov
  0 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 16:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:56:37AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:41AM +0530, Kumar Kartikeya Dwivedi wrote:
> > +}
> > +
> > +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> > +					   struct bpf_reg_state *reg, u32 regno,
> > +					   struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +	struct btf_struct_meta *struct_meta;
> > +	struct btf_field *field;
> > +	struct btf_record *rec;
> > +	u32 list_node_off;
> > +
> > +	if (meta->btf != btf_vmlinux ||
> > +	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
> > +	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
> > +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
>
> typo. bpf_list_node ?
>
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (!tnum_is_const(reg->var_off)) {
> > +		verbose(env,
> > +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
>
> same typo?
>

These two are typos.

> > +			regno);
> > +		return -EINVAL;
> > +	}
> > +
> > +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> > +	if (!struct_meta) {
> > +		verbose(env, "bpf_list_node not found for allocated object\n");
> > +		return -EINVAL;
> > +	}
> > +	rec = struct_meta->record;
> > +
> > +	list_node_off = reg->off + reg->var_off.value;
> > +	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> > +	if (!field || field->offset != list_node_off) {
> > +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> > +		return -EINVAL;
> > +	}
> > +
> > +	field = meta->arg_list_head.field;
> > +
> > +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
> > +				  field->list_head.value_btf_id, true)) {
> > +		verbose(env, "bpf_list_head value type does not match arg#1\n");
>
> and the same typo again?!
>

This is probably just poorly worded.
The value type (__contains) of bpf_list_head does not match arg#1 (node).

What's better, maybe:
bpf_list_node type does not match bpf_list_head value type?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}
  2022-11-15 16:52   ` Dave Marchevsky
@ 2022-11-15 17:01     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 17:01 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Tue, Nov 15, 2022 at 10:22:56PM IST, Dave Marchevsky wrote:
> On 11/14/22 2:15 PM, Kumar Kartikeya Dwivedi wrote:
> > This commit implements the delayed release logic for bpf_list_push_front
> > and bpf_list_push_back.
> >
> > Once a node has been added to the list, it's pointer changes to
> > PTR_UNTRUSTED. However, it is only released once the lock protecting the
> > list is unlocked. For such PTR_TO_BTF_ID | MEM_ALLOC with PTR_UNTRUSTED
> > set but an active ref_obj_id, it is still permitted to read them as long
> > as the lock is held. Writing to them is not allowed.
> >
> > This allows having read access to push items we no longer own until we
> > release the lock guarding the list, allowing a little more flexibility
> > when working with these APIs.
> >
> > Note that enabling write support has fairly tricky interactions with
> > what happens inside the critical section. Just as an example, currently,
> > bpf_obj_drop is not permitted, but if it were, being able to write to
> > the PTR_UNTRUSTED pointer while the object gets released back to the
> > memory allocator would violate safety properties we wish to guarantee
> > (i.e. not crashing the kernel). The memory could be reused for a
> > different type in the BPF program or even in the kernel as it gets
> > eventually kfree'd.
> >
> > Not enabling bpf_obj_drop inside the critical section would appear to
> > prevent all of the above, but that is more of an artifical limitation
> > right now. Since the write support is tangled with how we handle
> > potential aliasing of nodes inside the critical section that may or may
> > not be part of the list anymore, it has been deferred to a future patch.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Can the two WARN_ON_ONCE in this patch be converted to
> verifier-log-and-EFAULT? Looks like they're both in
> functions with access to 'env' and are checking for
> scenarios that should be considered bugs in the verifier.
>

Will do.

> Aside from that style nit, logic and patch summary updates
> here LGTM.
>
> Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables
  2022-11-14 19:15 ` [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
@ 2022-11-15 17:33   ` Dave Marchevsky
  2022-11-15 19:37     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Marchevsky @ 2022-11-15 17:33 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

On 11/14/22 2:15 PM, Kumar Kartikeya Dwivedi wrote:
> Global variables reside in maps accessible using direct_value_addr
> callbacks, so giving each load instruction's rewrite a unique reg->id
> disallows us from holding locks which are global.
> 
> The reason for preserving reg->id as a unique value for registers that
> may point to spin lock is that two separate lookups are treated as two
> separate memory regions, and any possible aliasing is ignored for the
> purposes of spin lock correctness.
> 
> This is not great especially for the global variable case, which are
> served from maps that have max_entries == 1, i.e. they always lead to
> map values pointing into the same map value.
> 
> So refactor the active_spin_lock into a 'active_lock' structure which
> represents the lock identity, and instead of the reg->id, remember two
> fields, a pointer and the reg->id. The pointer will store reg->map_ptr
> or reg->btf. It's only necessary to distinguish for the id == 0 case of
> global variables, but always setting the pointer to a non-NULL value and
> using the pointer to check whether the lock is held simplifies code in
> the verifier.
> 
> This is generic enough to allow it for global variables, map lookups,
> and allocated objects at the same time.
> 
> Note that while whether a lock is held can be answered by just comparing
> active_lock.ptr to NULL, to determine whether the register is pointing
> to the same held lock requires comparing _both_ ptr and id.
> 
> Finally, as a result of this refactoring, pseudo load instructions are
> not given a unique reg->id, as they are doing lookup for the same map
> value (max_entries is never greater than 1).
> 
> Essentially, we consider that the tuple of (ptr, id) will always be
> unique for any kind of argument to bpf_spin_{lock,unlock}.
> 
> Note that this can be extended in the future to also remember offset
> used for locking, so that we can introduce multiple bpf_spin_lock fields
> in the same allocation.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf_verifier.h | 10 ++++++++-
>  kernel/bpf/verifier.c        | 41 ++++++++++++++++++++++++------------
>  2 files changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 1a32baa78ce2..fa738abea267 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -323,7 +323,15 @@ struct bpf_verifier_state {
>  	u32 branches;
>  	u32 insn_idx;
>  	u32 curframe;
> -	u32 active_spin_lock;
> +	struct {
> +		/* This can either be reg->map_ptr or reg->btf, but it is only
> +		 * used to check whether the lock is held or not by comparing to
> +		 * NULL.
> +		 */
> +		void *ptr;
> +		/* This will be reg->id */
> +		u32 id;
> +	} active_lock;

I didn't get back to you re: naming here, but I think these names are clear,
especially with comments elaborating on the details. The first comment can
be clarified a bit, though. Sounds like "is only used to check whether lock is
held or not by comparing to NULL" is saying that active_lock.ptr is only
compared to NULL in this patch, but changes to process_spin_lock check
which results in verbose(env, "bpf_spin_unlock of different lock\n") are
comparing it to another ptr. 

Maybe you're trying to say "if active_lock.ptr
is NULL, there's no active lock and other fields in this struct don't hold
anything valid. If non-NULL, there is an active lock held"?

Separately, the line in patch summary with "we consider that the tuple of 
(ptr, id) will always be unique" would help with clarity if it was in the
comments here.

>  	bool speculative;
>  
>  	/* first and last insn idx of this verifier state */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 070d003a99f0..99b5edb56978 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1215,7 +1215,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
>  	}
>  	dst_state->speculative = src->speculative;
>  	dst_state->curframe = src->curframe;
> -	dst_state->active_spin_lock = src->active_spin_lock;
> +	dst_state->active_lock.ptr = src->active_lock.ptr;
> +	dst_state->active_lock.id = src->active_lock.id;
>  	dst_state->branches = src->branches;
>  	dst_state->parent = src->parent;
>  	dst_state->first_insn_idx = src->first_insn_idx;
> @@ -5587,7 +5588,7 @@ int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state
>   * Since only one bpf_spin_lock is allowed the checks are simpler than
>   * reg_is_refcounted() logic. The verifier needs to remember only
>   * one spin_lock instead of array of acquired_refs.
> - * cur_state->active_spin_lock remembers which map value element got locked
> + * cur_state->active_lock remembers which map value element got locked

Maybe remove "map value" here? Since previous patch adds support for allocated
object. "remembers which element got locked" or "remembers which map value
or allocated object" better reflect current state of spin_lock support after
these patches.

>   * and clears it after bpf_spin_unlock.
>   */
>  static int process_spin_lock(struct bpf_verifier_env *env, int regno,
> @@ -5636,22 +5637,35 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
>  		return -EINVAL;
>  	}
>  	if (is_lock) {
> -		if (cur->active_spin_lock) {
> +		if (cur->active_lock.ptr) {
>  			verbose(env,
>  				"Locking two bpf_spin_locks are not allowed\n");
>  			return -EINVAL;
>  		}
> -		cur->active_spin_lock = reg->id;
> +		if (map)
> +			cur->active_lock.ptr = map;
> +		else
> +			cur->active_lock.ptr = btf;
> +		cur->active_lock.id = reg->id;
>  	} else {
> -		if (!cur->active_spin_lock) {
> +		void *ptr;
> +
> +		if (map)
> +			ptr = map;
> +		else
> +			ptr = btf;
> +
> +		if (!cur->active_lock.ptr) {
>  			verbose(env, "bpf_spin_unlock without taking a lock\n");
>  			return -EINVAL;
>  		}
> -		if (cur->active_spin_lock != reg->id) {
> +		if (cur->active_lock.ptr != ptr ||
> +		    cur->active_lock.id != reg->id) {
>  			verbose(env, "bpf_spin_unlock of different lock\n");
>  			return -EINVAL;
>  		}
> -		cur->active_spin_lock = 0;
> +		cur->active_lock.ptr = NULL;
> +		cur->active_lock.id = 0;
>  	}
>  	return 0;
>  }
> @@ -10582,8 +10596,8 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
>  		dst_reg->type = PTR_TO_MAP_VALUE;
>  		dst_reg->off = aux->map_off;
> -		if (btf_record_has_field(map->record, BPF_SPIN_LOCK))
> -			dst_reg->id = ++env->id_gen;
> +		WARN_ON_ONCE(map->max_entries != 1);
> +		/* We want reg->id to be same (0) as map_value is not distinct */
>  	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
>  		   insn->src_reg == BPF_PSEUDO_MAP_IDX) {
>  		dst_reg->type = CONST_PTR_TO_MAP;
> @@ -10661,7 +10675,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  		return err;
>  	}
>  
> -	if (env->cur_state->active_spin_lock) {
> +	if (env->cur_state->active_lock.ptr) {
>  		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_spin_lock-ed region\n");
>  		return -EINVAL;
>  	}
> @@ -11927,7 +11941,8 @@ static bool states_equal(struct bpf_verifier_env *env,
>  	if (old->speculative && !cur->speculative)
>  		return false;
>  
> -	if (old->active_spin_lock != cur->active_spin_lock)
> +	if (old->active_lock.ptr != cur->active_lock.ptr ||
> +	    old->active_lock.id != cur->active_lock.id)
>  		return false;
>  
>  	/* for states to be equal callsites have to be the same
> @@ -12572,7 +12587,7 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> -				if (env->cur_state->active_spin_lock &&
> +				if (env->cur_state->active_lock.ptr &&
>  				    (insn->src_reg == BPF_PSEUDO_CALL ||
>  				     insn->imm != BPF_FUNC_spin_unlock)) {
>  					verbose(env, "function calls are not allowed while holding a lock\n");
> @@ -12609,7 +12624,7 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> -				if (env->cur_state->active_spin_lock) {
> +				if (env->cur_state->active_lock.ptr) {
>  					verbose(env, "bpf_spin_unlock is missing\n");
>  					return -EINVAL;
>  				}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-15 16:59     ` Kumar Kartikeya Dwivedi
@ 2022-11-15 18:26       ` Alexei Starovoitov
  2022-11-15 19:36         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15 18:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 10:29:51PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Tue, Nov 15, 2022 at 11:56:37AM IST, Alexei Starovoitov wrote:
> > On Tue, Nov 15, 2022 at 12:45:41AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > +}
> > > +
> > > +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> > > +					   struct bpf_reg_state *reg, u32 regno,
> > > +					   struct bpf_kfunc_call_arg_meta *meta)
> > > +{
> > > +	struct btf_struct_meta *struct_meta;
> > > +	struct btf_field *field;
> > > +	struct btf_record *rec;
> > > +	u32 list_node_off;
> > > +
> > > +	if (meta->btf != btf_vmlinux ||
> > > +	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
> > > +	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
> > > +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> >
> > typo. bpf_list_node ?
> >
> > > +		return -EFAULT;
> > > +	}
> > > +
> > > +	if (!tnum_is_const(reg->var_off)) {
> > > +		verbose(env,
> > > +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> >
> > same typo?
> >
> 
> These two are typos.
> 
> > > +			regno);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> > > +	if (!struct_meta) {
> > > +		verbose(env, "bpf_list_node not found for allocated object\n");
> > > +		return -EINVAL;
> > > +	}
> > > +	rec = struct_meta->record;
> > > +
> > > +	list_node_off = reg->off + reg->var_off.value;
> > > +	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> > > +	if (!field || field->offset != list_node_off) {
> > > +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	field = meta->arg_list_head.field;
> > > +
> > > +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
> > > +				  field->list_head.value_btf_id, true)) {
> > > +		verbose(env, "bpf_list_head value type does not match arg#1\n");
> >
> > and the same typo again?!
> >
> 
> This is probably just poorly worded.
> The value type (__contains) of bpf_list_head does not match arg#1 (node).
> 
> What's better, maybe:
> bpf_list_node type does not match bpf_list_head value type?

That would be the case when user is trying to bpf_list_push_head
to head that has __contains tag that point to a different node ?
It feels the users will be hitting this error case from time to time,
so the most verbose message is the best.
Both options above are a bit cryptic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-15 18:26       ` Alexei Starovoitov
@ 2022-11-15 19:36         ` Kumar Kartikeya Dwivedi
  2022-11-15 19:45           ` Alexei Starovoitov
  0 siblings, 1 reply; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 19:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:56:16PM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 10:29:51PM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Tue, Nov 15, 2022 at 11:56:37AM IST, Alexei Starovoitov wrote:
> > > On Tue, Nov 15, 2022 at 12:45:41AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > +}
> > > > +
> > > > +static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env,
> > > > +					   struct bpf_reg_state *reg, u32 regno,
> > > > +					   struct bpf_kfunc_call_arg_meta *meta)
> > > > +{
> > > > +	struct btf_struct_meta *struct_meta;
> > > > +	struct btf_field *field;
> > > > +	struct btf_record *rec;
> > > > +	u32 list_node_off;
> > > > +
> > > > +	if (meta->btf != btf_vmlinux ||
> > > > +	    (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] &&
> > > > +	     meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) {
> > > > +		verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n");
> > >
> > > typo. bpf_list_node ?
> > >
> > > > +		return -EFAULT;
> > > > +	}
> > > > +
> > > > +	if (!tnum_is_const(reg->var_off)) {
> > > > +		verbose(env,
> > > > +			"R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n",
> > >
> > > same typo?
> > >
> >
> > These two are typos.
> >
> > > > +			regno);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	struct_meta = btf_find_struct_meta(reg->btf, reg->btf_id);
> > > > +	if (!struct_meta) {
> > > > +		verbose(env, "bpf_list_node not found for allocated object\n");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	rec = struct_meta->record;
> > > > +
> > > > +	list_node_off = reg->off + reg->var_off.value;
> > > > +	field = btf_record_find(rec, list_node_off, BPF_LIST_NODE);
> > > > +	if (!field || field->offset != list_node_off) {
> > > > +		verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	field = meta->arg_list_head.field;
> > > > +
> > > > +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf,
> > > > +				  field->list_head.value_btf_id, true)) {
> > > > +		verbose(env, "bpf_list_head value type does not match arg#1\n");
> > >
> > > and the same typo again?!
> > >
> >
> > This is probably just poorly worded.
> > The value type (__contains) of bpf_list_head does not match arg#1 (node).
> >
> > What's better, maybe:
> > bpf_list_node type does not match bpf_list_head value type?
>
> That would be the case when user is trying to bpf_list_push_head
> to head that has __contains tag that point to a different node ?

Right.

> It feels the users will be hitting this error case from time to time,
> so the most verbose message is the best.
> Both options above are a bit cryptic.

How about something like this?

operation on bpf_list_head expects node at offset=X in struct foo, but
node is at offset=Y in struct bar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables
  2022-11-15 17:33   ` Dave Marchevsky
@ 2022-11-15 19:37     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 48+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-15 19:37 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Tue, Nov 15, 2022 at 11:03:02PM IST, Dave Marchevsky wrote:
> On 11/14/22 2:15 PM, Kumar Kartikeya Dwivedi wrote:
> > Global variables reside in maps accessible using direct_value_addr
> > callbacks, so giving each load instruction's rewrite a unique reg->id
> > disallows us from holding locks which are global.
> >
> > The reason for preserving reg->id as a unique value for registers that
> > may point to spin lock is that two separate lookups are treated as two
> > separate memory regions, and any possible aliasing is ignored for the
> > purposes of spin lock correctness.
> >
> > This is not great especially for the global variable case, which are
> > served from maps that have max_entries == 1, i.e. they always lead to
> > map values pointing into the same map value.
> >
> > So refactor the active_spin_lock into a 'active_lock' structure which
> > represents the lock identity, and instead of the reg->id, remember two
> > fields, a pointer and the reg->id. The pointer will store reg->map_ptr
> > or reg->btf. It's only necessary to distinguish for the id == 0 case of
> > global variables, but always setting the pointer to a non-NULL value and
> > using the pointer to check whether the lock is held simplifies code in
> > the verifier.
> >
> > This is generic enough to allow it for global variables, map lookups,
> > and allocated objects at the same time.
> >
> > Note that while whether a lock is held can be answered by just comparing
> > active_lock.ptr to NULL, to determine whether the register is pointing
> > to the same held lock requires comparing _both_ ptr and id.
> >
> > Finally, as a result of this refactoring, pseudo load instructions are
> > not given a unique reg->id, as they are doing lookup for the same map
> > value (max_entries is never greater than 1).
> >
> > Essentially, we consider that the tuple of (ptr, id) will always be
> > unique for any kind of argument to bpf_spin_{lock,unlock}.
> >
> > Note that this can be extended in the future to also remember offset
> > used for locking, so that we can introduce multiple bpf_spin_lock fields
> > in the same allocation.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf_verifier.h | 10 ++++++++-
> >  kernel/bpf/verifier.c        | 41 ++++++++++++++++++++++++------------
> >  2 files changed, 37 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 1a32baa78ce2..fa738abea267 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -323,7 +323,15 @@ struct bpf_verifier_state {
> >  	u32 branches;
> >  	u32 insn_idx;
> >  	u32 curframe;
> > -	u32 active_spin_lock;
> > +	struct {
> > +		/* This can either be reg->map_ptr or reg->btf, but it is only
> > +		 * used to check whether the lock is held or not by comparing to
> > +		 * NULL.
> > +		 */
> > +		void *ptr;
> > +		/* This will be reg->id */
> > +		u32 id;
> > +	} active_lock;
>
> I didn't get back to you re: naming here, but I think these names are clear,
> especially with comments elaborating on the details. The first comment can
> be clarified a bit, though. Sounds like "is only used to check whether lock is
> held or not by comparing to NULL" is saying that active_lock.ptr is only
> compared to NULL in this patch, but changes to process_spin_lock check
> which results in verbose(env, "bpf_spin_unlock of different lock\n") are
> comparing it to another ptr.
>
> Maybe you're trying to say "if active_lock.ptr
> is NULL, there's no active lock and other fields in this struct don't hold
> anything valid. If non-NULL, there is an active lock held"?
>

Makes sense, I'll improve the comment.

> Separately, the line in patch summary with "we consider that the tuple of
> (ptr, id) will always be unique" would help with clarity if it was in the
> comments here.
>

Ack.

> >  	bool speculative;
> >
> >  	/* first and last insn idx of this verifier state */
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 070d003a99f0..99b5edb56978 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -1215,7 +1215,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
> >  	}
> >  	dst_state->speculative = src->speculative;
> >  	dst_state->curframe = src->curframe;
> > -	dst_state->active_spin_lock = src->active_spin_lock;
> > +	dst_state->active_lock.ptr = src->active_lock.ptr;
> > +	dst_state->active_lock.id = src->active_lock.id;
> >  	dst_state->branches = src->branches;
> >  	dst_state->parent = src->parent;
> >  	dst_state->first_insn_idx = src->first_insn_idx;
> > @@ -5587,7 +5588,7 @@ int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state
> >   * Since only one bpf_spin_lock is allowed the checks are simpler than
> >   * reg_is_refcounted() logic. The verifier needs to remember only
> >   * one spin_lock instead of array of acquired_refs.
> > - * cur_state->active_spin_lock remembers which map value element got locked
> > + * cur_state->active_lock remembers which map value element got locked
>
> Maybe remove "map value" here? Since previous patch adds support for allocated
> object. "remembers which element got locked" or "remembers which map value
> or allocated object" better reflect current state of spin_lock support after
> these patches.
>

Ack, I should update other discrepancies wrt the previous update in this whole
comment block.

Thanks for the review.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API
  2022-11-15 19:36         ` Kumar Kartikeya Dwivedi
@ 2022-11-15 19:45           ` Alexei Starovoitov
  0 siblings, 0 replies; 48+ messages in thread
From: Alexei Starovoitov @ 2022-11-15 19:45 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dave Marchevsky

On Tue, Nov 15, 2022 at 11:36 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> > It feels the users will be hitting this error case from time to time,
> > so the most verbose message is the best.
> > Both options above are a bit cryptic.
>
> How about something like this?
>
> operation on bpf_list_head expects node at offset=X in struct foo, but
> node is at offset=Y in struct bar

Excellent. That would be the most accurate error message.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-11-15 19:45 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-14 19:15 [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 01/26] bpf: Remove local kptr references in documentation Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 02/26] bpf: Remove BPF_MAP_OFF_ARR_MAX Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 03/26] bpf: Fix copy_map_value, zero_map_value Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 04/26] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 05/26] bpf: Rename RET_PTR_TO_ALLOC_MEM Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 06/26] bpf: Rename MEM_ALLOC to MEM_RINGBUF Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 07/26] bpf: Refactor btf_struct_access Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 08/26] bpf: Introduce allocated objects support Kumar Kartikeya Dwivedi
2022-11-15  5:48   ` Alexei Starovoitov
2022-11-15 16:47     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 09/26] bpf: Recognize lock and list fields in allocated objects Kumar Kartikeya Dwivedi
2022-11-15  5:50   ` Alexei Starovoitov
2022-11-15 16:48     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 10/26] bpf: Verify ownership relationships for user BTF types Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 11/26] bpf: Allow locking bpf_spin_lock in allocated objects Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 12/26] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-11-15 17:33   ` Dave Marchevsky
2022-11-15 19:37     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 13/26] bpf: Allow locking bpf_spin_lock in inner map values Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 14/26] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-11-15  6:10   ` Alexei Starovoitov
2022-11-15 16:51     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 15/26] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 16/26] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new Kumar Kartikeya Dwivedi
2022-11-15  6:19   ` Alexei Starovoitov
2022-11-15 16:53     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 18/26] bpf: Introduce bpf_obj_drop Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 19/26] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-11-15  6:23   ` Alexei Starovoitov
2022-11-15 16:57     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 20/26] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-11-15  6:26   ` Alexei Starovoitov
2022-11-15 16:59     ` Kumar Kartikeya Dwivedi
2022-11-15 18:26       ` Alexei Starovoitov
2022-11-15 19:36         ` Kumar Kartikeya Dwivedi
2022-11-15 19:45           ` Alexei Starovoitov
2022-11-14 19:15 ` [PATCH bpf-next v7 21/26] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back} Kumar Kartikeya Dwivedi
2022-11-15 16:52   ` Dave Marchevsky
2022-11-15 17:01     ` Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 22/26] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
2022-11-15 16:40   ` Dave Marchevsky
2022-11-14 19:15 ` [PATCH bpf-next v7 23/26] selftests/bpf: Update spinlock selftest Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 24/26] selftests/bpf: Add failure test cases for spin lock pairing Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 25/26] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
2022-11-14 19:15 ` [PATCH bpf-next v7 26/26] selftests/bpf: Add BTF sanity tests Kumar Kartikeya Dwivedi
2022-11-15  6:00 ` [PATCH bpf-next v7 00/26] Allocated objects, BPF linked lists patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.