All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps
@ 2022-03-20 15:54 Kumar Kartikeya Dwivedi
  2022-03-20 15:54 ` [PATCH bpf-next v3 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
                   ` (12 more replies)
  0 siblings, 13 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:54 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
two kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the bpf_kptr_xchg helper, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

Notes
-----

 * C selftests require https://reviews.llvm.org/D119799 to pass.
 * Unlike BPF timers, kptr is not reset or freed on map_release_uref.
 * Referenced kptr storage is always treated as unsigned long * on kernel side,
   as BPF side cannot mutate it. The storage (8 bytes) is sufficient for both
   32-bit and 64-bit platforms.
 * Use of WRITE_ONCE to reset unreferenced kptr on 32-bit systems is fine, as
   the actual pointer is always word sized, so the store tearing into two 32-bit
   stores won't be a problem as the other half is always zeroed out.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20220317115957.3193097-1-memxor@gmail.com

 * Address comments from Alexei
   * Set name, sz, align in btf_find_field
   * Do idx >= info_cnt check in caller of btf_find_field_*
     * Use extra element in the info_arr to make this safe
   * Remove while loop, reject extra tags
   * Remove cases of defensive programming
   * Move bpf_capable() check to map_check_btf
   * Put check_ptr_off_reg reordering hunk into separate patch
   * Warn for ref_ptr once
   * Make the meta.ref_obj_id == 0 case simpler to read
   * Remove kptr_percpu and kptr_user support, remove their tests
   * Store size of field at offset in off_arr
 * Fix BPF_F_NO_PREALLOC set wrongly for hash map in C selftest
 * Add missing check_mem_reg call for kptr_get kfunc arg#0 check

v1 -> v2
v1: https://lore.kernel.org/bpf/20220220134813.3411982-1-memxor@gmail.com

 * Address comments from Alexei
   * Rename bpf_btf_find_by_name_kind_all to bpf_find_btf_id
   * Reduce indentation level in that function
   * Always take reference regardless of module or vmlinux BTF
   * Also made it the same for btf_get_module_btf
   * Use kptr, kptr_ref, kptr_percpu, kptr_user type tags
   * Don't reserve tag namespace
   * Refactor btf_find_field to be side effect free, allocate and populate
     kptr_off_tab in caller
   * Move module reference to dtor patch
   * Remove support for BPF_XCHG, BPF_CMPXCHG insn
   * Introduce bpf_kptr_xchg helper
   * Embed offset array in struct bpf_map, populate and sort it once
   * Adjust copy_map_value to memcpy directly using this offset array
   * Removed size member from offset array to save space
 * Fix some problems pointed out by kernel test robot
 * Tidy selftests
 * Lots of other minor fixes

Kumar Kartikeya Dwivedi (13):
  bpf: Make btf_find_field more generic
  bpf: Move check_ptr_off_reg before check_map_access
  bpf: Allow storing unreferenced kptr in map
  bpf: Indicate argument that will be released in bpf_func_proto
  bpf: Allow storing referenced kptr in map
  bpf: Prevent escaping of kptr loaded from maps
  bpf: Adapt copy_map_value for multiple offset case
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Wire up freeing of referenced kptr
  bpf: Teach verifier about kptr_get kfunc helpers
  libbpf: Add kptr type tag macros to bpf_helpers.h
  selftests/bpf: Add C tests for kptr
  selftests/bpf: Add verifier tests for kptr

 include/linux/bpf.h                           | 113 +++-
 include/linux/btf.h                           |  23 +
 include/uapi/linux/bpf.h                      |  12 +
 kernel/bpf/arraymap.c                         |  14 +-
 kernel/bpf/btf.c                              | 506 ++++++++++++++++--
 kernel/bpf/hashtab.c                          |  29 +-
 kernel/bpf/helpers.c                          |  22 +
 kernel/bpf/map_in_map.c                       |   5 +-
 kernel/bpf/ringbuf.c                          |   2 +
 kernel/bpf/syscall.c                          | 211 +++++++-
 kernel/bpf/verifier.c                         | 379 +++++++++++--
 net/bpf/test_run.c                            |  39 +-
 net/core/filter.c                             |   1 +
 tools/include/uapi/linux/bpf.h                |  12 +
 tools/lib/bpf/bpf_helpers.h                   |   2 +
 .../selftests/bpf/prog_tests/map_kptr.c       |  20 +
 tools/testing/selftests/bpf/progs/map_kptr.c  | 194 +++++++
 tools/testing/selftests/bpf/test_verifier.c   |  49 +-
 .../testing/selftests/bpf/verifier/map_kptr.c | 445 +++++++++++++++
 19 files changed, 1931 insertions(+), 147 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_kptr.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 01/13] bpf: Make btf_find_field more generic
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
@ 2022-03-20 15:54 ` Kumar Kartikeya Dwivedi
  2022-03-20 15:54 ` [PATCH bpf-next v3 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:54 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Next commit's field type will not be struct, but pointer, and it will
not be limited to one offset, but multiple ones. Make existing
btf_find_struct_field and btf_find_datasec_var functions amenable to use
for finding BTF ID pointers in map value, by taking a moving spin_lock
and timer specific checks into their own function.

The alignment, and name are checked before the function is called, so it
is the last point where we can skip field or return an error before the
next loop iteration happens. The name parameter is now optional, and
only checked if it is not NULL.

The size must be checked in the function, because in case of PTR it will
instead point to the underlying BTF ID it is pointing to (or modifiers),
so the check becomes wrong to do outside of function, and the base type
has to be obtained by removing modifiers.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 129 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 96 insertions(+), 33 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6d9e711cb5d4..9e17af936a7a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3161,71 +3161,126 @@ static void btf_struct_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
+enum {
+	BTF_FIELD_SPIN_LOCK,
+	BTF_FIELD_TIMER,
+};
+
+struct btf_field_info {
+	u32 off;
+};
+
+static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
+				 u32 off, int sz, struct btf_field_info *info)
+{
+	if (!__btf_type_is_struct(t))
+		return 0;
+	if (t->size != sz)
+		return 0;
+	if (info->off != -ENOENT)
+		/* only one such field is allowed */
+		return -E2BIG;
+	info->off = off;
+	return 0;
+}
+
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
-				 const char *name, int sz, int align)
+				 const char *name, int sz, int align, int field_type,
+				 struct btf_field_info *info)
 {
 	const struct btf_member *member;
-	u32 i, off = -ENOENT;
+	u32 i, off;
+	int ret;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								    member->type);
-		if (!__btf_type_is_struct(member_type))
-			continue;
-		if (member_type->size != sz)
-			continue;
-		if (strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
-			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
+
 		off = __btf_member_bit_offset(t, member);
+
+		if (name && strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
+			continue;
 		if (off % 8)
 			/* valid C code cannot generate such BTF */
 			return -EINVAL;
 		off /= 8;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			ret = btf_find_field_struct(btf, member_type, off, sz, info);
+			if (ret < 0)
+				return ret;
+			break;
+		default:
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
-				const char *name, int sz, int align)
+				const char *name, int sz, int align, int field_type,
+				struct btf_field_info *info)
 {
 	const struct btf_var_secinfo *vsi;
-	u32 i, off = -ENOENT;
+	u32 i, off;
+	int ret;
 
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
 		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
 
-		if (!__btf_type_is_struct(var_type))
-			continue;
-		if (var_type->size != sz)
+		off = vsi->offset;
+
+		if (name && strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
 			continue;
 		if (vsi->size != sz)
 			continue;
-		if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
-			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
-		off = vsi->offset;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			ret = btf_find_field_struct(btf, var_type, off, sz, info);
+			if (ret < 0)
+				return ret;
+			break;
+		default:
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  const char *name, int sz, int align)
+			  int field_type, struct btf_field_info *info)
 {
+	const char *name;
+	int sz, align;
+
+	switch (field_type) {
+	case BTF_FIELD_SPIN_LOCK:
+		name = "bpf_spin_lock";
+		sz = sizeof(struct bpf_spin_lock);
+		align = __alignof__(struct bpf_spin_lock);
+		break;
+	case BTF_FIELD_TIMER:
+		name = "bpf_timer";
+		sz = sizeof(struct bpf_timer);
+		align = __alignof__(struct bpf_timer);
+		break;
+	default:
+		return -EFAULT;
+	}
 
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align);
+		return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align);
+		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
 	return -EINVAL;
 }
 
@@ -3235,16 +3290,24 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
  */
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_spin_lock",
-			      sizeof(struct bpf_spin_lock),
-			      __alignof__(struct bpf_spin_lock));
+	struct btf_field_info info = { .off = -ENOENT };
+	int ret;
+
+	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
+	if (ret < 0)
+		return ret;
+	return info.off;
 }
 
 int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_timer",
-			      sizeof(struct bpf_timer),
-			      __alignof__(struct bpf_timer));
+	struct btf_field_info info = { .off = -ENOENT };
+	int ret;
+
+	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
+	if (ret < 0)
+		return ret;
+	return info.off;
 }
 
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 02/13] bpf: Move check_ptr_off_reg before check_map_access
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-03-20 15:54 ` [PATCH bpf-next v3 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
@ 2022-03-20 15:54 ` Kumar Kartikeya Dwivedi
  2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:54 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Some functions in next patch want to use this function, and those
functions will be called by check_map_access, hence move it before
check_map_access.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 76 +++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0287176bfe9a..4ce9a528fb63 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3469,6 +3469,44 @@ static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static int __check_ptr_off_reg(struct bpf_verifier_env *env,
+			       const struct bpf_reg_state *reg, int regno,
+			       bool fixed_off_ok)
+{
+	/* Access to this pointer-typed register or passing it to a helper
+	 * is only allowed in its original, unmodified form.
+	 */
+
+	if (reg->off < 0) {
+		verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
+			reg_type_str(env, reg->type), regno, reg->off);
+		return -EACCES;
+	}
+
+	if (!fixed_off_ok && reg->off) {
+		verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
+			reg_type_str(env, reg->type), regno, reg->off);
+		return -EACCES;
+	}
+
+	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+		char tn_buf[48];
+
+		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+		verbose(env, "variable %s access var_off=%s disallowed\n",
+			reg_type_str(env, reg->type), tn_buf);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
+int check_ptr_off_reg(struct bpf_verifier_env *env,
+		      const struct bpf_reg_state *reg, int regno)
+{
+	return __check_ptr_off_reg(env, reg, regno, false);
+}
+
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			    int off, int size, bool zero_size_allowed)
@@ -3980,44 +4018,6 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
 }
 #endif
 
-static int __check_ptr_off_reg(struct bpf_verifier_env *env,
-			       const struct bpf_reg_state *reg, int regno,
-			       bool fixed_off_ok)
-{
-	/* Access to this pointer-typed register or passing it to a helper
-	 * is only allowed in its original, unmodified form.
-	 */
-
-	if (reg->off < 0) {
-		verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
-			reg_type_str(env, reg->type), regno, reg->off);
-		return -EACCES;
-	}
-
-	if (!fixed_off_ok && reg->off) {
-		verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
-			reg_type_str(env, reg->type), regno, reg->off);
-		return -EACCES;
-	}
-
-	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
-		char tn_buf[48];
-
-		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
-		verbose(env, "variable %s access var_off=%s disallowed\n",
-			reg_type_str(env, reg->type), tn_buf);
-		return -EACCES;
-	}
-
-	return 0;
-}
-
-int check_ptr_off_reg(struct bpf_verifier_env *env,
-		      const struct bpf_reg_state *reg, int regno)
-{
-	return __check_ptr_off_reg(env, reg, regno, false);
-}
-
 static int __check_buffer_access(struct bpf_verifier_env *env,
 				 const char *buf_info,
 				 const struct bpf_reg_state *reg,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-03-20 15:54 ` [PATCH bpf-next v3 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
  2022-03-20 15:54 ` [PATCH bpf-next v3 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-21 23:39   ` Joanne Koong
                     ` (2 more replies)
  2022-03-20 15:55 ` [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  12 siblings, 3 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

This commit introduces a new pointer type 'kptr' which can be embedded
in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
register must have the same type as in the map value's BTF, and loading
a kptr marks the destination register as PTR_TO_BTF_ID with the correct
kernel BTF and BTF ID.

Such kptr are unreferenced, i.e. by the time another invocation of the
BPF program loads this pointer, the object which the pointer points to
may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
patched to PROBE_MEM loads by the verifier, it would safe to allow user
to still access such invalid pointer, but passing such pointers into
BPF helpers and kfuncs should not be permitted. A future patch in this
series will close this gap.

The flexibility offered by allowing programs to dereference such invalid
pointers while being safe at runtime frees the verifier from doing
complex lifetime tracking. As long as the user may ensure that the
object remains valid, it can ensure data read by it from the kernel
object is valid.

The user indicates that a certain pointer must be treated as kptr
capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
information is recorded in the object BTF which will be passed into the
kernel by way of map's BTF information. The name and kind from the map
value BTF is used to look up the in-kernel type, and the actual BTF and
BTF ID is recorded in the map struct in a new kptr_off_tab member. For
now, only storing pointers to structs is permitted.

An example of this specification is shown below:

	#define __kptr __attribute__((btf_type_tag("kptr")))

	struct map_value {
		...
		struct task_struct __kptr *task;
		...
	};

Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
task_struct into the map, and then load it later.

Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
the verifier cannot know whether the value is NULL or not statically, it
must treat all potential loads at that map value offset as loading a
possibly NULL pointer.

Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
are allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is a PTR_TO_BTF_ID with
same BTF type as specified in the map BTF. The access size must always
be BPF_DW.

For the map in map support, the kptr_off_tab for outer map is copied
from the inner map's kptr_off_tab. It was chosen to do a deep copy
instead of introducing a refcount to kptr_off_tab, because the copy only
needs to be done when paramterizing using inner_map_fd in the map in map
case, hence would be unnecessary for all other users.

It is not permitted to use MAP_FREEZE command and mmap for BPF map
having kptr, similar to the bpf_timer case.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h     |  29 +++++++-
 include/linux/btf.h     |   2 +
 kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
 kernel/bpf/map_in_map.c |   5 +-
 kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
 kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
 6 files changed, 401 insertions(+), 28 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 88449fbbe063..f35920d279dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -155,6 +155,22 @@ struct bpf_map_ops {
 	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
+enum {
+	/* Support at most 8 pointers in a BPF map value */
+	BPF_MAP_VALUE_OFF_MAX = 8,
+};
+
+struct bpf_map_value_off_desc {
+	u32 offset;
+	u32 btf_id;
+	struct btf *btf;
+};
+
+struct bpf_map_value_off {
+	u32 nr_off;
+	struct bpf_map_value_off_desc off[];
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -171,6 +187,7 @@ struct bpf_map {
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
 	int spin_lock_off; /* >=0 valid offset, <0 error */
+	struct bpf_map_value_off *kptr_off_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
@@ -184,7 +201,7 @@ struct bpf_map {
 	char name[BPF_OBJ_NAME_LEN];
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 14 bytes hole */
+	/* 6 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -217,6 +234,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
 	return map->timer_off >= 0;
 }
 
+static inline bool map_value_has_kptr(const struct bpf_map *map)
+{
+	return !IS_ERR_OR_NULL(map->kptr_off_tab);
+}
+
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
@@ -1497,6 +1519,11 @@ void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
+void bpf_map_free_kptr_off_tab(struct bpf_map *map);
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 36bc09b8e890..5b578dc81c04 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
+					const struct btf_type *t);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 9e17af936a7a..92afbec0a887 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3164,9 +3164,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
 enum {
 	BTF_FIELD_SPIN_LOCK,
 	BTF_FIELD_TIMER,
+	BTF_FIELD_KPTR,
+};
+
+enum {
+	BTF_FIELD_IGNORE = 0,
+	BTF_FIELD_FOUND  = 1,
 };
 
 struct btf_field_info {
+	const struct btf_type *type;
 	u32 off;
 };
 
@@ -3174,23 +3181,48 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
 				 u32 off, int sz, struct btf_field_info *info)
 {
 	if (!__btf_type_is_struct(t))
-		return 0;
+		return BTF_FIELD_IGNORE;
 	if (t->size != sz)
-		return 0;
-	if (info->off != -ENOENT)
-		/* only one such field is allowed */
-		return -E2BIG;
+		return BTF_FIELD_IGNORE;
 	info->off = off;
-	return 0;
+	return BTF_FIELD_FOUND;
+}
+
+static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
+			       u32 off, int sz, struct btf_field_info *info)
+{
+	/* For PTR, sz is always == 8 */
+	if (!btf_type_is_ptr(t))
+		return BTF_FIELD_IGNORE;
+	t = btf_type_by_id(btf, t->type);
+
+	if (!btf_type_is_type_tag(t))
+		return BTF_FIELD_IGNORE;
+	/* Reject extra tags */
+	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
+		return -EINVAL;
+	if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+		return -EINVAL;
+
+	/* Get the base type */
+	if (btf_type_is_modifier(t))
+		t = btf_type_skip_modifiers(btf, t->type, NULL);
+	/* Only pointer to struct is allowed */
+	if (!__btf_type_is_struct(t))
+		return -EINVAL;
+
+	info->type = t;
+	info->off = off;
+	return BTF_FIELD_FOUND;
 }
 
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
 				 const char *name, int sz, int align, int field_type,
-				 struct btf_field_info *info)
+				 struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_member *member;
+	int ret, idx = 0;
 	u32 i, off;
-	int ret;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
@@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
-			ret = btf_find_field_struct(btf, member_type, off, sz, info);
+			ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
+			if (ret < 0)
+				return ret;
+			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
 			if (ret < 0)
 				return ret;
 			break;
 		default:
 			return -EFAULT;
 		}
+
+		if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
+			return -E2BIG;
+		else if (ret == BTF_FIELD_IGNORE)
+			continue;
+		++idx;
 	}
-	return 0;
+	return idx;
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				const char *name, int sz, int align, int field_type,
-				struct btf_field_info *info)
+				struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_var_secinfo *vsi;
+	int ret, idx = 0;
 	u32 i, off;
-	int ret;
 
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
@@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
-			ret = btf_find_field_struct(btf, var_type, off, sz, info);
+			ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
+			if (ret < 0)
+				return ret;
+			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
 			if (ret < 0)
 				return ret;
 			break;
 		default:
 			return -EFAULT;
 		}
+
+		if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
+			return -E2BIG;
+		if (ret == BTF_FIELD_IGNORE)
+			continue;
+		++idx;
 	}
-	return 0;
+	return idx;
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  int field_type, struct btf_field_info *info)
+			  int field_type, struct btf_field_info *info, int info_cnt)
 {
 	const char *name;
 	int sz, align;
@@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 		sz = sizeof(struct bpf_timer);
 		align = __alignof__(struct bpf_timer);
 		break;
+	case BTF_FIELD_KPTR:
+		name = NULL;
+		sz = sizeof(u64);
+		align = __alignof__(u64);
+		break;
 	default:
 		return -EFAULT;
 	}
 
+	/* The maximum allowed fields of a certain type will be info_cnt - 1 */
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
+		return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
+		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
 	return -EINVAL;
 }
 
@@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
  */
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 {
-	struct btf_field_info info = { .off = -ENOENT };
+	/* btf_find_field requires array of size max + 1 */
+	struct btf_field_info info_arr[2];
 	int ret;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
+	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
 	if (ret < 0)
 		return ret;
-	return info.off;
+	if (!ret)
+		return -ENOENT;
+	return info_arr[0].off;
 }
 
 int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 {
-	struct btf_field_info info = { .off = -ENOENT };
+	/* btf_find_field requires array of size max + 1 */
+	struct btf_field_info info_arr[2];
 	int ret;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
+	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
 	if (ret < 0)
 		return ret;
-	return info.off;
+	if (!ret)
+		return -ENOENT;
+	return info_arr[0].off;
+}
+
+struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
+					const struct btf_type *t)
+{
+	/* btf_find_field requires array of size max + 1 */
+	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
+	struct bpf_map_value_off *tab;
+	int ret, i, nr_off;
+
+	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
+	BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
+
+	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
+	if (ret < 0)
+		return ERR_PTR(ret);
+	if (!ret)
+		return NULL;
+
+	nr_off = ret;
+	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
+	if (!tab)
+		return ERR_PTR(-ENOMEM);
+
+	tab->nr_off = 0;
+	for (i = 0; i < nr_off; i++) {
+		const struct btf_type *t;
+		struct btf *off_btf;
+		s32 id;
+
+		t = info_arr[i].type;
+		id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
+				     &off_btf);
+		if (id < 0) {
+			ret = id;
+			goto end;
+		}
+
+		tab->off[i].offset = info_arr[i].off;
+		tab->off[i].btf_id = id;
+		tab->off[i].btf = off_btf;
+		tab->nr_off = i + 1;
+	}
+	return tab;
+end:
+	while (tab->nr_off--)
+		btf_put(tab->off[tab->nr_off].btf);
+	kfree(tab);
+	return ERR_PTR(ret);
 }
 
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 5cd8f5277279..135205d0d560 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	inner_map_meta->max_entries = inner_map->max_entries;
 	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
 	inner_map_meta->timer_off = inner_map->timer_off;
+	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
 	if (inner_map->btf) {
 		btf_get(inner_map->btf);
 		inner_map_meta->btf = inner_map->btf;
@@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 
 void bpf_map_meta_free(struct bpf_map *map_meta)
 {
+	bpf_map_free_kptr_off_tab(map_meta);
 	btf_put(map_meta->btf);
 	kfree(map_meta);
 }
@@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 		meta0->key_size == meta1->key_size &&
 		meta0->value_size == meta1->value_size &&
 		meta0->timer_off == meta1->timer_off &&
-		meta0->map_flags == meta1->map_flags;
+		meta0->map_flags == meta1->map_flags &&
+		bpf_map_equal_kptr_off_tab(meta0, meta1);
 }
 
 void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cdaa1152436a..5990d6fa97ab 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -6,6 +6,7 @@
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
 #include <linux/bpf_verifier.h>
+#include <linux/bsearch.h>
 #include <linux/btf.h>
 #include <linux/syscalls.h>
 #include <linux/slab.h>
@@ -473,12 +474,95 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 #endif
 
+static int bpf_map_kptr_off_cmp(const void *a, const void *b)
+{
+	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
+
+	if (off_desc1->offset < off_desc2->offset)
+		return -1;
+	else if (off_desc1->offset > off_desc2->offset)
+		return 1;
+	return 0;
+}
+
+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
+{
+	/* Since members are iterated in btf_find_field in increasing order,
+	 * offsets appended to kptr_off_tab are in increasing order, so we can
+	 * do bsearch to find exact match.
+	 */
+	struct bpf_map_value_off *tab;
+
+	if (!map_value_has_kptr(map))
+		return NULL;
+	tab = map->kptr_off_tab;
+	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
+}
+
+void bpf_map_free_kptr_off_tab(struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab;
+	int i;
+
+	if (!map_value_has_kptr(map))
+		return;
+	for (i = 0; i < tab->nr_off; i++) {
+		struct btf *btf = tab->off[i].btf;
+
+		btf_put(btf);
+	}
+	kfree(tab);
+	map->kptr_off_tab = NULL;
+}
+
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
+	int size, i, ret;
+
+	if (!map_value_has_kptr(map))
+		return ERR_PTR(-ENOENT);
+	/* Do a deep copy of the kptr_off_tab */
+	for (i = 0; i < tab->nr_off; i++)
+		btf_get(tab->off[i].btf);
+
+	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
+	new_tab = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+	if (!new_tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+	memcpy(new_tab, tab, size);
+	return new_tab;
+end:
+	while (i--)
+		btf_put(tab->off[i].btf);
+	return ERR_PTR(ret);
+}
+
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
+{
+	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
+	bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
+	int size;
+
+	if (!a_has_kptr && !b_has_kptr)
+		return true;
+	if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))
+		return false;
+	if (tab_a->nr_off != tab_b->nr_off)
+		return false;
+	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
+	return !memcmp(tab_a, tab_b, size);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
+	bpf_map_free_kptr_off_tab(map);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
@@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	int err;
 
 	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
-	    map_value_has_timer(map))
+	    map_value_has_timer(map) || map_value_has_kptr(map))
 		return -ENOTSUPP;
 
 	if (!(vma->vm_flags & VM_SHARED))
@@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			return -EOPNOTSUPP;
 	}
 
-	if (map->ops->map_check_btf)
+	map->kptr_off_tab = btf_find_kptr(btf, value_type);
+	if (map_value_has_kptr(map)) {
+		if (!bpf_capable())
+			return -EPERM;
+		if (map->map_flags & BPF_F_RDONLY_PROG) {
+			ret = -EACCES;
+			goto free_map_tab;
+		}
+		if (map->map_type != BPF_MAP_TYPE_HASH &&
+		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+		    map->map_type != BPF_MAP_TYPE_ARRAY) {
+			ret = -EOPNOTSUPP;
+			goto free_map_tab;
+		}
+	}
+
+	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
+		if (ret < 0)
+			goto free_map_tab;
+	}
 
+	return ret;
+free_map_tab:
+	bpf_map_free_kptr_off_tab(map);
 	return ret;
 }
 
@@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
 		return PTR_ERR(map);
 
 	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
-	    map_value_has_timer(map)) {
+	    map_value_has_timer(map) || map_value_has_kptr(map)) {
 		fdput(f);
 		return -ENOTSUPP;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ce9a528fb63..744b7362e52e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, false);
 }
 
+static int map_kptr_match_type(struct bpf_verifier_env *env,
+			       struct bpf_map_value_off_desc *off_desc,
+			       struct bpf_reg_state *reg, u32 regno)
+{
+	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
+	const char *reg_name = "";
+
+	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
+		goto bad_type;
+
+	if (!btf_is_kernel(reg->btf)) {
+		verbose(env, "R%d must point to kernel BTF\n", regno);
+		return -EINVAL;
+	}
+	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
+	reg_name = kernel_type_name(reg->btf, reg->btf_id);
+
+	if (__check_ptr_off_reg(env, reg, regno, true))
+		return -EACCES;
+
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+				  off_desc->btf, off_desc->btf_id))
+		goto bad_type;
+	return 0;
+bad_type:
+	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
+		reg_type_str(env, reg->type), reg_name);
+	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	return -EINVAL;
+}
+
+/* Returns an error, or 0 if ignoring the access, or 1 if register state was
+ * updated, in which case later updates must be skipped.
+ */
+static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
+				 int off, int size, int value_regno,
+				 enum bpf_access_type t, int insn_idx)
+{
+	struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+	struct bpf_map_value_off_desc *off_desc;
+	struct bpf_map *map = reg->map_ptr;
+	int class = BPF_CLASS(insn->code);
+
+	/* Things we already checked for in check_map_access:
+	 *  - Reject cases where variable offset may touch BTF ID pointer
+	 *  - size of access (must be BPF_DW)
+	 *  - off_desc->offset == off + reg->var_off.value
+	 */
+	if (!tnum_is_const(reg->var_off))
+		return 0;
+
+	off_desc = bpf_map_kptr_off_contains(map, off + reg->var_off.value);
+	if (!off_desc)
+		return 0;
+
+	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
+	if (BPF_MODE(insn->code) != BPF_MEM)
+		goto end;
+
+	if (class == BPF_LDX) {
+		val_reg = reg_state(env, value_regno);
+		/* We can simply mark the value_regno receiving the pointer
+		 * value from map as PTR_TO_BTF_ID, with the correct type.
+		 */
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
+				off_desc->btf_id, PTR_MAYBE_NULL);
+		val_reg->id = ++env->id_gen;
+	} else if (class == BPF_STX) {
+		val_reg = reg_state(env, value_regno);
+		if (!register_is_null(val_reg) &&
+		    map_kptr_match_type(env, off_desc, val_reg, value_regno))
+			return -EACCES;
+	} else if (class == BPF_ST) {
+		if (insn->imm) {
+			verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
+				off_desc->offset);
+			return -EACCES;
+		}
+	} else {
+		goto end;
+	}
+	return 1;
+end:
+	verbose(env, "kptr in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
+	return -EACCES;
+}
+
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			    int off, int size, bool zero_size_allowed)
@@ -3545,6 +3633,32 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 	}
+	if (map_value_has_kptr(map)) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++) {
+			u32 p = tab->off[i].offset;
+
+			if (reg->smin_value + off < p + sizeof(u64) &&
+			    p < reg->umax_value + off + size) {
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "kptr access cannot have variable offset\n");
+					return -EACCES;
+				}
+				if (p != off + reg->var_off.value) {
+					verbose(env, "kptr access misaligned expected=%u off=%llu\n",
+						p, off + reg->var_off.value);
+					return -EACCES;
+				}
+				if (size != bpf_size_to_bytes(BPF_DW)) {
+					verbose(env, "kptr access size must be BPF_DW\n");
+					return -EACCES;
+				}
+				break;
+			}
+		}
+	}
 	return err;
 }
 
@@ -4421,6 +4535,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err)
 			return err;
 		err = check_map_access(env, regno, off, size, false);
+		err = err ?: check_map_kptr_access(env, regno, off, size, value_regno, t, insn_idx);
+		if (err < 0)
+			return err;
+		/* if err == 0, check_map_kptr_access ignored the access */
 		if (!err && t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
 
@@ -4442,6 +4560,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				mark_reg_unknown(env, regs, value_regno);
 			}
 		}
+		/* clear err == 1 */
+		err = err < 0 ? err : 0;
 	} else if (base_type(reg->type) == PTR_TO_MEM) {
 		bool rdonly_mem = type_is_rdonly_mem(reg->type);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22  1:47   ` Joanne Koong
  2022-03-20 15:55 ` [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Add a few fields for each arg (argN_release) that when set to true,
tells verifier that for a release function, that argument's register
will be the one for which meta.ref_obj_id will be set, and which will
then be released using release_reference. To capture the regno,
introduce a release_regno field in bpf_call_arg_meta.

This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   | 10 ++++++++++
 kernel/bpf/ringbuf.c  |  2 ++
 kernel/bpf/verifier.c | 39 +++++++++++++++++++++++++++++++++------
 net/core/filter.c     |  1 +
 4 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f35920d279dd..48ddde854d67 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -487,6 +487,16 @@ struct bpf_func_proto {
 		};
 		u32 *arg_btf_id[5];
 	};
+	union {
+		struct {
+			bool arg1_release;
+			bool arg2_release;
+			bool arg3_release;
+			bool arg4_release;
+			bool arg5_release;
+		};
+		bool arg_release[5];
+	};
 	int *ret_btf_id; /* return value btf_id */
 	bool (*allowed)(const struct bpf_prog *prog);
 };
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..f40ce718630e 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -405,6 +405,7 @@ const struct bpf_func_proto bpf_ringbuf_submit_proto = {
 	.func		= bpf_ringbuf_submit,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_release	= true,
 	.arg2_type	= ARG_ANYTHING,
 };
 
@@ -418,6 +419,7 @@ const struct bpf_func_proto bpf_ringbuf_discard_proto = {
 	.func		= bpf_ringbuf_discard,
 	.ret_type	= RET_VOID,
 	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_release	= true,
 	.arg2_type	= ARG_ANYTHING,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 744b7362e52e..b8cd34607215 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
 	struct bpf_map *map_ptr;
 	bool raw_mode;
 	bool pkt_access;
+	u8 release_regno;
 	int regno;
 	int access_size;
 	int mem_size;
@@ -6101,12 +6102,31 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	return true;
 }
 
-static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
+static bool check_release_regno(const struct bpf_func_proto *fn, int func_id,
+				struct bpf_call_arg_meta *meta)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(fn->arg_release); i++) {
+		if (fn->arg_release[i]) {
+			if (!is_release_function(func_id))
+				return false;
+			if (meta->release_regno)
+				return false;
+			meta->release_regno = i + 1;
+		}
+	}
+	return !is_release_function(func_id) || meta->release_regno;
+}
+
+static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
+			    struct bpf_call_arg_meta *meta)
 {
 	return check_raw_mode_ok(fn) &&
 	       check_arg_pair_ok(fn) &&
 	       check_btf_id_ok(fn) &&
-	       check_refcount_ok(fn, func_id) ? 0 : -EINVAL;
+	       check_refcount_ok(fn, func_id) &&
+	       check_release_regno(fn, func_id, meta) ? 0 : -EINVAL;
 }
 
 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
@@ -6785,7 +6805,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	memset(&meta, 0, sizeof(meta));
 	meta.pkt_access = fn->pkt_access;
 
-	err = check_func_proto(fn, func_id);
+	err = check_func_proto(fn, func_id, &meta);
 	if (err) {
 		verbose(env, "kernel subsystem misconfigured func %s#%d\n",
 			func_id_name(func_id), func_id);
@@ -6818,8 +6838,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
+	regs = cur_regs(env);
+
 	if (is_release_function(func_id)) {
-		err = release_reference(env, meta.ref_obj_id);
+		err = -EINVAL;
+		if (meta.ref_obj_id)
+			err = release_reference(env, meta.ref_obj_id);
+		/* meta.ref_obj_id can only be 0 if register that is meant to be
+		 * released is NULL, which must be > R0.
+		 */
+		else if (meta.release_regno && register_is_null(&regs[meta.release_regno]))
+			err = 0;
 		if (err) {
 			verbose(env, "func %s#%d reference has not been acquired before\n",
 				func_id_name(func_id), func_id);
@@ -6827,8 +6856,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
-	regs = cur_regs(env);
-
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env);
diff --git a/net/core/filter.c b/net/core/filter.c
index 03655f2074ae..17eff4731b06 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6622,6 +6622,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg1_release   = true,
 };
 
 BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22 20:59   ` Martin KaFai Lau
  2022-03-20 15:55 ` [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Extending the code in previous commit, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h            |   7 ++
 include/uapi/linux/bpf.h       |  12 ++++
 kernel/bpf/btf.c               |  11 ++-
 kernel/bpf/helpers.c           |  22 ++++++
 kernel/bpf/verifier.c          | 128 ++++++++++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h |  12 ++++
 6 files changed, 175 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 48ddde854d67..6814e4885fab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -160,10 +160,15 @@ enum {
 	BPF_MAP_VALUE_OFF_MAX = 8,
 };
 
+enum {
+	BPF_MAP_VALUE_OFF_F_REF = (1U << 0),
+};
+
 struct bpf_map_value_off_desc {
 	u32 offset;
 	u32 btf_id;
 	struct btf *btf;
+	int flags;
 };
 
 struct bpf_map_value_off {
@@ -413,6 +418,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_STACK,	/* pointer to stack */
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
+	ARG_PTR_TO_KPTR,	/* pointer to kptr */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -422,6 +428,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+	ARG_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7604e7d5438f..b4e89da75d77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 92afbec0a887..e36ad26a5a6e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3175,6 +3175,7 @@ enum {
 struct btf_field_info {
 	const struct btf_type *type;
 	u32 off;
+	int flags;
 };
 
 static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
@@ -3191,6 +3192,8 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
 static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 			       u32 off, int sz, struct btf_field_info *info)
 {
+	int flags;
+
 	/* For PTR, sz is always == 8 */
 	if (!btf_type_is_ptr(t))
 		return BTF_FIELD_IGNORE;
@@ -3201,7 +3204,11 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 	/* Reject extra tags */
 	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
 		return -EINVAL;
-	if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+	if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+		flags = 0;
+	else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
+		flags = BPF_MAP_VALUE_OFF_F_REF;
+	else
 		return -EINVAL;
 
 	/* Get the base type */
@@ -3213,6 +3220,7 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 
 	info->type = t;
 	info->off = off;
+	info->flags = flags;
 	return BTF_FIELD_FOUND;
 }
 
@@ -3415,6 +3423,7 @@ struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
 		tab->off[i].offset = info_arr[i].off;
 		tab->off[i].btf_id = id;
 		tab->off[i].btf = off_btf;
+		tab->off[i].flags = info_arr[i].flags;
 		tab->nr_off = i + 1;
 	}
 	return tab;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..2e95f94d4efa 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,26 @@ void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
+{
+	unsigned long *kptr = map_value;
+
+	return xchg(kptr, (unsigned long)ptr);
+}
+
+static u32 bpf_kptr_xchg_btf_id;
+
+const struct bpf_func_proto bpf_kptr_xchg_proto = {
+	.func         = bpf_kptr_xchg,
+	.gpl_only     = false,
+	.ret_type     = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id   = &bpf_kptr_xchg_btf_id,
+	.arg1_type    = ARG_PTR_TO_KPTR,
+	.arg2_type    = ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id  = &bpf_kptr_xchg_btf_id,
+	.arg2_release = true,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1452,6 +1472,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_timer_start_proto;
 	case BPF_FUNC_timer_cancel:
 		return &bpf_timer_cancel_proto;
+	case BPF_FUNC_kptr_xchg:
+		return &bpf_kptr_xchg_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b8cd34607215..f731a0b45acb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -258,6 +258,7 @@ struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
+	struct bpf_map_value_off_desc *kptr_off_desc;
 };
 
 struct btf *btf_vmlinux;
@@ -480,7 +481,8 @@ static bool is_release_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_sk_release ||
 	       func_id == BPF_FUNC_ringbuf_submit ||
-	       func_id == BPF_FUNC_ringbuf_discard;
+	       func_id == BPF_FUNC_ringbuf_discard ||
+	       func_id == BPF_FUNC_kptr_xchg;
 }
 
 static bool may_be_acquire_function(enum bpf_func_id func_id)
@@ -500,7 +502,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
 	if (func_id == BPF_FUNC_sk_lookup_tcp ||
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
-	    func_id == BPF_FUNC_ringbuf_reserve)
+	    func_id == BPF_FUNC_ringbuf_reserve ||
+	    func_id == BPF_FUNC_kptr_xchg)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -3510,10 +3513,12 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 
 static int map_kptr_match_type(struct bpf_verifier_env *env,
 			       struct bpf_map_value_off_desc *off_desc,
-			       struct bpf_reg_state *reg, u32 regno)
+			       struct bpf_reg_state *reg, u32 regno,
+			       bool ref_ptr)
 {
 	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
 	const char *reg_name = "";
+	bool fixed_off_ok = true;
 
 	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
 		goto bad_type;
@@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
 	reg_name = kernel_type_name(reg->btf, reg->btf_id);
 
-	if (__check_ptr_off_reg(env, reg, regno, true))
+	if (ref_ptr) {
+		if (!reg->ref_obj_id) {
+			verbose(env, "R%d must be referenced %s%s\n", regno,
+				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+			return -EACCES;
+		}
+		/* reg->off can be used to store pointer to a certain type formed by
+		 * incrementing pointer of a parent structure the object is embedded in,
+		 * e.g. map may expect unreferenced struct path *, and user should be
+		 * allowed a store using &file->f_path. However, in the case of
+		 * referenced pointer, we cannot do this, because the reference is only
+		 * for the parent structure, not its embedded object(s), and because
+		 * the transfer of ownership happens for the original pointer to and
+		 * from the map (before its eventual release).
+		 */
+		if (reg->off)
+			fixed_off_ok = false;
+	}
+	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
+	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
 		return -EACCES;
 
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
@@ -3568,6 +3592,12 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	if (BPF_MODE(insn->code) != BPF_MEM)
 		goto end;
 
+	/* We cannot directly access kptr_ref */
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
+		verbose(env, "accessing referenced kptr disallowed\n");
+		return -EACCES;
+	}
+
 	if (class == BPF_LDX) {
 		val_reg = reg_state(env, value_regno);
 		/* We can simply mark the value_regno receiving the pointer
@@ -3579,7 +3609,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	} else if (class == BPF_STX) {
 		val_reg = reg_state(env, value_regno);
 		if (!register_is_null(val_reg) &&
-		    map_kptr_match_type(env, off_desc, val_reg, value_regno))
+		    map_kptr_match_type(env, off_desc, val_reg, value_regno, false))
 			return -EACCES;
 	} else if (class == BPF_ST) {
 		if (insn->imm) {
@@ -5255,6 +5285,59 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
+static int process_kptr_func(struct bpf_verifier_env *env, int regno,
+			     struct bpf_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_map_value_off_desc *off_desc;
+	struct bpf_map *map_ptr = reg->map_ptr;
+	u32 kptr_off;
+	int ret;
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. kptr has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+	if (!map_ptr->btf) {
+		verbose(env, "map '%s' has to have BTF in order to use bpf_kptr_xchg\n",
+			map_ptr->name);
+		return -EINVAL;
+	}
+	if (!map_value_has_kptr(map_ptr)) {
+		ret = PTR_ERR(map_ptr->kptr_off_tab);
+		if (ret == -E2BIG)
+			verbose(env, "map '%s' has more than %d kptr\n", map_ptr->name,
+				BPF_MAP_VALUE_OFF_MAX);
+		else if (ret == -EEXIST)
+			verbose(env, "map '%s' has repeating kptr BTF tags\n", map_ptr->name);
+		else
+			verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
+		return -EINVAL;
+	}
+
+	meta->map_ptr = map_ptr;
+	/* Check access for BPF_WRITE */
+	meta->raw_mode = true;
+	ret = check_helper_mem_access(env, regno, sizeof(u64), false, meta);
+	if (ret < 0)
+		return ret;
+
+	kptr_off = reg->off + reg->var_off.value;
+	off_desc = bpf_map_kptr_off_contains(map_ptr, kptr_off);
+	if (!off_desc) {
+		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
+		return -EACCES;
+	}
+	if (!(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
+		return -EACCES;
+	}
+	meta->kptr_off_desc = off_desc;
+	return 0;
+}
+
 static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 {
 	return base_type(type) == ARG_PTR_TO_MEM ||
@@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
 
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
@@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
+	[ARG_PTR_TO_KPTR]		= &kptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			  enum bpf_arg_type arg_type,
-			  const u32 *arg_btf_id)
+			  const u32 *arg_btf_id,
+			  struct bpf_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	enum bpf_reg_type expected, type = reg->type;
@@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			arg_btf_id = compatible->btf_id;
 		}
 
-		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
-					  btf_vmlinux, *arg_btf_id)) {
+		if (meta->func_id == BPF_FUNC_kptr_xchg) {
+			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
+				return -EACCES;
+		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+						 btf_vmlinux, *arg_btf_id)) {
 			verbose(env, "R%d is of type %s but %s is expected\n",
 				regno, kernel_type_name(reg->btf, reg->btf_id),
 				kernel_type_name(btf_vmlinux, *arg_btf_id));
@@ -5585,7 +5674,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		 */
 		goto skip_type_check;
 
-	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg]);
+	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg], meta);
 	if (err)
 		return err;
 
@@ -5750,6 +5839,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "string is not zero-terminated\n");
 			return -EINVAL;
 		}
+	} else if (arg_type == ARG_PTR_TO_KPTR) {
+		if (process_kptr_func(env, regno, meta))
+			return -EACCES;
 	}
 
 	return err;
@@ -6092,10 +6184,10 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
-		if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
 			return false;
 
-		if (fn->arg_type[i] != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
 			return false;
 	}
 
@@ -6979,21 +7071,25 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
 	} else if (base_type(ret_type) == RET_PTR_TO_BTF_ID) {
+		struct btf *ret_btf;
 		int ret_btf_id;
 
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
-		ret_btf_id = *fn->ret_btf_id;
+		if (func_id == BPF_FUNC_kptr_xchg) {
+			ret_btf = meta.kptr_off_desc->btf;
+			ret_btf_id = meta.kptr_off_desc->btf_id;
+		} else {
+			ret_btf = btf_vmlinux;
+			ret_btf_id = *fn->ret_btf_id;
+		}
 		if (ret_btf_id == 0) {
 			verbose(env, "invalid return type %u of func %s#%d\n",
 				base_type(ret_type), func_id_name(func_id),
 				func_id);
 			return -EINVAL;
 		}
-		/* current BPF helper definitions are only coming from
-		 * built-in code with type IDs from  vmlinux BTF
-		 */
-		regs[BPF_REG_0].btf = btf_vmlinux;
+		regs[BPF_REG_0].btf = ret_btf;
 		regs[BPF_REG_0].btf_id = ret_btf_id;
 	} else {
 		verbose(env, "unknown return type %u of func %s#%d\n",
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7604e7d5438f..b4e89da75d77 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22  5:58   ` Andrii Nakryiko
  2022-03-20 15:55 ` [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

While we can guarantee that even for unreferenced kptr, the object
pointer points to being freed etc. can be handled by the verifier's
exception handling (normal load patching to PROBE_MEM loads), we still
cannot allow the user to pass these pointers to BPF helpers and kfunc,
because the same exception handling won't be done for accesses inside
the kernel. The same is true if a referenced pointer is loaded using
normal load instruction. Since the reference is not guaranteed to be
held while the pointer is used, it must be marked as untrusted.

Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
all registers loading unreferenced and referenced kptr from BPF maps,
and ensure they can never escape the BPF program and into the kernel by
way of calling stable/unstable helpers.

In check_ptr_to_btf_access, the !type_may_be_null check to reject type
flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
two are checked inside the function and rejected using a proper error
message, but we still want to allow dereference of untrusted case.

Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
walked, so that this flag is never dropped once it has been set on a
PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
direction).

In convert_ctx_accesses, extend the switch case to consider untrusted
PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
conversion for BPF_LDX.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   | 10 +++++++++-
 kernel/bpf/verifier.c | 34 +++++++++++++++++++++++++++-------
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6814e4885fab..9d424d567dd3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -369,7 +369,15 @@ enum bpf_type_flag {
 	 */
 	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
+	/* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
+	 * unreferenced and referenced kptr loaded from map value using a load
+	 * instruction, so that they can only be dereferenced but not escape the
+	 * BPF program into the kernel (i.e. cannot be passed as arguments to
+	 * kfunc or bpf helpers).
+	 */
+	PTR_UNTRUSTED		= BIT(5 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= PTR_UNTRUSTED,
 };
 
 /* Max number of base types. */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f731a0b45acb..9c5c72ea1d98 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -579,6 +579,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		strncpy(prefix, "user_", 32);
 	if (type & MEM_PERCPU)
 		strncpy(prefix, "percpu_", 32);
+	if (type & PTR_UNTRUSTED)
+		strncpy(prefix, "untrusted_", 32);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -3520,8 +3522,17 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	const char *reg_name = "";
 	bool fixed_off_ok = true;
 
-	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
-		goto bad_type;
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
+		if (reg->type != PTR_TO_BTF_ID &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL))
+			goto bad_type;
+	} else { /* only unreferenced case accepts untrusted pointers */
+		if (reg->type != PTR_TO_BTF_ID &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL) &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_UNTRUSTED) &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL | PTR_UNTRUSTED))
+			goto bad_type;
+	}
 
 	if (!btf_is_kernel(reg->btf)) {
 		verbose(env, "R%d must point to kernel BTF\n", regno);
@@ -3592,9 +3603,11 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	if (BPF_MODE(insn->code) != BPF_MEM)
 		goto end;
 
-	/* We cannot directly access kptr_ref */
-	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
-		verbose(env, "accessing referenced kptr disallowed\n");
+	/* We only allow loading referenced kptr, since it will be marked as
+	 * untrusted, similar to unreferenced kptr.
+	 */
+	if (class != BPF_LDX && (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
 
@@ -3604,7 +3617,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
-				off_desc->btf_id, PTR_MAYBE_NULL);
+				off_desc->btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
 		val_reg = reg_state(env, value_regno);
@@ -4369,6 +4382,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (ret < 0)
 		return ret;
 
+	/* If this is an untrusted pointer, all pointers formed by walking it
+	 * also inherit the untrusted flag.
+	 */
+	if (type_flag(reg->type) & PTR_UNTRUSTED)
+		flag |= PTR_UNTRUSTED;
+
 	if (atype == BPF_READ && value_regno >= 0)
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
 
@@ -13065,7 +13084,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		if (!ctx_access)
 			continue;
 
-		switch (env->insn_aux_data[i + delta].ptr_type) {
+		switch ((int)env->insn_aux_data[i + delta].ptr_type) {
 		case PTR_TO_CTX:
 			if (!ops->convert_ctx_access)
 				continue;
@@ -13082,6 +13101,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
 		case PTR_TO_BTF_ID:
+		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22 20:38   ` Andrii Nakryiko
  2022-03-20 15:55 ` [PATCH bpf-next v3 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Since now there might be at most 10 offsets that need handling in
copy_map_value, the manual shuffling and special case is no longer going
to work. Hence, let's generalise the copy_map_value function by using
a sorted array of offsets to skip regions that must be avoided while
copying into and out of a map value.

When the map is created, we populate the offset array in struct map,
with one extra element for map->value_size, which is used as the final
offset to subtract previous offset from. Then, copy_map_value uses this
sorted offset array is used to memcpy while skipping timer, spin lock,
and kptr.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  | 55 +++++++++++++++++++++++---------------------
 kernel/bpf/syscall.c | 52 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9d424d567dd3..6474d2d44b78 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -158,6 +158,10 @@ struct bpf_map_ops {
 enum {
 	/* Support at most 8 pointers in a BPF map value */
 	BPF_MAP_VALUE_OFF_MAX = 8,
+	BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
+				1 + /* for bpf_spin_lock */
+				1 + /* for bpf_timer */
+				1,  /* for map->value_size sentinel */
 };
 
 enum {
@@ -206,9 +210,17 @@ struct bpf_map {
 	char name[BPF_OBJ_NAME_LEN];
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 6 bytes hole */
-
-	/* The 3rd and 4th cacheline with misc members to avoid false sharing
+	/* 2 bytes hole */
+	struct {
+		struct {
+			u32 off;
+			u8 sz;
+		} field[BPF_MAP_OFF_ARR_MAX];
+		u32 cnt;
+	} off_arr;
+	/* 40 bytes hole */
+
+	/* The 4th and 5th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
 	 */
 	atomic64_t refcnt ____cacheline_aligned;
@@ -250,36 +262,27 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
 	if (unlikely(map_value_has_timer(map)))
 		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
+	if (unlikely(map_value_has_kptr(map))) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++)
+			*(u64 *)(dst + tab->off[i].offset) = 0;
+	}
 }
 
 /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
 static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
 {
-	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
+	int i;
 
-	if (unlikely(map_value_has_spin_lock(map))) {
-		s_off = map->spin_lock_off;
-		s_sz = sizeof(struct bpf_spin_lock);
-	}
-	if (unlikely(map_value_has_timer(map))) {
-		t_off = map->timer_off;
-		t_sz = sizeof(struct bpf_timer);
-	}
+	memcpy(dst, src, map->off_arr.field[0].off);
+	for (i = 1; i < map->off_arr.cnt; i++) {
+		u32 curr_off = map->off_arr.field[i - 1].off;
+		u32 next_off = map->off_arr.field[i].off;
 
-	if (unlikely(s_sz || t_sz)) {
-		if (s_off < t_off || !s_sz) {
-			swap(s_off, t_off);
-			swap(s_sz, t_sz);
-		}
-		memcpy(dst, src, t_off);
-		memcpy(dst + t_off + t_sz,
-		       src + t_off + t_sz,
-		       s_off - t_off - t_sz);
-		memcpy(dst + s_off + s_sz,
-		       src + s_off + s_sz,
-		       map->value_size - s_off - s_sz);
-	} else {
-		memcpy(dst, src, map->value_size);
+		curr_off += map->off_arr.field[i - 1].sz;
+		memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
 	}
 }
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5990d6fa97ab..7b32537bd81f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -30,6 +30,7 @@
 #include <linux/pgtable.h>
 #include <linux/bpf_lsm.h>
 #include <linux/poll.h>
+#include <linux/sort.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/memcontrol.h>
@@ -851,6 +852,55 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
+static int map_off_arr_cmp(const void *_a, const void *_b)
+{
+	const u32 a = *(const u32 *)_a;
+	const u32 b = *(const u32 *)_b;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void map_populate_off_arr(struct bpf_map *map)
+{
+	u32 i;
+
+	map->off_arr.cnt = 0;
+	if (map_value_has_spin_lock(map)) {
+		i = map->off_arr.cnt;
+
+		map->off_arr.field[i].off = map->spin_lock_off;
+		map->off_arr.field[i].sz = sizeof(struct bpf_spin_lock);
+		map->off_arr.cnt++;
+	}
+	if (map_value_has_timer(map)) {
+		i = map->off_arr.cnt;
+
+		map->off_arr.field[i].off = map->timer_off;
+		map->off_arr.field[i].sz = sizeof(struct bpf_timer);
+		map->off_arr.cnt++;
+	}
+	if (map_value_has_kptr(map)) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		u32 j = map->off_arr.cnt;
+
+		for (i = 0; i < tab->nr_off; i++) {
+			map->off_arr.field[j + i].off = tab->off[i].offset;
+			map->off_arr.field[j + i].sz = sizeof(u64);
+		}
+		map->off_arr.cnt += tab->nr_off;
+	}
+
+	map->off_arr.field[map->off_arr.cnt++].off = map->value_size;
+	if (map->off_arr.cnt == 1)
+		return;
+	sort(map->off_arr.field, map->off_arr.cnt, sizeof(map->off_arr.field[0]),
+	     map_off_arr_cmp, NULL);
+}
+
 static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			 u32 btf_key_id, u32 btf_value_id)
 {
@@ -1018,6 +1068,8 @@ static int map_create(union bpf_attr *attr)
 			attr->btf_vmlinux_value_type_id;
 	}
 
+	map_populate_off_arr(map);
+
 	err = security_bpf_map_alloc(map);
 	if (err)
 		goto free_map;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-20 15:55 ` [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

To support storing referenced PTR_TO_BTF_ID in maps, we require
associating a specific BTF ID with a 'destructor' kfunc. This is because
we need to release a live referenced pointer at a certain offset in map
value from the map destruction path, otherwise we end up leaking
resources.

Hence, introduce support for passing an array of btf_id, kfunc_btf_id
pairs that denote a BTF ID and its associated release function. Then,
add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
destructor kfunc of a certain BTF ID. If found, we can use it to free
the object from the map free path.

The registration of these pairs also serve as a whitelist of structures
which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
without finding the destructor kfunc, we will bail and return an error.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h |  17 +++++++
 kernel/bpf/btf.c    | 108 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 5b578dc81c04..ff4be49b7a26 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -40,6 +40,11 @@ struct btf_kfunc_id_set {
 	};
 };
 
+struct btf_id_dtor_kfunc {
+	u32 btf_id;
+	u32 kfunc_btf_id;
+};
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
@@ -346,6 +351,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf,
 			       enum btf_kfunc_type type, u32 kfunc_btf_id);
 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 			      const struct btf_kfunc_id_set *s);
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -369,6 +377,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 {
 	return 0;
 }
+static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	return -ENOENT;
+}
+static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
+					      u32 add_cnt, struct module *owner)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e36ad26a5a6e..9cb6f61a50a7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -207,12 +207,18 @@ enum btf_kfunc_hook {
 
 enum {
 	BTF_KFUNC_SET_MAX_CNT = 32,
+	BTF_DTOR_KFUNC_MAX_CNT = 256,
 };
 
 struct btf_kfunc_set_tab {
 	struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX];
 };
 
+struct btf_id_dtor_kfunc_tab {
+	u32 cnt;
+	struct btf_id_dtor_kfunc dtors[];
+};
+
 struct btf {
 	void *data;
 	struct btf_type **types;
@@ -228,6 +234,7 @@ struct btf {
 	u32 id;
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
+	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -1614,8 +1621,19 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
 	btf->kfunc_set_tab = NULL;
 }
 
+static void btf_free_dtor_kfunc_tab(struct btf *btf)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+
+	if (!tab)
+		return;
+	kfree(tab);
+	btf->dtor_kfunc_tab = NULL;
+}
+
 static void btf_free(struct btf *btf)
 {
+	btf_free_dtor_kfunc_tab(btf);
 	btf_free_kfunc_set_tab(btf);
 	kvfree(btf->types);
 	kvfree(btf->resolved_sizes);
@@ -7018,6 +7036,96 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 }
 EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set);
 
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+	struct btf_id_dtor_kfunc *dtor;
+
+	if (!tab)
+		return -ENOENT;
+	/* Even though the size of tab->dtors[0] is > sizeof(u32), we only need
+	 * to compare the first u32 with btf_id, so we can reuse btf_id_cmp_func.
+	 */
+	BUILD_BUG_ON(offsetof(struct btf_id_dtor_kfunc, btf_id) != 0);
+	dtor = bsearch(&btf_id, tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func);
+	if (!dtor)
+		return -ENOENT;
+	return dtor->kfunc_btf_id;
+}
+
+/* This function must be invoked only from initcalls/module init functions */
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner)
+{
+	struct btf_id_dtor_kfunc_tab *tab;
+	struct btf *btf;
+	u32 tab_cnt;
+	int ret;
+
+	btf = btf_get_module_btf(owner);
+	if (!btf) {
+		if (!owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+			pr_err("missing vmlinux BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		if (owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) {
+			pr_err("missing module BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		return 0;
+	}
+	if (IS_ERR(btf))
+		return PTR_ERR(btf);
+
+	if (add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = btf->dtor_kfunc_tab;
+	/* Only one call allowed for modules */
+	if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	tab_cnt = tab ? tab->cnt : 0;
+	if (tab_cnt > U32_MAX - add_cnt) {
+		ret = -EOVERFLOW;
+		goto end;
+	}
+	if (tab_cnt + add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = krealloc(btf->dtor_kfunc_tab,
+		       offsetof(struct btf_id_dtor_kfunc_tab, dtors[tab_cnt + add_cnt]),
+		       GFP_KERNEL | __GFP_NOWARN);
+	if (!tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	if (!btf->dtor_kfunc_tab)
+		tab->cnt = 0;
+	btf->dtor_kfunc_tab = tab;
+
+	memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+	tab->cnt += add_cnt;
+
+	sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
+
+	return 0;
+end:
+	btf_free_dtor_kfunc_tab(btf);
+	btf_put(btf);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(register_btf_id_dtor_kfuncs);
+
 #define MAX_TYPES_ARE_COMPAT_DEPTH 2
 
 static
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22 20:51   ` Andrii Nakryiko
  2022-03-22 21:10   ` Alexei Starovoitov
  2022-03-20 15:55 ` [PATCH bpf-next v3 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

A destructor kfunc can be defined as void func(type *), where type may
be void or any other pointer type as per convenience.

In this patch, we ensure that the type is sane and capture the function
pointer into off_desc of ptr_off_tab for the specific pointer offset,
with the invariant that the dtor pointer is always set when 'kptr_ref'
tag is applied to the pointer's pointee type, which is indicated by the
flag BPF_MAP_VALUE_OFF_F_REF.

Note that only BTF IDs whose destructor kfunc is registered, thus become
the allowed BTF IDs for embedding as referenced kptr. Hence it serves
the purpose of finding dtor kfunc BTF ID, as well acting as a check
against the whitelist of allowed BTF IDs for this purpose.

Finally, wire up the actual freeing of the referenced pointer if any at
all available offsets, so that no references are leaked after the BPF
map goes away and the BPF program previously moved the ownership a
referenced pointer into it.

The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
will free any existing referenced kptr. The same case is with LRU map's
bpf_lru_push_free/htab_lru_push_free functions, which are extended to
reset unreferenced and free referenced kptr.

Note that unlike BPF timers, kptr is not reset or freed when map uref
drops to zero.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  4 ++
 include/linux/btf.h   |  2 +
 kernel/bpf/arraymap.c | 14 ++++++-
 kernel/bpf/btf.c      | 86 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/hashtab.c  | 29 ++++++++++-----
 kernel/bpf/syscall.c  | 57 +++++++++++++++++++++++++---
 6 files changed, 173 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6474d2d44b78..ae52602fdfbf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
 #include <linux/bpfptr.h>
+#include <linux/btf.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -172,6 +173,8 @@ struct bpf_map_value_off_desc {
 	u32 offset;
 	u32 btf_id;
 	struct btf *btf;
+	struct module *module;
+	btf_dtor_kfunc_t dtor;
 	int flags;
 };
 
@@ -1551,6 +1554,7 @@ struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u3
 void bpf_map_free_kptr_off_tab(struct bpf_map *map);
 struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
 bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+void bpf_map_free_kptr(struct bpf_map *map, void *map_value);
 
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index ff4be49b7a26..8acf728c8616 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -45,6 +45,8 @@ struct btf_id_dtor_kfunc {
 	u32 kfunc_btf_id;
 };
 
+typedef void (*btf_dtor_kfunc_t)(void *);
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7f145aefbff8..3cc2884321e7 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -287,10 +287,12 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 	return 0;
 }
 
-static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
+static void check_and_free_timer_and_kptr(struct bpf_array *arr, void *val)
 {
 	if (unlikely(map_value_has_timer(&arr->map)))
 		bpf_timer_cancel_and_free(val + arr->map.timer_off);
+	if (unlikely(map_value_has_kptr(&arr->map)))
+		bpf_map_free_kptr(&arr->map, val);
 }
 
 /* Called from syscall or from eBPF program */
@@ -327,7 +329,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 			copy_map_value_locked(map, val, value, false);
 		else
 			copy_map_value(map, val, value);
-		check_and_free_timer_in_array(array, val);
+		check_and_free_timer_and_kptr(array, val);
 	}
 	return 0;
 }
@@ -386,6 +388,7 @@ static void array_map_free_timers(struct bpf_map *map)
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	int i;
 
+	/* We don't reset or free kptr on uref dropping to zero. */
 	if (likely(!map_value_has_timer(map)))
 		return;
 
@@ -398,6 +401,13 @@ static void array_map_free_timers(struct bpf_map *map)
 static void array_map_free(struct bpf_map *map)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	int i;
+
+	if (unlikely(map_value_has_kptr(map))) {
+		for (i = 0; i < array->map.max_entries; i++)
+			bpf_map_free_kptr(map, array->value + array->elem_size * i);
+		bpf_map_free_kptr_off_tab(map);
+	}
 
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		bpf_array_free_percpu(array);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 9cb6f61a50a7..6227c1be6326 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3408,6 +3408,7 @@ struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
 	/* btf_find_field requires array of size max + 1 */
 	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
 	struct bpf_map_value_off *tab;
+	struct module *mod = NULL;
 	int ret, i, nr_off;
 
 	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
@@ -3438,16 +3439,99 @@ struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
 			goto end;
 		}
 
+		/* Find and stash the function pointer for the destruction function that
+		 * needs to be eventually invoked from the map free path.
+		 */
+		if (info_arr[i].flags & BPF_MAP_VALUE_OFF_F_REF) {
+			const struct btf_type *dtor_func, *dtor_func_proto;
+			const struct btf_param *args;
+			const char *dtor_func_name;
+			unsigned long addr;
+			s32 dtor_btf_id;
+			u32 nr_args;
+
+			/* This call also serves as a whitelist of allowed objects that
+			 * can be used as a referenced pointer and be stored in a map at
+			 * the same time.
+			 */
+			dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
+			if (dtor_btf_id < 0) {
+				ret = dtor_btf_id;
+				btf_put(off_btf);
+				goto end;
+			}
+
+			dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
+			if (!dtor_func || !btf_type_is_func(dtor_func)) {
+				ret = -EINVAL;
+				btf_put(off_btf);
+				goto end;
+			}
+
+			dtor_func_proto = btf_type_by_id(off_btf, dtor_func->type);
+			if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto)) {
+				ret = -EINVAL;
+				btf_put(off_btf);
+				goto end;
+			}
+
+			/* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
+			t = btf_type_by_id(off_btf, dtor_func_proto->type);
+			if (!t || !btf_type_is_void(t)) {
+				ret = -EINVAL;
+				btf_put(off_btf);
+				goto end;
+			}
+
+			nr_args = btf_type_vlen(dtor_func_proto);
+			args = btf_params(dtor_func_proto);
+
+			t = NULL;
+			if (nr_args)
+				t = btf_type_by_id(off_btf, args[0].type);
+			/* Allow any pointer type, as width on targets Linux supports
+			 * will be same for all pointer types (i.e. sizeof(void *))
+			 */
+			if (nr_args != 1 || !t || !btf_type_is_ptr(t)) {
+				ret = -EINVAL;
+				btf_put(off_btf);
+				goto end;
+			}
+
+			if (btf_is_module(btf)) {
+				mod = btf_try_get_module(off_btf);
+				if (!mod) {
+					ret = -ENXIO;
+					btf_put(off_btf);
+					goto end;
+				}
+			}
+
+			dtor_func_name = __btf_name_by_offset(off_btf, dtor_func->name_off);
+			addr = kallsyms_lookup_name(dtor_func_name);
+			if (!addr) {
+				ret = -EINVAL;
+				module_put(mod);
+				btf_put(off_btf);
+				goto end;
+			}
+			tab->off[i].dtor = (void *)addr;
+		}
+
 		tab->off[i].offset = info_arr[i].off;
 		tab->off[i].btf_id = id;
 		tab->off[i].btf = off_btf;
+		tab->off[i].module = mod;
 		tab->off[i].flags = info_arr[i].flags;
 		tab->nr_off = i + 1;
 	}
 	return tab;
 end:
-	while (tab->nr_off--)
+	while (tab->nr_off--) {
 		btf_put(tab->off[tab->nr_off].btf);
+		if (tab->off[tab->nr_off].module)
+			module_put(tab->off[tab->nr_off].module);
+	}
 	kfree(tab);
 	return ERR_PTR(ret);
 }
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 65877967f414..fa4a0a8754c5 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -725,12 +725,16 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
 	return insn - insn_buf;
 }
 
-static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
+static void check_and_free_timer_and_kptr(struct bpf_htab *htab,
+					  struct htab_elem *elem,
+					  bool free_kptr)
 {
+	void *map_value = elem->key + round_up(htab->map.key_size, 8);
+
 	if (unlikely(map_value_has_timer(&htab->map)))
-		bpf_timer_cancel_and_free(elem->key +
-					  round_up(htab->map.key_size, 8) +
-					  htab->map.timer_off);
+		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
+	if (unlikely(map_value_has_kptr(&htab->map)) && free_kptr)
+		bpf_map_free_kptr(&htab->map, map_value);
 }
 
 /* It is called from the bpf_lru_list when the LRU needs to delete
@@ -757,7 +761,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
 		if (l == tgt_l) {
 			hlist_nulls_del_rcu(&l->hash_node);
-			check_and_free_timer(htab, l);
+			check_and_free_timer_and_kptr(htab, l, true);
 			break;
 		}
 
@@ -829,7 +833,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
 {
 	if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
 		free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
-	check_and_free_timer(htab, l);
+	check_and_free_timer_and_kptr(htab, l, true);
 	kfree(l);
 }
 
@@ -857,7 +861,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 	htab_put_fd_value(htab, l);
 
 	if (htab_is_prealloc(htab)) {
-		check_and_free_timer(htab, l);
+		check_and_free_timer_and_kptr(htab, l, true);
 		__pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		atomic_dec(&htab->count);
@@ -1104,7 +1108,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		if (!htab_is_prealloc(htab))
 			free_htab_elem(htab, l_old);
 		else
-			check_and_free_timer(htab, l_old);
+			check_and_free_timer_and_kptr(htab, l_old, true);
 	}
 	ret = 0;
 err:
@@ -1114,7 +1118,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 
 static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
 {
-	check_and_free_timer(htab, elem);
+	check_and_free_timer_and_kptr(htab, elem, true);
 	bpf_lru_push_free(&htab->lru, &elem->lru_node);
 }
 
@@ -1420,7 +1424,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
 		struct htab_elem *l;
 
 		hlist_nulls_for_each_entry(l, n, head, hash_node)
-			check_and_free_timer(htab, l);
+			/* We don't reset or free kptr on uref dropping to zero,
+			 * hence set free_kptr to false.
+			 */
+			check_and_free_timer_and_kptr(htab, l, false);
 		cond_resched_rcu();
 	}
 	rcu_read_unlock();
@@ -1430,6 +1437,7 @@ static void htab_map_free_timers(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 
+	/* We don't reset or free kptr on uref dropping to zero. */
 	if (likely(!map_value_has_timer(&htab->map)))
 		return;
 	if (!htab_is_prealloc(htab))
@@ -1458,6 +1466,7 @@ static void htab_map_free(struct bpf_map *map)
 	else
 		prealloc_destroy(htab);
 
+	bpf_map_free_kptr_off_tab(map);
 	free_percpu(htab->extra_elems);
 	bpf_map_area_free(htab->buckets);
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7b32537bd81f..3901a049fe2a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -508,8 +508,11 @@ void bpf_map_free_kptr_off_tab(struct bpf_map *map)
 	if (!map_value_has_kptr(map))
 		return;
 	for (i = 0; i < tab->nr_off; i++) {
+		struct module *mod = tab->off[i].module;
 		struct btf *btf = tab->off[i].btf;
 
+		if (mod)
+			module_put(mod);
 		btf_put(btf);
 	}
 	kfree(tab);
@@ -524,8 +527,16 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
 	if (!map_value_has_kptr(map))
 		return ERR_PTR(-ENOENT);
 	/* Do a deep copy of the kptr_off_tab */
-	for (i = 0; i < tab->nr_off; i++)
-		btf_get(tab->off[i].btf);
+	for (i = 0; i < tab->nr_off; i++) {
+		struct module *mod = tab->off[i].module;
+		struct btf *btf = tab->off[i].btf;
+
+		if (mod && !try_module_get(mod)) {
+			ret = -ENXIO;
+			goto end;
+		}
+		btf_get(btf);
+	}
 
 	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
 	new_tab = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
@@ -536,8 +547,14 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
 	memcpy(new_tab, tab, size);
 	return new_tab;
 end:
-	while (i--)
-		btf_put(tab->off[i].btf);
+	while (i--) {
+		struct module *mod = tab->off[i].module;
+		struct btf *btf = tab->off[i].btf;
+
+		if (mod)
+			module_put(mod);
+		btf_put(btf);
+	}
 	return ERR_PTR(ret);
 }
 
@@ -557,15 +574,43 @@ bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_ma
 	return !memcmp(tab_a, tab_b, size);
 }
 
+/* Caller must ensure map_value_has_kptr is true. Note that this function can be
+ * called on a map value while the map_value is visible to BPF programs, as it
+ * ensures the correct synchronization, and we already enforce the same using
+ * the verifier on the BPF program side, esp. for referenced pointers.
+ */
+void bpf_map_free_kptr(struct bpf_map *map, void *map_value)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab;
+	unsigned long *btf_id_ptr;
+	int i;
+
+	for (i = 0; i < tab->nr_off; i++) {
+		struct bpf_map_value_off_desc *off_desc = &tab->off[i];
+		unsigned long old_ptr;
+
+		btf_id_ptr = map_value + off_desc->offset;
+		if (!(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+			u64 *p = (u64 *)btf_id_ptr;
+
+			WRITE_ONCE(p, 0);
+			continue;
+		}
+		old_ptr = xchg(btf_id_ptr, 0);
+		off_desc->dtor((void *)old_ptr);
+	}
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
-	bpf_map_free_kptr_off_tab(map);
 	bpf_map_release_memcg(map);
-	/* implementation dependent freeing */
+	/* implementation dependent freeing, map_free callback also does
+	 * bpf_map_free_kptr_off_tab, if needed.
+	 */
 	map->ops->map_free(map);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 10/13] bpf: Teach verifier about kptr_get kfunc helpers
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-20 15:55 ` [PATCH bpf-next v3 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

We introduce a new style of kfunc helpers, namely *_kptr_get, where they
take pointer to the map value which points to a referenced kernel
pointer contained in the map. Since this is referenced, only
bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to
change the current value, and each pointer that resides in that location
would be referenced, and RCU protected (this must be kept in mind while
adding kernel types embeddable as reference kptr in BPF maps).

This means that if do the load of the pointer value in an RCU read
section, and find a live pointer, then as long as we hold RCU read lock,
it won't be freed by a parallel xchg + release operation. This allows us
to implement a safe refcount increment scheme. Hence, enforce that first
argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
right offset to referenced pointer.

For the rest of the arguments, they are subjected to typical kfunc
argument checks, hence allowing some flexibility in passing more intent
into how the reference should be taken.

For instance, in case of struct nf_conn, it is not freed until RCU grace
period ends, but can still be reused for another tuple once refcount has
dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call
refcount_inc_not_zero, but also do a tuple match after incrementing the
reference, and when it fails to match it, put the reference again and
return NULL.

This can be implemented easily if we allow passing additional parameters
to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
tuple__sz pair.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h |  2 ++
 kernel/bpf/btf.c    | 61 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8acf728c8616..d5d37bfde8df 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,6 +17,7 @@ enum btf_kfunc_type {
 	BTF_KFUNC_TYPE_ACQUIRE,
 	BTF_KFUNC_TYPE_RELEASE,
 	BTF_KFUNC_TYPE_RET_NULL,
+	BTF_KFUNC_TYPE_KPTR_ACQUIRE,
 	BTF_KFUNC_TYPE_MAX,
 };
 
@@ -35,6 +36,7 @@ struct btf_kfunc_id_set {
 			struct btf_id_set *acquire_set;
 			struct btf_id_set *release_set;
 			struct btf_id_set *ret_null_set;
+			struct btf_id_set *kptr_acquire_set;
 		};
 		struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
 	};
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6227c1be6326..e11e44f83301 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6060,11 +6060,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 	struct bpf_verifier_log *log = &env->log;
 	u32 i, nargs, ref_id, ref_obj_id = 0;
 	bool is_kfunc = btf_is_kernel(btf);
+	bool rel = false, kptr_get = false;
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
 	int ref_regno = 0, ret;
-	bool rel = false;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
@@ -6090,10 +6090,14 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	/* Only kfunc can be release func */
-	if (is_kfunc)
+	if (is_kfunc) {
+		/* Only kfunc can be release func */
 		rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
 						BTF_KFUNC_TYPE_RELEASE, func_id);
+		kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
+						     BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
+	}
+
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
@@ -6122,8 +6126,55 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		if (ret < 0)
 			return ret;
 
-		if (btf_get_prog_ctx_type(log, btf, t,
-					  env->prog->type, i)) {
+		/* kptr_get is only true for kfunc */
+		if (i == 0 && kptr_get) {
+			struct bpf_map_value_off_desc *off_desc;
+
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				bpf_log(log, "arg#0 expected pointer to map value\n");
+				return -EINVAL;
+			}
+
+			ret = check_mem_reg(env, reg, regno, sizeof(u64));
+			if (ret < 0)
+				return ret;
+
+			/* check_func_arg_reg_off allows var_off for
+			 * PTR_TO_MAP_VALUE, but we need fixed offset to find
+			 * off_desc.
+			 */
+			if (!tnum_is_const(reg->var_off)) {
+				bpf_log(log, "arg#0 must have constant offset\n");
+				return -EINVAL;
+			}
+
+			off_desc = bpf_map_kptr_off_contains(reg->map_ptr, reg->off + reg->var_off.value);
+			if (!off_desc || !(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
+					reg->off + reg->var_off.value);
+				return -EINVAL;
+			}
+
+			if (!btf_type_is_ptr(ref_t)) {
+				bpf_log(log, "arg#0 BTF type must be a double pointer\n");
+				return -EINVAL;
+			}
+
+			ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id);
+			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+
+			if (!btf_type_is_struct(ref_t)) {
+				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			if (!btf_struct_ids_match(log, btf, ref_id, 0, off_desc->btf, off_desc->btf_id)) {
+				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			/* rest of the arguments can be anything, like normal kfunc */
+		} else if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-20 15:55 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
  2022-03-20 15:55 ` [PATCH bpf-next v3 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Include convenience definitions:
__kptr:		Unreferenced BTF ID pointer
__kptr_ref:	Referenced BTF ID pointer

Users can use them to tag the pointer type meant to be used with the new
support directly in the map value definition.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/bpf_helpers.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 44df982d2a5c..bbae9a057bc8 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -149,6 +149,8 @@ enum libbpf_tristate {
 
 #define __kconfig __attribute__((section(".kconfig")))
 #define __ksym __attribute__((section(".ksyms")))
+#define __kptr __attribute__((btf_type_tag("kptr")))
+#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
 
 #ifndef ___bpf_concat
 #define ___bpf_concat(a, b) a ## b
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  2022-03-22 21:00   ` Andrii Nakryiko
  2022-03-24  9:10   ` Jiri Olsa
  2022-03-20 15:55 ` [PATCH bpf-next v3 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
  12 siblings, 2 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

This uses the __kptr and __kptr_ref macros as well, and tries to test
the stuff that is supposed to work, since we have negative tests in
test_verifier suite. Also include some code to test map-in-map support,
such that the inner_map_meta matches the kptr_off_tab of map added as
element.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/map_kptr.c       |  20 ++
 tools/testing/selftests/bpf/progs/map_kptr.c  | 194 ++++++++++++++++++
 2 files changed, 214 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c

diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
new file mode 100644
index 000000000000..688732295ce9
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#include "map_kptr.skel.h"
+
+void test_map_kptr(void)
+{
+	struct map_kptr *skel;
+	char buf[24];
+	int key = 0;
+
+	skel = map_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
+		return;
+	ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_map), &key, buf, 0),
+		  "bpf_map_update_elem hash_map");
+	ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key, buf, 0),
+		  "bpf_map_update_elem hash_malloc_map");
+	map_kptr__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
new file mode 100644
index 000000000000..75df3dc05db2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/map_kptr.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+struct map_value {
+	struct prog_test_ref_kfunc __kptr *unref_ptr;
+	struct prog_test_ref_kfunc __kptr_ref *ref_ptr;
+};
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+struct hash_map {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} hash_map SEC(".maps");
+
+struct hash_malloc_map {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+} hash_malloc_map SEC(".maps");
+
+struct lru_hash_map {
+	__uint(type, BPF_MAP_TYPE_LRU_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} lru_hash_map SEC(".maps");
+
+#define DEFINE_MAP_OF_MAP(map_type, inner_map_type, name)       \
+	struct {                                                \
+		__uint(type, map_type);                         \
+		__uint(max_entries, 1);                         \
+		__uint(key_size, sizeof(int));                  \
+		__uint(value_size, sizeof(int));                \
+		__array(values, struct inner_map_type);         \
+	} name SEC(".maps") = {                                 \
+		.values = { [0] = &inner_map_type },            \
+	}
+
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_map, array_of_array_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_map, array_of_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_malloc_map, array_of_hash_malloc_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, lru_hash_map, array_of_lru_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, array_map, hash_of_array_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_map, hash_of_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_malloc_map, hash_of_hash_malloc_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, lru_hash_map, hash_of_lru_hash_maps);
+
+extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
+extern struct prog_test_ref_kfunc *
+bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
+extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
+
+static __always_inline
+void test_kptr_unref(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = v->unref_ptr;
+	/* store untrusted_ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return;
+	if (p->a + p->b > 100)
+		return;
+	/* store untrusted_ptr_ */
+	v->unref_ptr = p;
+	/* store NULL */
+	v->unref_ptr = NULL;
+}
+
+static __always_inline
+void test_kptr_ref(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = v->ref_ptr;
+	/* store ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return;
+	if (p->a + p->b > 100)
+		return;
+	/* store NULL */
+	p = bpf_kptr_xchg(&v->ref_ptr, NULL);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	/* store ptr_ */
+	v->unref_ptr = p;
+	bpf_kfunc_call_test_release(p);
+
+	p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
+	if (!p)
+		return;
+	/* store ptr_ */
+	p = bpf_kptr_xchg(&v->ref_ptr, p);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	bpf_kfunc_call_test_release(p);
+}
+
+static __always_inline
+void test_kptr_get(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	bpf_kfunc_call_test_release(p);
+}
+
+static __always_inline
+void test_kptr(struct map_value *v)
+{
+	test_kptr_unref(v);
+	test_kptr_ref(v);
+	test_kptr_get(v);
+}
+
+SEC("tc")
+int test_map_kptr(struct __sk_buff *ctx)
+{
+	void *maps[] = {
+		&array_map,
+		&hash_map,
+		&hash_malloc_map,
+		&lru_hash_map,
+	};
+	struct map_value *v;
+	int i, key = 0;
+
+	for (i = 0; i < sizeof(maps) / sizeof(*maps); i++) {
+		v = bpf_map_lookup_elem(&array_map, &key);
+		if (!v)
+			return 0;
+		test_kptr(v);
+	}
+	return 0;
+}
+
+SEC("tc")
+int test_map_in_map_kptr(struct __sk_buff *ctx)
+{
+	void *map_of_maps[] = {
+		&array_of_array_maps,
+		&array_of_hash_maps,
+		&array_of_hash_malloc_maps,
+		&array_of_lru_hash_maps,
+		&hash_of_array_maps,
+		&hash_of_hash_maps,
+		&hash_of_hash_malloc_maps,
+		&hash_of_lru_hash_maps,
+	};
+	struct map_value *v;
+	int i, key = 0;
+	void *map;
+
+	for (i = 0; i < sizeof(map_of_maps) / sizeof(*map_of_maps); i++) {
+		map = bpf_map_lookup_elem(&array_of_array_maps, &key);
+		if (!map)
+			return 0;
+		v = bpf_map_lookup_elem(map, &key);
+		if (!v)
+			return 0;
+		test_kptr(v);
+	}
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH bpf-next v3 13/13] selftests/bpf: Add verifier tests for kptr
  2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2022-03-20 15:55 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
@ 2022-03-20 15:55 ` Kumar Kartikeya Dwivedi
  12 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-20 15:55 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in
BPF map case, including some tests that verify implementation sanity and
corner cases.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 net/bpf/test_run.c                            |  39 +-
 tools/testing/selftests/bpf/test_verifier.c   |  49 +-
 .../testing/selftests/bpf/verifier/map_kptr.c | 445 ++++++++++++++++++
 3 files changed, 526 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/map_kptr.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index e7b9c2636d10..be1cd7498a4e 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -584,6 +584,12 @@ noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p)
 {
 }
 
+noinline struct prog_test_ref_kfunc *
+bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b)
+{
+	return &prog_test_struct;
+}
+
 struct prog_test_pass1 {
 	int x0;
 	struct {
@@ -669,6 +675,7 @@ BTF_ID(func, bpf_kfunc_call_test3)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
 BTF_ID(func, bpf_kfunc_call_test_release)
 BTF_ID(func, bpf_kfunc_call_memb_release)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_ID(func, bpf_kfunc_call_test_pass_ctx)
 BTF_ID(func, bpf_kfunc_call_test_pass1)
 BTF_ID(func, bpf_kfunc_call_test_pass2)
@@ -682,6 +689,7 @@ BTF_SET_END(test_sk_check_kfunc_ids)
 
 BTF_SET_START(test_sk_acquire_kfunc_ids)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_SET_END(test_sk_acquire_kfunc_ids)
 
 BTF_SET_START(test_sk_release_kfunc_ids)
@@ -691,8 +699,13 @@ BTF_SET_END(test_sk_release_kfunc_ids)
 
 BTF_SET_START(test_sk_ret_null_kfunc_ids)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_SET_END(test_sk_ret_null_kfunc_ids)
 
+BTF_SET_START(test_sk_kptr_acquire_kfunc_ids)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
+BTF_SET_END(test_sk_kptr_acquire_kfunc_ids)
+
 static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
 			   u32 size, u32 headroom, u32 tailroom)
 {
@@ -1579,14 +1592,30 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
 
 static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
 	.owner        = THIS_MODULE,
-	.check_set    = &test_sk_check_kfunc_ids,
-	.acquire_set  = &test_sk_acquire_kfunc_ids,
-	.release_set  = &test_sk_release_kfunc_ids,
-	.ret_null_set = &test_sk_ret_null_kfunc_ids,
+	.check_set        = &test_sk_check_kfunc_ids,
+	.acquire_set      = &test_sk_acquire_kfunc_ids,
+	.release_set      = &test_sk_release_kfunc_ids,
+	.ret_null_set     = &test_sk_ret_null_kfunc_ids,
+	.kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids
 };
 
+BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
+BTF_ID(struct, prog_test_ref_kfunc)
+BTF_ID(func, bpf_kfunc_call_test_release)
+
 static int __init bpf_prog_test_run_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
+		{
+		  .btf_id       = bpf_prog_test_dtor_kfunc_ids[0],
+		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
+		},
+	};
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
+						  ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
+						  THIS_MODULE);
 }
 late_initcall(bpf_prog_test_run_init);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index a2cd236c32eb..847402f570bd 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -53,7 +53,7 @@
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_TEST_INSNS	1000000
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	22
+#define MAX_NR_MAPS	23
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -101,6 +101,7 @@ struct bpf_test {
 	int fixup_map_reuseport_array[MAX_FIXUPS];
 	int fixup_map_ringbuf[MAX_FIXUPS];
 	int fixup_map_timer[MAX_FIXUPS];
+	int fixup_map_kptr[MAX_FIXUPS];
 	struct kfunc_btf_id_pair fixup_kfunc_btf_id[MAX_FIXUPS];
 	/* Expected verifier log output for result REJECT or VERBOSE_ACCEPT.
 	 * Can be a tab-separated sequence of expected strings. An empty string
@@ -621,8 +622,13 @@ static int create_cgroup_storage(bool percpu)
  * struct timer {
  *   struct bpf_timer t;
  * };
+ * struct btf_ptr {
+ *   struct prog_test_ref_kfunc __kptr *ptr;
+ *   struct prog_test_ref_kfunc __kptr_ref *ptr;
+ * }
  */
-static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t";
+static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t"
+				  "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref";
 static __u32 btf_raw_types[] = {
 	/* int */
 	BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
@@ -638,6 +644,18 @@ static __u32 btf_raw_types[] = {
 	/* struct timer */                              /* [5] */
 	BTF_TYPE_ENC(35, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), 16),
 	BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */
+	/* struct prog_test_ref_kfunc */		/* [6] */
+	BTF_STRUCT_ENC(51, 0, 0),
+	/* type tag "kptr" */
+	BTF_TYPE_TAG_ENC(75, 6),			/* [7] */
+	/* type tag "kptr_ref" */
+	BTF_TYPE_TAG_ENC(80, 6),			/* [8] */
+	BTF_PTR_ENC(7),					/* [9] */
+	BTF_PTR_ENC(8),					/* [10] */
+	/* struct btf_ptr */				/* [11] */
+	BTF_STRUCT_ENC(43, 2, 16),
+	BTF_MEMBER_ENC(71, 9, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */
+	BTF_MEMBER_ENC(71, 10, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */
 };
 
 static int load_btf(void)
@@ -727,6 +745,25 @@ static int create_map_timer(void)
 	return fd;
 }
 
+static int create_map_kptr(void)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		.btf_key_type_id = 1,
+		.btf_value_type_id = 11,
+	);
+	int fd, btf_fd;
+
+	btf_fd = load_btf();
+	if (btf_fd < 0)
+		return -1;
+
+	opts.btf_fd = btf_fd;
+	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_map", 4, 16, 1, &opts);
+	if (fd < 0)
+		printf("Failed to create map with btf_id pointer\n");
+	return fd;
+}
+
 static char bpf_vlog[UINT_MAX >> 8];
 
 static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
@@ -754,6 +791,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_map_reuseport_array = test->fixup_map_reuseport_array;
 	int *fixup_map_ringbuf = test->fixup_map_ringbuf;
 	int *fixup_map_timer = test->fixup_map_timer;
+	int *fixup_map_kptr = test->fixup_map_kptr;
 	struct kfunc_btf_id_pair *fixup_kfunc_btf_id = test->fixup_kfunc_btf_id;
 
 	if (test->fill_helper) {
@@ -947,6 +985,13 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_map_timer++;
 		} while (*fixup_map_timer);
 	}
+	if (*fixup_map_kptr) {
+		map_fds[22] = create_map_kptr();
+		do {
+			prog[*fixup_map_kptr].imm = map_fds[22];
+			fixup_map_kptr++;
+		} while (*fixup_map_kptr);
+	}
 
 	/* Patch in kfunc BTF IDs */
 	if (fixup_kfunc_btf_id->kfunc) {
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
new file mode 100644
index 000000000000..afca65491a18
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -0,0 +1,445 @@
+/* Common tests */
+{
+	"map_kptr: BPF_ST imm != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "BPF_ST imm must be 0 when storing to kptr at off=0",
+},
+{
+	"map_kptr: size != bpf_size_to_bytes(BPF_DW)",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_W, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access size must be BPF_DW",
+},
+{
+	"map_kptr: map_value non-const var_off",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_3, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access cannot have variable offset",
+},
+{
+	"map_kptr: bpf_kptr_xchg non-const var_off",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_3),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 doesn't have constant offset. kptr has to be at the constant offset",
+},
+{
+	"map_kptr: unaligned boundary load/store",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 7),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access misaligned expected=0 off=7",
+},
+{
+	"map_kptr: reject var_off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "variable untrusted_ptr_ access var_off=(0x0; 0x7) disallowed",
+},
+/* Tests for unreferened PTR_TO_BTF_ID */
+{
+	"map_kptr: unref: reject btf_struct_ids_match == false",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 4),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc expected=ptr_prog_test",
+},
+{
+	"map_kptr: unref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R0 invalid mem access 'untrusted_ptr_or_null_'",
+},
+{
+	"map_kptr: unref: correct in kernel type size",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 24),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "access beyond struct prog_test_ref_kfunc at off 24 size 8",
+},
+{
+	"map_kptr: unref: inherit PTR_UNTRUSTED on struct walk",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 16),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_ expected=percpu_ptr_",
+},
+{
+	"map_kptr: unref: no reference state created",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = ACCEPT,
+},
+{
+	"map_kptr: unref: bpf_kptr_xchg rejected",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "off=0 kptr isn't referenced kptr",
+},
+{
+	"map_kptr: unref: bpf_kfunc_call_test_kptr_get rejected",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "arg#0 no referenced kptr at map value offset=0",
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_kptr_get", 13 },
+	}
+},
+/* Tests for referenced PTR_TO_BTF_ID */
+{
+	"map_kptr: ref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_",
+},
+{
+	"map_kptr: ref: reject off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 4),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "dereference of modified ptr_ ptr R2 off=4 disallowed",
+},
+{
+	"map_kptr: ref: reference state created and released on xchg",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "Unreleased reference id=5 alloc_insn=20",
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_acquire", 15 },
+	}
+},
+{
+	"map_kptr: ref: reject STX",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, 0),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 8),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "store to referenced kptr disallowed",
+},
+{
+	"map_kptr: ref: reject ST",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 8, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "store to referenced kptr disallowed",
+},
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-03-21 23:39   ` Joanne Koong
  2022-03-22  7:04     ` Kumar Kartikeya Dwivedi
  2022-03-22  5:45   ` Andrii Nakryiko
  2022-03-22 18:06   ` Martin KaFai Lau
  2 siblings, 1 reply; 44+ messages in thread
From: Joanne Koong @ 2022-03-21 23:39 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 5:27 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This commit introduces a new pointer type 'kptr' which can be embedded
> in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> register must have the same type as in the map value's BTF, and loading
> a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> kernel BTF and BTF ID.
>
> Such kptr are unreferenced, i.e. by the time another invocation of the
> BPF program loads this pointer, the object which the pointer points to
> may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> patched to PROBE_MEM loads by the verifier, it would safe to allow user
> to still access such invalid pointer, but passing such pointers into
> BPF helpers and kfuncs should not be permitted. A future patch in this
> series will close this gap.
>
> The flexibility offered by allowing programs to dereference such invalid
> pointers while being safe at runtime frees the verifier from doing
> complex lifetime tracking. As long as the user may ensure that the
> object remains valid, it can ensure data read by it from the kernel
> object is valid.
>
> The user indicates that a certain pointer must be treated as kptr
> capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> information is recorded in the object BTF which will be passed into the
> kernel by way of map's BTF information. The name and kind from the map
> value BTF is used to look up the in-kernel type, and the actual BTF and
> BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> now, only storing pointers to structs is permitted.
>
> An example of this specification is shown below:
>
>         #define __kptr __attribute__((btf_type_tag("kptr")))
>
>         struct map_value {
>                 ...
>                 struct task_struct __kptr *task;
>                 ...
>         };
>
> Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> task_struct into the map, and then load it later.
>
> Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> the verifier cannot know whether the value is NULL or not statically, it
> must treat all potential loads at that map value offset as loading a
> possibly NULL pointer.
>
> Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> are allowed instructions that can access such a pointer. On BPF_LDX, the
> destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> it is checked whether the source register type is a PTR_TO_BTF_ID with
> same BTF type as specified in the map BTF. The access size must always
> be BPF_DW.
>
> For the map in map support, the kptr_off_tab for outer map is copied
> from the inner map's kptr_off_tab. It was chosen to do a deep copy
> instead of introducing a refcount to kptr_off_tab, because the copy only
> needs to be done when paramterizing using inner_map_fd in the map in map
> case, hence would be unnecessary for all other users.
>
> It is not permitted to use MAP_FREEZE command and mmap for BPF map
> having kptr, similar to the bpf_timer case.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h     |  29 +++++++-
>  include/linux/btf.h     |   2 +
>  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
>  kernel/bpf/map_in_map.c |   5 +-
>  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
>  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
>  6 files changed, 401 insertions(+), 28 deletions(-)
>
[...]
> +
>  struct bpf_map *bpf_map_get(u32 ufd);
>  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
>  struct bpf_map *__bpf_map_get(struct fd f);
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 36bc09b8e890..5b578dc81c04 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
>                            u32 expected_offset, u32 expected_size);
>  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
>  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> +                                       const struct btf_type *t);

nit: given that "btf_find_kptr" allocates memory as well, maybe the
name "btf_parse_kptr" would be more reflective?

>  bool btf_type_is_void(const struct btf_type *t);
>  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
>  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 9e17af936a7a..92afbec0a887 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3164,9 +3164,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
>  enum {
>         BTF_FIELD_SPIN_LOCK,
>         BTF_FIELD_TIMER,
> +       BTF_FIELD_KPTR,
> +};
> +
> +enum {
> +       BTF_FIELD_IGNORE = 0,
> +       BTF_FIELD_FOUND  = 1,
>  };
>
>  struct btf_field_info {
> +       const struct btf_type *type;
>         u32 off;
>  };
>
> @@ -3174,23 +3181,48 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
>                                  u32 off, int sz, struct btf_field_info *info)
>  {
>         if (!__btf_type_is_struct(t))
> -               return 0;
> +               return BTF_FIELD_IGNORE;
>         if (t->size != sz)
> -               return 0;
> -       if (info->off != -ENOENT)
> -               /* only one such field is allowed */
> -               return -E2BIG;
> +               return BTF_FIELD_IGNORE;
>         info->off = off;
> -       return 0;
> +       return BTF_FIELD_FOUND;
> +}
> +
> +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> +                              u32 off, int sz, struct btf_field_info *info)
> +{
> +       /* For PTR, sz is always == 8 */
> +       if (!btf_type_is_ptr(t))
> +               return BTF_FIELD_IGNORE;
> +       t = btf_type_by_id(btf, t->type);
> +
> +       if (!btf_type_is_type_tag(t))
> +               return BTF_FIELD_IGNORE;
> +       /* Reject extra tags */
> +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> +               return -EINVAL;
> +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> +               return -EINVAL;
> +
> +       /* Get the base type */
> +       if (btf_type_is_modifier(t))
> +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> +       /* Only pointer to struct is allowed */
> +       if (!__btf_type_is_struct(t))
> +               return -EINVAL;
> +
> +       info->type = t;
> +       info->off = off;
> +       return BTF_FIELD_FOUND;
>  }
>
>  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
>                                  const char *name, int sz, int align, int field_type,
> -                                struct btf_field_info *info)
> +                                struct btf_field_info *info, int info_cnt)

From my understanding, this patch now modifies btf_find_struct_field
and btf_find_datasec_var such that the "info" that is passed in has to
be an array of size max possible + 1 while "info_cnt" is the max
possible count, or we risk writing beyond the "info" array passed in.
It seems like we could just modify the
btf_find_struct_field/btf_find_datasec_var logic so that the user can
just pass in info array of max possible size instead of max possible
size + 1 - or is your concern that this would require more idx >=
info_cnt checks inside the functions? Maybe we should include a
comment here and in btf_find_datasec_var to document that "info"
should always be max possible size + 1?

>  {
>         const struct btf_member *member;
> +       int ret, idx = 0;
>         u32 i, off;
> -       int ret;
>
>         for_each_member(i, t, member) {
>                 const struct btf_type *member_type = btf_type_by_id(btf,
> @@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
>                 switch (field_type) {
>                 case BTF_FIELD_SPIN_LOCK:
>                 case BTF_FIELD_TIMER:
> -                       ret = btf_find_field_struct(btf, member_type, off, sz, info);
> +                       ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
> +                       if (ret < 0)
> +                               return ret;
> +                       break;
> +               case BTF_FIELD_KPTR:
> +                       ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
>                         if (ret < 0)
>                                 return ret;
>                         break;
>                 default:
>                         return -EFAULT;
>                 }
> +
> +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> +                       return -E2BIG;
> +               else if (ret == BTF_FIELD_IGNORE)
> +                       continue;
> +               ++idx;
>         }
> -       return 0;
> +       return idx;
>  }
>
>  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>                                 const char *name, int sz, int align, int field_type,
> -                               struct btf_field_info *info)
> +                               struct btf_field_info *info, int info_cnt)
>  {
>         const struct btf_var_secinfo *vsi;
> +       int ret, idx = 0;
>         u32 i, off;
> -       int ret;
>
>         for_each_vsi(i, t, vsi) {
>                 const struct btf_type *var = btf_type_by_id(btf, vsi->type);
> @@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>                 switch (field_type) {
>                 case BTF_FIELD_SPIN_LOCK:
>                 case BTF_FIELD_TIMER:
> -                       ret = btf_find_field_struct(btf, var_type, off, sz, info);
> +                       ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
> +                       if (ret < 0)
> +                               return ret;
> +                       break;
> +               case BTF_FIELD_KPTR:
> +                       ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
>                         if (ret < 0)
>                                 return ret;
>                         break;
>                 default:
>                         return -EFAULT;
>                 }
> +
> +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> +                       return -E2BIG;
> +               if (ret == BTF_FIELD_IGNORE)
> +                       continue;
> +               ++idx;
>         }
> -       return 0;
> +       return idx;
>  }
>
>  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> -                         int field_type, struct btf_field_info *info)
> +                         int field_type, struct btf_field_info *info, int info_cnt)
>  {
>         const char *name;
>         int sz, align;
> @@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
>                 sz = sizeof(struct bpf_timer);
>                 align = __alignof__(struct bpf_timer);
>                 break;
> +       case BTF_FIELD_KPTR:
> +               name = NULL;
> +               sz = sizeof(u64);
> +               align = __alignof__(u64);
> +               break;
>         default:
>                 return -EFAULT;
>         }
>
> +       /* The maximum allowed fields of a certain type will be info_cnt - 1 */
>         if (__btf_type_is_struct(t))
> -               return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
> +               return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);
>         else if (btf_type_is_datasec(t))
> -               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
> +               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
>         return -EINVAL;
>  }
>
> @@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
>   */
>  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
>  {
> -       struct btf_field_info info = { .off = -ENOENT };
> +       /* btf_find_field requires array of size max + 1 */
> +       struct btf_field_info info_arr[2];
>         int ret;
>
> -       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
> +       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
>         if (ret < 0)
>                 return ret;
> -       return info.off;
> +       if (!ret)
> +               return -ENOENT;
> +       return info_arr[0].off;
>  }
>
>  int btf_find_timer(const struct btf *btf, const struct btf_type *t)
>  {
> -       struct btf_field_info info = { .off = -ENOENT };
> +       /* btf_find_field requires array of size max + 1 */
> +       struct btf_field_info info_arr[2];
>         int ret;
>
> -       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
> +       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
>         if (ret < 0)
>                 return ret;
> -       return info.off;
> +       if (!ret)
> +               return -ENOENT;
> +       return info_arr[0].off;
> +}
> +
> +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> +                                       const struct btf_type *t)
> +{
> +       /* btf_find_field requires array of size max + 1 */
> +       struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
> +       struct bpf_map_value_off *tab;
> +       int ret, i, nr_off;
> +
> +       /* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> +       BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
> +
> +       ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> +       if (ret < 0)
> +               return ERR_PTR(ret);
> +       if (!ret)
> +               return NULL;
> +
> +       nr_off = ret;
> +       tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
> +       if (!tab)
> +               return ERR_PTR(-ENOMEM);
> +
> +       tab->nr_off = 0;
> +       for (i = 0; i < nr_off; i++) {
> +               const struct btf_type *t;
> +               struct btf *off_btf;
> +               s32 id;
> +
> +               t = info_arr[i].type;
> +               id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
> +                                    &off_btf);
> +               if (id < 0) {
> +                       ret = id;
> +                       goto end;
> +               }
> +
> +               tab->off[i].offset = info_arr[i].off;
> +               tab->off[i].btf_id = id;
> +               tab->off[i].btf = off_btf;
> +               tab->nr_off = i + 1;
> +       }
> +       return tab;
> +end:
> +       while (tab->nr_off--)
> +               btf_put(tab->off[tab->nr_off].btf);
> +       kfree(tab);
> +       return ERR_PTR(ret);
>  }
>
>  static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
> diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> index 5cd8f5277279..135205d0d560 100644
> --- a/kernel/bpf/map_in_map.c
> +++ b/kernel/bpf/map_in_map.c
> @@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>         inner_map_meta->max_entries = inner_map->max_entries;
>         inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
>         inner_map_meta->timer_off = inner_map->timer_off;
> +       inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
>         if (inner_map->btf) {
>                 btf_get(inner_map->btf);
>                 inner_map_meta->btf = inner_map->btf;
> @@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>
>  void bpf_map_meta_free(struct bpf_map *map_meta)
>  {
> +       bpf_map_free_kptr_off_tab(map_meta);
>         btf_put(map_meta->btf);
>         kfree(map_meta);
>  }
> @@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
>                 meta0->key_size == meta1->key_size &&
>                 meta0->value_size == meta1->value_size &&
>                 meta0->timer_off == meta1->timer_off &&
> -               meta0->map_flags == meta1->map_flags;
> +               meta0->map_flags == meta1->map_flags &&
> +               bpf_map_equal_kptr_off_tab(meta0, meta1);
>  }
>
>  void *bpf_map_fd_get_ptr(struct bpf_map *map,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index cdaa1152436a..5990d6fa97ab 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -6,6 +6,7 @@
>  #include <linux/bpf_trace.h>
>  #include <linux/bpf_lirc.h>
>  #include <linux/bpf_verifier.h>
> +#include <linux/bsearch.h>
>  #include <linux/btf.h>
>  #include <linux/syscalls.h>
>  #include <linux/slab.h>
> @@ -473,12 +474,95 @@ static void bpf_map_release_memcg(struct bpf_map *map)
>  }
>  #endif
>
> +static int bpf_map_kptr_off_cmp(const void *a, const void *b)
> +{
> +       const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> +
> +       if (off_desc1->offset < off_desc2->offset)
> +               return -1;
> +       else if (off_desc1->offset > off_desc2->offset)
> +               return 1;
> +       return 0;
> +}
> +
> +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
> +{
> +       /* Since members are iterated in btf_find_field in increasing order,
> +        * offsets appended to kptr_off_tab are in increasing order, so we can
> +        * do bsearch to find exact match.
> +        */
> +       struct bpf_map_value_off *tab;
> +
> +       if (!map_value_has_kptr(map))
> +               return NULL;
> +       tab = map->kptr_off_tab;
> +       return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
> +}
> +
> +void bpf_map_free_kptr_off_tab(struct bpf_map *map)
> +{
> +       struct bpf_map_value_off *tab = map->kptr_off_tab;
> +       int i;
> +
> +       if (!map_value_has_kptr(map))
> +               return;
> +       for (i = 0; i < tab->nr_off; i++) {
> +               struct btf *btf = tab->off[i].btf;
> +
> +               btf_put(btf);
> +       }
> +       kfree(tab);
> +       map->kptr_off_tab = NULL;
> +}
> +
> +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
> +{
> +       struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
> +       int size, i, ret;
> +
> +       if (!map_value_has_kptr(map))
> +               return ERR_PTR(-ENOENT);
> +       /* Do a deep copy of the kptr_off_tab */
> +       for (i = 0; i < tab->nr_off; i++)
> +               btf_get(tab->off[i].btf);
> +
> +       size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
> +       new_tab = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> +       if (!new_tab) {
> +               ret = -ENOMEM;
> +               goto end;
> +       }
> +       memcpy(new_tab, tab, size);
> +       return new_tab;
> +end:
> +       while (i--)
> +               btf_put(tab->off[i].btf);
> +       return ERR_PTR(ret);
> +}
> +
> +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> +{
> +       struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> +       bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
> +       int size;
> +
> +       if (!a_has_kptr && !b_has_kptr)
> +               return true;
> +       if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))

nit: I think we could simplify this second if check to
if (!a_has_kptr || !b_has_kptr)
    return false;

> +               return false;
> +       if (tab_a->nr_off != tab_b->nr_off)
> +               return false;
> +       size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> +       return !memcmp(tab_a, tab_b, size);
> +}
> +
>  /* called from workqueue */
>  static void bpf_map_free_deferred(struct work_struct *work)
>  {
>         struct bpf_map *map = container_of(work, struct bpf_map, work);
>
>         security_bpf_map_free(map);
> +       bpf_map_free_kptr_off_tab(map);
>         bpf_map_release_memcg(map);
>         /* implementation dependent freeing */
>         map->ops->map_free(map);
> @@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>         int err;
>
>         if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> -           map_value_has_timer(map))
> +           map_value_has_timer(map) || map_value_has_kptr(map))
>                 return -ENOTSUPP;
>
>         if (!(vma->vm_flags & VM_SHARED))
> @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>                         return -EOPNOTSUPP;
>         }
>
> -       if (map->ops->map_check_btf)
> +       map->kptr_off_tab = btf_find_kptr(btf, value_type);
> +       if (map_value_has_kptr(map)) {
> +               if (!bpf_capable())
> +                       return -EPERM;
> +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> +                       ret = -EACCES;
> +                       goto free_map_tab;
> +               }
> +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> +                       ret = -EOPNOTSUPP;
> +                       goto free_map_tab;
> +               }
> +       }
> +
> +       if (map->ops->map_check_btf) {
>                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> +               if (ret < 0)
> +                       goto free_map_tab;
> +       }
>
> +       return ret;
> +free_map_tab:
> +       bpf_map_free_kptr_off_tab(map);
>         return ret;
>  }
>
> @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
>                 return PTR_ERR(map);
>
>         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> -           map_value_has_timer(map)) {
> +           map_value_has_timer(map) || map_value_has_kptr(map)) {
>                 fdput(f);
>                 return -ENOTSUPP;
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4ce9a528fb63..744b7362e52e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
>         return __check_ptr_off_reg(env, reg, regno, false);
>  }
>
> +static int map_kptr_match_type(struct bpf_verifier_env *env,
> +                              struct bpf_map_value_off_desc *off_desc,
> +                              struct bpf_reg_state *reg, u32 regno)
> +{
> +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> +       const char *reg_name = "";
> +
> +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> +               goto bad_type;
> +
> +       if (!btf_is_kernel(reg->btf)) {
> +               verbose(env, "R%d must point to kernel BTF\n", regno);
> +               return -EINVAL;
> +       }
> +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> +
> +       if (__check_ptr_off_reg(env, reg, regno, true))
> +               return -EACCES;
> +
> +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> +                                 off_desc->btf, off_desc->btf_id))
> +               goto bad_type;
> +       return 0;
> +bad_type:
> +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> +               reg_type_str(env, reg->type), reg_name);
> +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +       return -EINVAL;
> +}
> +
> +/* Returns an error, or 0 if ignoring the access, or 1 if register state was
> + * updated, in which case later updates must be skipped.
> + */
> +static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> +                                int off, int size, int value_regno,
> +                                enum bpf_access_type t, int insn_idx)

Did you mean to include the "enum bpf_access_type t" argument? I'm not
seeing where it's being used in this function

> +{
> +       struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
> +       struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> +       struct bpf_map_value_off_desc *off_desc;
> +       struct bpf_map *map = reg->map_ptr;
> +       int class = BPF_CLASS(insn->code);
> +
> +       /* Things we already checked for in check_map_access:
> +        *  - Reject cases where variable offset may touch BTF ID pointer
> +        *  - size of access (must be BPF_DW)
> +        *  - off_desc->offset == off + reg->var_off.value
> +        */
> +       if (!tnum_is_const(reg->var_off))
> +               return 0;
> +
> +       off_desc = bpf_map_kptr_off_contains(map, off + reg->var_off.value);
> +       if (!off_desc)
> +               return 0;
> +
> +       /* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
> +       if (BPF_MODE(insn->code) != BPF_MEM)
> +               goto end;
> +
> +       if (class == BPF_LDX) {
> +               val_reg = reg_state(env, value_regno);
> +               /* We can simply mark the value_regno receiving the pointer
> +                * value from map as PTR_TO_BTF_ID, with the correct type.
> +                */
> +               mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
> +                               off_desc->btf_id, PTR_MAYBE_NULL);
> +               val_reg->id = ++env->id_gen;
> +       } else if (class == BPF_STX) {
> +               val_reg = reg_state(env, value_regno);
> +               if (!register_is_null(val_reg) &&
> +                   map_kptr_match_type(env, off_desc, val_reg, value_regno))
> +                       return -EACCES;
> +       } else if (class == BPF_ST) {
> +               if (insn->imm) {
> +                       verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
> +                               off_desc->offset);
> +                       return -EACCES;
> +               }
> +       } else {
> +               goto end;
> +       }
> +       return 1;
> +end:
> +       verbose(env, "kptr in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
> +       return -EACCES;
> +}
> +
>  /* check read/write into a map element with possible variable offset */
>  static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>                             int off, int size, bool zero_size_allowed)
> @@ -3545,6 +3633,32 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
>                         return -EACCES;
>                 }
>         }
> +       if (map_value_has_kptr(map)) {
> +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> +               int i;
> +
> +               for (i = 0; i < tab->nr_off; i++) {
> +                       u32 p = tab->off[i].offset;
> +
> +                       if (reg->smin_value + off < p + sizeof(u64) &&
> +                           p < reg->umax_value + off + size) {
> +                               if (!tnum_is_const(reg->var_off)) {
> +                                       verbose(env, "kptr access cannot have variable offset\n");
> +                                       return -EACCES;
> +                               }
> +                               if (p != off + reg->var_off.value) {
> +                                       verbose(env, "kptr access misaligned expected=%u off=%llu\n",
> +                                               p, off + reg->var_off.value);
> +                                       return -EACCES;
> +                               }
> +                               if (size != bpf_size_to_bytes(BPF_DW)) {
> +                                       verbose(env, "kptr access size must be BPF_DW\n");
> +                                       return -EACCES;
> +                               }
> +                               break;
> +                       }
> +               }
> +       }
>         return err;
>  }
>
> @@ -4421,6 +4535,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                 if (err)
>                         return err;
>                 err = check_map_access(env, regno, off, size, false);
> +               err = err ?: check_map_kptr_access(env, regno, off, size, value_regno, t, insn_idx);
> +               if (err < 0)
> +                       return err;
> +               /* if err == 0, check_map_kptr_access ignored the access */
>                 if (!err && t == BPF_READ && value_regno >= 0) {
>                         struct bpf_map *map = reg->map_ptr;
>
> @@ -4442,6 +4560,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn)
>                                 mark_reg_unknown(env, regs, value_regno);
>                         }
>                 }
> +               /* clear err == 1 */
> +               err = err < 0 ? err : 0;

I find this flow a bit unintuitive to follow. Would something like

    err = check_map_access(env, regno, off, size, false);
    if (err)
        return err;
    if (bpf_map_kptr_off_contains(map, off + reg->var_off.value, &off_desc)) {
        err = check_map_kptr_access(...) *where check_map_kptr_access
now returns 0 on success and error code otherwise*
    } else if (!err && t == BPF_READ && value_regno >= 0) {
        ...
    }

be clearer?


>         } else if (base_type(reg->type) == PTR_TO_MEM) {
>                 bool rdonly_mem = type_is_rdonly_mem(reg->type);
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto
  2022-03-20 15:55 ` [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto Kumar Kartikeya Dwivedi
@ 2022-03-22  1:47   ` Joanne Koong
  2022-03-22  7:34     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Joanne Koong @ 2022-03-22  1:47 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 6:34 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Add a few fields for each arg (argN_release) that when set to true,
> tells verifier that for a release function, that argument's register
> will be the one for which meta.ref_obj_id will be set, and which will
> then be released using release_reference. To capture the regno,
> introduce a release_regno field in bpf_call_arg_meta.
>
> This would be required in the next patch, where we may either pass NULL
> or a refcounted pointer as an argument to the release function
> bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
> enough, as there is a case where the type of argument needed matches,
> but the ref_obj_id is set to 0. Hence, we must enforce that whenever
> meta.ref_obj_id is zero, the register that is to be released can only
> be NULL for a release function.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   | 10 ++++++++++
>  kernel/bpf/ringbuf.c  |  2 ++
>  kernel/bpf/verifier.c | 39 +++++++++++++++++++++++++++++++++------
>  net/core/filter.c     |  1 +
>  4 files changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f35920d279dd..48ddde854d67 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -487,6 +487,16 @@ struct bpf_func_proto {
>                 };
>                 u32 *arg_btf_id[5];
>         };
> +       union {
> +               struct {
> +                       bool arg1_release;
> +                       bool arg2_release;
> +                       bool arg3_release;
> +                       bool arg4_release;
> +                       bool arg5_release;
> +               };
> +               bool arg_release[5];
> +       };

Instead of having the new fields "argx_release" for each arg, what are
your thoughts on using PTR_RELEASE as an "enum bpf_type_flag" to the
existing "argx_type" field? For example, instead of

     .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
     .arg1_release   = true,

could we do something like

     .arg1_type      = ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE

In the verifier, we could determine whether an argument register
releases a reference by checking whether this PTR_RELEASE flag is set.

Would this be a little cleaner? Curious to hear your thoughts.


>         int *ret_btf_id; /* return value btf_id */
>         bool (*allowed)(const struct bpf_prog *prog);
>  };
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index 710ba9de12ce..f40ce718630e 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -405,6 +405,7 @@ const struct bpf_func_proto bpf_ringbuf_submit_proto = {
>         .func           = bpf_ringbuf_submit,
>         .ret_type       = RET_VOID,
>         .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
> +       .arg1_release   = true,
>         .arg2_type      = ARG_ANYTHING,
>  };
>
> @@ -418,6 +419,7 @@ const struct bpf_func_proto bpf_ringbuf_discard_proto = {
>         .func           = bpf_ringbuf_discard,
>         .ret_type       = RET_VOID,
>         .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
> +       .arg1_release   = true,
>         .arg2_type      = ARG_ANYTHING,
>  };
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 744b7362e52e..b8cd34607215 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
>         struct bpf_map *map_ptr;
>         bool raw_mode;
>         bool pkt_access;
> +       u8 release_regno;
>         int regno;
>         int access_size;
>         int mem_size;
> @@ -6101,12 +6102,31 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
>         return true;
>  }
>
> -static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
> +static bool check_release_regno(const struct bpf_func_proto *fn, int func_id,
> +                               struct bpf_call_arg_meta *meta)
> +{
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(fn->arg_release); i++) {
> +               if (fn->arg_release[i]) {
> +                       if (!is_release_function(func_id))
> +                               return false;
> +                       if (meta->release_regno)
> +                               return false;
> +                       meta->release_regno = i + 1;
> +               }
> +       }
> +       return !is_release_function(func_id) || meta->release_regno;
> +}
> +
> +static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
> +                           struct bpf_call_arg_meta *meta)
>  {
>         return check_raw_mode_ok(fn) &&
>                check_arg_pair_ok(fn) &&
>                check_btf_id_ok(fn) &&
> -              check_refcount_ok(fn, func_id) ? 0 : -EINVAL;
> +              check_refcount_ok(fn, func_id) &&
> +              check_release_regno(fn, func_id, meta) ? 0 : -EINVAL;
>  }
>
>  /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
> @@ -6785,7 +6805,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>         memset(&meta, 0, sizeof(meta));
>         meta.pkt_access = fn->pkt_access;
>
> -       err = check_func_proto(fn, func_id);
> +       err = check_func_proto(fn, func_id, &meta);
>         if (err) {
>                 verbose(env, "kernel subsystem misconfigured func %s#%d\n",
>                         func_id_name(func_id), func_id);
> @@ -6818,8 +6838,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                         return err;
>         }
>
> +       regs = cur_regs(env);
> +
>         if (is_release_function(func_id)) {
> -               err = release_reference(env, meta.ref_obj_id);
> +               err = -EINVAL;
> +               if (meta.ref_obj_id)
> +                       err = release_reference(env, meta.ref_obj_id);
> +               /* meta.ref_obj_id can only be 0 if register that is meant to be
> +                * released is NULL, which must be > R0.
> +                */
> +               else if (meta.release_regno && register_is_null(&regs[meta.release_regno]))
> +                       err = 0;

If I'm understanding this correctly, in this patch we will call
check_release_regno on every function to determine if any / which of
the argument registers release a reference. Given that in the majority
of cases the function will not be a release function, what are your
thoughts on moving that check to be within the scope of this if
function? So if it is a release function, and meta.ref_obj_id is not
set, then we do the checking for which argument register is a release
register and whether that register is null. Curious to hear your
thoughts.


>                 if (err) {
>                         verbose(env, "func %s#%d reference has not been acquired before\n",
>                                 func_id_name(func_id), func_id);
> @@ -6827,8 +6856,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                 }
>         }
>
> -       regs = cur_regs(env);
> -
>         switch (func_id) {
>         case BPF_FUNC_tail_call:
>                 err = check_reference_leak(env);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 03655f2074ae..17eff4731b06 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6622,6 +6622,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
>         .gpl_only       = false,
>         .ret_type       = RET_INTEGER,
>         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> +       .arg1_release   = true,
>  };
>
>  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
  2022-03-21 23:39   ` Joanne Koong
@ 2022-03-22  5:45   ` Andrii Nakryiko
  2022-03-22  7:16     ` Kumar Kartikeya Dwivedi
  2022-03-22 18:06   ` Martin KaFai Lau
  2 siblings, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22  5:45 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This commit introduces a new pointer type 'kptr' which can be embedded
> in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> register must have the same type as in the map value's BTF, and loading
> a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> kernel BTF and BTF ID.
>
> Such kptr are unreferenced, i.e. by the time another invocation of the
> BPF program loads this pointer, the object which the pointer points to
> may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> patched to PROBE_MEM loads by the verifier, it would safe to allow user
> to still access such invalid pointer, but passing such pointers into
> BPF helpers and kfuncs should not be permitted. A future patch in this
> series will close this gap.
>
> The flexibility offered by allowing programs to dereference such invalid
> pointers while being safe at runtime frees the verifier from doing
> complex lifetime tracking. As long as the user may ensure that the
> object remains valid, it can ensure data read by it from the kernel
> object is valid.
>
> The user indicates that a certain pointer must be treated as kptr
> capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> information is recorded in the object BTF which will be passed into the
> kernel by way of map's BTF information. The name and kind from the map
> value BTF is used to look up the in-kernel type, and the actual BTF and
> BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> now, only storing pointers to structs is permitted.
>
> An example of this specification is shown below:
>
>         #define __kptr __attribute__((btf_type_tag("kptr")))
>
>         struct map_value {
>                 ...
>                 struct task_struct __kptr *task;
>                 ...
>         };
>
> Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> task_struct into the map, and then load it later.
>
> Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> the verifier cannot know whether the value is NULL or not statically, it
> must treat all potential loads at that map value offset as loading a
> possibly NULL pointer.
>
> Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> are allowed instructions that can access such a pointer. On BPF_LDX, the
> destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> it is checked whether the source register type is a PTR_TO_BTF_ID with
> same BTF type as specified in the map BTF. The access size must always
> be BPF_DW.
>
> For the map in map support, the kptr_off_tab for outer map is copied
> from the inner map's kptr_off_tab. It was chosen to do a deep copy
> instead of introducing a refcount to kptr_off_tab, because the copy only
> needs to be done when paramterizing using inner_map_fd in the map in map
> case, hence would be unnecessary for all other users.
>
> It is not permitted to use MAP_FREEZE command and mmap for BPF map
> having kptr, similar to the bpf_timer case.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h     |  29 +++++++-
>  include/linux/btf.h     |   2 +
>  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
>  kernel/bpf/map_in_map.c |   5 +-
>  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
>  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
>  6 files changed, 401 insertions(+), 28 deletions(-)
>

[...]

> +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> +                              u32 off, int sz, struct btf_field_info *info)
> +{
> +       /* For PTR, sz is always == 8 */
> +       if (!btf_type_is_ptr(t))
> +               return BTF_FIELD_IGNORE;
> +       t = btf_type_by_id(btf, t->type);
> +
> +       if (!btf_type_is_type_tag(t))
> +               return BTF_FIELD_IGNORE;
> +       /* Reject extra tags */
> +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> +               return -EINVAL;

Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
you assume there are no more tags with just this check?


> +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> +               return -EINVAL;
> +
> +       /* Get the base type */
> +       if (btf_type_is_modifier(t))
> +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> +       /* Only pointer to struct is allowed */
> +       if (!__btf_type_is_struct(t))
> +               return -EINVAL;
> +
> +       info->type = t;
> +       info->off = off;
> +       return BTF_FIELD_FOUND;
>  }
>
>  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
>                                  const char *name, int sz, int align, int field_type,
> -                                struct btf_field_info *info)
> +                                struct btf_field_info *info, int info_cnt)
>  {
>         const struct btf_member *member;
> +       int ret, idx = 0;
>         u32 i, off;
> -       int ret;
>
>         for_each_member(i, t, member) {
>                 const struct btf_type *member_type = btf_type_by_id(btf,
> @@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
>                 switch (field_type) {
>                 case BTF_FIELD_SPIN_LOCK:
>                 case BTF_FIELD_TIMER:
> -                       ret = btf_find_field_struct(btf, member_type, off, sz, info);
> +                       ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
> +                       if (ret < 0)
> +                               return ret;
> +                       break;
> +               case BTF_FIELD_KPTR:
> +                       ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
>                         if (ret < 0)
>                                 return ret;
>                         break;
>                 default:
>                         return -EFAULT;
>                 }
> +
> +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)

hm.. haven't you already written info[info_cnt] above by now? I see
that above you do (info_cnt - 1), but why such tricks if you can have
a temporary struct btf_field_info on the stack, write into it, and if
BTF_FIELD_FOUND and idx < info_cnt then write it into info[idx]?


> +                       return -E2BIG;
> +               else if (ret == BTF_FIELD_IGNORE)
> +                       continue;
> +               ++idx;
>         }
> -       return 0;
> +       return idx;
>  }
>
>  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>                                 const char *name, int sz, int align, int field_type,
> -                               struct btf_field_info *info)
> +                               struct btf_field_info *info, int info_cnt)
>  {
>         const struct btf_var_secinfo *vsi;
> +       int ret, idx = 0;
>         u32 i, off;
> -       int ret;
>
>         for_each_vsi(i, t, vsi) {
>                 const struct btf_type *var = btf_type_by_id(btf, vsi->type);
> @@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>                 switch (field_type) {
>                 case BTF_FIELD_SPIN_LOCK:
>                 case BTF_FIELD_TIMER:
> -                       ret = btf_find_field_struct(btf, var_type, off, sz, info);
> +                       ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
> +                       if (ret < 0)
> +                               return ret;
> +                       break;
> +               case BTF_FIELD_KPTR:
> +                       ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
>                         if (ret < 0)
>                                 return ret;
>                         break;
>                 default:
>                         return -EFAULT;
>                 }
> +
> +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)

same, already writing past the end of array?

> +                       return -E2BIG;
> +               if (ret == BTF_FIELD_IGNORE)
> +                       continue;
> +               ++idx;
>         }
> -       return 0;
> +       return idx;
>  }
>
>  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> -                         int field_type, struct btf_field_info *info)
> +                         int field_type, struct btf_field_info *info, int info_cnt)
>  {
>         const char *name;
>         int sz, align;
> @@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
>                 sz = sizeof(struct bpf_timer);
>                 align = __alignof__(struct bpf_timer);
>                 break;
> +       case BTF_FIELD_KPTR:
> +               name = NULL;
> +               sz = sizeof(u64);
> +               align = __alignof__(u64);

can be 4 on 32-bit arch, is that ok?

> +               break;
>         default:
>                 return -EFAULT;
>         }
>
> +       /* The maximum allowed fields of a certain type will be info_cnt - 1 */
>         if (__btf_type_is_struct(t))
> -               return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
> +               return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);

why -1, to avoid overwriting past the end of array?

>         else if (btf_type_is_datasec(t))
> -               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
> +               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
>         return -EINVAL;
>  }
>
> @@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
>   */
>  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
>  {
> -       struct btf_field_info info = { .off = -ENOENT };
> +       /* btf_find_field requires array of size max + 1 */

ok, right, as I expected above, but see also suggestion to not have
these weird implicit expectations

> +       struct btf_field_info info_arr[2];
>         int ret;
>
> -       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
> +       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
>         if (ret < 0)
>                 return ret;
> -       return info.off;
> +       if (!ret)
> +               return -ENOENT;
> +       return info_arr[0].off;
>  }
>
>  int btf_find_timer(const struct btf *btf, const struct btf_type *t)
>  {
> -       struct btf_field_info info = { .off = -ENOENT };
> +       /* btf_find_field requires array of size max + 1 */
> +       struct btf_field_info info_arr[2];
>         int ret;
>
> -       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
> +       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
>         if (ret < 0)
>                 return ret;
> -       return info.off;
> +       if (!ret)
> +               return -ENOENT;
> +       return info_arr[0].off;
> +}
> +
> +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> +                                       const struct btf_type *t)
> +{
> +       /* btf_find_field requires array of size max + 1 */
> +       struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
> +       struct bpf_map_value_off *tab;
> +       int ret, i, nr_off;
> +
> +       /* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> +       BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);

you can store u32 type_id instead of full btf_type pointer, type
looking below in the loop is cheap and won't fail


> +
> +       ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> +       if (ret < 0)
> +               return ERR_PTR(ret);
> +       if (!ret)
> +               return NULL;
> +

[...]

> +
> +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> +{
> +       struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> +       bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
> +       int size;
> +
> +       if (!a_has_kptr && !b_has_kptr)
> +               return true;
> +       if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))
> +               return false;

if (a_has_kptr != b_has_kptr)
    return false;

> +       if (tab_a->nr_off != tab_b->nr_off)
> +               return false;
> +       size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> +       return !memcmp(tab_a, tab_b, size);
> +}
> +
>  /* called from workqueue */
>  static void bpf_map_free_deferred(struct work_struct *work)
>  {
>         struct bpf_map *map = container_of(work, struct bpf_map, work);
>
>         security_bpf_map_free(map);
> +       bpf_map_free_kptr_off_tab(map);
>         bpf_map_release_memcg(map);
>         /* implementation dependent freeing */
>         map->ops->map_free(map);
> @@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>         int err;
>
>         if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> -           map_value_has_timer(map))
> +           map_value_has_timer(map) || map_value_has_kptr(map))
>                 return -ENOTSUPP;
>
>         if (!(vma->vm_flags & VM_SHARED))
> @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>                         return -EOPNOTSUPP;
>         }
>
> -       if (map->ops->map_check_btf)
> +       map->kptr_off_tab = btf_find_kptr(btf, value_type);

btf_find_kptr() is so confusingly named. It certainly can find more
than one kptr, so at least it should be btf_find_kptrs(). Combining
with Joanne's suggestion, btf_parse_kptrs() would indeed be better.

> +       if (map_value_has_kptr(map)) {
> +               if (!bpf_capable())
> +                       return -EPERM;
> +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> +                       ret = -EACCES;
> +                       goto free_map_tab;
> +               }
> +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> +                   map->map_type != BPF_MAP_TYPE_ARRAY) {

what about PERCPU_ARRAY, for instance? Is there something
fundamentally wrong to support it for local storage maps?

> +                       ret = -EOPNOTSUPP;
> +                       goto free_map_tab;
> +               }
> +       }
> +
> +       if (map->ops->map_check_btf) {
>                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> +               if (ret < 0)
> +                       goto free_map_tab;
> +       }
>
> +       return ret;
> +free_map_tab:
> +       bpf_map_free_kptr_off_tab(map);
>         return ret;
>  }
>
> @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
>                 return PTR_ERR(map);
>
>         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> -           map_value_has_timer(map)) {
> +           map_value_has_timer(map) || map_value_has_kptr(map)) {
>                 fdput(f);
>                 return -ENOTSUPP;
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4ce9a528fb63..744b7362e52e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
>         return __check_ptr_off_reg(env, reg, regno, false);
>  }
>
> +static int map_kptr_match_type(struct bpf_verifier_env *env,
> +                              struct bpf_map_value_off_desc *off_desc,
> +                              struct bpf_reg_state *reg, u32 regno)
> +{
> +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> +       const char *reg_name = "";
> +
> +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)

base_type(reg->type) != PTR_TO_BTF_ID ?

> +               goto bad_type;
> +
> +       if (!btf_is_kernel(reg->btf)) {
> +               verbose(env, "R%d must point to kernel BTF\n", regno);
> +               return -EINVAL;
> +       }
> +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> +
> +       if (__check_ptr_off_reg(env, reg, regno, true))
> +               return -EACCES;
> +
> +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> +                                 off_desc->btf, off_desc->btf_id))
> +               goto bad_type;
> +       return 0;
> +bad_type:
> +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> +               reg_type_str(env, reg->type), reg_name);
> +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);

why two separate verbose calls, you can easily combine them (and they
should be output on a single line given it's a single error)

> +       return -EINVAL;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-03-20 15:55 ` [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
@ 2022-03-22  5:58   ` Andrii Nakryiko
  2022-03-22  7:18     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22  5:58 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> While we can guarantee that even for unreferenced kptr, the object
> pointer points to being freed etc. can be handled by the verifier's
> exception handling (normal load patching to PROBE_MEM loads), we still
> cannot allow the user to pass these pointers to BPF helpers and kfunc,
> because the same exception handling won't be done for accesses inside
> the kernel. The same is true if a referenced pointer is loaded using
> normal load instruction. Since the reference is not guaranteed to be
> held while the pointer is used, it must be marked as untrusted.
>
> Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
> all registers loading unreferenced and referenced kptr from BPF maps,
> and ensure they can never escape the BPF program and into the kernel by
> way of calling stable/unstable helpers.
>
> In check_ptr_to_btf_access, the !type_may_be_null check to reject type
> flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
> MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
> two are checked inside the function and rejected using a proper error
> message, but we still want to allow dereference of untrusted case.
>
> Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
> walked, so that this flag is never dropped once it has been set on a
> PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
> direction).
>
> In convert_ctx_accesses, extend the switch case to consider untrusted
> PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
> conversion for BPF_LDX.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   | 10 +++++++++-
>  kernel/bpf/verifier.c | 34 +++++++++++++++++++++++++++-------
>  2 files changed, 36 insertions(+), 8 deletions(-)
>

[...]

> -       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> -               goto bad_type;
> +       if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
> +               if (reg->type != PTR_TO_BTF_ID &&
> +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL))
> +                       goto bad_type;
> +       } else { /* only unreferenced case accepts untrusted pointers */
> +               if (reg->type != PTR_TO_BTF_ID &&
> +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL) &&
> +                   reg->type != (PTR_TO_BTF_ID | PTR_UNTRUSTED) &&
> +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL | PTR_UNTRUSTED))

use base_type(), Luke! ;)

> +                       goto bad_type;
> +       }
>
>         if (!btf_is_kernel(reg->btf)) {
>                 verbose(env, "R%d must point to kernel BTF\n", regno);

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-21 23:39   ` Joanne Koong
@ 2022-03-22  7:04     ` Kumar Kartikeya Dwivedi
  2022-03-22 20:22       ` Andrii Nakryiko
  0 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-22  7:04 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 05:09:30AM IST, Joanne Koong wrote:
> On Sun, Mar 20, 2022 at 5:27 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This commit introduces a new pointer type 'kptr' which can be embedded
> > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > register must have the same type as in the map value's BTF, and loading
> > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > kernel BTF and BTF ID.
> >
> > Such kptr are unreferenced, i.e. by the time another invocation of the
> > BPF program loads this pointer, the object which the pointer points to
> > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > to still access such invalid pointer, but passing such pointers into
> > BPF helpers and kfuncs should not be permitted. A future patch in this
> > series will close this gap.
> >
> > The flexibility offered by allowing programs to dereference such invalid
> > pointers while being safe at runtime frees the verifier from doing
> > complex lifetime tracking. As long as the user may ensure that the
> > object remains valid, it can ensure data read by it from the kernel
> > object is valid.
> >
> > The user indicates that a certain pointer must be treated as kptr
> > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > information is recorded in the object BTF which will be passed into the
> > kernel by way of map's BTF information. The name and kind from the map
> > value BTF is used to look up the in-kernel type, and the actual BTF and
> > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > now, only storing pointers to structs is permitted.
> >
> > An example of this specification is shown below:
> >
> >         #define __kptr __attribute__((btf_type_tag("kptr")))
> >
> >         struct map_value {
> >                 ...
> >                 struct task_struct __kptr *task;
> >                 ...
> >         };
> >
> > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > task_struct into the map, and then load it later.
> >
> > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > the verifier cannot know whether the value is NULL or not statically, it
> > must treat all potential loads at that map value offset as loading a
> > possibly NULL pointer.
> >
> > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > same BTF type as specified in the map BTF. The access size must always
> > be BPF_DW.
> >
> > For the map in map support, the kptr_off_tab for outer map is copied
> > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > instead of introducing a refcount to kptr_off_tab, because the copy only
> > needs to be done when paramterizing using inner_map_fd in the map in map
> > case, hence would be unnecessary for all other users.
> >
> > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > having kptr, similar to the bpf_timer case.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h     |  29 +++++++-
> >  include/linux/btf.h     |   2 +
> >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> >  kernel/bpf/map_in_map.c |   5 +-
> >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> >  6 files changed, 401 insertions(+), 28 deletions(-)
> >
> [...]
> > +
> >  struct bpf_map *bpf_map_get(u32 ufd);
> >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> >  struct bpf_map *__bpf_map_get(struct fd f);
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 36bc09b8e890..5b578dc81c04 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> >                            u32 expected_offset, u32 expected_size);
> >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > +                                       const struct btf_type *t);
>
> nit: given that "btf_find_kptr" allocates memory as well, maybe the
> name "btf_parse_kptr" would be more reflective?
>

Good point, will change.

> >  bool btf_type_is_void(const struct btf_type *t);
> >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 9e17af936a7a..92afbec0a887 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -3164,9 +3164,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
> >  enum {
> >         BTF_FIELD_SPIN_LOCK,
> >         BTF_FIELD_TIMER,
> > +       BTF_FIELD_KPTR,
> > +};
> > +
> > +enum {
> > +       BTF_FIELD_IGNORE = 0,
> > +       BTF_FIELD_FOUND  = 1,
> >  };
> >
> >  struct btf_field_info {
> > +       const struct btf_type *type;
> >         u32 off;
> >  };
> >
> > @@ -3174,23 +3181,48 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
> >                                  u32 off, int sz, struct btf_field_info *info)
> >  {
> >         if (!__btf_type_is_struct(t))
> > -               return 0;
> > +               return BTF_FIELD_IGNORE;
> >         if (t->size != sz)
> > -               return 0;
> > -       if (info->off != -ENOENT)
> > -               /* only one such field is allowed */
> > -               return -E2BIG;
> > +               return BTF_FIELD_IGNORE;
> >         info->off = off;
> > -       return 0;
> > +       return BTF_FIELD_FOUND;
> > +}
> > +
> > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > +                              u32 off, int sz, struct btf_field_info *info)
> > +{
> > +       /* For PTR, sz is always == 8 */
> > +       if (!btf_type_is_ptr(t))
> > +               return BTF_FIELD_IGNORE;
> > +       t = btf_type_by_id(btf, t->type);
> > +
> > +       if (!btf_type_is_type_tag(t))
> > +               return BTF_FIELD_IGNORE;
> > +       /* Reject extra tags */
> > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > +               return -EINVAL;
> > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > +               return -EINVAL;
> > +
> > +       /* Get the base type */
> > +       if (btf_type_is_modifier(t))
> > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > +       /* Only pointer to struct is allowed */
> > +       if (!__btf_type_is_struct(t))
> > +               return -EINVAL;
> > +
> > +       info->type = t;
> > +       info->off = off;
> > +       return BTF_FIELD_FOUND;
> >  }
> >
> >  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
> >                                  const char *name, int sz, int align, int field_type,
> > -                                struct btf_field_info *info)
> > +                                struct btf_field_info *info, int info_cnt)
>
> From my understanding, this patch now modifies btf_find_struct_field
> and btf_find_datasec_var such that the "info" that is passed in has to
> be an array of size max possible + 1 while "info_cnt" is the max
> possible count, or we risk writing beyond the "info" array passed in.
> It seems like we could just modify the
> btf_find_struct_field/btf_find_datasec_var logic so that the user can
> just pass in info array of max possible size instead of max possible
> size + 1 - or is your concern that this would require more idx >=
> info_cnt checks inside the functions? Maybe we should include a
> comment here and in btf_find_datasec_var to document that "info"
> should always be max possible size + 1?
>

So for some context on why this was changed, follow [0].

I agree it's pretty ugly. My first thought was to check it inside the functions,
but that is also not very great. So I went with this, one more suggestion from
Alexei was to split it into a find and then fill info, because the error on
idx >= info_cnt should only happen after we find. Right now the find and fill
happens together, so to error out, you need an extra element it can fill before
you bail for ARRAY_SIZE - 1 (which is the actual max).

TBH the find + fill split looks best to me, but open to more suggestions.

[0]: https://lore.kernel.org/bpf/20220319181538.nbqdkprjrzkxk7v4@ast-mbp.dhcp.thefacebook.com

> >  {
> >         const struct btf_member *member;
> > +       int ret, idx = 0;
> >         u32 i, off;
> > -       int ret;
> >
> >         for_each_member(i, t, member) {
> >                 const struct btf_type *member_type = btf_type_by_id(btf,
> > @@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
> >                 switch (field_type) {
> >                 case BTF_FIELD_SPIN_LOCK:
> >                 case BTF_FIELD_TIMER:
> > -                       ret = btf_find_field_struct(btf, member_type, off, sz, info);
> > +                       ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
> > +                       if (ret < 0)
> > +                               return ret;
> > +                       break;
> > +               case BTF_FIELD_KPTR:
> > +                       ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
> >                         if (ret < 0)
> >                                 return ret;
> >                         break;
> >                 default:
> >                         return -EFAULT;
> >                 }
> > +
> > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> > +                       return -E2BIG;
> > +               else if (ret == BTF_FIELD_IGNORE)
> > +                       continue;
> > +               ++idx;
> >         }
> > -       return 0;
> > +       return idx;
> >  }
> >
> >  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> >                                 const char *name, int sz, int align, int field_type,
> > -                               struct btf_field_info *info)
> > +                               struct btf_field_info *info, int info_cnt)
> >  {
> >         const struct btf_var_secinfo *vsi;
> > +       int ret, idx = 0;
> >         u32 i, off;
> > -       int ret;
> >
> >         for_each_vsi(i, t, vsi) {
> >                 const struct btf_type *var = btf_type_by_id(btf, vsi->type);
> > @@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> >                 switch (field_type) {
> >                 case BTF_FIELD_SPIN_LOCK:
> >                 case BTF_FIELD_TIMER:
> > -                       ret = btf_find_field_struct(btf, var_type, off, sz, info);
> > +                       ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
> > +                       if (ret < 0)
> > +                               return ret;
> > +                       break;
> > +               case BTF_FIELD_KPTR:
> > +                       ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
> >                         if (ret < 0)
> >                                 return ret;
> >                         break;
> >                 default:
> >                         return -EFAULT;
> >                 }
> > +
> > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> > +                       return -E2BIG;
> > +               if (ret == BTF_FIELD_IGNORE)
> > +                       continue;
> > +               ++idx;
> >         }
> > -       return 0;
> > +       return idx;
> >  }
> >
> >  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > -                         int field_type, struct btf_field_info *info)
> > +                         int field_type, struct btf_field_info *info, int info_cnt)
> >  {
> >         const char *name;
> >         int sz, align;
> > @@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> >                 sz = sizeof(struct bpf_timer);
> >                 align = __alignof__(struct bpf_timer);
> >                 break;
> > +       case BTF_FIELD_KPTR:
> > +               name = NULL;
> > +               sz = sizeof(u64);
> > +               align = __alignof__(u64);
> > +               break;
> >         default:
> >                 return -EFAULT;
> >         }
> >
> > +       /* The maximum allowed fields of a certain type will be info_cnt - 1 */
> >         if (__btf_type_is_struct(t))
> > -               return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
> > +               return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);
> >         else if (btf_type_is_datasec(t))
> > -               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
> > +               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
> >         return -EINVAL;
> >  }
> >
> > @@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> >   */
> >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
> >  {
> > -       struct btf_field_info info = { .off = -ENOENT };
> > +       /* btf_find_field requires array of size max + 1 */
> > +       struct btf_field_info info_arr[2];
> >         int ret;
> >
> > -       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
> > +       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
> >         if (ret < 0)
> >                 return ret;
> > -       return info.off;
> > +       if (!ret)
> > +               return -ENOENT;
> > +       return info_arr[0].off;
> >  }
> >
> >  int btf_find_timer(const struct btf *btf, const struct btf_type *t)
> >  {
> > -       struct btf_field_info info = { .off = -ENOENT };
> > +       /* btf_find_field requires array of size max + 1 */
> > +       struct btf_field_info info_arr[2];
> >         int ret;
> >
> > -       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
> > +       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
> >         if (ret < 0)
> >                 return ret;
> > -       return info.off;
> > +       if (!ret)
> > +               return -ENOENT;
> > +       return info_arr[0].off;
> > +}
> > +
> > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > +                                       const struct btf_type *t)
> > +{
> > +       /* btf_find_field requires array of size max + 1 */
> > +       struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
> > +       struct bpf_map_value_off *tab;
> > +       int ret, i, nr_off;
> > +
> > +       /* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > +       BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
> > +
> > +       ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> > +       if (ret < 0)
> > +               return ERR_PTR(ret);
> > +       if (!ret)
> > +               return NULL;
> > +
> > +       nr_off = ret;
> > +       tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
> > +       if (!tab)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       tab->nr_off = 0;
> > +       for (i = 0; i < nr_off; i++) {
> > +               const struct btf_type *t;
> > +               struct btf *off_btf;
> > +               s32 id;
> > +
> > +               t = info_arr[i].type;
> > +               id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
> > +                                    &off_btf);
> > +               if (id < 0) {
> > +                       ret = id;
> > +                       goto end;
> > +               }
> > +
> > +               tab->off[i].offset = info_arr[i].off;
> > +               tab->off[i].btf_id = id;
> > +               tab->off[i].btf = off_btf;
> > +               tab->nr_off = i + 1;
> > +       }
> > +       return tab;
> > +end:
> > +       while (tab->nr_off--)
> > +               btf_put(tab->off[tab->nr_off].btf);
> > +       kfree(tab);
> > +       return ERR_PTR(ret);
> >  }
> >
> >  static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
> > diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> > index 5cd8f5277279..135205d0d560 100644
> > --- a/kernel/bpf/map_in_map.c
> > +++ b/kernel/bpf/map_in_map.c
> > @@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> >         inner_map_meta->max_entries = inner_map->max_entries;
> >         inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
> >         inner_map_meta->timer_off = inner_map->timer_off;
> > +       inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
> >         if (inner_map->btf) {
> >                 btf_get(inner_map->btf);
> >                 inner_map_meta->btf = inner_map->btf;
> > @@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> >
> >  void bpf_map_meta_free(struct bpf_map *map_meta)
> >  {
> > +       bpf_map_free_kptr_off_tab(map_meta);
> >         btf_put(map_meta->btf);
> >         kfree(map_meta);
> >  }
> > @@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
> >                 meta0->key_size == meta1->key_size &&
> >                 meta0->value_size == meta1->value_size &&
> >                 meta0->timer_off == meta1->timer_off &&
> > -               meta0->map_flags == meta1->map_flags;
> > +               meta0->map_flags == meta1->map_flags &&
> > +               bpf_map_equal_kptr_off_tab(meta0, meta1);
> >  }
> >
> >  void *bpf_map_fd_get_ptr(struct bpf_map *map,
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index cdaa1152436a..5990d6fa97ab 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/bpf_trace.h>
> >  #include <linux/bpf_lirc.h>
> >  #include <linux/bpf_verifier.h>
> > +#include <linux/bsearch.h>
> >  #include <linux/btf.h>
> >  #include <linux/syscalls.h>
> >  #include <linux/slab.h>
> > @@ -473,12 +474,95 @@ static void bpf_map_release_memcg(struct bpf_map *map)
> >  }
> >  #endif
> >
> > +static int bpf_map_kptr_off_cmp(const void *a, const void *b)
> > +{
> > +       const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> > +
> > +       if (off_desc1->offset < off_desc2->offset)
> > +               return -1;
> > +       else if (off_desc1->offset > off_desc2->offset)
> > +               return 1;
> > +       return 0;
> > +}
> > +
> > +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
> > +{
> > +       /* Since members are iterated in btf_find_field in increasing order,
> > +        * offsets appended to kptr_off_tab are in increasing order, so we can
> > +        * do bsearch to find exact match.
> > +        */
> > +       struct bpf_map_value_off *tab;
> > +
> > +       if (!map_value_has_kptr(map))
> > +               return NULL;
> > +       tab = map->kptr_off_tab;
> > +       return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
> > +}
> > +
> > +void bpf_map_free_kptr_off_tab(struct bpf_map *map)
> > +{
> > +       struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +       int i;
> > +
> > +       if (!map_value_has_kptr(map))
> > +               return;
> > +       for (i = 0; i < tab->nr_off; i++) {
> > +               struct btf *btf = tab->off[i].btf;
> > +
> > +               btf_put(btf);
> > +       }
> > +       kfree(tab);
> > +       map->kptr_off_tab = NULL;
> > +}
> > +
> > +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
> > +{
> > +       struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
> > +       int size, i, ret;
> > +
> > +       if (!map_value_has_kptr(map))
> > +               return ERR_PTR(-ENOENT);
> > +       /* Do a deep copy of the kptr_off_tab */
> > +       for (i = 0; i < tab->nr_off; i++)
> > +               btf_get(tab->off[i].btf);
> > +
> > +       size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
> > +       new_tab = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > +       if (!new_tab) {
> > +               ret = -ENOMEM;
> > +               goto end;
> > +       }
> > +       memcpy(new_tab, tab, size);
> > +       return new_tab;
> > +end:
> > +       while (i--)
> > +               btf_put(tab->off[i].btf);
> > +       return ERR_PTR(ret);
> > +}
> > +
> > +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> > +{
> > +       struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> > +       bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
> > +       int size;
> > +
> > +       if (!a_has_kptr && !b_has_kptr)
> > +               return true;
> > +       if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))
>
> nit: I think we could simplify this second if check to
> if (!a_has_kptr || !b_has_kptr)
>     return false;
>

Ack.

> > +               return false;
> > +       if (tab_a->nr_off != tab_b->nr_off)
> > +               return false;
> > +       size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> > +       return !memcmp(tab_a, tab_b, size);
> > +}
> > +
> >  /* called from workqueue */
> >  static void bpf_map_free_deferred(struct work_struct *work)
> >  {
> >         struct bpf_map *map = container_of(work, struct bpf_map, work);
> >
> >         security_bpf_map_free(map);
> > +       bpf_map_free_kptr_off_tab(map);
> >         bpf_map_release_memcg(map);
> >         /* implementation dependent freeing */
> >         map->ops->map_free(map);
> > @@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
> >         int err;
> >
> >         if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> > -           map_value_has_timer(map))
> > +           map_value_has_timer(map) || map_value_has_kptr(map))
> >                 return -ENOTSUPP;
> >
> >         if (!(vma->vm_flags & VM_SHARED))
> > @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >                         return -EOPNOTSUPP;
> >         }
> >
> > -       if (map->ops->map_check_btf)
> > +       map->kptr_off_tab = btf_find_kptr(btf, value_type);
> > +       if (map_value_has_kptr(map)) {
> > +               if (!bpf_capable())
> > +                       return -EPERM;
> > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > +                       ret = -EACCES;
> > +                       goto free_map_tab;
> > +               }
> > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> > +                       ret = -EOPNOTSUPP;
> > +                       goto free_map_tab;
> > +               }
> > +       }
> > +
> > +       if (map->ops->map_check_btf) {
> >                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > +               if (ret < 0)
> > +                       goto free_map_tab;
> > +       }
> >
> > +       return ret;
> > +free_map_tab:
> > +       bpf_map_free_kptr_off_tab(map);
> >         return ret;
> >  }
> >
> > @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
> >                 return PTR_ERR(map);
> >
> >         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > -           map_value_has_timer(map)) {
> > +           map_value_has_timer(map) || map_value_has_kptr(map)) {
> >                 fdput(f);
> >                 return -ENOTSUPP;
> >         }
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 4ce9a528fb63..744b7362e52e 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> >         return __check_ptr_off_reg(env, reg, regno, false);
> >  }
> >
> > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > +                              struct bpf_map_value_off_desc *off_desc,
> > +                              struct bpf_reg_state *reg, u32 regno)
> > +{
> > +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > +       const char *reg_name = "";
> > +
> > +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > +               goto bad_type;
> > +
> > +       if (!btf_is_kernel(reg->btf)) {
> > +               verbose(env, "R%d must point to kernel BTF\n", regno);
> > +               return -EINVAL;
> > +       }
> > +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > +
> > +       if (__check_ptr_off_reg(env, reg, regno, true))
> > +               return -EACCES;
> > +
> > +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > +                                 off_desc->btf, off_desc->btf_id))
> > +               goto bad_type;
> > +       return 0;
> > +bad_type:
> > +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > +               reg_type_str(env, reg->type), reg_name);
> > +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +       return -EINVAL;
> > +}
> > +
> > +/* Returns an error, or 0 if ignoring the access, or 1 if register state was
> > + * updated, in which case later updates must be skipped.
> > + */
> > +static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > +                                int off, int size, int value_regno,
> > +                                enum bpf_access_type t, int insn_idx)
>
> Did you mean to include the "enum bpf_access_type t" argument? I'm not
> seeing where it's being used in this function
>

Good catch, this was only needed when I supported BPF_XCHG directly swapping
kptr_ref. Since that has been dropped, I will remove this unused parameter.

> > +{
> > +       struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
> > +       struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > +       struct bpf_map_value_off_desc *off_desc;
> > +       struct bpf_map *map = reg->map_ptr;
> > +       int class = BPF_CLASS(insn->code);
> > +
> > +       /* Things we already checked for in check_map_access:
> > +        *  - Reject cases where variable offset may touch BTF ID pointer
> > +        *  - size of access (must be BPF_DW)
> > +        *  - off_desc->offset == off + reg->var_off.value
> > +        */
> > +       if (!tnum_is_const(reg->var_off))
> > +               return 0;
> > +
> > +       off_desc = bpf_map_kptr_off_contains(map, off + reg->var_off.value);
> > +       if (!off_desc)
> > +               return 0;
> > +
> > +       /* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
> > +       if (BPF_MODE(insn->code) != BPF_MEM)
> > +               goto end;
> > +
> > +       if (class == BPF_LDX) {
> > +               val_reg = reg_state(env, value_regno);
> > +               /* We can simply mark the value_regno receiving the pointer
> > +                * value from map as PTR_TO_BTF_ID, with the correct type.
> > +                */
> > +               mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
> > +                               off_desc->btf_id, PTR_MAYBE_NULL);
> > +               val_reg->id = ++env->id_gen;
> > +       } else if (class == BPF_STX) {
> > +               val_reg = reg_state(env, value_regno);
> > +               if (!register_is_null(val_reg) &&
> > +                   map_kptr_match_type(env, off_desc, val_reg, value_regno))
> > +                       return -EACCES;
> > +       } else if (class == BPF_ST) {
> > +               if (insn->imm) {
> > +                       verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
> > +                               off_desc->offset);
> > +                       return -EACCES;
> > +               }
> > +       } else {
> > +               goto end;
> > +       }
> > +       return 1;
> > +end:
> > +       verbose(env, "kptr in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
> > +       return -EACCES;
> > +}
> > +
> >  /* check read/write into a map element with possible variable offset */
> >  static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> >                             int off, int size, bool zero_size_allowed)
> > @@ -3545,6 +3633,32 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> >                         return -EACCES;
> >                 }
> >         }
> > +       if (map_value_has_kptr(map)) {
> > +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +               int i;
> > +
> > +               for (i = 0; i < tab->nr_off; i++) {
> > +                       u32 p = tab->off[i].offset;
> > +
> > +                       if (reg->smin_value + off < p + sizeof(u64) &&
> > +                           p < reg->umax_value + off + size) {
> > +                               if (!tnum_is_const(reg->var_off)) {
> > +                                       verbose(env, "kptr access cannot have variable offset\n");
> > +                                       return -EACCES;
> > +                               }
> > +                               if (p != off + reg->var_off.value) {
> > +                                       verbose(env, "kptr access misaligned expected=%u off=%llu\n",
> > +                                               p, off + reg->var_off.value);
> > +                                       return -EACCES;
> > +                               }
> > +                               if (size != bpf_size_to_bytes(BPF_DW)) {
> > +                                       verbose(env, "kptr access size must be BPF_DW\n");
> > +                                       return -EACCES;
> > +                               }
> > +                               break;
> > +                       }
> > +               }
> > +       }
> >         return err;
> >  }
> >
> > @@ -4421,6 +4535,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
> >                 if (err)
> >                         return err;
> >                 err = check_map_access(env, regno, off, size, false);
> > +               err = err ?: check_map_kptr_access(env, regno, off, size, value_regno, t, insn_idx);
> > +               if (err < 0)
> > +                       return err;
> > +               /* if err == 0, check_map_kptr_access ignored the access */
> >                 if (!err && t == BPF_READ && value_regno >= 0) {
> >                         struct bpf_map *map = reg->map_ptr;
> >
> > @@ -4442,6 +4560,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn)
> >                                 mark_reg_unknown(env, regs, value_regno);
> >                         }
> >                 }
> > +               /* clear err == 1 */
> > +               err = err < 0 ? err : 0;
>
> I find this flow a bit unintuitive to follow. Would something like
>
>     err = check_map_access(env, regno, off, size, false);
>     if (err)
>         return err;
>     if (bpf_map_kptr_off_contains(map, off + reg->var_off.value, &off_desc)) {
>         err = check_map_kptr_access(...) *where check_map_kptr_access
> now returns 0 on success and error code otherwise*
>     } else if (!err && t == BPF_READ && value_regno >= 0) {
>         ...
>     }
>
> be clearer?
>

Agreed, will change.

>
> >         } else if (base_type(reg->type) == PTR_TO_MEM) {
> >                 bool rdonly_mem = type_is_rdonly_mem(reg->type);
> >
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22  5:45   ` Andrii Nakryiko
@ 2022-03-22  7:16     ` Kumar Kartikeya Dwivedi
  2022-03-22  7:43       ` Kumar Kartikeya Dwivedi
  2022-03-22 18:52       ` Andrii Nakryiko
  0 siblings, 2 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-22  7:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 11:15:42AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This commit introduces a new pointer type 'kptr' which can be embedded
> > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > register must have the same type as in the map value's BTF, and loading
> > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > kernel BTF and BTF ID.
> >
> > Such kptr are unreferenced, i.e. by the time another invocation of the
> > BPF program loads this pointer, the object which the pointer points to
> > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > to still access such invalid pointer, but passing such pointers into
> > BPF helpers and kfuncs should not be permitted. A future patch in this
> > series will close this gap.
> >
> > The flexibility offered by allowing programs to dereference such invalid
> > pointers while being safe at runtime frees the verifier from doing
> > complex lifetime tracking. As long as the user may ensure that the
> > object remains valid, it can ensure data read by it from the kernel
> > object is valid.
> >
> > The user indicates that a certain pointer must be treated as kptr
> > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > information is recorded in the object BTF which will be passed into the
> > kernel by way of map's BTF information. The name and kind from the map
> > value BTF is used to look up the in-kernel type, and the actual BTF and
> > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > now, only storing pointers to structs is permitted.
> >
> > An example of this specification is shown below:
> >
> >         #define __kptr __attribute__((btf_type_tag("kptr")))
> >
> >         struct map_value {
> >                 ...
> >                 struct task_struct __kptr *task;
> >                 ...
> >         };
> >
> > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > task_struct into the map, and then load it later.
> >
> > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > the verifier cannot know whether the value is NULL or not statically, it
> > must treat all potential loads at that map value offset as loading a
> > possibly NULL pointer.
> >
> > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > same BTF type as specified in the map BTF. The access size must always
> > be BPF_DW.
> >
> > For the map in map support, the kptr_off_tab for outer map is copied
> > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > instead of introducing a refcount to kptr_off_tab, because the copy only
> > needs to be done when paramterizing using inner_map_fd in the map in map
> > case, hence would be unnecessary for all other users.
> >
> > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > having kptr, similar to the bpf_timer case.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h     |  29 +++++++-
> >  include/linux/btf.h     |   2 +
> >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> >  kernel/bpf/map_in_map.c |   5 +-
> >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> >  6 files changed, 401 insertions(+), 28 deletions(-)
> >
>
> [...]
>
> > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > +                              u32 off, int sz, struct btf_field_info *info)
> > +{
> > +       /* For PTR, sz is always == 8 */
> > +       if (!btf_type_is_ptr(t))
> > +               return BTF_FIELD_IGNORE;
> > +       t = btf_type_by_id(btf, t->type);
> > +
> > +       if (!btf_type_is_type_tag(t))
> > +               return BTF_FIELD_IGNORE;
> > +       /* Reject extra tags */
> > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > +               return -EINVAL;
>
> Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
> you assume there are no more tags with just this check?
>

All tags are supposed to be before other modifiers, so tags come first, in
continuity. See [0].

Alexei suggested to reject all other tags for now.

 [0]: https://lore.kernel.org/bpf/20220127154627.665163-1-yhs@fb.com

>
> > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > +               return -EINVAL;
> > +
> > +       /* Get the base type */
> > +       if (btf_type_is_modifier(t))
> > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > +       /* Only pointer to struct is allowed */
> > +       if (!__btf_type_is_struct(t))
> > +               return -EINVAL;
> > +
> > +       info->type = t;
> > +       info->off = off;
> > +       return BTF_FIELD_FOUND;
> >  }
> >
> >  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
> >                                  const char *name, int sz, int align, int field_type,
> > -                                struct btf_field_info *info)
> > +                                struct btf_field_info *info, int info_cnt)
> >  {
> >         const struct btf_member *member;
> > +       int ret, idx = 0;
> >         u32 i, off;
> > -       int ret;
> >
> >         for_each_member(i, t, member) {
> >                 const struct btf_type *member_type = btf_type_by_id(btf,
> > @@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
> >                 switch (field_type) {
> >                 case BTF_FIELD_SPIN_LOCK:
> >                 case BTF_FIELD_TIMER:
> > -                       ret = btf_find_field_struct(btf, member_type, off, sz, info);
> > +                       ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
> > +                       if (ret < 0)
> > +                               return ret;
> > +                       break;
> > +               case BTF_FIELD_KPTR:
> > +                       ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
> >                         if (ret < 0)
> >                                 return ret;
> >                         break;
> >                 default:
> >                         return -EFAULT;
> >                 }
> > +
> > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
>
> hm.. haven't you already written info[info_cnt] above by now? I see
> that above you do (info_cnt - 1), but why such tricks if you can have
> a temporary struct btf_field_info on the stack, write into it, and if
> BTF_FIELD_FOUND and idx < info_cnt then write it into info[idx]?
>
>
> > +                       return -E2BIG;
> > +               else if (ret == BTF_FIELD_IGNORE)
> > +                       continue;
> > +               ++idx;
> >         }
> > -       return 0;
> > +       return idx;
> >  }
> >
> >  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> >                                 const char *name, int sz, int align, int field_type,
> > -                               struct btf_field_info *info)
> > +                               struct btf_field_info *info, int info_cnt)
> >  {
> >         const struct btf_var_secinfo *vsi;
> > +       int ret, idx = 0;
> >         u32 i, off;
> > -       int ret;
> >
> >         for_each_vsi(i, t, vsi) {
> >                 const struct btf_type *var = btf_type_by_id(btf, vsi->type);
> > @@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> >                 switch (field_type) {
> >                 case BTF_FIELD_SPIN_LOCK:
> >                 case BTF_FIELD_TIMER:
> > -                       ret = btf_find_field_struct(btf, var_type, off, sz, info);
> > +                       ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
> > +                       if (ret < 0)
> > +                               return ret;
> > +                       break;
> > +               case BTF_FIELD_KPTR:
> > +                       ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
> >                         if (ret < 0)
> >                                 return ret;
> >                         break;
> >                 default:
> >                         return -EFAULT;
> >                 }
> > +
> > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
>
> same, already writing past the end of array?
>
> > +                       return -E2BIG;
> > +               if (ret == BTF_FIELD_IGNORE)
> > +                       continue;
> > +               ++idx;
> >         }
> > -       return 0;
> > +       return idx;
> >  }
> >
> >  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > -                         int field_type, struct btf_field_info *info)
> > +                         int field_type, struct btf_field_info *info, int info_cnt)
> >  {
> >         const char *name;
> >         int sz, align;
> > @@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> >                 sz = sizeof(struct bpf_timer);
> >                 align = __alignof__(struct bpf_timer);
> >                 break;
> > +       case BTF_FIELD_KPTR:
> > +               name = NULL;
> > +               sz = sizeof(u64);
> > +               align = __alignof__(u64);
>
> can be 4 on 32-bit arch, is that ok?
>

Good catch, it must be 8, so will hardcode.

> > +               break;
> >         default:
> >                 return -EFAULT;
> >         }
> >
> > +       /* The maximum allowed fields of a certain type will be info_cnt - 1 */
> >         if (__btf_type_is_struct(t))
> > -               return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
> > +               return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);
>
> why -1, to avoid overwriting past the end of array?
>

Yes, see my reply to Joanne, let's continue discussing it there.

> >         else if (btf_type_is_datasec(t))
> > -               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
> > +               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
> >         return -EINVAL;
> >  }
> >
> > @@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> >   */
> >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
> >  {
> > -       struct btf_field_info info = { .off = -ENOENT };
> > +       /* btf_find_field requires array of size max + 1 */
>
> ok, right, as I expected above, but see also suggestion to not have
> these weird implicit expectations
>
> > +       struct btf_field_info info_arr[2];
> >         int ret;
> >
> > -       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
> > +       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
> >         if (ret < 0)
> >                 return ret;
> > -       return info.off;
> > +       if (!ret)
> > +               return -ENOENT;
> > +       return info_arr[0].off;
> >  }
> >
> >  int btf_find_timer(const struct btf *btf, const struct btf_type *t)
> >  {
> > -       struct btf_field_info info = { .off = -ENOENT };
> > +       /* btf_find_field requires array of size max + 1 */
> > +       struct btf_field_info info_arr[2];
> >         int ret;
> >
> > -       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
> > +       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
> >         if (ret < 0)
> >                 return ret;
> > -       return info.off;
> > +       if (!ret)
> > +               return -ENOENT;
> > +       return info_arr[0].off;
> > +}
> > +
> > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > +                                       const struct btf_type *t)
> > +{
> > +       /* btf_find_field requires array of size max + 1 */
> > +       struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
> > +       struct bpf_map_value_off *tab;
> > +       int ret, i, nr_off;
> > +
> > +       /* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > +       BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
>
> you can store u32 type_id instead of full btf_type pointer, type
> looking below in the loop is cheap and won't fail
>

Ok, will switch to type_id.

>
> > +
> > +       ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> > +       if (ret < 0)
> > +               return ERR_PTR(ret);
> > +       if (!ret)
> > +               return NULL;
> > +
>
> [...]
>
> > +
> > +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> > +{
> > +       struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> > +       bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
> > +       int size;
> > +
> > +       if (!a_has_kptr && !b_has_kptr)
> > +               return true;
> > +       if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))
> > +               return false;
>
> if (a_has_kptr != b_has_kptr)
>     return false;
>

Ack.

> > +       if (tab_a->nr_off != tab_b->nr_off)
> > +               return false;
> > +       size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> > +       return !memcmp(tab_a, tab_b, size);
> > +}
> > +
> >  /* called from workqueue */
> >  static void bpf_map_free_deferred(struct work_struct *work)
> >  {
> >         struct bpf_map *map = container_of(work, struct bpf_map, work);
> >
> >         security_bpf_map_free(map);
> > +       bpf_map_free_kptr_off_tab(map);
> >         bpf_map_release_memcg(map);
> >         /* implementation dependent freeing */
> >         map->ops->map_free(map);
> > @@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
> >         int err;
> >
> >         if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> > -           map_value_has_timer(map))
> > +           map_value_has_timer(map) || map_value_has_kptr(map))
> >                 return -ENOTSUPP;
> >
> >         if (!(vma->vm_flags & VM_SHARED))
> > @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >                         return -EOPNOTSUPP;
> >         }
> >
> > -       if (map->ops->map_check_btf)
> > +       map->kptr_off_tab = btf_find_kptr(btf, value_type);
>
> btf_find_kptr() is so confusingly named. It certainly can find more
> than one kptr, so at least it should be btf_find_kptrs(). Combining
> with Joanne's suggestion, btf_parse_kptrs() would indeed be better.
>

Ok.

> > +       if (map_value_has_kptr(map)) {
> > +               if (!bpf_capable())
> > +                       return -EPERM;
> > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > +                       ret = -EACCES;
> > +                       goto free_map_tab;
> > +               }
> > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
>
> what about PERCPU_ARRAY, for instance? Is there something
> fundamentally wrong to support it for local storage maps?
>

Plugging in support into maps that already take timers was easier to begin, I
can do percpu support as a follow up.

In case of local storage, I'm a little worried about how we prevent creating
reference cycles. There was a thread where find_get_task_by_pid was proposed as
unstable helper, once we e.g. support embedding task_struct in map, and allow
storing such pointer in task local storage, it would be pretty easy to construct
a circular reference cycle.

Should we think about this now, or should we worry about this when task_struct
is actually supported as kptr? It's not only task_struct, same applies to sock.

There's a discussion to be had, hence I left it out for now.

> > +                       ret = -EOPNOTSUPP;
> > +                       goto free_map_tab;
> > +               }
> > +       }
> > +
> > +       if (map->ops->map_check_btf) {
> >                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > +               if (ret < 0)
> > +                       goto free_map_tab;
> > +       }
> >
> > +       return ret;
> > +free_map_tab:
> > +       bpf_map_free_kptr_off_tab(map);
> >         return ret;
> >  }
> >
> > @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
> >                 return PTR_ERR(map);
> >
> >         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > -           map_value_has_timer(map)) {
> > +           map_value_has_timer(map) || map_value_has_kptr(map)) {
> >                 fdput(f);
> >                 return -ENOTSUPP;
> >         }
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 4ce9a528fb63..744b7362e52e 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> >         return __check_ptr_off_reg(env, reg, regno, false);
> >  }
> >
> > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > +                              struct bpf_map_value_off_desc *off_desc,
> > +                              struct bpf_reg_state *reg, u32 regno)
> > +{
> > +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > +       const char *reg_name = "";
> > +
> > +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
>
> base_type(reg->type) != PTR_TO_BTF_ID ?
>
> > +               goto bad_type;
> > +
> > +       if (!btf_is_kernel(reg->btf)) {
> > +               verbose(env, "R%d must point to kernel BTF\n", regno);
> > +               return -EINVAL;
> > +       }
> > +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > +
> > +       if (__check_ptr_off_reg(env, reg, regno, true))
> > +               return -EACCES;
> > +
> > +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > +                                 off_desc->btf, off_desc->btf_id))
> > +               goto bad_type;
> > +       return 0;
> > +bad_type:
> > +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > +               reg_type_str(env, reg->type), reg_name);
> > +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
>
> why two separate verbose calls, you can easily combine them (and they
> should be output on a single line given it's a single error)
>

reg_type_str cannot be called more than once in the same statement, since it
reuses the same buffer.

> > +       return -EINVAL;
> > +}
> > +
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-03-22  5:58   ` Andrii Nakryiko
@ 2022-03-22  7:18     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-22  7:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 11:28:26AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > While we can guarantee that even for unreferenced kptr, the object
> > pointer points to being freed etc. can be handled by the verifier's
> > exception handling (normal load patching to PROBE_MEM loads), we still
> > cannot allow the user to pass these pointers to BPF helpers and kfunc,
> > because the same exception handling won't be done for accesses inside
> > the kernel. The same is true if a referenced pointer is loaded using
> > normal load instruction. Since the reference is not guaranteed to be
> > held while the pointer is used, it must be marked as untrusted.
> >
> > Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
> > all registers loading unreferenced and referenced kptr from BPF maps,
> > and ensure they can never escape the BPF program and into the kernel by
> > way of calling stable/unstable helpers.
> >
> > In check_ptr_to_btf_access, the !type_may_be_null check to reject type
> > flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
> > MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
> > two are checked inside the function and rejected using a proper error
> > message, but we still want to allow dereference of untrusted case.
> >
> > Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
> > walked, so that this flag is never dropped once it has been set on a
> > PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
> > direction).
> >
> > In convert_ctx_accesses, extend the switch case to consider untrusted
> > PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
> > conversion for BPF_LDX.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h   | 10 +++++++++-
> >  kernel/bpf/verifier.c | 34 +++++++++++++++++++++++++++-------
> >  2 files changed, 36 insertions(+), 8 deletions(-)
> >
>
> [...]
>
> > -       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > -               goto bad_type;
> > +       if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
> > +               if (reg->type != PTR_TO_BTF_ID &&
> > +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL))
> > +                       goto bad_type;
> > +       } else { /* only unreferenced case accepts untrusted pointers */
> > +               if (reg->type != PTR_TO_BTF_ID &&
> > +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL) &&
> > +                   reg->type != (PTR_TO_BTF_ID | PTR_UNTRUSTED) &&
> > +                   reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL | PTR_UNTRUSTED))
>
> use base_type(), Luke! ;)
>

Ack, will switch.

> > +                       goto bad_type;
> > +       }
> >
> >         if (!btf_is_kernel(reg->btf)) {
> >                 verbose(env, "R%d must point to kernel BTF\n", regno);
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto
  2022-03-22  1:47   ` Joanne Koong
@ 2022-03-22  7:34     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-22  7:34 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 07:17:40AM IST, Joanne Koong wrote:
> On Sun, Mar 20, 2022 at 6:34 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Add a few fields for each arg (argN_release) that when set to true,
> > tells verifier that for a release function, that argument's register
> > will be the one for which meta.ref_obj_id will be set, and which will
> > then be released using release_reference. To capture the regno,
> > introduce a release_regno field in bpf_call_arg_meta.
> >
> > This would be required in the next patch, where we may either pass NULL
> > or a refcounted pointer as an argument to the release function
> > bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
> > enough, as there is a case where the type of argument needed matches,
> > but the ref_obj_id is set to 0. Hence, we must enforce that whenever
> > meta.ref_obj_id is zero, the register that is to be released can only
> > be NULL for a release function.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h   | 10 ++++++++++
> >  kernel/bpf/ringbuf.c  |  2 ++
> >  kernel/bpf/verifier.c | 39 +++++++++++++++++++++++++++++++++------
> >  net/core/filter.c     |  1 +
> >  4 files changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f35920d279dd..48ddde854d67 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -487,6 +487,16 @@ struct bpf_func_proto {
> >                 };
> >                 u32 *arg_btf_id[5];
> >         };
> > +       union {
> > +               struct {
> > +                       bool arg1_release;
> > +                       bool arg2_release;
> > +                       bool arg3_release;
> > +                       bool arg4_release;
> > +                       bool arg5_release;
> > +               };
> > +               bool arg_release[5];
> > +       };
>
> Instead of having the new fields "argx_release" for each arg, what are
> your thoughts on using PTR_RELEASE as an "enum bpf_type_flag" to the
> existing "argx_type" field? For example, instead of
>
>      .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
>      .arg1_release   = true,
>
> could we do something like
>
>      .arg1_type      = ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE
>
> In the verifier, we could determine whether an argument register
> releases a reference by checking whether this PTR_RELEASE flag is set.
>
> Would this be a little cleaner? Curious to hear your thoughts.
>

I don't dislike it, it's just a little more work to make sure we don't have it
set for arg_type in places where it isn't expected, so it would need some
inspection of existing code. It's certainly a bit better than having five bools.

I guess I'll try it out and see.

>
> >         int *ret_btf_id; /* return value btf_id */
> >         bool (*allowed)(const struct bpf_prog *prog);
> >  };
> > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> > index 710ba9de12ce..f40ce718630e 100644
> > --- a/kernel/bpf/ringbuf.c
> > +++ b/kernel/bpf/ringbuf.c
> > @@ -405,6 +405,7 @@ const struct bpf_func_proto bpf_ringbuf_submit_proto = {
> >         .func           = bpf_ringbuf_submit,
> >         .ret_type       = RET_VOID,
> >         .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
> > +       .arg1_release   = true,
> >         .arg2_type      = ARG_ANYTHING,
> >  };
> >
> > @@ -418,6 +419,7 @@ const struct bpf_func_proto bpf_ringbuf_discard_proto = {
> >         .func           = bpf_ringbuf_discard,
> >         .ret_type       = RET_VOID,
> >         .arg1_type      = ARG_PTR_TO_ALLOC_MEM,
> > +       .arg1_release   = true,
> >         .arg2_type      = ARG_ANYTHING,
> >  };
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 744b7362e52e..b8cd34607215 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
> >         struct bpf_map *map_ptr;
> >         bool raw_mode;
> >         bool pkt_access;
> > +       u8 release_regno;
> >         int regno;
> >         int access_size;
> >         int mem_size;
> > @@ -6101,12 +6102,31 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
> >         return true;
> >  }
> >
> > -static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
> > +static bool check_release_regno(const struct bpf_func_proto *fn, int func_id,
> > +                               struct bpf_call_arg_meta *meta)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(fn->arg_release); i++) {
> > +               if (fn->arg_release[i]) {
> > +                       if (!is_release_function(func_id))
> > +                               return false;
> > +                       if (meta->release_regno)
> > +                               return false;
> > +                       meta->release_regno = i + 1;
> > +               }
> > +       }
> > +       return !is_release_function(func_id) || meta->release_regno;
> > +}
> > +
> > +static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
> > +                           struct bpf_call_arg_meta *meta)
> >  {
> >         return check_raw_mode_ok(fn) &&
> >                check_arg_pair_ok(fn) &&
> >                check_btf_id_ok(fn) &&
> > -              check_refcount_ok(fn, func_id) ? 0 : -EINVAL;
> > +              check_refcount_ok(fn, func_id) &&
> > +              check_release_regno(fn, func_id, meta) ? 0 : -EINVAL;
> >  }
> >
> >  /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
> > @@ -6785,7 +6805,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >         memset(&meta, 0, sizeof(meta));
> >         meta.pkt_access = fn->pkt_access;
> >
> > -       err = check_func_proto(fn, func_id);
> > +       err = check_func_proto(fn, func_id, &meta);
> >         if (err) {
> >                 verbose(env, "kernel subsystem misconfigured func %s#%d\n",
> >                         func_id_name(func_id), func_id);
> > @@ -6818,8 +6838,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                         return err;
> >         }
> >
> > +       regs = cur_regs(env);
> > +
> >         if (is_release_function(func_id)) {
> > -               err = release_reference(env, meta.ref_obj_id);
> > +               err = -EINVAL;
> > +               if (meta.ref_obj_id)
> > +                       err = release_reference(env, meta.ref_obj_id);
> > +               /* meta.ref_obj_id can only be 0 if register that is meant to be
> > +                * released is NULL, which must be > R0.
> > +                */
> > +               else if (meta.release_regno && register_is_null(&regs[meta.release_regno]))
> > +                       err = 0;
>
> If I'm understanding this correctly, in this patch we will call
> check_release_regno on every function to determine if any / which of
> the argument registers release a reference. Given that in the majority
> of cases the function will not be a release function, what are your
> thoughts on moving that check to be within the scope of this if
> function? So if it is a release function, and meta.ref_obj_id is not
> set, then we do the checking for which argument register is a release
> register and whether that register is null. Curious to hear your
> thoughts.
>

The suggestion looks nice, as it saves a lot of work, but my preference was to
error when the bpf_func_proto fields are incorrect (more than one arg has
argN_release == true). In this case we can still detect such a case, but it is
behind 'if (is_release_function(...))', so it wouldn't catch incorrect
bpf_func_proto of non-release functions.

So whether to do it your way would depend on whether it is considered valuable
(or defensive programming) to detect badly set up bpf_func_proto or not (we
already do it for some other cases, so it's nothing new), particularly for this
case.

>
> >                 if (err) {
> >                         verbose(env, "func %s#%d reference has not been acquired before\n",
> >                                 func_id_name(func_id), func_id);
> > @@ -6827,8 +6856,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                 }
> >         }
> >
> > -       regs = cur_regs(env);
> > -
> >         switch (func_id) {
> >         case BPF_FUNC_tail_call:
> >                 err = check_reference_leak(env);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 03655f2074ae..17eff4731b06 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -6622,6 +6622,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
> >         .gpl_only       = false,
> >         .ret_type       = RET_INTEGER,
> >         .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> > +       .arg1_release   = true,
> >  };
> >
> >  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22  7:16     ` Kumar Kartikeya Dwivedi
@ 2022-03-22  7:43       ` Kumar Kartikeya Dwivedi
  2022-03-22 18:52       ` Andrii Nakryiko
  1 sibling, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-22  7:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 12:46:40PM IST, Kumar Kartikeya Dwivedi wrote:
> On Tue, Mar 22, 2022 at 11:15:42AM IST, Andrii Nakryiko wrote:
> > On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > register must have the same type as in the map value's BTF, and loading
> > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > kernel BTF and BTF ID.
> > >
> > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > BPF program loads this pointer, the object which the pointer points to
> > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > to still access such invalid pointer, but passing such pointers into
> > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > series will close this gap.
> > >
> > > The flexibility offered by allowing programs to dereference such invalid
> > > pointers while being safe at runtime frees the verifier from doing
> > > complex lifetime tracking. As long as the user may ensure that the
> > > object remains valid, it can ensure data read by it from the kernel
> > > object is valid.
> > >
> > > The user indicates that a certain pointer must be treated as kptr
> > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > information is recorded in the object BTF which will be passed into the
> > > kernel by way of map's BTF information. The name and kind from the map
> > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > now, only storing pointers to structs is permitted.
> > >
> > > An example of this specification is shown below:
> > >
> > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > >
> > >         struct map_value {
> > >                 ...
> > >                 struct task_struct __kptr *task;
> > >                 ...
> > >         };
> > >
> > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > task_struct into the map, and then load it later.
> > >
> > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > the verifier cannot know whether the value is NULL or not statically, it
> > > must treat all potential loads at that map value offset as loading a
> > > possibly NULL pointer.
> > >
> > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > same BTF type as specified in the map BTF. The access size must always
> > > be BPF_DW.
> > >
> > > For the map in map support, the kptr_off_tab for outer map is copied
> > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > case, hence would be unnecessary for all other users.
> > >
> > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > having kptr, similar to the bpf_timer case.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h     |  29 +++++++-
> > >  include/linux/btf.h     |   2 +
> > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > >  kernel/bpf/map_in_map.c |   5 +-
> > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > >
> >
> > [...]
> >
> > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > +                              u32 off, int sz, struct btf_field_info *info)
> > > +{
> > > +       /* For PTR, sz is always == 8 */
> > > +       if (!btf_type_is_ptr(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       t = btf_type_by_id(btf, t->type);
> > > +
> > > +       if (!btf_type_is_type_tag(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       /* Reject extra tags */
> > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > +               return -EINVAL;
> >
> > Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
> > you assume there are no more tags with just this check?
> >
>
> All tags are supposed to be before other modifiers, so tags come first, in
> continuity. See [0].
>
> Alexei suggested to reject all other tags for now.
>
>  [0]: https://lore.kernel.org/bpf/20220127154627.665163-1-yhs@fb.com
>
> >
> > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > +               return -EINVAL;
> > > +
> > > +       /* Get the base type */
> > > +       if (btf_type_is_modifier(t))
> > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > +       /* Only pointer to struct is allowed */
> > > +       if (!__btf_type_is_struct(t))
> > > +               return -EINVAL;
> > > +
> > > +       info->type = t;
> > > +       info->off = off;
> > > +       return BTF_FIELD_FOUND;
> > >  }
> > >
> > >  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
> > >                                  const char *name, int sz, int align, int field_type,
> > > -                                struct btf_field_info *info)
> > > +                                struct btf_field_info *info, int info_cnt)
> > >  {
> > >         const struct btf_member *member;
> > > +       int ret, idx = 0;
> > >         u32 i, off;
> > > -       int ret;
> > >
> > >         for_each_member(i, t, member) {
> > >                 const struct btf_type *member_type = btf_type_by_id(btf,
> > > @@ -3210,24 +3242,35 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
> > >                 switch (field_type) {
> > >                 case BTF_FIELD_SPIN_LOCK:
> > >                 case BTF_FIELD_TIMER:
> > > -                       ret = btf_find_field_struct(btf, member_type, off, sz, info);
> > > +                       ret = btf_find_field_struct(btf, member_type, off, sz, &info[idx]);
> > > +                       if (ret < 0)
> > > +                               return ret;
> > > +                       break;
> > > +               case BTF_FIELD_KPTR:
> > > +                       ret = btf_find_field_kptr(btf, member_type, off, sz, &info[idx]);
> > >                         if (ret < 0)
> > >                                 return ret;
> > >                         break;
> > >                 default:
> > >                         return -EFAULT;
> > >                 }
> > > +
> > > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> >
> > hm.. haven't you already written info[info_cnt] above by now? I see
> > that above you do (info_cnt - 1), but why such tricks if you can have
> > a temporary struct btf_field_info on the stack, write into it, and if
> > BTF_FIELD_FOUND and idx < info_cnt then write it into info[idx]?
> >
> >
> > > +                       return -E2BIG;
> > > +               else if (ret == BTF_FIELD_IGNORE)
> > > +                       continue;
> > > +               ++idx;
> > >         }
> > > -       return 0;
> > > +       return idx;
> > >  }
> > >
> > >  static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> > >                                 const char *name, int sz, int align, int field_type,
> > > -                               struct btf_field_info *info)
> > > +                               struct btf_field_info *info, int info_cnt)
> > >  {
> > >         const struct btf_var_secinfo *vsi;
> > > +       int ret, idx = 0;
> > >         u32 i, off;
> > > -       int ret;
> > >
> > >         for_each_vsi(i, t, vsi) {
> > >                 const struct btf_type *var = btf_type_by_id(btf, vsi->type);
> > > @@ -3245,19 +3288,30 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> > >                 switch (field_type) {
> > >                 case BTF_FIELD_SPIN_LOCK:
> > >                 case BTF_FIELD_TIMER:
> > > -                       ret = btf_find_field_struct(btf, var_type, off, sz, info);
> > > +                       ret = btf_find_field_struct(btf, var_type, off, sz, &info[idx]);
> > > +                       if (ret < 0)
> > > +                               return ret;
> > > +                       break;
> > > +               case BTF_FIELD_KPTR:
> > > +                       ret = btf_find_field_kptr(btf, var_type, off, sz, &info[idx]);
> > >                         if (ret < 0)
> > >                                 return ret;
> > >                         break;
> > >                 default:
> > >                         return -EFAULT;
> > >                 }
> > > +
> > > +               if (ret == BTF_FIELD_FOUND && idx >= info_cnt)
> >
> > same, already writing past the end of array?
> >
> > > +                       return -E2BIG;
> > > +               if (ret == BTF_FIELD_IGNORE)
> > > +                       continue;
> > > +               ++idx;
> > >         }
> > > -       return 0;
> > > +       return idx;
> > >  }
> > >
> > >  static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > > -                         int field_type, struct btf_field_info *info)
> > > +                         int field_type, struct btf_field_info *info, int info_cnt)
> > >  {
> > >         const char *name;
> > >         int sz, align;
> > > @@ -3273,14 +3327,20 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > >                 sz = sizeof(struct bpf_timer);
> > >                 align = __alignof__(struct bpf_timer);
> > >                 break;
> > > +       case BTF_FIELD_KPTR:
> > > +               name = NULL;
> > > +               sz = sizeof(u64);
> > > +               align = __alignof__(u64);
> >
> > can be 4 on 32-bit arch, is that ok?
> >
>
> Good catch, it must be 8, so will hardcode.
>
> > > +               break;
> > >         default:
> > >                 return -EFAULT;
> > >         }
> > >
> > > +       /* The maximum allowed fields of a certain type will be info_cnt - 1 */
> > >         if (__btf_type_is_struct(t))
> > > -               return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
> > > +               return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt - 1);
> >
> > why -1, to avoid overwriting past the end of array?
> >
>
> Yes, see my reply to Joanne, let's continue discussing it there.
>
> > >         else if (btf_type_is_datasec(t))
> > > -               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
> > > +               return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt - 1);
> > >         return -EINVAL;
> > >  }
> > >
> > > @@ -3290,24 +3350,79 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
> > >   */
> > >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
> > >  {
> > > -       struct btf_field_info info = { .off = -ENOENT };
> > > +       /* btf_find_field requires array of size max + 1 */
> >
> > ok, right, as I expected above, but see also suggestion to not have
> > these weird implicit expectations
> >
> > > +       struct btf_field_info info_arr[2];
> > >         int ret;
> > >
> > > -       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
> > > +       ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, info_arr, ARRAY_SIZE(info_arr));
> > >         if (ret < 0)
> > >                 return ret;
> > > -       return info.off;
> > > +       if (!ret)
> > > +               return -ENOENT;
> > > +       return info_arr[0].off;
> > >  }
> > >
> > >  int btf_find_timer(const struct btf *btf, const struct btf_type *t)
> > >  {
> > > -       struct btf_field_info info = { .off = -ENOENT };
> > > +       /* btf_find_field requires array of size max + 1 */
> > > +       struct btf_field_info info_arr[2];
> > >         int ret;
> > >
> > > -       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
> > > +       ret = btf_find_field(btf, t, BTF_FIELD_TIMER, info_arr, ARRAY_SIZE(info_arr));
> > >         if (ret < 0)
> > >                 return ret;
> > > -       return info.off;
> > > +       if (!ret)
> > > +               return -ENOENT;
> > > +       return info_arr[0].off;
> > > +}
> > > +
> > > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > > +                                       const struct btf_type *t)
> > > +{
> > > +       /* btf_find_field requires array of size max + 1 */
> > > +       struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX + 1];
> > > +       struct bpf_map_value_off *tab;
> > > +       int ret, i, nr_off;
> > > +
> > > +       /* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > > +       BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
> >
> > you can store u32 type_id instead of full btf_type pointer, type
> > looking below in the loop is cheap and won't fail
> >
>
> Ok, will switch to type_id.
>
> >
> > > +
> > > +       ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> > > +       if (ret < 0)
> > > +               return ERR_PTR(ret);
> > > +       if (!ret)
> > > +               return NULL;
> > > +
> >
> > [...]
> >
> > > +
> > > +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> > > +{
> > > +       struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> > > +       bool a_has_kptr = map_value_has_kptr(map_a), b_has_kptr = map_value_has_kptr(map_b);
> > > +       int size;
> > > +
> > > +       if (!a_has_kptr && !b_has_kptr)
> > > +               return true;
> > > +       if ((a_has_kptr && !b_has_kptr) || (!a_has_kptr && b_has_kptr))
> > > +               return false;
> >
> > if (a_has_kptr != b_has_kptr)
> >     return false;
> >
>
> Ack.
>
> > > +       if (tab_a->nr_off != tab_b->nr_off)
> > > +               return false;
> > > +       size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> > > +       return !memcmp(tab_a, tab_b, size);
> > > +}
> > > +
> > >  /* called from workqueue */
> > >  static void bpf_map_free_deferred(struct work_struct *work)
> > >  {
> > >         struct bpf_map *map = container_of(work, struct bpf_map, work);
> > >
> > >         security_bpf_map_free(map);
> > > +       bpf_map_free_kptr_off_tab(map);
> > >         bpf_map_release_memcg(map);
> > >         /* implementation dependent freeing */
> > >         map->ops->map_free(map);
> > > @@ -640,7 +724,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
> > >         int err;
> > >
> > >         if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> > > -           map_value_has_timer(map))
> > > +           map_value_has_timer(map) || map_value_has_kptr(map))
> > >                 return -ENOTSUPP;
> > >
> > >         if (!(vma->vm_flags & VM_SHARED))
> > > @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> > >                         return -EOPNOTSUPP;
> > >         }
> > >
> > > -       if (map->ops->map_check_btf)
> > > +       map->kptr_off_tab = btf_find_kptr(btf, value_type);
> >
> > btf_find_kptr() is so confusingly named. It certainly can find more
> > than one kptr, so at least it should be btf_find_kptrs(). Combining
> > with Joanne's suggestion, btf_parse_kptrs() would indeed be better.
> >
>
> Ok.
>
> > > +       if (map_value_has_kptr(map)) {
> > > +               if (!bpf_capable())
> > > +                       return -EPERM;
> > > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > > +                       ret = -EACCES;
> > > +                       goto free_map_tab;
> > > +               }
> > > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> >
> > what about PERCPU_ARRAY, for instance? Is there something
> > fundamentally wrong to support it for local storage maps?
> >
>
> Plugging in support into maps that already take timers was easier to begin, I
> can do percpu support as a follow up.
>
> In case of local storage, I'm a little worried about how we prevent creating
> reference cycles. There was a thread where find_get_task_by_pid was proposed as
> unstable helper, once we e.g. support embedding task_struct in map, and allow
> storing such pointer in task local storage, it would be pretty easy to construct
> a circular reference cycle.
>
> Should we think about this now, or should we worry about this when task_struct
> is actually supported as kptr? It's not only task_struct, same applies to sock.
>
> There's a discussion to be had, hence I left it out for now.
>
> > > +                       ret = -EOPNOTSUPP;
> > > +                       goto free_map_tab;
> > > +               }
> > > +       }
> > > +
> > > +       if (map->ops->map_check_btf) {
> > >                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > > +               if (ret < 0)
> > > +                       goto free_map_tab;
> > > +       }
> > >
> > > +       return ret;
> > > +free_map_tab:
> > > +       bpf_map_free_kptr_off_tab(map);
> > >         return ret;
> > >  }
> > >
> > > @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
> > >                 return PTR_ERR(map);
> > >
> > >         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > > -           map_value_has_timer(map)) {
> > > +           map_value_has_timer(map) || map_value_has_kptr(map)) {
> > >                 fdput(f);
> > >                 return -ENOTSUPP;
> > >         }
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 4ce9a528fb63..744b7362e52e 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> > >         return __check_ptr_off_reg(env, reg, regno, false);
> > >  }
> > >
> > > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > +                              struct bpf_map_value_off_desc *off_desc,
> > > +                              struct bpf_reg_state *reg, u32 regno)
> > > +{
> > > +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > > +       const char *reg_name = "";
> > > +
> > > +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> >
> > base_type(reg->type) != PTR_TO_BTF_ID ?
> >
> > > +               goto bad_type;
> > > +
> > > +       if (!btf_is_kernel(reg->btf)) {
> > > +               verbose(env, "R%d must point to kernel BTF\n", regno);
> > > +               return -EINVAL;
> > > +       }
> > > +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > > +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > > +
> > > +       if (__check_ptr_off_reg(env, reg, regno, true))
> > > +               return -EACCES;
> > > +
> > > +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > +                                 off_desc->btf, off_desc->btf_id))
> > > +               goto bad_type;
> > > +       return 0;
> > > +bad_type:
> > > +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > > +               reg_type_str(env, reg->type), reg_name);
> > > +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> >
> > why two separate verbose calls, you can easily combine them (and they
> > should be output on a single line given it's a single error)
> >
>
> reg_type_str cannot be called more than once in the same statement, since it
> reuses the same buffer.
>

I think to fix this we can use an array of buffers (e.g. max of 6 or so), and
then use i++ % ARRAY_SIZE(...), this would allow calling it twice in the same
statement.

> > > +       return -EINVAL;
> > > +}
> > > +
> >
> > [...]
>
> --
> Kartikeya

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
  2022-03-21 23:39   ` Joanne Koong
  2022-03-22  5:45   ` Andrii Nakryiko
@ 2022-03-22 18:06   ` Martin KaFai Lau
  2022-03-25 14:45     ` Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 44+ messages in thread
From: Martin KaFai Lau @ 2022-03-22 18:06 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 09:25:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  			return -EOPNOTSUPP;
>  	}
>  
> -	if (map->ops->map_check_btf)
> +	map->kptr_off_tab = btf_find_kptr(btf, value_type);
> +	if (map_value_has_kptr(map)) {
> +		if (!bpf_capable())
> +			return -EPERM;
Not sure if this has been brought up.

No need to bpf_map_free_kptr_off_tab() in the case?

> +		if (map->map_flags & BPF_F_RDONLY_PROG) {
> +			ret = -EACCES;
> +			goto free_map_tab;
> +		}
> +		if (map->map_type != BPF_MAP_TYPE_HASH &&
> +		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> +		    map->map_type != BPF_MAP_TYPE_ARRAY) {
> +			ret = -EOPNOTSUPP;
> +			goto free_map_tab;
> +		}
> +	}
If btf_find_kptr() returns err, it can be ignored and continue ?

btw, it is quite unusual to store an err ptr in map.
How about only stores NULL or a valid ptr in map->kptr_off_tab?

> +
> +	if (map->ops->map_check_btf) {
>  		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> +		if (ret < 0)
> +			goto free_map_tab;
> +	}
>
> +	return ret;
> +free_map_tab:
> +	bpf_map_free_kptr_off_tab(map);
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22  7:16     ` Kumar Kartikeya Dwivedi
  2022-03-22  7:43       ` Kumar Kartikeya Dwivedi
@ 2022-03-22 18:52       ` Andrii Nakryiko
  2022-03-25 14:42         ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22 18:52 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 12:16 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Mar 22, 2022 at 11:15:42AM IST, Andrii Nakryiko wrote:
> > On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > register must have the same type as in the map value's BTF, and loading
> > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > kernel BTF and BTF ID.
> > >
> > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > BPF program loads this pointer, the object which the pointer points to
> > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > to still access such invalid pointer, but passing such pointers into
> > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > series will close this gap.
> > >
> > > The flexibility offered by allowing programs to dereference such invalid
> > > pointers while being safe at runtime frees the verifier from doing
> > > complex lifetime tracking. As long as the user may ensure that the
> > > object remains valid, it can ensure data read by it from the kernel
> > > object is valid.
> > >
> > > The user indicates that a certain pointer must be treated as kptr
> > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > information is recorded in the object BTF which will be passed into the
> > > kernel by way of map's BTF information. The name and kind from the map
> > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > now, only storing pointers to structs is permitted.
> > >
> > > An example of this specification is shown below:
> > >
> > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > >
> > >         struct map_value {
> > >                 ...
> > >                 struct task_struct __kptr *task;
> > >                 ...
> > >         };
> > >
> > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > task_struct into the map, and then load it later.
> > >
> > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > the verifier cannot know whether the value is NULL or not statically, it
> > > must treat all potential loads at that map value offset as loading a
> > > possibly NULL pointer.
> > >
> > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > same BTF type as specified in the map BTF. The access size must always
> > > be BPF_DW.
> > >
> > > For the map in map support, the kptr_off_tab for outer map is copied
> > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > case, hence would be unnecessary for all other users.
> > >
> > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > having kptr, similar to the bpf_timer case.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h     |  29 +++++++-
> > >  include/linux/btf.h     |   2 +
> > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > >  kernel/bpf/map_in_map.c |   5 +-
> > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > >
> >
> > [...]
> >
> > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > +                              u32 off, int sz, struct btf_field_info *info)
> > > +{
> > > +       /* For PTR, sz is always == 8 */
> > > +       if (!btf_type_is_ptr(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       t = btf_type_by_id(btf, t->type);
> > > +
> > > +       if (!btf_type_is_type_tag(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       /* Reject extra tags */
> > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > +               return -EINVAL;
> >
> > Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
> > you assume there are no more tags with just this check?
> >
>
> All tags are supposed to be before other modifiers, so tags come first, in
> continuity. See [0].

Doesn't seem like kernel's BTF validator enforces this, we should
probably tighten that up a bit. Clang won't emit such BTF, but nothing
prevents user from generating non-conformant BTF on its own either.

>
> Alexei suggested to reject all other tags for now.
>
>  [0]: https://lore.kernel.org/bpf/20220127154627.665163-1-yhs@fb.com
>
> >
> > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > +               return -EINVAL;
> > > +
> > > +       /* Get the base type */
> > > +       if (btf_type_is_modifier(t))
> > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > +       /* Only pointer to struct is allowed */
> > > +       if (!__btf_type_is_struct(t))
> > > +               return -EINVAL;
> > > +
> > > +       info->type = t;
> > > +       info->off = off;
> > > +       return BTF_FIELD_FOUND;
> > >  }

[...]

> > > +       if (map_value_has_kptr(map)) {
> > > +               if (!bpf_capable())
> > > +                       return -EPERM;
> > > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > > +                       ret = -EACCES;
> > > +                       goto free_map_tab;
> > > +               }
> > > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> >
> > what about PERCPU_ARRAY, for instance? Is there something
> > fundamentally wrong to support it for local storage maps?
> >
>
> Plugging in support into maps that already take timers was easier to begin, I
> can do percpu support as a follow up.
>
> In case of local storage, I'm a little worried about how we prevent creating
> reference cycles. There was a thread where find_get_task_by_pid was proposed as
> unstable helper, once we e.g. support embedding task_struct in map, and allow
> storing such pointer in task local storage, it would be pretty easy to construct
> a circular reference cycle.
>
> Should we think about this now, or should we worry about this when task_struct
> is actually supported as kptr? It's not only task_struct, same applies to sock.
>
> There's a discussion to be had, hence I left it out for now.

PERCPU_ARRAY seemed (and still seems) like a safe map to support (same
as PERCPU_HASH), which is why I asked. I see concerns about local
storage, though, thanks.

>
> > > +                       ret = -EOPNOTSUPP;
> > > +                       goto free_map_tab;
> > > +               }
> > > +       }
> > > +
> > > +       if (map->ops->map_check_btf) {
> > >                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > > +               if (ret < 0)
> > > +                       goto free_map_tab;
> > > +       }
> > >
> > > +       return ret;
> > > +free_map_tab:
> > > +       bpf_map_free_kptr_off_tab(map);
> > >         return ret;
> > >  }
> > >
> > > @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
> > >                 return PTR_ERR(map);
> > >
> > >         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > > -           map_value_has_timer(map)) {
> > > +           map_value_has_timer(map) || map_value_has_kptr(map)) {
> > >                 fdput(f);
> > >                 return -ENOTSUPP;
> > >         }
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 4ce9a528fb63..744b7362e52e 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> > >         return __check_ptr_off_reg(env, reg, regno, false);
> > >  }
> > >
> > > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > +                              struct bpf_map_value_off_desc *off_desc,
> > > +                              struct bpf_reg_state *reg, u32 regno)
> > > +{
> > > +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > > +       const char *reg_name = "";
> > > +
> > > +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> >
> > base_type(reg->type) != PTR_TO_BTF_ID ?
> >
> > > +               goto bad_type;
> > > +
> > > +       if (!btf_is_kernel(reg->btf)) {
> > > +               verbose(env, "R%d must point to kernel BTF\n", regno);
> > > +               return -EINVAL;
> > > +       }
> > > +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > > +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > > +
> > > +       if (__check_ptr_off_reg(env, reg, regno, true))
> > > +               return -EACCES;
> > > +
> > > +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > +                                 off_desc->btf, off_desc->btf_id))
> > > +               goto bad_type;
> > > +       return 0;
> > > +bad_type:
> > > +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > > +               reg_type_str(env, reg->type), reg_name);
> > > +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> >
> > why two separate verbose calls, you can easily combine them (and they
> > should be output on a single line given it's a single error)
> >
>
> reg_type_str cannot be called more than once in the same statement, since it
> reuses the same buffer.
>

ah, subtle, ok, never mind then, no big deal to have two verbose()
calls if there is a reason for it

> > > +       return -EINVAL;
> > > +}
> > > +
> >
> > [...]
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22  7:04     ` Kumar Kartikeya Dwivedi
@ 2022-03-22 20:22       ` Andrii Nakryiko
  2022-03-25 14:51         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22 20:22 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Joanne Koong, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 12:05 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Mar 22, 2022 at 05:09:30AM IST, Joanne Koong wrote:
> > On Sun, Mar 20, 2022 at 5:27 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > register must have the same type as in the map value's BTF, and loading
> > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > kernel BTF and BTF ID.
> > >
> > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > BPF program loads this pointer, the object which the pointer points to
> > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > to still access such invalid pointer, but passing such pointers into
> > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > series will close this gap.
> > >
> > > The flexibility offered by allowing programs to dereference such invalid
> > > pointers while being safe at runtime frees the verifier from doing
> > > complex lifetime tracking. As long as the user may ensure that the
> > > object remains valid, it can ensure data read by it from the kernel
> > > object is valid.
> > >
> > > The user indicates that a certain pointer must be treated as kptr
> > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > information is recorded in the object BTF which will be passed into the
> > > kernel by way of map's BTF information. The name and kind from the map
> > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > now, only storing pointers to structs is permitted.
> > >
> > > An example of this specification is shown below:
> > >
> > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > >
> > >         struct map_value {
> > >                 ...
> > >                 struct task_struct __kptr *task;
> > >                 ...
> > >         };
> > >
> > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > task_struct into the map, and then load it later.
> > >
> > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > the verifier cannot know whether the value is NULL or not statically, it
> > > must treat all potential loads at that map value offset as loading a
> > > possibly NULL pointer.
> > >
> > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > same BTF type as specified in the map BTF. The access size must always
> > > be BPF_DW.
> > >
> > > For the map in map support, the kptr_off_tab for outer map is copied
> > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > case, hence would be unnecessary for all other users.
> > >
> > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > having kptr, similar to the bpf_timer case.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h     |  29 +++++++-
> > >  include/linux/btf.h     |   2 +
> > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > >  kernel/bpf/map_in_map.c |   5 +-
> > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > >
> > [...]
> > > +
> > >  struct bpf_map *bpf_map_get(u32 ufd);
> > >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> > >  struct bpf_map *__bpf_map_get(struct fd f);
> > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > index 36bc09b8e890..5b578dc81c04 100644
> > > --- a/include/linux/btf.h
> > > +++ b/include/linux/btf.h
> > > @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> > >                            u32 expected_offset, u32 expected_size);
> > >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> > >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > > +                                       const struct btf_type *t);
> >
> > nit: given that "btf_find_kptr" allocates memory as well, maybe the
> > name "btf_parse_kptr" would be more reflective?
> >
>
> Good point, will change.
>
> > >  bool btf_type_is_void(const struct btf_type *t);
> > >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> > >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 9e17af936a7a..92afbec0a887 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -3164,9 +3164,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
> > >  enum {
> > >         BTF_FIELD_SPIN_LOCK,
> > >         BTF_FIELD_TIMER,
> > > +       BTF_FIELD_KPTR,
> > > +};
> > > +
> > > +enum {
> > > +       BTF_FIELD_IGNORE = 0,
> > > +       BTF_FIELD_FOUND  = 1,
> > >  };
> > >
> > >  struct btf_field_info {
> > > +       const struct btf_type *type;
> > >         u32 off;
> > >  };
> > >
> > > @@ -3174,23 +3181,48 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
> > >                                  u32 off, int sz, struct btf_field_info *info)
> > >  {
> > >         if (!__btf_type_is_struct(t))
> > > -               return 0;
> > > +               return BTF_FIELD_IGNORE;
> > >         if (t->size != sz)
> > > -               return 0;
> > > -       if (info->off != -ENOENT)
> > > -               /* only one such field is allowed */
> > > -               return -E2BIG;
> > > +               return BTF_FIELD_IGNORE;
> > >         info->off = off;
> > > -       return 0;
> > > +       return BTF_FIELD_FOUND;
> > > +}
> > > +
> > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > +                              u32 off, int sz, struct btf_field_info *info)
> > > +{
> > > +       /* For PTR, sz is always == 8 */
> > > +       if (!btf_type_is_ptr(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       t = btf_type_by_id(btf, t->type);
> > > +
> > > +       if (!btf_type_is_type_tag(t))
> > > +               return BTF_FIELD_IGNORE;
> > > +       /* Reject extra tags */
> > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > +               return -EINVAL;
> > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > +               return -EINVAL;
> > > +
> > > +       /* Get the base type */
> > > +       if (btf_type_is_modifier(t))
> > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > +       /* Only pointer to struct is allowed */
> > > +       if (!__btf_type_is_struct(t))
> > > +               return -EINVAL;
> > > +
> > > +       info->type = t;
> > > +       info->off = off;
> > > +       return BTF_FIELD_FOUND;
> > >  }
> > >
> > >  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
> > >                                  const char *name, int sz, int align, int field_type,
> > > -                                struct btf_field_info *info)
> > > +                                struct btf_field_info *info, int info_cnt)
> >
> > From my understanding, this patch now modifies btf_find_struct_field
> > and btf_find_datasec_var such that the "info" that is passed in has to
> > be an array of size max possible + 1 while "info_cnt" is the max
> > possible count, or we risk writing beyond the "info" array passed in.
> > It seems like we could just modify the
> > btf_find_struct_field/btf_find_datasec_var logic so that the user can
> > just pass in info array of max possible size instead of max possible
> > size + 1 - or is your concern that this would require more idx >=
> > info_cnt checks inside the functions? Maybe we should include a
> > comment here and in btf_find_datasec_var to document that "info"
> > should always be max possible size + 1?
> >
>
> So for some context on why this was changed, follow [0].
>
> I agree it's pretty ugly. My first thought was to check it inside the functions,
> but that is also not very great. So I went with this, one more suggestion from
> Alexei was to split it into a find and then fill info, because the error on
> idx >= info_cnt should only happen after we find. Right now the find and fill
> happens together, so to error out, you need an extra element it can fill before
> you bail for ARRAY_SIZE - 1 (which is the actual max).
>
> TBH the find + fill split looks best to me, but open to more suggestions.

I think there is much simpler way that doesn't require unnecessary
copying or splitting anything:

struct btf_field_info tmp;

...

ret = btf_find_field_struct(btf, member_type, off, sz,
                            idx < info_cnt ? &info[idx] : &tmp);

...

That's it.

>
> [0]: https://lore.kernel.org/bpf/20220319181538.nbqdkprjrzkxk7v4@ast-mbp.dhcp.thefacebook.com
>

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case
  2022-03-20 15:55 ` [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
@ 2022-03-22 20:38   ` Andrii Nakryiko
  2022-03-25 15:06     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22 20:38 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Since now there might be at most 10 offsets that need handling in
> copy_map_value, the manual shuffling and special case is no longer going
> to work. Hence, let's generalise the copy_map_value function by using
> a sorted array of offsets to skip regions that must be avoided while
> copying into and out of a map value.
>
> When the map is created, we populate the offset array in struct map,
> with one extra element for map->value_size, which is used as the final
> offset to subtract previous offset from. Then, copy_map_value uses this
> sorted offset array is used to memcpy while skipping timer, spin lock,
> and kptr.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h  | 55 +++++++++++++++++++++++---------------------
>  kernel/bpf/syscall.c | 52 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 81 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9d424d567dd3..6474d2d44b78 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -158,6 +158,10 @@ struct bpf_map_ops {
>  enum {
>         /* Support at most 8 pointers in a BPF map value */
>         BPF_MAP_VALUE_OFF_MAX = 8,
> +       BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> +                               1 + /* for bpf_spin_lock */
> +                               1 + /* for bpf_timer */
> +                               1,  /* for map->value_size sentinel */
>  };
>
>  enum {
> @@ -206,9 +210,17 @@ struct bpf_map {
>         char name[BPF_OBJ_NAME_LEN];
>         bool bypass_spec_v1;
>         bool frozen; /* write-once; write-protected by freeze_mutex */
> -       /* 6 bytes hole */
> -
> -       /* The 3rd and 4th cacheline with misc members to avoid false sharing
> +       /* 2 bytes hole */
> +       struct {
> +               struct {
> +                       u32 off;
> +                       u8 sz;

So here we are wasting 11 * 3 == 33 bytes of padding, right? And it
will only increase as we add bpf_dynptr support soon.

But if we split this struct into two arrays you won't be wasting any of that:

struct {
    u32 cnt;
    u32 field_offs[BPF_MAP_OFF_ARR_MAX];
    u8 szs[BPF_MAP_OFF_ARR_MAX]
} off_arr;

?

Further, given the majority of BPF maps in the system probably won't
use any of these special fields, would it make sense to dynamically
allocate this portion of struct bpf_map?

> +               } field[BPF_MAP_OFF_ARR_MAX];
> +               u32 cnt;
> +       } off_arr;
> +       /* 40 bytes hole */
> +
> +       /* The 4th and 5th cacheline with misc members to avoid false sharing
>          * particularly with refcounting.
>          */
>         atomic64_t refcnt ____cacheline_aligned;
> @@ -250,36 +262,27 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
>                 memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
>         if (unlikely(map_value_has_timer(map)))
>                 memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> +       if (unlikely(map_value_has_kptr(map))) {
> +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> +               int i;
> +
> +               for (i = 0; i < tab->nr_off; i++)
> +                       *(u64 *)(dst + tab->off[i].offset) = 0;
> +       }
>  }
>
>  /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
>  static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
>  {
> -       u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
> +       int i;
>
> -       if (unlikely(map_value_has_spin_lock(map))) {
> -               s_off = map->spin_lock_off;
> -               s_sz = sizeof(struct bpf_spin_lock);
> -       }
> -       if (unlikely(map_value_has_timer(map))) {
> -               t_off = map->timer_off;
> -               t_sz = sizeof(struct bpf_timer);
> -       }
> +       memcpy(dst, src, map->off_arr.field[0].off);
> +       for (i = 1; i < map->off_arr.cnt; i++) {
> +               u32 curr_off = map->off_arr.field[i - 1].off;
> +               u32 next_off = map->off_arr.field[i].off;
>
> -       if (unlikely(s_sz || t_sz)) {
> -               if (s_off < t_off || !s_sz) {
> -                       swap(s_off, t_off);
> -                       swap(s_sz, t_sz);
> -               }
> -               memcpy(dst, src, t_off);
> -               memcpy(dst + t_off + t_sz,
> -                      src + t_off + t_sz,
> -                      s_off - t_off - t_sz);
> -               memcpy(dst + s_off + s_sz,
> -                      src + s_off + s_sz,
> -                      map->value_size - s_off - s_sz);
> -       } else {
> -               memcpy(dst, src, map->value_size);
> +               curr_off += map->off_arr.field[i - 1].sz;
> +               memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
>         }

We can also get away with value_size sentinel value if we rewrite this
logic as follows:

u32 cur_off = 0;
int i;

for (i = 0; i < map->off_arr.cnt; i++) {
    memcpy(dst + cur_off, src + cur_off,  map->off_arr.field[i].off - cur_off);
    cur_off += map->off_arr.field[i].sz;
}

memcpy(dst + cur_off, src + cur_off, map->value_size - cur_off);


It will be as optimal but won't require value_size sentinel.

>  }
>  void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5990d6fa97ab..7b32537bd81f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -30,6 +30,7 @@
>  #include <linux/pgtable.h>
>  #include <linux/bpf_lsm.h>
>  #include <linux/poll.h>
> +#include <linux/sort.h>
>  #include <linux/bpf-netns.h>
>  #include <linux/rcupdate_trace.h>
>  #include <linux/memcontrol.h>
> @@ -851,6 +852,55 @@ int map_check_no_btf(const struct bpf_map *map,
>         return -ENOTSUPP;
>  }
>
> +static int map_off_arr_cmp(const void *_a, const void *_b)
> +{
> +       const u32 a = *(const u32 *)_a;
> +       const u32 b = *(const u32 *)_b;
> +
> +       if (a < b)
> +               return -1;
> +       else if (a > b)
> +               return 1;
> +       return 0;
> +}
> +
> +static void map_populate_off_arr(struct bpf_map *map)
> +{
> +       u32 i;
> +
> +       map->off_arr.cnt = 0;
> +       if (map_value_has_spin_lock(map)) {
> +               i = map->off_arr.cnt;
> +
> +               map->off_arr.field[i].off = map->spin_lock_off;
> +               map->off_arr.field[i].sz = sizeof(struct bpf_spin_lock);
> +               map->off_arr.cnt++;
> +       }
> +       if (map_value_has_timer(map)) {
> +               i = map->off_arr.cnt;
> +
> +               map->off_arr.field[i].off = map->timer_off;
> +               map->off_arr.field[i].sz = sizeof(struct bpf_timer);
> +               map->off_arr.cnt++;
> +       }
> +       if (map_value_has_kptr(map)) {
> +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> +               u32 j = map->off_arr.cnt;
> +
> +               for (i = 0; i < tab->nr_off; i++) {
> +                       map->off_arr.field[j + i].off = tab->off[i].offset;
> +                       map->off_arr.field[j + i].sz = sizeof(u64);
> +               }
> +               map->off_arr.cnt += tab->nr_off;
> +       }
> +
> +       map->off_arr.field[map->off_arr.cnt++].off = map->value_size;

Using a pointer for map->off_arr.field[j + i] and incrementing it
along the cnt would make this code more succinct, and possibly even a
bit more efficient. With my above suggestion to split offs from szs,
you'll need two pointers, but still might be cleaner.

> +       if (map->off_arr.cnt == 1)
> +               return;
> +       sort(map->off_arr.field, map->off_arr.cnt, sizeof(map->off_arr.field[0]),
> +            map_off_arr_cmp, NULL);

See how Jiri is using sort_r() to sort two related arrays and keep
them in sync w.r.t. order.

> +}
> +
>  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>                          u32 btf_key_id, u32 btf_value_id)
>  {
> @@ -1018,6 +1068,8 @@ static int map_create(union bpf_attr *attr)
>                         attr->btf_vmlinux_value_type_id;
>         }
>
> +       map_populate_off_arr(map);
> +
>         err = security_bpf_map_alloc(map);
>         if (err)
>                 goto free_map;
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr
  2022-03-20 15:55 ` [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
@ 2022-03-22 20:51   ` Andrii Nakryiko
  2022-03-25 14:50     ` Kumar Kartikeya Dwivedi
  2022-03-22 21:10   ` Alexei Starovoitov
  1 sibling, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22 20:51 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> A destructor kfunc can be defined as void func(type *), where type may
> be void or any other pointer type as per convenience.
>
> In this patch, we ensure that the type is sane and capture the function
> pointer into off_desc of ptr_off_tab for the specific pointer offset,
> with the invariant that the dtor pointer is always set when 'kptr_ref'
> tag is applied to the pointer's pointee type, which is indicated by the
> flag BPF_MAP_VALUE_OFF_F_REF.
>
> Note that only BTF IDs whose destructor kfunc is registered, thus become
> the allowed BTF IDs for embedding as referenced kptr. Hence it serves
> the purpose of finding dtor kfunc BTF ID, as well acting as a check
> against the whitelist of allowed BTF IDs for this purpose.
>
> Finally, wire up the actual freeing of the referenced pointer if any at
> all available offsets, so that no references are leaked after the BPF
> map goes away and the BPF program previously moved the ownership a
> referenced pointer into it.
>
> The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
> will free any existing referenced kptr. The same case is with LRU map's
> bpf_lru_push_free/htab_lru_push_free functions, which are extended to
> reset unreferenced and free referenced kptr.
>
> Note that unlike BPF timers, kptr is not reset or freed when map uref
> drops to zero.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   |  4 ++
>  include/linux/btf.h   |  2 +
>  kernel/bpf/arraymap.c | 14 ++++++-
>  kernel/bpf/btf.c      | 86 ++++++++++++++++++++++++++++++++++++++++++-
>  kernel/bpf/hashtab.c  | 29 ++++++++++-----
>  kernel/bpf/syscall.c  | 57 +++++++++++++++++++++++++---
>  6 files changed, 173 insertions(+), 19 deletions(-)
>

[...]

> +                       /* This call also serves as a whitelist of allowed objects that
> +                        * can be used as a referenced pointer and be stored in a map at
> +                        * the same time.
> +                        */
> +                       dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
> +                       if (dtor_btf_id < 0) {
> +                               ret = dtor_btf_id;
> +                               btf_put(off_btf);

do btf_put() in end section instead of copy/pasting it in every single
branch here and below?

> +                               goto end;
> +                       }
> +
> +                       dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
> +                       if (!dtor_func || !btf_type_is_func(dtor_func)) {
> +                               ret = -EINVAL;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +

[...]

> -       while (tab->nr_off--)
> +       while (tab->nr_off--) {
>                 btf_put(tab->off[tab->nr_off].btf);
> +               if (tab->off[tab->nr_off].module)
> +                       module_put(tab->off[tab->nr_off].module);
> +       }
>         kfree(tab);
>         return ERR_PTR(ret);
>  }
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 65877967f414..fa4a0a8754c5 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -725,12 +725,16 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
>         return insn - insn_buf;
>  }
>
> -static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
> +static void check_and_free_timer_and_kptr(struct bpf_htab *htab,

we'll need to rename this to
check_and_free_timer_and_kptrs_and_dynptrs() pretty soon, so let's
better figure out more generic name now? :)

Don't know, something like "release_fields" or something?

> +                                         struct htab_elem *elem,
> +                                         bool free_kptr)
>  {
> +       void *map_value = elem->key + round_up(htab->map.key_size, 8);
> +
>         if (unlikely(map_value_has_timer(&htab->map)))
> -               bpf_timer_cancel_and_free(elem->key +
> -                                         round_up(htab->map.key_size, 8) +
> -                                         htab->map.timer_off);
> +               bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
> +       if (unlikely(map_value_has_kptr(&htab->map)) && free_kptr)
> +               bpf_map_free_kptr(&htab->map, map_value);

kptrs (please use plural consistently for functions that actually
handle multiple kptrs).

>  }
>
>  /* It is called from the bpf_lru_list when the LRU needs to delete

[...]

>  static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
>  {
> -       check_and_free_timer(htab, elem);
> +       check_and_free_timer_and_kptr(htab, elem, true);
>         bpf_lru_push_free(&htab->lru, &elem->lru_node);
>  }
>
> @@ -1420,7 +1424,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
>                 struct htab_elem *l;
>
>                 hlist_nulls_for_each_entry(l, n, head, hash_node)
> -                       check_and_free_timer(htab, l);
> +                       /* We don't reset or free kptr on uref dropping to zero,
> +                        * hence set free_kptr to false.
> +                        */
> +                       check_and_free_timer_and_kptr(htab, l, false);

ok, now reading this, I wonder if it's better to keep timer and kptrs
clean ups separate? And then dynptrs separate still? Instead of adding
all these flags.

>                 cond_resched_rcu();
>         }
>         rcu_read_unlock();
> @@ -1430,6 +1437,7 @@ static void htab_map_free_timers(struct bpf_map *map)
>  {
>         struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
>
> +       /* We don't reset or free kptr on uref dropping to zero. */
>         if (likely(!map_value_has_timer(&htab->map)))
>                 return;
>         if (!htab_is_prealloc(htab))

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map
  2022-03-20 15:55 ` [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-03-22 20:59   ` Martin KaFai Lau
  2022-03-25 14:57     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Martin KaFai Lau @ 2022-03-22 20:59 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
>  static int map_kptr_match_type(struct bpf_verifier_env *env,
>  			       struct bpf_map_value_off_desc *off_desc,
> -			       struct bpf_reg_state *reg, u32 regno)
> +			       struct bpf_reg_state *reg, u32 regno,
> +			       bool ref_ptr)
>  {
>  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
>  	const char *reg_name = "";
> +	bool fixed_off_ok = true;
>  
>  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
>  		goto bad_type;
> @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
>  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
>  
> -	if (__check_ptr_off_reg(env, reg, regno, true))
> +	if (ref_ptr) {
> +		if (!reg->ref_obj_id) {
> +			verbose(env, "R%d must be referenced %s%s\n", regno,
> +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +			return -EACCES;
> +		}
The is_release_function() checkings under check_helper_call() is
not the same?

> +		/* reg->off can be used to store pointer to a certain type formed by
> +		 * incrementing pointer of a parent structure the object is embedded in,
> +		 * e.g. map may expect unreferenced struct path *, and user should be
> +		 * allowed a store using &file->f_path. However, in the case of
> +		 * referenced pointer, we cannot do this, because the reference is only
> +		 * for the parent structure, not its embedded object(s), and because
> +		 * the transfer of ownership happens for the original pointer to and
> +		 * from the map (before its eventual release).
> +		 */
> +		if (reg->off)
> +			fixed_off_ok = false;
I thought the new check_func_arg_reg_off() is supposed to handle the
is_release_function() case.  The check_func_arg_reg_off() called
in check_func_arg() can not handle this case?

> +	}
> +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
>  		return -EACCES;
>  
>  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,

[ ... ]

> @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
>  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
>  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
>  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
>  
>  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
>  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
>  	[ARG_PTR_TO_TIMER]		= &timer_types,
> +	[ARG_PTR_TO_KPTR]		= &kptr_types,
>  };
>  
>  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  			  enum bpf_arg_type arg_type,
> -			  const u32 *arg_btf_id)
> +			  const u32 *arg_btf_id,
> +			  struct bpf_call_arg_meta *meta)
>  {
>  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
>  	enum bpf_reg_type expected, type = reg->type;
> @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  			arg_btf_id = compatible->btf_id;
>  		}
>  
> -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> -					  btf_vmlinux, *arg_btf_id)) {
> +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> +				return -EACCES;
> +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> +						 btf_vmlinux, *arg_btf_id)) {
>  			verbose(env, "R%d is of type %s but %s is expected\n",
>  				regno, kernel_type_name(reg->btf, reg->btf_id),
>  				kernel_type_name(btf_vmlinux, *arg_btf_id));

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr
  2022-03-20 15:55 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
@ 2022-03-22 21:00   ` Andrii Nakryiko
  2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
  2022-03-24  9:10   ` Jiri Olsa
  1 sibling, 1 reply; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-22 21:00 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:56 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This uses the __kptr and __kptr_ref macros as well, and tries to test
> the stuff that is supposed to work, since we have negative tests in
> test_verifier suite. Also include some code to test map-in-map support,
> such that the inner_map_meta matches the kptr_off_tab of map added as
> element.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/map_kptr.c       |  20 ++
>  tools/testing/selftests/bpf/progs/map_kptr.c  | 194 ++++++++++++++++++
>  2 files changed, 214 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
> new file mode 100644
> index 000000000000..688732295ce9
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
> @@ -0,0 +1,20 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +
> +#include "map_kptr.skel.h"
> +
> +void test_map_kptr(void)
> +{
> +       struct map_kptr *skel;
> +       char buf[24];
> +       int key = 0;
> +
> +       skel = map_kptr__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
> +               return;
> +       ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_map), &key, buf, 0),
> +                 "bpf_map_update_elem hash_map");
> +       ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key, buf, 0),
> +                 "bpf_map_update_elem hash_malloc_map");


nit: it's quite messy and verbose, please do the operation outside of
ASSERT_OK() and just validate error:

err = bpf_map_update_elem(...);
ASSERT_OK(err, "hash_map_update");

And keep those ASSERT_XXX() string descriptors relatively short (see
how they are used internally in ASSERT_XXX() macros).

> +       map_kptr__destroy(skel);
> +}

[...]

> +
> +extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
> +extern struct prog_test_ref_kfunc *
> +bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
> +extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
> +
> +static __always_inline

nit: no need for __always_inline everywhere, just `static void` will
do the right thing.

> +void test_kptr_unref(struct map_value *v)
> +{
> +       struct prog_test_ref_kfunc *p;
> +
> +       p = v->unref_ptr;
> +       /* store untrusted_ptr_or_null_ */
> +       v->unref_ptr = p;
> +       if (!p)
> +               return;
> +       if (p->a + p->b > 100)
> +               return;
> +       /* store untrusted_ptr_ */
> +       v->unref_ptr = p;
> +       /* store NULL */
> +       v->unref_ptr = NULL;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr
  2022-03-20 15:55 ` [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
  2022-03-22 20:51   ` Andrii Nakryiko
@ 2022-03-22 21:10   ` Alexei Starovoitov
  2022-03-25 15:07     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 44+ messages in thread
From: Alexei Starovoitov @ 2022-03-22 21:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> +               /* Find and stash the function pointer for the destruction function that
> +                * needs to be eventually invoked from the map free path.
> +                */
> +               if (info_arr[i].flags & BPF_MAP_VALUE_OFF_F_REF) {
> +                       const struct btf_type *dtor_func, *dtor_func_proto;
> +                       const struct btf_param *args;
> +                       const char *dtor_func_name;
> +                       unsigned long addr;
> +                       s32 dtor_btf_id;
> +                       u32 nr_args;
> +
> +                       /* This call also serves as a whitelist of allowed objects that
> +                        * can be used as a referenced pointer and be stored in a map at
> +                        * the same time.
> +                        */
> +                       dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
> +                       if (dtor_btf_id < 0) {
> +                               ret = dtor_btf_id;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +
> +                       dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
> +                       if (!dtor_func || !btf_type_is_func(dtor_func)) {
> +                               ret = -EINVAL;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +
> +                       dtor_func_proto = btf_type_by_id(off_btf, dtor_func->type);
> +                       if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto)) {
> +                               ret = -EINVAL;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +
> +                       /* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
> +                       t = btf_type_by_id(off_btf, dtor_func_proto->type);
> +                       if (!t || !btf_type_is_void(t)) {
> +                               ret = -EINVAL;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +
> +                       nr_args = btf_type_vlen(dtor_func_proto);
> +                       args = btf_params(dtor_func_proto);
> +
> +                       t = NULL;
> +                       if (nr_args)
> +                               t = btf_type_by_id(off_btf, args[0].type);
> +                       /* Allow any pointer type, as width on targets Linux supports
> +                        * will be same for all pointer types (i.e. sizeof(void *))
> +                        */
> +                       if (nr_args != 1 || !t || !btf_type_is_ptr(t)) {
> +                               ret = -EINVAL;
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +
> +                       if (btf_is_module(btf)) {
> +                               mod = btf_try_get_module(off_btf);
> +                               if (!mod) {
> +                                       ret = -ENXIO;
> +                                       btf_put(off_btf);
> +                                       goto end;
> +                               }
> +                       }
> +
> +                       dtor_func_name = __btf_name_by_offset(off_btf, dtor_func->name_off);
> +                       addr = kallsyms_lookup_name(dtor_func_name);
> +                       if (!addr) {
> +                               ret = -EINVAL;
> +                               module_put(mod);
> +                               btf_put(off_btf);
> +                               goto end;
> +                       }
> +                       tab->off[i].dtor = (void *)addr;

Most of the above should probably be in register_btf_id_dtor_kfuncs().
It's best to fail early.
Here we'll just remember dtor function pointer to speed up release.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr
  2022-03-20 15:55 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
  2022-03-22 21:00   ` Andrii Nakryiko
@ 2022-03-24  9:10   ` Jiri Olsa
  2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 44+ messages in thread
From: Jiri Olsa @ 2022-03-24  9:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sun, Mar 20, 2022 at 09:25:09PM +0530, Kumar Kartikeya Dwivedi wrote:

SNIP

> +static __always_inline
> +void test_kptr(struct map_value *v)
> +{
> +	test_kptr_unref(v);
> +	test_kptr_ref(v);
> +	test_kptr_get(v);
> +}
> +
> +SEC("tc")
> +int test_map_kptr(struct __sk_buff *ctx)
> +{
> +	void *maps[] = {
> +		&array_map,
> +		&hash_map,
> +		&hash_malloc_map,
> +		&lru_hash_map,
> +	};
> +	struct map_value *v;
> +	int i, key = 0;
> +
> +	for (i = 0; i < sizeof(maps) / sizeof(*maps); i++) {
> +		v = bpf_map_lookup_elem(&array_map, &key);

hi,
I was just quickly checking on the usage, so I might be missing something,
but should this be lookup to maps[i] instead of array_map ?

similar below in test_map_in_map_kptr

jirka

> +		if (!v)
> +			return 0;
> +		test_kptr(v);
> +	}
> +	return 0;
> +}
> +
> +SEC("tc")
> +int test_map_in_map_kptr(struct __sk_buff *ctx)
> +{
> +	void *map_of_maps[] = {
> +		&array_of_array_maps,
> +		&array_of_hash_maps,
> +		&array_of_hash_malloc_maps,
> +		&array_of_lru_hash_maps,
> +		&hash_of_array_maps,
> +		&hash_of_hash_maps,
> +		&hash_of_hash_malloc_maps,
> +		&hash_of_lru_hash_maps,
> +	};
> +	struct map_value *v;
> +	int i, key = 0;
> +	void *map;
> +
> +	for (i = 0; i < sizeof(map_of_maps) / sizeof(*map_of_maps); i++) {
> +		map = bpf_map_lookup_elem(&array_of_array_maps, &key);
> +		if (!map)
> +			return 0;
> +		v = bpf_map_lookup_elem(map, &key);
> +		if (!v)
> +			return 0;
> +		test_kptr(v);
> +	}
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22 18:52       ` Andrii Nakryiko
@ 2022-03-25 14:42         ` Kumar Kartikeya Dwivedi
  2022-03-25 22:59           ` Andrii Nakryiko
  0 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 12:22:20AM IST, Andrii Nakryiko wrote:
> On Tue, Mar 22, 2022 at 12:16 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, Mar 22, 2022 at 11:15:42AM IST, Andrii Nakryiko wrote:
> > > On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > > register must have the same type as in the map value's BTF, and loading
> > > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > > kernel BTF and BTF ID.
> > > >
> > > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > > BPF program loads this pointer, the object which the pointer points to
> > > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > > to still access such invalid pointer, but passing such pointers into
> > > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > > series will close this gap.
> > > >
> > > > The flexibility offered by allowing programs to dereference such invalid
> > > > pointers while being safe at runtime frees the verifier from doing
> > > > complex lifetime tracking. As long as the user may ensure that the
> > > > object remains valid, it can ensure data read by it from the kernel
> > > > object is valid.
> > > >
> > > > The user indicates that a certain pointer must be treated as kptr
> > > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > > information is recorded in the object BTF which will be passed into the
> > > > kernel by way of map's BTF information. The name and kind from the map
> > > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > > now, only storing pointers to structs is permitted.
> > > >
> > > > An example of this specification is shown below:
> > > >
> > > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > > >
> > > >         struct map_value {
> > > >                 ...
> > > >                 struct task_struct __kptr *task;
> > > >                 ...
> > > >         };
> > > >
> > > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > > task_struct into the map, and then load it later.
> > > >
> > > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > > the verifier cannot know whether the value is NULL or not statically, it
> > > > must treat all potential loads at that map value offset as loading a
> > > > possibly NULL pointer.
> > > >
> > > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > > same BTF type as specified in the map BTF. The access size must always
> > > > be BPF_DW.
> > > >
> > > > For the map in map support, the kptr_off_tab for outer map is copied
> > > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > > case, hence would be unnecessary for all other users.
> > > >
> > > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > > having kptr, similar to the bpf_timer case.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h     |  29 +++++++-
> > > >  include/linux/btf.h     |   2 +
> > > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > > >  kernel/bpf/map_in_map.c |   5 +-
> > > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > > +                              u32 off, int sz, struct btf_field_info *info)
> > > > +{
> > > > +       /* For PTR, sz is always == 8 */
> > > > +       if (!btf_type_is_ptr(t))
> > > > +               return BTF_FIELD_IGNORE;
> > > > +       t = btf_type_by_id(btf, t->type);
> > > > +
> > > > +       if (!btf_type_is_type_tag(t))
> > > > +               return BTF_FIELD_IGNORE;
> > > > +       /* Reject extra tags */
> > > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > > +               return -EINVAL;
> > >
> > > Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
> > > you assume there are no more tags with just this check?
> > >
> >
> > All tags are supposed to be before other modifiers, so tags come first, in
> > continuity. See [0].
>
> Doesn't seem like kernel's BTF validator enforces this, we should
> probably tighten that up a bit. Clang won't emit such BTF, but nothing
> prevents user from generating non-conformant BTF on its own either.
>

Right, what would be a good place to do this validation? When loading a BTF
using bpf(2) syscall, or when we do this btf_parse_kptrs?

> >
> > Alexei suggested to reject all other tags for now.
> >
> >  [0]: https://lore.kernel.org/bpf/20220127154627.665163-1-yhs@fb.com
> >
> > >
> > > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > > +               return -EINVAL;
> > > > +
> > > > +       /* Get the base type */
> > > > +       if (btf_type_is_modifier(t))
> > > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > > +       /* Only pointer to struct is allowed */
> > > > +       if (!__btf_type_is_struct(t))
> > > > +               return -EINVAL;
> > > > +
> > > > +       info->type = t;
> > > > +       info->off = off;
> > > > +       return BTF_FIELD_FOUND;
> > > >  }
>
> [...]
>
> > > > +       if (map_value_has_kptr(map)) {
> > > > +               if (!bpf_capable())
> > > > +                       return -EPERM;
> > > > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > > > +                       ret = -EACCES;
> > > > +                       goto free_map_tab;
> > > > +               }
> > > > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > > > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > > > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> > >
> > > what about PERCPU_ARRAY, for instance? Is there something
> > > fundamentally wrong to support it for local storage maps?
> > >
> >
> > Plugging in support into maps that already take timers was easier to begin, I
> > can do percpu support as a follow up.
> >
> > In case of local storage, I'm a little worried about how we prevent creating
> > reference cycles. There was a thread where find_get_task_by_pid was proposed as
> > unstable helper, once we e.g. support embedding task_struct in map, and allow
> > storing such pointer in task local storage, it would be pretty easy to construct
> > a circular reference cycle.
> >
> > Should we think about this now, or should we worry about this when task_struct
> > is actually supported as kptr? It's not only task_struct, same applies to sock.
> >
> > There's a discussion to be had, hence I left it out for now.
>
> PERCPU_ARRAY seemed (and still seems) like a safe map to support (same
> as PERCPU_HASH), which is why I asked. I see concerns about local
> storage, though, thanks.
>

I'll look into it after this lands.

> >
> > > > +                       ret = -EOPNOTSUPP;
> > > > +                       goto free_map_tab;
> > > > +               }
> > > > +       }
> > > > +
> > > > +       if (map->ops->map_check_btf) {
> > > >                 ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > > > +               if (ret < 0)
> > > > +                       goto free_map_tab;
> > > > +       }
> > > >
> > > > +       return ret;
> > > > +free_map_tab:
> > > > +       bpf_map_free_kptr_off_tab(map);
> > > >         return ret;
> > > >  }
> > > >
> > > > @@ -1639,7 +1745,7 @@ static int map_freeze(const union bpf_attr *attr)
> > > >                 return PTR_ERR(map);
> > > >
> > > >         if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > > > -           map_value_has_timer(map)) {
> > > > +           map_value_has_timer(map) || map_value_has_kptr(map)) {
> > > >                 fdput(f);
> > > >                 return -ENOTSUPP;
> > > >         }
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 4ce9a528fb63..744b7362e52e 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -3507,6 +3507,94 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> > > >         return __check_ptr_off_reg(env, reg, regno, false);
> > > >  }
> > > >
> > > > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > > +                              struct bpf_map_value_off_desc *off_desc,
> > > > +                              struct bpf_reg_state *reg, u32 regno)
> > > > +{
> > > > +       const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > > > +       const char *reg_name = "";
> > > > +
> > > > +       if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > >
> > > base_type(reg->type) != PTR_TO_BTF_ID ?
> > >
> > > > +               goto bad_type;
> > > > +
> > > > +       if (!btf_is_kernel(reg->btf)) {
> > > > +               verbose(env, "R%d must point to kernel BTF\n", regno);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +       /* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > > > +       reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > > > +
> > > > +       if (__check_ptr_off_reg(env, reg, regno, true))
> > > > +               return -EACCES;
> > > > +
> > > > +       if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > +                                 off_desc->btf, off_desc->btf_id))
> > > > +               goto bad_type;
> > > > +       return 0;
> > > > +bad_type:
> > > > +       verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > > > +               reg_type_str(env, reg->type), reg_name);
> > > > +       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > >
> > > why two separate verbose calls, you can easily combine them (and they
> > > should be output on a single line given it's a single error)
> > >
> >
> > reg_type_str cannot be called more than once in the same statement, since it
> > reuses the same buffer.
> >
>
> ah, subtle, ok, never mind then, no big deal to have two verbose()
> calls if there is a reason for it
>
> > > > +       return -EINVAL;
> > > > +}
> > > > +
> > >
> > > [...]
> >
> > --
> > Kartikeya

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22 18:06   ` Martin KaFai Lau
@ 2022-03-25 14:45     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:45 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Mar 22, 2022 at 11:36:55PM IST, Martin KaFai Lau wrote:
> On Sun, Mar 20, 2022 at 09:25:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> > @@ -820,9 +904,31 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >  			return -EOPNOTSUPP;
> >  	}
> >
> > -	if (map->ops->map_check_btf)
> > +	map->kptr_off_tab = btf_find_kptr(btf, value_type);
> > +	if (map_value_has_kptr(map)) {
> > +		if (!bpf_capable())
> > +			return -EPERM;
> Not sure if this has been brought up.
>
> No need to bpf_map_free_kptr_off_tab() in the case?
>

Good catch, it should indeed be freed.
For the case of map_check_btf in caller, I'm relying on map_free callback to
handle the freeing.

> > +		if (map->map_flags & BPF_F_RDONLY_PROG) {
> > +			ret = -EACCES;
> > +			goto free_map_tab;
> > +		}
> > +		if (map->map_type != BPF_MAP_TYPE_HASH &&
> > +		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > +		    map->map_type != BPF_MAP_TYPE_ARRAY) {
> > +			ret = -EOPNOTSUPP;
> > +			goto free_map_tab;
> > +		}
> > +	}
> If btf_find_kptr() returns err, it can be ignored and continue ?
>
> btw, it is quite unusual to store an err ptr in map.
> How about only stores NULL or a valid ptr in map->kptr_off_tab?
>

It allows us to report a clear error from process_kptr_func, similar to storing
error in place of spin_lock_off and timer_off, so for consistency I kept it
similar to those. But IS_ERR_OR_NULL still means no kptr_off_tab is present, in
places where it matters we don't distinguish between the two (e.g.
map_value_has_kptr also checks for both).

> > +
> > +	if (map->ops->map_check_btf) {
> >  		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > +		if (ret < 0)
> > +			goto free_map_tab;
> > +	}
> >
> > +	return ret;
> > +free_map_tab:
> > +	bpf_map_free_kptr_off_tab(map);
> >  	return ret;
> >  }

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr
  2022-03-22 20:51   ` Andrii Nakryiko
@ 2022-03-25 14:50     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 02:21:56AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > A destructor kfunc can be defined as void func(type *), where type may
> > be void or any other pointer type as per convenience.
> >
> > In this patch, we ensure that the type is sane and capture the function
> > pointer into off_desc of ptr_off_tab for the specific pointer offset,
> > with the invariant that the dtor pointer is always set when 'kptr_ref'
> > tag is applied to the pointer's pointee type, which is indicated by the
> > flag BPF_MAP_VALUE_OFF_F_REF.
> >
> > Note that only BTF IDs whose destructor kfunc is registered, thus become
> > the allowed BTF IDs for embedding as referenced kptr. Hence it serves
> > the purpose of finding dtor kfunc BTF ID, as well acting as a check
> > against the whitelist of allowed BTF IDs for this purpose.
> >
> > Finally, wire up the actual freeing of the referenced pointer if any at
> > all available offsets, so that no references are leaked after the BPF
> > map goes away and the BPF program previously moved the ownership a
> > referenced pointer into it.
> >
> > The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
> > will free any existing referenced kptr. The same case is with LRU map's
> > bpf_lru_push_free/htab_lru_push_free functions, which are extended to
> > reset unreferenced and free referenced kptr.
> >
> > Note that unlike BPF timers, kptr is not reset or freed when map uref
> > drops to zero.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h   |  4 ++
> >  include/linux/btf.h   |  2 +
> >  kernel/bpf/arraymap.c | 14 ++++++-
> >  kernel/bpf/btf.c      | 86 ++++++++++++++++++++++++++++++++++++++++++-
> >  kernel/bpf/hashtab.c  | 29 ++++++++++-----
> >  kernel/bpf/syscall.c  | 57 +++++++++++++++++++++++++---
> >  6 files changed, 173 insertions(+), 19 deletions(-)
> >
>
> [...]
>
> > +                       /* This call also serves as a whitelist of allowed objects that
> > +                        * can be used as a referenced pointer and be stored in a map at
> > +                        * the same time.
> > +                        */
> > +                       dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
> > +                       if (dtor_btf_id < 0) {
> > +                               ret = dtor_btf_id;
> > +                               btf_put(off_btf);
>
> do btf_put() in end section instead of copy/pasting it in every single
> branch here and below?
>

Ok.

> > +                               goto end;
> > +                       }
> > +
> > +                       dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
> > +                       if (!dtor_func || !btf_type_is_func(dtor_func)) {
> > +                               ret = -EINVAL;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
>
> [...]
>
> > -       while (tab->nr_off--)
> > +       while (tab->nr_off--) {
> >                 btf_put(tab->off[tab->nr_off].btf);
> > +               if (tab->off[tab->nr_off].module)
> > +                       module_put(tab->off[tab->nr_off].module);
> > +       }
> >         kfree(tab);
> >         return ERR_PTR(ret);
> >  }
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 65877967f414..fa4a0a8754c5 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -725,12 +725,16 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
> >         return insn - insn_buf;
> >  }
> >
> > -static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
> > +static void check_and_free_timer_and_kptr(struct bpf_htab *htab,
>
> we'll need to rename this to
> check_and_free_timer_and_kptrs_and_dynptrs() pretty soon, so let's
> better figure out more generic name now? :)
>
> Don't know, something like "release_fields" or something?
>

Ok, will change.

> > +                                         struct htab_elem *elem,
> > +                                         bool free_kptr)
> >  {
> > +       void *map_value = elem->key + round_up(htab->map.key_size, 8);
> > +
> >         if (unlikely(map_value_has_timer(&htab->map)))
> > -               bpf_timer_cancel_and_free(elem->key +
> > -                                         round_up(htab->map.key_size, 8) +
> > -                                         htab->map.timer_off);
> > +               bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
> > +       if (unlikely(map_value_has_kptr(&htab->map)) && free_kptr)
> > +               bpf_map_free_kptr(&htab->map, map_value);
>
> kptrs (please use plural consistently for functions that actually
> handle multiple kptrs).
>

Ok, will audit all other places as well.

> >  }
> >
> >  /* It is called from the bpf_lru_list when the LRU needs to delete
>
> [...]
>
> >  static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
> >  {
> > -       check_and_free_timer(htab, elem);
> > +       check_and_free_timer_and_kptr(htab, elem, true);
> >         bpf_lru_push_free(&htab->lru, &elem->lru_node);
> >  }
> >
> > @@ -1420,7 +1424,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
> >                 struct htab_elem *l;
> >
> >                 hlist_nulls_for_each_entry(l, n, head, hash_node)
> > -                       check_and_free_timer(htab, l);
> > +                       /* We don't reset or free kptr on uref dropping to zero,
> > +                        * hence set free_kptr to false.
> > +                        */
> > +                       check_and_free_timer_and_kptr(htab, l, false);
>
> ok, now reading this, I wonder if it's better to keep timer and kptrs
> clean ups separate? And then dynptrs separate still? Instead of adding
> all these flags.

Right, in case of array map, we directly called bpf_timer_cancel_and_free,
instead of going through the function named like this, I guess it makes sense to
do the same for hash map, since I assume both kptrs and dynptrs wouldn't want to
be freed on map_release_uref, unlike timers.

>
> >                 cond_resched_rcu();
> >         }
> >         rcu_read_unlock();
> > @@ -1430,6 +1437,7 @@ static void htab_map_free_timers(struct bpf_map *map)
> >  {
> >         struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> >
> > +       /* We don't reset or free kptr on uref dropping to zero. */
> >         if (likely(!map_value_has_timer(&htab->map)))
> >                 return;
> >         if (!htab_is_prealloc(htab))
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-22 20:22       ` Andrii Nakryiko
@ 2022-03-25 14:51         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 01:52:52AM IST, Andrii Nakryiko wrote:
> On Tue, Mar 22, 2022 at 12:05 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, Mar 22, 2022 at 05:09:30AM IST, Joanne Koong wrote:
> > > On Sun, Mar 20, 2022 at 5:27 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > > register must have the same type as in the map value's BTF, and loading
> > > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > > kernel BTF and BTF ID.
> > > >
> > > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > > BPF program loads this pointer, the object which the pointer points to
> > > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > > to still access such invalid pointer, but passing such pointers into
> > > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > > series will close this gap.
> > > >
> > > > The flexibility offered by allowing programs to dereference such invalid
> > > > pointers while being safe at runtime frees the verifier from doing
> > > > complex lifetime tracking. As long as the user may ensure that the
> > > > object remains valid, it can ensure data read by it from the kernel
> > > > object is valid.
> > > >
> > > > The user indicates that a certain pointer must be treated as kptr
> > > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > > information is recorded in the object BTF which will be passed into the
> > > > kernel by way of map's BTF information. The name and kind from the map
> > > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > > now, only storing pointers to structs is permitted.
> > > >
> > > > An example of this specification is shown below:
> > > >
> > > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > > >
> > > >         struct map_value {
> > > >                 ...
> > > >                 struct task_struct __kptr *task;
> > > >                 ...
> > > >         };
> > > >
> > > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > > task_struct into the map, and then load it later.
> > > >
> > > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > > the verifier cannot know whether the value is NULL or not statically, it
> > > > must treat all potential loads at that map value offset as loading a
> > > > possibly NULL pointer.
> > > >
> > > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > > same BTF type as specified in the map BTF. The access size must always
> > > > be BPF_DW.
> > > >
> > > > For the map in map support, the kptr_off_tab for outer map is copied
> > > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > > case, hence would be unnecessary for all other users.
> > > >
> > > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > > having kptr, similar to the bpf_timer case.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h     |  29 +++++++-
> > > >  include/linux/btf.h     |   2 +
> > > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > > >  kernel/bpf/map_in_map.c |   5 +-
> > > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > > >
> > > [...]
> > > > +
> > > >  struct bpf_map *bpf_map_get(u32 ufd);
> > > >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> > > >  struct bpf_map *__bpf_map_get(struct fd f);
> > > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > > index 36bc09b8e890..5b578dc81c04 100644
> > > > --- a/include/linux/btf.h
> > > > +++ b/include/linux/btf.h
> > > > @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> > > >                            u32 expected_offset, u32 expected_size);
> > > >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> > > >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > > > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > > > +                                       const struct btf_type *t);
> > >
> > > nit: given that "btf_find_kptr" allocates memory as well, maybe the
> > > name "btf_parse_kptr" would be more reflective?
> > >
> >
> > Good point, will change.
> >
> > > >  bool btf_type_is_void(const struct btf_type *t);
> > > >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> > > >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index 9e17af936a7a..92afbec0a887 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -3164,9 +3164,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
> > > >  enum {
> > > >         BTF_FIELD_SPIN_LOCK,
> > > >         BTF_FIELD_TIMER,
> > > > +       BTF_FIELD_KPTR,
> > > > +};
> > > > +
> > > > +enum {
> > > > +       BTF_FIELD_IGNORE = 0,
> > > > +       BTF_FIELD_FOUND  = 1,
> > > >  };
> > > >
> > > >  struct btf_field_info {
> > > > +       const struct btf_type *type;
> > > >         u32 off;
> > > >  };
> > > >
> > > > @@ -3174,23 +3181,48 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
> > > >                                  u32 off, int sz, struct btf_field_info *info)
> > > >  {
> > > >         if (!__btf_type_is_struct(t))
> > > > -               return 0;
> > > > +               return BTF_FIELD_IGNORE;
> > > >         if (t->size != sz)
> > > > -               return 0;
> > > > -       if (info->off != -ENOENT)
> > > > -               /* only one such field is allowed */
> > > > -               return -E2BIG;
> > > > +               return BTF_FIELD_IGNORE;
> > > >         info->off = off;
> > > > -       return 0;
> > > > +       return BTF_FIELD_FOUND;
> > > > +}
> > > > +
> > > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > > +                              u32 off, int sz, struct btf_field_info *info)
> > > > +{
> > > > +       /* For PTR, sz is always == 8 */
> > > > +       if (!btf_type_is_ptr(t))
> > > > +               return BTF_FIELD_IGNORE;
> > > > +       t = btf_type_by_id(btf, t->type);
> > > > +
> > > > +       if (!btf_type_is_type_tag(t))
> > > > +               return BTF_FIELD_IGNORE;
> > > > +       /* Reject extra tags */
> > > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > > +               return -EINVAL;
> > > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > > +               return -EINVAL;
> > > > +
> > > > +       /* Get the base type */
> > > > +       if (btf_type_is_modifier(t))
> > > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > > +       /* Only pointer to struct is allowed */
> > > > +       if (!__btf_type_is_struct(t))
> > > > +               return -EINVAL;
> > > > +
> > > > +       info->type = t;
> > > > +       info->off = off;
> > > > +       return BTF_FIELD_FOUND;
> > > >  }
> > > >
> > > >  static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
> > > >                                  const char *name, int sz, int align, int field_type,
> > > > -                                struct btf_field_info *info)
> > > > +                                struct btf_field_info *info, int info_cnt)
> > >
> > > From my understanding, this patch now modifies btf_find_struct_field
> > > and btf_find_datasec_var such that the "info" that is passed in has to
> > > be an array of size max possible + 1 while "info_cnt" is the max
> > > possible count, or we risk writing beyond the "info" array passed in.
> > > It seems like we could just modify the
> > > btf_find_struct_field/btf_find_datasec_var logic so that the user can
> > > just pass in info array of max possible size instead of max possible
> > > size + 1 - or is your concern that this would require more idx >=
> > > info_cnt checks inside the functions? Maybe we should include a
> > > comment here and in btf_find_datasec_var to document that "info"
> > > should always be max possible size + 1?
> > >
> >
> > So for some context on why this was changed, follow [0].
> >
> > I agree it's pretty ugly. My first thought was to check it inside the functions,
> > but that is also not very great. So I went with this, one more suggestion from
> > Alexei was to split it into a find and then fill info, because the error on
> > idx >= info_cnt should only happen after we find. Right now the find and fill
> > happens together, so to error out, you need an extra element it can fill before
> > you bail for ARRAY_SIZE - 1 (which is the actual max).
> >
> > TBH the find + fill split looks best to me, but open to more suggestions.
>
> I think there is much simpler way that doesn't require unnecessary
> copying or splitting anything:
>
> struct btf_field_info tmp;
>
> ...
>
> ret = btf_find_field_struct(btf, member_type, off, sz,
>                             idx < info_cnt ? &info[idx] : &tmp);
>
> ...
>
> That's it.
>

Indeed, not sure why I was overthinking this, it should work :).

> >
> > [0]: https://lore.kernel.org/bpf/20220319181538.nbqdkprjrzkxk7v4@ast-mbp.dhcp.thefacebook.com
> >
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr
  2022-03-24  9:10   ` Jiri Olsa
@ 2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:52 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Thu, Mar 24, 2022 at 02:40:14PM IST, Jiri Olsa wrote:
> On Sun, Mar 20, 2022 at 09:25:09PM +0530, Kumar Kartikeya Dwivedi wrote:
>
> SNIP
>
> > +static __always_inline
> > +void test_kptr(struct map_value *v)
> > +{
> > +	test_kptr_unref(v);
> > +	test_kptr_ref(v);
> > +	test_kptr_get(v);
> > +}
> > +
> > +SEC("tc")
> > +int test_map_kptr(struct __sk_buff *ctx)
> > +{
> > +	void *maps[] = {
> > +		&array_map,
> > +		&hash_map,
> > +		&hash_malloc_map,
> > +		&lru_hash_map,
> > +	};
> > +	struct map_value *v;
> > +	int i, key = 0;
> > +
> > +	for (i = 0; i < sizeof(maps) / sizeof(*maps); i++) {
> > +		v = bpf_map_lookup_elem(&array_map, &key);
>
> hi,
> I was just quickly checking on the usage, so I might be missing something,
> but should this be lookup to maps[i] instead of array_map ?
>
> similar below in test_map_in_map_kptr
>

My bad, it's a braino. Will fix in v4.
Thanks!

> jirka
>
> > +		if (!v)
> > +			return 0;
> > +		test_kptr(v);
> > +	}
> > +	return 0;
> > +}
> > +
> > +SEC("tc")
> > +int test_map_in_map_kptr(struct __sk_buff *ctx)
> > +{
> > +	void *map_of_maps[] = {
> > +		&array_of_array_maps,
> > +		&array_of_hash_maps,
> > +		&array_of_hash_malloc_maps,
> > +		&array_of_lru_hash_maps,
> > +		&hash_of_array_maps,
> > +		&hash_of_hash_maps,
> > +		&hash_of_hash_malloc_maps,
> > +		&hash_of_lru_hash_maps,
> > +	};
> > +	struct map_value *v;
> > +	int i, key = 0;
> > +	void *map;
> > +
> > +	for (i = 0; i < sizeof(map_of_maps) / sizeof(*map_of_maps); i++) {
> > +		map = bpf_map_lookup_elem(&array_of_array_maps, &key);
> > +		if (!map)
> > +			return 0;
> > +		v = bpf_map_lookup_elem(map, &key);
> > +		if (!v)
> > +			return 0;
> > +		test_kptr(v);
> > +	}
> > +	return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr
  2022-03-22 21:00   ` Andrii Nakryiko
@ 2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 02:30:57AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 20, 2022 at 8:56 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This uses the __kptr and __kptr_ref macros as well, and tries to test
> > the stuff that is supposed to work, since we have negative tests in
> > test_verifier suite. Also include some code to test map-in-map support,
> > such that the inner_map_meta matches the kptr_off_tab of map added as
> > element.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  .../selftests/bpf/prog_tests/map_kptr.c       |  20 ++
> >  tools/testing/selftests/bpf/progs/map_kptr.c  | 194 ++++++++++++++++++
> >  2 files changed, 214 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
> > new file mode 100644
> > index 000000000000..688732295ce9
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
> > @@ -0,0 +1,20 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <test_progs.h>
> > +
> > +#include "map_kptr.skel.h"
> > +
> > +void test_map_kptr(void)
> > +{
> > +       struct map_kptr *skel;
> > +       char buf[24];
> > +       int key = 0;
> > +
> > +       skel = map_kptr__open_and_load();
> > +       if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
> > +               return;
> > +       ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_map), &key, buf, 0),
> > +                 "bpf_map_update_elem hash_map");
> > +       ASSERT_OK(bpf_map_update_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key, buf, 0),
> > +                 "bpf_map_update_elem hash_malloc_map");
>
>
> nit: it's quite messy and verbose, please do the operation outside of
> ASSERT_OK() and just validate error:
>
> err = bpf_map_update_elem(...);
> ASSERT_OK(err, "hash_map_update");
>
> And keep those ASSERT_XXX() string descriptors relatively short (see
> how they are used internally in ASSERT_XXX() macros).
>

Ok.

> > +       map_kptr__destroy(skel);
> > +}
>
> [...]
>
> > +
> > +extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
> > +extern struct prog_test_ref_kfunc *
> > +bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
> > +extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
> > +
> > +static __always_inline
>
> nit: no need for __always_inline everywhere, just `static void` will
> do the right thing.
>

Ok, will drop.

> > +void test_kptr_unref(struct map_value *v)
> > +{
> > +       struct prog_test_ref_kfunc *p;
> > +
> > +       p = v->unref_ptr;
> > +       /* store untrusted_ptr_or_null_ */
> > +       v->unref_ptr = p;
> > +       if (!p)
> > +               return;
> > +       if (p->a + p->b > 100)
> > +               return;
> > +       /* store untrusted_ptr_ */
> > +       v->unref_ptr = p;
> > +       /* store NULL */
> > +       v->unref_ptr = NULL;
> > +}
> > +
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map
  2022-03-22 20:59   ` Martin KaFai Lau
@ 2022-03-25 14:57     ` Kumar Kartikeya Dwivedi
  2022-03-25 23:39       ` Martin KaFai Lau
  0 siblings, 1 reply; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 14:57 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  			       struct bpf_map_value_off_desc *off_desc,
> > -			       struct bpf_reg_state *reg, u32 regno)
> > +			       struct bpf_reg_state *reg, u32 regno,
> > +			       bool ref_ptr)
> >  {
> >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> >  	const char *reg_name = "";
> > +	bool fixed_off_ok = true;
> >
> >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> >  		goto bad_type;
> > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> >
> > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > +	if (ref_ptr) {
> > +		if (!reg->ref_obj_id) {
> > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +			return -EACCES;
> > +		}
> The is_release_function() checkings under check_helper_call() is
> not the same?
>
> > +		/* reg->off can be used to store pointer to a certain type formed by
> > +		 * incrementing pointer of a parent structure the object is embedded in,
> > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > +		 * allowed a store using &file->f_path. However, in the case of
> > +		 * referenced pointer, we cannot do this, because the reference is only
> > +		 * for the parent structure, not its embedded object(s), and because
> > +		 * the transfer of ownership happens for the original pointer to and
> > +		 * from the map (before its eventual release).
> > +		 */
> > +		if (reg->off)
> > +			fixed_off_ok = false;
> I thought the new check_func_arg_reg_off() is supposed to handle the
> is_release_function() case.  The check_func_arg_reg_off() called
> in check_func_arg() can not handle this case?
>

The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
is 0. So in that case, I should probably check reg->ref_obj_id to be non-zero
when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
that this would eventually be an argument to the release function, so the
argument should be checked with check_func_arg_reg_off.

> > +	}
> > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> >  		return -EACCES;
> >
> >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
>
> [ ... ]
>
> > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> >
> >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> >  };
> >
> >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  			  enum bpf_arg_type arg_type,
> > -			  const u32 *arg_btf_id)
> > +			  const u32 *arg_btf_id,
> > +			  struct bpf_call_arg_meta *meta)
> >  {
> >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> >  	enum bpf_reg_type expected, type = reg->type;
> > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  			arg_btf_id = compatible->btf_id;
> >  		}
> >
> > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > -					  btf_vmlinux, *arg_btf_id)) {
> > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > +				return -EACCES;
> > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > +						 btf_vmlinux, *arg_btf_id)) {
> >  			verbose(env, "R%d is of type %s but %s is expected\n",
> >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> >  				kernel_type_name(btf_vmlinux, *arg_btf_id));

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case
  2022-03-22 20:38   ` Andrii Nakryiko
@ 2022-03-25 15:06     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 15:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 02:08:36AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Since now there might be at most 10 offsets that need handling in
> > copy_map_value, the manual shuffling and special case is no longer going
> > to work. Hence, let's generalise the copy_map_value function by using
> > a sorted array of offsets to skip regions that must be avoided while
> > copying into and out of a map value.
> >
> > When the map is created, we populate the offset array in struct map,
> > with one extra element for map->value_size, which is used as the final
> > offset to subtract previous offset from. Then, copy_map_value uses this
> > sorted offset array is used to memcpy while skipping timer, spin lock,
> > and kptr.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h  | 55 +++++++++++++++++++++++---------------------
> >  kernel/bpf/syscall.c | 52 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 81 insertions(+), 26 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9d424d567dd3..6474d2d44b78 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -158,6 +158,10 @@ struct bpf_map_ops {
> >  enum {
> >         /* Support at most 8 pointers in a BPF map value */
> >         BPF_MAP_VALUE_OFF_MAX = 8,
> > +       BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
> > +                               1 + /* for bpf_spin_lock */
> > +                               1 + /* for bpf_timer */
> > +                               1,  /* for map->value_size sentinel */
> >  };
> >
> >  enum {
> > @@ -206,9 +210,17 @@ struct bpf_map {
> >         char name[BPF_OBJ_NAME_LEN];
> >         bool bypass_spec_v1;
> >         bool frozen; /* write-once; write-protected by freeze_mutex */
> > -       /* 6 bytes hole */
> > -
> > -       /* The 3rd and 4th cacheline with misc members to avoid false sharing
> > +       /* 2 bytes hole */
> > +       struct {
> > +               struct {
> > +                       u32 off;
> > +                       u8 sz;
>
> So here we are wasting 11 * 3 == 33 bytes of padding, right? And it
> will only increase as we add bpf_dynptr support soon.
>
> But if we split this struct into two arrays you won't be wasting any of that:
>
> struct {
>     u32 cnt;
>     u32 field_offs[BPF_MAP_OFF_ARR_MAX];
>     u8 szs[BPF_MAP_OFF_ARR_MAX]
> } off_arr;
>
> ?

Ok, will switch to this.

>
> Further, given the majority of BPF maps in the system probably won't
> use any of these special fields, would it make sense to dynamically
> allocate this portion of struct bpf_map?
>

Yes, dynamically allocating also makes sense. I'll go with that for v4.

> > +               } field[BPF_MAP_OFF_ARR_MAX];
> > +               u32 cnt;
> > +       } off_arr;
> > +       /* 40 bytes hole */
> > +
> > +       /* The 4th and 5th cacheline with misc members to avoid false sharing
> >          * particularly with refcounting.
> >          */
> >         atomic64_t refcnt ____cacheline_aligned;
> > @@ -250,36 +262,27 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >                 memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
> >         if (unlikely(map_value_has_timer(map)))
> >                 memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> > +       if (unlikely(map_value_has_kptr(map))) {
> > +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +               int i;
> > +
> > +               for (i = 0; i < tab->nr_off; i++)
> > +                       *(u64 *)(dst + tab->off[i].offset) = 0;
> > +       }
> >  }
> >
> >  /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
> >  static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
> >  {
> > -       u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
> > +       int i;
> >
> > -       if (unlikely(map_value_has_spin_lock(map))) {
> > -               s_off = map->spin_lock_off;
> > -               s_sz = sizeof(struct bpf_spin_lock);
> > -       }
> > -       if (unlikely(map_value_has_timer(map))) {
> > -               t_off = map->timer_off;
> > -               t_sz = sizeof(struct bpf_timer);
> > -       }
> > +       memcpy(dst, src, map->off_arr.field[0].off);
> > +       for (i = 1; i < map->off_arr.cnt; i++) {
> > +               u32 curr_off = map->off_arr.field[i - 1].off;
> > +               u32 next_off = map->off_arr.field[i].off;
> >
> > -       if (unlikely(s_sz || t_sz)) {
> > -               if (s_off < t_off || !s_sz) {
> > -                       swap(s_off, t_off);
> > -                       swap(s_sz, t_sz);
> > -               }
> > -               memcpy(dst, src, t_off);
> > -               memcpy(dst + t_off + t_sz,
> > -                      src + t_off + t_sz,
> > -                      s_off - t_off - t_sz);
> > -               memcpy(dst + s_off + s_sz,
> > -                      src + s_off + s_sz,
> > -                      map->value_size - s_off - s_sz);
> > -       } else {
> > -               memcpy(dst, src, map->value_size);
> > +               curr_off += map->off_arr.field[i - 1].sz;
> > +               memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
> >         }
>
> We can also get away with value_size sentinel value if we rewrite this
> logic as follows:
>
> u32 cur_off = 0;
> int i;
>
> for (i = 0; i < map->off_arr.cnt; i++) {
>     memcpy(dst + cur_off, src + cur_off,  map->off_arr.field[i].off - cur_off);
>     cur_off += map->off_arr.field[i].sz;
> }
>
> memcpy(dst + cur_off, src + cur_off, map->value_size - cur_off);
>

Looks better, will switch.

>
> It will be as optimal but won't require value_size sentinel.
>
> >  }
> >  void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 5990d6fa97ab..7b32537bd81f 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -30,6 +30,7 @@
> >  #include <linux/pgtable.h>
> >  #include <linux/bpf_lsm.h>
> >  #include <linux/poll.h>
> > +#include <linux/sort.h>
> >  #include <linux/bpf-netns.h>
> >  #include <linux/rcupdate_trace.h>
> >  #include <linux/memcontrol.h>
> > @@ -851,6 +852,55 @@ int map_check_no_btf(const struct bpf_map *map,
> >         return -ENOTSUPP;
> >  }
> >
> > +static int map_off_arr_cmp(const void *_a, const void *_b)
> > +{
> > +       const u32 a = *(const u32 *)_a;
> > +       const u32 b = *(const u32 *)_b;
> > +
> > +       if (a < b)
> > +               return -1;
> > +       else if (a > b)
> > +               return 1;
> > +       return 0;
> > +}
> > +
> > +static void map_populate_off_arr(struct bpf_map *map)
> > +{
> > +       u32 i;
> > +
> > +       map->off_arr.cnt = 0;
> > +       if (map_value_has_spin_lock(map)) {
> > +               i = map->off_arr.cnt;
> > +
> > +               map->off_arr.field[i].off = map->spin_lock_off;
> > +               map->off_arr.field[i].sz = sizeof(struct bpf_spin_lock);
> > +               map->off_arr.cnt++;
> > +       }
> > +       if (map_value_has_timer(map)) {
> > +               i = map->off_arr.cnt;
> > +
> > +               map->off_arr.field[i].off = map->timer_off;
> > +               map->off_arr.field[i].sz = sizeof(struct bpf_timer);
> > +               map->off_arr.cnt++;
> > +       }
> > +       if (map_value_has_kptr(map)) {
> > +               struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +               u32 j = map->off_arr.cnt;
> > +
> > +               for (i = 0; i < tab->nr_off; i++) {
> > +                       map->off_arr.field[j + i].off = tab->off[i].offset;
> > +                       map->off_arr.field[j + i].sz = sizeof(u64);
> > +               }
> > +               map->off_arr.cnt += tab->nr_off;
> > +       }
> > +
> > +       map->off_arr.field[map->off_arr.cnt++].off = map->value_size;
>
> Using a pointer for map->off_arr.field[j + i] and incrementing it
> along the cnt would make this code more succinct, and possibly even a
> bit more efficient. With my above suggestion to split offs from szs,
> you'll need two pointers, but still might be cleaner.
>

Ack.

> > +       if (map->off_arr.cnt == 1)
> > +               return;
> > +       sort(map->off_arr.field, map->off_arr.cnt, sizeof(map->off_arr.field[0]),
> > +            map_off_arr_cmp, NULL);
>
> See how Jiri is using sort_r() to sort two related arrays and keep
> them in sync w.r.t. order.
>

Thanks for the pointer.

> > +}
> > +
> >  static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >                          u32 btf_key_id, u32 btf_value_id)
> >  {
> > @@ -1018,6 +1068,8 @@ static int map_create(union bpf_attr *attr)
> >                         attr->btf_vmlinux_value_type_id;
> >         }
> >
> > +       map_populate_off_arr(map);
> > +
> >         err = security_bpf_map_alloc(map);
> >         if (err)
> >                 goto free_map;
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr
  2022-03-22 21:10   ` Alexei Starovoitov
@ 2022-03-25 15:07     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-25 15:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Wed, Mar 23, 2022 at 02:40:17AM IST, Alexei Starovoitov wrote:
> On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > +               /* Find and stash the function pointer for the destruction function that
> > +                * needs to be eventually invoked from the map free path.
> > +                */
> > +               if (info_arr[i].flags & BPF_MAP_VALUE_OFF_F_REF) {
> > +                       const struct btf_type *dtor_func, *dtor_func_proto;
> > +                       const struct btf_param *args;
> > +                       const char *dtor_func_name;
> > +                       unsigned long addr;
> > +                       s32 dtor_btf_id;
> > +                       u32 nr_args;
> > +
> > +                       /* This call also serves as a whitelist of allowed objects that
> > +                        * can be used as a referenced pointer and be stored in a map at
> > +                        * the same time.
> > +                        */
> > +                       dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
> > +                       if (dtor_btf_id < 0) {
> > +                               ret = dtor_btf_id;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
> > +                       dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
> > +                       if (!dtor_func || !btf_type_is_func(dtor_func)) {
> > +                               ret = -EINVAL;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
> > +                       dtor_func_proto = btf_type_by_id(off_btf, dtor_func->type);
> > +                       if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto)) {
> > +                               ret = -EINVAL;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
> > +                       /* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
> > +                       t = btf_type_by_id(off_btf, dtor_func_proto->type);
> > +                       if (!t || !btf_type_is_void(t)) {
> > +                               ret = -EINVAL;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
> > +                       nr_args = btf_type_vlen(dtor_func_proto);
> > +                       args = btf_params(dtor_func_proto);
> > +
> > +                       t = NULL;
> > +                       if (nr_args)
> > +                               t = btf_type_by_id(off_btf, args[0].type);
> > +                       /* Allow any pointer type, as width on targets Linux supports
> > +                        * will be same for all pointer types (i.e. sizeof(void *))
> > +                        */
> > +                       if (nr_args != 1 || !t || !btf_type_is_ptr(t)) {
> > +                               ret = -EINVAL;
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +
> > +                       if (btf_is_module(btf)) {
> > +                               mod = btf_try_get_module(off_btf);
> > +                               if (!mod) {
> > +                                       ret = -ENXIO;
> > +                                       btf_put(off_btf);
> > +                                       goto end;
> > +                               }
> > +                       }
> > +
> > +                       dtor_func_name = __btf_name_by_offset(off_btf, dtor_func->name_off);
> > +                       addr = kallsyms_lookup_name(dtor_func_name);
> > +                       if (!addr) {
> > +                               ret = -EINVAL;
> > +                               module_put(mod);
> > +                               btf_put(off_btf);
> > +                               goto end;
> > +                       }
> > +                       tab->off[i].dtor = (void *)addr;
>
> Most of the above should probably be in register_btf_id_dtor_kfuncs().
> It's best to fail early.
> Here we'll just remember dtor function pointer to speed up release.

Ok, will move all of these checks to register_btf_id_dtor_kfuncs.

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map
  2022-03-25 14:42         ` Kumar Kartikeya Dwivedi
@ 2022-03-25 22:59           ` Andrii Nakryiko
  0 siblings, 0 replies; 44+ messages in thread
From: Andrii Nakryiko @ 2022-03-25 22:59 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Fri, Mar 25, 2022 at 7:42 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Mar 23, 2022 at 12:22:20AM IST, Andrii Nakryiko wrote:
> > On Tue, Mar 22, 2022 at 12:16 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Tue, Mar 22, 2022 at 11:15:42AM IST, Andrii Nakryiko wrote:
> > > > On Sun, Mar 20, 2022 at 8:55 AM Kumar Kartikeya Dwivedi
> > > > <memxor@gmail.com> wrote:
> > > > >
> > > > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > > > register must have the same type as in the map value's BTF, and loading
> > > > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > > > kernel BTF and BTF ID.
> > > > >
> > > > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > > > BPF program loads this pointer, the object which the pointer points to
> > > > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > > > to still access such invalid pointer, but passing such pointers into
> > > > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > > > series will close this gap.
> > > > >
> > > > > The flexibility offered by allowing programs to dereference such invalid
> > > > > pointers while being safe at runtime frees the verifier from doing
> > > > > complex lifetime tracking. As long as the user may ensure that the
> > > > > object remains valid, it can ensure data read by it from the kernel
> > > > > object is valid.
> > > > >
> > > > > The user indicates that a certain pointer must be treated as kptr
> > > > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > > > information is recorded in the object BTF which will be passed into the
> > > > > kernel by way of map's BTF information. The name and kind from the map
> > > > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > > > now, only storing pointers to structs is permitted.
> > > > >
> > > > > An example of this specification is shown below:
> > > > >
> > > > >         #define __kptr __attribute__((btf_type_tag("kptr")))
> > > > >
> > > > >         struct map_value {
> > > > >                 ...
> > > > >                 struct task_struct __kptr *task;
> > > > >                 ...
> > > > >         };
> > > > >
> > > > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > > > task_struct into the map, and then load it later.
> > > > >
> > > > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > > > the verifier cannot know whether the value is NULL or not statically, it
> > > > > must treat all potential loads at that map value offset as loading a
> > > > > possibly NULL pointer.
> > > > >
> > > > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > > > same BTF type as specified in the map BTF. The access size must always
> > > > > be BPF_DW.
> > > > >
> > > > > For the map in map support, the kptr_off_tab for outer map is copied
> > > > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > > > case, hence would be unnecessary for all other users.
> > > > >
> > > > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > > > having kptr, similar to the bpf_timer case.
> > > > >
> > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > > ---
> > > > >  include/linux/bpf.h     |  29 +++++++-
> > > > >  include/linux/btf.h     |   2 +
> > > > >  kernel/bpf/btf.c        | 161 ++++++++++++++++++++++++++++++++++------
> > > > >  kernel/bpf/map_in_map.c |   5 +-
> > > > >  kernel/bpf/syscall.c    | 112 +++++++++++++++++++++++++++-
> > > > >  kernel/bpf/verifier.c   | 120 ++++++++++++++++++++++++++++++
> > > > >  6 files changed, 401 insertions(+), 28 deletions(-)
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > > > > +                              u32 off, int sz, struct btf_field_info *info)
> > > > > +{
> > > > > +       /* For PTR, sz is always == 8 */
> > > > > +       if (!btf_type_is_ptr(t))
> > > > > +               return BTF_FIELD_IGNORE;
> > > > > +       t = btf_type_by_id(btf, t->type);
> > > > > +
> > > > > +       if (!btf_type_is_type_tag(t))
> > > > > +               return BTF_FIELD_IGNORE;
> > > > > +       /* Reject extra tags */
> > > > > +       if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
> > > > > +               return -EINVAL;
> > > >
> > > > Can we have tag -> const -> tag -> volatile -> tag in BTF? Wouldn't
> > > > you assume there are no more tags with just this check?
> > > >
> > >
> > > All tags are supposed to be before other modifiers, so tags come first, in
> > > continuity. See [0].
> >
> > Doesn't seem like kernel's BTF validator enforces this, we should
> > probably tighten that up a bit. Clang won't emit such BTF, but nothing
> > prevents user from generating non-conformant BTF on its own either.
> >
>
> Right, what would be a good place to do this validation? When loading a BTF
> using bpf(2) syscall, or when we do this btf_parse_kptrs?

Given this is generic BTF property we are expecting and enforcing, it
should be in normal BTF validation logic.

>
> > >
> > > Alexei suggested to reject all other tags for now.
> > >
> > >  [0]: https://lore.kernel.org/bpf/20220127154627.665163-1-yhs@fb.com
> > >
> > > >
> > > > > +       if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       /* Get the base type */
> > > > > +       if (btf_type_is_modifier(t))
> > > > > +               t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > > > +       /* Only pointer to struct is allowed */
> > > > > +       if (!__btf_type_is_struct(t))
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       info->type = t;
> > > > > +       info->off = off;
> > > > > +       return BTF_FIELD_FOUND;
> > > > >  }
> >
> > [...]
> >
> > > > > +       if (map_value_has_kptr(map)) {
> > > > > +               if (!bpf_capable())
> > > > > +                       return -EPERM;
> > > > > +               if (map->map_flags & BPF_F_RDONLY_PROG) {
> > > > > +                       ret = -EACCES;
> > > > > +                       goto free_map_tab;
> > > > > +               }
> > > > > +               if (map->map_type != BPF_MAP_TYPE_HASH &&
> > > > > +                   map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > > > > +                   map->map_type != BPF_MAP_TYPE_ARRAY) {
> > > >
> > > > what about PERCPU_ARRAY, for instance? Is there something
> > > > fundamentally wrong to support it for local storage maps?
> > > >
> > >
> > > Plugging in support into maps that already take timers was easier to begin, I
> > > can do percpu support as a follow up.
> > >
> > > In case of local storage, I'm a little worried about how we prevent creating
> > > reference cycles. There was a thread where find_get_task_by_pid was proposed as
> > > unstable helper, once we e.g. support embedding task_struct in map, and allow
> > > storing such pointer in task local storage, it would be pretty easy to construct
> > > a circular reference cycle.
> > >
> > > Should we think about this now, or should we worry about this when task_struct
> > > is actually supported as kptr? It's not only task_struct, same applies to sock.
> > >
> > > There's a discussion to be had, hence I left it out for now.
> >
> > PERCPU_ARRAY seemed (and still seems) like a safe map to support (same
> > as PERCPU_HASH), which is why I asked. I see concerns about local
> > storage, though, thanks.
> >
>
> I'll look into it after this lands.
>
> > >
> > > > > +                       ret = -EOPNOTSUPP;
> > > > > +                       goto free_map_tab;
> > > > > +               }
> > > > > +       }
> > > > > +

[...]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map
  2022-03-25 14:57     ` Kumar Kartikeya Dwivedi
@ 2022-03-25 23:39       ` Martin KaFai Lau
  2022-03-26  1:01         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 44+ messages in thread
From: Martin KaFai Lau @ 2022-03-25 23:39 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Fri, Mar 25, 2022 at 08:27:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> > On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  			       struct bpf_map_value_off_desc *off_desc,
> > > -			       struct bpf_reg_state *reg, u32 regno)
> > > +			       struct bpf_reg_state *reg, u32 regno,
> > > +			       bool ref_ptr)
> > >  {
> > >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > >  	const char *reg_name = "";
> > > +	bool fixed_off_ok = true;
> > >
> > >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > >  		goto bad_type;
> > > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > >
> > > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > > +	if (ref_ptr) {
> > > +		if (!reg->ref_obj_id) {
> > > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > +			return -EACCES;
> > > +		}
> > The is_release_function() checkings under check_helper_call() is
> > not the same?
> >
> > > +		/* reg->off can be used to store pointer to a certain type formed by
> > > +		 * incrementing pointer of a parent structure the object is embedded in,
> > > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > > +		 * allowed a store using &file->f_path. However, in the case of
> > > +		 * referenced pointer, we cannot do this, because the reference is only
> > > +		 * for the parent structure, not its embedded object(s), and because
> > > +		 * the transfer of ownership happens for the original pointer to and
> > > +		 * from the map (before its eventual release).
> > > +		 */
> > > +		if (reg->off)
> > > +			fixed_off_ok = false;
> > I thought the new check_func_arg_reg_off() is supposed to handle the
> > is_release_function() case.  The check_func_arg_reg_off() called
> > in check_func_arg() can not handle this case?
> >
> 
> The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
> is 0.
If ref_obj_id is not 0, check_func_arg_reg_off() will reject reg->off.
check_func_arg_reg_off is called after check_reg_type().

If ref_obj_id is 0, the is_release_function() check in the
check_helper_call() should complain:
	verbose(env, "func %s#%d reference has not been acquired before\n",
		func_id_name(func_id), func_id);

I am quite confused why it needs special reg->off and
reg->ref_obj_id checking here for the map_kptr helper taking
PTR_TO_BTF_ID arg but not other helper taking PTR_TO_BTF_ID arg.
The existing checks for the other helper taking PTR_TO_BTF_ID
arg is not enough?

> So in that case, I should probably check reg->ref_obj_id to be non-zero
> when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
> that this would eventually be an argument to the release function, so the
> argument should be checked with check_func_arg_reg_off.



> 
> > > +	}
> > > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> > >  		return -EACCES;
> > >
> > >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> >
> > [ ... ]
> >
> > > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > >
> > >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> > >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> > >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> > >  };
> > >
> > >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > >  			  enum bpf_arg_type arg_type,
> > > -			  const u32 *arg_btf_id)
> > > +			  const u32 *arg_btf_id,
> > > +			  struct bpf_call_arg_meta *meta)
> > >  {
> > >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > >  	enum bpf_reg_type expected, type = reg->type;
> > > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > >  			arg_btf_id = compatible->btf_id;
> > >  		}
> > >
> > > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > -					  btf_vmlinux, *arg_btf_id)) {
> > > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > > +				return -EACCES;
> > > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > +						 btf_vmlinux, *arg_btf_id)) {
> > >  			verbose(env, "R%d is of type %s but %s is expected\n",
> > >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> > >  				kernel_type_name(btf_vmlinux, *arg_btf_id));
> 
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map
  2022-03-25 23:39       ` Martin KaFai Lau
@ 2022-03-26  1:01         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 44+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-03-26  1:01 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Sat, Mar 26, 2022 at 05:09:52AM IST, Martin KaFai Lau wrote:
> On Fri, Mar 25, 2022 at 08:27:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> > > On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > >  			       struct bpf_map_value_off_desc *off_desc,
> > > > -			       struct bpf_reg_state *reg, u32 regno)
> > > > +			       struct bpf_reg_state *reg, u32 regno,
> > > > +			       bool ref_ptr)
> > > >  {
> > > >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > > >  	const char *reg_name = "";
> > > > +	bool fixed_off_ok = true;
> > > >
> > > >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > > >  		goto bad_type;
> > > > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > > >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > > >
> > > > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > > > +	if (ref_ptr) {
> > > > +		if (!reg->ref_obj_id) {
> > > > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > > > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > > +			return -EACCES;
> > > > +		}
> > > The is_release_function() checkings under check_helper_call() is
> > > not the same?
> > >
> > > > +		/* reg->off can be used to store pointer to a certain type formed by
> > > > +		 * incrementing pointer of a parent structure the object is embedded in,
> > > > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > > > +		 * allowed a store using &file->f_path. However, in the case of
> > > > +		 * referenced pointer, we cannot do this, because the reference is only
> > > > +		 * for the parent structure, not its embedded object(s), and because
> > > > +		 * the transfer of ownership happens for the original pointer to and
> > > > +		 * from the map (before its eventual release).
> > > > +		 */
> > > > +		if (reg->off)
> > > > +			fixed_off_ok = false;
> > > I thought the new check_func_arg_reg_off() is supposed to handle the
> > > is_release_function() case.  The check_func_arg_reg_off() called
> > > in check_func_arg() can not handle this case?
> > >
> >
> > The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
> > is 0.
> If ref_obj_id is not 0, check_func_arg_reg_off() will reject reg->off.
> check_func_arg_reg_off is called after check_reg_type().
>
> If ref_obj_id is 0, the is_release_function() check in the
> check_helper_call() should complain:
> 	verbose(env, "func %s#%d reference has not been acquired before\n",
> 		func_id_name(func_id), func_id);
>
> I am quite confused why it needs special reg->off and
> reg->ref_obj_id checking here for the map_kptr helper taking
> PTR_TO_BTF_ID arg but not other helper taking PTR_TO_BTF_ID arg.
> The existing checks for the other helper taking PTR_TO_BTF_ID
> arg is not enough?
>

Yes, you're right, it should be enough. We just need to check for the normal
case here, with fixed_off_ok = true, since that can come from BPF_STX. In the
referenced case, earlier it was also possible to store using BPF_XCHG, but not
anymore, so now the checks for the helper should be enough, and complain.

Will drop this in v4, and just keep __check_ptr_off_reg(env, reg, regno, false).

> > So in that case, I should probably check reg->ref_obj_id to be non-zero
> > when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
> > that this would eventually be an argument to the release function, so the
> > argument should be checked with check_func_arg_reg_off.
>
>
>
> >
> > > > +	}
> > > > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > > > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> > > >  		return -EACCES;
> > > >
> > > >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > >
> > > [ ... ]
> > >
> > > > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > > >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > > >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > > > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > >
> > > >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > > >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > > > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > > >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> > > >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> > > >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > > > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> > > >  };
> > > >
> > > >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > > >  			  enum bpf_arg_type arg_type,
> > > > -			  const u32 *arg_btf_id)
> > > > +			  const u32 *arg_btf_id,
> > > > +			  struct bpf_call_arg_meta *meta)
> > > >  {
> > > >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > > >  	enum bpf_reg_type expected, type = reg->type;
> > > > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > > >  			arg_btf_id = compatible->btf_id;
> > > >  		}
> > > >
> > > > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > -					  btf_vmlinux, *arg_btf_id)) {
> > > > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > > > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > > > +				return -EACCES;
> > > > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > +						 btf_vmlinux, *arg_btf_id)) {
> > > >  			verbose(env, "R%d is of type %s but %s is expected\n",
> > > >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> > > >  				kernel_type_name(btf_vmlinux, *arg_btf_id));
> >
> > --
> > Kartikeya

--
Kartikeya

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-03-26  1:02 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-20 15:54 [PATCH bpf-next v3 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
2022-03-20 15:54 ` [PATCH bpf-next v3 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
2022-03-20 15:54 ` [PATCH bpf-next v3 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
2022-03-21 23:39   ` Joanne Koong
2022-03-22  7:04     ` Kumar Kartikeya Dwivedi
2022-03-22 20:22       ` Andrii Nakryiko
2022-03-25 14:51         ` Kumar Kartikeya Dwivedi
2022-03-22  5:45   ` Andrii Nakryiko
2022-03-22  7:16     ` Kumar Kartikeya Dwivedi
2022-03-22  7:43       ` Kumar Kartikeya Dwivedi
2022-03-22 18:52       ` Andrii Nakryiko
2022-03-25 14:42         ` Kumar Kartikeya Dwivedi
2022-03-25 22:59           ` Andrii Nakryiko
2022-03-22 18:06   ` Martin KaFai Lau
2022-03-25 14:45     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 04/13] bpf: Indicate argument that will be released in bpf_func_proto Kumar Kartikeya Dwivedi
2022-03-22  1:47   ` Joanne Koong
2022-03-22  7:34     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
2022-03-22 20:59   ` Martin KaFai Lau
2022-03-25 14:57     ` Kumar Kartikeya Dwivedi
2022-03-25 23:39       ` Martin KaFai Lau
2022-03-26  1:01         ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
2022-03-22  5:58   ` Andrii Nakryiko
2022-03-22  7:18     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
2022-03-22 20:38   ` Andrii Nakryiko
2022-03-25 15:06     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
2022-03-22 20:51   ` Andrii Nakryiko
2022-03-25 14:50     ` Kumar Kartikeya Dwivedi
2022-03-22 21:10   ` Alexei Starovoitov
2022-03-25 15:07     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
2022-03-22 21:00   ` Andrii Nakryiko
2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
2022-03-24  9:10   ` Jiri Olsa
2022-03-25 14:52     ` Kumar Kartikeya Dwivedi
2022-03-20 15:55 ` [PATCH bpf-next v3 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.