All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps
@ 2022-04-15 16:03 Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
two kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the bpf_kptr_xchg helper, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

Notes
-----

 * C selftests require https://reviews.llvm.org/D119799 to pass.
 * Unlike BPF timers, kptr is not reset or freed on map_release_uref.
 * Referenced kptr storage is always treated as unsigned long * on kernel side,
   as BPF side cannot mutate it. The storage (8 bytes) is sufficient for both
   32-bit and 64-bit platforms.
 * Use of WRITE_ONCE to reset unreferenced kptr on 32-bit systems is fine, as
   the actual pointer is always word sized, so the store tearing into two 32-bit
   stores won't be a problem as the other half is always zeroed out.

Changelog:
----------
v4 -> v5
v4: https://lore.kernel.org/bpf/20220409093303.499196-1-memxor@gmail.com

 * Address comments from Joanne
   * Move __btf_member_bit_offset before strcmp
   * Move strcmp conditional on name to unref kptr patch
   * Directly return from btf_find_struct in patch 1
   * Use enum btf_field_type vs int field_type
   * Put btf and btf_id in off_desc in named struct 'kptr'
   * Switch order for BTF_FIELD_IGNORE check
   * Drop dead tab->nr_off = 0 store
   * Use i instead of tab->nr_off to btf_put on failure
   * Replace kzalloc + memcpy with kmemdup (kernel test robot)
   * Reject both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG
   * Add logging statement for reject BPF_MODE(insn->code) != BPF_MEM
   * Rename off_desc -> kptr_off_desc in check_mem_access
   * Drop check for err, fallthrough to end of function
   * Remove is_release_function, use meta.release_regno to detect release
     function, release reference state, and remove check_release_regno
   * Drop off_desc->flags, use off_desc->type
   * Update comment for ARG_PTR_TO_KPTR
 * Distinguish between direct/indirect access to kptr
 * Drop check_helper_mem_access from process_kptr_func, check_mem_reg in kptr_get
 * Add verifier test for helper accessing kptr indirectly
 * Fix other misc nits, add Acked-by for patch 2

v3 -> v4
v3: https://lore.kernel.org/bpf/20220320155510.671497-1-memxor@gmail.com

 * Use btf_parse_kptrs, plural kptrs naming (Joanne, Andrii)
 * Remove unused parameters in check_map_kptr_access (Joanne)
 * Handle idx < info_cnt kludge using tmp variable (Andrii)
 * Validate tags always precede modifiers in BTF (Andrii)
   * Split out into https://lore.kernel.org/bpf/20220406004121.282699-1-memxor@gmail.com
 * Store u32 type_id in btf_field_info (Andrii)
 * Use base_type in map_kptr_match_type (Andrii)
 * Free	kptr_off_tab when not bpf_capable (Martin)
 * Use PTR_RELEASE flag instead of bools in bpf_func_proto (Joanne)
 * Drop extra reg->off and reg->ref_obj_id checks in map_kptr_match_type (Martin)
 * Use separate u32 and u8 arrays for offs and sizes in off_arr (Andrii)
 * Simplify and remove map->value_size sentinel in copy_map_value (Andrii)
 * Use sort_r to keep both arrays in sync while sorting (Andrii)
 * Rename check_and_free_timers_and_kptr to check_and_free_fields (Andrii)
 * Move dtor prototype checks to registration phase (Alexei)
 * Use ret variable for checking ASSERT_XXX, use shorter strings (Andrii)
 * Fix missing checks for other maps (Jiri)
 * Fix various other nits, and bugs noticed during self review

v2 -> v3
v2: https://lore.kernel.org/bpf/20220317115957.3193097-1-memxor@gmail.com

 * Address comments from Alexei
   * Set name, sz, align in btf_find_field
   * Do idx >= info_cnt check in caller of btf_find_field_*
     * Use extra element in the info_arr to make this safe
   * Remove while loop, reject extra tags
   * Remove cases of defensive programming
   * Move bpf_capable() check to map_check_btf
   * Put check_ptr_off_reg reordering hunk into separate patch
   * Warn for ref_ptr once
   * Make the meta.ref_obj_id == 0 case simpler to read
   * Remove kptr_percpu and kptr_user support, remove their tests
   * Store size of field at offset in off_arr
 * Fix BPF_F_NO_PREALLOC set wrongly for hash map in C selftest
 * Add missing check_mem_reg call for kptr_get kfunc arg#0 check

v1 -> v2
v1: https://lore.kernel.org/bpf/20220220134813.3411982-1-memxor@gmail.com

 * Address comments from Alexei
   * Rename bpf_btf_find_by_name_kind_all to bpf_find_btf_id
   * Reduce indentation level in that function
   * Always take reference regardless of module or vmlinux BTF
   * Also made it the same for btf_get_module_btf
   * Use kptr, kptr_ref, kptr_percpu, kptr_user type tags
   * Don't reserve tag namespace
   * Refactor btf_find_field to be side effect free, allocate and populate
     kptr_off_tab in caller
   * Move module reference to dtor patch
   * Remove support for BPF_XCHG, BPF_CMPXCHG insn
   * Introduce bpf_kptr_xchg helper
   * Embed offset array in struct bpf_map, populate and sort it once
   * Adjust copy_map_value to memcpy directly using this offset array
   * Removed size member from offset array to save space
 * Fix some problems pointed out by kernel test robot
 * Tidy selftests
 * Lots of other minor fixes

Kumar Kartikeya Dwivedi (13):
  bpf: Make btf_find_field more generic
  bpf: Move check_ptr_off_reg before check_map_access
  bpf: Allow storing unreferenced kptr in map
  bpf: Tag argument to be released in bpf_func_proto
  bpf: Allow storing referenced kptr in map
  bpf: Prevent escaping of kptr loaded from maps
  bpf: Adapt copy_map_value for multiple offset case
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Wire up freeing of referenced kptr
  bpf: Teach verifier about kptr_get kfunc helpers
  libbpf: Add kptr type tag macros to bpf_helpers.h
  selftests/bpf: Add C tests for kptr
  selftests/bpf: Add verifier tests for kptr

 include/linux/bpf.h                           | 110 +++-
 include/linux/bpf_verifier.h                  |   3 +-
 include/linux/btf.h                           |  23 +
 include/uapi/linux/bpf.h                      |  12 +
 kernel/bpf/arraymap.c                         |  14 +-
 kernel/bpf/btf.c                              | 526 ++++++++++++++++--
 kernel/bpf/hashtab.c                          |  58 +-
 kernel/bpf/helpers.c                          |  21 +
 kernel/bpf/map_in_map.c                       |   5 +-
 kernel/bpf/ringbuf.c                          |   4 +-
 kernel/bpf/syscall.c                          | 248 ++++++++-
 kernel/bpf/verifier.c                         | 412 ++++++++++----
 net/bpf/test_run.c                            |  45 +-
 net/core/filter.c                             |   2 +-
 tools/include/uapi/linux/bpf.h                |  12 +
 tools/lib/bpf/bpf_helpers.h                   |   2 +
 .../selftests/bpf/prog_tests/map_kptr.c       |  37 ++
 tools/testing/selftests/bpf/progs/map_kptr.c  | 190 +++++++
 tools/testing/selftests/bpf/test_verifier.c   |  55 +-
 .../testing/selftests/bpf/verifier/map_kptr.c | 469 ++++++++++++++++
 .../selftests/bpf/verifier/ref_tracking.c     |   2 +-
 tools/testing/selftests/bpf/verifier/sock.c   |   6 +-
 22 files changed, 2057 insertions(+), 199 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_kptr.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 01/13] bpf: Make btf_find_field more generic
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Next commit introduces field type 'kptr' whose kind will not be struct,
but pointer, and it will not be limited to one offset, but multiple
ones. Make existing btf_find_struct_field and btf_find_datasec_var
functions amenable to use for finding kptrs in map value, by moving
spin_lock and timer specific checks into their own function.

The alignment, and name are checked before the function is called, so it
is the last point where we can skip field or return an error before the
next loop iteration happens. Size of the field and type is meant to be
checked inside the function.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 120 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 89 insertions(+), 31 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0918a39279f6..e2efc81a5ec3 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3163,24 +3163,44 @@ static void btf_struct_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
+enum btf_field_type {
+	BTF_FIELD_SPIN_LOCK,
+	BTF_FIELD_TIMER,
+};
+
+struct btf_field_info {
+	u32 off;
+};
+
+static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
+			   u32 off, int sz, struct btf_field_info *info)
+{
+	if (!__btf_type_is_struct(t))
+		return 0;
+	if (t->size != sz)
+		return 0;
+	if (info->off != -ENOENT)
+		/* only one such field is allowed */
+		return -E2BIG;
+	info->off = off;
+	return 0;
+}
+
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
-				 const char *name, int sz, int align)
+				 const char *name, int sz, int align,
+				 enum btf_field_type field_type,
+				 struct btf_field_info *info)
 {
 	const struct btf_member *member;
-	u32 i, off = -ENOENT;
+	u32 i, off;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								    member->type);
-		if (!__btf_type_is_struct(member_type))
-			continue;
-		if (member_type->size != sz)
-			continue;
+
 		if (strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
 			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
+
 		off = __btf_member_bit_offset(t, member);
 		if (off % 8)
 			/* valid C code cannot generate such BTF */
@@ -3188,46 +3208,76 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 		off /= 8;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			return btf_find_struct(btf, member_type, off, sz, info);
+		default:
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
-				const char *name, int sz, int align)
+				const char *name, int sz, int align,
+				enum btf_field_type field_type,
+				struct btf_field_info *info)
 {
 	const struct btf_var_secinfo *vsi;
-	u32 i, off = -ENOENT;
+	u32 i, off;
 
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
 		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
 
-		if (!__btf_type_is_struct(var_type))
-			continue;
-		if (var_type->size != sz)
+		off = vsi->offset;
+
+		if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
 			continue;
 		if (vsi->size != sz)
 			continue;
-		if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
-			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
-		off = vsi->offset;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			return btf_find_struct(btf, var_type, off, sz, info);
+		default:
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  const char *name, int sz, int align)
+			  enum btf_field_type field_type,
+			  struct btf_field_info *info)
 {
+	const char *name;
+	int sz, align;
+
+	switch (field_type) {
+	case BTF_FIELD_SPIN_LOCK:
+		name = "bpf_spin_lock";
+		sz = sizeof(struct bpf_spin_lock);
+		align = __alignof__(struct bpf_spin_lock);
+		break;
+	case BTF_FIELD_TIMER:
+		name = "bpf_timer";
+		sz = sizeof(struct bpf_timer);
+		align = __alignof__(struct bpf_timer);
+		break;
+	default:
+		return -EFAULT;
+	}
 
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align);
+		return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align);
+		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
 	return -EINVAL;
 }
 
@@ -3237,16 +3287,24 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
  */
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_spin_lock",
-			      sizeof(struct bpf_spin_lock),
-			      __alignof__(struct bpf_spin_lock));
+	struct btf_field_info info = { .off = -ENOENT };
+	int ret;
+
+	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
+	if (ret < 0)
+		return ret;
+	return info.off;
 }
 
 int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_timer",
-			      sizeof(struct bpf_timer),
-			      __alignof__(struct bpf_timer));
+	struct btf_field_info info = { .off = -ENOENT };
+	int ret;
+
+	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
+	if (ret < 0)
+		return ret;
+	return info.off;
 }
 
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:30   ` Alexei Starovoitov
  2022-04-15 16:03 ` [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Joanne Koong, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Some functions in next patch want to use this function, and those
functions will be called by check_map_access, hence move it before
check_map_access.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 76 +++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9c1a02b82ecd..71827d14724a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3469,6 +3469,44 @@ static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static int __check_ptr_off_reg(struct bpf_verifier_env *env,
+			       const struct bpf_reg_state *reg, int regno,
+			       bool fixed_off_ok)
+{
+	/* Access to this pointer-typed register or passing it to a helper
+	 * is only allowed in its original, unmodified form.
+	 */
+
+	if (reg->off < 0) {
+		verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
+			reg_type_str(env, reg->type), regno, reg->off);
+		return -EACCES;
+	}
+
+	if (!fixed_off_ok && reg->off) {
+		verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
+			reg_type_str(env, reg->type), regno, reg->off);
+		return -EACCES;
+	}
+
+	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+		char tn_buf[48];
+
+		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+		verbose(env, "variable %s access var_off=%s disallowed\n",
+			reg_type_str(env, reg->type), tn_buf);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
+int check_ptr_off_reg(struct bpf_verifier_env *env,
+		      const struct bpf_reg_state *reg, int regno)
+{
+	return __check_ptr_off_reg(env, reg, regno, false);
+}
+
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			    int off, int size, bool zero_size_allowed)
@@ -3980,44 +4018,6 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
 }
 #endif
 
-static int __check_ptr_off_reg(struct bpf_verifier_env *env,
-			       const struct bpf_reg_state *reg, int regno,
-			       bool fixed_off_ok)
-{
-	/* Access to this pointer-typed register or passing it to a helper
-	 * is only allowed in its original, unmodified form.
-	 */
-
-	if (reg->off < 0) {
-		verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
-			reg_type_str(env, reg->type), regno, reg->off);
-		return -EACCES;
-	}
-
-	if (!fixed_off_ok && reg->off) {
-		verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
-			reg_type_str(env, reg->type), regno, reg->off);
-		return -EACCES;
-	}
-
-	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
-		char tn_buf[48];
-
-		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
-		verbose(env, "variable %s access var_off=%s disallowed\n",
-			reg_type_str(env, reg->type), tn_buf);
-		return -EACCES;
-	}
-
-	return 0;
-}
-
-int check_ptr_off_reg(struct bpf_verifier_env *env,
-		      const struct bpf_reg_state *reg, int regno)
-{
-	return __check_ptr_off_reg(env, reg, regno, false);
-}
-
 static int __check_buffer_access(struct bpf_verifier_env *env,
 				 const char *buf_info,
 				 const struct bpf_reg_state *reg,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:15   ` Alexei Starovoitov
  2022-04-15 16:03 ` [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

This commit introduces a new pointer type 'kptr' which can be embedded
in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
register must have the same type as in the map value's BTF, and loading
a kptr marks the destination register as PTR_TO_BTF_ID with the correct
kernel BTF and BTF ID.

Such kptr are unreferenced, i.e. by the time another invocation of the
BPF program loads this pointer, the object which the pointer points to
may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
patched to PROBE_MEM loads by the verifier, it would safe to allow user
to still access such invalid pointer, but passing such pointers into
BPF helpers and kfuncs should not be permitted. A future patch in this
series will close this gap.

The flexibility offered by allowing programs to dereference such invalid
pointers while being safe at runtime frees the verifier from doing
complex lifetime tracking. As long as the user may ensure that the
object remains valid, it can ensure data read by it from the kernel
object is valid.

The user indicates that a certain pointer must be treated as kptr
capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
information is recorded in the object BTF which will be passed into the
kernel by way of map's BTF information. The name and kind from the map
value BTF is used to look up the in-kernel type, and the actual BTF and
BTF ID is recorded in the map struct in a new kptr_off_tab member. For
now, only storing pointers to structs is permitted.

An example of this specification is shown below:

	#define __kptr __attribute__((btf_type_tag("kptr")))

	struct map_value {
		...
		struct task_struct __kptr *task;
		...
	};

Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
task_struct into the map, and then load it later.

Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
the verifier cannot know whether the value is NULL or not statically, it
must treat all potential loads at that map value offset as loading a
possibly NULL pointer.

Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
are allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is a PTR_TO_BTF_ID with
same BTF type as specified in the map BTF. The access size must always
be BPF_DW.

For the map in map support, the kptr_off_tab for outer map is copied
from the inner map's kptr_off_tab. It was chosen to do a deep copy
instead of introducing a refcount to kptr_off_tab, because the copy only
needs to be done when paramterizing using inner_map_fd in the map in map
case, hence would be unnecessary for all other users.

It is not permitted to use MAP_FREEZE command and mmap for BPF map
having kptrs, similar to the bpf_timer case. A kptr also requires that
BPF program has both read and write access to the map (hence both
BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).

Note that check_map_access must be called from both
check_helper_mem_access and for the BPF instructions, hence the kptr
check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
and reuse it for this purpose.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h     |  31 +++++++-
 include/linux/btf.h     |   2 +
 kernel/bpf/btf.c        | 167 +++++++++++++++++++++++++++++++++++-----
 kernel/bpf/map_in_map.c |   5 +-
 kernel/bpf/syscall.c    | 113 ++++++++++++++++++++++++++-
 kernel/bpf/verifier.c   | 139 ++++++++++++++++++++++++++++++---
 6 files changed, 421 insertions(+), 36 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..ab86f4675db2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -155,6 +155,24 @@ struct bpf_map_ops {
 	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
+enum {
+	/* Support at most 8 pointers in a BPF map value */
+	BPF_MAP_VALUE_OFF_MAX = 8,
+};
+
+struct bpf_map_value_off_desc {
+	u32 offset;
+	struct {
+		struct btf *btf;
+		u32 btf_id;
+	} kptr;
+};
+
+struct bpf_map_value_off {
+	u32 nr_off;
+	struct bpf_map_value_off_desc off[];
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -171,6 +189,7 @@ struct bpf_map {
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
 	int spin_lock_off; /* >=0 valid offset, <0 error */
+	struct bpf_map_value_off *kptr_off_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
@@ -184,7 +203,7 @@ struct bpf_map {
 	char name[BPF_OBJ_NAME_LEN];
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 14 bytes hole */
+	/* 6 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -217,6 +236,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
 	return map->timer_off >= 0;
 }
 
+static inline bool map_value_has_kptrs(const struct bpf_map *map)
+{
+	return !IS_ERR_OR_NULL(map->kptr_off_tab);
+}
+
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
@@ -1497,6 +1521,11 @@ void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
+void bpf_map_free_kptr_off_tab(struct bpf_map *map);
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 36bc09b8e890..19c297f9a52f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
+					  const struct btf_type *t);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e2efc81a5ec3..be191df76ea4 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3166,9 +3166,16 @@ static void btf_struct_log(struct btf_verifier_env *env,
 enum btf_field_type {
 	BTF_FIELD_SPIN_LOCK,
 	BTF_FIELD_TIMER,
+	BTF_FIELD_KPTR,
+};
+
+enum {
+	BTF_FIELD_IGNORE = 0,
+	BTF_FIELD_FOUND  = 1,
 };
 
 struct btf_field_info {
+	u32 type_id;
 	u32 off;
 };
 
@@ -3176,29 +3183,57 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
 			   u32 off, int sz, struct btf_field_info *info)
 {
 	if (!__btf_type_is_struct(t))
-		return 0;
+		return BTF_FIELD_IGNORE;
 	if (t->size != sz)
-		return 0;
-	if (info->off != -ENOENT)
-		/* only one such field is allowed */
-		return -E2BIG;
+		return BTF_FIELD_IGNORE;
 	info->off = off;
-	return 0;
+	return BTF_FIELD_FOUND;
+}
+
+static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
+			 u32 off, int sz, struct btf_field_info *info)
+{
+	u32 res_id;
+
+	/* For PTR, sz is always == 8 */
+	if (!btf_type_is_ptr(t))
+		return BTF_FIELD_IGNORE;
+	t = btf_type_by_id(btf, t->type);
+
+	if (!btf_type_is_type_tag(t))
+		return BTF_FIELD_IGNORE;
+	/* Reject extra tags */
+	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
+		return -EINVAL;
+	if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+		return -EINVAL;
+
+	/* Get the base type */
+	t = btf_type_skip_modifiers(btf, t->type, &res_id);
+	/* Only pointer to struct is allowed */
+	if (!__btf_type_is_struct(t))
+		return -EINVAL;
+
+	info->type_id = res_id;
+	info->off = off;
+	return BTF_FIELD_FOUND;
 }
 
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
 				 const char *name, int sz, int align,
 				 enum btf_field_type field_type,
-				 struct btf_field_info *info)
+				 struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_member *member;
+	struct btf_field_info tmp;
+	int ret, idx = 0;
 	u32 i, off;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								    member->type);
 
-		if (strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
+		if (name && strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
 			continue;
 
 		off = __btf_member_bit_offset(t, member);
@@ -3212,20 +3247,38 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
-			return btf_find_struct(btf, member_type, off, sz, info);
+			ret = btf_find_struct(btf, member_type, off, sz,
+					      idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_kptr(btf, member_type, off, sz,
+					    idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
+
+		if (ret == BTF_FIELD_IGNORE)
+			continue;
+		if (idx >= info_cnt)
+			return -E2BIG;
+		++idx;
 	}
-	return 0;
+	return idx;
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 				const char *name, int sz, int align,
 				enum btf_field_type field_type,
-				struct btf_field_info *info)
+				struct btf_field_info *info, int info_cnt)
 {
 	const struct btf_var_secinfo *vsi;
+	struct btf_field_info tmp;
+	int ret, idx = 0;
 	u32 i, off;
 
 	for_each_vsi(i, t, vsi) {
@@ -3234,7 +3287,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 
 		off = vsi->offset;
 
-		if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
+		if (name && strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
 			continue;
 		if (vsi->size != sz)
 			continue;
@@ -3244,17 +3297,33 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BTF_FIELD_SPIN_LOCK:
 		case BTF_FIELD_TIMER:
-			return btf_find_struct(btf, var_type, off, sz, info);
+			ret = btf_find_struct(btf, var_type, off, sz,
+					      idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_kptr(btf, var_type, off, sz,
+					    idx < info_cnt ? &info[idx] : &tmp);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
+
+		if (ret == BTF_FIELD_IGNORE)
+			continue;
+		if (idx >= info_cnt)
+			return -E2BIG;
+		++idx;
 	}
-	return 0;
+	return idx;
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 			  enum btf_field_type field_type,
-			  struct btf_field_info *info)
+			  struct btf_field_info *info, int info_cnt)
 {
 	const char *name;
 	int sz, align;
@@ -3270,14 +3339,19 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 		sz = sizeof(struct bpf_timer);
 		align = __alignof__(struct bpf_timer);
 		break;
+	case BTF_FIELD_KPTR:
+		name = NULL;
+		sz = sizeof(u64);
+		align = 8;
+		break;
 	default:
 		return -EFAULT;
 	}
 
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align, field_type, info);
+		return btf_find_struct_field(btf, t, name, sz, align, field_type, info, info_cnt);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info);
+		return btf_find_datasec_var(btf, t, name, sz, align, field_type, info, info_cnt);
 	return -EINVAL;
 }
 
@@ -3287,26 +3361,77 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
  */
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 {
-	struct btf_field_info info = { .off = -ENOENT };
+	struct btf_field_info info;
 	int ret;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info);
+	ret = btf_find_field(btf, t, BTF_FIELD_SPIN_LOCK, &info, 1);
 	if (ret < 0)
 		return ret;
+	if (!ret)
+		return -ENOENT;
 	return info.off;
 }
 
 int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 {
-	struct btf_field_info info = { .off = -ENOENT };
+	struct btf_field_info info;
 	int ret;
 
-	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info);
+	ret = btf_find_field(btf, t, BTF_FIELD_TIMER, &info, 1);
 	if (ret < 0)
 		return ret;
+	if (!ret)
+		return -ENOENT;
 	return info.off;
 }
 
+struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
+					  const struct btf_type *t)
+{
+	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
+	struct bpf_map_value_off *tab;
+	int ret, i, nr_off;
+
+	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
+	BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
+
+	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
+	if (ret < 0)
+		return ERR_PTR(ret);
+	if (!ret)
+		return NULL;
+
+	nr_off = ret;
+	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
+	if (!tab)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < nr_off; i++) {
+		const struct btf_type *t;
+		struct btf *off_btf;
+		s32 id;
+
+		t = btf_type_by_id(btf, info_arr[i].type_id);
+		id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
+				     &off_btf);
+		if (id < 0) {
+			ret = id;
+			goto end;
+		}
+
+		tab->off[i].offset = info_arr[i].off;
+		tab->off[i].kptr.btf_id = id;
+		tab->off[i].kptr.btf = off_btf;
+	}
+	tab->nr_off = nr_off;
+	return tab;
+end:
+	while (i--)
+		btf_put(tab->off[i].kptr.btf);
+	kfree(tab);
+	return ERR_PTR(ret);
+}
+
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
 			      u32 type_id, void *data, u8 bits_offset,
 			      struct btf_show *show)
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 5cd8f5277279..135205d0d560 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	inner_map_meta->max_entries = inner_map->max_entries;
 	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
 	inner_map_meta->timer_off = inner_map->timer_off;
+	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
 	if (inner_map->btf) {
 		btf_get(inner_map->btf);
 		inner_map_meta->btf = inner_map->btf;
@@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 
 void bpf_map_meta_free(struct bpf_map *map_meta)
 {
+	bpf_map_free_kptr_off_tab(map_meta);
 	btf_put(map_meta->btf);
 	kfree(map_meta);
 }
@@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 		meta0->key_size == meta1->key_size &&
 		meta0->value_size == meta1->value_size &&
 		meta0->timer_off == meta1->timer_off &&
-		meta0->map_flags == meta1->map_flags;
+		meta0->map_flags == meta1->map_flags &&
+		bpf_map_equal_kptr_off_tab(meta0, meta1);
 }
 
 void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e9621cfa09f2..fba49f390ed5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -6,6 +6,7 @@
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
 #include <linux/bpf_verifier.h>
+#include <linux/bsearch.h>
 #include <linux/btf.h>
 #include <linux/syscalls.h>
 #include <linux/slab.h>
@@ -473,12 +474,94 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 #endif
 
+static int bpf_map_kptr_off_cmp(const void *a, const void *b)
+{
+	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
+
+	if (off_desc1->offset < off_desc2->offset)
+		return -1;
+	else if (off_desc1->offset > off_desc2->offset)
+		return 1;
+	return 0;
+}
+
+struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
+{
+	/* Since members are iterated in btf_find_field in increasing order,
+	 * offsets appended to kptr_off_tab are in increasing order, so we can
+	 * do bsearch to find exact match.
+	 */
+	struct bpf_map_value_off *tab;
+
+	if (!map_value_has_kptrs(map))
+		return NULL;
+	tab = map->kptr_off_tab;
+	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
+}
+
+void bpf_map_free_kptr_off_tab(struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab;
+	int i;
+
+	if (!map_value_has_kptrs(map))
+		return;
+	for (i = 0; i < tab->nr_off; i++) {
+		struct btf *btf = tab->off[i].kptr.btf;
+
+		btf_put(btf);
+	}
+	kfree(tab);
+	map->kptr_off_tab = NULL;
+}
+
+struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
+	int size, i, ret;
+
+	if (!map_value_has_kptrs(map))
+		return ERR_PTR(-ENOENT);
+	/* Do a deep copy of the kptr_off_tab */
+	for (i = 0; i < tab->nr_off; i++)
+		btf_get(tab->off[i].kptr.btf);
+
+	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
+	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
+	if (!new_tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+	return new_tab;
+end:
+	while (i--)
+		btf_put(tab->off[i].kptr.btf);
+	return ERR_PTR(ret);
+}
+
+bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
+{
+	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
+	bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
+	int size;
+
+	if (!a_has_kptr && !b_has_kptr)
+		return true;
+	if (a_has_kptr != b_has_kptr)
+		return false;
+	if (tab_a->nr_off != tab_b->nr_off)
+		return false;
+	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
+	return !memcmp(tab_a, tab_b, size);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
+	bpf_map_free_kptr_off_tab(map);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
@@ -640,7 +723,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	int err;
 
 	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
-	    map_value_has_timer(map))
+	    map_value_has_timer(map) || map_value_has_kptrs(map))
 		return -ENOTSUPP;
 
 	if (!(vma->vm_flags & VM_SHARED))
@@ -820,9 +903,33 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			return -EOPNOTSUPP;
 	}
 
-	if (map->ops->map_check_btf)
+	map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
+	if (map_value_has_kptrs(map)) {
+		if (!bpf_capable()) {
+			ret = -EPERM;
+			goto free_map_tab;
+		}
+		if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
+			ret = -EACCES;
+			goto free_map_tab;
+		}
+		if (map->map_type != BPF_MAP_TYPE_HASH &&
+		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+		    map->map_type != BPF_MAP_TYPE_ARRAY) {
+			ret = -EOPNOTSUPP;
+			goto free_map_tab;
+		}
+	}
+
+	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
+		if (ret < 0)
+			goto free_map_tab;
+	}
 
+	return ret;
+free_map_tab:
+	bpf_map_free_kptr_off_tab(map);
 	return ret;
 }
 
@@ -1639,7 +1746,7 @@ static int map_freeze(const union bpf_attr *attr)
 		return PTR_ERR(map);
 
 	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
-	    map_value_has_timer(map)) {
+	    map_value_has_timer(map) || map_value_has_kptrs(map)) {
 		fdput(f);
 		return -ENOTSUPP;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 71827d14724a..c802e51c4e18 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3211,7 +3211,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
 	return 0;
 }
 
-enum stack_access_src {
+enum bpf_access_src {
 	ACCESS_DIRECT = 1,  /* the access is performed by an instruction */
 	ACCESS_HELPER = 2,  /* the access is performed by a helper */
 };
@@ -3219,7 +3219,7 @@ enum stack_access_src {
 static int check_stack_range_initialized(struct bpf_verifier_env *env,
 					 int regno, int off, int access_size,
 					 bool zero_size_allowed,
-					 enum stack_access_src type,
+					 enum bpf_access_src type,
 					 struct bpf_call_arg_meta *meta);
 
 static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno)
@@ -3507,9 +3507,87 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, false);
 }
 
+static int map_kptr_match_type(struct bpf_verifier_env *env,
+			       struct bpf_map_value_off_desc *off_desc,
+			       struct bpf_reg_state *reg, u32 regno)
+{
+	const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
+	const char *reg_name = "";
+
+	if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
+		goto bad_type;
+
+	if (!btf_is_kernel(reg->btf)) {
+		verbose(env, "R%d must point to kernel BTF\n", regno);
+		return -EINVAL;
+	}
+	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
+	reg_name = kernel_type_name(reg->btf, reg->btf_id);
+
+	if (__check_ptr_off_reg(env, reg, regno, true))
+		return -EACCES;
+
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+				  off_desc->kptr.btf, off_desc->kptr.btf_id))
+		goto bad_type;
+	return 0;
+bad_type:
+	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
+		reg_type_str(env, reg->type), reg_name);
+	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	return -EINVAL;
+}
+
+static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
+				 int value_regno, int insn_idx,
+				 struct bpf_map_value_off_desc *off_desc)
+{
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+	int class = BPF_CLASS(insn->code);
+	struct bpf_reg_state *val_reg;
+
+	/* Things we already checked for in check_map_access and caller:
+	 *  - Reject cases where variable offset may touch kptr
+	 *  - size of access (must be BPF_DW)
+	 *  - tnum_is_const(reg->var_off)
+	 *  - off_desc->offset == off + reg->var_off.value
+	 */
+	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
+	if (BPF_MODE(insn->code) != BPF_MEM) {
+		verbose(env, "kptr in map can only be accessed using BPF_MEM instruction mode\n");
+		return -EACCES;
+	}
+
+	if (class == BPF_LDX) {
+		val_reg = reg_state(env, value_regno);
+		/* We can simply mark the value_regno receiving the pointer
+		 * value from map as PTR_TO_BTF_ID, with the correct type.
+		 */
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
+				off_desc->kptr.btf_id, PTR_MAYBE_NULL);
+		val_reg->id = ++env->id_gen;
+	} else if (class == BPF_STX) {
+		val_reg = reg_state(env, value_regno);
+		if (!register_is_null(val_reg) &&
+		    map_kptr_match_type(env, off_desc, val_reg, value_regno))
+			return -EACCES;
+	} else if (class == BPF_ST) {
+		if (insn->imm) {
+			verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
+				off_desc->offset);
+			return -EACCES;
+		}
+	} else {
+		verbose(env, "kptr in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
+		return -EACCES;
+	}
+	return 0;
+}
+
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
-			    int off, int size, bool zero_size_allowed)
+			    int off, int size, bool zero_size_allowed,
+			    enum bpf_access_src src)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = vstate->frame[vstate->curframe];
@@ -3545,6 +3623,36 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 	}
+	if (map_value_has_kptrs(map)) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++) {
+			u32 p = tab->off[i].offset;
+
+			if (reg->smin_value + off < p + sizeof(u64) &&
+			    p < reg->umax_value + off + size) {
+				if (src != ACCESS_DIRECT) {
+					verbose(env, "kptr cannot be accessed indirectly by helper\n");
+					return -EACCES;
+				}
+				if (!tnum_is_const(reg->var_off)) {
+					verbose(env, "kptr access cannot have variable offset\n");
+					return -EACCES;
+				}
+				if (p != off + reg->var_off.value) {
+					verbose(env, "kptr access misaligned expected=%u off=%llu\n",
+						p, off + reg->var_off.value);
+					return -EACCES;
+				}
+				if (size != bpf_size_to_bytes(BPF_DW)) {
+					verbose(env, "kptr access size must be BPF_DW\n");
+					return -EACCES;
+				}
+				break;
+			}
+		}
+	}
 	return err;
 }
 
@@ -4316,7 +4424,7 @@ static int check_stack_slot_within_bounds(int off,
 static int check_stack_access_within_bounds(
 		struct bpf_verifier_env *env,
 		int regno, int off, int access_size,
-		enum stack_access_src src, enum bpf_access_type type)
+		enum bpf_access_src src, enum bpf_access_type type)
 {
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *reg = regs + regno;
@@ -4412,6 +4520,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
 	} else if (reg->type == PTR_TO_MAP_VALUE) {
+		struct bpf_map_value_off_desc *kptr_off_desc = NULL;
+
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
 			verbose(env, "R%d leaks addr into map\n", value_regno);
@@ -4420,8 +4530,16 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_map_access_type(env, regno, off, size, t);
 		if (err)
 			return err;
-		err = check_map_access(env, regno, off, size, false);
-		if (!err && t == BPF_READ && value_regno >= 0) {
+		err = check_map_access(env, regno, off, size, false, ACCESS_DIRECT);
+		if (err)
+			return err;
+		if (tnum_is_const(reg->var_off))
+			kptr_off_desc = bpf_map_kptr_off_contains(reg->map_ptr,
+								  off + reg->var_off.value);
+		if (kptr_off_desc) {
+			err = check_map_kptr_access(env, regno, value_regno, insn_idx,
+						    kptr_off_desc);
+		} else if (t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
 
 			/* if map is read-only, track its contents as scalars */
@@ -4724,7 +4842,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 static int check_stack_range_initialized(
 		struct bpf_verifier_env *env, int regno, int off,
 		int access_size, bool zero_size_allowed,
-		enum stack_access_src type, struct bpf_call_arg_meta *meta)
+		enum bpf_access_src type, struct bpf_call_arg_meta *meta)
 {
 	struct bpf_reg_state *reg = reg_state(env, regno);
 	struct bpf_func_state *state = func(env, reg);
@@ -4874,7 +4992,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
 					  BPF_READ))
 			return -EACCES;
 		return check_map_access(env, regno, reg->off, access_size,
-					zero_size_allowed);
+					zero_size_allowed, ACCESS_HELPER);
 	case PTR_TO_MEM:
 		if (type_is_rdonly_mem(reg->type)) {
 			if (meta && meta->raw_mode) {
@@ -5642,7 +5760,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		}
 
 		err = check_map_access(env, regno, reg->off,
-				       map->value_size - reg->off, false);
+				       map->value_size - reg->off, false,
+				       ACCESS_HELPER);
 		if (err)
 			return err;
 
@@ -7462,7 +7581,7 @@ static int sanitize_check_bounds(struct bpf_verifier_env *env,
 			return -EACCES;
 		break;
 	case PTR_TO_MAP_VALUE:
-		if (check_map_access(env, dst, dst_reg->off, 1, false)) {
+		if (check_map_access(env, dst, dst_reg->off, 1, false, ACCESS_HELPER)) {
 			verbose(env, "R%d pointer arithmetic of map value goes out of range, "
 				"prohibited for !root\n", dst);
 			return -EACCES;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:19   ` Alexei Starovoitov
  2022-04-15 16:03 ` [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Add a new type flag for bpf_arg_type that when set tells verifier that
for a release function, that argument's register will be the one for
which meta.ref_obj_id will be set, and which will then be released
using release_reference. To capture the regno, introduce a new field
release_regno in bpf_call_arg_meta.

This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.

Since we now indicate whether an argument is to be released in
bpf_func_proto itself, is_release_function helper has lost its utitlity,
hence refactor code to work without it, and just rely on
meta.release_regno to know when to release state for a ref_obj_id.
Still, the restriction of one release argument and only one ref_obj_id
passed to BPF helper or kfunc remains. This may be lifted in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  5 +-
 include/linux/bpf_verifier.h                  |  3 +-
 kernel/bpf/btf.c                              |  9 ++-
 kernel/bpf/ringbuf.c                          |  4 +-
 kernel/bpf/verifier.c                         | 76 +++++++++++--------
 net/core/filter.c                             |  2 +-
 .../selftests/bpf/verifier/ref_tracking.c     |  2 +-
 tools/testing/selftests/bpf/verifier/sock.c   |  6 +-
 8 files changed, 60 insertions(+), 47 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ab86f4675db2..f73a3f10e654 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -366,7 +366,10 @@ enum bpf_type_flag {
 	 */
 	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
+	/* Indicates that the pointer argument will be released. */
+	PTR_RELEASE		= BIT(5 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= PTR_RELEASE,
 };
 
 /* Max number of base types. */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 3a9d2d7cc6b7..1f1e7f2ea967 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 		      const struct bpf_reg_state *reg, int regno);
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func);
+			   enum bpf_arg_type arg_type);
 int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 			     u32 regno);
 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index be191df76ea4..7227a77a02f7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5993,6 +5993,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
+		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1;
 		struct bpf_reg_state *reg = &regs[regno];
 
@@ -6013,7 +6014,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
+		if (rel && reg->ref_obj_id)
+			arg_type |= PTR_RELEASE;
+		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
 		if (ret < 0)
 			return ret;
 
@@ -6046,9 +6049,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				reg_btf = reg->btf;
 				reg_ref_id = reg->btf_id;
 				/* Ensure only one argument is referenced
-				 * PTR_TO_BTF_ID, check_func_arg_reg_off relies
-				 * on only one referenced register being allowed
-				 * for kfuncs.
+				 * PTR_TO_BTF_ID.
 				 */
 				if (reg->ref_obj_id) {
 					if (ref_obj_id) {
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..a22c21c0a7ef 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_submit_proto = {
 	.func		= bpf_ringbuf_submit,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
@@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_discard_proto = {
 	.func		= bpf_ringbuf_discard,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c802e51c4e18..97f88d06f848 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
 	struct bpf_map *map_ptr;
 	bool raw_mode;
 	bool pkt_access;
+	u8 release_regno;
 	int regno;
 	int access_size;
 	int mem_size;
@@ -471,17 +472,6 @@ static bool type_may_be_null(u32 type)
 	return type & PTR_MAYBE_NULL;
 }
 
-/* Determine whether the function releases some resources allocated by another
- * function call. The first reference type argument will be assumed to be
- * released by release_reference().
- */
-static bool is_release_function(enum bpf_func_id func_id)
-{
-	return func_id == BPF_FUNC_sk_release ||
-	       func_id == BPF_FUNC_ringbuf_submit ||
-	       func_id == BPF_FUNC_ringbuf_discard;
-}
-
 static bool may_be_acquire_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_sk_lookup_tcp ||
@@ -5304,6 +5294,11 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
 	       type == ARG_PTR_TO_LONG;
 }
 
+static bool arg_type_is_release_ptr(enum bpf_arg_type type)
+{
+	return type & PTR_RELEASE;
+}
+
 static int int_ptr_type_to_size(enum bpf_arg_type type)
 {
 	if (type == ARG_PTR_TO_INT)
@@ -5514,11 +5509,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func)
+			   enum bpf_arg_type arg_type)
 {
-	bool fixed_off_ok = false, release_reg;
 	enum bpf_reg_type type = reg->type;
+	bool fixed_off_ok = false;
 
 	switch ((u32)type) {
 	case SCALAR_VALUE:
@@ -5536,7 +5530,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		/* Some of the argument types nevertheless require a
 		 * zero register offset.
 		 */
-		if (arg_type != ARG_PTR_TO_ALLOC_MEM)
+		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
 			return 0;
 		break;
 	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
@@ -5544,19 +5538,17 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 */
 	case PTR_TO_BTF_ID:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
-		 * it's fixed offset must be 0. We rely on the property that
-		 * only one referenced register can be passed to BPF helpers and
-		 * kfuncs. In the other cases, fixed offset can be non-zero.
+		 * it's fixed offset must be 0.	In the other cases, fixed offset
+		 * can be non-zero.
 		 */
-		release_reg = is_release_func && reg->ref_obj_id;
-		if (release_reg && reg->off) {
+		if (arg_type_is_release_ptr(arg_type) && reg->off) {
 			verbose(env, "R%d must have zero offset when passed to release func\n",
 				regno);
 			return -EINVAL;
 		}
-		/* For release_reg == true, fixed_off_ok must be false, but we
-		 * already checked and rejected reg->off != 0 above, so set to
-		 * true to allow fixed offset for all other cases.
+		/* For arg is release pointer, fixed_off_ok must be false, but
+		 * we already checked and rejected reg->off != 0 above, so set
+		 * to true to allow fixed offset for all other cases.
 		 */
 		fixed_off_ok = true;
 		break;
@@ -5615,14 +5607,24 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	if (err)
 		return err;
 
-	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
+	err = check_func_arg_reg_off(env, reg, regno, arg_type);
 	if (err)
 		return err;
 
 skip_type_check:
-	/* check_func_arg_reg_off relies on only one referenced register being
-	 * allowed for BPF helpers.
-	 */
+	if (arg_type_is_release_ptr(arg_type)) {
+		if (!reg->ref_obj_id && !register_is_null(reg)) {
+			verbose(env, "R%d must be referenced when passed to release function\n",
+				regno);
+			return -EINVAL;
+		}
+		if (meta->release_regno) {
+			verbose(env, "verifier internal error: more than one release argument\n");
+			return -EFAULT;
+		}
+		meta->release_regno = regno;
+	}
+
 	if (reg->ref_obj_id) {
 		if (meta->ref_obj_id) {
 			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
@@ -6129,7 +6131,8 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	return true;
 }
 
-static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
+static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
+			    struct bpf_call_arg_meta *meta)
 {
 	return check_raw_mode_ok(fn) &&
 	       check_arg_pair_ok(fn) &&
@@ -6813,7 +6816,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	memset(&meta, 0, sizeof(meta));
 	meta.pkt_access = fn->pkt_access;
 
-	err = check_func_proto(fn, func_id);
+	err = check_func_proto(fn, func_id, &meta);
 	if (err) {
 		verbose(env, "kernel subsystem misconfigured func %s#%d\n",
 			func_id_name(func_id), func_id);
@@ -6846,8 +6849,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
-	if (is_release_function(func_id)) {
-		err = release_reference(env, meta.ref_obj_id);
+	regs = cur_regs(env);
+
+	if (meta.release_regno) {
+		err = -EINVAL;
+		if (meta.ref_obj_id)
+			err = release_reference(env, meta.ref_obj_id);
+		/* meta.ref_obj_id can only be 0 if register that is meant to be
+		 * released is NULL, which must be > R0.
+		 */
+		else if (register_is_null(&regs[meta.release_regno]))
+			err = 0;
 		if (err) {
 			verbose(env, "func %s#%d reference has not been acquired before\n",
 				func_id_name(func_id), func_id);
@@ -6855,8 +6867,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
-	regs = cur_regs(env);
-
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env);
diff --git a/net/core/filter.c b/net/core/filter.c
index 143f442a9505..8eb01a997476 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
 	.func		= bpf_sk_release,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_RELEASE,
 };
 
 BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index fbd682520e47..57a83d763ec1 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -796,7 +796,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "R1 must be referenced when passed to release function",
 },
 {
 	/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
index 86b24cad27a7..d11d0b28be41 100644
--- a/tools/testing/selftests/bpf/verifier/sock.c
+++ b/tools/testing/selftests/bpf/verifier/sock.c
@@ -417,7 +417,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "R1 must be referenced when passed to release function",
 },
 {
 	"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
@@ -436,7 +436,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "R1 must be referenced when passed to release function",
 },
 {
 	"bpf_sk_release(bpf_tcp_sock(skb->sk))",
@@ -455,7 +455,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "R1 must be referenced when passed to release function",
 },
 {
 	"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:21   ` Alexei Starovoitov
  2022-04-15 16:03 ` [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Extending the code in previous commits, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Note that process_kptr_func doesn't have to call
check_helper_mem_access, since we already disallow rdonly/wronly flags
for map, which is what check_map_access_type checks, and we already
ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
so check_map_access is also not required.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h            |  8 +++
 include/uapi/linux/bpf.h       | 12 +++++
 kernel/bpf/btf.c               | 10 +++-
 kernel/bpf/helpers.c           | 21 ++++++++
 kernel/bpf/verifier.c          | 98 +++++++++++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h | 12 +++++
 6 files changed, 148 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f73a3f10e654..61f83a23980f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -160,8 +160,14 @@ enum {
 	BPF_MAP_VALUE_OFF_MAX = 8,
 };
 
+enum bpf_map_off_desc_type {
+	BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR,
+	BPF_MAP_OFF_DESC_TYPE_REF_KPTR,
+};
+
 struct bpf_map_value_off_desc {
 	u32 offset;
+	enum bpf_map_off_desc_type type;
 	struct {
 		struct btf *btf;
 		u32 btf_id;
@@ -418,6 +424,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_STACK,	/* pointer to stack */
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
+	ARG_PTR_TO_KPTR,	/* pointer to referenced kptr */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -427,6 +434,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+	ARG_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..444fe6f1cf35 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 7227a77a02f7..0c5559157c77 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3177,6 +3177,7 @@ enum {
 struct btf_field_info {
 	u32 type_id;
 	u32 off;
+	enum bpf_map_off_desc_type type;
 };
 
 static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
@@ -3193,6 +3194,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
 static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 			 u32 off, int sz, struct btf_field_info *info)
 {
+	enum bpf_map_off_desc_type type;
 	u32 res_id;
 
 	/* For PTR, sz is always == 8 */
@@ -3205,7 +3207,11 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 	/* Reject extra tags */
 	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
 		return -EINVAL;
-	if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+	if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+		type = BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR;
+	else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
+		type = BPF_MAP_OFF_DESC_TYPE_REF_KPTR;
+	else
 		return -EINVAL;
 
 	/* Get the base type */
@@ -3216,6 +3222,7 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 
 	info->type_id = res_id;
 	info->off = off;
+	info->type = type;
 	return BTF_FIELD_FOUND;
 }
 
@@ -3420,6 +3427,7 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 		}
 
 		tab->off[i].offset = info_arr[i].off;
+		tab->off[i].type = info_arr[i].type;
 		tab->off[i].kptr.btf_id = id;
 		tab->off[i].kptr.btf = off_btf;
 	}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..a437d0f0458a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,25 @@ void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
+{
+	unsigned long *kptr = map_value;
+
+	return xchg(kptr, (unsigned long)ptr);
+}
+
+static u32 bpf_kptr_xchg_btf_id;
+
+const struct bpf_func_proto bpf_kptr_xchg_proto = {
+	.func         = bpf_kptr_xchg,
+	.gpl_only     = false,
+	.ret_type     = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id   = &bpf_kptr_xchg_btf_id,
+	.arg1_type    = ARG_PTR_TO_KPTR,
+	.arg2_type    = ARG_PTR_TO_BTF_ID_OR_NULL | PTR_RELEASE,
+	.arg2_btf_id  = &bpf_kptr_xchg_btf_id,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1452,6 +1471,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_timer_start_proto;
 	case BPF_FUNC_timer_cancel:
 		return &bpf_timer_cancel_proto;
+	case BPF_FUNC_kptr_xchg:
+		return &bpf_kptr_xchg_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 97f88d06f848..aa5c0d1c8495 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -258,6 +258,7 @@ struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
+	struct bpf_map_value_off_desc *kptr_off_desc;
 };
 
 struct btf *btf_vmlinux;
@@ -489,7 +490,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
 	if (func_id == BPF_FUNC_sk_lookup_tcp ||
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
-	    func_id == BPF_FUNC_ringbuf_reserve)
+	    func_id == BPF_FUNC_ringbuf_reserve ||
+	    func_id == BPF_FUNC_kptr_xchg)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -3514,6 +3516,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
 	reg_name = kernel_type_name(reg->btf, reg->btf_id);
 
+	/* For ref_ptr case, release function check should ensure we get one
+	 * referenced PTR_TO_BTF_ID, and that its fixed offset is 0. For the
+	 * normal store of unreferenced kptr, we must ensure var_off is zero.
+	 * Since ref_ptr cannot be accessed directly by BPF insns, checks for
+	 * reg->off and reg->ref_obj_id are not needed here.
+	 */
 	if (__check_ptr_off_reg(env, reg, regno, true))
 		return -EACCES;
 
@@ -3548,6 +3556,12 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		return -EACCES;
 	}
 
+	/* We cannot directly access kptr_ref */
+	if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
+		verbose(env, "accessing referenced kptr disallowed\n");
+		return -EACCES;
+	}
+
 	if (class == BPF_LDX) {
 		val_reg = reg_state(env, value_regno);
 		/* We can simply mark the value_regno receiving the pointer
@@ -5271,6 +5285,53 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
+static int process_kptr_func(struct bpf_verifier_env *env, int regno,
+			     struct bpf_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_map_value_off_desc *off_desc;
+	struct bpf_map *map_ptr = reg->map_ptr;
+	u32 kptr_off;
+	int ret;
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. kptr has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+	if (!map_ptr->btf) {
+		verbose(env, "map '%s' has to have BTF in order to use bpf_kptr_xchg\n",
+			map_ptr->name);
+		return -EINVAL;
+	}
+	if (!map_value_has_kptrs(map_ptr)) {
+		ret = PTR_ERR(map_ptr->kptr_off_tab);
+		if (ret == -E2BIG)
+			verbose(env, "map '%s' has more than %d kptr\n", map_ptr->name,
+				BPF_MAP_VALUE_OFF_MAX);
+		else if (ret == -EEXIST)
+			verbose(env, "map '%s' has repeating kptr BTF tags\n", map_ptr->name);
+		else
+			verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
+		return -EINVAL;
+	}
+
+	meta->map_ptr = map_ptr;
+	kptr_off = reg->off + reg->var_off.value;
+	off_desc = bpf_map_kptr_off_contains(map_ptr, kptr_off);
+	if (!off_desc) {
+		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
+		return -EACCES;
+	}
+	if (off_desc->type != BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
+		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
+		return -EACCES;
+	}
+	meta->kptr_off_desc = off_desc;
+	return 0;
+}
+
 static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 {
 	return base_type(type) == ARG_PTR_TO_MEM ||
@@ -5411,6 +5472,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
 
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
@@ -5438,11 +5500,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
+	[ARG_PTR_TO_KPTR]		= &kptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			  enum bpf_arg_type arg_type,
-			  const u32 *arg_btf_id)
+			  const u32 *arg_btf_id,
+			  struct bpf_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	enum bpf_reg_type expected, type = reg->type;
@@ -5495,8 +5559,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			arg_btf_id = compatible->btf_id;
 		}
 
-		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
-					  btf_vmlinux, *arg_btf_id)) {
+		if (meta->func_id == BPF_FUNC_kptr_xchg) {
+			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
+				return -EACCES;
+		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+						 btf_vmlinux, *arg_btf_id)) {
 			verbose(env, "R%d is of type %s but %s is expected\n",
 				regno, kernel_type_name(reg->btf, reg->btf_id),
 				kernel_type_name(btf_vmlinux, *arg_btf_id));
@@ -5603,7 +5670,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		 */
 		goto skip_type_check;
 
-	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg]);
+	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg], meta);
 	if (err)
 		return err;
 
@@ -5779,6 +5846,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "string is not zero-terminated\n");
 			return -EINVAL;
 		}
+	} else if (arg_type == ARG_PTR_TO_KPTR) {
+		if (process_kptr_func(env, regno, meta))
+			return -EACCES;
 	}
 
 	return err;
@@ -6121,10 +6191,10 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
-		if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
 			return false;
 
-		if (fn->arg_type[i] != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
 			return false;
 	}
 
@@ -6990,21 +7060,25 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
 	} else if (base_type(ret_type) == RET_PTR_TO_BTF_ID) {
+		struct btf *ret_btf;
 		int ret_btf_id;
 
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
-		ret_btf_id = *fn->ret_btf_id;
+		if (func_id == BPF_FUNC_kptr_xchg) {
+			ret_btf = meta.kptr_off_desc->kptr.btf;
+			ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
+		} else {
+			ret_btf = btf_vmlinux;
+			ret_btf_id = *fn->ret_btf_id;
+		}
 		if (ret_btf_id == 0) {
 			verbose(env, "invalid return type %u of func %s#%d\n",
 				base_type(ret_type), func_id_name(func_id),
 				func_id);
 			return -EINVAL;
 		}
-		/* current BPF helper definitions are only coming from
-		 * built-in code with type IDs from  vmlinux BTF
-		 */
-		regs[BPF_REG_0].btf = btf_vmlinux;
+		regs[BPF_REG_0].btf = ret_btf;
 		regs[BPF_REG_0].btf_id = ret_btf_id;
 	} else {
 		verbose(env, "unknown return type %u of func %s#%d\n",
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..444fe6f1cf35 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-18 23:48   ` Joanne Koong
  2022-04-15 16:03 ` [PATCH bpf-next v5 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

While we can guarantee that even for unreferenced kptr, the object
pointer points to being freed etc. can be handled by the verifier's
exception handling (normal load patching to PROBE_MEM loads), we still
cannot allow the user to pass these pointers to BPF helpers and kfunc,
because the same exception handling won't be done for accesses inside
the kernel. The same is true if a referenced pointer is loaded using
normal load instruction. Since the reference is not guaranteed to be
held while the pointer is used, it must be marked as untrusted.

Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
all registers loading unreferenced and referenced kptr from BPF maps,
and ensure they can never escape the BPF program and into the kernel by
way of calling stable/unstable helpers.

In check_ptr_to_btf_access, the !type_may_be_null check to reject type
flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
two are checked inside the function and rejected using a proper error
message, but we still want to allow dereference of untrusted case.

Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
walked, so that this flag is never dropped once it has been set on a
PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
direction).

In convert_ctx_accesses, extend the switch case to consider untrusted
PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
conversion for BPF_LDX.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   | 10 +++++++++-
 kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++++++-------
 2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 61f83a23980f..7e2ac2a26bdb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -375,7 +375,15 @@ enum bpf_type_flag {
 	/* Indicates that the pointer argument will be released. */
 	PTR_RELEASE		= BIT(5 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= PTR_RELEASE,
+	/* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
+	 * unreferenced and referenced kptr loaded from map value using a load
+	 * instruction, so that they can only be dereferenced but not escape the
+	 * BPF program into the kernel (i.e. cannot be passed as arguments to
+	 * kfunc or bpf helpers).
+	 */
+	PTR_UNTRUSTED		= BIT(6 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= PTR_UNTRUSTED,
 };
 
 /* Max number of base types. */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index aa5c0d1c8495..3b89dc8d41ce 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -567,6 +567,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		strncpy(prefix, "user_", 32);
 	if (type & MEM_PERCPU)
 		strncpy(prefix, "percpu_", 32);
+	if (type & PTR_UNTRUSTED)
+		strncpy(prefix, "untrusted_", 32);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -3504,9 +3506,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 			       struct bpf_reg_state *reg, u32 regno)
 {
 	const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
+	int perm_flags = PTR_MAYBE_NULL;
 	const char *reg_name = "";
 
-	if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
+	/* Only unreferenced case accepts untrusted pointers */
+	if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
+		perm_flags |= PTR_UNTRUSTED;
+
+	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
 		goto bad_type;
 
 	if (!btf_is_kernel(reg->btf)) {
@@ -3532,7 +3539,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 bad_type:
 	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
 		reg_type_str(env, reg->type), reg_name);
-	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	verbose(env, "expected=%s%s", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
+		verbose(env, " or %s%s\n", reg_type_str(env, PTR_TO_BTF_ID | PTR_UNTRUSTED),
+			targ_name);
+	else
+		verbose(env, "\n");
 	return -EINVAL;
 }
 
@@ -3556,9 +3568,11 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		return -EACCES;
 	}
 
-	/* We cannot directly access kptr_ref */
-	if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
-		verbose(env, "accessing referenced kptr disallowed\n");
+	/* We only allow loading referenced kptr, since it will be marked as
+	 * untrusted, similar to unreferenced kptr.
+	 */
+	if (class != BPF_LDX && off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
+		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
 
@@ -3568,7 +3582,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
-				off_desc->kptr.btf_id, PTR_MAYBE_NULL);
+				off_desc->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
 		val_reg = reg_state(env, value_regno);
@@ -4336,6 +4350,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (ret < 0)
 		return ret;
 
+	/* If this is an untrusted pointer, all pointers formed by walking it
+	 * also inherit the untrusted flag.
+	 */
+	if (type_flag(reg->type) & PTR_UNTRUSTED)
+		flag |= PTR_UNTRUSTED;
+
 	if (atype == BPF_READ && value_regno >= 0)
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
 
@@ -13054,7 +13074,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		if (!ctx_access)
 			continue;
 
-		switch (env->insn_aux_data[i + delta].ptr_type) {
+		switch ((int)env->insn_aux_data[i + delta].ptr_type) {
 		case PTR_TO_CTX:
 			if (!ops->convert_ctx_access)
 				continue;
@@ -13071,6 +13091,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
 		case PTR_TO_BTF_ID:
+		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 07/13] bpf: Adapt copy_map_value for multiple offset case
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Since now there might be at most 10 offsets that need handling in
copy_map_value, the manual shuffling and special case is no longer going
to work. Hence, let's generalise the copy_map_value function by using
a sorted array of offsets to skip regions that must be avoided while
copying into and out of a map value.

When the map is created, we populate the offset array in struct map,
Then, copy_map_value uses this sorted offset array is used to memcpy
while skipping timer, spin lock, and kptr. The array is allocated as
in most cases none of these special fields would be present in map
value, hence we can save on space for the common case by not embedding
the entire object inside bpf_map struct.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  | 56 +++++++++++++++-------------
 kernel/bpf/syscall.c | 88 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 117 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7e2ac2a26bdb..165d2a38eb97 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -158,6 +158,9 @@ struct bpf_map_ops {
 enum {
 	/* Support at most 8 pointers in a BPF map value */
 	BPF_MAP_VALUE_OFF_MAX = 8,
+	BPF_MAP_OFF_ARR_MAX   = BPF_MAP_VALUE_OFF_MAX +
+				1 + /* for bpf_spin_lock */
+				1,  /* for bpf_timer */
 };
 
 enum bpf_map_off_desc_type {
@@ -179,6 +182,12 @@ struct bpf_map_value_off {
 	struct bpf_map_value_off_desc off[];
 };
 
+struct bpf_map_off_arr {
+	u32 cnt;
+	u32 field_off[BPF_MAP_OFF_ARR_MAX];
+	u8 field_sz[BPF_MAP_OFF_ARR_MAX];
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -207,10 +216,7 @@ struct bpf_map {
 	struct mem_cgroup *memcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
-	bool bypass_spec_v1;
-	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 6 bytes hole */
-
+	struct bpf_map_off_arr *off_arr;
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
 	 */
@@ -230,6 +236,8 @@ struct bpf_map {
 		bool jited;
 		bool xdp_has_frags;
 	} owner;
+	bool bypass_spec_v1;
+	bool frozen; /* write-once; write-protected by freeze_mutex */
 };
 
 static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@@ -253,37 +261,33 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
 	if (unlikely(map_value_has_timer(map)))
 		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
+	if (unlikely(map_value_has_kptrs(map))) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++)
+			*(u64 *)(dst + tab->off[i].offset) = 0;
+	}
 }
 
 /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
 static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
 {
-	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
+	u32 curr_off = 0;
+	int i;
 
-	if (unlikely(map_value_has_spin_lock(map))) {
-		s_off = map->spin_lock_off;
-		s_sz = sizeof(struct bpf_spin_lock);
-	}
-	if (unlikely(map_value_has_timer(map))) {
-		t_off = map->timer_off;
-		t_sz = sizeof(struct bpf_timer);
+	if (likely(!map->off_arr)) {
+		memcpy(dst, src, map->value_size);
+		return;
 	}
 
-	if (unlikely(s_sz || t_sz)) {
-		if (s_off < t_off || !s_sz) {
-			swap(s_off, t_off);
-			swap(s_sz, t_sz);
-		}
-		memcpy(dst, src, t_off);
-		memcpy(dst + t_off + t_sz,
-		       src + t_off + t_sz,
-		       s_off - t_off - t_sz);
-		memcpy(dst + s_off + s_sz,
-		       src + s_off + s_sz,
-		       map->value_size - s_off - s_sz);
-	} else {
-		memcpy(dst, src, map->value_size);
+	for (i = 0; i < map->off_arr->cnt; i++) {
+		u32 next_off = map->off_arr->field_off[i];
+
+		memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
+		curr_off += map->off_arr->field_sz[i];
 	}
+	memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off);
 }
 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 			   bool lock_src);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fba49f390ed5..1b1497b94303 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -30,6 +30,7 @@
 #include <linux/pgtable.h>
 #include <linux/bpf_lsm.h>
 #include <linux/poll.h>
+#include <linux/sort.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/memcontrol.h>
@@ -561,6 +562,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
+	kfree(map->off_arr);
 	bpf_map_free_kptr_off_tab(map);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
@@ -850,6 +852,84 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
+static int map_off_arr_cmp(const void *_a, const void *_b, const void *priv)
+{
+	const u32 a = *(const u32 *)_a;
+	const u32 b = *(const u32 *)_b;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void map_off_arr_swap(void *_a, void *_b, int size, const void *priv)
+{
+	struct bpf_map *map = (struct bpf_map *)priv;
+	u32 *off_base = map->off_arr->field_off;
+	u32 *a = _a, *b = _b;
+	u8 *sz_a, *sz_b;
+
+	sz_a = map->off_arr->field_sz + (a - off_base);
+	sz_b = map->off_arr->field_sz + (b - off_base);
+
+	swap(*a, *b);
+	swap(*sz_a, *sz_b);
+}
+
+static int bpf_map_alloc_off_arr(struct bpf_map *map)
+{
+	bool has_spin_lock = map_value_has_spin_lock(map);
+	bool has_timer = map_value_has_timer(map);
+	bool has_kptrs = map_value_has_kptrs(map);
+	struct bpf_map_off_arr *off_arr;
+	u32 i;
+
+	if (!has_spin_lock && !has_timer && !has_kptrs) {
+		map->off_arr = NULL;
+		return 0;
+	}
+
+	off_arr = kmalloc(sizeof(*map->off_arr), GFP_KERNEL | __GFP_NOWARN);
+	if (!off_arr)
+		return -ENOMEM;
+	map->off_arr = off_arr;
+
+	off_arr->cnt = 0;
+	if (has_spin_lock) {
+		i = off_arr->cnt;
+
+		off_arr->field_off[i] = map->spin_lock_off;
+		off_arr->field_sz[i] = sizeof(struct bpf_spin_lock);
+		off_arr->cnt++;
+	}
+	if (has_timer) {
+		i = off_arr->cnt;
+
+		off_arr->field_off[i] = map->timer_off;
+		off_arr->field_sz[i] = sizeof(struct bpf_timer);
+		off_arr->cnt++;
+	}
+	if (has_kptrs) {
+		struct bpf_map_value_off *tab = map->kptr_off_tab;
+		u32 *off = &off_arr->field_off[off_arr->cnt];
+		u8 *sz = &off_arr->field_sz[off_arr->cnt];
+
+		for (i = 0; i < tab->nr_off; i++) {
+			*off++ = tab->off[i].offset;
+			*sz++ = sizeof(u64);
+		}
+		off_arr->cnt += tab->nr_off;
+	}
+
+	if (off_arr->cnt == 1)
+		return 0;
+	sort_r(off_arr->field_off, off_arr->cnt, sizeof(off_arr->field_off[0]),
+	       map_off_arr_cmp, map_off_arr_swap, map);
+	return 0;
+}
+
 static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			 u32 btf_key_id, u32 btf_value_id)
 {
@@ -1019,10 +1099,14 @@ static int map_create(union bpf_attr *attr)
 			attr->btf_vmlinux_value_type_id;
 	}
 
-	err = security_bpf_map_alloc(map);
+	err = bpf_map_alloc_off_arr(map);
 	if (err)
 		goto free_map;
 
+	err = security_bpf_map_alloc(map);
+	if (err)
+		goto free_map_off_arr;
+
 	err = bpf_map_alloc_id(map);
 	if (err)
 		goto free_map_sec;
@@ -1045,6 +1129,8 @@ static int map_create(union bpf_attr *attr)
 
 free_map_sec:
 	security_bpf_map_free(map);
+free_map_off_arr:
+	kfree(map->off_arr);
 free_map:
 	btf_put(map->btf);
 	map->ops->map_free(map);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

To support storing referenced PTR_TO_BTF_ID in maps, we require
associating a specific BTF ID with a 'destructor' kfunc. This is because
we need to release a live referenced pointer at a certain offset in map
value from the map destruction path, otherwise we end up leaking
resources.

Hence, introduce support for passing an array of btf_id, kfunc_btf_id
pairs that denote a BTF ID and its associated release function. Then,
add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
destructor kfunc of a certain BTF ID. If found, we can use it to free
the object from the map free path.

The registration of these pairs also serve as a whitelist of structures
which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
without finding the destructor kfunc, we will bail and return an error.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h |  17 +++++++
 kernel/bpf/btf.c    | 108 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 19c297f9a52f..fea424681d66 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -40,6 +40,11 @@ struct btf_kfunc_id_set {
 	};
 };
 
+struct btf_id_dtor_kfunc {
+	u32 btf_id;
+	u32 kfunc_btf_id;
+};
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
@@ -346,6 +351,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf,
 			       enum btf_kfunc_type type, u32 kfunc_btf_id);
 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 			      const struct btf_kfunc_id_set *s);
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -369,6 +377,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 {
 	return 0;
 }
+static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	return -ENOENT;
+}
+static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
+					      u32 add_cnt, struct module *owner)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0c5559157c77..fdb4d4971a2a 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -207,12 +207,18 @@ enum btf_kfunc_hook {
 
 enum {
 	BTF_KFUNC_SET_MAX_CNT = 32,
+	BTF_DTOR_KFUNC_MAX_CNT = 256,
 };
 
 struct btf_kfunc_set_tab {
 	struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX];
 };
 
+struct btf_id_dtor_kfunc_tab {
+	u32 cnt;
+	struct btf_id_dtor_kfunc dtors[];
+};
+
 struct btf {
 	void *data;
 	struct btf_type **types;
@@ -228,6 +234,7 @@ struct btf {
 	u32 id;
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
+	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -1616,8 +1623,19 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
 	btf->kfunc_set_tab = NULL;
 }
 
+static void btf_free_dtor_kfunc_tab(struct btf *btf)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+
+	if (!tab)
+		return;
+	kfree(tab);
+	btf->dtor_kfunc_tab = NULL;
+}
+
 static void btf_free(struct btf *btf)
 {
+	btf_free_dtor_kfunc_tab(btf);
 	btf_free_kfunc_set_tab(btf);
 	kvfree(btf->types);
 	kvfree(btf->resolved_sizes);
@@ -7024,6 +7042,96 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 }
 EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set);
 
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+	struct btf_id_dtor_kfunc *dtor;
+
+	if (!tab)
+		return -ENOENT;
+	/* Even though the size of tab->dtors[0] is > sizeof(u32), we only need
+	 * to compare the first u32 with btf_id, so we can reuse btf_id_cmp_func.
+	 */
+	BUILD_BUG_ON(offsetof(struct btf_id_dtor_kfunc, btf_id) != 0);
+	dtor = bsearch(&btf_id, tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func);
+	if (!dtor)
+		return -ENOENT;
+	return dtor->kfunc_btf_id;
+}
+
+/* This function must be invoked only from initcalls/module init functions */
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner)
+{
+	struct btf_id_dtor_kfunc_tab *tab;
+	struct btf *btf;
+	u32 tab_cnt;
+	int ret;
+
+	btf = btf_get_module_btf(owner);
+	if (!btf) {
+		if (!owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+			pr_err("missing vmlinux BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		if (owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) {
+			pr_err("missing module BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		return 0;
+	}
+	if (IS_ERR(btf))
+		return PTR_ERR(btf);
+
+	if (add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = btf->dtor_kfunc_tab;
+	/* Only one call allowed for modules */
+	if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	tab_cnt = tab ? tab->cnt : 0;
+	if (tab_cnt > U32_MAX - add_cnt) {
+		ret = -EOVERFLOW;
+		goto end;
+	}
+	if (tab_cnt + add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = krealloc(btf->dtor_kfunc_tab,
+		       offsetof(struct btf_id_dtor_kfunc_tab, dtors[tab_cnt + add_cnt]),
+		       GFP_KERNEL | __GFP_NOWARN);
+	if (!tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	if (!btf->dtor_kfunc_tab)
+		tab->cnt = 0;
+	btf->dtor_kfunc_tab = tab;
+
+	memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+	tab->cnt += add_cnt;
+
+	sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
+
+	return 0;
+end:
+	btf_free_dtor_kfunc_tab(btf);
+	btf_put(btf);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(register_btf_id_dtor_kfuncs);
+
 #define MAX_TYPES_ARE_COMPAT_DEPTH 2
 
 static
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:26   ` Alexei Starovoitov
  2022-04-15 16:03 ` [PATCH bpf-next v5 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

A destructor kfunc can be defined as void func(type *), where type may
be void or any other pointer type as per convenience.

In this patch, we ensure that the type is sane and capture the function
pointer into off_desc of ptr_off_tab for the specific pointer offset,
with the invariant that the dtor pointer is always set when 'kptr_ref'
tag is applied to the pointer's pointee type, which is indicated by the
flag BPF_MAP_VALUE_OFF_F_REF.

Note that only BTF IDs whose destructor kfunc is registered, thus become
the allowed BTF IDs for embedding as referenced kptr. Hence it serves
the purpose of finding dtor kfunc BTF ID, as well acting as a check
against the whitelist of allowed BTF IDs for this purpose.

Finally, wire up the actual freeing of the referenced pointer if any at
all available offsets, so that no references are leaked after the BPF
map goes away and the BPF program previously moved the ownership a
referenced pointer into it.

The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
will free any existing referenced kptr. The same case is with LRU map's
bpf_lru_push_free/htab_lru_push_free functions, which are extended to
reset unreferenced and free referenced kptr.

Note that unlike BPF timers, kptr is not reset or freed when map uref
drops to zero.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |   4 ++
 include/linux/btf.h   |   2 +
 kernel/bpf/arraymap.c |  14 +++++-
 kernel/bpf/btf.c      | 100 +++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/hashtab.c  |  58 ++++++++++++++++++------
 kernel/bpf/syscall.c  |  57 +++++++++++++++++++++---
 6 files changed, 212 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 165d2a38eb97..0d416a1e0a6c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
 #include <linux/bpfptr.h>
+#include <linux/btf.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -173,6 +174,8 @@ struct bpf_map_value_off_desc {
 	enum bpf_map_off_desc_type type;
 	struct {
 		struct btf *btf;
+		struct module *module;
+		btf_dtor_kfunc_t dtor;
 		u32 btf_id;
 	} kptr;
 };
@@ -1548,6 +1551,7 @@ struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u3
 void bpf_map_free_kptr_off_tab(struct bpf_map *map);
 struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
 bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+void bpf_map_free_kptrs(struct bpf_map *map, void *map_value);
 
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index fea424681d66..f70625dd5bb4 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -45,6 +45,8 @@ struct btf_id_dtor_kfunc {
 	u32 kfunc_btf_id;
 };
 
+typedef void (*btf_dtor_kfunc_t)(void *);
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7f145aefbff8..a84bbca55336 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -287,10 +287,12 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 	return 0;
 }
 
-static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
+static void check_and_free_fields(struct bpf_array *arr, void *val)
 {
 	if (unlikely(map_value_has_timer(&arr->map)))
 		bpf_timer_cancel_and_free(val + arr->map.timer_off);
+	if (unlikely(map_value_has_kptrs(&arr->map)))
+		bpf_map_free_kptrs(&arr->map, val);
 }
 
 /* Called from syscall or from eBPF program */
@@ -327,7 +329,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 			copy_map_value_locked(map, val, value, false);
 		else
 			copy_map_value(map, val, value);
-		check_and_free_timer_in_array(array, val);
+		check_and_free_fields(array, val);
 	}
 	return 0;
 }
@@ -386,6 +388,7 @@ static void array_map_free_timers(struct bpf_map *map)
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	int i;
 
+	/* We don't reset or free kptr on uref dropping to zero. */
 	if (likely(!map_value_has_timer(map)))
 		return;
 
@@ -398,6 +401,13 @@ static void array_map_free_timers(struct bpf_map *map)
 static void array_map_free(struct bpf_map *map)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	int i;
+
+	if (unlikely(map_value_has_kptrs(map))) {
+		for (i = 0; i < array->map.max_entries; i++)
+			bpf_map_free_kptrs(map, array->value + array->elem_size * i);
+		bpf_map_free_kptr_off_tab(map);
+	}
 
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		bpf_array_free_percpu(array);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index fdb4d4971a2a..062a751c1595 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3415,6 +3415,8 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 {
 	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
 	struct bpf_map_value_off *tab;
+	struct btf *off_btf = NULL;
+	struct module *mod = NULL;
 	int ret, i, nr_off;
 
 	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
@@ -3433,7 +3435,6 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 
 	for (i = 0; i < nr_off; i++) {
 		const struct btf_type *t;
-		struct btf *off_btf;
 		s32 id;
 
 		t = btf_type_by_id(btf, info_arr[i].type_id);
@@ -3444,16 +3445,69 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
 			goto end;
 		}
 
+		/* Find and stash the function pointer for the destruction function that
+		 * needs to be eventually invoked from the map free path.
+		 */
+		if (info_arr[i].type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
+			const struct btf_type *dtor_func;
+			const char *dtor_func_name;
+			unsigned long addr;
+			s32 dtor_btf_id;
+
+			/* This call also serves as a whitelist of allowed objects that
+			 * can be used as a referenced pointer and be stored in a map at
+			 * the same time.
+			 */
+			dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
+			if (dtor_btf_id < 0) {
+				ret = dtor_btf_id;
+				goto end_btf;
+			}
+
+			dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
+			if (!dtor_func) {
+				ret = -ENOENT;
+				goto end_btf;
+			}
+
+			if (btf_is_module(btf)) {
+				mod = btf_try_get_module(off_btf);
+				if (!mod) {
+					ret = -ENXIO;
+					goto end_btf;
+				}
+			}
+
+			/* We already verified dtor_func to be btf_type_is_func
+			 * in register_btf_id_dtor_kfuncs.
+			 */
+			dtor_func_name = __btf_name_by_offset(off_btf, dtor_func->name_off);
+			addr = kallsyms_lookup_name(dtor_func_name);
+			if (!addr) {
+				ret = -EINVAL;
+				goto end_mod;
+			}
+			tab->off[i].kptr.dtor = (void *)addr;
+		}
+
 		tab->off[i].offset = info_arr[i].off;
 		tab->off[i].type = info_arr[i].type;
 		tab->off[i].kptr.btf_id = id;
 		tab->off[i].kptr.btf = off_btf;
+		tab->off[i].kptr.module = mod;
 	}
 	tab->nr_off = nr_off;
 	return tab;
+end_mod:
+	module_put(mod);
+end_btf:
+	btf_put(off_btf);
 end:
-	while (i--)
+	while (i--) {
 		btf_put(tab->off[i].kptr.btf);
+		if (tab->off[i].kptr.module)
+			module_put(tab->off[i].kptr.module);
+	}
 	kfree(tab);
 	return ERR_PTR(ret);
 }
@@ -7059,6 +7113,43 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
 	return dtor->kfunc_btf_id;
 }
 
+static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc *dtors, u32 cnt)
+{
+	const struct btf_type *dtor_func, *dtor_func_proto, *t;
+	const struct btf_param *args;
+	s32 dtor_btf_id;
+	u32 nr_args, i;
+
+	for (i = 0; i < cnt; i++) {
+		dtor_btf_id = dtors[i].kfunc_btf_id;
+
+		dtor_func = btf_type_by_id(btf, dtor_btf_id);
+		if (!dtor_func || !btf_type_is_func(dtor_func))
+			return -EINVAL;
+
+		dtor_func_proto = btf_type_by_id(btf, dtor_func->type);
+		if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto))
+			return -EINVAL;
+
+		/* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
+		t = btf_type_by_id(btf, dtor_func_proto->type);
+		if (!t || !btf_type_is_void(t))
+			return -EINVAL;
+
+		nr_args = btf_type_vlen(dtor_func_proto);
+		if (nr_args != 1)
+			return -EINVAL;
+		args = btf_params(dtor_func_proto);
+		t = btf_type_by_id(btf, args[0].type);
+		/* Allow any pointer type, as width on targets Linux supports
+		 * will be same for all pointer types (i.e. sizeof(void *))
+		 */
+		if (!t || !btf_type_is_ptr(t))
+			return -EINVAL;
+	}
+	return 0;
+}
+
 /* This function must be invoked only from initcalls/module init functions */
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner)
@@ -7089,6 +7180,11 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
 		goto end;
 	}
 
+	/* Ensure that the prototype of dtor kfuncs being registered is sane */
+	ret = btf_check_dtor_kfuncs(btf, dtors, add_cnt);
+	if (ret < 0)
+		goto end;
+
 	tab = btf->dtor_kfunc_tab;
 	/* Only one call allowed for modules */
 	if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index c68fbebc8c00..2bc9416096ca 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -254,6 +254,25 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
 	}
 }
 
+static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
+{
+	u32 num_entries = htab->map.max_entries;
+	int i;
+
+	if (likely(!map_value_has_kptrs(&htab->map)))
+		return;
+	if (htab_has_extra_elems(htab))
+		num_entries += num_possible_cpus();
+
+	for (i = 0; i < num_entries; i++) {
+		struct htab_elem *elem;
+
+		elem = get_htab_elem(htab, i);
+		bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
+		cond_resched();
+	}
+}
+
 static void htab_free_elems(struct bpf_htab *htab)
 {
 	int i;
@@ -725,12 +744,15 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
 	return insn - insn_buf;
 }
 
-static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
+static void check_and_free_fields(struct bpf_htab *htab,
+				  struct htab_elem *elem)
 {
+	void *map_value = elem->key + round_up(htab->map.key_size, 8);
+
 	if (unlikely(map_value_has_timer(&htab->map)))
-		bpf_timer_cancel_and_free(elem->key +
-					  round_up(htab->map.key_size, 8) +
-					  htab->map.timer_off);
+		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
+	if (unlikely(map_value_has_kptrs(&htab->map)))
+		bpf_map_free_kptrs(&htab->map, map_value);
 }
 
 /* It is called from the bpf_lru_list when the LRU needs to delete
@@ -757,7 +779,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
 		if (l == tgt_l) {
 			hlist_nulls_del_rcu(&l->hash_node);
-			check_and_free_timer(htab, l);
+			check_and_free_fields(htab, l);
 			break;
 		}
 
@@ -829,7 +851,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
 {
 	if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
 		free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
-	check_and_free_timer(htab, l);
+	check_and_free_fields(htab, l);
 	kfree(l);
 }
 
@@ -857,7 +879,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 	htab_put_fd_value(htab, l);
 
 	if (htab_is_prealloc(htab)) {
-		check_and_free_timer(htab, l);
+		check_and_free_fields(htab, l);
 		__pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		atomic_dec(&htab->count);
@@ -1104,7 +1126,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		if (!htab_is_prealloc(htab))
 			free_htab_elem(htab, l_old);
 		else
-			check_and_free_timer(htab, l_old);
+			check_and_free_fields(htab, l_old);
 	}
 	ret = 0;
 err:
@@ -1114,7 +1136,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 
 static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
 {
-	check_and_free_timer(htab, elem);
+	check_and_free_fields(htab, elem);
 	bpf_lru_push_free(&htab->lru, &elem->lru_node);
 }
 
@@ -1419,8 +1441,14 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
 		struct hlist_nulls_node *n;
 		struct htab_elem *l;
 
-		hlist_nulls_for_each_entry(l, n, head, hash_node)
-			check_and_free_timer(htab, l);
+		hlist_nulls_for_each_entry(l, n, head, hash_node) {
+			/* We don't reset or free kptr on uref dropping to zero,
+			 * hence just free timer.
+			 */
+			bpf_timer_cancel_and_free(l->key +
+						  round_up(htab->map.key_size, 8) +
+						  htab->map.timer_off);
+		}
 		cond_resched_rcu();
 	}
 	rcu_read_unlock();
@@ -1430,6 +1458,7 @@ static void htab_map_free_timers(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 
+	/* We don't reset or free kptr on uref dropping to zero. */
 	if (likely(!map_value_has_timer(&htab->map)))
 		return;
 	if (!htab_is_prealloc(htab))
@@ -1453,11 +1482,14 @@ static void htab_map_free(struct bpf_map *map)
 	 * not have executed. Wait for them.
 	 */
 	rcu_barrier();
-	if (!htab_is_prealloc(htab))
+	if (!htab_is_prealloc(htab)) {
 		delete_all_elements(htab);
-	else
+	} else {
+		htab_free_prealloced_kptrs(htab);
 		prealloc_destroy(htab);
+	}
 
+	bpf_map_free_kptr_off_tab(map);
 	free_percpu(htab->extra_elems);
 	bpf_map_area_free(htab->buckets);
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1b1497b94303..518acf39b40c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -508,8 +508,11 @@ void bpf_map_free_kptr_off_tab(struct bpf_map *map)
 	if (!map_value_has_kptrs(map))
 		return;
 	for (i = 0; i < tab->nr_off; i++) {
+		struct module *mod = tab->off[i].kptr.module;
 		struct btf *btf = tab->off[i].kptr.btf;
 
+		if (mod)
+			module_put(mod);
 		btf_put(btf);
 	}
 	kfree(tab);
@@ -524,8 +527,16 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
 	if (!map_value_has_kptrs(map))
 		return ERR_PTR(-ENOENT);
 	/* Do a deep copy of the kptr_off_tab */
-	for (i = 0; i < tab->nr_off; i++)
-		btf_get(tab->off[i].kptr.btf);
+	for (i = 0; i < tab->nr_off; i++) {
+		struct module *mod = tab->off[i].kptr.module;
+		struct btf *btf = tab->off[i].kptr.btf;
+
+		if (mod && !try_module_get(mod)) {
+			ret = -ENXIO;
+			goto end;
+		}
+		btf_get(btf);
+	}
 
 	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
 	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
@@ -535,8 +546,14 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
 	}
 	return new_tab;
 end:
-	while (i--)
-		btf_put(tab->off[i].kptr.btf);
+	while (i--) {
+		struct module *mod = tab->off[i].kptr.module;
+		struct btf *btf = tab->off[i].kptr.btf;
+
+		if (mod)
+			module_put(mod);
+		btf_put(btf);
+	}
 	return ERR_PTR(ret);
 }
 
@@ -556,6 +573,33 @@ bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_ma
 	return !memcmp(tab_a, tab_b, size);
 }
 
+/* Caller must ensure map_value_has_kptrs is true. Note that this function can
+ * be called on a map value while the map_value is visible to BPF programs, as
+ * it ensures the correct synchronization, and we already enforce the same using
+ * the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
+ */
+void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
+{
+	struct bpf_map_value_off *tab = map->kptr_off_tab;
+	unsigned long *btf_id_ptr;
+	int i;
+
+	for (i = 0; i < tab->nr_off; i++) {
+		struct bpf_map_value_off_desc *off_desc = &tab->off[i];
+		unsigned long old_ptr;
+
+		btf_id_ptr = map_value + off_desc->offset;
+		if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR) {
+			u64 *p = (u64 *)btf_id_ptr;
+
+			WRITE_ONCE(p, 0);
+			continue;
+		}
+		old_ptr = xchg(btf_id_ptr, 0);
+		off_desc->kptr.dtor((void *)old_ptr);
+	}
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
@@ -563,9 +607,10 @@ static void bpf_map_free_deferred(struct work_struct *work)
 
 	security_bpf_map_free(map);
 	kfree(map->off_arr);
-	bpf_map_free_kptr_off_tab(map);
 	bpf_map_release_memcg(map);
-	/* implementation dependent freeing */
+	/* implementation dependent freeing, map_free callback also does
+	 * bpf_map_free_kptr_off_tab, if needed.
+	 */
 	map->ops->map_free(map);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 10/13] bpf: Teach verifier about kptr_get kfunc helpers
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

We introduce a new style of kfunc helpers, namely *_kptr_get, where they
take pointer to the map value which points to a referenced kernel
pointer contained in the map. Since this is referenced, only
bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to
change the current value, and each pointer that resides in that location
would be referenced, and RCU protected (this must be kept in mind while
adding kernel types embeddable as reference kptr in BPF maps).

This means that if do the load of the pointer value in an RCU read
section, and find a live pointer, then as long as we hold RCU read lock,
it won't be freed by a parallel xchg + release operation. This allows us
to implement a safe refcount increment scheme. Hence, enforce that first
argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
right offset to referenced pointer.

For the rest of the arguments, they are subjected to typical kfunc
argument checks, hence allowing some flexibility in passing more intent
into how the reference should be taken.

For instance, in case of struct nf_conn, it is not freed until RCU grace
period ends, but can still be reused for another tuple once refcount has
dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call
refcount_inc_not_zero, but also do a tuple match after incrementing the
reference, and when it fails to match it, put the reference again and
return NULL.

This can be implemented easily if we allow passing additional parameters
to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
tuple__sz pair.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h |  2 ++
 kernel/bpf/btf.c    | 58 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index f70625dd5bb4..2611cea2c2b6 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,6 +17,7 @@ enum btf_kfunc_type {
 	BTF_KFUNC_TYPE_ACQUIRE,
 	BTF_KFUNC_TYPE_RELEASE,
 	BTF_KFUNC_TYPE_RET_NULL,
+	BTF_KFUNC_TYPE_KPTR_ACQUIRE,
 	BTF_KFUNC_TYPE_MAX,
 };
 
@@ -35,6 +36,7 @@ struct btf_kfunc_id_set {
 			struct btf_id_set *acquire_set;
 			struct btf_id_set *release_set;
 			struct btf_id_set *ret_null_set;
+			struct btf_id_set *kptr_acquire_set;
 		};
 		struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
 	};
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 062a751c1595..7155874f1902 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6035,11 +6035,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 	struct bpf_verifier_log *log = &env->log;
 	u32 i, nargs, ref_id, ref_obj_id = 0;
 	bool is_kfunc = btf_is_kernel(btf);
+	bool rel = false, kptr_get = false;
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
 	int ref_regno = 0, ret;
-	bool rel = false;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
@@ -6065,10 +6065,14 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	/* Only kfunc can be release func */
-	if (is_kfunc)
+	if (is_kfunc) {
+		/* Only kfunc can be release func */
 		rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
 						BTF_KFUNC_TYPE_RELEASE, func_id);
+		kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
+						     BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
+	}
+
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
@@ -6100,8 +6104,52 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		if (ret < 0)
 			return ret;
 
-		if (btf_get_prog_ctx_type(log, btf, t,
-					  env->prog->type, i)) {
+		/* kptr_get is only true for kfunc */
+		if (i == 0 && kptr_get) {
+			struct bpf_map_value_off_desc *off_desc;
+
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				bpf_log(log, "arg#0 expected pointer to map value\n");
+				return -EINVAL;
+			}
+
+			/* check_func_arg_reg_off allows var_off for
+			 * PTR_TO_MAP_VALUE, but we need fixed offset to find
+			 * off_desc.
+			 */
+			if (!tnum_is_const(reg->var_off)) {
+				bpf_log(log, "arg#0 must have constant offset\n");
+				return -EINVAL;
+			}
+
+			off_desc = bpf_map_kptr_off_contains(reg->map_ptr, reg->off + reg->var_off.value);
+			if (!off_desc || off_desc->type != BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
+				bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n",
+					reg->off + reg->var_off.value);
+				return -EINVAL;
+			}
+
+			if (!btf_type_is_ptr(ref_t)) {
+				bpf_log(log, "arg#0 BTF type must be a double pointer\n");
+				return -EINVAL;
+			}
+
+			ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id);
+			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+
+			if (!btf_type_is_struct(ref_t)) {
+				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			if (!btf_struct_ids_match(log, btf, ref_id, 0, off_desc->kptr.btf,
+						  off_desc->kptr.btf_id)) {
+				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			/* rest of the arguments can be anything, like normal kfunc */
+		} else if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Include convenience definitions:
__kptr:	Unreferenced kptr
__kptr_ref: Referenced kptr

Users can use them to tag the pointer type meant to be used with the new
support directly in the map value definition.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/bpf_helpers.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 44df982d2a5c..bbae9a057bc8 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -149,6 +149,8 @@ enum libbpf_tristate {
 
 #define __kconfig __attribute__((section(".kconfig")))
 #define __ksym __attribute__((section(".ksyms")))
+#define __kptr __attribute__((btf_type_tag("kptr")))
+#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
 
 #ifndef ___bpf_concat
 #define ___bpf_concat(a, b) a ## b
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 12/13] selftests/bpf: Add C tests for kptr
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-15 16:03 ` [PATCH bpf-next v5 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
  2022-04-21  4:40 ` [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps patchwork-bot+netdevbpf
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

This uses the __kptr and __kptr_ref macros as well, and tries to test
the stuff that is supposed to work, since we have negative tests in
test_verifier suite. Also include some code to test map-in-map support,
such that the inner_map_meta matches the kptr_off_tab of map added as
element.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/map_kptr.c       |  37 ++++
 tools/testing/selftests/bpf/progs/map_kptr.c  | 190 ++++++++++++++++++
 2 files changed, 227 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_kptr.c

diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
new file mode 100644
index 000000000000..9e2fbda64a65
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#include "map_kptr.skel.h"
+
+void test_map_kptr(void)
+{
+	struct map_kptr *skel;
+	int key = 0, ret;
+	char buf[24];
+
+	skel = map_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
+		return;
+
+	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.array_map), &key, buf, 0);
+	ASSERT_OK(ret, "array_map update");
+	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.array_map), &key, buf, 0);
+	ASSERT_OK(ret, "array_map update2");
+
+	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.hash_map), &key, buf, 0);
+	ASSERT_OK(ret, "hash_map update");
+	ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.hash_map), &key);
+	ASSERT_OK(ret, "hash_map delete");
+
+	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key, buf, 0);
+	ASSERT_OK(ret, "hash_malloc_map update");
+	ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key);
+	ASSERT_OK(ret, "hash_malloc_map delete");
+
+	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.lru_hash_map), &key, buf, 0);
+	ASSERT_OK(ret, "lru_hash_map update");
+	ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.lru_hash_map), &key);
+	ASSERT_OK(ret, "lru_hash_map delete");
+
+	map_kptr__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
new file mode 100644
index 000000000000..1b0e0409eaa5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/map_kptr.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+struct map_value {
+	struct prog_test_ref_kfunc __kptr *unref_ptr;
+	struct prog_test_ref_kfunc __kptr_ref *ref_ptr;
+};
+
+struct array_map {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+struct hash_map {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} hash_map SEC(".maps");
+
+struct hash_malloc_map {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+} hash_malloc_map SEC(".maps");
+
+struct lru_hash_map {
+	__uint(type, BPF_MAP_TYPE_LRU_HASH);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} lru_hash_map SEC(".maps");
+
+#define DEFINE_MAP_OF_MAP(map_type, inner_map_type, name)       \
+	struct {                                                \
+		__uint(type, map_type);                         \
+		__uint(max_entries, 1);                         \
+		__uint(key_size, sizeof(int));                  \
+		__uint(value_size, sizeof(int));                \
+		__array(values, struct inner_map_type);         \
+	} name SEC(".maps") = {                                 \
+		.values = { [0] = &inner_map_type },            \
+	}
+
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_map, array_of_array_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_map, array_of_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_malloc_map, array_of_hash_malloc_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, lru_hash_map, array_of_lru_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, array_map, hash_of_array_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_map, hash_of_hash_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_malloc_map, hash_of_hash_malloc_maps);
+DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, lru_hash_map, hash_of_lru_hash_maps);
+
+extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
+extern struct prog_test_ref_kfunc *
+bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
+extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
+
+static void test_kptr_unref(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = v->unref_ptr;
+	/* store untrusted_ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return;
+	if (p->a + p->b > 100)
+		return;
+	/* store untrusted_ptr_ */
+	v->unref_ptr = p;
+	/* store NULL */
+	v->unref_ptr = NULL;
+}
+
+static void test_kptr_ref(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = v->ref_ptr;
+	/* store ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return;
+	if (p->a + p->b > 100)
+		return;
+	/* store NULL */
+	p = bpf_kptr_xchg(&v->ref_ptr, NULL);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	/* store ptr_ */
+	v->unref_ptr = p;
+	bpf_kfunc_call_test_release(p);
+
+	p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
+	if (!p)
+		return;
+	/* store ptr_ */
+	p = bpf_kptr_xchg(&v->ref_ptr, p);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	bpf_kfunc_call_test_release(p);
+}
+
+static void test_kptr_get(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p;
+
+	p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
+	if (!p)
+		return;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return;
+	}
+	bpf_kfunc_call_test_release(p);
+}
+
+static void test_kptr(struct map_value *v)
+{
+	test_kptr_unref(v);
+	test_kptr_ref(v);
+	test_kptr_get(v);
+}
+
+SEC("tc")
+int test_map_kptr(struct __sk_buff *ctx)
+{
+	struct map_value *v;
+	int i, key = 0;
+
+#define TEST(map)					\
+	v = bpf_map_lookup_elem(&map, &key);		\
+	if (!v)						\
+		return 0;				\
+	test_kptr(v)
+
+	TEST(array_map);
+	TEST(hash_map);
+	TEST(hash_malloc_map);
+	TEST(lru_hash_map);
+
+#undef TEST
+	return 0;
+}
+
+SEC("tc")
+int test_map_in_map_kptr(struct __sk_buff *ctx)
+{
+	struct map_value *v;
+	int i, key = 0;
+	void *map;
+
+#define TEST(map_in_map)                                \
+	map = bpf_map_lookup_elem(&map_in_map, &key);   \
+	if (!map)                                       \
+		return 0;                               \
+	v = bpf_map_lookup_elem(map, &key);		\
+	if (!v)						\
+		return 0;				\
+	test_kptr(v)
+
+	TEST(array_of_array_maps);
+	TEST(array_of_hash_maps);
+	TEST(array_of_hash_malloc_maps);
+	TEST(array_of_lru_hash_maps);
+	TEST(hash_of_array_maps);
+	TEST(hash_of_hash_maps);
+	TEST(hash_of_hash_malloc_maps);
+	TEST(hash_of_lru_hash_maps);
+
+#undef TEST
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next v5 13/13] selftests/bpf: Add verifier tests for kptr
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
@ 2022-04-15 16:03 ` Kumar Kartikeya Dwivedi
  2022-04-21  4:40 ` [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps patchwork-bot+netdevbpf
  13 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-15 16:03 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in
BPF map case, including some tests that verify implementation sanity and
corner cases.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 net/bpf/test_run.c                            |  45 +-
 tools/testing/selftests/bpf/test_verifier.c   |  55 +-
 .../testing/selftests/bpf/verifier/map_kptr.c | 469 ++++++++++++++++++
 3 files changed, 562 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/map_kptr.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index e7b9c2636d10..29fe32821e7e 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -584,6 +584,12 @@ noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p)
 {
 }
 
+noinline struct prog_test_ref_kfunc *
+bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b)
+{
+	return &prog_test_struct;
+}
+
 struct prog_test_pass1 {
 	int x0;
 	struct {
@@ -669,6 +675,7 @@ BTF_ID(func, bpf_kfunc_call_test3)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
 BTF_ID(func, bpf_kfunc_call_test_release)
 BTF_ID(func, bpf_kfunc_call_memb_release)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_ID(func, bpf_kfunc_call_test_pass_ctx)
 BTF_ID(func, bpf_kfunc_call_test_pass1)
 BTF_ID(func, bpf_kfunc_call_test_pass2)
@@ -682,6 +689,7 @@ BTF_SET_END(test_sk_check_kfunc_ids)
 
 BTF_SET_START(test_sk_acquire_kfunc_ids)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_SET_END(test_sk_acquire_kfunc_ids)
 
 BTF_SET_START(test_sk_release_kfunc_ids)
@@ -691,8 +699,13 @@ BTF_SET_END(test_sk_release_kfunc_ids)
 
 BTF_SET_START(test_sk_ret_null_kfunc_ids)
 BTF_ID(func, bpf_kfunc_call_test_acquire)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
 BTF_SET_END(test_sk_ret_null_kfunc_ids)
 
+BTF_SET_START(test_sk_kptr_acquire_kfunc_ids)
+BTF_ID(func, bpf_kfunc_call_test_kptr_get)
+BTF_SET_END(test_sk_kptr_acquire_kfunc_ids)
+
 static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
 			   u32 size, u32 headroom, u32 tailroom)
 {
@@ -1579,14 +1592,36 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
 
 static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
 	.owner        = THIS_MODULE,
-	.check_set    = &test_sk_check_kfunc_ids,
-	.acquire_set  = &test_sk_acquire_kfunc_ids,
-	.release_set  = &test_sk_release_kfunc_ids,
-	.ret_null_set = &test_sk_ret_null_kfunc_ids,
+	.check_set        = &test_sk_check_kfunc_ids,
+	.acquire_set      = &test_sk_acquire_kfunc_ids,
+	.release_set      = &test_sk_release_kfunc_ids,
+	.ret_null_set     = &test_sk_ret_null_kfunc_ids,
+	.kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids
 };
 
+BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
+BTF_ID(struct, prog_test_ref_kfunc)
+BTF_ID(func, bpf_kfunc_call_test_release)
+BTF_ID(struct, prog_test_member)
+BTF_ID(func, bpf_kfunc_call_memb_release)
+
 static int __init bpf_prog_test_run_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
+		{
+		  .btf_id       = bpf_prog_test_dtor_kfunc_ids[0],
+		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
+		},
+		{
+		  .btf_id	= bpf_prog_test_dtor_kfunc_ids[2],
+		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[3],
+		},
+	};
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
+						  ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
+						  THIS_MODULE);
 }
 late_initcall(bpf_prog_test_run_init);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index a2cd236c32eb..372579c9f45e 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -53,7 +53,7 @@
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_TEST_INSNS	1000000
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	22
+#define MAX_NR_MAPS	23
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -101,6 +101,7 @@ struct bpf_test {
 	int fixup_map_reuseport_array[MAX_FIXUPS];
 	int fixup_map_ringbuf[MAX_FIXUPS];
 	int fixup_map_timer[MAX_FIXUPS];
+	int fixup_map_kptr[MAX_FIXUPS];
 	struct kfunc_btf_id_pair fixup_kfunc_btf_id[MAX_FIXUPS];
 	/* Expected verifier log output for result REJECT or VERBOSE_ACCEPT.
 	 * Can be a tab-separated sequence of expected strings. An empty string
@@ -621,8 +622,15 @@ static int create_cgroup_storage(bool percpu)
  * struct timer {
  *   struct bpf_timer t;
  * };
+ * struct btf_ptr {
+ *   struct prog_test_ref_kfunc __kptr *ptr;
+ *   struct prog_test_ref_kfunc __kptr_ref *ptr;
+ *   struct prog_test_member __kptr_ref *ptr;
+ * }
  */
-static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t";
+static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t"
+				  "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref"
+				  "\0prog_test_member";
 static __u32 btf_raw_types[] = {
 	/* int */
 	BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
@@ -638,6 +646,22 @@ static __u32 btf_raw_types[] = {
 	/* struct timer */                              /* [5] */
 	BTF_TYPE_ENC(35, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), 16),
 	BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */
+	/* struct prog_test_ref_kfunc */		/* [6] */
+	BTF_STRUCT_ENC(51, 0, 0),
+	BTF_STRUCT_ENC(89, 0, 0),			/* [7] */
+	/* type tag "kptr" */
+	BTF_TYPE_TAG_ENC(75, 6),			/* [8] */
+	/* type tag "kptr_ref" */
+	BTF_TYPE_TAG_ENC(80, 6),			/* [9] */
+	BTF_TYPE_TAG_ENC(80, 7),			/* [10] */
+	BTF_PTR_ENC(8),					/* [11] */
+	BTF_PTR_ENC(9),					/* [12] */
+	BTF_PTR_ENC(10),				/* [13] */
+	/* struct btf_ptr */				/* [14] */
+	BTF_STRUCT_ENC(43, 3, 24),
+	BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */
+	BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */
+	BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr_ref *ptr; */
 };
 
 static int load_btf(void)
@@ -727,6 +751,25 @@ static int create_map_timer(void)
 	return fd;
 }
 
+static int create_map_kptr(void)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		.btf_key_type_id = 1,
+		.btf_value_type_id = 14,
+	);
+	int fd, btf_fd;
+
+	btf_fd = load_btf();
+	if (btf_fd < 0)
+		return -1;
+
+	opts.btf_fd = btf_fd;
+	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_map", 4, 24, 1, &opts);
+	if (fd < 0)
+		printf("Failed to create map with btf_id pointer\n");
+	return fd;
+}
+
 static char bpf_vlog[UINT_MAX >> 8];
 
 static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
@@ -754,6 +797,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_map_reuseport_array = test->fixup_map_reuseport_array;
 	int *fixup_map_ringbuf = test->fixup_map_ringbuf;
 	int *fixup_map_timer = test->fixup_map_timer;
+	int *fixup_map_kptr = test->fixup_map_kptr;
 	struct kfunc_btf_id_pair *fixup_kfunc_btf_id = test->fixup_kfunc_btf_id;
 
 	if (test->fill_helper) {
@@ -947,6 +991,13 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_map_timer++;
 		} while (*fixup_map_timer);
 	}
+	if (*fixup_map_kptr) {
+		map_fds[22] = create_map_kptr();
+		do {
+			prog[*fixup_map_kptr].imm = map_fds[22];
+			fixup_map_kptr++;
+		} while (*fixup_map_kptr);
+	}
 
 	/* Patch in kfunc BTF IDs */
 	if (fixup_kfunc_btf_id->kfunc) {
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
new file mode 100644
index 000000000000..501a5d31ef35
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -0,0 +1,469 @@
+/* Common tests */
+{
+	"map_kptr: BPF_ST imm != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "BPF_ST imm must be 0 when storing to kptr at off=0",
+},
+{
+	"map_kptr: size != bpf_size_to_bytes(BPF_DW)",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_W, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access size must be BPF_DW",
+},
+{
+	"map_kptr: map_value non-const var_off",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_3, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access cannot have variable offset",
+},
+{
+	"map_kptr: bpf_kptr_xchg non-const var_off",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_3),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 doesn't have constant offset. kptr has to be at the constant offset",
+},
+{
+	"map_kptr: unaligned boundary load/store",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 7),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr access misaligned expected=0 off=7",
+},
+{
+	"map_kptr: reject var_off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "variable untrusted_ptr_ access var_off=(0x0; 0x7) disallowed",
+},
+/* Tests for unreferened PTR_TO_BTF_ID */
+{
+	"map_kptr: unref: reject btf_struct_ids_match == false",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 4),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc expected=ptr_prog_test",
+},
+{
+	"map_kptr: unref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R0 invalid mem access 'untrusted_ptr_or_null_'",
+},
+{
+	"map_kptr: unref: correct in kernel type size",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 24),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "access beyond struct prog_test_ref_kfunc at off 24 size 8",
+},
+{
+	"map_kptr: unref: inherit PTR_UNTRUSTED on struct walk",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 16),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_ expected=percpu_ptr_",
+},
+{
+	"map_kptr: unref: no reference state created",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = ACCEPT,
+},
+{
+	"map_kptr: unref: bpf_kptr_xchg rejected",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "off=0 kptr isn't referenced kptr",
+},
+{
+	"map_kptr: unref: bpf_kfunc_call_test_kptr_get rejected",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "arg#0 no referenced kptr at map value offset=0",
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_kptr_get", 13 },
+	}
+},
+/* Tests for referenced PTR_TO_BTF_ID */
+{
+	"map_kptr: ref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_",
+},
+{
+	"map_kptr: ref: reject off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "R2 must have zero offset when passed to release func",
+},
+{
+	"map_kptr: ref: reference state created and released on xchg",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_kptr_xchg),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "Unreleased reference id=5 alloc_insn=20",
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_acquire", 15 },
+	}
+},
+{
+	"map_kptr: ref: reject STX",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, 0),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 8),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "store to referenced kptr disallowed",
+},
+{
+	"map_kptr: ref: reject ST",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 8, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "store to referenced kptr disallowed",
+},
+{
+	"map_kptr: reject helper access to kptr",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 2),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_delete_elem),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_kptr = { 1 },
+	.result = REJECT,
+	.errstr = "kptr cannot be accessed indirectly by helper",
+},
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-04-15 16:03 ` [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
@ 2022-04-18 23:48   ` Joanne Koong
  2022-04-19  2:47     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Joanne Koong @ 2022-04-18 23:48 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 9:04 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> While we can guarantee that even for unreferenced kptr, the object
> pointer points to being freed etc. can be handled by the verifier's
> exception handling (normal load patching to PROBE_MEM loads), we still
> cannot allow the user to pass these pointers to BPF helpers and kfunc,
> because the same exception handling won't be done for accesses inside
> the kernel. The same is true if a referenced pointer is loaded using
> normal load instruction. Since the reference is not guaranteed to be
> held while the pointer is used, it must be marked as untrusted.
>
> Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
> all registers loading unreferenced and referenced kptr from BPF maps,
> and ensure they can never escape the BPF program and into the kernel by
> way of calling stable/unstable helpers.
To me, it seems more clear / straightforward if loads are prohibited
altogether and the only way to get a referenced kptr from a BPF map is
through the *_kptr_get function, instead of allowing loads but
prohibiting the loaded value from going to bpf helpers + kfuncs. To me
it seems like 1) using the kptr in kfuncs / helper funcs will be a
significant portion of use cases, 2) as a user, I think it's
non-intuitive that I'm able to retrieve it and get a direct reference
to it but not be able to use it in a kfunc/helper func, and 3) this
would simplify this logic in the verifier where we don't need to add
PTR_UNTRUSTED.
What are your thoughts?

>
> In check_ptr_to_btf_access, the !type_may_be_null check to reject type
> flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
> MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
> two are checked inside the function and rejected using a proper error
> message, but we still want to allow dereference of untrusted case.
>
> Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
> walked, so that this flag is never dropped once it has been set on a
> PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
> direction).
>
> In convert_ctx_accesses, extend the switch case to consider untrusted
> PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
> conversion for BPF_LDX.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   | 10 +++++++++-
>  kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++++++-------
>  2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 61f83a23980f..7e2ac2a26bdb 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -375,7 +375,15 @@ enum bpf_type_flag {
>         /* Indicates that the pointer argument will be released. */
>         PTR_RELEASE             = BIT(5 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = PTR_RELEASE,
> +       /* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
> +        * unreferenced and referenced kptr loaded from map value using a load
> +        * instruction, so that they can only be dereferenced but not escape the
> +        * BPF program into the kernel (i.e. cannot be passed as arguments to
> +        * kfunc or bpf helpers).
> +        */
> +       PTR_UNTRUSTED           = BIT(6 + BPF_BASE_TYPE_BITS),
> +
> +       __BPF_TYPE_LAST_FLAG    = PTR_UNTRUSTED,
>  };
>
>  /* Max number of base types. */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index aa5c0d1c8495..3b89dc8d41ce 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -567,6 +567,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
>                 strncpy(prefix, "user_", 32);
>         if (type & MEM_PERCPU)
>                 strncpy(prefix, "percpu_", 32);
> +       if (type & PTR_UNTRUSTED)
> +               strncpy(prefix, "untrusted_", 32);
>
>         snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
>                  prefix, str[base_type(type)], postfix);
> @@ -3504,9 +3506,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>                                struct bpf_reg_state *reg, u32 regno)
>  {
>         const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
> +       int perm_flags = PTR_MAYBE_NULL;
>         const char *reg_name = "";
>
> -       if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
> +       /* Only unreferenced case accepts untrusted pointers */
> +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> +               perm_flags |= PTR_UNTRUSTED;
> +
> +       if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>                 goto bad_type;
>
>         if (!btf_is_kernel(reg->btf)) {
> @@ -3532,7 +3539,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  bad_type:
>         verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
>                 reg_type_str(env, reg->type), reg_name);
> -       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +       verbose(env, "expected=%s%s", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> +               verbose(env, " or %s%s\n", reg_type_str(env, PTR_TO_BTF_ID | PTR_UNTRUSTED),
> +                       targ_name);
> +       else
> +               verbose(env, "\n");
>         return -EINVAL;
>  }
>
> @@ -3556,9 +3568,11 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
>                 return -EACCES;
>         }
>
> -       /* We cannot directly access kptr_ref */
> -       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> -               verbose(env, "accessing referenced kptr disallowed\n");
> +       /* We only allow loading referenced kptr, since it will be marked as
> +        * untrusted, similar to unreferenced kptr.
> +        */
> +       if (class != BPF_LDX && off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> +               verbose(env, "store to referenced kptr disallowed\n");
>                 return -EACCES;
>         }
>
> @@ -3568,7 +3582,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
>                  * value from map as PTR_TO_BTF_ID, with the correct type.
>                  */
>                 mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> -                               off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> +                               off_desc->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
>                 val_reg->id = ++env->id_gen;
>         } else if (class == BPF_STX) {
>                 val_reg = reg_state(env, value_regno);
> @@ -4336,6 +4350,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>         if (ret < 0)
>                 return ret;
>
> +       /* If this is an untrusted pointer, all pointers formed by walking it
> +        * also inherit the untrusted flag.
> +        */
> +       if (type_flag(reg->type) & PTR_UNTRUSTED)
> +               flag |= PTR_UNTRUSTED;
> +
>         if (atype == BPF_READ && value_regno >= 0)
>                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
>
> @@ -13054,7 +13074,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                 if (!ctx_access)
>                         continue;
>
> -               switch (env->insn_aux_data[i + delta].ptr_type) {
> +               switch ((int)env->insn_aux_data[i + delta].ptr_type) {
>                 case PTR_TO_CTX:
>                         if (!ops->convert_ctx_access)
>                                 continue;
> @@ -13071,6 +13091,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                         convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
>                         break;
>                 case PTR_TO_BTF_ID:
> +               case PTR_TO_BTF_ID | PTR_UNTRUSTED:
>                         if (type == BPF_READ) {
>                                 insn->code = BPF_LDX | BPF_PROBE_MEM |
>                                         BPF_SIZE((insn)->code);
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-04-18 23:48   ` Joanne Koong
@ 2022-04-19  2:47     ` Kumar Kartikeya Dwivedi
  2022-04-19 17:35       ` Joanne Koong
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-19  2:47 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Tue, Apr 19, 2022 at 05:18:38AM IST, Joanne Koong wrote:
> On Fri, Apr 15, 2022 at 9:04 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > While we can guarantee that even for unreferenced kptr, the object
> > pointer points to being freed etc. can be handled by the verifier's
> > exception handling (normal load patching to PROBE_MEM loads), we still
> > cannot allow the user to pass these pointers to BPF helpers and kfunc,
> > because the same exception handling won't be done for accesses inside
> > the kernel. The same is true if a referenced pointer is loaded using
> > normal load instruction. Since the reference is not guaranteed to be
> > held while the pointer is used, it must be marked as untrusted.
> >
> > Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
> > all registers loading unreferenced and referenced kptr from BPF maps,
> > and ensure they can never escape the BPF program and into the kernel by
> > way of calling stable/unstable helpers.
> To me, it seems more clear / straightforward if loads are prohibited
> altogether and the only way to get a referenced kptr from a BPF map is
> through the *_kptr_get function, instead of allowing loads but
> prohibiting the loaded value from going to bpf helpers + kfuncs. To me
> it seems like 1) using the kptr in kfuncs / helper funcs will be a
> significant portion of use cases, 2) as a user, I think it's
> non-intuitive that I'm able to retrieve it and get a direct reference
> to it but not be able to use it in a kfunc/helper func, and 3) this
> would simplify this logic in the verifier where we don't need to add
> PTR_UNTRUSTED.
> What are your thoughts?
>

Given this is atleast needed for the unreferenced case, so the flag needs to
stay, but considering just referenced kptr:

1) is true, but in many use cases just reading from the object is also enough,
in those cases imposing the cost of kptr_get is too much, I think. If there are
reasonable gurantees that the object won't go away, or some way to detect that
the pointer changed (e.g. by detecting writer presence [0]), it should be safe
to permit reads from such untrusted pointer without ensuring user holds a
refcount. You can imagine case where you have programs attached to a callchain,
and you stash a ref in a map in an invocation earlier in the chain, then inspect
the data somewhere in the middle, and eventually drop the ref, etc. The fact
that this can be made safe using the exception handling is a great feature IMO.

2) It can certainly be a bit surprising, but I think kptr_ref is already special
enough that the user needs to carefully understand the semantics when making use
of them. Even now, you will have to use kptr_get to be able to get a normal
PTR_TO_BTF_ID they can pass to helpers, the untrusted pointer is for cases where
you know what you are doing (and know that what you'll read is still valid at a
later point, depending on how that data will be used).

3) We already need this flag, for this case and eventually also making this the
default for majority of cases where we cannot prove PTR_TO_BTF_ID is safe (e.g.
in tracing or LSM ctx). See [1] for some background. There are going to be a lot
more cases going forward where dereference is safe (hence allowed) but passing
to helpers or kfunc is not.

 [0]: https://lore.kernel.org/bpf/20220222082129.yivvpm6yo3474dp3@apollo.legion
 [1]: https://lore.kernel.org/bpf/CAADnVQJF8yQgKRQH2CqXuB9JR-p3fQeiGRxB0+N_V7uTH2iOeA@mail.gmail.com

> >
> > In check_ptr_to_btf_access, the !type_may_be_null check to reject type
> > flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
> > MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
> > two are checked inside the function and rejected using a proper error
> > message, but we still want to allow dereference of untrusted case.
> >
> > Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
> > walked, so that this flag is never dropped once it has been set on a
> > PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
> > direction).
> >
> > In convert_ctx_accesses, extend the switch case to consider untrusted
> > PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
> > conversion for BPF_LDX.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h   | 10 +++++++++-
> >  kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++++++-------
> >  2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 61f83a23980f..7e2ac2a26bdb 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -375,7 +375,15 @@ enum bpf_type_flag {
> >         /* Indicates that the pointer argument will be released. */
> >         PTR_RELEASE             = BIT(5 + BPF_BASE_TYPE_BITS),
> >
> > -       __BPF_TYPE_LAST_FLAG    = PTR_RELEASE,
> > +       /* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
> > +        * unreferenced and referenced kptr loaded from map value using a load
> > +        * instruction, so that they can only be dereferenced but not escape the
> > +        * BPF program into the kernel (i.e. cannot be passed as arguments to
> > +        * kfunc or bpf helpers).
> > +        */
> > +       PTR_UNTRUSTED           = BIT(6 + BPF_BASE_TYPE_BITS),
> > +
> > +       __BPF_TYPE_LAST_FLAG    = PTR_UNTRUSTED,
> >  };
> >
> >  /* Max number of base types. */
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index aa5c0d1c8495..3b89dc8d41ce 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -567,6 +567,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
> >                 strncpy(prefix, "user_", 32);
> >         if (type & MEM_PERCPU)
> >                 strncpy(prefix, "percpu_", 32);
> > +       if (type & PTR_UNTRUSTED)
> > +               strncpy(prefix, "untrusted_", 32);
> >
> >         snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
> >                  prefix, str[base_type(type)], postfix);
> > @@ -3504,9 +3506,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >                                struct bpf_reg_state *reg, u32 regno)
> >  {
> >         const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
> > +       int perm_flags = PTR_MAYBE_NULL;
> >         const char *reg_name = "";
> >
> > -       if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
> > +       /* Only unreferenced case accepts untrusted pointers */
> > +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> > +               perm_flags |= PTR_UNTRUSTED;
> > +
> > +       if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
> >                 goto bad_type;
> >
> >         if (!btf_is_kernel(reg->btf)) {
> > @@ -3532,7 +3539,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  bad_type:
> >         verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> >                 reg_type_str(env, reg->type), reg_name);
> > -       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +       verbose(env, "expected=%s%s", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> > +               verbose(env, " or %s%s\n", reg_type_str(env, PTR_TO_BTF_ID | PTR_UNTRUSTED),
> > +                       targ_name);
> > +       else
> > +               verbose(env, "\n");
> >         return -EINVAL;
> >  }
> >
> > @@ -3556,9 +3568,11 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> >                 return -EACCES;
> >         }
> >
> > -       /* We cannot directly access kptr_ref */
> > -       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> > -               verbose(env, "accessing referenced kptr disallowed\n");
> > +       /* We only allow loading referenced kptr, since it will be marked as
> > +        * untrusted, similar to unreferenced kptr.
> > +        */
> > +       if (class != BPF_LDX && off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> > +               verbose(env, "store to referenced kptr disallowed\n");
> >                 return -EACCES;
> >         }
> >
> > @@ -3568,7 +3582,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> >                  * value from map as PTR_TO_BTF_ID, with the correct type.
> >                  */
> >                 mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> > -                               off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> > +                               off_desc->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
> >                 val_reg->id = ++env->id_gen;
> >         } else if (class == BPF_STX) {
> >                 val_reg = reg_state(env, value_regno);
> > @@ -4336,6 +4350,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >         if (ret < 0)
> >                 return ret;
> >
> > +       /* If this is an untrusted pointer, all pointers formed by walking it
> > +        * also inherit the untrusted flag.
> > +        */
> > +       if (type_flag(reg->type) & PTR_UNTRUSTED)
> > +               flag |= PTR_UNTRUSTED;
> > +
> >         if (atype == BPF_READ && value_regno >= 0)
> >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> >
> > @@ -13054,7 +13074,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >                 if (!ctx_access)
> >                         continue;
> >
> > -               switch (env->insn_aux_data[i + delta].ptr_type) {
> > +               switch ((int)env->insn_aux_data[i + delta].ptr_type) {
> >                 case PTR_TO_CTX:
> >                         if (!ops->convert_ctx_access)
> >                                 continue;
> > @@ -13071,6 +13091,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >                         convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
> >                         break;
> >                 case PTR_TO_BTF_ID:
> > +               case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> >                         if (type == BPF_READ) {
> >                                 insn->code = BPF_LDX | BPF_PROBE_MEM |
> >                                         BPF_SIZE((insn)->code);
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps
  2022-04-19  2:47     ` Kumar Kartikeya Dwivedi
@ 2022-04-19 17:35       ` Joanne Koong
  0 siblings, 0 replies; 30+ messages in thread
From: Joanne Koong @ 2022-04-19 17:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer

On Mon, Apr 18, 2022 at 7:46 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 05:18:38AM IST, Joanne Koong wrote:
> > On Fri, Apr 15, 2022 at 9:04 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > While we can guarantee that even for unreferenced kptr, the object
> > > pointer points to being freed etc. can be handled by the verifier's
> > > exception handling (normal load patching to PROBE_MEM loads), we still
> > > cannot allow the user to pass these pointers to BPF helpers and kfunc,
> > > because the same exception handling won't be done for accesses inside
> > > the kernel. The same is true if a referenced pointer is loaded using
> > > normal load instruction. Since the reference is not guaranteed to be
> > > held while the pointer is used, it must be marked as untrusted.
> > >
> > > Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
> > > all registers loading unreferenced and referenced kptr from BPF maps,
> > > and ensure they can never escape the BPF program and into the kernel by
> > > way of calling stable/unstable helpers.
> > To me, it seems more clear / straightforward if loads are prohibited
> > altogether and the only way to get a referenced kptr from a BPF map is
> > through the *_kptr_get function, instead of allowing loads but
> > prohibiting the loaded value from going to bpf helpers + kfuncs. To me
> > it seems like 1) using the kptr in kfuncs / helper funcs will be a
> > significant portion of use cases, 2) as a user, I think it's
> > non-intuitive that I'm able to retrieve it and get a direct reference
> > to it but not be able to use it in a kfunc/helper func, and 3) this
> > would simplify this logic in the verifier where we don't need to add
> > PTR_UNTRUSTED.
> > What are your thoughts?
> >
>
> Given this is atleast needed for the unreferenced case, so the flag needs to
> stay, but considering just referenced kptr:
>
Oh I see. I was thinking about the referenced case mostly and wasn't
considering the unreferenced kptr in map case. I agree then - we'll
need it for the unreferenced case so we might as well also have it for
the referenced case.

> 1) is true, but in many use cases just reading from the object is also enough,
> in those cases imposing the cost of kptr_get is too much, I think. If there are
> reasonable gurantees that the object won't go away, or some way to detect that
> the pointer changed (e.g. by detecting writer presence [0]), it should be safe
> to permit reads from such untrusted pointer without ensuring user holds a
> refcount. You can imagine case where you have programs attached to a callchain,
> and you stash a ref in a map in an invocation earlier in the chain, then inspect
> the data somewhere in the middle, and eventually drop the ref, etc. The fact
> that this can be made safe using the exception handling is a great feature IMO.
>
> 2) It can certainly be a bit surprising, but I think kptr_ref is already special
> enough that the user needs to carefully understand the semantics when making use
> of them. Even now, you will have to use kptr_get to be able to get a normal
> PTR_TO_BTF_ID they can pass to helpers, the untrusted pointer is for cases where
> you know what you are doing (and know that what you'll read is still valid at a
> later point, depending on how that data will be used).
>
> 3) We already need this flag, for this case and eventually also making this the
> default for majority of cases where we cannot prove PTR_TO_BTF_ID is safe (e.g.
> in tracing or LSM ctx). See [1] for some background. There are going to be a lot
> more cases going forward where dereference is safe (hence allowed) but passing
> to helpers or kfunc is not.
Gotcha. Thanks for the context.
>
>  [0]: https://lore.kernel.org/bpf/20220222082129.yivvpm6yo3474dp3@apollo.legion
>  [1]: https://lore.kernel.org/bpf/CAADnVQJF8yQgKRQH2CqXuB9JR-p3fQeiGRxB0+N_V7uTH2iOeA@mail.gmail.com
>
> > >
> > > In check_ptr_to_btf_access, the !type_may_be_null check to reject type
> > > flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER,
> > > MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first
> > > two are checked inside the function and rejected using a proper error
> > > message, but we still want to allow dereference of untrusted case.
> > >
> > > Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are
> > > walked, so that this flag is never dropped once it has been set on a
> > > PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one
> > > direction).
> > >
> > > In convert_ctx_accesses, extend the switch case to consider untrusted
> > > PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM
> > > conversion for BPF_LDX.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h   | 10 +++++++++-
> > >  kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++++++-------
> > >  2 files changed, 37 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 61f83a23980f..7e2ac2a26bdb 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -375,7 +375,15 @@ enum bpf_type_flag {
> > >         /* Indicates that the pointer argument will be released. */
> > >         PTR_RELEASE             = BIT(5 + BPF_BASE_TYPE_BITS),
> > >
> > > -       __BPF_TYPE_LAST_FLAG    = PTR_RELEASE,
> > > +       /* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
> > > +        * unreferenced and referenced kptr loaded from map value using a load
> > > +        * instruction, so that they can only be dereferenced but not escape the
> > > +        * BPF program into the kernel (i.e. cannot be passed as arguments to
> > > +        * kfunc or bpf helpers).
> > > +        */
> > > +       PTR_UNTRUSTED           = BIT(6 + BPF_BASE_TYPE_BITS),
> > > +
> > > +       __BPF_TYPE_LAST_FLAG    = PTR_UNTRUSTED,
> > >  };
> > >
> > >  /* Max number of base types. */
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index aa5c0d1c8495..3b89dc8d41ce 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -567,6 +567,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
> > >                 strncpy(prefix, "user_", 32);
> > >         if (type & MEM_PERCPU)
> > >                 strncpy(prefix, "percpu_", 32);
> > > +       if (type & PTR_UNTRUSTED)
> > > +               strncpy(prefix, "untrusted_", 32);
> > >
> > >         snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
> > >                  prefix, str[base_type(type)], postfix);
> > > @@ -3504,9 +3506,14 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >                                struct bpf_reg_state *reg, u32 regno)
> > >  {
> > >         const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
> > > +       int perm_flags = PTR_MAYBE_NULL;
> > >         const char *reg_name = "";
> > >
> > > -       if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
> > > +       /* Only unreferenced case accepts untrusted pointers */
> > > +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> > > +               perm_flags |= PTR_UNTRUSTED;
> > > +
> > > +       if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
> > >                 goto bad_type;
> > >
> > >         if (!btf_is_kernel(reg->btf)) {
> > > @@ -3532,7 +3539,12 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  bad_type:
> > >         verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > >                 reg_type_str(env, reg->type), reg_name);
> > > -       verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > +       verbose(env, "expected=%s%s", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > +       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR)
> > > +               verbose(env, " or %s%s\n", reg_type_str(env, PTR_TO_BTF_ID | PTR_UNTRUSTED),
> > > +                       targ_name);
> > > +       else
> > > +               verbose(env, "\n");
> > >         return -EINVAL;
> > >  }
> > >
> > > @@ -3556,9 +3568,11 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > >                 return -EACCES;
> > >         }
> > >
> > > -       /* We cannot directly access kptr_ref */
> > > -       if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> > > -               verbose(env, "accessing referenced kptr disallowed\n");
> > > +       /* We only allow loading referenced kptr, since it will be marked as
> > > +        * untrusted, similar to unreferenced kptr.
> > > +        */
> > > +       if (class != BPF_LDX && off_desc->type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> > > +               verbose(env, "store to referenced kptr disallowed\n");
> > >                 return -EACCES;
> > >         }
> > >
> > > @@ -3568,7 +3582,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > >                  * value from map as PTR_TO_BTF_ID, with the correct type.
> > >                  */
> > >                 mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> > > -                               off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> > > +                               off_desc->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED);
> > >                 val_reg->id = ++env->id_gen;
> > >         } else if (class == BPF_STX) {
> > >                 val_reg = reg_state(env, value_regno);
> > > @@ -4336,6 +4350,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > >         if (ret < 0)
> > >                 return ret;
> > >
> > > +       /* If this is an untrusted pointer, all pointers formed by walking it
> > > +        * also inherit the untrusted flag.
> > > +        */
> > > +       if (type_flag(reg->type) & PTR_UNTRUSTED)
> > > +               flag |= PTR_UNTRUSTED;
> > > +
> > >         if (atype == BPF_READ && value_regno >= 0)
> > >                 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > >
> > > @@ -13054,7 +13074,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> > >                 if (!ctx_access)
> > >                         continue;
> > >
> > > -               switch (env->insn_aux_data[i + delta].ptr_type) {
> > > +               switch ((int)env->insn_aux_data[i + delta].ptr_type) {
> > >                 case PTR_TO_CTX:
> > >                         if (!ops->convert_ctx_access)
> > >                                 continue;
> > > @@ -13071,6 +13091,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> > >                         convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
> > >                         break;
> > >                 case PTR_TO_BTF_ID:
> > > +               case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> > >                         if (type == BPF_READ) {
> > >                                 insn->code = BPF_LDX | BPF_PROBE_MEM |
> > >                                         BPF_SIZE((insn)->code);
> > > --
> > > 2.35.1
> > >
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map
  2022-04-15 16:03 ` [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-04-21  4:15   ` Alexei Starovoitov
  2022-04-21 19:36     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21  4:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 09:33:44PM +0530, Kumar Kartikeya Dwivedi wrote:
> +struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
> +					  const struct btf_type *t)
> +{
> +	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
> +	struct bpf_map_value_off *tab;
> +	int ret, i, nr_off;
> +
> +	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> +	BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);

Pls drop this line and comment. It's establishing a false sense of safety
that this is the place to worry about when enum is changing.
Any time an enum or #define is changed all code that uses it has to be audited.
This stack increase is a minor concern comparing to all other side effects
the increase of BPF_MAP_VALUE_OFF_MAX would do.

> +
> +	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> +	if (ret < 0)
> +		return ERR_PTR(ret);
> +	if (!ret)
> +		return NULL;
> +
> +	nr_off = ret;
> +	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
> +	if (!tab)
> +		return ERR_PTR(-ENOMEM);
> +
> +	for (i = 0; i < nr_off; i++) {
> +		const struct btf_type *t;
> +		struct btf *off_btf;

off_btf is an odd name here. Call it kernel_btf ?

> +		s32 id;
> +
> +		t = btf_type_by_id(btf, info_arr[i].type_id);

pls add a comment here to make it clear that above 'btf' is a prog's btf
and below search is trying to find it in kernel or module btf-s.

> +		id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
> +				     &off_btf);
> +		if (id < 0) {
> +			ret = id;
> +			goto end;
> +		}
> +
> +		tab->off[i].offset = info_arr[i].off;
> +		tab->off[i].kptr.btf_id = id;
> +		tab->off[i].kptr.btf = off_btf;
> +	}
> +	tab->nr_off = nr_off;
> +	return tab;
> +end:
> +	while (i--)
> +		btf_put(tab->off[i].kptr.btf);
> +	kfree(tab);
> +	return ERR_PTR(ret);
> +}
> +
>  static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
>  			      u32 type_id, void *data, u8 bits_offset,
>  			      struct btf_show *show)
> diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> index 5cd8f5277279..135205d0d560 100644
> --- a/kernel/bpf/map_in_map.c
> +++ b/kernel/bpf/map_in_map.c
> @@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>  	inner_map_meta->max_entries = inner_map->max_entries;
>  	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
>  	inner_map_meta->timer_off = inner_map->timer_off;
> +	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
>  	if (inner_map->btf) {
>  		btf_get(inner_map->btf);
>  		inner_map_meta->btf = inner_map->btf;
> @@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
>  
>  void bpf_map_meta_free(struct bpf_map *map_meta)
>  {
> +	bpf_map_free_kptr_off_tab(map_meta);
>  	btf_put(map_meta->btf);
>  	kfree(map_meta);
>  }
> @@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
>  		meta0->key_size == meta1->key_size &&
>  		meta0->value_size == meta1->value_size &&
>  		meta0->timer_off == meta1->timer_off &&
> -		meta0->map_flags == meta1->map_flags;
> +		meta0->map_flags == meta1->map_flags &&
> +		bpf_map_equal_kptr_off_tab(meta0, meta1);
>  }
>  
>  void *bpf_map_fd_get_ptr(struct bpf_map *map,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index e9621cfa09f2..fba49f390ed5 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -6,6 +6,7 @@
>  #include <linux/bpf_trace.h>
>  #include <linux/bpf_lirc.h>
>  #include <linux/bpf_verifier.h>
> +#include <linux/bsearch.h>
>  #include <linux/btf.h>
>  #include <linux/syscalls.h>
>  #include <linux/slab.h>
> @@ -473,12 +474,94 @@ static void bpf_map_release_memcg(struct bpf_map *map)
>  }
>  #endif
>  
> +static int bpf_map_kptr_off_cmp(const void *a, const void *b)
> +{
> +	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> +
> +	if (off_desc1->offset < off_desc2->offset)
> +		return -1;
> +	else if (off_desc1->offset > off_desc2->offset)
> +		return 1;
> +	return 0;
> +}
> +
> +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
> +{
> +	/* Since members are iterated in btf_find_field in increasing order,
> +	 * offsets appended to kptr_off_tab are in increasing order, so we can
> +	 * do bsearch to find exact match.
> +	 */
> +	struct bpf_map_value_off *tab;
> +
> +	if (!map_value_has_kptrs(map))
> +		return NULL;
> +	tab = map->kptr_off_tab;
> +	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
> +}
> +
> +void bpf_map_free_kptr_off_tab(struct bpf_map *map)
> +{
> +	struct bpf_map_value_off *tab = map->kptr_off_tab;
> +	int i;
> +
> +	if (!map_value_has_kptrs(map))
> +		return;
> +	for (i = 0; i < tab->nr_off; i++) {
> +		struct btf *btf = tab->off[i].kptr.btf;
> +
> +		btf_put(btf);

why not to do: btf_put(tab->off[i].kptr.btf);

> +	}
> +	kfree(tab);
> +	map->kptr_off_tab = NULL;
> +}
> +
> +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
> +{
> +	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
> +	int size, i, ret;
> +
> +	if (!map_value_has_kptrs(map))
> +		return ERR_PTR(-ENOENT);
> +	/* Do a deep copy of the kptr_off_tab */
> +	for (i = 0; i < tab->nr_off; i++)
> +		btf_get(tab->off[i].kptr.btf);
> +
> +	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
> +	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
> +	if (!new_tab) {
> +		ret = -ENOMEM;
> +		goto end;
> +	}
> +	return new_tab;
> +end:
> +	while (i--)
> +		btf_put(tab->off[i].kptr.btf);

Why do this get/put dance?
Isn't it equivalent to do kmemdup first and then for() btf_get?
kptr_off_tab is not going away and btfs are not going away either.
There is no race.

> +	return ERR_PTR(ret);
> +}
> +
> +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> +{
> +	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> +	bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
> +	int size;
> +
> +	if (!a_has_kptr && !b_has_kptr)
> +		return true;
> +	if (a_has_kptr != b_has_kptr)
> +		return false;
> +	if (tab_a->nr_off != tab_b->nr_off)
> +		return false;
> +	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> +	return !memcmp(tab_a, tab_b, size);
> +}
> +
>  /* called from workqueue */
>  static void bpf_map_free_deferred(struct work_struct *work)
>  {
>  	struct bpf_map *map = container_of(work, struct bpf_map, work);
>  
>  	security_bpf_map_free(map);
> +	bpf_map_free_kptr_off_tab(map);
>  	bpf_map_release_memcg(map);
>  	/* implementation dependent freeing */
>  	map->ops->map_free(map);
> @@ -640,7 +723,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
>  	int err;
>  
>  	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> -	    map_value_has_timer(map))
> +	    map_value_has_timer(map) || map_value_has_kptrs(map))
>  		return -ENOTSUPP;
>  
>  	if (!(vma->vm_flags & VM_SHARED))
> @@ -820,9 +903,33 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  			return -EOPNOTSUPP;
>  	}
>  
> -	if (map->ops->map_check_btf)
> +	map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
> +	if (map_value_has_kptrs(map)) {
> +		if (!bpf_capable()) {
> +			ret = -EPERM;
> +			goto free_map_tab;
> +		}
> +		if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
> +			ret = -EACCES;
> +			goto free_map_tab;
> +		}
> +		if (map->map_type != BPF_MAP_TYPE_HASH &&
> +		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> +		    map->map_type != BPF_MAP_TYPE_ARRAY) {
> +			ret = -EOPNOTSUPP;
> +			goto free_map_tab;
> +		}
> +	}
> +
> +	if (map->ops->map_check_btf) {
>  		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> +		if (ret < 0)
> +			goto free_map_tab;
> +	}
>  
> +	return ret;
> +free_map_tab:
> +	bpf_map_free_kptr_off_tab(map);
>  	return ret;
>  }
>  
> @@ -1639,7 +1746,7 @@ static int map_freeze(const union bpf_attr *attr)
>  		return PTR_ERR(map);
>  
>  	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> -	    map_value_has_timer(map)) {
> +	    map_value_has_timer(map) || map_value_has_kptrs(map)) {
>  		fdput(f);
>  		return -ENOTSUPP;
>  	}
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 71827d14724a..c802e51c4e18 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3211,7 +3211,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> -enum stack_access_src {
> +enum bpf_access_src {
>  	ACCESS_DIRECT = 1,  /* the access is performed by an instruction */
>  	ACCESS_HELPER = 2,  /* the access is performed by a helper */
>  };
> @@ -3219,7 +3219,7 @@ enum stack_access_src {
>  static int check_stack_range_initialized(struct bpf_verifier_env *env,
>  					 int regno, int off, int access_size,
>  					 bool zero_size_allowed,
> -					 enum stack_access_src type,
> +					 enum bpf_access_src type,
>  					 struct bpf_call_arg_meta *meta);
>  
>  static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno)
> @@ -3507,9 +3507,87 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
>  	return __check_ptr_off_reg(env, reg, regno, false);
>  }
>  
> +static int map_kptr_match_type(struct bpf_verifier_env *env,
> +			       struct bpf_map_value_off_desc *off_desc,
> +			       struct bpf_reg_state *reg, u32 regno)
> +{
> +	const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
> +	const char *reg_name = "";
> +
> +	if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
> +		goto bad_type;
> +
> +	if (!btf_is_kernel(reg->btf)) {
> +		verbose(env, "R%d must point to kernel BTF\n", regno);
> +		return -EINVAL;
> +	}
> +	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> +	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> +
> +	if (__check_ptr_off_reg(env, reg, regno, true))
> +		return -EACCES;
> +
> +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> +				  off_desc->kptr.btf, off_desc->kptr.btf_id))
> +		goto bad_type;

Is full type comparison really needed?
reg->btf should be the same pointer as off_desc->kptr.btf
and btf_id should match exactly.
Is this a feature proofing for some day when registers with PTR_TO_BTF_ID type
will start pointing to prog's btf?

> +	return 0;
> +bad_type:
> +	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> +		reg_type_str(env, reg->type), reg_name);
> +	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +	return -EINVAL;
> +}
> +
> +static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> +				 int value_regno, int insn_idx,
> +				 struct bpf_map_value_off_desc *off_desc)
> +{
> +	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> +	int class = BPF_CLASS(insn->code);
> +	struct bpf_reg_state *val_reg;
> +
> +	/* Things we already checked for in check_map_access and caller:
> +	 *  - Reject cases where variable offset may touch kptr
> +	 *  - size of access (must be BPF_DW)
> +	 *  - tnum_is_const(reg->var_off)
> +	 *  - off_desc->offset == off + reg->var_off.value
> +	 */
> +	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
> +	if (BPF_MODE(insn->code) != BPF_MEM) {
> +		verbose(env, "kptr in map can only be accessed using BPF_MEM instruction mode\n");
> +		return -EACCES;
> +	}
> +
> +	if (class == BPF_LDX) {
> +		val_reg = reg_state(env, value_regno);
> +		/* We can simply mark the value_regno receiving the pointer
> +		 * value from map as PTR_TO_BTF_ID, with the correct type.
> +		 */
> +		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> +				off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> +		val_reg->id = ++env->id_gen;

why non zero id this needed?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto
  2022-04-15 16:03 ` [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto Kumar Kartikeya Dwivedi
@ 2022-04-21  4:19   ` Alexei Starovoitov
  2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21  4:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 09:33:45PM +0530, Kumar Kartikeya Dwivedi wrote:
> Add a new type flag for bpf_arg_type that when set tells verifier that
> for a release function, that argument's register will be the one for
> which meta.ref_obj_id will be set, and which will then be released
> using release_reference. To capture the regno, introduce a new field
> release_regno in bpf_call_arg_meta.
> 
> This would be required in the next patch, where we may either pass NULL
> or a refcounted pointer as an argument to the release function
> bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
> enough, as there is a case where the type of argument needed matches,
> but the ref_obj_id is set to 0. Hence, we must enforce that whenever
> meta.ref_obj_id is zero, the register that is to be released can only
> be NULL for a release function.
> 
> Since we now indicate whether an argument is to be released in
> bpf_func_proto itself, is_release_function helper has lost its utitlity,
> hence refactor code to work without it, and just rely on
> meta.release_regno to know when to release state for a ref_obj_id.
> Still, the restriction of one release argument and only one ref_obj_id
> passed to BPF helper or kfunc remains. This may be lifted in the future.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h                           |  5 +-
>  include/linux/bpf_verifier.h                  |  3 +-
>  kernel/bpf/btf.c                              |  9 ++-
>  kernel/bpf/ringbuf.c                          |  4 +-
>  kernel/bpf/verifier.c                         | 76 +++++++++++--------
>  net/core/filter.c                             |  2 +-
>  .../selftests/bpf/verifier/ref_tracking.c     |  2 +-
>  tools/testing/selftests/bpf/verifier/sock.c   |  6 +-
>  8 files changed, 60 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ab86f4675db2..f73a3f10e654 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -366,7 +366,10 @@ enum bpf_type_flag {
>  	 */
>  	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
>  
> -	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
> +	/* Indicates that the pointer argument will be released. */
> +	PTR_RELEASE		= BIT(5 + BPF_BASE_TYPE_BITS),

I think OBJ_RELEASE as Joanne did it in her patch is a better name.

"pointer release" is not quite correct.
It's an object that pointer is pointing to will be released.

> +
> +	__BPF_TYPE_LAST_FLAG	= PTR_RELEASE,
>  };
>  
>  /* Max number of base types. */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 3a9d2d7cc6b7..1f1e7f2ea967 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
>  		      const struct bpf_reg_state *reg, int regno);
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
> -			   enum bpf_arg_type arg_type,
> -			   bool is_release_func);
> +			   enum bpf_arg_type arg_type);
>  int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>  			     u32 regno);
>  int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index be191df76ea4..7227a77a02f7 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5993,6 +5993,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  	 * verifier sees.
>  	 */
>  	for (i = 0; i < nargs; i++) {
> +		enum bpf_arg_type arg_type = ARG_DONTCARE;
>  		u32 regno = i + 1;
>  		struct bpf_reg_state *reg = &regs[regno];
>  
> @@ -6013,7 +6014,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
>  		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
>  
> -		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
> +		if (rel && reg->ref_obj_id)
> +			arg_type |= PTR_RELEASE;

Don't get it. Why ?

> +		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
>  		if (ret < 0)
>  			return ret;
>  
> @@ -6046,9 +6049,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  				reg_btf = reg->btf;
>  				reg_ref_id = reg->btf_id;
>  				/* Ensure only one argument is referenced
> -				 * PTR_TO_BTF_ID, check_func_arg_reg_off relies
> -				 * on only one referenced register being allowed
> -				 * for kfuncs.
> +				 * PTR_TO_BTF_ID.

/* Ensure only one argument is referenced PTR_TO_BTF_ID.

>  				 */
>  				if (reg->ref_obj_id) {
>  					if (ref_obj_id) {
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index 710ba9de12ce..a22c21c0a7ef 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
>  const struct bpf_func_proto bpf_ringbuf_submit_proto = {
>  	.func		= bpf_ringbuf_submit,
>  	.ret_type	= RET_VOID,
> -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
>  	.arg2_type	= ARG_ANYTHING,
>  };
>  
> @@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
>  const struct bpf_func_proto bpf_ringbuf_discard_proto = {
>  	.func		= bpf_ringbuf_discard,
>  	.ret_type	= RET_VOID,
> -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
>  	.arg2_type	= ARG_ANYTHING,
>  };
>  
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index c802e51c4e18..97f88d06f848 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
>  	struct bpf_map *map_ptr;
>  	bool raw_mode;
>  	bool pkt_access;
> +	u8 release_regno;
>  	int regno;

release_regno and regno are always equal.
Why go with u8 instead of bool flag?

>  	int access_size;
>  	int mem_size;
> @@ -471,17 +472,6 @@ static bool type_may_be_null(u32 type)
>  	return type & PTR_MAYBE_NULL;
>  }
>  
> -/* Determine whether the function releases some resources allocated by another
> - * function call. The first reference type argument will be assumed to be
> - * released by release_reference().
> - */
> -static bool is_release_function(enum bpf_func_id func_id)
> -{
> -	return func_id == BPF_FUNC_sk_release ||
> -	       func_id == BPF_FUNC_ringbuf_submit ||
> -	       func_id == BPF_FUNC_ringbuf_discard;
> -}
> -
>  static bool may_be_acquire_function(enum bpf_func_id func_id)
>  {
>  	return func_id == BPF_FUNC_sk_lookup_tcp ||
> @@ -5304,6 +5294,11 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
>  	       type == ARG_PTR_TO_LONG;
>  }
>  
> +static bool arg_type_is_release_ptr(enum bpf_arg_type type)

arg_type_is_relase() ?

> +{
> +	return type & PTR_RELEASE;
> +}
> +
>  static int int_ptr_type_to_size(enum bpf_arg_type type)
>  {
>  	if (type == ARG_PTR_TO_INT)
> @@ -5514,11 +5509,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
> -			   enum bpf_arg_type arg_type,
> -			   bool is_release_func)
> +			   enum bpf_arg_type arg_type)
>  {
> -	bool fixed_off_ok = false, release_reg;
>  	enum bpf_reg_type type = reg->type;
> +	bool fixed_off_ok = false;
>  
>  	switch ((u32)type) {
>  	case SCALAR_VALUE:
> @@ -5536,7 +5530,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  		/* Some of the argument types nevertheless require a
>  		 * zero register offset.
>  		 */
> -		if (arg_type != ARG_PTR_TO_ALLOC_MEM)
> +		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
>  			return 0;
>  		break;
>  	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
> @@ -5544,19 +5538,17 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 */
>  	case PTR_TO_BTF_ID:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
> -		 * it's fixed offset must be 0. We rely on the property that
> -		 * only one referenced register can be passed to BPF helpers and
> -		 * kfuncs. In the other cases, fixed offset can be non-zero.
> +		 * it's fixed offset must be 0.	In the other cases, fixed offset
> +		 * can be non-zero.
>  		 */
> -		release_reg = is_release_func && reg->ref_obj_id;
> -		if (release_reg && reg->off) {
> +		if (arg_type_is_release_ptr(arg_type) && reg->off) {
>  			verbose(env, "R%d must have zero offset when passed to release func\n",
>  				regno);
>  			return -EINVAL;
>  		}
> -		/* For release_reg == true, fixed_off_ok must be false, but we
> -		 * already checked and rejected reg->off != 0 above, so set to
> -		 * true to allow fixed offset for all other cases.
> +		/* For arg is release pointer, fixed_off_ok must be false, but
> +		 * we already checked and rejected reg->off != 0 above, so set
> +		 * to true to allow fixed offset for all other cases.
>  		 */
>  		fixed_off_ok = true;
>  		break;
> @@ -5615,14 +5607,24 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  	if (err)
>  		return err;
>  
> -	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
> +	err = check_func_arg_reg_off(env, reg, regno, arg_type);
>  	if (err)
>  		return err;
>  
>  skip_type_check:
> -	/* check_func_arg_reg_off relies on only one referenced register being
> -	 * allowed for BPF helpers.
> -	 */
> +	if (arg_type_is_release_ptr(arg_type)) {
> +		if (!reg->ref_obj_id && !register_is_null(reg)) {
> +			verbose(env, "R%d must be referenced when passed to release function\n",
> +				regno);
> +			return -EINVAL;
> +		}
> +		if (meta->release_regno) {
> +			verbose(env, "verifier internal error: more than one release argument\n");
> +			return -EFAULT;
> +		}
> +		meta->release_regno = regno;
> +	}
> +
>  	if (reg->ref_obj_id) {
>  		if (meta->ref_obj_id) {
>  			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> @@ -6129,7 +6131,8 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
>  	return true;
>  }
>  
> -static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
> +static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
> +			    struct bpf_call_arg_meta *meta)
>  {
>  	return check_raw_mode_ok(fn) &&
>  	       check_arg_pair_ok(fn) &&
> @@ -6813,7 +6816,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  	memset(&meta, 0, sizeof(meta));
>  	meta.pkt_access = fn->pkt_access;
>  
> -	err = check_func_proto(fn, func_id);
> +	err = check_func_proto(fn, func_id, &meta);
>  	if (err) {
>  		verbose(env, "kernel subsystem misconfigured func %s#%d\n",
>  			func_id_name(func_id), func_id);
> @@ -6846,8 +6849,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			return err;
>  	}
>  
> -	if (is_release_function(func_id)) {
> -		err = release_reference(env, meta.ref_obj_id);
> +	regs = cur_regs(env);
> +
> +	if (meta.release_regno) {
> +		err = -EINVAL;
> +		if (meta.ref_obj_id)
> +			err = release_reference(env, meta.ref_obj_id);
> +		/* meta.ref_obj_id can only be 0 if register that is meant to be
> +		 * released is NULL, which must be > R0.
> +		 */
> +		else if (register_is_null(&regs[meta.release_regno]))
> +			err = 0;
>  		if (err) {
>  			verbose(env, "func %s#%d reference has not been acquired before\n",
>  				func_id_name(func_id), func_id);
> @@ -6855,8 +6867,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  		}
>  	}
>  
> -	regs = cur_regs(env);
> -
>  	switch (func_id) {
>  	case BPF_FUNC_tail_call:
>  		err = check_reference_leak(env);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 143f442a9505..8eb01a997476 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
>  	.func		= bpf_sk_release,
>  	.gpl_only	= false,
>  	.ret_type	= RET_INTEGER,
> -	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> +	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_RELEASE,
>  };
>  
>  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> index fbd682520e47..57a83d763ec1 100644
> --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> @@ -796,7 +796,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "reference has not been acquired before",
> +	.errstr = "R1 must be referenced when passed to release function",
>  },
>  {
>  	/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
> diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
> index 86b24cad27a7..d11d0b28be41 100644
> --- a/tools/testing/selftests/bpf/verifier/sock.c
> +++ b/tools/testing/selftests/bpf/verifier/sock.c
> @@ -417,7 +417,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "reference has not been acquired before",
> +	.errstr = "R1 must be referenced when passed to release function",
>  },
>  {
>  	"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
> @@ -436,7 +436,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "reference has not been acquired before",
> +	.errstr = "R1 must be referenced when passed to release function",
>  },
>  {
>  	"bpf_sk_release(bpf_tcp_sock(skb->sk))",
> @@ -455,7 +455,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "reference has not been acquired before",
> +	.errstr = "R1 must be referenced when passed to release function",
>  },
>  {
>  	"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map
  2022-04-15 16:03 ` [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
@ 2022-04-21  4:21   ` Alexei Starovoitov
  2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21  4:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 09:33:46PM +0530, Kumar Kartikeya Dwivedi wrote:
> Extending the code in previous commits, introduce referenced kptr
> support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
> unreferenced kptr, referenced kptr have a lot more restrictions. In
> addition to the type matching, only a newly introduced bpf_kptr_xchg
> helper is allowed to modify the map value at that offset. This transfers
> the referenced pointer being stored into the map, releasing the
> references state for the program, and returning the old value and
> creating new reference state for the returned pointer.
> 
> Similar to unreferenced pointer case, return value for this case will
> also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
> must either be eventually released by calling the corresponding release
> function, otherwise it must be transferred into another map.
> 
> It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
> the value, and obtain the old value if any.
> 
> BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
> commit will permit using BPF_LDX for such pointers, but attempt at
> making it safe, since the lifetime of object won't be guaranteed.
> 
> There are valid reasons to enforce the restriction of permitting only
> bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
> consistent in face of concurrent modification, and any prior values
> contained in the map must also be released before a new one is moved
> into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
> returns the old value, which the verifier would require the user to
> either free or move into another map, and releases the reference held
> for the pointer being moved in.
> 
> In the future, direct BPF_XCHG instruction may also be permitted to work
> like bpf_kptr_xchg helper.
> 
> Note that process_kptr_func doesn't have to call
> check_helper_mem_access, since we already disallow rdonly/wronly flags
> for map, which is what check_map_access_type checks, and we already
> ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
> so check_map_access is also not required.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h            |  8 +++
>  include/uapi/linux/bpf.h       | 12 +++++
>  kernel/bpf/btf.c               | 10 +++-
>  kernel/bpf/helpers.c           | 21 ++++++++
>  kernel/bpf/verifier.c          | 98 +++++++++++++++++++++++++++++-----
>  tools/include/uapi/linux/bpf.h | 12 +++++
>  6 files changed, 148 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f73a3f10e654..61f83a23980f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -160,8 +160,14 @@ enum {
>  	BPF_MAP_VALUE_OFF_MAX = 8,
>  };
>  
> +enum bpf_map_off_desc_type {
> +	BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR,
> +	BPF_MAP_OFF_DESC_TYPE_REF_KPTR,

Those are verbose names and MAP_OFF_DESC part doesn't add value.
Maybe:
enum bpf_kptr_type {
 BPF_KPTR_UNREF,
 BPF_KPTR_REF
};

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr
  2022-04-15 16:03 ` [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
@ 2022-04-21  4:26   ` Alexei Starovoitov
  2022-04-21 19:39     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21  4:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 09:33:50PM +0530, Kumar Kartikeya Dwivedi wrote:
>  	return 0;
>  }
> @@ -386,6 +388,7 @@ static void array_map_free_timers(struct bpf_map *map)
>  	struct bpf_array *array = container_of(map, struct bpf_array, map);
>  	int i;
>  
> +	/* We don't reset or free kptr on uref dropping to zero. */
>  	if (likely(!map_value_has_timer(map)))

It was a copy paste mistake of mine to use likely() here in a cold
function. Let's not repeat it.

>  		return;
>  
> @@ -398,6 +401,13 @@ static void array_map_free_timers(struct bpf_map *map)
>  static void array_map_free(struct bpf_map *map)
>  {
>  	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	int i;
> +
> +	if (unlikely(map_value_has_kptrs(map))) {

Don't add unlikely() here.

> +		for (i = 0; i < array->map.max_entries; i++)
> +			bpf_map_free_kptrs(map, array->value + array->elem_size * i);
> +		bpf_map_free_kptr_off_tab(map);
> +	}
>  
>  	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
>  		bpf_array_free_percpu(array);
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index fdb4d4971a2a..062a751c1595 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3415,6 +3415,8 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
>  {
>  	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
>  	struct bpf_map_value_off *tab;
> +	struct btf *off_btf = NULL;
> +	struct module *mod = NULL;
>  	int ret, i, nr_off;
>  
>  	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> @@ -3433,7 +3435,6 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
>  
>  	for (i = 0; i < nr_off; i++) {
>  		const struct btf_type *t;
> -		struct btf *off_btf;
>  		s32 id;
>  
>  		t = btf_type_by_id(btf, info_arr[i].type_id);
> @@ -3444,16 +3445,69 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
>  			goto end;
>  		}
>  
> +		/* Find and stash the function pointer for the destruction function that
> +		 * needs to be eventually invoked from the map free path.
> +		 */
> +		if (info_arr[i].type == BPF_MAP_OFF_DESC_TYPE_REF_KPTR) {
> +			const struct btf_type *dtor_func;
> +			const char *dtor_func_name;
> +			unsigned long addr;
> +			s32 dtor_btf_id;
> +
> +			/* This call also serves as a whitelist of allowed objects that
> +			 * can be used as a referenced pointer and be stored in a map at
> +			 * the same time.
> +			 */
> +			dtor_btf_id = btf_find_dtor_kfunc(off_btf, id);
> +			if (dtor_btf_id < 0) {
> +				ret = dtor_btf_id;
> +				goto end_btf;
> +			}
> +
> +			dtor_func = btf_type_by_id(off_btf, dtor_btf_id);
> +			if (!dtor_func) {
> +				ret = -ENOENT;
> +				goto end_btf;
> +			}
> +
> +			if (btf_is_module(btf)) {
> +				mod = btf_try_get_module(off_btf);
> +				if (!mod) {
> +					ret = -ENXIO;
> +					goto end_btf;
> +				}
> +			}
> +
> +			/* We already verified dtor_func to be btf_type_is_func
> +			 * in register_btf_id_dtor_kfuncs.
> +			 */
> +			dtor_func_name = __btf_name_by_offset(off_btf, dtor_func->name_off);
> +			addr = kallsyms_lookup_name(dtor_func_name);
> +			if (!addr) {
> +				ret = -EINVAL;
> +				goto end_mod;
> +			}
> +			tab->off[i].kptr.dtor = (void *)addr;
> +		}
> +
>  		tab->off[i].offset = info_arr[i].off;
>  		tab->off[i].type = info_arr[i].type;
>  		tab->off[i].kptr.btf_id = id;
>  		tab->off[i].kptr.btf = off_btf;
> +		tab->off[i].kptr.module = mod;
>  	}
>  	tab->nr_off = nr_off;
>  	return tab;
> +end_mod:
> +	module_put(mod);
> +end_btf:
> +	btf_put(off_btf);
>  end:
> -	while (i--)
> +	while (i--) {
>  		btf_put(tab->off[i].kptr.btf);
> +		if (tab->off[i].kptr.module)
> +			module_put(tab->off[i].kptr.module);
> +	}
>  	kfree(tab);
>  	return ERR_PTR(ret);
>  }
> @@ -7059,6 +7113,43 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
>  	return dtor->kfunc_btf_id;
>  }
>  
> +static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc *dtors, u32 cnt)
> +{
> +	const struct btf_type *dtor_func, *dtor_func_proto, *t;
> +	const struct btf_param *args;
> +	s32 dtor_btf_id;
> +	u32 nr_args, i;
> +
> +	for (i = 0; i < cnt; i++) {
> +		dtor_btf_id = dtors[i].kfunc_btf_id;
> +
> +		dtor_func = btf_type_by_id(btf, dtor_btf_id);
> +		if (!dtor_func || !btf_type_is_func(dtor_func))
> +			return -EINVAL;
> +
> +		dtor_func_proto = btf_type_by_id(btf, dtor_func->type);
> +		if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto))
> +			return -EINVAL;
> +
> +		/* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
> +		t = btf_type_by_id(btf, dtor_func_proto->type);
> +		if (!t || !btf_type_is_void(t))
> +			return -EINVAL;
> +
> +		nr_args = btf_type_vlen(dtor_func_proto);
> +		if (nr_args != 1)
> +			return -EINVAL;
> +		args = btf_params(dtor_func_proto);
> +		t = btf_type_by_id(btf, args[0].type);
> +		/* Allow any pointer type, as width on targets Linux supports
> +		 * will be same for all pointer types (i.e. sizeof(void *))
> +		 */
> +		if (!t || !btf_type_is_ptr(t))
> +			return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  /* This function must be invoked only from initcalls/module init functions */
>  int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
>  				struct module *owner)
> @@ -7089,6 +7180,11 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
>  		goto end;
>  	}
>  
> +	/* Ensure that the prototype of dtor kfuncs being registered is sane */
> +	ret = btf_check_dtor_kfuncs(btf, dtors, add_cnt);
> +	if (ret < 0)
> +		goto end;
> +
>  	tab = btf->dtor_kfunc_tab;
>  	/* Only one call allowed for modules */
>  	if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index c68fbebc8c00..2bc9416096ca 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -254,6 +254,25 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
>  	}
>  }
>  
> +static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
> +{
> +	u32 num_entries = htab->map.max_entries;
> +	int i;
> +
> +	if (likely(!map_value_has_kptrs(&htab->map)))

drop it here too.

> +		return;
> +	if (htab_has_extra_elems(htab))
> +		num_entries += num_possible_cpus();
> +
> +	for (i = 0; i < num_entries; i++) {
> +		struct htab_elem *elem;
> +
> +		elem = get_htab_elem(htab, i);
> +		bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
> +		cond_resched();
> +	}
> +}
> +
>  static void htab_free_elems(struct bpf_htab *htab)
>  {
>  	int i;
> @@ -725,12 +744,15 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
>  	return insn - insn_buf;
>  }
>  
> -static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
> +static void check_and_free_fields(struct bpf_htab *htab,
> +				  struct htab_elem *elem)
>  {
> +	void *map_value = elem->key + round_up(htab->map.key_size, 8);
> +
>  	if (unlikely(map_value_has_timer(&htab->map)))

remove my copy-paste error pls.

> -		bpf_timer_cancel_and_free(elem->key +
> -					  round_up(htab->map.key_size, 8) +
> -					  htab->map.timer_off);
> +		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
> +	if (unlikely(map_value_has_kptrs(&htab->map)))

don't add it.

> +		bpf_map_free_kptrs(&htab->map, map_value);
>  }
>  
>  /* It is called from the bpf_lru_list when the LRU needs to delete
> @@ -757,7 +779,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
>  	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
>  		if (l == tgt_l) {
>  			hlist_nulls_del_rcu(&l->hash_node);
> -			check_and_free_timer(htab, l);
> +			check_and_free_fields(htab, l);
>  			break;
>  		}
>  
> @@ -829,7 +851,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
>  {
>  	if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
>  		free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
> -	check_and_free_timer(htab, l);
> +	check_and_free_fields(htab, l);
>  	kfree(l);
>  }
>  
> @@ -857,7 +879,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
>  	htab_put_fd_value(htab, l);
>  
>  	if (htab_is_prealloc(htab)) {
> -		check_and_free_timer(htab, l);
> +		check_and_free_fields(htab, l);
>  		__pcpu_freelist_push(&htab->freelist, &l->fnode);
>  	} else {
>  		atomic_dec(&htab->count);
> @@ -1104,7 +1126,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>  		if (!htab_is_prealloc(htab))
>  			free_htab_elem(htab, l_old);
>  		else
> -			check_and_free_timer(htab, l_old);
> +			check_and_free_fields(htab, l_old);
>  	}
>  	ret = 0;
>  err:
> @@ -1114,7 +1136,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>  
>  static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
>  {
> -	check_and_free_timer(htab, elem);
> +	check_and_free_fields(htab, elem);
>  	bpf_lru_push_free(&htab->lru, &elem->lru_node);
>  }
>  
> @@ -1419,8 +1441,14 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
>  		struct hlist_nulls_node *n;
>  		struct htab_elem *l;
>  
> -		hlist_nulls_for_each_entry(l, n, head, hash_node)
> -			check_and_free_timer(htab, l);
> +		hlist_nulls_for_each_entry(l, n, head, hash_node) {
> +			/* We don't reset or free kptr on uref dropping to zero,
> +			 * hence just free timer.
> +			 */
> +			bpf_timer_cancel_and_free(l->key +
> +						  round_up(htab->map.key_size, 8) +
> +						  htab->map.timer_off);
> +		}
>  		cond_resched_rcu();
>  	}
>  	rcu_read_unlock();
> @@ -1430,6 +1458,7 @@ static void htab_map_free_timers(struct bpf_map *map)
>  {
>  	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
>  
> +	/* We don't reset or free kptr on uref dropping to zero. */
>  	if (likely(!map_value_has_timer(&htab->map)))

pls remove.

>  		return;
>  	if (!htab_is_prealloc(htab))
> @@ -1453,11 +1482,14 @@ static void htab_map_free(struct bpf_map *map)
>  	 * not have executed. Wait for them.
>  	 */
>  	rcu_barrier();
> -	if (!htab_is_prealloc(htab))
> +	if (!htab_is_prealloc(htab)) {
>  		delete_all_elements(htab);
> -	else
> +	} else {
> +		htab_free_prealloced_kptrs(htab);
>  		prealloc_destroy(htab);
> +	}
>  
> +	bpf_map_free_kptr_off_tab(map);
>  	free_percpu(htab->extra_elems);
>  	bpf_map_area_free(htab->buckets);
>  	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1b1497b94303..518acf39b40c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -508,8 +508,11 @@ void bpf_map_free_kptr_off_tab(struct bpf_map *map)
>  	if (!map_value_has_kptrs(map))
>  		return;
>  	for (i = 0; i < tab->nr_off; i++) {
> +		struct module *mod = tab->off[i].kptr.module;
>  		struct btf *btf = tab->off[i].kptr.btf;
>  
> +		if (mod)
> +			module_put(mod);
>  		btf_put(btf);
>  	}
>  	kfree(tab);
> @@ -524,8 +527,16 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
>  	if (!map_value_has_kptrs(map))
>  		return ERR_PTR(-ENOENT);
>  	/* Do a deep copy of the kptr_off_tab */
> -	for (i = 0; i < tab->nr_off; i++)
> -		btf_get(tab->off[i].kptr.btf);
> +	for (i = 0; i < tab->nr_off; i++) {
> +		struct module *mod = tab->off[i].kptr.module;
> +		struct btf *btf = tab->off[i].kptr.btf;
> +
> +		if (mod && !try_module_get(mod)) {
> +			ret = -ENXIO;
> +			goto end;
> +		}
> +		btf_get(btf);
> +	}
>  
>  	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
>  	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
> @@ -535,8 +546,14 @@ struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
>  	}
>  	return new_tab;
>  end:
> -	while (i--)
> -		btf_put(tab->off[i].kptr.btf);
> +	while (i--) {
> +		struct module *mod = tab->off[i].kptr.module;
> +		struct btf *btf = tab->off[i].kptr.btf;
> +
> +		if (mod)
> +			module_put(mod);
> +		btf_put(btf);
> +	}
>  	return ERR_PTR(ret);
>  }
>  
> @@ -556,6 +573,33 @@ bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_ma
>  	return !memcmp(tab_a, tab_b, size);
>  }
>  
> +/* Caller must ensure map_value_has_kptrs is true. Note that this function can
> + * be called on a map value while the map_value is visible to BPF programs, as
> + * it ensures the correct synchronization, and we already enforce the same using
> + * the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
> + */
> +void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
> +{
> +	struct bpf_map_value_off *tab = map->kptr_off_tab;
> +	unsigned long *btf_id_ptr;
> +	int i;
> +
> +	for (i = 0; i < tab->nr_off; i++) {
> +		struct bpf_map_value_off_desc *off_desc = &tab->off[i];
> +		unsigned long old_ptr;
> +
> +		btf_id_ptr = map_value + off_desc->offset;
> +		if (off_desc->type == BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR) {
> +			u64 *p = (u64 *)btf_id_ptr;
> +
> +			WRITE_ONCE(p, 0);
> +			continue;
> +		}
> +		old_ptr = xchg(btf_id_ptr, 0);
> +		off_desc->kptr.dtor((void *)old_ptr);
> +	}
> +}
> +
>  /* called from workqueue */
>  static void bpf_map_free_deferred(struct work_struct *work)
>  {
> @@ -563,9 +607,10 @@ static void bpf_map_free_deferred(struct work_struct *work)
>  
>  	security_bpf_map_free(map);
>  	kfree(map->off_arr);
> -	bpf_map_free_kptr_off_tab(map);
>  	bpf_map_release_memcg(map);
> -	/* implementation dependent freeing */
> +	/* implementation dependent freeing, map_free callback also does
> +	 * bpf_map_free_kptr_off_tab, if needed.
> +	 */
>  	map->ops->map_free(map);
>  }
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access
  2022-04-15 16:03 ` [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
@ 2022-04-21  4:30   ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21  4:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Joanne Koong, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 15, 2022 at 09:33:43PM +0530, Kumar Kartikeya Dwivedi wrote:
> Some functions in next patch want to use this function, and those
> functions will be called by check_map_access, hence move it before
> check_map_access.
> 
> Acked-by: Joanne Koong <joannelkoong@gmail.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

I've applied the first two patches.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps
  2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2022-04-15 16:03 ` [PATCH bpf-next v5 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
@ 2022-04-21  4:40 ` patchwork-bot+netdevbpf
  13 siblings, 0 replies; 30+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-04-21  4:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, ast, andrii, daniel, joannelkoong, toke, brouer

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri, 15 Apr 2022 21:33:41 +0530 you wrote:
> This set enables storing pointers of a certain type in BPF map, and extends the
> verifier to enforce type safety and lifetime correctness properties.
> 
> The infrastructure being added is generic enough for allowing storing any kind
> of pointers whose type is available using BTF (user or kernel) in the future
> (e.g. strongly typed memory allocation in BPF program), which are internally
> tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
> two kinds of pointers obtained from the kernel.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v5,01/13] bpf: Make btf_find_field more generic
    https://git.kernel.org/bpf/bpf-next/c/91af2fc8739e
  - [bpf-next,v5,02/13] bpf: Move check_ptr_off_reg before check_map_access
    https://git.kernel.org/bpf/bpf-next/c/0ed6ff597f2d
  - [bpf-next,v5,03/13] bpf: Allow storing unreferenced kptr in map
    (no matching commit)
  - [bpf-next,v5,04/13] bpf: Tag argument to be released in bpf_func_proto
    (no matching commit)
  - [bpf-next,v5,05/13] bpf: Allow storing referenced kptr in map
    (no matching commit)
  - [bpf-next,v5,06/13] bpf: Prevent escaping of kptr loaded from maps
    (no matching commit)
  - [bpf-next,v5,07/13] bpf: Adapt copy_map_value for multiple offset case
    (no matching commit)
  - [bpf-next,v5,08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf
    (no matching commit)
  - [bpf-next,v5,09/13] bpf: Wire up freeing of referenced kptr
    (no matching commit)
  - [bpf-next,v5,10/13] bpf: Teach verifier about kptr_get kfunc helpers
    (no matching commit)
  - [bpf-next,v5,11/13] libbpf: Add kptr type tag macros to bpf_helpers.h
    (no matching commit)
  - [bpf-next,v5,12/13] selftests/bpf: Add C tests for kptr
    (no matching commit)
  - [bpf-next,v5,13/13] selftests/bpf: Add verifier tests for kptr
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map
  2022-04-21  4:15   ` Alexei Starovoitov
@ 2022-04-21 19:36     ` Kumar Kartikeya Dwivedi
  2022-04-21 22:26       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-21 19:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Thu, Apr 21, 2022 at 09:45:28AM IST, Alexei Starovoitov wrote:
> On Fri, Apr 15, 2022 at 09:33:44PM +0530, Kumar Kartikeya Dwivedi wrote:
> > +struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
> > +					  const struct btf_type *t)
> > +{
> > +	struct btf_field_info info_arr[BPF_MAP_VALUE_OFF_MAX];
> > +	struct bpf_map_value_off *tab;
> > +	int ret, i, nr_off;
> > +
> > +	/* Revisit stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > +	BUILD_BUG_ON(BPF_MAP_VALUE_OFF_MAX != 8);
>
> Pls drop this line and comment. It's establishing a false sense of safety
> that this is the place to worry about when enum is changing.
> Any time an enum or #define is changed all code that uses it has to be audited.
> This stack increase is a minor concern comparing to all other side effects
> the increase of BPF_MAP_VALUE_OFF_MAX would do.
>

Ok.

> > +
> > +	ret = btf_find_field(btf, t, BTF_FIELD_KPTR, info_arr, ARRAY_SIZE(info_arr));
> > +	if (ret < 0)
> > +		return ERR_PTR(ret);
> > +	if (!ret)
> > +		return NULL;
> > +
> > +	nr_off = ret;
> > +	tab = kzalloc(offsetof(struct bpf_map_value_off, off[nr_off]), GFP_KERNEL | __GFP_NOWARN);
> > +	if (!tab)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	for (i = 0; i < nr_off; i++) {
> > +		const struct btf_type *t;
> > +		struct btf *off_btf;
>
> off_btf is an odd name here. Call it kernel_btf ?
>

Ok, will rename.

> > +		s32 id;
> > +
> > +		t = btf_type_by_id(btf, info_arr[i].type_id);
>
> pls add a comment here to make it clear that above 'btf' is a prog's btf
> and below search is trying to find it in kernel or module btf-s.
>

Ok.

> > +		id = bpf_find_btf_id(__btf_name_by_offset(btf, t->name_off), BTF_INFO_KIND(t->info),
> > +				     &off_btf);
> > +		if (id < 0) {
> > +			ret = id;
> > +			goto end;
> > +		}
> > +
> > +		tab->off[i].offset = info_arr[i].off;
> > +		tab->off[i].kptr.btf_id = id;
> > +		tab->off[i].kptr.btf = off_btf;
> > +	}
> > +	tab->nr_off = nr_off;
> > +	return tab;
> > +end:
> > +	while (i--)
> > +		btf_put(tab->off[i].kptr.btf);
> > +	kfree(tab);
> > +	return ERR_PTR(ret);
> > +}
> > +
> >  static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
> >  			      u32 type_id, void *data, u8 bits_offset,
> >  			      struct btf_show *show)
> > diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
> > index 5cd8f5277279..135205d0d560 100644
> > --- a/kernel/bpf/map_in_map.c
> > +++ b/kernel/bpf/map_in_map.c
> > @@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> >  	inner_map_meta->max_entries = inner_map->max_entries;
> >  	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
> >  	inner_map_meta->timer_off = inner_map->timer_off;
> > +	inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
> >  	if (inner_map->btf) {
> >  		btf_get(inner_map->btf);
> >  		inner_map_meta->btf = inner_map->btf;
> > @@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
> >
> >  void bpf_map_meta_free(struct bpf_map *map_meta)
> >  {
> > +	bpf_map_free_kptr_off_tab(map_meta);
> >  	btf_put(map_meta->btf);
> >  	kfree(map_meta);
> >  }
> > @@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
> >  		meta0->key_size == meta1->key_size &&
> >  		meta0->value_size == meta1->value_size &&
> >  		meta0->timer_off == meta1->timer_off &&
> > -		meta0->map_flags == meta1->map_flags;
> > +		meta0->map_flags == meta1->map_flags &&
> > +		bpf_map_equal_kptr_off_tab(meta0, meta1);
> >  }
> >
> >  void *bpf_map_fd_get_ptr(struct bpf_map *map,
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index e9621cfa09f2..fba49f390ed5 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/bpf_trace.h>
> >  #include <linux/bpf_lirc.h>
> >  #include <linux/bpf_verifier.h>
> > +#include <linux/bsearch.h>
> >  #include <linux/btf.h>
> >  #include <linux/syscalls.h>
> >  #include <linux/slab.h>
> > @@ -473,12 +474,94 @@ static void bpf_map_release_memcg(struct bpf_map *map)
> >  }
> >  #endif
> >
> > +static int bpf_map_kptr_off_cmp(const void *a, const void *b)
> > +{
> > +	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
> > +
> > +	if (off_desc1->offset < off_desc2->offset)
> > +		return -1;
> > +	else if (off_desc1->offset > off_desc2->offset)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
> > +{
> > +	/* Since members are iterated in btf_find_field in increasing order,
> > +	 * offsets appended to kptr_off_tab are in increasing order, so we can
> > +	 * do bsearch to find exact match.
> > +	 */
> > +	struct bpf_map_value_off *tab;
> > +
> > +	if (!map_value_has_kptrs(map))
> > +		return NULL;
> > +	tab = map->kptr_off_tab;
> > +	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
> > +}
> > +
> > +void bpf_map_free_kptr_off_tab(struct bpf_map *map)
> > +{
> > +	struct bpf_map_value_off *tab = map->kptr_off_tab;
> > +	int i;
> > +
> > +	if (!map_value_has_kptrs(map))
> > +		return;
> > +	for (i = 0; i < tab->nr_off; i++) {
> > +		struct btf *btf = tab->off[i].kptr.btf;
> > +
> > +		btf_put(btf);
>
> why not to do: btf_put(tab->off[i].kptr.btf);
>

Ok.

> > +	}
> > +	kfree(tab);
> > +	map->kptr_off_tab = NULL;
> > +}
> > +
> > +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
> > +{
> > +	struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
> > +	int size, i, ret;
> > +
> > +	if (!map_value_has_kptrs(map))
> > +		return ERR_PTR(-ENOENT);
> > +	/* Do a deep copy of the kptr_off_tab */
> > +	for (i = 0; i < tab->nr_off; i++)
> > +		btf_get(tab->off[i].kptr.btf);
> > +
> > +	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
> > +	new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
> > +	if (!new_tab) {
> > +		ret = -ENOMEM;
> > +		goto end;
> > +	}
> > +	return new_tab;
> > +end:
> > +	while (i--)
> > +		btf_put(tab->off[i].kptr.btf);
>
> Why do this get/put dance?
> Isn't it equivalent to do kmemdup first and then for() btf_get?
> kptr_off_tab is not going away and btfs are not going away either.
> There is no race.
>

You are right, we should be able to just do kmemdup + btf_get loop.

> > +	return ERR_PTR(ret);
> > +}
> > +
> > +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
> > +{
> > +	struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
> > +	bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
> > +	int size;
> > +
> > +	if (!a_has_kptr && !b_has_kptr)
> > +		return true;
> > +	if (a_has_kptr != b_has_kptr)
> > +		return false;
> > +	if (tab_a->nr_off != tab_b->nr_off)
> > +		return false;
> > +	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
> > +	return !memcmp(tab_a, tab_b, size);
> > +}
> > +
> >  /* called from workqueue */
> >  static void bpf_map_free_deferred(struct work_struct *work)
> >  {
> >  	struct bpf_map *map = container_of(work, struct bpf_map, work);
> >
> >  	security_bpf_map_free(map);
> > +	bpf_map_free_kptr_off_tab(map);
> >  	bpf_map_release_memcg(map);
> >  	/* implementation dependent freeing */
> >  	map->ops->map_free(map);
> > @@ -640,7 +723,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
> >  	int err;
> >
> >  	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
> > -	    map_value_has_timer(map))
> > +	    map_value_has_timer(map) || map_value_has_kptrs(map))
> >  		return -ENOTSUPP;
> >
> >  	if (!(vma->vm_flags & VM_SHARED))
> > @@ -820,9 +903,33 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
> >  			return -EOPNOTSUPP;
> >  	}
> >
> > -	if (map->ops->map_check_btf)
> > +	map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
> > +	if (map_value_has_kptrs(map)) {
> > +		if (!bpf_capable()) {
> > +			ret = -EPERM;
> > +			goto free_map_tab;
> > +		}
> > +		if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
> > +			ret = -EACCES;
> > +			goto free_map_tab;
> > +		}
> > +		if (map->map_type != BPF_MAP_TYPE_HASH &&
> > +		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
> > +		    map->map_type != BPF_MAP_TYPE_ARRAY) {
> > +			ret = -EOPNOTSUPP;
> > +			goto free_map_tab;
> > +		}
> > +	}
> > +
> > +	if (map->ops->map_check_btf) {
> >  		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
> > +		if (ret < 0)
> > +			goto free_map_tab;
> > +	}
> >
> > +	return ret;
> > +free_map_tab:
> > +	bpf_map_free_kptr_off_tab(map);
> >  	return ret;
> >  }
> >
> > @@ -1639,7 +1746,7 @@ static int map_freeze(const union bpf_attr *attr)
> >  		return PTR_ERR(map);
> >
> >  	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
> > -	    map_value_has_timer(map)) {
> > +	    map_value_has_timer(map) || map_value_has_kptrs(map)) {
> >  		fdput(f);
> >  		return -ENOTSUPP;
> >  	}
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 71827d14724a..c802e51c4e18 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -3211,7 +3211,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env,
> >  	return 0;
> >  }
> >
> > -enum stack_access_src {
> > +enum bpf_access_src {
> >  	ACCESS_DIRECT = 1,  /* the access is performed by an instruction */
> >  	ACCESS_HELPER = 2,  /* the access is performed by a helper */
> >  };
> > @@ -3219,7 +3219,7 @@ enum stack_access_src {
> >  static int check_stack_range_initialized(struct bpf_verifier_env *env,
> >  					 int regno, int off, int access_size,
> >  					 bool zero_size_allowed,
> > -					 enum stack_access_src type,
> > +					 enum bpf_access_src type,
> >  					 struct bpf_call_arg_meta *meta);
> >
> >  static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno)
> > @@ -3507,9 +3507,87 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> >  	return __check_ptr_off_reg(env, reg, regno, false);
> >  }
> >
> > +static int map_kptr_match_type(struct bpf_verifier_env *env,
> > +			       struct bpf_map_value_off_desc *off_desc,
> > +			       struct bpf_reg_state *reg, u32 regno)
> > +{
> > +	const char *targ_name = kernel_type_name(off_desc->kptr.btf, off_desc->kptr.btf_id);
> > +	const char *reg_name = "";
> > +
> > +	if (base_type(reg->type) != PTR_TO_BTF_ID || type_flag(reg->type) != PTR_MAYBE_NULL)
> > +		goto bad_type;
> > +
> > +	if (!btf_is_kernel(reg->btf)) {
> > +		verbose(env, "R%d must point to kernel BTF\n", regno);
> > +		return -EINVAL;
> > +	}
> > +	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > +	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > +
> > +	if (__check_ptr_off_reg(env, reg, regno, true))
> > +		return -EACCES;
> > +
> > +	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > +				  off_desc->kptr.btf, off_desc->kptr.btf_id))
> > +		goto bad_type;
>
> Is full type comparison really needed?

Yes.

> reg->btf should be the same pointer as off_desc->kptr.btf
> and btf_id should match exactly.

This is not true, it can be vmlinux or module BTF. But if you mean just
comparing the pointer and btf_id, we still need to handle reg->off.

We want to support cases like:

struct foo {
	struct bar br;
	struct baz bz;
};

struct foo *v = func(); // PTR_TO_BTF_ID
map->foo = v;	   // reg->off is zero, btf and btf_id matches type.
map->bar = &v->br; // reg->off is still zero, but we need to walk and retry with
		   // first member type of struct after comparison fails.
map->baz = &v->bz; // reg->off is non-zero, so struct needs to be walked to
		   // match type.

In the ref case, the argument's offset will always be 0, so third case is not
going to work, but in the unref case, we want to allow storing pointers to
structs embedded inside parent struct.

Please let me know if I misunderstood what you meant.

> Is this a feature proofing for some day when registers with PTR_TO_BTF_ID type
> will start pointing to prog's btf?
>
> > +	return 0;
> > +bad_type:
> > +	verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > +		reg_type_str(env, reg->type), reg_name);
> > +	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +	return -EINVAL;
> > +}
> > +
> > +static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > +				 int value_regno, int insn_idx,
> > +				 struct bpf_map_value_off_desc *off_desc)
> > +{
> > +	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > +	int class = BPF_CLASS(insn->code);
> > +	struct bpf_reg_state *val_reg;
> > +
> > +	/* Things we already checked for in check_map_access and caller:
> > +	 *  - Reject cases where variable offset may touch kptr
> > +	 *  - size of access (must be BPF_DW)
> > +	 *  - tnum_is_const(reg->var_off)
> > +	 *  - off_desc->offset == off + reg->var_off.value
> > +	 */
> > +	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
> > +	if (BPF_MODE(insn->code) != BPF_MEM) {
> > +		verbose(env, "kptr in map can only be accessed using BPF_MEM instruction mode\n");
> > +		return -EACCES;
> > +	}
> > +
> > +	if (class == BPF_LDX) {
> > +		val_reg = reg_state(env, value_regno);
> > +		/* We can simply mark the value_regno receiving the pointer
> > +		 * value from map as PTR_TO_BTF_ID, with the correct type.
> > +		 */
> > +		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> > +				off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> > +		val_reg->id = ++env->id_gen;
>
> why non zero id this needed?

For mark_ptr_or_null_reg. I'll add a comment.

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto
  2022-04-21  4:19   ` Alexei Starovoitov
@ 2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
  2022-04-24 21:57       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-21 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Thu, Apr 21, 2022 at 09:49:54AM IST, Alexei Starovoitov wrote:
> On Fri, Apr 15, 2022 at 09:33:45PM +0530, Kumar Kartikeya Dwivedi wrote:
> > Add a new type flag for bpf_arg_type that when set tells verifier that
> > for a release function, that argument's register will be the one for
> > which meta.ref_obj_id will be set, and which will then be released
> > using release_reference. To capture the regno, introduce a new field
> > release_regno in bpf_call_arg_meta.
> >
> > This would be required in the next patch, where we may either pass NULL
> > or a refcounted pointer as an argument to the release function
> > bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
> > enough, as there is a case where the type of argument needed matches,
> > but the ref_obj_id is set to 0. Hence, we must enforce that whenever
> > meta.ref_obj_id is zero, the register that is to be released can only
> > be NULL for a release function.
> >
> > Since we now indicate whether an argument is to be released in
> > bpf_func_proto itself, is_release_function helper has lost its utitlity,
> > hence refactor code to work without it, and just rely on
> > meta.release_regno to know when to release state for a ref_obj_id.
> > Still, the restriction of one release argument and only one ref_obj_id
> > passed to BPF helper or kfunc remains. This may be lifted in the future.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h                           |  5 +-
> >  include/linux/bpf_verifier.h                  |  3 +-
> >  kernel/bpf/btf.c                              |  9 ++-
> >  kernel/bpf/ringbuf.c                          |  4 +-
> >  kernel/bpf/verifier.c                         | 76 +++++++++++--------
> >  net/core/filter.c                             |  2 +-
> >  .../selftests/bpf/verifier/ref_tracking.c     |  2 +-
> >  tools/testing/selftests/bpf/verifier/sock.c   |  6 +-
> >  8 files changed, 60 insertions(+), 47 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index ab86f4675db2..f73a3f10e654 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -366,7 +366,10 @@ enum bpf_type_flag {
> >  	 */
> >  	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
> >
> > -	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
> > +	/* Indicates that the pointer argument will be released. */
> > +	PTR_RELEASE		= BIT(5 + BPF_BASE_TYPE_BITS),
>
> I think OBJ_RELEASE as Joanne did it in her patch is a better name.
>
> "pointer release" is not quite correct.
> It's an object that pointer is pointing to will be released.
>

Ok, will rename.

> > +
> > +	__BPF_TYPE_LAST_FLAG	= PTR_RELEASE,
> >  };
> >
> >  /* Max number of base types. */
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 3a9d2d7cc6b7..1f1e7f2ea967 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
> >  		      const struct bpf_reg_state *reg, int regno);
> >  int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  			   const struct bpf_reg_state *reg, int regno,
> > -			   enum bpf_arg_type arg_type,
> > -			   bool is_release_func);
> > +			   enum bpf_arg_type arg_type);
> >  int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> >  			     u32 regno);
> >  int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index be191df76ea4..7227a77a02f7 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -5993,6 +5993,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  	 * verifier sees.
> >  	 */
> >  	for (i = 0; i < nargs; i++) {
> > +		enum bpf_arg_type arg_type = ARG_DONTCARE;
> >  		u32 regno = i + 1;
> >  		struct bpf_reg_state *reg = &regs[regno];
> >
> > @@ -6013,7 +6014,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
> >  		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
> >
> > -		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
> > +		if (rel && reg->ref_obj_id)
> > +			arg_type |= PTR_RELEASE;
>
> Don't get it. Why ?
>

It uses arg_type_is_release_ptr, so to indicate this is release argument we set
this flag.

> > +		ret = check_func_arg_reg_off(env, reg, regno, arg_type);
> >  		if (ret < 0)
> >  			return ret;
> >
> > @@ -6046,9 +6049,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  				reg_btf = reg->btf;
> >  				reg_ref_id = reg->btf_id;
> >  				/* Ensure only one argument is referenced
> > -				 * PTR_TO_BTF_ID, check_func_arg_reg_off relies
> > -				 * on only one referenced register being allowed
> > -				 * for kfuncs.
> > +				 * PTR_TO_BTF_ID.
>
> /* Ensure only one argument is referenced PTR_TO_BTF_ID.
>

Ok.

> >  				 */
> >  				if (reg->ref_obj_id) {
> >  					if (ref_obj_id) {
> > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> > index 710ba9de12ce..a22c21c0a7ef 100644
> > --- a/kernel/bpf/ringbuf.c
> > +++ b/kernel/bpf/ringbuf.c
> > @@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
> >  const struct bpf_func_proto bpf_ringbuf_submit_proto = {
> >  	.func		= bpf_ringbuf_submit,
> >  	.ret_type	= RET_VOID,
> > -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> > +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
> >  	.arg2_type	= ARG_ANYTHING,
> >  };
> >
> > @@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
> >  const struct bpf_func_proto bpf_ringbuf_discard_proto = {
> >  	.func		= bpf_ringbuf_discard,
> >  	.ret_type	= RET_VOID,
> > -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> > +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | PTR_RELEASE,
> >  	.arg2_type	= ARG_ANYTHING,
> >  };
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index c802e51c4e18..97f88d06f848 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
> >  	struct bpf_map *map_ptr;
> >  	bool raw_mode;
> >  	bool pkt_access;
> > +	u8 release_regno;
> >  	int regno;
>
> release_regno and regno are always equal.
> Why go with u8 instead of bool flag?
>

Didn't realise that. I will change it.

> >  	int access_size;
> >  	int mem_size;
> > @@ -471,17 +472,6 @@ static bool type_may_be_null(u32 type)
> >  	return type & PTR_MAYBE_NULL;
> >  }
> >
> > -/* Determine whether the function releases some resources allocated by another
> > - * function call. The first reference type argument will be assumed to be
> > - * released by release_reference().
> > - */
> > -static bool is_release_function(enum bpf_func_id func_id)
> > -{
> > -	return func_id == BPF_FUNC_sk_release ||
> > -	       func_id == BPF_FUNC_ringbuf_submit ||
> > -	       func_id == BPF_FUNC_ringbuf_discard;
> > -}
> > -
> >  static bool may_be_acquire_function(enum bpf_func_id func_id)
> >  {
> >  	return func_id == BPF_FUNC_sk_lookup_tcp ||
> > @@ -5304,6 +5294,11 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
> >  	       type == ARG_PTR_TO_LONG;
> >  }
> >
> > +static bool arg_type_is_release_ptr(enum bpf_arg_type type)
>
> arg_type_is_relase() ?
>

Ok.

> > +{
> > +	return type & PTR_RELEASE;
> > +}
> > +
> >  static int int_ptr_type_to_size(enum bpf_arg_type type)
> >  {
> >  	if (type == ARG_PTR_TO_INT)
> > @@ -5514,11 +5509,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >
> >  int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  			   const struct bpf_reg_state *reg, int regno,
> > -			   enum bpf_arg_type arg_type,
> > -			   bool is_release_func)
> > +			   enum bpf_arg_type arg_type)
> >  {
> > -	bool fixed_off_ok = false, release_reg;
> >  	enum bpf_reg_type type = reg->type;
> > +	bool fixed_off_ok = false;
> >
> >  	switch ((u32)type) {
> >  	case SCALAR_VALUE:
> > @@ -5536,7 +5530,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  		/* Some of the argument types nevertheless require a
> >  		 * zero register offset.
> >  		 */
> > -		if (arg_type != ARG_PTR_TO_ALLOC_MEM)
> > +		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
> >  			return 0;
> >  		break;
> >  	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
> > @@ -5544,19 +5538,17 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  	 */
> >  	case PTR_TO_BTF_ID:
> >  		/* When referenced PTR_TO_BTF_ID is passed to release function,
> > -		 * it's fixed offset must be 0. We rely on the property that
> > -		 * only one referenced register can be passed to BPF helpers and
> > -		 * kfuncs. In the other cases, fixed offset can be non-zero.
> > +		 * it's fixed offset must be 0.	In the other cases, fixed offset
> > +		 * can be non-zero.
> >  		 */
> > -		release_reg = is_release_func && reg->ref_obj_id;
> > -		if (release_reg && reg->off) {
> > +		if (arg_type_is_release_ptr(arg_type) && reg->off) {
> >  			verbose(env, "R%d must have zero offset when passed to release func\n",
> >  				regno);
> >  			return -EINVAL;
> >  		}
> > -		/* For release_reg == true, fixed_off_ok must be false, but we
> > -		 * already checked and rejected reg->off != 0 above, so set to
> > -		 * true to allow fixed offset for all other cases.
> > +		/* For arg is release pointer, fixed_off_ok must be false, but
> > +		 * we already checked and rejected reg->off != 0 above, so set
> > +		 * to true to allow fixed offset for all other cases.
> >  		 */
> >  		fixed_off_ok = true;
> >  		break;
> > @@ -5615,14 +5607,24 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >  	if (err)
> >  		return err;
> >
> > -	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
> > +	err = check_func_arg_reg_off(env, reg, regno, arg_type);
> >  	if (err)
> >  		return err;
> >
> >  skip_type_check:
> > -	/* check_func_arg_reg_off relies on only one referenced register being
> > -	 * allowed for BPF helpers.
> > -	 */
> > +	if (arg_type_is_release_ptr(arg_type)) {
> > +		if (!reg->ref_obj_id && !register_is_null(reg)) {
> > +			verbose(env, "R%d must be referenced when passed to release function\n",
> > +				regno);
> > +			return -EINVAL;
> > +		}
> > +		if (meta->release_regno) {
> > +			verbose(env, "verifier internal error: more than one release argument\n");
> > +			return -EFAULT;
> > +		}
> > +		meta->release_regno = regno;
> > +	}
> > +
> >  	if (reg->ref_obj_id) {
> >  		if (meta->ref_obj_id) {
> >  			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> > @@ -6129,7 +6131,8 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
> >  	return true;
> >  }
> >
> > -static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
> > +static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
> > +			    struct bpf_call_arg_meta *meta)
> >  {
> >  	return check_raw_mode_ok(fn) &&
> >  	       check_arg_pair_ok(fn) &&
> > @@ -6813,7 +6816,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >  	memset(&meta, 0, sizeof(meta));
> >  	meta.pkt_access = fn->pkt_access;
> >
> > -	err = check_func_proto(fn, func_id);
> > +	err = check_func_proto(fn, func_id, &meta);
> >  	if (err) {
> >  		verbose(env, "kernel subsystem misconfigured func %s#%d\n",
> >  			func_id_name(func_id), func_id);
> > @@ -6846,8 +6849,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >  			return err;
> >  	}
> >
> > -	if (is_release_function(func_id)) {
> > -		err = release_reference(env, meta.ref_obj_id);
> > +	regs = cur_regs(env);
> > +
> > +	if (meta.release_regno) {
> > +		err = -EINVAL;
> > +		if (meta.ref_obj_id)
> > +			err = release_reference(env, meta.ref_obj_id);
> > +		/* meta.ref_obj_id can only be 0 if register that is meant to be
> > +		 * released is NULL, which must be > R0.
> > +		 */
> > +		else if (register_is_null(&regs[meta.release_regno]))
> > +			err = 0;
> >  		if (err) {
> >  			verbose(env, "func %s#%d reference has not been acquired before\n",
> >  				func_id_name(func_id), func_id);
> > @@ -6855,8 +6867,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >  		}
> >  	}
> >
> > -	regs = cur_regs(env);
> > -
> >  	switch (func_id) {
> >  	case BPF_FUNC_tail_call:
> >  		err = check_reference_leak(env);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 143f442a9505..8eb01a997476 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
> >  	.func		= bpf_sk_release,
> >  	.gpl_only	= false,
> >  	.ret_type	= RET_INTEGER,
> > -	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> > +	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | PTR_RELEASE,
> >  };
> >
> >  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> > diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > index fbd682520e47..57a83d763ec1 100644
> > --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > @@ -796,7 +796,7 @@
> >  	},
> >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >  	.result = REJECT,
> > -	.errstr = "reference has not been acquired before",
> > +	.errstr = "R1 must be referenced when passed to release function",
> >  },
> >  {
> >  	/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
> > diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
> > index 86b24cad27a7..d11d0b28be41 100644
> > --- a/tools/testing/selftests/bpf/verifier/sock.c
> > +++ b/tools/testing/selftests/bpf/verifier/sock.c
> > @@ -417,7 +417,7 @@
> >  	},
> >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >  	.result = REJECT,
> > -	.errstr = "reference has not been acquired before",
> > +	.errstr = "R1 must be referenced when passed to release function",
> >  },
> >  {
> >  	"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
> > @@ -436,7 +436,7 @@
> >  	},
> >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >  	.result = REJECT,
> > -	.errstr = "reference has not been acquired before",
> > +	.errstr = "R1 must be referenced when passed to release function",
> >  },
> >  {
> >  	"bpf_sk_release(bpf_tcp_sock(skb->sk))",
> > @@ -455,7 +455,7 @@
> >  	},
> >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >  	.result = REJECT,
> > -	.errstr = "reference has not been acquired before",
> > +	.errstr = "R1 must be referenced when passed to release function",
> >  },
> >  {
> >  	"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
> > --
> > 2.35.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map
  2022-04-21  4:21   ` Alexei Starovoitov
@ 2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-21 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Thu, Apr 21, 2022 at 09:51:47AM IST, Alexei Starovoitov wrote:
> On Fri, Apr 15, 2022 at 09:33:46PM +0530, Kumar Kartikeya Dwivedi wrote:
> > Extending the code in previous commits, introduce referenced kptr
> > support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
> > unreferenced kptr, referenced kptr have a lot more restrictions. In
> > addition to the type matching, only a newly introduced bpf_kptr_xchg
> > helper is allowed to modify the map value at that offset. This transfers
> > the referenced pointer being stored into the map, releasing the
> > references state for the program, and returning the old value and
> > creating new reference state for the returned pointer.
> >
> > Similar to unreferenced pointer case, return value for this case will
> > also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
> > must either be eventually released by calling the corresponding release
> > function, otherwise it must be transferred into another map.
> >
> > It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
> > the value, and obtain the old value if any.
> >
> > BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
> > commit will permit using BPF_LDX for such pointers, but attempt at
> > making it safe, since the lifetime of object won't be guaranteed.
> >
> > There are valid reasons to enforce the restriction of permitting only
> > bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
> > consistent in face of concurrent modification, and any prior values
> > contained in the map must also be released before a new one is moved
> > into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
> > returns the old value, which the verifier would require the user to
> > either free or move into another map, and releases the reference held
> > for the pointer being moved in.
> >
> > In the future, direct BPF_XCHG instruction may also be permitted to work
> > like bpf_kptr_xchg helper.
> >
> > Note that process_kptr_func doesn't have to call
> > check_helper_mem_access, since we already disallow rdonly/wronly flags
> > for map, which is what check_map_access_type checks, and we already
> > ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
> > so check_map_access is also not required.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h            |  8 +++
> >  include/uapi/linux/bpf.h       | 12 +++++
> >  kernel/bpf/btf.c               | 10 +++-
> >  kernel/bpf/helpers.c           | 21 ++++++++
> >  kernel/bpf/verifier.c          | 98 +++++++++++++++++++++++++++++-----
> >  tools/include/uapi/linux/bpf.h | 12 +++++
> >  6 files changed, 148 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f73a3f10e654..61f83a23980f 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -160,8 +160,14 @@ enum {
> >  	BPF_MAP_VALUE_OFF_MAX = 8,
> >  };
> >
> > +enum bpf_map_off_desc_type {
> > +	BPF_MAP_OFF_DESC_TYPE_UNREF_KPTR,
> > +	BPF_MAP_OFF_DESC_TYPE_REF_KPTR,
>
> Those are verbose names and MAP_OFF_DESC part doesn't add value.
> Maybe:
> enum bpf_kptr_type {
>  BPF_KPTR_UNREF,
>  BPF_KPTR_REF
> };

Ok, will rename.

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr
  2022-04-21  4:26   ` Alexei Starovoitov
@ 2022-04-21 19:39     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-21 19:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Thu, Apr 21, 2022 at 09:56:51AM IST, Alexei Starovoitov wrote:
> On Fri, Apr 15, 2022 at 09:33:50PM +0530, Kumar Kartikeya Dwivedi wrote:
> >  	return 0;
> >  }
> > @@ -386,6 +388,7 @@ static void array_map_free_timers(struct bpf_map *map)
> >  	struct bpf_array *array = container_of(map, struct bpf_array, map);
> >  	int i;
> >
> > +	/* We don't reset or free kptr on uref dropping to zero. */
> >  	if (likely(!map_value_has_timer(map)))
>
> It was a copy paste mistake of mine to use likely() here in a cold
> function. Let's not repeat it.
>

Ok, will remove this and all the following ones that you pointed out.

> > [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map
  2022-04-21 19:36     ` Kumar Kartikeya Dwivedi
@ 2022-04-21 22:26       ` Alexei Starovoitov
  2022-04-24 21:50         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2022-04-21 22:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Thu, Apr 21, 2022 at 12:36 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
> > > +
> > > +   if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > +                             off_desc->kptr.btf, off_desc->kptr.btf_id))
> > > +           goto bad_type;
> >
> > Is full type comparison really needed?
>
> Yes.
>
> > reg->btf should be the same pointer as off_desc->kptr.btf
> > and btf_id should match exactly.
>
> This is not true, it can be vmlinux or module BTF. But if you mean just
> comparing the pointer and btf_id, we still need to handle reg->off.
>
> We want to support cases like:
>
> struct foo {
>         struct bar br;
>         struct baz bz;
> };
>
> struct foo *v = func(); // PTR_TO_BTF_ID
> map->foo = v;      // reg->off is zero, btf and btf_id matches type.
> map->bar = &v->br; // reg->off is still zero, but we need to walk and retry with
>                    // first member type of struct after comparison fails.
> map->baz = &v->bz; // reg->off is non-zero, so struct needs to be walked to
>                    // match type.
>
> In the ref case, the argument's offset will always be 0, so third case is not
> going to work, but in the unref case, we want to allow storing pointers to
> structs embedded inside parent struct.
>
> Please let me know if I misunderstood what you meant.

Makes sense.
Please add this comment to the code.

> > Is this a feature proofing for some day when registers with PTR_TO_BTF_ID type
> > will start pointing to prog's btf?
> >
> > > +   return 0;
> > > +bad_type:
> > > +   verbose(env, "invalid kptr access, R%d type=%s%s ", regno,
> > > +           reg_type_str(env, reg->type), reg_name);
> > > +   verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > +   return -EINVAL;
> > > +}
> > > +
> > > +static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
> > > +                            int value_regno, int insn_idx,
> > > +                            struct bpf_map_value_off_desc *off_desc)
> > > +{
> > > +   struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
> > > +   int class = BPF_CLASS(insn->code);
> > > +   struct bpf_reg_state *val_reg;
> > > +
> > > +   /* Things we already checked for in check_map_access and caller:
> > > +    *  - Reject cases where variable offset may touch kptr
> > > +    *  - size of access (must be BPF_DW)
> > > +    *  - tnum_is_const(reg->var_off)
> > > +    *  - off_desc->offset == off + reg->var_off.value
> > > +    */
> > > +   /* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
> > > +   if (BPF_MODE(insn->code) != BPF_MEM) {
> > > +           verbose(env, "kptr in map can only be accessed using BPF_MEM instruction mode\n");
> > > +           return -EACCES;
> > > +   }
> > > +
> > > +   if (class == BPF_LDX) {
> > > +           val_reg = reg_state(env, value_regno);
> > > +           /* We can simply mark the value_regno receiving the pointer
> > > +            * value from map as PTR_TO_BTF_ID, with the correct type.
> > > +            */
> > > +           mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->kptr.btf,
> > > +                           off_desc->kptr.btf_id, PTR_MAYBE_NULL);
> > > +           val_reg->id = ++env->id_gen;
> >
> > why non zero id this needed?
>
> For mark_ptr_or_null_reg. I'll add a comment.

Ahh. It's because it's not a plain PTR_TO_BTF_ID,
but the one with PTR_MAYBE_NULL.
Makes sense.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map
  2022-04-21 22:26       ` Alexei Starovoitov
@ 2022-04-24 21:50         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-24 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 22, 2022 at 03:56:44AM IST, Alexei Starovoitov wrote:
> On Thu, Apr 21, 2022 at 12:36 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> > > > +
> > > > +   if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > +                             off_desc->kptr.btf, off_desc->kptr.btf_id))
> > > > +           goto bad_type;
> > >
> > > Is full type comparison really needed?
> >
> > Yes.
> >
> > > reg->btf should be the same pointer as off_desc->kptr.btf
> > > and btf_id should match exactly.
> >
> > This is not true, it can be vmlinux or module BTF. But if you mean just
> > comparing the pointer and btf_id, we still need to handle reg->off.
> >
> > We want to support cases like:
> >
> > struct foo {
> >         struct bar br;
> >         struct baz bz;
> > };
> >
> > struct foo *v = func(); // PTR_TO_BTF_ID
> > map->foo = v;      // reg->off is zero, btf and btf_id matches type.
> > map->bar = &v->br; // reg->off is still zero, but we need to walk and retry with
> >                    // first member type of struct after comparison fails.
> > map->baz = &v->bz; // reg->off is non-zero, so struct needs to be walked to
> >                    // match type.
> >
> > In the ref case, the argument's offset will always be 0, so third case is not
> > going to work, but in the unref case, we want to allow storing pointers to
> > structs embedded inside parent struct.
> >
> > Please let me know if I misunderstood what you meant.
>
> Makes sense.
> Please add this comment to the code.
>

I took a closer look at this, and I think we're missing one extra corner case
from the ones covered in 24d5bb806c7e, i.e. when reg->off is zero and struct is
walked to match type. This would be incorrect for release/kptr_ref case, even if
it is unlikely to occur in practice, it should be rejected by default. I
included a patch + selftest for this in v6, ptal.

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto
  2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
@ 2022-04-24 21:57       ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-24 21:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Joanne Koong, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer

On Fri, Apr 22, 2022 at 01:08:08AM IST, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 21, 2022 at 09:49:54AM IST, Alexei Starovoitov wrote:
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index c802e51c4e18..97f88d06f848 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -245,6 +245,7 @@ struct bpf_call_arg_meta {
> > >  	struct bpf_map *map_ptr;
> > >  	bool raw_mode;
> > >  	bool pkt_access;
> > > +	u8 release_regno;
> > >  	int regno;
> >
> > release_regno and regno are always equal.
> > Why go with u8 instead of bool flag?
> >
>
> Didn't realise that. I will change it.
>

Actually, I think regno may not equal release_regno. It is set by by
check_stack_range_initialized only when meta->raw_mode is true, along with
meta.access_size. So I skipped this change in v6.

--
Kartikeya

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-04-24 21:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-15 16:03 [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 01/13] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 02/13] bpf: Move check_ptr_off_reg before check_map_access Kumar Kartikeya Dwivedi
2022-04-21  4:30   ` Alexei Starovoitov
2022-04-15 16:03 ` [PATCH bpf-next v5 03/13] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
2022-04-21  4:15   ` Alexei Starovoitov
2022-04-21 19:36     ` Kumar Kartikeya Dwivedi
2022-04-21 22:26       ` Alexei Starovoitov
2022-04-24 21:50         ` Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 04/13] bpf: Tag argument to be released in bpf_func_proto Kumar Kartikeya Dwivedi
2022-04-21  4:19   ` Alexei Starovoitov
2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
2022-04-24 21:57       ` Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 05/13] bpf: Allow storing referenced kptr in map Kumar Kartikeya Dwivedi
2022-04-21  4:21   ` Alexei Starovoitov
2022-04-21 19:38     ` Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 06/13] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
2022-04-18 23:48   ` Joanne Koong
2022-04-19  2:47     ` Kumar Kartikeya Dwivedi
2022-04-19 17:35       ` Joanne Koong
2022-04-15 16:03 ` [PATCH bpf-next v5 07/13] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 08/13] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 09/13] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
2022-04-21  4:26   ` Alexei Starovoitov
2022-04-21 19:39     ` Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 10/13] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 11/13] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 12/13] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
2022-04-15 16:03 ` [PATCH bpf-next v5 13/13] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
2022-04-21  4:40 ` [PATCH bpf-next v5 00/13] Introduce typed pointer support in BPF maps patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.