All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps
@ 2022-02-20 13:47 Kumar Kartikeya Dwivedi
  2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
                   ` (15 more replies)
  0 siblings, 16 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:47 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Introduction
------------

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
four kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the BPF_XCHG instruction, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

3. per-CPU kernel pointer

These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
into the map, and when loading from the map, they must NULL check it before use,
because while a non-zero value stored into the map should always be valid, it can
still be reset to zero on updates. After checking it to be non-NULL, it can be
passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
to underlying per-CPU object.

It is also permitted to write 0 and reset the value.

4. Userspace pointer

The verifier recently gained support for annotating BTF with __user type tag.
This indicates pointers pointing to memory which must be read using the
bpf_probe_read_user helper to ensure correct results. The set also permits
storing them into the BPF map, and ensures user pointer cannot be stored
into other kinds of pointers mentioned above.

When loaded from the map, the only thing that can be done is to pass this
pointer to bpf_probe_read_user. No dereference is allowed.

Notes
-----

This set requires the following LLVM fix to pass the BPF CI:

  https://reviews.llvm.org/D119799

Also, I applied Alexei's suggestion of removing callback for btf_find_field, but
that 'ugly' is still required, since bad offset alignment etc. can return an
error, and we don't want to leave a partial ptr_off_tab around in that case. The
other option is freeing inside btf_find_field, but that would be more code
conditional on BTF_FIELD_KPTR, when the caller can do it based on ret < 0.

TODO
----

Needs a lot more testing, especially for stuff apart from verifier correctness.
Will work on that in parallel during v1 review. The idea was to get a little
more feedback (esp. for kptr_get stuff) before moving forward with adding more
tests. Posting it now to just get discussion started. The verifier tests fairly
comprehensively test many edge cases I could think of.

Kumar Kartikeya Dwivedi (15):
  bpf: Factor out fd returning from bpf_btf_find_by_name_kind
  bpf: Make btf_find_field more generic
  bpf: Allow storing PTR_TO_BTF_ID in map
  bpf: Allow storing referenced PTR_TO_BTF_ID in map
  bpf: Allow storing PTR_TO_PERCPU_BTF_ID in map
  bpf: Allow storing __user PTR_TO_BTF_ID in map
  bpf: Prevent escaping of pointers loaded from maps
  bpf: Adapt copy_map_value for multiple offset case
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  bpf: Teach verifier about kptr_get style kfunc helpers
  net/netfilter: Add bpf_ct_kptr_get helper
  libbpf: Add __kptr* macros to bpf_helpers.h
  selftests/bpf: Add C tests for PTR_TO_BTF_ID in map
  selftests/bpf: Add verifier tests for PTR_TO_BTF_ID in map

 include/linux/bpf.h                           |  90 ++-
 include/linux/btf.h                           |  24 +
 include/net/netfilter/nf_conntrack_core.h     |  17 +
 kernel/bpf/arraymap.c                         |  13 +-
 kernel/bpf/btf.c                              | 565 ++++++++++++++--
 kernel/bpf/hashtab.c                          |  27 +-
 kernel/bpf/map_in_map.c                       |   5 +-
 kernel/bpf/syscall.c                          | 227 ++++++-
 kernel/bpf/verifier.c                         | 311 ++++++++-
 net/bpf/test_run.c                            |  17 +-
 net/netfilter/nf_conntrack_bpf.c              | 132 +++-
 net/netfilter/nf_conntrack_core.c             |  17 -
 tools/lib/bpf/bpf_helpers.h                   |   4 +
 .../selftests/bpf/prog_tests/map_btf_ptr.c    |  13 +
 .../testing/selftests/bpf/progs/map_btf_ptr.c | 105 +++
 .../testing/selftests/bpf/progs/test_bpf_nf.c |  31 +
 tools/testing/selftests/bpf/test_verifier.c   |  57 +-
 .../selftests/bpf/verifier/map_btf_ptr.c      | 624 ++++++++++++++++++
 18 files changed, 2144 insertions(+), 135 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_btf_ptr.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_btf_ptr.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
@ 2022-02-20 13:47 ` Kumar Kartikeya Dwivedi
  2022-02-22  5:28   ` Alexei Starovoitov
  2022-02-20 13:48 ` [PATCH bpf-next v1 02/15] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:47 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

In next few patches, we need a helper that searches all kernel BTFs
(vmlinux and module BTFs), and finds the type denoted by 'name' and
'kind'. Turns out bpf_btf_find_by_name_kind already does the same thing,
but it instead returns a BTF ID and optionally fd (if module BTF). This
is used for relocating ksyms in BPF loader code (bpftool gen skel -L).

We extract the core code out into a new helper
btf_find_by_name_kind_all, which returns the BTF ID and BTF pointer in
an out parameter. The reference for the returned BTF pointer is only
bumped if it is a module BTF, this needs to be kept in mind when using
this helper.

Hence, the user must release the BTF reference iff btf_is_module is
true, otherwise transfer the ownership to e.g. an fd.

In case of the helper, the fd is only allocated for module BTFs, so no
extra handling for btf_vmlinux case is required.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 47 +++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 2c4c5dbe2abe..3645d8c14a18 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6545,16 +6545,10 @@ static struct btf *btf_get_module_btf(const struct module *module)
 	return btf;
 }
 
-BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
+static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp)
 {
 	struct btf *btf;
-	long ret;
-
-	if (flags)
-		return -EINVAL;
-
-	if (name_sz <= 1 || name[name_sz - 1])
-		return -EINVAL;
+	s32 ret;
 
 	btf = bpf_get_btf_vmlinux();
 	if (IS_ERR(btf))
@@ -6580,19 +6574,40 @@ BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int
 			spin_unlock_bh(&btf_idr_lock);
 			ret = btf_find_by_name_kind(mod_btf, name, kind);
 			if (ret > 0) {
-				int btf_obj_fd;
-
-				btf_obj_fd = __btf_new_fd(mod_btf);
-				if (btf_obj_fd < 0) {
-					btf_put(mod_btf);
-					return btf_obj_fd;
-				}
-				return ret | (((u64)btf_obj_fd) << 32);
+				*btfp = mod_btf;
+				return ret;
 			}
 			spin_lock_bh(&btf_idr_lock);
 			btf_put(mod_btf);
 		}
 		spin_unlock_bh(&btf_idr_lock);
+	} else {
+		*btfp = btf;
+	}
+	return ret;
+}
+
+BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
+{
+	struct btf *btf = NULL;
+	int btf_obj_fd = 0;
+	long ret;
+
+	if (flags)
+		return -EINVAL;
+
+	if (name_sz <= 1 || name[name_sz - 1])
+		return -EINVAL;
+
+	ret = btf_find_by_name_kind_all(name, kind, &btf);
+	if (ret > 0 && btf_is_module(btf)) {
+		/* reference for btf is only raised if module BTF */
+		btf_obj_fd = __btf_new_fd(btf);
+		if (btf_obj_fd < 0) {
+			btf_put(btf);
+			return btf_obj_fd;
+		}
+		return ret | (((u64)btf_obj_fd) << 32);
 	}
 	return ret;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 02/15] bpf: Make btf_find_field more generic
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Next commit's field type will not be struct, but pointer, and it will
not be limited to one offset, but multiple ones. Make existing
btf_find_struct_field and btf_find_datasec_var functions amenable to use
for finding BTF ID pointers in map value, by taking a moving spin_lock
and timer specific checks into their own function.

The alignment, and name are checked before the function is called, so it
is the last point where we can skip field or return an error before the
next loop iteration happens. This is important, because we'll be
potentially reallocating memory inside this function in next commit, so
being able to do that when everything else is in order is going to be
more convenient.

The name parameter is now optional, and only checked if it is not NULL.

The size must be checked in the function, because in case of PTR it will
instead point to the underlying BTF ID it is pointing to (or modifiers),
so the check becomes wrong to do outside of function, and the base type
has to be obtained by removing modifiers.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/btf.c | 119 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 85 insertions(+), 34 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 3645d8c14a18..55f6ccac3388 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3119,71 +3119,108 @@ static void btf_struct_log(struct btf_verifier_env *env,
 	btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
 }
 
+enum {
+	BTF_FIELD_SPIN_LOCK,
+	BTF_FIELD_TIMER,
+};
+
+static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
+				 u32 off, int sz, void *data)
+{
+	u32 *offp = data;
+
+	if (!__btf_type_is_struct(t))
+		return 0;
+	if (t->size != sz)
+		return 0;
+	if (*offp != -ENOENT)
+		/* only one such field is allowed */
+		return -E2BIG;
+	*offp = off;
+	return 0;
+}
+
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
-				 const char *name, int sz, int align)
+				 const char *name, int sz, int align, int field_type,
+				 void *data)
 {
 	const struct btf_member *member;
-	u32 i, off = -ENOENT;
+	u32 i, off;
+	int ret;
 
 	for_each_member(i, t, member) {
 		const struct btf_type *member_type = btf_type_by_id(btf,
 								    member->type);
-		if (!__btf_type_is_struct(member_type))
-			continue;
-		if (member_type->size != sz)
-			continue;
-		if (strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
-			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
+
 		off = __btf_member_bit_offset(t, member);
+
+		if (name && strcmp(__btf_name_by_offset(btf, member_type->name_off), name))
+			continue;
 		if (off % 8)
 			/* valid C code cannot generate such BTF */
 			return -EINVAL;
 		off /= 8;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			ret = btf_find_field_struct(btf, member_type, off, sz, data);
+			if (ret < 0)
+				return ret;
+			break;
+		default:
+			pr_err("verifier bug: unknown field type requested\n");
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
-				const char *name, int sz, int align)
+				const char *name, int sz, int align, int field_type,
+				void *data)
 {
 	const struct btf_var_secinfo *vsi;
-	u32 i, off = -ENOENT;
+	u32 i, off;
+	int ret;
 
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
 		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
 
-		if (!__btf_type_is_struct(var_type))
-			continue;
-		if (var_type->size != sz)
+		off = vsi->offset;
+
+		if (name && strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
 			continue;
 		if (vsi->size != sz)
 			continue;
-		if (strcmp(__btf_name_by_offset(btf, var_type->name_off), name))
-			continue;
-		if (off != -ENOENT)
-			/* only one such field is allowed */
-			return -E2BIG;
-		off = vsi->offset;
 		if (off % align)
 			return -EINVAL;
+
+		switch (field_type) {
+		case BTF_FIELD_SPIN_LOCK:
+		case BTF_FIELD_TIMER:
+			ret = btf_find_field_struct(btf, var_type, off, sz, data);
+			if (ret < 0)
+				return ret;
+			break;
+		default:
+			return -EFAULT;
+		}
 	}
-	return off;
+	return 0;
 }
 
 static int btf_find_field(const struct btf *btf, const struct btf_type *t,
-			  const char *name, int sz, int align)
+			  const char *name, int sz, int align, int field_type,
+			  void *data)
 {
-
 	if (__btf_type_is_struct(t))
-		return btf_find_struct_field(btf, t, name, sz, align);
+		return btf_find_struct_field(btf, t, name, sz, align, field_type, data);
 	else if (btf_type_is_datasec(t))
-		return btf_find_datasec_var(btf, t, name, sz, align);
+		return btf_find_datasec_var(btf, t, name, sz, align, field_type, data);
 	return -EINVAL;
 }
 
@@ -3193,16 +3230,30 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
  */
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_spin_lock",
-			      sizeof(struct bpf_spin_lock),
-			      __alignof__(struct bpf_spin_lock));
+	u32 off = -ENOENT;
+	int ret;
+
+	ret = btf_find_field(btf, t, "bpf_spin_lock",
+			     sizeof(struct bpf_spin_lock),
+			     __alignof__(struct bpf_spin_lock),
+			     BTF_FIELD_SPIN_LOCK, &off);
+	if (ret < 0)
+		return ret;
+	return off;
 }
 
 int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 {
-	return btf_find_field(btf, t, "bpf_timer",
-			      sizeof(struct bpf_timer),
-			      __alignof__(struct bpf_timer));
+	u32 off = -ENOENT;
+	int ret;
+
+	ret = btf_find_field(btf, t, "bpf_timer",
+			     sizeof(struct bpf_timer),
+			     __alignof__(struct bpf_timer),
+			     BTF_FIELD_TIMER, &off);
+	if (ret < 0)
+		return ret;
+	return off;
 }
 
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
  2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 02/15] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-22  6:46   ` Alexei Starovoitov
  2022-02-20 13:48 ` [PATCH bpf-next v1 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

This patch allows user to embed PTR_TO_BTF_ID in map value, such that
loading it marks the destination register as having the appropriate
register type and such a pointer can be dereferenced like usual
PTR_TO_BTF_ID and be passed to various BPF helpers.

This feature can be useful to store an object in a map for a long time,
and then inspect it later. Since PTR_TO_BTF_ID is safe against invalid
access, verifier doesn't need to perform any complex lifetime checks. It
can be useful in cases where user already knows pointer will remain
valid, so any dereference at a later time (possibly in entirely
different BPF program invocation) will yield correct results as far the
data read from kernel memory is concerned.

Note that it is quite possible such BTF ID pointer is invalid, in this
case the verifier's built-in exception handling mechanism where it
converts loads into PTR_TO_BTF_ID into PROBE_MEM loads, would handle the
invalid case. Next patch which adds referenced PTR_TO_BTF_ID would need
to take more care in ensuring a correct value is stored in the BPF map.

The user indicates that a certain pointer must be treated as
PTR_TO_BTF_ID by using a BTF type tag 'btf_id' on the pointed to type of
the pointer. Then, this information is recorded in the object BTF which
will be passed into the kernel by way of map's BTF information.

The kernel then records the type, and offset of all such pointers, and
finds their corresponding built-in kernel type by the name and BTF kind.

Later, during verification this information is used that access to such
pointers is sized correctly, and done at a proper offset into the map
value. Only BPF_LDX, BPF_STX, and BPF_ST with 0 (to denote NULL) are
allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is same PTR_TO_BTF_ID,
and whether the BTF ID (reg->btf and reg->btf_id) matches the type
specified in the map value's definition.

Hence, the verifier allows flexible access to kernel data across program
invocations in a type safe manner, without compromising on the runtime
safety of the kernel.

Next patch will extend this support to referenced PTR_TO_BTF_ID.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h     |  30 +++++++-
 include/linux/btf.h     |   3 +
 kernel/bpf/btf.c        | 127 ++++++++++++++++++++++++++++++++++
 kernel/bpf/map_in_map.c |   5 +-
 kernel/bpf/syscall.c    | 137 ++++++++++++++++++++++++++++++++++++-
 kernel/bpf/verifier.c   | 148 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 446 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f19abc59b6cd..ce45ffb79f82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -155,6 +155,23 @@ struct bpf_map_ops {
 	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
+enum {
+	/* Support at most 8 pointers in a BPF map value */
+	BPF_MAP_VALUE_OFF_MAX = 8,
+};
+
+struct bpf_map_value_off_desc {
+	u32 offset;
+	u32 btf_id;
+	struct btf *btf;
+	struct module *module;
+};
+
+struct bpf_map_value_off {
+	u32 nr_off;
+	struct bpf_map_value_off_desc off[];
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -171,6 +188,7 @@ struct bpf_map {
 	u64 map_extra; /* any per-map-type extra fields */
 	u32 map_flags;
 	int spin_lock_off; /* >=0 valid offset, <0 error */
+	struct bpf_map_value_off *ptr_off_tab;
 	int timer_off; /* >=0 valid offset, <0 error */
 	u32 id;
 	int numa_node;
@@ -184,7 +202,7 @@ struct bpf_map {
 	char name[BPF_OBJ_NAME_LEN];
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 14 bytes hole */
+	/* 6 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -217,6 +235,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
 	return map->timer_off >= 0;
 }
 
+static inline bool map_value_has_ptr_to_btf_id(const struct bpf_map *map)
+{
+	return !IS_ERR_OR_NULL(map->ptr_off_tab);
+}
+
 static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 {
 	if (unlikely(map_value_has_spin_lock(map)))
@@ -1490,6 +1513,11 @@ void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
+struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32 offset);
+void bpf_map_free_ptr_off_tab(struct bpf_map *map);
+struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map);
+bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 36bc09b8e890..6592183aeb23 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -26,6 +26,7 @@ struct btf_type;
 union bpf_attr;
 struct btf_show;
 struct btf_id_set;
+struct bpf_map;
 
 struct btf_kfunc_id_set {
 	struct module *owner;
@@ -123,6 +124,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+int btf_find_ptr_to_btf_id(const struct btf *btf, const struct btf_type *t,
+			   struct bpf_map *map);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 55f6ccac3388..1edb5710e155 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3122,6 +3122,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
 enum {
 	BTF_FIELD_SPIN_LOCK,
 	BTF_FIELD_TIMER,
+	BTF_FIELD_KPTR,
 };
 
 static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
@@ -3140,6 +3141,106 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
 	return 0;
 }
 
+static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp);
+
+static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
+			       u32 off, int sz, void *data)
+{
+	struct bpf_map_value_off *tab;
+	struct bpf_map *map = data;
+	struct module *mod = NULL;
+	bool btf_id_tag = false;
+	struct btf *kernel_btf;
+	int nr_off, ret;
+	s32 id;
+
+	/* For PTR, sz is always == 8 */
+	if (!btf_type_is_ptr(t))
+		return 0;
+	t = btf_type_by_id(btf, t->type);
+
+	while (btf_type_is_type_tag(t)) {
+		if (!strcmp("kernel.bpf.btf_id", __btf_name_by_offset(btf, t->name_off))) {
+			/* repeated tag */
+			if (btf_id_tag) {
+				ret = -EINVAL;
+				goto end;
+			}
+			btf_id_tag = true;
+		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
+			   sizeof("kernel.") - 1)) {
+			/* TODO: Should we reject these when loading BTF? */
+			/* Unavailable tag in reserved tag namespace */
+			ret = -EACCES;
+			goto end;
+		}
+		/* Look for next tag */
+		t = btf_type_by_id(btf, t->type);
+	}
+	if (!btf_id_tag)
+		return 0;
+
+	/* Get the base type */
+	if (btf_type_is_modifier(t))
+		t = btf_type_skip_modifiers(btf, t->type, NULL);
+	/* Only pointer to struct is allowed */
+	if (!__btf_type_is_struct(t)) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	id = btf_find_by_name_kind_all(__btf_name_by_offset(btf, t->name_off),
+				       BTF_INFO_KIND(t->info), &kernel_btf);
+	if (id < 0) {
+		ret = id;
+		goto end;
+	}
+
+	nr_off = map->ptr_off_tab ? map->ptr_off_tab->nr_off : 0;
+	if (nr_off == BPF_MAP_VALUE_OFF_MAX) {
+		ret = -E2BIG;
+		goto end_btf;
+	}
+
+	tab = krealloc(map->ptr_off_tab, offsetof(struct bpf_map_value_off, off[nr_off + 1]),
+		       GFP_KERNEL | __GFP_NOWARN);
+	if (!tab) {
+		ret = -ENOMEM;
+		goto end_btf;
+	}
+	/* Initialize nr_off for newly allocated ptr_off_tab */
+	if (!map->ptr_off_tab)
+		tab->nr_off = 0;
+	map->ptr_off_tab = tab;
+
+	/* We take reference to make sure valid pointers into module data don't
+	 * become invalid across program invocation.
+	 */
+	if (btf_is_module(kernel_btf)) {
+		mod = btf_try_get_module(kernel_btf);
+		if (!mod) {
+			ret = -ENXIO;
+			goto end_btf;
+		}
+	}
+
+	tab->off[nr_off].offset = off;
+	tab->off[nr_off].btf_id = id;
+	tab->off[nr_off].btf    = kernel_btf;
+	tab->off[nr_off].module = mod;
+	tab->nr_off++;
+
+	return 0;
+end_btf:
+	/* Reference is only raised for module BTF */
+	if (btf_is_module(kernel_btf))
+		btf_put(kernel_btf);
+end:
+	bpf_map_free_ptr_off_tab(map);
+	map->ptr_off_tab = ERR_PTR(ret);
+	return ret;
+}
+
 static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t,
 				 const char *name, int sz, int align, int field_type,
 				 void *data)
@@ -3170,6 +3271,11 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 			if (ret < 0)
 				return ret;
 			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_field_kptr(btf, member_type, off, sz, data);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			pr_err("verifier bug: unknown field type requested\n");
 			return -EFAULT;
@@ -3206,6 +3312,11 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			if (ret < 0)
 				return ret;
 			break;
+		case BTF_FIELD_KPTR:
+			ret = btf_find_field_kptr(btf, var_type, off, sz, data);
+			if (ret < 0)
+				return ret;
+			break;
 		default:
 			return -EFAULT;
 		}
@@ -3256,6 +3367,22 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t)
 	return off;
 }
 
+int btf_find_ptr_to_btf_id(const struct btf *btf, const struct btf_type *t,
+			   struct bpf_map *map)
+{
+	int ret;
+
+	ret = btf_find_field(btf, t, NULL, sizeof(u64), __alignof__(u64),
+			     BTF_FIELD_KPTR, map);
+	/* While btf_find_field_kptr cleans up after itself, later iterations
+	 * can still return error without calling it, so call free function
+	 * again.
+	 */
+	if (ret < 0)
+		bpf_map_free_ptr_off_tab(map);
+	return ret;
+}
+
 static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
 			      u32 type_id, void *data, u8 bits_offset,
 			      struct btf_show *show)
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 5cd8f5277279..293e41a4f0b3 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 	inner_map_meta->max_entries = inner_map->max_entries;
 	inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
 	inner_map_meta->timer_off = inner_map->timer_off;
+	inner_map_meta->ptr_off_tab = bpf_map_copy_ptr_off_tab(inner_map);
 	if (inner_map->btf) {
 		btf_get(inner_map->btf);
 		inner_map_meta->btf = inner_map->btf;
@@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 
 void bpf_map_meta_free(struct bpf_map *map_meta)
 {
+	bpf_map_free_ptr_off_tab(map_meta);
 	btf_put(map_meta->btf);
 	kfree(map_meta);
 }
@@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 		meta0->key_size == meta1->key_size &&
 		meta0->value_size == meta1->value_size &&
 		meta0->timer_off == meta1->timer_off &&
-		meta0->map_flags == meta1->map_flags;
+		meta0->map_flags == meta1->map_flags &&
+		bpf_map_equal_ptr_off_tab(meta0, meta1);
 }
 
 void *bpf_map_fd_get_ptr(struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9c7a72b65eee..beb96866f34d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -6,6 +6,7 @@
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
 #include <linux/bpf_verifier.h>
+#include <linux/bsearch.h>
 #include <linux/btf.h>
 #include <linux/syscalls.h>
 #include <linux/slab.h>
@@ -472,12 +473,123 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 }
 #endif
 
+static int bpf_map_ptr_off_cmp(const void *a, const void *b)
+{
+	const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
+
+	if (off_desc1->offset < off_desc2->offset)
+		return -1;
+	else if (off_desc1->offset > off_desc2->offset)
+		return 1;
+	return 0;
+}
+
+struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32 offset)
+{
+	/* Since members are iterated in btf_find_field in increasing order,
+	 * offsets appended to ptr_off_tab are in increasing order, so we can
+	 * do bsearch to find exact match.
+	 */
+	struct bpf_map_value_off *tab;
+
+	if (!map_value_has_ptr_to_btf_id(map))
+		return NULL;
+	tab = map->ptr_off_tab;
+	return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_ptr_off_cmp);
+}
+
+void bpf_map_free_ptr_off_tab(struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->ptr_off_tab;
+	int i;
+
+	if (IS_ERR_OR_NULL(tab))
+		return;
+	for (i = 0; i < tab->nr_off; i++) {
+		struct module *mod = tab->off[i].module;
+		struct btf *btf = tab->off[i].btf;
+
+		/* off[i].btf is obtained from bpf_btf_find_by_name_kind_all,
+		 * which only takes reference for module BTF, not vmlinux BTF.
+		 */
+		if (btf_is_module(btf)) {
+			module_put(mod);
+			btf_put(btf);
+		}
+	}
+	kfree(tab);
+	map->ptr_off_tab = NULL;
+}
+
+struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map)
+{
+	struct bpf_map_value_off *tab = map->ptr_off_tab, *new_tab;
+	int size, i, ret;
+
+	if (IS_ERR_OR_NULL(tab))
+		return tab;
+	/* Increment references that we have to transfer into the new
+	 * ptr_off_tab.
+	 */
+	for (i = 0; i < tab->nr_off; i++) {
+		struct btf *btf = tab->off[i].btf;
+
+		if (btf_is_module(btf)) {
+			if (!btf_try_get_module(btf)) {
+				ret = -ENXIO;
+				/* No references for off_desc at index 'i' have
+				 * been taken at this point, so the cleanup loop
+				 * at 'end' will start releasing from previous
+				 * index.
+				 */
+				goto end;
+			}
+			btf_get(btf);
+		}
+	}
+
+	size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
+	new_tab = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
+	if (!new_tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+	memcpy(new_tab, tab, size);
+	return new_tab;
+end:
+	while (i--) {
+		if (btf_is_module(tab->off[i].btf)) {
+			module_put(tab->off[i].module);
+			btf_put(tab->off[i].btf);
+		}
+	}
+	return ERR_PTR(ret);
+}
+
+bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
+{
+	struct bpf_map_value_off *tab_a = map_a->ptr_off_tab, *tab_b = map_b->ptr_off_tab;
+	int size;
+
+	if (IS_ERR(tab_a) || IS_ERR(tab_b))
+		return false;
+	if (!tab_a && !tab_b)
+		return true;
+	if ((!tab_a && tab_b) || (tab_a && !tab_b))
+		return false;
+	if (tab_a->nr_off != tab_b->nr_off)
+		return false;
+	size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
+	return !memcmp(tab_a, tab_b, size);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
+	bpf_map_free_ptr_off_tab(map);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
@@ -639,7 +751,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
 	int err;
 
 	if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
-	    map_value_has_timer(map))
+	    map_value_has_timer(map) || map_value_has_ptr_to_btf_id(map))
 		return -ENOTSUPP;
 
 	if (!(vma->vm_flags & VM_SHARED))
@@ -819,9 +931,30 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 			return -EOPNOTSUPP;
 	}
 
-	if (map->ops->map_check_btf)
+	/* We can ignore the return value */
+	btf_find_ptr_to_btf_id(btf, value_type, map);
+	if (map_value_has_ptr_to_btf_id(map)) {
+		if (map->map_flags & BPF_F_RDONLY_PROG) {
+			ret = -EACCES;
+			goto free_map_tab;
+		}
+		if (map->map_type != BPF_MAP_TYPE_HASH &&
+		    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+		    map->map_type != BPF_MAP_TYPE_ARRAY) {
+			ret = -EOPNOTSUPP;
+			goto free_map_tab;
+		}
+	}
+
+	if (map->ops->map_check_btf) {
 		ret = map->ops->map_check_btf(map, btf, key_type, value_type);
+		if (ret < 0)
+			goto free_map_tab;
+	}
 
+	return ret;
+free_map_tab:
+	bpf_map_free_ptr_off_tab(map);
 	return ret;
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d7473fee247c..1ffefddebaea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3465,6 +3465,118 @@ static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
 	return 0;
 }
 
+static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
+					struct bpf_map_value_off_desc *off_desc,
+					struct bpf_reg_state *reg, u32 regno)
+{
+	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
+	const char *reg_name = "";
+
+	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
+		goto end;
+
+	if (!btf_is_kernel(reg->btf)) {
+		verbose(env, "R%d must point to kernel BTF\n", regno);
+		return -EINVAL;
+	}
+	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
+	reg_name = kernel_type_name(reg->btf, reg->btf_id);
+
+	if (reg->off < 0) {
+		verbose(env,
+			"R%d is ptr_%s invalid negative access: off=%d\n",
+			regno, reg_name, reg->off);
+		return -EINVAL;
+	}
+
+	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+		char tn_buf[48];
+
+		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+		verbose(env,
+			"R%d is ptr_%s invalid variable offset: off=%d, var_off=%s\n",
+			regno, reg_name, reg->off, tn_buf);
+		return -EINVAL;
+	}
+
+	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+				  off_desc->btf, off_desc->btf_id))
+		goto end;
+	return 0;
+end:
+	verbose(env, "invalid btf_id pointer access, R%d type=%s%s ", regno,
+		reg_type_str(env, reg->type), reg_name);
+	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	return -EINVAL;
+}
+
+/* Returns an error, or 0 if ignoring the access, or 1 if register state was
+ * updated, in which case later updates must be skipped.
+ */
+static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int off, int size,
+				   int value_regno, enum bpf_access_type t, int insn_idx)
+{
+	struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
+	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+	struct bpf_map_value_off_desc *off_desc;
+	int insn_class = BPF_CLASS(insn->code);
+	struct bpf_map *map = reg->map_ptr;
+
+	/* Things we already checked for in check_map_access:
+	 *  - Reject cases where variable offset may touch BTF ID pointer
+	 *  - size of access (must be BPF_DW)
+	 *  - off_desc->offset == off + reg->var_off.value
+	 */
+	if (!tnum_is_const(reg->var_off))
+		return 0;
+
+	off_desc = bpf_map_ptr_off_contains(map, off + reg->var_off.value);
+	if (!off_desc)
+		return 0;
+
+	if (WARN_ON_ONCE(size != bpf_size_to_bytes(BPF_DW)))
+		return -EACCES;
+
+	if (BPF_MODE(insn->code) != BPF_MEM)
+		goto end;
+
+	if (!env->bpf_capable) {
+		verbose(env, "btf_id pointer in map only allowed for CAP_BPF and CAP_SYS_ADMIN\n");
+		return -EPERM;
+	}
+
+	if (insn_class == BPF_LDX) {
+		if (WARN_ON_ONCE(value_regno < 0))
+			return -EFAULT;
+		val_reg = reg_state(env, value_regno);
+		/* We can simply mark the value_regno receiving the pointer
+		 * value from map as PTR_TO_BTF_ID, with the correct type.
+		 */
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
+				off_desc->btf_id, PTR_MAYBE_NULL);
+		val_reg->id = ++env->id_gen;
+	} else if (insn_class == BPF_STX) {
+		if (WARN_ON_ONCE(value_regno < 0))
+			return -EFAULT;
+		val_reg = reg_state(env, value_regno);
+		if (!register_is_null(val_reg) &&
+		    map_ptr_to_btf_id_match_type(env, off_desc, val_reg, value_regno))
+			return -EACCES;
+	} else if (insn_class == BPF_ST) {
+		if (insn->imm) {
+			verbose(env, "BPF_ST imm must be 0 when writing to btf_id pointer at off=%u\n",
+				off_desc->offset);
+			return -EACCES;
+		}
+	} else {
+		goto end;
+	}
+	return 1;
+end:
+	verbose(env, "btf_id pointer in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
+	return -EACCES;
+}
+
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			    int off, int size, bool zero_size_allowed)
@@ -3503,6 +3615,36 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		}
 	}
+	if (map_value_has_ptr_to_btf_id(map)) {
+		struct bpf_map_value_off *tab = map->ptr_off_tab;
+		bool known_off = tnum_is_const(reg->var_off);
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++) {
+			u32 p = tab->off[i].offset;
+
+			if (reg->smin_value + off < p + sizeof(u64) &&
+			    p < reg->umax_value + off + size) {
+				if (!known_off) {
+					verbose(env, "btf_id pointer cannot be accessed by variable offset load/store\n");
+					return -EACCES;
+				}
+				if (p != off + reg->var_off.value) {
+					verbose(env, "btf_id pointer offset incorrect\n");
+					return -EACCES;
+				}
+				if (size != sizeof(u64)) {
+					verbose(env, "btf_id pointer load/store size must be 8\n");
+					return -EACCES;
+				}
+				break;
+			}
+		}
+	} else if (IS_ERR(map->ptr_off_tab)) {
+		/* Reject program using map with incorrectly tagged btf_id pointer */
+		verbose(env, "invalid btf_id pointer tagging in map value\n");
+		return PTR_ERR(map->ptr_off_tab);
+	}
 	return err;
 }
 
@@ -4404,6 +4546,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err)
 			return err;
 		err = check_map_access(env, regno, off, size, false);
+		if (!err)
+			err = check_map_ptr_to_btf_id(env, regno, off, size, value_regno,
+						      t, insn_idx);
+		/* if err == 0, check_map_ptr_to_btf_id ignored the access */
 		if (!err && t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
 
@@ -4425,6 +4571,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				mark_reg_unknown(env, regs, value_regno);
 			}
 		}
+		if (err == 1)
+			err = 0;
 	} else if (base_type(reg->type) == PTR_TO_MEM) {
 		bool rdonly_mem = type_is_rdonly_mem(reg->type);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-22  6:53   ` Alexei Starovoitov
  2022-02-20 13:48 ` [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID " Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

This commit enables storing referenced PTR_TO_BTF_ID pointers in maps,
with some restrictions to ensure the value of the pointer remains
consistent, as it needs to be eventuall freed using a release function,
either by the BPF program itself in a later invocation, or by the map's
free path.

Such a pointer must be tagged using both 'btf_id' and 'ref' type tags on
the type being pointed to by the pointer, in the map value's BTF. The
verifier will only permit updating such pointers using BPF_XCHG
instruction.

There are valid reasons to enforce this restriction:
- The pointer value must be consistent in face of concurrent
  modification, and any prior values contained in the map must also be
  released before a new one is moved into the map. To ensure proper
  transfer of this ownership, BPF_XCHG returns the old value, which the
  verifier would require the user to either free or move into another
  map, and released the reference held for the pointer being moved in.

  Hence, after a BPF_XCHG on a map value, the user releases the
  reference for the pointer they are moving in, and acquires a reference
  for the old pointer returned by the exchange instruction (in src_reg).

  In case of unreferenced PTR_TO_BTF_ID in maps, losing the old value by
  a store had no adverse effect. The only thing we need to ensure is
  that the actual word sized pointer is written and read concurrently
  using word sized instructions on the actual architecture, even if BPF
  ABI has 64-bit pointers, the underlying pointer value on 32-bit
  systems will be 32-bit, so emitting a load and store as two 32-bit
  sized loads and stores would still be valid, however doing the same on
  a 64-bit system would be wrong, as the pointer value being read can be
  inconsistent.

  This is because while pointer dereference inside a BPF program is
  concerned, the verifier patches loads to PROBE_MEM loads, which
  support exception handling of faulting loads, but PTR_TO_BTF_ID can
  also be passed into BPF helpers and kernel functions, which do not
  have the same liberty.

- This also ensures that BPF programs executing concurrently never end
  up in a state where a certain pointer value was lost due to
  manipulations of the map value, thus leaking the reference that was
  moved in.

  There is always an entity which releases the reference eventually, it
  will either be the map's free path (which will detect and release live
  pointers in the map), or the BPF program itself, which can exchange a
  referenced pointer with NULL and free the old reference.

- In case of multiple such pointers, doing many BPF_XCHG can be a bit
  costly, especially if those pointers are already protected by a BPF
  spin lock against concurrent modification. In the future, this support
  can be extended so that a single spin lock protects multiple such
  pointers and this move operation can be enforced using a helper, while
  also ensuring linearizability of the pointer move operations. This
  will amortize the cost of each individual BPF_XCHG that would be
  needed otherwise using a spin lock. However, this is work that has
  been left as an exercise for a future patch.

  This mechanism would also require the user to indicate to the verifier
  which members of the map value are protected by the BPF spin lock, by
  using annotation in map's BTF.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |   5 ++
 kernel/bpf/btf.c      |  22 +++++++-
 kernel/bpf/verifier.c | 117 ++++++++++++++++++++++++++++++++++++------
 3 files changed, 126 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ce45ffb79f82..923b9f36c275 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -160,11 +160,16 @@ enum {
 	BPF_MAP_VALUE_OFF_MAX = 8,
 };
 
+enum {
+	BPF_MAP_VALUE_OFF_F_REF = (1U << 0),
+};
+
 struct bpf_map_value_off_desc {
 	u32 offset;
 	u32 btf_id;
 	struct btf *btf;
 	struct module *module;
+	int flags;
 };
 
 struct bpf_map_value_off {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 1edb5710e155..20124f4a421c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3146,10 +3146,10 @@ static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **bt
 static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 			       u32 off, int sz, void *data)
 {
+	bool btf_id_tag = false, ref_tag = false;
 	struct bpf_map_value_off *tab;
 	struct bpf_map *map = data;
 	struct module *mod = NULL;
-	bool btf_id_tag = false;
 	struct btf *kernel_btf;
 	int nr_off, ret;
 	s32 id;
@@ -3167,6 +3167,13 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 				goto end;
 			}
 			btf_id_tag = true;
+		} else if (!strcmp("kernel.bpf.ref", __btf_name_by_offset(btf, t->name_off))) {
+			/* repeated tag */
+			if (ref_tag) {
+				ret = -EINVAL;
+				goto end;
+			}
+			ref_tag = true;
 		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
 			   sizeof("kernel.") - 1)) {
 			/* TODO: Should we reject these when loading BTF? */
@@ -3177,8 +3184,14 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 		/* Look for next tag */
 		t = btf_type_by_id(btf, t->type);
 	}
-	if (!btf_id_tag)
+	if (!btf_id_tag) {
+		/* 'ref' tag must be specified together with 'btf_id' tag */
+		if (ref_tag) {
+			ret = -EINVAL;
+			goto end;
+		}
 		return 0;
+	}
 
 	/* Get the base type */
 	if (btf_type_is_modifier(t))
@@ -3215,6 +3228,10 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 
 	/* We take reference to make sure valid pointers into module data don't
 	 * become invalid across program invocation.
+	 *
+	 * We also need to hold a reference to the module, which corresponds to
+	 * the referenced type, as it has the destructor function we need to
+	 * call when map goes away and a live pointer exists at offset.
 	 */
 	if (btf_is_module(kernel_btf)) {
 		mod = btf_try_get_module(kernel_btf);
@@ -3228,6 +3245,7 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 	tab->off[nr_off].btf_id = id;
 	tab->off[nr_off].btf    = kernel_btf;
 	tab->off[nr_off].module = mod;
+	tab->off[nr_off].flags  = ref_tag ? BPF_MAP_VALUE_OFF_F_REF : 0;
 	tab->nr_off++;
 
 	return 0;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1ffefddebaea..a9d8c0d3c919 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -521,6 +521,13 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+static bool is_xchg_insn(const struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_STX &&
+	       BPF_MODE(insn->code) == BPF_ATOMIC &&
+	       insn->imm == BPF_XCHG;
+}
+
 static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 {
 	return BPF_CLASS(insn->code) == BPF_STX &&
@@ -3467,7 +3474,8 @@ static int check_mem_region_access(struct bpf_verifier_env *env, u32 regno,
 
 static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 					struct bpf_map_value_off_desc *off_desc,
-					struct bpf_reg_state *reg, u32 regno)
+					struct bpf_reg_state *reg, u32 regno,
+					bool ref_ptr)
 {
 	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
 	const char *reg_name = "";
@@ -3498,6 +3506,20 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 			regno, reg_name, reg->off, tn_buf);
 		return -EINVAL;
 	}
+	/* reg->off can be used to store pointer to a certain type formed by
+	 * incrementing pointer of a parent structure the object is embedded in,
+	 * e.g. map may expect unreferenced struct path *, and user should be
+	 * allowed a store using &file->f_path. However, in the case of
+	 * referenced pointer, we cannot do this, because the reference is only
+	 * for the parent structure, not its embedded object(s), and because
+	 * the transfer of ownership happens for the original pointer to and
+	 * from the map (before its eventual release).
+	 */
+	if (reg->off && ref_ptr) {
+		verbose(env, "R%d stored to referenced btf_id pointer cannot have non-zero offset\n",
+			regno);
+		return -EINVAL;
+	}
 
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
 				  off_desc->btf, off_desc->btf_id))
@@ -3510,17 +3532,23 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 	return -EINVAL;
 }
 
+static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+
 /* Returns an error, or 0 if ignoring the access, or 1 if register state was
  * updated, in which case later updates must be skipped.
  */
 static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int off, int size,
-				   int value_regno, enum bpf_access_type t, int insn_idx)
+				   int value_regno, enum bpf_access_type t, int insn_idx,
+				   struct bpf_reg_state *atomic_load_reg)
 {
 	struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	struct bpf_map_value_off_desc *off_desc;
 	int insn_class = BPF_CLASS(insn->code);
 	struct bpf_map *map = reg->map_ptr;
+	bool ref_ptr = false;
+	u32 ref_obj_id = 0;
+	int ret;
 
 	/* Things we already checked for in check_map_access:
 	 *  - Reject cases where variable offset may touch BTF ID pointer
@@ -3533,11 +3561,12 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 	off_desc = bpf_map_ptr_off_contains(map, off + reg->var_off.value);
 	if (!off_desc)
 		return 0;
+	ref_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_REF;
 
 	if (WARN_ON_ONCE(size != bpf_size_to_bytes(BPF_DW)))
 		return -EACCES;
 
-	if (BPF_MODE(insn->code) != BPF_MEM)
+	if (BPF_MODE(insn->code) != BPF_MEM && BPF_MODE(insn->code) != BPF_ATOMIC)
 		goto end;
 
 	if (!env->bpf_capable) {
@@ -3545,10 +3574,50 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		return -EPERM;
 	}
 
-	if (insn_class == BPF_LDX) {
+	if (is_xchg_insn(insn)) {
+		/* We do checks and updates during register fill call for fetch case */
+		if (t != BPF_READ || value_regno < 0)
+			return 1;
+		val_reg = reg_state(env, value_regno);
+		if (!register_is_null(atomic_load_reg) &&
+		    map_ptr_to_btf_id_match_type(env, off_desc, atomic_load_reg, value_regno, ref_ptr))
+			return -EACCES;
+		/* Acquire new reference state for old pointer, and release
+		 * current reference state for exchanged pointer.
+		 */
+		if (ref_ptr) {
+			if (!register_is_null(atomic_load_reg)) {
+				if (!atomic_load_reg->ref_obj_id) {
+					verbose(env, "R%d type=%s%s must be referenced\n",
+						value_regno, reg_type_str(env, atomic_load_reg->type),
+						kernel_type_name(reg->btf, reg->btf_id));
+					return -EACCES;
+				}
+				ret = release_reference(env, atomic_load_reg->ref_obj_id);
+				if (ret < 0)
+					return ret;
+			}
+			ret = acquire_reference_state(env, insn_idx);
+			if (ret < 0)
+				return ret;
+			ref_obj_id = ret;
+		}
+		/* val_reg might be NULL at this point */
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
+				off_desc->btf_id, PTR_MAYBE_NULL);
+		/* __mark_ptr_or_null_regs needs ref_obj_id == id to clear
+		 * reference state for ptr == NULL branch.
+		 */
+		val_reg->id = ref_obj_id ?: ++env->id_gen;
+		val_reg->ref_obj_id = ref_obj_id;
+	} else if (insn_class == BPF_LDX) {
 		if (WARN_ON_ONCE(value_regno < 0))
 			return -EFAULT;
 		val_reg = reg_state(env, value_regno);
+		if (ref_ptr) {
+			verbose(env, "referenced btf_id pointer can only be accessed using BPF_XCHG\n");
+			return -EACCES;
+		}
 		/* We can simply mark the value_regno receiving the pointer
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
@@ -3559,10 +3628,18 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		if (WARN_ON_ONCE(value_regno < 0))
 			return -EFAULT;
 		val_reg = reg_state(env, value_regno);
+		if (ref_ptr) {
+			verbose(env, "referenced btf_id pointer can only be accessed using BPF_XCHG\n");
+			return -EACCES;
+		}
 		if (!register_is_null(val_reg) &&
-		    map_ptr_to_btf_id_match_type(env, off_desc, val_reg, value_regno))
+		    map_ptr_to_btf_id_match_type(env, off_desc, val_reg, value_regno, false))
 			return -EACCES;
 	} else if (insn_class == BPF_ST) {
+		if (ref_ptr) {
+			verbose(env, "referenced btf_id pointer can only be accessed using BPF_XCHG\n");
+			return -EACCES;
+		}
 		if (insn->imm) {
 			verbose(env, "BPF_ST imm must be 0 when writing to btf_id pointer at off=%u\n",
 				off_desc->offset);
@@ -3573,7 +3650,7 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 	}
 	return 1;
 end:
-	verbose(env, "btf_id pointer in map can only be accessed using BPF_LDX/BPF_STX/BPF_ST\n");
+	verbose(env, "btf_id pointer in map can only be accessed using BPF_LDX, BPF_STX, BPF_ST, BPF_XCHG\n");
 	return -EACCES;
 }
 
@@ -4505,7 +4582,8 @@ static int check_stack_access_within_bounds(
  */
 static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
 			    int off, int bpf_size, enum bpf_access_type t,
-			    int value_regno, bool strict_alignment_once)
+			    int value_regno, bool strict_alignment_once,
+			    struct bpf_reg_state *atomic_load_reg)
 {
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *reg = regs + regno;
@@ -4548,7 +4626,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_map_access(env, regno, off, size, false);
 		if (!err)
 			err = check_map_ptr_to_btf_id(env, regno, off, size, value_regno,
-						      t, insn_idx);
+						      t, insn_idx, atomic_load_reg);
 		/* if err == 0, check_map_ptr_to_btf_id ignored the access */
 		if (!err && t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
@@ -4743,9 +4821,12 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 
 static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
 {
+	struct bpf_reg_state atomic_load_reg;
 	int load_reg;
 	int err;
 
+	__mark_reg_unknown(env, &atomic_load_reg);
+
 	switch (insn->imm) {
 	case BPF_ADD:
 	case BPF_ADD | BPF_FETCH:
@@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 		else
 			load_reg = insn->src_reg;
 
+		atomic_load_reg = *reg_state(env, load_reg);
 		/* check and record load of old value */
 		err = check_reg_arg(env, load_reg, DST_OP);
 		if (err)
@@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 	}
 
 	/* Check whether we can read the memory, with second call for fetch
-	 * case to simulate the register fill.
+	 * case to simulate the register fill, which also triggers checks
+	 * for manipulation of BTF ID pointers embedded in BPF maps.
 	 */
 	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
-			       BPF_SIZE(insn->code), BPF_READ, -1, true);
+			       BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
 	if (!err && load_reg >= 0)
 		err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
 				       BPF_SIZE(insn->code), BPF_READ, load_reg,
-				       true);
+				       true, load_reg >= 0 ? &atomic_load_reg : NULL);
 	if (err)
 		return err;
 
 	/* Check whether we can write into the same memory. */
 	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
-			       BPF_SIZE(insn->code), BPF_WRITE, -1, true);
+			       BPF_SIZE(insn->code), BPF_WRITE, -1, true, NULL);
 	if (err)
 		return err;
 
@@ -6797,7 +6880,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	 */
 	for (i = 0; i < meta.access_size; i++) {
 		err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
-				       BPF_WRITE, -1, false);
+				       BPF_WRITE, -1, false, NULL);
 		if (err)
 			return err;
 	}
@@ -11662,7 +11745,8 @@ static int do_check(struct bpf_verifier_env *env)
 			 */
 			err = check_mem_access(env, env->insn_idx, insn->src_reg,
 					       insn->off, BPF_SIZE(insn->code),
-					       BPF_READ, insn->dst_reg, false);
+					       BPF_READ, insn->dst_reg, false,
+					       NULL);
 			if (err)
 				return err;
 
@@ -11717,7 +11801,8 @@ static int do_check(struct bpf_verifier_env *env)
 			/* check that memory (dst_reg + off) is writeable */
 			err = check_mem_access(env, env->insn_idx, insn->dst_reg,
 					       insn->off, BPF_SIZE(insn->code),
-					       BPF_WRITE, insn->src_reg, false);
+					       BPF_WRITE, insn->src_reg, false,
+					       NULL);
 			if (err)
 				return err;
 
@@ -11751,7 +11836,7 @@ static int do_check(struct bpf_verifier_env *env)
 			/* check that memory (dst_reg + off) is writeable */
 			err = check_mem_access(env, env->insn_idx, insn->dst_reg,
 					       insn->off, BPF_SIZE(insn->code),
-					       BPF_WRITE, -1, false);
+					       BPF_WRITE, -1, false, NULL);
 			if (err)
 				return err;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 20:40   ` kernel test robot
  2022-02-20 13:48 ` [PATCH bpf-next v1 06/15] bpf: Allow storing __user PTR_TO_BTF_ID " Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Hao Luo, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Make adjustments to the code to allow storing PTR_TO_PERCPU_BTF_ID in a
map. Note that these are not yet supported as referenced pointers, so
that is explicitly disallowed during BTF tag parsing. Similar to 'ref'
tag, a new 'percpu' tag composes with 'btf_id' tag on the pointed to
type to hint that it is a percpu btf_id pointer.

Cc: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  3 ++-
 kernel/bpf/btf.c      | 27 ++++++++++++++++++++++-----
 kernel/bpf/verifier.c | 37 ++++++++++++++++++++++++++++---------
 3 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 923b9f36c275..843c8c01cf9d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -161,7 +161,8 @@ enum {
 };
 
 enum {
-	BPF_MAP_VALUE_OFF_F_REF = (1U << 0),
+	BPF_MAP_VALUE_OFF_F_REF    = (1U << 0),
+	BPF_MAP_VALUE_OFF_F_PERCPU = (1U << 1),
 };
 
 struct bpf_map_value_off_desc {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 20124f4a421c..eb57584ee0a8 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3146,12 +3146,12 @@ static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **bt
 static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 			       u32 off, int sz, void *data)
 {
-	bool btf_id_tag = false, ref_tag = false;
+	bool btf_id_tag = false, ref_tag = false, percpu_tag = false;
 	struct bpf_map_value_off *tab;
 	struct bpf_map *map = data;
+	int nr_off, ret, flags = 0;
 	struct module *mod = NULL;
 	struct btf *kernel_btf;
-	int nr_off, ret;
 	s32 id;
 
 	/* For PTR, sz is always == 8 */
@@ -3174,6 +3174,13 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 				goto end;
 			}
 			ref_tag = true;
+		} else if (!strcmp("kernel.bpf.percpu", __btf_name_by_offset(btf, t->name_off))) {
+			/* repeated tag */
+			if (percpu_tag) {
+				ret = -EINVAL;
+				goto end;
+			}
+			percpu_tag = true;
 		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
 			   sizeof("kernel.") - 1)) {
 			/* TODO: Should we reject these when loading BTF? */
@@ -3185,13 +3192,18 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 		t = btf_type_by_id(btf, t->type);
 	}
 	if (!btf_id_tag) {
-		/* 'ref' tag must be specified together with 'btf_id' tag */
-		if (ref_tag) {
+		/* 'ref' or 'percpu' tag must be specified together with 'btf_id' tag */
+		if (ref_tag || percpu_tag) {
 			ret = -EINVAL;
 			goto end;
 		}
 		return 0;
 	}
+	/* referenced percpu btf_id pointer is not yet supported */
+	if (ref_tag && percpu_tag) {
+		ret = -EINVAL;
+		goto end;
+	}
 
 	/* Get the base type */
 	if (btf_type_is_modifier(t))
@@ -3241,11 +3253,16 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 		}
 	}
 
+	if (ref_tag)
+		flags |= BPF_MAP_VALUE_OFF_F_REF;
+	else if (percpu_tag)
+		flags |= BPF_MAP_VALUE_OFF_F_PERCPU;
+
 	tab->off[nr_off].offset = off;
 	tab->off[nr_off].btf_id = id;
 	tab->off[nr_off].btf    = kernel_btf;
 	tab->off[nr_off].module = mod;
-	tab->off[nr_off].flags  = ref_tag ? BPF_MAP_VALUE_OFF_F_REF : 0;
+	tab->off[nr_off].flags  = flags;
 	tab->nr_off++;
 
 	return 0;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a9d8c0d3c919..00d6ab49033d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1559,12 +1559,13 @@ static void mark_btf_ld_reg(struct bpf_verifier_env *env,
 			    struct btf *btf, u32 btf_id,
 			    enum bpf_type_flag flag)
 {
-	if (reg_type == SCALAR_VALUE) {
+	if (reg_type == SCALAR_VALUE ||
+	    WARN_ON_ONCE(reg_type != PTR_TO_BTF_ID && reg_type != PTR_TO_PERCPU_BTF_ID)) {
 		mark_reg_unknown(env, regs, regno);
 		return;
 	}
 	mark_reg_known_zero(env, regs, regno);
-	regs[regno].type = PTR_TO_BTF_ID | flag;
+	regs[regno].type = reg_type | flag;
 	regs[regno].btf = btf;
 	regs[regno].btf_id = btf_id;
 }
@@ -3478,10 +3479,18 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 					bool ref_ptr)
 {
 	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
+	enum bpf_reg_type reg_type;
 	const char *reg_name = "";
 
-	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
-		goto end;
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU) {
+		if (reg->type != PTR_TO_PERCPU_BTF_ID &&
+		    reg->type != (PTR_TO_PERCPU_BTF_ID | PTR_MAYBE_NULL))
+			goto end;
+	} else { /* referenced and unreferenced case */
+		if (reg->type != PTR_TO_BTF_ID &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL))
+			goto end;
+	}
 
 	if (!btf_is_kernel(reg->btf)) {
 		verbose(env, "R%d must point to kernel BTF\n", regno);
@@ -3524,11 +3533,16 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
 				  off_desc->btf, off_desc->btf_id))
 		goto end;
+
 	return 0;
 end:
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU)
+		reg_type = PTR_TO_PERCPU_BTF_ID | PTR_MAYBE_NULL;
+	else
+		reg_type = PTR_TO_BTF_ID | PTR_MAYBE_NULL;
 	verbose(env, "invalid btf_id pointer access, R%d type=%s%s ", regno,
 		reg_type_str(env, reg->type), reg_name);
-	verbose(env, "expected=%s%s\n", reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+	verbose(env, "expected=%s%s\n", reg_type_str(env, reg_type), targ_name);
 	return -EINVAL;
 }
 
@@ -3543,10 +3557,11 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 {
 	struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
+	enum bpf_reg_type reg_type = PTR_TO_BTF_ID;
+	bool ref_ptr = false, percpu_ptr = false;
 	struct bpf_map_value_off_desc *off_desc;
 	int insn_class = BPF_CLASS(insn->code);
 	struct bpf_map *map = reg->map_ptr;
-	bool ref_ptr = false;
 	u32 ref_obj_id = 0;
 	int ret;
 
@@ -3561,7 +3576,6 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 	off_desc = bpf_map_ptr_off_contains(map, off + reg->var_off.value);
 	if (!off_desc)
 		return 0;
-	ref_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_REF;
 
 	if (WARN_ON_ONCE(size != bpf_size_to_bytes(BPF_DW)))
 		return -EACCES;
@@ -3574,6 +3588,11 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		return -EPERM;
 	}
 
+	ref_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_REF;
+	percpu_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU;
+	if (percpu_ptr)
+		reg_type = PTR_TO_PERCPU_BTF_ID;
+
 	if (is_xchg_insn(insn)) {
 		/* We do checks and updates during register fill call for fetch case */
 		if (t != BPF_READ || value_regno < 0)
@@ -3603,7 +3622,7 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 			ref_obj_id = ret;
 		}
 		/* val_reg might be NULL at this point */
-		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, reg_type, off_desc->btf,
 				off_desc->btf_id, PTR_MAYBE_NULL);
 		/* __mark_ptr_or_null_regs needs ref_obj_id == id to clear
 		 * reference state for ptr == NULL branch.
@@ -3621,7 +3640,7 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		/* We can simply mark the value_regno receiving the pointer
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
-		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, off_desc->btf,
+		mark_btf_ld_reg(env, cur_regs(env), value_regno, reg_type, off_desc->btf,
 				off_desc->btf_id, PTR_MAYBE_NULL);
 		val_reg->id = ++env->id_gen;
 	} else if (insn_class == BPF_STX) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 06/15] bpf: Allow storing __user PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID " Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 07/15] bpf: Prevent escaping of pointers loaded from maps Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Recently, verifier gained __user annotation support [0] where it
prevents BPF program from normally derefering user memory pointer in the
kernel, and instead requires use of bpf_probe_read_user. We can allow
the user to also store these pointers in BPF maps, with the logic that
whenever user loads it from the BPF map, it gets marked as MEM_USER.

  [0]: https://lore.kernel.org/bpf/20220127154555.650886-1-yhs@fb.com

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  1 +
 kernel/bpf/btf.c      | 20 +++++++++++++++-----
 kernel/bpf/verifier.c | 21 +++++++++++++++------
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 843c8c01cf9d..37ca92f4c7b7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -163,6 +163,7 @@ enum {
 enum {
 	BPF_MAP_VALUE_OFF_F_REF    = (1U << 0),
 	BPF_MAP_VALUE_OFF_F_PERCPU = (1U << 1),
+	BPF_MAP_VALUE_OFF_F_USER   = (1U << 2),
 };
 
 struct bpf_map_value_off_desc {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index eb57584ee0a8..bafceae90c32 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3146,7 +3146,7 @@ static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **bt
 static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 			       u32 off, int sz, void *data)
 {
-	bool btf_id_tag = false, ref_tag = false, percpu_tag = false;
+	bool btf_id_tag = false, ref_tag = false, percpu_tag = false, user_tag = false;
 	struct bpf_map_value_off *tab;
 	struct bpf_map *map = data;
 	int nr_off, ret, flags = 0;
@@ -3181,6 +3181,13 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 				goto end;
 			}
 			percpu_tag = true;
+		} else if (!strcmp("kernel.bpf.user", __btf_name_by_offset(btf, t->name_off))) {
+			/* repeated tag */
+			if (user_tag) {
+				ret = -EINVAL;
+				goto end;
+			}
+			user_tag = true;
 		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
 			   sizeof("kernel.") - 1)) {
 			/* TODO: Should we reject these when loading BTF? */
@@ -3192,15 +3199,16 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 		t = btf_type_by_id(btf, t->type);
 	}
 	if (!btf_id_tag) {
-		/* 'ref' or 'percpu' tag must be specified together with 'btf_id' tag */
-		if (ref_tag || percpu_tag) {
+		/* 'ref', 'percpu', 'user' tag must be specified together with 'btf_id' tag */
+		if (ref_tag || percpu_tag || user_tag) {
 			ret = -EINVAL;
 			goto end;
 		}
 		return 0;
 	}
-	/* referenced percpu btf_id pointer is not yet supported */
-	if (ref_tag && percpu_tag) {
+	/* All three are mutually exclusive */
+	ret = ref_tag + percpu_tag + user_tag;
+	if (ret > 1) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -3257,6 +3265,8 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 		flags |= BPF_MAP_VALUE_OFF_F_REF;
 	else if (percpu_tag)
 		flags |= BPF_MAP_VALUE_OFF_F_PERCPU;
+	else if (user_tag)
+		flags |= BPF_MAP_VALUE_OFF_F_USER;
 
 	tab->off[nr_off].offset = off;
 	tab->off[nr_off].btf_id = id;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 00d6ab49033d..28da858bb921 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3482,7 +3482,11 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 	enum bpf_reg_type reg_type;
 	const char *reg_name = "";
 
-	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU) {
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_USER) {
+		if (reg->type != (PTR_TO_BTF_ID | MEM_USER) &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL | MEM_USER))
+			goto end;
+	} else if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU) {
 		if (reg->type != PTR_TO_PERCPU_BTF_ID &&
 		    reg->type != (PTR_TO_PERCPU_BTF_ID | PTR_MAYBE_NULL))
 			goto end;
@@ -3536,7 +3540,9 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 
 	return 0;
 end:
-	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU)
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_USER)
+		reg_type = PTR_TO_BTF_ID | PTR_MAYBE_NULL | MEM_USER;
+	else if (off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU)
 		reg_type = PTR_TO_PERCPU_BTF_ID | PTR_MAYBE_NULL;
 	else
 		reg_type = PTR_TO_BTF_ID | PTR_MAYBE_NULL;
@@ -3556,14 +3562,14 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 				   struct bpf_reg_state *atomic_load_reg)
 {
 	struct bpf_reg_state *reg = reg_state(env, regno), *val_reg;
+	bool ref_ptr = false, percpu_ptr = false, user_ptr = false;
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	enum bpf_reg_type reg_type = PTR_TO_BTF_ID;
-	bool ref_ptr = false, percpu_ptr = false;
 	struct bpf_map_value_off_desc *off_desc;
 	int insn_class = BPF_CLASS(insn->code);
+	int ret, reg_flags = PTR_MAYBE_NULL;
 	struct bpf_map *map = reg->map_ptr;
 	u32 ref_obj_id = 0;
-	int ret;
 
 	/* Things we already checked for in check_map_access:
 	 *  - Reject cases where variable offset may touch BTF ID pointer
@@ -3590,8 +3596,11 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 
 	ref_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_REF;
 	percpu_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU;
+	user_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_USER;
 	if (percpu_ptr)
 		reg_type = PTR_TO_PERCPU_BTF_ID;
+	else if (user_ptr)
+		reg_flags |= MEM_USER;
 
 	if (is_xchg_insn(insn)) {
 		/* We do checks and updates during register fill call for fetch case */
@@ -3623,7 +3632,7 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		}
 		/* val_reg might be NULL at this point */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, reg_type, off_desc->btf,
-				off_desc->btf_id, PTR_MAYBE_NULL);
+				off_desc->btf_id, reg_flags);
 		/* __mark_ptr_or_null_regs needs ref_obj_id == id to clear
 		 * reference state for ptr == NULL branch.
 		 */
@@ -3641,7 +3650,7 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, reg_type, off_desc->btf,
-				off_desc->btf_id, PTR_MAYBE_NULL);
+				off_desc->btf_id, reg_flags);
 		val_reg->id = ++env->id_gen;
 	} else if (insn_class == BPF_STX) {
 		if (WARN_ON_ONCE(value_regno < 0))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 07/15] bpf: Prevent escaping of pointers loaded from maps
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 06/15] bpf: Allow storing __user PTR_TO_BTF_ID " Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

While we can guarantee that even for unreferenced pointer, the object
pointer points to being freed etc. can be handled by the verifier's
exception handling (normal load patching to PROBE_MEM loads), we still
cannot allow the user to pass these pointers to BPF helpers and kfunc,
because the same exception handling won't be done for accesses inside
the kernel.

Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark
all registers loading unreferenced PTR_TO_BTF_ID from BPF maps, and
ensure they can never escape the BPF program and into the kernel by way
of calling stable/unstable helpers.

Adjust the check in check_mem_access so that we allow calling
check_ptr_to_btf_access only when no or PTR_UNTRUSTED type flag is set.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  7 +++++++
 kernel/bpf/verifier.c | 29 ++++++++++++++++++++++++++---
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 37ca92f4c7b7..ae599aaf8d4c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -364,6 +364,13 @@ enum bpf_type_flag {
 	/* MEM is in user address space. */
 	MEM_USER		= BIT(3 + BPF_BASE_TYPE_BITS),
 
+	/* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
+	 * unrefcounted pointers loaded from map value, so that they can only
+	 * be dereferenced but not escape the BPF program into the kernel (i.e.
+	 * cannot be passed as arguments to kfunc or bpf helpers).
+	 */
+	PTR_UNTRUSTED		= BIT(4 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_LAST_FLAG	= MEM_USER,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 28da858bb921..0a2cd21d9ec1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -582,6 +582,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		strncpy(prefix, "alloc_", 32);
 	if (type & MEM_USER)
 		strncpy(prefix, "user_", 32);
+	if (type & PTR_UNTRUSTED)
+		strncpy(prefix, "untrusted_", 32);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -3490,10 +3492,17 @@ static int map_ptr_to_btf_id_match_type(struct bpf_verifier_env *env,
 		if (reg->type != PTR_TO_PERCPU_BTF_ID &&
 		    reg->type != (PTR_TO_PERCPU_BTF_ID | PTR_MAYBE_NULL))
 			goto end;
-	} else { /* referenced and unreferenced case */
+	} else if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
+		/* register state (ref_obj_id) must be checked by caller */
 		if (reg->type != PTR_TO_BTF_ID &&
 		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL))
 			goto end;
+	} else { /* only unreferenced case accepts untrusted pointers */
+		if (reg->type != PTR_TO_BTF_ID &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL) &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_UNTRUSTED) &&
+		    reg->type != (PTR_TO_BTF_ID | PTR_MAYBE_NULL | PTR_UNTRUSTED))
+			goto end;
 	}
 
 	if (!btf_is_kernel(reg->btf)) {
@@ -3597,10 +3606,13 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 	ref_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_REF;
 	percpu_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_PERCPU;
 	user_ptr = off_desc->flags & BPF_MAP_VALUE_OFF_F_USER;
+
 	if (percpu_ptr)
 		reg_type = PTR_TO_PERCPU_BTF_ID;
 	else if (user_ptr)
 		reg_flags |= MEM_USER;
+	else
+		reg_flags |= PTR_UNTRUSTED;
 
 	if (is_xchg_insn(insn)) {
 		/* We do checks and updates during register fill call for fetch case */
@@ -3629,6 +3641,10 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 			if (ret < 0)
 				return ret;
 			ref_obj_id = ret;
+			/* Unset PTR_UNTRUSTED, so that it can be passed to bpf
+			 * helpers or kfunc.
+			 */
+			reg_flags &= ~PTR_UNTRUSTED;
 		}
 		/* val_reg might be NULL at this point */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, reg_type, off_desc->btf,
@@ -4454,6 +4470,12 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (ret < 0)
 		return ret;
 
+	/* If this is an untrusted pointer, all btf_id pointers formed by
+	 * walking it also inherit the untrusted flag.
+	 */
+	if (type_flag(reg->type) & PTR_UNTRUSTED)
+		flag |= PTR_UNTRUSTED;
+
 	if (atype == BPF_READ && value_regno >= 0)
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
 
@@ -4804,7 +4826,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_tp_buffer_access(env, reg, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
-	} else if (reg->type == PTR_TO_BTF_ID) {
+	} else if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_UNTRUSTED)) {
 		err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
 					      value_regno);
 	} else if (reg->type == CONST_PTR_TO_MAP) {
@@ -13041,7 +13063,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		if (!ctx_access)
 			continue;
 
-		switch (env->insn_aux_data[i + delta].ptr_type) {
+		switch ((int)env->insn_aux_data[i + delta].ptr_type) {
 		case PTR_TO_CTX:
 			if (!ops->convert_ctx_access)
 				continue;
@@ -13058,6 +13080,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
 		case PTR_TO_BTF_ID:
+		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 07/15] bpf: Prevent escaping of pointers loaded from maps Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-22  7:04   ` Alexei Starovoitov
  2022-02-20 13:48 ` [PATCH bpf-next v1 09/15] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

The changes in this patch deserve closer look, so it has been split into
its own independent patch. While earlier we just had to skip two objects
at most while copying in and out of map, now we have potentially many
objects (at most 8 + 2 = 10, due to the BPF_MAP_VALUE_OFF_MAX limit).

Hence, divide the copy_map_value function into an inlined fast path and
function call to slowpath. The slowpath handles the case of > 3 offsets,
while we handle the most common cases (0, 1, 2, or 3 offsets) in the
inline function itself.

In copy_map_value_slow, we use 11 offsets, just to make the for loop
that copies the value free of edge cases for the last offset, by using
map->value_size as final offset to subtract remaining area to copy from.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h  | 43 +++++++++++++++++++++++++++++++---
 kernel/bpf/syscall.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ae599aaf8d4c..5d845ca02eba 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -253,12 +253,22 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
 		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
 	if (unlikely(map_value_has_timer(map)))
 		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
+	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
+		struct bpf_map_value_off *tab = map->ptr_off_tab;
+		int i;
+
+		for (i = 0; i < tab->nr_off; i++)
+			*(u64 *)(dst + tab->off[i].offset) = 0;
+	}
 }
 
+void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
+			 u32 s_sz, u32 t_off, u32 t_sz);
+
 /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
 static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
 {
-	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
+	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0, p_off = 0, p_sz = 0;
 
 	if (unlikely(map_value_has_spin_lock(map))) {
 		s_off = map->spin_lock_off;
@@ -268,13 +278,40 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
 		t_off = map->timer_off;
 		t_sz = sizeof(struct bpf_timer);
 	}
+	/* Multiple offset case is slow, offload to function */
+	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
+		struct bpf_map_value_off *tab = map->ptr_off_tab;
+
+		/* Inline the likely common case */
+		if (likely(tab->nr_off == 1)) {
+			p_off = tab->off[0].offset;
+			p_sz = sizeof(u64);
+		} else {
+			copy_map_value_slow(map, dst, src, s_off, s_sz, t_off, t_sz);
+			return;
+		}
+	}
+
+	if (unlikely(s_sz || t_sz || p_sz)) {
+		/* The order is p_off, t_off, s_off, use insertion sort */
 
-	if (unlikely(s_sz || t_sz)) {
+		if (t_off < p_off || !t_sz) {
+			swap(t_off, p_off);
+			swap(t_sz, p_sz);
+		}
 		if (s_off < t_off || !s_sz) {
 			swap(s_off, t_off);
 			swap(s_sz, t_sz);
+			if (t_off < p_off || !t_sz) {
+				swap(t_off, p_off);
+				swap(t_sz, p_sz);
+			}
 		}
-		memcpy(dst, src, t_off);
+
+		memcpy(dst, src, p_off);
+		memcpy(dst + p_off + p_sz,
+		       src + p_off + p_sz,
+		       t_off - p_off - p_sz);
 		memcpy(dst + t_off + t_sz,
 		       src + t_off + t_sz,
 		       s_off - t_off - t_sz);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index beb96866f34d..83d71d6912f5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -30,6 +30,7 @@
 #include <linux/pgtable.h>
 #include <linux/bpf_lsm.h>
 #include <linux/poll.h>
+#include <linux/sort.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/memcontrol.h>
@@ -230,6 +231,60 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
 	return err;
 }
 
+static int copy_map_value_cmp(const void *_a, const void *_b)
+{
+	const u32 a = *(const u32 *)_a;
+	const u32 b = *(const u32 *)_b;
+
+	/* We only need to sort based on offset */
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
+			 u32 s_sz, u32 t_off, u32 t_sz)
+{
+	struct bpf_map_value_off *tab = map->ptr_off_tab; /* already set to non-NULL */
+	/* 3 = 2 for bpf_timer, bpf_spin_lock, 1 for map->value_size sentinel */
+	struct {
+		u32 off;
+		u32 sz;
+	} off_arr[BPF_MAP_VALUE_OFF_MAX + 3];
+	int i, cnt = 0;
+
+	/* Reconsider stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
+	BUILD_BUG_ON(sizeof(off_arr) != 88);
+
+	for (i = 0; i < tab->nr_off; i++) {
+		off_arr[cnt].off = tab->off[i].offset;
+		off_arr[cnt++].sz = sizeof(u64);
+	}
+	if (s_sz) {
+		off_arr[cnt].off = s_off;
+		off_arr[cnt++].sz = s_sz;
+	}
+	if (t_sz) {
+		off_arr[cnt].off = t_off;
+		off_arr[cnt++].sz = t_sz;
+	}
+	off_arr[cnt].off = map->value_size;
+
+	sort(off_arr, cnt, sizeof(off_arr[0]), copy_map_value_cmp, NULL);
+
+	/* There is always at least one element */
+	memcpy(dst, src, off_arr[0].off);
+	/* Copy the rest, while skipping other regions */
+	for (i = 1; i < cnt; i++) {
+		u32 curr_off = off_arr[i - 1].off + off_arr[i - 1].sz;
+		u32 next_off = off_arr[i].off;
+
+		memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
+	}
+}
+
 static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 			      __u64 flags)
 {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 09/15] bpf: Populate pairs of btf_id and destructor kfunc in btf
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

To support storing referenced PTR_TO_BTF_ID in maps, we require
associating a specific BTF ID with a 'destructor' kfunc. This is because
we need to release a live referenced pointer at a certain offset in map
value from the map destruction path, otherwise we end up leaking
resources.

Hence, introduce support for passing an array of btf_id, kfunc_btf_id
pairs that denote a BTF ID and its associated release function. Then,
add an accessor 'btf_find_dtor_kfunc' which can be used to look up the
destructor kfunc of a certain BTF ID. If found, we can use it to free
the object from the map free path.

The registration of these pairs also serve as a whitelist of structures
which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because
without finding the destructor kfunc, we will bail and return an error.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h |  17 +++++++
 kernel/bpf/btf.c    | 109 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 6592183aeb23..a304a1ea39d9 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -41,6 +41,11 @@ struct btf_kfunc_id_set {
 	};
 };
 
+struct btf_id_dtor_kfunc {
+	u32 btf_id;
+	u32 kfunc_btf_id;
+};
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
@@ -347,6 +352,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf,
 			       enum btf_kfunc_type type, u32 kfunc_btf_id);
 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 			      const struct btf_kfunc_id_set *s);
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
@@ -370,6 +378,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 {
 	return 0;
 }
+static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	return -ENOENT;
+}
+static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
+					      u32 add_cnt, struct module *owner)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index bafceae90c32..8a6ec1847f17 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -207,12 +207,18 @@ enum btf_kfunc_hook {
 
 enum {
 	BTF_KFUNC_SET_MAX_CNT = 32,
+	BTF_DTOR_KFUNC_MAX_CNT = 256,
 };
 
 struct btf_kfunc_set_tab {
 	struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX];
 };
 
+struct btf_id_dtor_kfunc_tab {
+	u32 cnt;
+	struct btf_id_dtor_kfunc dtors[];
+};
+
 struct btf {
 	void *data;
 	struct btf_type **types;
@@ -228,6 +234,7 @@ struct btf {
 	u32 id;
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
+	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
 
 	/* split BTF support */
 	struct btf *base_btf;
@@ -1572,8 +1579,19 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
 	btf->kfunc_set_tab = NULL;
 }
 
+static void btf_free_dtor_kfunc_tab(struct btf *btf)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+
+	if (!tab)
+		return;
+	kfree(tab);
+	btf->dtor_kfunc_tab = NULL;
+}
+
 static void btf_free(struct btf *btf)
 {
+	btf_free_dtor_kfunc_tab(btf);
 	btf_free_kfunc_set_tab(btf);
 	kvfree(btf->types);
 	kvfree(btf->resolved_sizes);
@@ -7037,6 +7055,97 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 }
 EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set);
 
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+{
+	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
+	struct btf_id_dtor_kfunc *dtor;
+
+	if (!tab)
+		return -ENOENT;
+	/* Even though the size of tab->dtors[0] is > sizeof(u32), we only need
+	 * to compare the first u32 with btf_id, so we can reuse btf_id_cmp_func.
+	 */
+	BUILD_BUG_ON(offsetof(struct btf_id_dtor_kfunc, btf_id) != 0);
+	dtor = bsearch(&btf_id, tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func);
+	if (!dtor)
+		return -ENOENT;
+	return dtor->kfunc_btf_id;
+}
+
+/* This function must be invoked only from initcalls/module init functions */
+int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
+				struct module *owner)
+{
+	struct btf_id_dtor_kfunc_tab *tab;
+	struct btf *btf;
+	u32 tab_cnt;
+	int ret;
+
+	btf = btf_get_module_btf(owner);
+	if (!btf) {
+		if (!owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+			pr_err("missing vmlinux BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		if (owner && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) {
+			pr_err("missing module BTF, cannot register dtor kfuncs\n");
+			return -ENOENT;
+		}
+		return 0;
+	}
+	if (IS_ERR(btf))
+		return PTR_ERR(btf);
+
+	if (add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = btf->dtor_kfunc_tab;
+	/* Only one call allowed for modules */
+	if (WARN_ON_ONCE(tab && btf_is_module(btf))) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	tab_cnt = tab ? tab->cnt : 0;
+	if (tab_cnt > U32_MAX - add_cnt) {
+		ret = -EOVERFLOW;
+		goto end;
+	}
+	if (tab_cnt + add_cnt >= BTF_DTOR_KFUNC_MAX_CNT) {
+		pr_err("cannot register more than %d kfunc destructors\n", BTF_DTOR_KFUNC_MAX_CNT);
+		ret = -E2BIG;
+		goto end;
+	}
+
+	tab = krealloc(btf->dtor_kfunc_tab,
+		       offsetof(struct btf_id_dtor_kfunc_tab, dtors[tab_cnt + add_cnt]),
+		       GFP_KERNEL | __GFP_NOWARN);
+	if (!tab) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	if (!btf->dtor_kfunc_tab)
+		tab->cnt = 0;
+	btf->dtor_kfunc_tab = tab;
+
+	memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+	tab->cnt += add_cnt;
+
+	sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
+
+	return 0;
+end:
+	btf_free_dtor_kfunc_tab(btf);
+	if (btf_is_module(btf))
+		btf_put(btf);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(register_btf_id_dtor_kfuncs);
+
 #define MAX_TYPES_ARE_COMPAT_DEPTH 2
 
 static
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 09/15] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 21:43   ` kernel test robot
                     ` (2 more replies)
  2022-02-20 13:48 ` [PATCH bpf-next v1 11/15] bpf: Teach verifier about kptr_get style kfunc helpers Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  15 siblings, 3 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

A destructor kfunc can be defined as void func(type *), where type may
be void or any other pointer type as per convenience. The kfunc doesn't
have to take care about the map side pointer width, as it will be passed
a pointer after converting the u64 address embedded in the map.

In this patch, we ensure that the type is sane and capture the function
pointer into off_desc of ptr_off_tab for the specific pointer offset,
with the invariant that the dtor pointer is always set when 'ref' tag is
applied to the pointer's pointee type, which is indicated by the flag
BPF_MAP_VALUE_OFF_F_REF.

Note that only BTF IDs whose destructor kfunc is registered, thus become
the allowed BTF IDs for embedding as referenced PTR_TO_BTF_ID. Hence
btf_find_dtor_kfunc serves the purpose of finding dtor kfunc BTF ID, as
well acting as a check against the whitelist of allowed BTF IDs for this
purpose.

Finally, wire up the actual freeing of the referenced pointer if any at
all available offsets, so that no references are leaked after the BPF
map goes away and the BPF program previously moved the ownership a
referenced pointer into it.

The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem
will free any existing referenced PTR_TO_BTF_ID. The same case is with
LRU map's bpf_lru_push_free/htab_lru_push_free functions, which are
extended to reset and free referenced pointers.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  3 ++
 include/linux/btf.h   |  2 ++
 kernel/bpf/arraymap.c | 13 ++++++--
 kernel/bpf/btf.c      | 72 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/hashtab.c  | 27 ++++++++++------
 kernel/bpf/syscall.c  | 37 ++++++++++++++++++++--
 6 files changed, 139 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5d845ca02eba..744f1886cf91 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
 #include <linux/bpfptr.h>
+#include <linux/btf.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -171,6 +172,7 @@ struct bpf_map_value_off_desc {
 	u32 btf_id;
 	struct btf *btf;
 	struct module *module;
+	btf_dtor_kfunc_t dtor; /* only set when flags & BPF_MAP_VALUE_OFF_F_REF is true */
 	int flags;
 };
 
@@ -1568,6 +1570,7 @@ struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32
 void bpf_map_free_ptr_off_tab(struct bpf_map *map);
 struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map);
 bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
+void bpf_map_free_ptr_to_btf_id(struct bpf_map *map, void *map_value);
 
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index a304a1ea39d9..c7e75be9637f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -46,6 +46,8 @@ struct btf_id_dtor_kfunc {
 	u32 kfunc_btf_id;
 };
 
+typedef void (*btf_dtor_kfunc_t)(void *);
+
 extern const struct file_operations btf_fops;
 
 void btf_get(struct btf *btf);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7f145aefbff8..de4baca3edd7 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -287,10 +287,12 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 	return 0;
 }
 
-static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
+static void check_and_free_timer_and_ptr_in_array(struct bpf_array *arr, void *val)
 {
 	if (unlikely(map_value_has_timer(&arr->map)))
 		bpf_timer_cancel_and_free(val + arr->map.timer_off);
+	if (unlikely(map_value_has_ptr_to_btf_id(&arr->map)))
+		bpf_map_free_ptr_to_btf_id(&arr->map, val);
 }
 
 /* Called from syscall or from eBPF program */
@@ -327,7 +329,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 			copy_map_value_locked(map, val, value, false);
 		else
 			copy_map_value(map, val, value);
-		check_and_free_timer_in_array(array, val);
+		check_and_free_timer_and_ptr_in_array(array, val);
 	}
 	return 0;
 }
@@ -398,6 +400,13 @@ static void array_map_free_timers(struct bpf_map *map)
 static void array_map_free(struct bpf_map *map)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	int i;
+
+	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
+		for (i = 0; i < array->map.max_entries; i++)
+			bpf_map_free_ptr_to_btf_id(map, array->value + array->elem_size * i);
+		bpf_map_free_ptr_off_tab(map);
+	}
 
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		bpf_array_free_percpu(array);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 8a6ec1847f17..f322967da54b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3170,7 +3170,7 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 	int nr_off, ret, flags = 0;
 	struct module *mod = NULL;
 	struct btf *kernel_btf;
-	s32 id;
+	s32 id, dtor_btf_id;
 
 	/* For PTR, sz is always == 8 */
 	if (!btf_type_is_ptr(t))
@@ -3291,9 +3291,79 @@ static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 	tab->off[nr_off].btf    = kernel_btf;
 	tab->off[nr_off].module = mod;
 	tab->off[nr_off].flags  = flags;
+
+	/* Find and stash the function pointer for the destruction function that
+	 * needs to be eventually invoked from the map free path.
+	 *
+	 * Note that we already took module reference, and the map free path
+	 * always invoked the destructor for BTF ID before freeing ptr_off_tab,
+	 * so calling the function should be safe in that context.
+	 */
+	if (ref_tag) {
+		const struct btf_type *dtor_func, *dtor_func_proto, *t;
+		const struct btf_param *args;
+		const char *dtor_func_name;
+		unsigned long addr;
+		u32 nr_args;
+
+		/* This call also serves as a whitelist of allowed objects that
+		 * can be used as a referenced pointer and be stored in a map at
+		 * the same time.
+		 */
+		dtor_btf_id = btf_find_dtor_kfunc(kernel_btf, id);
+		if (dtor_btf_id < 0) {
+			ret = dtor_btf_id;
+			goto end_mod;
+		}
+
+		dtor_func = btf_type_by_id(kernel_btf, dtor_btf_id);
+		if (!dtor_func || !btf_type_is_func(dtor_func)) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+
+		dtor_func_proto = btf_type_by_id(kernel_btf, dtor_func->type);
+		if (!dtor_func_proto || !btf_type_is_func_proto(dtor_func_proto)) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+
+		/* Make sure the prototype of the destructor kfunc is 'void func(type *)' */
+		t = btf_type_by_id(kernel_btf, dtor_func_proto->type);
+		if (!t || !btf_type_is_void(t)) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+
+		nr_args = btf_type_vlen(dtor_func_proto);
+		args = btf_params(dtor_func_proto);
+
+		t = NULL;
+		if (nr_args)
+			t = btf_type_by_id(kernel_btf, args[0].type);
+		/* Allow any pointer type, as width on targets Linux supports
+		 * will be same for all pointer types (i.e. sizeof(void *))
+		 */
+		if (nr_args != 1 || !t || !btf_type_is_ptr(t)) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+
+		dtor_func_name = __btf_name_by_offset(kernel_btf, dtor_func->name_off);
+		addr = kallsyms_lookup_name(dtor_func_name);
+		if (!addr) {
+			ret = -EINVAL;
+			goto end_mod;
+		}
+		tab->off[nr_off].dtor = (void *)addr;
+	}
+
 	tab->nr_off++;
 
 	return 0;
+end_mod:
+	if (mod)
+		module_put(mod);
 end_btf:
 	/* Reference is only raised for module BTF */
 	if (btf_is_module(kernel_btf))
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d29af9988f37..3c33b58e8d3e 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -725,12 +725,15 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
 	return insn - insn_buf;
 }
 
-static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
+static void check_and_free_timer_and_ptr(struct bpf_htab *htab,
+					 struct htab_elem *elem, bool free_ptr)
 {
+	void *map_value = elem->key + round_up(htab->map.key_size, 8);
+
 	if (unlikely(map_value_has_timer(&htab->map)))
-		bpf_timer_cancel_and_free(elem->key +
-					  round_up(htab->map.key_size, 8) +
-					  htab->map.timer_off);
+		bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
+	if (unlikely(map_value_has_ptr_to_btf_id(&htab->map)) && free_ptr)
+		bpf_map_free_ptr_to_btf_id(&htab->map, map_value);
 }
 
 /* It is called from the bpf_lru_list when the LRU needs to delete
@@ -757,7 +760,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
 		if (l == tgt_l) {
 			hlist_nulls_del_rcu(&l->hash_node);
-			check_and_free_timer(htab, l);
+			check_and_free_timer_and_ptr(htab, l, true);
 			break;
 		}
 
@@ -829,7 +832,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
 {
 	if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
 		free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
-	check_and_free_timer(htab, l);
+	check_and_free_timer_and_ptr(htab, l, true);
 	kfree(l);
 }
 
@@ -857,7 +860,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 	htab_put_fd_value(htab, l);
 
 	if (htab_is_prealloc(htab)) {
-		check_and_free_timer(htab, l);
+		check_and_free_timer_and_ptr(htab, l, true);
 		__pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		atomic_dec(&htab->count);
@@ -1104,7 +1107,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		if (!htab_is_prealloc(htab))
 			free_htab_elem(htab, l_old);
 		else
-			check_and_free_timer(htab, l_old);
+			check_and_free_timer_and_ptr(htab, l_old, true);
 	}
 	ret = 0;
 err:
@@ -1114,7 +1117,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 
 static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
 {
-	check_and_free_timer(htab, elem);
+	check_and_free_timer_and_ptr(htab, elem, true);
 	bpf_lru_push_free(&htab->lru, &elem->lru_node);
 }
 
@@ -1420,7 +1423,10 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
 		struct htab_elem *l;
 
 		hlist_nulls_for_each_entry(l, n, head, hash_node)
-			check_and_free_timer(htab, l);
+			/* We are called from map_release_uref, so we don't free
+			 * ref'd pointers.
+			 */
+			check_and_free_timer_and_ptr(htab, l, false);
 		cond_resched_rcu();
 	}
 	rcu_read_unlock();
@@ -1458,6 +1464,7 @@ static void htab_map_free(struct bpf_map *map)
 	else
 		prealloc_destroy(htab);
 
+	bpf_map_free_ptr_off_tab(map);
 	free_percpu(htab->extra_elems);
 	bpf_map_area_free(htab->buckets);
 	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83d71d6912f5..925e8c615ad2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -638,15 +638,48 @@ bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map
 	return !memcmp(tab_a, tab_b, size);
 }
 
+/* Caller must ensure map_value_has_ptr_to_btf_id is true. Note that this
+ * function can be called on a map value while the map_value is visible to BPF
+ * programs, as it ensures the correct synchronization, and we already enforce
+ * the same using the verifier on the BPF program side, esp. for referenced
+ * pointers.
+ */
+void bpf_map_free_ptr_to_btf_id(struct bpf_map *map, void *map_value)
+{
+	struct bpf_map_value_off *tab = map->ptr_off_tab;
+	u64 *btf_id_ptr;
+	int i;
+
+	for (i = 0; i < tab->nr_off; i++) {
+		struct bpf_map_value_off_desc *off_desc = &tab->off[i];
+		u64 old_ptr;
+
+		btf_id_ptr = map_value + off_desc->offset;
+		if (!(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+			/* On 32-bit platforms, WRITE_ONCE 64-bit store tearing
+			 * into two 32-bit stores is fine for us, as we only
+			 * permit pointer values to be stored at this address,
+			 * which are word sized, so the other half of 64-bit
+			 * value will always be zeroed.
+			 */
+			WRITE_ONCE(*btf_id_ptr, 0);
+			continue;
+		}
+		old_ptr = xchg(btf_id_ptr, 0);
+		off_desc->dtor((void *)old_ptr);
+	}
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
 
 	security_bpf_map_free(map);
-	bpf_map_free_ptr_off_tab(map);
 	bpf_map_release_memcg(map);
-	/* implementation dependent freeing */
+	/* implementation dependent freeing, map_free callback also does
+	 * bpf_map_free_ptr_off_tab, if needed.
+	 */
 	map->ops->map_free(map);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 11/15] bpf: Teach verifier about kptr_get style kfunc helpers
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

We introduce a new style of kfunc helpers, namely *_kptr_get, where they
take pointer to the map value which points to a referenced kernel
pointer contained in the map. Since this is referenced, only BPF_XCHG
from kernel and BPF side is allowed to change the current value, and
each pointer that resides in that location would be referenced, and RCU
protected (must be kept in mind while adding kernel types embeddable as
reference kptr in BPF maps).

This means that if do the load of the pointer value in an RCU read
section, and find a live pointer, then as long as we hold RCU read lock,
it won't be freed by a parallel xchg + release operation. This allows us
to implement a safe refcount increment scheme. Hence, enforce that first
argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the
right offset to referenced pointer.

For the rest of the arguments, they are subjected to typical kfunc
argument checks, hence allowing some flexibility in passing more intent
into how the reference should be taken.

For instance, in case of struct nf_conn, it is not freed until all RCU
grace period ends, but can still be reused for another tuple once
refcount has dropped to zero. Hence, a bpf_ct_kptr_get helper not only
needs to call refcount_inc_not_zero, but also do a tuple match after
incrementing the reference, and when it fails to match it, put the
reference again and return NULL.

This can be implemented easily if we allow passing additional parameters
to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a
tuple__sz pair.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h   |  2 ++
 kernel/bpf/btf.c      | 48 +++++++++++++++++++++++++++++++++++++++++--
 kernel/bpf/verifier.c |  9 ++++----
 3 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index c7e75be9637f..10918ac0e55f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,6 +17,7 @@ enum btf_kfunc_type {
 	BTF_KFUNC_TYPE_ACQUIRE,
 	BTF_KFUNC_TYPE_RELEASE,
 	BTF_KFUNC_TYPE_RET_NULL,
+	BTF_KFUNC_TYPE_KPTR_ACQUIRE,
 	BTF_KFUNC_TYPE_MAX,
 };
 
@@ -36,6 +37,7 @@ struct btf_kfunc_id_set {
 			struct btf_id_set *acquire_set;
 			struct btf_id_set *release_set;
 			struct btf_id_set *ret_null_set;
+			struct btf_id_set *kptr_acquire_set;
 		};
 		struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
 	};
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index f322967da54b..1d112db4c124 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6034,11 +6034,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 	struct bpf_verifier_log *log = &env->log;
 	u32 i, nargs, ref_id, ref_obj_id = 0;
 	bool is_kfunc = btf_is_kernel(btf);
+	bool rel = false, kptr_get = false;
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
 	int ref_regno = 0;
-	bool rel = false;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
@@ -6064,6 +6064,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
+	if (is_kfunc)
+		kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
+						     BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
@@ -6087,7 +6090,48 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
-		if (btf_get_prog_ctx_type(log, btf, t,
+		if (i == 0 && is_kfunc && kptr_get) {
+			struct bpf_map_value_off_desc *off_desc;
+
+			if (reg->type != PTR_TO_MAP_VALUE) {
+				bpf_log(log, "arg#0 expected pointer to map value, but got %s\n",
+					btf_type_str(t));
+				return -EINVAL;
+			}
+
+			if (!tnum_is_const(reg->var_off)) {
+				bpf_log(log, "arg#0 cannot have variable offset\n");
+				return -EINVAL;
+			}
+
+			off_desc = bpf_map_ptr_off_contains(reg->map_ptr, reg->off + reg->var_off.value);
+			if (!off_desc || !(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+				bpf_log(log, "arg#0 no referenced pointer at map value offset=%llu\n",
+					reg->off + reg->var_off.value);
+				return -EINVAL;
+			}
+
+			if (!btf_type_is_ptr(ref_t)) {
+				bpf_log(log, "arg#0 type must be a double pointer\n");
+				return -EINVAL;
+			}
+
+			ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id);
+			ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+
+			if (!btf_type_is_struct(ref_t)) {
+				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+			if (!btf_struct_ids_match(log, btf, ref_id, 0, off_desc->btf, off_desc->btf_id)) {
+				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n",
+					func_name, i, btf_type_str(ref_t), ref_tname);
+				return -EINVAL;
+			}
+
+			/* rest of the arguments can be anything, like normal kfunc */
+		} else if (btf_get_prog_ctx_type(log, btf, t,
 					  env->prog->type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0a2cd21d9ec1..a4ff951ea46f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3657,11 +3657,12 @@ static int check_map_ptr_to_btf_id(struct bpf_verifier_env *env, u32 regno, int
 	} else if (insn_class == BPF_LDX) {
 		if (WARN_ON_ONCE(value_regno < 0))
 			return -EFAULT;
+		/* We allow loading referenced pointer, but mark it as
+		 * untrusted. User needs to use a kptr_get helper to obtain a
+		 * trusted refcounted PTR_TO_BTF_ID by passing in the map
+		 * value pointing to the referenced pointer.
+		 */
 		val_reg = reg_state(env, value_regno);
-		if (ref_ptr) {
-			verbose(env, "referenced btf_id pointer can only be accessed using BPF_XCHG\n");
-			return -EACCES;
-		}
 		/* We can simply mark the value_regno receiving the pointer
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 11/15] bpf: Teach verifier about kptr_get style kfunc helpers Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-21  4:35   ` kernel test robot
  2022-02-20 13:48 ` [PATCH bpf-next v1 13/15] libbpf: Add __kptr* macros to bpf_helpers.h Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Require some more feedback on whether this is OK, before refactoring
netfilter functions to share code to increment reference and match the
tuple. Also probably need to work on allowing taking reference to struct
net * to save another lookup inside this function.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/net/netfilter/nf_conntrack_core.h |  17 +++
 net/netfilter/nf_conntrack_bpf.c          | 132 +++++++++++++++++-----
 net/netfilter/nf_conntrack_core.c         |  17 ---
 3 files changed, 119 insertions(+), 47 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 13807ea94cd2..09389769dce3 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -51,6 +51,23 @@ nf_conntrack_find_get(struct net *net,
 
 int __nf_conntrack_confirm(struct sk_buff *skb);
 
+static inline bool
+nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
+		const struct nf_conntrack_tuple *tuple,
+		const struct nf_conntrack_zone *zone,
+		const struct net *net)
+{
+	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+	/* A conntrack can be recreated with the equal tuple,
+	 * so we need to check that the conntrack is confirmed
+	 */
+	return nf_ct_tuple_equal(tuple, &h->tuple) &&
+	       nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h)) &&
+	       nf_ct_is_confirmed(ct) &&
+	       net_eq(net, nf_ct_net(ct));
+}
+
 /* Confirm a connection: returns NF_DROP if packet must be dropped. */
 static inline int nf_conntrack_confirm(struct sk_buff *skb)
 {
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8ad3f52579f3..26211a5ec0c4 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -52,6 +52,30 @@ enum {
 	NF_BPF_CT_OPTS_SZ = 12,
 };
 
+static int bpf_fill_nf_tuple(struct nf_conntrack_tuple *tuple,
+			     struct bpf_sock_tuple *bpf_tuple, u32 tuple_len)
+{
+	switch (tuple_len) {
+	case sizeof(bpf_tuple->ipv4):
+		tuple->src.l3num = AF_INET;
+		tuple->src.u3.ip = bpf_tuple->ipv4.saddr;
+		tuple->src.u.tcp.port = bpf_tuple->ipv4.sport;
+		tuple->dst.u3.ip = bpf_tuple->ipv4.daddr;
+		tuple->dst.u.tcp.port = bpf_tuple->ipv4.dport;
+		break;
+	case sizeof(bpf_tuple->ipv6):
+		tuple->src.l3num = AF_INET6;
+		memcpy(tuple->src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
+		tuple->src.u.tcp.port = bpf_tuple->ipv6.sport;
+		memcpy(tuple->dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
+		tuple->dst.u.tcp.port = bpf_tuple->ipv6.dport;
+		break;
+	default:
+		return -EAFNOSUPPORT;
+	}
+	return 0;
+}
+
 static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
 					  struct bpf_sock_tuple *bpf_tuple,
 					  u32 tuple_len, u8 protonum,
@@ -59,6 +83,7 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
 {
 	struct nf_conntrack_tuple_hash *hash;
 	struct nf_conntrack_tuple tuple;
+	int ret;
 
 	if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP))
 		return ERR_PTR(-EPROTO);
@@ -66,25 +91,9 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
 		return ERR_PTR(-EINVAL);
 
 	memset(&tuple, 0, sizeof(tuple));
-	switch (tuple_len) {
-	case sizeof(bpf_tuple->ipv4):
-		tuple.src.l3num = AF_INET;
-		tuple.src.u3.ip = bpf_tuple->ipv4.saddr;
-		tuple.src.u.tcp.port = bpf_tuple->ipv4.sport;
-		tuple.dst.u3.ip = bpf_tuple->ipv4.daddr;
-		tuple.dst.u.tcp.port = bpf_tuple->ipv4.dport;
-		break;
-	case sizeof(bpf_tuple->ipv6):
-		tuple.src.l3num = AF_INET6;
-		memcpy(tuple.src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
-		tuple.src.u.tcp.port = bpf_tuple->ipv6.sport;
-		memcpy(tuple.dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
-		tuple.dst.u.tcp.port = bpf_tuple->ipv6.dport;
-		break;
-	default:
-		return ERR_PTR(-EAFNOSUPPORT);
-	}
-
+	ret = bpf_fill_nf_tuple(&tuple, bpf_tuple, tuple_len);
+	if (ret < 0)
+		return ERR_PTR(ret);
 	tuple.dst.protonum = protonum;
 
 	if (netns_id >= 0) {
@@ -208,50 +217,113 @@ void bpf_ct_release(struct nf_conn *nfct)
 	nf_ct_put(nfct);
 }
 
+/* TODO: Just a PoC, need to reuse code in __nf_conntrack_find_get for this */
+struct nf_conn *bpf_ct_kptr_get(struct nf_conn **ptr, struct bpf_sock_tuple *bpf_tuple,
+				u32 tuple__sz, u8 protonum, u8 direction)
+{
+	struct nf_conntrack_tuple tuple;
+	struct nf_conn *nfct;
+	struct net *net;
+	u64 *nfct_p;
+	int ret;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	if ((protonum != IPPROTO_TCP && protonum != IPPROTO_UDP) ||
+	    (direction != IP_CT_DIR_ORIGINAL && direction != IP_CT_DIR_REPLY))
+		return NULL;
+
+	/* ptr is actually pointer to u64 having address, hence recast u64 load
+	 * to native pointer width.
+	 */
+	nfct_p = (u64 *)ptr;
+	nfct = (struct nf_conn *)READ_ONCE(*nfct_p);
+	if (!nfct || unlikely(!refcount_inc_not_zero(&nfct->ct_general.use)))
+		return NULL;
+
+	memset(&tuple, 0, sizeof(tuple));
+	ret = bpf_fill_nf_tuple(&tuple, bpf_tuple, tuple__sz);
+	if (ret < 0)
+		goto end;
+	tuple.dst.protonum = protonum;
+
+	/* XXX: Need to allow passing in struct net *, or take netns_id, this is non-sense */
+	net = nf_ct_net(nfct);
+	if (!nf_ct_key_equal(&nfct->tuplehash[direction], &tuple,
+			     &nf_ct_zone_dflt, nf_ct_net(nfct)))
+		goto end;
+	return nfct;
+end:
+	nf_ct_put(nfct);
+	return NULL;
+}
+
 __diag_pop()
 
 BTF_SET_START(nf_ct_xdp_check_kfunc_ids)
 BTF_ID(func, bpf_xdp_ct_lookup)
+BTF_ID(func, bpf_ct_kptr_get)
 BTF_ID(func, bpf_ct_release)
 BTF_SET_END(nf_ct_xdp_check_kfunc_ids)
 
 BTF_SET_START(nf_ct_tc_check_kfunc_ids)
 BTF_ID(func, bpf_skb_ct_lookup)
+BTF_ID(func, bpf_ct_kptr_get)
 BTF_ID(func, bpf_ct_release)
 BTF_SET_END(nf_ct_tc_check_kfunc_ids)
 
 BTF_SET_START(nf_ct_acquire_kfunc_ids)
 BTF_ID(func, bpf_xdp_ct_lookup)
 BTF_ID(func, bpf_skb_ct_lookup)
+BTF_ID(func, bpf_ct_kptr_get)
 BTF_SET_END(nf_ct_acquire_kfunc_ids)
 
 BTF_SET_START(nf_ct_release_kfunc_ids)
 BTF_ID(func, bpf_ct_release)
 BTF_SET_END(nf_ct_release_kfunc_ids)
 
+BTF_SET_START(nf_ct_kptr_acquire_kfunc_ids)
+BTF_ID(func, bpf_ct_kptr_get)
+BTF_SET_END(nf_ct_kptr_acquire_kfunc_ids)
+
 /* Both sets are identical */
 #define nf_ct_ret_null_kfunc_ids nf_ct_acquire_kfunc_ids
 
 static const struct btf_kfunc_id_set nf_conntrack_xdp_kfunc_set = {
-	.owner        = THIS_MODULE,
-	.check_set    = &nf_ct_xdp_check_kfunc_ids,
-	.acquire_set  = &nf_ct_acquire_kfunc_ids,
-	.release_set  = &nf_ct_release_kfunc_ids,
-	.ret_null_set = &nf_ct_ret_null_kfunc_ids,
+	.owner            = THIS_MODULE,
+	.check_set        = &nf_ct_xdp_check_kfunc_ids,
+	.acquire_set      = &nf_ct_acquire_kfunc_ids,
+	.release_set      = &nf_ct_release_kfunc_ids,
+	.ret_null_set     = &nf_ct_ret_null_kfunc_ids,
+	.kptr_acquire_set = &nf_ct_kptr_acquire_kfunc_ids,
 };
 
 static const struct btf_kfunc_id_set nf_conntrack_tc_kfunc_set = {
-	.owner        = THIS_MODULE,
-	.check_set    = &nf_ct_tc_check_kfunc_ids,
-	.acquire_set  = &nf_ct_acquire_kfunc_ids,
-	.release_set  = &nf_ct_release_kfunc_ids,
-	.ret_null_set = &nf_ct_ret_null_kfunc_ids,
+	.owner            = THIS_MODULE,
+	.check_set        = &nf_ct_tc_check_kfunc_ids,
+	.acquire_set      = &nf_ct_acquire_kfunc_ids,
+	.release_set      = &nf_ct_release_kfunc_ids,
+	.ret_null_set     = &nf_ct_ret_null_kfunc_ids,
+	.kptr_acquire_set = &nf_ct_kptr_acquire_kfunc_ids,
 };
 
+BTF_ID_LIST(nf_conntrack_dtor_kfunc_ids)
+BTF_ID(struct, nf_conn)
+BTF_ID(func, bpf_ct_release)
+
 int register_nf_conntrack_bpf(void)
 {
+	const struct btf_id_dtor_kfunc nf_conntrack_dtor_kfunc[] = {
+		{
+			.btf_id       = nf_conntrack_dtor_kfunc_ids[0],
+			.kfunc_btf_id = nf_conntrack_dtor_kfunc_ids[1],
+		}
+	};
 	int ret;
 
-	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set);
+	ret = register_btf_id_dtor_kfuncs(nf_conntrack_dtor_kfunc,
+					  ARRAY_SIZE(nf_conntrack_dtor_kfunc),
+					  THIS_MODULE);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_tc_kfunc_set);
 }
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9b7f9c966f73..0aae98f60769 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -710,23 +710,6 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 }
 EXPORT_SYMBOL_GPL(nf_ct_delete);
 
-static inline bool
-nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
-		const struct nf_conntrack_tuple *tuple,
-		const struct nf_conntrack_zone *zone,
-		const struct net *net)
-{
-	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
-
-	/* A conntrack can be recreated with the equal tuple,
-	 * so we need to check that the conntrack is confirmed
-	 */
-	return nf_ct_tuple_equal(tuple, &h->tuple) &&
-	       nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h)) &&
-	       nf_ct_is_confirmed(ct) &&
-	       net_eq(net, nf_ct_net(ct));
-}
-
 static inline bool
 nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2)
 {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 13/15] libbpf: Add __kptr* macros to bpf_helpers.h
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 14/15] selftests/bpf: Add C tests for PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Include convenience definitions:
__kptr:		Unreferenced BTF ID pointer
__kptr_ref:	Referenced BTF ID pointer
__kptr_percpu:	per-CPU BTF ID pointer
__kptr_user:	Userspace BTF ID pointer

Users can use them to tag the pointer type meant to be used with the new
support directly in the map value definition. Note that these attributes
require https://reviews.llvm.org/D119799 to be emitted in BPF object
BTF correctly when applied to a non-builtin type.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/bpf_helpers.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 44df982d2a5c..feb311fe1c72 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -149,6 +149,10 @@ enum libbpf_tristate {
 
 #define __kconfig __attribute__((section(".kconfig")))
 #define __ksym __attribute__((section(".ksyms")))
+#define __kptr __attribute__((btf_type_tag("kernel.bpf.btf_id")))
+#define __kptr_ref __kptr __attribute__((btf_type_tag("kernel.bpf.ref")))
+#define __kptr_percpu __kptr __attribute__((btf_type_tag("kernel.bpf.percpu")))
+#define __kptr_user __kptr __attribute__((btf_type_tag("kernel.bpf.user")))
 
 #ifndef ___bpf_concat
 #define ___bpf_concat(a, b) a ## b
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 14/15] selftests/bpf: Add C tests for PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 13/15] libbpf: Add __kptr* macros to bpf_helpers.h Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-20 13:48 ` [PATCH bpf-next v1 15/15] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
  2022-02-22  6:05 ` [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Song Liu
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

This uses the __kptr* macros as well, and tries to test the stuff that
is supposed to work, since we have negative tests in test_verifier
suite.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/map_btf_ptr.c    |  13 +++
 .../testing/selftests/bpf/progs/map_btf_ptr.c | 105 ++++++++++++++++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c |  31 ++++++
 3 files changed, 149 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_btf_ptr.c

diff --git a/tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c b/tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c
new file mode 100644
index 000000000000..8fb6acf1b89d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c
@@ -0,0 +1,13 @@
+#include <test_progs.h>
+
+#include "map_btf_ptr.skel.h"
+
+void test_map_btf_ptr(void)
+{
+	struct map_btf_ptr *skel;
+
+	skel = map_btf_ptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "map_btf_ptr__open_and_load"))
+		return;
+	map_btf_ptr__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/map_btf_ptr.c b/tools/testing/selftests/bpf/progs/map_btf_ptr.c
new file mode 100644
index 000000000000..b0c2ba595290
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/map_btf_ptr.c
@@ -0,0 +1,105 @@
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+#define xchg(dst, src) __sync_lock_test_and_set(&(dst), (src))
+
+struct map_value {
+	struct prog_test_ref_kfunc __kptr *unref_ptr;
+	/* Workarounds for https://lore.kernel.org/bpf/20220220071333.sltv4jrwniool2qy@apollo.legion */
+	struct prog_test_ref_kfunc __kptr __attribute__((btf_type_tag("kernel.bpf.ref"))) *ref_ptr;
+	struct prog_test_ref_kfunc __kptr __attribute__((btf_type_tag("kernel.bpf.percpu"))) *percpu_ptr;
+	struct prog_test_ref_kfunc __kptr __attribute__((btf_type_tag("kernel.bpf.user"))) *user_ptr;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
+extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
+
+SEC("tc")
+int map_btf_ptr(struct __sk_buff *ctx)
+{
+	struct prog_test_ref_kfunc *p;
+	char buf[sizeof(*p)];
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 0;
+	p = v->unref_ptr;
+	/* store untrusted_ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return 0;
+	if (p->a + p->b > 100)
+		return 1;
+	/* store untrusted_ptr_ */
+	v->unref_ptr = p;
+	/* store NULL */
+	v->unref_ptr = NULL;
+
+	p = v->ref_ptr;
+	/* store ptr_or_null_ */
+	v->unref_ptr = p;
+	if (!p)
+		return 0;
+	if (p->a + p->b > 100)
+		return 1;
+	/* store NULL */
+	p = xchg(v->ref_ptr, NULL);
+	if (!p)
+		return 0;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return 1;
+	}
+	/* store ptr_ */
+	v->unref_ptr = p;
+	bpf_kfunc_call_test_release(p);
+
+	p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
+	if (!p)
+		return 0;
+	/* store ptr_ */
+	p = xchg(v->ref_ptr, p);
+	if (!p)
+		return 0;
+	if (p->a + p->b > 100) {
+		bpf_kfunc_call_test_release(p);
+		return 1;
+	}
+	bpf_kfunc_call_test_release(p);
+
+	p = v->percpu_ptr;
+	/* store percpu_ptr_or_null_ */
+	v->percpu_ptr = p;
+	if (!p)
+		return 0;
+	p = bpf_this_cpu_ptr(p);
+	if (p->a + p->b > 100)
+		return 1;
+	/* store percpu_ptr_ */
+	v->percpu_ptr = p;
+	/* store NULL */
+	v->percpu_ptr = NULL;
+
+	p = v->user_ptr;
+	/* store user_ptr_or_null_ */
+	v->user_ptr = p;
+	if (!p)
+		return 0;
+	bpf_probe_read_user(buf, sizeof(buf), p);
+	/* store user_ptr_ */
+	v->user_ptr = p;
+	/* store NULL */
+	v->user_ptr = NULL;
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
index f00a9731930e..74e3892be544 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -30,8 +30,21 @@ struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32,
 				  struct bpf_ct_opts___local *, u32) __ksym;
 struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32,
 				  struct bpf_ct_opts___local *, u32) __ksym;
+struct nf_conn *bpf_ct_kptr_get(struct nf_conn **, struct bpf_sock_tuple *, u32,
+				u8, u8) __ksym;
 void bpf_ct_release(struct nf_conn *) __ksym;
 
+struct nf_map_value {
+	struct nf_conn __kptr_ref *ct;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct nf_map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
 static __always_inline void
 nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
 				   struct bpf_ct_opts___local *, u32),
@@ -101,10 +114,27 @@ nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
 		test_eafnosupport = opts_def.error;
 }
 
+static __always_inline void
+nf_ct_test_kptr(void)
+{
+	struct bpf_sock_tuple tuple = {};
+	struct nf_map_value *v;
+	struct nf_conn *ct;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return;
+	ct = bpf_ct_kptr_get(&v->ct, &tuple, sizeof(tuple.ipv4), IPPROTO_TCP, IP_CT_DIR_ORIGINAL);
+	if (!ct)
+		return;
+	bpf_ct_release(ct);
+}
+
 SEC("xdp")
 int nf_xdp_ct_test(struct xdp_md *ctx)
 {
 	nf_ct_test((void *)bpf_xdp_ct_lookup, ctx);
+	nf_ct_test_kptr();
 	return 0;
 }
 
@@ -112,6 +142,7 @@ SEC("tc")
 int nf_skb_ct_test(struct __sk_buff *ctx)
 {
 	nf_ct_test((void *)bpf_skb_ct_lookup, ctx);
+	nf_ct_test_kptr();
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH bpf-next v1 15/15] selftests/bpf: Add verifier tests for PTR_TO_BTF_ID in map
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 14/15] selftests/bpf: Add C tests for PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
@ 2022-02-20 13:48 ` Kumar Kartikeya Dwivedi
  2022-02-22  6:05 ` [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Song Liu
  15 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-20 13:48 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in
BPF map case, including some tests that verify implementation sanity and
corner cases.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 net/bpf/test_run.c                            |  17 +-
 tools/testing/selftests/bpf/test_verifier.c   |  57 +-
 .../selftests/bpf/verifier/map_btf_ptr.c      | 624 ++++++++++++++++++
 3 files changed, 695 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/map_btf_ptr.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index f08034500813..caa289f63849 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1263,8 +1263,23 @@ static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
 	.ret_null_set = &test_sk_ret_null_kfunc_ids,
 };
 
+BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
+BTF_ID(struct, prog_test_ref_kfunc)
+BTF_ID(func, bpf_kfunc_call_test_release)
+
 static int __init bpf_prog_test_run_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
+		{
+		  .btf_id       = bpf_prog_test_dtor_kfunc_ids[0],
+		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
+		},
+	};
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
+	return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
+						  ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
+						  THIS_MODULE);
 }
 late_initcall(bpf_prog_test_run_init);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 92e3465fbae8..9ec0c4457396 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -54,7 +54,7 @@
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_TEST_INSNS	1000000
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	22
+#define MAX_NR_MAPS	23
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -98,6 +98,7 @@ struct bpf_test {
 	int fixup_map_reuseport_array[MAX_FIXUPS];
 	int fixup_map_ringbuf[MAX_FIXUPS];
 	int fixup_map_timer[MAX_FIXUPS];
+	int fixup_map_btf_ptr[MAX_FIXUPS];
 	struct kfunc_btf_id_pair fixup_kfunc_btf_id[MAX_FIXUPS];
 	/* Expected verifier log output for result REJECT or VERBOSE_ACCEPT.
 	 * Can be a tab-separated sequence of expected strings. An empty string
@@ -618,8 +619,13 @@ static int create_cgroup_storage(bool percpu)
  * struct timer {
  *   struct bpf_timer t;
  * };
+ * struct btf_ptr {
+ *   struct prog_test_ref_kfunc __btf_id *ptr;
+ * }
  */
-static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t";
+static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t"
+				  "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kernel.bpf.btf_id"
+				  "\0kernel.bpf.ref\0kernel.bpf.percpu\0kernel.bpf.user";
 static __u32 btf_raw_types[] = {
 	/* int */
 	BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),  /* [1] */
@@ -635,6 +641,26 @@ static __u32 btf_raw_types[] = {
 	/* struct timer */                              /* [5] */
 	BTF_TYPE_ENC(35, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), 16),
 	BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */
+	/* struct prog_test_ref_kfunc */		/* [6] */
+	BTF_STRUCT_ENC(51, 0, 0),
+	/* type tag "kernel.bpf.btf_id" */
+	BTF_TYPE_TAG_ENC(75, 6),			/* [7] */
+	/* type tag "kernel.bpf.ref" */
+	BTF_TYPE_TAG_ENC(93, 7),			/* [8] */
+	/* type tag "kernel.bpf.percpu" */
+	BTF_TYPE_TAG_ENC(108, 7),			/* [9] */
+	/* type tag "kernel.bpf.user" */
+	BTF_TYPE_TAG_ENC(126, 7),			/* [10] */
+	BTF_PTR_ENC(7),					/* [11] */
+	BTF_PTR_ENC(8),					/* [12] */
+	BTF_PTR_ENC(9),					/* [13] */
+	BTF_PTR_ENC(10),				/* [14] */
+	/* struct btf_ptr */				/* [15] */
+	BTF_STRUCT_ENC(43, 4, 32),
+	BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */
+	BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */
+	BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_ref_kfunc __kptr_percpu *ptr; */
+	BTF_MEMBER_ENC(71, 14, 192), /* struct prog_test_ref_kfunc __kptr_user *ptr; */
 };
 
 static int load_btf(void)
@@ -724,6 +750,25 @@ static int create_map_timer(void)
 	return fd;
 }
 
+static int create_map_btf_ptr(void)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		.btf_key_type_id = 1,
+		.btf_value_type_id = 15,
+	);
+	int fd, btf_fd;
+
+	btf_fd = load_btf();
+	if (btf_fd < 0)
+		return -1;
+
+	opts.btf_fd = btf_fd;
+	fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_map", 4, 32, 1, &opts);
+	if (fd < 0)
+		printf("Failed to create map with btf_id pointer\n");
+	return fd;
+}
+
 static char bpf_vlog[UINT_MAX >> 8];
 
 static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
@@ -751,6 +796,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_map_reuseport_array = test->fixup_map_reuseport_array;
 	int *fixup_map_ringbuf = test->fixup_map_ringbuf;
 	int *fixup_map_timer = test->fixup_map_timer;
+	int *fixup_map_btf_ptr = test->fixup_map_btf_ptr;
 	struct kfunc_btf_id_pair *fixup_kfunc_btf_id = test->fixup_kfunc_btf_id;
 
 	if (test->fill_helper) {
@@ -944,6 +990,13 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_map_timer++;
 		} while (*fixup_map_timer);
 	}
+	if (*fixup_map_btf_ptr) {
+		map_fds[22] = create_map_btf_ptr();
+		do {
+			prog[*fixup_map_btf_ptr].imm = map_fds[22];
+			fixup_map_btf_ptr++;
+		} while (*fixup_map_btf_ptr);
+	}
 
 	/* Patch in kfunc BTF IDs */
 	if (fixup_kfunc_btf_id->kfunc) {
diff --git a/tools/testing/selftests/bpf/verifier/map_btf_ptr.c b/tools/testing/selftests/bpf/verifier/map_btf_ptr.c
new file mode 100644
index 000000000000..89d854ce90eb
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/map_btf_ptr.c
@@ -0,0 +1,624 @@
+/* Common tests */
+{
+	"map_btf_ptr: BPF_ST imm != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "BPF_ST imm must be 0 when writing to btf_id pointer at off=0",
+},
+{
+	"map_btf_ptr: size != bpf_size_to_bytes(BPF_DW)",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_W, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "btf_id pointer load/store size must be 8",
+},
+{
+	"map_btf_ptr: map_value non-const var_off",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_2, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_3, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "btf_id pointer cannot be accessed by variable offset load/store",
+},
+{
+	"map_btf_ptr: unaligned boundary load/store",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 7),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "btf_id pointer offset incorrect",
+},
+{
+	"map_btf_ptr: reject var_off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+	BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 1),
+	BPF_EXIT_INSN(),
+	BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 is ptr_prog_test_ref_kfunc invalid variable offset: off=0, var_off=(0x0; 0x7)",
+},
+/* Tests for unreferened PTR_TO_BTF_ID */
+{
+	"map_btf_ptr: unref: reject btf_struct_ids_match == false",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 4),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=untrusted_ptr_prog_test_ref_kfunc expected=ptr_or_null_prog_test",
+},
+{
+	"map_btf_ptr: unref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R0 invalid mem access 'untrusted_ptr_or_null_'",
+},
+{
+	"map_btf_ptr: unref: correct in kernel type size",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 16),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "access beyond struct prog_test_ref_kfunc at off 16 size 8",
+},
+{
+	"map_btf_ptr: unref: inherit PTR_UNTRUSTED on struct walk",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_ expected=percpu_ptr_",
+},
+{
+	"map_btf_ptr: unref: no reference state created",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+},
+{
+	"map_btf_ptr: unref: xchg no reference state created",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_0, BPF_REG_1, 0),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+},
+/* Tests for referenced PTR_TO_BTF_ID */
+{
+	"map_btf_ptr: ref: loaded pointer marked as untrusted",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_",
+},
+{
+	"map_btf_ptr: ref: reject off != 0",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_0, BPF_REG_1, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 4),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_0, BPF_REG_1, 8),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 stored to referenced btf_id pointer cannot have non-zero offset",
+},
+{
+	"map_btf_ptr: ref: reference state created on xchg",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_7, BPF_REG_0, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "Unreleased reference id=4 alloc_insn=17",
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_acquire", 14 },
+	}
+},
+{
+	"map_btf_ptr: ref: reference state cleared for src_reg",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_7, BPF_REG_0, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = ACCEPT,
+	.fixup_kfunc_btf_id = {
+		{ "bpf_kfunc_call_test_acquire", 14 },
+		{ "bpf_kfunc_call_test_release", 21 },
+	}
+},
+{
+	"map_btf_ptr: ref: reject STX",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_REG(BPF_REG_1, 0),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 8),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "referenced btf_id pointer can only be accessed using BPF_XCHG",
+},
+{
+	"map_btf_ptr: ref: reject ST",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 8, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "referenced btf_id pointer can only be accessed using BPF_XCHG",
+},
+/* Tests for PTR_TO_PERCPU_BTF_ID */
+{
+	"map_btf_ptr: percpu: loaded pointer marked as percpu",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 16),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 type=percpu_ptr_or_null_ expected=percpu_ptr_",
+},
+{
+	"map_btf_ptr: percpu: reject store of untrusted_ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 16),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=untrusted_ptr_ expected=percpu_ptr_or_null_",
+},
+{
+	"map_btf_ptr: percpu: reject store of ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_0, BPF_REG_1, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 16),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=ptr_ expected=percpu_ptr_or_null_",
+},
+{
+	"map_btf_ptr: percpu: reject store of user_ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 24),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 16),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=user_ptr_ expected=percpu_ptr_or_null_",
+},
+/* Tests for PTR_TO_BTF_ID | MEM_USR */
+{
+	"map_btf_ptr: user: loaded pointer marked as user",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 24),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_this_cpu_ptr),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 type=user_ptr_or_null_ expected=percpu_ptr_",
+},
+{
+	"map_btf_ptr: user: reject user pointer deref",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 24),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 8),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "R1 invalid mem access 'user_ptr_'",
+},
+{
+	"map_btf_ptr: user: reject store of untrusted_ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 24),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=untrusted_ptr_ expected=user_ptr_or_null_",
+},
+{
+	"map_btf_ptr: user: reject store of ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_0, BPF_REG_1, 8),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 24),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=ptr_ expected=user_ptr_or_null_",
+},
+{
+	"map_btf_ptr: user: reject store of percpu_ptr_",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_LD_MAP_FD(BPF_REG_6, 0),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 16),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1),
+	BPF_EXIT_INSN(),
+	BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 24),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_btf_ptr = { 1 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid btf_id pointer access, R1 type=percpu_ptr_ expected=user_ptr_or_null_",
+},
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID " Kumar Kartikeya Dwivedi
@ 2022-02-20 20:40   ` kernel test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-02-20 20:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kbuild-all, Hao Luo, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, netfilter-devel, netdev

Hi Kumar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220217]
[cannot apply to bpf-next/master bpf/master linus/master v5.17-rc4 v5.17-rc3 v5.17-rc2 v5.17-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
base:    3c30cf91b5ecc7272b3d2942ae0505dd8320b81c
config: openrisc-randconfig-s032-20220220 (https://download.01.org/0day-ci/archive/20220221/202202210444.8UyLf80r-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/255d8431d2cae10fb3ac6abd44b1bf73f15dd060
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
        git checkout 255d8431d2cae10fb3ac6abd44b1bf73f15dd060
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=openrisc SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/verifier.c:1568:39: sparse: sparse: mixing different enum types:
>> kernel/bpf/verifier.c:1568:39: sparse:    unsigned int enum bpf_reg_type
>> kernel/bpf/verifier.c:1568:39: sparse:    unsigned int enum bpf_type_flag
   kernel/bpf/verifier.c:13916:38: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c: note: in included file (through include/linux/bpf.h, include/linux/bpf-cgroup.h):
   include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar

vim +1568 kernel/bpf/verifier.c

  1555	
  1556	static void mark_btf_ld_reg(struct bpf_verifier_env *env,
  1557				    struct bpf_reg_state *regs, u32 regno,
  1558				    enum bpf_reg_type reg_type,
  1559				    struct btf *btf, u32 btf_id,
  1560				    enum bpf_type_flag flag)
  1561	{
  1562		if (reg_type == SCALAR_VALUE ||
  1563		    WARN_ON_ONCE(reg_type != PTR_TO_BTF_ID && reg_type != PTR_TO_PERCPU_BTF_ID)) {
  1564			mark_reg_unknown(env, regs, regno);
  1565			return;
  1566		}
  1567		mark_reg_known_zero(env, regs, regno);
> 1568		regs[regno].type = reg_type | flag;
  1569		regs[regno].btf = btf;
  1570		regs[regno].btf_id = btf_id;
  1571	}
  1572	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
@ 2022-02-20 21:43   ` kernel test robot
  2022-02-20 22:55   ` kernel test robot
  2022-02-21  0:39   ` kernel test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-02-20 21:43 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kbuild-all, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Hi Kumar,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20220217]
[cannot apply to bpf-next/master bpf/master linus/master v5.17-rc4 v5.17-rc3 v5.17-rc2 v5.17-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
base:    3c30cf91b5ecc7272b3d2942ae0505dd8320b81c
config: openrisc-randconfig-s032-20220220 (https://download.01.org/0day-ci/archive/20220221/202202210547.JnjWSpPA-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 11.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/09a47522ec608218eb6aabd5011316d78ad245e0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
        git checkout 09a47522ec608218eb6aabd5011316d78ad245e0
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=openrisc SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   kernel/bpf/syscall.c: In function 'bpf_map_free_ptr_to_btf_id':
>> kernel/bpf/syscall.c:669:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     669 |                 off_desc->dtor((void *)old_ptr);
         |                                ^
   In file included from arch/openrisc/include/asm/atomic.h:131,
                    from include/linux/atomic.h:7,
                    from include/asm-generic/bitops/lock.h:5,
                    from arch/openrisc/include/asm/bitops.h:41,
                    from include/linux/bitops.h:33,
                    from include/linux/log2.h:12,
                    from include/asm-generic/div64.h:55,
                    from ./arch/openrisc/include/generated/asm/div64.h:1,
                    from include/linux/math.h:6,
                    from include/linux/math64.h:6,
                    from include/linux/time.h:6,
                    from include/linux/ktime.h:24,
                    from include/linux/timer.h:6,
                    from include/linux/workqueue.h:9,
                    from include/linux/bpf.h:9,
                    from kernel/bpf/syscall.c:4:
   In function '__xchg',
       inlined from 'bpf_map_free_ptr_to_btf_id' at kernel/bpf/syscall.c:668:13:
>> arch/openrisc/include/asm/cmpxchg.h:160:24: error: call to '__xchg_called_with_bad_pointer' declared with attribute error: Bad argument size for xchg
     160 |                 return __xchg_called_with_bad_pointer();
         |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +669 kernel/bpf/syscall.c

   640	
   641	/* Caller must ensure map_value_has_ptr_to_btf_id is true. Note that this
   642	 * function can be called on a map value while the map_value is visible to BPF
   643	 * programs, as it ensures the correct synchronization, and we already enforce
   644	 * the same using the verifier on the BPF program side, esp. for referenced
   645	 * pointers.
   646	 */
   647	void bpf_map_free_ptr_to_btf_id(struct bpf_map *map, void *map_value)
   648	{
   649		struct bpf_map_value_off *tab = map->ptr_off_tab;
   650		u64 *btf_id_ptr;
   651		int i;
   652	
   653		for (i = 0; i < tab->nr_off; i++) {
   654			struct bpf_map_value_off_desc *off_desc = &tab->off[i];
   655			u64 old_ptr;
   656	
   657			btf_id_ptr = map_value + off_desc->offset;
   658			if (!(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
   659				/* On 32-bit platforms, WRITE_ONCE 64-bit store tearing
   660				 * into two 32-bit stores is fine for us, as we only
   661				 * permit pointer values to be stored at this address,
   662				 * which are word sized, so the other half of 64-bit
   663				 * value will always be zeroed.
   664				 */
   665				WRITE_ONCE(*btf_id_ptr, 0);
   666				continue;
   667			}
   668			old_ptr = xchg(btf_id_ptr, 0);
 > 669			off_desc->dtor((void *)old_ptr);
   670		}
   671	}
   672	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
  2022-02-20 21:43   ` kernel test robot
@ 2022-02-20 22:55   ` kernel test robot
  2022-02-21  0:39   ` kernel test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-02-20 22:55 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: llvm, kbuild-all, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, netfilter-devel, netdev

Hi Kumar,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20220217]
[cannot apply to bpf-next/master bpf/master linus/master v5.17-rc4 v5.17-rc3 v5.17-rc2 v5.17-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
base:    3c30cf91b5ecc7272b3d2942ae0505dd8320b81c
config: mips-randconfig-r012-20220220 (https://download.01.org/0day-ci/archive/20220221/202202210651.wyTgHcwt-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/0day-ci/linux/commit/09a47522ec608218eb6aabd5011316d78ad245e0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
        git checkout 09a47522ec608218eb6aabd5011316d78ad245e0
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from kernel/bpf/syscall.c:4:
   In file included from include/linux/bpf.h:9:
   In file included from include/linux/workqueue.h:9:
   In file included from include/linux/timer.h:6:
   In file included from include/linux/ktime.h:24:
   In file included from include/linux/time.h:60:
   In file included from include/linux/time32.h:13:
   In file included from include/linux/timex.h:65:
   In file included from arch/mips/include/asm/timex.h:19:
   In file included from arch/mips/include/asm/cpu-type.h:12:
   In file included from include/linux/smp.h:13:
   In file included from include/linux/cpumask.h:13:
   In file included from include/linux/atomic.h:7:
   In file included from arch/mips/include/asm/atomic.h:23:
>> arch/mips/include/asm/cmpxchg.h:83:11: error: call to __xchg_called_with_bad_pointer declared with 'error' attribute: Bad argument size for xchg
                           return __xchg_called_with_bad_pointer();
                                  ^
   1 error generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for OMAP_GPMC
   Depends on MEMORY && OF_ADDRESS
   Selected by
   - MTD_NAND_OMAP2 && MTD && MTD_RAW_NAND && (ARCH_OMAP2PLUS || ARCH_KEYSTONE || ARCH_K3 || COMPILE_TEST && HAS_IOMEM


vim +/error +83 arch/mips/include/asm/cmpxchg.h

5154f3b4194910 Paul Burton         2017-06-09  66  
b70eb30056dc84 Paul Burton         2017-06-09  67  extern unsigned long __xchg_small(volatile void *ptr, unsigned long val,
b70eb30056dc84 Paul Burton         2017-06-09  68  				  unsigned int size);
b70eb30056dc84 Paul Burton         2017-06-09  69  
46f1619500d022 Thomas Bogendoerfer 2019-10-09  70  static __always_inline
46f1619500d022 Thomas Bogendoerfer 2019-10-09  71  unsigned long __xchg(volatile void *ptr, unsigned long x, int size)
b81947c646bfef David Howells       2012-03-28  72  {
b81947c646bfef David Howells       2012-03-28  73  	switch (size) {
b70eb30056dc84 Paul Burton         2017-06-09  74  	case 1:
b70eb30056dc84 Paul Burton         2017-06-09  75  	case 2:
b70eb30056dc84 Paul Burton         2017-06-09  76  		return __xchg_small(ptr, x, size);
b70eb30056dc84 Paul Burton         2017-06-09  77  
b81947c646bfef David Howells       2012-03-28  78  	case 4:
62c6081dca75d6 Paul Burton         2017-06-09  79  		return __xchg_asm("ll", "sc", (volatile u32 *)ptr, x);
62c6081dca75d6 Paul Burton         2017-06-09  80  
b81947c646bfef David Howells       2012-03-28  81  	case 8:
62c6081dca75d6 Paul Burton         2017-06-09  82  		if (!IS_ENABLED(CONFIG_64BIT))
62c6081dca75d6 Paul Burton         2017-06-09 @83  			return __xchg_called_with_bad_pointer();
62c6081dca75d6 Paul Burton         2017-06-09  84  
62c6081dca75d6 Paul Burton         2017-06-09  85  		return __xchg_asm("lld", "scd", (volatile u64 *)ptr, x);
62c6081dca75d6 Paul Burton         2017-06-09  86  
d15dc68c1143e2 Paul Burton         2017-06-09  87  	default:
d15dc68c1143e2 Paul Burton         2017-06-09  88  		return __xchg_called_with_bad_pointer();
b81947c646bfef David Howells       2012-03-28  89  	}
b81947c646bfef David Howells       2012-03-28  90  }
b81947c646bfef David Howells       2012-03-28  91  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
  2022-02-20 21:43   ` kernel test robot
  2022-02-20 22:55   ` kernel test robot
@ 2022-02-21  0:39   ` kernel test robot
  2 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-02-21  0:39 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kbuild-all, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Hi Kumar,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20220217]
[cannot apply to bpf-next/master bpf/master linus/master v5.17-rc4 v5.17-rc3 v5.17-rc2 v5.17-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
base:    3c30cf91b5ecc7272b3d2942ae0505dd8320b81c
config: microblaze-randconfig-r022-20220220 (https://download.01.org/0day-ci/archive/20220221/202202210811.0jZyZUP1-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/09a47522ec608218eb6aabd5011316d78ad245e0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
        git checkout 09a47522ec608218eb6aabd5011316d78ad245e0
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   microblaze-linux-ld: kernel/bpf/syscall.o: in function `bpf_map_free_ptr_to_btf_id':
>> (.text+0x555c): undefined reference to `__generic_xchg_called_with_bad_pointer'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper
  2022-02-20 13:48 ` [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper Kumar Kartikeya Dwivedi
@ 2022-02-21  4:35   ` kernel test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-02-21  4:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: kbuild-all, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

Hi Kumar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220217]
[cannot apply to bpf-next/master bpf/master linus/master v5.17-rc4 v5.17-rc3 v5.17-rc2 v5.17-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
base:    3c30cf91b5ecc7272b3d2942ae0505dd8320b81c
config: s390-defconfig (https://download.01.org/0day-ci/archive/20220221/202202211228.CO4wFX0Q-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/79e35d4e4ee33a7692f0612065012307a361cd56
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Kumar-Kartikeya-Dwivedi/Introduce-typed-pointer-support-in-BPF-maps/20220220-215105
        git checkout 79e35d4e4ee33a7692f0612065012307a361cd56
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash net/netfilter/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   net/netfilter/nf_conntrack_bpf.c: In function 'bpf_ct_kptr_get':
>> net/netfilter/nf_conntrack_bpf.c:226:21: warning: variable 'net' set but not used [-Wunused-but-set-variable]
     226 |         struct net *net;
         |                     ^~~
   net/netfilter/nf_conntrack_bpf.c: At top level:
   net/netfilter/nf_conntrack_bpf.c:314:5: warning: no previous prototype for 'register_nf_conntrack_bpf' [-Wmissing-prototypes]
     314 | int register_nf_conntrack_bpf(void)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~


vim +/net +226 net/netfilter/nf_conntrack_bpf.c

   219	
   220	/* TODO: Just a PoC, need to reuse code in __nf_conntrack_find_get for this */
   221	struct nf_conn *bpf_ct_kptr_get(struct nf_conn **ptr, struct bpf_sock_tuple *bpf_tuple,
   222					u32 tuple__sz, u8 protonum, u8 direction)
   223	{
   224		struct nf_conntrack_tuple tuple;
   225		struct nf_conn *nfct;
 > 226		struct net *net;
   227		u64 *nfct_p;
   228		int ret;
   229	
   230		WARN_ON_ONCE(!rcu_read_lock_held());
   231	
   232		if ((protonum != IPPROTO_TCP && protonum != IPPROTO_UDP) ||
   233		    (direction != IP_CT_DIR_ORIGINAL && direction != IP_CT_DIR_REPLY))
   234			return NULL;
   235	
   236		/* ptr is actually pointer to u64 having address, hence recast u64 load
   237		 * to native pointer width.
   238		 */
   239		nfct_p = (u64 *)ptr;
   240		nfct = (struct nf_conn *)READ_ONCE(*nfct_p);
   241		if (!nfct || unlikely(!refcount_inc_not_zero(&nfct->ct_general.use)))
   242			return NULL;
   243	
   244		memset(&tuple, 0, sizeof(tuple));
   245		ret = bpf_fill_nf_tuple(&tuple, bpf_tuple, tuple__sz);
   246		if (ret < 0)
   247			goto end;
   248		tuple.dst.protonum = protonum;
   249	
   250		/* XXX: Need to allow passing in struct net *, or take netns_id, this is non-sense */
   251		net = nf_ct_net(nfct);
   252		if (!nf_ct_key_equal(&nfct->tuplehash[direction], &tuple,
   253				     &nf_ct_zone_dflt, nf_ct_net(nfct)))
   254			goto end;
   255		return nfct;
   256	end:
   257		nf_ct_put(nfct);
   258		return NULL;
   259	}
   260	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind
  2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
@ 2022-02-22  5:28   ` Alexei Starovoitov
  2022-02-23  3:05     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-22  5:28 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Sun, Feb 20, 2022 at 07:17:59PM +0530, Kumar Kartikeya Dwivedi wrote:
> In next few patches, we need a helper that searches all kernel BTFs
> (vmlinux and module BTFs), and finds the type denoted by 'name' and
> 'kind'. Turns out bpf_btf_find_by_name_kind already does the same thing,
> but it instead returns a BTF ID and optionally fd (if module BTF). This
> is used for relocating ksyms in BPF loader code (bpftool gen skel -L).
> 
> We extract the core code out into a new helper
> btf_find_by_name_kind_all, which returns the BTF ID and BTF pointer in
> an out parameter. The reference for the returned BTF pointer is only
> bumped if it is a module BTF, this needs to be kept in mind when using
> this helper.
> 
> Hence, the user must release the BTF reference iff btf_is_module is
> true, otherwise transfer the ownership to e.g. an fd.
> 
> In case of the helper, the fd is only allocated for module BTFs, so no
> extra handling for btf_vmlinux case is required.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/btf.c | 47 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 2c4c5dbe2abe..3645d8c14a18 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6545,16 +6545,10 @@ static struct btf *btf_get_module_btf(const struct module *module)
>  	return btf;
>  }
>  
> -BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
> +static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp)

The name is getting too long.
How about bpf_find_btf_id() ?

>  {
>  	struct btf *btf;
> -	long ret;
> -
> -	if (flags)
> -		return -EINVAL;
> -
> -	if (name_sz <= 1 || name[name_sz - 1])
> -		return -EINVAL;
> +	s32 ret;
>  
>  	btf = bpf_get_btf_vmlinux();
>  	if (IS_ERR(btf))
> @@ -6580,19 +6574,40 @@ BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int
>  			spin_unlock_bh(&btf_idr_lock);
>  			ret = btf_find_by_name_kind(mod_btf, name, kind);
>  			if (ret > 0) {
> -				int btf_obj_fd;
> -
> -				btf_obj_fd = __btf_new_fd(mod_btf);
> -				if (btf_obj_fd < 0) {
> -					btf_put(mod_btf);
> -					return btf_obj_fd;
> -				}
> -				return ret | (((u64)btf_obj_fd) << 32);
> +				*btfp = mod_btf;
> +				return ret;
>  			}
>  			spin_lock_bh(&btf_idr_lock);
>  			btf_put(mod_btf);
>  		}
>  		spin_unlock_bh(&btf_idr_lock);
> +	} else {
> +		*btfp = btf;
> +	}

Since we're refactoring let's drop the indent.
How about
  if (ret > 0) {
    *btfp = btf;
    return ret;
  }
  idr_for_each_entry().

and move the func right after btf_find_by_name_kind(),
so that later patch doesn't need to do:
static s32 bpf_find_btf_id();
Eventually this helper might become global with this name.

Also may be do btf_get() for vmlinux_btf too?
In case it reduces 'if (btf_is_module())' checks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps
  2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
                   ` (14 preceding siblings ...)
  2022-02-20 13:48 ` [PATCH bpf-next v1 15/15] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
@ 2022-02-22  6:05 ` Song Liu
  2022-02-22  8:21   ` Kumar Kartikeya Dwivedi
  15 siblings, 1 reply; 38+ messages in thread
From: Song Liu @ 2022-02-22  6:05 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Networking

On Sun, Feb 20, 2022 at 5:48 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Introduction
> ------------
>
> This set enables storing pointers of a certain type in BPF map, and extends the
> verifier to enforce type safety and lifetime correctness properties.
>
> The infrastructure being added is generic enough for allowing storing any kind
> of pointers whose type is available using BTF (user or kernel) in the future
> (e.g. strongly typed memory allocation in BPF program), which are internally
> tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
> four kinds of pointers obtained from the kernel.
>
> Obviously, use of this feature depends on map BTF.
>
> 1. Unreferenced kernel pointer
>
> In this case, there are very few restrictions. The pointer type being stored
> must match the type declared in the map value. However, such a pointer when
> loaded from the map can only be dereferenced, but not passed to any in-kernel
> helpers or kernel functions available to the program. This is because while the
> verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
> which are then handled specially by the JIT implementation, the same liberty is
> not available to accesses inside the kernel. The pointer by the time it is
> passed into a helper has no lifetime related guarantees about the object it is
> pointing to, and may well be referencing invalid memory.
>
> 2. Referenced kernel pointer
>
> This case imposes a lot of restrictions on the programmer, to ensure safety. To
> transfer the ownership of a reference in the BPF program to the map, the user
> must use the BPF_XCHG instruction, which returns the old pointer contained in
> the map, as an acquired reference, and releases verifier state for the
> referenced pointer being exchanged, as it moves into the map.
>
> This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
> functions callable by the program.
>
> However, if BPF_LDX is used to load a referenced pointer from the map, it is
> still not permitted to pass it to in-kernel helpers or kernel functions. To
> obtain a reference usable with helpers, the user must invoke a kfunc helper
> which returns a usable reference (which also must be eventually released before
> BPF_EXIT, or moved into a map).
>
> Since the load of the pointer (preserving data dependency ordering) must happen
> inside the RCU read section, the kfunc helper will take a pointer to the map
> value, which must point to the actual pointer of the object whose reference is
> to be raised. The type will be verified from the BTF information of the kfunc,
> as the prototype must be:
>
>         T *func(T **, ... /* other arguments */);
>
> Then, the verifier checks whether pointer at offset of the map value points to
> the type T, and permits the call.
>
> This convention is followed so that such helpers may also be called from
> sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
> program context, hence necessiating the need to pass in a pointer to the actual
> pointer to perform the load inside the RCU read section.
>
> 3. per-CPU kernel pointer
>
> These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
> into the map, and when loading from the map, they must NULL check it before use,
> because while a non-zero value stored into the map should always be valid, it can
> still be reset to zero on updates. After checking it to be non-NULL, it can be
> passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
> to underlying per-CPU object.
>
> It is also permitted to write 0 and reset the value.
>
> 4. Userspace pointer
>
> The verifier recently gained support for annotating BTF with __user type tag.
> This indicates pointers pointing to memory which must be read using the
> bpf_probe_read_user helper to ensure correct results. The set also permits
> storing them into the BPF map, and ensures user pointer cannot be stored
> into other kinds of pointers mentioned above.
>
> When loaded from the map, the only thing that can be done is to pass this
> pointer to bpf_probe_read_user. No dereference is allowed.
>

I guess I missed some context here. Could you please provide some reference
to the use cases of these features?

For Unreferenced kernel pointer and userspace pointer, it seems that there is
no guarantee the pointer will still be valid during access (we only know it is
valid when it is stored in the map). Is this correct?

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
@ 2022-02-22  6:46   ` Alexei Starovoitov
  2022-02-23  3:09     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-22  6:46 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Sun, Feb 20, 2022 at 07:18:01PM +0530, Kumar Kartikeya Dwivedi wrote:
> This patch allows user to embed PTR_TO_BTF_ID in map value, such that
> loading it marks the destination register as having the appropriate
> register type and such a pointer can be dereferenced like usual
> PTR_TO_BTF_ID and be passed to various BPF helpers.
> 
> This feature can be useful to store an object in a map for a long time,
> and then inspect it later. Since PTR_TO_BTF_ID is safe against invalid
> access, verifier doesn't need to perform any complex lifetime checks. It
> can be useful in cases where user already knows pointer will remain
> valid, so any dereference at a later time (possibly in entirely
> different BPF program invocation) will yield correct results as far the
> data read from kernel memory is concerned.
> 
> Note that it is quite possible such BTF ID pointer is invalid, in this
> case the verifier's built-in exception handling mechanism where it
> converts loads into PTR_TO_BTF_ID into PROBE_MEM loads, would handle the
> invalid case. Next patch which adds referenced PTR_TO_BTF_ID would need
> to take more care in ensuring a correct value is stored in the BPF map.
> 
> The user indicates that a certain pointer must be treated as
> PTR_TO_BTF_ID by using a BTF type tag 'btf_id' on the pointed to type of
> the pointer. Then, this information is recorded in the object BTF which
> will be passed into the kernel by way of map's BTF information.
> 
> The kernel then records the type, and offset of all such pointers, and
> finds their corresponding built-in kernel type by the name and BTF kind.
> 
> Later, during verification this information is used that access to such
> pointers is sized correctly, and done at a proper offset into the map
> value. Only BPF_LDX, BPF_STX, and BPF_ST with 0 (to denote NULL) are
> allowed instructions that can access such a pointer. On BPF_LDX, the
> destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> it is checked whether the source register type is same PTR_TO_BTF_ID,
> and whether the BTF ID (reg->btf and reg->btf_id) matches the type
> specified in the map value's definition.
> 
> Hence, the verifier allows flexible access to kernel data across program
> invocations in a type safe manner, without compromising on the runtime
> safety of the kernel.
> 
> Next patch will extend this support to referenced PTR_TO_BTF_ID.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h     |  30 +++++++-
>  include/linux/btf.h     |   3 +
>  kernel/bpf/btf.c        | 127 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/map_in_map.c |   5 +-
>  kernel/bpf/syscall.c    | 137 ++++++++++++++++++++++++++++++++++++-
>  kernel/bpf/verifier.c   | 148 ++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 446 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f19abc59b6cd..ce45ffb79f82 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -155,6 +155,23 @@ struct bpf_map_ops {
>  	const struct bpf_iter_seq_info *iter_seq_info;
>  };
>  
> +enum {
> +	/* Support at most 8 pointers in a BPF map value */
> +	BPF_MAP_VALUE_OFF_MAX = 8,
> +};
> +
> +struct bpf_map_value_off_desc {
> +	u32 offset;
> +	u32 btf_id;
> +	struct btf *btf;
> +	struct module *module;
> +};
> +
> +struct bpf_map_value_off {
> +	u32 nr_off;
> +	struct bpf_map_value_off_desc off[];
> +};
> +
>  struct bpf_map {
>  	/* The first two cachelines with read-mostly members of which some
>  	 * are also accessed in fast-path (e.g. ops, max_entries).
> @@ -171,6 +188,7 @@ struct bpf_map {
>  	u64 map_extra; /* any per-map-type extra fields */
>  	u32 map_flags;
>  	int spin_lock_off; /* >=0 valid offset, <0 error */
> +	struct bpf_map_value_off *ptr_off_tab;
>  	int timer_off; /* >=0 valid offset, <0 error */
>  	u32 id;
>  	int numa_node;
> @@ -184,7 +202,7 @@ struct bpf_map {
>  	char name[BPF_OBJ_NAME_LEN];
>  	bool bypass_spec_v1;
>  	bool frozen; /* write-once; write-protected by freeze_mutex */
> -	/* 14 bytes hole */
> +	/* 6 bytes hole */
>  
>  	/* The 3rd and 4th cacheline with misc members to avoid false sharing
>  	 * particularly with refcounting.
> @@ -217,6 +235,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
>  	return map->timer_off >= 0;
>  }
>  
> +static inline bool map_value_has_ptr_to_btf_id(const struct bpf_map *map)
> +{
> +	return !IS_ERR_OR_NULL(map->ptr_off_tab);
> +}
> +
>  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
>  {
>  	if (unlikely(map_value_has_spin_lock(map)))
> @@ -1490,6 +1513,11 @@ void bpf_prog_put(struct bpf_prog *prog);
>  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
>  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
>  
> +struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32 offset);
> +void bpf_map_free_ptr_off_tab(struct bpf_map *map);
> +struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map);
> +bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
> +
>  struct bpf_map *bpf_map_get(u32 ufd);
>  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
>  struct bpf_map *__bpf_map_get(struct fd f);
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 36bc09b8e890..6592183aeb23 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -26,6 +26,7 @@ struct btf_type;
>  union bpf_attr;
>  struct btf_show;
>  struct btf_id_set;
> +struct bpf_map;
>  
>  struct btf_kfunc_id_set {
>  	struct module *owner;
> @@ -123,6 +124,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
>  			   u32 expected_offset, u32 expected_size);
>  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
>  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> +int btf_find_ptr_to_btf_id(const struct btf *btf, const struct btf_type *t,
> +			   struct bpf_map *map);
>  bool btf_type_is_void(const struct btf_type *t);
>  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
>  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 55f6ccac3388..1edb5710e155 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3122,6 +3122,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
>  enum {
>  	BTF_FIELD_SPIN_LOCK,
>  	BTF_FIELD_TIMER,
> +	BTF_FIELD_KPTR,
>  };
>  
>  static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
> @@ -3140,6 +3141,106 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
>  	return 0;
>  }
>  
> +static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp);
> +
> +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> +			       u32 off, int sz, void *data)
> +{
> +	struct bpf_map_value_off *tab;
> +	struct bpf_map *map = data;
> +	struct module *mod = NULL;
> +	bool btf_id_tag = false;
> +	struct btf *kernel_btf;
> +	int nr_off, ret;
> +	s32 id;
> +
> +	/* For PTR, sz is always == 8 */
> +	if (!btf_type_is_ptr(t))
> +		return 0;
> +	t = btf_type_by_id(btf, t->type);
> +
> +	while (btf_type_is_type_tag(t)) {
> +		if (!strcmp("kernel.bpf.btf_id", __btf_name_by_offset(btf, t->name_off))) {

All of these strings consume space.
Multiple tags consume space too.
I would just do:
#define __kptr __attribute__((btf_type_tag("kptr")))
#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
#define __kptr_percpu __attribute__((btf_type_tag("kptr_percpu")))
#define __kptr_user __attribute__((btf_type_tag("kptr_user")))

> +			/* repeated tag */
> +			if (btf_id_tag) {
> +				ret = -EINVAL;
> +				goto end;
> +			}
> +			btf_id_tag = true;
> +		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
> +			   sizeof("kernel.") - 1)) {
> +			/* TODO: Should we reject these when loading BTF? */
> +			/* Unavailable tag in reserved tag namespace */

I don't think we need to reserve the tag space.
There is little risk to break progs with future tags.
I would just drop this 'if'.

> +			ret = -EACCES;
> +			goto end;
> +		}
> +		/* Look for next tag */
> +		t = btf_type_by_id(btf, t->type);
> +	}
> +	if (!btf_id_tag)
> +		return 0;
> +
> +	/* Get the base type */
> +	if (btf_type_is_modifier(t))
> +		t = btf_type_skip_modifiers(btf, t->type, NULL);
> +	/* Only pointer to struct is allowed */
> +	if (!__btf_type_is_struct(t)) {
> +		ret = -EINVAL;
> +		goto end;
> +	}
> +
> +	id = btf_find_by_name_kind_all(__btf_name_by_offset(btf, t->name_off),
> +				       BTF_INFO_KIND(t->info), &kernel_btf);
> +	if (id < 0) {
> +		ret = id;
> +		goto end;
> +	}
> +
> +	nr_off = map->ptr_off_tab ? map->ptr_off_tab->nr_off : 0;
> +	if (nr_off == BPF_MAP_VALUE_OFF_MAX) {
> +		ret = -E2BIG;
> +		goto end_btf;
> +	}
> +
> +	tab = krealloc(map->ptr_off_tab, offsetof(struct bpf_map_value_off, off[nr_off + 1]),
> +		       GFP_KERNEL | __GFP_NOWARN);

Argh.
If the function is called btf_find_field() it should do 'find' and only 'find'.
It should be side effect free and should find _one_ field.
If you want a function with side effcts it should be called something like btf_walk_fields.

For this case how about side effect free btf_find_fieldS() that will populate array
struct bpf_field_info {
  struct btf *type; /* set for spin_lock, timer, kptr */
  u32 off;
  int flags; /* ref|percpu|user for kptr */
};

cnt = btf_find_fields(prog_btf, value_type, BTF_FIELD_SPIN_LOCK|TIMER|KPTR, fields);

btf_find_struct_field/btf_find_datasec_var will keep the count and will error
when it reaches BPF_MAP_VALUE_OFF_MAX. 
switch (field_type) {
case BTF_FIELD_SPIN_LOCK:
   btf_find_field_struct(... "bpf_spin_lock",
                             sizeof(struct bpf_spin_lock),
                             __alignof__(struct bpf_spin_lock),
                             fields + i);
case BTF_FIELD_TIMER:
   btf_find_field_struct(... "bpf_timer", sizeof, alignof, fields + i);
case BTF_FIELD_KPTR:
   btf_find_field_kptr(... fields + i);
}

btf_find_by_name_kind_all (or new name bpf_find_btf_id)
will be done after btf_find_fields() is over.
dtor will be found after as well.
struct bpf_map_value_off will be allocated once.

> +	if (!tab) {
> +		ret = -ENOMEM;
> +		goto end_btf;
> +	}
> +	/* Initialize nr_off for newly allocated ptr_off_tab */
> +	if (!map->ptr_off_tab)
> +		tab->nr_off = 0;
> +	map->ptr_off_tab = tab;
> +
> +	/* We take reference to make sure valid pointers into module data don't
> +	 * become invalid across program invocation.
> +	 */

what is the point of grabbing mod ref?
This patch needs btf only and its refcnt will be incremented by bpf_find_btf_id.
Is that because of future dtor ?
Then it should be part of that patch.

> +	if (btf_is_module(kernel_btf)) {
> +		mod = btf_try_get_module(kernel_btf);
> +		if (!mod) {
> +			ret = -ENXIO;
> +			goto end_btf;
> +		}
> +	}
> +
> +	tab->off[nr_off].offset = off;
> +	tab->off[nr_off].btf_id = id;
> +	tab->off[nr_off].btf    = kernel_btf;
> +	tab->off[nr_off].module = mod;
> +	tab->nr_off++;
> +
> +	return 0;
> +end_btf:
> +	/* Reference is only raised for module BTF */
> +	if (btf_is_module(kernel_btf))
> +		btf_put(kernel_btf);

see earlier suggestion. this 'if' can be dropped if we btf_get for vmlinux_btf too.

> +end:
> +	bpf_map_free_ptr_off_tab(map);
> +	map->ptr_off_tab = ERR_PTR(ret);
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-20 13:48 ` [PATCH bpf-next v1 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
@ 2022-02-22  6:53   ` Alexei Starovoitov
  2022-02-22  7:10     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-22  6:53 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
>  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
>  			    int off, int bpf_size, enum bpf_access_type t,
> -			    int value_regno, bool strict_alignment_once)
> +			    int value_regno, bool strict_alignment_once,
> +			    struct bpf_reg_state *atomic_load_reg)

No new side effects please.
value_regno is not pretty already.
At least its known ugliness that we need to clean up one day.

>  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
>  {
> +	struct bpf_reg_state atomic_load_reg;
>  	int load_reg;
>  	int err;
>  
> +	__mark_reg_unknown(env, &atomic_load_reg);
> +
>  	switch (insn->imm) {
>  	case BPF_ADD:
>  	case BPF_ADD | BPF_FETCH:
> @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
>  		else
>  			load_reg = insn->src_reg;
>  
> +		atomic_load_reg = *reg_state(env, load_reg);
>  		/* check and record load of old value */
>  		err = check_reg_arg(env, load_reg, DST_OP);
>  		if (err)
> @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
>  	}
>  
>  	/* Check whether we can read the memory, with second call for fetch
> -	 * case to simulate the register fill.
> +	 * case to simulate the register fill, which also triggers checks
> +	 * for manipulation of BTF ID pointers embedded in BPF maps.
>  	 */
>  	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> -			       BPF_SIZE(insn->code), BPF_READ, -1, true);
> +			       BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
>  	if (!err && load_reg >= 0)
>  		err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
>  				       BPF_SIZE(insn->code), BPF_READ, load_reg,
> -				       true);
> +				       true, load_reg >= 0 ? &atomic_load_reg : NULL);

Special xchg logic should be down outside of check_mem_access()
instead of hidden by layers of calls.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case
  2022-02-20 13:48 ` [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
@ 2022-02-22  7:04   ` Alexei Starovoitov
  2022-02-23  3:13     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-22  7:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Sun, Feb 20, 2022 at 07:18:06PM +0530, Kumar Kartikeya Dwivedi wrote:
> The changes in this patch deserve closer look, so it has been split into
> its own independent patch. While earlier we just had to skip two objects
> at most while copying in and out of map, now we have potentially many
> objects (at most 8 + 2 = 10, due to the BPF_MAP_VALUE_OFF_MAX limit).
> 
> Hence, divide the copy_map_value function into an inlined fast path and
> function call to slowpath. The slowpath handles the case of > 3 offsets,
> while we handle the most common cases (0, 1, 2, or 3 offsets) in the
> inline function itself.
> 
> In copy_map_value_slow, we use 11 offsets, just to make the for loop
> that copies the value free of edge cases for the last offset, by using
> map->value_size as final offset to subtract remaining area to copy from.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h  | 43 +++++++++++++++++++++++++++++++---
>  kernel/bpf/syscall.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 95 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ae599aaf8d4c..5d845ca02eba 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -253,12 +253,22 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
>  		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
>  	if (unlikely(map_value_has_timer(map)))
>  		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> +	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> +		struct bpf_map_value_off *tab = map->ptr_off_tab;
> +		int i;
> +
> +		for (i = 0; i < tab->nr_off; i++)
> +			*(u64 *)(dst + tab->off[i].offset) = 0;
> +	}
>  }
>  
> +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> +			 u32 s_sz, u32 t_off, u32 t_sz);
> +
>  /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
>  static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
>  {
> -	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
> +	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0, p_off = 0, p_sz = 0;
>  
>  	if (unlikely(map_value_has_spin_lock(map))) {
>  		s_off = map->spin_lock_off;
> @@ -268,13 +278,40 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
>  		t_off = map->timer_off;
>  		t_sz = sizeof(struct bpf_timer);
>  	}
> +	/* Multiple offset case is slow, offload to function */
> +	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> +		struct bpf_map_value_off *tab = map->ptr_off_tab;
> +
> +		/* Inline the likely common case */
> +		if (likely(tab->nr_off == 1)) {
> +			p_off = tab->off[0].offset;
> +			p_sz = sizeof(u64);
> +		} else {
> +			copy_map_value_slow(map, dst, src, s_off, s_sz, t_off, t_sz);
> +			return;
> +		}
> +	}
> +
> +	if (unlikely(s_sz || t_sz || p_sz)) {
> +		/* The order is p_off, t_off, s_off, use insertion sort */
>  
> -	if (unlikely(s_sz || t_sz)) {
> +		if (t_off < p_off || !t_sz) {
> +			swap(t_off, p_off);
> +			swap(t_sz, p_sz);
> +		}
>  		if (s_off < t_off || !s_sz) {
>  			swap(s_off, t_off);
>  			swap(s_sz, t_sz);
> +			if (t_off < p_off || !t_sz) {
> +				swap(t_off, p_off);
> +				swap(t_sz, p_sz);
> +			}
>  		}
> -		memcpy(dst, src, t_off);
> +
> +		memcpy(dst, src, p_off);
> +		memcpy(dst + p_off + p_sz,
> +		       src + p_off + p_sz,
> +		       t_off - p_off - p_sz);
>  		memcpy(dst + t_off + t_sz,
>  		       src + t_off + t_sz,
>  		       s_off - t_off - t_sz);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index beb96866f34d..83d71d6912f5 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -30,6 +30,7 @@
>  #include <linux/pgtable.h>
>  #include <linux/bpf_lsm.h>
>  #include <linux/poll.h>
> +#include <linux/sort.h>
>  #include <linux/bpf-netns.h>
>  #include <linux/rcupdate_trace.h>
>  #include <linux/memcontrol.h>
> @@ -230,6 +231,60 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
>  	return err;
>  }
>  
> +static int copy_map_value_cmp(const void *_a, const void *_b)
> +{
> +	const u32 a = *(const u32 *)_a;
> +	const u32 b = *(const u32 *)_b;
> +
> +	/* We only need to sort based on offset */
> +	if (a < b)
> +		return -1;
> +	else if (a > b)
> +		return 1;
> +	return 0;
> +}
> +
> +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> +			 u32 s_sz, u32 t_off, u32 t_sz)
> +{
> +	struct bpf_map_value_off *tab = map->ptr_off_tab; /* already set to non-NULL */
> +	/* 3 = 2 for bpf_timer, bpf_spin_lock, 1 for map->value_size sentinel */
> +	struct {
> +		u32 off;
> +		u32 sz;
> +	} off_arr[BPF_MAP_VALUE_OFF_MAX + 3];
> +	int i, cnt = 0;
> +
> +	/* Reconsider stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> +	BUILD_BUG_ON(sizeof(off_arr) != 88);
> +
> +	for (i = 0; i < tab->nr_off; i++) {
> +		off_arr[cnt].off = tab->off[i].offset;
> +		off_arr[cnt++].sz = sizeof(u64);
> +	}
> +	if (s_sz) {
> +		off_arr[cnt].off = s_off;
> +		off_arr[cnt++].sz = s_sz;
> +	}
> +	if (t_sz) {
> +		off_arr[cnt].off = t_off;
> +		off_arr[cnt++].sz = t_sz;
> +	}
> +	off_arr[cnt].off = map->value_size;
> +
> +	sort(off_arr, cnt, sizeof(off_arr[0]), copy_map_value_cmp, NULL);

Ouch. sort every time we need to copy map value?
sort it once please. 88 bytes in a map are worth it.
Especially since "slow" version will trigger with just 2 kptrs.
(if I understand this correctly).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-22  6:53   ` Alexei Starovoitov
@ 2022-02-22  7:10     ` Kumar Kartikeya Dwivedi
  2022-02-22 16:20       ` Alexei Starovoitov
  0 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-22  7:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Tue, Feb 22, 2022 at 12:23:49PM IST, Alexei Starovoitov wrote:
> On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> >  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
> >  			    int off, int bpf_size, enum bpf_access_type t,
> > -			    int value_regno, bool strict_alignment_once)
> > +			    int value_regno, bool strict_alignment_once,
> > +			    struct bpf_reg_state *atomic_load_reg)
>
> No new side effects please.
> value_regno is not pretty already.
> At least its known ugliness that we need to clean up one day.
>
> >  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> >  {
> > +	struct bpf_reg_state atomic_load_reg;
> >  	int load_reg;
> >  	int err;
> >
> > +	__mark_reg_unknown(env, &atomic_load_reg);
> > +
> >  	switch (insn->imm) {
> >  	case BPF_ADD:
> >  	case BPF_ADD | BPF_FETCH:
> > @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> >  		else
> >  			load_reg = insn->src_reg;
> >
> > +		atomic_load_reg = *reg_state(env, load_reg);
> >  		/* check and record load of old value */
> >  		err = check_reg_arg(env, load_reg, DST_OP);
> >  		if (err)
> > @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> >  	}
> >
> >  	/* Check whether we can read the memory, with second call for fetch
> > -	 * case to simulate the register fill.
> > +	 * case to simulate the register fill, which also triggers checks
> > +	 * for manipulation of BTF ID pointers embedded in BPF maps.
> >  	 */
> >  	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > -			       BPF_SIZE(insn->code), BPF_READ, -1, true);
> > +			       BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
> >  	if (!err && load_reg >= 0)
> >  		err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> >  				       BPF_SIZE(insn->code), BPF_READ, load_reg,
> > -				       true);
> > +				       true, load_reg >= 0 ? &atomic_load_reg : NULL);
>
> Special xchg logic should be down outside of check_mem_access()
> instead of hidden by layers of calls.

Right, it's ugly, but if we don't capture the reg state before that
check_reg_arg(env, load_reg, DST_OP), it's not possible to see the actual
PTR_TO_BTF_ID being moved into the map, since check_reg_arg will do a
mark_reg_unknown for value_regno. Any other ideas on what I can do?

37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
changed the order of check_mem_access and DST_OP check_reg_arg.

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps
  2022-02-22  6:05 ` [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Song Liu
@ 2022-02-22  8:21   ` Kumar Kartikeya Dwivedi
  2022-02-23  7:29     ` Song Liu
  0 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-22  8:21 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Networking

On Tue, Feb 22, 2022 at 11:35:14AM IST, Song Liu wrote:
> On Sun, Feb 20, 2022 at 5:48 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Introduction
> > ------------
> >
> > This set enables storing pointers of a certain type in BPF map, and extends the
> > verifier to enforce type safety and lifetime correctness properties.
> >
> > The infrastructure being added is generic enough for allowing storing any kind
> > of pointers whose type is available using BTF (user or kernel) in the future
> > (e.g. strongly typed memory allocation in BPF program), which are internally
> > tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
> > four kinds of pointers obtained from the kernel.
> >
> > Obviously, use of this feature depends on map BTF.
> >
> > 1. Unreferenced kernel pointer
> >
> > In this case, there are very few restrictions. The pointer type being stored
> > must match the type declared in the map value. However, such a pointer when
> > loaded from the map can only be dereferenced, but not passed to any in-kernel
> > helpers or kernel functions available to the program. This is because while the
> > verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
> > which are then handled specially by the JIT implementation, the same liberty is
> > not available to accesses inside the kernel. The pointer by the time it is
> > passed into a helper has no lifetime related guarantees about the object it is
> > pointing to, and may well be referencing invalid memory.
> >
> > 2. Referenced kernel pointer
> >
> > This case imposes a lot of restrictions on the programmer, to ensure safety. To
> > transfer the ownership of a reference in the BPF program to the map, the user
> > must use the BPF_XCHG instruction, which returns the old pointer contained in
> > the map, as an acquired reference, and releases verifier state for the
> > referenced pointer being exchanged, as it moves into the map.
> >
> > This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
> > functions callable by the program.
> >
> > However, if BPF_LDX is used to load a referenced pointer from the map, it is
> > still not permitted to pass it to in-kernel helpers or kernel functions. To
> > obtain a reference usable with helpers, the user must invoke a kfunc helper
> > which returns a usable reference (which also must be eventually released before
> > BPF_EXIT, or moved into a map).
> >
> > Since the load of the pointer (preserving data dependency ordering) must happen
> > inside the RCU read section, the kfunc helper will take a pointer to the map
> > value, which must point to the actual pointer of the object whose reference is
> > to be raised. The type will be verified from the BTF information of the kfunc,
> > as the prototype must be:
> >
> >         T *func(T **, ... /* other arguments */);
> >
> > Then, the verifier checks whether pointer at offset of the map value points to
> > the type T, and permits the call.
> >
> > This convention is followed so that such helpers may also be called from
> > sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
> > program context, hence necessiating the need to pass in a pointer to the actual
> > pointer to perform the load inside the RCU read section.
> >
> > 3. per-CPU kernel pointer
> >
> > These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
> > into the map, and when loading from the map, they must NULL check it before use,
> > because while a non-zero value stored into the map should always be valid, it can
> > still be reset to zero on updates. After checking it to be non-NULL, it can be
> > passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
> > to underlying per-CPU object.
> >
> > It is also permitted to write 0 and reset the value.
> >
> > 4. Userspace pointer
> >
> > The verifier recently gained support for annotating BTF with __user type tag.
> > This indicates pointers pointing to memory which must be read using the
> > bpf_probe_read_user helper to ensure correct results. The set also permits
> > storing them into the BPF map, and ensures user pointer cannot be stored
> > into other kinds of pointers mentioned above.
> >
> > When loaded from the map, the only thing that can be done is to pass this
> > pointer to bpf_probe_read_user. No dereference is allowed.
> >
>
> I guess I missed some context here. Could you please provide some reference
> to the use cases of these features?
>

The common usecase is caching references to objects inside BPF maps, to avoid
costly lookups, and being able to raise it once for the duration of program
invocation when passing it to multiple helpers (to avoid further re-lookups).
Storing references also allows you to control object lifetime.

One other use case is enabling xdp_frame queueing in XDP using this, but that
still needs some integration work after this lands, so it's a bit early to
comment on the specifics.

Other than that, I think Alexei already mentioned this could be easily extended
to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the
future.

  [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@ast-mbp.dhcp.thefacebook.com

> For Unreferenced kernel pointer and userspace pointer, it seems that there is
> no guarantee the pointer will still be valid during access (we only know it is
> valid when it is stored in the map). Is this correct?
>

That is correct. In the case of unreferenced and referenced kernel pointers,
when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed
to pass them into helpers or kfuncs, because from that point onwards we cannot
claim that the object is still alive when pointer is used later. Still,
dereference is permitted because verifier handles faults for bad accesses using
PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is
then later detected by JIT to build exception table used by exception handler).

In case of reading unreferenced pointer, in some cases you know that the pointer
will stay valid, so you can just store it in the map and load and directly
access it, it imposes very little restrictions.

For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say
that this makes it a lot less useful, because if BPF program already holds
reference, just to make sure I _read valid data_, I still have to use the
kptr_get style helper to raise and put reference to ensure the object is alive
when it is accessed.

So in that case, for RCU protected objects, it should still wait for BPF program
to hit BPF_EXIT before the actual release, but for other cases like the case of
sleepable programs, or objects where refcount alone manages lifetime, you can
also detect writer presence of the other BPF program (to detect if pointer
during our access was xchg'd out) using a seqlock style scheme:

	v = bpf_map_lookup_elem(&map, ...);
	if (!v)
		return 0;
	seq_begin = v->seq;
	atomic_thread_fence(memory_order_acquire); // A
	<do access>
	atomic_thread_fence(memory_order_acquire); // B
	seq_end = v->seq;
	if (seq_begin & 1 || seq_begin != seq_end)
		goto bad_read;
	<use data>

Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on
x86). The updater BPF program will increment v->seq before and after xchg,
ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update
is in progress.

This would allow you to not raise refcount, while still ensuring that as long as
object was accessed, it was still valid between A and B. Even if raising
uncontended refcount is cheap, this is much cheaper.

The case of userspace pointer is different, it sets the MEM_USER flag, so the
only useful thing to do is calling bpf_probe_read_user, you can't even
dereference it. You are right that in most cases that userspace pointer won't be
useful, but for some cooperative cases between BPF program and userspace thread,
it can act as a way to share certain thread local areas/userspace memory that
the BPF program can then store keyed by the task_struct *, where using a BPF map
to share memory is not always possible.

> Thanks,
> Song
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-22  7:10     ` Kumar Kartikeya Dwivedi
@ 2022-02-22 16:20       ` Alexei Starovoitov
  2022-02-23  3:04         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-22 16:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Mon, Feb 21, 2022 at 11:10 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Feb 22, 2022 at 12:23:49PM IST, Alexei Starovoitov wrote:
> > On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > >  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
> > >                         int off, int bpf_size, enum bpf_access_type t,
> > > -                       int value_regno, bool strict_alignment_once)
> > > +                       int value_regno, bool strict_alignment_once,
> > > +                       struct bpf_reg_state *atomic_load_reg)
> >
> > No new side effects please.
> > value_regno is not pretty already.
> > At least its known ugliness that we need to clean up one day.
> >
> > >  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> > >  {
> > > +   struct bpf_reg_state atomic_load_reg;
> > >     int load_reg;
> > >     int err;
> > >
> > > +   __mark_reg_unknown(env, &atomic_load_reg);
> > > +
> > >     switch (insn->imm) {
> > >     case BPF_ADD:
> > >     case BPF_ADD | BPF_FETCH:
> > > @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > >             else
> > >                     load_reg = insn->src_reg;
> > >
> > > +           atomic_load_reg = *reg_state(env, load_reg);
> > >             /* check and record load of old value */
> > >             err = check_reg_arg(env, load_reg, DST_OP);
> > >             if (err)
> > > @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > >     }
> > >
> > >     /* Check whether we can read the memory, with second call for fetch
> > > -    * case to simulate the register fill.
> > > +    * case to simulate the register fill, which also triggers checks
> > > +    * for manipulation of BTF ID pointers embedded in BPF maps.
> > >      */
> > >     err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > -                          BPF_SIZE(insn->code), BPF_READ, -1, true);
> > > +                          BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
> > >     if (!err && load_reg >= 0)
> > >             err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > >                                    BPF_SIZE(insn->code), BPF_READ, load_reg,
> > > -                                  true);
> > > +                                  true, load_reg >= 0 ? &atomic_load_reg : NULL);
> >
> > Special xchg logic should be down outside of check_mem_access()
> > instead of hidden by layers of calls.
>
> Right, it's ugly, but if we don't capture the reg state before that
> check_reg_arg(env, load_reg, DST_OP), it's not possible to see the actual
> PTR_TO_BTF_ID being moved into the map, since check_reg_arg will do a
> mark_reg_unknown for value_regno. Any other ideas on what I can do?
>
> 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
> changed the order of check_mem_access and DST_OP check_reg_arg.

That highlights my point that side effects are bad.
That commit tries to work around that behavior and makes things
harder to extend like you found out with xchg logic.
Another option would be to add bpf_kptr_xchg() helper
instead of dealing with insn. It will be tiny bit slower,
but it will work on all architectures. While xchg bpf jit is
on x86,s390,mips so far.
We need to think more on how to refactor check_mem_acess without
digging ourselves into an even bigger hole.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-22 16:20       ` Alexei Starovoitov
@ 2022-02-23  3:04         ` Kumar Kartikeya Dwivedi
  2022-02-23 21:52           ` Alexei Starovoitov
  0 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-23  3:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Tue, Feb 22, 2022 at 09:50:00PM IST, Alexei Starovoitov wrote:
> On Mon, Feb 21, 2022 at 11:10 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 12:23:49PM IST, Alexei Starovoitov wrote:
> > > On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > >  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
> > > >                         int off, int bpf_size, enum bpf_access_type t,
> > > > -                       int value_regno, bool strict_alignment_once)
> > > > +                       int value_regno, bool strict_alignment_once,
> > > > +                       struct bpf_reg_state *atomic_load_reg)
> > >
> > > No new side effects please.
> > > value_regno is not pretty already.
> > > At least its known ugliness that we need to clean up one day.
> > >
> > > >  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> > > >  {
> > > > +   struct bpf_reg_state atomic_load_reg;
> > > >     int load_reg;
> > > >     int err;
> > > >
> > > > +   __mark_reg_unknown(env, &atomic_load_reg);
> > > > +
> > > >     switch (insn->imm) {
> > > >     case BPF_ADD:
> > > >     case BPF_ADD | BPF_FETCH:
> > > > @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > >             else
> > > >                     load_reg = insn->src_reg;
> > > >
> > > > +           atomic_load_reg = *reg_state(env, load_reg);
> > > >             /* check and record load of old value */
> > > >             err = check_reg_arg(env, load_reg, DST_OP);
> > > >             if (err)
> > > > @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > >     }
> > > >
> > > >     /* Check whether we can read the memory, with second call for fetch
> > > > -    * case to simulate the register fill.
> > > > +    * case to simulate the register fill, which also triggers checks
> > > > +    * for manipulation of BTF ID pointers embedded in BPF maps.
> > > >      */
> > > >     err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > > -                          BPF_SIZE(insn->code), BPF_READ, -1, true);
> > > > +                          BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
> > > >     if (!err && load_reg >= 0)
> > > >             err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > >                                    BPF_SIZE(insn->code), BPF_READ, load_reg,
> > > > -                                  true);
> > > > +                                  true, load_reg >= 0 ? &atomic_load_reg : NULL);
> > >
> > > Special xchg logic should be down outside of check_mem_access()
> > > instead of hidden by layers of calls.
> >
> > Right, it's ugly, but if we don't capture the reg state before that
> > check_reg_arg(env, load_reg, DST_OP), it's not possible to see the actual
> > PTR_TO_BTF_ID being moved into the map, since check_reg_arg will do a
> > mark_reg_unknown for value_regno. Any other ideas on what I can do?
> >
> > 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
> > changed the order of check_mem_access and DST_OP check_reg_arg.
>
> That highlights my point that side effects are bad.
> That commit tries to work around that behavior and makes things
> harder to extend like you found out with xchg logic.
> Another option would be to add bpf_kptr_xchg() helper
> instead of dealing with insn. It will be tiny bit slower,
> but it will work on all architectures. While xchg bpf jit is
> on x86,s390,mips so far.

Right, but kfunc is currently limited to x86, which is required to obtain a
refcounted PTR_TO_BTF_ID that you can move into the map, so it wouldn't make
much of a difference.

> We need to think more on how to refactor check_mem_acess without
> digging ourselves into an even bigger hole.

So I'm ok with working on untangling check_mem_access as a follow up, but for
now should we go forward with how it is? Just looking at it yesterday makes me
think it's going to require a fair amount of refactoring and discussion.

Also, do you have any ideas on how to change it? Do you want it to work like how
is_valid_access callbacks work? So passing something like a bpf_insn_access_aux
into the call, where it sets how it'd like to update the register, and then
actual updates take place in caller context?

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind
  2022-02-22  5:28   ` Alexei Starovoitov
@ 2022-02-23  3:05     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-23  3:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Tue, Feb 22, 2022 at 10:58:11AM IST, Alexei Starovoitov wrote:
> On Sun, Feb 20, 2022 at 07:17:59PM +0530, Kumar Kartikeya Dwivedi wrote:
> > In next few patches, we need a helper that searches all kernel BTFs
> > (vmlinux and module BTFs), and finds the type denoted by 'name' and
> > 'kind'. Turns out bpf_btf_find_by_name_kind already does the same thing,
> > but it instead returns a BTF ID and optionally fd (if module BTF). This
> > is used for relocating ksyms in BPF loader code (bpftool gen skel -L).
> >
> > We extract the core code out into a new helper
> > btf_find_by_name_kind_all, which returns the BTF ID and BTF pointer in
> > an out parameter. The reference for the returned BTF pointer is only
> > bumped if it is a module BTF, this needs to be kept in mind when using
> > this helper.
> >
> > Hence, the user must release the BTF reference iff btf_is_module is
> > true, otherwise transfer the ownership to e.g. an fd.
> >
> > In case of the helper, the fd is only allocated for module BTFs, so no
> > extra handling for btf_vmlinux case is required.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  kernel/bpf/btf.c | 47 +++++++++++++++++++++++++++++++----------------
> >  1 file changed, 31 insertions(+), 16 deletions(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 2c4c5dbe2abe..3645d8c14a18 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -6545,16 +6545,10 @@ static struct btf *btf_get_module_btf(const struct module *module)
> >  	return btf;
> >  }
> >
> > -BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
> > +static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp)
>
> The name is getting too long.
> How about bpf_find_btf_id() ?
>
> >  {
> >  	struct btf *btf;
> > -	long ret;
> > -
> > -	if (flags)
> > -		return -EINVAL;
> > -
> > -	if (name_sz <= 1 || name[name_sz - 1])
> > -		return -EINVAL;
> > +	s32 ret;
> >
> >  	btf = bpf_get_btf_vmlinux();
> >  	if (IS_ERR(btf))
> > @@ -6580,19 +6574,40 @@ BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int
> >  			spin_unlock_bh(&btf_idr_lock);
> >  			ret = btf_find_by_name_kind(mod_btf, name, kind);
> >  			if (ret > 0) {
> > -				int btf_obj_fd;
> > -
> > -				btf_obj_fd = __btf_new_fd(mod_btf);
> > -				if (btf_obj_fd < 0) {
> > -					btf_put(mod_btf);
> > -					return btf_obj_fd;
> > -				}
> > -				return ret | (((u64)btf_obj_fd) << 32);
> > +				*btfp = mod_btf;
> > +				return ret;
> >  			}
> >  			spin_lock_bh(&btf_idr_lock);
> >  			btf_put(mod_btf);
> >  		}
> >  		spin_unlock_bh(&btf_idr_lock);
> > +	} else {
> > +		*btfp = btf;
> > +	}
>
> Since we're refactoring let's drop the indent.
> How about
>   if (ret > 0) {
>     *btfp = btf;
>     return ret;
>   }
>   idr_for_each_entry().
>
> and move the func right after btf_find_by_name_kind(),
> so that later patch doesn't need to do:
> static s32 bpf_find_btf_id();
> Eventually this helper might become global with this name.
>

Ok, will change.

> Also may be do btf_get() for vmlinux_btf too?
> In case it reduces 'if (btf_is_module())' checks.

Right, should also change this for btf_get_module_btf then, to make things
consistent.

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map
  2022-02-22  6:46   ` Alexei Starovoitov
@ 2022-02-23  3:09     ` Kumar Kartikeya Dwivedi
  2022-02-23 21:46       ` Alexei Starovoitov
  0 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-23  3:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Tue, Feb 22, 2022 at 12:16:19PM IST, Alexei Starovoitov wrote:
> On Sun, Feb 20, 2022 at 07:18:01PM +0530, Kumar Kartikeya Dwivedi wrote:
> > This patch allows user to embed PTR_TO_BTF_ID in map value, such that
> > loading it marks the destination register as having the appropriate
> > register type and such a pointer can be dereferenced like usual
> > PTR_TO_BTF_ID and be passed to various BPF helpers.
> >
> > This feature can be useful to store an object in a map for a long time,
> > and then inspect it later. Since PTR_TO_BTF_ID is safe against invalid
> > access, verifier doesn't need to perform any complex lifetime checks. It
> > can be useful in cases where user already knows pointer will remain
> > valid, so any dereference at a later time (possibly in entirely
> > different BPF program invocation) will yield correct results as far the
> > data read from kernel memory is concerned.
> >
> > Note that it is quite possible such BTF ID pointer is invalid, in this
> > case the verifier's built-in exception handling mechanism where it
> > converts loads into PTR_TO_BTF_ID into PROBE_MEM loads, would handle the
> > invalid case. Next patch which adds referenced PTR_TO_BTF_ID would need
> > to take more care in ensuring a correct value is stored in the BPF map.
> >
> > The user indicates that a certain pointer must be treated as
> > PTR_TO_BTF_ID by using a BTF type tag 'btf_id' on the pointed to type of
> > the pointer. Then, this information is recorded in the object BTF which
> > will be passed into the kernel by way of map's BTF information.
> >
> > The kernel then records the type, and offset of all such pointers, and
> > finds their corresponding built-in kernel type by the name and BTF kind.
> >
> > Later, during verification this information is used that access to such
> > pointers is sized correctly, and done at a proper offset into the map
> > value. Only BPF_LDX, BPF_STX, and BPF_ST with 0 (to denote NULL) are
> > allowed instructions that can access such a pointer. On BPF_LDX, the
> > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > it is checked whether the source register type is same PTR_TO_BTF_ID,
> > and whether the BTF ID (reg->btf and reg->btf_id) matches the type
> > specified in the map value's definition.
> >
> > Hence, the verifier allows flexible access to kernel data across program
> > invocations in a type safe manner, without compromising on the runtime
> > safety of the kernel.
> >
> > Next patch will extend this support to referenced PTR_TO_BTF_ID.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h     |  30 +++++++-
> >  include/linux/btf.h     |   3 +
> >  kernel/bpf/btf.c        | 127 ++++++++++++++++++++++++++++++++++
> >  kernel/bpf/map_in_map.c |   5 +-
> >  kernel/bpf/syscall.c    | 137 ++++++++++++++++++++++++++++++++++++-
> >  kernel/bpf/verifier.c   | 148 ++++++++++++++++++++++++++++++++++++++++
> >  6 files changed, 446 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f19abc59b6cd..ce45ffb79f82 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -155,6 +155,23 @@ struct bpf_map_ops {
> >  	const struct bpf_iter_seq_info *iter_seq_info;
> >  };
> >
> > +enum {
> > +	/* Support at most 8 pointers in a BPF map value */
> > +	BPF_MAP_VALUE_OFF_MAX = 8,
> > +};
> > +
> > +struct bpf_map_value_off_desc {
> > +	u32 offset;
> > +	u32 btf_id;
> > +	struct btf *btf;
> > +	struct module *module;
> > +};
> > +
> > +struct bpf_map_value_off {
> > +	u32 nr_off;
> > +	struct bpf_map_value_off_desc off[];
> > +};
> > +
> >  struct bpf_map {
> >  	/* The first two cachelines with read-mostly members of which some
> >  	 * are also accessed in fast-path (e.g. ops, max_entries).
> > @@ -171,6 +188,7 @@ struct bpf_map {
> >  	u64 map_extra; /* any per-map-type extra fields */
> >  	u32 map_flags;
> >  	int spin_lock_off; /* >=0 valid offset, <0 error */
> > +	struct bpf_map_value_off *ptr_off_tab;
> >  	int timer_off; /* >=0 valid offset, <0 error */
> >  	u32 id;
> >  	int numa_node;
> > @@ -184,7 +202,7 @@ struct bpf_map {
> >  	char name[BPF_OBJ_NAME_LEN];
> >  	bool bypass_spec_v1;
> >  	bool frozen; /* write-once; write-protected by freeze_mutex */
> > -	/* 14 bytes hole */
> > +	/* 6 bytes hole */
> >
> >  	/* The 3rd and 4th cacheline with misc members to avoid false sharing
> >  	 * particularly with refcounting.
> > @@ -217,6 +235,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
> >  	return map->timer_off >= 0;
> >  }
> >
> > +static inline bool map_value_has_ptr_to_btf_id(const struct bpf_map *map)
> > +{
> > +	return !IS_ERR_OR_NULL(map->ptr_off_tab);
> > +}
> > +
> >  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >  {
> >  	if (unlikely(map_value_has_spin_lock(map)))
> > @@ -1490,6 +1513,11 @@ void bpf_prog_put(struct bpf_prog *prog);
> >  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
> >  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
> >
> > +struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32 offset);
> > +void bpf_map_free_ptr_off_tab(struct bpf_map *map);
> > +struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map);
> > +bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
> > +
> >  struct bpf_map *bpf_map_get(u32 ufd);
> >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> >  struct bpf_map *__bpf_map_get(struct fd f);
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 36bc09b8e890..6592183aeb23 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -26,6 +26,7 @@ struct btf_type;
> >  union bpf_attr;
> >  struct btf_show;
> >  struct btf_id_set;
> > +struct bpf_map;
> >
> >  struct btf_kfunc_id_set {
> >  	struct module *owner;
> > @@ -123,6 +124,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> >  			   u32 expected_offset, u32 expected_size);
> >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > +int btf_find_ptr_to_btf_id(const struct btf *btf, const struct btf_type *t,
> > +			   struct bpf_map *map);
> >  bool btf_type_is_void(const struct btf_type *t);
> >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 55f6ccac3388..1edb5710e155 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -3122,6 +3122,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
> >  enum {
> >  	BTF_FIELD_SPIN_LOCK,
> >  	BTF_FIELD_TIMER,
> > +	BTF_FIELD_KPTR,
> >  };
> >
> >  static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
> > @@ -3140,6 +3141,106 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
> >  	return 0;
> >  }
> >
> > +static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp);
> > +
> > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > +			       u32 off, int sz, void *data)
> > +{
> > +	struct bpf_map_value_off *tab;
> > +	struct bpf_map *map = data;
> > +	struct module *mod = NULL;
> > +	bool btf_id_tag = false;
> > +	struct btf *kernel_btf;
> > +	int nr_off, ret;
> > +	s32 id;
> > +
> > +	/* For PTR, sz is always == 8 */
> > +	if (!btf_type_is_ptr(t))
> > +		return 0;
> > +	t = btf_type_by_id(btf, t->type);
> > +
> > +	while (btf_type_is_type_tag(t)) {
> > +		if (!strcmp("kernel.bpf.btf_id", __btf_name_by_offset(btf, t->name_off))) {
>
> All of these strings consume space.
> Multiple tags consume space too.
> I would just do:
> #define __kptr __attribute__((btf_type_tag("kptr")))
> #define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
> #define __kptr_percpu __attribute__((btf_type_tag("kptr_percpu")))
> #define __kptr_user __attribute__((btf_type_tag("kptr_user")))
>

Ok.

> > +			/* repeated tag */
> > +			if (btf_id_tag) {
> > +				ret = -EINVAL;
> > +				goto end;
> > +			}
> > +			btf_id_tag = true;
> > +		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
> > +			   sizeof("kernel.") - 1)) {
> > +			/* TODO: Should we reject these when loading BTF? */
> > +			/* Unavailable tag in reserved tag namespace */
>
> I don't think we need to reserve the tag space.
> There is little risk to break progs with future tags.
> I would just drop this 'if'.
>

Fine with dropping, but what is the expected behavior when userspace has set a
tag in map value BTF that we give some meaning in the kernel later?

> > +			ret = -EACCES;
> > +			goto end;
> > +		}
> > +		/* Look for next tag */
> > +		t = btf_type_by_id(btf, t->type);
> > +	}
> > +	if (!btf_id_tag)
> > +		return 0;
> > +
> > +	/* Get the base type */
> > +	if (btf_type_is_modifier(t))
> > +		t = btf_type_skip_modifiers(btf, t->type, NULL);
> > +	/* Only pointer to struct is allowed */
> > +	if (!__btf_type_is_struct(t)) {
> > +		ret = -EINVAL;
> > +		goto end;
> > +	}
> > +
> > +	id = btf_find_by_name_kind_all(__btf_name_by_offset(btf, t->name_off),
> > +				       BTF_INFO_KIND(t->info), &kernel_btf);
> > +	if (id < 0) {
> > +		ret = id;
> > +		goto end;
> > +	}
> > +
> > +	nr_off = map->ptr_off_tab ? map->ptr_off_tab->nr_off : 0;
> > +	if (nr_off == BPF_MAP_VALUE_OFF_MAX) {
> > +		ret = -E2BIG;
> > +		goto end_btf;
> > +	}
> > +
> > +	tab = krealloc(map->ptr_off_tab, offsetof(struct bpf_map_value_off, off[nr_off + 1]),
> > +		       GFP_KERNEL | __GFP_NOWARN);
>
> Argh.
> If the function is called btf_find_field() it should do 'find' and only 'find'.
> It should be side effect free and should find _one_ field.
> If you want a function with side effcts it should be called something like btf_walk_fields.
>
> For this case how about side effect free btf_find_fieldS() that will populate array
> struct bpf_field_info {
>   struct btf *type; /* set for spin_lock, timer, kptr */
>   u32 off;
>   int flags; /* ref|percpu|user for kptr */
> };
>
> cnt = btf_find_fields(prog_btf, value_type, BTF_FIELD_SPIN_LOCK|TIMER|KPTR, fields);
>
> btf_find_struct_field/btf_find_datasec_var will keep the count and will error
> when it reaches BPF_MAP_VALUE_OFF_MAX.
> switch (field_type) {
> case BTF_FIELD_SPIN_LOCK:
>    btf_find_field_struct(... "bpf_spin_lock",
>                              sizeof(struct bpf_spin_lock),
>                              __alignof__(struct bpf_spin_lock),
>                              fields + i);
> case BTF_FIELD_TIMER:
>    btf_find_field_struct(... "bpf_timer", sizeof, alignof, fields + i);
> case BTF_FIELD_KPTR:
>    btf_find_field_kptr(... fields + i);
> }
>
> btf_find_by_name_kind_all (or new name bpf_find_btf_id)
> will be done after btf_find_fields() is over.
> dtor will be found after as well.
> struct bpf_map_value_off will be allocated once.
>

Ack, sounds good.

> > +	if (!tab) {
> > +		ret = -ENOMEM;
> > +		goto end_btf;
> > +	}
> > +	/* Initialize nr_off for newly allocated ptr_off_tab */
> > +	if (!map->ptr_off_tab)
> > +		tab->nr_off = 0;
> > +	map->ptr_off_tab = tab;
> > +
> > +	/* We take reference to make sure valid pointers into module data don't
> > +	 * become invalid across program invocation.
> > +	 */
>
> what is the point of grabbing mod ref?
> This patch needs btf only and its refcnt will be incremented by bpf_find_btf_id.
> Is that because of future dtor ?
> Then it should be part of that patch.
>

Right, screwed it up while rebasing. Will fix.

> > +	if (btf_is_module(kernel_btf)) {
> > +		mod = btf_try_get_module(kernel_btf);
> > +		if (!mod) {
> > +			ret = -ENXIO;
> > +			goto end_btf;
> > +		}
> > +	}
> > +
> > +	tab->off[nr_off].offset = off;
> > +	tab->off[nr_off].btf_id = id;
> > +	tab->off[nr_off].btf    = kernel_btf;
> > +	tab->off[nr_off].module = mod;
> > +	tab->nr_off++;
> > +
> > +	return 0;
> > +end_btf:
> > +	/* Reference is only raised for module BTF */
> > +	if (btf_is_module(kernel_btf))
> > +		btf_put(kernel_btf);
>
> see earlier suggestion. this 'if' can be dropped if we btf_get for vmlinux_btf too.
>
> > +end:
> > +	bpf_map_free_ptr_off_tab(map);
> > +	map->ptr_off_tab = ERR_PTR(ret);
> > +	return ret;
> > +}

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case
  2022-02-22  7:04   ` Alexei Starovoitov
@ 2022-02-23  3:13     ` Kumar Kartikeya Dwivedi
  2022-02-23 21:41       ` Alexei Starovoitov
  0 siblings, 1 reply; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-23  3:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, netdev

On Tue, Feb 22, 2022 at 12:34:05PM IST, Alexei Starovoitov wrote:
> On Sun, Feb 20, 2022 at 07:18:06PM +0530, Kumar Kartikeya Dwivedi wrote:
> > The changes in this patch deserve closer look, so it has been split into
> > its own independent patch. While earlier we just had to skip two objects
> > at most while copying in and out of map, now we have potentially many
> > objects (at most 8 + 2 = 10, due to the BPF_MAP_VALUE_OFF_MAX limit).
> >
> > Hence, divide the copy_map_value function into an inlined fast path and
> > function call to slowpath. The slowpath handles the case of > 3 offsets,
> > while we handle the most common cases (0, 1, 2, or 3 offsets) in the
> > inline function itself.
> >
> > In copy_map_value_slow, we use 11 offsets, just to make the for loop
> > that copies the value free of edge cases for the last offset, by using
> > map->value_size as final offset to subtract remaining area to copy from.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h  | 43 +++++++++++++++++++++++++++++++---
> >  kernel/bpf/syscall.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 95 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index ae599aaf8d4c..5d845ca02eba 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -253,12 +253,22 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >  		memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
> >  	if (unlikely(map_value_has_timer(map)))
> >  		memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> > +	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> > +		struct bpf_map_value_off *tab = map->ptr_off_tab;
> > +		int i;
> > +
> > +		for (i = 0; i < tab->nr_off; i++)
> > +			*(u64 *)(dst + tab->off[i].offset) = 0;
> > +	}
> >  }
> >
> > +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> > +			 u32 s_sz, u32 t_off, u32 t_sz);
> > +
> >  /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
> >  static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
> >  {
> > -	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
> > +	u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0, p_off = 0, p_sz = 0;
> >
> >  	if (unlikely(map_value_has_spin_lock(map))) {
> >  		s_off = map->spin_lock_off;
> > @@ -268,13 +278,40 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
> >  		t_off = map->timer_off;
> >  		t_sz = sizeof(struct bpf_timer);
> >  	}
> > +	/* Multiple offset case is slow, offload to function */
> > +	if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> > +		struct bpf_map_value_off *tab = map->ptr_off_tab;
> > +
> > +		/* Inline the likely common case */
> > +		if (likely(tab->nr_off == 1)) {
> > +			p_off = tab->off[0].offset;
> > +			p_sz = sizeof(u64);
> > +		} else {
> > +			copy_map_value_slow(map, dst, src, s_off, s_sz, t_off, t_sz);
> > +			return;
> > +		}
> > +	}
> > +
> > +	if (unlikely(s_sz || t_sz || p_sz)) {
> > +		/* The order is p_off, t_off, s_off, use insertion sort */
> >
> > -	if (unlikely(s_sz || t_sz)) {
> > +		if (t_off < p_off || !t_sz) {
> > +			swap(t_off, p_off);
> > +			swap(t_sz, p_sz);
> > +		}
> >  		if (s_off < t_off || !s_sz) {
> >  			swap(s_off, t_off);
> >  			swap(s_sz, t_sz);
> > +			if (t_off < p_off || !t_sz) {
> > +				swap(t_off, p_off);
> > +				swap(t_sz, p_sz);
> > +			}
> >  		}
> > -		memcpy(dst, src, t_off);
> > +
> > +		memcpy(dst, src, p_off);
> > +		memcpy(dst + p_off + p_sz,
> > +		       src + p_off + p_sz,
> > +		       t_off - p_off - p_sz);
> >  		memcpy(dst + t_off + t_sz,
> >  		       src + t_off + t_sz,
> >  		       s_off - t_off - t_sz);
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index beb96866f34d..83d71d6912f5 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -30,6 +30,7 @@
> >  #include <linux/pgtable.h>
> >  #include <linux/bpf_lsm.h>
> >  #include <linux/poll.h>
> > +#include <linux/sort.h>
> >  #include <linux/bpf-netns.h>
> >  #include <linux/rcupdate_trace.h>
> >  #include <linux/memcontrol.h>
> > @@ -230,6 +231,60 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
> >  	return err;
> >  }
> >
> > +static int copy_map_value_cmp(const void *_a, const void *_b)
> > +{
> > +	const u32 a = *(const u32 *)_a;
> > +	const u32 b = *(const u32 *)_b;
> > +
> > +	/* We only need to sort based on offset */
> > +	if (a < b)
> > +		return -1;
> > +	else if (a > b)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> > +			 u32 s_sz, u32 t_off, u32 t_sz)
> > +{
> > +	struct bpf_map_value_off *tab = map->ptr_off_tab; /* already set to non-NULL */
> > +	/* 3 = 2 for bpf_timer, bpf_spin_lock, 1 for map->value_size sentinel */
> > +	struct {
> > +		u32 off;
> > +		u32 sz;
> > +	} off_arr[BPF_MAP_VALUE_OFF_MAX + 3];
> > +	int i, cnt = 0;
> > +
> > +	/* Reconsider stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > +	BUILD_BUG_ON(sizeof(off_arr) != 88);
> > +
> > +	for (i = 0; i < tab->nr_off; i++) {
> > +		off_arr[cnt].off = tab->off[i].offset;
> > +		off_arr[cnt++].sz = sizeof(u64);
> > +	}
> > +	if (s_sz) {
> > +		off_arr[cnt].off = s_off;
> > +		off_arr[cnt++].sz = s_sz;
> > +	}
> > +	if (t_sz) {
> > +		off_arr[cnt].off = t_off;
> > +		off_arr[cnt++].sz = t_sz;
> > +	}
> > +	off_arr[cnt].off = map->value_size;
> > +
> > +	sort(off_arr, cnt, sizeof(off_arr[0]), copy_map_value_cmp, NULL);
>
> Ouch. sort every time we need to copy map value?
> sort it once please. 88 bytes in a map are worth it.
> Especially since "slow" version will trigger with just 2 kptrs.
> (if I understand this correctly).

Ok, also think we can reduce the size of the 88 bytes down to 55 bytes (32-bit
off + 8-bit size), and embed it in struct map. Then the shuffling needed for
timer and spin lock should also be gone.

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps
  2022-02-22  8:21   ` Kumar Kartikeya Dwivedi
@ 2022-02-23  7:29     ` Song Liu
  0 siblings, 0 replies; 38+ messages in thread
From: Song Liu @ 2022-02-23  7:29 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Networking

On Tue, Feb 22, 2022 at 12:21 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
[...]


> >
> > I guess I missed some context here. Could you please provide some reference
> > to the use cases of these features?
> >
>
> The common usecase is caching references to objects inside BPF maps, to avoid
> costly lookups, and being able to raise it once for the duration of program
> invocation when passing it to multiple helpers (to avoid further re-lookups).
> Storing references also allows you to control object lifetime.
>
> One other use case is enabling xdp_frame queueing in XDP using this, but that
> still needs some integration work after this lands, so it's a bit early to
> comment on the specifics.
>
> Other than that, I think Alexei already mentioned this could be easily extended
> to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the
> future.
>
>   [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@ast-mbp.dhcp.thefacebook.com
>
> > For Unreferenced kernel pointer and userspace pointer, it seems that there is
> > no guarantee the pointer will still be valid during access (we only know it is
> > valid when it is stored in the map). Is this correct?
> >
>
> That is correct. In the case of unreferenced and referenced kernel pointers,
> when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed
> to pass them into helpers or kfuncs, because from that point onwards we cannot
> claim that the object is still alive when pointer is used later. Still,
> dereference is permitted because verifier handles faults for bad accesses using
> PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is
> then later detected by JIT to build exception table used by exception handler).
>
> In case of reading unreferenced pointer, in some cases you know that the pointer
> will stay valid, so you can just store it in the map and load and directly
> access it, it imposes very little restrictions.
>
> For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say
> that this makes it a lot less useful, because if BPF program already holds
> reference, just to make sure I _read valid data_, I still have to use the
> kptr_get style helper to raise and put reference to ensure the object is alive
> when it is accessed.
>
> So in that case, for RCU protected objects, it should still wait for BPF program
> to hit BPF_EXIT before the actual release, but for other cases like the case of
> sleepable programs, or objects where refcount alone manages lifetime, you can
> also detect writer presence of the other BPF program (to detect if pointer
> during our access was xchg'd out) using a seqlock style scheme:
>
>         v = bpf_map_lookup_elem(&map, ...);
>         if (!v)
>                 return 0;
>         seq_begin = v->seq;
>         atomic_thread_fence(memory_order_acquire); // A
>         <do access>
>         atomic_thread_fence(memory_order_acquire); // B
>         seq_end = v->seq;
>         if (seq_begin & 1 || seq_begin != seq_end)
>                 goto bad_read;
>         <use data>
>
> Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on
> x86). The updater BPF program will increment v->seq before and after xchg,
> ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update
> is in progress.
>
> This would allow you to not raise refcount, while still ensuring that as long as
> object was accessed, it was still valid between A and B. Even if raising
> uncontended refcount is cheap, this is much cheaper.
>
> The case of userspace pointer is different, it sets the MEM_USER flag, so the
> only useful thing to do is calling bpf_probe_read_user, you can't even
> dereference it. You are right that in most cases that userspace pointer won't be
> useful, but for some cooperative cases between BPF program and userspace thread,
> it can act as a way to share certain thread local areas/userspace memory that
> the BPF program can then store keyed by the task_struct *, where using a BPF map
> to share memory is not always possible.

Thanks for the explanation! I can see the referenced kernel pointer be very
powerful in many use cases. The per cpu pointer is also interesting.

Song

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case
  2022-02-23  3:13     ` Kumar Kartikeya Dwivedi
@ 2022-02-23 21:41       ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-23 21:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Tue, Feb 22, 2022 at 7:13 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Feb 22, 2022 at 12:34:05PM IST, Alexei Starovoitov wrote:
> > On Sun, Feb 20, 2022 at 07:18:06PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > The changes in this patch deserve closer look, so it has been split into
> > > its own independent patch. While earlier we just had to skip two objects
> > > at most while copying in and out of map, now we have potentially many
> > > objects (at most 8 + 2 = 10, due to the BPF_MAP_VALUE_OFF_MAX limit).
> > >
> > > Hence, divide the copy_map_value function into an inlined fast path and
> > > function call to slowpath. The slowpath handles the case of > 3 offsets,
> > > while we handle the most common cases (0, 1, 2, or 3 offsets) in the
> > > inline function itself.
> > >
> > > In copy_map_value_slow, we use 11 offsets, just to make the for loop
> > > that copies the value free of edge cases for the last offset, by using
> > > map->value_size as final offset to subtract remaining area to copy from.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h  | 43 +++++++++++++++++++++++++++++++---
> > >  kernel/bpf/syscall.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 95 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index ae599aaf8d4c..5d845ca02eba 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -253,12 +253,22 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > >             memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
> > >     if (unlikely(map_value_has_timer(map)))
> > >             memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
> > > +   if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> > > +           struct bpf_map_value_off *tab = map->ptr_off_tab;
> > > +           int i;
> > > +
> > > +           for (i = 0; i < tab->nr_off; i++)
> > > +                   *(u64 *)(dst + tab->off[i].offset) = 0;
> > > +   }
> > >  }
> > >
> > > +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> > > +                    u32 s_sz, u32 t_off, u32 t_sz);
> > > +
> > >  /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
> > >  static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
> > >  {
> > > -   u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
> > > +   u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0, p_off = 0, p_sz = 0;
> > >
> > >     if (unlikely(map_value_has_spin_lock(map))) {
> > >             s_off = map->spin_lock_off;
> > > @@ -268,13 +278,40 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
> > >             t_off = map->timer_off;
> > >             t_sz = sizeof(struct bpf_timer);
> > >     }
> > > +   /* Multiple offset case is slow, offload to function */
> > > +   if (unlikely(map_value_has_ptr_to_btf_id(map))) {
> > > +           struct bpf_map_value_off *tab = map->ptr_off_tab;
> > > +
> > > +           /* Inline the likely common case */
> > > +           if (likely(tab->nr_off == 1)) {
> > > +                   p_off = tab->off[0].offset;
> > > +                   p_sz = sizeof(u64);
> > > +           } else {
> > > +                   copy_map_value_slow(map, dst, src, s_off, s_sz, t_off, t_sz);
> > > +                   return;
> > > +           }
> > > +   }
> > > +
> > > +   if (unlikely(s_sz || t_sz || p_sz)) {
> > > +           /* The order is p_off, t_off, s_off, use insertion sort */
> > >
> > > -   if (unlikely(s_sz || t_sz)) {
> > > +           if (t_off < p_off || !t_sz) {
> > > +                   swap(t_off, p_off);
> > > +                   swap(t_sz, p_sz);
> > > +           }
> > >             if (s_off < t_off || !s_sz) {
> > >                     swap(s_off, t_off);
> > >                     swap(s_sz, t_sz);
> > > +                   if (t_off < p_off || !t_sz) {
> > > +                           swap(t_off, p_off);
> > > +                           swap(t_sz, p_sz);
> > > +                   }
> > >             }
> > > -           memcpy(dst, src, t_off);
> > > +
> > > +           memcpy(dst, src, p_off);
> > > +           memcpy(dst + p_off + p_sz,
> > > +                  src + p_off + p_sz,
> > > +                  t_off - p_off - p_sz);
> > >             memcpy(dst + t_off + t_sz,
> > >                    src + t_off + t_sz,
> > >                    s_off - t_off - t_sz);
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index beb96866f34d..83d71d6912f5 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -30,6 +30,7 @@
> > >  #include <linux/pgtable.h>
> > >  #include <linux/bpf_lsm.h>
> > >  #include <linux/poll.h>
> > > +#include <linux/sort.h>
> > >  #include <linux/bpf-netns.h>
> > >  #include <linux/rcupdate_trace.h>
> > >  #include <linux/memcontrol.h>
> > > @@ -230,6 +231,60 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
> > >     return err;
> > >  }
> > >
> > > +static int copy_map_value_cmp(const void *_a, const void *_b)
> > > +{
> > > +   const u32 a = *(const u32 *)_a;
> > > +   const u32 b = *(const u32 *)_b;
> > > +
> > > +   /* We only need to sort based on offset */
> > > +   if (a < b)
> > > +           return -1;
> > > +   else if (a > b)
> > > +           return 1;
> > > +   return 0;
> > > +}
> > > +
> > > +void copy_map_value_slow(struct bpf_map *map, void *dst, void *src, u32 s_off,
> > > +                    u32 s_sz, u32 t_off, u32 t_sz)
> > > +{
> > > +   struct bpf_map_value_off *tab = map->ptr_off_tab; /* already set to non-NULL */
> > > +   /* 3 = 2 for bpf_timer, bpf_spin_lock, 1 for map->value_size sentinel */
> > > +   struct {
> > > +           u32 off;
> > > +           u32 sz;
> > > +   } off_arr[BPF_MAP_VALUE_OFF_MAX + 3];
> > > +   int i, cnt = 0;
> > > +
> > > +   /* Reconsider stack usage when bumping BPF_MAP_VALUE_OFF_MAX */
> > > +   BUILD_BUG_ON(sizeof(off_arr) != 88);
> > > +
> > > +   for (i = 0; i < tab->nr_off; i++) {
> > > +           off_arr[cnt].off = tab->off[i].offset;
> > > +           off_arr[cnt++].sz = sizeof(u64);
> > > +   }
> > > +   if (s_sz) {
> > > +           off_arr[cnt].off = s_off;
> > > +           off_arr[cnt++].sz = s_sz;
> > > +   }
> > > +   if (t_sz) {
> > > +           off_arr[cnt].off = t_off;
> > > +           off_arr[cnt++].sz = t_sz;
> > > +   }
> > > +   off_arr[cnt].off = map->value_size;
> > > +
> > > +   sort(off_arr, cnt, sizeof(off_arr[0]), copy_map_value_cmp, NULL);
> >
> > Ouch. sort every time we need to copy map value?
> > sort it once please. 88 bytes in a map are worth it.
> > Especially since "slow" version will trigger with just 2 kptrs.
> > (if I understand this correctly).
>
> Ok, also think we can reduce the size of the 88 bytes down to 55 bytes (32-bit
> off + 8-bit size), and embed it in struct map. Then the shuffling needed for
> timer and spin lock should also be gone.

We can probably make this copy_map_value_slow to be the only function.
I suspect
+       /* There is always at least one element */
+       memcpy(dst, src, off_arr[0].off);
+       /* Copy the rest, while skipping other regions */
+       for (i = 1; i < cnt; i++) {
+               u32 curr_off = off_arr[i - 1].off + off_arr[i - 1].sz;
+               u32 next_off = off_arr[i].off;
+
+               memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
+       }
is faster than the dance we currently do in copy_map_value with a bunch of if-s.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map
  2022-02-23  3:09     ` Kumar Kartikeya Dwivedi
@ 2022-02-23 21:46       ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-23 21:46 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Tue, Feb 22, 2022 at 7:09 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
> > > +                   }
> > > +                   btf_id_tag = true;
> > > +           } else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
> > > +                      sizeof("kernel.") - 1)) {
> > > +                   /* TODO: Should we reject these when loading BTF? */
> > > +                   /* Unavailable tag in reserved tag namespace */
> >
> > I don't think we need to reserve the tag space.
> > There is little risk to break progs with future tags.
> > I would just drop this 'if'.
> >
>
> Fine with dropping, but what is the expected behavior when userspace has set a
> tag in map value BTF that we give some meaning in the kernel later?

All of these features fall into the unstable category.
kfuncs can disappear. kernel data structs can get renamed.
dtor, kptr_get functions not only can change, but can be removed.
When bpf progs are so tightly interacting with the kernel they
have to change and adjust.
Eventually we might bolt a bit of CO-RE like logic to kfunc and kptr
to make things more portable, but it's too early to reserve a btf_tag prefix.
Progs will change. We're not saving users any headache with reservation.
Tracing bpf progs are not user space. That's the key.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-23  3:04         ` Kumar Kartikeya Dwivedi
@ 2022-02-23 21:52           ` Alexei Starovoitov
  2022-02-24  8:43             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2022-02-23 21:52 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Tue, Feb 22, 2022 at 7:04 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Feb 22, 2022 at 09:50:00PM IST, Alexei Starovoitov wrote:
> > On Mon, Feb 21, 2022 at 11:10 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Tue, Feb 22, 2022 at 12:23:49PM IST, Alexei Starovoitov wrote:
> > > > On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > >  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
> > > > >                         int off, int bpf_size, enum bpf_access_type t,
> > > > > -                       int value_regno, bool strict_alignment_once)
> > > > > +                       int value_regno, bool strict_alignment_once,
> > > > > +                       struct bpf_reg_state *atomic_load_reg)
> > > >
> > > > No new side effects please.
> > > > value_regno is not pretty already.
> > > > At least its known ugliness that we need to clean up one day.
> > > >
> > > > >  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> > > > >  {
> > > > > +   struct bpf_reg_state atomic_load_reg;
> > > > >     int load_reg;
> > > > >     int err;
> > > > >
> > > > > +   __mark_reg_unknown(env, &atomic_load_reg);
> > > > > +
> > > > >     switch (insn->imm) {
> > > > >     case BPF_ADD:
> > > > >     case BPF_ADD | BPF_FETCH:
> > > > > @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > > >             else
> > > > >                     load_reg = insn->src_reg;
> > > > >
> > > > > +           atomic_load_reg = *reg_state(env, load_reg);
> > > > >             /* check and record load of old value */
> > > > >             err = check_reg_arg(env, load_reg, DST_OP);
> > > > >             if (err)
> > > > > @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > > >     }
> > > > >
> > > > >     /* Check whether we can read the memory, with second call for fetch
> > > > > -    * case to simulate the register fill.
> > > > > +    * case to simulate the register fill, which also triggers checks
> > > > > +    * for manipulation of BTF ID pointers embedded in BPF maps.
> > > > >      */
> > > > >     err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > > > -                          BPF_SIZE(insn->code), BPF_READ, -1, true);
> > > > > +                          BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
> > > > >     if (!err && load_reg >= 0)
> > > > >             err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > > >                                    BPF_SIZE(insn->code), BPF_READ, load_reg,
> > > > > -                                  true);
> > > > > +                                  true, load_reg >= 0 ? &atomic_load_reg : NULL);
> > > >
> > > > Special xchg logic should be down outside of check_mem_access()
> > > > instead of hidden by layers of calls.
> > >
> > > Right, it's ugly, but if we don't capture the reg state before that
> > > check_reg_arg(env, load_reg, DST_OP), it's not possible to see the actual
> > > PTR_TO_BTF_ID being moved into the map, since check_reg_arg will do a
> > > mark_reg_unknown for value_regno. Any other ideas on what I can do?
> > >
> > > 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
> > > changed the order of check_mem_access and DST_OP check_reg_arg.
> >
> > That highlights my point that side effects are bad.
> > That commit tries to work around that behavior and makes things
> > harder to extend like you found out with xchg logic.
> > Another option would be to add bpf_kptr_xchg() helper
> > instead of dealing with insn. It will be tiny bit slower,
> > but it will work on all architectures. While xchg bpf jit is
> > on x86,s390,mips so far.
>
> Right, but kfunc is currently limited to x86, which is required to obtain a
> refcounted PTR_TO_BTF_ID that you can move into the map, so it wouldn't make
> much of a difference.

Well the patches to add trampoline support to powerpc were already posted.

> > We need to think more on how to refactor check_mem_acess without
> > digging ourselves into an even bigger hole.
>
> So I'm ok with working on untangling check_mem_access as a follow up, but for
> now should we go forward with how it is? Just looking at it yesterday makes me
> think it's going to require a fair amount of refactoring and discussion.
>
> Also, do you have any ideas on how to change it? Do you want it to work like how
> is_valid_access callbacks work? So passing something like a bpf_insn_access_aux
> into the call, where it sets how it'd like to update the register, and then
> actual updates take place in caller context?

I don't like callbacks in general.
They're fine for walk_the_tree, for_each_elem accessors,
but passing a callback into check_mem_access is not great.
Do you mind going with a bpf_kptr_xchg() helper for now
and optimizing into direct xchg insn later?
It's not clear whether it's going to be faster to be noticeable.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH bpf-next v1 04/15] bpf: Allow storing referenced PTR_TO_BTF_ID in map
  2022-02-23 21:52           ` Alexei Starovoitov
@ 2022-02-24  8:43             ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 38+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-02-24  8:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	netfilter-devel, Network Development

On Thu, Feb 24, 2022 at 03:22:43AM IST, Alexei Starovoitov wrote:
> On Tue, Feb 22, 2022 at 7:04 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 09:50:00PM IST, Alexei Starovoitov wrote:
> > > On Mon, Feb 21, 2022 at 11:10 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 22, 2022 at 12:23:49PM IST, Alexei Starovoitov wrote:
> > > > > On Sun, Feb 20, 2022 at 07:18:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > >  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno,
> > > > > >                         int off, int bpf_size, enum bpf_access_type t,
> > > > > > -                       int value_regno, bool strict_alignment_once)
> > > > > > +                       int value_regno, bool strict_alignment_once,
> > > > > > +                       struct bpf_reg_state *atomic_load_reg)
> > > > >
> > > > > No new side effects please.
> > > > > value_regno is not pretty already.
> > > > > At least its known ugliness that we need to clean up one day.
> > > > >
> > > > > >  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
> > > > > >  {
> > > > > > +   struct bpf_reg_state atomic_load_reg;
> > > > > >     int load_reg;
> > > > > >     int err;
> > > > > >
> > > > > > +   __mark_reg_unknown(env, &atomic_load_reg);
> > > > > > +
> > > > > >     switch (insn->imm) {
> > > > > >     case BPF_ADD:
> > > > > >     case BPF_ADD | BPF_FETCH:
> > > > > > @@ -4813,6 +4894,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > > > >             else
> > > > > >                     load_reg = insn->src_reg;
> > > > > >
> > > > > > +           atomic_load_reg = *reg_state(env, load_reg);
> > > > > >             /* check and record load of old value */
> > > > > >             err = check_reg_arg(env, load_reg, DST_OP);
> > > > > >             if (err)
> > > > > > @@ -4825,20 +4907,21 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> > > > > >     }
> > > > > >
> > > > > >     /* Check whether we can read the memory, with second call for fetch
> > > > > > -    * case to simulate the register fill.
> > > > > > +    * case to simulate the register fill, which also triggers checks
> > > > > > +    * for manipulation of BTF ID pointers embedded in BPF maps.
> > > > > >      */
> > > > > >     err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > > > > -                          BPF_SIZE(insn->code), BPF_READ, -1, true);
> > > > > > +                          BPF_SIZE(insn->code), BPF_READ, -1, true, NULL);
> > > > > >     if (!err && load_reg >= 0)
> > > > > >             err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> > > > > >                                    BPF_SIZE(insn->code), BPF_READ, load_reg,
> > > > > > -                                  true);
> > > > > > +                                  true, load_reg >= 0 ? &atomic_load_reg : NULL);
> > > > >
> > > > > Special xchg logic should be down outside of check_mem_access()
> > > > > instead of hidden by layers of calls.
> > > >
> > > > Right, it's ugly, but if we don't capture the reg state before that
> > > > check_reg_arg(env, load_reg, DST_OP), it's not possible to see the actual
> > > > PTR_TO_BTF_ID being moved into the map, since check_reg_arg will do a
> > > > mark_reg_unknown for value_regno. Any other ideas on what I can do?
> > > >
> > > > 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
> > > > changed the order of check_mem_access and DST_OP check_reg_arg.
> > >
> > > That highlights my point that side effects are bad.
> > > That commit tries to work around that behavior and makes things
> > > harder to extend like you found out with xchg logic.
> > > Another option would be to add bpf_kptr_xchg() helper
> > > instead of dealing with insn. It will be tiny bit slower,
> > > but it will work on all architectures. While xchg bpf jit is
> > > on x86,s390,mips so far.
> >
> > Right, but kfunc is currently limited to x86, which is required to obtain a
> > refcounted PTR_TO_BTF_ID that you can move into the map, so it wouldn't make
> > much of a difference.
>
> Well the patches to add trampoline support to powerpc were already posted.
>
> > > We need to think more on how to refactor check_mem_acess without
> > > digging ourselves into an even bigger hole.
> >
> > So I'm ok with working on untangling check_mem_access as a follow up, but for
> > now should we go forward with how it is? Just looking at it yesterday makes me
> > think it's going to require a fair amount of refactoring and discussion.
> >
> > Also, do you have any ideas on how to change it? Do you want it to work like how
> > is_valid_access callbacks work? So passing something like a bpf_insn_access_aux
> > into the call, where it sets how it'd like to update the register, and then
> > actual updates take place in caller context?
>
> I don't like callbacks in general.
> They're fine for walk_the_tree, for_each_elem accessors,
> but passing a callback into check_mem_access is not great.

I didn't mean passing a callback, I meant passing a struct like you mentioned in
a previous comment to another patch (btf_field_info) where we can set state that
must be updated for the register, and then updates are done by the caller, to
separate the 'side effects' from the other checks. is_valid_access verifier
callback receive a similar bpf_insn_access_aux parameter which is then used to
update register state.

> Do you mind going with a bpf_kptr_xchg() helper for now
> and optimizing into direct xchg insn later?

I don't have a problem with that. I just didn't see any advantages (except the
wider architecture support that you pointed out). We still have to special case
some places in check_helper_call (since it needs to transfer R1's btf_id to R0,
and work with all PTR_TO_BTF_ID, not just 1), so the implementation is similar.

I guess for most usecases it wouldn't matter much.

> It's not clear whether it's going to be faster to be noticeable.

Just for curiosity, I measured a loop of 5000 xchg ops, one with bpf_xchg, one
with bpf_kptr_xchg. This is the simple case (uncontended, raw cost of both
operations) xchg insn is at ~4 nsecs, bpf_kptr_xchg is at ~8 nsecs (single
socket 8 core Intel i5 @ 2.5GHz). I'm guessing in a complicated case spill/fill
of caller saved regs will also come into play for the helper case.

--
Kartikeya

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-02-24  8:43 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
2022-02-22  5:28   ` Alexei Starovoitov
2022-02-23  3:05     ` Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 02/15] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-22  6:46   ` Alexei Starovoitov
2022-02-23  3:09     ` Kumar Kartikeya Dwivedi
2022-02-23 21:46       ` Alexei Starovoitov
2022-02-20 13:48 ` [PATCH bpf-next v1 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
2022-02-22  6:53   ` Alexei Starovoitov
2022-02-22  7:10     ` Kumar Kartikeya Dwivedi
2022-02-22 16:20       ` Alexei Starovoitov
2022-02-23  3:04         ` Kumar Kartikeya Dwivedi
2022-02-23 21:52           ` Alexei Starovoitov
2022-02-24  8:43             ` Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID " Kumar Kartikeya Dwivedi
2022-02-20 20:40   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 06/15] bpf: Allow storing __user PTR_TO_BTF_ID " Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 07/15] bpf: Prevent escaping of pointers loaded from maps Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
2022-02-22  7:04   ` Alexei Starovoitov
2022-02-23  3:13     ` Kumar Kartikeya Dwivedi
2022-02-23 21:41       ` Alexei Starovoitov
2022-02-20 13:48 ` [PATCH bpf-next v1 09/15] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-20 21:43   ` kernel test robot
2022-02-20 22:55   ` kernel test robot
2022-02-21  0:39   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 11/15] bpf: Teach verifier about kptr_get style kfunc helpers Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper Kumar Kartikeya Dwivedi
2022-02-21  4:35   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 13/15] libbpf: Add __kptr* macros to bpf_helpers.h Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 14/15] selftests/bpf: Add C tests for PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 15/15] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
2022-02-22  6:05 ` [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Song Liu
2022-02-22  8:21   ` Kumar Kartikeya Dwivedi
2022-02-23  7:29     ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.