All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
@ 2022-11-17  3:23 David Vernet
  2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: David Vernet @ 2022-11-17  3:23 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

Now that BPF supports adding new kernel functions with kfuncs, and
storing kernel objects in maps with kptrs, we can add a set of kfuncs
which allow struct task_struct objects to be stored in maps as
referenced kptrs.

The possible use cases for doing this are plentiful.  During tracing,
for example, it would be useful to be able to collect some tasks that
performed a certain operation, and then periodically summarize who they
are, which cgroup they're in, how much CPU time they've utilized, etc.
Doing this now would require storing the tasks' pids along with some
relevant data to be exported to user space, and later associating the
pids to tasks in other event handlers where the data is recorded.
Another useful by-product of this is that it allows a program to pin a
task in a BPF program, and by proxy therefore also e.g. pin its task
local storage.

In order to support this, we'll need to expand KF_TRUSTED_ARGS to
support receiving trusted, non-refcounted pointers. It currently only
supports either PTR_TO_CTX pointers, or refcounted pointers. What this
means in terms of the implementation is that btf_check_func_arg_match()
would have to add another condition to its logic for checking if
a ptr needs a refcount to also require that the pointer has at least one
type modifier, such as a new modifier we're adding called PTR_TRUSTED
(described below). Note that PTR_UNTRUSTED is insufficient for this
purpose, as it does not cover all of the possible pointers we need to
watch out for, though. For example, a pointer obtained from walking a
struct is considered "trusted" (or at least, not PTR_UNTRUSTED). To
account for this and enable us to expand KF_TRUSTED_ARGS to include
allow-listed arguments such as those passed by the kernel to tracepoints
and struct_ops callbacks, this patch set also introduces a new
PTR_TRUSTED type flag modifier which records if a pointer was obtained
passed from the kernel in a trusted context.

In closing, this patch set:

1. Adds the new PTR_TRUSTED register type modifier flag, and updates the
   verifier and existing selftests accordingly.
2. Expands KF_TRUSTED_ARGS to also include trusted pointers that were
   not obtained from walking structs. 
3. Adds a new set of kfuncs that allows struct task_struct* objects to be
   used as kptrs.
4. Adds a new selftest suite to validate these new task kfuncs.

--
Changelog:
v6 -> v7:
- Removed the PTR_WALKED type modifier, and instead define a new
  PTR_TRUSTED type modifier which is set on registers containing
  pointers passed from trusted contexts (i.e. as tracepoint or
  struct_ops callback args) (Alexei)
- Remove the new KF_OWNED_ARGS kfunc flag. This can be accomplished
  by defining a new type that wraps an existing type, such as with
  struct nf_conn___init (Alexei)
- Add a test_task_current_acquire_release testcase which verifies we can
  acquire a task struct returned from bpf_get_current_task_btf().
- Make bpf_task_acquire() no longer return NULL, as it can no longer be
  called with a NULL task.
- Removed unnecessary is_test_kfunc_task() checks from failure
  testcases.

v5 -> v6:
- Add a new KF_OWNED_ARGS kfunc flag which may be used by kfuncs to
  express that they require trusted, refcounted args (Kumar)
- Rename PTR_NESTED -> PTR_WALKED in the verifier (Kumar)
- Convert reg_type_str() prefixes to use snprintf() instead of strncpy()
  (Kumar)
- Add PTR_TO_BTF_ID | PTR_WALKED to missing struct btf_reg_type
  instances -- specifically btf_id_sock_common_types, and
  percpu_btf_ptr_types.
- Add a missing PTR_TO_BTF_ID | PTR_WALKED switch case entry in
  check_func_arg_reg_off(), which is required when validating helper
  calls (Kumar)
- Update reg_type_mismatch_ok() to check base types for the registers
  (i.e. to accommodate type modifiers). Additionally, add a lengthy
  comment that explains why this is being done (Kumar)
- Update convert_ctx_accesses() to also issue probe reads for
  PTR_TO_BTF_ID | PTR_WALKED (Kumar)
- Update selftests to expect new prefix reg type strings.
- Rename task_kfunc_acquire_trusted_nested testcase to
  task_kfunc_acquire_trusted_walked, and fix a comment (Kumar)
- Remove KF_TRUSTED_ARGS from bpf_task_release(), which already includes
  KF_RELEASE (Kumar)
- Add bpf-next in patch subject lines (Kumar)

v4 -> v5:
- Fix an improperly formatted patch title.

v3 -> v4:
- Remove an unnecessary check from my repository that I forgot to remove
  after debugging something.

v2 -> v3:
- Make bpf_task_acquire() check for NULL, and include KF_RET_NULL
  (Martin)
- Include new PTR_NESTED register modifier type flag which specifies
  whether a pointer was obtained from walking a struct. Use this to
  expand the meaning of KF_TRUSTED_ARGS to include trusted pointers that
  were passed from the kernel (Kumar)
- Add more selftests to the task_kfunc selftest suite which verify that
  you cannot pass a walked pointer to bpf_task_acquire().
- Update bpf_task_acquire() to also specify KF_TRUSTED_ARGS.

v1 -> v2:
- Rename tracing_btf_ids to generic_kfunc_btf_ids, and add the new
  kfuncs to that list instead of making a separate btf id list (Alexei).
- Don't run the new selftest suite on s390x, which doesn't appear to
  support invoking kfuncs.
- Add a missing __diag_ignore block for -Wmissing-prototypes
  (lkp@intel.com).
- Fix formatting on some of the SPDX-License-Identifier tags.
- Clarified the function header comment a bit on bpf_task_kptr_get().

David Vernet (3):
  bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  bpf: Add kfuncs for storing struct task_struct * as a kptr
  bpf/selftests: Add selftests for new task kfuncs

 Documentation/bpf/kfuncs.rst                  |  28 +-
 include/linux/bpf.h                           |  25 ++
 include/linux/btf.h                           |  66 ++--
 kernel/bpf/btf.c                              |  44 ++-
 kernel/bpf/helpers.c                          |  83 ++++-
 kernel/bpf/verifier.c                         |  45 ++-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/ipv4/bpf_tcp_ca.c                         |   4 +-
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/task_kfunc.c     | 160 +++++++++
 .../selftests/bpf/progs/task_kfunc_common.h   |  81 +++++
 .../selftests/bpf/progs/task_kfunc_failure.c  | 304 ++++++++++++++++++
 .../selftests/bpf/progs/task_kfunc_success.c  | 127 ++++++++
 tools/testing/selftests/bpf/verifier/calls.c  |   4 +-
 .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
 15 files changed, 906 insertions(+), 72 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c

-- 
2.38.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
@ 2022-11-17  3:24 ` David Vernet
  2022-11-18  2:26   ` Alexei Starovoitov
  2022-11-17  3:24 ` [PATCH bpf-next v7 2/3] bpf: Add kfuncs for storing struct task_struct * as a kptr David Vernet
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-17  3:24 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.

The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.

There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:

SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
             struct task_struct *task,
             u64 clone_flags)
{
        struct task_struct *acquired, *nested;

        nested = task->last_wakee;

        /* Would not be rejected by the verifier. */
        acquired = bpf_task_acquire(nested);
        if (!acquired)
                return 0;

        bpf_task_release(acquired);
        return 0;
}

To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.

A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.

Signed-off-by: David Vernet <void@manifault.com>
---
 Documentation/bpf/kfuncs.rst                  | 30 ++++-----
 include/linux/bpf.h                           | 30 +++++++++
 include/linux/btf.h                           | 65 ++++++++++++-------
 kernel/bpf/btf.c                              | 38 +++++++++--
 kernel/bpf/verifier.c                         | 45 ++++++++-----
 kernel/trace/bpf_trace.c                      |  2 +-
 net/ipv4/bpf_tcp_ca.c                         |  4 +-
 tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
 .../selftests/bpf/verifier/ref_tracking.c     |  4 +-
 9 files changed, 154 insertions(+), 66 deletions(-)

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0f858156371d..67b7e2f46ec6 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -137,22 +137,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
 --------------------------
 
 The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
-indicates that the all pointer arguments will always have a guaranteed lifetime,
-and pointers to kernel objects are always passed to helpers in their unmodified
-form (as obtained from acquire kfuncs).
-
-It can be used to enforce that a pointer to a refcounted object acquired from a
-kfunc or BPF helper is passed as an argument to this kfunc without any
-modifications (e.g. pointer arithmetic) such that it is trusted and points to
-the original object.
-
-Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
-but those can have a non-zero offset.
-
-This flag is often used for kfuncs that operate (change some property, perform
-some operation) on an object that was obtained using an acquire kfunc. Such
-kfuncs need an unchanged pointer to ensure the integrity of the operation being
-performed on the expected object.
+indicates that the all pointer arguments are valid, and that all pointers to
+BTF objects have been passed in their unmodified form (that is, at a zero
+offset, and without having been obtained from walking another pointer).
+
+There are two types of pointers to kernel objects which are considered "valid":
+
+1. Pointers which are passed as tracepoint or struct_ops callback arguments.
+2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
+
+Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
+KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
+
+The definition of "valid" pointers is subject to change at any time, and has
+absolutely no ABI stability guarantees.
 
 2.4.6 KF_SLEEPABLE flag
 -----------------------
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 54462dd28824..763ae250693e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -524,6 +524,35 @@ enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* PTR was passed from the kernel in a trusted context, and may be
+	 * passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions.
+	 * Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above.
+	 * PTR_UNTRUSTED refers to a kptr that was read directly from a map
+	 * without invoking bpf_kptr_xchg(). What we really need to know is
+	 * whether a pointer is safe to pass to a kfunc or BPF helper function.
+	 * While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF
+	 * helpers, they do not cover all possible instances of unsafe
+	 * pointers. For example, a pointer that was obtained from walking a
+	 * struct will _not_ get the PTR_UNTRUSTED type modifier, despite the
+	 * fact that it may be NULL, invalid, etc. This is due to backwards
+	 * compatibility requirements, as this was the behavior that was first
+	 * introduced when kptrs were added. The behavior is now considered
+	 * deprecated, and PTR_UNTRUSTED will eventually be removed.
+	 *
+	 * PTR_TRUSTED, on the other hand, is a pointer that the kernel
+	 * guarantees to be valid and safe to pass to kfuncs and BPF helpers.
+	 * For example, pointers passed to tracepoint arguments are considered
+	 * PTR_TRUSTED, as are pointers that are passed to struct_ops
+	 * callbacks. As alluded to above, pointers that are obtained from
+	 * walking PTR_TRUSTED pointers are _not_ trusted. For example, if a
+	 * struct task_struct *task is PTR_TRUSTED, then accessing
+	 * task->last_wakee will lose the PTR_TRUSTED modifier when it's stored
+	 * in a BPF register. Similarly, pointers passed to certain programs
+	 * types such as kretprobes are not guaranteed to be valid, as they may
+	 * for example contain an object that was recently freed.
+	 */
+	PTR_TRUSTED		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
@@ -617,6 +646,7 @@ enum bpf_return_type {
 	RET_PTR_TO_RINGBUF_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
 	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_MEM,
 	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
+	RET_PTR_TO_BTF_ID_TRUSTED	= PTR_TRUSTED	 | RET_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/include/linux/btf.h b/include/linux/btf.h
index d80345fa566b..13b969e74d3b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,36 +17,53 @@
 #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
 #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
 #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
-/* Trusted arguments are those which are meant to be referenced arguments with
- * unchanged offset. It is used to enforce that pointers obtained from acquire
- * kfuncs remain unmodified when being passed to helpers taking trusted args.
+/* Trusted arguments are those which are guaranteed to be valid when passed to
+ * the kfunc. It is used to enforce that pointers obtained from either acquire
+ * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
+ * invocation, remain unmodified when being passed to helpers taking trusted
+ * args.
  *
- * Consider
- *	struct foo {
- *		int data;
- *		struct foo *next;
- *	};
+ * Consider, for example, the following new task tracepoint:
  *
- *	struct bar {
- *		int data;
- *		struct foo f;
- *	};
+ *	SEC("tp_btf/task_newtask")
+ *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
+ *	{
+ *		...
+ *	}
  *
- *	struct foo *f = alloc_foo(); // Acquire kfunc
- *	struct bar *b = alloc_bar(); // Acquire kfunc
+ * And the following kfunc:
  *
- * If a kfunc set_foo_data() wants to operate only on the allocated object, it
- * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
+ *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
  *
- *	set_foo_data(f, 42);	   // Allowed
- *	set_foo_data(f->next, 42); // Rejected, non-referenced pointer
- *	set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
- *	set_foo_data(&b->f, 42);   // Rejected, referenced, but bad offset
+ * All invocations to the kfunc must pass the unmodified, unwalked task:
  *
- * In the final case, usually for the purposes of type matching, it is deduced
- * by looking at the type of the member at the offset, but due to the
- * requirement of trusted argument, this deduction will be strict and not done
- * for this case.
+ *	bpf_task_acquire(task);		    // Allowed
+ *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
+ *
+ * Programs may also pass referenced tasks directly to the kfunc:
+ *
+ *	struct task_struct *acquired;
+ *
+ *	acquired = bpf_task_acquire(task);	// Allowed, same as above
+ *	bpf_task_acquire(acquired);		// Allowed
+ *	bpf_task_acquire(task);			// Allowed
+ *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
+ *
+ * Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or
+ * kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these
+ * pointers are guaranteed to be safe. For example, the following BPF program
+ * would be rejected:
+ *
+ * SEC("kretprobe/free_task")
+ * int BPF_PROG(free_task_probe, struct task_struct *tsk)
+ * {
+ *	struct task_struct *acquired;
+ *
+ *	acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer
+ *	bpf_task_release(acquired);
+ *
+ *	return 0;
+ * }
  */
 #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
 #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 875355ff3718..8291f2911624 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5579,6 +5579,11 @@ static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
 	return nr_args + 1;
 }
 
+static bool prog_type_args_trusted(enum bpf_prog_type prog_type)
+{
+	return prog_type == BPF_PROG_TYPE_TRACING || prog_type == BPF_PROG_TYPE_STRUCT_OPS;
+}
+
 bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info)
@@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	}
 
 	info->reg_type = PTR_TO_BTF_ID;
+	if (prog_type_args_trusted(prog->type))
+		info->reg_type |= PTR_TRUSTED;
+
 	if (tgt_prog) {
 		enum bpf_prog_type tgt_type;
 
@@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		/* These register types have special constraints wrt ref_obj_id
 		 * and offset checks. The rest of trusted args don't.
 		 */
-		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
+		obj_ptr = reg->type == PTR_TO_CTX ||
+			  base_type(reg->type) == PTR_TO_BTF_ID ||
 			  reg2btf_ids[base_type(reg->type)];
 
 		/* Check if argument must be a referenced pointer, args + i has
 		 * been verified to be a pointer (after skipping modifiers).
 		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
+		 *
+		 * All object pointers must be refcounted, other than:
+		 * - PTR_TO_CTX
+		 * - PTR_TRUSTED pointers
 		 */
-		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
-			bpf_log(log, "R%d must be referenced\n", regno);
+		if (is_kfunc &&
+		    trusted_args &&
+		    obj_ptr &&
+		    base_type(reg->type) != PTR_TO_CTX &&
+		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
+		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
+		    !reg->ref_obj_id) {
+			bpf_log(log, "R%d must be referenced or trusted\n", regno);
 			return -EINVAL;
 		}
 
@@ -6646,8 +6665,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 					i, btf_type_str(t));
 				return -EINVAL;
 			}
-		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
-			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
+		} else if (is_kfunc && (base_type(reg->type) == PTR_TO_BTF_ID ||
+			   (reg2btf_ids[base_type(reg->type)]))) {
 			const struct btf_type *reg_ref_t;
 			const struct btf *reg_btf;
 			const char *reg_ref_tname;
@@ -6660,7 +6679,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				return -EINVAL;
 			}
 
-			if (reg->type == PTR_TO_BTF_ID) {
+			if ((type_flag(reg->type) & ~PTR_TRUSTED)) {
+				bpf_log(log, "kernel function %s arg#%d pointer had unexpected modifiers %d\n",
+					func_name, i, type_flag(reg->type));
+				return -EINVAL;
+			}
+
+			if (base_type(reg->type) == PTR_TO_BTF_ID) {
 				reg_btf = reg->btf;
 				reg_ref_id = reg->btf_id;
 			} else {
@@ -6988,6 +7013,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			}
 
 			reg->type = PTR_TO_MEM | PTR_MAYBE_NULL;
+
 			reg->id = ++env->id_gen;
 
 			continue;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0312d9ce292f..f5b6b1f969d9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -543,7 +543,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 static const char *reg_type_str(struct bpf_verifier_env *env,
 				enum bpf_reg_type type)
 {
-	char postfix[16] = {0}, prefix[32] = {0};
+	char postfix[16] = {0}, prefix[64] = {0};
 	static const char * const str[] = {
 		[NOT_INIT]		= "?",
 		[SCALAR_VALUE]		= "scalar",
@@ -575,16 +575,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 			strncpy(postfix, "_or_null", 16);
 	}
 
-	if (type & MEM_RDONLY)
-		strncpy(prefix, "rdonly_", 32);
-	if (type & MEM_RINGBUF)
-		strncpy(prefix, "ringbuf_", 32);
-	if (type & MEM_USER)
-		strncpy(prefix, "user_", 32);
-	if (type & MEM_PERCPU)
-		strncpy(prefix, "percpu_", 32);
-	if (type & PTR_UNTRUSTED)
-		strncpy(prefix, "untrusted_", 32);
+	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
+		 type & MEM_RDONLY ? "rdonly_" : "",
+		 type & MEM_RINGBUF ? "ringbuf_" : "",
+		 type & MEM_USER ? "user_" : "",
+		 type & MEM_PERCPU ? "percpu_" : "",
+		 type & PTR_UNTRUSTED ? "untrusted_" : "",
+		 type & PTR_TRUSTED ? "trusted_" : ""
+	);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -3844,7 +3842,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 			       struct bpf_reg_state *reg, u32 regno)
 {
 	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
-	int perm_flags = PTR_MAYBE_NULL;
+	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
 	const char *reg_name = "";
 
 	/* Only unreferenced case accepts untrusted pointers */
@@ -4707,6 +4705,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (type_flag(reg->type) & PTR_UNTRUSTED)
 		flag |= PTR_UNTRUSTED;
 
+	/* Any pointer obtained from walking a trusted pointer is no longer trusted. */
+	flag &= ~PTR_TRUSTED;
+
 	if (atype == BPF_READ && value_regno >= 0)
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
 
@@ -5774,6 +5775,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = {
 		PTR_TO_TCP_SOCK,
 		PTR_TO_XDP_SOCK,
 		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | PTR_TRUSTED,
 	},
 	.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
 };
@@ -5807,9 +5809,19 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
+static const struct bpf_reg_types btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | PTR_TRUSTED,
+	},
+};
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
-static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
+static const struct bpf_reg_types percpu_btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID | MEM_PERCPU,
+		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
+	}
+};
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
@@ -5897,7 +5909,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return -EACCES;
 
 found:
-	if (reg->type == PTR_TO_BTF_ID) {
+	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {
 		/* For bpf_sk_release, it needs to match against first member
 		 * 'struct sock_common', hence make an exception for it. This
 		 * allows bpf_sk_release to work for multiple socket types.
@@ -5973,6 +5985,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | PTR_TRUSTED:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
@@ -13690,6 +13703,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			break;
 		case PTR_TO_BTF_ID:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
+		case PTR_TO_BTF_ID | PTR_TRUSTED:
+		case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f2d8d070d024..5b9008bc597b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf)
 const struct bpf_func_proto bpf_get_current_task_btf_proto = {
 	.func		= bpf_get_current_task_btf,
 	.gpl_only	= true,
-	.ret_type	= RET_PTR_TO_BTF_ID,
+	.ret_type	= RET_PTR_TO_BTF_ID_TRUSTED,
 	.ret_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
 };
 
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index d15c91de995f..0006b5438ff7 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -61,7 +61,9 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
 	if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
 		return false;
 
-	if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id)
+	if (base_type(info->reg_type) == PTR_TO_BTF_ID &&
+	    !(type_flag(info->reg_type) & ~PTR_TRUSTED) &&
+	    info->btf_id == sock_id)
 		/* promote it to tcp_sock */
 		info->btf_id = tcp_sock_id;
 
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index e1a937277b54..7ac947f00df4 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -109,7 +109,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
+	.errstr = "arg#0 pointer had unexpected modifiers",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_kfunc_call_test_acquire", 3 },
 		{ "bpf_kfunc_call_test_release", 5 },
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index fd683a32a276..d9367f2894b9 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -142,7 +142,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 pointer had unexpected modifiers",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_user_key", 2 },
 		{ "bpf_key_put", 4 },
@@ -163,7 +163,7 @@
 	.kfunc = "bpf",
 	.expected_attach_type = BPF_LSM_MAC,
 	.flags = BPF_F_SLEEPABLE,
-	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
+	.errstr = "arg#0 pointer had unexpected modifiers",
 	.fixup_kfunc_btf_id = {
 		{ "bpf_lookup_system_key", 1 },
 		{ "bpf_key_put", 3 },
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v7 2/3] bpf: Add kfuncs for storing struct task_struct * as a kptr
  2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
  2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
@ 2022-11-17  3:24 ` David Vernet
  2022-11-17  3:24 ` [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs David Vernet
  2022-11-17 21:03 ` [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs John Fastabend
  3 siblings, 0 replies; 22+ messages in thread
From: David Vernet @ 2022-11-17  3:24 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

Now that BPF supports adding new kernel functions with kfuncs, and
storing kernel objects in maps with kptrs, we can add a set of kfuncs
which allow struct task_struct objects to be stored in maps as
referenced kptrs. The possible use cases for doing this are plentiful.
During tracing, for example, it would be useful to be able to collect
some tasks that performed a certain operation, and then periodically
summarize who they are, which cgroup they're in, how much CPU time
they've utilized, etc.

In order to enable this, this patch adds three new kfuncs:

struct task_struct *bpf_task_acquire(struct task_struct *p);
struct task_struct *bpf_task_kptr_get(struct task_struct **pp);
void bpf_task_release(struct task_struct *p);

A follow-on patch will add selftests validating these kfuncs.

Signed-off-by: David Vernet <void@manifault.com>
---
 kernel/bpf/helpers.c | 83 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7bc71995f17c..532063c9a3a2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1738,20 +1738,93 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	}
 }
 
-BTF_SET8_START(tracing_btf_ids)
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+/**
+ * bpf_task_acquire - Acquire a reference to a task. A task acquired by this
+ * kfunc which is not stored in a map as a kptr, must be released by calling
+ * bpf_task_release().
+ * @p: The task on which a reference is being acquired.
+ */
+__used noinline
+struct task_struct *bpf_task_acquire(struct task_struct *p)
+{
+	refcount_inc(&p->rcu_users);
+	return p;
+}
+
+/**
+ * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
+ * kptr acquired by this kfunc which is not subsequently stored in a map, must
+ * be released by calling bpf_task_release().
+ * @pp: A pointer to a task kptr on which a reference is being acquired.
+ */
+__used noinline
+struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
+{
+	struct task_struct *p;
+
+	rcu_read_lock();
+	p = READ_ONCE(*pp);
+	if (p && !refcount_inc_not_zero(&p->rcu_users))
+		p = NULL;
+	rcu_read_unlock();
+
+	return p;
+}
+
+/**
+ * bpf_task_release - Release the reference acquired on a struct task_struct *.
+ * If this kfunc is invoked in an RCU read region, the task_struct is
+ * guaranteed to not be freed until the current grace period has ended, even if
+ * its refcount drops to 0.
+ * @p: The task on which a reference is being released.
+ */
+__used noinline void bpf_task_release(struct task_struct *p)
+{
+	if (!p)
+		return;
+
+	put_task_struct_rcu_user(p);
+}
+
+__diag_pop();
+
+BTF_SET8_START(generic_kfunc_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
-BTF_SET8_END(tracing_btf_ids)
+BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
+BTF_SET8_END(generic_kfunc_btf_ids)
 
-static const struct btf_kfunc_id_set tracing_kfunc_set = {
+static const struct btf_kfunc_id_set generic_kfunc_set = {
 	.owner = THIS_MODULE,
-	.set   = &tracing_btf_ids,
+	.set   = &generic_kfunc_btf_ids,
 };
 
+BTF_ID_LIST(generic_kfunc_dtor_ids)
+BTF_ID(struct, task_struct)
+BTF_ID(func, bpf_task_release)
+
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
+	int ret;
+	const struct btf_id_dtor_kfunc generic_kfunc_dtors[] = {
+		{
+			.btf_id       = generic_kfunc_dtor_ids[0],
+			.kfunc_btf_id = generic_kfunc_dtor_ids[1]
+		},
+	};
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
+	return ret ?: register_btf_id_dtor_kfuncs(generic_kfunc_dtors,
+						  ARRAY_SIZE(generic_kfunc_dtors),
+						  THIS_MODULE);
 }
 
 late_initcall(kfunc_init);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs
  2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
  2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
  2022-11-17  3:24 ` [PATCH bpf-next v7 2/3] bpf: Add kfuncs for storing struct task_struct * as a kptr David Vernet
@ 2022-11-17  3:24 ` David Vernet
  2022-11-18  2:21   ` Alexei Starovoitov
  2022-11-17 21:03 ` [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs John Fastabend
  3 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-17  3:24 UTC (permalink / raw)
  To: bpf
  Cc: ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

A previous change added a series of kfuncs for storing struct
task_struct objects as referenced kptrs. This patch adds a new
task_kfunc test suite for validating their expected behavior.

Signed-off-by: David Vernet <void@manifault.com>
---
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/task_kfunc.c     | 159 +++++++++++
 .../selftests/bpf/progs/task_kfunc_common.h   |  71 +++++
 .../selftests/bpf/progs/task_kfunc_failure.c  | 259 ++++++++++++++++++
 .../selftests/bpf/progs/task_kfunc_success.c  | 149 ++++++++++
 5 files changed, 639 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index be4e3d47ea3e..97e522828d6c 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -53,6 +53,7 @@ skc_to_unix_sock                         # could not attach BPF object unexpecte
 socket_cookie                            # prog_attach unexpected error: -524                                          (trampoline)
 stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
 tailcalls                                # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls      (?)
+task_kfunc                               # JIT does not support calling kernel function
 task_local_storage                       # failed to auto-attach program 'trace_exit_creds': -524                      (trampoline)
 test_bpffs                               # bpffs test  failed 255                                                      (iterator)
 test_bprm_opts                           # failed to auto-attach program 'secure_exec': -524                           (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
new file mode 100644
index 000000000000..a3aa36a4beb3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#define _GNU_SOURCE
+#include <sys/wait.h>
+#include <test_progs.h>
+#include <unistd.h>
+
+#include "task_kfunc_failure.skel.h"
+#include "task_kfunc_success.skel.h"
+
+static size_t log_buf_sz = 1 << 20; /* 1 MB */
+static char obj_log_buf[1048576];
+
+static struct task_kfunc_success *open_load_task_kfunc_skel(void)
+{
+	struct task_kfunc_success *skel;
+	int err;
+
+	skel = task_kfunc_success__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return NULL;
+
+	skel->bss->pid = getpid();
+
+	err = task_kfunc_success__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	return skel;
+
+cleanup:
+	task_kfunc_success__destroy(skel);
+	return NULL;
+}
+
+static void run_success_test(const char *prog_name)
+{
+	struct task_kfunc_success *skel;
+	int status;
+	pid_t child_pid;
+	struct bpf_program *prog;
+	struct bpf_link *link = NULL;
+
+	skel = open_load_task_kfunc_skel();
+	if (!ASSERT_OK_PTR(skel, "open_load_skel"))
+		return;
+
+	if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "attached_link"))
+		goto cleanup;
+
+	child_pid = fork();
+	if (!ASSERT_GT(child_pid, -1, "child_pid"))
+		goto cleanup;
+	if (child_pid == 0)
+		_exit(0);
+	waitpid(child_pid, &status, 0);
+
+	ASSERT_OK(skel->bss->err, "post_wait_err");
+
+cleanup:
+	bpf_link__destroy(link);
+	task_kfunc_success__destroy(skel);
+}
+
+static const char * const success_tests[] = {
+	"test_task_acquire_release_argument",
+	"test_task_acquire_release_current",
+	"test_task_acquire_leave_in_map",
+	"test_task_xchg_release",
+	"test_task_get_release",
+	"test_task_current_acquire_release",
+};
+
+static struct {
+	const char *prog_name;
+	const char *expected_err_msg;
+} failure_tests[] = {
+	{"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
+	{"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
+	{"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
+	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
+	{"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
+	{"task_kfunc_acquire_unreleased", "Unreleased reference"},
+	{"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
+	{"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
+	{"task_kfunc_get_null", "arg#0 expected pointer to map value"},
+	{"task_kfunc_xchg_unreleased", "Unreleased reference"},
+	{"task_kfunc_get_unreleased", "Unreleased reference"},
+	{"task_kfunc_release_untrusted", "bpf_task_release arg#0 pointer had unexpected modifiers"},
+	{"task_kfunc_release_fp", "arg#0 pointer type STRUCT task_struct must point"},
+	{"task_kfunc_release_null", "bpf_task_release arg#0 pointer had unexpected modifiers"},
+	{"task_kfunc_release_unacquired", "release kernel function bpf_task_release expects"},
+};
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct task_kfunc_failure *skel;
+	int err, i;
+
+	opts.kernel_log_buf = obj_log_buf;
+	opts.kernel_log_size = log_buf_sz;
+	opts.kernel_log_level = 1;
+
+	skel = task_kfunc_failure__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
+		goto cleanup;
+
+	for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
+		struct bpf_program *prog;
+		const char *curr_name = failure_tests[i].prog_name;
+
+		prog = bpf_object__find_program_by_name(skel->obj, curr_name);
+		if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+			goto cleanup;
+
+		bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
+	}
+
+	err = task_kfunc_failure__load(skel);
+	if (!ASSERT_ERR(err, "unexpected load success"))
+		goto cleanup;
+
+	if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+		fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+		fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+	}
+
+cleanup:
+	task_kfunc_failure__destroy(skel);
+}
+
+void test_task_kfunc(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
+		if (!test__start_subtest(success_tests[i]))
+			continue;
+
+		run_success_test(success_tests[i]);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
+		if (!test__start_subtest(failure_tests[i].prog_name))
+			continue;
+
+		verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
new file mode 100644
index 000000000000..8489b952dc49
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#ifndef _TASK_KFUNC_COMMON_H
+#define _TASK_KFUNC_COMMON_H
+
+#include <errno.h>
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct __tasks_kfunc_map_value {
+	struct task_struct __kptr_ref * task;
+};
+
+struct hash_map {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, struct __tasks_kfunc_map_value);
+	__uint(max_entries, 1);
+} __tasks_kfunc_map SEC(".maps");
+
+struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
+struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
+void bpf_task_release(struct task_struct *p) __ksym;
+
+static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
+{
+	s32 pid;
+	long status;
+
+	status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
+	if (status)
+		return NULL;
+
+	return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+}
+
+static inline int tasks_kfunc_map_insert(struct task_struct *p)
+{
+	struct __tasks_kfunc_map_value local, *v;
+	long status;
+	struct task_struct *acquired, *old;
+	s32 pid;
+
+	status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
+	if (status)
+		return status;
+
+	local.task = NULL;
+	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
+	if (status)
+		return status;
+
+	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+	if (!v) {
+		bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
+		return status;
+	}
+
+	acquired = bpf_task_acquire(p);
+	old = bpf_kptr_xchg(&v->task, acquired);
+	if (old) {
+		bpf_task_release(old);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
+#endif /* _TASK_KFUNC_COMMON_H */
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
new file mode 100644
index 000000000000..4c2e8a8f3544
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Prototype for all of the program trace events below:
+ *
+ * TRACE_EVENT(task_newtask,
+ *         TP_PROTO(struct task_struct *p, u64 clone_flags)
+ */
+
+static struct __tasks_kfunc_map_value *insert_lookup_task(struct task_struct *task)
+{
+	int status;
+
+	status = tasks_kfunc_map_insert(task);
+	if (status)
+		return NULL;
+
+	return tasks_kfunc_map_value_lookup(task);
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired;
+	struct __tasks_kfunc_map_value *v;
+
+	v = insert_lookup_task(task);
+	if (!v)
+		return 0;
+
+	/* Can't invoke bpf_task_acquire() on an untrusted pointer. */
+	acquired = bpf_task_acquire(v->task);
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_fp, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired, *stack_task = (struct task_struct *)&clone_flags;
+
+	/* Can't invoke bpf_task_acquire() on a random frame pointer. */
+	acquired = bpf_task_acquire((struct task_struct *)&stack_task);
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("kretprobe/free_task")
+int BPF_PROG(task_kfunc_acquire_unsafe_kretprobe, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired;
+
+	acquired = bpf_task_acquire(task);
+	/* Can't release a bpf_task_acquire()'d task without a NULL check. */
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired;
+
+	/* Can't invoke bpf_task_acquire() on a trusted pointer obtained from walking a struct. */
+	acquired = bpf_task_acquire(task->last_wakee);
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_null, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired;
+
+	/* Can't invoke bpf_task_acquire() on a NULL pointer. */
+	acquired = bpf_task_acquire(NULL);
+	if (!acquired)
+		return 0;
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_unreleased, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired;
+
+	acquired = bpf_task_acquire(task);
+
+	/* Acquired task is never released. */
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_non_kptr_param, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+
+	/* Cannot use bpf_task_kptr_get() on a non-kptr, even on a valid task. */
+	kptr = bpf_task_kptr_get(&task);
+	if (!kptr)
+		return 0;
+
+	bpf_task_release(kptr);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_non_kptr_acquired, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr, *acquired;
+
+	acquired = bpf_task_acquire(task);
+
+	/* Cannot use bpf_task_kptr_get() on a non-kptr, even if it was acquired. */
+	kptr = bpf_task_kptr_get(&acquired);
+	bpf_task_release(acquired);
+	if (!kptr)
+		return 0;
+
+	bpf_task_release(kptr);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_null, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+
+	/* Cannot use bpf_task_kptr_get() on a NULL pointer. */
+	kptr = bpf_task_kptr_get(NULL);
+	if (!kptr)
+		return 0;
+
+	bpf_task_release(kptr);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_xchg_unreleased, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+	struct __tasks_kfunc_map_value *v;
+
+	v = insert_lookup_task(task);
+	if (!v)
+		return 0;
+
+	kptr = bpf_kptr_xchg(&v->task, NULL);
+	if (!kptr)
+		return 0;
+
+	/* Kptr retrieved from map is never released. */
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_unreleased, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+	struct __tasks_kfunc_map_value *v;
+
+	v = insert_lookup_task(task);
+	if (!v)
+		return 0;
+
+	kptr = bpf_task_kptr_get(&v->task);
+	if (!kptr)
+		return 0;
+
+	/* Kptr acquired above is never released. */
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_untrusted, struct task_struct *task, u64 clone_flags)
+{
+	struct __tasks_kfunc_map_value *v;
+
+	v = insert_lookup_task(task);
+	if (!v)
+		return 0;
+
+	/* Can't invoke bpf_task_release() on an untrusted pointer. */
+	bpf_task_release(v->task);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_fp, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *acquired = (struct task_struct *)&clone_flags;
+
+	/* Cannot release random frame pointer. */
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags)
+{
+	struct __tasks_kfunc_map_value local, *v;
+	long status;
+	struct task_struct *acquired, *old;
+	s32 pid;
+
+	status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid);
+	if (status)
+		return 0;
+
+	local.task = NULL;
+	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
+	if (status)
+		return status;
+
+	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+	if (!v)
+		return status;
+
+	acquired = bpf_task_acquire(task);
+
+	old = bpf_kptr_xchg(&v->task, acquired);
+
+	/* old cannot be passed to bpf_task_release() without a NULL check. */
+	bpf_task_release(old);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_flags)
+{
+	/* Cannot release trusted task pointer which was not acquired. */
+	bpf_task_release(task);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
new file mode 100644
index 000000000000..be4534b5ba2e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err, pid;
+
+/* Prototype for all of the program trace events below:
+ *
+ * TRACE_EVENT(task_newtask,
+ *         TP_PROTO(struct task_struct *p, u64 clone_flags)
+ */
+
+static bool is_test_kfunc_task(void)
+{
+	int cur_pid = bpf_get_current_pid_tgid() >> 32;
+
+	return pid == cur_pid;
+}
+
+static int test_acquire_release(struct task_struct *task)
+{
+	struct task_struct *acquired;
+
+	acquired = bpf_task_acquire(task);
+	bpf_task_release(acquired);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_release_argument, struct task_struct *task, u64 clone_flags)
+{
+	if (!is_test_kfunc_task())
+		return 0;
+
+	return test_acquire_release(task);
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_release_current, struct task_struct *task, u64 clone_flags)
+{
+	if (!is_test_kfunc_task())
+		return 0;
+
+	return test_acquire_release(bpf_get_current_task_btf());
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone_flags)
+{
+	long status;
+
+	if (!is_test_kfunc_task())
+		return 0;
+
+	status = tasks_kfunc_map_insert(task);
+	if (status)
+		err = 1;
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+	struct __tasks_kfunc_map_value *v;
+	long status;
+
+	if (!is_test_kfunc_task())
+		return 0;
+
+	status = tasks_kfunc_map_insert(task);
+	if (status) {
+		err = 1;
+		return 0;
+	}
+
+	v = tasks_kfunc_map_value_lookup(task);
+	if (!v) {
+		err = 2;
+		return 0;
+	}
+
+	kptr = bpf_kptr_xchg(&v->task, NULL);
+	if (!kptr) {
+		err = 3;
+		return 0;
+	}
+
+	bpf_task_release(kptr);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *kptr;
+	struct __tasks_kfunc_map_value *v;
+	long status;
+
+	if (!is_test_kfunc_task())
+		return 0;
+
+	status = tasks_kfunc_map_insert(task);
+	if (status) {
+		err = 1;
+		return 0;
+	}
+
+	v = tasks_kfunc_map_value_lookup(task);
+	if (!v) {
+		err = 2;
+		return 0;
+	}
+
+	kptr = bpf_task_kptr_get(&v->task);
+	if (!kptr) {
+		err = 3;
+		return 0;
+	}
+
+	bpf_task_release(kptr);
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_current_acquire_release, struct task_struct *task, u64 clone_flags)
+{
+	struct task_struct *current, *acquired;
+
+	if (!is_test_kfunc_task())
+		return 0;
+
+	current = bpf_get_current_task_btf();
+	acquired = bpf_task_acquire(current);
+	bpf_task_release(acquired);
+
+	return 0;
+}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
                   ` (2 preceding siblings ...)
  2022-11-17  3:24 ` [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs David Vernet
@ 2022-11-17 21:03 ` John Fastabend
  2022-11-17 21:54   ` David Vernet
  3 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2022-11-17 21:03 UTC (permalink / raw)
  To: David Vernet, bpf
  Cc: ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

David Vernet wrote:
> Now that BPF supports adding new kernel functions with kfuncs, and
> storing kernel objects in maps with kptrs, we can add a set of kfuncs
> which allow struct task_struct objects to be stored in maps as
> referenced kptrs.
> 
> The possible use cases for doing this are plentiful.  During tracing,
> for example, it would be useful to be able to collect some tasks that
> performed a certain operation, and then periodically summarize who they
> are, which cgroup they're in, how much CPU time they've utilized, etc.
> Doing this now would require storing the tasks' pids along with some
> relevant data to be exported to user space, and later associating the
> pids to tasks in other event handlers where the data is recorded.
> Another useful by-product of this is that it allows a program to pin a
> task in a BPF program, and by proxy therefore also e.g. pin its task
> local storage.

Sorry wasn't obvious to me (late to the party so if it was in some
other v* described apologies). Can we say something about the life
cycle of this acquired task_structs because they are incrementing
the ref cnt on the task struct they have potential to impact system.
I know at least we've had a few bugs in our task struct tracking
that has led to various bugs where we leak references. In our case
we didn't pin the kernel object so the leak is just BPF memory and
user space memory, still sort of bad because we would hit memory
limits and get OOM'd. Leaking kernel task structs is worse though.

quick question. If you put acquired task struct in a map what
happens if user side deletes the entry? Presumably this causes the
release to happen and the task_struct is good to go. Did I miss
the logic? I was thinking you would have something in bpf_map_free_kptrs
and a type callback to release() the refcnt?

> 
> In order to support this, we'll need to expand KF_TRUSTED_ARGS to
> support receiving trusted, non-refcounted pointers. It currently only
> supports either PTR_TO_CTX pointers, or refcounted pointers. What this
> means in terms of the implementation is that btf_check_func_arg_match()
> would have to add another condition to its logic for checking if
> a ptr needs a refcount to also require that the pointer has at least one
> type modifier, such as a new modifier we're adding called PTR_TRUSTED
> (described below). Note that PTR_UNTRUSTED is insufficient for this
> purpose, as it does not cover all of the possible pointers we need to
> watch out for, though. For example, a pointer obtained from walking a
> struct is considered "trusted" (or at least, not PTR_UNTRUSTED). To
> account for this and enable us to expand KF_TRUSTED_ARGS to include
> allow-listed arguments such as those passed by the kernel to tracepoints
> and struct_ops callbacks, this patch set also introduces a new
> PTR_TRUSTED type flag modifier which records if a pointer was obtained
> passed from the kernel in a trusted context.
> 
> In closing, this patch set:
> 
> 1. Adds the new PTR_TRUSTED register type modifier flag, and updates the
>    verifier and existing selftests accordingly.
> 2. Expands KF_TRUSTED_ARGS to also include trusted pointers that were
>    not obtained from walking structs. 
> 3. Adds a new set of kfuncs that allows struct task_struct* objects to be
>    used as kptrs.
> 4. Adds a new selftest suite to validate these new task kfuncs.
> 
> --
> Changelog:
> v6 -> v7:
> - Removed the PTR_WALKED type modifier, and instead define a new
>   PTR_TRUSTED type modifier which is set on registers containing
>   pointers passed from trusted contexts (i.e. as tracepoint or
>   struct_ops callback args) (Alexei)
> - Remove the new KF_OWNED_ARGS kfunc flag. This can be accomplished
>   by defining a new type that wraps an existing type, such as with
>   struct nf_conn___init (Alexei)
> - Add a test_task_current_acquire_release testcase which verifies we can
>   acquire a task struct returned from bpf_get_current_task_btf().
> - Make bpf_task_acquire() no longer return NULL, as it can no longer be
>   called with a NULL task.
> - Removed unnecessary is_test_kfunc_task() checks from failure
>   testcases.
> 
> v5 -> v6:
> - Add a new KF_OWNED_ARGS kfunc flag which may be used by kfuncs to
>   express that they require trusted, refcounted args (Kumar)
> - Rename PTR_NESTED -> PTR_WALKED in the verifier (Kumar)
> - Convert reg_type_str() prefixes to use snprintf() instead of strncpy()
>   (Kumar)
> - Add PTR_TO_BTF_ID | PTR_WALKED to missing struct btf_reg_type
>   instances -- specifically btf_id_sock_common_types, and
>   percpu_btf_ptr_types.
> - Add a missing PTR_TO_BTF_ID | PTR_WALKED switch case entry in
>   check_func_arg_reg_off(), which is required when validating helper
>   calls (Kumar)
> - Update reg_type_mismatch_ok() to check base types for the registers
>   (i.e. to accommodate type modifiers). Additionally, add a lengthy
>   comment that explains why this is being done (Kumar)
> - Update convert_ctx_accesses() to also issue probe reads for
>   PTR_TO_BTF_ID | PTR_WALKED (Kumar)
> - Update selftests to expect new prefix reg type strings.
> - Rename task_kfunc_acquire_trusted_nested testcase to
>   task_kfunc_acquire_trusted_walked, and fix a comment (Kumar)
> - Remove KF_TRUSTED_ARGS from bpf_task_release(), which already includes
>   KF_RELEASE (Kumar)
> - Add bpf-next in patch subject lines (Kumar)
> 
> v4 -> v5:
> - Fix an improperly formatted patch title.
> 
> v3 -> v4:
> - Remove an unnecessary check from my repository that I forgot to remove
>   after debugging something.
> 
> v2 -> v3:
> - Make bpf_task_acquire() check for NULL, and include KF_RET_NULL
>   (Martin)
> - Include new PTR_NESTED register modifier type flag which specifies
>   whether a pointer was obtained from walking a struct. Use this to
>   expand the meaning of KF_TRUSTED_ARGS to include trusted pointers that
>   were passed from the kernel (Kumar)
> - Add more selftests to the task_kfunc selftest suite which verify that
>   you cannot pass a walked pointer to bpf_task_acquire().
> - Update bpf_task_acquire() to also specify KF_TRUSTED_ARGS.
> 
> v1 -> v2:
> - Rename tracing_btf_ids to generic_kfunc_btf_ids, and add the new
>   kfuncs to that list instead of making a separate btf id list (Alexei).
> - Don't run the new selftest suite on s390x, which doesn't appear to
>   support invoking kfuncs.
> - Add a missing __diag_ignore block for -Wmissing-prototypes
>   (lkp@intel.com).
> - Fix formatting on some of the SPDX-License-Identifier tags.
> - Clarified the function header comment a bit on bpf_task_kptr_get().
> 
> David Vernet (3):
>   bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
>   bpf: Add kfuncs for storing struct task_struct * as a kptr
>   bpf/selftests: Add selftests for new task kfuncs
> 
>  Documentation/bpf/kfuncs.rst                  |  28 +-
>  include/linux/bpf.h                           |  25 ++
>  include/linux/btf.h                           |  66 ++--
>  kernel/bpf/btf.c                              |  44 ++-
>  kernel/bpf/helpers.c                          |  83 ++++-
>  kernel/bpf/verifier.c                         |  45 ++-
>  kernel/trace/bpf_trace.c                      |   2 +-
>  net/ipv4/bpf_tcp_ca.c                         |   4 +-
>  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
>  .../selftests/bpf/prog_tests/task_kfunc.c     | 160 +++++++++
>  .../selftests/bpf/progs/task_kfunc_common.h   |  81 +++++
>  .../selftests/bpf/progs/task_kfunc_failure.c  | 304 ++++++++++++++++++
>  .../selftests/bpf/progs/task_kfunc_success.c  | 127 ++++++++
>  tools/testing/selftests/bpf/verifier/calls.c  |   4 +-
>  .../selftests/bpf/verifier/ref_tracking.c     |   4 +-
>  15 files changed, 906 insertions(+), 72 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
> 
> -- 
> 2.38.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-17 21:03 ` [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs John Fastabend
@ 2022-11-17 21:54   ` David Vernet
  2022-11-17 22:36     ` John Fastabend
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-17 21:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	kpsingh, jolsa, haoluo, tj, kernel-team, linux-kernel

On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote:
> David Vernet wrote:
> > Now that BPF supports adding new kernel functions with kfuncs, and
> > storing kernel objects in maps with kptrs, we can add a set of kfuncs
> > which allow struct task_struct objects to be stored in maps as
> > referenced kptrs.
> > 
> > The possible use cases for doing this are plentiful.  During tracing,
> > for example, it would be useful to be able to collect some tasks that
> > performed a certain operation, and then periodically summarize who they
> > are, which cgroup they're in, how much CPU time they've utilized, etc.
> > Doing this now would require storing the tasks' pids along with some
> > relevant data to be exported to user space, and later associating the
> > pids to tasks in other event handlers where the data is recorded.
> > Another useful by-product of this is that it allows a program to pin a
> > task in a BPF program, and by proxy therefore also e.g. pin its task
> > local storage.
> 
> Sorry wasn't obvious to me (late to the party so if it was in some
> other v* described apologies). Can we say something about the life
> cycle of this acquired task_structs because they are incrementing
> the ref cnt on the task struct they have potential to impact system.

We should probably add an entire docs page which describes how kptrs
work, and I am happy to do that (ideally in a follow-on patch set if
that's OK with you). In general I think it would be useful to include
docs for any general-purpose kfuncs like the ones proposed in this set.

In regards to your specific question about the task lifecycle, nothing
being proposed in this patch set differs from how kptr lifecycles work
in general. The idea is that the BPF program:

1. Gets a "kptr_ref" kptr from an "acquire" kfunc.
2. Stores it in a map with bpf_kptr_xchg().

The program can then either later manually extract it from the map
(again with bpf_kptr_xchg()) and release it, or if the reference is
never removed from the map, let it be automatically released when the
map is destroyed. See [0] and [1] for a bit more information.

[0]: https://docs.kernel.org/bpf/kfuncs.html?highlight=kptr#kf-acquire-flag
[1]: https://lwn.net/Articles/900749/

> I know at least we've had a few bugs in our task struct tracking
> that has led to various bugs where we leak references. In our case
> we didn't pin the kernel object so the leak is just BPF memory and
> user space memory, still sort of bad because we would hit memory
> limits and get OOM'd. Leaking kernel task structs is worse though.

I don't think we need to worry about leaks. The verifier should fail to
load any program that doesn't properly release a kptr, and if it fails
to identify programs that improperly hold refcounts, that's a bug that
needs to be fixed. Similarly, if any map implementation (described
below) fails to properly free references at the appropriate time (when
an element or the map itself is destroyed), those are just bugs that
need to be fixed.

I think the relevant tradeoff here is really the possible side effects
of keeping a task pinned and avoiding it being reaped. I agree that's an
important consideration, but I think that would arguably apply to any
kptr (modulo the size of the object being pinned, which is certainly
relevant as well). We already have kptrs for e.g. struct nf_conn [2].
Granted, struct task_struct is significantly larger, but bpf_kptr_xchg()
is only enabled for privileged programs, so it seems like a reasonable
operation to allow.

[2]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/netfilter/nf_conntrack_bpf.c#n253

> quick question. If you put acquired task struct in a map what
> happens if user side deletes the entry? Presumably this causes the
> release to happen and the task_struct is good to go. Did I miss
> the logic? I was thinking you would have something in bpf_map_free_kptrs
> and a type callback to release() the refcnt?

Someone else can chime in here to correct me if I'm wrong, but AFAIU
this is handled by the map implementations calling out to
bpf_obj_free_fields() to invoke the kptr destructor when the element is
destroyed. See [3] and [4] for examples of where they're called from the
arraymap and hashmap logic respectively. This is how the destructors are
similarly invoked when the maps are destroyed.

[3]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/arraymap.c#n431
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/hashtab.c#n764

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-17 21:54   ` David Vernet
@ 2022-11-17 22:36     ` John Fastabend
  2022-11-18  1:41       ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2022-11-17 22:36 UTC (permalink / raw)
  To: David Vernet, John Fastabend
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	kpsingh, jolsa, haoluo, tj, kernel-team, linux-kernel

David Vernet wrote:
> On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote:
> > David Vernet wrote:
> > > Now that BPF supports adding new kernel functions with kfuncs, and
> > > storing kernel objects in maps with kptrs, we can add a set of kfuncs
> > > which allow struct task_struct objects to be stored in maps as
> > > referenced kptrs.
> > > 
> > > The possible use cases for doing this are plentiful.  During tracing,
> > > for example, it would be useful to be able to collect some tasks that
> > > performed a certain operation, and then periodically summarize who they
> > > are, which cgroup they're in, how much CPU time they've utilized, etc.
> > > Doing this now would require storing the tasks' pids along with some
> > > relevant data to be exported to user space, and later associating the
> > > pids to tasks in other event handlers where the data is recorded.
> > > Another useful by-product of this is that it allows a program to pin a
> > > task in a BPF program, and by proxy therefore also e.g. pin its task
> > > local storage.
> > 
> > Sorry wasn't obvious to me (late to the party so if it was in some
> > other v* described apologies). Can we say something about the life
> > cycle of this acquired task_structs because they are incrementing
> > the ref cnt on the task struct they have potential to impact system.
> 
> We should probably add an entire docs page which describes how kptrs
> work, and I am happy to do that (ideally in a follow-on patch set if
> that's OK with you). In general I think it would be useful to include
> docs for any general-purpose kfuncs like the ones proposed in this set.

Sure, I wouldn't require that for your series though fwiw.

> 
> In regards to your specific question about the task lifecycle, nothing
> being proposed in this patch set differs from how kptr lifecycles work
> in general. The idea is that the BPF program:
> 
> 1. Gets a "kptr_ref" kptr from an "acquire" kfunc.
> 2. Stores it in a map with bpf_kptr_xchg().
> 
> The program can then either later manually extract it from the map
> (again with bpf_kptr_xchg()) and release it, or if the reference is
> never removed from the map, let it be automatically released when the
> map is destroyed. See [0] and [1] for a bit more information.

Yep as long as the ref is decremented on map destroy and elem delete
all good.

> 
> [0]: https://docs.kernel.org/bpf/kfuncs.html?highlight=kptr#kf-acquire-flag
> [1]: https://lwn.net/Articles/900749/
> 
> > I know at least we've had a few bugs in our task struct tracking
> > that has led to various bugs where we leak references. In our case
> > we didn't pin the kernel object so the leak is just BPF memory and
> > user space memory, still sort of bad because we would hit memory
> > limits and get OOM'd. Leaking kernel task structs is worse though.
> 
> I don't think we need to worry about leaks. The verifier should fail to
> load any program that doesn't properly release a kptr, and if it fails
> to identify programs that improperly hold refcounts, that's a bug that
> needs to be fixed. Similarly, if any map implementation (described
> below) fails to properly free references at the appropriate time (when
> an element or the map itself is destroyed), those are just bugs that
> need to be fixed.
> 
> I think the relevant tradeoff here is really the possible side effects
> of keeping a task pinned and avoiding it being reaped. I agree that's an
> important consideration, but I think that would arguably apply to any
> kptr (modulo the size of the object being pinned, which is certainly
> relevant as well). We already have kptrs for e.g. struct nf_conn [2].
> Granted, struct task_struct is significantly larger, but bpf_kptr_xchg()
> is only enabled for privileged programs, so it seems like a reasonable
> operation to allow.

No not arguing it shouldn't be possible just didn't see the release
hook.

> 
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/netfilter/nf_conntrack_bpf.c#n253
> 
> > quick question. If you put acquired task struct in a map what
> > happens if user side deletes the entry? Presumably this causes the
> > release to happen and the task_struct is good to go. Did I miss
> > the logic? I was thinking you would have something in bpf_map_free_kptrs
> > and a type callback to release() the refcnt?
> 
> Someone else can chime in here to correct me if I'm wrong, but AFAIU
> this is handled by the map implementations calling out to
> bpf_obj_free_fields() to invoke the kptr destructor when the element is
> destroyed. See [3] and [4] for examples of where they're called from the
> arraymap and hashmap logic respectively. This is how the destructors are
> similarly invoked when the maps are destroyed.

Yep I found the dtor() gets populated in btf.c and apparently needed
to repull my local tree because I missed it. Thanks for the detailed
response.

And last thing I was checking is because KF_SLEEPABLE is not set
this should be blocked from running on sleepable progs which would
break the call_rcu in the destructor. Maybe small nit, not sure
its worth it but might be nice to annotate the helper description
with a note, "will not work on sleepable progs" or something to
that effect.

Thanks.

> 
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/arraymap.c#n431
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/hashtab.c#n764
> 
> [...]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-17 22:36     ` John Fastabend
@ 2022-11-18  1:41       ` David Vernet
  2022-11-18  6:04         ` John Fastabend
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-18  1:41 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	kpsingh, jolsa, haoluo, tj, kernel-team, linux-kernel

On Thu, Nov 17, 2022 at 02:36:50PM -0800, John Fastabend wrote:
> David Vernet wrote:
> > On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote:
> > > David Vernet wrote:
> > > > Now that BPF supports adding new kernel functions with kfuncs, and
> > > > storing kernel objects in maps with kptrs, we can add a set of kfuncs
> > > > which allow struct task_struct objects to be stored in maps as
> > > > referenced kptrs.
> > > > 
> > > > The possible use cases for doing this are plentiful.  During tracing,
> > > > for example, it would be useful to be able to collect some tasks that
> > > > performed a certain operation, and then periodically summarize who they
> > > > are, which cgroup they're in, how much CPU time they've utilized, etc.
> > > > Doing this now would require storing the tasks' pids along with some
> > > > relevant data to be exported to user space, and later associating the
> > > > pids to tasks in other event handlers where the data is recorded.
> > > > Another useful by-product of this is that it allows a program to pin a
> > > > task in a BPF program, and by proxy therefore also e.g. pin its task
> > > > local storage.
> > > 
> > > Sorry wasn't obvious to me (late to the party so if it was in some
> > > other v* described apologies). Can we say something about the life
> > > cycle of this acquired task_structs because they are incrementing
> > > the ref cnt on the task struct they have potential to impact system.
> > 
> > We should probably add an entire docs page which describes how kptrs
> > work, and I am happy to do that (ideally in a follow-on patch set if
> > that's OK with you). In general I think it would be useful to include
> > docs for any general-purpose kfuncs like the ones proposed in this set.
> 
> Sure, I wouldn't require that for your series though fwiw.

Sounds good to me

[...]

> > > quick question. If you put acquired task struct in a map what
> > > happens if user side deletes the entry? Presumably this causes the
> > > release to happen and the task_struct is good to go. Did I miss
> > > the logic? I was thinking you would have something in bpf_map_free_kptrs
> > > and a type callback to release() the refcnt?
> > 
> > Someone else can chime in here to correct me if I'm wrong, but AFAIU
> > this is handled by the map implementations calling out to
> > bpf_obj_free_fields() to invoke the kptr destructor when the element is
> > destroyed. See [3] and [4] for examples of where they're called from the
> > arraymap and hashmap logic respectively. This is how the destructors are
> > similarly invoked when the maps are destroyed.
> 
> Yep I found the dtor() gets populated in btf.c and apparently needed
> to repull my local tree because I missed it. Thanks for the detailed
> response.
> 
> And last thing I was checking is because KF_SLEEPABLE is not set
> this should be blocked from running on sleepable progs which would
> break the call_rcu in the destructor. Maybe small nit, not sure
> its worth it but might be nice to annotate the helper description
> with a note, "will not work on sleepable progs" or something to
> that effect.

KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
not whether the calling program can be sleepable. call_rcu() doesn't
block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
a kfunc is sleepable, non-sleepable programs are not able to call it
(and this is enforced in the verifier).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs
  2022-11-17  3:24 ` [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs David Vernet
@ 2022-11-18  2:21   ` Alexei Starovoitov
  2022-11-18 14:49     ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-18  2:21 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Wed, Nov 16, 2022 at 09:24:02PM -0600, David Vernet wrote:
> A previous change added a series of kfuncs for storing struct
> task_struct objects as referenced kptrs. This patch adds a new
> task_kfunc test suite for validating their expected behavior.
> 
> Signed-off-by: David Vernet <void@manifault.com>
> ---
>  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
>  .../selftests/bpf/prog_tests/task_kfunc.c     | 159 +++++++++++
>  .../selftests/bpf/progs/task_kfunc_common.h   |  71 +++++
>  .../selftests/bpf/progs/task_kfunc_failure.c  | 259 ++++++++++++++++++
>  .../selftests/bpf/progs/task_kfunc_success.c  | 149 ++++++++++
>  5 files changed, 639 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
> 
> diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> index be4e3d47ea3e..97e522828d6c 100644
> --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> @@ -53,6 +53,7 @@ skc_to_unix_sock                         # could not attach BPF object unexpecte
>  socket_cookie                            # prog_attach unexpected error: -524                                          (trampoline)
>  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
>  tailcalls                                # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls      (?)
> +task_kfunc                               # JIT does not support calling kernel function
>  task_local_storage                       # failed to auto-attach program 'trace_exit_creds': -524                      (trampoline)
>  test_bpffs                               # bpffs test  failed 255                                                      (iterator)
>  test_bprm_opts                           # failed to auto-attach program 'secure_exec': -524                           (trampoline)
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> new file mode 100644
> index 000000000000..a3aa36a4beb3
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#define _GNU_SOURCE
> +#include <sys/wait.h>
> +#include <test_progs.h>
> +#include <unistd.h>
> +
> +#include "task_kfunc_failure.skel.h"
> +#include "task_kfunc_success.skel.h"
> +
> +static size_t log_buf_sz = 1 << 20; /* 1 MB */
> +static char obj_log_buf[1048576];
> +
> +static struct task_kfunc_success *open_load_task_kfunc_skel(void)
> +{
> +	struct task_kfunc_success *skel;
> +	int err;
> +
> +	skel = task_kfunc_success__open();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		return NULL;
> +
> +	skel->bss->pid = getpid();
> +
> +	err = task_kfunc_success__load(skel);
> +	if (!ASSERT_OK(err, "skel_load"))
> +		goto cleanup;
> +
> +	return skel;
> +
> +cleanup:
> +	task_kfunc_success__destroy(skel);
> +	return NULL;
> +}
> +
> +static void run_success_test(const char *prog_name)
> +{
> +	struct task_kfunc_success *skel;
> +	int status;
> +	pid_t child_pid;
> +	struct bpf_program *prog;
> +	struct bpf_link *link = NULL;
> +
> +	skel = open_load_task_kfunc_skel();
> +	if (!ASSERT_OK_PTR(skel, "open_load_skel"))
> +		return;
> +
> +	if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
> +		goto cleanup;
> +
> +	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> +	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +		goto cleanup;
> +
> +	link = bpf_program__attach(prog);
> +	if (!ASSERT_OK_PTR(link, "attached_link"))
> +		goto cleanup;
> +
> +	child_pid = fork();
> +	if (!ASSERT_GT(child_pid, -1, "child_pid"))
> +		goto cleanup;
> +	if (child_pid == 0)
> +		_exit(0);
> +	waitpid(child_pid, &status, 0);
> +
> +	ASSERT_OK(skel->bss->err, "post_wait_err");
> +
> +cleanup:
> +	bpf_link__destroy(link);
> +	task_kfunc_success__destroy(skel);
> +}
> +
> +static const char * const success_tests[] = {
> +	"test_task_acquire_release_argument",
> +	"test_task_acquire_release_current",
> +	"test_task_acquire_leave_in_map",
> +	"test_task_xchg_release",
> +	"test_task_get_release",
> +	"test_task_current_acquire_release",
> +};
> +
> +static struct {
> +	const char *prog_name;
> +	const char *expected_err_msg;
> +} failure_tests[] = {
> +	{"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
> +	{"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
> +	{"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
> +	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
> +	{"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
> +	{"task_kfunc_acquire_unreleased", "Unreleased reference"},
> +	{"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
> +	{"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
> +	{"task_kfunc_get_null", "arg#0 expected pointer to map value"},
> +	{"task_kfunc_xchg_unreleased", "Unreleased reference"},
> +	{"task_kfunc_get_unreleased", "Unreleased reference"},
> +	{"task_kfunc_release_untrusted", "bpf_task_release arg#0 pointer had unexpected modifiers"},
> +	{"task_kfunc_release_fp", "arg#0 pointer type STRUCT task_struct must point"},
> +	{"task_kfunc_release_null", "bpf_task_release arg#0 pointer had unexpected modifiers"},
> +	{"task_kfunc_release_unacquired", "release kernel function bpf_task_release expects"},
> +};
> +
> +static void verify_fail(const char *prog_name, const char *expected_err_msg)
> +{
> +	LIBBPF_OPTS(bpf_object_open_opts, opts);
> +	struct task_kfunc_failure *skel;
> +	int err, i;
> +
> +	opts.kernel_log_buf = obj_log_buf;
> +	opts.kernel_log_size = log_buf_sz;
> +	opts.kernel_log_level = 1;
> +
> +	skel = task_kfunc_failure__open_opts(&opts);
> +	if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
> +		goto cleanup;
> +
> +	for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> +		struct bpf_program *prog;
> +		const char *curr_name = failure_tests[i].prog_name;
> +
> +		prog = bpf_object__find_program_by_name(skel->obj, curr_name);
> +		if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +			goto cleanup;
> +
> +		bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
> +	}
> +
> +	err = task_kfunc_failure__load(skel);
> +	if (!ASSERT_ERR(err, "unexpected load success"))
> +		goto cleanup;
> +
> +	if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
> +		fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
> +		fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
> +	}
> +
> +cleanup:
> +	task_kfunc_failure__destroy(skel);
> +}
> +
> +void test_task_kfunc(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
> +		if (!test__start_subtest(success_tests[i]))
> +			continue;
> +
> +		run_success_test(success_tests[i]);
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> +		if (!test__start_subtest(failure_tests[i].prog_name))
> +			continue;
> +
> +		verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
> +	}
> +}
> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> new file mode 100644
> index 000000000000..8489b952dc49
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#ifndef _TASK_KFUNC_COMMON_H
> +#define _TASK_KFUNC_COMMON_H
> +
> +#include <errno.h>
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +struct __tasks_kfunc_map_value {
> +	struct task_struct __kptr_ref * task;
> +};
> +
> +struct hash_map {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	__type(key, int);
> +	__type(value, struct __tasks_kfunc_map_value);
> +	__uint(max_entries, 1);
> +} __tasks_kfunc_map SEC(".maps");
> +
> +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
> +struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
> +void bpf_task_release(struct task_struct *p) __ksym;
> +
> +static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
> +{
> +	s32 pid;
> +	long status;
> +
> +	status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> +	if (status)
> +		return NULL;
> +
> +	return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> +}
> +
> +static inline int tasks_kfunc_map_insert(struct task_struct *p)
> +{
> +	struct __tasks_kfunc_map_value local, *v;
> +	long status;
> +	struct task_struct *acquired, *old;
> +	s32 pid;
> +
> +	status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> +	if (status)
> +		return status;
> +
> +	local.task = NULL;
> +	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> +	if (status)
> +		return status;
> +
> +	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> +	if (!v) {
> +		bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
> +		return status;

here it will return 0, but probably should be returning error?

> +	}
> +
> +	acquired = bpf_task_acquire(p);
> +	old = bpf_kptr_xchg(&v->task, acquired);
> +	if (old) {
> +		bpf_task_release(old);
> +		return -EEXIST;
> +	}
> +
> +	return 0;
> +}
> +
> +#endif /* _TASK_KFUNC_COMMON_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> new file mode 100644
> index 000000000000..4c2e8a8f3544
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +/* Prototype for all of the program trace events below:
> + *
> + * TRACE_EVENT(task_newtask,
> + *         TP_PROTO(struct task_struct *p, u64 clone_flags)
> + */
> +
> +static struct __tasks_kfunc_map_value *insert_lookup_task(struct task_struct *task)
> +{
> +	int status;
> +
> +	status = tasks_kfunc_map_insert(task);
> +	if (status)
> +		return NULL;
> +
> +	return tasks_kfunc_map_value_lookup(task);
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired;
> +	struct __tasks_kfunc_map_value *v;
> +
> +	v = insert_lookup_task(task);
> +	if (!v)
> +		return 0;
> +
> +	/* Can't invoke bpf_task_acquire() on an untrusted pointer. */
> +	acquired = bpf_task_acquire(v->task);
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_fp, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired, *stack_task = (struct task_struct *)&clone_flags;
> +
> +	/* Can't invoke bpf_task_acquire() on a random frame pointer. */
> +	acquired = bpf_task_acquire((struct task_struct *)&stack_task);
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("kretprobe/free_task")
> +int BPF_PROG(task_kfunc_acquire_unsafe_kretprobe, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired;
> +
> +	acquired = bpf_task_acquire(task);
> +	/* Can't release a bpf_task_acquire()'d task without a NULL check. */
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired;
> +
> +	/* Can't invoke bpf_task_acquire() on a trusted pointer obtained from walking a struct. */
> +	acquired = bpf_task_acquire(task->last_wakee);
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_null, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired;
> +
> +	/* Can't invoke bpf_task_acquire() on a NULL pointer. */
> +	acquired = bpf_task_acquire(NULL);
> +	if (!acquired)
> +		return 0;
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_unreleased, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired;
> +
> +	acquired = bpf_task_acquire(task);
> +
> +	/* Acquired task is never released. */
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_get_non_kptr_param, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +
> +	/* Cannot use bpf_task_kptr_get() on a non-kptr, even on a valid task. */
> +	kptr = bpf_task_kptr_get(&task);
> +	if (!kptr)
> +		return 0;
> +
> +	bpf_task_release(kptr);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_get_non_kptr_acquired, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr, *acquired;
> +
> +	acquired = bpf_task_acquire(task);
> +
> +	/* Cannot use bpf_task_kptr_get() on a non-kptr, even if it was acquired. */
> +	kptr = bpf_task_kptr_get(&acquired);
> +	bpf_task_release(acquired);
> +	if (!kptr)
> +		return 0;
> +
> +	bpf_task_release(kptr);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_get_null, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +
> +	/* Cannot use bpf_task_kptr_get() on a NULL pointer. */
> +	kptr = bpf_task_kptr_get(NULL);
> +	if (!kptr)
> +		return 0;
> +
> +	bpf_task_release(kptr);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_xchg_unreleased, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +	struct __tasks_kfunc_map_value *v;
> +
> +	v = insert_lookup_task(task);
> +	if (!v)
> +		return 0;
> +
> +	kptr = bpf_kptr_xchg(&v->task, NULL);
> +	if (!kptr)
> +		return 0;
> +
> +	/* Kptr retrieved from map is never released. */
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_get_unreleased, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +	struct __tasks_kfunc_map_value *v;
> +
> +	v = insert_lookup_task(task);
> +	if (!v)
> +		return 0;
> +
> +	kptr = bpf_task_kptr_get(&v->task);
> +	if (!kptr)
> +		return 0;
> +
> +	/* Kptr acquired above is never released. */
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_release_untrusted, struct task_struct *task, u64 clone_flags)
> +{
> +	struct __tasks_kfunc_map_value *v;
> +
> +	v = insert_lookup_task(task);
> +	if (!v)
> +		return 0;
> +
> +	/* Can't invoke bpf_task_release() on an untrusted pointer. */
> +	bpf_task_release(v->task);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_release_fp, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *acquired = (struct task_struct *)&clone_flags;
> +
> +	/* Cannot release random frame pointer. */
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags)
> +{
> +	struct __tasks_kfunc_map_value local, *v;
> +	long status;
> +	struct task_struct *acquired, *old;
> +	s32 pid;
> +
> +	status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid);
> +	if (status)
> +		return 0;
> +
> +	local.task = NULL;
> +	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> +	if (status)
> +		return status;
> +
> +	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> +	if (!v)
> +		return status;

should be return error instead?

> +
> +	acquired = bpf_task_acquire(task);
> +
> +	old = bpf_kptr_xchg(&v->task, acquired);
> +
> +	/* old cannot be passed to bpf_task_release() without a NULL check. */
> +	bpf_task_release(old);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_flags)
> +{
> +	/* Cannot release trusted task pointer which was not acquired. */
> +	bpf_task_release(task);
> +
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
> new file mode 100644
> index 000000000000..be4534b5ba2e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
> @@ -0,0 +1,149 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int err, pid;
> +
> +/* Prototype for all of the program trace events below:
> + *
> + * TRACE_EVENT(task_newtask,
> + *         TP_PROTO(struct task_struct *p, u64 clone_flags)
> + */
> +
> +static bool is_test_kfunc_task(void)
> +{
> +	int cur_pid = bpf_get_current_pid_tgid() >> 32;
> +
> +	return pid == cur_pid;
> +}
> +
> +static int test_acquire_release(struct task_struct *task)
> +{
> +	struct task_struct *acquired;
> +
> +	acquired = bpf_task_acquire(task);
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_acquire_release_argument, struct task_struct *task, u64 clone_flags)
> +{
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	return test_acquire_release(task);
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_acquire_release_current, struct task_struct *task, u64 clone_flags)
> +{
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	return test_acquire_release(bpf_get_current_task_btf());
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone_flags)
> +{
> +	long status;
> +
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	status = tasks_kfunc_map_insert(task);
> +	if (status)
> +		err = 1;
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +	struct __tasks_kfunc_map_value *v;
> +	long status;
> +
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	status = tasks_kfunc_map_insert(task);
> +	if (status) {
> +		err = 1;
> +		return 0;
> +	}
> +
> +	v = tasks_kfunc_map_value_lookup(task);
> +	if (!v) {
> +		err = 2;
> +		return 0;
> +	}
> +
> +	kptr = bpf_kptr_xchg(&v->task, NULL);
> +	if (!kptr) {
> +		err = 3;
> +		return 0;
> +	}
> +
> +	bpf_task_release(kptr);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *kptr;
> +	struct __tasks_kfunc_map_value *v;
> +	long status;
> +
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	status = tasks_kfunc_map_insert(task);
> +	if (status) {
> +		err = 1;
> +		return 0;
> +	}
> +
> +	v = tasks_kfunc_map_value_lookup(task);
> +	if (!v) {
> +		err = 2;
> +		return 0;
> +	}
> +
> +	kptr = bpf_task_kptr_get(&v->task);
> +	if (!kptr) {
> +		err = 3;
> +		return 0;
> +	}
> +
> +	bpf_task_release(kptr);
> +
> +	return 0;
> +}
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(test_task_current_acquire_release, struct task_struct *task, u64 clone_flags)
> +{
> +	struct task_struct *current, *acquired;
> +
> +	if (!is_test_kfunc_task())
> +		return 0;
> +
> +	current = bpf_get_current_task_btf();
> +	acquired = bpf_task_acquire(current);
> +	bpf_task_release(acquired);
> +
> +	return 0;
> +}
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
@ 2022-11-18  2:26   ` Alexei Starovoitov
  2022-11-18 14:45     ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-18  2:26 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Wed, Nov 16, 2022 at 09:24:00PM -0600, David Vernet wrote:
> Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> to the verifier that it should enforce that a BPF program passes it a
> "safe", trusted pointer. Currently, "safe" means that the pointer is
> either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> the kernel passes a BPF program a safe / trusted pointer to an object
> that the BPF program wishes to use as a kptr, but because the object
> does not yet have a ref_obj_id from the perspective of the verifier, the
> program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> kfunc.
> 
> The solution is to expand the set of pointers that are considered
> trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> with these pointers without getting rejected by the verifier.
> 
> There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> such as when a BPF program reads a kptr directly from a map
> without performing a bpf_kptr_xchg() call. These pointers of course can
> and should be rejected by the verifier. Unfortunately, however,
> PTR_UNTRUSTED does not cover all the cases for safety that need to
> be addressed to adequately protect kfuncs. Specifically, pointers
> obtained by a BPF program "walking" a struct are _not_ considered
> PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> allow the following unsafe BPF program to be loaded:
> 
> SEC("tp_btf/task_newtask")
> int BPF_PROG(unsafe_acquire_task,
>              struct task_struct *task,
>              u64 clone_flags)
> {
>         struct task_struct *acquired, *nested;
> 
>         nested = task->last_wakee;
> 
>         /* Would not be rejected by the verifier. */
>         acquired = bpf_task_acquire(nested);
>         if (!acquired)
>                 return 0;
> 
>         bpf_task_release(acquired);
>         return 0;
> }
> 
> To address this, this patch defines a new type flag called PTR_TRUSTED
> which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
> KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
> passed directly from the kernel as a tracepoint or struct_ops callback
> argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
> pointer is no longer PTR_TRUSTED. From the example above, the struct
> task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
> obtained from 'task->last_wakee' is not PTR_TRUSTED.
> 
> A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> and then another patch will add selftests to validate.
> 
> Signed-off-by: David Vernet <void@manifault.com>
> ---
>  Documentation/bpf/kfuncs.rst                  | 30 ++++-----
>  include/linux/bpf.h                           | 30 +++++++++
>  include/linux/btf.h                           | 65 ++++++++++++-------
>  kernel/bpf/btf.c                              | 38 +++++++++--
>  kernel/bpf/verifier.c                         | 45 ++++++++-----
>  kernel/trace/bpf_trace.c                      |  2 +-
>  net/ipv4/bpf_tcp_ca.c                         |  4 +-
>  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
>  .../selftests/bpf/verifier/ref_tracking.c     |  4 +-
>  9 files changed, 154 insertions(+), 66 deletions(-)
> 
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 0f858156371d..67b7e2f46ec6 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -137,22 +137,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
>  --------------------------
>  
>  The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> -indicates that the all pointer arguments will always have a guaranteed lifetime,
> -and pointers to kernel objects are always passed to helpers in their unmodified
> -form (as obtained from acquire kfuncs).
> -
> -It can be used to enforce that a pointer to a refcounted object acquired from a
> -kfunc or BPF helper is passed as an argument to this kfunc without any
> -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> -the original object.
> -
> -Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
> -but those can have a non-zero offset.
> -
> -This flag is often used for kfuncs that operate (change some property, perform
> -some operation) on an object that was obtained using an acquire kfunc. Such
> -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> -performed on the expected object.
> +indicates that the all pointer arguments are valid, and that all pointers to
> +BTF objects have been passed in their unmodified form (that is, at a zero
> +offset, and without having been obtained from walking another pointer).
> +
> +There are two types of pointers to kernel objects which are considered "valid":
> +
> +1. Pointers which are passed as tracepoint or struct_ops callback arguments.
> +2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
> +
> +Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
> +KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
> +
> +The definition of "valid" pointers is subject to change at any time, and has
> +absolutely no ABI stability guarantees.
>  
>  2.4.6 KF_SLEEPABLE flag
>  -----------------------
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 54462dd28824..763ae250693e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -524,6 +524,35 @@ enum bpf_type_flag {
>  	/* Size is known at compile time. */
>  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>  
> +	/* PTR was passed from the kernel in a trusted context, and may be
> +	 * passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions.
> +	 * Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above.
> +	 * PTR_UNTRUSTED refers to a kptr that was read directly from a map
> +	 * without invoking bpf_kptr_xchg(). What we really need to know is
> +	 * whether a pointer is safe to pass to a kfunc or BPF helper function.
> +	 * While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF
> +	 * helpers, they do not cover all possible instances of unsafe
> +	 * pointers. For example, a pointer that was obtained from walking a
> +	 * struct will _not_ get the PTR_UNTRUSTED type modifier, despite the
> +	 * fact that it may be NULL, invalid, etc. This is due to backwards
> +	 * compatibility requirements, as this was the behavior that was first
> +	 * introduced when kptrs were added. The behavior is now considered
> +	 * deprecated, and PTR_UNTRUSTED will eventually be removed.
> +	 *
> +	 * PTR_TRUSTED, on the other hand, is a pointer that the kernel
> +	 * guarantees to be valid and safe to pass to kfuncs and BPF helpers.
> +	 * For example, pointers passed to tracepoint arguments are considered
> +	 * PTR_TRUSTED, as are pointers that are passed to struct_ops
> +	 * callbacks. As alluded to above, pointers that are obtained from
> +	 * walking PTR_TRUSTED pointers are _not_ trusted. For example, if a
> +	 * struct task_struct *task is PTR_TRUSTED, then accessing
> +	 * task->last_wakee will lose the PTR_TRUSTED modifier when it's stored
> +	 * in a BPF register. Similarly, pointers passed to certain programs
> +	 * types such as kretprobes are not guaranteed to be valid, as they may
> +	 * for example contain an object that was recently freed.
> +	 */
> +	PTR_TRUSTED		= BIT(11 + BPF_BASE_TYPE_BITS),
> +
>  	__BPF_TYPE_FLAG_MAX,
>  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>  };
> @@ -617,6 +646,7 @@ enum bpf_return_type {
>  	RET_PTR_TO_RINGBUF_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
>  	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_MEM,
>  	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
> +	RET_PTR_TO_BTF_ID_TRUSTED	= PTR_TRUSTED	 | RET_PTR_TO_BTF_ID,
>  
>  	/* This must be the last entry. Its purpose is to ensure the enum is
>  	 * wide enough to hold the higher bits reserved for bpf_type_flag.
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index d80345fa566b..13b969e74d3b 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -17,36 +17,53 @@
>  #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
>  #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
>  #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> -/* Trusted arguments are those which are meant to be referenced arguments with
> - * unchanged offset. It is used to enforce that pointers obtained from acquire
> - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> +/* Trusted arguments are those which are guaranteed to be valid when passed to
> + * the kfunc. It is used to enforce that pointers obtained from either acquire
> + * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> + * invocation, remain unmodified when being passed to helpers taking trusted
> + * args.
>   *
> - * Consider
> - *	struct foo {
> - *		int data;
> - *		struct foo *next;
> - *	};
> + * Consider, for example, the following new task tracepoint:
>   *
> - *	struct bar {
> - *		int data;
> - *		struct foo f;
> - *	};
> + *	SEC("tp_btf/task_newtask")
> + *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> + *	{
> + *		...
> + *	}
>   *
> - *	struct foo *f = alloc_foo(); // Acquire kfunc
> - *	struct bar *b = alloc_bar(); // Acquire kfunc
> + * And the following kfunc:
>   *
> - * If a kfunc set_foo_data() wants to operate only on the allocated object, it
> - * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
> + *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
>   *
> - *	set_foo_data(f, 42);	   // Allowed
> - *	set_foo_data(f->next, 42); // Rejected, non-referenced pointer
> - *	set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
> - *	set_foo_data(&b->f, 42);   // Rejected, referenced, but bad offset
> + * All invocations to the kfunc must pass the unmodified, unwalked task:
>   *
> - * In the final case, usually for the purposes of type matching, it is deduced
> - * by looking at the type of the member at the offset, but due to the
> - * requirement of trusted argument, this deduction will be strict and not done
> - * for this case.
> + *	bpf_task_acquire(task);		    // Allowed
> + *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
> + *
> + * Programs may also pass referenced tasks directly to the kfunc:
> + *
> + *	struct task_struct *acquired;
> + *
> + *	acquired = bpf_task_acquire(task);	// Allowed, same as above
> + *	bpf_task_acquire(acquired);		// Allowed
> + *	bpf_task_acquire(task);			// Allowed
> + *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> + *
> + * Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or
> + * kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these
> + * pointers are guaranteed to be safe. For example, the following BPF program
> + * would be rejected:
> + *
> + * SEC("kretprobe/free_task")
> + * int BPF_PROG(free_task_probe, struct task_struct *tsk)
> + * {
> + *	struct task_struct *acquired;
> + *
> + *	acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer
> + *	bpf_task_release(acquired);
> + *
> + *	return 0;
> + * }
>   */
>  #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
>  #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 875355ff3718..8291f2911624 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5579,6 +5579,11 @@ static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
>  	return nr_args + 1;
>  }
>  
> +static bool prog_type_args_trusted(enum bpf_prog_type prog_type)
> +{
> +	return prog_type == BPF_PROG_TYPE_TRACING || prog_type == BPF_PROG_TYPE_STRUCT_OPS;
> +}
> +
>  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  		    const struct bpf_prog *prog,
>  		    struct bpf_insn_access_aux *info)
> @@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  	}
>  
>  	info->reg_type = PTR_TO_BTF_ID;
> +	if (prog_type_args_trusted(prog->type))
> +		info->reg_type |= PTR_TRUSTED;
> +
>  	if (tgt_prog) {
>  		enum bpf_prog_type tgt_type;
>  
> @@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  		/* These register types have special constraints wrt ref_obj_id
>  		 * and offset checks. The rest of trusted args don't.
>  		 */
> -		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
> +		obj_ptr = reg->type == PTR_TO_CTX ||
> +			  base_type(reg->type) == PTR_TO_BTF_ID ||
>  			  reg2btf_ids[base_type(reg->type)];
>  
>  		/* Check if argument must be a referenced pointer, args + i has
>  		 * been verified to be a pointer (after skipping modifiers).
>  		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
> +		 *
> +		 * All object pointers must be refcounted, other than:
> +		 * - PTR_TO_CTX
> +		 * - PTR_TRUSTED pointers
>  		 */
> -		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
> -			bpf_log(log, "R%d must be referenced\n", regno);
> +		if (is_kfunc &&
> +		    trusted_args &&
> +		    obj_ptr &&
> +		    base_type(reg->type) != PTR_TO_CTX &&
> +		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
> +		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
> +		    !reg->ref_obj_id) {

This is pretty hard to read.
Is this checking:
!(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
?

Why not to use the above?
Similar in other places... type_flag(reg->type) & ~PTR_TRUSTED is not easy.
Maybe add a helper that will do
bool ff(reg)
{ 
  return reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED);
}

?

> +			bpf_log(log, "R%d must be referenced or trusted\n", regno);
>  			return -EINVAL;
>  		}
>  
> @@ -6646,8 +6665,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  					i, btf_type_str(t));
>  				return -EINVAL;
>  			}
> -		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
> -			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
> +		} else if (is_kfunc && (base_type(reg->type) == PTR_TO_BTF_ID ||
> +			   (reg2btf_ids[base_type(reg->type)]))) {
>  			const struct btf_type *reg_ref_t;
>  			const struct btf *reg_btf;
>  			const char *reg_ref_tname;
> @@ -6660,7 +6679,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  				return -EINVAL;
>  			}
>  
> -			if (reg->type == PTR_TO_BTF_ID) {
> +			if ((type_flag(reg->type) & ~PTR_TRUSTED)) {

and use that helper here?

> +				bpf_log(log, "kernel function %s arg#%d pointer had unexpected modifiers %d\n",
> +					func_name, i, type_flag(reg->type));
> +				return -EINVAL;
> +			}
> +
> +			if (base_type(reg->type) == PTR_TO_BTF_ID) {
>  				reg_btf = reg->btf;
>  				reg_ref_id = reg->btf_id;
>  			} else {
> @@ -6988,6 +7013,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
>  			}
>  
>  			reg->type = PTR_TO_MEM | PTR_MAYBE_NULL;
> +
>  			reg->id = ++env->id_gen;
>  
>  			continue;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0312d9ce292f..f5b6b1f969d9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -543,7 +543,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
>  static const char *reg_type_str(struct bpf_verifier_env *env,
>  				enum bpf_reg_type type)
>  {
> -	char postfix[16] = {0}, prefix[32] = {0};
> +	char postfix[16] = {0}, prefix[64] = {0};
>  	static const char * const str[] = {
>  		[NOT_INIT]		= "?",
>  		[SCALAR_VALUE]		= "scalar",
> @@ -575,16 +575,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
>  			strncpy(postfix, "_or_null", 16);
>  	}
>  
> -	if (type & MEM_RDONLY)
> -		strncpy(prefix, "rdonly_", 32);
> -	if (type & MEM_RINGBUF)
> -		strncpy(prefix, "ringbuf_", 32);
> -	if (type & MEM_USER)
> -		strncpy(prefix, "user_", 32);
> -	if (type & MEM_PERCPU)
> -		strncpy(prefix, "percpu_", 32);
> -	if (type & PTR_UNTRUSTED)
> -		strncpy(prefix, "untrusted_", 32);
> +	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
> +		 type & MEM_RDONLY ? "rdonly_" : "",
> +		 type & MEM_RINGBUF ? "ringbuf_" : "",
> +		 type & MEM_USER ? "user_" : "",
> +		 type & MEM_PERCPU ? "percpu_" : "",
> +		 type & PTR_UNTRUSTED ? "untrusted_" : "",
> +		 type & PTR_TRUSTED ? "trusted_" : ""
> +	);
>  
>  	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
>  		 prefix, str[base_type(type)], postfix);
> @@ -3844,7 +3842,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  			       struct bpf_reg_state *reg, u32 regno)
>  {
>  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> -	int perm_flags = PTR_MAYBE_NULL;
> +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
>  	const char *reg_name = "";
>  
>  	/* Only unreferenced case accepts untrusted pointers */
> @@ -4707,6 +4705,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  	if (type_flag(reg->type) & PTR_UNTRUSTED)
>  		flag |= PTR_UNTRUSTED;
>  
> +	/* Any pointer obtained from walking a trusted pointer is no longer trusted. */
> +	flag &= ~PTR_TRUSTED;
> +
>  	if (atype == BPF_READ && value_regno >= 0)
>  		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
>  
> @@ -5774,6 +5775,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = {
>  		PTR_TO_TCP_SOCK,
>  		PTR_TO_XDP_SOCK,
>  		PTR_TO_BTF_ID,
> +		PTR_TO_BTF_ID | PTR_TRUSTED,
>  	},
>  	.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
>  };
> @@ -5807,9 +5809,19 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
>  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
>  static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
>  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> +static const struct bpf_reg_types btf_ptr_types = {
> +	.types = {
> +		PTR_TO_BTF_ID,
> +		PTR_TO_BTF_ID | PTR_TRUSTED,
> +	},
> +};
>  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
> -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
> +static const struct bpf_reg_types percpu_btf_ptr_types = {
> +	.types = {
> +		PTR_TO_BTF_ID | MEM_PERCPU,
> +		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
> +	}
> +};
>  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
>  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
>  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> @@ -5897,7 +5909,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  	return -EACCES;
>  
>  found:
> -	if (reg->type == PTR_TO_BTF_ID) {
> +	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {
>  		/* For bpf_sk_release, it needs to match against first member
>  		 * 'struct sock_common', hence make an exception for it. This
>  		 * allows bpf_sk_release to work for multiple socket types.
> @@ -5973,6 +5985,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> +	case PTR_TO_BTF_ID | PTR_TRUSTED:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * it's fixed offset must be 0.	In the other cases, fixed offset
>  		 * can be non-zero.
> @@ -13690,6 +13703,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  			break;
>  		case PTR_TO_BTF_ID:
>  		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> +		case PTR_TO_BTF_ID | PTR_TRUSTED:
> +		case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED:
>  			if (type == BPF_READ) {
>  				insn->code = BPF_LDX | BPF_PROBE_MEM |
>  					BPF_SIZE((insn)->code);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f2d8d070d024..5b9008bc597b 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf)
>  const struct bpf_func_proto bpf_get_current_task_btf_proto = {
>  	.func		= bpf_get_current_task_btf,
>  	.gpl_only	= true,
> -	.ret_type	= RET_PTR_TO_BTF_ID,
> +	.ret_type	= RET_PTR_TO_BTF_ID_TRUSTED,
>  	.ret_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
>  };
>  
> diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> index d15c91de995f..0006b5438ff7 100644
> --- a/net/ipv4/bpf_tcp_ca.c
> +++ b/net/ipv4/bpf_tcp_ca.c
> @@ -61,7 +61,9 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
>  	if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
>  		return false;
>  
> -	if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id)
> +	if (base_type(info->reg_type) == PTR_TO_BTF_ID &&
> +	    !(type_flag(info->reg_type) & ~PTR_TRUSTED) &&
> +	    info->btf_id == sock_id)
>  		/* promote it to tcp_sock */
>  		info->btf_id = tcp_sock_id;
>  
> diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> index e1a937277b54..7ac947f00df4 100644
> --- a/tools/testing/selftests/bpf/verifier/calls.c
> +++ b/tools/testing/selftests/bpf/verifier/calls.c
> @@ -109,7 +109,7 @@
>  	},
>  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
>  	.result = REJECT,
> -	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
> +	.errstr = "arg#0 pointer had unexpected modifiers",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_kfunc_call_test_acquire", 3 },
>  		{ "bpf_kfunc_call_test_release", 5 },
> diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> index fd683a32a276..d9367f2894b9 100644
> --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> @@ -142,7 +142,7 @@
>  	.kfunc = "bpf",
>  	.expected_attach_type = BPF_LSM_MAC,
>  	.flags = BPF_F_SLEEPABLE,
> -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> +	.errstr = "arg#0 pointer had unexpected modifiers",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_lookup_user_key", 2 },
>  		{ "bpf_key_put", 4 },
> @@ -163,7 +163,7 @@
>  	.kfunc = "bpf",
>  	.expected_attach_type = BPF_LSM_MAC,
>  	.flags = BPF_F_SLEEPABLE,
> -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> +	.errstr = "arg#0 pointer had unexpected modifiers",
>  	.fixup_kfunc_btf_id = {
>  		{ "bpf_lookup_system_key", 1 },
>  		{ "bpf_key_put", 3 },
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-18  1:41       ` David Vernet
@ 2022-11-18  6:04         ` John Fastabend
  2022-11-18 15:08           ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2022-11-18  6:04 UTC (permalink / raw)
  To: David Vernet, John Fastabend
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	kpsingh, jolsa, haoluo, tj, kernel-team, linux-kernel

David Vernet wrote:
> On Thu, Nov 17, 2022 at 02:36:50PM -0800, John Fastabend wrote:
> > David Vernet wrote:
> > > On Thu, Nov 17, 2022 at 01:03:45PM -0800, John Fastabend wrote:
> > > > David Vernet wrote:
> > > > > Now that BPF supports adding new kernel functions with kfuncs, and
> > > > > storing kernel objects in maps with kptrs, we can add a set of kfuncs
> > > > > which allow struct task_struct objects to be stored in maps as
> > > > > referenced kptrs.
> > > > > 
> > > > > The possible use cases for doing this are plentiful.  During tracing,
> > > > > for example, it would be useful to be able to collect some tasks that
> > > > > performed a certain operation, and then periodically summarize who they
> > > > > are, which cgroup they're in, how much CPU time they've utilized, etc.
> > > > > Doing this now would require storing the tasks' pids along with some
> > > > > relevant data to be exported to user space, and later associating the
> > > > > pids to tasks in other event handlers where the data is recorded.
> > > > > Another useful by-product of this is that it allows a program to pin a
> > > > > task in a BPF program, and by proxy therefore also e.g. pin its task
> > > > > local storage.
> > > > 
> > > > Sorry wasn't obvious to me (late to the party so if it was in some
> > > > other v* described apologies). Can we say something about the life
> > > > cycle of this acquired task_structs because they are incrementing
> > > > the ref cnt on the task struct they have potential to impact system.
> > > 
> > > We should probably add an entire docs page which describes how kptrs
> > > work, and I am happy to do that (ideally in a follow-on patch set if
> > > that's OK with you). In general I think it would be useful to include
> > > docs for any general-purpose kfuncs like the ones proposed in this set.
> > 
> > Sure, I wouldn't require that for your series though fwiw.
> 
> Sounds good to me
> 
> [...]
> 
> > > > quick question. If you put acquired task struct in a map what
> > > > happens if user side deletes the entry? Presumably this causes the
> > > > release to happen and the task_struct is good to go. Did I miss
> > > > the logic? I was thinking you would have something in bpf_map_free_kptrs
> > > > and a type callback to release() the refcnt?
> > > 
> > > Someone else can chime in here to correct me if I'm wrong, but AFAIU
> > > this is handled by the map implementations calling out to
> > > bpf_obj_free_fields() to invoke the kptr destructor when the element is
> > > destroyed. See [3] and [4] for examples of where they're called from the
> > > arraymap and hashmap logic respectively. This is how the destructors are
> > > similarly invoked when the maps are destroyed.
> > 
> > Yep I found the dtor() gets populated in btf.c and apparently needed
> > to repull my local tree because I missed it. Thanks for the detailed
> > response.
> > 
> > And last thing I was checking is because KF_SLEEPABLE is not set
> > this should be blocked from running on sleepable progs which would
> > break the call_rcu in the destructor. Maybe small nit, not sure
> > its worth it but might be nice to annotate the helper description
> > with a note, "will not work on sleepable progs" or something to
> > that effect.
> 
> KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> not whether the calling program can be sleepable. call_rcu() doesn't
> block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> a kfunc is sleepable, non-sleepable programs are not able to call it
> (and this is enforced in the verifier).

OK but should these helpers be allowed in sleepable progs? I think
not. What stops this, (using your helpers):

  cpu0                                       cpu1
  ----
  v = insert_lookup_task(task)
  kptr = bpf_kptr_xchg(&v->task, NULL);
  if (!kptr)
    return 0;
                                            map_delete_elem()
                                               put_task()
                                                 rcu_call
  do_something_might_sleep()
                                                    put_task_struct
                                                      ... free  
  kptr->[free'd memory]
 
the insert_lookup_task will bump the refcnt on the acquire on map
insert. But the lookup doesn't do anything to the refcnt and the
map_delete_elem will delete it. We have a check for spin_lock
types to stop them from being in sleepable progs. Did I miss a
similar check for these?

Thanks again

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-18  2:26   ` Alexei Starovoitov
@ 2022-11-18 14:45     ` David Vernet
  2022-11-18 16:45       ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-18 14:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Thu, Nov 17, 2022 at 06:26:40PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 16, 2022 at 09:24:00PM -0600, David Vernet wrote:
> > Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> > to the verifier that it should enforce that a BPF program passes it a
> > "safe", trusted pointer. Currently, "safe" means that the pointer is
> > either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> > the kernel passes a BPF program a safe / trusted pointer to an object
> > that the BPF program wishes to use as a kptr, but because the object
> > does not yet have a ref_obj_id from the perspective of the verifier, the
> > program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> > kfunc.
> > 
> > The solution is to expand the set of pointers that are considered
> > trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> > with these pointers without getting rejected by the verifier.
> > 
> > There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> > such as when a BPF program reads a kptr directly from a map
> > without performing a bpf_kptr_xchg() call. These pointers of course can
> > and should be rejected by the verifier. Unfortunately, however,
> > PTR_UNTRUSTED does not cover all the cases for safety that need to
> > be addressed to adequately protect kfuncs. Specifically, pointers
> > obtained by a BPF program "walking" a struct are _not_ considered
> > PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> > kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> > acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> > that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> > allow the following unsafe BPF program to be loaded:
> > 
> > SEC("tp_btf/task_newtask")
> > int BPF_PROG(unsafe_acquire_task,
> >              struct task_struct *task,
> >              u64 clone_flags)
> > {
> >         struct task_struct *acquired, *nested;
> > 
> >         nested = task->last_wakee;
> > 
> >         /* Would not be rejected by the verifier. */
> >         acquired = bpf_task_acquire(nested);
> >         if (!acquired)
> >                 return 0;
> > 
> >         bpf_task_release(acquired);
> >         return 0;
> > }
> > 
> > To address this, this patch defines a new type flag called PTR_TRUSTED
> > which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
> > KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
> > passed directly from the kernel as a tracepoint or struct_ops callback
> > argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
> > pointer is no longer PTR_TRUSTED. From the example above, the struct
> > task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
> > obtained from 'task->last_wakee' is not PTR_TRUSTED.
> > 
> > A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> > and then another patch will add selftests to validate.
> > 
> > Signed-off-by: David Vernet <void@manifault.com>
> > ---
> >  Documentation/bpf/kfuncs.rst                  | 30 ++++-----
> >  include/linux/bpf.h                           | 30 +++++++++
> >  include/linux/btf.h                           | 65 ++++++++++++-------
> >  kernel/bpf/btf.c                              | 38 +++++++++--
> >  kernel/bpf/verifier.c                         | 45 ++++++++-----
> >  kernel/trace/bpf_trace.c                      |  2 +-
> >  net/ipv4/bpf_tcp_ca.c                         |  4 +-
> >  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
> >  .../selftests/bpf/verifier/ref_tracking.c     |  4 +-
> >  9 files changed, 154 insertions(+), 66 deletions(-)
> > 
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 0f858156371d..67b7e2f46ec6 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -137,22 +137,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
> >  --------------------------
> >  
> >  The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> > -indicates that the all pointer arguments will always have a guaranteed lifetime,
> > -and pointers to kernel objects are always passed to helpers in their unmodified
> > -form (as obtained from acquire kfuncs).
> > -
> > -It can be used to enforce that a pointer to a refcounted object acquired from a
> > -kfunc or BPF helper is passed as an argument to this kfunc without any
> > -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> > -the original object.
> > -
> > -Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
> > -but those can have a non-zero offset.
> > -
> > -This flag is often used for kfuncs that operate (change some property, perform
> > -some operation) on an object that was obtained using an acquire kfunc. Such
> > -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> > -performed on the expected object.
> > +indicates that the all pointer arguments are valid, and that all pointers to
> > +BTF objects have been passed in their unmodified form (that is, at a zero
> > +offset, and without having been obtained from walking another pointer).
> > +
> > +There are two types of pointers to kernel objects which are considered "valid":
> > +
> > +1. Pointers which are passed as tracepoint or struct_ops callback arguments.
> > +2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
> > +
> > +Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
> > +KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
> > +
> > +The definition of "valid" pointers is subject to change at any time, and has
> > +absolutely no ABI stability guarantees.
> >  
> >  2.4.6 KF_SLEEPABLE flag
> >  -----------------------
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 54462dd28824..763ae250693e 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -524,6 +524,35 @@ enum bpf_type_flag {
> >  	/* Size is known at compile time. */
> >  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
> >  
> > +	/* PTR was passed from the kernel in a trusted context, and may be
> > +	 * passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions.
> > +	 * Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above.
> > +	 * PTR_UNTRUSTED refers to a kptr that was read directly from a map
> > +	 * without invoking bpf_kptr_xchg(). What we really need to know is
> > +	 * whether a pointer is safe to pass to a kfunc or BPF helper function.
> > +	 * While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF
> > +	 * helpers, they do not cover all possible instances of unsafe
> > +	 * pointers. For example, a pointer that was obtained from walking a
> > +	 * struct will _not_ get the PTR_UNTRUSTED type modifier, despite the
> > +	 * fact that it may be NULL, invalid, etc. This is due to backwards
> > +	 * compatibility requirements, as this was the behavior that was first
> > +	 * introduced when kptrs were added. The behavior is now considered
> > +	 * deprecated, and PTR_UNTRUSTED will eventually be removed.
> > +	 *
> > +	 * PTR_TRUSTED, on the other hand, is a pointer that the kernel
> > +	 * guarantees to be valid and safe to pass to kfuncs and BPF helpers.
> > +	 * For example, pointers passed to tracepoint arguments are considered
> > +	 * PTR_TRUSTED, as are pointers that are passed to struct_ops
> > +	 * callbacks. As alluded to above, pointers that are obtained from
> > +	 * walking PTR_TRUSTED pointers are _not_ trusted. For example, if a
> > +	 * struct task_struct *task is PTR_TRUSTED, then accessing
> > +	 * task->last_wakee will lose the PTR_TRUSTED modifier when it's stored
> > +	 * in a BPF register. Similarly, pointers passed to certain programs
> > +	 * types such as kretprobes are not guaranteed to be valid, as they may
> > +	 * for example contain an object that was recently freed.
> > +	 */
> > +	PTR_TRUSTED		= BIT(11 + BPF_BASE_TYPE_BITS),
> > +
> >  	__BPF_TYPE_FLAG_MAX,
> >  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
> >  };
> > @@ -617,6 +646,7 @@ enum bpf_return_type {
> >  	RET_PTR_TO_RINGBUF_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
> >  	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_MEM,
> >  	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
> > +	RET_PTR_TO_BTF_ID_TRUSTED	= PTR_TRUSTED	 | RET_PTR_TO_BTF_ID,
> >  
> >  	/* This must be the last entry. Its purpose is to ensure the enum is
> >  	 * wide enough to hold the higher bits reserved for bpf_type_flag.
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index d80345fa566b..13b969e74d3b 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -17,36 +17,53 @@
> >  #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> >  #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> >  #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> > -/* Trusted arguments are those which are meant to be referenced arguments with
> > - * unchanged offset. It is used to enforce that pointers obtained from acquire
> > - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> > +/* Trusted arguments are those which are guaranteed to be valid when passed to
> > + * the kfunc. It is used to enforce that pointers obtained from either acquire
> > + * kfuncs, or from the main kernel on a tracepoint or struct_ops callback
> > + * invocation, remain unmodified when being passed to helpers taking trusted
> > + * args.
> >   *
> > - * Consider
> > - *	struct foo {
> > - *		int data;
> > - *		struct foo *next;
> > - *	};
> > + * Consider, for example, the following new task tracepoint:
> >   *
> > - *	struct bar {
> > - *		int data;
> > - *		struct foo f;
> > - *	};
> > + *	SEC("tp_btf/task_newtask")
> > + *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> > + *	{
> > + *		...
> > + *	}
> >   *
> > - *	struct foo *f = alloc_foo(); // Acquire kfunc
> > - *	struct bar *b = alloc_bar(); // Acquire kfunc
> > + * And the following kfunc:
> >   *
> > - * If a kfunc set_foo_data() wants to operate only on the allocated object, it
> > - * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
> > + *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
> >   *
> > - *	set_foo_data(f, 42);	   // Allowed
> > - *	set_foo_data(f->next, 42); // Rejected, non-referenced pointer
> > - *	set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
> > - *	set_foo_data(&b->f, 42);   // Rejected, referenced, but bad offset
> > + * All invocations to the kfunc must pass the unmodified, unwalked task:
> >   *
> > - * In the final case, usually for the purposes of type matching, it is deduced
> > - * by looking at the type of the member at the offset, but due to the
> > - * requirement of trusted argument, this deduction will be strict and not done
> > - * for this case.
> > + *	bpf_task_acquire(task);		    // Allowed
> > + *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
> > + *
> > + * Programs may also pass referenced tasks directly to the kfunc:
> > + *
> > + *	struct task_struct *acquired;
> > + *
> > + *	acquired = bpf_task_acquire(task);	// Allowed, same as above
> > + *	bpf_task_acquire(acquired);		// Allowed
> > + *	bpf_task_acquire(task);			// Allowed
> > + *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> > + *
> > + * Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or
> > + * kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these
> > + * pointers are guaranteed to be safe. For example, the following BPF program
> > + * would be rejected:
> > + *
> > + * SEC("kretprobe/free_task")
> > + * int BPF_PROG(free_task_probe, struct task_struct *tsk)
> > + * {
> > + *	struct task_struct *acquired;
> > + *
> > + *	acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer
> > + *	bpf_task_release(acquired);
> > + *
> > + *	return 0;
> > + * }
> >   */
> >  #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> >  #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 875355ff3718..8291f2911624 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -5579,6 +5579,11 @@ static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
> >  	return nr_args + 1;
> >  }
> >  
> > +static bool prog_type_args_trusted(enum bpf_prog_type prog_type)
> > +{
> > +	return prog_type == BPF_PROG_TYPE_TRACING || prog_type == BPF_PROG_TYPE_STRUCT_OPS;
> > +}
> > +
> >  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >  		    const struct bpf_prog *prog,
> >  		    struct bpf_insn_access_aux *info)
> > @@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >  	}
> >  
> >  	info->reg_type = PTR_TO_BTF_ID;
> > +	if (prog_type_args_trusted(prog->type))
> > +		info->reg_type |= PTR_TRUSTED;
> > +
> >  	if (tgt_prog) {
> >  		enum bpf_prog_type tgt_type;
> >  
> > @@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  		/* These register types have special constraints wrt ref_obj_id
> >  		 * and offset checks. The rest of trusted args don't.
> >  		 */
> > -		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
> > +		obj_ptr = reg->type == PTR_TO_CTX ||
> > +			  base_type(reg->type) == PTR_TO_BTF_ID ||
> >  			  reg2btf_ids[base_type(reg->type)];
> >  
> >  		/* Check if argument must be a referenced pointer, args + i has
> >  		 * been verified to be a pointer (after skipping modifiers).
> >  		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
> > +		 *
> > +		 * All object pointers must be refcounted, other than:
> > +		 * - PTR_TO_CTX
> > +		 * - PTR_TRUSTED pointers
> >  		 */
> > -		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
> > -			bpf_log(log, "R%d must be referenced\n", regno);
> > +		if (is_kfunc &&
> > +		    trusted_args &&
> > +		    obj_ptr &&
> > +		    base_type(reg->type) != PTR_TO_CTX &&
> > +		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
> > +		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
> > +		    !reg->ref_obj_id) {
> 
> This is pretty hard to read.
> Is this checking:
> !(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> ?
> 
> Why not to use the above?

Agreed this is more readable, I'll do this for v8 (from a helper as you
suggested).

> Similar in other places... type_flag(reg->type) & ~PTR_TRUSTED is not easy.
> Maybe add a helper that will do
> bool ff(reg)
> { 
>   return reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED);
> }
> 
> ?

Sure, will do.

> > +			bpf_log(log, "R%d must be referenced or trusted\n", regno);
> >  			return -EINVAL;
> >  		}
> >  
> > @@ -6646,8 +6665,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  					i, btf_type_str(t));
> >  				return -EINVAL;
> >  			}
> > -		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
> > -			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
> > +		} else if (is_kfunc && (base_type(reg->type) == PTR_TO_BTF_ID ||
> > +			   (reg2btf_ids[base_type(reg->type)]))) {
> >  			const struct btf_type *reg_ref_t;
> >  			const struct btf *reg_btf;
> >  			const char *reg_ref_tname;
> > @@ -6660,7 +6679,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> >  				return -EINVAL;
> >  			}
> >  
> > -			if (reg->type == PTR_TO_BTF_ID) {
> > +			if ((type_flag(reg->type) & ~PTR_TRUSTED)) {
> 
> and use that helper here?

I don't think that specific helper would work here because we also need
to verify that no type modifiers other than PTR_TRUSTED are present for
when reg2btf_ids[base_type(reg->type)] is non-NULL.

> 
> > +				bpf_log(log, "kernel function %s arg#%d pointer had unexpected modifiers %d\n",
> > +					func_name, i, type_flag(reg->type));
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (base_type(reg->type) == PTR_TO_BTF_ID) {
> >  				reg_btf = reg->btf;
> >  				reg_ref_id = reg->btf_id;
> >  			} else {
> > @@ -6988,6 +7013,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
> >  			}
> >  
> >  			reg->type = PTR_TO_MEM | PTR_MAYBE_NULL;
> > +
> >  			reg->id = ++env->id_gen;
> >  
> >  			continue;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 0312d9ce292f..f5b6b1f969d9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -543,7 +543,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
> >  static const char *reg_type_str(struct bpf_verifier_env *env,
> >  				enum bpf_reg_type type)
> >  {
> > -	char postfix[16] = {0}, prefix[32] = {0};
> > +	char postfix[16] = {0}, prefix[64] = {0};
> >  	static const char * const str[] = {
> >  		[NOT_INIT]		= "?",
> >  		[SCALAR_VALUE]		= "scalar",
> > @@ -575,16 +575,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
> >  			strncpy(postfix, "_or_null", 16);
> >  	}
> >  
> > -	if (type & MEM_RDONLY)
> > -		strncpy(prefix, "rdonly_", 32);
> > -	if (type & MEM_RINGBUF)
> > -		strncpy(prefix, "ringbuf_", 32);
> > -	if (type & MEM_USER)
> > -		strncpy(prefix, "user_", 32);
> > -	if (type & MEM_PERCPU)
> > -		strncpy(prefix, "percpu_", 32);
> > -	if (type & PTR_UNTRUSTED)
> > -		strncpy(prefix, "untrusted_", 32);
> > +	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
> > +		 type & MEM_RDONLY ? "rdonly_" : "",
> > +		 type & MEM_RINGBUF ? "ringbuf_" : "",
> > +		 type & MEM_USER ? "user_" : "",
> > +		 type & MEM_PERCPU ? "percpu_" : "",
> > +		 type & PTR_UNTRUSTED ? "untrusted_" : "",
> > +		 type & PTR_TRUSTED ? "trusted_" : ""
> > +	);
> >  
> >  	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
> >  		 prefix, str[base_type(type)], postfix);
> > @@ -3844,7 +3842,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  			       struct bpf_reg_state *reg, u32 regno)
> >  {
> >  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> > -	int perm_flags = PTR_MAYBE_NULL;
> > +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
> >  	const char *reg_name = "";
> >  
> >  	/* Only unreferenced case accepts untrusted pointers */
> > @@ -4707,6 +4705,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >  	if (type_flag(reg->type) & PTR_UNTRUSTED)
> >  		flag |= PTR_UNTRUSTED;
> >  
> > +	/* Any pointer obtained from walking a trusted pointer is no longer trusted. */
> > +	flag &= ~PTR_TRUSTED;
> > +
> >  	if (atype == BPF_READ && value_regno >= 0)
> >  		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> >  
> > @@ -5774,6 +5775,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = {
> >  		PTR_TO_TCP_SOCK,
> >  		PTR_TO_XDP_SOCK,
> >  		PTR_TO_BTF_ID,
> > +		PTR_TO_BTF_ID | PTR_TRUSTED,
> >  	},
> >  	.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> >  };
> > @@ -5807,9 +5809,19 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
> >  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
> >  static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
> >  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> > -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> > +static const struct bpf_reg_types btf_ptr_types = {
> > +	.types = {
> > +		PTR_TO_BTF_ID,
> > +		PTR_TO_BTF_ID | PTR_TRUSTED,
> > +	},
> > +};
> >  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
> > -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
> > +static const struct bpf_reg_types percpu_btf_ptr_types = {
> > +	.types = {
> > +		PTR_TO_BTF_ID | MEM_PERCPU,
> > +		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
> > +	}
> > +};
> >  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > @@ -5897,7 +5909,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  	return -EACCES;
> >  
> >  found:
> > -	if (reg->type == PTR_TO_BTF_ID) {
> > +	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {
> >  		/* For bpf_sk_release, it needs to match against first member
> >  		 * 'struct sock_common', hence make an exception for it. This
> >  		 * allows bpf_sk_release to work for multiple socket types.
> > @@ -5973,6 +5985,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >  	 * fixed offset.
> >  	 */
> >  	case PTR_TO_BTF_ID:
> > +	case PTR_TO_BTF_ID | PTR_TRUSTED:
> >  		/* When referenced PTR_TO_BTF_ID is passed to release function,
> >  		 * it's fixed offset must be 0.	In the other cases, fixed offset
> >  		 * can be non-zero.
> > @@ -13690,6 +13703,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >  			break;
> >  		case PTR_TO_BTF_ID:
> >  		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> > +		case PTR_TO_BTF_ID | PTR_TRUSTED:
> > +		case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED:
> >  			if (type == BPF_READ) {
> >  				insn->code = BPF_LDX | BPF_PROBE_MEM |
> >  					BPF_SIZE((insn)->code);
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index f2d8d070d024..5b9008bc597b 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf)
> >  const struct bpf_func_proto bpf_get_current_task_btf_proto = {
> >  	.func		= bpf_get_current_task_btf,
> >  	.gpl_only	= true,
> > -	.ret_type	= RET_PTR_TO_BTF_ID,
> > +	.ret_type	= RET_PTR_TO_BTF_ID_TRUSTED,
> >  	.ret_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
> >  };
> >  
> > diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> > index d15c91de995f..0006b5438ff7 100644
> > --- a/net/ipv4/bpf_tcp_ca.c
> > +++ b/net/ipv4/bpf_tcp_ca.c
> > @@ -61,7 +61,9 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
> >  	if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
> >  		return false;
> >  
> > -	if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id)
> > +	if (base_type(info->reg_type) == PTR_TO_BTF_ID &&
> > +	    !(type_flag(info->reg_type) & ~PTR_TRUSTED) &&
> > +	    info->btf_id == sock_id)
> >  		/* promote it to tcp_sock */
> >  		info->btf_id = tcp_sock_id;
> >  
> > diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> > index e1a937277b54..7ac947f00df4 100644
> > --- a/tools/testing/selftests/bpf/verifier/calls.c
> > +++ b/tools/testing/selftests/bpf/verifier/calls.c
> > @@ -109,7 +109,7 @@
> >  	},
> >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> >  	.result = REJECT,
> > -	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
> > +	.errstr = "arg#0 pointer had unexpected modifiers",
> >  	.fixup_kfunc_btf_id = {
> >  		{ "bpf_kfunc_call_test_acquire", 3 },
> >  		{ "bpf_kfunc_call_test_release", 5 },
> > diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > index fd683a32a276..d9367f2894b9 100644
> > --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > @@ -142,7 +142,7 @@
> >  	.kfunc = "bpf",
> >  	.expected_attach_type = BPF_LSM_MAC,
> >  	.flags = BPF_F_SLEEPABLE,
> > -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> > +	.errstr = "arg#0 pointer had unexpected modifiers",
> >  	.fixup_kfunc_btf_id = {
> >  		{ "bpf_lookup_user_key", 2 },
> >  		{ "bpf_key_put", 4 },
> > @@ -163,7 +163,7 @@
> >  	.kfunc = "bpf",
> >  	.expected_attach_type = BPF_LSM_MAC,
> >  	.flags = BPF_F_SLEEPABLE,
> > -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> > +	.errstr = "arg#0 pointer had unexpected modifiers",
> >  	.fixup_kfunc_btf_id = {
> >  		{ "bpf_lookup_system_key", 1 },
> >  		{ "bpf_key_put", 3 },
> > -- 
> > 2.38.1
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs
  2022-11-18  2:21   ` Alexei Starovoitov
@ 2022-11-18 14:49     ` David Vernet
  0 siblings, 0 replies; 22+ messages in thread
From: David Vernet @ 2022-11-18 14:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Thu, Nov 17, 2022 at 06:21:00PM -0800, Alexei Starovoitov wrote:

[...]

> > +static inline int tasks_kfunc_map_insert(struct task_struct *p)
> > +{
> > +	struct __tasks_kfunc_map_value local, *v;
> > +	long status;
> > +	struct task_struct *acquired, *old;
> > +	s32 pid;
> > +
> > +	status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> > +	if (status)
> > +		return status;
> > +
> > +	local.task = NULL;
> > +	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> > +	if (status)
> > +		return status;
> > +
> > +	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> > +	if (!v) {
> > +		bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
> > +		return status;
> 
> here it will return 0, but probably should be returning error?

Ah, yes this should be returning -ENOENT. Thanks for catching this.

[...]

> > +SEC("tp_btf/task_newtask")
> > +int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags)
> > +{
> > +	struct __tasks_kfunc_map_value local, *v;
> > +	long status;
> > +	struct task_struct *acquired, *old;
> > +	s32 pid;
> > +
> > +	status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid);
> > +	if (status)
> > +		return 0;
> > +
> > +	local.task = NULL;
> > +	status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> > +	if (status)
> > +		return status;
> > +
> > +	v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> > +	if (!v)
> > +		return status;
> 
> should be return error instead?

Yep, here as well.

I'll fix both of these in v8.

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-18  6:04         ` John Fastabend
@ 2022-11-18 15:08           ` David Vernet
  2022-11-18 18:31             ` Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-18 15:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	kpsingh, jolsa, haoluo, tj, kernel-team, linux-kernel

On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote:

[...]

> > > And last thing I was checking is because KF_SLEEPABLE is not set
> > > this should be blocked from running on sleepable progs which would
> > > break the call_rcu in the destructor. Maybe small nit, not sure
> > > its worth it but might be nice to annotate the helper description
> > > with a note, "will not work on sleepable progs" or something to
> > > that effect.
> > 
> > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> > not whether the calling program can be sleepable. call_rcu() doesn't
> > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> > a kfunc is sleepable, non-sleepable programs are not able to call it
> > (and this is enforced in the verifier).
> 
> OK but should these helpers be allowed in sleepable progs? I think
> not. What stops this, (using your helpers):
> 
>   cpu0                                       cpu1
>   ----
>   v = insert_lookup_task(task)
>   kptr = bpf_kptr_xchg(&v->task, NULL);
>   if (!kptr)
>     return 0;
>                                             map_delete_elem()
>                                                put_task()
>                                                  rcu_call
>   do_something_might_sleep()
>                                                     put_task_struct
>                                                       ... free  
>   kptr->[free'd memory]
>  
> the insert_lookup_task will bump the refcnt on the acquire on map
> insert. But the lookup doesn't do anything to the refcnt and the
> map_delete_elem will delete it. We have a check for spin_lock
> types to stop them from being in sleepable progs. Did I miss a
> similar check for these?

So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically
xchg the kptr from the map, and so the map_delete_elem() call would fail
with (something like) -ENOENT. In general, the semantics are similar to
std::unique_ptr::swap() in C++.

FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to
reason about. The reason is that we're passing a pointer to the map
value containing a kptr directly to the kfunc (with the attempt of
acquiring an additional reference if a kptr was already present in the
map) rather than doing an xchg which atomically gets us the unique
pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone
else could come along and delete the kptr from the map while the kfunc
is trying to acquire that additional reference. The race looks something
like this:

   cpu0                                       cpu1
   ----
   v = insert_lookup_task(task)
   kptr = bpf_task_kptr_get(&v->task);
                                             map_delete_elem()
                                                put_task()
                                                  rcu_call
                                                     put_task_struct
                                                       ... free  
   if (!kptr)
     /* In this race example, this path will be taken. */
     return 0;

The difference is that here, we're not doing an atomic xchg of the kptr
out of the map. Instead, we're passing a pointer to the map value
containing the kptr directly to bpf_task_kptr_get(), which itself tries
to acquire an additional reference on the task to return to the program
as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU
and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure
that it can't hit a UAF, and that it won't return a dying task to the
caller:

/**
 * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
 * kptr acquired by this kfunc which is not subsequently stored in a map, must
 * be released by calling bpf_task_release().
 * @pp: A pointer to a task kptr on which a reference is being acquired.
 */
__used noinline
struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
{
        struct task_struct *p;

        rcu_read_lock();
        p = READ_ONCE(*pp);

	/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	 * cpu1 could remove the element from the map here, and invoke
	 * put_task_struct_rcu_user(). We're in an RCU read region
	 * though, so the task won't be freed until at the very
	 * earliest, the rcu_read_unlock() below.
	 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
	 */

        if (p && !refcount_inc_not_zero(&p->rcu_users))
		/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		 * refcount_inc_not_zero() will return false, as cpu1
		 * deleted the element from the map and dropped its last
		 * refcount. So we just return NULL as the task will be
		 * deleted once an RCU gp has elapsed.
		 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
		 */
                p = NULL;
        rcu_read_unlock();

        return p;
}

Let me know if that makes sense. This stuff is tricky, and I plan to
clearly / thoroughly add it to that kptr docs page once this patch set
lands.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-18 14:45     ` David Vernet
@ 2022-11-18 16:45       ` David Vernet
  2022-11-18 18:45         ` Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-18 16:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 08:45:44AM -0600, David Vernet wrote:

[...]

> > >  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >  		    const struct bpf_prog *prog,
> > >  		    struct bpf_insn_access_aux *info)
> > > @@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >  	}
> > >  
> > >  	info->reg_type = PTR_TO_BTF_ID;
> > > +	if (prog_type_args_trusted(prog->type))
> > > +		info->reg_type |= PTR_TRUSTED;
> > > +
> > >  	if (tgt_prog) {
> > >  		enum bpf_prog_type tgt_type;
> > >  
> > > @@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > >  		/* These register types have special constraints wrt ref_obj_id
> > >  		 * and offset checks. The rest of trusted args don't.
> > >  		 */
> > > -		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
> > > +		obj_ptr = reg->type == PTR_TO_CTX ||
> > > +			  base_type(reg->type) == PTR_TO_BTF_ID ||
> > >  			  reg2btf_ids[base_type(reg->type)];
> > >  
> > >  		/* Check if argument must be a referenced pointer, args + i has
> > >  		 * been verified to be a pointer (after skipping modifiers).
> > >  		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
> > > +		 *
> > > +		 * All object pointers must be refcounted, other than:
> > > +		 * - PTR_TO_CTX
> > > +		 * - PTR_TRUSTED pointers
> > >  		 */
> > > -		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
> > > -			bpf_log(log, "R%d must be referenced\n", regno);
> > > +		if (is_kfunc &&
> > > +		    trusted_args &&
> > > +		    obj_ptr &&
> > > +		    base_type(reg->type) != PTR_TO_CTX &&
> > > +		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
> > > +		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
> > > +		    !reg->ref_obj_id) {
> > 
> > This is pretty hard to read.
> > Is this checking:
> > !(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> > ?
> > 
> > Why not to use the above?
> 
> Agreed this is more readable, I'll do this for v8 (from a helper as you
> suggested).

Sorry, my initial response was incorrect. After thinking about this
more, I don't think this conditional would be correct here:

	!(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))

That conditional is saying, "If it's PTR_TO_BTF_ID, and either no
modifiers are set, or PTR_TRUSTED is set". Or in other words, "If
PTR_TO_BTF_ID is set, we don't need a refcount check unless a modifier
other than PTR_TRUSTED is set on the register." This is incorrect, as it
would short-circuit out of the check before !reg->ref_obj_id for
reg->type == PTR_TO_BTF_ID, so we would skip the reference requirement
for normal, unmodified PTR_TO_BTF_ID objects. It would also cause us to
incorrectly _not_ skip the ref_obj_id > 0 check for when a
reg2btf_ids[base_type(reg->type)] register has the PTR_TRUSTED modifier.

What we really need is a check that encodes, "Don't require a refcount
if PTR_TRUSTED is present and no other type modifiers are present",
i.e.:

	!(type_flag(reg->type) & PTR_TRUSTED) || (type_flag(reg->type) & ~PTR_TRUSTED)

My intention was to be conservative here and say "only trust PTR_TRUSTED
if no other type modifiers are set". I think this is necessary because
other type modifiers such as PTR_UNTRUSTED could theoretically be set on
the register as well. Clearly this code is pretty difficult to reason
about though, so I'm open to suggestions for how to simplify it.

I'll point out specifically that it's difficult to reason about when
modifiers are or are not safe to allow. For example, we definitely don't
want to skip the refcount check for OBJ_RELEASE | PTR_TRUSTED, because
if it's a release arg it should always have a refcount on it.
PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
though seems fine? In general, I thought it was prudent for us to take
the most conservative possible approach here, which is that PTR_TRUSTED
only applies when no other modifiers are present, and it applies for all
obj_ptr types (other than PTR_TO_CTX which does its own thing).

Note as well that this check is different from the one you pointed out
below, which is verifying that PTR_TRUSTED is the only modifier for both
reg2btf_ids[base_type(reg->type)] and base_type(reg->type) ==
PTR_TO_BTF_ID.  Additionally, the check is different than the check in
check_reg_type(), which I'll highlight below where the code is actually
modified.

> 
> > Similar in other places... type_flag(reg->type) & ~PTR_TRUSTED is not easy.
> > Maybe add a helper that will do
> > bool ff(reg)
> > { 
> >   return reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED);
> > }
> > 
> > ?
> 
> Sure, will do.
> 
> > > +			bpf_log(log, "R%d must be referenced or trusted\n", regno);
> > >  			return -EINVAL;
> > >  		}
> > >  
> > > @@ -6646,8 +6665,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > >  					i, btf_type_str(t));
> > >  				return -EINVAL;
> > >  			}
> > > -		} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
> > > -			   (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
> > > +		} else if (is_kfunc && (base_type(reg->type) == PTR_TO_BTF_ID ||
> > > +			   (reg2btf_ids[base_type(reg->type)]))) {
> > >  			const struct btf_type *reg_ref_t;
> > >  			const struct btf *reg_btf;
> > >  			const char *reg_ref_tname;
> > > @@ -6660,7 +6679,13 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > >  				return -EINVAL;
> > >  			}
> > >  
> > > -			if (reg->type == PTR_TO_BTF_ID) {
> > > +			if ((type_flag(reg->type) & ~PTR_TRUSTED)) {
> > 
> > and use that helper here?
> 
> I don't think that specific helper would work here because we also need
> to verify that no type modifiers other than PTR_TRUSTED are present for
> when reg2btf_ids[base_type(reg->type)] is non-NULL.

As mentioned above, this check is slightly different than the one which
would be encapsulated in the helper suggested.

> 
> > 
> > > +				bpf_log(log, "kernel function %s arg#%d pointer had unexpected modifiers %d\n",
> > > +					func_name, i, type_flag(reg->type));
> > > +				return -EINVAL;
> > > +			}
> > > +
> > > +			if (base_type(reg->type) == PTR_TO_BTF_ID) {
> > >  				reg_btf = reg->btf;
> > >  				reg_ref_id = reg->btf_id;
> > >  			} else {
> > > @@ -6988,6 +7013,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
> > >  			}
> > >  
> > >  			reg->type = PTR_TO_MEM | PTR_MAYBE_NULL;
> > > +
> > >  			reg->id = ++env->id_gen;
> > >  
> > >  			continue;
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 0312d9ce292f..f5b6b1f969d9 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -543,7 +543,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
> > >  static const char *reg_type_str(struct bpf_verifier_env *env,
> > >  				enum bpf_reg_type type)
> > >  {
> > > -	char postfix[16] = {0}, prefix[32] = {0};
> > > +	char postfix[16] = {0}, prefix[64] = {0};
> > >  	static const char * const str[] = {
> > >  		[NOT_INIT]		= "?",
> > >  		[SCALAR_VALUE]		= "scalar",
> > > @@ -575,16 +575,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
> > >  			strncpy(postfix, "_or_null", 16);
> > >  	}
> > >  
> > > -	if (type & MEM_RDONLY)
> > > -		strncpy(prefix, "rdonly_", 32);
> > > -	if (type & MEM_RINGBUF)
> > > -		strncpy(prefix, "ringbuf_", 32);
> > > -	if (type & MEM_USER)
> > > -		strncpy(prefix, "user_", 32);
> > > -	if (type & MEM_PERCPU)
> > > -		strncpy(prefix, "percpu_", 32);
> > > -	if (type & PTR_UNTRUSTED)
> > > -		strncpy(prefix, "untrusted_", 32);
> > > +	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
> > > +		 type & MEM_RDONLY ? "rdonly_" : "",
> > > +		 type & MEM_RINGBUF ? "ringbuf_" : "",
> > > +		 type & MEM_USER ? "user_" : "",
> > > +		 type & MEM_PERCPU ? "percpu_" : "",
> > > +		 type & PTR_UNTRUSTED ? "untrusted_" : "",
> > > +		 type & PTR_TRUSTED ? "trusted_" : ""
> > > +	);
> > >  
> > >  	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
> > >  		 prefix, str[base_type(type)], postfix);
> > > @@ -3844,7 +3842,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  			       struct bpf_reg_state *reg, u32 regno)
> > >  {
> > >  	const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id);
> > > -	int perm_flags = PTR_MAYBE_NULL;
> > > +	int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED;
> > >  	const char *reg_name = "";
> > >  
> > >  	/* Only unreferenced case accepts untrusted pointers */
> > > @@ -4707,6 +4705,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > >  	if (type_flag(reg->type) & PTR_UNTRUSTED)
> > >  		flag |= PTR_UNTRUSTED;
> > >  
> > > +	/* Any pointer obtained from walking a trusted pointer is no longer trusted. */
> > > +	flag &= ~PTR_TRUSTED;
> > > +
> > >  	if (atype == BPF_READ && value_regno >= 0)
> > >  		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
> > >  
> > > @@ -5774,6 +5775,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = {
> > >  		PTR_TO_TCP_SOCK,
> > >  		PTR_TO_XDP_SOCK,
> > >  		PTR_TO_BTF_ID,
> > > +		PTR_TO_BTF_ID | PTR_TRUSTED,
> > >  	},
> > >  	.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
> > >  };
> > > @@ -5807,9 +5809,19 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
> > >  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
> > >  static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
> > >  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> > > -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> > > +static const struct bpf_reg_types btf_ptr_types = {
> > > +	.types = {
> > > +		PTR_TO_BTF_ID,
> > > +		PTR_TO_BTF_ID | PTR_TRUSTED,
> > > +	},
> > > +};
> > >  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
> > > -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
> > > +static const struct bpf_reg_types percpu_btf_ptr_types = {
> > > +	.types = {
> > > +		PTR_TO_BTF_ID | MEM_PERCPU,
> > > +		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
> > > +	}
> > > +};
> > >  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > @@ -5897,7 +5909,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > >  	return -EACCES;
> > >  
> > >  found:
> > > -	if (reg->type == PTR_TO_BTF_ID) {
> > > +	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {

As mentioned above, this check is different than the one we're doing in
btf_ctx_access() when determining if the reg requires a ref_obj_id > 0.
This check is actually doing what you originally suggested above:

if (reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))

I think what you wrote is more readable and am happy to apply it to this
check in v8, but unfortunately I don't think we really have an
opportunity to avoid code duplication here with a helper (though a
helper may still improve readability).

Let me know your thoughts. I'll wait to post v8 until we've resolved
this.

> > >  		/* For bpf_sk_release, it needs to match against first member
> > >  		 * 'struct sock_common', hence make an exception for it. This
> > >  		 * allows bpf_sk_release to work for multiple socket types.
> > > @@ -5973,6 +5985,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> > >  	 * fixed offset.
> > >  	 */
> > >  	case PTR_TO_BTF_ID:
> > > +	case PTR_TO_BTF_ID | PTR_TRUSTED:
> > >  		/* When referenced PTR_TO_BTF_ID is passed to release function,
> > >  		 * it's fixed offset must be 0.	In the other cases, fixed offset
> > >  		 * can be non-zero.
> > > @@ -13690,6 +13703,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> > >  			break;
> > >  		case PTR_TO_BTF_ID:
> > >  		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> > > +		case PTR_TO_BTF_ID | PTR_TRUSTED:
> > > +		case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED:
> > >  			if (type == BPF_READ) {
> > >  				insn->code = BPF_LDX | BPF_PROBE_MEM |
> > >  					BPF_SIZE((insn)->code);
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index f2d8d070d024..5b9008bc597b 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf)
> > >  const struct bpf_func_proto bpf_get_current_task_btf_proto = {
> > >  	.func		= bpf_get_current_task_btf,
> > >  	.gpl_only	= true,
> > > -	.ret_type	= RET_PTR_TO_BTF_ID,
> > > +	.ret_type	= RET_PTR_TO_BTF_ID_TRUSTED,
> > >  	.ret_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
> > >  };
> > >  
> > > diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> > > index d15c91de995f..0006b5438ff7 100644
> > > --- a/net/ipv4/bpf_tcp_ca.c
> > > +++ b/net/ipv4/bpf_tcp_ca.c
> > > @@ -61,7 +61,9 @@ static bool bpf_tcp_ca_is_valid_access(int off, int size,
> > >  	if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
> > >  		return false;
> > >  
> > > -	if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id)
> > > +	if (base_type(info->reg_type) == PTR_TO_BTF_ID &&
> > > +	    !(type_flag(info->reg_type) & ~PTR_TRUSTED) &&
> > > +	    info->btf_id == sock_id)
> > >  		/* promote it to tcp_sock */
> > >  		info->btf_id = tcp_sock_id;
> > >  
> > > diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
> > > index e1a937277b54..7ac947f00df4 100644
> > > --- a/tools/testing/selftests/bpf/verifier/calls.c
> > > +++ b/tools/testing/selftests/bpf/verifier/calls.c
> > > @@ -109,7 +109,7 @@
> > >  	},
> > >  	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > >  	.result = REJECT,
> > > -	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point",
> > > +	.errstr = "arg#0 pointer had unexpected modifiers",
> > >  	.fixup_kfunc_btf_id = {
> > >  		{ "bpf_kfunc_call_test_acquire", 3 },
> > >  		{ "bpf_kfunc_call_test_release", 5 },
> > > diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > > index fd683a32a276..d9367f2894b9 100644
> > > --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > > +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> > > @@ -142,7 +142,7 @@
> > >  	.kfunc = "bpf",
> > >  	.expected_attach_type = BPF_LSM_MAC,
> > >  	.flags = BPF_F_SLEEPABLE,
> > > -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> > > +	.errstr = "arg#0 pointer had unexpected modifiers",
> > >  	.fixup_kfunc_btf_id = {
> > >  		{ "bpf_lookup_user_key", 2 },
> > >  		{ "bpf_key_put", 4 },
> > > @@ -163,7 +163,7 @@
> > >  	.kfunc = "bpf",
> > >  	.expected_attach_type = BPF_LSM_MAC,
> > >  	.flags = BPF_F_SLEEPABLE,
> > > -	.errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar",
> > > +	.errstr = "arg#0 pointer had unexpected modifiers",
> > >  	.fixup_kfunc_btf_id = {
> > >  		{ "bpf_lookup_system_key", 1 },
> > >  		{ "bpf_key_put", 3 },
> > > -- 
> > > 2.38.1
> > > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-18 15:08           ` David Vernet
@ 2022-11-18 18:31             ` Alexei Starovoitov
  2022-11-19  6:09               ` John Fastabend
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-18 18:31 UTC (permalink / raw)
  To: David Vernet
  Cc: John Fastabend, bpf, ast, andrii, daniel, martin.lau, memxor,
	yhs, song, sdf, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 09:08:12AM -0600, David Vernet wrote:
> On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote:
> 
> [...]
> 
> > > > And last thing I was checking is because KF_SLEEPABLE is not set
> > > > this should be blocked from running on sleepable progs which would
> > > > break the call_rcu in the destructor. Maybe small nit, not sure
> > > > its worth it but might be nice to annotate the helper description
> > > > with a note, "will not work on sleepable progs" or something to
> > > > that effect.
> > > 
> > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> > > not whether the calling program can be sleepable. call_rcu() doesn't
> > > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> > > a kfunc is sleepable, non-sleepable programs are not able to call it
> > > (and this is enforced in the verifier).
> > 
> > OK but should these helpers be allowed in sleepable progs? I think
> > not. What stops this, (using your helpers):
> > 
> >   cpu0                                       cpu1
> >   ----
> >   v = insert_lookup_task(task)
> >   kptr = bpf_kptr_xchg(&v->task, NULL);
> >   if (!kptr)
> >     return 0;
> >                                             map_delete_elem()
> >                                                put_task()
> >                                                  rcu_call
> >   do_something_might_sleep()
> >                                                     put_task_struct
> >                                                       ... free  

the free won't happen here, because the kptr on cpu0 holds the refcnt.
bpf side never does direct free of kptr. It only inc/dec refcnt via kfuncs.

> >   kptr->[free'd memory]
> >  
> > the insert_lookup_task will bump the refcnt on the acquire on map
> > insert. But the lookup doesn't do anything to the refcnt and the

lookup from map doesn't touch kptrs in the value.
just reading v->kptr becomes PTR_UNTRUSTED with probe_mem protection.

> > map_delete_elem will delete it. We have a check for spin_lock
> > types to stop them from being in sleepable progs. Did I miss a
> > similar check for these?
> 
> So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically
> xchg the kptr from the map, and so the map_delete_elem() call would fail
> with (something like) -ENOENT. In general, the semantics are similar to
> std::unique_ptr::swap() in C++.
> 
> FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to
> reason about. The reason is that we're passing a pointer to the map
> value containing a kptr directly to the kfunc (with the attempt of
> acquiring an additional reference if a kptr was already present in the
> map) rather than doing an xchg which atomically gets us the unique
> pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone
> else could come along and delete the kptr from the map while the kfunc
> is trying to acquire that additional reference. The race looks something
> like this:
> 
>    cpu0                                       cpu1
>    ----
>    v = insert_lookup_task(task)
>    kptr = bpf_task_kptr_get(&v->task);
>                                              map_delete_elem()
>                                                 put_task()
>                                                   rcu_call
>                                                      put_task_struct
>                                                        ... free  
>    if (!kptr)
>      /* In this race example, this path will be taken. */
>      return 0;
> 
> The difference is that here, we're not doing an atomic xchg of the kptr
> out of the map. Instead, we're passing a pointer to the map value
> containing the kptr directly to bpf_task_kptr_get(), which itself tries
> to acquire an additional reference on the task to return to the program
> as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU
> and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure
> that it can't hit a UAF, and that it won't return a dying task to the
> caller:
> 
> /**
>  * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
>  * kptr acquired by this kfunc which is not subsequently stored in a map, must
>  * be released by calling bpf_task_release().
>  * @pp: A pointer to a task kptr on which a reference is being acquired.
>  */
> __used noinline
> struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
> {
>         struct task_struct *p;
> 
>         rcu_read_lock();
>         p = READ_ONCE(*pp);
> 
> 	/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 	 * cpu1 could remove the element from the map here, and invoke
> 	 * put_task_struct_rcu_user(). We're in an RCU read region
> 	 * though, so the task won't be freed until at the very
> 	 * earliest, the rcu_read_unlock() below.
> 	 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 	 */
> 
>         if (p && !refcount_inc_not_zero(&p->rcu_users))
> 		/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		 * refcount_inc_not_zero() will return false, as cpu1
> 		 * deleted the element from the map and dropped its last
> 		 * refcount. So we just return NULL as the task will be
> 		 * deleted once an RCU gp has elapsed.
> 		 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> 		 */
>                 p = NULL;
>         rcu_read_unlock();
> 
>         return p;
> }
> 
> Let me know if that makes sense. This stuff is tricky, and I plan to
> clearly / thoroughly add it to that kptr docs page once this patch set
> lands.

All correct. Probably worth adding this comment directly in bpf_task_kptr_get.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-18 16:45       ` David Vernet
@ 2022-11-18 18:45         ` Alexei Starovoitov
  2022-11-18 21:44           ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-18 18:45 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 10:45:37AM -0600, David Vernet wrote:
> On Fri, Nov 18, 2022 at 08:45:44AM -0600, David Vernet wrote:
> 
> [...]
> 
> > > >  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > > >  		    const struct bpf_prog *prog,
> > > >  		    struct bpf_insn_access_aux *info)
> > > > @@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > > >  	}
> > > >  
> > > >  	info->reg_type = PTR_TO_BTF_ID;
> > > > +	if (prog_type_args_trusted(prog->type))
> > > > +		info->reg_type |= PTR_TRUSTED;
> > > > +
> > > >  	if (tgt_prog) {
> > > >  		enum bpf_prog_type tgt_type;
> > > >  
> > > > @@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > > >  		/* These register types have special constraints wrt ref_obj_id
> > > >  		 * and offset checks. The rest of trusted args don't.
> > > >  		 */
> > > > -		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
> > > > +		obj_ptr = reg->type == PTR_TO_CTX ||
> > > > +			  base_type(reg->type) == PTR_TO_BTF_ID ||
> > > >  			  reg2btf_ids[base_type(reg->type)];
> > > >  
> > > >  		/* Check if argument must be a referenced pointer, args + i has
> > > >  		 * been verified to be a pointer (after skipping modifiers).
> > > >  		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
> > > > +		 *
> > > > +		 * All object pointers must be refcounted, other than:
> > > > +		 * - PTR_TO_CTX
> > > > +		 * - PTR_TRUSTED pointers
> > > >  		 */
> > > > -		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
> > > > -			bpf_log(log, "R%d must be referenced\n", regno);
> > > > +		if (is_kfunc &&
> > > > +		    trusted_args &&
> > > > +		    obj_ptr &&
> > > > +		    base_type(reg->type) != PTR_TO_CTX &&
> > > > +		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
> > > > +		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
> > > > +		    !reg->ref_obj_id) {
> > > 
> > > This is pretty hard to read.
> > > Is this checking:
> > > !(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> > > ?
> > > 
> > > Why not to use the above?
> > 
> > Agreed this is more readable, I'll do this for v8 (from a helper as you
> > suggested).
> 
> Sorry, my initial response was incorrect. After thinking about this
> more, I don't think this conditional would be correct here:
> 
> 	!(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> 
> That conditional is saying, "If it's PTR_TO_BTF_ID, and either no
> modifiers are set, or PTR_TRUSTED is set". Or in other words, "If
> PTR_TO_BTF_ID is set, we don't need a refcount check unless a modifier
> other than PTR_TRUSTED is set on the register." This is incorrect, as it
> would short-circuit out of the check before !reg->ref_obj_id for
> reg->type == PTR_TO_BTF_ID, so we would skip the reference requirement
> for normal, unmodified PTR_TO_BTF_ID objects. It would also cause us to
> incorrectly _not_ skip the ref_obj_id > 0 check for when a
> reg2btf_ids[base_type(reg->type)] register has the PTR_TRUSTED modifier.
> 
> What we really need is a check that encodes, "Don't require a refcount
> if PTR_TRUSTED is present and no other type modifiers are present",
> i.e.:
> 
> 	!(type_flag(reg->type) & PTR_TRUSTED) || (type_flag(reg->type) & ~PTR_TRUSTED)
> 
> My intention was to be conservative here and say "only trust PTR_TRUSTED
> if no other type modifiers are set". I think this is necessary because
> other type modifiers such as PTR_UNTRUSTED could theoretically be set on
> the register as well. Clearly this code is pretty difficult to reason
> about though, so I'm open to suggestions for how to simplify it.
> 
> I'll point out specifically that it's difficult to reason about when
> modifiers are or are not safe to allow. For example, we definitely don't
> want to skip the refcount check for OBJ_RELEASE | PTR_TRUSTED, because

OBJ_RELEASE cannot be part of reg flag.
It's only in arg_type.

Anyway Kumar's refactoring was applied the code in question looks different now:
It would fall into this part:
case KF_ARG_PTR_TO_BTF_ID:
        /* Only base_type is checked, further checks are done here */
        if (reg->type != PTR_TO_BTF_ID &&
            (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
                verbose(env, "arg#%d expected pointer to btf or socket\n", i);
                return -EINVAL;
        }
        ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);

> if it's a release arg it should always have a refcount on it.
> PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
> though seems fine? In general, I thought it was prudent for us to take
> the most conservative possible approach here, which is that PTR_TRUSTED
> only applies when no other modifiers are present, and it applies for all
> obj_ptr types (other than PTR_TO_CTX which does its own thing).

Probably worth refining when PTR_TRUSTED is cleared.
For example adding PTR_UNTRUSTED should definitely clear it.
MEM_ALLOC flag is probably equivalent to PTR_TRUSTED.
Maybe the bit:
regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
should set PTR_TRUSTED as well?

> 
> Note as well that this check is different from the one you pointed out
> below, which is verifying that PTR_TRUSTED is the only modifier for both
> reg2btf_ids[base_type(reg->type)] and base_type(reg->type) ==
> PTR_TO_BTF_ID.  Additionally, the check is different than the check in
> check_reg_type(), which I'll highlight below where the code is actually
> modified.

I'm mainly objecting to logic:
!(type_flag(reg->type) & PTR_TRUSTED) || (type_flag(reg->type) & ~PTR_TRUSTED)

which looks like 'catch-all'.
Like it will error on MEM_ALLOC which probably not correct.
In other words it's 'too conservative'. Meaning it's rejecting valid code.

> > > >  
> > > >  found:
> > > > -	if (reg->type == PTR_TO_BTF_ID) {
> > > > +	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {
> 
> As mentioned above, this check is different than the one we're doing in
> btf_ctx_access() when determining if the reg requires a ref_obj_id > 0.
> This check is actually doing what you originally suggested above:
> 
> if (reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> 
> I think what you wrote is more readable and am happy to apply it to this
> check in v8, but unfortunately I don't think we really have an
> opportunity to avoid code duplication here with a helper (though a
> helper may still improve readability).

ok. forget the helper. open coding all conditions is probably cleaner,
since they will be different in every case.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-18 18:45         ` Alexei Starovoitov
@ 2022-11-18 21:44           ` David Vernet
  2022-11-19  4:13             ` Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-18 21:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 10:45:00AM -0800, Alexei Starovoitov wrote:
> On Fri, Nov 18, 2022 at 10:45:37AM -0600, David Vernet wrote:
> > On Fri, Nov 18, 2022 at 08:45:44AM -0600, David Vernet wrote:
> > 
> > [...]
> > 
> > > > >  bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > > > >  		    const struct bpf_prog *prog,
> > > > >  		    struct bpf_insn_access_aux *info)
> > > > > @@ -5722,6 +5727,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > > > >  	}
> > > > >  
> > > > >  	info->reg_type = PTR_TO_BTF_ID;
> > > > > +	if (prog_type_args_trusted(prog->type))
> > > > > +		info->reg_type |= PTR_TRUSTED;
> > > > > +
> > > > >  	if (tgt_prog) {
> > > > >  		enum bpf_prog_type tgt_type;
> > > > >  
> > > > > @@ -6558,15 +6566,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
> > > > >  		/* These register types have special constraints wrt ref_obj_id
> > > > >  		 * and offset checks. The rest of trusted args don't.
> > > > >  		 */
> > > > > -		obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
> > > > > +		obj_ptr = reg->type == PTR_TO_CTX ||
> > > > > +			  base_type(reg->type) == PTR_TO_BTF_ID ||
> > > > >  			  reg2btf_ids[base_type(reg->type)];
> > > > >  
> > > > >  		/* Check if argument must be a referenced pointer, args + i has
> > > > >  		 * been verified to be a pointer (after skipping modifiers).
> > > > >  		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
> > > > > +		 *
> > > > > +		 * All object pointers must be refcounted, other than:
> > > > > +		 * - PTR_TO_CTX
> > > > > +		 * - PTR_TRUSTED pointers
> > > > >  		 */
> > > > > -		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
> > > > > -			bpf_log(log, "R%d must be referenced\n", regno);
> > > > > +		if (is_kfunc &&
> > > > > +		    trusted_args &&
> > > > > +		    obj_ptr &&
> > > > > +		    base_type(reg->type) != PTR_TO_CTX &&
> > > > > +		    (!(type_flag(reg->type) & PTR_TRUSTED) ||
> > > > > +		     (type_flag(reg->type) & ~PTR_TRUSTED)) &&
> > > > > +		    !reg->ref_obj_id) {
> > > > 
> > > > This is pretty hard to read.
> > > > Is this checking:
> > > > !(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> > > > ?
> > > > 
> > > > Why not to use the above?
> > > 
> > > Agreed this is more readable, I'll do this for v8 (from a helper as you
> > > suggested).
> > 
> > Sorry, my initial response was incorrect. After thinking about this
> > more, I don't think this conditional would be correct here:
> > 
> > 	!(reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> > 
> > That conditional is saying, "If it's PTR_TO_BTF_ID, and either no
> > modifiers are set, or PTR_TRUSTED is set". Or in other words, "If
> > PTR_TO_BTF_ID is set, we don't need a refcount check unless a modifier
> > other than PTR_TRUSTED is set on the register." This is incorrect, as it
> > would short-circuit out of the check before !reg->ref_obj_id for
> > reg->type == PTR_TO_BTF_ID, so we would skip the reference requirement
> > for normal, unmodified PTR_TO_BTF_ID objects. It would also cause us to
> > incorrectly _not_ skip the ref_obj_id > 0 check for when a
> > reg2btf_ids[base_type(reg->type)] register has the PTR_TRUSTED modifier.
> > 
> > What we really need is a check that encodes, "Don't require a refcount
> > if PTR_TRUSTED is present and no other type modifiers are present",
> > i.e.:
> > 
> > 	!(type_flag(reg->type) & PTR_TRUSTED) || (type_flag(reg->type) & ~PTR_TRUSTED)
> > 
> > My intention was to be conservative here and say "only trust PTR_TRUSTED
> > if no other type modifiers are set". I think this is necessary because
> > other type modifiers such as PTR_UNTRUSTED could theoretically be set on
> > the register as well. Clearly this code is pretty difficult to reason
> > about though, so I'm open to suggestions for how to simplify it.
> > 
> > I'll point out specifically that it's difficult to reason about when
> > modifiers are or are not safe to allow. For example, we definitely don't
> > want to skip the refcount check for OBJ_RELEASE | PTR_TRUSTED, because
> 
> OBJ_RELEASE cannot be part of reg flag.
> It's only in arg_type.

Ah yeah, fair enough. Got confused because it's part of the same
bpf_type_flag enum. I think the point in general stands though.

> Anyway Kumar's refactoring was applied the code in question looks different now:
> It would fall into this part:

Great, that's a nice simplification.

> case KF_ARG_PTR_TO_BTF_ID:
>         /* Only base_type is checked, further checks are done here */
>         if (reg->type != PTR_TO_BTF_ID &&
>             (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
>                 verbose(env, "arg#%d expected pointer to btf or socket\n", i);
>                 return -EINVAL;
>         }
>         ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> 
> > if it's a release arg it should always have a refcount on it.
> > PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
> > though seems fine? In general, I thought it was prudent for us to take
> > the most conservative possible approach here, which is that PTR_TRUSTED
> > only applies when no other modifiers are present, and it applies for all
> > obj_ptr types (other than PTR_TO_CTX which does its own thing).
> 
> Probably worth refining when PTR_TRUSTED is cleared.
> For example adding PTR_UNTRUSTED should definitely clear it.

That makes sense for PTR_UNTRUSTED, what about the other type modifiers
like PTR_MAYBE_NULL? We set and unset if a ptr is NULL throughout a
function, so we'd have to record if it was previously trusted in order
to properly re-OR after a NULL check.

> MEM_ALLOC flag is probably equivalent to PTR_TRUSTED.
> Maybe the bit:
> regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> should set PTR_TRUSTED as well?

We could, but that changes the meaning of PTR_TRUSTED and IMO makes it
harder to reason about. Before it was just "the kernel passed this arg
to the program and promises the program that it was trusted when it was
first passed". Now it's that plus it could mean that it points to an
allocated object from bpf_obj_new()". IMO we should keep all of these
modifiers separate so that the presence of a modifier has a well-defined
meaning that we can interpret in each context as needed.  In this case,
we can make trust opt-in, so a KF_TRUSTED_ARGS BTF pointer either of the
following:

1. reg->ref_obj_id > 0
2. Either one of PTR_TRUSTED | MEM_ALLOC type modifiers are set, and no
   others.

Let me know if that sounds OK to you.

> > 
> > Note as well that this check is different from the one you pointed out
> > below, which is verifying that PTR_TRUSTED is the only modifier for both
> > reg2btf_ids[base_type(reg->type)] and base_type(reg->type) ==
> > PTR_TO_BTF_ID.  Additionally, the check is different than the check in
> > check_reg_type(), which I'll highlight below where the code is actually
> > modified.
> 
> I'm mainly objecting to logic:
> !(type_flag(reg->type) & PTR_TRUSTED) || (type_flag(reg->type) & ~PTR_TRUSTED)
> 
> which looks like 'catch-all'.
> Like it will error on MEM_ALLOC which probably not correct.
> In other words it's 'too conservative'. Meaning it's rejecting valid code.

Agreed that after the rebase this would no longer be correct. I think we
should make it opt-in, though. PTR_TRUSTED | MEM_ALLOC is fine.
PTR_TRUSTED | MEM_ALLOC | PTR_MAYBE_NULL would not be.

> 
> > > > >  
> > > > >  found:
> > > > > -	if (reg->type == PTR_TO_BTF_ID) {
> > > > > +	if (base_type(reg->type) == PTR_TO_BTF_ID && !(type_flag(reg->type) & ~PTR_TRUSTED)) {
> > 
> > As mentioned above, this check is different than the one we're doing in
> > btf_ctx_access() when determining if the reg requires a ref_obj_id > 0.
> > This check is actually doing what you originally suggested above:
> > 
> > if (reg->type == PTR_TO_BTF_ID || reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED))
> > 
> > I think what you wrote is more readable and am happy to apply it to this
> > check in v8, but unfortunately I don't think we really have an
> > opportunity to avoid code duplication here with a helper (though a
> > helper may still improve readability).
> 
> ok. forget the helper. open coding all conditions is probably cleaner,
> since they will be different in every case.

Ack

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-18 21:44           ` David Vernet
@ 2022-11-19  4:13             ` Alexei Starovoitov
  2022-11-19  5:14               ` David Vernet
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-19  4:13 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 03:44:42PM -0600, David Vernet wrote:
> > > if it's a release arg it should always have a refcount on it.
> > > PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
> > > though seems fine? In general, I thought it was prudent for us to take
> > > the most conservative possible approach here, which is that PTR_TRUSTED
> > > only applies when no other modifiers are present, and it applies for all
> > > obj_ptr types (other than PTR_TO_CTX which does its own thing).
> > 
> > Probably worth refining when PTR_TRUSTED is cleared.
> > For example adding PTR_UNTRUSTED should definitely clear it.
> 
> That makes sense for PTR_UNTRUSTED, what about the other type modifiers
> like PTR_MAYBE_NULL? We set and unset if a ptr is NULL throughout a
> function, so we'd have to record if it was previously trusted in order
> to properly re-OR after a NULL check.

PTR_MAYBE_NULL is another bit and I don't think it conflicts with PTR_TRUSTED.
PTR_TO_BTF_ID | PTR_TRUSTED is a valid pointer.
PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL is a valid pointer or NULL.

PTR_TO_BTF_ID | PTR_MAYBE_NULL is a legacy "valid pointer" or NULL.
That legacy pointer cannot be passed to KF_TRUSTED_ARGS kfuncs.

KF_TRUSTED_ARGS kfuncs should not accept PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL.
It's a job of the prog to do != NULL check.
Otherwise all such != NULL checks would need to move inside kfuncs which is not good.

> > MEM_ALLOC flag is probably equivalent to PTR_TRUSTED.
> > Maybe the bit:
> > regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> > should set PTR_TRUSTED as well?
> 
> We could, but that changes the meaning of PTR_TRUSTED and IMO makes it
> harder to reason about. Before it was just "the kernel passed this arg
> to the program and promises the program that it was trusted when it was
> first passed". Now it's that plus it could mean that it points to an
> allocated object from bpf_obj_new()". IMO we should keep all of these
> modifiers separate so that the presence of a modifier has a well-defined
> meaning that we can interpret in each context as needed.  In this case,
> we can make trust opt-in, so a KF_TRUSTED_ARGS BTF pointer either of the
> following:
> 
> 1. reg->ref_obj_id > 0
> 2. Either one of PTR_TRUSTED | MEM_ALLOC type modifiers are set, and no
>    others.

I don't think MEM_ALLOC conflicts with PTR_TRUSTED.
MEM_ALLOC flags means that it came from bpf_obj_new() and that's what
bpf_spin_lock and bpf_obj_drop() want to see.

Adding PTR_TRUSTED to MEM_ALLOC looks necessary to me.
It doesn't have to be done right now, but eventually feels right.

I've been thinking whether reg->ref_obj_id > 0 condition should be converted
to PTR_TRUSTED too...
On one side it will simplify the check for KF_TRUSTED_ARGS.
The only thing check_kfunc_args() would need to do is:
is_kfunc_trusted_args(meta)
&& type_flag(reg->type) & PTR_TRUSTED
&& !(type_flag(reg->type) & PTR_MAYBE_NULL)

On the other side fixing all places where we set ref_obj_id
and adding |= PTR_TRUSTED may be too cumbersome ?

Right now we're saying PTR_TO_CTX is implicitly trusted, but we can OR
PTR_TO_CTX with PTR_TRUSTED to make it explicit and truly generalize the check.

> Agreed that after the rebase this would no longer be correct. I think we
> should make it opt-in, though. PTR_TRUSTED | MEM_ALLOC is fine.
> PTR_TRUSTED | MEM_ALLOC | PTR_MAYBE_NULL would not be.

to pass into KF_TRUSTED_ARGS kfunc? Agree.
I guess we can tighten the check a bit:
is_kfunc_trusted_args(meta)
&& type_flag(reg->type) & PTR_TRUSTED
&& !(type_flag(reg->type) & ~(PTR_TRUSTED | MEM_ALLOC))

In english: the pointer should have PTR_TRUSTED flag and
no other flags other than PTR_TRUSTED and MEM_ALLOC should be set.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-19  4:13             ` Alexei Starovoitov
@ 2022-11-19  5:14               ` David Vernet
  2022-11-19 16:48                 ` Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: David Vernet @ 2022-11-19  5:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 08:13:37PM -0800, Alexei Starovoitov wrote:
> On Fri, Nov 18, 2022 at 03:44:42PM -0600, David Vernet wrote:
> > > > if it's a release arg it should always have a refcount on it.
> > > > PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
> > > > though seems fine? In general, I thought it was prudent for us to take
> > > > the most conservative possible approach here, which is that PTR_TRUSTED
> > > > only applies when no other modifiers are present, and it applies for all
> > > > obj_ptr types (other than PTR_TO_CTX which does its own thing).
> > > 
> > > Probably worth refining when PTR_TRUSTED is cleared.
> > > For example adding PTR_UNTRUSTED should definitely clear it.



> > 
> > That makes sense for PTR_UNTRUSTED, what about the other type modifiers
> > like PTR_MAYBE_NULL? We set and unset if a ptr is NULL throughout a
> > function, so we'd have to record if it was previously trusted in order
> > to properly re-OR after a NULL check.
> 
> PTR_MAYBE_NULL is another bit and I don't think it conflicts with PTR_TRUSTED.
> PTR_TO_BTF_ID | PTR_TRUSTED is a valid pointer.
> PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL is a valid pointer or NULL.
> 
> PTR_TO_BTF_ID | PTR_MAYBE_NULL is a legacy "valid pointer" or NULL.
> That legacy pointer cannot be passed to KF_TRUSTED_ARGS kfuncs.
> 
> KF_TRUSTED_ARGS kfuncs should not accept PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL.

Indeed -- my point was that I don't think e.g. clearing PTR_TRUSTED when
we set PTR_UNTRUSTED will work, at least not yet. It's still too tricky
to find all the places where we'd have to &= ~PTR_TRUSTED or |=
PTR_TRUSTED when setting specific type modifiers. As described below, we
first have to clarify the general workflow to enable the presence of
PTR_TRUSTED to be the single source of truth for trust.

> It's a job of the prog to do != NULL check.
> Otherwise all such != NULL checks would need to move inside kfuncs which is not good.
> 
> > > MEM_ALLOC flag is probably equivalent to PTR_TRUSTED.
> > > Maybe the bit:
> > > regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> > > should set PTR_TRUSTED as well?
> > 
> > We could, but that changes the meaning of PTR_TRUSTED and IMO makes it
> > harder to reason about. Before it was just "the kernel passed this arg
> > to the program and promises the program that it was trusted when it was
> > first passed". Now it's that plus it could mean that it points to an
> > allocated object from bpf_obj_new()". IMO we should keep all of these
> > modifiers separate so that the presence of a modifier has a well-defined
> > meaning that we can interpret in each context as needed.  In this case,
> > we can make trust opt-in, so a KF_TRUSTED_ARGS BTF pointer either of the
> > following:
> > 
> > 1. reg->ref_obj_id > 0
> > 2. Either one of PTR_TRUSTED | MEM_ALLOC type modifiers are set, and no
> >    others.
> 
> I don't think MEM_ALLOC conflicts with PTR_TRUSTED.
> MEM_ALLOC flags means that it came from bpf_obj_new() and that's what
> bpf_spin_lock and bpf_obj_drop() want to see.
> 
> Adding PTR_TRUSTED to MEM_ALLOC looks necessary to me.
> It doesn't have to be done right now, but eventually feels right.

I think I agree. MEM_ALLOC should always imply PTR_TRUSTED. Ideally we
shouldn't have to check MEM_ALLOC for KF_TRUSTED_ARGS at all, and
PTR_TRUSTED should be the only modifier representing if something is
safe. For now I'd prefer to keep them separate until we have a clear
plan, especially with respect to clearing PTR_TRUSTED for when something
unsafe happens like PTR_UNTRUSTED or PTR_MAYBE_NULL. It's all too
muddied still.

> I've been thinking whether reg->ref_obj_id > 0 condition should be converted
> to PTR_TRUSTED too...
> On one side it will simplify the check for KF_TRUSTED_ARGS.
> The only thing check_kfunc_args() would need to do is:
> is_kfunc_trusted_args(meta)
> && type_flag(reg->type) & PTR_TRUSTED
> && !(type_flag(reg->type) & PTR_MAYBE_NULL)
> 
> On the other side fixing all places where we set ref_obj_id
> and adding |= PTR_TRUSTED may be too cumbersome ?

I think it's probably too cumbersome now, but yeah, as mentioned above,
I think it's the right direction. I think it will require a lot of
thought to do it right, though. With the code the way that it is now, I
can't convince myself that we wouldn't do something like |= PTR_TRUSTED
when we observe ref_obj_id > 0, and then later &= ~PTR_TRUSTED when
setting PTR_MAYBE_NULL. I think Kumar's latest patch set is a nice step
towards achieving this clearer state. Hopefully we can continue to
improve.

> Right now we're saying PTR_TO_CTX is implicitly trusted, but we can OR
> PTR_TO_CTX with PTR_TRUSTED to make it explicit and truly generalize the check.

Further agreed, this is the correct longer-term direction.

> > Agreed that after the rebase this would no longer be correct. I think we
> > should make it opt-in, though. PTR_TRUSTED | MEM_ALLOC is fine.
> > PTR_TRUSTED | MEM_ALLOC | PTR_MAYBE_NULL would not be.
> 
> to pass into KF_TRUSTED_ARGS kfunc? Agree.
> I guess we can tighten the check a bit:
> is_kfunc_trusted_args(meta)
> && type_flag(reg->type) & PTR_TRUSTED
> && !(type_flag(reg->type) & ~(PTR_TRUSTED | MEM_ALLOC))
> 
> In english: the pointer should have PTR_TRUSTED flag and
> no other flags other than PTR_TRUSTED and MEM_ALLOC should be set.

Yeah, I think this is the correct way to model this for now. Later on
once we refactor things, the presence of PTR_TRUSTED on its own should
be sufficient. A good north star to aim towards.

I'll send this out as v8 tomorrow.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs
  2022-11-18 18:31             ` Alexei Starovoitov
@ 2022-11-19  6:09               ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2022-11-19  6:09 UTC (permalink / raw)
  To: Alexei Starovoitov, David Vernet
  Cc: John Fastabend, bpf, ast, andrii, daniel, martin.lau, memxor,
	yhs, song, sdf, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

Alexei Starovoitov wrote:
> On Fri, Nov 18, 2022 at 09:08:12AM -0600, David Vernet wrote:
> > On Thu, Nov 17, 2022 at 10:04:27PM -0800, John Fastabend wrote:
> > 
> > [...]
> > 
> > > > > And last thing I was checking is because KF_SLEEPABLE is not set
> > > > > this should be blocked from running on sleepable progs which would
> > > > > break the call_rcu in the destructor. Maybe small nit, not sure
> > > > > its worth it but might be nice to annotate the helper description
> > > > > with a note, "will not work on sleepable progs" or something to
> > > > > that effect.
> > > > 
> > > > KF_SLEEPABLE is used to indicate whether the kfunc _itself_ may sleep,
> > > > not whether the calling program can be sleepable. call_rcu() doesn't
> > > > block, so no need to mark the kfunc as KF_SLEEPABLE. The key is that if
> > > > a kfunc is sleepable, non-sleepable programs are not able to call it
> > > > (and this is enforced in the verifier).
> > > 
> > > OK but should these helpers be allowed in sleepable progs? I think
> > > not. What stops this, (using your helpers):
> > > 
> > >   cpu0                                       cpu1
> > >   ----
> > >   v = insert_lookup_task(task)
> > >   kptr = bpf_kptr_xchg(&v->task, NULL);
> > >   if (!kptr)
> > >     return 0;
> > >                                             map_delete_elem()
> > >                                                put_task()
> > >                                                  rcu_call
> > >   do_something_might_sleep()
> > >                                                     put_task_struct
> > >                                                       ... free  
> 
> the free won't happen here, because the kptr on cpu0 holds the refcnt.
> bpf side never does direct free of kptr. It only inc/dec refcnt via kfuncs.
> 
> > >   kptr->[free'd memory]
> > >  
> > > the insert_lookup_task will bump the refcnt on the acquire on map
> > > insert. But the lookup doesn't do anything to the refcnt and the
> 
> lookup from map doesn't touch kptrs in the value.
> just reading v->kptr becomes PTR_UNTRUSTED with probe_mem protection.
> 
> > > map_delete_elem will delete it. We have a check for spin_lock
> > > types to stop them from being in sleepable progs. Did I miss a
> > > similar check for these?
> > 
> > So, in your example above, bpf_kptr_xchg(&v->task, NULL) will atomically
> > xchg the kptr from the map, and so the map_delete_elem() call would fail
> > with (something like) -ENOENT. In general, the semantics are similar to
> > std::unique_ptr::swap() in C++.
> > 
> > FWIW, I think KF_KPTR_GET kfuncs are the more complex / racy kfuncs to
> > reason about. The reason is that we're passing a pointer to the map
> > value containing a kptr directly to the kfunc (with the attempt of
> > acquiring an additional reference if a kptr was already present in the
> > map) rather than doing an xchg which atomically gets us the unique
> > pointer if nobody else xchgs it in first. So with KF_KPTR_GET, someone
> > else could come along and delete the kptr from the map while the kfunc
> > is trying to acquire that additional reference. The race looks something
> > like this:
> > 
> >    cpu0                                       cpu1
> >    ----
> >    v = insert_lookup_task(task)
> >    kptr = bpf_task_kptr_get(&v->task);
> >                                              map_delete_elem()
> >                                                 put_task()
> >                                                   rcu_call
> >                                                      put_task_struct
> >                                                        ... free  
> >    if (!kptr)
> >      /* In this race example, this path will be taken. */
> >      return 0;
> > 
> > The difference is that here, we're not doing an atomic xchg of the kptr
> > out of the map. Instead, we're passing a pointer to the map value
> > containing the kptr directly to bpf_task_kptr_get(), which itself tries
> > to acquire an additional reference on the task to return to the program
> > as a kptr. This is still safe, however, as bpf_task_kptr_get() uses RCU
> > and refcount_inc_not_zero() in the bpf_task_kptr_get() kfunc to ensure
> > that it can't hit a UAF, and that it won't return a dying task to the
> > caller:
> > 
> > /**
> >  * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
> >  * kptr acquired by this kfunc which is not subsequently stored in a map, must
> >  * be released by calling bpf_task_release().
> >  * @pp: A pointer to a task kptr on which a reference is being acquired.
> >  */
> > __used noinline
> > struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
> > {
> >         struct task_struct *p;
> > 
> >         rcu_read_lock();
> >         p = READ_ONCE(*pp);
> > 
> > 	/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 	 * cpu1 could remove the element from the map here, and invoke
> > 	 * put_task_struct_rcu_user(). We're in an RCU read region
> > 	 * though, so the task won't be freed until at the very
> > 	 * earliest, the rcu_read_unlock() below.
> > 	 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > 	 */
> > 
> >         if (p && !refcount_inc_not_zero(&p->rcu_users))
> > 		/* <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		 * refcount_inc_not_zero() will return false, as cpu1
> > 		 * deleted the element from the map and dropped its last
> > 		 * refcount. So we just return NULL as the task will be
> > 		 * deleted once an RCU gp has elapsed.
> > 		 * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > 		 */
> >                 p = NULL;
> >         rcu_read_unlock();
> > 
> >         return p;
> > }
> > 
> > Let me know if that makes sense. This stuff is tricky, and I plan to
> > clearly / thoroughly add it to that kptr docs page once this patch set
> > lands.
> 
> All correct. Probably worth adding this comment directly in bpf_task_kptr_get.

Yes also agree thanks for the details. Spent sometime trying to break
it this event, but didn't find anything.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
  2022-11-19  5:14               ` David Vernet
@ 2022-11-19 16:48                 ` Alexei Starovoitov
  0 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2022-11-19 16:48 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, ast, andrii, daniel, martin.lau, memxor, yhs, song, sdf,
	john.fastabend, kpsingh, jolsa, haoluo, tj, kernel-team,
	linux-kernel

On Fri, Nov 18, 2022 at 11:14:03PM -0600, David Vernet wrote:
> On Fri, Nov 18, 2022 at 08:13:37PM -0800, Alexei Starovoitov wrote:
> > On Fri, Nov 18, 2022 at 03:44:42PM -0600, David Vernet wrote:
> > > > > if it's a release arg it should always have a refcount on it.
> > > > > PTR_UNTRUSTED | PTR_TRUSTED would also make no sense. MEM_FIXED_SIZE
> > > > > though seems fine? In general, I thought it was prudent for us to take
> > > > > the most conservative possible approach here, which is that PTR_TRUSTED
> > > > > only applies when no other modifiers are present, and it applies for all
> > > > > obj_ptr types (other than PTR_TO_CTX which does its own thing).
> > > > 
> > > > Probably worth refining when PTR_TRUSTED is cleared.
> > > > For example adding PTR_UNTRUSTED should definitely clear it.
> 
> 
> 
> > > 
> > > That makes sense for PTR_UNTRUSTED, what about the other type modifiers
> > > like PTR_MAYBE_NULL? We set and unset if a ptr is NULL throughout a
> > > function, so we'd have to record if it was previously trusted in order
> > > to properly re-OR after a NULL check.
> > 
> > PTR_MAYBE_NULL is another bit and I don't think it conflicts with PTR_TRUSTED.
> > PTR_TO_BTF_ID | PTR_TRUSTED is a valid pointer.
> > PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL is a valid pointer or NULL.
> > 
> > PTR_TO_BTF_ID | PTR_MAYBE_NULL is a legacy "valid pointer" or NULL.
> > That legacy pointer cannot be passed to KF_TRUSTED_ARGS kfuncs.
> > 
> > KF_TRUSTED_ARGS kfuncs should not accept PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL.
> 
> Indeed -- my point was that I don't think e.g. clearing PTR_TRUSTED when
> we set PTR_UNTRUSTED will work, at least not yet. It's still too tricky
> to find all the places where we'd have to &= ~PTR_TRUSTED or |=
> PTR_TRUSTED when setting specific type modifiers. As described below, we
> first have to clarify the general workflow to enable the presence of
> PTR_TRUSTED to be the single source of truth for trust.

Agree. A reg->type with both PTR_TRUSTED and PTR_UNTRUSTED would be a bug,
but let's fix it when we get there.
Even if such bug hits us we can protect from it by make sure that
we treat PTR_UNTRUSTED as logically stronger flag.

> > It's a job of the prog to do != NULL check.
> > Otherwise all such != NULL checks would need to move inside kfuncs which is not good.
> > 
> > > > MEM_ALLOC flag is probably equivalent to PTR_TRUSTED.
> > > > Maybe the bit:
> > > > regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
> > > > should set PTR_TRUSTED as well?
> > > 
> > > We could, but that changes the meaning of PTR_TRUSTED and IMO makes it
> > > harder to reason about. Before it was just "the kernel passed this arg
> > > to the program and promises the program that it was trusted when it was
> > > first passed". Now it's that plus it could mean that it points to an
> > > allocated object from bpf_obj_new()". IMO we should keep all of these
> > > modifiers separate so that the presence of a modifier has a well-defined
> > > meaning that we can interpret in each context as needed.  In this case,
> > > we can make trust opt-in, so a KF_TRUSTED_ARGS BTF pointer either of the
> > > following:
> > > 
> > > 1. reg->ref_obj_id > 0
> > > 2. Either one of PTR_TRUSTED | MEM_ALLOC type modifiers are set, and no
> > >    others.
> > 
> > I don't think MEM_ALLOC conflicts with PTR_TRUSTED.
> > MEM_ALLOC flags means that it came from bpf_obj_new() and that's what
> > bpf_spin_lock and bpf_obj_drop() want to see.
> > 
> > Adding PTR_TRUSTED to MEM_ALLOC looks necessary to me.
> > It doesn't have to be done right now, but eventually feels right.
> 
> I think I agree. MEM_ALLOC should always imply PTR_TRUSTED. Ideally we
> shouldn't have to check MEM_ALLOC for KF_TRUSTED_ARGS at all, and
> PTR_TRUSTED should be the only modifier representing if something is
> safe. 

exactly.

> For now I'd prefer to keep them separate until we have a clear
> plan, especially with respect to clearing PTR_TRUSTED for when something
> unsafe happens like PTR_UNTRUSTED or PTR_MAYBE_NULL. It's all too
> muddied still.

sure. we can do that in the follow up.
A bit more technical debt to address later.

> 
> > I've been thinking whether reg->ref_obj_id > 0 condition should be converted
> > to PTR_TRUSTED too...
> > On one side it will simplify the check for KF_TRUSTED_ARGS.
> > The only thing check_kfunc_args() would need to do is:
> > is_kfunc_trusted_args(meta)
> > && type_flag(reg->type) & PTR_TRUSTED
> > && !(type_flag(reg->type) & PTR_MAYBE_NULL)
> > 
> > On the other side fixing all places where we set ref_obj_id
> > and adding |= PTR_TRUSTED may be too cumbersome ?
> 
> I think it's probably too cumbersome now, but yeah, as mentioned above,
> I think it's the right direction. I think it will require a lot of
> thought to do it right, though. With the code the way that it is now, I
> can't convince myself that we wouldn't do something like |= PTR_TRUSTED
> when we observe ref_obj_id > 0, and then later &= ~PTR_TRUSTED when
> setting PTR_MAYBE_NULL. I think Kumar's latest patch set is a nice step
> towards achieving this clearer state. Hopefully we can continue to
> improve.
> 
> > Right now we're saying PTR_TO_CTX is implicitly trusted, but we can OR
> > PTR_TO_CTX with PTR_TRUSTED to make it explicit and truly generalize the check.
> 
> Further agreed, this is the correct longer-term direction.
> 
> > > Agreed that after the rebase this would no longer be correct. I think we
> > > should make it opt-in, though. PTR_TRUSTED | MEM_ALLOC is fine.
> > > PTR_TRUSTED | MEM_ALLOC | PTR_MAYBE_NULL would not be.
> > 
> > to pass into KF_TRUSTED_ARGS kfunc? Agree.
> > I guess we can tighten the check a bit:
> > is_kfunc_trusted_args(meta)
> > && type_flag(reg->type) & PTR_TRUSTED
> > && !(type_flag(reg->type) & ~(PTR_TRUSTED | MEM_ALLOC))
> > 
> > In english: the pointer should have PTR_TRUSTED flag and
> > no other flags other than PTR_TRUSTED and MEM_ALLOC should be set.
> 
> Yeah, I think this is the correct way to model this for now. Later on
> once we refactor things, the presence of PTR_TRUSTED on its own should
> be sufficient. A good north star to aim towards.
> 
> I'll send this out as v8 tomorrow.

Perfect. Looking forward.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-11-19 16:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17  3:23 [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs David Vernet
2022-11-17  3:24 ` [PATCH bpf-next v7 1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
2022-11-18  2:26   ` Alexei Starovoitov
2022-11-18 14:45     ` David Vernet
2022-11-18 16:45       ` David Vernet
2022-11-18 18:45         ` Alexei Starovoitov
2022-11-18 21:44           ` David Vernet
2022-11-19  4:13             ` Alexei Starovoitov
2022-11-19  5:14               ` David Vernet
2022-11-19 16:48                 ` Alexei Starovoitov
2022-11-17  3:24 ` [PATCH bpf-next v7 2/3] bpf: Add kfuncs for storing struct task_struct * as a kptr David Vernet
2022-11-17  3:24 ` [PATCH bpf-next v7 3/3] bpf/selftests: Add selftests for new task kfuncs David Vernet
2022-11-18  2:21   ` Alexei Starovoitov
2022-11-18 14:49     ` David Vernet
2022-11-17 21:03 ` [PATCH bpf-next v7 0/3] Support storing struct task_struct objects as kptrs John Fastabend
2022-11-17 21:54   ` David Vernet
2022-11-17 22:36     ` John Fastabend
2022-11-18  1:41       ` David Vernet
2022-11-18  6:04         ` John Fastabend
2022-11-18 15:08           ` David Vernet
2022-11-18 18:31             ` Alexei Starovoitov
2022-11-19  6:09               ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.