All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 00/14] Exceptions - Resource Cleanup
@ 2024-02-01  4:20 Kumar Kartikeya Dwivedi
  2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
                   ` (14 more replies)
  0 siblings, 15 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:20 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

This set implements the second part of the exceptions set, with support
for releasing resources at runtime during the unwinding phase. This
allows programs to throw an exception at any point of time and terminate
their execution immediately.

Currently, any acquired resources held by the program will cause a
thrown exception to fail during verification. This is because safely
unwinding the stack requires releasing these resources which represent
kernel objects and locks. Not doing this and continuing the stack
unwinding will destroy the stack frames containing such objects and will
lead to all kinds of issues and the violation of BPF's safety
properties.

Note that while the current mechanism only supports throwing exceptions
synchronously, the unwinding mechanism only requires a valid frame
descriptor to perform the cleanup. Thus, in a followup, we can build on
top of this series to introduce support for preempting execution of BPF
programs and terminating them, in cases where termination is not
statically provable and the program exceeds some runtime threshold.
This can also allow holding multiple locks at the same time, detecting
deadlocks and aborting programs, etc.

In this set, we implement support that allows the kernel to release such
resources when performing the unwinding step for the program and
aborting it. To enable this, the kernel needs to be made aware about the
layout of the program stack (for each BPF frame) and the type of each
kernel object residing therein. This information is retrieved at runtime
by tying this metadata to the program counter.

Everytime a bpf_throw call is processed by the verifier, it generates a
frame descriptor for the caller and all of its caller frames, and ties
them to the program counter. At runtime, the kernel will only process
unwinding requests for such program counters, and this mapping of frame
descriptors to the PC will allow their discovery of resources for each
frame.

Note that at the same program point, there cannot be a case where the
verifier may have to produce distinct frame descriptors. If such a case
is encountered, then the verification will fail. This is an uncommon
case where depending on the program path taken at runtime, the same
stack slot may contain pointers of different types. In most of these
examples, the program would not pass the verifier anyway, since the
value that has to be freed would be lost in verifier state and cannot be
recovered.

A special provision is made for cases where the stack slot contains NULL
or a pointer in two different program paths, it is quite common for such
a case to occur where a resource may be acquired conditionally, and the
release occurs later in the program guarded by the same conditional.

Notes
=====

 * Releasing bpf_spin_lock, RCU read locks is not supported in the RFC,
   but will be done as a follow up or added to the next revision on top
   of this set.
 * A few known rough edges/minor bugs which will be fixed in the next
   version.
 * Adding more tests for corner cases.

Kumar Kartikeya Dwivedi (14):
  bpf: Mark subprogs as throw reachable before do_check pass
  bpf: Process global subprog's exception propagation
  selftests/bpf: Add test for throwing global subprog with acquired refs
  bpf: Refactor check_pseudo_btf_id's BTF reference bump
  bpf: Implement BPF exception frame descriptor generation
  bpf: Adjust frame descriptor pc on instruction patching
  bpf: Use hidden subprog trampoline for bpf_throw
  bpf: Compute used callee saved registers for subprogs
  bpf, x86: Fix up pc offsets for frame descriptor entries
  bpf, x86: Implement runtime resource cleanup for exceptions
  bpf: Release references in verifier state when throwing exceptions
  bpf: Register cleanup dtors for runtime unwinding
  bpf: Make bpf_throw available to all program types
  selftests/bpf: Add tests for exceptions runtime cleanup

 arch/x86/net/bpf_jit_comp.c                   | 116 ++-
 drivers/hid/bpf/hid_bpf_dispatch.c            |  17 +
 include/linux/bpf.h                           |  57 ++
 include/linux/bpf_verifier.h                  |   9 +-
 include/linux/btf.h                           |  10 +-
 include/linux/filter.h                        |   3 +
 kernel/bpf/btf.c                              |  11 +-
 kernel/bpf/core.c                             |  18 +
 kernel/bpf/cpumask.c                          |   3 +-
 kernel/bpf/helpers.c                          | 165 +++-
 kernel/bpf/verifier.c                         | 714 +++++++++++++++++-
 kernel/trace/bpf_trace.c                      |  16 +
 net/bpf/test_run.c                            |   4 +-
 net/core/filter.c                             |   5 +
 net/netfilter/nf_conntrack_bpf.c              |  14 +-
 net/xfrm/xfrm_state_bpf.c                     |  16 +
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../bpf/prog_tests/exceptions_cleanup.c       |  65 ++
 .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++
 .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++
 .../selftests/bpf/progs/exceptions_fail.c     |  38 +-
 22 files changed, 1817 insertions(+), 88 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c


base-commit: 77326a4a06e1e97432322f403cb439880871d34d
-- 
2.40.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
@ 2024-02-01  4:20 ` Kumar Kartikeya Dwivedi
  2024-02-12 19:35   ` David Vernet
  2024-02-15  1:01   ` Eduard Zingerman
  2024-02-01  4:20 ` [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:20 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

The motivation of this patch is to figure out which subprogs participate
in exception propagation. In other words, whichever subprog's execution
can lead to an exception being thrown either directly or indirectly (by
way of calling other subprogs).

With the current exceptions support, the runtime performs stack
unwinding when bpf_throw is called. For now, any resources acquired by
the program cannot be released, therefore bpf_throw calls made with
non-zero acquired references must be rejected during verification.

However, there currently exists a loophole in this restriction due to
the way the verification procedure is structured. The verifier will
first walk over the main subprog's instructions, but not descend into
subprog calls to ones with global linkage. These global subprogs will
then be independently verified instead. Therefore, in a situation where
a global subprog ends up throwing an exception (either directly by
calling bpf_throw, or indirectly by way of calling another subprog that
does so), the verifier will fail to notice this fact and may permit
throwing BPF exceptions with non-zero acquired references.

Therefore, to fix this, we add a summarization pass before the do_check
stage which walks all call chains of the program and marks all of the
subprogs that are reachable from a bpf_throw call which unwinds the
program stack.

We only do so if we actually see a bpf_throw call in the program though,
since we do not want to walk all instructions unless we need to.  One we
analyze all possible call chains of the program, we will be able to mark
them as 'is_throw_reachable' in their subprog_info.

After performing this step, we need to make another change as to how
subprog call verification occurs. In case of global subprog, we will
need to explore an alternate program path where the call instruction
processing of a global subprog's call will immediately throw an
exception. We will thus simulate a normal path without any exceptions,
and one where the exception is thrown and the program proceeds no
further. In this way, the verifier will be able to detect the whether
any acquired references or locks exist in the verifier state and thus
reject the program if needed.

Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  2 +
 kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0dcde339dc7e..1d666b6c21e6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -626,6 +626,7 @@ struct bpf_subprog_info {
 	bool is_async_cb: 1;
 	bool is_exception_cb: 1;
 	bool args_cached: 1;
+	bool is_throw_reachable: 1;
 
 	u8 arg_cnt;
 	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
@@ -691,6 +692,7 @@ struct bpf_verifier_env {
 	bool bypass_spec_v4;
 	bool seen_direct_write;
 	bool seen_exception;
+	bool seen_throw_insn;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
 	struct bpf_verifier_log log;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index cd4d780e5400..bba53c4e3a0c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2941,6 +2941,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		    insn[i].src_reg == 0 &&
 		    insn[i].imm == BPF_FUNC_tail_call)
 			subprog[cur_subprog].has_tail_call = true;
+		if (!env->seen_throw_insn && is_bpf_throw_kfunc(&insn[i]))
+			env->seen_throw_insn = true;
 		if (BPF_CLASS(code) == BPF_LD &&
 		    (BPF_MODE(code) == BPF_ABS || BPF_MODE(code) == BPF_IND))
 			subprog[cur_subprog].has_ld_abs = true;
@@ -5866,6 +5868,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
 
 			if (!is_bpf_throw_kfunc(insn + i))
 				continue;
+			/* When this is allowed, don't forget to update logic for sync and
+			 * async callbacks in mark_exception_reachable_subprogs.
+			 */
 			if (subprog[idx].is_cb)
 				err = true;
 			for (int c = 0; c < frame && !err; c++) {
@@ -16205,6 +16210,83 @@ static int check_btf_info(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* We walk the call graph of the program in this function, and mark everything in
+ * the call chain as 'is_throw_reachable'. This allows us to know which subprog
+ * calls may propagate an exception and generate exception frame descriptors for
+ * those call instructions. We already do that for bpf_throw calls made directly,
+ * but we need to mark the subprogs as we won't be able to see the call chains
+ * during symbolic execution in do_check_common due to global subprogs.
+ *
+ * Note that unlike check_max_stack_depth, we don't explore the async callbacks
+ * apart from main subprogs, as we don't support throwing from them for now, but
+ */
+static int mark_exception_reachable_subprogs(struct bpf_verifier_env *env)
+{
+	struct bpf_subprog_info *subprog = env->subprog_info;
+	struct bpf_insn *insn = env->prog->insnsi;
+	int idx = 0, frame = 0, i, subprog_end;
+	int ret_insn[MAX_CALL_FRAMES];
+	int ret_prog[MAX_CALL_FRAMES];
+
+	/* No need if we never saw any bpf_throw() call in the program. */
+	if (!env->seen_throw_insn)
+		return 0;
+
+	i = subprog[idx].start;
+restart:
+	subprog_end = subprog[idx + 1].start;
+	for (; i < subprog_end; i++) {
+		int next_insn, sidx;
+
+		if (bpf_pseudo_kfunc_call(insn + i) && !insn[i].off) {
+			if (!is_bpf_throw_kfunc(insn + i))
+				continue;
+			subprog[idx].is_throw_reachable = true;
+			for (int j = 0; j < frame; j++)
+				subprog[ret_prog[j]].is_throw_reachable = true;
+		}
+
+		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
+			continue;
+		/* remember insn and function to return to */
+		ret_insn[frame] = i + 1;
+		ret_prog[frame] = idx;
+
+		/* find the callee */
+		next_insn = i + insn[i].imm + 1;
+		sidx = find_subprog(env, next_insn);
+		if (sidx < 0) {
+			WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
+			return -EFAULT;
+		}
+		/* We cannot distinguish between sync or async cb, so we need to follow
+		 * both.  Async callbacks don't really propagate exceptions but calling
+		 * bpf_throw from them is not allowed anyway, so there is no harm in
+		 * exploring them.
+		 * TODO: To address this properly, we will have to move is_cb,
+		 * is_async_cb markings to the stage before do_check.
+		 */
+		i = next_insn;
+		idx = sidx;
+
+		frame++;
+		if (frame >= MAX_CALL_FRAMES) {
+			verbose(env, "the call stack of %d frames is too deep !\n", frame);
+			return -E2BIG;
+		}
+		goto restart;
+	}
+	/* end of for() loop means the last insn of the 'subprog'
+	 * was reached. Doesn't matter whether it was JA or EXIT
+	 */
+	if (frame == 0)
+		return 0;
+	frame--;
+	i = ret_insn[frame];
+	idx = ret_prog[frame];
+	goto restart;
+}
+
 /* check %cur's range satisfies %old's */
 static bool range_within(struct bpf_reg_state *old,
 			 struct bpf_reg_state *cur)
@@ -20939,6 +21021,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (ret < 0)
 		goto skip_full_check;
 
+	ret = mark_exception_reachable_subprogs(env);
+	if (ret < 0)
+		goto skip_full_check;
+
 	ret = do_check_main(env);
 	ret = ret ?: do_check_subprogs(env);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
  2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
@ 2024-02-01  4:20 ` Kumar Kartikeya Dwivedi
  2024-02-15  1:10   ` Eduard Zingerman
  2024-02-01  4:20 ` [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:20 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Global subprogs are not descended during symbolic execution, but we
summarized whether they can throw an exception (reachable from another
exception throwing subprog) in mark_exception_reachable_subprogs added
by the previous patch.

We must now ensure that we explore the path of the program where
invoking the call instruction leads to an exception being thrown, so
that we can correctly reject programs where it is not permissible to
throw an exception.  For instance, it might be permissible to throw from
a global subprog, but its caller may hold references. Without this
patch, the verifier will accept such programs.

To do this, we use push_stack to push a separate branch into the branch
stack of the verifier, with the same current and previous insn_idx.
Then, we set a bit in the verifier state of the branch to indicate that
the next instruction it will process is of a global subprog call which
will throw an exception. When we encounter this instruction, this bit
will be cleared.

Special care must be taken to update the state pruning logic, as without
any changes, it is possible that we end up pruning when popping the
exception throwing state for exploration. Therefore, while we can never
have the 'global_subprog_call_exception' bit set in the verifier state
of an explored state, we will see it in the current state, and use this
to reject pruning requests and continue its exploration.

Note that we process the exception after processing the call
instruction, similar to how we do a process_bpf_exit_full jump in case
of bpf_throw kfuncs.

Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1d666b6c21e6..5482701e6ad9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,7 @@ struct bpf_verifier_state {
 	 * while they are still in use.
 	 */
 	bool used_as_loop_entry;
+	bool global_subprog_call_exception;
 
 	/* first and last insn idx of this verifier state */
 	u32 first_insn_idx;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bba53c4e3a0c..622c638b123b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1418,6 +1418,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->dfs_depth = src->dfs_depth;
 	dst_state->callback_unroll_depth = src->callback_unroll_depth;
 	dst_state->used_as_loop_entry = src->used_as_loop_entry;
+	dst_state->global_subprog_call_exception = src->global_subprog_call_exception;
 	for (i = 0; i <= src->curframe; i++) {
 		dst = dst_state->frame[i];
 		if (!dst) {
@@ -9497,6 +9498,15 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 		verbose(env, "Func#%d ('%s') is global and assumed valid.\n",
 			subprog, sub_name);
+		if (subprog_info(env, subprog)->is_throw_reachable && !env->cur_state->global_subprog_call_exception) {
+			struct bpf_verifier_state *branch = push_stack(env, env->insn_idx, env->prev_insn_idx, false);
+
+			if (!branch) {
+				verbose(env, "verifier internal error: cannot push branch to explore exception of global subprog\n");
+				return -EFAULT;
+			}
+			branch->global_subprog_call_exception = true;
+		}
 		/* mark global subprog for verifying after main prog */
 		subprog_aux(env, subprog)->called = true;
 		clear_caller_saved_regs(env, caller->regs);
@@ -9505,6 +9515,9 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		mark_reg_unknown(env, caller->regs, BPF_REG_0);
 		caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG;
 
+		if (env->cur_state->global_subprog_call_exception)
+			verbose(env, "Func#%d ('%s') may throw exception, exploring program path where exception is thrown\n",
+				subprog, sub_name);
 		/* continue with next insn after call */
 		return 0;
 	}
@@ -16784,6 +16797,10 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->active_rcu_lock != cur->active_rcu_lock)
 		return false;
 
+	/* Prevent pruning to explore state where global subprog call throws an exception. */
+	if (cur->global_subprog_call_exception)
+		return false;
+
 	/* for states to be equal callsites have to be the same
 	 * and all frame states need to be equivalent
 	 */
@@ -17675,6 +17692,11 @@ static int do_check(struct bpf_verifier_env *env)
 				}
 				if (insn->src_reg == BPF_PSEUDO_CALL) {
 					err = check_func_call(env, insn, &env->insn_idx);
+					if (!err && env->cur_state->global_subprog_call_exception) {
+						env->cur_state->global_subprog_call_exception = false;
+						exception_exit = true;
+						goto process_bpf_exit_full;
+					}
 				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 					err = check_kfunc_call(env, insn, &env->insn_idx);
 					if (!err && is_bpf_throw_kfunc(insn)) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
  2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
  2024-02-01  4:20 ` [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation Kumar Kartikeya Dwivedi
@ 2024-02-01  4:20 ` Kumar Kartikeya Dwivedi
  2024-02-15  1:10   ` Eduard Zingerman
  2024-02-01  4:20 ` [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:20 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Add a test case to exercise verifier logic where a global function that
may potentially throw an exception is invoked from the main subprog,
such that during exploration, the reference state is not visible when
the bpf_throw instruction is explored. Without the fixes in prior
commits, bpf_throw will not complain when unreleased resources are
lingering in the program when a possible exception may be thrown.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/progs/exceptions_fail.c     | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index 9cceb6521143..28602f905d7d 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -146,6 +146,13 @@ __noinline static int throwing_subprog(struct __sk_buff *ctx)
 	return 0;
 }
 
+__noinline int throwing_global_subprog(struct __sk_buff *ctx)
+{
+	if (ctx->len)
+		bpf_throw(0);
+	return 0;
+}
+
 SEC("?tc")
 __failure __msg("bpf_rcu_read_unlock is missing")
 int reject_subprog_with_rcu_read_lock(void *ctx)
@@ -346,4 +353,18 @@ int reject_exception_throw_cb_diff(struct __sk_buff *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("exploring program path where exception is thrown")
+int reject_exception_throw_ref_call_throwing_global(struct __sk_buff *ctx)
+{
+	struct { long a; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	if (ctx->protocol)
+		throwing_global_subprog(ctx);
+	bpf_obj_drop(p);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2024-02-01  4:20 ` [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs Kumar Kartikeya Dwivedi
@ 2024-02-01  4:20 ` Kumar Kartikeya Dwivedi
  2024-02-15  1:11   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:20 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Refactor check_pseudo_btf_id's code which adds a new BTF reference to
the used_btfs into a separate helper function called add_used_btfs. This
will be later useful in exception frame generation to take BTF
references with their modules, so that we can keep the modules alive
whose functions may be required to unwind a given BPF program when it
eventually throws an exception.

While typically module references should already be held in such a case,
since the program will have used a kfunc to acquire a reference that it
did not clean up before throwing an exception, there are corner cases
where this may not be true (e.g. one program producing the object, and
another simply using bpf_kptr_xchg, and not having a kfunc call into the
module). Therefore, it is more prudent to simply bump the reference
whenever we encounter such cases for exception frame generation.

The behaviour of add_used_btfs is to take an input BTF object with its
reference count already raised, and the consume the reference count in
case of successful insertion. In case of an error, the caller is
responsible for releasing the reference.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 70 ++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 622c638b123b..03ad9a9d47c9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17861,6 +17861,42 @@ static int find_btf_percpu_datasec(struct btf *btf)
 	return -ENOENT;
 }
 
+static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf)
+{
+	struct btf_mod_pair *btf_mod;
+	int i, err;
+
+	/* check whether we recorded this BTF (and maybe module) already */
+	for (i = 0; i < env->used_btf_cnt; i++) {
+		if (env->used_btfs[i].btf == btf) {
+			btf_put(btf);
+			return 0;
+		}
+	}
+
+	if (env->used_btf_cnt >= MAX_USED_BTFS) {
+		err = -E2BIG;
+		goto err;
+	}
+
+	btf_mod = &env->used_btfs[env->used_btf_cnt];
+	btf_mod->btf = btf;
+	btf_mod->module = NULL;
+
+	/* if we reference variables from kernel module, bump its refcount */
+	if (btf_is_module(btf)) {
+		btf_mod->module = btf_try_get_module(btf);
+		if (!btf_mod->module) {
+			err = -ENXIO;
+			goto err;
+		}
+	}
+	env->used_btf_cnt++;
+	return 0;
+err:
+	return err;
+}
+
 /* replace pseudo btf_id with kernel symbol address */
 static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 			       struct bpf_insn *insn,
@@ -17868,7 +17904,6 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 {
 	const struct btf_var_secinfo *vsi;
 	const struct btf_type *datasec;
-	struct btf_mod_pair *btf_mod;
 	const struct btf_type *t;
 	const char *sym_name;
 	bool percpu = false;
@@ -17921,7 +17956,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 	if (btf_type_is_func(t)) {
 		aux->btf_var.reg_type = PTR_TO_MEM | MEM_RDONLY;
 		aux->btf_var.mem_size = 0;
-		goto check_btf;
+		goto add_btf;
 	}
 
 	datasec_id = find_btf_percpu_datasec(btf);
@@ -17962,35 +17997,10 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 		aux->btf_var.btf = btf;
 		aux->btf_var.btf_id = type;
 	}
-check_btf:
-	/* check whether we recorded this BTF (and maybe module) already */
-	for (i = 0; i < env->used_btf_cnt; i++) {
-		if (env->used_btfs[i].btf == btf) {
-			btf_put(btf);
-			return 0;
-		}
-	}
-
-	if (env->used_btf_cnt >= MAX_USED_BTFS) {
-		err = -E2BIG;
+add_btf:
+	err = add_used_btf(env, btf);
+	if (err < 0)
 		goto err_put;
-	}
-
-	btf_mod = &env->used_btfs[env->used_btf_cnt];
-	btf_mod->btf = btf;
-	btf_mod->module = NULL;
-
-	/* if we reference variables from kernel module, bump its refcount */
-	if (btf_is_module(btf)) {
-		btf_mod->module = btf_try_get_module(btf);
-		if (!btf_mod->module) {
-			err = -ENXIO;
-			goto err_put;
-		}
-	}
-
-	env->used_btf_cnt++;
-
 	return 0;
 err_put:
 	btf_put(btf);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2024-02-01  4:20 ` [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-15 18:24   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Introduce support to the verifier to generate a set of descriptors for
each BPF frame to describe the register and stack state, which can be
used to reason about the resources acquired by the program at that
particular program point, and subsequent patches can introduce support
to unwind a given program when it throws an exception while holding
ownership of such resources.

Descriptors generated for each frame are then tied back to the subprog
they belong to, and attached to the bpf_prog instance during the JIT
phase, with the program counter serving as a key to find the descriptor
of interest for a given subprog.

Logically, during the unwinding phase, for each frame, we will use the
program counter and bpf_prog object to figure out how we should release
acquired resources if any in the frame.

Let's study how the frame descriptor generation algorithm works.
Whenever an exception throwing instruction is encountered, thus global
subprog calls which are throw reachable, and bpf_throw kfunc, we call
gen_exception_frame_descs.

This function will start with the current frame, and explore the
registers and other objects on the current stack. We consider 8-byte
granularity as all registers spilled on the stack and objects like
dynptr and iter are 8-byte aligned. For each such stack entry, we
inspect the slot_type and figure out whether it is a spilled register or
a dynptr/iter object.

For any acquired resources on the stack, and insert entries representing
them into a frame descriptor table for the current subprog at the
current instruction index.

The same steps are repeated for registers that are callee saved, as
these would be possibly spilled on stack of one of the frames in the
call chain and would have to be located in order to be freed.

In case of registers (spilled or callee saved), we make a special
provision for register_is_null scalar values, to increase the chances of
merging frame descriptors where the only divergence is NULL in one state
being replaced with a valid pointer in another.

The next important step is the logic to merge the frame descriptors. It
is possible that the verifier reaches the same instruction index in a
program from multiple paths, and has to generate frame descriptors for
them at that program counter. In such a case, we always ensure that
after generating the frame descriptor, we attempt to "merge" it with an
existing one.

The merging rules are fairly simple except for a few caveats. First, if
the layout and type of objects on the stack and in registers is the
same, we have a successful merge. Next, in case of registers (spilled or
callee saved), we have a special where if the old entry has NULL, the
new type (non-NULL) replaces it, and if the new entry has NULL, it
satisfies the merge rules with the old entry (can be of any type).

This helps in cases where we have an optional value held in a register
or stack slot in one program path, which is replaced by the actual value
in the other program path. This can also be the case in case of
conditionals, where the verifier may see acquired references in verifier
state depending on if a condition is true (therefore, not in all of the
program paths traversing the same instruction).

To illustrate with an example, in the following program:

struct foo *p = NULL;
if (x)
	p = bpf_obj_new(typeof(*p));
if (y)
	bpf_throw(0);
if (p)
	bpf_obj_drop(p);

In such a case, bpf_throw may be reached for x == 0, y == 1 and x == 1,
y == 1, with two possible values of p. As long as both can be passed
into the release function (i.e. NULL or a valid pointer value), we can
satisfy the merge.

TODO: We need to reserve a slot for STACK_ZERO as well.
TODO: Improve the error message in case we have pointer and misc instead of zero.

Currently, we only consider resources which are modelled as acquired
references in verifier state. In particular, this excludes resources
like held spinlocks and RCU read sections. For now, both of these will
not be handled, and the verifier will continue to complain when
exceptions are thrown in their presence.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  27 ++
 include/linux/bpf_verifier.h                  |   2 +
 kernel/bpf/core.c                             |  13 +
 kernel/bpf/verifier.c                         | 368 ++++++++++++++++++
 .../selftests/bpf/progs/exceptions_fail.c     |   4 +-
 5 files changed, 412 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1ebbee1d648e..463c8d22ad72 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1424,6 +1424,7 @@ struct btf_mod_pair {
 };
 
 struct bpf_kfunc_desc_tab;
+struct bpf_exception_frame_desc_tab;
 
 struct bpf_prog_aux {
 	atomic64_t refcnt;
@@ -1518,6 +1519,7 @@ struct bpf_prog_aux {
 	struct module *mod;
 	u32 num_exentries;
 	struct exception_table_entry *extable;
+	struct bpf_exception_frame_desc_tab *fdtab;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
@@ -3367,4 +3369,29 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+struct bpf_frame_desc_reg_entry {
+	u32 type;
+	s16 spill_type;
+	union {
+		s16 off;
+		u16 regno;
+	};
+	struct btf *btf;
+	u32 btf_id;
+};
+
+struct bpf_exception_frame_desc {
+	u64 pc;
+	u32 stack_cnt;
+	struct bpf_frame_desc_reg_entry regs[4];
+	struct bpf_frame_desc_reg_entry stack[];
+};
+
+struct bpf_exception_frame_desc_tab {
+	u32 cnt;
+	struct bpf_exception_frame_desc **desc;
+};
+
+void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 5482701e6ad9..0113a3a940e2 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -631,6 +631,8 @@ struct bpf_subprog_info {
 
 	u8 arg_cnt;
 	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
+
+	struct bpf_exception_frame_desc_tab *fdtab;
 };
 
 struct bpf_verifier_env;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 71c459a51d9e..995a4dcfa970 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2734,6 +2734,14 @@ static void bpf_free_used_btfs(struct bpf_prog_aux *aux)
 	kfree(aux->used_btfs);
 }
 
+void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab)
+{
+	if (!fdtab)
+		return;
+	for (int i = 0; i < fdtab->cnt; i++)
+		kfree(fdtab->desc[i]);
+}
+
 static void bpf_prog_free_deferred(struct work_struct *work)
 {
 	struct bpf_prog_aux *aux;
@@ -2747,6 +2755,11 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
 		bpf_cgroup_atype_put(aux->cgroup_atype);
 #endif
+	/* Free all exception frame descriptors */
+	for (int i = 0; i < aux->func_cnt; i++) {
+		bpf_exception_frame_desc_tab_free(aux->func[i]->aux->fdtab);
+		aux->func[i]->aux->fdtab = NULL;
+	}
 	bpf_free_used_maps(aux);
 	bpf_free_used_btfs(aux);
 	if (bpf_prog_is_dev_bound(aux))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 03ad9a9d47c9..27233c308d83 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10004,6 +10004,366 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	return 0;
 }
 
+static void print_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *fd, const char *pfx)
+{
+	const char *type = fd->off < 0 ? "stack" : "reg";
+	const char *key = fd->off < 0 ? "off" : "regno";
+	const char *spill_type;
+
+	switch (fd->spill_type) {
+	case STACK_INVALID:
+		spill_type = "<unknown>";
+		break;
+	case STACK_SPILL:
+		spill_type = "reg";
+		break;
+	case STACK_DYNPTR:
+		spill_type = "dynptr";
+		break;
+	case STACK_ITER:
+		spill_type = "iter";
+		break;
+	default:
+		spill_type = "???";
+		break;
+	}
+	verbose(env, "frame_desc: %s%s fde: %s=%d spill_type=%s ", pfx, type, key, fd->off, spill_type);
+	if (fd->btf) {
+		const struct btf_type *t = btf_type_by_id(fd->btf, fd->btf_id);
+
+		verbose(env, "type=%s%s btf=%s btf_id=%d\n", fd->off < 0 ? "" : "ptr_",
+			btf_name_by_offset(fd->btf, t->name_off), btf_get_name(fd->btf), fd->btf_id);
+	} else {
+		verbose(env, "type=%s\n", fd->spill_type == STACK_DYNPTR ? "ringbuf" : reg_type_str(env, fd->type));
+	}
+}
+
+static int merge_frame_desc(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *ofd, struct bpf_frame_desc_reg_entry *fd)
+{
+	int ofd_type, fd_type;
+
+	/* If ofd->off/regno is 0, this is uninitialized reg entry, just merge new entry. */
+	if (!ofd->off)
+		goto merge_new;
+	/* Exact merge for dynptr, iter, reg, stack. */
+	if (!memcmp(ofd, fd, sizeof(*ofd)))
+		goto none;
+	/* First, for a successful merge, both spill_type should be same.*/
+	if (ofd->spill_type != fd->spill_type)
+		goto fail;
+	/* Then, both should correspond to a reg or stack entry for non-exact merge. */
+	if (ofd->spill_type != STACK_SPILL && ofd->spill_type != STACK_INVALID)
+		goto fail;
+	ofd_type = ofd->type;
+	fd_type = fd->type;
+	/* One of the old or new entry must be NULL, if both are not same. */
+	if (ofd_type == fd_type)
+		goto none;
+	if (ofd_type != SCALAR_VALUE && fd_type != SCALAR_VALUE)
+		goto fail;
+	if (fd_type == SCALAR_VALUE)
+		goto none;
+	verbose(env, "frame_desc: merge: merging new frame desc entry into old\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+merge_new:
+	if (!ofd->off)
+		verbose(env, "frame_desc: merge: creating new frame desc entry\n");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	*ofd = *fd;
+	return 0;
+none:
+	verbose(env, "frame_desc: merge: no merging needed of new frame desc entry into old\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	return 0;
+fail:
+	verbose(env, "frame_desc: merge: failed to merge old and new frame desc entry\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	return -EINVAL;
+}
+
+static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
+{
+	struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
+	int ret = 0;
+
+	for (int i = 0; i < fdtab->cnt; i++) {
+		if (pc != fdtab->desc[i]->pc)
+			continue;
+		descs = &fdtab->desc[i];
+		desc = fdtab->desc[i];
+		break;
+	}
+
+	if (!desc) {
+		verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
+		return -ENOENT;
+	}
+
+	if (fd->off < 0)
+		goto stack;
+	/* We didn't find a match for regno or offset, fill it into the frame descriptor. */
+	return merge_frame_desc(env, &desc->regs[fd->regno - BPF_REG_6], fd);
+
+stack:
+	for (int i = 0; i < desc->stack_cnt; i++) {
+		struct bpf_frame_desc_reg_entry *ofd = desc->stack + i;
+
+		if (ofd->off != fd->off)
+			continue;
+		ret = merge_frame_desc(env, ofd, fd);
+		if (ret < 0)
+			return ret;
+		return 0;
+	}
+	p = krealloc(desc, offsetof(typeof(*desc), stack[desc->stack_cnt + 1]), GFP_USER | __GFP_ZERO);
+	if (!p) {
+		return -ENOMEM;
+	}
+	verbose(env, "frame_desc: merge: creating new frame desc entry\n");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	desc = p;
+	desc->stack[desc->stack_cnt] = *fd;
+	desc->stack_cnt++;
+	*descs = desc;
+	return 0;
+}
+
+/* Implementation details:
+ * This function is responsible for pushing a prepared bpf_frame_desc_reg_entry
+ * into the frame descriptor array tied to each subprog. The first step is
+ * ensuring the array is allocated and has enough capacity. Second, we must find
+ * if there is an existing descriptor already for the program counter under
+ * consideration, and try to report an error if we see conflicting frame
+ * descriptor generation requests for the same instruction in the program.
+ * Note that by default, we let NULL registers and stack slots occupy an entry.
+ * This is done so that any future non-NULL registers or stack slots at the same
+ * regno or offset can be satisfied by changing the type of entry to a "stronger"
+ * pointer type. The release handler can deal with NULL or valid values,
+ * therefore such a logic allows handling cases where the program may only have
+ * a pointer in some of the program paths and NULL in others while reaching the
+ * same instruction that causes an exception to be thrown.
+ * Likewise, a NULL entry merges into the stronger pointer type entry when a
+ * frame descriptor already exists before pushing a new one.
+ */
+static int push_exception_frame_desc(struct bpf_verifier_env *env, int frameno, struct bpf_frame_desc_reg_entry *fd)
+{
+	struct bpf_func_state *frame = env->cur_state->frame[frameno], *curframe = cur_func(env);
+	struct bpf_subprog_info *si = subprog_info(env, frame->subprogno);
+	struct bpf_exception_frame_desc_tab *fdtab = si->fdtab;
+	struct bpf_exception_frame_desc **desc;
+	u64 pc = env->insn_idx;
+	int ret;
+
+	/* If this is not the current frame, then we need to figure out the callsite
+	 * for its callee to identify the pc.
+	 */
+	if (frameno != curframe->frameno)
+		pc = env->cur_state->frame[frameno + 1]->callsite;
+
+	if (!fdtab) {
+		fdtab = kzalloc(sizeof(*si->fdtab), GFP_USER);
+		if (!fdtab)
+			return -ENOMEM;
+		fdtab->desc = kzalloc(sizeof(*fdtab->desc), GFP_USER);
+		if (!fdtab->desc) {
+			kfree(fdtab);
+			return -ENOMEM;
+		}
+		si->fdtab = fdtab;
+	}
+
+	ret = find_and_merge_frame_desc(env, fdtab, pc, fd);
+	if (!ret)
+		return 0;
+	if (ret < 0 && ret != -ENOENT)
+		return ret;
+	/* We didn't find a frame descriptor for pc, grow the array and insert it. */
+	desc = realloc_array(fdtab->desc, fdtab->cnt ?: 1, fdtab->cnt + 1, sizeof(*fdtab->desc));
+	if (!desc) {
+		return -ENOMEM;
+	}
+	fdtab->desc = desc;
+	fdtab->desc[fdtab->cnt] = kzalloc(sizeof(*fdtab->desc[0]), GFP_USER);
+	if (!fdtab->desc[fdtab->cnt])
+		return -ENOMEM;
+	fdtab->desc[fdtab->cnt]->pc = pc;
+	fdtab->cnt++;
+	return find_and_merge_frame_desc(env, fdtab, pc, fd);
+}
+
+static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+
+	if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
+		return 0;
+	if (base_type(reg->type) == PTR_TO_BTF_ID)
+		fd.btf = reg->btf;
+	fd.type = reg->type & ~PTR_MAYBE_NULL;
+	fd.btf_id = fd.btf ? reg->btf_id : 0;
+	fd.spill_type = off < 0 ? STACK_SPILL : STACK_INVALID;
+	fd.off = off;
+	verbose(env, "frame_desc: frame%d: insn_idx=%d %s=%d size=%d ref_obj_id=%d type=%s\n",
+		frameno, env->insn_idx, off < 0 ? "off" : "regno", off, BPF_REG_SIZE, reg->ref_obj_id, reg_type_str(env, reg->type));
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int gen_exception_frame_desc_dynptr_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+	int type = reg->dynptr.type;
+
+	/* We only need to generate an entry when the dynptr is refcounted,
+	 * otherwise it encapsulates no resource that needs to be released.
+	 */
+	if (!dynptr_type_refcounted(type))
+		return 0;
+	switch (type) {
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		fd.type = BPF_DYNPTR_TYPE_RINGBUF;
+		fd.spill_type = STACK_DYNPTR;
+		fd.off = off;
+		verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%lu dynptr_ringbuf\n", frameno, env->insn_idx, off,
+			BPF_DYNPTR_NR_SLOTS * BPF_REG_SIZE);
+		break;
+	default:
+		verbose(env, "verifier internal error: refcounted dynptr type unhandled for exception frame descriptor entry\n");
+		return -EFAULT;
+	}
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf);
+
+static int gen_exception_frame_desc_iter_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+	struct btf *btf = reg->iter.btf;
+	u32 btf_id = reg->iter.btf_id;
+	const struct btf_type *t;
+	int ret;
+
+	fd.btf = btf;
+	fd.type = reg->type;
+	fd.btf_id = btf_id;
+	fd.spill_type = STACK_ITER;
+	fd.off = off;
+	t = btf_type_by_id(btf, btf_id);
+	verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%u ref_obj_id=%d iter_%s\n",
+		frameno, env->insn_idx, off, t->size, reg->ref_obj_id, btf_name_by_offset(btf, t->name_off));
+	btf_get(btf);
+	ret = add_used_btf(env, btf);
+	if (ret < 0) {
+		btf_put(btf);
+		return ret;
+	}
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
+{
+	int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
+	struct bpf_reg_state *reg, not_init_reg, null_reg;
+	int slot_type, ret;
+
+	__mark_reg_not_init(env, &not_init_reg);
+	__mark_reg_known_zero(&null_reg);
+
+	slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
+	reg = &frame->stack[spi].spilled_ptr;
+
+	switch (slot_type) {
+	case STACK_SPILL:
+		/* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
+		if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
+			break;
+		ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_DYNPTR:
+		/* Keep iterating until we find the first slot. */
+		if (!reg->dynptr.first_slot)
+			break;
+		ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_ITER:
+		/* Keep iterating until we find the first slot. */
+		if (!reg->ref_obj_id)
+			break;
+		ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_MISC:
+	case STACK_INVALID:
+		/* Create an invalid entry for MISC and INVALID */
+		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
+		if (ret < 0)
+			return 0;
+		break;
+	case STACK_ZERO:
+		reg = &null_reg;
+		for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
+			if (frame->stack[spi].slot_type[i] != STACK_ZERO)
+				reg = &not_init_reg;
+		}
+		ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	default:
+		verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
+			frame->frameno, off, slot_type);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/* We generate exception descriptors for all frames at the current program
+ * counter.  For caller frames, we use their callsite as their program counter,
+ * and we go on generating it until the main frame.
+ *
+ * It's necessary to detect whether the stack layout is different, in that case
+ * frame descriptor generation should fail and we cannot really support runtime
+ * unwinding in that case.
+ */
+static int gen_exception_frame_descs(struct bpf_verifier_env *env)
+{
+	struct bpf_reg_state not_init_reg;
+	int ret;
+
+	__mark_reg_not_init(env, &not_init_reg);
+
+	for (int frameno = env->cur_state->curframe; frameno >= 0; frameno--) {
+		struct bpf_func_state *frame = env->cur_state->frame[frameno];
+
+		verbose(env, "frame_desc: frame%d: Stack:\n", frameno);
+		for (int i = BPF_REG_SIZE - 1; i < frame->allocated_stack; i += BPF_REG_SIZE) {
+			ret = gen_exception_frame_desc_stack_entry(env, frame, i);
+			if (ret < 0)
+				return ret;
+		}
+
+		verbose(env, "frame_desc: frame%d: Registers:\n", frameno);
+		for (int i = BPF_REG_6; i < BPF_REG_FP; i++) {
+			struct bpf_reg_state *reg = &frame->regs[i];
+
+			/* Treat havoc scalars as incompatible type. */
+			if (reg->type == SCALAR_VALUE && !register_is_null(reg))
+				reg = &not_init_reg;
+			ret = gen_exception_frame_desc_reg_entry(env, reg, i, frame->frameno);
+			if (ret < 0)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
 {
 	struct bpf_func_state *state = cur_func(env);
@@ -17694,12 +18054,18 @@ static int do_check(struct bpf_verifier_env *env)
 					err = check_func_call(env, insn, &env->insn_idx);
 					if (!err && env->cur_state->global_subprog_call_exception) {
 						env->cur_state->global_subprog_call_exception = false;
+						err = gen_exception_frame_descs(env);
+						if (err < 0)
+							return err;
 						exception_exit = true;
 						goto process_bpf_exit_full;
 					}
 				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 					err = check_kfunc_call(env, insn, &env->insn_idx);
 					if (!err && is_bpf_throw_kfunc(insn)) {
+						err = gen_exception_frame_descs(env);
+						if (err < 0)
+							return err;
 						exception_exit = true;
 						goto process_bpf_exit_full;
 					}
@@ -21184,6 +21550,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 		mutex_unlock(&bpf_verifier_lock);
 	vfree(env->insn_aux_data);
 err_free_env:
+	for (int i = 0; i < env->subprog_cnt; i++)
+		bpf_exception_frame_desc_tab_free(env->subprog_info[i].fdtab);
 	kfree(env);
 	return ret;
 }
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index 28602f905d7d..5a517065b4e6 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -213,7 +213,7 @@ __noinline static int subprog_cb_ref(u32 i, void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference")
+__failure __msg("cannot be called from callback subprog 1")
 int reject_with_cb_reference(void *ctx)
 {
 	struct foo *f;
@@ -235,7 +235,7 @@ int reject_with_cb(void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference")
+__success
 int reject_with_subprog_reference(void *ctx)
 {
 	return subprog_ref(ctx) + 1;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-15 16:31   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

When instruction patching (addition or removal) occurs, the fdtab
attached to each subprog, and the program counter in its descriptors
will be out of sync wrt relative position in the program. To fix this,
we need to adjust the pc, free any unneeded fdtab and descriptors, and
ensure the entries correspond to the correct instruction offsets.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 50 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 27233c308d83..e5b1db1db679 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18737,6 +18737,23 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
 	}
 }
 
+static void adjust_subprog_frame_descs(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+	if (len == 1)
+		return;
+	for (int i = 0; i <= env->subprog_cnt; i++) {
+		struct bpf_exception_frame_desc_tab *fdtab = subprog_info(env, i)->fdtab;
+
+		if (!fdtab)
+			continue;
+		for (int j = 0; j < fdtab->cnt; j++) {
+			if (fdtab->desc[j]->pc <= off)
+				continue;
+			fdtab->desc[j]->pc += len - 1;
+		}
+	}
+}
+
 static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len)
 {
 	struct bpf_jit_poke_descriptor *tab = prog->aux->poke_tab;
@@ -18775,6 +18792,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
 	}
 	adjust_insn_aux_data(env, new_data, new_prog, off, len);
 	adjust_subprog_starts(env, off, len);
+	adjust_subprog_frame_descs(env, off, len);
 	adjust_poke_descs(new_prog, off, len);
 	return new_prog;
 }
@@ -18805,6 +18823,10 @@ static int adjust_subprog_starts_after_remove(struct bpf_verifier_env *env,
 		/* move fake 'exit' subprog as well */
 		move = env->subprog_cnt + 1 - j;
 
+		/* Free fdtab for subprog_info that we are going to destroy. */
+		for (int k = i; k < j; k++)
+			bpf_exception_frame_desc_tab_free(env->subprog_info[k].fdtab);
+
 		memmove(env->subprog_info + i,
 			env->subprog_info + j,
 			sizeof(*env->subprog_info) * move);
@@ -18835,6 +18857,30 @@ static int adjust_subprog_starts_after_remove(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int adjust_subprog_frame_descs_after_remove(struct bpf_verifier_env *env, u32 off, u32 cnt)
+{
+	for (int i = 0; i < env->subprog_cnt; i++) {
+		struct bpf_exception_frame_desc_tab *fdtab = subprog_info(env, i)->fdtab;
+
+		if (!fdtab)
+			continue;
+		for (int j = 0; j < fdtab->cnt; j++) {
+			/* Part of a subprog_info whose instructions were removed partially, but the fdtab remained. */
+			if (fdtab->desc[j]->pc >= off && fdtab->desc[j]->pc < off + cnt) {
+				void *p = fdtab->desc[j];
+				if (j < fdtab->cnt - 1)
+					memmove(fdtab->desc + j, fdtab->desc + j + 1, sizeof(fdtab->desc[0]) * (fdtab->cnt - j - 1));
+				kfree(p);
+				fdtab->cnt--;
+				j--;
+			}
+			if (fdtab->desc[j]->pc >= off + cnt)
+				fdtab->desc[j]->pc -= cnt;
+		}
+	}
+	return 0;
+}
+
 static int bpf_adj_linfo_after_remove(struct bpf_verifier_env *env, u32 off,
 				      u32 cnt)
 {
@@ -18916,6 +18962,10 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt)
 	if (err)
 		return err;
 
+	err = adjust_subprog_frame_descs_after_remove(env, off, cnt);
+	if (err)
+		return err;
+
 	err = bpf_adj_linfo_after_remove(env, off, cnt);
 	if (err)
 		return err;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-15 22:11   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

When we perform a bpf_throw kfunc call, callee saved registers in BPF
calling convention (R6-R9) may end up getting saved and clobbered by
bpf_throw. Typically, the kernel will restore the registers before
returning back to the BPF program, but in case of bpf_throw, the
function will never return. Therefore, any acquired resources sitting in
these registers will end up getting destroyed if not saved on the
stack, without any cleanup happening for them.

Also, when in a BPF call chain, caller frames may have held acquired
resources in R6-R9 and called their subprogs, which may have spilled
these on their stack frame to reuse these registers before entering the
bpf_throw kfunc. Thus, we also need to locate and free these callee
saved registers for each frame.

It is thus necessary to save these registers somewhere before we call
into the bpf_throw kfunc. Instead of adding spills everywhere bpf_throw
is called, we can use a new hidden subprog that saves R6-R9 on the stack
and then calls into bpf_throw. This way, all of the bpf_throw call sites
can be turned into call instructions for this subprog, and the hidden
subprog in turn will save the callee-saved registers before calling into
the bpf_throw kfunc.

In this way, when unwinding the stack, we can locate the callee saved
registers on the hidden subprog stack frame and perform their cleanup.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c  | 24 ++++++++--------
 include/linux/bpf.h          |  5 ++++
 include/linux/bpf_verifier.h |  3 +-
 kernel/bpf/verifier.c        | 55 ++++++++++++++++++++++++++++++++++--
 4 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e1390d1e331b..87692d983ffd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -640,9 +640,10 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
 	EMIT2(X86_JE, offset);                    /* je out */
 
-	if (bpf_prog->aux->exception_boundary) {
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 		pop_callee_regs(&prog, all_callee_regs_used);
-		pop_r12(&prog);
+		if (bpf_prog->aux->exception_boundary)
+			pop_r12(&prog);
 	} else {
 		pop_callee_regs(&prog, callee_regs_used);
 	}
@@ -699,9 +700,10 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
 	emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
 		  poke->tailcall_bypass);
 
-	if (bpf_prog->aux->exception_boundary) {
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 		pop_callee_regs(&prog, all_callee_regs_used);
-		pop_r12(&prog);
+		if (bpf_prog->aux->exception_boundary)
+			pop_r12(&prog);
 	} else {
 		pop_callee_regs(&prog, callee_regs_used);
 	}
@@ -1164,12 +1166,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 	/* Exception callback will clobber callee regs for its own use, and
 	 * restore the original callee regs from main prog's stack frame.
 	 */
-	if (bpf_prog->aux->exception_boundary) {
-		/* We also need to save r12, which is not mapped to any BPF
-		 * register, as we throw after entry into the kernel, which may
-		 * overwrite r12.
-		 */
-		push_r12(&prog);
+	if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
+		if (bpf_prog->aux->exception_boundary)
+			push_r12(&prog);
 		push_callee_regs(&prog, all_callee_regs_used);
 	} else {
 		push_callee_regs(&prog, callee_regs_used);
@@ -2031,9 +2030,10 @@ st:			if (is_imm8(insn->off))
 			seen_exit = true;
 			/* Update cleanup_addr */
 			ctx->cleanup_addr = proglen;
-			if (bpf_prog->aux->exception_boundary) {
+			if (bpf_prog->aux->exception_boundary || bpf_prog->aux->bpf_throw_tramp) {
 				pop_callee_regs(&prog, all_callee_regs_used);
-				pop_r12(&prog);
+				if (bpf_prog->aux->exception_boundary)
+					pop_r12(&prog);
 			} else {
 				pop_callee_regs(&prog, callee_regs_used);
 			}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 463c8d22ad72..83cff18a1b66 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3369,6 +3369,11 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+static inline bool bpf_is_hidden_subprog(const struct bpf_prog *prog)
+{
+	return prog->aux->func_idx >= prog->aux->func_cnt;
+}
+
 struct bpf_frame_desc_reg_entry {
 	u32 type;
 	s16 spill_type;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 0113a3a940e2..04e27fce33d6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -683,6 +683,7 @@ struct bpf_verifier_env {
 	u32 id_gen;			/* used to generate unique reg IDs */
 	u32 hidden_subprog_cnt;		/* number of hidden subprogs */
 	int exception_callback_subprog;
+	int bpf_throw_tramp_subprog;
 	bool explore_alu_limits;
 	bool allow_ptr_leaks;
 	/* Allow access to uninitialized stack memory. Writes with fixed offset are
@@ -699,7 +700,7 @@ struct bpf_verifier_env {
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
 	struct bpf_verifier_log log;
-	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 2]; /* max + 2 for the fake and exception subprogs */
+	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 3]; /* max + 3 for the fake, exception, and bpf_throw_tramp subprogs */
 	union {
 		struct bpf_idmap idmap_scratch;
 		struct bpf_idset idset_scratch;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e5b1db1db679..942243cba9f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19836,9 +19836,9 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
 	int cnt = env->subprog_cnt;
 	struct bpf_prog *prog;
 
-	/* We only reserve one slot for hidden subprogs in subprog_info. */
-	if (env->hidden_subprog_cnt) {
-		verbose(env, "verifier internal error: only one hidden subprog supported\n");
+	/* We only reserve two slots for hidden subprogs in subprog_info. */
+	if (env->hidden_subprog_cnt == 2) {
+		verbose(env, "verifier internal error: only two hidden subprogs supported\n");
 		return -EFAULT;
 	}
 	/* We're not patching any existing instruction, just appending the new
@@ -19892,6 +19892,42 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		mark_subprog_exc_cb(env, env->exception_callback_subprog);
 	}
 
+	if (env->seen_exception) {
+		struct bpf_insn patch[] = {
+			/* Use the correct insn_cnt here, as we want to append past the hidden subprog above. */
+			env->prog->insnsi[env->prog->len - 1],
+			/* Scratch R6-R9 so that the JIT spills them to the stack on entry. */
+			BPF_MOV64_IMM(BPF_REG_6, 0),
+			BPF_MOV64_IMM(BPF_REG_7, 0),
+			BPF_MOV64_IMM(BPF_REG_8, 0),
+			BPF_MOV64_IMM(BPF_REG_9, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, special_kfunc_list[KF_bpf_throw]),
+		};
+		const bool all_callee_regs_used[4] = {true, true, true, true};
+
+		ret = add_hidden_subprog(env, patch, ARRAY_SIZE(patch));
+		if (ret < 0)
+			return ret;
+		prog = env->prog;
+		insn = prog->insnsi;
+
+		env->bpf_throw_tramp_subprog = env->subprog_cnt - 1;
+		/* Ensure to mark callee_regs_used, so that we can collect any saved_regs if necessary. */
+		memcpy(env->subprog_info[env->bpf_throw_tramp_subprog].callee_regs_used, all_callee_regs_used, sizeof(all_callee_regs_used));
+		/* Certainly, we have seen a bpf_throw call in this program, as
+		 * seen_exception is true, therefore the bpf_kfunc_desc entry for it must
+		 * be populated and found here. We need to do the fixup now, otherwise
+		 * the loop over insn_cnt below won't see this kfunc call.
+		 */
+		ret = fixup_kfunc_call(env, &prog->insnsi[prog->len - 1], insn_buf, prog->len - 1, &cnt);
+		if (ret < 0)
+			return ret;
+		if (cnt != 0) {
+			verbose(env, "verifier internal error: unhandled patching for bpf_throw fixup in bpf_throw_tramp subprog\n");
+			return -EFAULT;
+		}
+	}
+
 	for (i = 0; i < insn_cnt; i++, insn++) {
 		/* Make divide-by-zero exceptions impossible. */
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
@@ -20012,6 +20048,19 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
+			/* All bpf_throw calls in this program must be patched to call the
+			 * bpf_throw_tramp subprog instead.  This ensures we correctly save
+			 * the R6-R9 before entry into kernel, and can clean them up if
+			 * needed.
+			 * Note: seen_exception must be set, otherwise no bpf_throw_tramp is
+			 * generated.
+			 */
+			if (env->seen_exception && is_bpf_throw_kfunc(insn)) {
+				*insn = BPF_CALL_REL(0);
+				insn->imm = (int)env->subprog_info[env->bpf_throw_tramp_subprog].start - (i + delta) - 1;
+				continue;
+			}
+
 			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-15 22:12   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

During runtime unwinding and cleanup, we will need to figure out where
the callee saved registers are stored on the stack, so that when a
bpf_thtrow call is made, all frames can release their callee saved
registers by finding their saved copies on the stack of callee frames.

While the previous patch ensured any BPF callee saved registers are
saved on a hidden subprog stack frame before entry into kernel (where we
would not know their location if spilled), there are cases where a
subprog's R6-R9 are not spilled into its immediate callee stack frame,
but much later in the call chain in some later callee stack frame. As
such, we would need to figure out while walking down the stack which
frames have spilled their incoming callee saved regs, and thus keep
track of where the latest spill would have happened with respect to a
given frame in the stack trace.

To perform this, we would need to know which callee saved registers are
saved by a given subprog at runtime during the unwinding phase. Right
now, there is a convenient way the x86 JIT figures this out in
detect_reg_usage. Utilize such logic in verifier core, and copy this
information to bpf_prog_aux struct before the JIT step to preserve this
information at runtime, through bpf_prog_aux.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |  1 +
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 10 ++++++++++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 83cff18a1b66..4ac6add0cec8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1460,6 +1460,7 @@ struct bpf_prog_aux {
 	bool xdp_has_frags;
 	bool exception_cb;
 	bool exception_boundary;
+	bool callee_regs_used[4];
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
 	/* function name for valid attach_btf_id */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 04e27fce33d6..e08ff540ec44 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -620,6 +620,7 @@ struct bpf_subprog_info {
 	u32 start; /* insn idx of function entry point */
 	u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
 	u16 stack_depth; /* max. stack depth used by this function */
+	bool callee_regs_used[4];
 	bool has_tail_call: 1;
 	bool tail_call_reachable: 1;
 	bool has_ld_abs: 1;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 942243cba9f1..aeaf97b0a749 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2942,6 +2942,15 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		    insn[i].src_reg == 0 &&
 		    insn[i].imm == BPF_FUNC_tail_call)
 			subprog[cur_subprog].has_tail_call = true;
+		/* Collect callee regs used in the subprog. */
+		if (insn[i].dst_reg == BPF_REG_6 || insn[i].src_reg == BPF_REG_6)
+			subprog[cur_subprog].callee_regs_used[0] = true;
+		if (insn[i].dst_reg == BPF_REG_7 || insn[i].src_reg == BPF_REG_7)
+			subprog[cur_subprog].callee_regs_used[1] = true;
+		if (insn[i].dst_reg == BPF_REG_8 || insn[i].src_reg == BPF_REG_8)
+			subprog[cur_subprog].callee_regs_used[2] = true;
+		if (insn[i].dst_reg == BPF_REG_9 || insn[i].src_reg == BPF_REG_9)
+			subprog[cur_subprog].callee_regs_used[3] = true;
 		if (!env->seen_throw_insn && is_bpf_throw_kfunc(&insn[i]))
 			env->seen_throw_insn = true;
 		if (BPF_CLASS(code) == BPF_LD &&
@@ -19501,6 +19510,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		}
 		func[i]->aux->num_exentries = num_exentries;
 		func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable;
+		memcpy(&func[i]->aux->callee_regs_used, env->subprog_info[i].callee_regs_used, sizeof(func[i]->aux->callee_regs_used));
 		func[i]->aux->exception_cb = env->subprog_info[i].is_exception_cb;
 		if (!i)
 			func[i]->aux->exception_boundary = env->seen_exception;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-15 22:12   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Until now, the program counter value stored in frame descriptor entries
was the instruction index of the BPF program's insn and callsites when
going down the frames in a call chain. However, at runtime, the program
counter will be the pointer to the next instruction, and thus needs to
be computed in a position independent way to tally it at runtime to find
the frame descriptor when unwinding.

To do this, we first convert the global instruction index into an
instruction index relative to the start of a subprog, and add 1 to it
(to reflect that at runtime, the program counter points to the next
instruction). Then, we modify the JIT (for now, x86) to convert them
to instruction offsets relative to the start of the JIT image, which is
the prog->bpf_func of the subprog in question at runtime.

Later, subtracting the prog->bpf_func pointer from runtime program
counter will yield the same offset, and allow us to figure out the
corresponding frame descriptor entry.

Note that we have to mark a frame descriptor entry as 'final' because
bpf_int_jit_compile can be called multiple times, and we would try to
convert our already converted pc values again, therefore once we do the
conversion remember it and do not repeat it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 11 +++++++++++
 include/linux/bpf.h         |  2 ++
 kernel/bpf/verifier.c       | 15 +++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 87692d983ffd..0dd0791c6ee0 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3112,6 +3112,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		prog = orig_prog;
 	}
 
+	if (prog->aux->fdtab && !prog->aux->fdtab->final && image) {
+		struct bpf_exception_frame_desc_tab *fdtab = prog->aux->fdtab;
+
+		for (int i = 0; i < fdtab->cnt; i++) {
+			struct bpf_exception_frame_desc *desc = fdtab->desc[i];
+
+			desc->pc = addrs[desc->pc];
+		}
+		prog->aux->fdtab->final = true;
+	}
+
 	if (!image || !prog->is_func || extra_pass) {
 		if (image)
 			bpf_prog_fill_jited_linfo(prog, addrs + 1);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4ac6add0cec8..e310d3ceb14e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1460,6 +1460,7 @@ struct bpf_prog_aux {
 	bool xdp_has_frags;
 	bool exception_cb;
 	bool exception_boundary;
+	bool bpf_throw_tramp;
 	bool callee_regs_used[4];
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
@@ -3395,6 +3396,7 @@ struct bpf_exception_frame_desc {
 
 struct bpf_exception_frame_desc_tab {
 	u32 cnt;
+	bool final;
 	struct bpf_exception_frame_desc **desc;
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index aeaf97b0a749..ec9acadc9ea8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19514,6 +19514,20 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->exception_cb = env->subprog_info[i].is_exception_cb;
 		if (!i)
 			func[i]->aux->exception_boundary = env->seen_exception;
+		if (i == env->bpf_throw_tramp_subprog)
+			func[i]->aux->bpf_throw_tramp = true;
+		/* Fix up pc of fdtab entries to be relative to subprog start before JIT. */
+		if (env->subprog_info[i].fdtab) {
+			for (int k = 0; k < env->subprog_info[i].fdtab->cnt; k++) {
+				struct bpf_exception_frame_desc *desc = env->subprog_info[i].fdtab->desc[k];
+				/* Add 1 to point to the next instruction, which will be the PC at runtime. */
+				desc->pc = desc->pc - subprog_start + 1;
+			}
+		}
+		/* Transfer fdtab to subprog->aux */
+		func[i]->aux->fdtab = env->subprog_info[i].fdtab;
+		env->subprog_info[i].fdtab = NULL;
+
 		func[i] = bpf_int_jit_compile(func[i]);
 		if (!func[i]->jited) {
 			err = -ENOTSUPP;
@@ -19604,6 +19618,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	prog->aux->real_func_cnt = env->subprog_cnt;
 	prog->aux->bpf_exception_cb = (void *)func[env->exception_callback_subprog]->bpf_func;
 	prog->aux->exception_boundary = func[0]->aux->exception_boundary;
+	prog->aux->fdtab = func[0]->aux->fdtab;
 	bpf_prog_jit_attempt_done(prog);
 	return 0;
 out_free:
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-03  9:02   ` kernel test robot
  2024-02-16 12:02   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Finally, tie all ends together and implement functionality to process a
frame descriptor at runtime for each frame when bpf_throw is called, and
release resources present on the program's stack frames.

For each frame, we do bpf_cleanup_frame_resource, which will use the
instruction pointer at runtime to figure out the right frame descriptor
entry. After this, we explore all stack and registers and call their
respective cleanup procedures.

Next, if the frame corresponds to a subprog, we all save the location of
where it has spilled its callers R6-R9 registers. If so, we record their
value in the unwinding context. Only doing this when each frame has
scratched the register in question allows us to arrive at the set of
values actually needed during the freeing step for registers, regardless
of how many callees existed and the varying locations of spilled callee
saved registers. These registers can also lie across different frames,
but will collected top down when arriving at a frame.

Finally, after doing the cleanup, we go on to execute the exception
callback and finish unwinding the stack.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c |  81 ++++++++++++++++++++++++
 include/linux/bpf.h         |  22 +++++++
 include/linux/filter.h      |   3 +
 kernel/bpf/core.c           |   5 ++
 kernel/bpf/helpers.c        | 121 +++++++++++++++++++++++++++++++++---
 kernel/bpf/verifier.c       |  20 ++++++
 net/core/filter.c           |   5 ++
 7 files changed, 247 insertions(+), 10 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 0dd0791c6ee0..26a96fee2f4e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3191,6 +3191,11 @@ bool bpf_jit_supports_exceptions(void)
 	return IS_ENABLED(CONFIG_UNWINDER_ORC);
 }
 
+bool bpf_jit_supports_exceptions_cleanup(void)
+{
+	return bpf_jit_supports_exceptions();
+}
+
 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
 {
 #if defined(CONFIG_UNWINDER_ORC)
@@ -3208,6 +3213,82 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
 	WARN(1, "verification of programs using bpf_throw should have failed\n");
 }
 
+static int bpf_frame_spilled_caller_reg_off(struct bpf_prog *prog, int regno)
+{
+	int off = 0;
+
+	for (int i = BPF_REG_9; i >= BPF_REG_6; i--) {
+		if (regno == i)
+			return off;
+		if (prog->aux->callee_regs_used[i - BPF_REG_6])
+			off += sizeof(u64);
+	}
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+void arch_bpf_cleanup_frame_resource(struct bpf_prog *prog, struct bpf_throw_ctx *ctx, u64 ip, u64 sp, u64 bp) {
+	struct bpf_exception_frame_desc_tab *fdtab = prog->aux->fdtab;
+	struct bpf_exception_frame_desc *fd = NULL;
+	u64 ip_off = ip - (u64)prog->bpf_func;
+
+	/* Hidden subprogs and subprogs without fdtab do not need cleanup. */
+	if (bpf_is_hidden_subprog(prog) || !fdtab)
+		goto end;
+
+	for (int i = 0; i < fdtab->cnt; i++) {
+		if (ip_off != fdtab->desc[i]->pc)
+			continue;
+		fd = fdtab->desc[i];
+		break;
+	}
+	/* This should always be found, but let's bail if we cannot find it. */
+	if (WARN_ON_ONCE(!fd))
+		return;
+
+	for (int i = 0; i < fd->stack_cnt; i++) {
+		void *ptr = (void *)((s64)bp + fd->stack[i].off);
+
+		bpf_cleanup_resource(fd->stack + i, ptr);
+	}
+
+	for (int i = 0; i < ARRAY_SIZE(fd->regs); i++) {
+		void *ptr;
+
+		if (!fd->regs[i].regno || fd->regs[i].type == NOT_INIT || fd->regs[i].type == SCALAR_VALUE)
+			continue;
+		/* Our sp will be bp of new frame before caller regs are spilled, so offset is relative to our sp. */
+		WARN_ON_ONCE(!ctx->saved_reg[i]);
+		ptr = (void *)&ctx->saved_reg[i];
+		bpf_cleanup_resource(fd->regs + i, ptr);
+		ctx->saved_reg[i] = 0;
+	}
+end:
+	/* There could be a case where we have something in main R6, R7, R8, R9 that
+	 * needs releasing, and the callchain is as follows:
+	 * main -> subprog1 -> subprog2 -> bpf_throw_tramp -> bpf_throw
+	 * In such a case, subprog1 may use only R6, R7 and subprog2 may use R8, R9 being unscratched until
+	 * subprog2 calls bpf_throw. In that case, subprog2 will spill R6-R9. The
+	 * loop below when we are called for each subprog in order will ensure we have the correct saved_reg
+	 * from the PoV of the current bpf_prog corresponding to a frame.
+	 * E.g. in the chain main -> s1 -> s2 -> bpf_throw_tramp -> bpf_throw
+	 * Let's say R6-R9 have values A, B, C, D in main when calling subprog1.
+	 * Below, we show the computed saved_regs values as we walk the stack:
+	 * For bpf_throw_tramp, saved_regs = { 0, 0, 0, 0 }
+	 * For s2, saved_regs = { 0, 0, 0, D } // D loaded from bpf_throw_tramp frame
+	 * For s1, saved_regs = { 0, 0, C, D } // C loaded from subprog2 frame
+	 * For main, saved_regs = { A, B, C, D } // A, B loaded from subprog1 frame
+	 * Thus, for main, we have the correct saved_regs values even though they
+	 * were spilled in multiple callee stack frames down the call chain.
+	 */
+	if (bpf_is_subprog(prog)) {
+		for (int i = 0; i < ARRAY_SIZE(prog->aux->callee_regs_used); i++) {
+			if (prog->aux->callee_regs_used[i])
+				ctx->saved_reg[i] = *(u64 *)((s64)sp + bpf_frame_spilled_caller_reg_off(prog, BPF_REG_6 + i));
+		}
+	}
+}
+
 void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
 			       struct bpf_prog *new, struct bpf_prog *old)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e310d3ceb14e..a7c8c118c534 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3402,4 +3402,26 @@ struct bpf_exception_frame_desc_tab {
 
 void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab);
 
+struct bpf_throw_ctx {
+	struct bpf_prog_aux *aux;
+	u64 sp;
+	u64 bp;
+	union {
+		struct {
+			u64 saved_r6;
+			u64 saved_r7;
+			u64 saved_r8;
+			u64 saved_r9;
+		};
+		u64 saved_reg[4];
+	};
+	int cnt;
+};
+
+void arch_bpf_cleanup_frame_resource(struct bpf_prog *prog, struct bpf_throw_ctx *ctx, u64 ip, u64 sp, u64 bp);
+void bpf_cleanup_resource(struct bpf_frame_desc_reg_entry *fd, void *ptr);
+int bpf_cleanup_resource_reg(struct bpf_frame_desc_reg_entry *fd, void *ptr);
+int bpf_cleanup_resource_dynptr(struct bpf_frame_desc_reg_entry *fd, void *ptr);
+int bpf_cleanup_resource_iter(struct bpf_frame_desc_reg_entry *fd, void *ptr);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fee070b9826e..9779d8281a59 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
 bool bpf_jit_supports_kfunc_call(void);
 bool bpf_jit_supports_far_kfunc_call(void);
 bool bpf_jit_supports_exceptions(void);
+bool bpf_jit_supports_exceptions_cleanup(void);
 bool bpf_jit_supports_ptr_xchg(void);
 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
 bool bpf_helper_changes_pkt_data(void *func);
@@ -1624,4 +1625,6 @@ static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, voi
 }
 #endif /* CONFIG_NET */
 
+void bpf_sk_release_dtor(void *ptr);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 995a4dcfa970..6e6260c1e926 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2979,6 +2979,11 @@ bool __weak bpf_jit_supports_exceptions(void)
 	return false;
 }
 
+bool __weak bpf_jit_supports_exceptions_cleanup(void)
+{
+	return false;
+}
+
 void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
 {
 }
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4db1c658254c..304fe26cba65 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
  */
 #include <linux/bpf.h>
+#include <linux/bpf_verifier.h>
 #include <linux/btf.h>
 #include <linux/bpf-cgroup.h>
 #include <linux/cgroup.h>
@@ -2499,12 +2500,113 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
 	rcu_read_unlock();
 }
 
-struct bpf_throw_ctx {
-	struct bpf_prog_aux *aux;
-	u64 sp;
-	u64 bp;
-	int cnt;
-};
+int bpf_cleanup_resource_reg(struct bpf_frame_desc_reg_entry *fd, void *ptr)
+{
+	u64 reg_value = ptr ? *(u64 *)ptr : 0;
+	struct btf_struct_meta *meta;
+	const struct btf_type *t;
+	u32 dtor_id;
+
+	switch (fd->type) {
+	case PTR_TO_SOCKET:
+	case PTR_TO_TCP_SOCK:
+	case PTR_TO_SOCK_COMMON:
+		if (reg_value)
+			bpf_sk_release_dtor((void *)reg_value);
+		return 0;
+	case PTR_TO_MEM | MEM_RINGBUF:
+		if (reg_value)
+			bpf_ringbuf_discard_proto.func(reg_value, 0, 0, 0, 0);
+		return 0;
+	case PTR_TO_BTF_ID | MEM_ALLOC:
+	case PTR_TO_BTF_ID | MEM_ALLOC | MEM_PERCPU:
+		if (!reg_value)
+			return 0;
+		meta = btf_find_struct_meta(fd->btf, fd->btf_id);
+		if (fd->type & MEM_PERCPU)
+			bpf_percpu_obj_drop_impl((void *)reg_value, meta);
+		else
+			bpf_obj_drop_impl((void *)reg_value, meta);
+		return 0;
+	case PTR_TO_BTF_ID:
+#ifdef CONFIG_NET
+		if (bsearch(&fd->btf_id, btf_sock_ids, MAX_BTF_SOCK_TYPE, sizeof(btf_sock_ids[0]), btf_id_cmp_func)) {
+			if (reg_value)
+				bpf_sk_release_dtor((void *)reg_value);
+			return 0;
+		}
+#endif
+		dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_KPTR | BPF_DTOR_CLEANUP);
+		if (dtor_id < 0)
+			return -EINVAL;
+		t = btf_type_by_id(fd->btf, dtor_id);
+		if (!t)
+			return -EINVAL;
+		if (reg_value) {
+			void (*dtor)(void *) = (void *)kallsyms_lookup_name(btf_name_by_offset(fd->btf, t->name_off));
+			dtor((void *)reg_value);
+		}
+		return 0;
+	case SCALAR_VALUE:
+	case NOT_INIT:
+		return 0;
+	default:
+		break;
+	}
+	return -EINVAL;
+}
+
+int bpf_cleanup_resource_dynptr(struct bpf_frame_desc_reg_entry *fd, void *ptr)
+{
+	switch (fd->type) {
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		if (ptr)
+			bpf_ringbuf_discard_dynptr_proto.func((u64)ptr, 0, 0, 0, 0);
+		return 0;
+	default:
+		break;
+	}
+	return -EINVAL;
+}
+
+int bpf_cleanup_resource_iter(struct bpf_frame_desc_reg_entry *fd, void *ptr)
+{
+	const struct btf_type *t;
+	void (*dtor)(void *);
+	u32 dtor_id;
+
+	dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_CLEANUP);
+	if (dtor_id < 0)
+		return -EINVAL;
+	t = btf_type_by_id(fd->btf, dtor_id);
+	if (!t)
+		return -EINVAL;
+	dtor = (void *)kallsyms_lookup_name(btf_name_by_offset(fd->btf, t->name_off));
+	if (ptr)
+		dtor(ptr);
+	return 0;
+}
+
+void bpf_cleanup_resource(struct bpf_frame_desc_reg_entry *fd, void *ptr)
+{
+	if (!ptr)
+		return;
+	switch (fd->spill_type) {
+	case STACK_DYNPTR:
+		bpf_cleanup_resource_dynptr(fd, ptr);
+		break;
+	case STACK_ITER:
+		bpf_cleanup_resource_iter(fd, ptr);
+		break;
+	case STACK_SPILL:
+	case STACK_INVALID:
+		bpf_cleanup_resource_reg(fd, ptr);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
 
 static bool bpf_stack_walker(void *cookie, u64 ip, u64 sp, u64 bp)
 {
@@ -2514,13 +2616,12 @@ static bool bpf_stack_walker(void *cookie, u64 ip, u64 sp, u64 bp)
 	if (!is_bpf_text_address(ip))
 		return !ctx->cnt;
 	prog = bpf_prog_ksym_find(ip);
-	ctx->cnt++;
-	if (bpf_is_subprog(prog))
-		return true;
 	ctx->aux = prog->aux;
 	ctx->sp = sp;
 	ctx->bp = bp;
-	return false;
+	arch_bpf_cleanup_frame_resource(prog, ctx, ip, sp, bp);
+	ctx->cnt++;
+	return bpf_is_subprog(prog);
 }
 
 __bpf_kfunc void bpf_throw(u64 cookie)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ec9acadc9ea8..3e3b8a20451c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10216,6 +10216,11 @@ static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, stru
 	fd.off = off;
 	verbose(env, "frame_desc: frame%d: insn_idx=%d %s=%d size=%d ref_obj_id=%d type=%s\n",
 		frameno, env->insn_idx, off < 0 ? "off" : "regno", off, BPF_REG_SIZE, reg->ref_obj_id, reg_type_str(env, reg->type));
+
+	if (bpf_cleanup_resource_reg(&fd, NULL)) {
+		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
+		return -EFAULT;
+	}
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10241,6 +10246,11 @@ static int gen_exception_frame_desc_dynptr_entry(struct bpf_verifier_env *env, s
 		verbose(env, "verifier internal error: refcounted dynptr type unhandled for exception frame descriptor entry\n");
 		return -EFAULT;
 	}
+
+	if (bpf_cleanup_resource_dynptr(&fd, NULL)) {
+		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
+		return -EFAULT;
+	}
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10268,6 +10278,11 @@ static int gen_exception_frame_desc_iter_entry(struct bpf_verifier_env *env, str
 		btf_put(btf);
 		return ret;
 	}
+
+	if (bpf_cleanup_resource_iter(&fd, NULL)) {
+		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
+		return -EFAULT;
+	}
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10348,6 +10363,11 @@ static int gen_exception_frame_descs(struct bpf_verifier_env *env)
 
 	__mark_reg_not_init(env, &not_init_reg);
 
+	if (!bpf_jit_supports_exceptions_cleanup()) {
+		verbose(env, "JIT does not support cleanup of resources when throwing exceptions\n");
+		return -ENOTSUPP;
+	}
+
 	for (int frameno = env->cur_state->curframe; frameno >= 0; frameno--) {
 		struct bpf_func_state *frame = env->cur_state->frame[frameno];
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 524adf1fa6d0..789e36f79f4c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6912,6 +6912,11 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
 	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
 };
 
+void bpf_sk_release_dtor(void *ptr)
+{
+	bpf_sk_release((u64)ptr, 0, 0, 0, 0);
+}
+
 BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
 	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-16 12:21   ` Eduard Zingerman
  2024-02-01  4:21 ` [RFC PATCH v1 12/14] bpf: Register cleanup dtors for runtime unwinding Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Reflect in the verifier state that references would be released whenever
we throw a BPF exception. Now that we support generating frame
descriptors, and performing the runtime cleanup, whenever processing an
entry corresponding to an acquired reference, make sure we release its
reference state. Note that we only release this state for the current
frame, as the acquired refs are only checked against that when
processing an exceptional exit.

This would ensure that for acquired resources apart from locks and RCU
read sections, BPF programs never fail in case of lingering resources
during verification.

While at it, we can tweak check_reference_leak to drop the
exception_exit parameter, and fix selftests that will fail due to the
changed behaviour.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c                          | 18 ++++++++++++------
 .../selftests/bpf/progs/exceptions_fail.c      |  2 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3e3b8a20451c..8edefcd999ea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10221,6 +10221,8 @@ static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, stru
 		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
 		return -EFAULT;
 	}
+	if (reg->ref_obj_id && frameno == cur_func(env)->frameno)
+		WARN_ON_ONCE(release_reference(env, reg->ref_obj_id));
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10251,6 +10253,8 @@ static int gen_exception_frame_desc_dynptr_entry(struct bpf_verifier_env *env, s
 		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
 		return -EFAULT;
 	}
+	if (frameno == cur_func(env)->frameno)
+		WARN_ON_ONCE(release_reference(env, reg->ref_obj_id));
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10283,6 +10287,8 @@ static int gen_exception_frame_desc_iter_entry(struct bpf_verifier_env *env, str
 		verbose(env, "frame_desc: frame%d: failed to simulate cleanup for frame desc entry\n", frameno);
 		return -EFAULT;
 	}
+	if (frameno == cur_func(env)->frameno)
+		WARN_ON_ONCE(release_reference(env, reg->ref_obj_id));
 	return push_exception_frame_desc(env, frameno, &fd);
 }
 
@@ -10393,17 +10399,17 @@ static int gen_exception_frame_descs(struct bpf_verifier_env *env)
 	return 0;
 }
 
-static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
+static int check_reference_leak(struct bpf_verifier_env *env)
 {
 	struct bpf_func_state *state = cur_func(env);
 	bool refs_lingering = false;
 	int i;
 
-	if (!exception_exit && state->frameno && !state->in_callback_fn)
+	if (state->frameno && !state->in_callback_fn)
 		return 0;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
+		if (state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
 			continue;
 		verbose(env, "Unreleased reference id=%d alloc_insn=%d\n",
 			state->refs[i].id, state->refs[i].insn_idx);
@@ -10658,7 +10664,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
-		err = check_reference_leak(env, false);
+		err = check_reference_leak(env);
 		if (err) {
 			verbose(env, "tail_call would lead to reference leak\n");
 			return err;
@@ -15593,7 +15599,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * gen_ld_abs() may terminate the program at runtime, leading to
 	 * reference leak.
 	 */
-	err = check_reference_leak(env, false);
+	err = check_reference_leak(env);
 	if (err) {
 		verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n");
 		return err;
@@ -18149,7 +18155,7 @@ static int do_check(struct bpf_verifier_env *env)
 				 * function, for which reference_state must
 				 * match caller reference state when it exits.
 				 */
-				err = check_reference_leak(env, exception_exit);
+				err = check_reference_leak(env);
 				if (err)
 					return err;
 
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index 5a517065b4e6..dfd164a7a261 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -354,7 +354,7 @@ int reject_exception_throw_cb_diff(struct __sk_buff *ctx)
 }
 
 SEC("?tc")
-__failure __msg("exploring program path where exception is thrown")
+__success __log_level(2) __msg("exploring program path where exception is thrown")
 int reject_exception_throw_ref_call_throwing_global(struct __sk_buff *ctx)
 {
 	struct { long a; } *p = bpf_obj_new(typeof(*p));
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 12/14] bpf: Register cleanup dtors for runtime unwinding
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-01  4:21 ` [RFC PATCH v1 13/14] bpf: Make bpf_throw available to all program types Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Jiri Kosina, Benjamin Tissoires, Pablo Neira Ayuso,
	Florian Westphal, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, David Vernet, Tejun Heo,
	Raj Sahu, Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Reuse exist BTF dtor infrastructure to also include dtor kfuncs that can
be used to release PTR_TO_BTF_ID pointers and other BTF objects
(iterators). For this purpose, we extend btf_id_dtor_kfunc object with a
flags field, and ensure that entries that cannot work as kptrs are not
allowed to be embedded in map values.

Prior to this change, btf_id_dtor_kfunc served a dual role of allow list
of kptrs and finding their dtors. To separate this role, we must now
explicitly pass only BPF_DTOR_KPTR to ensure we don't look up other
cleanup kfuncs in the dtor table.

Finally, set up iterator and other objects that can be acquired to be
released by adding their cleanup kfunc dtor entries and registering them
with the BTF.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 drivers/hid/bpf/hid_bpf_dispatch.c | 17 ++++++++++++
 include/linux/btf.h                | 10 +++++--
 kernel/bpf/btf.c                   | 11 +++++---
 kernel/bpf/cpumask.c               |  3 ++-
 kernel/bpf/helpers.c               | 43 +++++++++++++++++++++++++++---
 kernel/trace/bpf_trace.c           | 16 +++++++++++
 net/bpf/test_run.c                 |  4 ++-
 net/netfilter/nf_conntrack_bpf.c   | 14 +++++++++-
 net/xfrm/xfrm_state_bpf.c          | 16 +++++++++++
 9 files changed, 123 insertions(+), 11 deletions(-)

diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index 02c441aaa217..eea1699b91cc 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -452,6 +452,10 @@ static const struct btf_kfunc_id_set hid_bpf_syscall_kfunc_set = {
 	.set   = &hid_bpf_syscall_kfunc_ids,
 };
 
+BTF_ID_LIST(hid_bpf_dtor_id_list)
+BTF_ID(struct, hid_bpf_ctx)
+BTF_ID(func, hid_bpf_release_context)
+
 int hid_bpf_connect_device(struct hid_device *hdev)
 {
 	struct hid_bpf_prog_list *prog_list;
@@ -496,6 +500,13 @@ EXPORT_SYMBOL_GPL(hid_bpf_device_init);
 
 static int __init hid_bpf_init(void)
 {
+	const struct btf_id_dtor_kfunc dtors[] = {
+		{
+			.btf_id = hid_bpf_dtor_id_list[0],
+			.kfunc_btf_id = hid_bpf_dtor_id_list[1],
+			.flags = BPF_DTOR_CLEANUP,
+		},
+	};
 	int err;
 
 	/* Note: if we exit with an error any time here, we would entirely break HID, which
@@ -505,6 +516,12 @@ static int __init hid_bpf_init(void)
 	 * will not be available, so nobody will be able to use the functionality.
 	 */
 
+	err = register_btf_id_dtor_kfuncs(dtors, ARRAY_SIZE(dtors), THIS_MODULE);
+	if (err) {
+		pr_warn("error while registering hid_bpf cleanup dtors: %d", err);
+		return 0;
+	}
+
 	err = register_btf_fmodret_id_set(&hid_bpf_fmodret_set);
 	if (err) {
 		pr_warn("error while registering fmodret entrypoints: %d", err);
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 1ee8977b8c95..219cc4a5d22d 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -120,9 +120,15 @@ struct btf_kfunc_id_set {
 	btf_kfunc_filter_t filter;
 };
 
+enum {
+	BPF_DTOR_KPTR	 = (1 << 0),
+	BPF_DTOR_CLEANUP = (1 << 1),
+};
+
 struct btf_id_dtor_kfunc {
 	u32 btf_id;
 	u32 kfunc_btf_id;
+	u32 flags;
 };
 
 struct btf_struct_meta {
@@ -521,7 +527,7 @@ u32 *btf_kfunc_is_modify_return(const struct btf *btf, u32 kfunc_btf_id,
 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 			      const struct btf_kfunc_id_set *s);
 int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset);
-s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id, u32 flags);
 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
 				struct module *owner);
 struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
@@ -555,7 +561,7 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
 {
 	return 0;
 }
-static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id, u32 flags)
 {
 	return -ENOENT;
 }
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ef380e546952..17b9c04a71dd 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3657,7 +3657,7 @@ static int btf_parse_kptr(const struct btf *btf, struct btf_field *field,
 		 * can be used as a referenced pointer and be stored in a map at
 		 * the same time.
 		 */
-		dtor_btf_id = btf_find_dtor_kfunc(kptr_btf, id);
+		dtor_btf_id = btf_find_dtor_kfunc(kptr_btf, id, BPF_DTOR_KPTR);
 		if (dtor_btf_id < 0) {
 			ret = dtor_btf_id;
 			goto end_btf;
@@ -8144,7 +8144,7 @@ int register_btf_fmodret_id_set(const struct btf_kfunc_id_set *kset)
 }
 EXPORT_SYMBOL_GPL(register_btf_fmodret_id_set);
 
-s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
+s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id, u32 flags)
 {
 	struct btf_id_dtor_kfunc_tab *tab = btf->dtor_kfunc_tab;
 	struct btf_id_dtor_kfunc *dtor;
@@ -8156,7 +8156,7 @@ s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
 	 */
 	BUILD_BUG_ON(offsetof(struct btf_id_dtor_kfunc, btf_id) != 0);
 	dtor = bsearch(&btf_id, tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func);
-	if (!dtor)
+	if (!dtor || !(dtor->flags & flags))
 		return -ENOENT;
 	return dtor->kfunc_btf_id;
 }
@@ -8171,6 +8171,11 @@ static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc
 	for (i = 0; i < cnt; i++) {
 		dtor_btf_id = dtors[i].kfunc_btf_id;
 
+		if (!dtors[i].flags) {
+			pr_err("missing flag for btf_id_dtor_kfunc entry\n");
+			return -EINVAL;
+		}
+
 		dtor_func = btf_type_by_id(btf, dtor_btf_id);
 		if (!dtor_func || !btf_type_is_func(dtor_func))
 			return -EINVAL;
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index dad0fb1c8e87..7209adc1af6b 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -467,7 +467,8 @@ static int __init cpumask_kfunc_init(void)
 	const struct btf_id_dtor_kfunc cpumask_dtors[] = {
 		{
 			.btf_id	      = cpumask_dtor_ids[0],
-			.kfunc_btf_id = cpumask_dtor_ids[1]
+			.kfunc_btf_id = cpumask_dtor_ids[1],
+			.flags = BPF_DTOR_KPTR | BPF_DTOR_CLEANUP,
 		},
 	};
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 304fe26cba65..e1dfc4053f45 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2685,9 +2685,19 @@ static const struct btf_kfunc_id_set generic_kfunc_set = {
 BTF_ID_LIST(generic_dtor_ids)
 BTF_ID(struct, task_struct)
 BTF_ID(func, bpf_task_release_dtor)
+BTF_ID(struct, bpf_iter_num)
+BTF_ID(func, bpf_iter_num_destroy)
+BTF_ID(struct, bpf_iter_task)
+BTF_ID(func, bpf_iter_task_destroy)
+BTF_ID(struct, bpf_iter_task_vma)
+BTF_ID(func, bpf_iter_task_vma_destroy)
 #ifdef CONFIG_CGROUPS
 BTF_ID(struct, cgroup)
 BTF_ID(func, bpf_cgroup_release_dtor)
+BTF_ID(struct, bpf_iter_css)
+BTF_ID(func, bpf_iter_css_destroy)
+BTF_ID(struct, bpf_iter_css_task)
+BTF_ID(func, bpf_iter_css_task_destroy)
 #endif
 
 BTF_KFUNCS_START(common_btf_ids)
@@ -2732,12 +2742,39 @@ static int __init kfunc_init(void)
 	const struct btf_id_dtor_kfunc generic_dtors[] = {
 		{
 			.btf_id       = generic_dtor_ids[0],
-			.kfunc_btf_id = generic_dtor_ids[1]
+			.kfunc_btf_id = generic_dtor_ids[1],
+			.flags        = BPF_DTOR_KPTR | BPF_DTOR_CLEANUP,
 		},
-#ifdef CONFIG_CGROUPS
 		{
 			.btf_id       = generic_dtor_ids[2],
-			.kfunc_btf_id = generic_dtor_ids[3]
+			.kfunc_btf_id = generic_dtor_ids[3],
+			.flags        = BPF_DTOR_CLEANUP,
+		},
+		{
+			.btf_id       = generic_dtor_ids[4],
+			.kfunc_btf_id = generic_dtor_ids[5],
+			.flags        = BPF_DTOR_CLEANUP,
+		},
+		{
+			.btf_id       = generic_dtor_ids[6],
+			.kfunc_btf_id = generic_dtor_ids[7],
+			.flags        = BPF_DTOR_CLEANUP,
+		},
+#ifdef CONFIG_CGROUPS
+		{
+			.btf_id       = generic_dtor_ids[8],
+			.kfunc_btf_id = generic_dtor_ids[9],
+			.flags        = BPF_DTOR_KPTR | BPF_DTOR_CLEANUP,
+		},
+		{
+			.btf_id       = generic_dtor_ids[10],
+			.kfunc_btf_id = generic_dtor_ids[11],
+			.flags        = BPF_DTOR_CLEANUP,
+		},
+		{
+			.btf_id       = generic_dtor_ids[12],
+			.kfunc_btf_id = generic_dtor_ids[13],
+			.flags        = BPF_DTOR_CLEANUP,
 		},
 #endif
 	};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 241ddf5e3895..7a4bab3e698c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1426,8 +1426,24 @@ static const struct btf_kfunc_id_set bpf_key_sig_kfunc_set = {
 	.set = &key_sig_kfunc_set,
 };
 
+BTF_ID_LIST(bpf_key_dtor_id_list)
+BTF_ID(struct, bpf_key)
+BTF_ID(func, bpf_key_put)
+
 static int __init bpf_key_sig_kfuncs_init(void)
 {
+	const struct btf_id_dtor_kfunc dtors[] = {
+		{
+			.btf_id = bpf_key_dtor_id_list[0],
+			.kfunc_btf_id = bpf_key_dtor_id_list[1],
+			.flags = BPF_DTOR_CLEANUP,
+		},
+	};
+	int ret;
+
+	ret = register_btf_id_dtor_kfuncs(dtors, ARRAY_SIZE(dtors), THIS_MODULE);
+	if (ret < 0)
+		return 0;
 	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
 					 &bpf_key_sig_kfunc_set);
 }
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 5535f9adc658..4f506b27bb13 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1691,11 +1691,13 @@ static int __init bpf_prog_test_run_init(void)
 	const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
 		{
 		  .btf_id       = bpf_prog_test_dtor_kfunc_ids[0],
-		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
+		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1],
+		  .flags = BPF_DTOR_KPTR,
 		},
 		{
 		  .btf_id	= bpf_prog_test_dtor_kfunc_ids[2],
 		  .kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[3],
+		  .flags = BPF_DTOR_KPTR,
 		},
 	};
 	int ret;
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index d2492d050fe6..00eb111d9c1a 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -485,11 +485,23 @@ static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
 	.set   = &nf_ct_kfunc_set,
 };
 
+BTF_ID_LIST(nf_dtor_id_list)
+BTF_ID(struct, nf_conn)
+BTF_ID(func, bpf_ct_release)
+
 int register_nf_conntrack_bpf(void)
 {
+	const struct btf_id_dtor_kfunc dtors[] = {
+		{
+			.btf_id = nf_dtor_id_list[0],
+			.kfunc_btf_id = nf_dtor_id_list[1],
+			.flags = BPF_DTOR_CLEANUP,
+		},
+	};
 	int ret;
 
-	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
+	ret = register_btf_id_dtor_kfuncs(dtors, ARRAY_SIZE(dtors), THIS_MODULE);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
 	if (!ret) {
 		mutex_lock(&nf_conn_btf_access_lock);
diff --git a/net/xfrm/xfrm_state_bpf.c b/net/xfrm/xfrm_state_bpf.c
index 2248eda741f8..fdf6c22d145f 100644
--- a/net/xfrm/xfrm_state_bpf.c
+++ b/net/xfrm/xfrm_state_bpf.c
@@ -127,8 +127,24 @@ static const struct btf_kfunc_id_set xfrm_state_xdp_kfunc_set = {
 	.set   = &xfrm_state_kfunc_set,
 };
 
+BTF_ID_LIST(dtor_id_list)
+BTF_ID(struct, xfrm_state)
+BTF_ID(func, bpf_xdp_xfrm_state_release)
+
 int __init register_xfrm_state_bpf(void)
 {
+	const struct btf_id_dtor_kfunc dtors[] = {
+		{
+			.btf_id = dtor_id_list[0],
+			.kfunc_btf_id = dtor_id_list[1],
+			.flags = BPF_DTOR_CLEANUP,
+		},
+	};
+	int ret;
+
+	ret = register_btf_id_dtor_kfuncs(dtors, ARRAY_SIZE(dtors), THIS_MODULE);
+	if (ret < 0)
+		return ret;
 	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
 					 &xfrm_state_xdp_kfunc_set);
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 13/14] bpf: Make bpf_throw available to all program types
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 12/14] bpf: Register cleanup dtors for runtime unwinding Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-01  4:21 ` [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup Kumar Kartikeya Dwivedi
  2024-03-14 11:08 ` [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Eduard Zingerman
  14 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Move bpf_throw kfunc to common_kfunc_set so that any program type can
utilize it to throw exceptions. This will also be useful to test a wider
variety of programs to test the cleanup logic properly.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/helpers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e1dfc4053f45..388174f34a9b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2729,6 +2729,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
 BTF_ID_FLAGS(func, bpf_dynptr_size)
 BTF_ID_FLAGS(func, bpf_dynptr_clone)
+BTF_ID_FLAGS(func, bpf_throw)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 13/14] bpf: Make bpf_throw available to all program types Kumar Kartikeya Dwivedi
@ 2024-02-01  4:21 ` Kumar Kartikeya Dwivedi
  2024-02-12 20:53   ` David Vernet
  2024-03-14 11:08 ` [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Eduard Zingerman
  14 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-01  4:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Add tests for the runtime cleanup support for exceptions, ensuring that
resources are correctly identified and released when an exception is
thrown. Also, we add negative tests to exercise corner cases the
verifier should reject.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
 .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
 .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
 .../selftests/bpf/progs/exceptions_fail.c     |  13 -
 6 files changed, 689 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 5c2cc7e8c5d0..6fc79727cd14 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,6 +1,7 @@
 bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
 bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
 exceptions					 # JIT does not support calling kfunc bpf_throw: -524
+exceptions_unwind				 # JIT does not support calling kfunc bpf_throw: -524
 fexit_sleep                                      # The test never returns. The remaining tests cannot start.
 kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
 kprobe_multi_test                                # needs CONFIG_FPROBE
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 1a63996c0304..f09a73dee72c 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -1,5 +1,6 @@
 # TEMPORARY
 # Alphabetical order
 exceptions				 # JIT does not support calling kfunc bpf_throw				       (exceptions)
+exceptions_unwind			 # JIT does not support calling kfunc bpf_throw				       (exceptions)
 get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
 stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
new file mode 100644
index 000000000000..78df037b60ea
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
@@ -0,0 +1,65 @@
+#include "bpf/bpf.h"
+#include "exceptions.skel.h"
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "exceptions_cleanup.skel.h"
+#include "exceptions_cleanup_fail.skel.h"
+
+static void test_exceptions_cleanup_fail(void)
+{
+	RUN_TESTS(exceptions_cleanup_fail);
+}
+
+void test_exceptions_cleanup(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, ropts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct exceptions_cleanup *skel;
+	int ret;
+
+	if (test__start_subtest("exceptions_cleanup_fail"))
+		test_exceptions_cleanup_fail();
+
+	skel = exceptions_cleanup__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
+		return;
+
+	ret = exceptions_cleanup__attach(skel);
+	if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
+		return;
+
+#define RUN_EXC_CLEANUP_TEST(name)                                      \
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.name), \
+				     &ropts);                           \
+	if (!ASSERT_OK(ret, #name ": return value"))                    \
+		return;                                                 \
+	if (!ASSERT_EQ(ropts.retval, 0xeB9F, #name ": opts.retval"))    \
+		return;                                                 \
+	ret = bpf_prog_test_run_opts(                                   \
+		bpf_program__fd(skel->progs.exceptions_cleanup_check),  \
+		&ropts);                                                \
+	if (!ASSERT_OK(ret, #name " CHECK: return value"))              \
+		return;                                                 \
+	if (!ASSERT_EQ(ropts.retval, 0, #name " CHECK: opts.retval"))   \
+		return;													\
+	skel->bss->only_count = 0;
+
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_num_iter);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_num_iter_mult);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_dynptr_iter);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_obj);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_percpu_obj);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_ringbuf);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_reg);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_null_or_ptr_do_ptr);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_null_or_ptr_do_null);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_callee_saved);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_frame);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_loop_iterations);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_dead_code_elim);
+	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_frame_dce);
+}
diff --git a/tools/testing/selftests/bpf/progs/exceptions_cleanup.c b/tools/testing/selftests/bpf/progs/exceptions_cleanup.c
new file mode 100644
index 000000000000..ccf14fe6bd1b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_cleanup.c
@@ -0,0 +1,468 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_endian.h>
+#include "bpf_misc.h"
+#include "bpf_kfuncs.h"
+#include "bpf_experimental.h"
+
+struct {
+    __uint(type, BPF_MAP_TYPE_RINGBUF);
+    __uint(max_entries, 8);
+} ringbuf SEC(".maps");
+
+enum {
+    RES_DYNPTR,
+    RES_ITER,
+    RES_REG,
+    RES_SPILL,
+    __RES_MAX,
+};
+
+struct bpf_resource {
+    int type;
+};
+
+struct {
+    __uint(type, BPF_MAP_TYPE_HASH);
+    __uint(max_entries, 1024);
+    __type(key, int);
+    __type(value, struct bpf_resource);
+} hashmap SEC(".maps");
+
+const volatile bool always_false = false;
+bool only_count = false;
+int res_count = 0;
+
+#define MARK_RESOURCE(ptr, type) ({ res_count++; bpf_map_update_elem(&hashmap, &(void *){ptr}, &(struct bpf_resource){type}, 0); });
+#define FIND_RESOURCE(ptr) ((struct bpf_resource *)bpf_map_lookup_elem(&hashmap, &(void *){ptr}) ?: &(struct bpf_resource){__RES_MAX})
+#define FREE_RESOURCE(ptr) bpf_map_delete_elem(&hashmap, &(void *){ptr})
+#define VAL 0xeB9F
+
+SEC("fentry/bpf_cleanup_resource")
+int BPF_PROG(exception_cleanup_mark_free, struct bpf_frame_desc_reg_entry *fd, void *ptr)
+{
+    if (fd->spill_type == STACK_INVALID)
+        bpf_probe_read_kernel(&ptr, sizeof(ptr), ptr);
+    if (only_count) {
+        res_count--;
+        return 0;
+    }
+    switch (fd->spill_type) {
+    case STACK_SPILL:
+        if (FIND_RESOURCE(ptr)->type == RES_SPILL)
+            FREE_RESOURCE(ptr);
+        break;
+    case STACK_INVALID:
+        if (FIND_RESOURCE(ptr)->type == RES_REG)
+            FREE_RESOURCE(ptr);
+        break;
+    case STACK_ITER:
+        if (FIND_RESOURCE(ptr)->type == RES_ITER)
+            FREE_RESOURCE(ptr);
+        break;
+    case STACK_DYNPTR:
+        if (FIND_RESOURCE(ptr)->type == RES_DYNPTR)
+            FREE_RESOURCE(ptr);
+        break;
+    }
+    return 0;
+}
+
+static long map_cb(struct bpf_map *map, void *key, void *value, void *ctx)
+{
+    int *cnt = ctx;
+
+    (*cnt)++;
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_check(struct __sk_buff *ctx)
+{
+    int cnt = 0;
+
+    if (only_count)
+        return res_count;
+    bpf_for_each_map_elem(&hashmap, map_cb, &cnt, 0);
+    return cnt;
+}
+
+SEC("tc")
+int exceptions_cleanup_prog_num_iter(struct __sk_buff *ctx)
+{
+    int i;
+
+    bpf_for(i, 0, 10) {
+        MARK_RESOURCE(&___it, RES_ITER);
+        bpf_throw(VAL);
+    }
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_prog_num_iter_mult(struct __sk_buff *ctx)
+{
+    int i, j, k;
+
+    bpf_for(i, 0, 10) {
+        MARK_RESOURCE(&___it, RES_ITER);
+        bpf_for(j, 0, 10) {
+            MARK_RESOURCE(&___it, RES_ITER);
+            bpf_for(k, 0, 10) {
+                MARK_RESOURCE(&___it, RES_ITER);
+                bpf_throw(VAL);
+            }
+        }
+    }
+    return 0;
+}
+
+__noinline
+static int exceptions_cleanup_subprog(struct __sk_buff *ctx)
+{
+    int i;
+
+    bpf_for(i, 0, 10) {
+        MARK_RESOURCE(&___it, RES_ITER);
+        bpf_throw(VAL);
+    }
+    return ctx->len;
+}
+
+SEC("tc")
+int exceptions_cleanup_prog_dynptr_iter(struct __sk_buff *ctx)
+{
+    struct bpf_dynptr rbuf;
+    int ret = 0;
+
+    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
+    MARK_RESOURCE(&rbuf, RES_DYNPTR);
+    if (ctx->protocol)
+        ret = exceptions_cleanup_subprog(ctx);
+    bpf_ringbuf_discard_dynptr(&rbuf, 0);
+    return ret;
+}
+
+SEC("tc")
+int exceptions_cleanup_obj(struct __sk_buff *ctx)
+{
+    struct { int i; } *p;
+
+    p = bpf_obj_new(typeof(*p));
+    MARK_RESOURCE(&p, RES_SPILL);
+    bpf_throw(VAL);
+    return p->i;
+}
+
+SEC("tc")
+int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
+{
+    struct { int i; } *p;
+
+    p = bpf_percpu_obj_new(typeof(*p));
+    MARK_RESOURCE(&p, RES_SPILL);
+    bpf_throw(VAL);
+    return !p;
+}
+
+SEC("tc")
+int exceptions_cleanup_ringbuf(struct __sk_buff *ctx)
+{
+    void *p;
+
+    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    MARK_RESOURCE(&p, RES_SPILL);
+    bpf_throw(VAL);
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_reg(struct __sk_buff *ctx)
+{
+    void *p;
+
+    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    MARK_RESOURCE(p, RES_REG);
+    bpf_throw(VAL);
+    if (p)
+        bpf_ringbuf_discard(p, 0);
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_null_or_ptr_do_ptr(struct __sk_buff *ctx)
+{
+    union {
+        void *p;
+        char buf[8];
+    } volatile p;
+    u64 z = 0;
+
+    __builtin_memcpy((void *)&p.p, &z, sizeof(z));
+    MARK_RESOURCE((void *)&p.p, RES_SPILL);
+    if (ctx->len)
+        p.p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    bpf_throw(VAL);
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_null_or_ptr_do_null(struct __sk_buff *ctx)
+{
+    union {
+        void *p;
+        char buf[8];
+    } volatile p;
+
+    p.p = 0;
+    MARK_RESOURCE((void *)p.buf, RES_SPILL);
+    if (!ctx->len)
+        p.p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    bpf_throw(VAL);
+    return 0;
+}
+
+__noinline static int mark_resource_subprog(u64 a, u64 b, u64 c, u64 d)
+{
+    MARK_RESOURCE((void *)a, RES_REG);
+    MARK_RESOURCE((void *)b, RES_REG);
+    MARK_RESOURCE((void *)c, RES_REG);
+    MARK_RESOURCE((void *)d, RES_REG);
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_callee_saved(struct __sk_buff *ctx)
+{
+    asm volatile (
+       "r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        r6 = r0;                        \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        r7 = r0;                        \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        r8 = r0;                        \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        r9 = r0;                        \
+        r1 = r6;                        \
+        r2 = r7;                        \
+        r3 = r8;                        \
+        r4 = r9;                        \
+        call mark_resource_subprog;     \
+        r1 = 0xeB9F;                    \
+        call bpf_throw;                 \
+    " : : __imm(bpf_ringbuf_reserve),
+          __imm_addr(ringbuf)
+      : __clobber_all);
+    mark_resource_subprog(0, 0, 0, 0);
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_callee_saved_noopt(struct __sk_buff *ctx)
+{
+    mark_resource_subprog(1, 2, 3, 4);
+    return 0;
+}
+
+__noinline int global_subprog_throw(struct __sk_buff *ctx)
+{
+    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    bpf_throw(VAL);
+    return p ? *p : 0 + ctx->len;
+}
+
+__noinline int global_subprog(struct __sk_buff *ctx)
+{
+    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    if (!p)
+        return ctx->len;
+    global_subprog_throw(ctx);
+    bpf_ringbuf_discard(p, 0);
+    return !!p + ctx->len;
+}
+
+__noinline static int static_subprog(struct __sk_buff *ctx)
+{
+    struct bpf_dynptr rbuf;
+    u64 *p, r = 0;
+
+    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
+    p = bpf_dynptr_data(&rbuf, 0, 8);
+    if (!p)
+        goto end;
+    *p = global_subprog(ctx);
+    r += *p;
+end:
+    bpf_ringbuf_discard_dynptr(&rbuf, 0);
+    return r + ctx->len;
+}
+
+SEC("tc")
+int exceptions_cleanup_frame(struct __sk_buff *ctx)
+{
+    struct foo { int i; } *p = bpf_obj_new(typeof(*p));
+    int i;
+    only_count = 1;
+    res_count = 4;
+    if (!p)
+        return 1;
+    p->i = static_subprog(ctx);
+    i = p->i;
+    bpf_obj_drop(p);
+    return i + ctx->len;
+}
+
+SEC("tc")
+__success
+int exceptions_cleanup_loop_iterations(struct __sk_buff *ctx)
+{
+    struct { int i; } *f[50] = {};
+    int i;
+
+    only_count = true;
+
+    for (i = 0; i < 50; i++) {
+        f[i] = bpf_obj_new(typeof(*f[0]));
+        if (!f[i])
+            goto end;
+        res_count++;
+        if (i == 49) {
+            bpf_throw(VAL);
+        }
+    }
+end:
+    for (i = 0; i < 50; i++) {
+        if (!f[i])
+            continue;
+        bpf_obj_drop(f[i]);
+    }
+    return 0;
+}
+
+SEC("tc")
+int exceptions_cleanup_dead_code_elim(struct __sk_buff *ctx)
+{
+    void *p;
+
+    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    if (!p)
+        return 0;
+    asm volatile (
+        "r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+    " ::: "r0");
+    bpf_throw(VAL);
+    bpf_ringbuf_discard(p, 0);
+    return 0;
+}
+
+__noinline int global_subprog_throw_dce(struct __sk_buff *ctx)
+{
+    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    bpf_throw(VAL);
+    return p ? *p : 0 + ctx->len;
+}
+
+__noinline int global_subprog_dce(struct __sk_buff *ctx)
+{
+    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    if (!p)
+        return ctx->len;
+    asm volatile (
+        "r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+    " ::: "r0");
+    global_subprog_throw_dce(ctx);
+    bpf_ringbuf_discard(p, 0);
+    return !!p + ctx->len;
+}
+
+__noinline static int static_subprog_dce(struct __sk_buff *ctx)
+{
+    struct bpf_dynptr rbuf;
+    u64 *p, r = 0;
+
+    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
+    p = bpf_dynptr_data(&rbuf, 0, 8);
+    if (!p)
+        goto end;
+    asm volatile (
+        "r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+         r0 = r0;        \
+    " ::: "r0");
+    *p = global_subprog_dce(ctx);
+    r += *p;
+end:
+    bpf_ringbuf_discard_dynptr(&rbuf, 0);
+    return r + ctx->len;
+}
+
+SEC("tc")
+int exceptions_cleanup_frame_dce(struct __sk_buff *ctx)
+{
+    struct foo { int i; } *p = bpf_obj_new(typeof(*p));
+    int i;
+    only_count = 1;
+    res_count = 4;
+    if (!p)
+        return 1;
+    p->i = static_subprog_dce(ctx);
+    i = p->i;
+    bpf_obj_drop(p);
+    return i + ctx->len;
+}
+
+SEC("tc")
+int reject_slot_with_zero_vs_ptr_ok(struct __sk_buff *ctx)
+{
+    asm volatile (
+       "r7 = *(u32 *)(r1 + 0);          \
+        r0 = 0;                         \
+        *(u64 *)(r10 - 8) = r0;         \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        if r7 != 0 goto jump4;          \
+        call %[bpf_ringbuf_reserve];    \
+        *(u64 *)(r10 - 8) = r0;         \
+    jump4:                              \
+        r0 = 0;                         \
+        r1 = 0;                         \
+        call bpf_throw;                 \
+    " : : __imm(bpf_ringbuf_reserve),
+          __imm_addr(ringbuf)
+      : "memory");
+    return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c b/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
new file mode 100644
index 000000000000..b3c70f92b35f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct {
+    __uint(type, BPF_MAP_TYPE_RINGBUF);
+    __uint(max_entries, 8);
+} ringbuf SEC(".maps");
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_reference(void *ctx)
+{
+	struct { int i; } *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_throw(0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
+int reject_slot_with_distinct_ptr(struct __sk_buff *ctx)
+{
+    void *p;
+
+    if (ctx->len) {
+        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    } else {
+        p = bpf_obj_new(typeof(struct { int i; }));
+    }
+    bpf_throw(0);
+    return !p;
+}
+
+SEC("?tc")
+__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
+int reject_slot_with_distinct_ptr_old(struct __sk_buff *ctx)
+{
+    void *p;
+
+    if (ctx->len) {
+        p = bpf_obj_new(typeof(struct { int i; }));
+    } else {
+        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    }
+    bpf_throw(0);
+    return !p;
+}
+
+SEC("?tc")
+__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
+int reject_slot_with_misc_vs_ptr(struct __sk_buff *ctx)
+{
+    void *p = (void *)bpf_ktime_get_ns();
+
+    if (ctx->protocol)
+        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+    bpf_throw(0);
+    return !p;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_slot_with_misc_vs_ptr_old(struct __sk_buff *ctx)
+{
+    void *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
+
+    if (ctx->protocol)
+        p = (void *)bpf_ktime_get_ns();
+    bpf_throw(0);
+    return !p;
+}
+
+SEC("?tc")
+__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
+int reject_slot_with_invalid_vs_ptr(struct __sk_buff *ctx)
+{
+    asm volatile (
+       "r7 = r1;                        \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        r4 = *(u32 *)(r7 + 0);          \
+        r6 = *(u64 *)(r10 - 8);         \
+        if r4 == 0 goto jump;           \
+        call %[bpf_ringbuf_reserve];    \
+        r6 = r0;                        \
+    jump:                               \
+        r0 = 0;                         \
+        r1 = 0;                         \
+        call bpf_throw;                 \
+    " : : __imm(bpf_ringbuf_reserve),
+          __imm_addr(ringbuf)
+      : "memory");
+    return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_slot_with_invalid_vs_ptr_old(struct __sk_buff *ctx)
+{
+    asm volatile (
+       "r7 = r1;                        \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        r6 = r0;                        \
+        r4 = *(u32 *)(r7 + 0);          \
+        if r4 == 0 goto jump2;          \
+        r6 = *(u64 *)(r10 - 8);         \
+    jump2:                              \
+        r0 = 0;                         \
+        r1 = 0;                         \
+        call bpf_throw;                 \
+    " : : __imm(bpf_ringbuf_reserve),
+          __imm_addr(ringbuf)
+      : "memory");
+    return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_slot_with_zero_vs_ptr(struct __sk_buff *ctx)
+{
+    asm volatile (
+       "r7 = *(u32 *)(r1 + 0);          \
+        r1 = %[ringbuf] ll;             \
+        r2 = 8;                         \
+        r3 = 0;                         \
+        call %[bpf_ringbuf_reserve];    \
+        *(u64 *)(r10 - 8) = r0;         \
+        r0 = 0;                         \
+        if r7 != 0 goto jump3;          \
+        *(u64 *)(r10 - 8) = r0;         \
+    jump3:                              \
+        r0 = 0;                         \
+        r1 = 0;                         \
+        call bpf_throw;                 \
+    " : : __imm(bpf_ringbuf_reserve),
+          __imm_addr(ringbuf)
+      : "memory");
+    return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index dfd164a7a261..1e73200c6276 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -182,19 +182,6 @@ int reject_with_rbtree_add_throw(void *ctx)
 	return 0;
 }
 
-SEC("?tc")
-__failure __msg("Unreleased reference")
-int reject_with_reference(void *ctx)
-{
-	struct foo *f;
-
-	f = bpf_obj_new(typeof(*f));
-	if (!f)
-		return 0;
-	bpf_throw(0);
-	return 0;
-}
-
 __noinline static int subprog_ref(struct __sk_buff *ctx)
 {
 	struct foo *f;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
  2024-02-01  4:21 ` [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions Kumar Kartikeya Dwivedi
@ 2024-02-03  9:02   ` kernel test robot
  2024-02-16 12:02   ` Eduard Zingerman
  1 sibling, 0 replies; 53+ messages in thread
From: kernel test robot @ 2024-02-03  9:02 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: oe-kbuild-all

Hi Kumar,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on 77326a4a06e1e97432322f403cb439880871d34d]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Mark-subprogs-as-throw-reachable-before-do_check-pass/20240201-162356
base:   77326a4a06e1e97432322f403cb439880871d34d
patch link:    https://lore.kernel.org/r/20240201042109.1150490-11-memxor%40gmail.com
patch subject: [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20240203/202402031647.EPHvf6YH-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240203/202402031647.EPHvf6YH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202402031647.EPHvf6YH-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/btf.h:10,
                    from include/linux/bpf.h:28,
                    from include/linux/bpf_verifier.h:7,
                    from net/core/filter.c:21:
   include/linux/btf_ids.h:231: warning: "BTF_KFUNCS_START" redefined
     231 | #define BTF_KFUNCS_START(name) static struct btf_id_set8 __maybe_unused name = { .flags = BTF_SET8_KFUNCS };
         | 
   include/linux/btf_ids.h:230: note: this is the location of the previous definition
     230 | #define BTF_KFUNCS_START(name) static struct btf_id_set8 __maybe_unused name = { 0 };
         | 
   net/core/filter.c: In function 'bpf_sk_release_dtor':
>> net/core/filter.c:6917:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    6917 |         bpf_sk_release((u64)ptr, 0, 0, 0, 0);
         |                        ^
--
   In file included from include/linux/btf.h:10,
                    from include/linux/bpf.h:28,
                    from kernel/bpf/helpers.c:4:
   include/linux/btf_ids.h:231: warning: "BTF_KFUNCS_START" redefined
     231 | #define BTF_KFUNCS_START(name) static struct btf_id_set8 __maybe_unused name = { .flags = BTF_SET8_KFUNCS };
         | 
   include/linux/btf_ids.h:230: note: this is the location of the previous definition
     230 | #define BTF_KFUNCS_START(name) static struct btf_id_set8 __maybe_unused name = { 0 };
         | 
   kernel/bpf/helpers.c: In function 'bpf_cleanup_resource_reg':
   kernel/bpf/helpers.c:2515:45: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    2515 |                         bpf_sk_release_dtor((void *)reg_value);
         |                                             ^
   kernel/bpf/helpers.c:2527:50: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    2527 |                         bpf_percpu_obj_drop_impl((void *)reg_value, meta);
         |                                                  ^
   kernel/bpf/helpers.c:2529:43: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    2529 |                         bpf_obj_drop_impl((void *)reg_value, meta);
         |                                           ^
   kernel/bpf/helpers.c:2535:53: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    2535 |                                 bpf_sk_release_dtor((void *)reg_value);
         |                                                     ^
   kernel/bpf/helpers.c:2539:68: error: 'BPF_DTOR_KPTR' undeclared (first use in this function); did you mean 'BPF_KPTR'?
    2539 |                 dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_KPTR | BPF_DTOR_CLEANUP);
         |                                                                    ^~~~~~~~~~~~~
         |                                                                    BPF_KPTR
   kernel/bpf/helpers.c:2539:68: note: each undeclared identifier is reported only once for each function it appears in
   kernel/bpf/helpers.c:2539:84: error: 'BPF_DTOR_CLEANUP' undeclared (first use in this function)
    2539 |                 dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_KPTR | BPF_DTOR_CLEANUP);
         |                                                                                    ^~~~~~~~~~~~~~~~
   kernel/bpf/helpers.c:2539:27: error: too many arguments to function 'btf_find_dtor_kfunc'
    2539 |                 dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_KPTR | BPF_DTOR_CLEANUP);
         |                           ^~~~~~~~~~~~~~~~~~~
   include/linux/btf.h:524:5: note: declared here
     524 | s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
         |     ^~~~~~~~~~~~~~~~~~~
   kernel/bpf/helpers.c:2547:30: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    2547 |                         dtor((void *)reg_value);
         |                              ^
   kernel/bpf/helpers.c: In function 'bpf_cleanup_resource_dynptr':
>> kernel/bpf/helpers.c:2564:63: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    2564 |                         bpf_ringbuf_discard_dynptr_proto.func((u64)ptr, 0, 0, 0, 0);
         |                                                               ^
   kernel/bpf/helpers.c: In function 'bpf_cleanup_resource_iter':
   kernel/bpf/helpers.c:2578:60: error: 'BPF_DTOR_CLEANUP' undeclared (first use in this function)
    2578 |         dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_CLEANUP);
         |                                                            ^~~~~~~~~~~~~~~~
   kernel/bpf/helpers.c:2578:19: error: too many arguments to function 'btf_find_dtor_kfunc'
    2578 |         dtor_id = btf_find_dtor_kfunc(fd->btf, fd->btf_id, BPF_DTOR_CLEANUP);
         |                   ^~~~~~~~~~~~~~~~~~~
   include/linux/btf.h:524:5: note: declared here
     524 | s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
         |     ^~~~~~~~~~~~~~~~~~~


vim +6917 net/core/filter.c

  6914	
  6915	void bpf_sk_release_dtor(void *ptr)
  6916	{
> 6917		bpf_sk_release((u64)ptr, 0, 0, 0, 0);
  6918	}
  6919	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass
  2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
@ 2024-02-12 19:35   ` David Vernet
  2024-02-12 22:28     ` Kumar Kartikeya Dwivedi
  2024-02-15  1:01   ` Eduard Zingerman
  1 sibling, 1 reply; 53+ messages in thread
From: David Vernet @ 2024-02-12 19:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

[-- Attachment #1: Type: text/plain, Size: 8424 bytes --]

On Thu, Feb 01, 2024 at 04:20:56AM +0000, Kumar Kartikeya Dwivedi wrote:
> The motivation of this patch is to figure out which subprogs participate
> in exception propagation. In other words, whichever subprog's execution
> can lead to an exception being thrown either directly or indirectly (by
> way of calling other subprogs).
> 
> With the current exceptions support, the runtime performs stack
> unwinding when bpf_throw is called. For now, any resources acquired by
> the program cannot be released, therefore bpf_throw calls made with
> non-zero acquired references must be rejected during verification.
> 
> However, there currently exists a loophole in this restriction due to
> the way the verification procedure is structured. The verifier will
> first walk over the main subprog's instructions, but not descend into
> subprog calls to ones with global linkage. These global subprogs will
> then be independently verified instead. Therefore, in a situation where
> a global subprog ends up throwing an exception (either directly by
> calling bpf_throw, or indirectly by way of calling another subprog that
> does so), the verifier will fail to notice this fact and may permit
> throwing BPF exceptions with non-zero acquired references.
> 
> Therefore, to fix this, we add a summarization pass before the do_check
> stage which walks all call chains of the program and marks all of the
> subprogs that are reachable from a bpf_throw call which unwinds the
> program stack.
> 
> We only do so if we actually see a bpf_throw call in the program though,
> since we do not want to walk all instructions unless we need to.  One we

s/Once/once

> analyze all possible call chains of the program, we will be able to mark
> them as 'is_throw_reachable' in their subprog_info.
> 
> After performing this step, we need to make another change as to how
> subprog call verification occurs. In case of global subprog, we will
> need to explore an alternate program path where the call instruction
> processing of a global subprog's call will immediately throw an
> exception. We will thus simulate a normal path without any exceptions,
> and one where the exception is thrown and the program proceeds no
> further. In this way, the verifier will be able to detect the whether
> any acquired references or locks exist in the verifier state and thus
> reject the program if needed.
> 
> Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Just had a few nits and one question. Looks reasonable to me overall.

> ---
>  include/linux/bpf_verifier.h |  2 +
>  kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 88 insertions(+)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 0dcde339dc7e..1d666b6c21e6 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -626,6 +626,7 @@ struct bpf_subprog_info {
>  	bool is_async_cb: 1;
>  	bool is_exception_cb: 1;
>  	bool args_cached: 1;
> +	bool is_throw_reachable: 1;
>  
>  	u8 arg_cnt;
>  	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> @@ -691,6 +692,7 @@ struct bpf_verifier_env {
>  	bool bypass_spec_v4;
>  	bool seen_direct_write;
>  	bool seen_exception;
> +	bool seen_throw_insn;
>  	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
>  	const struct bpf_line_info *prev_linfo;
>  	struct bpf_verifier_log log;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index cd4d780e5400..bba53c4e3a0c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2941,6 +2941,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
>  		    insn[i].src_reg == 0 &&
>  		    insn[i].imm == BPF_FUNC_tail_call)
>  			subprog[cur_subprog].has_tail_call = true;
> +		if (!env->seen_throw_insn && is_bpf_throw_kfunc(&insn[i]))
> +			env->seen_throw_insn = true;
>  		if (BPF_CLASS(code) == BPF_LD &&
>  		    (BPF_MODE(code) == BPF_ABS || BPF_MODE(code) == BPF_IND))
>  			subprog[cur_subprog].has_ld_abs = true;
> @@ -5866,6 +5868,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
>  
>  			if (!is_bpf_throw_kfunc(insn + i))
>  				continue;
> +			/* When this is allowed, don't forget to update logic for sync and
> +			 * async callbacks in mark_exception_reachable_subprogs.
> +			 */
>  			if (subprog[idx].is_cb)
>  				err = true;
>  			for (int c = 0; c < frame && !err; c++) {
> @@ -16205,6 +16210,83 @@ static int check_btf_info(struct bpf_verifier_env *env,
>  	return 0;
>  }
>  
> +/* We walk the call graph of the program in this function, and mark everything in
> + * the call chain as 'is_throw_reachable'. This allows us to know which subprog
> + * calls may propagate an exception and generate exception frame descriptors for
> + * those call instructions. We already do that for bpf_throw calls made directly,
> + * but we need to mark the subprogs as we won't be able to see the call chains
> + * during symbolic execution in do_check_common due to global subprogs.
> + *
> + * Note that unlike check_max_stack_depth, we don't explore the async callbacks
> + * apart from main subprogs, as we don't support throwing from them for now, but

Comment ending prematurely

> + */
> +static int mark_exception_reachable_subprogs(struct bpf_verifier_env *env)
> +{
> +	struct bpf_subprog_info *subprog = env->subprog_info;
> +	struct bpf_insn *insn = env->prog->insnsi;
> +	int idx = 0, frame = 0, i, subprog_end;
> +	int ret_insn[MAX_CALL_FRAMES];
> +	int ret_prog[MAX_CALL_FRAMES];
> +
> +	/* No need if we never saw any bpf_throw() call in the program. */
> +	if (!env->seen_throw_insn)
> +		return 0;
> +
> +	i = subprog[idx].start;
> +restart:
> +	subprog_end = subprog[idx + 1].start;
> +	for (; i < subprog_end; i++) {
> +		int next_insn, sidx;
> +
> +		if (bpf_pseudo_kfunc_call(insn + i) && !insn[i].off) {

When should a kfunc call ever have a nonzero offset? We use the
immediate for the BTF ID, don't we?

> +			if (!is_bpf_throw_kfunc(insn + i))
> +				continue;
> +			subprog[idx].is_throw_reachable = true;
> +			for (int j = 0; j < frame; j++)
> +				subprog[ret_prog[j]].is_throw_reachable = true;
> +		}
> +
> +		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
> +			continue;
> +		/* remember insn and function to return to */
> +		ret_insn[frame] = i + 1;
> +		ret_prog[frame] = idx;
> +
> +		/* find the callee */
> +		next_insn = i + insn[i].imm + 1;
> +		sidx = find_subprog(env, next_insn);
> +		if (sidx < 0) {
> +			WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
> +			return -EFAULT;
> +		}
> +		/* We cannot distinguish between sync or async cb, so we need to follow
> +		 * both.  Async callbacks don't really propagate exceptions but calling
> +		 * bpf_throw from them is not allowed anyway, so there is no harm in
> +		 * exploring them.
> +		 * TODO: To address this properly, we will have to move is_cb,
> +		 * is_async_cb markings to the stage before do_check.
> +		 */
> +		i = next_insn;
> +		idx = sidx;
> +
> +		frame++;
> +		if (frame >= MAX_CALL_FRAMES) {
> +			verbose(env, "the call stack of %d frames is too deep !\n", frame);
> +			return -E2BIG;
> +		}
> +		goto restart;
> +	}
> +	/* end of for() loop means the last insn of the 'subprog'
> +	 * was reached. Doesn't matter whether it was JA or EXIT
> +	 */
> +	if (frame == 0)
> +		return 0;
> +	frame--;
> +	i = ret_insn[frame];
> +	idx = ret_prog[frame];
> +	goto restart;
> +}

If you squint youre eyes there's a non-trivial amount of duplicated
intent / logic here compared to check_max_stack_depth_subprog(). Do you
think it would be possible to combine them somehow?

> +
>  /* check %cur's range satisfies %old's */
>  static bool range_within(struct bpf_reg_state *old,
>  			 struct bpf_reg_state *cur)
> @@ -20939,6 +21021,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>  	if (ret < 0)
>  		goto skip_full_check;
>  
> +	ret = mark_exception_reachable_subprogs(env);
> +	if (ret < 0)
> +		goto skip_full_check;
> +
>  	ret = do_check_main(env);
>  	ret = ret ?: do_check_subprogs(env);
>  
> -- 
> 2.40.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup
  2024-02-01  4:21 ` [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup Kumar Kartikeya Dwivedi
@ 2024-02-12 20:53   ` David Vernet
  2024-02-12 22:43     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: David Vernet @ 2024-02-12 20:53 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

[-- Attachment #1: Type: text/plain, Size: 25667 bytes --]

On Thu, Feb 01, 2024 at 04:21:09AM +0000, Kumar Kartikeya Dwivedi wrote:
> Add tests for the runtime cleanup support for exceptions, ensuring that
> resources are correctly identified and released when an exception is
> thrown. Also, we add negative tests to exercise corner cases the
> verifier should reject.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
>  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
>  .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
>  .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
>  .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
>  .../selftests/bpf/progs/exceptions_fail.c     |  13 -
>  6 files changed, 689 insertions(+), 13 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> 
> diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> index 5c2cc7e8c5d0..6fc79727cd14 100644
> --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> @@ -1,6 +1,7 @@
>  bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
>  bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
>  exceptions					 # JIT does not support calling kfunc bpf_throw: -524
> +exceptions_unwind				 # JIT does not support calling kfunc bpf_throw: -524
>  fexit_sleep                                      # The test never returns. The remaining tests cannot start.
>  kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
>  kprobe_multi_test                                # needs CONFIG_FPROBE
> diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> index 1a63996c0304..f09a73dee72c 100644
> --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> @@ -1,5 +1,6 @@
>  # TEMPORARY
>  # Alphabetical order
>  exceptions				 # JIT does not support calling kfunc bpf_throw				       (exceptions)
> +exceptions_unwind			 # JIT does not support calling kfunc bpf_throw				       (exceptions)
>  get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
>  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
> diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> new file mode 100644
> index 000000000000..78df037b60ea
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> @@ -0,0 +1,65 @@
> +#include "bpf/bpf.h"
> +#include "exceptions.skel.h"
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +
> +#include "exceptions_cleanup.skel.h"
> +#include "exceptions_cleanup_fail.skel.h"
> +
> +static void test_exceptions_cleanup_fail(void)
> +{
> +	RUN_TESTS(exceptions_cleanup_fail);
> +}
> +
> +void test_exceptions_cleanup(void)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, ropts,
> +		.data_in = &pkt_v4,
> +		.data_size_in = sizeof(pkt_v4),
> +		.repeat = 1,
> +	);
> +	struct exceptions_cleanup *skel;
> +	int ret;
> +
> +	if (test__start_subtest("exceptions_cleanup_fail"))
> +		test_exceptions_cleanup_fail();

RUN_TESTS takes care of doing test__start_subtest(), etc. You should be
able to just call RUN_TESTS(exceptions_cleanup_fail) directly here.

> +
> +	skel = exceptions_cleanup__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
> +		return;
> +
> +	ret = exceptions_cleanup__attach(skel);
> +	if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
> +		return;
> +
> +#define RUN_EXC_CLEANUP_TEST(name)                                      \

Should we add a call to if (test__start_subtest(#name)) to this macro?

> +	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.name), \
> +				     &ropts);                           \
> +	if (!ASSERT_OK(ret, #name ": return value"))                    \
> +		return;                                                 \
> +	if (!ASSERT_EQ(ropts.retval, 0xeB9F, #name ": opts.retval"))    \
> +		return;                                                 \
> +	ret = bpf_prog_test_run_opts(                                   \
> +		bpf_program__fd(skel->progs.exceptions_cleanup_check),  \
> +		&ropts);                                                \
> +	if (!ASSERT_OK(ret, #name " CHECK: return value"))              \
> +		return;                                                 \
> +	if (!ASSERT_EQ(ropts.retval, 0, #name " CHECK: opts.retval"))   \
> +		return;													\
> +	skel->bss->only_count = 0;
> +
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_num_iter);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_num_iter_mult);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_prog_dynptr_iter);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_obj);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_percpu_obj);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_ringbuf);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_reg);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_null_or_ptr_do_ptr);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_null_or_ptr_do_null);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_callee_saved);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_frame);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_loop_iterations);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_dead_code_elim);
> +	RUN_EXC_CLEANUP_TEST(exceptions_cleanup_frame_dce);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/exceptions_cleanup.c b/tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> new file mode 100644
> index 000000000000..ccf14fe6bd1b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> @@ -0,0 +1,468 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +#include <bpf/bpf_endian.h>
> +#include "bpf_misc.h"
> +#include "bpf_kfuncs.h"
> +#include "bpf_experimental.h"
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_RINGBUF);
> +    __uint(max_entries, 8);
> +} ringbuf SEC(".maps");
> +
> +enum {
> +    RES_DYNPTR,
> +    RES_ITER,
> +    RES_REG,
> +    RES_SPILL,
> +    __RES_MAX,
> +};
> +
> +struct bpf_resource {
> +    int type;
> +};
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_HASH);
> +    __uint(max_entries, 1024);
> +    __type(key, int);
> +    __type(value, struct bpf_resource);
> +} hashmap SEC(".maps");
> +
> +const volatile bool always_false = false;
> +bool only_count = false;
> +int res_count = 0;
> +
> +#define MARK_RESOURCE(ptr, type) ({ res_count++; bpf_map_update_elem(&hashmap, &(void *){ptr}, &(struct bpf_resource){type}, 0); });
> +#define FIND_RESOURCE(ptr) ((struct bpf_resource *)bpf_map_lookup_elem(&hashmap, &(void *){ptr}) ?: &(struct bpf_resource){__RES_MAX})
> +#define FREE_RESOURCE(ptr) bpf_map_delete_elem(&hashmap, &(void *){ptr})
> +#define VAL 0xeB9F
> +
> +SEC("fentry/bpf_cleanup_resource")
> +int BPF_PROG(exception_cleanup_mark_free, struct bpf_frame_desc_reg_entry *fd, void *ptr)
> +{
> +    if (fd->spill_type == STACK_INVALID)
> +        bpf_probe_read_kernel(&ptr, sizeof(ptr), ptr);
> +    if (only_count) {
> +        res_count--;
> +        return 0;
> +    }
> +    switch (fd->spill_type) {
> +    case STACK_SPILL:
> +        if (FIND_RESOURCE(ptr)->type == RES_SPILL)
> +            FREE_RESOURCE(ptr);
> +        break;
> +    case STACK_INVALID:
> +        if (FIND_RESOURCE(ptr)->type == RES_REG)
> +            FREE_RESOURCE(ptr);
> +        break;
> +    case STACK_ITER:
> +        if (FIND_RESOURCE(ptr)->type == RES_ITER)
> +            FREE_RESOURCE(ptr);
> +        break;
> +    case STACK_DYNPTR:
> +        if (FIND_RESOURCE(ptr)->type == RES_DYNPTR)
> +            FREE_RESOURCE(ptr);
> +        break;
> +    }
> +    return 0;
> +}
> +
> +static long map_cb(struct bpf_map *map, void *key, void *value, void *ctx)
> +{
> +    int *cnt = ctx;
> +
> +    (*cnt)++;
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_check(struct __sk_buff *ctx)
> +{
> +    int cnt = 0;
> +
> +    if (only_count)
> +        return res_count;
> +    bpf_for_each_map_elem(&hashmap, map_cb, &cnt, 0);
> +    return cnt;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_prog_num_iter(struct __sk_buff *ctx)
> +{
> +    int i;
> +
> +    bpf_for(i, 0, 10) {
> +        MARK_RESOURCE(&___it, RES_ITER);
> +        bpf_throw(VAL);
> +    }
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_prog_num_iter_mult(struct __sk_buff *ctx)
> +{
> +    int i, j, k;
> +
> +    bpf_for(i, 0, 10) {
> +        MARK_RESOURCE(&___it, RES_ITER);
> +        bpf_for(j, 0, 10) {
> +            MARK_RESOURCE(&___it, RES_ITER);
> +            bpf_for(k, 0, 10) {
> +                MARK_RESOURCE(&___it, RES_ITER);
> +                bpf_throw(VAL);
> +            }
> +        }
> +    }
> +    return 0;
> +}
> +
> +__noinline
> +static int exceptions_cleanup_subprog(struct __sk_buff *ctx)
> +{
> +    int i;
> +
> +    bpf_for(i, 0, 10) {
> +        MARK_RESOURCE(&___it, RES_ITER);
> +        bpf_throw(VAL);
> +    }
> +    return ctx->len;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_prog_dynptr_iter(struct __sk_buff *ctx)
> +{
> +    struct bpf_dynptr rbuf;
> +    int ret = 0;
> +
> +    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
> +    MARK_RESOURCE(&rbuf, RES_DYNPTR);
> +    if (ctx->protocol)
> +        ret = exceptions_cleanup_subprog(ctx);
> +    bpf_ringbuf_discard_dynptr(&rbuf, 0);
> +    return ret;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_obj(struct __sk_buff *ctx)
> +{
> +    struct { int i; } *p;
> +
> +    p = bpf_obj_new(typeof(*p));
> +    MARK_RESOURCE(&p, RES_SPILL);
> +    bpf_throw(VAL);
> +    return p->i;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
> +{
> +    struct { int i; } *p;
> +
> +    p = bpf_percpu_obj_new(typeof(*p));
> +    MARK_RESOURCE(&p, RES_SPILL);
> +    bpf_throw(VAL);

It would be neat if we could have the bpf_throw() kfunc signature be
marked as __attribute__((noreturn)) and have things work correctly;
meaning you wouldn't have to even return a value here. The verifier
should know that bpf_throw() is terminal, so it should be able to prune
any subsequent instructions as unreachable anyways.

> +    return !p;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_ringbuf(struct __sk_buff *ctx)
> +{
> +    void *p;
> +
> +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    MARK_RESOURCE(&p, RES_SPILL);
> +    bpf_throw(VAL);
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_reg(struct __sk_buff *ctx)
> +{
> +    void *p;
> +
> +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    MARK_RESOURCE(p, RES_REG);
> +    bpf_throw(VAL);
> +    if (p)
> +        bpf_ringbuf_discard(p, 0);

Does the prog fail to load if you don't have this bpf_ringbuf_discard()
check? I assume not given that in
exceptions_cleanup_null_or_ptr_do_ptr() and elsewhere we do a reserve
without discarding. Is there some subtle stack state difference here or
something?

> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_null_or_ptr_do_ptr(struct __sk_buff *ctx)
> +{
> +    union {
> +        void *p;
> +        char buf[8];
> +    } volatile p;
> +    u64 z = 0;
> +
> +    __builtin_memcpy((void *)&p.p, &z, sizeof(z));
> +    MARK_RESOURCE((void *)&p.p, RES_SPILL);
> +    if (ctx->len)
> +        p.p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    bpf_throw(VAL);
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_null_or_ptr_do_null(struct __sk_buff *ctx)
> +{
> +    union {
> +        void *p;
> +        char buf[8];
> +    } volatile p;
> +
> +    p.p = 0;
> +    MARK_RESOURCE((void *)p.buf, RES_SPILL);
> +    if (!ctx->len)
> +        p.p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    bpf_throw(VAL);
> +    return 0;
> +}
> +
> +__noinline static int mark_resource_subprog(u64 a, u64 b, u64 c, u64 d)
> +{
> +    MARK_RESOURCE((void *)a, RES_REG);
> +    MARK_RESOURCE((void *)b, RES_REG);
> +    MARK_RESOURCE((void *)c, RES_REG);
> +    MARK_RESOURCE((void *)d, RES_REG);
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_callee_saved(struct __sk_buff *ctx)
> +{
> +    asm volatile (
> +       "r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        r6 = r0;                        \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        r7 = r0;                        \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        r8 = r0;                        \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        r9 = r0;                        \
> +        r1 = r6;                        \
> +        r2 = r7;                        \
> +        r3 = r8;                        \
> +        r4 = r9;                        \
> +        call mark_resource_subprog;     \
> +        r1 = 0xeB9F;                    \
> +        call bpf_throw;                 \
> +    " : : __imm(bpf_ringbuf_reserve),
> +          __imm_addr(ringbuf)
> +      : __clobber_all);
> +    mark_resource_subprog(0, 0, 0, 0);
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_callee_saved_noopt(struct __sk_buff *ctx)
> +{
> +    mark_resource_subprog(1, 2, 3, 4);
> +    return 0;
> +}
> +
> +__noinline int global_subprog_throw(struct __sk_buff *ctx)
> +{
> +    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    bpf_throw(VAL);
> +    return p ? *p : 0 + ctx->len;
> +}
> +
> +__noinline int global_subprog(struct __sk_buff *ctx)
> +{
> +    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    if (!p)
> +        return ctx->len;
> +    global_subprog_throw(ctx);
> +    bpf_ringbuf_discard(p, 0);
> +    return !!p + ctx->len;
> +}
> +
> +__noinline static int static_subprog(struct __sk_buff *ctx)
> +{
> +    struct bpf_dynptr rbuf;
> +    u64 *p, r = 0;
> +
> +    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
> +    p = bpf_dynptr_data(&rbuf, 0, 8);
> +    if (!p)
> +        goto end;
> +    *p = global_subprog(ctx);
> +    r += *p;
> +end:
> +    bpf_ringbuf_discard_dynptr(&rbuf, 0);
> +    return r + ctx->len;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_frame(struct __sk_buff *ctx)
> +{
> +    struct foo { int i; } *p = bpf_obj_new(typeof(*p));
> +    int i;
> +    only_count = 1;
> +    res_count = 4;
> +    if (!p)
> +        return 1;
> +    p->i = static_subprog(ctx);
> +    i = p->i;
> +    bpf_obj_drop(p);
> +    return i + ctx->len;
> +}
> +
> +SEC("tc")
> +__success
> +int exceptions_cleanup_loop_iterations(struct __sk_buff *ctx)
> +{
> +    struct { int i; } *f[50] = {};
> +    int i;
> +
> +    only_count = true;
> +
> +    for (i = 0; i < 50; i++) {
> +        f[i] = bpf_obj_new(typeof(*f[0]));
> +        if (!f[i])
> +            goto end;
> +        res_count++;
> +        if (i == 49) {
> +            bpf_throw(VAL);
> +        }
> +    }
> +end:
> +    for (i = 0; i < 50; i++) {
> +        if (!f[i])
> +            continue;
> +        bpf_obj_drop(f[i]);
> +    }
> +    return 0;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_dead_code_elim(struct __sk_buff *ctx)
> +{
> +    void *p;
> +
> +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    if (!p)
> +        return 0;
> +    asm volatile (
> +        "r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +    " ::: "r0");
> +    bpf_throw(VAL);
> +    bpf_ringbuf_discard(p, 0);
> +    return 0;
> +}
> +
> +__noinline int global_subprog_throw_dce(struct __sk_buff *ctx)
> +{
> +    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    bpf_throw(VAL);
> +    return p ? *p : 0 + ctx->len;
> +}
> +
> +__noinline int global_subprog_dce(struct __sk_buff *ctx)
> +{
> +    u64 *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    if (!p)
> +        return ctx->len;
> +    asm volatile (
> +        "r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +    " ::: "r0");
> +    global_subprog_throw_dce(ctx);
> +    bpf_ringbuf_discard(p, 0);
> +    return !!p + ctx->len;
> +}
> +
> +__noinline static int static_subprog_dce(struct __sk_buff *ctx)
> +{
> +    struct bpf_dynptr rbuf;
> +    u64 *p, r = 0;
> +
> +    bpf_ringbuf_reserve_dynptr(&ringbuf, 8, 0, &rbuf);
> +    p = bpf_dynptr_data(&rbuf, 0, 8);
> +    if (!p)
> +        goto end;
> +    asm volatile (
> +        "r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +         r0 = r0;        \
> +    " ::: "r0");
> +    *p = global_subprog_dce(ctx);
> +    r += *p;
> +end:
> +    bpf_ringbuf_discard_dynptr(&rbuf, 0);
> +    return r + ctx->len;
> +}
> +
> +SEC("tc")
> +int exceptions_cleanup_frame_dce(struct __sk_buff *ctx)
> +{
> +    struct foo { int i; } *p = bpf_obj_new(typeof(*p));
> +    int i;
> +    only_count = 1;
> +    res_count = 4;
> +    if (!p)
> +        return 1;
> +    p->i = static_subprog_dce(ctx);
> +    i = p->i;
> +    bpf_obj_drop(p);
> +    return i + ctx->len;
> +}
> +
> +SEC("tc")
> +int reject_slot_with_zero_vs_ptr_ok(struct __sk_buff *ctx)
> +{
> +    asm volatile (
> +       "r7 = *(u32 *)(r1 + 0);          \
> +        r0 = 0;                         \
> +        *(u64 *)(r10 - 8) = r0;         \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        if r7 != 0 goto jump4;          \
> +        call %[bpf_ringbuf_reserve];    \
> +        *(u64 *)(r10 - 8) = r0;         \
> +    jump4:                              \
> +        r0 = 0;                         \
> +        r1 = 0;                         \
> +        call bpf_throw;                 \
> +    " : : __imm(bpf_ringbuf_reserve),
> +          __imm_addr(ringbuf)
> +      : "memory");
> +    return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c b/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> new file mode 100644
> index 000000000000..b3c70f92b35f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> @@ -0,0 +1,154 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_core_read.h>
> +
> +#include "bpf_misc.h"
> +#include "bpf_experimental.h"
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_RINGBUF);
> +    __uint(max_entries, 8);
> +} ringbuf SEC(".maps");
> +
> +SEC("?tc")
> +__failure __msg("Unreleased reference")
> +int reject_with_reference(void *ctx)
> +{
> +	struct { int i; } *f;
> +
> +	f = bpf_obj_new(typeof(*f));
> +	if (!f)
> +		return 0;
> +	bpf_throw(0);
> +	return 0;
> +}
> +
> +SEC("?tc")
> +__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
> +int reject_slot_with_distinct_ptr(struct __sk_buff *ctx)
> +{
> +    void *p;
> +
> +    if (ctx->len) {
> +        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    } else {
> +        p = bpf_obj_new(typeof(struct { int i; }));
> +    }
> +    bpf_throw(0);
> +    return !p;
> +}
> +
> +SEC("?tc")
> +__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
> +int reject_slot_with_distinct_ptr_old(struct __sk_buff *ctx)
> +{
> +    void *p;
> +
> +    if (ctx->len) {
> +        p = bpf_obj_new(typeof(struct { int i; }));
> +    } else {
> +        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    }
> +    bpf_throw(0);
> +    return !p;
> +}
> +
> +SEC("?tc")
> +__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
> +int reject_slot_with_misc_vs_ptr(struct __sk_buff *ctx)
> +{
> +    void *p = (void *)bpf_ktime_get_ns();
> +
> +    if (ctx->protocol)
> +        p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +    bpf_throw(0);
> +    return !p;
> +}
> +
> +SEC("?tc")
> +__failure __msg("Unreleased reference")
> +int reject_slot_with_misc_vs_ptr_old(struct __sk_buff *ctx)
> +{
> +    void *p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> +
> +    if (ctx->protocol)
> +        p = (void *)bpf_ktime_get_ns();
> +    bpf_throw(0);
> +    return !p;
> +}
> +
> +SEC("?tc")
> +__failure __msg("frame_desc: merge: failed to merge old and new frame desc entry")
> +int reject_slot_with_invalid_vs_ptr(struct __sk_buff *ctx)
> +{
> +    asm volatile (
> +       "r7 = r1;                        \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        r4 = *(u32 *)(r7 + 0);          \
> +        r6 = *(u64 *)(r10 - 8);         \
> +        if r4 == 0 goto jump;           \
> +        call %[bpf_ringbuf_reserve];    \
> +        r6 = r0;                        \
> +    jump:                               \
> +        r0 = 0;                         \
> +        r1 = 0;                         \
> +        call bpf_throw;                 \
> +    " : : __imm(bpf_ringbuf_reserve),
> +          __imm_addr(ringbuf)
> +      : "memory");
> +    return 0;
> +}
> +
> +SEC("?tc")
> +__failure __msg("Unreleased reference")
> +int reject_slot_with_invalid_vs_ptr_old(struct __sk_buff *ctx)
> +{
> +    asm volatile (
> +       "r7 = r1;                        \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        r6 = r0;                        \
> +        r4 = *(u32 *)(r7 + 0);          \
> +        if r4 == 0 goto jump2;          \
> +        r6 = *(u64 *)(r10 - 8);         \
> +    jump2:                              \
> +        r0 = 0;                         \
> +        r1 = 0;                         \
> +        call bpf_throw;                 \
> +    " : : __imm(bpf_ringbuf_reserve),
> +          __imm_addr(ringbuf)
> +      : "memory");
> +    return 0;
> +}
> +
> +SEC("?tc")
> +__failure __msg("Unreleased reference")
> +int reject_slot_with_zero_vs_ptr(struct __sk_buff *ctx)
> +{
> +    asm volatile (
> +       "r7 = *(u32 *)(r1 + 0);          \
> +        r1 = %[ringbuf] ll;             \
> +        r2 = 8;                         \
> +        r3 = 0;                         \
> +        call %[bpf_ringbuf_reserve];    \
> +        *(u64 *)(r10 - 8) = r0;         \
> +        r0 = 0;                         \
> +        if r7 != 0 goto jump3;          \
> +        *(u64 *)(r10 - 8) = r0;         \
> +    jump3:                              \
> +        r0 = 0;                         \
> +        r1 = 0;                         \
> +        call bpf_throw;                 \
> +    " : : __imm(bpf_ringbuf_reserve),
> +          __imm_addr(ringbuf)
> +      : "memory");
> +    return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
> index dfd164a7a261..1e73200c6276 100644
> --- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
> +++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
> @@ -182,19 +182,6 @@ int reject_with_rbtree_add_throw(void *ctx)
>  	return 0;
>  }
>  
> -SEC("?tc")
> -__failure __msg("Unreleased reference")
> -int reject_with_reference(void *ctx)
> -{
> -	struct foo *f;
> -
> -	f = bpf_obj_new(typeof(*f));
> -	if (!f)
> -		return 0;
> -	bpf_throw(0);

Hmm, so why is this a memory leak exactly? Apologies if this is already
explained clearly elsewhere in the stack.

> -	return 0;
> -}
> -
>  __noinline static int subprog_ref(struct __sk_buff *ctx)
>  {
>  	struct foo *f;
> -- 
> 2.40.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass
  2024-02-12 19:35   ` David Vernet
@ 2024-02-12 22:28     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-12 22:28 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

On Mon, 12 Feb 2024 at 20:35, David Vernet <void@manifault.com> wrote:
>
> On Thu, Feb 01, 2024 at 04:20:56AM +0000, Kumar Kartikeya Dwivedi wrote:
> > The motivation of this patch is to figure out which subprogs participate
> > in exception propagation. In other words, whichever subprog's execution
> > can lead to an exception being thrown either directly or indirectly (by
> > way of calling other subprogs).
> >
> > With the current exceptions support, the runtime performs stack
> > unwinding when bpf_throw is called. For now, any resources acquired by
> > the program cannot be released, therefore bpf_throw calls made with
> > non-zero acquired references must be rejected during verification.
> >
> > However, there currently exists a loophole in this restriction due to
> > the way the verification procedure is structured. The verifier will
> > first walk over the main subprog's instructions, but not descend into
> > subprog calls to ones with global linkage. These global subprogs will
> > then be independently verified instead. Therefore, in a situation where
> > a global subprog ends up throwing an exception (either directly by
> > calling bpf_throw, or indirectly by way of calling another subprog that
> > does so), the verifier will fail to notice this fact and may permit
> > throwing BPF exceptions with non-zero acquired references.
> >
> > Therefore, to fix this, we add a summarization pass before the do_check
> > stage which walks all call chains of the program and marks all of the
> > subprogs that are reachable from a bpf_throw call which unwinds the
> > program stack.
> >
> > We only do so if we actually see a bpf_throw call in the program though,
> > since we do not want to walk all instructions unless we need to.  One we
>
> s/Once/once
>

Ack, will fix.

> > analyze all possible call chains of the program, we will be able to mark
> > them as 'is_throw_reachable' in their subprog_info.
> >
> > After performing this step, we need to make another change as to how
> > subprog call verification occurs. In case of global subprog, we will
> > need to explore an alternate program path where the call instruction
> > processing of a global subprog's call will immediately throw an
> > exception. We will thus simulate a normal path without any exceptions,
> > and one where the exception is thrown and the program proceeds no
> > further. In this way, the verifier will be able to detect the whether
> > any acquired references or locks exist in the verifier state and thus
> > reject the program if needed.
> >
> > Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>
> Just had a few nits and one question. Looks reasonable to me overall.
>
> > ---
> >  include/linux/bpf_verifier.h |  2 +
> >  kernel/bpf/verifier.c        | 86 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 88 insertions(+)
> >
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 0dcde339dc7e..1d666b6c21e6 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -626,6 +626,7 @@ struct bpf_subprog_info {
> >       bool is_async_cb: 1;
> >       bool is_exception_cb: 1;
> >       bool args_cached: 1;
> > +     bool is_throw_reachable: 1;
> >
> >       u8 arg_cnt;
> >       struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
> > @@ -691,6 +692,7 @@ struct bpf_verifier_env {
> >       bool bypass_spec_v4;
> >       bool seen_direct_write;
> >       bool seen_exception;
> > +     bool seen_throw_insn;
> >       struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
> >       const struct bpf_line_info *prev_linfo;
> >       struct bpf_verifier_log log;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index cd4d780e5400..bba53c4e3a0c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -2941,6 +2941,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
> >                   insn[i].src_reg == 0 &&
> >                   insn[i].imm == BPF_FUNC_tail_call)
> >                       subprog[cur_subprog].has_tail_call = true;
> > +             if (!env->seen_throw_insn && is_bpf_throw_kfunc(&insn[i]))
> > +                     env->seen_throw_insn = true;
> >               if (BPF_CLASS(code) == BPF_LD &&
> >                   (BPF_MODE(code) == BPF_ABS || BPF_MODE(code) == BPF_IND))
> >                       subprog[cur_subprog].has_ld_abs = true;
> > @@ -5866,6 +5868,9 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
> >
> >                       if (!is_bpf_throw_kfunc(insn + i))
> >                               continue;
> > +                     /* When this is allowed, don't forget to update logic for sync and
> > +                      * async callbacks in mark_exception_reachable_subprogs.
> > +                      */
> >                       if (subprog[idx].is_cb)
> >                               err = true;
> >                       for (int c = 0; c < frame && !err; c++) {
> > @@ -16205,6 +16210,83 @@ static int check_btf_info(struct bpf_verifier_env *env,
> >       return 0;
> >  }
> >
> > +/* We walk the call graph of the program in this function, and mark everything in
> > + * the call chain as 'is_throw_reachable'. This allows us to know which subprog
> > + * calls may propagate an exception and generate exception frame descriptors for
> > + * those call instructions. We already do that for bpf_throw calls made directly,
> > + * but we need to mark the subprogs as we won't be able to see the call chains
> > + * during symbolic execution in do_check_common due to global subprogs.
> > + *
> > + * Note that unlike check_max_stack_depth, we don't explore the async callbacks
> > + * apart from main subprogs, as we don't support throwing from them for now, but
>
> Comment ending prematurely
>

Ack.

> > + */
> > +static int mark_exception_reachable_subprogs(struct bpf_verifier_env *env)
> > +{
> > +     struct bpf_subprog_info *subprog = env->subprog_info;
> > +     struct bpf_insn *insn = env->prog->insnsi;
> > +     int idx = 0, frame = 0, i, subprog_end;
> > +     int ret_insn[MAX_CALL_FRAMES];
> > +     int ret_prog[MAX_CALL_FRAMES];
> > +
> > +     /* No need if we never saw any bpf_throw() call in the program. */
> > +     if (!env->seen_throw_insn)
> > +             return 0;
> > +
> > +     i = subprog[idx].start;
> > +restart:
> > +     subprog_end = subprog[idx + 1].start;
> > +     for (; i < subprog_end; i++) {
> > +             int next_insn, sidx;
> > +
> > +             if (bpf_pseudo_kfunc_call(insn + i) && !insn[i].off) {
>
> When should a kfunc call ever have a nonzero offset? We use the
> immediate for the BTF ID, don't we?
>

So in kfuncs, insn.off is used to indicate a vmlinux vs module kfunc.
If off is non-zero, it points to the index in the bpf_attr::fd_array
of the module BTF from which this kfunc comes.
But I think it might be easier to just remove this extra test and do
is_bpf_throw_kfunc directly.

> > +                     if (!is_bpf_throw_kfunc(insn + i))
> > +                             continue;
> > +                     subprog[idx].is_throw_reachable = true;
> > +                     for (int j = 0; j < frame; j++)
> > +                             subprog[ret_prog[j]].is_throw_reachable = true;
> > +             }
> > +
> > +             if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
> > +                     continue;
> > +             /* remember insn and function to return to */
> > +             ret_insn[frame] = i + 1;
> > +             ret_prog[frame] = idx;
> > +
> > +             /* find the callee */
> > +             next_insn = i + insn[i].imm + 1;
> > +             sidx = find_subprog(env, next_insn);
> > +             if (sidx < 0) {
> > +                     WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
> > +                     return -EFAULT;
> > +             }
> > +             /* We cannot distinguish between sync or async cb, so we need to follow
> > +              * both.  Async callbacks don't really propagate exceptions but calling
> > +              * bpf_throw from them is not allowed anyway, so there is no harm in
> > +              * exploring them.
> > +              * TODO: To address this properly, we will have to move is_cb,
> > +              * is_async_cb markings to the stage before do_check.
> > +              */
> > +             i = next_insn;
> > +             idx = sidx;
> > +
> > +             frame++;
> > +             if (frame >= MAX_CALL_FRAMES) {
> > +                     verbose(env, "the call stack of %d frames is too deep !\n", frame);
> > +                     return -E2BIG;
> > +             }
> > +             goto restart;
> > +     }
> > +     /* end of for() loop means the last insn of the 'subprog'
> > +      * was reached. Doesn't matter whether it was JA or EXIT
> > +      */
> > +     if (frame == 0)
> > +             return 0;
> > +     frame--;
> > +     i = ret_insn[frame];
> > +     idx = ret_prog[frame];
> > +     goto restart;
> > +}
>
> If you squint youre eyes there's a non-trivial amount of duplicated
> intent / logic here compared to check_max_stack_depth_subprog(). Do you
> think it would be possible to combine them somehow?
>

I agree, this function is mostly a copy-paste of that function with
some modifications.
I will take a stab at unifying both in the next version, though I
think they will end up calling some common logic from different points
as we need to do this marking before verification, and stack depth
checking after verification. Also, stack depth checks have some other
exception_cb and async_cb related checks as well, so this would
probably be a good opportunity to refactor that code as well.

Basically, just have a common way to iterate over all instructions and
call chains starting from some instruction in a subprog.

> > [...]
> >

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup
  2024-02-12 20:53   ` David Vernet
@ 2024-02-12 22:43     ` Kumar Kartikeya Dwivedi
  2024-02-13 19:33       ` David Vernet
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-12 22:43 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

On Mon, 12 Feb 2024 at 21:53, David Vernet <void@manifault.com> wrote:
>
> On Thu, Feb 01, 2024 at 04:21:09AM +0000, Kumar Kartikeya Dwivedi wrote:
> > Add tests for the runtime cleanup support for exceptions, ensuring that
> > resources are correctly identified and released when an exception is
> > thrown. Also, we add negative tests to exercise corner cases the
> > verifier should reject.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
> >  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
> >  .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
> >  .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
> >  .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
> >  .../selftests/bpf/progs/exceptions_fail.c     |  13 -
> >  6 files changed, 689 insertions(+), 13 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> >
> > diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > index 5c2cc7e8c5d0..6fc79727cd14 100644
> > --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> > +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > @@ -1,6 +1,7 @@
> >  bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> >  bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> >  exceptions                                    # JIT does not support calling kfunc bpf_throw: -524
> > +exceptions_unwind                             # JIT does not support calling kfunc bpf_throw: -524
> >  fexit_sleep                                      # The test never returns. The remaining tests cannot start.
> >  kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
> >  kprobe_multi_test                                # needs CONFIG_FPROBE
> > diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> > index 1a63996c0304..f09a73dee72c 100644
> > --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> > +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> > @@ -1,5 +1,6 @@
> >  # TEMPORARY
> >  # Alphabetical order
> >  exceptions                            # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > +exceptions_unwind                     # JIT does not support calling kfunc bpf_throw                                (exceptions)
> >  get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
> >  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
> > diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > new file mode 100644
> > index 000000000000..78df037b60ea
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > @@ -0,0 +1,65 @@
> > +#include "bpf/bpf.h"
> > +#include "exceptions.skel.h"
> > +#include <test_progs.h>
> > +#include <network_helpers.h>
> > +
> > +#include "exceptions_cleanup.skel.h"
> > +#include "exceptions_cleanup_fail.skel.h"
> > +
> > +static void test_exceptions_cleanup_fail(void)
> > +{
> > +     RUN_TESTS(exceptions_cleanup_fail);
> > +}
> > +
> > +void test_exceptions_cleanup(void)
> > +{
> > +     LIBBPF_OPTS(bpf_test_run_opts, ropts,
> > +             .data_in = &pkt_v4,
> > +             .data_size_in = sizeof(pkt_v4),
> > +             .repeat = 1,
> > +     );
> > +     struct exceptions_cleanup *skel;
> > +     int ret;
> > +
> > +     if (test__start_subtest("exceptions_cleanup_fail"))
> > +             test_exceptions_cleanup_fail();
>
> RUN_TESTS takes care of doing test__start_subtest(), etc. You should be
> able to just call RUN_TESTS(exceptions_cleanup_fail) directly here.
>

Ack, will fix.

> > +
> > +     skel = exceptions_cleanup__open_and_load();
> > +     if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
> > +             return;
> > +
> > +     ret = exceptions_cleanup__attach(skel);
> > +     if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
> > +             return;
> > +
> > +#define RUN_EXC_CLEANUP_TEST(name)                                      \
>
> Should we add a call to if (test__start_subtest(#name)) to this macro?
>

Makes sense, will change this.

> > [...]
> > +
> > +SEC("tc")
> > +int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
> > +{
> > +    struct { int i; } *p;
> > +
> > +    p = bpf_percpu_obj_new(typeof(*p));
> > +    MARK_RESOURCE(&p, RES_SPILL);
> > +    bpf_throw(VAL);
>
> It would be neat if we could have the bpf_throw() kfunc signature be
> marked as __attribute__((noreturn)) and have things work correctly;
> meaning you wouldn't have to even return a value here. The verifier
> should know that bpf_throw() is terminal, so it should be able to prune
> any subsequent instructions as unreachable anyways.
>

Originally, I was tagging the kfunc as noreturn, but Alexei advised
against it in
https://lore.kernel.org/bpf/CAADnVQJtUD6+gYinr+6ensj58qt2LeBj4dvT7Cyu-aBCafsP5g@mail.gmail.com
... so I have dropped it since.

Right now, the verifier will do dead code elimination ofcourse, but
sometimes the compiler does generate code that is tricky or unexpected
(like putting the bpf_throw instruction as the final one instead of
exit or jmp if somehow it can prove that bpf_throw will be taken by
all paths) for the verifier if the bpf_throw is noreturn. Even though
this would have the same effect at runtime (if the analysis of the
compiler is not wrong), there were some places we would have to modify
so that the compiler does not get confused.

Overall I'm not opposed to this, but I think we need more consensus
before flipping the flag. Since this can be changed later and the
necessary changes can be made in the verifier (just a couple of places
which expect exit or jmp to final insns), I decided to move ahead
without noreturn.

> > +    return !p;
> > +}
> > +
> > +SEC("tc")
> > +int exceptions_cleanup_ringbuf(struct __sk_buff *ctx)
> > +{
> > +    void *p;
> > +
> > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > +    MARK_RESOURCE(&p, RES_SPILL);
> > +    bpf_throw(VAL);
> > +    return 0;
> > +}
> > +
> > +SEC("tc")
> > +int exceptions_cleanup_reg(struct __sk_buff *ctx)
> > +{
> > +    void *p;
> > +
> > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > +    MARK_RESOURCE(p, RES_REG);
> > +    bpf_throw(VAL);
> > +    if (p)
> > +        bpf_ringbuf_discard(p, 0);
>
> Does the prog fail to load if you don't have this bpf_ringbuf_discard()
> check? I assume not given that in
> exceptions_cleanup_null_or_ptr_do_ptr() and elsewhere we do a reserve
> without discarding. Is there some subtle stack state difference here or
> something?
>

So I will add comments explaining this, since I realized this confused
you in a couple of places, but basically if I didn't do a discard
here, the compiler wouldn't save the value of p across the bpf_throw
call. So it may end up in some caller-saved register (R1-R5) and since
bpf_throw needs things to be either saved in the stack or in
callee-saved regs (R6-R9) to be able to do the stack unwinding, we
would not be able to test the case where the resource is held in
R6-R9.

In a correctly written program, in the path where bpf_throw is not
done, you will always have some cleanup code (otherwise your program
wouldn't pass), so the value should always end up being preserved
across a bpf_throw call (this is kind of why Alexei was sort of
worried about noreturn, because in that case the compiler may decide
to not preserve it for the bpf_throw path).
You cannot just leak a resource acquired before bpf_throw in the path
where exception is not thrown.

Also,  I think the test is a bit fragile, I should probably rewrite it
in inline assembly, because while the compiler chooses to hold it in a
register here, it is not bound to do so in this case.

> >  [...]
> >
> > -SEC("?tc")
> > -__failure __msg("Unreleased reference")
> > -int reject_with_reference(void *ctx)
> > -{
> > -     struct foo *f;
> > -
> > -     f = bpf_obj_new(typeof(*f));
> > -     if (!f)
> > -             return 0;
> > -     bpf_throw(0);
>
> Hmm, so why is this a memory leak exactly? Apologies if this is already
> explained clearly elsewhere in the stack.
>

I will add comments around some of these to better explain this in the
non-RFC v1.
Basically, this program is sort of unrealistic (since it's always
throwing, and not really cleaning up the object since there is no
other path except the one with bpf_throw). So the compiler ends up
putting 'f' in a caller-saved register, during release_reference we
don't find it after bpf_throw has been processed (since caller-saved
regs have been cleared due to kfunc processing, and we generate frame
descriptors after check_kfunc_call, basically simulating the state
where only preserved state after the call is observed at runtime), but
the reference state still lingers around for 'f', so you get this
"Unreleased reference" error later when check_reference_leak is hit.

It's just trying to exercise the case where the pointer tied to a
reference state has been lost in verifier state, and that we return an
error in such a case and don't succeed in verifying the program
accidently (because there is no way we can recover the value to free
at runtime).

> [...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup
  2024-02-12 22:43     ` Kumar Kartikeya Dwivedi
@ 2024-02-13 19:33       ` David Vernet
  2024-02-13 20:51         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: David Vernet @ 2024-02-13 19:33 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

[-- Attachment #1: Type: text/plain, Size: 12132 bytes --]

On Mon, Feb 12, 2024 at 11:43:42PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Mon, 12 Feb 2024 at 21:53, David Vernet <void@manifault.com> wrote:
> >
> > On Thu, Feb 01, 2024 at 04:21:09AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > Add tests for the runtime cleanup support for exceptions, ensuring that
> > > resources are correctly identified and released when an exception is
> > > thrown. Also, we add negative tests to exercise corner cases the
> > > verifier should reject.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
> > >  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
> > >  .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
> > >  .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
> > >  .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
> > >  .../selftests/bpf/progs/exceptions_fail.c     |  13 -
> > >  6 files changed, 689 insertions(+), 13 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > index 5c2cc7e8c5d0..6fc79727cd14 100644
> > > --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > @@ -1,6 +1,7 @@
> > >  bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > >  bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > >  exceptions                                    # JIT does not support calling kfunc bpf_throw: -524
> > > +exceptions_unwind                             # JIT does not support calling kfunc bpf_throw: -524
> > >  fexit_sleep                                      # The test never returns. The remaining tests cannot start.
> > >  kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
> > >  kprobe_multi_test                                # needs CONFIG_FPROBE
> > > diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > index 1a63996c0304..f09a73dee72c 100644
> > > --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> > > +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > @@ -1,5 +1,6 @@
> > >  # TEMPORARY
> > >  # Alphabetical order
> > >  exceptions                            # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > > +exceptions_unwind                     # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > >  get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
> > >  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > new file mode 100644
> > > index 000000000000..78df037b60ea
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > @@ -0,0 +1,65 @@
> > > +#include "bpf/bpf.h"
> > > +#include "exceptions.skel.h"
> > > +#include <test_progs.h>
> > > +#include <network_helpers.h>
> > > +
> > > +#include "exceptions_cleanup.skel.h"
> > > +#include "exceptions_cleanup_fail.skel.h"
> > > +
> > > +static void test_exceptions_cleanup_fail(void)
> > > +{
> > > +     RUN_TESTS(exceptions_cleanup_fail);
> > > +}
> > > +
> > > +void test_exceptions_cleanup(void)
> > > +{
> > > +     LIBBPF_OPTS(bpf_test_run_opts, ropts,
> > > +             .data_in = &pkt_v4,
> > > +             .data_size_in = sizeof(pkt_v4),
> > > +             .repeat = 1,
> > > +     );
> > > +     struct exceptions_cleanup *skel;
> > > +     int ret;
> > > +
> > > +     if (test__start_subtest("exceptions_cleanup_fail"))
> > > +             test_exceptions_cleanup_fail();
> >
> > RUN_TESTS takes care of doing test__start_subtest(), etc. You should be
> > able to just call RUN_TESTS(exceptions_cleanup_fail) directly here.
> >
> 
> Ack, will fix.
> 
> > > +
> > > +     skel = exceptions_cleanup__open_and_load();
> > > +     if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
> > > +             return;
> > > +
> > > +     ret = exceptions_cleanup__attach(skel);
> > > +     if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
> > > +             return;
> > > +
> > > +#define RUN_EXC_CLEANUP_TEST(name)                                      \
> >
> > Should we add a call to if (test__start_subtest(#name)) to this macro?
> >
> 
> Makes sense, will change this.
> 
> > > [...]
> > > +
> > > +SEC("tc")
> > > +int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
> > > +{
> > > +    struct { int i; } *p;
> > > +
> > > +    p = bpf_percpu_obj_new(typeof(*p));
> > > +    MARK_RESOURCE(&p, RES_SPILL);
> > > +    bpf_throw(VAL);
> >
> > It would be neat if we could have the bpf_throw() kfunc signature be
> > marked as __attribute__((noreturn)) and have things work correctly;
> > meaning you wouldn't have to even return a value here. The verifier
> > should know that bpf_throw() is terminal, so it should be able to prune
> > any subsequent instructions as unreachable anyways.
> >
> 
> Originally, I was tagging the kfunc as noreturn, but Alexei advised
> against it in
> https://lore.kernel.org/bpf/CAADnVQJtUD6+gYinr+6ensj58qt2LeBj4dvT7Cyu-aBCafsP5g@mail.gmail.com
> ... so I have dropped it since.

I see. Ok, we can ignore this for now, though I think we should consider
revisiting this at some point once we've clarified the rules behind the
implicit prologue/epilogue. Being able to actually specify noreturn
really can make a difference in performance in some cases.

> Right now, the verifier will do dead code elimination ofcourse, but
> sometimes the compiler does generate code that is tricky or unexpected
> (like putting the bpf_throw instruction as the final one instead of
> exit or jmp if somehow it can prove that bpf_throw will be taken by
> all paths) for the verifier if the bpf_throw is noreturn. Even though

Got it. As long as the verifier does dead-code elimination on that path,
that's really the most important thing.

> this would have the same effect at runtime (if the analysis of the
> compiler is not wrong), there were some places we would have to modify
> so that the compiler does not get confused.
> 
> Overall I'm not opposed to this, but I think we need more consensus
> before flipping the flag. Since this can be changed later and the
> necessary changes can be made in the verifier (just a couple of places
> which expect exit or jmp to final insns), I decided to move ahead
> without noreturn.

Understood, thanks for explaining. Leaving off noreturn for now is fine
with me.

> > > +    return !p;
> > > +}
> > > +
> > > +SEC("tc")
> > > +int exceptions_cleanup_ringbuf(struct __sk_buff *ctx)
> > > +{
> > > +    void *p;
> > > +
> > > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > > +    MARK_RESOURCE(&p, RES_SPILL);
> > > +    bpf_throw(VAL);
> > > +    return 0;
> > > +}
> > > +
> > > +SEC("tc")
> > > +int exceptions_cleanup_reg(struct __sk_buff *ctx)
> > > +{
> > > +    void *p;
> > > +
> > > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > > +    MARK_RESOURCE(p, RES_REG);
> > > +    bpf_throw(VAL);
> > > +    if (p)
> > > +        bpf_ringbuf_discard(p, 0);
> >
> > Does the prog fail to load if you don't have this bpf_ringbuf_discard()
> > check? I assume not given that in
> > exceptions_cleanup_null_or_ptr_do_ptr() and elsewhere we do a reserve
> > without discarding. Is there some subtle stack state difference here or
> > something?
> >
> 
> So I will add comments explaining this, since I realized this confused
> you in a couple of places, but basically if I didn't do a discard
> here, the compiler wouldn't save the value of p across the bpf_throw
> call. So it may end up in some caller-saved register (R1-R5) and since
> bpf_throw needs things to be either saved in the stack or in
> callee-saved regs (R6-R9) to be able to do the stack unwinding, we
> would not be able to test the case where the resource is held in
> R6-R9.
> 
> In a correctly written program, in the path where bpf_throw is not
> done, you will always have some cleanup code (otherwise your program
> wouldn't pass), so the value should always end up being preserved
> across a bpf_throw call (this is kind of why Alexei was sort of
> worried about noreturn, because in that case the compiler may decide
> to not preserve it for the bpf_throw path).
> You cannot just leak a resource acquired before bpf_throw in the path
> where exception is not thrown.

Ok, that makes sense. I suppose another way to frame this would be to
consider it in a typical scheduling scenario:

struct task_ctx *lookup_task_ctx(struct task_struct *p)
{
	struct task_ctx *taskc;
	s32 pid = p->pid;

	taskc = bpf_map_lookup_elem(&task_data, &pid);
	if (!taskc)
		bpf_throw(-ENOENT); // Verifier 

	return taskc;
}

void BPF_STRUCT_OPS(sched_stopping, struct task_struct *p, bool runnable)
{
	struct task_ctx *taskc;

	taskc = lookup_task_ctx(p)

	/* scale the execution time by the inverse of the weight and charge */
	p->scx.dsq_vtime +=
		(bpf_ktime_get_ns() - taskc->running_at) * 100 / p->scx.weight;
}

We're not dropping a reference here, but taskc is preserved across the
bpf_throw() path, so the same idea applies.

> Also,  I think the test is a bit fragile, I should probably rewrite it
> in inline assembly, because while the compiler chooses to hold it in a
> register here, it is not bound to do so in this case.

To that point, I wonder if it would be useful or possible to come up with some
kind of a macro that allows us to specify a list of variables that must be
preserved after a bpf_throw() call? Not sure how or if that would work exactly.

> > >  [...]
> > >
> > > -SEC("?tc")
> > > -__failure __msg("Unreleased reference")
> > > -int reject_with_reference(void *ctx)
> > > -{
> > > -     struct foo *f;
> > > -
> > > -     f = bpf_obj_new(typeof(*f));
> > > -     if (!f)
> > > -             return 0;
> > > -     bpf_throw(0);
> >
> > Hmm, so why is this a memory leak exactly? Apologies if this is already
> > explained clearly elsewhere in the stack.
> >
> 
> I will add comments around some of these to better explain this in the
> non-RFC v1.
> Basically, this program is sort of unrealistic (since it's always
> throwing, and not really cleaning up the object since there is no
> other path except the one with bpf_throw). So the compiler ends up
> putting 'f' in a caller-saved register, during release_reference we
> don't find it after bpf_throw has been processed (since caller-saved
> regs have been cleared due to kfunc processing, and we generate frame
> descriptors after check_kfunc_call, basically simulating the state
> where only preserved state after the call is observed at runtime), but
> the reference state still lingers around for 'f', so you get this
> "Unreleased reference" error later when check_reference_leak is hit.
> 
> It's just trying to exercise the case where the pointer tied to a
> reference state has been lost in verifier state, and that we return an
> error in such a case and don't succeed in verifying the program
> accidently (because there is no way we can recover the value to free
> at runtime).

Makes total sense, thanks a lot for explaining!

This looks great, I'm really excited to use it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup
  2024-02-13 19:33       ` David Vernet
@ 2024-02-13 20:51         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-13 20:51 UTC (permalink / raw)
  To: David Vernet
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Tejun Heo, Raj Sahu, Dan Williams,
	Rishabh Iyer, Sanidhya Kashyap

On Tue, 13 Feb 2024 at 20:33, David Vernet <void@manifault.com> wrote:
>
> On Mon, Feb 12, 2024 at 11:43:42PM +0100, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 12 Feb 2024 at 21:53, David Vernet <void@manifault.com> wrote:
> > >
> > > On Thu, Feb 01, 2024 at 04:21:09AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > > Add tests for the runtime cleanup support for exceptions, ensuring that
> > > > resources are correctly identified and released when an exception is
> > > > thrown. Also, we add negative tests to exercise corner cases the
> > > > verifier should reject.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
> > > >  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
> > > >  .../bpf/prog_tests/exceptions_cleanup.c       |  65 +++
> > > >  .../selftests/bpf/progs/exceptions_cleanup.c  | 468 ++++++++++++++++++
> > > >  .../bpf/progs/exceptions_cleanup_fail.c       | 154 ++++++
> > > >  .../selftests/bpf/progs/exceptions_fail.c     |  13 -
> > > >  6 files changed, 689 insertions(+), 13 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup.c
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/exceptions_cleanup_fail.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > index 5c2cc7e8c5d0..6fc79727cd14 100644
> > > > --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> > > > @@ -1,6 +1,7 @@
> > > >  bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > > >  bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> > > >  exceptions                                    # JIT does not support calling kfunc bpf_throw: -524
> > > > +exceptions_unwind                             # JIT does not support calling kfunc bpf_throw: -524
> > > >  fexit_sleep                                      # The test never returns. The remaining tests cannot start.
> > > >  kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
> > > >  kprobe_multi_test                                # needs CONFIG_FPROBE
> > > > diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > index 1a63996c0304..f09a73dee72c 100644
> > > > --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> > > > @@ -1,5 +1,6 @@
> > > >  # TEMPORARY
> > > >  # Alphabetical order
> > > >  exceptions                            # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > > > +exceptions_unwind                     # JIT does not support calling kfunc bpf_throw                                (exceptions)
> > > >  get_stack_raw_tp                         # user_stack corrupted user stack                                             (no backchain userspace)
> > > >  stacktrace_build_id                      # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2                   (?)
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > > new file mode 100644
> > > > index 000000000000..78df037b60ea
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/exceptions_cleanup.c
> > > > @@ -0,0 +1,65 @@
> > > > +#include "bpf/bpf.h"
> > > > +#include "exceptions.skel.h"
> > > > +#include <test_progs.h>
> > > > +#include <network_helpers.h>
> > > > +
> > > > +#include "exceptions_cleanup.skel.h"
> > > > +#include "exceptions_cleanup_fail.skel.h"
> > > > +
> > > > +static void test_exceptions_cleanup_fail(void)
> > > > +{
> > > > +     RUN_TESTS(exceptions_cleanup_fail);
> > > > +}
> > > > +
> > > > +void test_exceptions_cleanup(void)
> > > > +{
> > > > +     LIBBPF_OPTS(bpf_test_run_opts, ropts,
> > > > +             .data_in = &pkt_v4,
> > > > +             .data_size_in = sizeof(pkt_v4),
> > > > +             .repeat = 1,
> > > > +     );
> > > > +     struct exceptions_cleanup *skel;
> > > > +     int ret;
> > > > +
> > > > +     if (test__start_subtest("exceptions_cleanup_fail"))
> > > > +             test_exceptions_cleanup_fail();
> > >
> > > RUN_TESTS takes care of doing test__start_subtest(), etc. You should be
> > > able to just call RUN_TESTS(exceptions_cleanup_fail) directly here.
> > >
> >
> > Ack, will fix.
> >
> > > > +
> > > > +     skel = exceptions_cleanup__open_and_load();
> > > > +     if (!ASSERT_OK_PTR(skel, "exceptions_cleanup__open_and_load"))
> > > > +             return;
> > > > +
> > > > +     ret = exceptions_cleanup__attach(skel);
> > > > +     if (!ASSERT_OK(ret, "exceptions_cleanup__attach"))
> > > > +             return;
> > > > +
> > > > +#define RUN_EXC_CLEANUP_TEST(name)                                      \
> > >
> > > Should we add a call to if (test__start_subtest(#name)) to this macro?
> > >
> >
> > Makes sense, will change this.
> >
> > > > [...]
> > > > +
> > > > +SEC("tc")
> > > > +int exceptions_cleanup_percpu_obj(struct __sk_buff *ctx)
> > > > +{
> > > > +    struct { int i; } *p;
> > > > +
> > > > +    p = bpf_percpu_obj_new(typeof(*p));
> > > > +    MARK_RESOURCE(&p, RES_SPILL);
> > > > +    bpf_throw(VAL);
> > >
> > > It would be neat if we could have the bpf_throw() kfunc signature be
> > > marked as __attribute__((noreturn)) and have things work correctly;
> > > meaning you wouldn't have to even return a value here. The verifier
> > > should know that bpf_throw() is terminal, so it should be able to prune
> > > any subsequent instructions as unreachable anyways.
> > >
> >
> > Originally, I was tagging the kfunc as noreturn, but Alexei advised
> > against it in
> > https://lore.kernel.org/bpf/CAADnVQJtUD6+gYinr+6ensj58qt2LeBj4dvT7Cyu-aBCafsP5g@mail.gmail.com
> > ... so I have dropped it since.
>
> I see. Ok, we can ignore this for now, though I think we should consider
> revisiting this at some point once we've clarified the rules behind the
> implicit prologue/epilogue. Being able to actually specify noreturn
> really can make a difference in performance in some cases.
>

I agree. I will add this to my TODO list to explore after this set is merged.

> [...]
> > > > +
> > > > +SEC("tc")
> > > > +int exceptions_cleanup_reg(struct __sk_buff *ctx)
> > > > +{
> > > > +    void *p;
> > > > +
> > > > +    p = bpf_ringbuf_reserve(&ringbuf, 8, 0);
> > > > +    MARK_RESOURCE(p, RES_REG);
> > > > +    bpf_throw(VAL);
> > > > +    if (p)
> > > > +        bpf_ringbuf_discard(p, 0);
> > >
> > > Does the prog fail to load if you don't have this bpf_ringbuf_discard()
> > > check? I assume not given that in
> > > exceptions_cleanup_null_or_ptr_do_ptr() and elsewhere we do a reserve
> > > without discarding. Is there some subtle stack state difference here or
> > > something?
> > >
> >
> > So I will add comments explaining this, since I realized this confused
> > you in a couple of places, but basically if I didn't do a discard
> > here, the compiler wouldn't save the value of p across the bpf_throw
> > call. So it may end up in some caller-saved register (R1-R5) and since
> > bpf_throw needs things to be either saved in the stack or in
> > callee-saved regs (R6-R9) to be able to do the stack unwinding, we
> > would not be able to test the case where the resource is held in
> > R6-R9.
> >
> > In a correctly written program, in the path where bpf_throw is not
> > done, you will always have some cleanup code (otherwise your program
> > wouldn't pass), so the value should always end up being preserved
> > across a bpf_throw call (this is kind of why Alexei was sort of
> > worried about noreturn, because in that case the compiler may decide
> > to not preserve it for the bpf_throw path).
> > You cannot just leak a resource acquired before bpf_throw in the path
> > where exception is not thrown.
>
> Ok, that makes sense. I suppose another way to frame this would be to
> consider it in a typical scheduling scenario:
>
> struct task_ctx *lookup_task_ctx(struct task_struct *p)
> {
>         struct task_ctx *taskc;
>         s32 pid = p->pid;
>
>         taskc = bpf_map_lookup_elem(&task_data, &pid);
>         if (!taskc)
>                 bpf_throw(-ENOENT); // Verifier
>
>         return taskc;
> }
>
> void BPF_STRUCT_OPS(sched_stopping, struct task_struct *p, bool runnable)
> {
>         struct task_ctx *taskc;
>
>         taskc = lookup_task_ctx(p)
>
>         /* scale the execution time by the inverse of the weight and charge */
>         p->scx.dsq_vtime +=
>                 (bpf_ktime_get_ns() - taskc->running_at) * 100 / p->scx.weight;
> }
>
> We're not dropping a reference here, but taskc is preserved across the
> bpf_throw() path, so the same idea applies.
>

Yeah, I will add an example like this to the selftests to make sure we
also exercise such a pattern.

> > Also,  I think the test is a bit fragile, I should probably rewrite it
> > in inline assembly, because while the compiler chooses to hold it in a
> > register here, it is not bound to do so in this case.
>
> To that point, I wonder if it would be useful or possible to come up with some
> kind of a macro that allows us to specify a list of variables that must be
> preserved after a bpf_throw() call? Not sure how or if that would work exactly.
>

I think it can be useful, supposedly if we can force the compiler to
do a spill to the stack, that will be enough to enable unwinding.
But we should probably come back to this in case we see there are
certain compiler optimizations causing trouble.
Otherwise it's unnecessary cognitive overhead for someone writing a
program to have to explicitly mark variables like this.

> > > >  [...]
> > > >
> > > > -SEC("?tc")
> > > > -__failure __msg("Unreleased reference")
> > > > -int reject_with_reference(void *ctx)
> > > > -{
> > > > -     struct foo *f;
> > > > -
> > > > -     f = bpf_obj_new(typeof(*f));
> > > > -     if (!f)
> > > > -             return 0;
> > > > -     bpf_throw(0);
> > >
> > > Hmm, so why is this a memory leak exactly? Apologies if this is already
> > > explained clearly elsewhere in the stack.
> > >
> >
> > I will add comments around some of these to better explain this in the
> > non-RFC v1.
> > Basically, this program is sort of unrealistic (since it's always
> > throwing, and not really cleaning up the object since there is no
> > other path except the one with bpf_throw). So the compiler ends up
> > putting 'f' in a caller-saved register, during release_reference we
> > don't find it after bpf_throw has been processed (since caller-saved
> > regs have been cleared due to kfunc processing, and we generate frame
> > descriptors after check_kfunc_call, basically simulating the state
> > where only preserved state after the call is observed at runtime), but
> > the reference state still lingers around for 'f', so you get this
> > "Unreleased reference" error later when check_reference_leak is hit.
> >
> > It's just trying to exercise the case where the pointer tied to a
> > reference state has been lost in verifier state, and that we return an
> > error in such a case and don't succeed in verifying the program
> > accidently (because there is no way we can recover the value to free
> > at runtime).
>
> Makes total sense, thanks a lot for explaining!
>
> This looks great, I'm really excited to use it.

Thanks for the review! Will respin addressing your comments after
waiting for a day or two.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass
  2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
  2024-02-12 19:35   ` David Vernet
@ 2024-02-15  1:01   ` Eduard Zingerman
  2024-02-16 21:34     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15  1:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:

[...]

> +static int mark_exception_reachable_subprogs(struct bpf_verifier_env *env)
> +{

[...]

> +restart:
> +	subprog_end = subprog[idx + 1].start;
> +	for (; i < subprog_end; i++) {

[...]

> +		if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
> +			continue;
> +		/* remember insn and function to return to */
> +		ret_insn[frame] = i + 1;
> +		ret_prog[frame] = idx;
> +
> +		/* find the callee */
> +		next_insn = i + insn[i].imm + 1;
> +		sidx = find_subprog(env, next_insn);
> +		if (sidx < 0) {
> +			WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
> +			return -EFAULT;
> +		}

For programs like:

  foo():
    bar()
    bar()

this algorithm would scan bar() multiple times.
Would it be possible to remember if subprogram had been scanned
already and reuse collected .is_throw_reachable info?

[...]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation
  2024-02-01  4:20 ` [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation Kumar Kartikeya Dwivedi
@ 2024-02-15  1:10   ` Eduard Zingerman
  2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15  1:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
> Global subprogs are not descended during symbolic execution, but we
> summarized whether they can throw an exception (reachable from another
> exception throwing subprog) in mark_exception_reachable_subprogs added
> by the previous patch.

[...]

> Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Also, did you consider global subprograms that always throw?
E.g. do some logging and unconditionally call bpf_throw().

[...]

> @@ -9505,6 +9515,9 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		mark_reg_unknown(env, caller->regs, BPF_REG_0);
>  		caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG;
>  
> +		if (env->cur_state->global_subprog_call_exception)
> +			verbose(env, "Func#%d ('%s') may throw exception, exploring program path where exception is thrown\n",
> +				subprog, sub_name);

Nit: Maybe move this log entry to do_check?
     It would be printed right before returning to do_check() anyways.
     Maybe add a log level check?

>  		/* continue with next insn after call */
>  		return 0;
>  	}

[...]

> @@ -17675,6 +17692,11 @@ static int do_check(struct bpf_verifier_env *env)
>  				}
>  				if (insn->src_reg == BPF_PSEUDO_CALL) {
>  					err = check_func_call(env, insn, &env->insn_idx);
> +					if (!err && env->cur_state->global_subprog_call_exception) {
> +						env->cur_state->global_subprog_call_exception = false;
> +						exception_exit = true;
> +						goto process_bpf_exit_full;
> +					}
>  				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
>  					err = check_kfunc_call(env, insn, &env->insn_idx);
>  					if (!err && is_bpf_throw_kfunc(insn)) {




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs
  2024-02-01  4:20 ` [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs Kumar Kartikeya Dwivedi
@ 2024-02-15  1:10   ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15  1:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
> > Add a test case to exercise verifier logic where a global function that
> > may potentially throw an exception is invoked from the main subprog,
> > such that during exploration, the reference state is not visible when
> > the bpf_throw instruction is explored. Without the fixes in prior
> > commits, bpf_throw will not complain when unreleased resources are
> > lingering in the program when a possible exception may be thrown.
> > 
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump
  2024-02-01  4:20 ` [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump Kumar Kartikeya Dwivedi
@ 2024-02-15  1:11   ` Eduard Zingerman
  2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15  1:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
> Refactor check_pseudo_btf_id's code which adds a new BTF reference to
> the used_btfs into a separate helper function called add_used_btfs. This
> will be later useful in exception frame generation to take BTF
> references with their modules, so that we can keep the modules alive
> whose functions may be required to unwind a given BPF program when it
> eventually throws an exception.

[...]

> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf)

[...]

> +	if (env->used_btf_cnt >= MAX_USED_BTFS) {
> +		err = -E2BIG;
> +		goto err;

Nit: could be "return -E2BIG"

> +	}
> +
> +	btf_mod = &env->used_btfs[env->used_btf_cnt];
> +	btf_mod->btf = btf;
> +	btf_mod->module = NULL;
> +
> +	/* if we reference variables from kernel module, bump its refcount */
> +	if (btf_is_module(btf)) {
> +		btf_mod->module = btf_try_get_module(btf);
> +		if (!btf_mod->module) {
> +			err = -ENXIO;
> +			goto err;

Nit: could be "return -ENXIO"

> +		}
> +	}
> +	env->used_btf_cnt++;
> +	return 0;
> +err:
> +	return err;
> +}
> +



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching
  2024-02-01  4:21 ` [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching Kumar Kartikeya Dwivedi
@ 2024-02-15 16:31   ` Eduard Zingerman
  2024-02-16 21:52     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15 16:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:

[...]

> +static int adjust_subprog_frame_descs_after_remove(struct bpf_verifier_env *env, u32 off, u32 cnt)
> +{
> +	for (int i = 0; i < env->subprog_cnt; i++) {
> +		struct bpf_exception_frame_desc_tab *fdtab = subprog_info(env, i)->fdtab;
> +
> +		if (!fdtab)
> +			continue;
> +		for (int j = 0; j < fdtab->cnt; j++) {
> +			/* Part of a subprog_info whose instructions were removed partially, but the fdtab remained. */
> +			if (fdtab->desc[j]->pc >= off && fdtab->desc[j]->pc < off + cnt) {
> +				void *p = fdtab->desc[j];
> +				if (j < fdtab->cnt - 1)
> +					memmove(fdtab->desc + j, fdtab->desc + j + 1, sizeof(fdtab->desc[0]) * (fdtab->cnt - j - 1));
> +				kfree(p);

Is it necessary to release btf references for desc entries that are removed?
Those that were grabbed by add_used_btf() in gen_exception_frame_desc_iter_entry().

> +				fdtab->cnt--;
> +				j--;
> +			}
> +			if (fdtab->desc[j]->pc >= off + cnt)
> +				fdtab->desc[j]->pc -= cnt;
> +		}
> +	}
> +	return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-01  4:21 ` [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation Kumar Kartikeya Dwivedi
@ 2024-02-15 18:24   ` Eduard Zingerman
  2024-02-16 11:23     ` Eduard Zingerman
  2024-02-16 22:24     ` Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15 18:24 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:

Question: are there any real-life programs adapted to use exceptions
with cleanup feature? It would be interesting to see how robust
one-descriptor-per-pc is in practice, and also how it affects memory
consumption during verification.

The algorithm makes sense to me, a few comments/nits below.

[...]

> +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
> +{
> +	struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
> +	int ret = 0;
> +
> +	for (int i = 0; i < fdtab->cnt; i++) {
> +		if (pc != fdtab->desc[i]->pc)
> +			continue;
> +		descs = &fdtab->desc[i];
> +		desc = fdtab->desc[i];
> +		break;
> +	}
> +
> +	if (!desc) {
> +		verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
> +		return -ENOENT;
> +	}
> +
> +	if (fd->off < 0)
> +		goto stack;

Nit: maybe write it down as

	if (fd->off >= 0)
		return merge_frame_desc(...);

     and avoid goto?

[...]

> +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
> +{
> +	int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
> +	struct bpf_reg_state *reg, not_init_reg, null_reg;
> +	int slot_type, ret;
> +
> +	__mark_reg_not_init(env, &not_init_reg);
> +	__mark_reg_known_zero(&null_reg);

__mark_reg_known_zero() does not set .type field,
thus null_reg.type value is undefined.

> +
> +	slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
> +	reg = &frame->stack[spi].spilled_ptr;
> +
> +	switch (slot_type) {
> +	case STACK_SPILL:
> +		/* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
> +		if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
> +			break;
> +		ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_DYNPTR:
> +		/* Keep iterating until we find the first slot. */
> +		if (!reg->dynptr.first_slot)
> +			break;
> +		ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_ITER:
> +		/* Keep iterating until we find the first slot. */
> +		if (!reg->ref_obj_id)
> +			break;
> +		ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_MISC:
> +	case STACK_INVALID:
> +		/* Create an invalid entry for MISC and INVALID */
> +		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> +		if (ret < 0)
> +			return 0;

No tests are failing if I comment out this block.
Looking at the merge_frame_desc() logic it appears to me that fd
entries with fd->type == NOT_INIT would only be merged with other
NOT_INIT entries. What is the point of having such entries at all?

> +		break;
> +	case STACK_ZERO:
> +		reg = &null_reg;
> +		for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
> +			if (frame->stack[spi].slot_type[i] != STACK_ZERO)
> +				reg = &not_init_reg;
> +		}
> +		ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;

Same here, no tests are failing if STACK_ZERO block is commented.
In general, what is the point of adding STACK_ZERO entries?
There is a logic in place to merge NULL and non-NULL entries,
but how is it different from not adding NULL entries in a first place?
find_and_merge_frame_desc() does a linear scan over bpf_exception_frame_desc->stack
and does not rely on entries being sorted by .off field.

> +		break;
> +	default:
> +		verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
> +			frame->frameno, off, slot_type);
> +		return -EFAULT;
> +	}
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw
  2024-02-01  4:21 ` [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw Kumar Kartikeya Dwivedi
@ 2024-02-15 22:11   ` Eduard Zingerman
  2024-02-16 21:59     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15 22:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> When we perform a bpf_throw kfunc call, callee saved registers in BPF
> calling convention (R6-R9) may end up getting saved and clobbered by
> bpf_throw. Typically, the kernel will restore the registers before
> returning back to the BPF program, but in case of bpf_throw, the
> function will never return. Therefore, any acquired resources sitting in
> these registers will end up getting destroyed if not saved on the
> stack, without any cleanup happening for them.

Could you please paraphrase this description a bit?
It took me a while to figure out the difference between regular bpf
calls and kfunc calls. Something like:

  - For regular bpf subprogram calls jit emits code that pushes R6-R9 to stack
    before jumping into callee.
  - For kfunc calls jit emits instructions that do not guarantee that
    R6-R9 would be preserved on stack. E.g. for x86 kfunc call is translated
    as "call" instruction, which only pushes return address to stack.

--

Also, what do you think about the following hack:
- declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
- replace all calls to bpf_throw() with calls to bpf_throw_r()
  (r1-r5 do not have to be preserved anyways).
Thus avoid necessity to introduce the trampoline.

[...]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs
  2024-02-01  4:21 ` [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs Kumar Kartikeya Dwivedi
@ 2024-02-15 22:12   ` Eduard Zingerman
  2024-02-16 22:02     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15 22:12 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:

[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 942243cba9f1..aeaf97b0a749 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2942,6 +2942,15 @@ static int check_subprogs(struct bpf_verifier_env *env)
>  		    insn[i].src_reg == 0 &&
>  		    insn[i].imm == BPF_FUNC_tail_call)
>  			subprog[cur_subprog].has_tail_call = true;
> +		/* Collect callee regs used in the subprog. */
> +		if (insn[i].dst_reg == BPF_REG_6 || insn[i].src_reg == BPF_REG_6)
> +			subprog[cur_subprog].callee_regs_used[0] = true;
> +		if (insn[i].dst_reg == BPF_REG_7 || insn[i].src_reg == BPF_REG_7)
> +			subprog[cur_subprog].callee_regs_used[1] = true;
> +		if (insn[i].dst_reg == BPF_REG_8 || insn[i].src_reg == BPF_REG_8)
> +			subprog[cur_subprog].callee_regs_used[2] = true;
> +		if (insn[i].dst_reg == BPF_REG_9 || insn[i].src_reg == BPF_REG_9)
> +			subprog[cur_subprog].callee_regs_used[3] = true;

Nit: Maybe move bpf_jit_comp.c:detect_reg_usage() to some place available to
     both verifier and jit? Just to keep all related code in one place.
     E.g. technically nothing prevents x86 jit to do this detection in a more
     precise manner as a "fixed point" computation.

>  		if (!env->seen_throw_insn && is_bpf_throw_kfunc(&insn[i]))
>  			env->seen_throw_insn = true;
>  		if (BPF_CLASS(code) == BPF_LD &&

[...]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries
  2024-02-01  4:21 ` [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries Kumar Kartikeya Dwivedi
@ 2024-02-15 22:12   ` Eduard Zingerman
  2024-02-16 13:33     ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-15 22:12 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> Until now, the program counter value stored in frame descriptor entries
> was the instruction index of the BPF program's insn and callsites when
> going down the frames in a call chain. However, at runtime, the program
> counter will be the pointer to the next instruction, and thus needs to
> be computed in a position independent way to tally it at runtime to find
> the frame descriptor when unwinding.
> 
> To do this, we first convert the global instruction index into an
> instruction index relative to the start of a subprog, and add 1 to it
> (to reflect that at runtime, the program counter points to the next
> instruction). Then, we modify the JIT (for now, x86) to convert them
> to instruction offsets relative to the start of the JIT image, which is
> the prog->bpf_func of the subprog in question at runtime.
> 
> Later, subtracting the prog->bpf_func pointer from runtime program
> counter will yield the same offset, and allow us to figure out the
> corresponding frame descriptor entry.

Would it be possible to instead embed an address (or index)
of the corresponding frame descriptor in instruction stream itself?
E.g. do LD_imm64 and pass it as a first parameter for bpf_throw()?
Thus avoiding the necessity to track ip changes.

[...]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-15 18:24   ` Eduard Zingerman
@ 2024-02-16 11:23     ` Eduard Zingerman
  2024-02-16 22:06       ` Kumar Kartikeya Dwivedi
  2024-02-16 22:24     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-16 11:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-15 at 20:24 +0200, Eduard Zingerman wrote:
[...]

> > +	case STACK_MISC:
> > +	case STACK_INVALID:
> > +		/* Create an invalid entry for MISC and INVALID */
> > +		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > +		if (ret < 0)
> > +			return 0;
> 
> No tests are failing if I comment out this block.
> Looking at the merge_frame_desc() logic it appears to me that fd
> entries with fd->type == NOT_INIT would only be merged with other
> NOT_INIT entries. What is the point of having such entries at all?

Oh, I got it, it's a mark that ensures that no merge would happen with
e.g. some resource pointer. Makes sense, sorry for the noise.

Still, I think that my STACK_ZERO remark was correct.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
  2024-02-01  4:21 ` [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions Kumar Kartikeya Dwivedi
  2024-02-03  9:02   ` kernel test robot
@ 2024-02-16 12:02   ` Eduard Zingerman
  2024-02-16 22:28     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-16 12:02 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:

[...]

> +int bpf_cleanup_resource_reg(struct bpf_frame_desc_reg_entry *fd, void *ptr)
> +{

Question:
Maybe I missed something in frame descriptor construction process,
but it appears like there is nothing guarding against double cleanup.
E.g. consider a program like below:

   r6 = ... PTR_TO_SOCKET ...
   r7 = r6
   *(u64 *)(r10 - 16) = r6
   call bpf_throw()

Would bpf_cleanup_resource_reg() be called for all r6, r7 and fp[-16],
thus executing destructor for the same object multiple times?

> +	u64 reg_value = ptr ? *(u64 *)ptr : 0;
> +	struct btf_struct_meta *meta;
> +	const struct btf_type *t;
> +	u32 dtor_id;
> +
> +	switch (fd->type) {
> +	case PTR_TO_SOCKET:
> +	case PTR_TO_TCP_SOCK:
> +	case PTR_TO_SOCK_COMMON:
> +		if (reg_value)
> +			bpf_sk_release_dtor((void *)reg_value);
> +		return 0;

[...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions
  2024-02-01  4:21 ` [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions Kumar Kartikeya Dwivedi
@ 2024-02-16 12:21   ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-16 12:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> Reflect in the verifier state that references would be released whenever
> we throw a BPF exception. Now that we support generating frame
> descriptors, and performing the runtime cleanup, whenever processing an
> entry corresponding to an acquired reference, make sure we release its
> reference state. Note that we only release this state for the current
> frame, as the acquired refs are only checked against that when
> processing an exceptional exit.
> 
> This would ensure that for acquired resources apart from locks and RCU
> read sections, BPF programs never fail in case of lingering resources
> during verification.
> 
> While at it, we can tweak check_reference_leak to drop the
> exception_exit parameter, and fix selftests that will fail due to the
> changed behaviour.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries
  2024-02-15 22:12   ` Eduard Zingerman
@ 2024-02-16 13:33     ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-16 13:33 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 00:12 +0200, Eduard Zingerman wrote:
[...]

> Would it be possible to instead embed an address (or index)
> of the corresponding frame descriptor in instruction stream itself?
> E.g. do LD_imm64 and pass it as a first parameter for bpf_throw()?
> Thus avoiding the necessity to track ip changes.

This won't work for calls upper in the chain.
Oh, well...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass
  2024-02-15  1:01   ` Eduard Zingerman
@ 2024-02-16 21:34     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 21:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 02:01, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > +static int mark_exception_reachable_subprogs(struct bpf_verifier_env *env)
> > +{
>
> [...]
>
> > +restart:
> > +     subprog_end = subprog[idx + 1].start;
> > +     for (; i < subprog_end; i++) {
>
> [...]
>
> > +             if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i))
> > +                     continue;
> > +             /* remember insn and function to return to */
> > +             ret_insn[frame] = i + 1;
> > +             ret_prog[frame] = idx;
> > +
> > +             /* find the callee */
> > +             next_insn = i + insn[i].imm + 1;
> > +             sidx = find_subprog(env, next_insn);
> > +             if (sidx < 0) {
> > +                     WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
> > +                     return -EFAULT;
> > +             }
>
> For programs like:
>
>   foo():
>     bar()
>     bar()
>
> this algorithm would scan bar() multiple times.
> Would it be possible to remember if subprogram had been scanned
> already and reuse collected .is_throw_reachable info?
>

Good idea, I will look into avoiding this. I think
check_max_stack_depth would also benefit from such a change, and since
I plan on consolidating both to use similar logic, I will make the
change for both.

> [...]
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation
  2024-02-15  1:10   ` Eduard Zingerman
@ 2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
  2024-02-17 14:04       ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 21:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 02:10, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
> > Global subprogs are not descended during symbolic execution, but we
> > summarized whether they can throw an exception (reachable from another
> > exception throwing subprog) in mark_exception_reachable_subprogs added
> > by the previous patch.
>
> [...]
>
> > Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> Also, did you consider global subprograms that always throw?
> E.g. do some logging and unconditionally call bpf_throw().
>

I have an example for that in the exception test suite, but I will add
a test for that with lingering references around.

> [...]
>
> > @@ -9505,6 +9515,9 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >               mark_reg_unknown(env, caller->regs, BPF_REG_0);
> >               caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG;
> >
> > +             if (env->cur_state->global_subprog_call_exception)
> > +                     verbose(env, "Func#%d ('%s') may throw exception, exploring program path where exception is thrown\n",
> > +                             subprog, sub_name);
>
> Nit: Maybe move this log entry to do_check?
>      It would be printed right before returning to do_check() anyways.
>      Maybe add a log level check?
>

Hmm, true. I was actually even considering whether all frame_desc logs
should also be LOG_LEVEL2?

> >               /* continue with next insn after call */
> >               return 0;
> >       }
>
> [...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump
  2024-02-15  1:11   ` Eduard Zingerman
@ 2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 21:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 02:11, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:20 +0000, Kumar Kartikeya Dwivedi wrote:
> > Refactor check_pseudo_btf_id's code which adds a new BTF reference to
> > the used_btfs into a separate helper function called add_used_btfs. This
> > will be later useful in exception frame generation to take BTF
> > references with their modules, so that we can keep the modules alive
> > whose functions may be required to unwind a given BPF program when it
> > eventually throws an exception.
>
> [...]
>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > +static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf)
>
> [...]
>
> > +     if (env->used_btf_cnt >= MAX_USED_BTFS) {
> > +             err = -E2BIG;
> > +             goto err;
>
> Nit: could be "return -E2BIG"
>

Ack, will fix.

> > +     }
> > +
> > +     btf_mod = &env->used_btfs[env->used_btf_cnt];
> > +     btf_mod->btf = btf;
> > +     btf_mod->module = NULL;
> > +
> > +     /* if we reference variables from kernel module, bump its refcount */
> > +     if (btf_is_module(btf)) {
> > +             btf_mod->module = btf_try_get_module(btf);
> > +             if (!btf_mod->module) {
> > +                     err = -ENXIO;
> > +                     goto err;
>
> Nit: could be "return -ENXIO"
>

Ack.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching
  2024-02-15 16:31   ` Eduard Zingerman
@ 2024-02-16 21:52     ` Kumar Kartikeya Dwivedi
  2024-02-17 14:08       ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 21:52 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 17:31, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > +static int adjust_subprog_frame_descs_after_remove(struct bpf_verifier_env *env, u32 off, u32 cnt)
> > +{
> > +     for (int i = 0; i < env->subprog_cnt; i++) {
> > +             struct bpf_exception_frame_desc_tab *fdtab = subprog_info(env, i)->fdtab;
> > +
> > +             if (!fdtab)
> > +                     continue;
> > +             for (int j = 0; j < fdtab->cnt; j++) {
> > +                     /* Part of a subprog_info whose instructions were removed partially, but the fdtab remained. */
> > +                     if (fdtab->desc[j]->pc >= off && fdtab->desc[j]->pc < off + cnt) {
> > +                             void *p = fdtab->desc[j];
> > +                             if (j < fdtab->cnt - 1)
> > +                                     memmove(fdtab->desc + j, fdtab->desc + j + 1, sizeof(fdtab->desc[0]) * (fdtab->cnt - j - 1));
> > +                             kfree(p);
>
> Is it necessary to release btf references for desc entries that are removed?
> Those that were grabbed by add_used_btf() in gen_exception_frame_desc_iter_entry().
>

I think these btf pointers are just a view, the real owner is in
the used_btfs array, in case of failure, it is dropped as part of
bpf_verifier_env cleanup, or in case of success, transferred to
bpf_prog struct and released on bpf_prog cleanup.
So I think it should be ok, but I will recheck again.

> [...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw
  2024-02-15 22:11   ` Eduard Zingerman
@ 2024-02-16 21:59     ` Kumar Kartikeya Dwivedi
  2024-02-17 14:22       ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 21:59 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 23:11, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
> > When we perform a bpf_throw kfunc call, callee saved registers in BPF
> > calling convention (R6-R9) may end up getting saved and clobbered by
> > bpf_throw. Typically, the kernel will restore the registers before
> > returning back to the BPF program, but in case of bpf_throw, the
> > function will never return. Therefore, any acquired resources sitting in
> > these registers will end up getting destroyed if not saved on the
> > stack, without any cleanup happening for them.
>
> Could you please paraphrase this description a bit?
> It took me a while to figure out the difference between regular bpf
> calls and kfunc calls. Something like:
>
>   - For regular bpf subprogram calls jit emits code that pushes R6-R9 to stack
>     before jumping into callee.
>   - For kfunc calls jit emits instructions that do not guarantee that
>     R6-R9 would be preserved on stack. E.g. for x86 kfunc call is translated
>     as "call" instruction, which only pushes return address to stack.
>

Will rephrase like this, thanks for the suggestions.

> --
>
> Also, what do you think about the following hack:
> - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
> - replace all calls to bpf_throw() with calls to bpf_throw_r()
>   (r1-r5 do not have to be preserved anyways).
> Thus avoid necessity to introduce the trampoline.
>

I think we can do such a thing as well, but there are other tradeoffs.

Do you mean that R6 to R9 would be copied to R1 to R5? We will have to
special case such calls in each architecture's JIT, and add extra code
to handle it, since fixups from the verifier would also need to pass
the 6th argument, the cookie value to the bpf_throw call, which can't
fit in the 5 argument limit for existing kfuncs. I did contemplate
this solution but then decided against it for these reasons.

One of the advantages of this bpf_throw_tramp stuff is that it does
not increase code size for all callees, by doing the saving only when
subprog is called. We can do something similar for bpf_throw_r, but it
would be in architecture specific code in JIT or some arch_bpf_throw_r
instead.

Let me know if you suggested something different than what I understood above.

> [...]
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs
  2024-02-15 22:12   ` Eduard Zingerman
@ 2024-02-16 22:02     ` Kumar Kartikeya Dwivedi
  2024-02-17 14:26       ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 22:02 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 23:12, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 942243cba9f1..aeaf97b0a749 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -2942,6 +2942,15 @@ static int check_subprogs(struct bpf_verifier_env *env)
> >                   insn[i].src_reg == 0 &&
> >                   insn[i].imm == BPF_FUNC_tail_call)
> >                       subprog[cur_subprog].has_tail_call = true;
> > +             /* Collect callee regs used in the subprog. */
> > +             if (insn[i].dst_reg == BPF_REG_6 || insn[i].src_reg == BPF_REG_6)
> > +                     subprog[cur_subprog].callee_regs_used[0] = true;
> > +             if (insn[i].dst_reg == BPF_REG_7 || insn[i].src_reg == BPF_REG_7)
> > +                     subprog[cur_subprog].callee_regs_used[1] = true;
> > +             if (insn[i].dst_reg == BPF_REG_8 || insn[i].src_reg == BPF_REG_8)
> > +                     subprog[cur_subprog].callee_regs_used[2] = true;
> > +             if (insn[i].dst_reg == BPF_REG_9 || insn[i].src_reg == BPF_REG_9)
> > +                     subprog[cur_subprog].callee_regs_used[3] = true;
>
> Nit: Maybe move bpf_jit_comp.c:detect_reg_usage() to some place available to
>      both verifier and jit? Just to keep all related code in one place.
>      E.g. technically nothing prevents x86 jit to do this detection in a more
>      precise manner as a "fixed point" computation.
>

Hm, I remember I did this and then decided against it for some reason,
but I can't remember now.
I will make this change though, if I remember why I didn't go ahead
with it, I will reply again.

Also, what did you mean by the final sentence?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-16 11:23     ` Eduard Zingerman
@ 2024-02-16 22:06       ` Kumar Kartikeya Dwivedi
  2024-02-17 17:14         ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 22:06 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 16 Feb 2024 at 12:23, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-15 at 20:24 +0200, Eduard Zingerman wrote:
> [...]
>
> > > +   case STACK_MISC:
> > > +   case STACK_INVALID:
> > > +           /* Create an invalid entry for MISC and INVALID */
> > > +           ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > > +           if (ret < 0)
> > > +                   return 0;
> >
> > No tests are failing if I comment out this block.
> > Looking at the merge_frame_desc() logic it appears to me that fd
> > entries with fd->type == NOT_INIT would only be merged with other
> > NOT_INIT entries. What is the point of having such entries at all?
>
> Oh, I got it, it's a mark that ensures that no merge would happen with
> e.g. some resource pointer. Makes sense, sorry for the noise.
>
> Still, I think that my STACK_ZERO remark was correct.

I will add tests to exercise these cases, but the idea for STACK_ZERO
was to treat it as if we had a NULL value in that stack slot, thus
allowing merging with other resource pointers. Sometimes when NULL
initializing something, it can be marked as STACK_ZERO in the verifier
state. Therefore, we would prefer to treat it same as the case where a
scalar reg known to be zero is spilled to the stack (that is what we
do by using a fake struct bpf_reg_state).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-15 18:24   ` Eduard Zingerman
  2024-02-16 11:23     ` Eduard Zingerman
@ 2024-02-16 22:24     ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 22:24 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 15 Feb 2024 at 19:24, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
>
> Question: are there any real-life programs adapted to use exceptions
> with cleanup feature? It would be interesting to see how robust
> one-descriptor-per-pc is in practice, and also how it affects memory
> consumption during verification.
>

I tried it on sched-ext schedulers with this series with some helper
macros and it worked well.
I can post stats on memory consumption with the next version (or make
it part of veristat, whichever is more desirable).

I think the downside of this approach is when you have different
resources in the same register/slot from two different paths which
then hit the same bpf_throw. Surely, it will not work in that case the
way things are set up right now. But there are ways to address that
(ranging from compiler passes that hoist the throw call to the
predecessor basic blocks of separate paths when we detect such a
pattern, to emitting path hints of resource types during JIT which
bpf_throw can pick up), but I didn't encounter any examples so far
where this came up, and when it does, mostly you can rewrite things a
bit differently to make it work.

One of the changes though that I plan to make in the next posting is
keeping duplicate entries for the same resource around when the first
frame descriptor is created for a pc. This increases the chances of
merging correctly in case the second path intersects with us in some
slots, but for the purposes of releasing resources, that intersection
is sufficient.

Like you could have the same resource in R8 and R9, the second path
may only have it in R9 and not R8, both can merge, if we erase R8 (or
any other duplicates) in the original frame desc.

Right now, we simply do a release_resource so whenever we encounter
R8, we release its ref_obj_id and mark everything sharing the same id
invalid, so we never encounter R9 in the verifier state. In the above
case, these frame descs would not match but they could have if we had
discovered R9 before R8.

If you think this is not sufficient, please let me know. It is
certainly tricky and I might be underestimating the difficulty of
getting this right.

> The algorithm makes sense to me, a few comments/nits below.
>
> [...]
>
> > +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
> > +{
> > +     struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
> > +     int ret = 0;
> > +
> > +     for (int i = 0; i < fdtab->cnt; i++) {
> > +             if (pc != fdtab->desc[i]->pc)
> > +                     continue;
> > +             descs = &fdtab->desc[i];
> > +             desc = fdtab->desc[i];
> > +             break;
> > +     }
> > +
> > +     if (!desc) {
> > +             verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
> > +             return -ENOENT;
> > +     }
> > +
> > +     if (fd->off < 0)
> > +             goto stack;
>
> Nit: maybe write it down as
>
>         if (fd->off >= 0)
>                 return merge_frame_desc(...);
>
>      and avoid goto?
>

Ack, will fix.

> [...]
>
> > +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
> > +{
> > +     int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
> > +     struct bpf_reg_state *reg, not_init_reg, null_reg;
> > +     int slot_type, ret;
> > +
> > +     __mark_reg_not_init(env, &not_init_reg);
> > +     __mark_reg_known_zero(&null_reg);
>
> __mark_reg_known_zero() does not set .type field,
> thus null_reg.type value is undefined.
>

Hmm, good catch, will fix it.

> > +
> > +     slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
> > +     reg = &frame->stack[spi].spilled_ptr;
> > +
> > +     switch (slot_type) {
> > +     case STACK_SPILL:
> > +             /* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
> > +             if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
> > +                     break;
> > +             ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_DYNPTR:
> > +             /* Keep iterating until we find the first slot. */
> > +             if (!reg->dynptr.first_slot)
> > +                     break;
> > +             ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_ITER:
> > +             /* Keep iterating until we find the first slot. */
> > +             if (!reg->ref_obj_id)
> > +                     break;
> > +             ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_MISC:
> > +     case STACK_INVALID:
> > +             /* Create an invalid entry for MISC and INVALID */
> > +             ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return 0;
>
> No tests are failing if I comment out this block.
> Looking at the merge_frame_desc() logic it appears to me that fd
> entries with fd->type == NOT_INIT would only be merged with other
> NOT_INIT entries. What is the point of having such entries at all?
>

Assume you figured it out based on the other email.
Basically, creating entries so that no merge can occur for the slot.

> > +             break;
> > +     case STACK_ZERO:
> > +             reg = &null_reg;
> > +             for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
> > +                     if (frame->stack[spi].slot_type[i] != STACK_ZERO)
> > +                             reg = &not_init_reg;
> > +             }
> > +             ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
>
> Same here, no tests are failing if STACK_ZERO block is commented.
> In general, what is the point of adding STACK_ZERO entries?
> There is a logic in place to merge NULL and non-NULL entries,
> but how is it different from not adding NULL entries in a first place?
> find_and_merge_frame_desc() does a linear scan over bpf_exception_frame_desc->stack
> and does not rely on entries being sorted by .off field.
>

Answered on the other email as well. I will add more tests for this case.
STACK_ZERO also creates NULL entries, we just treat them the same as
NULL reg and insert an entry using a fake reg initialized to zero.
Also I think we need to pass reg, not &null_reg in this case. Will fix
this part of the code and add tests.

While it does not rely on entries being sorted, we still find the one
with the same offset.
I can sort for a binary search at runtime, I was mostly worried that
constant re-sorting during verification may end up costing more when
the number of frame descriptors is low, but I think it makes sense
atleast for runtime when the array is only searched.

> > +             break;
> > +     default:
> > +             verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
> > +                     frame->frameno, off, slot_type);
> > +             return -EFAULT;
> > +     }
> > +     return 0;
> > +}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
  2024-02-16 12:02   ` Eduard Zingerman
@ 2024-02-16 22:28     ` Kumar Kartikeya Dwivedi
  2024-02-19 12:01       ` Eduard Zingerman
  0 siblings, 1 reply; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-16 22:28 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 16 Feb 2024 at 13:02, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > +int bpf_cleanup_resource_reg(struct bpf_frame_desc_reg_entry *fd, void *ptr)
> > +{
>
> Question:
> Maybe I missed something in frame descriptor construction process,
> but it appears like there is nothing guarding against double cleanup.
> E.g. consider a program like below:
>
>    r6 = ... PTR_TO_SOCKET ...
>    r7 = r6
>    *(u64 *)(r10 - 16) = r6
>    call bpf_throw()
>
> Would bpf_cleanup_resource_reg() be called for all r6, r7 and fp[-16],
> thus executing destructor for the same object multiple times?

Good observation. My idea was to rely on release_reference so that
duplicate resources get erased from verifier state in such a way that
we don't go over the same ref_obj_id twice. IIUC, we start from the
current frame, and since bpf_for_each_reg_in_vstate iterates over all
frames, every register/stack slot sharing the ref_obj_id is destroyed,
so we wouldn't encounter the same resource again, hence the frame
descriptor should at most have one entry per resource. We iterate over
the stack frame first since the location of registers holding
resources is relatively stable and increases chances of merging across
different paths.

>
> > +     u64 reg_value = ptr ? *(u64 *)ptr : 0;
> > +     struct btf_struct_meta *meta;
> > +     const struct btf_type *t;
> > +     u32 dtor_id;
> > +
> > +     switch (fd->type) {
> > +     case PTR_TO_SOCKET:
> > +     case PTR_TO_TCP_SOCK:
> > +     case PTR_TO_SOCK_COMMON:
> > +             if (reg_value)
> > +                     bpf_sk_release_dtor((void *)reg_value);
> > +             return 0;
>
> [...]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation
  2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
@ 2024-02-17 14:04       ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-17 14:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 22:50 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> > Also, did you consider global subprograms that always throw?
> > E.g. do some logging and unconditionally call bpf_throw().
> > 
> 
> I have an example for that in the exception test suite, but I will add
> a test for that with lingering references around.

I meant that for some global subprograms it is not necessary to
explore both throwing and non-throwing branches, e.g.:

  void my_throw(void) {
    bpf_printk("oh no, oh no, oh no-no-no-...");
    bpf_throw(0);
  }

  SEC("tp")
  int foo(...) {
    ...
    if (a > 10)
      my_throw();
    ... here 'a' is always <= 10 ...
  }

[...]

> > Nit: Maybe move this log entry to do_check?
> >      It would be printed right before returning to do_check() anyways.
> >      Maybe add a log level check?
> > 
> 
> Hmm, true. I was actually even considering whether all frame_desc logs
> should also be LOG_LEVEL2?

Right, makes sense.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching
  2024-02-16 21:52     ` Kumar Kartikeya Dwivedi
@ 2024-02-17 14:08       ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-17 14:08 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 22:52 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> I think these btf pointers are just a view, the real owner is in
> the used_btfs array, in case of failure, it is dropped as part of
> bpf_verifier_env cleanup, or in case of success, transferred to
> bpf_prog struct and released on bpf_prog cleanup.
> So I think it should be ok, but I will recheck again.

You are correct and I'm wrong,
add_used_btf() indeed pushes link to env->used_btfs,
sorry for the noise.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw
  2024-02-16 21:59     ` Kumar Kartikeya Dwivedi
@ 2024-02-17 14:22       ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-17 14:22 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 22:59 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> > Also, what do you think about the following hack:
> > - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)";
> > - replace all calls to bpf_throw() with calls to bpf_throw_r()
> >   (r1-r5 do not have to be preserved anyways).
> > Thus avoid necessity to introduce the trampoline.
> > 
> 
> I think we can do such a thing as well, but there are other tradeoffs.
> 
> Do you mean that R6 to R9 would be copied to R1 to R5? We will have to
> special case such calls in each architecture's JIT, and add extra code
> to handle it, since fixups from the verifier would also need to pass
> the 6th argument, the cookie value to the bpf_throw call, which can't
> fit in the 5 argument limit for existing kfuncs. I did contemplate
> this solution but then decided against it for these reasons.
> 
> One of the advantages of this bpf_throw_tramp stuff is that it does
> not increase code size for all callees, by doing the saving only when
> subprog is called. We can do something similar for bpf_throw_r, but it
> would be in architecture specific code in JIT or some arch_bpf_throw_r
> instead.
> 
> Let me know if you suggested something different than what I understood above.

Forgot about cookie, however R6-R9 fit in R2-R5, so the cookie would be fine.
arch_bpf_throw_r() that saves R6-R9 right after the call is probably
better than plain bpf register copying.

But you are correct that trampoline allows uniform processing in
arch_bpf_cleanup_frame_resource(), so it would be less C code to
implement this feature in the end.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs
  2024-02-16 22:02     ` Kumar Kartikeya Dwivedi
@ 2024-02-17 14:26       ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-17 14:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 23:02 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> > Nit: Maybe move bpf_jit_comp.c:detect_reg_usage() to some place available to
> >      both verifier and jit? Just to keep all related code in one place.
> >      E.g. technically nothing prevents x86 jit to do this detection in a more
> >      precise manner as a "fixed point" computation.
> > 
> 
> Hm, I remember I did this and then decided against it for some reason,
> but I can't remember now.
> I will make this change though, if I remember why I didn't go ahead
> with it, I will reply again.
> 
> Also, what did you mean by the final sentence?

Tried to give some reasoning on why x86 jit implementation might change.
On a second thought, not the best reasoning, so please ignore it.
My main point here was about duplication of the coupled code:
if one would be changed, the other would have to be changed too.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-16 22:06       ` Kumar Kartikeya Dwivedi
@ 2024-02-17 17:14         ` Eduard Zingerman
  2024-02-20 21:58           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-17 17:14 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> I will add tests to exercise these cases, but the idea for STACK_ZERO
> was to treat it as if we had a NULL value in that stack slot, thus
> allowing merging with other resource pointers. Sometimes when NULL
> initializing something, it can be marked as STACK_ZERO in the verifier
> state. Therefore, we would prefer to treat it same as the case where a
> scalar reg known to be zero is spilled to the stack (that is what we
> do by using a fake struct bpf_reg_state).

Agree that it is useful to merge 0/NULL/STACK_ZERO with PTR_TO_SOMETHING.
What I meant is that merging with 0 is a noop and there is no need to
add a new fd entry. However, I missed the following check:

+ static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+ {
+ 	struct bpf_frame_desc_reg_entry fd = {};
+ 
+ 	if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
+ 		return 0;

So, the intent is to skip adding fd entry if register has scalar type, right?
I tried the following test case:

SEC("?tc")
__success
int test(struct __sk_buff *ctx)
{
    asm volatile (
       "r7 = *(u32 *)(r1 + 0);		\
	r1 = %[ringbuf] ll;		\
	r2 = 8;				\
	r3 = 0;				\
	r0 = 0;				\
	if r7 > 42 goto +1;		\
	call %[bpf_ringbuf_reserve];	\
	*(u64 *)(r10 - 8) = r0;		\
	r0 = 0;				\
	r1 = 0;				\
	call bpf_throw;			\
    "	:
	: __imm(bpf_ringbuf_reserve),
	  __imm_addr(ringbuf)
	: "memory");
    return 0;
}

And it adds fp[-8] entry for one branch and skips fp[-8] for the other.
However, the following test passes as well (note 'r0 = 7'):

SEC("?tc")
__success
int same_resource_many_regs(struct __sk_buff *ctx)
{
    asm volatile (
       "r7 = *(u32 *)(r1 + 0);		\
	r1 = %[ringbuf] ll;		\
	r2 = 8;				\
	r3 = 0;				\
	r0 = 7;	/* !!! */		\
	if r7 > 42 goto +1;		\
	call %[bpf_ringbuf_reserve];	\
	*(u64 *)(r10 - 8) = r0;		\
	r0 = 0;				\
	r1 = 0;				\
	call bpf_throw;			\
    "	:
	: __imm(bpf_ringbuf_reserve),
	  __imm_addr(ringbuf)
	: "memory");
    return 0;
}

Which is probably not correct, as scalar 7 would be used as a
parameter for ringbuf destructor, right?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions
  2024-02-16 22:28     ` Kumar Kartikeya Dwivedi
@ 2024-02-19 12:01       ` Eduard Zingerman
  0 siblings, 0 replies; 53+ messages in thread
From: Eduard Zingerman @ 2024-02-19 12:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Fri, 2024-02-16 at 23:28 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> > Question:
> > Maybe I missed something in frame descriptor construction process,
> > but it appears like there is nothing guarding against double cleanup.
> > E.g. consider a program like below:
> > 
> >    r6 = ... PTR_TO_SOCKET ...
> >    r7 = r6
> >    *(u64 *)(r10 - 16) = r6
> >    call bpf_throw()
> > 
> > Would bpf_cleanup_resource_reg() be called for all r6, r7 and fp[-16],
> > thus executing destructor for the same object multiple times?
> 
> Good observation. My idea was to rely on release_reference so that
> duplicate resources get erased from verifier state in such a way that
> we don't go over the same ref_obj_id twice. IIUC, we start from the
> current frame, and since bpf_for_each_reg_in_vstate iterates over all
> frames, every register/stack slot sharing the ref_obj_id is destroyed,
> so we wouldn't encounter the same resource again, hence the frame
> descriptor should at most have one entry per resource. We iterate over
> the stack frame first since the location of registers holding
> resources is relatively stable and increases chances of merging across
> different paths.

Oh, right, thank you for explaining.
At first I thought that release_reference is only for verifier to keep
track if there are some resources that bpf_throw() can't cleanup at
the moment.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation
  2024-02-17 17:14         ` Eduard Zingerman
@ 2024-02-20 21:58           ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-02-20 21:58 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Sat, 17 Feb 2024 at 18:14, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2024-02-16 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
> [...]
>
> > I will add tests to exercise these cases, but the idea for STACK_ZERO
> > was to treat it as if we had a NULL value in that stack slot, thus
> > allowing merging with other resource pointers. Sometimes when NULL
> > initializing something, it can be marked as STACK_ZERO in the verifier
> > state. Therefore, we would prefer to treat it same as the case where a
> > scalar reg known to be zero is spilled to the stack (that is what we
> > do by using a fake struct bpf_reg_state).
>
> Agree that it is useful to merge 0/NULL/STACK_ZERO with PTR_TO_SOMETHING.
> What I meant is that merging with 0 is a noop and there is no need to
> add a new fd entry. However, I missed the following check:
>
> + static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
> + {
> +       struct bpf_frame_desc_reg_entry fd = {};
> +
> +       if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
> +               return 0;
>
> So, the intent is to skip adding fd entry if register has scalar type, right?
> I tried the following test case:
>
> SEC("?tc")
> __success
> int test(struct __sk_buff *ctx)
> {
>     asm volatile (
>        "r7 = *(u32 *)(r1 + 0);          \
>         r1 = %[ringbuf] ll;             \
>         r2 = 8;                         \
>         r3 = 0;                         \
>         r0 = 0;                         \
>         if r7 > 42 goto +1;             \
>         call %[bpf_ringbuf_reserve];    \
>         *(u64 *)(r10 - 8) = r0;         \
>         r0 = 0;                         \
>         r1 = 0;                         \
>         call bpf_throw;                 \
>     "   :
>         : __imm(bpf_ringbuf_reserve),
>           __imm_addr(ringbuf)
>         : "memory");
>     return 0;
> }
>
> And it adds fp[-8] entry for one branch and skips fp[-8] for the other.
> However, the following test passes as well (note 'r0 = 7'):
>
> SEC("?tc")
> __success
> int same_resource_many_regs(struct __sk_buff *ctx)
> {
>     asm volatile (
>        "r7 = *(u32 *)(r1 + 0);          \
>         r1 = %[ringbuf] ll;             \
>         r2 = 8;                         \
>         r3 = 0;                         \
>         r0 = 7; /* !!! */               \
>         if r7 > 42 goto +1;             \
>         call %[bpf_ringbuf_reserve];    \
>         *(u64 *)(r10 - 8) = r0;         \
>         r0 = 0;                         \
>         r1 = 0;                         \
>         call bpf_throw;                 \
>     "   :
>         : __imm(bpf_ringbuf_reserve),
>           __imm_addr(ringbuf)
>         : "memory");
>     return 0;
> }
>
> Which is probably not correct, as scalar 7 would be used as a
> parameter for ringbuf destructor, right?

I think you are right, we probably need to create an unmergeable slot
in case we find a non-zero scalar value in the stack as well.
I will fix this and add tests as well.

Thanks a lot for your thorough reviews! Really appreciate it.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 00/14] Exceptions - Resource Cleanup
  2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2024-02-01  4:21 ` [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup Kumar Kartikeya Dwivedi
@ 2024-03-14 11:08 ` Eduard Zingerman
  2024-03-18  5:40   ` Kumar Kartikeya Dwivedi
  14 siblings, 1 reply; 53+ messages in thread
From: Eduard Zingerman @ 2024-03-14 11:08 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

Hi Kumar,

Sorry for delayed response. Theoretical possibility that frame merge
might fail because of some e.g. clang version changes bugs me,
so I thought a bit on what alternatives are there.
For the sake of discussion, what do you think about runtime-based
approach:
- for each throwing sub-program reserve a stack region where
  references to currently acquired resources (and their types)
  would be stored;
- upon a call to something that acquires resource push pointer/type
  pair to this region;
- upon a call to something that releases resource delete pointer/type
  pair from this region;
- when bpf_throw() is executed, walk this region for each active frame
  and execute corresponding destructors.
- (alternatively, reserve one region for all frames).

Technically, this could be implemented as follows:
- during main verification pass:
  - when verifier processes a call to something that acquires resource,
    mark call instruction and resource type in insn_aux_data data;
  - same for processing of a call to something that releases resource;
  - keep track of a maximal number of simultaneously acquired resources;
- after main verification pass:
  - bump stack size for each subprogram by amount, necessary to hold
    the acquired resource table and assume that this table is at the
    end of the subprogram stack;
  - after each acquire call, insert a kfunc call that would add
    resource reference to the table;
  - after each release call, insert a kfunc call that would remove
    resource reference from the table.

On a surface it appears that this approach has a few advantages:
- seems simpler than frame descriptors tracking and merging
  (but only implementation would tell if this is so);
- should be resilient to program compilation changes;
- abort is possible at any program point, which might be interesting for:
  - cancelable BPF programs (where abort might be needed in the middle
    of the e.g. loop);
  - arenas, where it might be desirable to stop the program after e.g.
    faulting arena access.

The obvious disadvantage is incurred runtime cost.
On the other hand, it might be not that big.

What do you think?

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH v1 00/14] Exceptions - Resource Cleanup
  2024-03-14 11:08 ` [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Eduard Zingerman
@ 2024-03-18  5:40   ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 53+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-03-18  5:40 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet, Tejun Heo, Raj Sahu,
	Dan Williams, Rishabh Iyer, Sanidhya Kashyap

On Thu, 14 Mar 2024 at 12:08, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Hi Kumar,
>

Hello Eduard,

> Sorry for delayed response. Theoretical possibility that frame merge
> might fail because of some e.g. clang version changes bugs me,
> so I thought a bit on what alternatives are there.

I spent some time this weekend addressing feedback, and prepping the
next version.
As you point out (and as I explained in one of the replies to you),
the possibility of the
same stack slot / register containing a conflicting resource type can
cause merge problems.
I have had a similar thought to what you describe below (emitting
resource hints in such cases) to address this,
but I slept over your idea and have a few thoughts. Please let me know
what you think.

> For the sake of discussion, what do you think about runtime-based
> approach:
> - for each throwing sub-program reserve a stack region where
>   references to currently acquired resources (and their types)
>   would be stored;
> - upon a call to something that acquires resource push pointer/type
>   pair to this region;
> - upon a call to something that releases resource delete pointer/type
>   pair from this region;
> - when bpf_throw() is executed, walk this region for each active frame
>   and execute corresponding destructors.
> - (alternatively, reserve one region for all frames).
>

I went over this option early on but rejected it due to the typical
downsides you observed.
The maximum stack usage due to this will be peak resource acquisition
during the runtime of a subprogram (N) x 16 bytes.
The maximum will be the sum of all subprogs in a call chain.
This seems a bit problematic to me, given this can basically greatly
exceed typical program stack usage. Compared to
extra stack slot usage by bpf_loop inlining, and may_goto instruction,
this appears to be too much.
We can mitigate it by using a more space efficient encoding to
identify resources, but the core problem still remains.
Apart from that, there's the overhead to know zero of this stack
portion on entry every time.

I just think this is all unnecessary especially when most of the time
the exception is not going to be thrown anyway,
so it's all cost incurred for a case that will practically never
happen in a correct program.

> Technically, this could be implemented as follows:
> - during main verification pass:
>   - when verifier processes a call to something that acquires resource,
>     mark call instruction and resource type in insn_aux_data data;
>   - same for processing of a call to something that releases resource;
>   - keep track of a maximal number of simultaneously acquired resources;
> - after main verification pass:
>   - bump stack size for each subprogram by amount, necessary to hold
>     the acquired resource table and assume that this table is at the
>     end of the subprogram stack;
>   - after each acquire call, insert a kfunc call that would add
>     resource reference to the table;
>   - after each release call, insert a kfunc call that would remove
>     resource reference from the table.
>
> On a surface it appears that this approach has a few advantages:
> - seems simpler than frame descriptors tracking and merging
>   (but only implementation would tell if this is so);
> - should be resilient to program compilation changes;
> - abort is possible at any program point, which might be interesting for:
>   - cancelable BPF programs (where abort might be needed in the middle
>     of the e.g. loop);

Let us discuss cancellation in another set that I plan to send in some
time after this one,
but I think aborting arbitrarily at any point of the program is not
the way to go. I have also
reviewed literature on how this happens in other language runtimes,
and the broad consensus
is to have explicit cancellation points in programs, and only allow
cancellation for them.
Synchronous and asynchronous is irrelevant to the mechanism, but
usually it is at explicit program
points.

Not only can you encounter a program executing in the middle of a
loop, you can interrupt it within a kfunc,
in such a case, you cannot really abort it immediately, as that may
not be safe. It is thus better to designate
cancellation points in the program, which are visited even in cases
where the program may possibly be stuck
for a long time (like iterator next kfunc) and only do cancellation
when executing them.

Anyway, let's continue this discussion once I send the RFC out for
cancellation. Will try to also include a PoC
for arena fault aborting.

>   - arenas, where it might be desirable to stop the program after e.g.
>     faulting arena access.
>
> The obvious disadvantage is incurred runtime cost.
> On the other hand, it might be not that big.
>
> What do you think?

Let's go back to the problem we can encounter with frame descriptors.
The case of multiple paths having distinct resource types in a common
stack slot.

What I would do is only force a spill of the specific resource type
(right after the pc acquires it)
only if their stack slots are unmergeable. Since we go over the stack
first, any conflicts in the register
types will be resolved since all states for the ref_obj_id will be
erased. For conflicts in registers, we'd
do something similar. A new 8-byte entry would be reserved for the pc
after max_stack_depth and
all these entries will be zeroed upon entry. A similar thing could be
done by the compiler itself and the
verifier wouldn't have to do all this, but for now let's just do it in
the verifier.

In this way, when we encounter an unlikely case where the same slot
has conflicting entries, we'll erase
state by storing its entry in the stack slot reserved for the pc, and
clear its state from the verifier state,
allowing all respective entries for the same resource to now merge
easily. The same can be repeated
for each resource if all of them conflict.

So far playing around with this series applying it to program
cancellation and sched-ext cases, I haven't
encountered merging issues, so it seems like an edge case that is
unlikely to often occur in practice. But when it does,
it can be resolved by the logic above. This should address all issues
and should be resilient to changes triggered by different clang
versions.

Let me know what you think.

>
> Thanks,
> Eduard

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2024-03-18  5:40 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-01  4:20 [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Kumar Kartikeya Dwivedi
2024-02-01  4:20 ` [RFC PATCH v1 01/14] bpf: Mark subprogs as throw reachable before do_check pass Kumar Kartikeya Dwivedi
2024-02-12 19:35   ` David Vernet
2024-02-12 22:28     ` Kumar Kartikeya Dwivedi
2024-02-15  1:01   ` Eduard Zingerman
2024-02-16 21:34     ` Kumar Kartikeya Dwivedi
2024-02-01  4:20 ` [RFC PATCH v1 02/14] bpf: Process global subprog's exception propagation Kumar Kartikeya Dwivedi
2024-02-15  1:10   ` Eduard Zingerman
2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
2024-02-17 14:04       ` Eduard Zingerman
2024-02-01  4:20 ` [RFC PATCH v1 03/14] selftests/bpf: Add test for throwing global subprog with acquired refs Kumar Kartikeya Dwivedi
2024-02-15  1:10   ` Eduard Zingerman
2024-02-01  4:20 ` [RFC PATCH v1 04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump Kumar Kartikeya Dwivedi
2024-02-15  1:11   ` Eduard Zingerman
2024-02-16 21:50     ` Kumar Kartikeya Dwivedi
2024-02-01  4:21 ` [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation Kumar Kartikeya Dwivedi
2024-02-15 18:24   ` Eduard Zingerman
2024-02-16 11:23     ` Eduard Zingerman
2024-02-16 22:06       ` Kumar Kartikeya Dwivedi
2024-02-17 17:14         ` Eduard Zingerman
2024-02-20 21:58           ` Kumar Kartikeya Dwivedi
2024-02-16 22:24     ` Kumar Kartikeya Dwivedi
2024-02-01  4:21 ` [RFC PATCH v1 06/14] bpf: Adjust frame descriptor pc on instruction patching Kumar Kartikeya Dwivedi
2024-02-15 16:31   ` Eduard Zingerman
2024-02-16 21:52     ` Kumar Kartikeya Dwivedi
2024-02-17 14:08       ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 07/14] bpf: Use hidden subprog trampoline for bpf_throw Kumar Kartikeya Dwivedi
2024-02-15 22:11   ` Eduard Zingerman
2024-02-16 21:59     ` Kumar Kartikeya Dwivedi
2024-02-17 14:22       ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 08/14] bpf: Compute used callee saved registers for subprogs Kumar Kartikeya Dwivedi
2024-02-15 22:12   ` Eduard Zingerman
2024-02-16 22:02     ` Kumar Kartikeya Dwivedi
2024-02-17 14:26       ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 09/14] bpf, x86: Fix up pc offsets for frame descriptor entries Kumar Kartikeya Dwivedi
2024-02-15 22:12   ` Eduard Zingerman
2024-02-16 13:33     ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 10/14] bpf, x86: Implement runtime resource cleanup for exceptions Kumar Kartikeya Dwivedi
2024-02-03  9:02   ` kernel test robot
2024-02-16 12:02   ` Eduard Zingerman
2024-02-16 22:28     ` Kumar Kartikeya Dwivedi
2024-02-19 12:01       ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 11/14] bpf: Release references in verifier state when throwing exceptions Kumar Kartikeya Dwivedi
2024-02-16 12:21   ` Eduard Zingerman
2024-02-01  4:21 ` [RFC PATCH v1 12/14] bpf: Register cleanup dtors for runtime unwinding Kumar Kartikeya Dwivedi
2024-02-01  4:21 ` [RFC PATCH v1 13/14] bpf: Make bpf_throw available to all program types Kumar Kartikeya Dwivedi
2024-02-01  4:21 ` [RFC PATCH v1 14/14] selftests/bpf: Add tests for exceptions runtime cleanup Kumar Kartikeya Dwivedi
2024-02-12 20:53   ` David Vernet
2024-02-12 22:43     ` Kumar Kartikeya Dwivedi
2024-02-13 19:33       ` David Vernet
2024-02-13 20:51         ` Kumar Kartikeya Dwivedi
2024-03-14 11:08 ` [RFC PATCH v1 00/14] Exceptions - Resource Cleanup Eduard Zingerman
2024-03-18  5:40   ` Kumar Kartikeya Dwivedi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.