bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2
@ 2023-04-05  0:42 Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 1/9] bpf: Fix kfunc callback handling Kumar Kartikeya Dwivedi
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

This series implements the bare minimum support for basic BPF
exceptions. This is a feature to allow programs to simply throw a
valueless exception within a BPF program to abort its execution.
Automatic cleanup of held resources and generation of landing pads to
unwind program state will be done in the part 2 set.

The exception state is part of the the current task's task_struct's
exception_thrown array, which is a set of 4 boolean variables,
representing the exception state for each kernel context (Task, SoftIRQ,
HardIRQ, NMI). This allows a program interrupting the execution of
another program to not corrupt its exception state when throwing.

During program nesting, i.e. a program invoked in the same context while
another program is active, an exception can never be in the 'thrown'
state, as long as it clears its own exception state when exiting and
returning to the kernel, the caller program will be fine.

Hence, the conditions are that a program receives unset exception state
on entry, and preserves the unset exception state on exit. Only BPF
subprogs avoid resetting exception state on exit, so that they can
propagate the modified exception state to their caller. This can be
another caller subprog, the caller main prog, or the kernel.

BPF helpers executing callbacks in the kernel are modified to catch and
exit early when they detect a thrown exception from the callback. The
program will then catch this exception and unwind.

BPF helpers and kfuncs may also throw BPF exceptions, they simply
declare their intent to throw using annotations (e.g. predicate function
with list of throwing helpers, KF_THROW for kfuncs, etc). The program
will automatically be instrumented to detect a thrown exception and
unwind the stack.

The following two kfuncs are introduced:

// Throw a BPF exception, terminating execution of the program.
void bpf_throw(void);

// Set an exception handler which is invoked after the unwinding of the
// stack is finished. The return value of the handler is the value
// returned when an exception is thrown, otherwise by default it is 0.
void bpf_set_exception_handler(int (*handler)(void));

Dedicated exception state in task_struct vs bpf_run_ctx was chosen
primarily for the following reasons:
 - Synchronous callbacks semantically execute within programs and affect
   their state. Hence, exceptions thrown by them should be propagated
   when invoked from helpers and kfuncs.
 - For async callbacks that terminate execution safely using bpf_throw,
   this would require having a bpf_run_ctx by default (with same
   semantics as the current solution) or setup by the invoking context
   for each cb invocation (which breaks cb == func pointer semantics
   assumed currently within the kernel).
 - Avoid setting up bpf_run_ctx for program types (esp. XDP) that don't
   need it, and no changes are needed for programs that don't use
   exceptions. Whatever minor overhead there is, is only paid for when
   they are used.

Restrictions can be imposed on the program to revisit the bpf_run_ctx
solution (like forbidding callbacks from using bpf_throw). However,
genericity of use of bpf_throw in all possible cases is given preference
(so that bpf_assert which is the primary consumer isn't surprising to
use when it isn't supported in certain contexts).

The overhead of calling helpers to deal with exception state can be
easily forgone by getting direct access to the current task pointer to
touch exception state, or using a dedicated callee-saved register to
hold it within the program (while relying on task_struct state across
BPF-Kernel boundary).

Verification
------------

We rely on the main verifier pass to decide how to rewrite the BPF
program to handle thrown exceptions.

The first step of verification in the main pass (do_check) is symbolic
execution of the global subprograms. These subprograms are verified
independently, and hence, they are not explored whenever there is a call
to a global subprog.

We first do 'can_throw' marking for each global subprogram by following
its execution, detecting any bpf_throw calls made in any of the paths
explored by the verifier. This however does not have full visibility
into the thrown exceptions, and only attains markings for exceptions it
can see to be thrown by static subprogs, helpers, kfuncs, etc.

For instance:

GF1:
	call GF2
	exit
GF2:
	call GF3
	exit
GF3:
	call bpf_throw

If all of these are explored in order, only GF3 receives the can_throw
marking. To remedy this, we do another pass and follow BPF_PSEUDO_CALL
edges to global subprogs in the call graph of global subprogs, and
propagate the direct throws in some global subprogs to other global
subprogs. Each caller then receives can_throw marking and is marked for
rewrite later.

Now, all global functions are annotated correctly with the right
can_throw markings, and any calls to them in the main subprog can also
be marked for appropriate rewrites later. We now go through the main
subprog, and since we have full visibility due to the verifier's path
awareness into which paths may throw, we use this to selectively mark
every such instruction which has throwing semantics (calls to throwing
subprogs (global/static), calls to helper or kfuncs taking callbacks
which throw), calls to kfuncs which throw, etc.

If a certain static subprog only throws when called from a certain point
in the program, and does not throw in the other, we avoid marking its
other callsite as throwing. However, this is unlike global subprogs,
where we do not explore them, hence cannot make this distinction. This
also means that two calls to the same static subprog may be rewritten
differently, and thus may or may not handle the exception.

This becomes a problem if we allow extension programs to attach to such
static subprogs. Usually, the use case is to replace global subprogs,
and static subprogs are rejected right now. A test case is included to
catch the case when things change and prompt appropriate checks in
check_ext_prog (as all callsites are not prepared to handle exceptions),

Optimizations
-------------

Currently, exception handling code is generated inline. This was done to
keep things simple for now, and since the generated code is typically
constant for each type of call instruction.

Generation of dedicated landing pads to unwind program stack and moving
it outline to a separate 'invented' subprog or end of subprog has been
split into the future set, where the BPF runtime needs to release
resources present on the program stack.

Another minor annoyance are the calls to bpf_get_exception to fetch
exception state. This call needs to be made unconditionally regardless
of whether an exception was thrown or not. It would be much more
convenient to have a hidden callee-saved BPF register which holds the
exception state, but I'm not using R12 for that (e.g. on x86) hurts some
other use case.

Then, checking exceptions thrown within the program becomes much more
lightweight, and the use bpf_get_exception is only limited to exceptions
thrown by helpers and kfuncs (around their call, and then the exception
register can propagate it to callers).

Callbacks invoked by the kernel synchronously will then set both
exception register and task-local exception state.

Known issues
------------

* Since bpf_throw is marked noreturn, the compiler sometimes may determine
  that a function always throws and emit the final instruction as a call
  to it without emitting an exit in the caller. This leads to an error
  where the verifier complains about final instruction not being a jump,
  exit, or bpf_throw call (which gets treated as an exit). This is
  unlikely to occur as bpf_throw wouldn't be used whenever the condition
  is already known at compile time, but I could see it when testing with
  always throwing subprogs and calling into them.

* Just asm volatile ("call bpf_throw" :::) does not emit DATASEC .ksyms
  for bpf_throw, there needs to be explicit call in C for clang to emit
  the DATASEC info in BTF, leading to errors during compilation.

Kumar Kartikeya Dwivedi (9):
  bpf: Fix kfunc callback handling
  bpf: Refactor and generalize optimize_bpf_loop
  bpf: Implement bpf_throw kfunc
  bpf: Handle throwing BPF callbacks in helpers and kfuncs
  bpf: Add pass to fixup global function throw information
  bpf: Add KF_THROW annotation for kfuncs
  bpf: Introduce bpf_set_exception_callback kfunc
  bpf: Introduce BPF assertion macros
  selftests/bpf: Add tests for BPF exceptions

 include/linux/bpf.h                           |   9 +-
 include/linux/bpf_verifier.h                  |  20 +-
 include/linux/btf.h                           |   1 +
 include/linux/sched.h                         |   1 +
 kernel/bpf/arraymap.c                         |  13 +-
 kernel/bpf/bpf_iter.c                         |   2 +
 kernel/bpf/hashtab.c                          |   4 +-
 kernel/bpf/helpers.c                          |  46 +-
 kernel/bpf/ringbuf.c                          |   4 +
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/task_iter.c                        |   2 +
 kernel/bpf/trampoline.c                       |   4 +-
 kernel/bpf/verifier.c                         | 692 +++++++++++++++++-
 net/bpf/test_run.c                            |  12 +
 .../testing/selftests/bpf/bpf_experimental.h  |  37 +
 .../selftests/bpf/prog_tests/exceptions.c     | 240 ++++++
 .../testing/selftests/bpf/progs/exceptions.c  | 218 ++++++
 .../selftests/bpf/progs/exceptions_ext.c      |  42 ++
 .../selftests/bpf/progs/exceptions_fail.c     | 267 +++++++
 19 files changed, 1576 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_fail.c


base-commit: d099f594ad5650e8d8232b5f31f5f90104e65def
-- 
2.40.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 1/9] bpf: Fix kfunc callback handling
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 2/9] bpf: Refactor and generalize optimize_bpf_loop Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

The kfunc code to handle KF_ARG_PTR_TO_CALLBACK does not check the reg
type before using reg->subprogno. This can accidently permit invalid
pointers from being passed into callback helpers (e.g. silently from
different paths). We need to reject any other type except PTR_TO_FUNC.

Fixes: 5d92ddc3de1b ("bpf: Add callback validation to kfunc verifier logic")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 56f569811f70..693aeddc9fe2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10562,6 +10562,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		}
 		case KF_ARG_PTR_TO_CALLBACK:
+			if (reg->type != PTR_TO_FUNC) {
+				verbose(env, "arg%d expected pointer to func\n", i);
+				return -EINVAL;
+			}
 			meta->subprogno = reg->subprogno;
 			break;
 		}
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 2/9] bpf: Refactor and generalize optimize_bpf_loop
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 1/9] bpf: Fix kfunc callback handling Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

The optimize_bpf_loop pass is currently used to transform calls to the
bpf_loop helper function into an inlined instruction sequence which
elides the helper call and does the looping directly and emits the call
to the loop callback.

The future patches for exception propagation for BPF programs will
require similar rewriting which needs access to extra stack space to
spill registers which may be clobbered in the emitted sequence.

We wish to reuse the logic of updating subprog stack depth by tracking
the extra stack depth needed across all rewrites in a subprog in
subseqeunt patches. Hence, refactor the code to make it amenable for
plugging extra rewrite passes over other instructions. Note that the
stack_depth_extra is now set by a max of existing and required extra
stack depth. This allows rewrites to reuse the extra stack depth among
themselves, by figuring the maximum depth needed for a subprog.

Note that we only do one rewrite in one loop iteration, and thus
new_prog is set only once. This can be used to pull out shared updates
of delta, env->prog, etc. into common code that occurs after all cases
of possible rewrites are examined, and if application, performed.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 693aeddc9fe2..8ecd5df73b07 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18024,23 +18024,26 @@ static struct bpf_prog *inline_bpf_loop(struct bpf_verifier_env *env,
 	return new_prog;
 }
 
-static bool is_bpf_loop_call(struct bpf_insn *insn)
+static bool is_inlineable_bpf_loop_call(struct bpf_insn *insn,
+					struct bpf_insn_aux_data *aux)
 {
 	return insn->code == (BPF_JMP | BPF_CALL) &&
 		insn->src_reg == 0 &&
-		insn->imm == BPF_FUNC_loop;
+		insn->imm == BPF_FUNC_loop &&
+		aux->loop_inline_state.fit_for_inline;
 }
 
 /* For all sub-programs in the program (including main) check
- * insn_aux_data to see if there are bpf_loop calls that require
- * inlining. If such calls are found the calls are replaced with a
+ * insn_aux_data to see if there are any instructions that need to be
+ * transformed into an instruction sequence. E.g. bpf_loop calls that
+ * require inlining. If such calls are found the calls are replaced with a
  * sequence of instructions produced by `inline_bpf_loop` function and
  * subprog stack_depth is increased by the size of 3 registers.
  * This stack space is used to spill values of the R6, R7, R8.  These
  * registers are used to store the loop bound, counter and context
  * variables.
  */
-static int optimize_bpf_loop(struct bpf_verifier_env *env)
+static int do_misc_rewrites(struct bpf_verifier_env *env)
 {
 	struct bpf_subprog_info *subprogs = env->subprog_info;
 	int i, cur_subprog = 0, cnt, delta = 0;
@@ -18051,13 +18054,14 @@ static int optimize_bpf_loop(struct bpf_verifier_env *env)
 	u16 stack_depth_extra = 0;
 
 	for (i = 0; i < insn_cnt; i++, insn++) {
-		struct bpf_loop_inline_state *inline_state =
-			&env->insn_aux_data[i + delta].loop_inline_state;
+		struct bpf_insn_aux_data *insn_aux = &env->insn_aux_data[i + delta];
+		struct bpf_prog *new_prog = NULL;
 
-		if (is_bpf_loop_call(insn) && inline_state->fit_for_inline) {
-			struct bpf_prog *new_prog;
+		if (is_inlineable_bpf_loop_call(insn, insn_aux)) {
+			struct bpf_loop_inline_state *inline_state = &insn_aux->loop_inline_state;
 
-			stack_depth_extra = BPF_REG_SIZE * 3 + stack_depth_roundup;
+			stack_depth_extra = max_t(u16, stack_depth_extra,
+						  BPF_REG_SIZE * 3 + stack_depth_roundup);
 			new_prog = inline_bpf_loop(env,
 						   i + delta,
 						   -(stack_depth + stack_depth_extra),
@@ -18065,7 +18069,9 @@ static int optimize_bpf_loop(struct bpf_verifier_env *env)
 						   &cnt);
 			if (!new_prog)
 				return -ENOMEM;
+		}
 
+		if (new_prog) {
 			delta     += cnt - 1;
 			env->prog  = new_prog;
 			insn       = new_prog->insnsi + i + delta;
@@ -18876,7 +18882,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 
 	/* instruction rewrites happen after this point */
 	if (ret == 0)
-		ret = optimize_bpf_loop(env);
+		ret = do_misc_rewrites(env);
 
 	if (is_priv) {
 		if (ret == 0)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 1/9] bpf: Fix kfunc callback handling Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 2/9] bpf: Refactor and generalize optimize_bpf_loop Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-06  2:16   ` Alexei Starovoitov
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Introduce the support for exceptions in BPF runtime. The bpf_throw kfunc
is the main helper to be used by programs to throw an exception. In the
verifier domain, it will be processed as an immediate main exit for the
program, ending exploration of that path.

This proves to be a powerful property: a program can perform a condition
check and call bpf_throw for the case that will never occur at runtime,
but it's still something it needs to prove to the verifier. The
unwinding of the program stack and any resources will automatically be
performed by the BPF runtime.

For now, we fail if we see lingering references, locks, etc., but a
future patch will extend the current infrastructure to generate the
cleanup code for those too.

Programs that do not use exceptions today have no change in their .text
and performance, as all extra code generated to throw, propagate, unwind
the stack, etc. only applies to programs that use this new facility.

- The exception state is represented using four booleans in the
  task_struct of current task. Each boolean corresponds to the exception
  state for each kernel context. This allows BPF programs to be
  interrupted and still not clobber the other's exception state.
- The other vexing case is of recursion. If a program calls into another
  program (e.g. call into helper which invokes tracing program
  eventually), it may throw and clobber the current exception state. To
  avoid this, an invariant is maintained across the implementation:
	Exception state is always cleared on entry and exit of the main
	BPF program.
  This implies that if recursion occurs, the BPF program will clear the
  current exception state on entry and exit. However, callbacks do not
  do the same, because they are subprograms. The case for propagating
  exceptions of callbacks invoked by the kernel back to the BPF program
  is handled in the next commit. This is also the main reason to clear
  exception state on entry, asynchronous callbacks can clobber exception
  state even though we make sure it's always set to be 0 within the
  kernel.
  Anyhow, the only other thing to be kept in mind is to never allow a
  BPF program to execute when the program is being unwinded. This
  implies that every function involved in this path must be notrace,
  which is the case for bpf_throw, bpf_get_exception and
  bpf_reset_exception.
- Rewrites happen for bpf_throw and call instructions to subprogs.
  The instructions which are executed in the main frame of the main
  program (thus, not global functions and extension programs, which end
  up executing in frame > 0) need to be rewritten differently. This is
  tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
  recursing tracing program may set exception state which the main
  program is instrumented to handle eventually, causing it to unwind
  when it shouldn't.
- Callsite specific marking is done. It is possible to reduce the
  instrumentation needed if we were marking callsites. Only all calls to
  global subprogs would need to be rewritten to handle thrown
  exceptions, otherwise for each callsite to static subprogs, the
  verifier's path awareness allows us to skip the handling if all
  passible paths taken using that callsite never throw. This propagates
  into all callers and prog may end up having throws_exception as false.
  Typically this reduces the amount of instrumentation when subprogs
  throwing are deeply nested and only throw under specific conditions.
- BPF_PROG_TYPE_EXT is special in that it replaces global functions in
  other BPF programs. A check is added after we know exception
  specification of a prog (throws_exception) to ensure we don't attach
  throwing extension to a program not instrumented to handle them, or
  to main subprog which has BPF_THROW_OUTER handling compared to
  extension prog's BPF_THROW_INNER handling.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |   9 +-
 include/linux/bpf_verifier.h                  |  13 +
 include/linux/sched.h                         |   1 +
 kernel/bpf/arraymap.c                         |   9 +-
 kernel/bpf/helpers.c                          |  22 ++
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/trampoline.c                       |   4 +-
 kernel/bpf/verifier.c                         | 241 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |   9 +
 9 files changed, 299 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 002a811b6b90..04b81f5fe809 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1287,6 +1287,7 @@ static inline bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
 struct bpf_func_info_aux {
 	u16 linkage;
 	bool unreliable;
+	bool throws_exception;
 };
 
 enum bpf_jit_poke_reason {
@@ -1430,7 +1431,8 @@ struct bpf_prog {
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
 				call_get_func_ip:1, /* Do we call get_func_ip() */
-				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
+				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
+				throws_exception:1; /* Does this program throw exceptions? */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
@@ -3035,4 +3037,9 @@ static inline gfp_t bpf_memcg_flags(gfp_t flags)
 	return flags;
 }
 
+/* BPF Exception helpers */
+void bpf_reset_exception(void);
+u64 bpf_get_exception(void);
+void bpf_throw(void);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 81d525d057c7..bc067223d3ee 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -430,6 +430,17 @@ struct bpf_loop_inline_state {
 	u32 callback_subprogno; /* valid when fit_for_inline is true */
 };
 
+enum {
+	BPF_THROW_NONE,
+	BPF_THROW_OUTER,
+	BPF_THROW_INNER,
+};
+
+struct bpf_throw_state {
+	int type;
+	bool check_helper_ret_code;
+};
+
 /* Possible states for alu_state member. */
 #define BPF_ALU_SANITIZE_SRC		(1U << 0)
 #define BPF_ALU_SANITIZE_DST		(1U << 1)
@@ -464,6 +475,7 @@ struct bpf_insn_aux_data {
 		 */
 		struct bpf_loop_inline_state loop_inline_state;
 	};
+	struct bpf_throw_state throw_state;
 	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
 	struct btf_struct_meta *kptr_struct_meta;
 	u64 map_key_state; /* constant (32 bit) key tracking for maps */
@@ -537,6 +549,7 @@ struct bpf_subprog_info {
 	bool tail_call_reachable;
 	bool has_ld_abs;
 	bool is_async_cb;
+	bool can_throw;
 };
 
 /* single container for all structs
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b11b4517760f..a568245b59a2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1480,6 +1480,7 @@ struct task_struct {
 	struct bpf_local_storage __rcu	*bpf_storage;
 	/* Used for BPF run context */
 	struct bpf_run_ctx		*bpf_ctx;
+	bool				bpf_exception_thrown[4];
 #endif
 
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..de0eadf8706f 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -905,7 +905,14 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
 	if (IS_ERR(prog))
 		return prog;
 
-	if (!bpf_prog_map_compatible(map, prog)) {
+	/* Programs which throw exceptions are not allowed to be tail call
+	 * targets. This is because it forces us to be conservative for each
+	 * bpf_tail_call invocation and assume it may throw, since we do not
+	 * know what the target program may do, thus causing us to propagate the
+	 * exception and mark calling prog as potentially throwing. Just be
+	 * restrictive for now and disallow this.
+	 */
+	if (prog->throws_exception || !bpf_prog_map_compatible(map, prog)) {
 		bpf_prog_put(prog);
 		return ERR_PTR(-EINVAL);
 	}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6be16db9f188..89e70907257c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1879,6 +1879,20 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
 	}
 }
 
+notrace void bpf_reset_exception(void)
+{
+	int i = interrupt_context_level();
+
+	current->bpf_exception_thrown[i] = false;
+}
+
+notrace u64 bpf_get_exception(void)
+{
+	int i = interrupt_context_level();
+
+	return current->bpf_exception_thrown[i];
+}
+
 __diag_push();
 __diag_ignore_all("-Wmissing-prototypes",
 		  "Global functions as their definitions will be in vmlinux BTF");
@@ -2295,6 +2309,13 @@ __bpf_kfunc void bpf_rcu_read_unlock(void)
 	rcu_read_unlock();
 }
 
+__bpf_kfunc notrace void bpf_throw(void)
+{
+	int i = interrupt_context_level();
+
+	current->bpf_exception_thrown[i] = true;
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -2321,6 +2342,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
 #endif
 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_throw)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e18ac7fdc210..f82e7a174d6a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3144,6 +3144,16 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 		tgt_prog = prog->aux->dst_prog;
 	}
 
+	/* Don't allow tracing programs to attach to fexit and clear exception
+	 * state when we are unwinding the program.
+	 */
+	if (prog->type == BPF_PROG_TYPE_TRACING &&
+	    (prog->expected_attach_type == BPF_TRACE_FEXIT) &&
+	    tgt_prog && tgt_prog->throws_exception && prog->throws_exception) {
+		err = -EINVAL;
+		goto out_unlock;
+	}
+
 	err = bpf_link_prime(&link->link.link, &link_primer);
 	if (err)
 		goto out_unlock;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index f61d5138b12b..e9f9dd52f16c 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -514,7 +514,9 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr
 	kind = bpf_attach_type_to_tramp(link->link.prog);
 	if (tr->extension_prog)
 		/* cannot attach fentry/fexit if extension prog is attached.
-		 * cannot overwrite extension prog either.
+		 * cannot overwrite extension prog either. We rely on this to
+		 * not check extension prog's exception specification (since
+		 * throwing extension may not replace non-throwing).
 		 */
 		return -EBUSY;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8ecd5df73b07..6981d8817c71 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2787,6 +2787,8 @@ static int add_subprog_and_kfunc(struct bpf_verifier_env *env)
 	return 0;
 }
 
+static bool is_bpf_throw_call(struct bpf_insn *insn);
+
 static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, subprog_start, subprog_end, off, cur_subprog = 0;
@@ -2820,11 +2822,12 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		if (i == subprog_end - 1) {
 			/* to avoid fall-through from one subprog into another
 			 * the last insn of the subprog should be either exit
-			 * or unconditional jump back
+			 * or unconditional jump back or bpf_throw call
 			 */
 			if (code != (BPF_JMP | BPF_EXIT) &&
-			    code != (BPF_JMP | BPF_JA)) {
-				verbose(env, "last insn is not an exit or jmp\n");
+			    code != (BPF_JMP | BPF_JA) &&
+			    !is_bpf_throw_call(insn + i)) {
+				verbose(env, "last insn is not an exit or jmp or bpf_throw call\n");
 				return -EINVAL;
 			}
 			subprog_start = subprog_end;
@@ -8200,6 +8203,7 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *callee, int insn_idx);
 
 static bool is_callback_calling_kfunc(u32 btf_id);
+static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx);
 
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx, int subprog,
@@ -8247,6 +8251,12 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG;
 
 			/* continue with next insn after call */
+
+			/* We don't explore the global function, but if it
+			 * throws, mark the callchain as throwing.
+			 */
+			if (env->subprog_info[subprog].can_throw)
+				return mark_chain_throw(env, *insn_idx);
 			return 0;
 		}
 	}
@@ -8382,6 +8392,53 @@ static int set_callee_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_throw_state_type(struct bpf_verifier_env *env, int insn_idx,
+				int frame, int subprog)
+{
+	struct bpf_throw_state *ts = &env->insn_aux_data[insn_idx].throw_state;
+	int type;
+
+	if (!frame && !subprog && env->prog->type != BPF_PROG_TYPE_EXT)
+		type = BPF_THROW_OUTER;
+	else
+		type = BPF_THROW_INNER;
+	if (ts->type != BPF_THROW_NONE) {
+		if (ts->type != type) {
+			verbose(env,
+				"conflicting rewrite type for throwing call insn %d: %d and %d\n",
+				insn_idx, ts->type, type);
+			return -EINVAL;
+		}
+	}
+	ts->type = type;
+	return 0;
+}
+
+static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx) {
+	struct bpf_func_info_aux *func_info_aux = env->prog->aux->func_info_aux;
+	struct bpf_subprog_info *subprog = env->subprog_info;
+	struct bpf_verifier_state *state = env->cur_state;
+	struct bpf_func_state **frame = state->frame;
+	u32 cur_subprogno;
+	int ret;
+
+	/* Mark all callsites leading up to this throw and their corresponding
+	 * subprogs and update their func_info_aux table.
+	 */
+	for (int i = 1; i <= state->curframe; i++) {
+		u32 subprogno = frame[i - 1]->subprogno;
+
+		func_info_aux[subprogno].throws_exception = subprog[subprogno].can_throw = true;
+		ret = set_throw_state_type(env, frame[i]->callsite, i - 1, subprogno);
+		if (ret < 0)
+			return ret;
+	}
+	/* Now mark actual instruction which caused the throw */
+	cur_subprogno = frame[state->curframe]->subprogno;
+	func_info_aux[cur_subprogno].throws_exception = subprog[cur_subprogno].can_throw = true;
+	return set_throw_state_type(env, insn_idx, state->curframe, cur_subprogno);
+}
+
 static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			   int *insn_idx)
 {
@@ -8394,7 +8451,6 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			target_insn);
 		return -EFAULT;
 	}
-
 	return __check_func_call(env, insn, insn_idx, subprog, set_callee_state);
 }
 
@@ -8755,17 +8811,17 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	return 0;
 }
 
-static int check_reference_leak(struct bpf_verifier_env *env)
+static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
 {
 	struct bpf_func_state *state = cur_func(env);
 	bool refs_lingering = false;
 	int i;
 
-	if (state->frameno && !state->in_callback_fn)
+	if (!exception_exit && state->frameno && !state->in_callback_fn)
 		return 0;
 
 	for (i = 0; i < state->acquired_refs; i++) {
-		if (state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
+		if (!exception_exit && state->in_callback_fn && state->refs[i].callback_ref != state->frameno)
 			continue;
 		verbose(env, "Unreleased reference id=%d alloc_insn=%d\n",
 			state->refs[i].id, state->refs[i].insn_idx);
@@ -8999,7 +9055,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
-		err = check_reference_leak(env);
+		err = check_reference_leak(env, false);
 		if (err) {
 			verbose(env, "tail_call would lead to reference leak\n");
 			return err;
@@ -9615,6 +9671,7 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_from_xdp,
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
+	KF_bpf_throw,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -9633,6 +9690,7 @@ BTF_ID(func, bpf_dynptr_from_skb)
 BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
+BTF_ID(func, bpf_throw)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -9653,6 +9711,7 @@ BTF_ID(func, bpf_dynptr_from_skb)
 BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
+BTF_ID(func, bpf_throw)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10736,6 +10795,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_throw]) {
+		err = mark_chain_throw(env, insn_idx);
+		if (err < 0)
+			return err;
+		return 1;
+	}
+
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
 		mark_reg_not_init(env, regs, caller_saved[i]);
 
@@ -13670,7 +13736,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	 * gen_ld_abs() may terminate the program at runtime, leading to
 	 * reference leak.
 	 */
-	err = check_reference_leak(env);
+	err = check_reference_leak(env, false);
 	if (err) {
 		verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n");
 		return err;
@@ -14075,6 +14141,10 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 			struct bpf_kfunc_call_arg_meta meta;
 
+			/* 'call bpf_throw' has no fallthrough edge, same as BPF_EXIT */
+			if (is_bpf_throw_call(insn))
+				return DONE_EXPLORING;
+
 			ret = fetch_kfunc_meta(env, insn, &meta, NULL);
 			if (ret == 0 && is_iter_next_kfunc(&meta)) {
 				mark_prune_point(env, t);
@@ -14738,7 +14808,7 @@ static bool regs_exact(const struct bpf_reg_state *rold,
 		       const struct bpf_reg_state *rcur,
 		       struct bpf_id_pair *idmap)
 {
-	return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 && 
+	return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
 	       check_ids(rold->id, rcur->id, idmap) &&
 	       check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap);
 }
@@ -15617,6 +15687,7 @@ static int do_check(struct bpf_verifier_env *env)
 	int prev_insn_idx = -1;
 
 	for (;;) {
+		bool exception_exit = false;
 		struct bpf_insn *insn;
 		u8 class;
 		int err;
@@ -15830,12 +15901,18 @@ static int do_check(struct bpf_verifier_env *env)
 						return -EINVAL;
 					}
 				}
-				if (insn->src_reg == BPF_PSEUDO_CALL)
+				if (insn->src_reg == BPF_PSEUDO_CALL) {
 					err = check_func_call(env, insn, &env->insn_idx);
-				else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL)
+				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 					err = check_kfunc_call(env, insn, &env->insn_idx);
-				else
+					if (err == 1) {
+						err = 0;
+						exception_exit = true;
+						goto process_bpf_exit_full;
+					}
+				} else {
 					err = check_helper_call(env, insn, &env->insn_idx);
+				}
 				if (err)
 					return err;
 
@@ -15863,6 +15940,7 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
+process_bpf_exit_full:
 				if (env->cur_state->active_lock.ptr &&
 				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_spin_unlock is missing\n");
@@ -15880,10 +15958,23 @@ static int do_check(struct bpf_verifier_env *env)
 				 * function, for which reference_state must
 				 * match caller reference state when it exits.
 				 */
-				err = check_reference_leak(env);
+				err = check_reference_leak(env, exception_exit);
 				if (err)
 					return err;
 
+				/* The side effect of the prepare_func_exit
+				 * which is being skipped is that it frees
+				 * bpf_func_state. Typically, process_bpf_exit
+				 * will only be hit with outermost exit.
+				 * copy_verifier_state in pop_stack will handle
+				 * freeing of any extra bpf_func_state left over
+				 * from not processing all nested function
+				 * exits. We also skip return code checks as
+				 * they are not needed for exceptional exits.
+				 */
+				if (exception_exit)
+					goto process_bpf_exit;
+
 				if (state->curframe) {
 					/* exit from nested function */
 					err = prepare_func_exit(env, &env->insn_idx);
@@ -17438,6 +17529,33 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	int i, ret, cnt, delta = 0;
 
 	for (i = 0; i < insn_cnt; i++, insn++) {
+		/* Typically, exception state is always cleared on entry and we
+		 * ensure to clear it before exiting, but in some cases, our
+		 * invocation can occur after a BPF callback has been executed
+		 * asynchronously in the context of the current task, which may
+		 * clobber the state (think of BPF timer callbacks). Callbacks
+		 * never reset exception state (as they may be called from
+		 * within a program). Thus, if we rely on seeing the exception
+		 * state, always clear it on entry.
+		 */
+		if (i == 0 && prog->throws_exception) {
+			struct bpf_insn entry_insns[] = {
+				BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+				BPF_EMIT_CALL(bpf_reset_exception),
+				BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+				insn[i],
+			};
+
+			cnt = ARRAY_SIZE(entry_insns);
+			new_prog = bpf_patch_insn_data(env, i + delta, entry_insns, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+		}
+
 		/* Make divide-by-zero exceptions impossible. */
 		if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
 		    insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
@@ -18030,7 +18148,33 @@ static bool is_inlineable_bpf_loop_call(struct bpf_insn *insn,
 	return insn->code == (BPF_JMP | BPF_CALL) &&
 		insn->src_reg == 0 &&
 		insn->imm == BPF_FUNC_loop &&
-		aux->loop_inline_state.fit_for_inline;
+		aux->loop_inline_state.fit_for_inline &&
+		aux->throw_state.type == BPF_THROW_NONE;
+}
+
+static struct bpf_prog *rewrite_bpf_throw_call(struct bpf_verifier_env *env,
+					       int position,
+					       struct bpf_throw_state *tstate,
+					       u32 *cnt)
+{
+	struct bpf_insn insn_buf[] = {
+		env->prog->insnsi[position],
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+
+	*cnt = ARRAY_SIZE(insn_buf);
+	/* We don't need the call instruction for throws in frame 0 */
+	if (tstate->type == BPF_THROW_OUTER)
+		return bpf_patch_insn_data(env, position, insn_buf + 1, *cnt - 1);
+	return bpf_patch_insn_data(env, position, insn_buf, *cnt);
+}
+
+static bool is_bpf_throw_call(struct bpf_insn *insn)
+{
+	return insn->code == (BPF_JMP | BPF_CALL) &&
+	       insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
+	       insn->off == 0 && insn->imm == special_kfunc_list[KF_bpf_throw];
 }
 
 /* For all sub-programs in the program (including main) check
@@ -18069,8 +18213,24 @@ static int do_misc_rewrites(struct bpf_verifier_env *env)
 						   &cnt);
 			if (!new_prog)
 				return -ENOMEM;
+		} else if (is_bpf_throw_call(insn)) {
+			struct bpf_throw_state *throw_state = &insn_aux->throw_state;
+
+			/* The verifier was able to prove that the bpf_throw
+			 * call was unreachable, hence it must have not been
+			 * seen and will be removed by opt_remove_dead_code.
+			 */
+			if (throw_state->type == BPF_THROW_NONE) {
+				WARN_ON_ONCE(insn_aux->seen);
+				goto skip;
+			}
+
+			new_prog = rewrite_bpf_throw_call(env, i + delta, throw_state, &cnt);
+			if (!new_prog)
+				return -ENOMEM;
 		}
 
+skip:
 		if (new_prog) {
 			delta     += cnt - 1;
 			env->prog  = new_prog;
@@ -18240,6 +18400,12 @@ static int do_check_subprogs(struct bpf_verifier_env *env)
 				"Func#%d is safe for any args that match its prototype\n",
 				i);
 		}
+		/* Only reliable functions from BTF PoV can be extended, hence
+		 * remember their exception specification to check that we don't
+		 * replace non-throwing subprog with throwing subprog. The
+		 * opposite is fine though.
+		 */
+		aux->func_info_aux[i].throws_exception = env->subprog_info[i].can_throw;
 	}
 	return 0;
 }
@@ -18250,8 +18416,12 @@ static int do_check_main(struct bpf_verifier_env *env)
 
 	env->insn_idx = 0;
 	ret = do_check_common(env, 0);
-	if (!ret)
+	if (!ret) {
 		env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
+		env->prog->throws_exception = env->subprog_info[0].can_throw;
+		if (env->prog->aux->func_info)
+			env->prog->aux->func_info_aux[0].throws_exception = env->prog->throws_exception;
+	}
 	return ret;
 }
 
@@ -18753,6 +18923,42 @@ struct btf *bpf_get_btf_vmlinux(void)
 	return btf_vmlinux;
 }
 
+static int check_ext_prog(struct bpf_verifier_env *env)
+{
+	struct bpf_prog *tgt_prog = env->prog->aux->dst_prog;
+	u32 btf_id = env->prog->aux->attach_btf_id;
+	struct bpf_prog *prog = env->prog;
+	int subprog = -1;
+
+	if (prog->type != BPF_PROG_TYPE_EXT)
+		return 0;
+	for (int i = 0; i < tgt_prog->aux->func_info_cnt; i++) {
+		if (tgt_prog->aux->func_info[i].type_id == btf_id) {
+			subprog = i;
+			break;
+		}
+	}
+	if (subprog == -1) {
+		verbose(env, "verifier internal error: extension prog's subprog not found\n");
+		return -EFAULT;
+	}
+	/* BPF_THROW_OUTER rewrites won't match BPF_PROG_TYPE_EXT's
+	 * BPF_THROW_INNER rewrites.
+	 */
+	if (!subprog && prog->throws_exception) {
+		verbose(env, "Cannot attach throwing extension to main subprog\n");
+		return -EINVAL;
+	}
+	/* Overwriting extensions is not allowed, so we can simply check
+	 * the specification of the subprog we are replacing.
+	 */
+	if (!tgt_prog->aux->func_info_aux[subprog].throws_exception && prog->throws_exception) {
+		verbose(env, "Cannot attach throwing extension to non-throwing subprog\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 {
 	u64 start_time = ktime_get_ns();
@@ -18871,6 +19077,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 	ret = do_check_subprogs(env);
 	ret = ret ?: do_check_main(env);
 
+	ret = ret ?: check_ext_prog(env);
+
+
 	if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux))
 		ret = bpf_prog_offload_finalize(env);
 
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index dbd2c729781a..d5de9251e775 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -89,4 +89,13 @@ extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
  */
 extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
 
+/* Description
+ *  Throw an exception, terminating the execution of the program immediately.
+ *  The eBPF runtime unwinds the stack automatically and exits the program with
+ *  the default return value of 0.
+ * Returns
+ *  This function never returns.
+ */
+extern void bpf_throw(void) __attribute__((noreturn)) __ksym;
+
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-06  2:21   ` Alexei Starovoitov
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 5/9] bpf: Add pass to fixup global function throw information Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Enable handling of callbacks that throw within helpers and kfuncs taking
them. The problem with helpers is that exception state can be
manipulated by a tracing program which is invoked between the checking
of exception after calling callback and returning back to the program.
Hence, helpers always return -EJUKEBOX whenever they detect an
exception. Only 1 kfunc takes a callback (bpf_rbtree_add), so it is
updated to be notrace to avoid this pitfall.

This allows us to use bpf_get_exception to detect the thrown case, and
in case we miss exception event we use return code to unwind.

TODO: It might be possible to simply check return code in case of
helpers or kfuncs taking callbacks and check_helper_ret_code = true, and
not bother with exception state. This should lead to less code being
generated per-callsite. For all other cases, we can rely on
bpf_get_exception, and ensure that helper/kfunc uses notrace to avoid
invocation of tracing programs that clobber exception state on return
path. But make this change in v2 after ensuring current->bpf_exception_thrown
approach is acceptable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/arraymap.c        |   4 +-
 kernel/bpf/bpf_iter.c        |   2 +
 kernel/bpf/hashtab.c         |   4 +-
 kernel/bpf/helpers.c         |  18 +++--
 kernel/bpf/ringbuf.c         |   4 ++
 kernel/bpf/task_iter.c       |   2 +
 kernel/bpf/verifier.c        | 129 +++++++++++++++++++++++++++++++++++
 8 files changed, 156 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bc067223d3ee..a5346a2b7e68 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -485,6 +485,7 @@ struct bpf_insn_aux_data {
 	bool zext_dst; /* this insn zero extends dst reg */
 	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
+	bool skip_patch_call_imm; /* Skip patch_call_imm phase in do_misc_fixups */
 	u8 alu_state; /* used in combination with alu_limit */
 
 	/* below fields are initialized once */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index de0eadf8706f..6c0c5e726ebf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -711,6 +711,8 @@ static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback
 		key = i;
 		ret = callback_fn((u64)(long)map, (u64)(long)&key,
 				  (u64)(long)val, (u64)(long)callback_ctx, 0);
+		if (bpf_get_exception())
+			ret = -EJUKEBOX;
 		/* return value: 0 - continue, 1 - stop and return */
 		if (ret)
 			break;
@@ -718,7 +720,7 @@ static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback
 
 	if (is_percpu)
 		migrate_enable();
-	return num_elems;
+	return ret == -EJUKEBOX ? ret : num_elems;
 }
 
 static u64 array_map_mem_usage(const struct bpf_map *map)
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 96856f130cbf..6e4e4b6213f8 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
 
 	for (i = 0; i < nr_loops; i++) {
 		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
+		if (bpf_get_exception())
+			return -EJUKEBOX;
 		/* return value: 0 - continue, 1 - stop and return */
 		if (ret)
 			return i + 1;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..5e70151e0414 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2178,6 +2178,8 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
 			num_elems++;
 			ret = callback_fn((u64)(long)map, (u64)(long)key,
 					  (u64)(long)val, (u64)(long)callback_ctx, 0);
+			if (bpf_get_exception())
+				ret = -EJUKEBOX;
 			/* return value: 0 - continue, 1 - stop and return */
 			if (ret) {
 				rcu_read_unlock();
@@ -2189,7 +2191,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
 out:
 	if (is_percpu)
 		migrate_enable();
-	return num_elems;
+	return ret == -EJUKEBOX ? ret : num_elems;
 }
 
 static u64 htab_map_mem_usage(const struct bpf_map *map)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 89e70907257c..82db3a64fa3f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1982,10 +1982,11 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
 }
 
 /* Need to copy rbtree_add_cached's logic here because our 'less' is a BPF
- * program
+ * program.
+ * Marked notrace to avoid clobbering of exception state in current by BPF
+ * programs.
  */
-static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
-			     void *less)
+static notrace void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, void *less)
 {
 	struct rb_node **link = &((struct rb_root_cached *)root)->rb_root.rb_node;
 	bpf_callback_t cb = (bpf_callback_t)less;
@@ -1993,8 +1994,13 @@ static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 	bool leftmost = true;
 
 	while (*link) {
+		u64 cb_res;
+
 		parent = *link;
-		if (cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0)) {
+		cb_res = cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0);
+		if (bpf_get_exception())
+			return;
+		if (cb_res) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -2007,8 +2013,8 @@ static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 			       (struct rb_root_cached *)root, leftmost);
 }
 
-__bpf_kfunc void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
-				bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
+__bpf_kfunc notrace void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+					bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
 {
 	__bpf_rbtree_add(root, node, (void *)less);
 }
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 875ac9b698d9..7f6764ae4fff 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -766,6 +766,10 @@ BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map,
 
 		bpf_dynptr_init(&dynptr, sample, BPF_DYNPTR_TYPE_LOCAL, 0, size);
 		ret = callback((uintptr_t)&dynptr, (uintptr_t)callback_ctx, 0, 0, 0);
+		if (bpf_get_exception()) {
+			ret = -EJUKEBOX;
+			goto schedule_work_return;
+		}
 		__bpf_user_ringbuf_sample_release(rb, size, flags);
 	}
 	ret = samples - discarded_samples;
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index c4ab9d6cdbe9..6e8667f03784 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -807,6 +807,8 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
 		callback_fn((u64)(long)task, (u64)(long)vma,
 			    (u64)(long)callback_ctx, 0, 0);
 		ret = 0;
+		if (bpf_get_exception())
+			ret = -EJUKEBOX;
 	}
 	bpf_mmap_unlock_mm(work, mm);
 	return ret;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6981d8817c71..07d808b05044 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9053,6 +9053,24 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
+	/* For each helper call which invokes a callback which may throw, it
+	 * will propagate the thrown exception to us. For helpers, we check the
+	 * return code in addition to exception state, as it may be reset
+	 * between detection and return within kernel. Note that we don't
+	 * include async callbacks (passed to bpf_timer_set_callback) because
+	 * exceptions won't be propagated.
+	 */
+	if (is_callback_calling_function(meta.func_id) &&
+	    meta.func_id != BPF_FUNC_timer_set_callback) {
+		struct bpf_throw_state *ts = &env->insn_aux_data[insn_idx].throw_state;
+		/* Check for -EJUKEBOX in case exception state is clobbered by
+		 * some other program executing between bpf_get_exception and
+		 * return from helper.
+		 */
+		if (base_type(fn->ret_type) == RET_INTEGER)
+			ts->check_helper_ret_code = true;
+	}
+
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env, false);
@@ -17691,6 +17709,9 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			continue;
 		}
 
+		if (env->insn_aux_data[i + delta].skip_patch_call_imm)
+			continue;
+
 		if (insn->imm == BPF_FUNC_get_route_realm)
 			prog->dst_needed = 1;
 		if (insn->imm == BPF_FUNC_get_prandom_u32)
@@ -18177,6 +18198,94 @@ static bool is_bpf_throw_call(struct bpf_insn *insn)
 	       insn->off == 0 && insn->imm == special_kfunc_list[KF_bpf_throw];
 }
 
+static struct bpf_prog *rewrite_bpf_call(struct bpf_verifier_env *env,
+					 int position,
+					 s32 stack_base,
+					 struct bpf_throw_state *tstate,
+					 u32 *cnt)
+{
+	s32 r0_offset = stack_base + 0 * BPF_REG_SIZE;
+	struct bpf_insn_aux_data *aux_data;
+	struct bpf_insn insn_buf[] = {
+		env->prog->insnsi[position],
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, r0_offset),
+		BPF_EMIT_CALL(bpf_get_exception),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, r0_offset),
+		BPF_JMP32_IMM(BPF_JNE, BPF_REG_0, -EJUKEBOX, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_prog *new_prog;
+	int type, tsubprog = -1;
+	u32 callback_start;
+	u32 call_insn_offset;
+	s32 callback_offset;
+	bool ret_code;
+
+	type = tstate->type;
+	ret_code = tstate->check_helper_ret_code;
+	if (type == BPF_THROW_OUTER)
+		insn_buf[4] = insn_buf[9] = BPF_EMIT_CALL(bpf_reset_exception);
+	if (type == BPF_THROW_INNER)
+		insn_buf[9] = BPF_EMIT_CALL(bpf_throw);
+
+	/* We need to fix offset of the pseudo call after patching.
+	 * Note: The actual call instruction is at insn_buf[0]
+	 */
+	if (bpf_pseudo_call(&insn_buf[0])) {
+		tsubprog = find_subprog(env, position + insn_buf[0].imm + 1);
+		if (WARN_ON_ONCE(tsubprog < 0))
+			return NULL;
+	}
+	/* For helpers, the code path between checking bpf_get_exception and
+	 * returning may involve invocation of tracing progs which reset
+	 * exception state, so also use the return value to invoke exception
+	 * path. Otherwise, exception event from callback is lost.
+	 */
+	if (ret_code)
+		*cnt = ARRAY_SIZE(insn_buf);
+	else
+		*cnt = ARRAY_SIZE(insn_buf) - 4;
+	new_prog = bpf_patch_insn_data(env, position, insn_buf, *cnt);
+	if (!new_prog)
+		return new_prog;
+
+	/* Note: The actual call instruction is at insn_buf[0] */
+	if (bpf_pseudo_call(&insn_buf[0])) {
+		callback_start = env->subprog_info[tsubprog].start;
+		call_insn_offset = position + 0;
+		callback_offset = callback_start - call_insn_offset - 1;
+		new_prog->insnsi[call_insn_offset].imm = callback_offset;
+	}
+
+	aux_data = env->insn_aux_data;
+	/* Note: We already patched in call at insn_buf[2], insn_buf[9]. */
+	aux_data[position + 2].skip_patch_call_imm = true;
+	if (ret_code)
+		aux_data[position + 9].skip_patch_call_imm = true;
+	/* Note: For BPF_THROW_OUTER, we already patched in call at insn_buf[4] */
+	if (type == BPF_THROW_OUTER)
+		aux_data[position + 4].skip_patch_call_imm = true;
+	return new_prog;
+}
+
+static bool is_throwing_bpf_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
+				 struct bpf_insn_aux_data *insn_aux)
+{
+	if (insn->code != (BPF_JMP | BPF_CALL))
+		return false;
+	if (insn->src_reg == BPF_PSEUDO_CALL ||
+	    insn->src_reg == BPF_PSEUDO_KFUNC_CALL ||
+	    insn->src_reg == 0)
+		return insn_aux->throw_state.type != BPF_THROW_NONE;
+	return false;
+}
+
 /* For all sub-programs in the program (including main) check
  * insn_aux_data to see if there are any instructions that need to be
  * transformed into an instruction sequence. E.g. bpf_loop calls that
@@ -18228,6 +18337,26 @@ static int do_misc_rewrites(struct bpf_verifier_env *env)
 			new_prog = rewrite_bpf_throw_call(env, i + delta, throw_state, &cnt);
 			if (!new_prog)
 				return -ENOMEM;
+		} else if (is_throwing_bpf_call(env, insn, insn_aux)) {
+			struct bpf_throw_state *throw_state = &insn_aux->throw_state;
+
+			stack_depth_extra = max_t(u16, stack_depth_extra,
+						  BPF_REG_SIZE * 1 + stack_depth_roundup);
+
+			/* The verifier was able to prove that the throwing call
+			 * was unreachable, hence it must have not been seen and
+			 * will be removed by opt_remove_dead_code.
+			 */
+			if (throw_state->type == BPF_THROW_NONE) {
+				WARN_ON_ONCE(insn_aux->seen);
+				goto skip;
+			}
+
+			new_prog = rewrite_bpf_call(env, i + delta,
+						    -(stack_depth + stack_depth_extra),
+						    throw_state, &cnt);
+			if (!new_prog)
+				return -ENOMEM;
 		}
 
 skip:
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 5/9] bpf: Add pass to fixup global function throw information
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 6/9] bpf: Add KF_THROW annotation for kfuncs Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Within the do_check pass, we made a core assumption that we have correct
can_throw info about all global subprogs and simply used
mark_chain_throw without entering them to mark callsites leading up to
their call. However, the do_check_subprogs pass of verifier is iterative
and does not propagate can_throw information across global subprogs
which call into each other. We need an extra pass through all of them to
propagate can_throw information visibility throwing global subprogs into
global subprogs that call into them.

After doing this pass, do_check_main will directly use mark_chain_throw
again and have the correct information about all global subprogs which
are called by it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 118 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 117 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 07d808b05044..acfcaadca3b6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13664,6 +13664,12 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 			verbose(env, "missing btf func_info\n");
 			return -EINVAL;
 		}
+		/* NOTE: Do not change this directly, as we rely on only
+		 * BPF_FUNC_STATIC allowed as BPF_PSEUDO_FUNC targets in
+		 * do_check_subprogs, see comment about propagating exception
+		 * information across global functions. When changing this, add
+		 * bpf_pseudo_func handling to the propagating loop as well.
+		 */
 		if (aux->func_info_aux[subprogno].linkage != BTF_FUNC_STATIC) {
 			verbose(env, "callback function not static\n");
 			return -EINVAL;
@@ -18491,6 +18497,110 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 	return ret;
 }
 
+/* We have gone through all global subprogs, and we know which ones were seen as
+ * throwing exceptions. Since calls to other global functions are not explored
+ * and we simply continue exploration at the next instruction, we may have not
+ * fully propagated can_throw information. E.g. consider the case below, where 1
+ * and 2 are verified in order.
+ *
+ * gfunc 1:
+ *	call gfunc2
+ *	exit
+ * gfunc 2:
+ *	call bpf_throw
+ *
+ * At this point, gfunc1 is not marked as throwing, but it calls gfunc2 which
+ * actually throws. The only thing we need to do is go through every global
+ * function, and propagate the information back to their callers. We only care
+ * about BPF_PSEUDO_CALL, as BPF_PSEUDO_FUNC loads cannot have global functions
+ * as targets
+ *
+ * Logic mostly mimics check_max_stack_depth, but adjusted and simplified for
+ * our use case.
+ */
+static int fixup_global_subprog_throw_info(struct bpf_verifier_env *env)
+{
+	struct bpf_func_info_aux *func_info_aux = env->prog->aux->func_info_aux;
+	struct bpf_subprog_info *subprog = env->subprog_info;
+	int frame = 0, idx = 0, i = 0, subprog_end;
+	struct bpf_insn *insn = env->prog->insnsi;
+	int ret_insn[MAX_CALL_FRAMES];
+	int ret_prog[MAX_CALL_FRAMES];
+	bool can_throw;
+	int j, ret;
+
+	/* Start at first global subprog */
+	for (int s = 1; s < env->subprog_cnt; s++) {
+		if (func_info_aux[s].linkage != BTF_FUNC_GLOBAL)
+			continue;
+		idx = s;
+		break;
+	}
+	if (!idx)
+		return -EFAULT;
+	i = subprog[idx].start;
+continue_func:
+	can_throw = false;
+	subprog_end = subprog[idx + 1].start;
+	for (; i < subprog_end; i++) {
+		int next_insn;
+
+		if (!bpf_pseudo_call(insn + i))
+			continue;
+		/* remember insn and function to return to */
+		ret_insn[frame] = i + 1;
+		ret_prog[frame] = idx;
+
+		/* find the callee */
+		next_insn = i + insn[i].imm + 1;
+		idx = find_subprog(env, next_insn);
+		if (idx < 0) {
+			WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn);
+			return -EFAULT;
+		}
+
+		/* Only follow global subprog calls */
+		if (func_info_aux[idx].linkage != BTF_FUNC_GLOBAL)
+			continue;
+		/* If this subprog already throws, mark all callers and continue
+		 * with next instruction in current subprog.
+		 */
+		if (subprog[idx].can_throw) {
+			/* Include current frame info when marking */
+			for (j = frame; j >= 0; j--) {
+				func_info_aux[ret_prog[j]].throws_exception = subprog[ret_prog[j]].can_throw = true;
+				/* Exception subprog cannot be set in global
+				 * function context, so set_throw_state_type
+				 * will always mark type as BPF_THROW_INNER
+				 * and subprog as -1.
+				 */
+				ret = set_throw_state_type(env, ret_insn[j] - 1, j, ret_prog[j]);
+				if (ret < 0)
+					return ret;
+			}
+			continue;
+		}
+
+		i = next_insn;
+		frame++;
+		if (frame >= MAX_CALL_FRAMES) {
+			verbose(env, "the call stack of %d frames is too deep !\n",
+				frame);
+			return -E2BIG;
+		}
+		goto continue_func;
+	}
+	/* end of for() loop means the last insn of the 'subprog'
+	 * was reached. Doesn't matter whether it was JA or EXIT
+	 */
+	if (frame == 0)
+		return 0;
+	frame--;
+	i = ret_insn[frame];
+	idx = ret_prog[frame];
+	goto continue_func;
+}
+
 /* Verify all global functions in a BPF program one by one based on their BTF.
  * All global functions must pass verification. Otherwise the whole program is rejected.
  * Consider:
@@ -18511,6 +18621,7 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 static int do_check_subprogs(struct bpf_verifier_env *env)
 {
 	struct bpf_prog_aux *aux = env->prog->aux;
+	bool does_anyone_throw = false;
 	int i, ret;
 
 	if (!aux->func_info)
@@ -18535,8 +18646,13 @@ static int do_check_subprogs(struct bpf_verifier_env *env)
 		 * opposite is fine though.
 		 */
 		aux->func_info_aux[i].throws_exception = env->subprog_info[i].can_throw;
+		if (!does_anyone_throw && env->subprog_info[i].can_throw)
+			does_anyone_throw = true;
 	}
-	return 0;
+
+	if (!does_anyone_throw)
+		return 0;
+	return fixup_global_subprog_throw_info(env);
 }
 
 static int do_check_main(struct bpf_verifier_env *env)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 6/9] bpf: Add KF_THROW annotation for kfuncs
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 5/9] bpf: Add pass to fixup global function throw information Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 7/9] bpf: Introduce bpf_set_exception_callback kfunc Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Add KF_THROW annotation to kfuncs to indicate that they may throw. This
is mostly for testing for now, but in the future it could be used by
kfuncs to throw on invalid arguments or invalid conditions based on
their input arguments, causing the program to abort, and simplify the
overall user experience of kfuncs for the happy case, without having to
deal with corner cases that never occur at runtime.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/btf.h   |  1 +
 kernel/bpf/verifier.c | 12 ++++++++++--
 net/bpf/test_run.c    | 12 ++++++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index d53b10cc55f2..8dfa4113822b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -75,6 +75,7 @@
 #define KF_ITER_NEW     (1 << 8) /* kfunc implements BPF iter constructor */
 #define KF_ITER_NEXT    (1 << 9) /* kfunc implements BPF iter next method */
 #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
+#define KF_THROW	(1 << 11) /* kfunc may throw a BPF exception */
 
 /*
  * Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index acfcaadca3b6..b9f4b1849647 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9454,6 +9454,11 @@ static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
 	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
 }
 
+static bool is_kfunc_throwing(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_THROW;
+}
+
 static bool __kfunc_param_match_suffix(const struct btf *btf,
 				       const struct btf_param *arg,
 				       const char *suffix)
@@ -10813,11 +10818,14 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
-	if (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_throw]) {
+	if (is_kfunc_throwing(&meta) ||
+	    (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_throw])) {
 		err = mark_chain_throw(env, insn_idx);
 		if (err < 0)
 			return err;
-		return 1;
+		/* Halt exploration only for bpf_throw */
+		if (!is_kfunc_throwing(&meta))
+			return 1;
 	}
 
 	for (i = 0; i < CALLER_SAVED_REGS; i++)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index f1652f5fbd2e..31f76ee4218b 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -766,6 +766,16 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
 	return arg;
 }
 
+__bpf_kfunc notrace void bpf_kfunc_call_test_always_throws(void)
+{
+	bpf_throw();
+}
+
+__bpf_kfunc notrace void bpf_kfunc_call_test_never_throws(void)
+{
+	return;
+}
+
 __diag_pop();
 
 BTF_SET8_START(bpf_test_modify_return_ids)
@@ -806,6 +816,8 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_always_throws, KF_THROW)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_never_throws, KF_THROW)
 BTF_SET8_END(test_sk_check_kfunc_ids)
 
 static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 7/9] bpf: Introduce bpf_set_exception_callback kfunc
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 6/9] bpf: Add KF_THROW annotation for kfuncs Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 8/9] bpf: Introduce BPF assertion macros Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

This patch allows the BPF program to queue the invocation of a callback
after an exception has been thrown and the BPF runtime has completed the
unwinding of the program stack. This is the last past point before the
program returns control back to the kernel.

In the earlier patches, by default, whenever an exception is thrown, we
return 0 to the kernel context. However, this is not always desirable
and the program may want to customize the default return code or perform
some program specific action when an exception is thrown. This commit
sets up the infrastructure to allow this. The semantics and
implementation notes are as follows.

- The program may use bpf_set_exception_callback to set up the exception
  callback only once.
- Since this callback is invoked during unwinding, it cannot throw
  itself.
- The exception state has been reset before it is invoked, hence any
  program attachments (fentry/fexit) are allowed as usual. The check to
  disallow throwing FEXIT for throwing program is relaxed at the level
  of subprog it is attaching to.
- Exception callbacks are disallowed to call bpf_set_exception_callback.
- bpf_set_exception_callback may not be called in global functions or
  BPF_PROG_TYPE_EXT (because they propagate exception outwards and do
  not perform the final exit back to the kernel, which is where the
  callback call is inserted). This can be supported in the future (by
  verifying that all paths which throw have the same callback set, and
  remember this in the global subprog/ext prog summary in func_info_aux,
  but is skipped for simplicity).
- For a given outermost throwing instruction, it cannot see different
  exception callbacks from different paths, since it has to hardcode one
  during the rewrite phase.
- From stack depth check point of view, async and exception callbacks
  are similar in the sense that they don't contribute to current
  callchain's stack_depth, but they need to be tested against the main
  subprog's stack depth and the exception callback's stack depth being
  within MAX_BPF_STACK. The variable in subprog_info is named to reflect
  this combined meaning, and appropriate handling is introduced.
- frame->in_exception_callback_fn is used to subject the exception
  callback to the same return code checks as the BPF program's main
  exit, so that it can return a meaningful return code based on the
  program type and is not limited to some fixed callback return range.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h                  |   6 +-
 kernel/bpf/helpers.c                          |   6 +
 kernel/bpf/verifier.c                         | 184 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  11 ++
 4 files changed, 188 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a5346a2b7e68..301318ed04f5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -293,6 +293,7 @@ struct bpf_func_state {
 	bool in_callback_fn;
 	struct tnum callback_ret_range;
 	bool in_async_callback_fn;
+	bool in_exception_callback_fn;
 
 	/* The following fields should be last. See copy_func_state() */
 	int acquired_refs;
@@ -370,6 +371,7 @@ struct bpf_verifier_state {
 	struct bpf_active_lock active_lock;
 	bool speculative;
 	bool active_rcu_lock;
+	s32 exception_callback_subprog;
 
 	/* first and last insn idx of this verifier state */
 	u32 first_insn_idx;
@@ -439,6 +441,7 @@ enum {
 struct bpf_throw_state {
 	int type;
 	bool check_helper_ret_code;
+	s32 subprog;
 };
 
 /* Possible states for alu_state member. */
@@ -549,7 +552,8 @@ struct bpf_subprog_info {
 	bool has_tail_call;
 	bool tail_call_reachable;
 	bool has_ld_abs;
-	bool is_async_cb;
+	bool is_async_or_exception_cb;
+	bool is_exception_cb;
 	bool can_throw;
 };
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 82db3a64fa3f..e6f15da8f154 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2322,6 +2322,11 @@ __bpf_kfunc notrace void bpf_throw(void)
 	current->bpf_exception_thrown[i] = true;
 }
 
+__bpf_kfunc notrace void bpf_set_exception_callback(int (*cb)(void))
+{
+	WARN_ON_ONCE(1);
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -2349,6 +2354,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
 #endif
 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_throw)
+BTF_ID_FLAGS(func, bpf_set_exception_callback)
 BTF_SET8_END(generic_btf_ids)
 
 static const struct btf_kfunc_id_set generic_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b9f4b1849647..5015abf246b1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1736,6 +1736,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	}
 	dst_state->speculative = src->speculative;
 	dst_state->active_rcu_lock = src->active_rcu_lock;
+	dst_state->exception_callback_subprog = src->exception_callback_subprog;
 	dst_state->curframe = src->curframe;
 	dst_state->active_lock.ptr = src->active_lock.ptr;
 	dst_state->active_lock.id = src->active_lock.id;
@@ -5178,10 +5179,16 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 				  next_insn);
 			return -EFAULT;
 		}
-		if (subprog[idx].is_async_cb) {
+		if (subprog[idx].is_async_or_exception_cb) {
 			if (subprog[idx].has_tail_call) {
-				verbose(env, "verifier bug. subprog has tail_call and async cb\n");
+				verbose(env, "verifier bug. subprog has tail_call and async or exception cb\n");
 				return -EFAULT;
+			}
+			if (subprog[idx].is_exception_cb) {
+				if (subprog[0].stack_depth + subprog[idx].stack_depth > MAX_BPF_STACK) {
+					verbose(env, "combined stack size of main and exception calls is %d. Too large\n", depth);
+					return -EACCES;
+				}
 			}
 			 /* async callbacks don't increase bpf prog stack size */
 			continue;
@@ -8203,6 +8210,7 @@ static int set_callee_state(struct bpf_verifier_env *env,
 			    struct bpf_func_state *callee, int insn_idx);
 
 static bool is_callback_calling_kfunc(u32 btf_id);
+static bool is_set_exception_cb_kfunc(struct bpf_insn *insn);
 static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx);
 
 static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
@@ -8279,13 +8287,16 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
-	if (insn->code == (BPF_JMP | BPF_CALL) &&
-	    insn->src_reg == 0 &&
-	    insn->imm == BPF_FUNC_timer_set_callback) {
+	if ((insn->code == (BPF_JMP | BPF_CALL) &&
+	     insn->src_reg == 0 &&
+	     insn->imm == BPF_FUNC_timer_set_callback) ||
+	     is_set_exception_cb_kfunc(insn)) {
 		struct bpf_verifier_state *async_cb;
 
 		/* there is no real recursion here. timer callbacks are async */
-		env->subprog_info[subprog].is_async_cb = true;
+		env->subprog_info[subprog].is_async_or_exception_cb = true;
+		if (is_set_exception_cb_kfunc(insn))
+			env->subprog_info[subprog].is_exception_cb = true;
 		async_cb = push_async_cb(env, env->subprog_info[subprog].start,
 					 *insn_idx, subprog);
 		if (!async_cb)
@@ -8396,12 +8407,15 @@ static int set_throw_state_type(struct bpf_verifier_env *env, int insn_idx,
 				int frame, int subprog)
 {
 	struct bpf_throw_state *ts = &env->insn_aux_data[insn_idx].throw_state;
-	int type;
+	int exception_subprog, type;
 
-	if (!frame && !subprog && env->prog->type != BPF_PROG_TYPE_EXT)
+	if (!frame && !subprog && env->prog->type != BPF_PROG_TYPE_EXT) {
 		type = BPF_THROW_OUTER;
-	else
+		exception_subprog = env->cur_state->exception_callback_subprog;
+	} else {
 		type = BPF_THROW_INNER;
+		exception_subprog = -1;
+	}
 	if (ts->type != BPF_THROW_NONE) {
 		if (ts->type != type) {
 			verbose(env,
@@ -8409,8 +8423,14 @@ static int set_throw_state_type(struct bpf_verifier_env *env, int insn_idx,
 				insn_idx, ts->type, type);
 			return -EINVAL;
 		}
+		if (ts->subprog != exception_subprog) {
+			verbose(env, "different exception callback subprogs for same insn %d: %d and %d\n",
+				insn_idx, ts->subprog, exception_subprog);
+			return -EINVAL;
+		}
 	}
 	ts->type = type;
+	ts->subprog = exception_subprog;
 	return 0;
 }
 
@@ -8432,9 +8452,23 @@ static int mark_chain_throw(struct bpf_verifier_env *env, int insn_idx) {
 		ret = set_throw_state_type(env, frame[i]->callsite, i - 1, subprogno);
 		if (ret < 0)
 			return ret;
+		/* Have we seen this being used as exception cb? Reject! */
+		if (subprog[subprogno].is_exception_cb) {
+			verbose(env,
+				"subprog %d (at insn %d) is used as exception callback, cannot throw\n",
+				subprogno, subprog[subprogno].start);
+			return -EACCES;
+		}
 	}
 	/* Now mark actual instruction which caused the throw */
 	cur_subprogno = frame[state->curframe]->subprogno;
+	/* Have we seen this being used as exception cb? Reject! */
+	if (subprog[cur_subprogno].is_exception_cb) {
+		verbose(env,
+			"subprog %d (at insn %d) is used as exception callback, cannot throw\n",
+			cur_subprogno, subprog[cur_subprogno].start);
+		return -EACCES;
+	}
 	func_info_aux[cur_subprogno].throws_exception = subprog[cur_subprogno].can_throw = true;
 	return set_throw_state_type(env, insn_idx, state->curframe, cur_subprogno);
 }
@@ -8619,6 +8653,23 @@ static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_exception_callback_state(struct bpf_verifier_env *env,
+					struct bpf_func_state *caller,
+					struct bpf_func_state *callee,
+					int insn_idx)
+{
+	/* void bpf_exception_callback(int (*cb)(void)); */
+
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_1]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_2]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_exception_callback_fn = true;
+	callee->callback_ret_range = tnum_range(0, 0);
+	return 0;
+}
+
 static bool is_rbtree_lock_required_kfunc(u32 btf_id);
 
 /* Are we currently verifying the callback for a rbtree helper that must
@@ -9695,6 +9746,7 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
 	KF_bpf_throw,
+	KF_bpf_set_exception_callback,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -9714,6 +9766,7 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_throw)
+BTF_ID(func, bpf_set_exception_callback)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -9735,6 +9788,7 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_throw)
+BTF_ID(func, bpf_set_exception_callback)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10080,7 +10134,14 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 
 static bool is_callback_calling_kfunc(u32 btf_id)
 {
-	return btf_id == special_kfunc_list[KF_bpf_rbtree_add];
+	return btf_id == special_kfunc_list[KF_bpf_rbtree_add] ||
+	       btf_id == special_kfunc_list[KF_bpf_set_exception_callback];
+}
+
+static bool is_set_exception_cb_kfunc(struct bpf_insn *insn)
+{
+	return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
+	       insn->imm == special_kfunc_list[KF_bpf_set_exception_callback];
 }
 
 static bool is_rbtree_lock_required_kfunc(u32 btf_id)
@@ -10704,6 +10765,9 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
 	return 0;
 }
 
+#define BPF_EXCEPTION_CB_CAN_SET (-1)
+#define BPF_EXCEPTION_CB_CANNOT_SET (-2)
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -10818,6 +10882,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
+	if (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_set_exception_callback]) {
+		if (env->cur_state->exception_callback_subprog == BPF_EXCEPTION_CB_CANNOT_SET) {
+			verbose(env, "exception callback cannot be set within global function or extension program\n");
+			return -EINVAL;
+		}
+		if (env->cur_state->frame[env->cur_state->curframe]->in_exception_callback_fn) {
+			verbose(env, "exception callback cannot be set from within exception callback\n");
+			return -EINVAL;
+		}
+		/* If we didn't explore and mark can_throw yet, we will see it
+		 * when we pop_stack for the pushed async cb which gets the
+		 * is_exception_cb marking and is caught in mark_chain_throw.
+		 */
+		if (env->subprog_info[meta.subprogno].can_throw) {
+			verbose(env, "exception callback can throw, which is not allowed\n");
+			return -EINVAL;
+		}
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_exception_callback_state);
+		if (err) {
+			verbose(env, "kfunc %s#%d failed callback verification\n",
+				func_name, meta.func_id);
+			return err;
+		}
+		env->cur_state->exception_callback_subprog = meta.subprogno;
+	}
+
 	if (is_kfunc_throwing(&meta) ||
 	    (meta.btf == btf_vmlinux && meta.func_id == special_kfunc_list[KF_bpf_throw])) {
 		err = mark_chain_throw(env, insn_idx);
@@ -13829,7 +13920,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 	const bool is_subprog = frame->subprogno;
 
 	/* LSM and struct_ops func-ptr's return type could be "void" */
-	if (!is_subprog) {
+	if (!is_subprog || frame->in_exception_callback_fn) {
 		switch (prog_type) {
 		case BPF_PROG_TYPE_LSM:
 			if (prog->expected_attach_type == BPF_LSM_CGROUP)
@@ -13877,7 +13968,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 		return 0;
 	}
 
-	if (is_subprog) {
+	if (is_subprog && !frame->in_exception_callback_fn) {
 		if (reg->type != SCALAR_VALUE) {
 			verbose(env, "At subprogram exit the register R0 is not a scalar value (%s)\n",
 				reg_type_str(env, reg->type));
@@ -15134,6 +15225,9 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->active_rcu_lock != cur->active_rcu_lock)
 		return false;
 
+	if (old->exception_callback_subprog != cur->exception_callback_subprog)
+		return false;
+
 	/* for states to be equal callsites have to be the same
 	 * and all frame states need to be equivalent
 	 */
@@ -17538,6 +17632,9 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		 * may_access_direct_pkt_data mutates it
 		 */
 		env->seen_direct_write = seen_direct_write;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_set_exception_callback]) {
+		insn_buf[0] = BPF_JMP_IMM(BPF_JA, 0, 0, 0);
+		*cnt = 1;
 	}
 	return 0;
 }
@@ -18194,15 +18291,35 @@ static struct bpf_prog *rewrite_bpf_throw_call(struct bpf_verifier_env *env,
 {
 	struct bpf_insn insn_buf[] = {
 		env->prog->insnsi[position],
-		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
 	};
+	struct bpf_prog *new_prog;
+	u32 callback_start;
+	u32 call_insn_offset;
+	s32 callback_offset;
+	int type, esubprog;
 
+	type = tstate->type;
+	esubprog = tstate->subprog;
 	*cnt = ARRAY_SIZE(insn_buf);
 	/* We don't need the call instruction for throws in frame 0 */
-	if (tstate->type == BPF_THROW_OUTER)
-		return bpf_patch_insn_data(env, position, insn_buf + 1, *cnt - 1);
-	return bpf_patch_insn_data(env, position, insn_buf, *cnt);
+	if (type == BPF_THROW_OUTER) {
+		/* We need to return r0 of exception callback from outermost frame */
+		if (esubprog != -1)
+			insn_buf[0] = BPF_CALL_REL(0);
+		else
+			insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, 0);
+	}
+	new_prog = bpf_patch_insn_data(env, position, insn_buf, *cnt);
+	if (!new_prog || esubprog == -1)
+		return new_prog;
+
+	callback_start = env->subprog_info[esubprog].start;
+	/* Note: insn_buf[0] is an offset of BPF_CALL_REL instruction */
+	call_insn_offset = position + 0;
+	callback_offset = callback_start - call_insn_offset - 1;
+	new_prog->insnsi[call_insn_offset].imm = callback_offset;
+	return new_prog;
 }
 
 static bool is_bpf_throw_call(struct bpf_insn *insn)
@@ -18234,17 +18351,25 @@ static struct bpf_prog *rewrite_bpf_call(struct bpf_verifier_env *env,
 		BPF_MOV64_IMM(BPF_REG_0, 0),
 		BPF_EXIT_INSN(),
 	};
+	int type, tsubprog = -1, esubprog;
 	struct bpf_prog *new_prog;
-	int type, tsubprog = -1;
 	u32 callback_start;
 	u32 call_insn_offset;
 	s32 callback_offset;
 	bool ret_code;
 
 	type = tstate->type;
+	esubprog = tstate->subprog;
 	ret_code = tstate->check_helper_ret_code;
-	if (type == BPF_THROW_OUTER)
+	if (type == BPF_THROW_OUTER) {
 		insn_buf[4] = insn_buf[9] = BPF_EMIT_CALL(bpf_reset_exception);
+		/* Note that we allow progs to attach to exception callbacks,
+		 * even if they do, they won't clobber any exception state that
+		 * we care about at this point.
+		 */
+		if (esubprog != -1)
+			insn_buf[5] = insn_buf[10] = BPF_CALL_REL(0);
+	}
 	if (type == BPF_THROW_INNER)
 		insn_buf[9] = BPF_EMIT_CALL(bpf_throw);
 
@@ -18285,6 +18410,25 @@ static struct bpf_prog *rewrite_bpf_call(struct bpf_verifier_env *env,
 	/* Note: For BPF_THROW_OUTER, we already patched in call at insn_buf[4] */
 	if (type == BPF_THROW_OUTER)
 		aux_data[position + 4].skip_patch_call_imm = true;
+
+	/* Fixups for exception callback begin here */
+	if (esubprog == -1)
+		return new_prog;
+	callback_start = env->subprog_info[esubprog].start;
+
+	/* Note: insn_buf[5] is an offset of BPF_CALL_REL instruction */
+	call_insn_offset = position + 5;
+	callback_offset = callback_start - call_insn_offset - 1;
+	new_prog->insnsi[call_insn_offset].imm = callback_offset;
+
+	if (!ret_code)
+		return new_prog;
+
+	/* Note: insn_buf[10] is an offset of BPF_CALL_REL instruction */
+	call_insn_offset = position + 10;
+	callback_offset = callback_start - call_insn_offset - 1;
+	new_prog->insnsi[call_insn_offset].imm = callback_offset;
+
 	return new_prog;
 }
 
@@ -18439,6 +18583,10 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
 		return -ENOMEM;
 	state->curframe = 0;
 	state->speculative = false;
+	if (subprog || env->prog->type == BPF_PROG_TYPE_EXT)
+		state->exception_callback_subprog = BPF_EXCEPTION_CB_CANNOT_SET;
+	else
+		state->exception_callback_subprog = BPF_EXCEPTION_CB_CAN_SET;
 	state->branches = 1;
 	state->frame[0] = kzalloc(sizeof(struct bpf_func_state), GFP_KERNEL);
 	if (!state->frame[0]) {
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index d5de9251e775..a9c75270e49b 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -98,4 +98,15 @@ extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
  */
 extern void bpf_throw(void) __attribute__((noreturn)) __ksym;
 
+/*
+ * Description
+ *  Set the callback which will be invoked after an exception is thrown and the
+ *  eBPF runtime has completely unwinded the program stack. The return value of
+ *  this callback is treated as the return value of the program when the
+ *  exception is thrown.
+ * Returns
+ *  Void
+ */
+extern void bpf_set_exception_callback(int (*)(void)) __ksym;
+
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 8/9] bpf: Introduce BPF assertion macros
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 7/9] bpf: Introduce bpf_set_exception_callback kfunc Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
  8 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Implement macros that allow for asserting conditions for with a given
register and constant value to prove a condition to the verifier, and
safely abort the program in case the condition is not true. The verifier
can still perform dead code elimination of the bpf_throw call if it can
actually prove the condition based on the seen data flow during path
exploration, and the function may not be marked as throwing.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/bpf_experimental.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index a9c75270e49b..aae358a8db4b 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -109,4 +109,21 @@ extern void bpf_throw(void) __attribute__((noreturn)) __ksym;
  */
 extern void bpf_set_exception_callback(int (*)(void)) __ksym;
 
+#define __bpf_assert_op(LHS, op, RHS)								   \
+	_Static_assert(sizeof(&(LHS)), "1st argument must be an lvalue expression");		   \
+	_Static_assert(__builtin_constant_p((RHS)), "2nd argument must be a constant expression"); \
+	asm volatile ("if %[lhs] " op " %[rhs] goto +1; call bpf_throw"				   \
+		      : : [lhs] "r"(LHS) , [rhs] "i"(RHS) :)
+
+#define bpf_assert_eq(LHS, RHS) __bpf_assert_op(LHS, "==", RHS)
+#define bpf_assert_ne(LHS, RHS) __bpf_assert_op(LHS, "!=", RHS)
+#define bpf_assert_lt(LHS, RHS) __bpf_assert_op(LHS, "<", RHS)
+#define bpf_assert_gt(LHS, RHS) __bpf_assert_op(LHS, ">", RHS)
+#define bpf_assert_le(LHS, RHS) __bpf_assert_op(LHS, "<=", RHS)
+#define bpf_assert_ge(LHS, RHS) __bpf_assert_op(LHS, ">=", RHS)
+#define bpf_assert_slt(LHS, RHS) __bpf_assert_op(LHS, "s<", RHS)
+#define bpf_assert_sgt(LHS, RHS) __bpf_assert_op(LHS, "s>", RHS)
+#define bpf_assert_sle(LHS, RHS) __bpf_assert_op(LHS, "s<=", RHS)
+#define bpf_assert_sge(LHS, RHS) __bpf_assert_op(LHS, "s>=", RHS)
+
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 8/9] bpf: Introduce BPF assertion macros Kumar Kartikeya Dwivedi
@ 2023-04-05  0:42 ` Kumar Kartikeya Dwivedi
  2023-04-06  2:38   ` Alexei Starovoitov
  8 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-05  0:42 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

Add selftests to cover success and failure cases of API usage, runtime
behavior and invariants that need to be maintained for implementation
correctness.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/exceptions.c     | 240 ++++++++++++++++
 .../testing/selftests/bpf/progs/exceptions.c  | 218 ++++++++++++++
 .../selftests/bpf/progs/exceptions_ext.c      |  42 +++
 .../selftests/bpf/progs/exceptions_fail.c     | 267 ++++++++++++++++++
 4 files changed, 767 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/exceptions_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/exceptions.c b/tools/testing/selftests/bpf/prog_tests/exceptions.c
new file mode 100644
index 000000000000..342f44a12c65
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/exceptions.c
@@ -0,0 +1,240 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "exceptions.skel.h"
+#include "exceptions_ext.skel.h"
+#include "exceptions_fail.skel.h"
+
+static char log_buf[1024 * 1024];
+
+static void test_exceptions_failure(void)
+{
+	RUN_TESTS(exceptions_fail);
+}
+
+static void test_exceptions_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, ropts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct exceptions_ext *eskel = NULL;
+	struct exceptions *skel;
+	int ret;
+
+	skel = exceptions__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "exceptions__open_and_load"))
+		return;
+
+#define RUN_SUCCESS(_prog, return_val)						  \
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs._prog), &ropts); \
+	ASSERT_OK(ret, #_prog " prog run ret");					  \
+	ASSERT_EQ(ropts.retval, return_val, #_prog " prog run retval");
+
+	RUN_SUCCESS(exception_throw_subprog, 16);
+	RUN_SUCCESS(exception_throw, 0);
+	RUN_SUCCESS(exception_throw_gfunc1, 1);
+	RUN_SUCCESS(exception_throw_gfunc2, 0);
+	RUN_SUCCESS(exception_throw_gfunc3, 1);
+	RUN_SUCCESS(exception_throw_gfunc4, 0);
+	RUN_SUCCESS(exception_throw_gfunc5, 1);
+	RUN_SUCCESS(exception_throw_gfunc6, 16);
+	RUN_SUCCESS(exception_throw_func1, 1);
+	RUN_SUCCESS(exception_throw_func2, 0);
+	RUN_SUCCESS(exception_throw_func3, 1);
+	RUN_SUCCESS(exception_throw_func4, 0);
+	RUN_SUCCESS(exception_throw_func5, 1);
+	RUN_SUCCESS(exception_throw_func6, 16);
+	RUN_SUCCESS(exception_throw_cb1, 0);
+	RUN_SUCCESS(exception_throw_cb2, 16);
+	RUN_SUCCESS(exception_throw_cb_diff, 16);
+	RUN_SUCCESS(exception_throw_kfunc1, 0);
+	RUN_SUCCESS(exception_throw_kfunc2, 1);
+
+#define RUN_EXT(load_ret, attach_err, expr, msg)				  \
+	{									  \
+		LIBBPF_OPTS(bpf_object_open_opts, o, .kernel_log_buf = log_buf,		 \
+						     .kernel_log_size = sizeof(log_buf), \
+						     .kernel_log_level = 2);		 \
+		exceptions_ext__destroy(eskel);					  \
+		eskel = exceptions_ext__open_opts(&o);				  \
+		struct bpf_program *prog = NULL;				  \
+		struct bpf_link *link = NULL;					  \
+		if (!ASSERT_OK_PTR(eskel, "exceptions_ext__open"))		  \
+			goto done;						  \
+		(expr);								  \
+		ASSERT_OK_PTR(bpf_program__name(prog), bpf_program__name(prog));  \
+		if (!ASSERT_EQ(exceptions_ext__load(eskel), load_ret,		  \
+			       "exceptions_ext__load"))	{			  \
+			printf("%s\n", log_buf);				  \
+			goto done;						  \
+		}								  \
+		if (load_ret != 0) {						  \
+			printf("%s\n", log_buf);				  \
+			if (!ASSERT_OK_PTR(strstr(log_buf, msg), "strstr"))	  \
+				goto done;					  \
+		}								  \
+		if (!load_ret && attach_err) {					  \
+			if (!ASSERT_ERR_PTR(link = bpf_program__attach(prog), "attach err")) \
+				goto done;					  \
+		} else if (!load_ret) {						  \
+			if (!ASSERT_OK_PTR(link = bpf_program__attach(prog), "attach ok"))  \
+				goto done;					  \
+			bpf_link__destroy(link);				  \
+		}								  \
+	}
+
+	/* non-throwing fexit -> non-throwing subprog : OK */
+	RUN_EXT(0, false, ({
+		prog = eskel->progs.pfexit;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* throwing fexit -> non-throwing subprog : BAD */
+	RUN_EXT(0, true, ({
+		prog = eskel->progs.throwing_fexit;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* non-throwing fexit -> throwing subprog : OK */
+	RUN_EXT(0, false, ({
+		prog = eskel->progs.pfexit;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "throwing_subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* throwing fexit -> throwing subprog : BAD */
+	RUN_EXT(0, true, ({
+		prog = eskel->progs.throwing_fexit;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "throwing_subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* fmod_ret not allowed for subprog - Check so we remember to handle its
+	 * throwing specification compatibility with target when supported.
+	 */
+	RUN_EXT(-EINVAL, false, ({
+		prog = eskel->progs.pfmod_ret;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "subprog"), "set_attach_target"))
+			goto done;
+	}), "can't modify return codes of BPF program");
+
+	/* fmod_ret not allowed for global subprog - Check so we remember to
+	 * handle its throwing specification compatibility with target when
+	 * supported.
+	 */
+	RUN_EXT(-EINVAL, false, ({
+		prog = eskel->progs.pfmod_ret;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "global_subprog"), "set_attach_target"))
+			goto done;
+	}), "can't modify return codes of BPF program");
+
+	/* non-throwing extension -> non-throwing subprog : BAD (!global)
+	 * We need to handle and reject it for static subprogs when supported
+	 * when extension is throwing as not all callsites are marked to handle
+	 * them.
+	 */
+	RUN_EXT(-EINVAL, true, ({
+		prog = eskel->progs.extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "subprog"), "set_attach_target"))
+			goto done;
+	}), "subprog() is not a global function");
+
+	/* non-throwing extension -> throwing subprog : BAD (!global)
+	 * We need to handle and reject it for static subprogs when supported
+	 * when extension is throwing as not all callsites are marked to handle
+	 * them.
+	 */
+	RUN_EXT(-EINVAL, true, ({
+		prog = eskel->progs.extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "throwing_subprog"), "set_attach_target"))
+			goto done;
+	}), "throwing_subprog() is not a global function");
+
+	/* non-throwing extension -> non-throwing global subprog : OK */
+	RUN_EXT(0, false, ({
+		prog = eskel->progs.extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "global_subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* non-throwing extension -> throwing global subprog : OK */
+	RUN_EXT(0, false, ({
+		prog = eskel->progs.extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "throwing_global_subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* throwing extension -> throwing global subprog : OK */
+	RUN_EXT(0, false, ({
+		prog = eskel->progs.throwing_extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "throwing_global_subprog"), "set_attach_target"))
+			goto done;
+	}), "");
+
+	/* throwing extension -> main subprog : BAD (OUTER vs INNER mismatch) */
+	RUN_EXT(-EINVAL, false, ({
+		prog = eskel->progs.throwing_extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "exception_throw_subprog"), "set_attach_target"))
+			goto done;
+	}), "Cannot attach throwing extension to main subprog");
+
+	/* throwing extension -> non-throwing global subprog : BAD */
+	RUN_EXT(-EINVAL, false, ({
+		prog = eskel->progs.throwing_extension;
+		bpf_program__set_autoload(prog, true);
+		if (!ASSERT_OK(bpf_program__set_attach_target(prog,
+			       bpf_program__fd(skel->progs.exception_throw_subprog),
+			       "global_subprog"), "set_attach_target"))
+			goto done;
+	}), "Cannot attach throwing extension to non-throwing subprog");
+done:
+	exceptions_ext__destroy(eskel);
+	exceptions__destroy(skel);
+}
+
+void test_exceptions(void)
+{
+	test_exceptions_failure();
+	test_exceptions_success();
+}
diff --git a/tools/testing/selftests/bpf/progs/exceptions.c b/tools/testing/selftests/bpf/progs/exceptions.c
new file mode 100644
index 000000000000..9a33f88e7e2c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+SEC("tc")
+int exception_throw(struct __sk_buff *ctx)
+{
+	if (ctx->data)
+		bpf_throw();
+	return 1;
+}
+
+
+static __noinline int subprog(struct __sk_buff *ctx)
+{
+	return ctx->len;
+}
+
+static __noinline int throwing_subprog(struct __sk_buff *ctx)
+{
+	if (ctx)
+		bpf_throw();
+	return 0;
+}
+
+__noinline int global_subprog(struct __sk_buff *ctx)
+{
+	return subprog(ctx) + 1;
+}
+
+__noinline int throwing_global_subprog(struct __sk_buff *ctx)
+{
+	if (ctx)
+		bpf_throw();
+	return 0;
+}
+
+static __noinline int exception_cb(void)
+{
+	return 16;
+}
+
+SEC("tc")
+int exception_throw_subprog(struct __sk_buff *ctx)
+{
+	volatile int i;
+
+	exception_cb();
+	bpf_set_exception_callback(exception_cb);
+	i = subprog(ctx);
+	i += global_subprog(ctx) - 1;
+	if (!i)
+		return throwing_global_subprog(ctx);
+	else
+		return throwing_subprog(ctx);
+	bpf_throw();
+	return 0;
+}
+
+__noinline int throwing_gfunc(volatile int i)
+{
+	bpf_assert_eq(i, 0);
+	return 1;
+}
+
+__noinline static int throwing_func(volatile int i)
+{
+	bpf_assert_lt(i, 1);
+	return 1;
+}
+
+SEC("tc")
+int exception_throw_gfunc1(void *ctx)
+{
+	return throwing_gfunc(0);
+}
+
+SEC("tc")
+__noinline int exception_throw_gfunc2()
+{
+	return throwing_gfunc(1);
+}
+
+__noinline int throwing_gfunc_2(volatile int i)
+{
+	return throwing_gfunc(i);
+}
+
+SEC("tc")
+int exception_throw_gfunc3(void *ctx)
+{
+	return throwing_gfunc_2(0);
+}
+
+SEC("tc")
+int exception_throw_gfunc4(void *ctx)
+{
+	return throwing_gfunc_2(1);
+}
+
+SEC("tc")
+int exception_throw_gfunc5(void *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	return throwing_gfunc_2(0);
+}
+
+SEC("tc")
+int exception_throw_gfunc6(void *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	return throwing_gfunc_2(1);
+}
+
+
+SEC("tc")
+int exception_throw_func1(void *ctx)
+{
+	return throwing_func(0);
+}
+
+SEC("tc")
+int exception_throw_func2(void *ctx)
+{
+	return throwing_func(1);
+}
+
+__noinline static int throwing_func_2(volatile int i)
+{
+	return throwing_func(i);
+}
+
+SEC("tc")
+int exception_throw_func3(void *ctx)
+{
+	return throwing_func_2(0);
+}
+
+SEC("tc")
+int exception_throw_func4(void *ctx)
+{
+	return throwing_func_2(1);
+}
+
+SEC("tc")
+int exception_throw_func5(void *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	return throwing_func_2(0);
+}
+
+SEC("tc")
+int exception_throw_func6(void *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	return throwing_func_2(1);
+}
+
+__noinline static int loop_cb1(u32 index, int *ctx)
+{
+	bpf_throw();
+	return 0;
+}
+
+__noinline static int loop_cb2(u32 index, int *ctx)
+{
+	bpf_throw();
+	return 0;
+}
+
+SEC("tc")
+int exception_throw_cb1(struct __sk_buff *ctx)
+{
+	bpf_loop(5, loop_cb1, NULL, 0);
+	return 1;
+}
+
+SEC("tc")
+int exception_throw_cb2(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	bpf_loop(5, loop_cb1, NULL, 0);
+	return 0;
+}
+
+SEC("tc")
+int exception_throw_cb_diff(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(exception_cb);
+	if (ctx->protocol)
+		bpf_loop(5, loop_cb1, NULL, 0);
+	else
+		bpf_loop(5, loop_cb2, NULL, 0);
+	return 1;
+}
+
+extern void bpf_kfunc_call_test_always_throws(void) __ksym;
+extern void bpf_kfunc_call_test_never_throws(void) __ksym;
+
+SEC("tc")
+int exception_throw_kfunc1(struct __sk_buff *ctx)
+{
+	bpf_kfunc_call_test_always_throws();
+	return 1;
+}
+
+SEC("tc")
+int exception_throw_kfunc2(struct __sk_buff *ctx)
+{
+	bpf_kfunc_call_test_never_throws();
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_ext.c b/tools/testing/selftests/bpf/progs/exceptions_ext.c
new file mode 100644
index 000000000000..d3b9e32681ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_ext.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_experimental.h"
+
+SEC("?freplace")
+int extension(struct __sk_buff *ctx)
+{
+	return 0;
+}
+
+SEC("?freplace")
+int throwing_extension(struct __sk_buff *ctx)
+{
+	bpf_throw();
+}
+
+SEC("?fexit")
+int pfexit(void *ctx)
+{
+	return 0;
+}
+
+SEC("?fexit")
+int throwing_fexit(void *ctx)
+{
+	bpf_throw();
+}
+
+SEC("?fmod_ret")
+int pfmod_ret(void *ctx)
+{
+	return 1;
+}
+
+SEC("?fmod_ret")
+int throwing_fmod_ret(void *ctx)
+{
+	bpf_throw();
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
new file mode 100644
index 000000000000..d8459c3840e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -0,0 +1,267 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+extern void bpf_rcu_read_lock(void) __ksym;
+
+#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
+
+struct foo {
+	struct bpf_rb_node node;
+};
+
+private(A) struct bpf_spin_lock lock;
+private(A) struct bpf_rb_root rbtree __contains(foo, node);
+
+__noinline static int subprog_lock(struct __sk_buff *ctx)
+{
+	bpf_spin_lock(&lock);
+	if (ctx->len)
+		bpf_throw();
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_with_lock(void *ctx)
+{
+	bpf_spin_lock(&lock);
+	bpf_throw();
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_subprog_with_lock(void *ctx)
+{
+	return subprog_lock(ctx);
+}
+
+SEC("?tc")
+__failure __msg("bpf_rcu_read_unlock is missing")
+int reject_with_rcu_read_lock(void *ctx)
+{
+	bpf_rcu_read_lock();
+	bpf_throw();
+}
+
+__noinline static int throwing_subprog(struct __sk_buff *ctx)
+{
+	if (ctx->len)
+		bpf_throw();
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("bpf_rcu_read_unlock is missing")
+int reject_subprog_with_rcu_read_lock(void *ctx)
+{
+	bpf_rcu_read_lock();
+	return throwing_subprog(ctx);
+}
+
+static bool rbless(struct bpf_rb_node *n1, const struct bpf_rb_node *n2)
+{
+	bpf_throw();
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed while holding a lock")
+int reject_with_rbtree_add_throw(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_spin_lock(&lock);
+	bpf_rbtree_add(&rbtree, &f->node, rbless);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_reference(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_throw();
+}
+
+__noinline static int subprog_ref(struct __sk_buff *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_throw();
+}
+
+__noinline static int subprog_cb_ref(u32 i, void *ctx)
+{
+	bpf_throw();
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_cb_reference(void *ctx)
+{
+	struct foo *f;
+
+	f = bpf_obj_new(typeof(*f));
+	if (!f)
+		return 0;
+	bpf_loop(5, subprog_cb_ref, NULL, 0);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("Unreleased reference")
+int reject_with_subprog_reference(void *ctx)
+{
+	return subprog_ref(ctx) + 1;
+}
+
+static __noinline int throwing_exception_cb(void)
+{
+	int i = 0;
+
+	bpf_assert_ne(i, 0);
+	return i;
+}
+
+static __noinline int exception_cb1(void)
+{
+	int i = 0;
+
+	bpf_assert_eq(i, 0);
+	return i;
+}
+
+static __noinline int exception_cb2(void)
+{
+	int i = 0;
+
+	bpf_assert_eq(i, 0);
+	return i;
+}
+
+__noinline int throwing_exception_gfunc(void)
+{
+	return throwing_exception_cb();
+}
+
+SEC("?tc")
+__failure __msg("is used as exception callback, cannot throw")
+int reject_throwing_exception_cb_1(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(throwing_exception_cb);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("exception callback can throw, which is not allowed")
+int reject_throwing_exception_cb_2(struct __sk_buff *ctx)
+{
+	throwing_exception_gfunc();
+	bpf_set_exception_callback(throwing_exception_cb);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("different exception callback subprogs for same insn 7: 2 and 1")
+int reject_throwing_exception_cb_3(struct __sk_buff *ctx)
+{
+	if (ctx->protocol)
+		bpf_set_exception_callback(exception_cb1);
+	else
+		bpf_set_exception_callback(exception_cb2);
+	bpf_throw();
+}
+
+__noinline int gfunc_set_exception_cb(void)
+{
+	bpf_set_exception_callback(exception_cb1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("exception callback cannot be set within global function or extension program")
+int reject_set_exception_cb_gfunc(struct __sk_buff *ctx)
+{
+	gfunc_set_exception_cb();
+	return 0;
+}
+
+static __noinline int exception_cb_rec(void)
+{
+	bpf_set_exception_callback(exception_cb_rec);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("exception callback cannot be set from within exception callback")
+int reject_set_exception_cb_rec1(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(exception_cb_rec);
+	return 0;
+}
+
+static __noinline int exception_cb_rec2(void);
+
+static __noinline int exception_cb_rec1(void)
+{
+	bpf_set_exception_callback(exception_cb_rec2);
+	return 0;
+}
+
+static __noinline int exception_cb_rec2(void)
+{
+	bpf_set_exception_callback(exception_cb_rec2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("exception callback cannot be set from within exception callback")
+int reject_set_exception_cb_rec2(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(exception_cb_rec1);
+	return 0;
+}
+
+static __noinline int exception_cb_rec3(void)
+{
+	bpf_set_exception_callback(exception_cb1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("exception callback cannot be set from within exception callback")
+int reject_set_exception_cb_rec3(struct __sk_buff *ctx)
+{
+	bpf_set_exception_callback(exception_cb_rec3);
+	return 0;
+}
+
+static __noinline int exception_cb_bad_ret(void)
+{
+	return 4242;
+}
+
+SEC("?fentry/bpf_check")
+__failure __msg("At program exit the register R0 has value")
+int reject_set_exception_cb_bad_ret(void *ctx)
+{
+	bpf_set_exception_callback(exception_cb_bad_ret);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc Kumar Kartikeya Dwivedi
@ 2023-04-06  2:16   ` Alexei Starovoitov
  2023-04-06 23:54     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-06  2:16 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> 
> - The exception state is represented using four booleans in the
>   task_struct of current task. Each boolean corresponds to the exception
>   state for each kernel context. This allows BPF programs to be
>   interrupted and still not clobber the other's exception state.

that doesn't work for sleepable bpf progs and in RT for regular progs too.

> - The other vexing case is of recursion. If a program calls into another
>   program (e.g. call into helper which invokes tracing program
>   eventually), it may throw and clobber the current exception state. To
>   avoid this, an invariant is maintained across the implementation:
> 	Exception state is always cleared on entry and exit of the main
> 	BPF program.
>   This implies that if recursion occurs, the BPF program will clear the
>   current exception state on entry and exit. However, callbacks do not
>   do the same, because they are subprograms. The case for propagating
>   exceptions of callbacks invoked by the kernel back to the BPF program
>   is handled in the next commit. This is also the main reason to clear
>   exception state on entry, asynchronous callbacks can clobber exception
>   state even though we make sure it's always set to be 0 within the
>   kernel.
>   Anyhow, the only other thing to be kept in mind is to never allow a
>   BPF program to execute when the program is being unwinded. This
>   implies that every function involved in this path must be notrace,
>   which is the case for bpf_throw, bpf_get_exception and
>   bpf_reset_exception.

...

> +			struct bpf_insn entry_insns[] = {
> +				BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +				BPF_EMIT_CALL(bpf_reset_exception),
> +				BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> +				insn[i],
> +			};

Is not correct in global bpf progs that take more than 1 argument.

How about using a scratch space in prog->aux->exception[] instead of current task?

> +notrace u64 bpf_get_exception(void)
> +{
> +	int i = interrupt_context_level();
> +
> +	return current->bpf_exception_thrown[i];
> +}

this is too slow to be acceptable.
it needs to be single load plus branch.
with prog->aux->exception approach we can achieve that.
Instead of inserting a call to bpf_get_exception() we can do load+cmp.
We probably should pass prog->aux into exception callback, so it
can know where throw came from.

> - Rewrites happen for bpf_throw and call instructions to subprogs.
>   The instructions which are executed in the main frame of the main
>   program (thus, not global functions and extension programs, which end
>   up executing in frame > 0) need to be rewritten differently. This is
>   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a

how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
would it be more precise ?

> +__bpf_kfunc notrace void bpf_throw(void)
> +{
> +	int i = interrupt_context_level();
> +
> +	current->bpf_exception_thrown[i] = true;
> +}

I think this needs to take u64 or couple u64 args and store them
in the scratch area.
bpf_assert* macros also need a way for bpf prog to specify
the reason for the assertion.
Otherwise there won't be any way to debug what happened.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs Kumar Kartikeya Dwivedi
@ 2023-04-06  2:21   ` Alexei Starovoitov
  2023-04-07  0:07     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-06  2:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
>  
>  	for (i = 0; i < nr_loops; i++) {
>  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> +		if (bpf_get_exception())
> +			return -EJUKEBOX;

This is too slow.
We cannot afford a call and conditional here.
Some time ago folks tried bpf_loop() and went back to bounded loop, because
the overhead of indirect call was not acceptable.
After that we've added inlining of bpf_loop() to make overhead to the minimum.
With prog->aux->exception[] approach it might be ok-ish,
but my preference would be to disallow throw in callbacks.
timer cb, rbtree_add cb are typically small.
bpf_loop cb can be big, but we have open coded iterators now.
So disabling asserts in cb-s is probably acceptable trade-off.

The choice of error name is odd, tbh.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-05  0:42 ` [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
@ 2023-04-06  2:38   ` Alexei Starovoitov
  2023-04-07  0:42     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-06  2:38 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Wed, Apr 05, 2023 at 02:42:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> +static __noinline int throwing_subprog(struct __sk_buff *ctx)
> +{
> +	if (ctx)
> +		bpf_throw();
> +	return 0;
> +}
> +
> +__noinline int global_subprog(struct __sk_buff *ctx)
> +{
> +	return subprog(ctx) + 1;
> +}
> +
> +__noinline int throwing_global_subprog(struct __sk_buff *ctx)
> +{
> +	if (ctx)
> +		bpf_throw();
> +	return 0;
> +}
> +
> +static __noinline int exception_cb(void)
> +{
> +	return 16;
> +}
> +
> +SEC("tc")
> +int exception_throw_subprog(struct __sk_buff *ctx)
> +{
> +	volatile int i;
> +
> +	exception_cb();
> +	bpf_set_exception_callback(exception_cb);
> +	i = subprog(ctx);
> +	i += global_subprog(ctx) - 1;
> +	if (!i)
> +		return throwing_global_subprog(ctx);
> +	else
> +		return throwing_subprog(ctx);
> +	bpf_throw();
> +	return 0;
> +}
> +
> +__noinline int throwing_gfunc(volatile int i)
> +{
> +	bpf_assert_eq(i, 0);
> +	return 1;
> +}
> +
> +__noinline static int throwing_func(volatile int i)
> +{
> +	bpf_assert_lt(i, 1);
> +	return 1;
> +}

exception_cb() has no way of knowning which assert statement threw the exception.
How about extending a macro:
bpf_assert_eq(i, 0, MY_INT_ERR);
or
bpf_assert_eq(i, 0) {bpf_throw(MY_INT_ERR);}

bpf_throw can store it in prog->aux->exception pass the address to cb.

Also I think we shouldn't complicate the verifier with auto release of resources.
If the user really wants to assert when spin_lock is held it should be user's
job to specify what resources should be released.
Can we make it look like:

bpf_spin_lock(&lock);
bpf_assert_eq(i, 0) {
  bpf_spin_unlock(&lock);
  bpf_throw(MY_INT_ERR);
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-06  2:16   ` Alexei Starovoitov
@ 2023-04-06 23:54     ` Kumar Kartikeya Dwivedi
  2023-04-07  2:11       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-06 23:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> >
> > - The exception state is represented using four booleans in the
> >   task_struct of current task. Each boolean corresponds to the exception
> >   state for each kernel context. This allows BPF programs to be
> >   interrupted and still not clobber the other's exception state.
>
> that doesn't work for sleepable bpf progs and in RT for regular progs too.
>

Can you elaborate? If a sleepable program blocks, that means the task is
scheduled out, so the next program will use the other task's task_struct.
Same for preemption for normal progs (under RT or not).

Is there something else that I'm missing?

> > - The other vexing case is of recursion. If a program calls into another
> >   program (e.g. call into helper which invokes tracing program
> >   eventually), it may throw and clobber the current exception state. To
> >   avoid this, an invariant is maintained across the implementation:
> > 	Exception state is always cleared on entry and exit of the main
> > 	BPF program.
> >   This implies that if recursion occurs, the BPF program will clear the
> >   current exception state on entry and exit. However, callbacks do not
> >   do the same, because they are subprograms. The case for propagating
> >   exceptions of callbacks invoked by the kernel back to the BPF program
> >   is handled in the next commit. This is also the main reason to clear
> >   exception state on entry, asynchronous callbacks can clobber exception
> >   state even though we make sure it's always set to be 0 within the
> >   kernel.
> >   Anyhow, the only other thing to be kept in mind is to never allow a
> >   BPF program to execute when the program is being unwinded. This
> >   implies that every function involved in this path must be notrace,
> >   which is the case for bpf_throw, bpf_get_exception and
> >   bpf_reset_exception.
>
> ...
>
> > +			struct bpf_insn entry_insns[] = {
> > +				BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> > +				BPF_EMIT_CALL(bpf_reset_exception),
> > +				BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> > +				insn[i],
> > +			};
>
> Is not correct in global bpf progs that take more than 1 argument.
>

But this is not done for global subprogs, only for the main subprog, it only
needs to be done when we enter the program from the kernel.

> How about using a scratch space in prog->aux->exception[] instead of current task?
>

I actually had this thought. It's even better because we can hardcode the
address of the exception state right in the program (since prog->aux remains
stable during bpf_patch_insn_data). However, concurrent program invocations on
multiple CPUs doesn't work well with this. It's like, one program sets the state
while the other tries to check it. It can be per-CPU but then we have to disable
preemption (which cannot be done).

Unfortunately per-task state seemed like the only choice which works without
complicating things too much.

> > +notrace u64 bpf_get_exception(void)
> > +{
> > +	int i = interrupt_context_level();
> > +
> > +	return current->bpf_exception_thrown[i];
> > +}
>
> this is too slow to be acceptable.

I agree, also partly why I still marked this an RFC.

> it needs to be single load plus branch.
> with prog->aux->exception approach we can achieve that.
> Instead of inserting a call to bpf_get_exception() we can do load+cmp.
> We probably should pass prog->aux into exception callback, so it
> can know where throw came from.
>

IMO prog->aux->exception won't work either (unless I'm missing some way which
you can point out). The other option would be that we spill pointer to the
per-task exception state to the program's stack on entry, and then every check
loads the value and performs the check. It will be a load from stack, load from
memory and then a jump instruction. Still not as good as a direct load though,
which we'd have with prog->aux, but much better than the current state.

> > - Rewrites happen for bpf_throw and call instructions to subprogs.
> >   The instructions which are executed in the main frame of the main
> >   program (thus, not global functions and extension programs, which end
> >   up executing in frame > 0) need to be rewritten differently. This is
> >   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
>
> how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
> would it be more precise ?

I'm fine with the renaming. The only thing the type signifies is if we need to
do the rewrite for frame 0 vs frame N.

>
> > +__bpf_kfunc notrace void bpf_throw(void)
> > +{
> > +	int i = interrupt_context_level();
> > +
> > +	current->bpf_exception_thrown[i] = true;
> > +}
>
> I think this needs to take u64 or couple u64 args and store them
> in the scratch area.
> bpf_assert* macros also need a way for bpf prog to specify
> the reason for the assertion.
> Otherwise there won't be any way to debug what happened.

I agree. Should we force it to be a constant value? Then we can hardcode it in
the .text without having to save and restore it, but that might end up being a
little too restrictive?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-06  2:21   ` Alexei Starovoitov
@ 2023-04-07  0:07     ` Kumar Kartikeya Dwivedi
  2023-04-07  2:15       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-07  0:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> >
> >  	for (i = 0; i < nr_loops; i++) {
> >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > +		if (bpf_get_exception())
> > +			return -EJUKEBOX;
>
> This is too slow.
> We cannot afford a call and conditional here.

There are two more options here: have two variants, one with and without the
check (always_inline template and bpf_loop vs bpf_loop_except calling functions
which pass false/true) and dispatch to the appropriate one based on if callback
throws or not (so the cost is not paid for current users at all). Secondly, we
can avoid repeated calls by hoisting the call out and save the pointer to
exception state, then it's a bit less costly.

> Some time ago folks tried bpf_loop() and went back to bounded loop, because
> the overhead of indirect call was not acceptable.
> After that we've added inlining of bpf_loop() to make overhead to the minimum.
> With prog->aux->exception[] approach it might be ok-ish,
> but my preference would be to disallow throw in callbacks.
> timer cb, rbtree_add cb are typically small.
> bpf_loop cb can be big, but we have open coded iterators now.
> So disabling asserts in cb-s is probably acceptable trade-off.

If the only reason to avoid them is the added performance cost, we can work
towards eliminating that when bpf_throw is not used (see above). I agree that
supporting it everywhere means thinking about a lot more corner cases, but I
feel it would be less surprising if doing bpf_assert simply worked everywhere.
One of the other reasons is that if it's being used within a shared static
function that both main program and callbacks call into, it will be a bit
annoying that it doesn't work in one context.

>
> The choice of error name is odd, tbh.

I was trying to choose something that helpers will never pass back to the
program to avoid conflicts. Open to other suggestions (but it may not matter
depending on the discussion above).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-06  2:38   ` Alexei Starovoitov
@ 2023-04-07  0:42     ` Kumar Kartikeya Dwivedi
  2023-04-07  2:30       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-07  0:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Thu, Apr 06, 2023 at 04:38:09AM CEST, Alexei Starovoitov wrote:
> On Wed, Apr 05, 2023 at 02:42:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> > +static __noinline int throwing_subprog(struct __sk_buff *ctx)
> > +{
> > +	if (ctx)
> > +		bpf_throw();
> > +	return 0;
> > +}
> > +
> > +__noinline int global_subprog(struct __sk_buff *ctx)
> > +{
> > +	return subprog(ctx) + 1;
> > +}
> > +
> > +__noinline int throwing_global_subprog(struct __sk_buff *ctx)
> > +{
> > +	if (ctx)
> > +		bpf_throw();
> > +	return 0;
> > +}
> > +
> > +static __noinline int exception_cb(void)
> > +{
> > +	return 16;
> > +}
> > +
> > +SEC("tc")
> > +int exception_throw_subprog(struct __sk_buff *ctx)
> > +{
> > +	volatile int i;
> > +
> > +	exception_cb();
> > +	bpf_set_exception_callback(exception_cb);
> > +	i = subprog(ctx);
> > +	i += global_subprog(ctx) - 1;
> > +	if (!i)
> > +		return throwing_global_subprog(ctx);
> > +	else
> > +		return throwing_subprog(ctx);
> > +	bpf_throw();
> > +	return 0;
> > +}
> > +
> > +__noinline int throwing_gfunc(volatile int i)
> > +{
> > +	bpf_assert_eq(i, 0);
> > +	return 1;
> > +}
> > +
> > +__noinline static int throwing_func(volatile int i)
> > +{
> > +	bpf_assert_lt(i, 1);
> > +	return 1;
> > +}
>
> exception_cb() has no way of knowning which assert statement threw the exception.
> How about extending a macro:
> bpf_assert_eq(i, 0, MY_INT_ERR);
> or
> bpf_assert_eq(i, 0) {bpf_throw(MY_INT_ERR);}
>
> bpf_throw can store it in prog->aux->exception pass the address to cb.
>

I agree and will add passing of a value that gets passed to the callback
(probably just set it in the exception state), but I don't think prog->aux will
work, see previous mails.

> Also I think we shouldn't complicate the verifier with auto release of resources.
> If the user really wants to assert when spin_lock is held it should be user's
> job to specify what resources should be released.
> Can we make it look like:
>
> bpf_spin_lock(&lock);
> bpf_assert_eq(i, 0) {
>   bpf_spin_unlock(&lock);
>   bpf_throw(MY_INT_ERR);
> }

Do you mean just locks or all resources? Then it kind of undermines the point of
having something like bpf_throw IMO. Since it's easy to insert code from the
point of throw but it's not possible to do the same in callers (unless we add a
way to 'catch' throws), so it only works for some particular cases where callers
don't hold references (or in the main subprog).

There are also other ways to go about this whole thing, like having the compiler
emit calls to instrinsics which the BPF runtime provides (or have the call
configurable through compiler switches), and it already emits the landing pad
code to release stuff and we simply receive the table of pads indexed by each
throwing instruction, perform necessary checks to ensure everything is actually
released correctly when control flow goes through them (e.g. when exploring
multiple paths through the same instruction), and unwind frame by frame. That
reduces the burden on both the verifier and user, but then it would be probably
need to be BPF C++, or have to be a new language extension for BPF C. E.g. there
was something about defer, panic, recover etc. in wg14
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2542.pdf . Having the compiler
do it is also probably easier if we want 'catch' style handlers.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-06 23:54     ` Kumar Kartikeya Dwivedi
@ 2023-04-07  2:11       ` Alexei Starovoitov
  2023-04-07  2:46         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-07  2:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> > >
> > > - The exception state is represented using four booleans in the
> > >   task_struct of current task. Each boolean corresponds to the exception
> > >   state for each kernel context. This allows BPF programs to be
> > >   interrupted and still not clobber the other's exception state.
> >
> > that doesn't work for sleepable bpf progs and in RT for regular progs too.
> >
> 
> Can you elaborate? If a sleepable program blocks, that means the task is
> scheduled out, so the next program will use the other task's task_struct.
> Same for preemption for normal progs (under RT or not).

I was worried about the case of the same task but different code paths
in the kernel with tracing prog stepping on preempted lsm.s prog.
I think you point that in this case they gotta be in different interrupt_context_level.
I need to think it through a bit more.

> > How about using a scratch space in prog->aux->exception[] instead of current task?
> >
> 
> I actually had this thought. It's even better because we can hardcode the
> address of the exception state right in the program (since prog->aux remains
> stable during bpf_patch_insn_data). 

exactly.

> However, concurrent program invocations on
> multiple CPUs doesn't work well with this. It's like, one program sets the state
> while the other tries to check it. 

Right. If it asserts on one cpu all other cpus will unwind as well,
since we're saying bpf_assert is for exceptions when user cannot convince
the verifier that the program is correct.
So it doesn't matter that it aborted everywhere. It's probably a good thing too.

> It can be per-CPU but then we have to disable
> preemption (which cannot be done).

I was thinking to propose a per-cpu prog->aux->exception[] area.
Sleepable and not are in migrate_disable(). bpf progs never migrate.
So we can do it, but we'd need new pseudo this_cpu_ptr instruction and
corresponding JIT support which felt like overkill.

Another idea I contemplated is to do preempt_disable() and local_irq_save()
into special field in prog->aux->exception first thing in bpf_throw()
and then re-enable everything before entering exception cb.
To avoid races with other progs, but NMI can still happen, so it's pointless.
Just non-per-cpu prog->aux->exception seems good enough.

> > > - Rewrites happen for bpf_throw and call instructions to subprogs.
> > >   The instructions which are executed in the main frame of the main
> > >   program (thus, not global functions and extension programs, which end
> > >   up executing in frame > 0) need to be rewritten differently. This is
> > >   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
> >
> > how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
> > would it be more precise ?
> 
> I'm fine with the renaming. The only thing the type signifies is if we need to
> do the rewrite for frame 0 vs frame N.

Sure. BPF_THROW_FRAME_ZERO and BPF_THROW_FRAME_NON_ZERO also works.
Or any name that shows that 2nd includes multiple cases.

> >
> > > +__bpf_kfunc notrace void bpf_throw(void)
> > > +{
> > > +	int i = interrupt_context_level();
> > > +
> > > +	current->bpf_exception_thrown[i] = true;
> > > +}
> >
> > I think this needs to take u64 or couple u64 args and store them
> > in the scratch area.
> > bpf_assert* macros also need a way for bpf prog to specify
> > the reason for the assertion.
> > Otherwise there won't be any way to debug what happened.
> 
> I agree. Should we force it to be a constant value? Then we can hardcode it in
> the .text without having to save and restore it, but that might end up being a
> little too restrictive?

with prog->aux->exception approach run-time values works too.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-07  0:07     ` Kumar Kartikeya Dwivedi
@ 2023-04-07  2:15       ` Alexei Starovoitov
  2023-04-07  2:57         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-07  2:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > >
> > >  	for (i = 0; i < nr_loops; i++) {
> > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > +		if (bpf_get_exception())
> > > +			return -EJUKEBOX;
> >
> > This is too slow.
> > We cannot afford a call and conditional here.
> 
> There are two more options here: have two variants, one with and without the
> check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> which pass false/true) and dispatch to the appropriate one based on if callback
> throws or not (so the cost is not paid for current users at all). Secondly, we
> can avoid repeated calls by hoisting the call out and save the pointer to
> exception state, then it's a bit less costly.
> 
> > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > the overhead of indirect call was not acceptable.
> > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > With prog->aux->exception[] approach it might be ok-ish,
> > but my preference would be to disallow throw in callbacks.
> > timer cb, rbtree_add cb are typically small.
> > bpf_loop cb can be big, but we have open coded iterators now.
> > So disabling asserts in cb-s is probably acceptable trade-off.
> 
> If the only reason to avoid them is the added performance cost, we can work
> towards eliminating that when bpf_throw is not used (see above). I agree that
> supporting it everywhere means thinking about a lot more corner cases, but I
> feel it would be less surprising if doing bpf_assert simply worked everywhere.
> One of the other reasons is that if it's being used within a shared static
> function that both main program and callbacks call into, it will be a bit
> annoying that it doesn't work in one context.

I hope with open coded iterators the only use case for callbacks will be timers,
exception cb and rbtree_add. All three are special cases.
There is nothing to unwind in the timer case.
Certinaly not allowed to rethrow in exception cb.
less-like rbtree_add should be tiny. And it's gotta to be fast.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-07  0:42     ` Kumar Kartikeya Dwivedi
@ 2023-04-07  2:30       ` Alexei Starovoitov
  2023-04-07  3:12         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-07  2:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 02:42:14AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 06, 2023 at 04:38:09AM CEST, Alexei Starovoitov wrote:
> > On Wed, Apr 05, 2023 at 02:42:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > +static __noinline int throwing_subprog(struct __sk_buff *ctx)
> > > +{
> > > +	if (ctx)
> > > +		bpf_throw();
> > > +	return 0;
> > > +}
> > > +
> > > +__noinline int global_subprog(struct __sk_buff *ctx)
> > > +{
> > > +	return subprog(ctx) + 1;
> > > +}
> > > +
> > > +__noinline int throwing_global_subprog(struct __sk_buff *ctx)
> > > +{
> > > +	if (ctx)
> > > +		bpf_throw();
> > > +	return 0;
> > > +}
> > > +
> > > +static __noinline int exception_cb(void)
> > > +{
> > > +	return 16;
> > > +}
> > > +
> > > +SEC("tc")
> > > +int exception_throw_subprog(struct __sk_buff *ctx)
> > > +{
> > > +	volatile int i;
> > > +
> > > +	exception_cb();
> > > +	bpf_set_exception_callback(exception_cb);
> > > +	i = subprog(ctx);
> > > +	i += global_subprog(ctx) - 1;
> > > +	if (!i)
> > > +		return throwing_global_subprog(ctx);
> > > +	else
> > > +		return throwing_subprog(ctx);
> > > +	bpf_throw();
> > > +	return 0;
> > > +}
> > > +
> > > +__noinline int throwing_gfunc(volatile int i)
> > > +{
> > > +	bpf_assert_eq(i, 0);
> > > +	return 1;
> > > +}
> > > +
> > > +__noinline static int throwing_func(volatile int i)
> > > +{
> > > +	bpf_assert_lt(i, 1);
> > > +	return 1;
> > > +}
> >
> > exception_cb() has no way of knowning which assert statement threw the exception.
> > How about extending a macro:
> > bpf_assert_eq(i, 0, MY_INT_ERR);
> > or
> > bpf_assert_eq(i, 0) {bpf_throw(MY_INT_ERR);}
> >
> > bpf_throw can store it in prog->aux->exception pass the address to cb.
> >
> 
> I agree and will add passing of a value that gets passed to the callback
> (probably just set it in the exception state), but I don't think prog->aux will
> work, see previous mails.
> 
> > Also I think we shouldn't complicate the verifier with auto release of resources.
> > If the user really wants to assert when spin_lock is held it should be user's
> > job to specify what resources should be released.
> > Can we make it look like:
> >
> > bpf_spin_lock(&lock);
> > bpf_assert_eq(i, 0) {
> >   bpf_spin_unlock(&lock);
> >   bpf_throw(MY_INT_ERR);
> > }
> 
> Do you mean just locks or all resources? Then it kind of undermines the point of
> having something like bpf_throw IMO. Since it's easy to insert code from the
> point of throw but it's not possible to do the same in callers (unless we add a
> way to 'catch' throws), so it only works for some particular cases where callers
> don't hold references (or in the main subprog).

That's a good point. This approach will force caller of functions that can
throw not hold any referenced objects (spin_locks, sk_lookup, obj_new, etc)
That indeed might be too restrictive.
It would be great if we could come up with a way for bpf prog to release resources
explicitly though. I'm not a fan of magic release of spin locks.

> There are also other ways to go about this whole thing, like having the compiler
> emit calls to instrinsics which the BPF runtime provides (or have the call
> configurable through compiler switches), and it already emits the landing pad
> code to release stuff and we simply receive the table of pads indexed by each
> throwing instruction, perform necessary checks to ensure everything is actually
> released correctly when control flow goes through them (e.g. when exploring
> multiple paths through the same instruction), and unwind frame by frame. That
> reduces the burden on both the verifier and user, but then it would be probably
> need to be BPF C++, or have to be a new language extension for BPF C. E.g. there
> was something about defer, panic, recover etc. in wg14
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2542.pdf . Having the compiler
> do it is also probably easier if we want 'catch' style handlers.

Interesting idea. __attribute__((cleanup(..))) may be handy, but that's a ton of work.
I'm in a camp of developers who enforce -fno-exceptions in C++ projects.
bpf_assert is not analogous to C++ exceptions. It shouldn't be seen a mechanism
to simplify control flow and error handling.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-07  2:11       ` Alexei Starovoitov
@ 2023-04-07  2:46         ` Kumar Kartikeya Dwivedi
  2023-04-12 19:36           ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-07  2:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > >
> > > > - The exception state is represented using four booleans in the
> > > >   task_struct of current task. Each boolean corresponds to the exception
> > > >   state for each kernel context. This allows BPF programs to be
> > > >   interrupted and still not clobber the other's exception state.
> > >
> > > that doesn't work for sleepable bpf progs and in RT for regular progs too.
> > >
> >
> > Can you elaborate? If a sleepable program blocks, that means the task is
> > scheduled out, so the next program will use the other task's task_struct.
> > Same for preemption for normal progs (under RT or not).
>
> I was worried about the case of the same task but different code paths
> in the kernel with tracing prog stepping on preempted lsm.s prog.
> I think you point that in this case they gotta be in different interrupt_context_level.
> I need to think it through a bit more.
>

If there is nesting, the programs always clear their exception state on exit, so
the prog that calls into the kernel which then calls into the tracing prog etc.
won't see its exception state on return. The only path where attaching programs
would screw things up is when we see a thrown exception and start unwinding
(where clearing would be a problem since its done frame-by-frame). For that, I
already prevent _throwing_ fexit programs from attaching to subprogs in this
series (normal ones are still ok and supported, because fentry/fexit is
important for stats etc.). There might be some other corner cases I missed but
ensuring this property alone in general should make things work correctly.

> > > How about using a scratch space in prog->aux->exception[] instead of current task?
> > >
> >
> > I actually had this thought. It's even better because we can hardcode the
> > address of the exception state right in the program (since prog->aux remains
> > stable during bpf_patch_insn_data).
>
> exactly.
>
> > However, concurrent program invocations on
> > multiple CPUs doesn't work well with this. It's like, one program sets the state
> > while the other tries to check it.
>
> Right. If it asserts on one cpu all other cpus will unwind as well,
> since we're saying bpf_assert is for exceptions when user cannot convince
> the verifier that the program is correct.
> So it doesn't matter that it aborted everywhere. It's probably a good thing too.
>

We can discuss the semantics (this makes bpf_assert more stronger and basically
poisons the program globally in some sense), but implementation wise it's going
to be a lot more tricky to reason about correctness.

Right now, the verifier follows paths and knows what resources are held when we
throw from a nested call chain (to complain on leaks). Callers doing the check
for exception state at runtime expect only certain throwing points to trigger
the check and rely on that for leak freedom.

With a global prog->aux->exception, things will be ok for the CPU on which the
exception was thrown, but some other CPU will see the check returning true in a
caller even if the callee subprog for it did not throw and was possibly
transferring its acquired references to the caller after completing execution,
which now causes leaks (because subprogs are allowed to acquire and return to
their caller).

The way to handle this would be that we assume every callee which throws may
also notionally throw right when returning (due to some other CPU's throw which
we may see). Then every exit from throwing callees may be processed as throwing
if we see the global state as set.

However, this completely prevents subprogs from transferring some acquired
resource to their caller (which I think is too restrictive). If I'm acquiring
memory from static subprog and returning to my caller, I can't any longer since
I notionally throw when exiting and holding resources when doing bpf_throw is
disallowed, so transfer is out of the question.

In the current scenario it would work, because I threw early on in my subprog
when checking some condition (or proving something to the verifier) and after
that just chugged along and did my work.

> > It can be per-CPU but then we have to disable
> > preemption (which cannot be done).
>
> I was thinking to propose a per-cpu prog->aux->exception[] area.
> Sleepable and not are in migrate_disable(). bpf progs never migrate.
> So we can do it, but we'd need new pseudo this_cpu_ptr instruction and
> corresponding JIT support which felt like overkill.
>
> Another idea I contemplated is to do preempt_disable() and local_irq_save()
> into special field in prog->aux->exception first thing in bpf_throw()
> and then re-enable everything before entering exception cb.
> To avoid races with other progs, but NMI can still happen, so it's pointless.
> Just non-per-cpu prog->aux->exception seems good enough.
>
> > > > - Rewrites happen for bpf_throw and call instructions to subprogs.
> > > >   The instructions which are executed in the main frame of the main
> > > >   program (thus, not global functions and extension programs, which end
> > > >   up executing in frame > 0) need to be rewritten differently. This is
> > > >   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
> > >
> > > how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
> > > would it be more precise ?
> >
> > I'm fine with the renaming. The only thing the type signifies is if we need to
> > do the rewrite for frame 0 vs frame N.
>
> Sure. BPF_THROW_FRAME_ZERO and BPF_THROW_FRAME_NON_ZERO also works.
> Or any name that shows that 2nd includes multiple cases.
>

Ack.

> > >
> > > > +__bpf_kfunc notrace void bpf_throw(void)
> > > > +{
> > > > +	int i = interrupt_context_level();
> > > > +
> > > > +	current->bpf_exception_thrown[i] = true;
> > > > +}
> > >
> > > I think this needs to take u64 or couple u64 args and store them
> > > in the scratch area.
> > > bpf_assert* macros also need a way for bpf prog to specify
> > > the reason for the assertion.
> > > Otherwise there won't be any way to debug what happened.
> >
> > I agree. Should we force it to be a constant value? Then we can hardcode it in
> > the .text without having to save and restore it, but that might end up being a
> > little too restrictive?
>
> with prog->aux->exception approach run-time values works too.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-07  2:15       ` Alexei Starovoitov
@ 2023-04-07  2:57         ` Kumar Kartikeya Dwivedi
  2023-04-12 19:43           ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-07  2:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > >
> > > >  	for (i = 0; i < nr_loops; i++) {
> > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > +		if (bpf_get_exception())
> > > > +			return -EJUKEBOX;
> > >
> > > This is too slow.
> > > We cannot afford a call and conditional here.
> >
> > There are two more options here: have two variants, one with and without the
> > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > which pass false/true) and dispatch to the appropriate one based on if callback
> > throws or not (so the cost is not paid for current users at all). Secondly, we
> > can avoid repeated calls by hoisting the call out and save the pointer to
> > exception state, then it's a bit less costly.
> >
> > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > the overhead of indirect call was not acceptable.
> > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > With prog->aux->exception[] approach it might be ok-ish,
> > > but my preference would be to disallow throw in callbacks.
> > > timer cb, rbtree_add cb are typically small.
> > > bpf_loop cb can be big, but we have open coded iterators now.
> > > So disabling asserts in cb-s is probably acceptable trade-off.
> >
> > If the only reason to avoid them is the added performance cost, we can work
> > towards eliminating that when bpf_throw is not used (see above). I agree that
> > supporting it everywhere means thinking about a lot more corner cases, but I
> > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > One of the other reasons is that if it's being used within a shared static
> > function that both main program and callbacks call into, it will be a bit
> > annoying that it doesn't work in one context.
>
> I hope with open coded iterators the only use case for callbacks will be timers,
> exception cb and rbtree_add. All three are special cases.

There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.

> There is nothing to unwind in the timer case.
> Certinaly not allowed to rethrow in exception cb.
> less-like rbtree_add should be tiny. And it's gotta to be fast.

I agree for some of the cases above they do not matter too much. The main one
was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
people may do siginificant work in the callback.

I think we can make it zero cost for programs that don't use it (even for the
kernel code), I was just hoping to be able to support it everywhere as a generic
helper to abort program execution without any special corner cases during usage.
The other main point was about code sharing of functions which makes use of
assertions. But I'm ok with dropping support for callbacks if you think it's not
worth it in the end.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-07  2:30       ` Alexei Starovoitov
@ 2023-04-07  3:12         ` Kumar Kartikeya Dwivedi
  2023-04-12 19:46           ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-07  3:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 04:30:07AM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 02:42:14AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Apr 06, 2023 at 04:38:09AM CEST, Alexei Starovoitov wrote:
> > > On Wed, Apr 05, 2023 at 02:42:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > +static __noinline int throwing_subprog(struct __sk_buff *ctx)
> > > > +{
> > > > +	if (ctx)
> > > > +		bpf_throw();
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +__noinline int global_subprog(struct __sk_buff *ctx)
> > > > +{
> > > > +	return subprog(ctx) + 1;
> > > > +}
> > > > +
> > > > +__noinline int throwing_global_subprog(struct __sk_buff *ctx)
> > > > +{
> > > > +	if (ctx)
> > > > +		bpf_throw();
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static __noinline int exception_cb(void)
> > > > +{
> > > > +	return 16;
> > > > +}
> > > > +
> > > > +SEC("tc")
> > > > +int exception_throw_subprog(struct __sk_buff *ctx)
> > > > +{
> > > > +	volatile int i;
> > > > +
> > > > +	exception_cb();
> > > > +	bpf_set_exception_callback(exception_cb);
> > > > +	i = subprog(ctx);
> > > > +	i += global_subprog(ctx) - 1;
> > > > +	if (!i)
> > > > +		return throwing_global_subprog(ctx);
> > > > +	else
> > > > +		return throwing_subprog(ctx);
> > > > +	bpf_throw();
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +__noinline int throwing_gfunc(volatile int i)
> > > > +{
> > > > +	bpf_assert_eq(i, 0);
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +__noinline static int throwing_func(volatile int i)
> > > > +{
> > > > +	bpf_assert_lt(i, 1);
> > > > +	return 1;
> > > > +}
> > >
> > > exception_cb() has no way of knowning which assert statement threw the exception.
> > > How about extending a macro:
> > > bpf_assert_eq(i, 0, MY_INT_ERR);
> > > or
> > > bpf_assert_eq(i, 0) {bpf_throw(MY_INT_ERR);}
> > >
> > > bpf_throw can store it in prog->aux->exception pass the address to cb.
> > >
> >
> > I agree and will add passing of a value that gets passed to the callback
> > (probably just set it in the exception state), but I don't think prog->aux will
> > work, see previous mails.
> >
> > > Also I think we shouldn't complicate the verifier with auto release of resources.
> > > If the user really wants to assert when spin_lock is held it should be user's
> > > job to specify what resources should be released.
> > > Can we make it look like:
> > >
> > > bpf_spin_lock(&lock);
> > > bpf_assert_eq(i, 0) {
> > >   bpf_spin_unlock(&lock);
> > >   bpf_throw(MY_INT_ERR);
> > > }
> >
> > Do you mean just locks or all resources? Then it kind of undermines the point of
> > having something like bpf_throw IMO. Since it's easy to insert code from the
> > point of throw but it's not possible to do the same in callers (unless we add a
> > way to 'catch' throws), so it only works for some particular cases where callers
> > don't hold references (or in the main subprog).
>
> That's a good point. This approach will force caller of functions that can
> throw not hold any referenced objects (spin_locks, sk_lookup, obj_new, etc)
> That indeed might be too restrictive.
> It would be great if we could come up with a way for bpf prog to release resources
> explicitly though. I'm not a fan of magic release of spin locks.
>

I think to realize that, we need a way to 'catch' thrown exceptions in the
caller. The user will write a block of code that handles release of resources in
the current scope and resume unwinding (implicitly by calling into some
intrinsics). When we discussed this earlier, you rightly mentioned problems
associated with introducing control flow into the program not seen by the
compiler (unlike try/catch in something like C++ which has clear semantics).

But to take a step back, and about what you've said below:

> bpf_assert is not analogous to C++ exceptions. It shouldn't be seen a mechanism
> to simplify control flow and error handling.

When programs get explicit control and handle the 'exceptional' condition, it
begins to mimic more of an alternative 'slow path' of the program when it
encounters an error, and starts resembling an alternative error handling
mechanism pretty much like C++ exceptions.

I think that was not the goal here, and the automatic release of resources is
simply supposed to happen to ensure that the kernel's resource safety is not
violated when a program is aborting. An assertion's should never occur at
runtime 99.9999% of the time, but when it does, the BPF runtime emits code to
ensure that the aborting program does not leave the kernel in an inconsistent
state (memory leaks, held locks that other programs can never take again, etc.).

So this will involve clean up of every resource it had acquired when it threw.
If we cannot ensure that we can safely release resources (e.g. throwing in NMI
context where a kptr_xchg'd pointer cannot be freed), we will bail during
verification.

> > There are also other ways to go about this whole thing, like having the compiler
> > emit calls to instrinsics which the BPF runtime provides (or have the call
> > configurable through compiler switches), and it already emits the landing pad
> > code to release stuff and we simply receive the table of pads indexed by each
> > throwing instruction, perform necessary checks to ensure everything is actually
> > released correctly when control flow goes through them (e.g. when exploring
> > multiple paths through the same instruction), and unwind frame by frame. That
> > reduces the burden on both the verifier and user, but then it would be probably
> > need to be BPF C++, or have to be a new language extension for BPF C. E.g. there
> > was something about defer, panic, recover etc. in wg14
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2542.pdf . Having the compiler
> > do it is also probably easier if we want 'catch' style handlers.
>
> Interesting idea. __attribute__((cleanup(..))) may be handy, but that's a ton of work.
> I'm in a camp of developers who enforce -fno-exceptions in C++ projects.
> bpf_assert is not analogous to C++ exceptions. It shouldn't be seen a mechanism
> to simplify control flow and error handling.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-07  2:46         ` Kumar Kartikeya Dwivedi
@ 2023-04-12 19:36           ` Alexei Starovoitov
  2023-04-13 17:05             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-12 19:36 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 04:46:55AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> > > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > >
> > > > > - The exception state is represented using four booleans in the
> > > > >   task_struct of current task. Each boolean corresponds to the exception
> > > > >   state for each kernel context. This allows BPF programs to be
> > > > >   interrupted and still not clobber the other's exception state.
> > > >
> > > > that doesn't work for sleepable bpf progs and in RT for regular progs too.
> > > >
> > >
> > > Can you elaborate? If a sleepable program blocks, that means the task is
> > > scheduled out, so the next program will use the other task's task_struct.
> > > Same for preemption for normal progs (under RT or not).
> >
> > I was worried about the case of the same task but different code paths
> > in the kernel with tracing prog stepping on preempted lsm.s prog.
> > I think you point that in this case they gotta be in different interrupt_context_level.
> > I need to think it through a bit more.
> >
> 
> If there is nesting, the programs always clear their exception state on exit, so
> the prog that calls into the kernel which then calls into the tracing prog etc.
> won't see its exception state on return. The only path where attaching programs
> would screw things up is when we see a thrown exception and start unwinding
> (where clearing would be a problem since its done frame-by-frame). For that, I
> already prevent _throwing_ fexit programs from attaching to subprogs in this
> series (normal ones are still ok and supported, because fentry/fexit is
> important for stats etc.). There might be some other corner cases I missed but
> ensuring this property alone in general should make things work correctly.
> 
> > > > How about using a scratch space in prog->aux->exception[] instead of current task?
> > > >
> > >
> > > I actually had this thought. It's even better because we can hardcode the
> > > address of the exception state right in the program (since prog->aux remains
> > > stable during bpf_patch_insn_data).
> >
> > exactly.
> >
> > > However, concurrent program invocations on
> > > multiple CPUs doesn't work well with this. It's like, one program sets the state
> > > while the other tries to check it.
> >
> > Right. If it asserts on one cpu all other cpus will unwind as well,
> > since we're saying bpf_assert is for exceptions when user cannot convince
> > the verifier that the program is correct.
> > So it doesn't matter that it aborted everywhere. It's probably a good thing too.
> >
> 
> We can discuss the semantics (this makes bpf_assert more stronger and basically
> poisons the program globally in some sense), but implementation wise it's going
> to be a lot more tricky to reason about correctness.
> 
> Right now, the verifier follows paths and knows what resources are held when we
> throw from a nested call chain (to complain on leaks). Callers doing the check
> for exception state at runtime expect only certain throwing points to trigger
> the check and rely on that for leak freedom.
> 
> With a global prog->aux->exception, things will be ok for the CPU on which the
> exception was thrown, but some other CPU will see the check returning true in a
> caller even if the callee subprog for it did not throw and was possibly
> transferring its acquired references to the caller after completing execution,
> which now causes leaks (because subprogs are allowed to acquire and return to
> their caller).
> 
> The way to handle this would be that we assume every callee which throws may
> also notionally throw right when returning (due to some other CPU's throw which
> we may see). Then every exit from throwing callees may be processed as throwing
> if we see the global state as set.
> 
> However, this completely prevents subprogs from transferring some acquired
> resource to their caller (which I think is too restrictive). If I'm acquiring
> memory from static subprog and returning to my caller, I can't any longer since
> I notionally throw when exiting and holding resources when doing bpf_throw is
> disallowed, so transfer is out of the question.

I was under impression that subprogs cannot acquire refs and transfer them
to caller.
Looks like your commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks")
allowed too much.
I don't think it's a good idea to support coding patterns like:
void my_alloc_foo(struct foo **ptr)
{
  struct foo *p = bpf_obj_new(typeof(*p));
  *ptr = p;
}

It's a correct C, of course, but do we really want to support such code?
I don't think the verifier can fully support it anyway.
That commit of yours allowed some of it in theory, but above example probably won't work,
since 'transfer' isn't understood by the verifier.

Regardless whether we tighten the verifier now or later such subprogs shouldn't be throwing.
So I don't see an issue doing global prog->aux->exception.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-07  2:57         ` Kumar Kartikeya Dwivedi
@ 2023-04-12 19:43           ` Alexei Starovoitov
  2023-04-13 17:13             ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-12 19:43 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > >
> > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > +		if (bpf_get_exception())
> > > > > +			return -EJUKEBOX;
> > > >
> > > > This is too slow.
> > > > We cannot afford a call and conditional here.
> > >
> > > There are two more options here: have two variants, one with and without the
> > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > exception state, then it's a bit less costly.
> > >
> > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > the overhead of indirect call was not acceptable.
> > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > but my preference would be to disallow throw in callbacks.
> > > > timer cb, rbtree_add cb are typically small.
> > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > >
> > > If the only reason to avoid them is the added performance cost, we can work
> > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > One of the other reasons is that if it's being used within a shared static
> > > function that both main program and callbacks call into, it will be a bit
> > > annoying that it doesn't work in one context.
> >
> > I hope with open coded iterators the only use case for callbacks will be timers,
> > exception cb and rbtree_add. All three are special cases.
> 
> There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> 
> > There is nothing to unwind in the timer case.
> > Certinaly not allowed to rethrow in exception cb.
> > less-like rbtree_add should be tiny. And it's gotta to be fast.
> 
> I agree for some of the cases above they do not matter too much. The main one
> was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> people may do siginificant work in the callback.
> 
> I think we can make it zero cost for programs that don't use it (even for the
> kernel code), I was just hoping to be able to support it everywhere as a generic
> helper to abort program execution without any special corner cases during usage.
> The other main point was about code sharing of functions which makes use of
> assertions. But I'm ok with dropping support for callbacks if you think it's not
> worth it in the end.

I think the unexpected run-time slowdown due to checks is a bigger problem.
Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
The run-time matter more than ability to use assert in all conditions.
I think we'd need two flavors of bpf_assert() asm macros: with and without bpf_throw().
They seem to be useful without actual throw. They will help the verifier understand
the ranges of variables in both cases. The macros are the 99% of the feature.
The actual throw mechanism is 1%. The unwinding and release of resource is a cost
of using macros without explicit control flow in the bpf programs.
The macros without throw might be good enough in many cases.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions
  2023-04-07  3:12         ` Kumar Kartikeya Dwivedi
@ 2023-04-12 19:46           ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-12 19:46 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 07, 2023 at 05:12:24AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Fri, Apr 07, 2023 at 04:30:07AM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 02:42:14AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, Apr 06, 2023 at 04:38:09AM CEST, Alexei Starovoitov wrote:
> > > > On Wed, Apr 05, 2023 at 02:42:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > +static __noinline int throwing_subprog(struct __sk_buff *ctx)
> > > > > +{
> > > > > +	if (ctx)
> > > > > +		bpf_throw();
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +__noinline int global_subprog(struct __sk_buff *ctx)
> > > > > +{
> > > > > +	return subprog(ctx) + 1;
> > > > > +}
> > > > > +
> > > > > +__noinline int throwing_global_subprog(struct __sk_buff *ctx)
> > > > > +{
> > > > > +	if (ctx)
> > > > > +		bpf_throw();
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +static __noinline int exception_cb(void)
> > > > > +{
> > > > > +	return 16;
> > > > > +}
> > > > > +
> > > > > +SEC("tc")
> > > > > +int exception_throw_subprog(struct __sk_buff *ctx)
> > > > > +{
> > > > > +	volatile int i;
> > > > > +
> > > > > +	exception_cb();
> > > > > +	bpf_set_exception_callback(exception_cb);
> > > > > +	i = subprog(ctx);
> > > > > +	i += global_subprog(ctx) - 1;
> > > > > +	if (!i)
> > > > > +		return throwing_global_subprog(ctx);
> > > > > +	else
> > > > > +		return throwing_subprog(ctx);
> > > > > +	bpf_throw();
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +__noinline int throwing_gfunc(volatile int i)
> > > > > +{
> > > > > +	bpf_assert_eq(i, 0);
> > > > > +	return 1;
> > > > > +}
> > > > > +
> > > > > +__noinline static int throwing_func(volatile int i)
> > > > > +{
> > > > > +	bpf_assert_lt(i, 1);
> > > > > +	return 1;
> > > > > +}
> > > >
> > > > exception_cb() has no way of knowning which assert statement threw the exception.
> > > > How about extending a macro:
> > > > bpf_assert_eq(i, 0, MY_INT_ERR);
> > > > or
> > > > bpf_assert_eq(i, 0) {bpf_throw(MY_INT_ERR);}
> > > >
> > > > bpf_throw can store it in prog->aux->exception pass the address to cb.
> > > >
> > >
> > > I agree and will add passing of a value that gets passed to the callback
> > > (probably just set it in the exception state), but I don't think prog->aux will
> > > work, see previous mails.
> > >
> > > > Also I think we shouldn't complicate the verifier with auto release of resources.
> > > > If the user really wants to assert when spin_lock is held it should be user's
> > > > job to specify what resources should be released.
> > > > Can we make it look like:
> > > >
> > > > bpf_spin_lock(&lock);
> > > > bpf_assert_eq(i, 0) {
> > > >   bpf_spin_unlock(&lock);
> > > >   bpf_throw(MY_INT_ERR);
> > > > }
> > >
> > > Do you mean just locks or all resources? Then it kind of undermines the point of
> > > having something like bpf_throw IMO. Since it's easy to insert code from the
> > > point of throw but it's not possible to do the same in callers (unless we add a
> > > way to 'catch' throws), so it only works for some particular cases where callers
> > > don't hold references (or in the main subprog).
> >
> > That's a good point. This approach will force caller of functions that can
> > throw not hold any referenced objects (spin_locks, sk_lookup, obj_new, etc)
> > That indeed might be too restrictive.
> > It would be great if we could come up with a way for bpf prog to release resources
> > explicitly though. I'm not a fan of magic release of spin locks.
> >
> 
> I think to realize that, we need a way to 'catch' thrown exceptions in the
> caller. The user will write a block of code that handles release of resources in
> the current scope and resume unwinding (implicitly by calling into some
> intrinsics). 

or we simply don't allow calling subprogs that can throw while holding resources.

> When we discussed this earlier, you rightly mentioned problems
> associated with introducing control flow into the program not seen by the
> compiler (unlike try/catch in something like C++ which has clear semantics).
> 
> But to take a step back, and about what you've said below:
> 
> > bpf_assert is not analogous to C++ exceptions. It shouldn't be seen a mechanism
> > to simplify control flow and error handling.
> 
> When programs get explicit control and handle the 'exceptional' condition, it
> begins to mimic more of an alternative 'slow path' of the program when it
> encounters an error, and starts resembling an alternative error handling
> mechanism pretty much like C++ exceptions.
> 
> I think that was not the goal here, and the automatic release of resources is
> simply supposed to happen to ensure that the kernel's resource safety is not
> violated when a program is aborting. An assertion's should never occur at
> runtime 99.9999% of the time, but when it does, the BPF runtime emits code to
> ensure that the aborting program does not leave the kernel in an inconsistent
> state (memory leaks, held locks that other programs can never take again, etc.).
> 
> So this will involve clean up of every resource it had acquired when it threw.
> If we cannot ensure that we can safely release resources (e.g. throwing in NMI
> context where a kptr_xchg'd pointer cannot be freed), we will bail during
> verification.

exactly. +1 to all of the above. the progs shouldn't be paying run-time cost
for this 0.0001% case.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc
  2023-04-12 19:36           ` Alexei Starovoitov
@ 2023-04-13 17:05             ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-13 17:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Wed, Apr 12, 2023 at 09:36:12PM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 04:46:55AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> > > > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > >
> > > > > > - The exception state is represented using four booleans in the
> > > > > >   task_struct of current task. Each boolean corresponds to the exception
> > > > > >   state for each kernel context. This allows BPF programs to be
> > > > > >   interrupted and still not clobber the other's exception state.
> > > > >
> > > > > that doesn't work for sleepable bpf progs and in RT for regular progs too.
> > > > >
> > > >
> > > > Can you elaborate? If a sleepable program blocks, that means the task is
> > > > scheduled out, so the next program will use the other task's task_struct.
> > > > Same for preemption for normal progs (under RT or not).
> > >
> > > I was worried about the case of the same task but different code paths
> > > in the kernel with tracing prog stepping on preempted lsm.s prog.
> > > I think you point that in this case they gotta be in different interrupt_context_level.
> > > I need to think it through a bit more.
> > >
> >
> > If there is nesting, the programs always clear their exception state on exit, so
> > the prog that calls into the kernel which then calls into the tracing prog etc.
> > won't see its exception state on return. The only path where attaching programs
> > would screw things up is when we see a thrown exception and start unwinding
> > (where clearing would be a problem since its done frame-by-frame). For that, I
> > already prevent _throwing_ fexit programs from attaching to subprogs in this
> > series (normal ones are still ok and supported, because fentry/fexit is
> > important for stats etc.). There might be some other corner cases I missed but
> > ensuring this property alone in general should make things work correctly.
> >
> > > > > How about using a scratch space in prog->aux->exception[] instead of current task?
> > > > >
> > > >
> > > > I actually had this thought. It's even better because we can hardcode the
> > > > address of the exception state right in the program (since prog->aux remains
> > > > stable during bpf_patch_insn_data).
> > >
> > > exactly.
> > >
> > > > However, concurrent program invocations on
> > > > multiple CPUs doesn't work well with this. It's like, one program sets the state
> > > > while the other tries to check it.
> > >
> > > Right. If it asserts on one cpu all other cpus will unwind as well,
> > > since we're saying bpf_assert is for exceptions when user cannot convince
> > > the verifier that the program is correct.
> > > So it doesn't matter that it aborted everywhere. It's probably a good thing too.
> > >
> >
> > We can discuss the semantics (this makes bpf_assert more stronger and basically
> > poisons the program globally in some sense), but implementation wise it's going
> > to be a lot more tricky to reason about correctness.
> >
> > Right now, the verifier follows paths and knows what resources are held when we
> > throw from a nested call chain (to complain on leaks). Callers doing the check
> > for exception state at runtime expect only certain throwing points to trigger
> > the check and rely on that for leak freedom.
> >
> > With a global prog->aux->exception, things will be ok for the CPU on which the
> > exception was thrown, but some other CPU will see the check returning true in a
> > caller even if the callee subprog for it did not throw and was possibly
> > transferring its acquired references to the caller after completing execution,
> > which now causes leaks (because subprogs are allowed to acquire and return to
> > their caller).
> >
> > The way to handle this would be that we assume every callee which throws may
> > also notionally throw right when returning (due to some other CPU's throw which
> > we may see). Then every exit from throwing callees may be processed as throwing
> > if we see the global state as set.
> >
> > However, this completely prevents subprogs from transferring some acquired
> > resource to their caller (which I think is too restrictive). If I'm acquiring
> > memory from static subprog and returning to my caller, I can't any longer since
> > I notionally throw when exiting and holding resources when doing bpf_throw is
> > disallowed, so transfer is out of the question.
>
> I was under impression that subprogs cannot acquire refs and transfer them
> to caller.
> Looks like your commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks")
> allowed too much.

I think you misunderstood the change in that commit. It was about restricting
callback functions from acquiring references and not releasing them before their
BPF_EXIT (since our handling is not completely correct for more than one
iteration). The verifier has always allowed acquiring references and
transferring them to the caller for subprog calls.

> I don't think it's a good idea to support coding patterns like:
> void my_alloc_foo(struct foo **ptr)
> {
>   struct foo *p = bpf_obj_new(typeof(*p));
>   *ptr = p;
> }
>
> It's a correct C, of course, but do we really want to support such code?
> I don't think the verifier can fully support it anyway.
> That commit of yours allowed some of it in theory, but above example probably won't work,
> since 'transfer' isn't understood by the verifier.

I have no strong opinions about restricting (I think the code for handling
transfers is sane and correct, we just transfer the modified reference state,
and it's a natural valid form of writing programs), especially since static
subprogs do not have the limitations as global subprogs, and you really don't
want to inline everything all the time.
But I think we may end up breaking existing code/programs if we do. Whether that
fallout will be small or not, I have no data yet to predict.

>
> Regardless whether we tighten the verifier now or later such subprogs shouldn't be throwing.
> So I don't see an issue doing global prog->aux->exception.

That's certainly an option, but I still think we need to be a bit careful. The
reason is that during analysis, we need to determine that whenever a subprog
call exits, are we in a state where we can safely unwind? It might end up
restricting a large set of use cases, but I can only say with certainty after I
try it out.

Right now, I heavily rely on the assumption that the checks only become true
when something throws (to also minimize rewrites, but that's a minor reason).
The core reason is being able to argue about correctness. With global exception
state, they can become true anytime, so we need to be a lot more conservative
even if we e.g. didn't see a subprog as throwing from all callsites.

call subprog(A) // will be rewritten, as using R1=A can throw
call subprog(B) // not rewritten, as using R1=B does not throw

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-12 19:43           ` Alexei Starovoitov
@ 2023-04-13 17:13             ` Kumar Kartikeya Dwivedi
  2023-04-13 23:41               ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-13 17:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > >
> > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > +		if (bpf_get_exception())
> > > > > > +			return -EJUKEBOX;
> > > > >
> > > > > This is too slow.
> > > > > We cannot afford a call and conditional here.
> > > >
> > > > There are two more options here: have two variants, one with and without the
> > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > exception state, then it's a bit less costly.
> > > >
> > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > the overhead of indirect call was not acceptable.
> > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > but my preference would be to disallow throw in callbacks.
> > > > > timer cb, rbtree_add cb are typically small.
> > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > >
> > > > If the only reason to avoid them is the added performance cost, we can work
> > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > One of the other reasons is that if it's being used within a shared static
> > > > function that both main program and callbacks call into, it will be a bit
> > > > annoying that it doesn't work in one context.
> > >
> > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > exception cb and rbtree_add. All three are special cases.
> >
> > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> >
> > > There is nothing to unwind in the timer case.
> > > Certinaly not allowed to rethrow in exception cb.
> > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> >
> > I agree for some of the cases above they do not matter too much. The main one
> > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > people may do siginificant work in the callback.
> >
> > I think we can make it zero cost for programs that don't use it (even for the
> > kernel code), I was just hoping to be able to support it everywhere as a generic
> > helper to abort program execution without any special corner cases during usage.
> > The other main point was about code sharing of functions which makes use of
> > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > worth it in the end.
>
> I think the unexpected run-time slowdown due to checks is a bigger problem.
> Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.

I 100% agree, slow down is a big problem and a downside of the current version.
They need to be as lightweight as possible. It's too costly right now in this
set.

> The run-time matter more than ability to use assert in all conditions. I think
> we'd need two flavors of bpf_assert() asm macros: with and without
> bpf_throw(). They seem to be useful without actual throw. They will help the
> verifier understand the ranges of variables in both cases. The macros are the
> 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> release of resource is a cost of using macros without explicit control flow in
> the bpf programs. The macros without throw might be good enough in many cases.

Ok, I'll update to allow for both variants. But just to confirm, do you want to
shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
whether you agree or disagree it's necessary.

I'll try the prog->aux->exception route, but I must say that we're sort of
trading off simpler semantics/behavior for speedup in that case (which is
necessary, but it is a cost IMO). I'll post a version using that, but I'll also
add comparisons with the variant where we spill ptr to exception state and
load+jmp. It's an extra instruction but I will try to benchmark and see how much
difference it causes in practice (probably over the XDP benchmark using such
exceptions, since that's one of the most important performance-critical use
cases). If you agree, let's double down on whatever approach we choose after
analysing the difference?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-13 17:13             ` Kumar Kartikeya Dwivedi
@ 2023-04-13 23:41               ` Alexei Starovoitov
  2023-04-16  4:45                 ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-13 23:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote:
> On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > > >
> > > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > > +		if (bpf_get_exception())
> > > > > > > +			return -EJUKEBOX;
> > > > > >
> > > > > > This is too slow.
> > > > > > We cannot afford a call and conditional here.
> > > > >
> > > > > There are two more options here: have two variants, one with and without the
> > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > > exception state, then it's a bit less costly.
> > > > >
> > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > > the overhead of indirect call was not acceptable.
> > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > > but my preference would be to disallow throw in callbacks.
> > > > > > timer cb, rbtree_add cb are typically small.
> > > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > > >
> > > > > If the only reason to avoid them is the added performance cost, we can work
> > > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > > One of the other reasons is that if it's being used within a shared static
> > > > > function that both main program and callbacks call into, it will be a bit
> > > > > annoying that it doesn't work in one context.
> > > >
> > > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > > exception cb and rbtree_add. All three are special cases.
> > >
> > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> > >
> > > > There is nothing to unwind in the timer case.
> > > > Certinaly not allowed to rethrow in exception cb.
> > > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> > >
> > > I agree for some of the cases above they do not matter too much. The main one
> > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > > people may do siginificant work in the callback.
> > >
> > > I think we can make it zero cost for programs that don't use it (even for the
> > > kernel code), I was just hoping to be able to support it everywhere as a generic
> > > helper to abort program execution without any special corner cases during usage.
> > > The other main point was about code sharing of functions which makes use of
> > > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > > worth it in the end.
> >
> > I think the unexpected run-time slowdown due to checks is a bigger problem.
> > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
> 
> I 100% agree, slow down is a big problem and a downside of the current version.
> They need to be as lightweight as possible. It's too costly right now in this
> set.
> 
> > The run-time matter more than ability to use assert in all conditions. I think
> > we'd need two flavors of bpf_assert() asm macros: with and without
> > bpf_throw(). They seem to be useful without actual throw. They will help the
> > verifier understand the ranges of variables in both cases. The macros are the
> > 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> > release of resource is a cost of using macros without explicit control flow in
> > the bpf programs. The macros without throw might be good enough in many cases.
> 
> Ok, I'll update to allow for both variants. But just to confirm, do you want to
> shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
> whether you agree or disagree it's necessary.
> 
> I'll try the prog->aux->exception route, but I must say that we're sort of
> trading off simpler semantics/behavior for speedup in that case (which is
> necessary, but it is a cost IMO). I'll post a version using that, but I'll also
> add comparisons with the variant where we spill ptr to exception state and
> load+jmp. It's an extra instruction but I will try to benchmark and see how much
> difference it causes in practice (probably over the XDP benchmark using such
> exceptions, since that's one of the most important performance-critical use
> cases). If you agree, let's double down on whatever approach we choose after
> analysing the difference?

I think performance considerations dominate implementation and ease of use.
Could you describe how 'spill to exception state' will look like?
I think the check after bpf_call insn has to be no more than LD + JMP.
I was thinking whether we can do static_key like patching of the code.
bpf_throw will know all locations that should be converted from nop into check
and will do text_poke_bp before throwing.
Maybe we can consider offline unwind and release too. The verifier will prep
release tables and throw will execute them. BPF progs always have frame pointers,
so walking the stack back is relatively easy. Release per callsite is hard.

As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-13 23:41               ` Alexei Starovoitov
@ 2023-04-16  4:45                 ` Kumar Kartikeya Dwivedi
  2023-04-17 23:20                   ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-04-16  4:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Fri, Apr 14, 2023 at 01:41:52AM CEST, Alexei Starovoitov wrote:
> On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > > > >
> > > > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > > > +		if (bpf_get_exception())
> > > > > > > > +			return -EJUKEBOX;
> > > > > > >
> > > > > > > This is too slow.
> > > > > > > We cannot afford a call and conditional here.
> > > > > >
> > > > > > There are two more options here: have two variants, one with and without the
> > > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > > > exception state, then it's a bit less costly.
> > > > > >
> > > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > > > the overhead of indirect call was not acceptable.
> > > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > > > but my preference would be to disallow throw in callbacks.
> > > > > > > timer cb, rbtree_add cb are typically small.
> > > > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > > > >
> > > > > > If the only reason to avoid them is the added performance cost, we can work
> > > > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > > > One of the other reasons is that if it's being used within a shared static
> > > > > > function that both main program and callbacks call into, it will be a bit
> > > > > > annoying that it doesn't work in one context.
> > > > >
> > > > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > > > exception cb and rbtree_add. All three are special cases.
> > > >
> > > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> > > >
> > > > > There is nothing to unwind in the timer case.
> > > > > Certinaly not allowed to rethrow in exception cb.
> > > > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> > > >
> > > > I agree for some of the cases above they do not matter too much. The main one
> > > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > > > people may do siginificant work in the callback.
> > > >
> > > > I think we can make it zero cost for programs that don't use it (even for the
> > > > kernel code), I was just hoping to be able to support it everywhere as a generic
> > > > helper to abort program execution without any special corner cases during usage.
> > > > The other main point was about code sharing of functions which makes use of
> > > > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > > > worth it in the end.
> > >
> > > I think the unexpected run-time slowdown due to checks is a bigger problem.
> > > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> > > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
> >
> > I 100% agree, slow down is a big problem and a downside of the current version.
> > They need to be as lightweight as possible. It's too costly right now in this
> > set.
> >
> > > The run-time matter more than ability to use assert in all conditions. I think
> > > we'd need two flavors of bpf_assert() asm macros: with and without
> > > bpf_throw(). They seem to be useful without actual throw. They will help the
> > > verifier understand the ranges of variables in both cases. The macros are the
> > > 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> > > release of resource is a cost of using macros without explicit control flow in
> > > the bpf programs. The macros without throw might be good enough in many cases.
> >
> > Ok, I'll update to allow for both variants. But just to confirm, do you want to
> > shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
> > whether you agree or disagree it's necessary.
> >
> > I'll try the prog->aux->exception route, but I must say that we're sort of
> > trading off simpler semantics/behavior for speedup in that case (which is
> > necessary, but it is a cost IMO). I'll post a version using that, but I'll also
> > add comparisons with the variant where we spill ptr to exception state and
> > load+jmp. It's an extra instruction but I will try to benchmark and see how much
> > difference it causes in practice (probably over the XDP benchmark using such
> > exceptions, since that's one of the most important performance-critical use
> > cases). If you agree, let's double down on whatever approach we choose after
> > analysing the difference?
>
> I think performance considerations dominate implementation and ease of use.
> Could you describe how 'spill to exception state' will look like?

It was about spilling pointer to exception state on entry to program (so just
main subprog) at a known offset on stack, then instead of LD + JMP, LD from
stack + LD + JMP where the check is needed.

> I think the check after bpf_call insn has to be no more than LD + JMP.
> I was thinking whether we can do static_key like patching of the code.
> bpf_throw will know all locations that should be converted from nop into check
> and will do text_poke_bp before throwing.
> Maybe we can consider offline unwind and release too. The verifier will prep
> release tables and throw will execute them. BPF progs always have frame pointers,
> so walking the stack back is relatively easy. Release per callsite is hard.
>

After some thought, I think offline unwinding is the way to go. That means no
rewrites for the existing code, and we just offload all the cost to the slow
path (bpf_throw call) as it should be. There would be no cost at runtime (except
the conditional branch, which should be well predicted). The current approach
was a bit simpler so I gave it a shot first but I think it's not the way to go.
I will rework the set.

> As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP.

Got it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs
  2023-04-16  4:45                 ` Kumar Kartikeya Dwivedi
@ 2023-04-17 23:20                   ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2023-04-17 23:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, David Vernet

On Sat, Apr 15, 2023 at 9:45 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> > I think the check after bpf_call insn has to be no more than LD + JMP.
> > I was thinking whether we can do static_key like patching of the code.
> > bpf_throw will know all locations that should be converted from nop into check
> > and will do text_poke_bp before throwing.
> > Maybe we can consider offline unwind and release too. The verifier will prep
> > release tables and throw will execute them. BPF progs always have frame pointers,
> > so walking the stack back is relatively easy. Release per callsite is hard.
> >
>
> After some thought, I think offline unwinding is the way to go. That means no
> rewrites for the existing code, and we just offload all the cost to the slow
> path (bpf_throw call) as it should be. There would be no cost at runtime (except
> the conditional branch, which should be well predicted). The current approach
> was a bit simpler so I gave it a shot first but I think it's not the way to go.
> I will rework the set.

It seems so indeed. Offline unwinding is more complex for sure.
The challenge is to make it mostly arch independent.
Something like get_perf_callchain() followed by lookup IP->release_table
and final setjmp() to bpf callback in the last bpf frame.
is_bpf_text_address() will help.
We shouldn't gate it by HAVE_RELIABLE_STACKTRACE,
since bpf prog stack walking is reliable on all archs where JIT is enabled.
Unwinding won't work reliably in interpreted mode though and it's ok.
bpf_throw is a kfunc and it needs prog->jit_requested anyway.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-04-17 23:20 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05  0:42 [PATCH RFC bpf-next v1 0/9] Exceptions - 1/2 Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 1/9] bpf: Fix kfunc callback handling Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 2/9] bpf: Refactor and generalize optimize_bpf_loop Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc Kumar Kartikeya Dwivedi
2023-04-06  2:16   ` Alexei Starovoitov
2023-04-06 23:54     ` Kumar Kartikeya Dwivedi
2023-04-07  2:11       ` Alexei Starovoitov
2023-04-07  2:46         ` Kumar Kartikeya Dwivedi
2023-04-12 19:36           ` Alexei Starovoitov
2023-04-13 17:05             ` Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs Kumar Kartikeya Dwivedi
2023-04-06  2:21   ` Alexei Starovoitov
2023-04-07  0:07     ` Kumar Kartikeya Dwivedi
2023-04-07  2:15       ` Alexei Starovoitov
2023-04-07  2:57         ` Kumar Kartikeya Dwivedi
2023-04-12 19:43           ` Alexei Starovoitov
2023-04-13 17:13             ` Kumar Kartikeya Dwivedi
2023-04-13 23:41               ` Alexei Starovoitov
2023-04-16  4:45                 ` Kumar Kartikeya Dwivedi
2023-04-17 23:20                   ` Alexei Starovoitov
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 5/9] bpf: Add pass to fixup global function throw information Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 6/9] bpf: Add KF_THROW annotation for kfuncs Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 7/9] bpf: Introduce bpf_set_exception_callback kfunc Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 8/9] bpf: Introduce BPF assertion macros Kumar Kartikeya Dwivedi
2023-04-05  0:42 ` [PATCH RFC bpf-next v1 9/9] selftests/bpf: Add tests for BPF exceptions Kumar Kartikeya Dwivedi
2023-04-06  2:38   ` Alexei Starovoitov
2023-04-07  0:42     ` Kumar Kartikeya Dwivedi
2023-04-07  2:30       ` Alexei Starovoitov
2023-04-07  3:12         ` Kumar Kartikeya Dwivedi
2023-04-12 19:46           ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).