All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
@ 2022-11-21 21:31 Jiri Olsa
  2022-11-24  0:41 ` Daniel Borkmann
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-11-21 21:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Hao Sun, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo

We hit following issues [1] [2] when we attach bpf program that calls
bpf_trace_printk helper to the contention_begin tracepoint.

As described in [3] with multiple bpf programs that call bpf_trace_printk
helper attached to the contention_begin might result in exhaustion of
printk buffer or cause a deadlock [2].

There's also another possible deadlock when multiple bpf programs attach
to bpf_trace_printk tracepoint and call one of the printk bpf helpers.

This change denies the attachment of bpf program to contention_begin
and bpf_trace_printk tracepoints if the bpf program calls one of the
printk bpf helpers.

Adding also verifier check for tb_btf programs, so this can be cought
in program loading time with error message like:

  Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.

[1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
[2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
[3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/

Reported-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/bpf.h          |  1 +
 include/linux/bpf_verifier.h |  2 ++
 kernel/bpf/syscall.c         |  3 +++
 kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c9eafa67f2a2..3ccabede0f50 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1319,6 +1319,7 @@ struct bpf_prog {
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
 				call_get_func_ip:1, /* Do we call get_func_ip() */
+				call_printk:1, /* Do we call trace_printk/trace_vprintk  */
 				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 545152ac136c..7118c2fda59d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
 			     struct bpf_reg_state *reg,
 			     enum bpf_arg_type arg_type);
 
+int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
+
 /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
 static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
 					     struct btf *btf, u32 btf_id)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..9a69bda7d62b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 		return -EINVAL;
 	}
 
+	if (bpf_check_tp_printk_denylist(tp_name, prog))
+		return -EACCES;
+
 	btp = bpf_get_raw_tracepoint(tp_name);
 	if (!btp)
 		return -ENOENT;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f07bec227fef..b662bc851e1c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
 				 state->callback_subprogno == subprogno);
 }
 
+int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
+{
+	static const char * const denylist[] = {
+		"contention_begin",
+		"bpf_trace_printk",
+	};
+	int i;
+
+	/* Do not allow attachment to denylist[] tracepoints,
+	 * if the program calls some of the printk helpers,
+	 * because there's possibility of deadlock.
+	 */
+	if (!prog->call_printk)
+		return 0;
+
+	for (i = 0; i < ARRAY_SIZE(denylist); i++) {
+		if (!strcmp(denylist[i], name))
+			return 1;
+	}
+	return 0;
+}
+
+static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
+{
+	struct bpf_prog *prog = env->prog;
+
+	if (prog->type != BPF_PROG_TYPE_TRACING ||
+	    prog->expected_attach_type != BPF_TRACE_RAW_TP)
+		return 0;
+
+	if (WARN_ON_ONCE(!prog->aux->attach_func_name))
+		return -EINVAL;
+
+	if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
+		return 0;
+
+	verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
+		func_id_name(func_id), func_id, prog->aux->attach_func_name);
+	return -EACCES;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -7675,6 +7716,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
 					set_user_ringbuf_callback_state);
 		break;
+	case BPF_FUNC_trace_printk:
+	case BPF_FUNC_trace_vprintk:
+		env->prog->call_printk = 1;
+		err = check_tp_printk_denylist(env, func_id);
+		break;
 	}
 
 	if (err)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-21 21:31 [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints Jiri Olsa
@ 2022-11-24  0:41 ` Daniel Borkmann
  2022-11-24  9:42   ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2022-11-24  0:41 UTC (permalink / raw)
  To: Jiri Olsa, Alexei Starovoitov, Andrii Nakryiko
  Cc: Hao Sun, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo

On 11/21/22 10:31 PM, Jiri Olsa wrote:
> We hit following issues [1] [2] when we attach bpf program that calls
> bpf_trace_printk helper to the contention_begin tracepoint.
> 
> As described in [3] with multiple bpf programs that call bpf_trace_printk
> helper attached to the contention_begin might result in exhaustion of
> printk buffer or cause a deadlock [2].
> 
> There's also another possible deadlock when multiple bpf programs attach
> to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> 
> This change denies the attachment of bpf program to contention_begin
> and bpf_trace_printk tracepoints if the bpf program calls one of the
> printk bpf helpers.
> 
> Adding also verifier check for tb_btf programs, so this can be cought
> in program loading time with error message like:
> 
>    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> 
> [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> 
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>   include/linux/bpf.h          |  1 +
>   include/linux/bpf_verifier.h |  2 ++
>   kernel/bpf/syscall.c         |  3 +++
>   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
>   4 files changed, 52 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index c9eafa67f2a2..3ccabede0f50 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1319,6 +1319,7 @@ struct bpf_prog {
>   				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>   				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>   				call_get_func_ip:1, /* Do we call get_func_ip() */
> +				call_printk:1, /* Do we call trace_printk/trace_vprintk  */
>   				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
>   	enum bpf_prog_type	type;		/* Type of BPF program */
>   	enum bpf_attach_type	expected_attach_type; /* For some prog types */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 545152ac136c..7118c2fda59d 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
>   			     struct bpf_reg_state *reg,
>   			     enum bpf_arg_type arg_type);
>   
> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> +
>   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
>   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
>   					     struct btf *btf, u32 btf_id)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..9a69bda7d62b 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
>   		return -EINVAL;
>   	}
>   
> +	if (bpf_check_tp_printk_denylist(tp_name, prog))
> +		return -EACCES;
> +
>   	btp = bpf_get_raw_tracepoint(tp_name);
>   	if (!btp)
>   		return -ENOENT;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f07bec227fef..b662bc851e1c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
>   				 state->callback_subprogno == subprogno);
>   }
>   
> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> +{
> +	static const char * const denylist[] = {
> +		"contention_begin",
> +		"bpf_trace_printk",
> +	};
> +	int i;
> +
> +	/* Do not allow attachment to denylist[] tracepoints,
> +	 * if the program calls some of the printk helpers,
> +	 * because there's possibility of deadlock.
> +	 */

What if that prog doesn't but tail calls into another one which calls printk helpers?

> +	if (!prog->call_printk)
> +		return 0;
> +
> +	for (i = 0; i < ARRAY_SIZE(denylist); i++) {
> +		if (!strcmp(denylist[i], name))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
> +{
> +	struct bpf_prog *prog = env->prog;
> +
> +	if (prog->type != BPF_PROG_TYPE_TRACING ||
> +	    prog->expected_attach_type != BPF_TRACE_RAW_TP)
> +		return 0;
> +
> +	if (WARN_ON_ONCE(!prog->aux->attach_func_name))
> +		return -EINVAL;
> +
> +	if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
> +		return 0;
> +
> +	verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
> +		func_id_name(func_id), func_id, prog->aux->attach_func_name);
> +	return -EACCES;
> +}
> +
>   static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   			     int *insn_idx_p)
>   {
> @@ -7675,6 +7716,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>   		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
>   					set_user_ringbuf_callback_state);
>   		break;
> +	case BPF_FUNC_trace_printk:
> +	case BPF_FUNC_trace_vprintk:
> +		env->prog->call_printk = 1;
> +		err = check_tp_printk_denylist(env, func_id);
> +		break;
>   	}
>   
>   	if (err)
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-24  0:41 ` Daniel Borkmann
@ 2022-11-24  9:42   ` Jiri Olsa
  2022-11-24 17:17     ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-11-24  9:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > We hit following issues [1] [2] when we attach bpf program that calls
> > bpf_trace_printk helper to the contention_begin tracepoint.
> > 
> > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > helper attached to the contention_begin might result in exhaustion of
> > printk buffer or cause a deadlock [2].
> > 
> > There's also another possible deadlock when multiple bpf programs attach
> > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > 
> > This change denies the attachment of bpf program to contention_begin
> > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > printk bpf helpers.
> > 
> > Adding also verifier check for tb_btf programs, so this can be cought
> > in program loading time with error message like:
> > 
> >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > 
> > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > 
> > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >   include/linux/bpf.h          |  1 +
> >   include/linux/bpf_verifier.h |  2 ++
> >   kernel/bpf/syscall.c         |  3 +++
> >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> >   4 files changed, 52 insertions(+)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index c9eafa67f2a2..3ccabede0f50 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> >   				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> >   				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> >   				call_get_func_ip:1, /* Do we call get_func_ip() */
> > +				call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> >   				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> >   	enum bpf_prog_type	type;		/* Type of BPF program */
> >   	enum bpf_attach_type	expected_attach_type; /* For some prog types */
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 545152ac136c..7118c2fda59d 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> >   			     struct bpf_reg_state *reg,
> >   			     enum bpf_arg_type arg_type);
> > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > +
> >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> >   					     struct btf *btf, u32 btf_id)
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 35972afb6850..9a69bda7d62b 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> >   		return -EINVAL;
> >   	}
> > +	if (bpf_check_tp_printk_denylist(tp_name, prog))
> > +		return -EACCES;
> > +
> >   	btp = bpf_get_raw_tracepoint(tp_name);
> >   	if (!btp)
> >   		return -ENOENT;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index f07bec227fef..b662bc851e1c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> >   				 state->callback_subprogno == subprogno);
> >   }
> > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > +{
> > +	static const char * const denylist[] = {
> > +		"contention_begin",
> > +		"bpf_trace_printk",
> > +	};
> > +	int i;
> > +
> > +	/* Do not allow attachment to denylist[] tracepoints,
> > +	 * if the program calls some of the printk helpers,
> > +	 * because there's possibility of deadlock.
> > +	 */
> 
> What if that prog doesn't but tail calls into another one which calls printk helpers?

right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
because I don't see easy way to check on that

we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
because verifier known the exact tracepoint already

thanks,
jirka

> 
> > +	if (!prog->call_printk)
> > +		return 0;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(denylist); i++) {
> > +		if (!strcmp(denylist[i], name))
> > +			return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
> > +{
> > +	struct bpf_prog *prog = env->prog;
> > +
> > +	if (prog->type != BPF_PROG_TYPE_TRACING ||
> > +	    prog->expected_attach_type != BPF_TRACE_RAW_TP)
> > +		return 0;
> > +
> > +	if (WARN_ON_ONCE(!prog->aux->attach_func_name))
> > +		return -EINVAL;
> > +
> > +	if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
> > +		return 0;
> > +
> > +	verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
> > +		func_id_name(func_id), func_id, prog->aux->attach_func_name);
> > +	return -EACCES;
> > +}
> > +
> >   static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >   			     int *insn_idx_p)
> >   {
> > @@ -7675,6 +7716,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >   		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
> >   					set_user_ringbuf_callback_state);
> >   		break;
> > +	case BPF_FUNC_trace_printk:
> > +	case BPF_FUNC_trace_vprintk:
> > +		env->prog->call_printk = 1;
> > +		err = check_tp_printk_denylist(env, func_id);
> > +		break;
> >   	}
> >   	if (err)
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-24  9:42   ` Jiri Olsa
@ 2022-11-24 17:17     ` Alexei Starovoitov
  2022-11-25  9:35       ` Jiri Olsa
  2022-12-03 17:42       ` Namhyung Kim
  0 siblings, 2 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2022-11-24 17:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, Hao Sun,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > We hit following issues [1] [2] when we attach bpf program that calls
> > > bpf_trace_printk helper to the contention_begin tracepoint.
> > >
> > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > helper attached to the contention_begin might result in exhaustion of
> > > printk buffer or cause a deadlock [2].
> > >
> > > There's also another possible deadlock when multiple bpf programs attach
> > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > >
> > > This change denies the attachment of bpf program to contention_begin
> > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > printk bpf helpers.
> > >
> > > Adding also verifier check for tb_btf programs, so this can be cought
> > > in program loading time with error message like:
> > >
> > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > >
> > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > >
> > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >   include/linux/bpf.h          |  1 +
> > >   include/linux/bpf_verifier.h |  2 ++
> > >   kernel/bpf/syscall.c         |  3 +++
> > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > >   4 files changed, 52 insertions(+)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index c9eafa67f2a2..3ccabede0f50 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index 545152ac136c..7118c2fda59d 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > >                          struct bpf_reg_state *reg,
> > >                          enum bpf_arg_type arg_type);
> > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > +
> > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > >                                          struct btf *btf, u32 btf_id)
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 35972afb6850..9a69bda7d62b 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > >             return -EINVAL;
> > >     }
> > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > +           return -EACCES;
> > > +
> > >     btp = bpf_get_raw_tracepoint(tp_name);
> > >     if (!btp)
> > >             return -ENOENT;
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index f07bec227fef..b662bc851e1c 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > >                              state->callback_subprogno == subprogno);
> > >   }
> > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > +{
> > > +   static const char * const denylist[] = {
> > > +           "contention_begin",
> > > +           "bpf_trace_printk",
> > > +   };
> > > +   int i;
> > > +
> > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > +    * if the program calls some of the printk helpers,
> > > +    * because there's possibility of deadlock.
> > > +    */
> >
> > What if that prog doesn't but tail calls into another one which calls printk helpers?
>
> right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> because I don't see easy way to check on that
>
> we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> because verifier known the exact tracepoint already

This is all fragile and merely a stop gap.
Doesn't sound that the issue is limited to bpf_trace_printk

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-24 17:17     ` Alexei Starovoitov
@ 2022-11-25  9:35       ` Jiri Olsa
  2022-11-30 23:29         ` Andrii Nakryiko
  2022-12-03 17:42       ` Namhyung Kim
  1 sibling, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-11-25  9:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Hao Sun, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo

On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > >
> > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > helper attached to the contention_begin might result in exhaustion of
> > > > printk buffer or cause a deadlock [2].
> > > >
> > > > There's also another possible deadlock when multiple bpf programs attach
> > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > >
> > > > This change denies the attachment of bpf program to contention_begin
> > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > printk bpf helpers.
> > > >
> > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > in program loading time with error message like:
> > > >
> > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > >
> > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > >
> > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >   include/linux/bpf.h          |  1 +
> > > >   include/linux/bpf_verifier.h |  2 ++
> > > >   kernel/bpf/syscall.c         |  3 +++
> > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > >   4 files changed, 52 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > index 545152ac136c..7118c2fda59d 100644
> > > > --- a/include/linux/bpf_verifier.h
> > > > +++ b/include/linux/bpf_verifier.h
> > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > >                          struct bpf_reg_state *reg,
> > > >                          enum bpf_arg_type arg_type);
> > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > +
> > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > >                                          struct btf *btf, u32 btf_id)
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index 35972afb6850..9a69bda7d62b 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > >             return -EINVAL;
> > > >     }
> > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > +           return -EACCES;
> > > > +
> > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > >     if (!btp)
> > > >             return -ENOENT;
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index f07bec227fef..b662bc851e1c 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > >                              state->callback_subprogno == subprogno);
> > > >   }
> > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > +{
> > > > +   static const char * const denylist[] = {
> > > > +           "contention_begin",
> > > > +           "bpf_trace_printk",
> > > > +   };
> > > > +   int i;
> > > > +
> > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > +    * if the program calls some of the printk helpers,
> > > > +    * because there's possibility of deadlock.
> > > > +    */
> > >
> > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> >
> > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > because I don't see easy way to check on that
> >
> > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > because verifier known the exact tracepoint already
> 
> This is all fragile and merely a stop gap.
> Doesn't sound that the issue is limited to bpf_trace_printk

hm, I don't have a better idea how to fix that.. I can't deny
contention_begin completely, because we use it in perf via
tp_btf/contention_begin (perf lock contention) and I don't
think there's another way for perf to do that

fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
programs completely and tracing BPF_TRACE_RAW_TP with printks

with selftest:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf/tp_deny_list&id=9a44d23187a699e6cd088d397f6801a1078361bc

we can add global tracepoint deny list if we see other issues in future

jirka


---
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 545152ac136c..7118c2fda59d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
 			     struct bpf_reg_state *reg,
 			     enum bpf_arg_type arg_type);
 
+int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
+
 /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
 static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
 					     struct btf *btf, u32 btf_id)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..0ef1aaaf7a45 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3324,6 +3324,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 			return -EFAULT;
 		buf[sizeof(buf) - 1] = 0;
 		tp_name = buf;
+
+		if (bpf_check_tp_printk_denylist(tp_name, prog))
+			return -EACCES;
 		break;
 	default:
 		return -EINVAL;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9528a066cfa5..847fdaa8a67b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7476,6 +7476,40 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
 				 state->callback_subprogno == subprogno);
 }
 
+int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
+{
+	static const char * const denylist[] = {
+		"contention_begin",
+		"bpf_trace_printk",
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(denylist); i++) {
+		if (!strcmp(denylist[i], name))
+			return 1;
+	}
+	return 0;
+}
+
+static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
+{
+	struct bpf_prog *prog = env->prog;
+
+	if (prog->type != BPF_PROG_TYPE_TRACING ||
+	    prog->expected_attach_type != BPF_TRACE_RAW_TP)
+		return 0;
+
+	if (WARN_ON_ONCE(!prog->aux->attach_func_name))
+		return -EINVAL;
+
+	if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
+		return 0;
+
+	verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
+		func_id_name(func_id), func_id, prog->aux->attach_func_name);
+	return -EACCES;
+}
+
 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			     int *insn_idx_p)
 {
@@ -7679,6 +7713,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
 					set_user_ringbuf_callback_state);
 		break;
+	case BPF_FUNC_trace_printk:
+	case BPF_FUNC_trace_vprintk:
+		err = check_tp_printk_denylist(env, func_id);
+		break;
 	}
 
 	if (err)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-25  9:35       ` Jiri Olsa
@ 2022-11-30 23:29         ` Andrii Nakryiko
  2022-12-03 17:58           ` Namhyung Kim
  2022-12-04 21:44           ` Jiri Olsa
  0 siblings, 2 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2022-11-30 23:29 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Hao Sun, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo

On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > >
> > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > printk buffer or cause a deadlock [2].
> > > > >
> > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > >
> > > > > This change denies the attachment of bpf program to contention_begin
> > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > printk bpf helpers.
> > > > >
> > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > in program loading time with error message like:
> > > > >
> > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > >
> > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > >
> > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > ---
> > > > >   include/linux/bpf.h          |  1 +
> > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > >   4 files changed, 52 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > --- a/include/linux/bpf_verifier.h
> > > > > +++ b/include/linux/bpf_verifier.h
> > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > >                          struct bpf_reg_state *reg,
> > > > >                          enum bpf_arg_type arg_type);
> > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > +
> > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > >                                          struct btf *btf, u32 btf_id)
> > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > --- a/kernel/bpf/syscall.c
> > > > > +++ b/kernel/bpf/syscall.c
> > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > >             return -EINVAL;
> > > > >     }
> > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > +           return -EACCES;
> > > > > +
> > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > >     if (!btp)
> > > > >             return -ENOENT;
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > >                              state->callback_subprogno == subprogno);
> > > > >   }
> > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > +{
> > > > > +   static const char * const denylist[] = {
> > > > > +           "contention_begin",
> > > > > +           "bpf_trace_printk",
> > > > > +   };
> > > > > +   int i;
> > > > > +
> > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > +    * if the program calls some of the printk helpers,
> > > > > +    * because there's possibility of deadlock.
> > > > > +    */
> > > >
> > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > >
> > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > because I don't see easy way to check on that
> > >
> > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > because verifier known the exact tracepoint already
> >
> > This is all fragile and merely a stop gap.
> > Doesn't sound that the issue is limited to bpf_trace_printk
>
> hm, I don't have a better idea how to fix that.. I can't deny
> contention_begin completely, because we use it in perf via
> tp_btf/contention_begin (perf lock contention) and I don't
> think there's another way for perf to do that
>
> fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> programs completely and tracing BPF_TRACE_RAW_TP with printks
>

I think disabling bpf_trace_printk() tracepoint for any BPF program is
totally fine. This tracepoint was never intended to be attached to.

But as for the general bpf_trace_printk() deadlocking. Should we
discuss how to make it not deadlock instead of starting to denylist
things left and right?

Do I understand that we take trace_printk_lock only to protect that
static char buf[]? Can we just make this buf per-CPU and do a trylock
instead? We'll only fail to bpf_trace_printk() something if we have
nested BPF programs (rare) or NMI (also rare).

And it's a printk(), it's never mission-critical, so if we drop some
message in rare case it's totally fine.


> with selftest:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf/tp_deny_list&id=9a44d23187a699e6cd088d397f6801a1078361bc
>
> we can add global tracepoint deny list if we see other issues in future
>
> jirka
>
>
> ---
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 545152ac136c..7118c2fda59d 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
>                              struct bpf_reg_state *reg,
>                              enum bpf_arg_type arg_type);
>
> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> +
>  /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
>  static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
>                                              struct btf *btf, u32 btf_id)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..0ef1aaaf7a45 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3324,6 +3324,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
>                         return -EFAULT;
>                 buf[sizeof(buf) - 1] = 0;
>                 tp_name = buf;
> +
> +               if (bpf_check_tp_printk_denylist(tp_name, prog))
> +                       return -EACCES;
>                 break;
>         default:
>                 return -EINVAL;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9528a066cfa5..847fdaa8a67b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7476,6 +7476,40 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
>                                  state->callback_subprogno == subprogno);
>  }
>
> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> +{
> +       static const char * const denylist[] = {
> +               "contention_begin",
> +               "bpf_trace_printk",
> +       };
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(denylist); i++) {
> +               if (!strcmp(denylist[i], name))
> +                       return 1;
> +       }
> +       return 0;
> +}
> +
> +static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
> +{
> +       struct bpf_prog *prog = env->prog;
> +
> +       if (prog->type != BPF_PROG_TYPE_TRACING ||
> +           prog->expected_attach_type != BPF_TRACE_RAW_TP)
> +               return 0;
> +
> +       if (WARN_ON_ONCE(!prog->aux->attach_func_name))
> +               return -EINVAL;
> +
> +       if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
> +               return 0;
> +
> +       verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
> +               func_id_name(func_id), func_id, prog->aux->attach_func_name);
> +       return -EACCES;
> +}
> +
>  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>                              int *insn_idx_p)
>  {
> @@ -7679,6 +7713,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
>                                         set_user_ringbuf_callback_state);
>                 break;
> +       case BPF_FUNC_trace_printk:
> +       case BPF_FUNC_trace_vprintk:
> +               err = check_tp_printk_denylist(env, func_id);
> +               break;
>         }
>
>         if (err)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-24 17:17     ` Alexei Starovoitov
  2022-11-25  9:35       ` Jiri Olsa
@ 2022-12-03 17:42       ` Namhyung Kim
  1 sibling, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2022-12-03 17:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Hao Sun, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo

On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > >
> > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > helper attached to the contention_begin might result in exhaustion of
> > > > printk buffer or cause a deadlock [2].
> > > >
> > > > There's also another possible deadlock when multiple bpf programs attach
> > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > >
> > > > This change denies the attachment of bpf program to contention_begin
> > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > printk bpf helpers.
> > > >
> > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > in program loading time with error message like:
> > > >
> > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > >
> > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > >
> > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >   include/linux/bpf.h          |  1 +
> > > >   include/linux/bpf_verifier.h |  2 ++
> > > >   kernel/bpf/syscall.c         |  3 +++
> > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > >   4 files changed, 52 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > index 545152ac136c..7118c2fda59d 100644
> > > > --- a/include/linux/bpf_verifier.h
> > > > +++ b/include/linux/bpf_verifier.h
> > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > >                          struct bpf_reg_state *reg,
> > > >                          enum bpf_arg_type arg_type);
> > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > +
> > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > >                                          struct btf *btf, u32 btf_id)
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index 35972afb6850..9a69bda7d62b 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > >             return -EINVAL;
> > > >     }
> > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > +           return -EACCES;
> > > > +
> > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > >     if (!btp)
> > > >             return -ENOENT;
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index f07bec227fef..b662bc851e1c 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > >                              state->callback_subprogno == subprogno);
> > > >   }
> > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > +{
> > > > +   static const char * const denylist[] = {
> > > > +           "contention_begin",
> > > > +           "bpf_trace_printk",
> > > > +   };
> > > > +   int i;
> > > > +
> > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > +    * if the program calls some of the printk helpers,
> > > > +    * because there's possibility of deadlock.
> > > > +    */
> > >
> > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> >
> > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > because I don't see easy way to check on that
> >
> > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > because verifier known the exact tracepoint already
> 
> This is all fragile and merely a stop gap.
> Doesn't sound that the issue is limited to bpf_trace_printk

Right, contention_begin has had problems with memory allocators too
(via task_local_storage) and potentially any code that grabs a lock.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-30 23:29         ` Andrii Nakryiko
@ 2022-12-03 17:58           ` Namhyung Kim
  2022-12-05 12:28             ` Jiri Olsa
  2022-12-04 21:44           ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2022-12-03 17:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > >
> > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > printk buffer or cause a deadlock [2].
> > > > > >
> > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > >
> > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > printk bpf helpers.
> > > > > >
> > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > in program loading time with error message like:
> > > > > >
> > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > >
> > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > >
> > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > ---
> > > > > >   include/linux/bpf.h          |  1 +
> > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > >   4 files changed, 52 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > >                          struct bpf_reg_state *reg,
> > > > > >                          enum bpf_arg_type arg_type);
> > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > +
> > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > --- a/kernel/bpf/syscall.c
> > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > >             return -EINVAL;
> > > > > >     }
> > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > +           return -EACCES;
> > > > > > +
> > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > >     if (!btp)
> > > > > >             return -ENOENT;
> > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > --- a/kernel/bpf/verifier.c
> > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > >                              state->callback_subprogno == subprogno);
> > > > > >   }
> > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > +{
> > > > > > +   static const char * const denylist[] = {
> > > > > > +           "contention_begin",
> > > > > > +           "bpf_trace_printk",
> > > > > > +   };
> > > > > > +   int i;
> > > > > > +
> > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > +    * if the program calls some of the printk helpers,
> > > > > > +    * because there's possibility of deadlock.
> > > > > > +    */
> > > > >
> > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > >
> > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > because I don't see easy way to check on that
> > > >
> > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > because verifier known the exact tracepoint already
> > >
> > > This is all fragile and merely a stop gap.
> > > Doesn't sound that the issue is limited to bpf_trace_printk
> >
> > hm, I don't have a better idea how to fix that.. I can't deny
> > contention_begin completely, because we use it in perf via
> > tp_btf/contention_begin (perf lock contention) and I don't
> > think there's another way for perf to do that
> >
> > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > programs completely and tracing BPF_TRACE_RAW_TP with printks
> >
> 
> I think disabling bpf_trace_printk() tracepoint for any BPF program is
> totally fine. This tracepoint was never intended to be attached to.
> 
> But as for the general bpf_trace_printk() deadlocking. Should we
> discuss how to make it not deadlock instead of starting to denylist
> things left and right?
> 
> Do I understand that we take trace_printk_lock only to protect that
> static char buf[]? Can we just make this buf per-CPU and do a trylock
> instead? We'll only fail to bpf_trace_printk() something if we have
> nested BPF programs (rare) or NMI (also rare).
> 
> And it's a printk(), it's never mission-critical, so if we drop some
> message in rare case it's totally fine.

What about contention_begin?  I wonder if we can disallow recursions
for those in the deny list like using bpf_prog_active..

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-11-30 23:29         ` Andrii Nakryiko
  2022-12-03 17:58           ` Namhyung Kim
@ 2022-12-04 21:44           ` Jiri Olsa
  2022-12-07 13:39             ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-12-04 21:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > >
> > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > printk buffer or cause a deadlock [2].
> > > > > >
> > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > >
> > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > printk bpf helpers.
> > > > > >
> > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > in program loading time with error message like:
> > > > > >
> > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > >
> > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > >
> > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > ---
> > > > > >   include/linux/bpf.h          |  1 +
> > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > >   4 files changed, 52 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > >                          struct bpf_reg_state *reg,
> > > > > >                          enum bpf_arg_type arg_type);
> > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > +
> > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > --- a/kernel/bpf/syscall.c
> > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > >             return -EINVAL;
> > > > > >     }
> > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > +           return -EACCES;
> > > > > > +
> > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > >     if (!btp)
> > > > > >             return -ENOENT;
> > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > --- a/kernel/bpf/verifier.c
> > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > >                              state->callback_subprogno == subprogno);
> > > > > >   }
> > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > +{
> > > > > > +   static const char * const denylist[] = {
> > > > > > +           "contention_begin",
> > > > > > +           "bpf_trace_printk",
> > > > > > +   };
> > > > > > +   int i;
> > > > > > +
> > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > +    * if the program calls some of the printk helpers,
> > > > > > +    * because there's possibility of deadlock.
> > > > > > +    */
> > > > >
> > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > >
> > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > because I don't see easy way to check on that
> > > >
> > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > because verifier known the exact tracepoint already
> > >
> > > This is all fragile and merely a stop gap.
> > > Doesn't sound that the issue is limited to bpf_trace_printk
> >
> > hm, I don't have a better idea how to fix that.. I can't deny
> > contention_begin completely, because we use it in perf via
> > tp_btf/contention_begin (perf lock contention) and I don't
> > think there's another way for perf to do that
> >
> > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > programs completely and tracing BPF_TRACE_RAW_TP with printks
> >
> 
> I think disabling bpf_trace_printk() tracepoint for any BPF program is
> totally fine. This tracepoint was never intended to be attached to.
> 
> But as for the general bpf_trace_printk() deadlocking. Should we
> discuss how to make it not deadlock instead of starting to denylist
> things left and right?
> 
> Do I understand that we take trace_printk_lock only to protect that
> static char buf[]? Can we just make this buf per-CPU and do a trylock
> instead? We'll only fail to bpf_trace_printk() something if we have
> nested BPF programs (rare) or NMI (also rare).

ugh, sorry I overlooked your reply :-\

sounds good.. if it'd be acceptable to use trylock, we'd get rid of the
contention_begin tracepoint being triggered, which was the case for deadlock

jirka

> 
> And it's a printk(), it's never mission-critical, so if we drop some
> message in rare case it's totally fine.
> 
> 
> > with selftest:
> >   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf/tp_deny_list&id=9a44d23187a699e6cd088d397f6801a1078361bc
> >
> > we can add global tracepoint deny list if we see other issues in future
> >
> > jirka
> >
> >
> > ---
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 545152ac136c..7118c2fda59d 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> >                              struct bpf_reg_state *reg,
> >                              enum bpf_arg_type arg_type);
> >
> > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > +
> >  /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> >  static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> >                                              struct btf *btf, u32 btf_id)
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 35972afb6850..0ef1aaaf7a45 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3324,6 +3324,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> >                         return -EFAULT;
> >                 buf[sizeof(buf) - 1] = 0;
> >                 tp_name = buf;
> > +
> > +               if (bpf_check_tp_printk_denylist(tp_name, prog))
> > +                       return -EACCES;
> >                 break;
> >         default:
> >                 return -EINVAL;
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 9528a066cfa5..847fdaa8a67b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7476,6 +7476,40 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> >                                  state->callback_subprogno == subprogno);
> >  }
> >
> > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > +{
> > +       static const char * const denylist[] = {
> > +               "contention_begin",
> > +               "bpf_trace_printk",
> > +       };
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(denylist); i++) {
> > +               if (!strcmp(denylist[i], name))
> > +                       return 1;
> > +       }
> > +       return 0;
> > +}
> > +
> > +static int check_tp_printk_denylist(struct bpf_verifier_env *env, int func_id)
> > +{
> > +       struct bpf_prog *prog = env->prog;
> > +
> > +       if (prog->type != BPF_PROG_TYPE_TRACING ||
> > +           prog->expected_attach_type != BPF_TRACE_RAW_TP)
> > +               return 0;
> > +
> > +       if (WARN_ON_ONCE(!prog->aux->attach_func_name))
> > +               return -EINVAL;
> > +
> > +       if (!bpf_check_tp_printk_denylist(prog->aux->attach_func_name, prog))
> > +               return 0;
> > +
> > +       verbose(env, "Can't attach program with %s#%d helper to %s tracepoint.\n",
> > +               func_id_name(func_id), func_id, prog->aux->attach_func_name);
> > +       return -EACCES;
> > +}
> > +
> >  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >                              int *insn_idx_p)
> >  {
> > @@ -7679,6 +7713,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
> >                                         set_user_ringbuf_callback_state);
> >                 break;
> > +       case BPF_FUNC_trace_printk:
> > +       case BPF_FUNC_trace_vprintk:
> > +               err = check_tp_printk_denylist(env, func_id);
> > +               break;
> >         }
> >
> >         if (err)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-03 17:58           ` Namhyung Kim
@ 2022-12-05 12:28             ` Jiri Olsa
  2022-12-06  4:00               ` Namhyung Kim
  2022-12-06 20:09               ` Alexei Starovoitov
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2022-12-05 12:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Sat, Dec 03, 2022 at 09:58:34AM -0800, Namhyung Kim wrote:
> On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> > On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > > >
> > > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > > printk buffer or cause a deadlock [2].
> > > > > > >
> > > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > > >
> > > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > > printk bpf helpers.
> > > > > > >
> > > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > > in program loading time with error message like:
> > > > > > >
> > > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > > >
> > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > > ---
> > > > > > >   include/linux/bpf.h          |  1 +
> > > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > > >   4 files changed, 52 insertions(+)
> > > > > > >
> > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > > --- a/include/linux/bpf.h
> > > > > > > +++ b/include/linux/bpf.h
> > > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > > >                          struct bpf_reg_state *reg,
> > > > > > >                          enum bpf_arg_type arg_type);
> > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > > +
> > > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > > >             return -EINVAL;
> > > > > > >     }
> > > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > > +           return -EACCES;
> > > > > > > +
> > > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > > >     if (!btp)
> > > > > > >             return -ENOENT;
> > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > > --- a/kernel/bpf/verifier.c
> > > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > > >                              state->callback_subprogno == subprogno);
> > > > > > >   }
> > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > > +{
> > > > > > > +   static const char * const denylist[] = {
> > > > > > > +           "contention_begin",
> > > > > > > +           "bpf_trace_printk",
> > > > > > > +   };
> > > > > > > +   int i;
> > > > > > > +
> > > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > > +    * if the program calls some of the printk helpers,
> > > > > > > +    * because there's possibility of deadlock.
> > > > > > > +    */
> > > > > >
> > > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > > >
> > > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > > because I don't see easy way to check on that
> > > > >
> > > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > > because verifier known the exact tracepoint already
> > > >
> > > > This is all fragile and merely a stop gap.
> > > > Doesn't sound that the issue is limited to bpf_trace_printk
> > >
> > > hm, I don't have a better idea how to fix that.. I can't deny
> > > contention_begin completely, because we use it in perf via
> > > tp_btf/contention_begin (perf lock contention) and I don't
> > > think there's another way for perf to do that
> > >
> > > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > > programs completely and tracing BPF_TRACE_RAW_TP with printks
> > >
> > 
> > I think disabling bpf_trace_printk() tracepoint for any BPF program is
> > totally fine. This tracepoint was never intended to be attached to.
> > 
> > But as for the general bpf_trace_printk() deadlocking. Should we
> > discuss how to make it not deadlock instead of starting to denylist
> > things left and right?
> > 
> > Do I understand that we take trace_printk_lock only to protect that
> > static char buf[]? Can we just make this buf per-CPU and do a trylock
> > instead? We'll only fail to bpf_trace_printk() something if we have
> > nested BPF programs (rare) or NMI (also rare).
> > 
> > And it's a printk(), it's never mission-critical, so if we drop some
> > message in rare case it's totally fine.
> 
> What about contention_begin?  I wonder if we can disallow recursions
> for those in the deny list like using bpf_prog_active..

I was testing change below which allows to check recursion just
for contention_begin tracepoint

for the reported issue we might be ok with the change that Andrii
suggested, but we could have the change below as extra precaution

jirka


---
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 20749bd9db71..1c89d4292374 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -740,8 +740,8 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
-int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
-int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data);
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data);
 struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
 void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
@@ -873,31 +873,31 @@ void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_free_bpf_prog(struct perf_event *event);
 
-void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
-void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
-void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run1(struct bpf_raw_event_data *data, u64 arg1);
+void bpf_trace_run2(struct bpf_raw_event_data *data, u64 arg1, u64 arg2);
+void bpf_trace_run3(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3);
-void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run4(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4);
-void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run5(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5);
-void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run6(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6);
-void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run7(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
-void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run8(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		    u64 arg8);
-void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run9(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		    u64 arg8, u64 arg9);
-void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run10(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10);
-void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run11(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10, u64 arg11);
-void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run12(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
 void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index e7c2276be33e..5312a8b149c0 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -46,6 +46,11 @@ typedef const int tracepoint_ptr_t;
 typedef struct tracepoint * const tracepoint_ptr_t;
 #endif
 
+struct bpf_raw_event_data {
+	struct bpf_prog *prog;
+	int __percpu *recursion;
+};
+
 struct bpf_raw_event_map {
 	struct tracepoint	*tp;
 	void			*bpf_func;
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..a8f9c3c7c447 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -81,8 +81,8 @@
 static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
-	struct bpf_prog *prog = __data;					\
-	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+	struct bpf_raw_event_data *____data = __data;			\
+	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(____data, CAST_TO_U64(args));	\
 }
 
 #undef DECLARE_EVENT_CLASS
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..5dcb32cd24e6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3141,9 +3141,36 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	return err;
 }
 
+static bool needs_recursion_check(struct bpf_raw_event_map *btp)
+{
+	return !strcmp(btp->tp->name, "contention_begin");
+}
+
+static int bpf_raw_event_data_init(struct bpf_raw_event_data *data,
+				   struct bpf_raw_event_map *btp,
+				   struct bpf_prog *prog)
+{
+	int __percpu *recursion = NULL;
+
+	if (needs_recursion_check(btp)) {
+		recursion = alloc_percpu_gfp(int, GFP_KERNEL);
+		if (!recursion)
+			return -ENOMEM;
+	}
+	data->recursion = recursion;
+	data->prog = prog;
+	return 0;
+}
+
+static void bpf_raw_event_data_release(struct bpf_raw_event_data *data)
+{
+	free_percpu(data->recursion);
+}
+
 struct bpf_raw_tp_link {
 	struct bpf_link link;
 	struct bpf_raw_event_map *btp;
+	struct bpf_raw_event_data data;
 };
 
 static void bpf_raw_tp_link_release(struct bpf_link *link)
@@ -3151,7 +3178,8 @@ static void bpf_raw_tp_link_release(struct bpf_link *link)
 	struct bpf_raw_tp_link *raw_tp =
 		container_of(link, struct bpf_raw_tp_link, link);
 
-	bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+	bpf_probe_unregister(raw_tp->btp, &raw_tp->data);
+	bpf_raw_event_data_release(&raw_tp->data);
 	bpf_put_raw_tracepoint(raw_tp->btp);
 }
 
@@ -3338,17 +3366,23 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 		err = -ENOMEM;
 		goto out_put_btp;
 	}
+	if (bpf_raw_event_data_init(&link->data, btp, prog)) {
+		err = -ENOMEM;
+		kfree(link);
+		goto out_put_btp;
+	}
 	bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
 		      &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
 
 	err = bpf_link_prime(&link->link, &link_primer);
 	if (err) {
+		bpf_raw_event_data_release(&link->data);
 		kfree(link);
 		goto out_put_btp;
 	}
 
-	err = bpf_probe_register(link->btp, prog);
+	err = bpf_probe_register(link->btp, &link->data);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3bbd3f0c810c..d27b7dc77894 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 }
 
 static __always_inline
-void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
+void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
 {
-	cant_sleep();
 	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
@@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 	this_cpu_dec(*(prog->active));
 }
 
+static __always_inline
+void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
+{
+	struct bpf_prog *prog = data->prog;
+
+	cant_sleep();
+	if (unlikely(!data->recursion))
+		return __bpf_trace_prog_run(prog, args);
+
+	if (unlikely(this_cpu_inc_return(*(data->recursion))))
+		goto out;
+	__bpf_trace_prog_run(prog, args);
+out:
+	this_cpu_dec(*(data->recursion));
+}
+
 #define UNPACK(...)			__VA_ARGS__
 #define REPEAT_1(FN, DL, X, ...)	FN(X)
 #define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
@@ -2290,12 +2305,12 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 #define __SEQ_0_11	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
 
 #define BPF_TRACE_DEFN_x(x)						\
-	void bpf_trace_run##x(struct bpf_prog *prog,			\
+	void bpf_trace_run##x(struct bpf_raw_event_data *data,		\
 			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
 	{								\
 		u64 args[x];						\
 		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
-		__bpf_trace_run(prog, args);				\
+		__bpf_trace_run(data, args);				\
 	}								\
 	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
 BPF_TRACE_DEFN_x(1);
@@ -2311,8 +2326,9 @@ BPF_TRACE_DEFN_x(10);
 BPF_TRACE_DEFN_x(11);
 BPF_TRACE_DEFN_x(12);
 
-static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
 {
+	struct bpf_prog *prog = data->prog;
 	struct tracepoint *tp = btp->tp;
 
 	/*
@@ -2326,17 +2342,17 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 		return -EINVAL;
 
 	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
-						   prog);
+						   data);
 }
 
-int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
 {
-	return __bpf_probe_register(btp, prog);
+	return __bpf_probe_register(btp, data);
 }
 
-int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
 {
-	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
+	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, data);
 }
 
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-05 12:28             ` Jiri Olsa
@ 2022-12-06  4:00               ` Namhyung Kim
  2022-12-06  8:14                 ` Jiri Olsa
  2022-12-06 20:09               ` Alexei Starovoitov
  1 sibling, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2022-12-06  4:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sat, Dec 03, 2022 at 09:58:34AM -0800, Namhyung Kim wrote:
> > On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> > > On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > > > >
> > > > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > > > printk buffer or cause a deadlock [2].
> > > > > > > >
> > > > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > > > >
> > > > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > > > printk bpf helpers.
> > > > > > > >
> > > > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > > > in program loading time with error message like:
> > > > > > > >
> > > > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > > > >
> > > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > > > ---
> > > > > > > >   include/linux/bpf.h          |  1 +
> > > > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > > > >   4 files changed, 52 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > > > >                          struct bpf_reg_state *reg,
> > > > > > > >                          enum bpf_arg_type arg_type);
> > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > > > +
> > > > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > > > >             return -EINVAL;
> > > > > > > >     }
> > > > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > > > +           return -EACCES;
> > > > > > > > +
> > > > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > > > >     if (!btp)
> > > > > > > >             return -ENOENT;
> > > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > > > --- a/kernel/bpf/verifier.c
> > > > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > > > >                              state->callback_subprogno == subprogno);
> > > > > > > >   }
> > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > > > +{
> > > > > > > > +   static const char * const denylist[] = {
> > > > > > > > +           "contention_begin",
> > > > > > > > +           "bpf_trace_printk",
> > > > > > > > +   };
> > > > > > > > +   int i;
> > > > > > > > +
> > > > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > > > +    * if the program calls some of the printk helpers,
> > > > > > > > +    * because there's possibility of deadlock.
> > > > > > > > +    */
> > > > > > >
> > > > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > > > >
> > > > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > > > because I don't see easy way to check on that
> > > > > >
> > > > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > > > because verifier known the exact tracepoint already
> > > > >
> > > > > This is all fragile and merely a stop gap.
> > > > > Doesn't sound that the issue is limited to bpf_trace_printk
> > > >
> > > > hm, I don't have a better idea how to fix that.. I can't deny
> > > > contention_begin completely, because we use it in perf via
> > > > tp_btf/contention_begin (perf lock contention) and I don't
> > > > think there's another way for perf to do that
> > > >
> > > > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > > > programs completely and tracing BPF_TRACE_RAW_TP with printks
> > > >
> > >
> > > I think disabling bpf_trace_printk() tracepoint for any BPF program is
> > > totally fine. This tracepoint was never intended to be attached to.
> > >
> > > But as for the general bpf_trace_printk() deadlocking. Should we
> > > discuss how to make it not deadlock instead of starting to denylist
> > > things left and right?
> > >
> > > Do I understand that we take trace_printk_lock only to protect that
> > > static char buf[]? Can we just make this buf per-CPU and do a trylock
> > > instead? We'll only fail to bpf_trace_printk() something if we have
> > > nested BPF programs (rare) or NMI (also rare).
> > >
> > > And it's a printk(), it's never mission-critical, so if we drop some
> > > message in rare case it's totally fine.
> >
> > What about contention_begin?  I wonder if we can disallow recursions
> > for those in the deny list like using bpf_prog_active..
>
> I was testing change below which allows to check recursion just
> for contention_begin tracepoint
>
> for the reported issue we might be ok with the change that Andrii
> suggested, but we could have the change below as extra precaution

Looks ok to me.  But it seems it'd add the recursion check to every
tracepoint.  Can we just change the affected tracepoints only by
using a kind of wrapped btp->bpf_func with some macro magic? ;-)

>
> ---

[SNIP]
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..d27b7dc77894 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>  }
>
>  static __always_inline
> -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
>  {
> -       cant_sleep();
>         if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
>                 bpf_prog_inc_misses_counter(prog);
>                 goto out;
> @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>         this_cpu_dec(*(prog->active));
>  }
>
> +static __always_inline
> +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
> +{
> +       struct bpf_prog *prog = data->prog;
> +
> +       cant_sleep();
> +       if (unlikely(!data->recursion))

likely ?

Thanks,
Namhyung


> +               return __bpf_trace_prog_run(prog, args);
> +
> +       if (unlikely(this_cpu_inc_return(*(data->recursion))))
> +               goto out;
> +       __bpf_trace_prog_run(prog, args);
> +out:
> +       this_cpu_dec(*(data->recursion));
> +}
> +
>  #define UNPACK(...)                    __VA_ARGS__
>  #define REPEAT_1(FN, DL, X, ...)       FN(X)
>  #define REPEAT_2(FN, DL, X, ...)       FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
> @@ -2290,12 +2305,12 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>  #define __SEQ_0_11     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
>
>  #define BPF_TRACE_DEFN_x(x)                                            \
> -       void bpf_trace_run##x(struct bpf_prog *prog,                    \
> +       void bpf_trace_run##x(struct bpf_raw_event_data *data,          \
>                               REPEAT(x, SARG, __DL_COM, __SEQ_0_11))    \
>         {                                                               \
>                 u64 args[x];                                            \
>                 REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);                  \
> -               __bpf_trace_run(prog, args);                            \
> +               __bpf_trace_run(data, args);                            \
>         }                                                               \
>         EXPORT_SYMBOL_GPL(bpf_trace_run##x)
>  BPF_TRACE_DEFN_x(1);
> @@ -2311,8 +2326,9 @@ BPF_TRACE_DEFN_x(10);
>  BPF_TRACE_DEFN_x(11);
>  BPF_TRACE_DEFN_x(12);
>
> -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
>  {
> +       struct bpf_prog *prog = data->prog;
>         struct tracepoint *tp = btp->tp;
>
>         /*
> @@ -2326,17 +2342,17 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>                 return -EINVAL;
>
>         return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> -                                                  prog);
> +                                                  data);
>  }
>
> -int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
>  {
> -       return __bpf_probe_register(btp, prog);
> +       return __bpf_probe_register(btp, data);
>  }
>
> -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data)
>  {
> -       return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> +       return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, data);
>  }
>
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-06  4:00               ` Namhyung Kim
@ 2022-12-06  8:14                 ` Jiri Olsa
  2022-12-06 18:20                   ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-12-06  8:14 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Mon, Dec 05, 2022 at 08:00:16PM -0800, Namhyung Kim wrote:
> On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Sat, Dec 03, 2022 at 09:58:34AM -0800, Namhyung Kim wrote:
> > > On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> > > > On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > > > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > > > > >
> > > > > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > > > > printk buffer or cause a deadlock [2].
> > > > > > > > >
> > > > > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > > > > >
> > > > > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > > > > printk bpf helpers.
> > > > > > > > >
> > > > > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > > > > in program loading time with error message like:
> > > > > > > > >
> > > > > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > > > > >
> > > > > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > > > > >
> > > > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > > > > ---
> > > > > > > > >   include/linux/bpf.h          |  1 +
> > > > > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > > > > >   4 files changed, 52 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > > > > >                          struct bpf_reg_state *reg,
> > > > > > > > >                          enum bpf_arg_type arg_type);
> > > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > > > > +
> > > > > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > > > > >             return -EINVAL;
> > > > > > > > >     }
> > > > > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > > > > +           return -EACCES;
> > > > > > > > > +
> > > > > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > > > > >     if (!btp)
> > > > > > > > >             return -ENOENT;
> > > > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > > > > --- a/kernel/bpf/verifier.c
> > > > > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > > > > >                              state->callback_subprogno == subprogno);
> > > > > > > > >   }
> > > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > > > > +{
> > > > > > > > > +   static const char * const denylist[] = {
> > > > > > > > > +           "contention_begin",
> > > > > > > > > +           "bpf_trace_printk",
> > > > > > > > > +   };
> > > > > > > > > +   int i;
> > > > > > > > > +
> > > > > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > > > > +    * if the program calls some of the printk helpers,
> > > > > > > > > +    * because there's possibility of deadlock.
> > > > > > > > > +    */
> > > > > > > >
> > > > > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > > > > >
> > > > > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > > > > because I don't see easy way to check on that
> > > > > > >
> > > > > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > > > > because verifier known the exact tracepoint already
> > > > > >
> > > > > > This is all fragile and merely a stop gap.
> > > > > > Doesn't sound that the issue is limited to bpf_trace_printk
> > > > >
> > > > > hm, I don't have a better idea how to fix that.. I can't deny
> > > > > contention_begin completely, because we use it in perf via
> > > > > tp_btf/contention_begin (perf lock contention) and I don't
> > > > > think there's another way for perf to do that
> > > > >
> > > > > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > > > > programs completely and tracing BPF_TRACE_RAW_TP with printks
> > > > >
> > > >
> > > > I think disabling bpf_trace_printk() tracepoint for any BPF program is
> > > > totally fine. This tracepoint was never intended to be attached to.
> > > >
> > > > But as for the general bpf_trace_printk() deadlocking. Should we
> > > > discuss how to make it not deadlock instead of starting to denylist
> > > > things left and right?
> > > >
> > > > Do I understand that we take trace_printk_lock only to protect that
> > > > static char buf[]? Can we just make this buf per-CPU and do a trylock
> > > > instead? We'll only fail to bpf_trace_printk() something if we have
> > > > nested BPF programs (rare) or NMI (also rare).
> > > >
> > > > And it's a printk(), it's never mission-critical, so if we drop some
> > > > message in rare case it's totally fine.
> > >
> > > What about contention_begin?  I wonder if we can disallow recursions
> > > for those in the deny list like using bpf_prog_active..
> >
> > I was testing change below which allows to check recursion just
> > for contention_begin tracepoint
> >
> > for the reported issue we might be ok with the change that Andrii
> > suggested, but we could have the change below as extra precaution
> 
> Looks ok to me.  But it seems it'd add the recursion check to every

hm, it should allocate recursion variable just for the contention_begin
tracepoint, rest should see NULL pointer

> tracepoint.  Can we just change the affected tracepoints only by
> using a kind of wrapped btp->bpf_func with some macro magic? ;-)

I tried that and the only other ways I found are:

  - add something like TRACE_EVENT_FLAGS macro and have __init call
    for specific tracepoint that sets the flag

  - add extra new 'bpf_func' that checks the re-entry, but that'd mean
    around 1000 extra mostly unused small functions

> 
> >
> > ---
> 
> [SNIP]
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 3bbd3f0c810c..d27b7dc77894 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> >  }
> >
> >  static __always_inline
> > -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> > +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
> >  {
> > -       cant_sleep();
> >         if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> >                 bpf_prog_inc_misses_counter(prog);
> >                 goto out;
> > @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> >         this_cpu_dec(*(prog->active));
> >  }
> >
> > +static __always_inline
> > +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
> > +{
> > +       struct bpf_prog *prog = data->prog;
> > +
> > +       cant_sleep();
> > +       if (unlikely(!data->recursion))
> 
> likely ?

right, thanks

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-06  8:14                 ` Jiri Olsa
@ 2022-12-06 18:20                   ` Namhyung Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2022-12-06 18:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Tue, Dec 6, 2022 at 12:14 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Dec 05, 2022 at 08:00:16PM -0800, Namhyung Kim wrote:
> > On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Sat, Dec 03, 2022 at 09:58:34AM -0800, Namhyung Kim wrote:
> > > > What about contention_begin?  I wonder if we can disallow recursions
> > > > for those in the deny list like using bpf_prog_active..
> > >
> > > I was testing change below which allows to check recursion just
> > > for contention_begin tracepoint
> > >
> > > for the reported issue we might be ok with the change that Andrii
> > > suggested, but we could have the change below as extra precaution
> >
> > Looks ok to me.  But it seems it'd add the recursion check to every
>
> hm, it should allocate recursion variable just for the contention_begin
> tracepoint, rest should see NULL pointer

Oh, right.  I meant the NULL check.

>
> > tracepoint.  Can we just change the affected tracepoints only by
> > using a kind of wrapped btp->bpf_func with some macro magic? ;-)
>
> I tried that and the only other ways I found are:
>
>   - add something like TRACE_EVENT_FLAGS macro and have __init call
>     for specific tracepoint that sets the flag
>
>   - add extra new 'bpf_func' that checks the re-entry, but that'd mean
>     around 1000 extra mostly unused small functions

Hmm.. ok, that's not what I want.  I'm fine with the patch then.
With the 'likely' change,

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-05 12:28             ` Jiri Olsa
  2022-12-06  4:00               ` Namhyung Kim
@ 2022-12-06 20:09               ` Alexei Starovoitov
  2022-12-07  2:14                 ` Namhyung Kim
  1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2022-12-06 20:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sat, Dec 03, 2022 at 09:58:34AM -0800, Namhyung Kim wrote:
> > On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> > > On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > > > >
> > > > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > > > printk buffer or cause a deadlock [2].
> > > > > > > >
> > > > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > > > >
> > > > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > > > printk bpf helpers.
> > > > > > > >
> > > > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > > > in program loading time with error message like:
> > > > > > > >
> > > > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > > > >
> > > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > > > ---
> > > > > > > >   include/linux/bpf.h          |  1 +
> > > > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > > > >   4 files changed, 52 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > > > >                          struct bpf_reg_state *reg,
> > > > > > > >                          enum bpf_arg_type arg_type);
> > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > > > +
> > > > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > > > >             return -EINVAL;
> > > > > > > >     }
> > > > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > > > +           return -EACCES;
> > > > > > > > +
> > > > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > > > >     if (!btp)
> > > > > > > >             return -ENOENT;
> > > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > > > --- a/kernel/bpf/verifier.c
> > > > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > > > >                              state->callback_subprogno == subprogno);
> > > > > > > >   }
> > > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > > > +{
> > > > > > > > +   static const char * const denylist[] = {
> > > > > > > > +           "contention_begin",
> > > > > > > > +           "bpf_trace_printk",
> > > > > > > > +   };
> > > > > > > > +   int i;
> > > > > > > > +
> > > > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > > > +    * if the program calls some of the printk helpers,
> > > > > > > > +    * because there's possibility of deadlock.
> > > > > > > > +    */
> > > > > > >
> > > > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > > > >
> > > > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > > > because I don't see easy way to check on that
> > > > > >
> > > > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > > > because verifier known the exact tracepoint already
> > > > >
> > > > > This is all fragile and merely a stop gap.
> > > > > Doesn't sound that the issue is limited to bpf_trace_printk
> > > >
> > > > hm, I don't have a better idea how to fix that.. I can't deny
> > > > contention_begin completely, because we use it in perf via
> > > > tp_btf/contention_begin (perf lock contention) and I don't
> > > > think there's another way for perf to do that
> > > >
> > > > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > > > programs completely and tracing BPF_TRACE_RAW_TP with printks
> > > >
> > >
> > > I think disabling bpf_trace_printk() tracepoint for any BPF program is
> > > totally fine. This tracepoint was never intended to be attached to.
> > >
> > > But as for the general bpf_trace_printk() deadlocking. Should we
> > > discuss how to make it not deadlock instead of starting to denylist
> > > things left and right?
> > >
> > > Do I understand that we take trace_printk_lock only to protect that
> > > static char buf[]? Can we just make this buf per-CPU and do a trylock
> > > instead? We'll only fail to bpf_trace_printk() something if we have
> > > nested BPF programs (rare) or NMI (also rare).
> > >
> > > And it's a printk(), it's never mission-critical, so if we drop some
> > > message in rare case it's totally fine.
> >
> > What about contention_begin?  I wonder if we can disallow recursions
> > for those in the deny list like using bpf_prog_active..
>
> I was testing change below which allows to check recursion just
> for contention_begin tracepoint
>
> for the reported issue we might be ok with the change that Andrii
> suggested, but we could have the change below as extra precaution
>
> jirka
>
>
> ---
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 20749bd9db71..1c89d4292374 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -740,8 +740,8 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
>  int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>  void perf_event_detach_bpf_prog(struct perf_event *event);
>  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> -int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> +int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data);
> +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_event_data *data);
>  struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
>  void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> @@ -873,31 +873,31 @@ void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>  int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>  void perf_event_free_bpf_prog(struct perf_event *event);
>
> -void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> -void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
> -void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run1(struct bpf_raw_event_data *data, u64 arg1);
> +void bpf_trace_run2(struct bpf_raw_event_data *data, u64 arg1, u64 arg2);
> +void bpf_trace_run3(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3);
> -void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run4(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4);
> -void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run5(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4, u64 arg5);
> -void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run6(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4, u64 arg5, u64 arg6);
> -void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run7(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
> -void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run8(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>                     u64 arg8);
> -void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run9(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>                     u64 arg8, u64 arg9);
> -void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run10(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                      u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>                      u64 arg8, u64 arg9, u64 arg10);
> -void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run11(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                      u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>                      u64 arg8, u64 arg9, u64 arg10, u64 arg11);
> -void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run12(struct bpf_raw_event_data *data, u64 arg1, u64 arg2,
>                      u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>                      u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
>  void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index e7c2276be33e..5312a8b149c0 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -46,6 +46,11 @@ typedef const int tracepoint_ptr_t;
>  typedef struct tracepoint * const tracepoint_ptr_t;
>  #endif
>
> +struct bpf_raw_event_data {
> +       struct bpf_prog *prog;
> +       int __percpu *recursion;
> +};
> +
>  struct bpf_raw_event_map {
>         struct tracepoint       *tp;
>         void                    *bpf_func;
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 6a13220d2d27..a8f9c3c7c447 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -81,8 +81,8 @@
>  static notrace void                                                    \
>  __bpf_trace_##call(void *__data, proto)                                        \
>  {                                                                      \
> -       struct bpf_prog *prog = __data;                                 \
> -       CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> +       struct bpf_raw_event_data *____data = __data;                   \
> +       CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(____data, CAST_TO_U64(args));      \
>  }
>
>  #undef DECLARE_EVENT_CLASS
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..5dcb32cd24e6 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3141,9 +3141,36 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>         return err;
>  }
>
> +static bool needs_recursion_check(struct bpf_raw_event_map *btp)
> +{
> +       return !strcmp(btp->tp->name, "contention_begin");
> +}
> +
> +static int bpf_raw_event_data_init(struct bpf_raw_event_data *data,
> +                                  struct bpf_raw_event_map *btp,
> +                                  struct bpf_prog *prog)
> +{
> +       int __percpu *recursion = NULL;
> +
> +       if (needs_recursion_check(btp)) {
> +               recursion = alloc_percpu_gfp(int, GFP_KERNEL);
> +               if (!recursion)
> +                       return -ENOMEM;
> +       }
> +       data->recursion = recursion;
> +       data->prog = prog;
> +       return 0;
> +}
> +
> +static void bpf_raw_event_data_release(struct bpf_raw_event_data *data)
> +{
> +       free_percpu(data->recursion);
> +}
> +
>  struct bpf_raw_tp_link {
>         struct bpf_link link;
>         struct bpf_raw_event_map *btp;
> +       struct bpf_raw_event_data data;
>  };
>
>  static void bpf_raw_tp_link_release(struct bpf_link *link)
> @@ -3151,7 +3178,8 @@ static void bpf_raw_tp_link_release(struct bpf_link *link)
>         struct bpf_raw_tp_link *raw_tp =
>                 container_of(link, struct bpf_raw_tp_link, link);
>
> -       bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
> +       bpf_probe_unregister(raw_tp->btp, &raw_tp->data);
> +       bpf_raw_event_data_release(&raw_tp->data);
>         bpf_put_raw_tracepoint(raw_tp->btp);
>  }
>
> @@ -3338,17 +3366,23 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
>                 err = -ENOMEM;
>                 goto out_put_btp;
>         }
> +       if (bpf_raw_event_data_init(&link->data, btp, prog)) {
> +               err = -ENOMEM;
> +               kfree(link);
> +               goto out_put_btp;
> +       }
>         bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
>                       &bpf_raw_tp_link_lops, prog);
>         link->btp = btp;
>
>         err = bpf_link_prime(&link->link, &link_primer);
>         if (err) {
> +               bpf_raw_event_data_release(&link->data);
>                 kfree(link);
>                 goto out_put_btp;
>         }
>
> -       err = bpf_probe_register(link->btp, prog);
> +       err = bpf_probe_register(link->btp, &link->data);
>         if (err) {
>                 bpf_link_cleanup(&link_primer);
>                 goto out_put_btp;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..d27b7dc77894 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>  }
>
>  static __always_inline
> -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
>  {
> -       cant_sleep();
>         if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
>                 bpf_prog_inc_misses_counter(prog);
>                 goto out;
> @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>         this_cpu_dec(*(prog->active));
>  }
>
> +static __always_inline
> +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
> +{
> +       struct bpf_prog *prog = data->prog;
> +
> +       cant_sleep();
> +       if (unlikely(!data->recursion))
> +               return __bpf_trace_prog_run(prog, args);
> +
> +       if (unlikely(this_cpu_inc_return(*(data->recursion))))
> +               goto out;
> +       __bpf_trace_prog_run(prog, args);
> +out:
> +       this_cpu_dec(*(data->recursion));
> +}

This is way too much run-time and memory overhead to address
this corner case. Pls come up with some other approach.
Sorry I don't have decent suggestions at the moment.
For now we can simply disallow attaching to contention_begin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-06 20:09               ` Alexei Starovoitov
@ 2022-12-07  2:14                 ` Namhyung Kim
  2022-12-07  5:23                   ` Hao Sun
  2022-12-07  8:18                   ` Jiri Olsa
  0 siblings, 2 replies; 24+ messages in thread
From: Namhyung Kim @ 2022-12-07  2:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Hao Sun, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo

On Tue, Dec 06, 2022 at 12:09:51PM -0800, Alexei Starovoitov wrote:
> On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 3bbd3f0c810c..d27b7dc77894 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> >  }
> >
> >  static __always_inline
> > -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> > +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
> >  {
> > -       cant_sleep();
> >         if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> >                 bpf_prog_inc_misses_counter(prog);
> >                 goto out;
> > @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> >         this_cpu_dec(*(prog->active));
> >  }
> >
> > +static __always_inline
> > +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
> > +{
> > +       struct bpf_prog *prog = data->prog;
> > +
> > +       cant_sleep();
> > +       if (unlikely(!data->recursion))
> > +               return __bpf_trace_prog_run(prog, args);
> > +
> > +       if (unlikely(this_cpu_inc_return(*(data->recursion))))
> > +               goto out;
> > +       __bpf_trace_prog_run(prog, args);
> > +out:
> > +       this_cpu_dec(*(data->recursion));
> > +}
> 
> This is way too much run-time and memory overhead to address
> this corner case. Pls come up with some other approach.
> Sorry I don't have decent suggestions at the moment.
> For now we can simply disallow attaching to contention_begin.
> 

How about this?  It seems to work for me.

Thanks,
Namhyung

---
 include/linux/trace_events.h    | 14 +++++++
 include/linux/tracepoint-defs.h |  5 +++
 kernel/bpf/syscall.c            | 18 ++++++++-
 kernel/trace/bpf_trace.c        | 65 ++++++++++++++++++++++++++++++---
 4 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 20749bd9db71..461468210a77 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -742,6 +742,10 @@ void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				 struct bpf_raw_event_data *data);
+int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+				   struct bpf_raw_event_data *data);
 struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
 void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
@@ -775,6 +779,16 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf
 {
 	return -EOPNOTSUPP;
 }
+static inline int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *p,
+					       struct bpf_raw_event_data *data)
+{
+	return -EOPNOTSUPP;
+}
+static inline int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+						 struct bpf_raw_event_data *data)
+{
+	return -EOPNOTSUPP;
+}
 static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
 {
 	return NULL;
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index e7c2276be33e..e5adfe606888 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -53,6 +53,11 @@ struct bpf_raw_event_map {
 	u32			writable_size;
 } __aligned(32);
 
+struct bpf_raw_event_data {
+	struct bpf_prog		*prog;
+	int __percpu		*active;
+};
+
 /*
  * If a tracepoint needs to be called from a header file, it is not
  * recommended to call it directly, as tracepoints in header files
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..a8be9c443306 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3144,14 +3144,24 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 struct bpf_raw_tp_link {
 	struct bpf_link link;
 	struct bpf_raw_event_map *btp;
+	struct bpf_raw_event_data data;
 };
 
+static bool needs_recursion_check(struct bpf_raw_event_map *btp)
+{
+	return !strcmp(btp->tp->name, "contention_begin");
+}
+
 static void bpf_raw_tp_link_release(struct bpf_link *link)
 {
 	struct bpf_raw_tp_link *raw_tp =
 		container_of(link, struct bpf_raw_tp_link, link);
 
-	bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+	if (needs_recursion_check(raw_tp->btp))
+		bpf_probe_unregister_norecurse(raw_tp->btp, &raw_tp->data);
+	else
+		bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+
 	bpf_put_raw_tracepoint(raw_tp->btp);
 }
 
@@ -3348,7 +3358,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 		goto out_put_btp;
 	}
 
-	err = bpf_probe_register(link->btp, prog);
+	if (needs_recursion_check(link->btp))
+		err = bpf_probe_register_norecurse(link->btp, prog, &link->data);
+	else
+		err = bpf_probe_register(link->btp, prog);
+
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3bbd3f0c810c..edbfeff029aa 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2297,7 +2297,20 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
 		__bpf_trace_run(prog, args);				\
 	}								\
-	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
+	EXPORT_SYMBOL_GPL(bpf_trace_run##x);				\
+									\
+	static void bpf_trace_run_norecurse##x(struct bpf_raw_event_data *data,	\
+			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
+	{								\
+		u64 args[x];						\
+		if (unlikely(this_cpu_inc_return(*(data->active)) != 1)) \
+			goto out;					\
+		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
+		__bpf_trace_run(data->prog, args);			\
+	out:								\
+		this_cpu_dec(*(data->active));				\
+	}
+
 BPF_TRACE_DEFN_x(1);
 BPF_TRACE_DEFN_x(2);
 BPF_TRACE_DEFN_x(3);
@@ -2311,7 +2324,23 @@ BPF_TRACE_DEFN_x(10);
 BPF_TRACE_DEFN_x(11);
 BPF_TRACE_DEFN_x(12);
 
-static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+static void *bpf_trace_norecurse_funcs[12] = {
+	(void *)bpf_trace_run_norecurse1,
+	(void *)bpf_trace_run_norecurse2,
+	(void *)bpf_trace_run_norecurse3,
+	(void *)bpf_trace_run_norecurse4,
+	(void *)bpf_trace_run_norecurse5,
+	(void *)bpf_trace_run_norecurse6,
+	(void *)bpf_trace_run_norecurse7,
+	(void *)bpf_trace_run_norecurse8,
+	(void *)bpf_trace_run_norecurse9,
+	(void *)bpf_trace_run_norecurse10,
+	(void *)bpf_trace_run_norecurse11,
+	(void *)bpf_trace_run_norecurse12,
+};
+
+static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				void *func, void *data)
 {
 	struct tracepoint *tp = btp->tp;
 
@@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 	if (prog->aux->max_tp_access > btp->writable_size)
 		return -EINVAL;
 
-	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
-						   prog);
+	return tracepoint_probe_register_may_exist(tp, func, data);
 }
 
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	return __bpf_probe_register(btp, prog);
+	return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
 }
 
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
@@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
 }
 
+int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				 struct bpf_raw_event_data *data)
+{
+	void *bpf_func;
+
+	data->active = alloc_percpu_gfp(int, GFP_KERNEL);
+	if (!data->active)
+		return -ENOMEM;
+
+	data->prog = prog;
+	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
+	return __bpf_probe_register(btp, prog, bpf_func, data);
+}
+
+int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+				   struct bpf_raw_event_data *data)
+{
+	int err;
+	void *bpf_func;
+
+	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
+	err = tracepoint_probe_unregister(btp->tp, bpf_func, data);
+	free_percpu(data->active);
+
+	return err;
+}
+
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 			    u32 *fd_type, const char **buf,
 			    u64 *probe_offset, u64 *probe_addr)
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07  2:14                 ` Namhyung Kim
@ 2022-12-07  5:23                   ` Hao Sun
  2022-12-07 22:58                     ` Namhyung Kim
  2022-12-07  8:18                   ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Hao Sun @ 2022-12-07  5:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, bpf, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo



> On 7 Dec 2022, at 10:14 AM, Namhyung Kim <namhyung@gmail.com> wrote:
> 
> On Tue, Dec 06, 2022 at 12:09:51PM -0800, Alexei Starovoitov wrote:
>> On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index 3bbd3f0c810c..d27b7dc77894 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>>> }
>>> 
>>> static __always_inline
>>> -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>>> +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
>>> {
>>> -       cant_sleep();
>>>        if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
>>>                bpf_prog_inc_misses_counter(prog);
>>>                goto out;
>>> @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>>>        this_cpu_dec(*(prog->active));
>>> }
>>> 
>>> +static __always_inline
>>> +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
>>> +{
>>> +       struct bpf_prog *prog = data->prog;
>>> +
>>> +       cant_sleep();
>>> +       if (unlikely(!data->recursion))
>>> +               return __bpf_trace_prog_run(prog, args);
>>> +
>>> +       if (unlikely(this_cpu_inc_return(*(data->recursion))))
>>> +               goto out;
>>> +       __bpf_trace_prog_run(prog, args);
>>> +out:
>>> +       this_cpu_dec(*(data->recursion));
>>> +}
>> 
>> This is way too much run-time and memory overhead to address
>> this corner case. Pls come up with some other approach.
>> Sorry I don't have decent suggestions at the moment.
>> For now we can simply disallow attaching to contention_begin.
>> 
> 
> How about this?  It seems to work for me.

How about progs that are attached with kprobe?
See this one:
https://lore.kernel.org/bpf/CACkBjsb3GRw5aiTT=RCUs3H5aum_QN+B0ZqZA=MvjspUP6NFMg@mail.gmail.com/T/#u

> 
> Thanks,
> Namhyung
> 
> ---
> include/linux/trace_events.h    | 14 +++++++
> include/linux/tracepoint-defs.h |  5 +++
> kernel/bpf/syscall.c            | 18 ++++++++-
> kernel/trace/bpf_trace.c        | 65 ++++++++++++++++++++++++++++++---
> 4 files changed, 95 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 20749bd9db71..461468210a77 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -742,6 +742,10 @@ void perf_event_detach_bpf_prog(struct perf_event *event);
> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> +  struct bpf_raw_event_data *data);
> +int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
> +    struct bpf_raw_event_data *data);
> struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
> void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
> int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> @@ -775,6 +779,16 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf
> {
> return -EOPNOTSUPP;
> }
> +static inline int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *p,
> +        struct bpf_raw_event_data *data)
> +{
> + return -EOPNOTSUPP;
> +}
> +static inline int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
> +  struct bpf_raw_event_data *data)
> +{
> + return -EOPNOTSUPP;
> +}
> static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
> {
> return NULL;
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index e7c2276be33e..e5adfe606888 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -53,6 +53,11 @@ struct bpf_raw_event_map {
> u32 writable_size;
> } __aligned(32);
> 
> +struct bpf_raw_event_data {
> + struct bpf_prog *prog;
> + int __percpu *active;
> +};
> +
> /*
>  * If a tracepoint needs to be called from a header file, it is not
>  * recommended to call it directly, as tracepoints in header files
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 35972afb6850..a8be9c443306 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3144,14 +3144,24 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
> struct bpf_raw_tp_link {
> struct bpf_link link;
> struct bpf_raw_event_map *btp;
> + struct bpf_raw_event_data data;
> };
> 
> +static bool needs_recursion_check(struct bpf_raw_event_map *btp)
> +{
> + return !strcmp(btp->tp->name, "contention_begin");
> +}
> +
> static void bpf_raw_tp_link_release(struct bpf_link *link)
> {
> struct bpf_raw_tp_link *raw_tp =
> container_of(link, struct bpf_raw_tp_link, link);
> 
> - bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
> + if (needs_recursion_check(raw_tp->btp))
> + bpf_probe_unregister_norecurse(raw_tp->btp, &raw_tp->data);
> + else
> + bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
> +
> bpf_put_raw_tracepoint(raw_tp->btp);
> }
> 
> @@ -3348,7 +3358,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> goto out_put_btp;
> }
> 
> - err = bpf_probe_register(link->btp, prog);
> + if (needs_recursion_check(link->btp))
> + err = bpf_probe_register_norecurse(link->btp, prog, &link->data);
> + else
> + err = bpf_probe_register(link->btp, prog);
> +
> if (err) {
> bpf_link_cleanup(&link_primer);
> goto out_put_btp;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..edbfeff029aa 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2297,7 +2297,20 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> __bpf_trace_run(prog, args); \
> } \
> - EXPORT_SYMBOL_GPL(bpf_trace_run##x)
> + EXPORT_SYMBOL_GPL(bpf_trace_run##x); \
> + \
> + static void bpf_trace_run_norecurse##x(struct bpf_raw_event_data *data, \
> +       REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
> + { \
> + u64 args[x]; \
> + if (unlikely(this_cpu_inc_return(*(data->active)) != 1)) \
> + goto out; \
> + REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
> + __bpf_trace_run(data->prog, args); \
> + out: \
> + this_cpu_dec(*(data->active)); \
> + }
> +
> BPF_TRACE_DEFN_x(1);
> BPF_TRACE_DEFN_x(2);
> BPF_TRACE_DEFN_x(3);
> @@ -2311,7 +2324,23 @@ BPF_TRACE_DEFN_x(10);
> BPF_TRACE_DEFN_x(11);
> BPF_TRACE_DEFN_x(12);
> 
> -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +static void *bpf_trace_norecurse_funcs[12] = {
> + (void *)bpf_trace_run_norecurse1,
> + (void *)bpf_trace_run_norecurse2,
> + (void *)bpf_trace_run_norecurse3,
> + (void *)bpf_trace_run_norecurse4,
> + (void *)bpf_trace_run_norecurse5,
> + (void *)bpf_trace_run_norecurse6,
> + (void *)bpf_trace_run_norecurse7,
> + (void *)bpf_trace_run_norecurse8,
> + (void *)bpf_trace_run_norecurse9,
> + (void *)bpf_trace_run_norecurse10,
> + (void *)bpf_trace_run_norecurse11,
> + (void *)bpf_trace_run_norecurse12,
> +};
> +
> +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> + void *func, void *data)
> {
> struct tracepoint *tp = btp->tp;
> 
> @@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> if (prog->aux->max_tp_access > btp->writable_size)
> return -EINVAL;
> 
> - return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> -    prog);
> + return tracepoint_probe_register_may_exist(tp, func, data);
> }
> 
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> {
> - return __bpf_probe_register(btp, prog);
> + return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
> }
> 
> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> @@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> }
> 
> +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> +  struct bpf_raw_event_data *data)
> +{
> + void *bpf_func;
> +
> + data->active = alloc_percpu_gfp(int, GFP_KERNEL);
> + if (!data->active)
> + return -ENOMEM;
> +
> + data->prog = prog;
> + bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> + return __bpf_probe_register(btp, prog, bpf_func, data);
> +}
> +
> +int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
> +    struct bpf_raw_event_data *data)
> +{
> + int err;
> + void *bpf_func;
> +
> + bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> + err = tracepoint_probe_unregister(btp->tp, bpf_func, data);
> + free_percpu(data->active);
> +
> + return err;
> +}
> +
> int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>     u32 *fd_type, const char **buf,
>     u64 *probe_offset, u64 *probe_addr)
> -- 
> 2.39.0.rc0.267.gcb52ba06e7-goog



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07  2:14                 ` Namhyung Kim
  2022-12-07  5:23                   ` Hao Sun
@ 2022-12-07  8:18                   ` Jiri Olsa
  2022-12-07 19:08                     ` Namhyung Kim
  1 sibling, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2022-12-07  8:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Tue, Dec 06, 2022 at 06:14:06PM -0800, Namhyung Kim wrote:

SNIP

> -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +static void *bpf_trace_norecurse_funcs[12] = {
> +	(void *)bpf_trace_run_norecurse1,
> +	(void *)bpf_trace_run_norecurse2,
> +	(void *)bpf_trace_run_norecurse3,
> +	(void *)bpf_trace_run_norecurse4,
> +	(void *)bpf_trace_run_norecurse5,
> +	(void *)bpf_trace_run_norecurse6,
> +	(void *)bpf_trace_run_norecurse7,
> +	(void *)bpf_trace_run_norecurse8,
> +	(void *)bpf_trace_run_norecurse9,
> +	(void *)bpf_trace_run_norecurse10,
> +	(void *)bpf_trace_run_norecurse11,
> +	(void *)bpf_trace_run_norecurse12,
> +};
> +
> +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> +				void *func, void *data)
>  {
>  	struct tracepoint *tp = btp->tp;
>  
> @@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
>  	if (prog->aux->max_tp_access > btp->writable_size)
>  		return -EINVAL;
>  
> -	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> -						   prog);
> +	return tracepoint_probe_register_may_exist(tp, func, data);
>  }
>  
>  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
>  {
> -	return __bpf_probe_register(btp, prog);
> +	return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
>  }
>  
>  int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> @@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
>  	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
>  }
>  
> +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> +				 struct bpf_raw_event_data *data)
> +{
> +	void *bpf_func;
> +
> +	data->active = alloc_percpu_gfp(int, GFP_KERNEL);
> +	if (!data->active)
> +		return -ENOMEM;
> +
> +	data->prog = prog;
> +	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> +	return __bpf_probe_register(btp, prog, bpf_func, data);

I don't think we can do that, because it won't do the arg -> u64 conversion
that __bpf_trace_##call functions are doing:

	__bpf_trace_##call(void *__data, proto)                                 \
	{                                                                       \
		struct bpf_prog *prog = __data;                                 \
		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
	}

like for 'old_pid' arg in sched_process_exec tracepoint:

	ffffffff811959e0 <__bpf_trace_sched_process_exec>:
	ffffffff811959e0:       89 d2                   mov    %edx,%edx
	ffffffff811959e2:       e9 a9 07 14 00          jmp    ffffffff812d6190 <bpf_trace_run3>
	ffffffff811959e7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
	ffffffff811959ee:       00 00

bpf program could see some trash in args < u64

we'd need to add 'recursion' variant for all __bpf_trace_##call functions

jirka



> +}
> +
> +int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
> +				   struct bpf_raw_event_data *data)
> +{
> +	int err;
> +	void *bpf_func;
> +
> +	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> +	err = tracepoint_probe_unregister(btp->tp, bpf_func, data);
> +	free_percpu(data->active);
> +
> +	return err;
> +}
> +
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>  			    u32 *fd_type, const char **buf,
>  			    u64 *probe_offset, u64 *probe_addr)
> -- 
> 2.39.0.rc0.267.gcb52ba06e7-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-04 21:44           ` Jiri Olsa
@ 2022-12-07 13:39             ` Jiri Olsa
  2022-12-07 19:10               ` Alexei Starovoitov
  2022-12-08  2:47               ` Hao Sun
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2022-12-07 13:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Sun, Dec 04, 2022 at 10:44:52PM +0100, Jiri Olsa wrote:
> On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
> > On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
> > > > On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
> > > > > > On 11/21/22 10:31 PM, Jiri Olsa wrote:
> > > > > > > We hit following issues [1] [2] when we attach bpf program that calls
> > > > > > > bpf_trace_printk helper to the contention_begin tracepoint.
> > > > > > >
> > > > > > > As described in [3] with multiple bpf programs that call bpf_trace_printk
> > > > > > > helper attached to the contention_begin might result in exhaustion of
> > > > > > > printk buffer or cause a deadlock [2].
> > > > > > >
> > > > > > > There's also another possible deadlock when multiple bpf programs attach
> > > > > > > to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
> > > > > > >
> > > > > > > This change denies the attachment of bpf program to contention_begin
> > > > > > > and bpf_trace_printk tracepoints if the bpf program calls one of the
> > > > > > > printk bpf helpers.
> > > > > > >
> > > > > > > Adding also verifier check for tb_btf programs, so this can be cought
> > > > > > > in program loading time with error message like:
> > > > > > >
> > > > > > >    Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
> > > > > > > [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
> > > > > > > [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
> > > > > > >
> > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > > > > > > Suggested-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > > > ---
> > > > > > >   include/linux/bpf.h          |  1 +
> > > > > > >   include/linux/bpf_verifier.h |  2 ++
> > > > > > >   kernel/bpf/syscall.c         |  3 +++
> > > > > > >   kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
> > > > > > >   4 files changed, 52 insertions(+)
> > > > > > >
> > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > index c9eafa67f2a2..3ccabede0f50 100644
> > > > > > > --- a/include/linux/bpf.h
> > > > > > > +++ b/include/linux/bpf.h
> > > > > > > @@ -1319,6 +1319,7 @@ struct bpf_prog {
> > > > > > >                             enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > > > > > >                             call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > > > > > >                             call_get_func_ip:1, /* Do we call get_func_ip() */
> > > > > > > +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
> > > > > > >                             tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > > > > > >     enum bpf_prog_type      type;           /* Type of BPF program */
> > > > > > >     enum bpf_attach_type    expected_attach_type; /* For some prog types */
> > > > > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > > > > index 545152ac136c..7118c2fda59d 100644
> > > > > > > --- a/include/linux/bpf_verifier.h
> > > > > > > +++ b/include/linux/bpf_verifier.h
> > > > > > > @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
> > > > > > >                          struct bpf_reg_state *reg,
> > > > > > >                          enum bpf_arg_type arg_type);
> > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
> > > > > > > +
> > > > > > >   /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
> > > > > > >   static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
> > > > > > >                                          struct btf *btf, u32 btf_id)
> > > > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > > > > index 35972afb6850..9a69bda7d62b 100644
> > > > > > > --- a/kernel/bpf/syscall.c
> > > > > > > +++ b/kernel/bpf/syscall.c
> > > > > > > @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> > > > > > >             return -EINVAL;
> > > > > > >     }
> > > > > > > +   if (bpf_check_tp_printk_denylist(tp_name, prog))
> > > > > > > +           return -EACCES;
> > > > > > > +
> > > > > > >     btp = bpf_get_raw_tracepoint(tp_name);
> > > > > > >     if (!btp)
> > > > > > >             return -ENOENT;
> > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > > index f07bec227fef..b662bc851e1c 100644
> > > > > > > --- a/kernel/bpf/verifier.c
> > > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > > @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
> > > > > > >                              state->callback_subprogno == subprogno);
> > > > > > >   }
> > > > > > > +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
> > > > > > > +{
> > > > > > > +   static const char * const denylist[] = {
> > > > > > > +           "contention_begin",
> > > > > > > +           "bpf_trace_printk",
> > > > > > > +   };
> > > > > > > +   int i;
> > > > > > > +
> > > > > > > +   /* Do not allow attachment to denylist[] tracepoints,
> > > > > > > +    * if the program calls some of the printk helpers,
> > > > > > > +    * because there's possibility of deadlock.
> > > > > > > +    */
> > > > > >
> > > > > > What if that prog doesn't but tail calls into another one which calls printk helpers?
> > > > >
> > > > > right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
> > > > > because I don't see easy way to check on that
> > > > >
> > > > > we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
> > > > > because verifier known the exact tracepoint already
> > > >
> > > > This is all fragile and merely a stop gap.
> > > > Doesn't sound that the issue is limited to bpf_trace_printk
> > >
> > > hm, I don't have a better idea how to fix that.. I can't deny
> > > contention_begin completely, because we use it in perf via
> > > tp_btf/contention_begin (perf lock contention) and I don't
> > > think there's another way for perf to do that
> > >
> > > fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
> > > programs completely and tracing BPF_TRACE_RAW_TP with printks
> > >
> > 
> > I think disabling bpf_trace_printk() tracepoint for any BPF program is
> > totally fine. This tracepoint was never intended to be attached to.
> > 
> > But as for the general bpf_trace_printk() deadlocking. Should we
> > discuss how to make it not deadlock instead of starting to denylist
> > things left and right?
> > 
> > Do I understand that we take trace_printk_lock only to protect that
> > static char buf[]? Can we just make this buf per-CPU and do a trylock
> > instead? We'll only fail to bpf_trace_printk() something if we have
> > nested BPF programs (rare) or NMI (also rare).
> 
> ugh, sorry I overlooked your reply :-\
> 
> sounds good.. if it'd be acceptable to use trylock, we'd get rid of the
> contention_begin tracepoint being triggered, which was the case for deadlock

looks like we can remove the spinlock completely by using the
nested level buffer approach same way as in bpf_bprintf_prepare

it gets rid of the contention_begin tracepoint, so I'm not being
able to trigger the issue in my test

jirka


---
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3bbd3f0c810c..d6afde7311f8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -369,33 +369,60 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
 	return &bpf_probe_write_user_proto;
 }
 
-static DEFINE_RAW_SPINLOCK(trace_printk_lock);
-
 #define MAX_TRACE_PRINTK_VARARGS	3
 #define BPF_TRACE_PRINTK_SIZE		1024
+#define BPF_TRACE_PRINTK_NEST		3
+
+struct trace_printk_buf {
+	char data[BPF_TRACE_PRINTK_NEST][BPF_TRACE_PRINTK_SIZE];
+	int nest;
+};
+static DEFINE_PER_CPU(struct trace_printk_buf, printk_buf);
+
+static void put_printk_buf(struct trace_printk_buf __percpu *buf)
+{
+	this_cpu_dec(buf->nest);
+	preempt_enable();
+}
+
+static bool get_printk_buf(struct trace_printk_buf __percpu *buf, char **data)
+{
+	int nest;
+
+	preempt_disable();
+	nest = this_cpu_inc_return(buf->nest);
+	if (nest > BPF_TRACE_PRINTK_NEST) {
+		put_printk_buf(buf);
+		return false;
+	}
+	*data = (char *) this_cpu_ptr(&buf->data[nest - 1]);
+	return true;
+}
 
 BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
 	   u64, arg2, u64, arg3)
 {
 	u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
 	u32 *bin_args;
-	static char buf[BPF_TRACE_PRINTK_SIZE];
-	unsigned long flags;
+	char *buf;
 	int ret;
 
+	if (!get_printk_buf(&printk_buf, &buf))
+		return -EBUSY;
+
 	ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args,
 				  MAX_TRACE_PRINTK_VARARGS);
 	if (ret < 0)
-		return ret;
+		goto out;
 
-	raw_spin_lock_irqsave(&trace_printk_lock, flags);
-	ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
+	ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
 
 	trace_bpf_trace_printk(buf);
-	raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
 
 	bpf_bprintf_cleanup();
 
+out:
+	put_printk_buf(&printk_buf);
 	return ret;
 }
 
@@ -427,31 +454,35 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	return &bpf_trace_printk_proto;
 }
 
+static DEFINE_PER_CPU(struct trace_printk_buf, vprintk_buf);
+
 BPF_CALL_4(bpf_trace_vprintk, char *, fmt, u32, fmt_size, const void *, data,
 	   u32, data_len)
 {
-	static char buf[BPF_TRACE_PRINTK_SIZE];
-	unsigned long flags;
 	int ret, num_args;
 	u32 *bin_args;
+	char *buf;
 
 	if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
 	    (data_len && !data))
 		return -EINVAL;
 	num_args = data_len / 8;
 
+	if (!get_printk_buf(&vprintk_buf, &buf))
+		return -EBUSY;
+
 	ret = bpf_bprintf_prepare(fmt, fmt_size, data, &bin_args, num_args);
 	if (ret < 0)
-		return ret;
+		goto out;
 
-	raw_spin_lock_irqsave(&trace_printk_lock, flags);
-	ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
+	ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
 
 	trace_bpf_trace_printk(buf);
-	raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
 
 	bpf_bprintf_cleanup();
 
+out:
+	put_printk_buf(&vprintk_buf);
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07  8:18                   ` Jiri Olsa
@ 2022-12-07 19:08                     ` Namhyung Kim
  2022-12-08  6:15                       ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2022-12-07 19:08 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Wed, Dec 7, 2022 at 12:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Dec 06, 2022 at 06:14:06PM -0800, Namhyung Kim wrote:
>
> SNIP
>
> > -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > +static void *bpf_trace_norecurse_funcs[12] = {
> > +     (void *)bpf_trace_run_norecurse1,
> > +     (void *)bpf_trace_run_norecurse2,
> > +     (void *)bpf_trace_run_norecurse3,
> > +     (void *)bpf_trace_run_norecurse4,
> > +     (void *)bpf_trace_run_norecurse5,
> > +     (void *)bpf_trace_run_norecurse6,
> > +     (void *)bpf_trace_run_norecurse7,
> > +     (void *)bpf_trace_run_norecurse8,
> > +     (void *)bpf_trace_run_norecurse9,
> > +     (void *)bpf_trace_run_norecurse10,
> > +     (void *)bpf_trace_run_norecurse11,
> > +     (void *)bpf_trace_run_norecurse12,
> > +};
> > +
> > +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > +                             void *func, void *data)
> >  {
> >       struct tracepoint *tp = btp->tp;
> >
> > @@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> >       if (prog->aux->max_tp_access > btp->writable_size)
> >               return -EINVAL;
> >
> > -     return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> > -                                                prog);
> > +     return tracepoint_probe_register_may_exist(tp, func, data);
> >  }
> >
> >  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> >  {
> > -     return __bpf_probe_register(btp, prog);
> > +     return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
> >  }
> >
> >  int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > @@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> >       return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> >  }
> >
> > +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > +                              struct bpf_raw_event_data *data)
> > +{
> > +     void *bpf_func;
> > +
> > +     data->active = alloc_percpu_gfp(int, GFP_KERNEL);
> > +     if (!data->active)
> > +             return -ENOMEM;
> > +
> > +     data->prog = prog;
> > +     bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> > +     return __bpf_probe_register(btp, prog, bpf_func, data);
>
> I don't think we can do that, because it won't do the arg -> u64 conversion
> that __bpf_trace_##call functions are doing:
>
>         __bpf_trace_##call(void *__data, proto)                                 \
>         {                                                                       \
>                 struct bpf_prog *prog = __data;                                 \
>                 CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
>         }
>
> like for 'old_pid' arg in sched_process_exec tracepoint:
>
>         ffffffff811959e0 <__bpf_trace_sched_process_exec>:
>         ffffffff811959e0:       89 d2                   mov    %edx,%edx
>         ffffffff811959e2:       e9 a9 07 14 00          jmp    ffffffff812d6190 <bpf_trace_run3>
>         ffffffff811959e7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
>         ffffffff811959ee:       00 00
>
> bpf program could see some trash in args < u64
>
> we'd need to add 'recursion' variant for all __bpf_trace_##call functions

Ah, ok.  So 'contention_begin' tracepoint has unsigned int flags.
perf lock contention BPF program properly uses the lower 4 bytes of flags,
but others might access the whole 8 bytes then they will see the garbage.
Is that your concern?

Hmm.. I think we can use BTF to get the size of each argument then do
the conversion.  Let me see..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07 13:39             ` Jiri Olsa
@ 2022-12-07 19:10               ` Alexei Starovoitov
  2022-12-08  2:47               ` Hao Sun
  1 sibling, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2022-12-07 19:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Hao Sun, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo

On Wed, Dec 7, 2022 at 5:39 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> looks like we can remove the spinlock completely by using the
> nested level buffer approach same way as in bpf_bprintf_prepare

imo that is a much better path forward.

> it gets rid of the contention_begin tracepoint, so I'm not being
> able to trigger the issue in my test
>
> jirka
>
>
> ---
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..d6afde7311f8 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -369,33 +369,60 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
>         return &bpf_probe_write_user_proto;
>  }
>
> -static DEFINE_RAW_SPINLOCK(trace_printk_lock);
> -
>  #define MAX_TRACE_PRINTK_VARARGS       3
>  #define BPF_TRACE_PRINTK_SIZE          1024
> +#define BPF_TRACE_PRINTK_NEST          3
> +
> +struct trace_printk_buf {
> +       char data[BPF_TRACE_PRINTK_NEST][BPF_TRACE_PRINTK_SIZE];
> +       int nest;
> +};
> +static DEFINE_PER_CPU(struct trace_printk_buf, printk_buf);
> +
> +static void put_printk_buf(struct trace_printk_buf __percpu *buf)
> +{
> +       this_cpu_dec(buf->nest);
> +       preempt_enable();
> +}
> +
> +static bool get_printk_buf(struct trace_printk_buf __percpu *buf, char **data)
> +{
> +       int nest;
> +
> +       preempt_disable();
> +       nest = this_cpu_inc_return(buf->nest);
> +       if (nest > BPF_TRACE_PRINTK_NEST) {
> +               put_printk_buf(buf);
> +               return false;
> +       }
> +       *data = (char *) this_cpu_ptr(&buf->data[nest - 1]);
> +       return true;
> +}
>
>  BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
>            u64, arg2, u64, arg3)
>  {
>         u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
>         u32 *bin_args;
> -       static char buf[BPF_TRACE_PRINTK_SIZE];
> -       unsigned long flags;
> +       char *buf;
>         int ret;
>
> +       if (!get_printk_buf(&printk_buf, &buf))
> +               return -EBUSY;
> +
>         ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args,
>                                   MAX_TRACE_PRINTK_VARARGS);
>         if (ret < 0)
> -               return ret;
> +               goto out;
>
> -       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> -       ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
> +       ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
>
>         trace_bpf_trace_printk(buf);
> -       raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
>
>         bpf_bprintf_cleanup();
>
> +out:
> +       put_printk_buf(&printk_buf);
>         return ret;
>  }
>
> @@ -427,31 +454,35 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
>         return &bpf_trace_printk_proto;
>  }
>
> +static DEFINE_PER_CPU(struct trace_printk_buf, vprintk_buf);
> +
>  BPF_CALL_4(bpf_trace_vprintk, char *, fmt, u32, fmt_size, const void *, data,
>            u32, data_len)
>  {
> -       static char buf[BPF_TRACE_PRINTK_SIZE];
> -       unsigned long flags;
>         int ret, num_args;
>         u32 *bin_args;
> +       char *buf;
>
>         if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
>             (data_len && !data))
>                 return -EINVAL;
>         num_args = data_len / 8;
>
> +       if (!get_printk_buf(&vprintk_buf, &buf))
> +               return -EBUSY;
> +
>         ret = bpf_bprintf_prepare(fmt, fmt_size, data, &bin_args, num_args);
>         if (ret < 0)
> -               return ret;
> +               goto out;
>
> -       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> -       ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
> +       ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
>
>         trace_bpf_trace_printk(buf);
> -       raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
>
>         bpf_bprintf_cleanup();
>
> +out:
> +       put_printk_buf(&vprintk_buf);
>         return ret;
>  }
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07  5:23                   ` Hao Sun
@ 2022-12-07 22:58                     ` Namhyung Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2022-12-07 22:58 UTC (permalink / raw)
  To: Hao Sun
  Cc: Alexei Starovoitov, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, bpf, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo

On Tue, Dec 6, 2022 at 9:23 PM Hao Sun <sunhao.th@gmail.com> wrote:
>
>
>
> > On 7 Dec 2022, at 10:14 AM, Namhyung Kim <namhyung@gmail.com> wrote:
> >
> > On Tue, Dec 06, 2022 at 12:09:51PM -0800, Alexei Starovoitov wrote:
> >> On Mon, Dec 5, 2022 at 4:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>> index 3bbd3f0c810c..d27b7dc77894 100644
> >>> --- a/kernel/trace/bpf_trace.c
> >>> +++ b/kernel/trace/bpf_trace.c
> >>> @@ -2252,9 +2252,8 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> >>> }
> >>>
> >>> static __always_inline
> >>> -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> >>> +void __bpf_trace_prog_run(struct bpf_prog *prog, u64 *args)
> >>> {
> >>> -       cant_sleep();
> >>>        if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> >>>                bpf_prog_inc_misses_counter(prog);
> >>>                goto out;
> >>> @@ -2266,6 +2265,22 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> >>>        this_cpu_dec(*(prog->active));
> >>> }
> >>>
> >>> +static __always_inline
> >>> +void __bpf_trace_run(struct bpf_raw_event_data *data, u64 *args)
> >>> +{
> >>> +       struct bpf_prog *prog = data->prog;
> >>> +
> >>> +       cant_sleep();
> >>> +       if (unlikely(!data->recursion))
> >>> +               return __bpf_trace_prog_run(prog, args);
> >>> +
> >>> +       if (unlikely(this_cpu_inc_return(*(data->recursion))))
> >>> +               goto out;
> >>> +       __bpf_trace_prog_run(prog, args);
> >>> +out:
> >>> +       this_cpu_dec(*(data->recursion));
> >>> +}
> >>
> >> This is way too much run-time and memory overhead to address
> >> this corner case. Pls come up with some other approach.
> >> Sorry I don't have decent suggestions at the moment.
> >> For now we can simply disallow attaching to contention_begin.
> >>
> >
> > How about this?  It seems to work for me.
>
> How about progs that are attached with kprobe?
> See this one:
> https://lore.kernel.org/bpf/CACkBjsb3GRw5aiTT=RCUs3H5aum_QN+B0ZqZA=MvjspUP6NFMg@mail.gmail.com/T/#u

Oh sorry, I'm just talking about the lock contention tracepoints.
For kprobe + printk, I don't have a good solution and I think it needs
some rework to use trylock as Andrii mentioned.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07 13:39             ` Jiri Olsa
  2022-12-07 19:10               ` Alexei Starovoitov
@ 2022-12-08  2:47               ` Hao Sun
  1 sibling, 0 replies; 24+ messages in thread
From: Hao Sun @ 2022-12-08  2:47 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, bpf, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo



> On 7 Dec 2022, at 9:39 PM, Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> On Sun, Dec 04, 2022 at 10:44:52PM +0100, Jiri Olsa wrote:
>> On Wed, Nov 30, 2022 at 03:29:39PM -0800, Andrii Nakryiko wrote:
>>> On Fri, Nov 25, 2022 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>>>> 
>>>> On Thu, Nov 24, 2022 at 09:17:22AM -0800, Alexei Starovoitov wrote:
>>>>> On Thu, Nov 24, 2022 at 1:42 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>>>>>> 
>>>>>> On Thu, Nov 24, 2022 at 01:41:23AM +0100, Daniel Borkmann wrote:
>>>>>>> On 11/21/22 10:31 PM, Jiri Olsa wrote:
>>>>>>>> We hit following issues [1] [2] when we attach bpf program that calls
>>>>>>>> bpf_trace_printk helper to the contention_begin tracepoint.
>>>>>>>> 
>>>>>>>> As described in [3] with multiple bpf programs that call bpf_trace_printk
>>>>>>>> helper attached to the contention_begin might result in exhaustion of
>>>>>>>> printk buffer or cause a deadlock [2].
>>>>>>>> 
>>>>>>>> There's also another possible deadlock when multiple bpf programs attach
>>>>>>>> to bpf_trace_printk tracepoint and call one of the printk bpf helpers.
>>>>>>>> 
>>>>>>>> This change denies the attachment of bpf program to contention_begin
>>>>>>>> and bpf_trace_printk tracepoints if the bpf program calls one of the
>>>>>>>> printk bpf helpers.
>>>>>>>> 
>>>>>>>> Adding also verifier check for tb_btf programs, so this can be cought
>>>>>>>> in program loading time with error message like:
>>>>>>>> 
>>>>>>>>   Can't attach program with bpf_trace_printk#6 helper to contention_begin tracepoint.
>>>>>>>> 
>>>>>>>> [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
>>>>>>>> [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
>>>>>>>> [3] https://lore.kernel.org/bpf/Y2j6ivTwFmA0FtvY@krava/
>>>>>>>> 
>>>>>>>> Reported-by: Hao Sun <sunhao.th@gmail.com>
>>>>>>>> Suggested-by: Alexei Starovoitov <ast@kernel.org>
>>>>>>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>>>>>>>> ---
>>>>>>>>  include/linux/bpf.h          |  1 +
>>>>>>>>  include/linux/bpf_verifier.h |  2 ++
>>>>>>>>  kernel/bpf/syscall.c         |  3 +++
>>>>>>>>  kernel/bpf/verifier.c        | 46 ++++++++++++++++++++++++++++++++++++
>>>>>>>>  4 files changed, 52 insertions(+)
>>>>>>>> 
>>>>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>>>>>> index c9eafa67f2a2..3ccabede0f50 100644
>>>>>>>> --- a/include/linux/bpf.h
>>>>>>>> +++ b/include/linux/bpf.h
>>>>>>>> @@ -1319,6 +1319,7 @@ struct bpf_prog {
>>>>>>>>                            enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>>>>>>>>                            call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>>>>>>>>                            call_get_func_ip:1, /* Do we call get_func_ip() */
>>>>>>>> +                           call_printk:1, /* Do we call trace_printk/trace_vprintk  */
>>>>>>>>                            tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
>>>>>>>>    enum bpf_prog_type      type;           /* Type of BPF program */
>>>>>>>>    enum bpf_attach_type    expected_attach_type; /* For some prog types */
>>>>>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>>>>>> index 545152ac136c..7118c2fda59d 100644
>>>>>>>> --- a/include/linux/bpf_verifier.h
>>>>>>>> +++ b/include/linux/bpf_verifier.h
>>>>>>>> @@ -618,6 +618,8 @@ bool is_dynptr_type_expected(struct bpf_verifier_env *env,
>>>>>>>>                         struct bpf_reg_state *reg,
>>>>>>>>                         enum bpf_arg_type arg_type);
>>>>>>>> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog);
>>>>>>>> +
>>>>>>>>  /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
>>>>>>>>  static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
>>>>>>>>                                         struct btf *btf, u32 btf_id)
>>>>>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>>>>>> index 35972afb6850..9a69bda7d62b 100644
>>>>>>>> --- a/kernel/bpf/syscall.c
>>>>>>>> +++ b/kernel/bpf/syscall.c
>>>>>>>> @@ -3329,6 +3329,9 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
>>>>>>>>            return -EINVAL;
>>>>>>>>    }
>>>>>>>> +   if (bpf_check_tp_printk_denylist(tp_name, prog))
>>>>>>>> +           return -EACCES;
>>>>>>>> +
>>>>>>>>    btp = bpf_get_raw_tracepoint(tp_name);
>>>>>>>>    if (!btp)
>>>>>>>>            return -ENOENT;
>>>>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>>>>> index f07bec227fef..b662bc851e1c 100644
>>>>>>>> --- a/kernel/bpf/verifier.c
>>>>>>>> +++ b/kernel/bpf/verifier.c
>>>>>>>> @@ -7472,6 +7472,47 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
>>>>>>>>                             state->callback_subprogno == subprogno);
>>>>>>>>  }
>>>>>>>> +int bpf_check_tp_printk_denylist(const char *name, struct bpf_prog *prog)
>>>>>>>> +{
>>>>>>>> +   static const char * const denylist[] = {
>>>>>>>> +           "contention_begin",
>>>>>>>> +           "bpf_trace_printk",
>>>>>>>> +   };
>>>>>>>> +   int i;
>>>>>>>> +
>>>>>>>> +   /* Do not allow attachment to denylist[] tracepoints,
>>>>>>>> +    * if the program calls some of the printk helpers,
>>>>>>>> +    * because there's possibility of deadlock.
>>>>>>>> +    */
>>>>>>> 
>>>>>>> What if that prog doesn't but tail calls into another one which calls printk helpers?
>>>>>> 
>>>>>> right, I'll deny that for all BPF_PROG_TYPE_RAW_TRACEPOINT* programs,
>>>>>> because I don't see easy way to check on that
>>>>>> 
>>>>>> we can leave printk check for tracing BPF_TRACE_RAW_TP programs,
>>>>>> because verifier known the exact tracepoint already
>>>>> 
>>>>> This is all fragile and merely a stop gap.
>>>>> Doesn't sound that the issue is limited to bpf_trace_printk
>>>> 
>>>> hm, I don't have a better idea how to fix that.. I can't deny
>>>> contention_begin completely, because we use it in perf via
>>>> tp_btf/contention_begin (perf lock contention) and I don't
>>>> think there's another way for perf to do that
>>>> 
>>>> fwiw the last version below denies BPF_PROG_TYPE_RAW_TRACEPOINT
>>>> programs completely and tracing BPF_TRACE_RAW_TP with printks
>>>> 
>>> 
>>> I think disabling bpf_trace_printk() tracepoint for any BPF program is
>>> totally fine. This tracepoint was never intended to be attached to.
>>> 
>>> But as for the general bpf_trace_printk() deadlocking. Should we
>>> discuss how to make it not deadlock instead of starting to denylist
>>> things left and right?
>>> 
>>> Do I understand that we take trace_printk_lock only to protect that
>>> static char buf[]? Can we just make this buf per-CPU and do a trylock
>>> instead? We'll only fail to bpf_trace_printk() something if we have
>>> nested BPF programs (rare) or NMI (also rare).
>> 
>> ugh, sorry I overlooked your reply :-\
>> 
>> sounds good.. if it'd be acceptable to use trylock, we'd get rid of the
>> contention_begin tracepoint being triggered, which was the case for deadlock
> 
> looks like we can remove the spinlock completely by using the
> nested level buffer approach same way as in bpf_bprintf_prepare
> 
> it gets rid of the contention_begin tracepoint, so I'm not being
> able to trigger the issue in my test
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3bbd3f0c810c..d6afde7311f8 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -369,33 +369,60 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
> return &bpf_probe_write_user_proto;
> }
> 
> -static DEFINE_RAW_SPINLOCK(trace_printk_lock);
> -
> #define MAX_TRACE_PRINTK_VARARGS 3
> #define BPF_TRACE_PRINTK_SIZE 1024
> +#define BPF_TRACE_PRINTK_NEST 3
> +
> +struct trace_printk_buf {
> + char data[BPF_TRACE_PRINTK_NEST][BPF_TRACE_PRINTK_SIZE];
> + int nest;
> +};
> +static DEFINE_PER_CPU(struct trace_printk_buf, printk_buf);
> +
> +static void put_printk_buf(struct trace_printk_buf __percpu *buf)
> +{
> + this_cpu_dec(buf->nest);
> + preempt_enable();
> +}
> +
> +static bool get_printk_buf(struct trace_printk_buf __percpu *buf, char **data)
> +{
> + int nest;
> +
> + preempt_disable();
> + nest = this_cpu_inc_return(buf->nest);
> + if (nest > BPF_TRACE_PRINTK_NEST) {
> + put_printk_buf(buf);
> + return false;
> + }
> + *data = (char *) this_cpu_ptr(&buf->data[nest - 1]);
> + return true;
> +}
> 
> BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
>    u64, arg2, u64, arg3)
> {
> u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
> u32 *bin_args;
> - static char buf[BPF_TRACE_PRINTK_SIZE];
> - unsigned long flags;
> + char *buf;
> int ret;
> 
> + if (!get_printk_buf(&printk_buf, &buf))
> + return -EBUSY;
> +
> ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args,
>   MAX_TRACE_PRINTK_VARARGS);
> if (ret < 0)
> - return ret;
> + goto out;
> 
> - raw_spin_lock_irqsave(&trace_printk_lock, flags);
> - ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
> + ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
> 
> trace_bpf_trace_printk(buf);
> - raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
> 
> bpf_bprintf_cleanup();
> 
> +out:
> + put_printk_buf(&printk_buf);
> return ret;
> }
> 
> @@ -427,31 +454,35 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
> return &bpf_trace_printk_proto;
> }
> 
> +static DEFINE_PER_CPU(struct trace_printk_buf, vprintk_buf);
> +
> BPF_CALL_4(bpf_trace_vprintk, char *, fmt, u32, fmt_size, const void *, data,
>    u32, data_len)
> {
> - static char buf[BPF_TRACE_PRINTK_SIZE];
> - unsigned long flags;
> int ret, num_args;
> u32 *bin_args;
> + char *buf;
> 
> if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
>     (data_len && !data))
> return -EINVAL;
> num_args = data_len / 8;
> 
> + if (!get_printk_buf(&vprintk_buf, &buf))
> + return -EBUSY;
> +
> ret = bpf_bprintf_prepare(fmt, fmt_size, data, &bin_args, num_args);
> if (ret < 0)
> - return ret;
> + goto out;
> 
> - raw_spin_lock_irqsave(&trace_printk_lock, flags);
> - ret = bstr_printf(buf, sizeof(buf), fmt, bin_args);
> + ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args);
> 
> trace_bpf_trace_printk(buf);
> - raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
> 
> bpf_bprintf_cleanup();
> 
> +out:
> + put_printk_buf(&vprintk_buf);
> return ret;
> }

Tested on a latest bpf-next build, I can confirm the patch also fixes
this[1] issue.

[1] https://lore.kernel.org/bpf/CACkBjsb3GRw5aiTT=RCUs3H5aum_QN+B0ZqZA=MvjspUP6NFMg@mail.gmail.com/T/#u

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-07 19:08                     ` Namhyung Kim
@ 2022-12-08  6:15                       ` Namhyung Kim
  2022-12-08 12:04                         ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2022-12-08  6:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Wed, Dec 07, 2022 at 11:08:40AM -0800, Namhyung Kim wrote:
> On Wed, Dec 7, 2022 at 12:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Dec 06, 2022 at 06:14:06PM -0800, Namhyung Kim wrote:
> >
> > SNIP
> >
> > > -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > +static void *bpf_trace_norecurse_funcs[12] = {
> > > +     (void *)bpf_trace_run_norecurse1,
> > > +     (void *)bpf_trace_run_norecurse2,
> > > +     (void *)bpf_trace_run_norecurse3,
> > > +     (void *)bpf_trace_run_norecurse4,
> > > +     (void *)bpf_trace_run_norecurse5,
> > > +     (void *)bpf_trace_run_norecurse6,
> > > +     (void *)bpf_trace_run_norecurse7,
> > > +     (void *)bpf_trace_run_norecurse8,
> > > +     (void *)bpf_trace_run_norecurse9,
> > > +     (void *)bpf_trace_run_norecurse10,
> > > +     (void *)bpf_trace_run_norecurse11,
> > > +     (void *)bpf_trace_run_norecurse12,
> > > +};
> > > +
> > > +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > > +                             void *func, void *data)
> > >  {
> > >       struct tracepoint *tp = btp->tp;
> > >
> > > @@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> > >       if (prog->aux->max_tp_access > btp->writable_size)
> > >               return -EINVAL;
> > >
> > > -     return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> > > -                                                prog);
> > > +     return tracepoint_probe_register_may_exist(tp, func, data);
> > >  }
> > >
> > >  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > >  {
> > > -     return __bpf_probe_register(btp, prog);
> > > +     return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
> > >  }
> > >
> > >  int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > @@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > >       return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> > >  }
> > >
> > > +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > > +                              struct bpf_raw_event_data *data)
> > > +{
> > > +     void *bpf_func;
> > > +
> > > +     data->active = alloc_percpu_gfp(int, GFP_KERNEL);
> > > +     if (!data->active)
> > > +             return -ENOMEM;
> > > +
> > > +     data->prog = prog;
> > > +     bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> > > +     return __bpf_probe_register(btp, prog, bpf_func, data);
> >
> > I don't think we can do that, because it won't do the arg -> u64 conversion
> > that __bpf_trace_##call functions are doing:
> >
> >         __bpf_trace_##call(void *__data, proto)                                 \
> >         {                                                                       \
> >                 struct bpf_prog *prog = __data;                                 \
> >                 CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> >         }
> >
> > like for 'old_pid' arg in sched_process_exec tracepoint:
> >
> >         ffffffff811959e0 <__bpf_trace_sched_process_exec>:
> >         ffffffff811959e0:       89 d2                   mov    %edx,%edx
> >         ffffffff811959e2:       e9 a9 07 14 00          jmp    ffffffff812d6190 <bpf_trace_run3>
> >         ffffffff811959e7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
> >         ffffffff811959ee:       00 00
> >
> > bpf program could see some trash in args < u64
> >
> > we'd need to add 'recursion' variant for all __bpf_trace_##call functions
> 
> Ah, ok.  So 'contention_begin' tracepoint has unsigned int flags.
> perf lock contention BPF program properly uses the lower 4 bytes of flags,
> but others might access the whole 8 bytes then they will see the garbage.
> Is that your concern?
> 
> Hmm.. I think we can use BTF to get the size of each argument then do
> the conversion.  Let me see..

Maybe something like this?  But I'm not sure if I did cast_to_u64() right.

Thanks,
Namhyung

---
 include/linux/trace_events.h    |  14 ++++
 include/linux/tracepoint-defs.h |   6 ++
 kernel/bpf/syscall.c            |  18 ++++-
 kernel/trace/bpf_trace.c        | 119 ++++++++++++++++++++++++++++++--
 4 files changed, 150 insertions(+), 7 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index f14d41bc7342..73bcc0378719 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -481,6 +481,10 @@ void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				 struct bpf_raw_event_data *data);
+int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+				   struct bpf_raw_event_data *data);
 struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
 void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
@@ -514,6 +518,16 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf
 {
 	return -EOPNOTSUPP;
 }
+static inline int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *p,
+					       struct bpf_raw_event_data *data)
+{
+	return -EOPNOTSUPP;
+}
+static inline int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+						 struct bpf_raw_event_data *data)
+{
+	return -EOPNOTSUPP;
+}
 static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
 {
 	return NULL;
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 0279bf79f113..a8f93cf9c471 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -42,6 +42,12 @@ struct bpf_raw_event_map {
 	u32			writable_size;
 } __aligned(32);
 
+struct bpf_raw_event_data {
+	struct bpf_prog		*prog;
+	int __percpu		*active;
+	u8			arg_sizes[12];
+};
+
 /*
  * If a tracepoint needs to be called from a header file, it is not
  * recommended to call it directly, as tracepoints in header files
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 71aa93697afa..1a4483e33ff3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3156,14 +3156,24 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 struct bpf_raw_tp_link {
 	struct bpf_link link;
 	struct bpf_raw_event_map *btp;
+	struct bpf_raw_event_data data;
 };
 
+static bool needs_recursion_check(struct bpf_raw_event_map *btp)
+{
+	return !strcmp(btp->tp->name, "contention_begin");
+}
+
 static void bpf_raw_tp_link_release(struct bpf_link *link)
 {
 	struct bpf_raw_tp_link *raw_tp =
 		container_of(link, struct bpf_raw_tp_link, link);
 
-	bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+	if (needs_recursion_check(raw_tp->btp))
+		bpf_probe_unregister_norecurse(raw_tp->btp, &raw_tp->data);
+	else
+		bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+
 	bpf_put_raw_tracepoint(raw_tp->btp);
 }
 
@@ -3360,7 +3370,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 		goto out_put_btp;
 	}
 
-	err = bpf_probe_register(link->btp, prog);
+	if (needs_recursion_check(link->btp))
+		err = bpf_probe_register_norecurse(link->btp, prog, &link->data);
+	else
+		err = bpf_probe_register(link->btp, prog);
+
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fc956d7bdff7..10048955c982 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2069,6 +2069,13 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 	this_cpu_dec(*(prog->active));
 }
 
+/* Actual *arg* is not in u64, copy arg to dst with a proper size */
+static void cast_to_u64(u64 *dst, u64 arg, u8 size)
+{
+	*dst = 0;
+	memcpy(dst, &arg, size);
+}
+
 #define UNPACK(...)			__VA_ARGS__
 #define REPEAT_1(FN, DL, X, ...)	FN(X)
 #define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
@@ -2086,6 +2093,7 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 
 #define SARG(X)		u64 arg##X
 #define COPY(X)		args[X] = arg##X
+#define CAST(X)		cast_to_u64(&args[X], arg##X, data->arg_sizes[X])
 
 #define __DL_COM	(,)
 #define __DL_SEM	(;)
@@ -2100,7 +2108,20 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
 		__bpf_trace_run(prog, args);				\
 	}								\
-	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
+	EXPORT_SYMBOL_GPL(bpf_trace_run##x);				\
+									\
+	static void bpf_trace_run_norecurse##x(struct bpf_raw_event_data *data,	\
+			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
+	{								\
+		u64 args[x];						\
+		if (unlikely(this_cpu_inc_return(*(data->active)) != 1)) \
+			goto out;					\
+		REPEAT(x, CAST, __DL_SEM, __SEQ_0_11);			\
+		__bpf_trace_run(data->prog, args);			\
+	out:								\
+		this_cpu_dec(*(data->active));				\
+	}
+
 BPF_TRACE_DEFN_x(1);
 BPF_TRACE_DEFN_x(2);
 BPF_TRACE_DEFN_x(3);
@@ -2114,7 +2135,23 @@ BPF_TRACE_DEFN_x(10);
 BPF_TRACE_DEFN_x(11);
 BPF_TRACE_DEFN_x(12);
 
-static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+static void *bpf_trace_norecurse_funcs[12] = {
+	(void *)bpf_trace_run_norecurse1,
+	(void *)bpf_trace_run_norecurse2,
+	(void *)bpf_trace_run_norecurse3,
+	(void *)bpf_trace_run_norecurse4,
+	(void *)bpf_trace_run_norecurse5,
+	(void *)bpf_trace_run_norecurse6,
+	(void *)bpf_trace_run_norecurse7,
+	(void *)bpf_trace_run_norecurse8,
+	(void *)bpf_trace_run_norecurse9,
+	(void *)bpf_trace_run_norecurse10,
+	(void *)bpf_trace_run_norecurse11,
+	(void *)bpf_trace_run_norecurse12,
+};
+
+static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				void *func, void *data)
 {
 	struct tracepoint *tp = btp->tp;
 
@@ -2128,13 +2165,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 	if (prog->aux->max_tp_access > btp->writable_size)
 		return -EINVAL;
 
-	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
-						   prog);
+	return tracepoint_probe_register_may_exist(tp, func, data);
 }
 
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	return __bpf_probe_register(btp, prog);
+	return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
 }
 
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
@@ -2142,6 +2178,79 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
 }
 
+int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
+				 struct bpf_raw_event_data *data)
+{
+	const struct btf *btf;
+	const struct btf_type *t;
+	struct btf_param *p;
+	char *tp_typedef_name;
+	void *bpf_func;
+	s32 type_id;
+	u32 i, size;
+
+	btf = bpf_get_btf_vmlinux();
+	if (IS_ERR_OR_NULL(btf))
+		return btf ? PTR_ERR(btf) : -EINVAL;
+
+	tp_typedef_name = kasprintf(GFP_KERNEL, "btf_trace_%s", btp->tp->name);
+	if (tp_typedef_name == NULL)
+		return -ENOMEM;
+
+	type_id = btf_find_by_name_kind(btf, tp_typedef_name, BTF_KIND_TYPEDEF);
+	kfree(tp_typedef_name);
+
+	if (type_id == -ENOENT)
+		return -EINVAL;
+
+	t = btf_type_by_id(btf, type_id);
+	if (t == NULL)
+		return -EINVAL;
+
+	t = btf_type_by_id(btf, t->type);
+	if (t == NULL || !btf_is_ptr(t))
+		return -EINVAL;
+
+	t = btf_type_by_id(btf, t->type);
+	if (t == NULL || !btf_type_is_func_proto(t))
+		return -EINVAL;
+
+	WARN_ON_ONCE(btp->num_args != btf_vlen(t));
+
+	for (i = 0, p = btf_params(t); i < btp->num_args; i++, p++) {
+		t = btf_type_by_id(btf, p->type);
+		if (t == NULL)
+			return -EINVAL;
+
+		btf_resolve_size(btf, t, &size);
+		if (size > 8)
+			return -EINVAL;
+
+		data->arg_sizes[i] = size;
+	}
+
+	data->active = alloc_percpu_gfp(int, GFP_KERNEL);
+	if (!data->active)
+		return -ENOMEM;
+
+	data->prog = prog;
+	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
+	return __bpf_probe_register(btp, prog, bpf_func, data);
+}
+
+int bpf_probe_unregister_norecurse(struct bpf_raw_event_map *btp,
+				   struct bpf_raw_event_data *data)
+{
+	int err;
+	void *bpf_func;
+
+	bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
+	err = tracepoint_probe_unregister(btp->tp, bpf_func, data);
+	free_percpu(data->active);
+
+	return err;
+}
+
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 			    u32 *fd_type, const char **buf,
 			    u64 *probe_offset, u64 *probe_addr)
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints
  2022-12-08  6:15                       ` Namhyung Kim
@ 2022-12-08 12:04                         ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2022-12-08 12:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Hao Sun, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo

On Wed, Dec 07, 2022 at 10:15:12PM -0800, Namhyung Kim wrote:
> On Wed, Dec 07, 2022 at 11:08:40AM -0800, Namhyung Kim wrote:
> > On Wed, Dec 7, 2022 at 12:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Tue, Dec 06, 2022 at 06:14:06PM -0800, Namhyung Kim wrote:
> > >
> > > SNIP
> > >
> > > > -static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > > +static void *bpf_trace_norecurse_funcs[12] = {
> > > > +     (void *)bpf_trace_run_norecurse1,
> > > > +     (void *)bpf_trace_run_norecurse2,
> > > > +     (void *)bpf_trace_run_norecurse3,
> > > > +     (void *)bpf_trace_run_norecurse4,
> > > > +     (void *)bpf_trace_run_norecurse5,
> > > > +     (void *)bpf_trace_run_norecurse6,
> > > > +     (void *)bpf_trace_run_norecurse7,
> > > > +     (void *)bpf_trace_run_norecurse8,
> > > > +     (void *)bpf_trace_run_norecurse9,
> > > > +     (void *)bpf_trace_run_norecurse10,
> > > > +     (void *)bpf_trace_run_norecurse11,
> > > > +     (void *)bpf_trace_run_norecurse12,
> > > > +};
> > > > +
> > > > +static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > > > +                             void *func, void *data)
> > > >  {
> > > >       struct tracepoint *tp = btp->tp;
> > > >
> > > > @@ -2325,13 +2354,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
> > > >       if (prog->aux->max_tp_access > btp->writable_size)
> > > >               return -EINVAL;
> > > >
> > > > -     return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> > > > -                                                prog);
> > > > +     return tracepoint_probe_register_may_exist(tp, func, data);
> > > >  }
> > > >
> > > >  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > >  {
> > > > -     return __bpf_probe_register(btp, prog);
> > > > +     return __bpf_probe_register(btp, prog, btp->bpf_func, prog);
> > > >  }
> > > >
> > > >  int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > > @@ -2339,6 +2367,33 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> > > >       return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> > > >  }
> > > >
> > > > +int bpf_probe_register_norecurse(struct bpf_raw_event_map *btp, struct bpf_prog *prog,
> > > > +                              struct bpf_raw_event_data *data)
> > > > +{
> > > > +     void *bpf_func;
> > > > +
> > > > +     data->active = alloc_percpu_gfp(int, GFP_KERNEL);
> > > > +     if (!data->active)
> > > > +             return -ENOMEM;
> > > > +
> > > > +     data->prog = prog;
> > > > +     bpf_func = bpf_trace_norecurse_funcs[btp->num_args];
> > > > +     return __bpf_probe_register(btp, prog, bpf_func, data);
> > >
> > > I don't think we can do that, because it won't do the arg -> u64 conversion
> > > that __bpf_trace_##call functions are doing:
> > >
> > >         __bpf_trace_##call(void *__data, proto)                                 \
> > >         {                                                                       \
> > >                 struct bpf_prog *prog = __data;                                 \
> > >                 CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> > >         }
> > >
> > > like for 'old_pid' arg in sched_process_exec tracepoint:
> > >
> > >         ffffffff811959e0 <__bpf_trace_sched_process_exec>:
> > >         ffffffff811959e0:       89 d2                   mov    %edx,%edx
> > >         ffffffff811959e2:       e9 a9 07 14 00          jmp    ffffffff812d6190 <bpf_trace_run3>
> > >         ffffffff811959e7:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
> > >         ffffffff811959ee:       00 00
> > >
> > > bpf program could see some trash in args < u64
> > >
> > > we'd need to add 'recursion' variant for all __bpf_trace_##call functions
> > 
> > Ah, ok.  So 'contention_begin' tracepoint has unsigned int flags.
> > perf lock contention BPF program properly uses the lower 4 bytes of flags,
> > but others might access the whole 8 bytes then they will see the garbage.
> > Is that your concern?
> > 
> > Hmm.. I think we can use BTF to get the size of each argument then do
> > the conversion.  Let me see..
> 
> Maybe something like this?  But I'm not sure if I did cast_to_u64() right.

I guess that would work, but now I like the idea of fixing the original
issue by removing the spinlock from printk helpers completely

we might need to come back to something like this in future if we hit
similar issue and won't have better way to fix it

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-12-08 12:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21 21:31 [PATCH bpf-next] bpf: Restrict attachment of bpf program to some tracepoints Jiri Olsa
2022-11-24  0:41 ` Daniel Borkmann
2022-11-24  9:42   ` Jiri Olsa
2022-11-24 17:17     ` Alexei Starovoitov
2022-11-25  9:35       ` Jiri Olsa
2022-11-30 23:29         ` Andrii Nakryiko
2022-12-03 17:58           ` Namhyung Kim
2022-12-05 12:28             ` Jiri Olsa
2022-12-06  4:00               ` Namhyung Kim
2022-12-06  8:14                 ` Jiri Olsa
2022-12-06 18:20                   ` Namhyung Kim
2022-12-06 20:09               ` Alexei Starovoitov
2022-12-07  2:14                 ` Namhyung Kim
2022-12-07  5:23                   ` Hao Sun
2022-12-07 22:58                     ` Namhyung Kim
2022-12-07  8:18                   ` Jiri Olsa
2022-12-07 19:08                     ` Namhyung Kim
2022-12-08  6:15                       ` Namhyung Kim
2022-12-08 12:04                         ` Jiri Olsa
2022-12-04 21:44           ` Jiri Olsa
2022-12-07 13:39             ` Jiri Olsa
2022-12-07 19:10               ` Alexei Starovoitov
2022-12-08  2:47               ` Hao Sun
2022-12-03 17:42       ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.