All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: <davem@davemloft.net>, <daniel@iogearbox.net>,
	<torvalds@linux-foundation.org>, <peterz@infradead.org>,
	<netdev@vger.kernel.org>, <kernel-team@fb.com>,
	<linux-api@vger.kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH v6 bpf-next 08/11] bpf: introduce BPF_RAW_TRACEPOINT
Date: Tue, 27 Mar 2018 11:45:34 -0700	[thread overview]
Message-ID: <f071b59d-5d67-0eb6-c5b1-68094b0a5118@fb.com> (raw)
In-Reply-To: <20180327130211.284c8924@gandalf.local.home>

On 3/27/18 10:02 AM, Steven Rostedt wrote:
> On Mon, 26 Mar 2018 19:47:03 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>
>
>> Ctrl-C of tracing daemon or cmdline tool that uses this feature
>> will automatically detach bpf program, unload it and
>> unregister tracepoint probe.
>>
>> On the kernel side for_each_kernel_tracepoint() is used
>
> You need to update the change log to state
> kernel_tracepoint_find_by_name().

ahh. right. will do.

>> +#undef DECLARE_EVENT_CLASS
>> +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
>> +/* no 'static' here. The bpf probe functions are global */		\
>> +notrace void								\
>
> I'm curious to why you have notrace here? Since it is separate from
> perf and ftrace, for debugging purposes, it could be useful to allow
> function tracing to this function.

To avoid unnecessary overhead. And I don't think it's useful to trace 
them. They're tiny jump functions of one or two instructions.
Really no point wasting mentry on them.


>> +static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
>> +{
>> +	struct bpf_raw_tracepoint *raw_tp;
>> +	struct tracepoint *tp;
>> +	struct bpf_prog *prog;
>> +	char tp_name[128];
>> +	int tp_fd, err;
>> +
>> +	if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
>> +			      sizeof(tp_name) - 1) < 0)
>> +		return -EFAULT;
>> +	tp_name[sizeof(tp_name) - 1] = 0;
>> +
>> +	tp = kernel_tracepoint_find_by_name(tp_name);
>> +	if (!tp)
>> +		return -ENOENT;
>> +
>> +	raw_tp = kmalloc(sizeof(*raw_tp), GFP_USER | __GFP_ZERO);
>
> Please use kzalloc(), instead of open coding the "__GPF_ZERO"

right. will do

>
> Could you add some comments here to explain what the below is doing.

To write a proper comment I need to understand it and I don't.
That's the reason why I didn't put in in common header,
because it would require proper comment on what it is and
how one can use it.
I'm expecting Daniel to follow up on this.

>> +#define UNPACK(...)			__VA_ARGS__
>> +#define REPEAT_1(FN, DL, X, ...)	FN(X)
>> +#define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
>> +#define REPEAT_3(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
>> +#define REPEAT_4(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
>> +#define REPEAT_5(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
>> +#define REPEAT_6(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
>> +#define REPEAT_7(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
>> +#define REPEAT_8(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
>> +#define REPEAT_9(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
>> +#define REPEAT_10(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_9(FN, DL, __VA_ARGS__)
>> +#define REPEAT_11(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_10(FN, DL, __VA_ARGS__)
>> +#define REPEAT_12(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_11(FN, DL, __VA_ARGS__)
>> +#define REPEAT(X, FN, DL, ...)		REPEAT_##X(FN, DL, __VA_ARGS__)
>> +

>> +
>> +	snprintf(buf, sizeof(buf), "__bpf_trace_%s", tp->name);
>> +	addr = kallsyms_lookup_name(buf);
>> +	if (!addr)
>> +		return -ENOENT;
>> +
>> +	return tracepoint_probe_register(tp, (void *)addr, prog);
>
> You are putting in a hell of a lot of trust with kallsyms returning
> properly. I can see this being very fragile. This is calling a function
> based on the result of kallsyms. I'm sure the security folks would love
> this.
>
> There's a few things to make this a bit more robust. One is to add a
> table that points to all __bpf_trace_* functions, and verify that the
> result from kallsyms is in that table.
>
> Honestly, I think this is too much of a short cut and a hack. I know
> you want to keep it "simple" and save space, but you really should do
> it the same way ftrace and perf do it. That is, create a section and
> have all tracepoints create a structure that holds a pointer to the
> tracepoint and to the bpf probe function. Then you don't even need the
> kernel_tracepoint_find_by_name(), you just iterate over your table and
> you get the tracepoint and the bpf function associated to it.
>
> Relying on kallsyms to return an address to execute is just way too
> extreme and fragile for my liking.

Wasting extra 8bytes * number_of_tracepoints just for lack of trust
in kallsyms doesn't sound like good trade off to me.
If kallsyms are inaccurate all sorts of things will break:
kprobes, livepatch, etc.
I'd rather suggest for ftrace to use kallsyms approach as well
and reduce memory footprint.

WARNING: multiple messages have this Message-ID (diff)
From: Alexei Starovoitov <ast@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: davem@davemloft.net, daniel@iogearbox.net,
	torvalds@linux-foundation.org, peterz@infradead.org,
	netdev@vger.kernel.org, kernel-team@fb.com,
	linux-api@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH v6 bpf-next 08/11] bpf: introduce BPF_RAW_TRACEPOINT
Date: Tue, 27 Mar 2018 11:45:34 -0700	[thread overview]
Message-ID: <f071b59d-5d67-0eb6-c5b1-68094b0a5118@fb.com> (raw)
In-Reply-To: <20180327130211.284c8924@gandalf.local.home>

On 3/27/18 10:02 AM, Steven Rostedt wrote:
> On Mon, 26 Mar 2018 19:47:03 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>
>
>> Ctrl-C of tracing daemon or cmdline tool that uses this feature
>> will automatically detach bpf program, unload it and
>> unregister tracepoint probe.
>>
>> On the kernel side for_each_kernel_tracepoint() is used
>
> You need to update the change log to state
> kernel_tracepoint_find_by_name().

ahh. right. will do.

>> +#undef DECLARE_EVENT_CLASS
>> +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
>> +/* no 'static' here. The bpf probe functions are global */		\
>> +notrace void								\
>
> I'm curious to why you have notrace here? Since it is separate from
> perf and ftrace, for debugging purposes, it could be useful to allow
> function tracing to this function.

To avoid unnecessary overhead. And I don't think it's useful to trace 
them. They're tiny jump functions of one or two instructions.
Really no point wasting mentry on them.


>> +static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
>> +{
>> +	struct bpf_raw_tracepoint *raw_tp;
>> +	struct tracepoint *tp;
>> +	struct bpf_prog *prog;
>> +	char tp_name[128];
>> +	int tp_fd, err;
>> +
>> +	if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
>> +			      sizeof(tp_name) - 1) < 0)
>> +		return -EFAULT;
>> +	tp_name[sizeof(tp_name) - 1] = 0;
>> +
>> +	tp = kernel_tracepoint_find_by_name(tp_name);
>> +	if (!tp)
>> +		return -ENOENT;
>> +
>> +	raw_tp = kmalloc(sizeof(*raw_tp), GFP_USER | __GFP_ZERO);
>
> Please use kzalloc(), instead of open coding the "__GPF_ZERO"

right. will do

>
> Could you add some comments here to explain what the below is doing.

To write a proper comment I need to understand it and I don't.
That's the reason why I didn't put in in common header,
because it would require proper comment on what it is and
how one can use it.
I'm expecting Daniel to follow up on this.

>> +#define UNPACK(...)			__VA_ARGS__
>> +#define REPEAT_1(FN, DL, X, ...)	FN(X)
>> +#define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
>> +#define REPEAT_3(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
>> +#define REPEAT_4(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
>> +#define REPEAT_5(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
>> +#define REPEAT_6(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
>> +#define REPEAT_7(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
>> +#define REPEAT_8(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
>> +#define REPEAT_9(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
>> +#define REPEAT_10(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_9(FN, DL, __VA_ARGS__)
>> +#define REPEAT_11(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_10(FN, DL, __VA_ARGS__)
>> +#define REPEAT_12(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_11(FN, DL, __VA_ARGS__)
>> +#define REPEAT(X, FN, DL, ...)		REPEAT_##X(FN, DL, __VA_ARGS__)
>> +

>> +
>> +	snprintf(buf, sizeof(buf), "__bpf_trace_%s", tp->name);
>> +	addr = kallsyms_lookup_name(buf);
>> +	if (!addr)
>> +		return -ENOENT;
>> +
>> +	return tracepoint_probe_register(tp, (void *)addr, prog);
>
> You are putting in a hell of a lot of trust with kallsyms returning
> properly. I can see this being very fragile. This is calling a function
> based on the result of kallsyms. I'm sure the security folks would love
> this.
>
> There's a few things to make this a bit more robust. One is to add a
> table that points to all __bpf_trace_* functions, and verify that the
> result from kallsyms is in that table.
>
> Honestly, I think this is too much of a short cut and a hack. I know
> you want to keep it "simple" and save space, but you really should do
> it the same way ftrace and perf do it. That is, create a section and
> have all tracepoints create a structure that holds a pointer to the
> tracepoint and to the bpf probe function. Then you don't even need the
> kernel_tracepoint_find_by_name(), you just iterate over your table and
> you get the tracepoint and the bpf function associated to it.
>
> Relying on kallsyms to return an address to execute is just way too
> extreme and fragile for my liking.

Wasting extra 8bytes * number_of_tracepoints just for lack of trust
in kallsyms doesn't sound like good trade off to me.
If kallsyms are inaccurate all sorts of things will break:
kprobes, livepatch, etc.
I'd rather suggest for ftrace to use kallsyms approach as well
and reduce memory footprint.

  parent reply	other threads:[~2018-03-27 18:46 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-27  2:46 [PATCH v6 bpf-next 00/11] bpf, tracing: introduce bpf raw tracepoints Alexei Starovoitov
2018-03-27  2:46 ` Alexei Starovoitov
2018-03-27  2:46 ` [PATCH v6 bpf-next 01/11] treewide: remove large struct-pass-by-value from tracepoint arguments Alexei Starovoitov
2018-03-27  2:46   ` Alexei Starovoitov
2018-03-27  2:46 ` [PATCH v6 bpf-next 02/11] net/mediatek: disambiguate mt76 vs mt7601u trace events Alexei Starovoitov
2018-03-27  2:46   ` Alexei Starovoitov
2018-03-27  2:46 ` [PATCH v6 bpf-next 03/11] net/mac802154: disambiguate mac80215 vs mac802154 " Alexei Starovoitov
2018-03-27  2:46   ` Alexei Starovoitov
2018-03-27  2:46 ` [PATCH v6 bpf-next 04/11] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint Alexei Starovoitov
2018-03-27  2:46   ` Alexei Starovoitov
2018-03-27  2:47 ` [PATCH v6 bpf-next 05/11] macro: introduce COUNT_ARGS() macro Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27  2:47 ` [PATCH v6 bpf-next 06/11] tracepoint: compute num_args at build time Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27 15:15   ` Steven Rostedt
2018-03-27 15:15     ` Steven Rostedt
2018-03-27  2:47 ` [PATCH v6 bpf-next 07/11] tracepoint: introduce kernel_tracepoint_find_by_name Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27 14:07   ` Steven Rostedt
2018-03-27 14:07     ` Steven Rostedt
2018-03-27 14:18     ` Mathieu Desnoyers
2018-03-27 14:42       ` Steven Rostedt
2018-03-27 15:53         ` Alexei Starovoitov
2018-03-27 16:09           ` Mathieu Desnoyers
2018-03-27 16:36           ` Daniel Borkmann
2018-03-27  2:47 ` [PATCH v6 bpf-next 08/11] bpf: introduce BPF_RAW_TRACEPOINT Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27 17:02   ` Steven Rostedt
2018-03-27 17:02     ` Steven Rostedt
2018-03-27 17:11     ` Steven Rostedt
2018-03-27 17:11       ` Steven Rostedt
2018-03-27 18:58       ` Steven Rostedt
2018-03-27 18:58         ` Steven Rostedt
2018-03-27 21:04         ` Steven Rostedt
2018-03-27 21:04           ` Steven Rostedt
2018-03-27 22:48           ` Alexei Starovoitov
2018-03-27 22:48             ` Alexei Starovoitov
2018-03-27 23:13             ` Mathieu Desnoyers
2018-03-28  0:00               ` Alexei Starovoitov
2018-03-28  0:44                 ` Mathieu Desnoyers
2018-03-28  0:51                   ` Alexei Starovoitov
2018-03-28 14:06                     ` Steven Rostedt
2018-03-27 18:45     ` Alexei Starovoitov [this message]
2018-03-27 18:45       ` Alexei Starovoitov
2018-03-27 19:00       ` Steven Rostedt
2018-03-27 19:00         ` Steven Rostedt
2018-03-27 19:07         ` Steven Rostedt
2018-03-27 19:07           ` Steven Rostedt
2018-03-27 19:10         ` Steven Rostedt
2018-03-27 19:10           ` Steven Rostedt
2018-03-27 19:10         ` Mathieu Desnoyers
2018-03-27  2:47 ` [PATCH v6 bpf-next 09/11] libbpf: add bpf_raw_tracepoint_open helper Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27  2:47 ` [PATCH v6 bpf-next 10/11] samples/bpf: raw tracepoint test Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov
2018-03-27  2:47 ` [PATCH v6 bpf-next 11/11] selftests/bpf: test for bpf_get_stackid() from raw tracepoints Alexei Starovoitov
2018-03-27  2:47   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f071b59d-5d67-0eb6-c5b1-68094b0a5118@fb.com \
    --to=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=keescook@chromium.org \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.