[RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
@ 2021-09-20 15:11 Dave Marchevsky
  2021-09-20 15:11 ` [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo Dave Marchevsky
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Dave Marchevsky @ 2021-09-20 15:11 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, Dave Marchevsky

The verifier currently logs some useful statistics in
print_verification_stats. Although the text log is an effective feedback
tool for an engineer iterating on a single application, it would also be
useful to enable tracking these stats in a more structured form for
fleetwide or historical analysis, which this patchset attempts to do.

A concrete motivating usecase which came up in recent weeks:

A team owns a complex BPF program, with various folks extending its
functionality over the years. An engineer tries to make a relatively
simple addition but encounters "BPF program is too large. Processed
1000001 insn". 

Their changes bumped the processed insns from 700k to over the limit and
there's no obvious way to simplify. They must now consider a large
refactor in order to incorporate the new feature. What if there was some
previous change which bumped processed insns from 200k->700k which
_could_ be modified to stress verifier less? Tracking historical
verifier stats for each version of the program over the years would
reduce manual work necessary to find such a change.

Although parsing the text log could work for this scenario, a solution
that's resilient to log format and other verifier changes would be
preferable.

This patchset adds a bpf_prog_verif_stats struct - containing the same
data logged by print_verification_stats - which can be retrieved as part
of bpf_prog_info. Looking for general feedback on approach and a few
specific areas before fleshing it out further:

* None of my usecases require storing verif_stats for the lifetime of a
  loaded prog, but adding to bpf_prog_aux felt more correct than trying
  to pass verif_stats back as part of BPF_PROG_LOAD
* The verif_stats are probably not generally useful enough to warrant
  inclusion in fdinfo, but hoping to get confirmation before removing
  that change in patch 1
* processed_insn, verification_time, and total_states are immediately
  useful for me, rest were added for parity with
	print_verification_stats. Can remove.
* Perhaps a version field would be useful in verif_stats in case future
  verifier changes make some current stats meaningless
* Note: stack_depth stat was intentionally skipped to keep patch 1
  simple. Will add if approach looks good.

Dave Marchevsky (2):
  bpf: add verifier stats to bpf_prog_info and fdinfo
  selftests/bpf: add verif_stats test

 include/linux/bpf.h                           |  1 +
 include/uapi/linux/bpf.h                      | 10 ++++++
 kernel/bpf/syscall.c                          | 20 +++++++++--
 kernel/bpf/verifier.c                         | 13 +++++++
 tools/include/uapi/linux/bpf.h                | 10 ++++++
 .../selftests/bpf/prog_tests/verif_stats.c    | 34 +++++++++++++++++++
 6 files changed, 86 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/verif_stats.c

-- 
2.30.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo
  2021-09-20 15:11 [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Dave Marchevsky
@ 2021-09-20 15:11 ` Dave Marchevsky
  2021-09-20 15:11 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add verif_stats test Dave Marchevsky
  2021-09-23 20:51 ` [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Alexei Starovoitov
  2 siblings, 0 replies; 12+ messages in thread
From: Dave Marchevsky @ 2021-09-20 15:11 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, Dave Marchevsky

These stats are currently printed in the verifier log and not stored
anywhere. To ease consumption of this data, add a bpf_prog_verif_stats
struct to bpf_prog_aux so they can be exposed via BPF_OBJ_GET_INFO_BY_FD
and fdinfo.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       | 10 ++++++++++
 kernel/bpf/syscall.c           | 20 ++++++++++++++++++--
 kernel/bpf/verifier.c          | 13 +++++++++++++
 tools/include/uapi/linux/bpf.h | 10 ++++++++++
 5 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b6c45a6cbbba..206c19b253b7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -876,6 +876,7 @@ struct bpf_prog_aux {
 	struct bpf_kfunc_desc_tab *kfunc_tab;
 	u32 size_poke_tab;
 	struct bpf_ksym ksym;
+	struct bpf_prog_verif_stats verif_stats;
 	const struct bpf_prog_ops *ops;
 	struct bpf_map **used_maps;
 	struct mutex used_maps_mutex; /* mutex for used_maps and used_map_cnt */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6fc59d61937a..cb0fa49e62d7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5576,6 +5576,15 @@ struct sk_reuseport_md {
 
 #define BPF_TAG_SIZE	8
 
+struct bpf_prog_verif_stats {
+	__u64 verification_time;
+	__u32 insn_processed;
+	__u32 max_states_per_insn;
+	__u32 total_states;
+	__u32 peak_states;
+	__u32 longest_mark_read_walk;
+};
+
 struct bpf_prog_info {
 	__u32 type;
 	__u32 id;
@@ -5613,6 +5622,7 @@ struct bpf_prog_info {
 	__u64 run_time_ns;
 	__u64 run_cnt;
 	__u64 recursion_misses;
+	struct bpf_prog_verif_stats verif_stats;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e50c0bfdb7d..0fa95ebd4276 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1836,9 +1836,11 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 {
 	const struct bpf_prog *prog = filp->private_data;
 	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
+	struct bpf_prog_verif_stats *verif_stats;
 	struct bpf_prog_stats stats;
 
 	bpf_prog_get_stats(prog, &stats);
+	verif_stats = &prog->aux->verif_stats;
 	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
 	seq_printf(m,
 		   "prog_type:\t%u\n"
@@ -1848,7 +1850,13 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "prog_id:\t%u\n"
 		   "run_time_ns:\t%llu\n"
 		   "run_cnt:\t%llu\n"
-		   "recursion_misses:\t%llu\n",
+		   "recursion_misses:\t%llu\n"
+		   "verification_time:\t%llu\n"
+		   "verif_insn_processed:\t%u\n"
+		   "verif_max_states_per_insn:\t%u\n"
+		   "verif_total_states:\t%u\n"
+		   "verif_peak_states:\t%u\n"
+		   "verif_longest_mark_read_walk:\t%u\n",
 		   prog->type,
 		   prog->jited,
 		   prog_tag,
@@ -1856,7 +1864,13 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 		   prog->aux->id,
 		   stats.nsecs,
 		   stats.cnt,
-		   stats.misses);
+		   stats.misses,
+		   verif_stats->verification_time,
+		   verif_stats->insn_processed,
+		   verif_stats->max_states_per_insn,
+		   verif_stats->total_states,
+		   verif_stats->peak_states,
+		   verif_stats->longest_mark_read_walk);
 }
 #endif
 
@@ -3625,6 +3639,8 @@ static int bpf_prog_get_info_by_fd(struct file *file,
 	info.run_cnt = stats.cnt;
 	info.recursion_misses = stats.misses;
 
+	info.verif_stats = prog->aux->verif_stats;
+
 	if (!bpf_capable()) {
 		info.jited_prog_len = 0;
 		info.xlated_prog_len = 0;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e76b55917905..97cd3b71d5ae 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13245,6 +13245,18 @@ static void print_verification_stats(struct bpf_verifier_env *env)
 		env->peak_states, env->longest_mark_read_walk);
 }
 
+static void populate_aux_verif_stats(struct bpf_verifier_env *env)
+{
+	struct bpf_prog_verif_stats *verif_stats = &env->prog->aux->verif_stats;
+
+	verif_stats->verification_time = env->verification_time;
+	verif_stats->insn_processed = env->insn_processed;
+	verif_stats->max_states_per_insn = env->max_states_per_insn;
+	verif_stats->total_states = env->total_states;
+	verif_stats->peak_states = env->peak_states;
+	verif_stats->longest_mark_read_walk = env->longest_mark_read_walk;
+}
+
 static int check_struct_ops_btf_id(struct bpf_verifier_env *env)
 {
 	const struct btf_type *t, *func_proto;
@@ -13826,6 +13838,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
 
 	env->verification_time = ktime_get_ns() - start_time;
 	print_verification_stats(env);
+	populate_aux_verif_stats(env);
 
 	if (log->level && bpf_verifier_log_full(log))
 		ret = -ENOSPC;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6fc59d61937a..cb0fa49e62d7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5576,6 +5576,15 @@ struct sk_reuseport_md {
 
 #define BPF_TAG_SIZE	8
 
+struct bpf_prog_verif_stats {
+	__u64 verification_time;
+	__u32 insn_processed;
+	__u32 max_states_per_insn;
+	__u32 total_states;
+	__u32 peak_states;
+	__u32 longest_mark_read_walk;
+};
+
 struct bpf_prog_info {
 	__u32 type;
 	__u32 id;
@@ -5613,6 +5622,7 @@ struct bpf_prog_info {
 	__u64 run_time_ns;
 	__u64 run_cnt;
 	__u64 recursion_misses;
+	struct bpf_prog_verif_stats verif_stats;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 2/2] selftests/bpf: add verif_stats test
  2021-09-20 15:11 [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Dave Marchevsky
  2021-09-20 15:11 ` [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo Dave Marchevsky
@ 2021-09-20 15:11 ` Dave Marchevsky
  2021-09-23 20:51 ` [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Alexei Starovoitov
  2 siblings, 0 replies; 12+ messages in thread
From: Dave Marchevsky @ 2021-09-20 15:11 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song, Dave Marchevsky

verif_stats fields were added to response of bpf_obj_get_info_by_fd
call on a prog. Confirm that they're being populated by loading a simple
program and asking for its info.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/prog_tests/verif_stats.c    | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/verif_stats.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verif_stats.c b/tools/testing/selftests/bpf/prog_tests/verif_stats.c
new file mode 100644
index 000000000000..7bd9ccb0efb8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/verif_stats.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <test_progs.h>
+
+#include "trace_vprintk.lskel.h"
+
+void test_verif_stats(void)
+{
+	__u32 len = sizeof(struct bpf_prog_info);
+	struct bpf_prog_info info = {};
+	struct trace_vprintk *skel;
+	int err;
+
+	skel = trace_vprintk__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "trace_vprintk__open_and_load"))
+		goto cleanup;
+
+	if (!ASSERT_GT(skel->progs.sys_enter.prog_fd, 0, "sys_enter_fd > 0"))
+		goto cleanup;
+
+	err = bpf_obj_get_info_by_fd(skel->progs.sys_enter.prog_fd, &info, &len);
+	if (!ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
+		goto cleanup;
+
+	if (!ASSERT_GT(info.verif_stats.insn_processed, 0, "verif_stats.insn_processed"))
+		goto cleanup;
+
+	if (!ASSERT_GT(info.verif_stats.total_states, 0, "verif_stats.total_states"))
+		goto cleanup;
+
+cleanup:
+	trace_vprintk__destroy(skel);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-20 15:11 [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Dave Marchevsky
  2021-09-20 15:11 ` [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo Dave Marchevsky
  2021-09-20 15:11 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add verif_stats test Dave Marchevsky
@ 2021-09-23 20:51 ` Alexei Starovoitov
  2021-09-24  1:27   ` Dave Marchevsky
  2 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2021-09-23 20:51 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Yonghong Song

On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> The verifier currently logs some useful statistics in
> print_verification_stats. Although the text log is an effective feedback
> tool for an engineer iterating on a single application, it would also be
> useful to enable tracking these stats in a more structured form for
> fleetwide or historical analysis, which this patchset attempts to do.
> 
> A concrete motivating usecase which came up in recent weeks:
> 
> A team owns a complex BPF program, with various folks extending its
> functionality over the years. An engineer tries to make a relatively
> simple addition but encounters "BPF program is too large. Processed
> 1000001 insn". 
> 
> Their changes bumped the processed insns from 700k to over the limit and
> there's no obvious way to simplify. They must now consider a large
> refactor in order to incorporate the new feature. What if there was some
> previous change which bumped processed insns from 200k->700k which
> _could_ be modified to stress verifier less? Tracking historical
> verifier stats for each version of the program over the years would
> reduce manual work necessary to find such a change.
> 
> 
> Although parsing the text log could work for this scenario, a solution
> that's resilient to log format and other verifier changes would be
> preferable.
> 
> This patchset adds a bpf_prog_verif_stats struct - containing the same
> data logged by print_verification_stats - which can be retrieved as part
> of bpf_prog_info. Looking for general feedback on approach and a few
> specific areas before fleshing it out further:
> 
> * None of my usecases require storing verif_stats for the lifetime of a
>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
>   to pass verif_stats back as part of BPF_PROG_LOAD
> * The verif_stats are probably not generally useful enough to warrant
>   inclusion in fdinfo, but hoping to get confirmation before removing
>   that change in patch 1
> * processed_insn, verification_time, and total_states are immediately
>   useful for me, rest were added for parity with
> 	print_verification_stats. Can remove.
> * Perhaps a version field would be useful in verif_stats in case future
>   verifier changes make some current stats meaningless
> * Note: stack_depth stat was intentionally skipped to keep patch 1
>   simple. Will add if approach looks good.

Sorry for the delay. LPC consumes a lot of mental energy :)

I see the value of exposing some of the verification stats as prog_info.
Let's look at the list:
struct bpf_prog_verif_stats {
       __u64 verification_time;
       __u32 insn_processed;
       __u32 max_states_per_insn;
       __u32 total_states;
       __u32 peak_states;
       __u32 longest_mark_read_walk;
};
verification_time is non deterministic. It varies with frequency
and run-to-run. I don't see how alerting tools can use it.

insn_processed is indeed the main verification metric.
By now it's well known and understood.

max_states_per_insn, total_states, etc were the metrics I've studied
carefully with pruning, back tracking and pretty much every significant
change I did or reiviewed in the verifier. They're useful to humans
and developers, but I don't see how alerting tools will use them.

So it feels to me that insn_processed alone will be enough to address the
monitoring goal.
It can be exposed to fd_info and printed by bpftool.
If/when it changes with some future verifier algorithm we should be able
to approximate it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-23 20:51 ` [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Alexei Starovoitov
@ 2021-09-24  1:27   ` Dave Marchevsky
  2021-09-24  2:02     ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Marchevsky @ 2021-09-24  1:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Yonghong Song

On 9/23/21 4:51 PM, Alexei Starovoitov wrote:   
> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
>> The verifier currently logs some useful statistics in
>> print_verification_stats. Although the text log is an effective feedback
>> tool for an engineer iterating on a single application, it would also be
>> useful to enable tracking these stats in a more structured form for
>> fleetwide or historical analysis, which this patchset attempts to do.
>>
>> A concrete motivating usecase which came up in recent weeks:
>>
>> A team owns a complex BPF program, with various folks extending its
>> functionality over the years. An engineer tries to make a relatively
>> simple addition but encounters "BPF program is too large. Processed
>> 1000001 insn". 
>>
>> Their changes bumped the processed insns from 700k to over the limit and
>> there's no obvious way to simplify. They must now consider a large
>> refactor in order to incorporate the new feature. What if there was some
>> previous change which bumped processed insns from 200k->700k which
>> _could_ be modified to stress verifier less? Tracking historical
>> verifier stats for each version of the program over the years would
>> reduce manual work necessary to find such a change.
>>
>>
>> Although parsing the text log could work for this scenario, a solution
>> that's resilient to log format and other verifier changes would be
>> preferable.
>>
>> This patchset adds a bpf_prog_verif_stats struct - containing the same
>> data logged by print_verification_stats - which can be retrieved as part
>> of bpf_prog_info. Looking for general feedback on approach and a few
>> specific areas before fleshing it out further:
>>
>> * None of my usecases require storing verif_stats for the lifetime of a
>>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
>>   to pass verif_stats back as part of BPF_PROG_LOAD
>> * The verif_stats are probably not generally useful enough to warrant
>>   inclusion in fdinfo, but hoping to get confirmation before removing
>>   that change in patch 1
>> * processed_insn, verification_time, and total_states are immediately
>>   useful for me, rest were added for parity with
>> 	print_verification_stats. Can remove.
>> * Perhaps a version field would be useful in verif_stats in case future
>>   verifier changes make some current stats meaningless
>> * Note: stack_depth stat was intentionally skipped to keep patch 1
>>   simple. Will add if approach looks good.
> 
> Sorry for the delay. LPC consumes a lot of mental energy :)
> 
> I see the value of exposing some of the verification stats as prog_info.
> Let's look at the list:
> struct bpf_prog_verif_stats {
>        __u64 verification_time;
>        __u32 insn_processed;
>        __u32 max_states_per_insn;
>        __u32 total_states;
>        __u32 peak_states;
>        __u32 longest_mark_read_walk;
> };
> verification_time is non deterministic. It varies with frequency
> and run-to-run. I don't see how alerting tools can use it.

Makes sense to me, will get rid of it.

> insn_processed is indeed the main verification metric.
> By now it's well known and understood.
> 
> max_states_per_insn, total_states, etc were the metrics I've studied
> carefully with pruning, back tracking and pretty much every significant
> change I did or reiviewed in the verifier. They're useful to humans
> and developers, but I don't see how alerting tools will use them.
> 
> So it feels to me that insn_processed alone will be enough to address the
> monitoring goal.

For the concrete usecase in my original message insn_processed would be 
enough. For the others - I thought there might be value in gathering
those "fleetwide" to inform verifier development, e.g.:

"Hmm, this team's libbpf program has been regressing total_states over
past few {kernel, llvm} rollouts, but they haven't been modifying it. 
Let's try to get a minimal repro, send to bpf@vger, and contribute to 
selftests if it is indeed hitting a weird verifier edge case"

So for those I'm not expecting them to be useful to alert on or be a
number that the average BPF program writer needs to care about.

Of course this is hypothetical as I haven't tried to gather such data
and look for interesting patterns. But these metrics being useful to
you when looking at significant verifier changes is a good sign. 

> It can be exposed to fd_info and printed by bpftool.
> If/when it changes with some future verifier algorithm we should be able
> to approximate it.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-24  1:27   ` Dave Marchevsky
@ 2021-09-24  2:02     ` Andrii Nakryiko
  2021-09-24 18:24       ` Dave Marchevsky
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2021-09-24  2:02 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Alexei Starovoitov, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Yonghong Song

On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
> > On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> >> The verifier currently logs some useful statistics in
> >> print_verification_stats. Although the text log is an effective feedback
> >> tool for an engineer iterating on a single application, it would also be
> >> useful to enable tracking these stats in a more structured form for
> >> fleetwide or historical analysis, which this patchset attempts to do.
> >>
> >> A concrete motivating usecase which came up in recent weeks:
> >>
> >> A team owns a complex BPF program, with various folks extending its
> >> functionality over the years. An engineer tries to make a relatively
> >> simple addition but encounters "BPF program is too large. Processed
> >> 1000001 insn".
> >>
> >> Their changes bumped the processed insns from 700k to over the limit and
> >> there's no obvious way to simplify. They must now consider a large
> >> refactor in order to incorporate the new feature. What if there was some
> >> previous change which bumped processed insns from 200k->700k which
> >> _could_ be modified to stress verifier less? Tracking historical
> >> verifier stats for each version of the program over the years would
> >> reduce manual work necessary to find such a change.
> >>
> >>
> >> Although parsing the text log could work for this scenario, a solution
> >> that's resilient to log format and other verifier changes would be
> >> preferable.
> >>
> >> This patchset adds a bpf_prog_verif_stats struct - containing the same
> >> data logged by print_verification_stats - which can be retrieved as part
> >> of bpf_prog_info. Looking for general feedback on approach and a few
> >> specific areas before fleshing it out further:
> >>
> >> * None of my usecases require storing verif_stats for the lifetime of a
> >>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
> >>   to pass verif_stats back as part of BPF_PROG_LOAD
> >> * The verif_stats are probably not generally useful enough to warrant
> >>   inclusion in fdinfo, but hoping to get confirmation before removing
> >>   that change in patch 1
> >> * processed_insn, verification_time, and total_states are immediately
> >>   useful for me, rest were added for parity with
> >>      print_verification_stats. Can remove.
> >> * Perhaps a version field would be useful in verif_stats in case future
> >>   verifier changes make some current stats meaningless
> >> * Note: stack_depth stat was intentionally skipped to keep patch 1
> >>   simple. Will add if approach looks good.
> >
> > Sorry for the delay. LPC consumes a lot of mental energy :)
> >
> > I see the value of exposing some of the verification stats as prog_info.
> > Let's look at the list:
> > struct bpf_prog_verif_stats {
> >        __u64 verification_time;
> >        __u32 insn_processed;
> >        __u32 max_states_per_insn;
> >        __u32 total_states;
> >        __u32 peak_states;
> >        __u32 longest_mark_read_walk;
> > };
> > verification_time is non deterministic. It varies with frequency
> > and run-to-run. I don't see how alerting tools can use it.
>
> Makes sense to me, will get rid of it.
>
> > insn_processed is indeed the main verification metric.
> > By now it's well known and understood.
> >
> > max_states_per_insn, total_states, etc were the metrics I've studied
> > carefully with pruning, back tracking and pretty much every significant
> > change I did or reiviewed in the verifier. They're useful to humans
> > and developers, but I don't see how alerting tools will use them.
> >
> > So it feels to me that insn_processed alone will be enough to address the
> > monitoring goal.
>
> For the concrete usecase in my original message insn_processed would be
> enough. For the others - I thought there might be value in gathering
> those "fleetwide" to inform verifier development, e.g.:
>
> "Hmm, this team's libbpf program has been regressing total_states over
> past few {kernel, llvm} rollouts, but they haven't been modifying it.
> Let's try to get a minimal repro, send to bpf@vger, and contribute to
> selftests if it is indeed hitting a weird verifier edge case"
>
> So for those I'm not expecting them to be useful to alert on or be a
> number that the average BPF program writer needs to care about.
>
> Of course this is hypothetical as I haven't tried to gather such data
> and look for interesting patterns. But these metrics being useful to
> you when looking at significant verifier changes is a good sign.

One reason to not add all those fields is to not end up with
meaningless stats (in the future) in UAPI. One way to work around that
is to make it "unstable" by providing it through raw_tracepoint as
internal kernel struct.

Basically, the proposal would be: add new tracepoint for when BPF
program is verified, either successfully or not. As one of the
parameters provide stats struct which is internal to BPF verifier and
is not exposed through UAPI.

Such tracepoint actually would be useful more generally as well, e.g.,
to monitor which programs are verified in the fleet, what's the rate
of success/failure (to detect verifier regression), what are the stats
(verification time actually would be good to have there, again for
stats and detecting regression), etc, etc.

WDYT?

>
> > It can be exposed to fd_info and printed by bpftool.
> > If/when it changes with some future verifier algorithm we should be able
> > to approximate it.
> >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-24  2:02     ` Andrii Nakryiko
@ 2021-09-24 18:24       ` Dave Marchevsky
  2021-09-27 18:20         ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Marchevsky @ 2021-09-24 18:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Yonghong Song

On 9/23/21 10:02 PM, Andrii Nakryiko wrote:   
> On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
>>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
>>>> The verifier currently logs some useful statistics in
>>>> print_verification_stats. Although the text log is an effective feedback
>>>> tool for an engineer iterating on a single application, it would also be
>>>> useful to enable tracking these stats in a more structured form for
>>>> fleetwide or historical analysis, which this patchset attempts to do.
>>>>
>>>> A concrete motivating usecase which came up in recent weeks:
>>>>
>>>> A team owns a complex BPF program, with various folks extending its
>>>> functionality over the years. An engineer tries to make a relatively
>>>> simple addition but encounters "BPF program is too large. Processed
>>>> 1000001 insn".
>>>>
>>>> Their changes bumped the processed insns from 700k to over the limit and
>>>> there's no obvious way to simplify. They must now consider a large
>>>> refactor in order to incorporate the new feature. What if there was some
>>>> previous change which bumped processed insns from 200k->700k which
>>>> _could_ be modified to stress verifier less? Tracking historical
>>>> verifier stats for each version of the program over the years would
>>>> reduce manual work necessary to find such a change.
>>>>
>>>>
>>>> Although parsing the text log could work for this scenario, a solution
>>>> that's resilient to log format and other verifier changes would be
>>>> preferable.
>>>>
>>>> This patchset adds a bpf_prog_verif_stats struct - containing the same
>>>> data logged by print_verification_stats - which can be retrieved as part
>>>> of bpf_prog_info. Looking for general feedback on approach and a few
>>>> specific areas before fleshing it out further:
>>>>
>>>> * None of my usecases require storing verif_stats for the lifetime of a
>>>>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
>>>>   to pass verif_stats back as part of BPF_PROG_LOAD
>>>> * The verif_stats are probably not generally useful enough to warrant
>>>>   inclusion in fdinfo, but hoping to get confirmation before removing
>>>>   that change in patch 1
>>>> * processed_insn, verification_time, and total_states are immediately
>>>>   useful for me, rest were added for parity with
>>>>      print_verification_stats. Can remove.
>>>> * Perhaps a version field would be useful in verif_stats in case future
>>>>   verifier changes make some current stats meaningless
>>>> * Note: stack_depth stat was intentionally skipped to keep patch 1
>>>>   simple. Will add if approach looks good.
>>>
>>> Sorry for the delay. LPC consumes a lot of mental energy :)
>>>
>>> I see the value of exposing some of the verification stats as prog_info.
>>> Let's look at the list:
>>> struct bpf_prog_verif_stats {
>>>        __u64 verification_time;
>>>        __u32 insn_processed;
>>>        __u32 max_states_per_insn;
>>>        __u32 total_states;
>>>        __u32 peak_states;
>>>        __u32 longest_mark_read_walk;
>>> };
>>> verification_time is non deterministic. It varies with frequency
>>> and run-to-run. I don't see how alerting tools can use it.
>>
>> Makes sense to me, will get rid of it.
>>
>>> insn_processed is indeed the main verification metric.
>>> By now it's well known and understood.
>>>
>>> max_states_per_insn, total_states, etc were the metrics I've studied
>>> carefully with pruning, back tracking and pretty much every significant
>>> change I did or reiviewed in the verifier. They're useful to humans
>>> and developers, but I don't see how alerting tools will use them.
>>>
>>> So it feels to me that insn_processed alone will be enough to address the
>>> monitoring goal.
>>
>> For the concrete usecase in my original message insn_processed would be
>> enough. For the others - I thought there might be value in gathering
>> those "fleetwide" to inform verifier development, e.g.:
>>
>> "Hmm, this team's libbpf program has been regressing total_states over
>> past few {kernel, llvm} rollouts, but they haven't been modifying it.
>> Let's try to get a minimal repro, send to bpf@vger, and contribute to
>> selftests if it is indeed hitting a weird verifier edge case"
>>
>> So for those I'm not expecting them to be useful to alert on or be a
>> number that the average BPF program writer needs to care about.
>>
>> Of course this is hypothetical as I haven't tried to gather such data
>> and look for interesting patterns. But these metrics being useful to
>> you when looking at significant verifier changes is a good sign.
> 
> One reason to not add all those fields is to not end up with
> meaningless stats (in the future) in UAPI. One way to work around that
> is to make it "unstable" by providing it through raw_tracepoint as
> internal kernel struct.
> 
> Basically, the proposal would be: add new tracepoint for when BPF
> program is verified, either successfully or not. As one of the
> parameters provide stats struct which is internal to BPF verifier and
> is not exposed through UAPI.
> 
> Such tracepoint actually would be useful more generally as well, e.g.,
> to monitor which programs are verified in the fleet, what's the rate
> of success/failure (to detect verifier regression), what are the stats
> (verification time actually would be good to have there, again for
> stats and detecting regression), etc, etc.
> 
> WDYT?
> 

Seems reasonable to me - and attaching a BPF program to the tracepoint to
grab data is delightfully meta :)

I'll do a pass on alternate implementation with _just_ tracepoint, no 
prog_info or fdinfo, can add minimal or full stats to those later if
necessary.

>>
>>> It can be exposed to fd_info and printed by bpftool.
>>> If/when it changes with some future verifier algorithm we should be able
>>> to approximate it.
>>>
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-24 18:24       ` Dave Marchevsky
@ 2021-09-27 18:20         ` John Fastabend
  2021-09-28  0:41           ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2021-09-27 18:20 UTC (permalink / raw)
  To: Dave Marchevsky, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Yonghong Song

Dave Marchevsky wrote:
> On 9/23/21 10:02 PM, Andrii Nakryiko wrote:   
> > On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >>
> >> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
> >>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> >>>> The verifier currently logs some useful statistics in
> >>>> print_verification_stats. Although the text log is an effective feedback
> >>>> tool for an engineer iterating on a single application, it would also be
> >>>> useful to enable tracking these stats in a more structured form for
> >>>> fleetwide or historical analysis, which this patchset attempts to do.
> >>>>
> >>>> A concrete motivating usecase which came up in recent weeks:
> >>>>
> >>>> A team owns a complex BPF program, with various folks extending its
> >>>> functionality over the years. An engineer tries to make a relatively
> >>>> simple addition but encounters "BPF program is too large. Processed
> >>>> 1000001 insn".
> >>>>
> >>>> Their changes bumped the processed insns from 700k to over the limit and
> >>>> there's no obvious way to simplify. They must now consider a large
> >>>> refactor in order to incorporate the new feature. What if there was some
> >>>> previous change which bumped processed insns from 200k->700k which
> >>>> _could_ be modified to stress verifier less? Tracking historical
> >>>> verifier stats for each version of the program over the years would
> >>>> reduce manual work necessary to find such a change.
> >>>>
> >>>>
> >>>> Although parsing the text log could work for this scenario, a solution
> >>>> that's resilient to log format and other verifier changes would be
> >>>> preferable.
> >>>>
> >>>> This patchset adds a bpf_prog_verif_stats struct - containing the same
> >>>> data logged by print_verification_stats - which can be retrieved as part
> >>>> of bpf_prog_info. Looking for general feedback on approach and a few
> >>>> specific areas before fleshing it out further:
> >>>>
> >>>> * None of my usecases require storing verif_stats for the lifetime of a
> >>>>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
> >>>>   to pass verif_stats back as part of BPF_PROG_LOAD
> >>>> * The verif_stats are probably not generally useful enough to warrant
> >>>>   inclusion in fdinfo, but hoping to get confirmation before removing
> >>>>   that change in patch 1
> >>>> * processed_insn, verification_time, and total_states are immediately
> >>>>   useful for me, rest were added for parity with
> >>>>      print_verification_stats. Can remove.
> >>>> * Perhaps a version field would be useful in verif_stats in case future
> >>>>   verifier changes make some current stats meaningless
> >>>> * Note: stack_depth stat was intentionally skipped to keep patch 1
> >>>>   simple. Will add if approach looks good.
> >>>
> >>> Sorry for the delay. LPC consumes a lot of mental energy :)
> >>>
> >>> I see the value of exposing some of the verification stats as prog_info.
> >>> Let's look at the list:
> >>> struct bpf_prog_verif_stats {
> >>>        __u64 verification_time;
> >>>        __u32 insn_processed;
> >>>        __u32 max_states_per_insn;
> >>>        __u32 total_states;
> >>>        __u32 peak_states;
> >>>        __u32 longest_mark_read_walk;
> >>> };
> >>> verification_time is non deterministic. It varies with frequency
> >>> and run-to-run. I don't see how alerting tools can use it.
> >>
> >> Makes sense to me, will get rid of it.
> >>
> >>> insn_processed is indeed the main verification metric.
> >>> By now it's well known and understood.
> >>>
> >>> max_states_per_insn, total_states, etc were the metrics I've studied
> >>> carefully with pruning, back tracking and pretty much every significant
> >>> change I did or reiviewed in the verifier. They're useful to humans
> >>> and developers, but I don't see how alerting tools will use them.
> >>>
> >>> So it feels to me that insn_processed alone will be enough to address the
> >>> monitoring goal.
> >>
> >> For the concrete usecase in my original message insn_processed would be
> >> enough. For the others - I thought there might be value in gathering
> >> those "fleetwide" to inform verifier development, e.g.:
> >>
> >> "Hmm, this team's libbpf program has been regressing total_states over
> >> past few {kernel, llvm} rollouts, but they haven't been modifying it.
> >> Let's try to get a minimal repro, send to bpf@vger, and contribute to
> >> selftests if it is indeed hitting a weird verifier edge case"
> >>
> >> So for those I'm not expecting them to be useful to alert on or be a
> >> number that the average BPF program writer needs to care about.
> >>
> >> Of course this is hypothetical as I haven't tried to gather such data
> >> and look for interesting patterns. But these metrics being useful to
> >> you when looking at significant verifier changes is a good sign.
> > 
> > One reason to not add all those fields is to not end up with
> > meaningless stats (in the future) in UAPI. One way to work around that
> > is to make it "unstable" by providing it through raw_tracepoint as
> > internal kernel struct.
> > 
> > Basically, the proposal would be: add new tracepoint for when BPF
> > program is verified, either successfully or not. As one of the
> > parameters provide stats struct which is internal to BPF verifier and
> > is not exposed through UAPI.
> > 
> > Such tracepoint actually would be useful more generally as well, e.g.,
> > to monitor which programs are verified in the fleet, what's the rate
> > of success/failure (to detect verifier regression), what are the stats
> > (verification time actually would be good to have there, again for
> > stats and detecting regression), etc, etc.
> > 
> > WDYT?
> > 
> 
> Seems reasonable to me - and attaching a BPF program to the tracepoint to
> grab data is delightfully meta :)
> 
> I'll do a pass on alternate implementation with _just_ tracepoint, no 
> prog_info or fdinfo, can add minimal or full stats to those later if
> necessary.

We can also use a hook point here to enforce policy on allowing the
BPF program to load or not using the stats here. For now basic
insn is a good start to allow larger/smaller programs to be loaded,
but we might add other info like call bitmask, features, types, etc.
If one of the arguments is the bpf_attr struct we can just read
lots of useful program info out directly.

We would need something different from a tracepoint though to let
it return a reject|accept code. How about a new hook type that
has something similar to sockops that lets us just return an
accept or reject code?

By doing this we can check loader signatures here to be sure the
loader is signed or otherwise has correct permissions to be loading
whatever type of bpf program is here.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-27 18:20         ` John Fastabend
@ 2021-09-28  0:41           ` Andrii Nakryiko
  2021-09-28  1:33             ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2021-09-28  0:41 UTC (permalink / raw)
  To: John Fastabend
  Cc: Dave Marchevsky, Alexei Starovoitov, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song

On Mon, Sep 27, 2021 at 11:20 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Dave Marchevsky wrote:
> > On 9/23/21 10:02 PM, Andrii Nakryiko wrote:
> > > On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > >>
> > >> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
> > >>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> > >>>> The verifier currently logs some useful statistics in
> > >>>> print_verification_stats. Although the text log is an effective feedback
> > >>>> tool for an engineer iterating on a single application, it would also be
> > >>>> useful to enable tracking these stats in a more structured form for
> > >>>> fleetwide or historical analysis, which this patchset attempts to do.
> > >>>>
> > >>>> A concrete motivating usecase which came up in recent weeks:
> > >>>>
> > >>>> A team owns a complex BPF program, with various folks extending its
> > >>>> functionality over the years. An engineer tries to make a relatively
> > >>>> simple addition but encounters "BPF program is too large. Processed
> > >>>> 1000001 insn".
> > >>>>
> > >>>> Their changes bumped the processed insns from 700k to over the limit and
> > >>>> there's no obvious way to simplify. They must now consider a large
> > >>>> refactor in order to incorporate the new feature. What if there was some
> > >>>> previous change which bumped processed insns from 200k->700k which
> > >>>> _could_ be modified to stress verifier less? Tracking historical
> > >>>> verifier stats for each version of the program over the years would
> > >>>> reduce manual work necessary to find such a change.
> > >>>>
> > >>>>
> > >>>> Although parsing the text log could work for this scenario, a solution
> > >>>> that's resilient to log format and other verifier changes would be
> > >>>> preferable.
> > >>>>
> > >>>> This patchset adds a bpf_prog_verif_stats struct - containing the same
> > >>>> data logged by print_verification_stats - which can be retrieved as part
> > >>>> of bpf_prog_info. Looking for general feedback on approach and a few
> > >>>> specific areas before fleshing it out further:
> > >>>>
> > >>>> * None of my usecases require storing verif_stats for the lifetime of a
> > >>>>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
> > >>>>   to pass verif_stats back as part of BPF_PROG_LOAD
> > >>>> * The verif_stats are probably not generally useful enough to warrant
> > >>>>   inclusion in fdinfo, but hoping to get confirmation before removing
> > >>>>   that change in patch 1
> > >>>> * processed_insn, verification_time, and total_states are immediately
> > >>>>   useful for me, rest were added for parity with
> > >>>>      print_verification_stats. Can remove.
> > >>>> * Perhaps a version field would be useful in verif_stats in case future
> > >>>>   verifier changes make some current stats meaningless
> > >>>> * Note: stack_depth stat was intentionally skipped to keep patch 1
> > >>>>   simple. Will add if approach looks good.
> > >>>
> > >>> Sorry for the delay. LPC consumes a lot of mental energy :)
> > >>>
> > >>> I see the value of exposing some of the verification stats as prog_info.
> > >>> Let's look at the list:
> > >>> struct bpf_prog_verif_stats {
> > >>>        __u64 verification_time;
> > >>>        __u32 insn_processed;
> > >>>        __u32 max_states_per_insn;
> > >>>        __u32 total_states;
> > >>>        __u32 peak_states;
> > >>>        __u32 longest_mark_read_walk;
> > >>> };
> > >>> verification_time is non deterministic. It varies with frequency
> > >>> and run-to-run. I don't see how alerting tools can use it.
> > >>
> > >> Makes sense to me, will get rid of it.
> > >>
> > >>> insn_processed is indeed the main verification metric.
> > >>> By now it's well known and understood.
> > >>>
> > >>> max_states_per_insn, total_states, etc were the metrics I've studied
> > >>> carefully with pruning, back tracking and pretty much every significant
> > >>> change I did or reiviewed in the verifier. They're useful to humans
> > >>> and developers, but I don't see how alerting tools will use them.
> > >>>
> > >>> So it feels to me that insn_processed alone will be enough to address the
> > >>> monitoring goal.
> > >>
> > >> For the concrete usecase in my original message insn_processed would be
> > >> enough. For the others - I thought there might be value in gathering
> > >> those "fleetwide" to inform verifier development, e.g.:
> > >>
> > >> "Hmm, this team's libbpf program has been regressing total_states over
> > >> past few {kernel, llvm} rollouts, but they haven't been modifying it.
> > >> Let's try to get a minimal repro, send to bpf@vger, and contribute to
> > >> selftests if it is indeed hitting a weird verifier edge case"
> > >>
> > >> So for those I'm not expecting them to be useful to alert on or be a
> > >> number that the average BPF program writer needs to care about.
> > >>
> > >> Of course this is hypothetical as I haven't tried to gather such data
> > >> and look for interesting patterns. But these metrics being useful to
> > >> you when looking at significant verifier changes is a good sign.
> > >
> > > One reason to not add all those fields is to not end up with
> > > meaningless stats (in the future) in UAPI. One way to work around that
> > > is to make it "unstable" by providing it through raw_tracepoint as
> > > internal kernel struct.
> > >
> > > Basically, the proposal would be: add new tracepoint for when BPF
> > > program is verified, either successfully or not. As one of the
> > > parameters provide stats struct which is internal to BPF verifier and
> > > is not exposed through UAPI.
> > >
> > > Such tracepoint actually would be useful more generally as well, e.g.,
> > > to monitor which programs are verified in the fleet, what's the rate
> > > of success/failure (to detect verifier regression), what are the stats
> > > (verification time actually would be good to have there, again for
> > > stats and detecting regression), etc, etc.
> > >
> > > WDYT?
> > >
> >
> > Seems reasonable to me - and attaching a BPF program to the tracepoint to
> > grab data is delightfully meta :)
> >
> > I'll do a pass on alternate implementation with _just_ tracepoint, no
> > prog_info or fdinfo, can add minimal or full stats to those later if
> > necessary.
>
> We can also use a hook point here to enforce policy on allowing the
> BPF program to load or not using the stats here. For now basic
> insn is a good start to allow larger/smaller programs to be loaded,
> but we might add other info like call bitmask, features, types, etc.
> If one of the arguments is the bpf_attr struct we can just read
> lots of useful program info out directly.
>
> We would need something different from a tracepoint though to let
> it return a reject|accept code. How about a new hook type that
> has something similar to sockops that lets us just return an
> accept or reject code?
>
> By doing this we can check loader signatures here to be sure the
> loader is signed or otherwise has correct permissions to be loading
> whatever type of bpf program is here.

For signing and generally preventing some BPF programs from loading
(e.g., if there is some malicious BPF program that takes tons of
memory to be validated), wouldn't you want to check that before BPF
verifier spent all those resources on verification? So maybe there
will be another hook before BPF prog is validated for that? Basically,
if you don't trust any BPF program unless it is signed, I'd expect you
check signature before BPF verifier does its heavy job.

>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-28  0:41           ` Andrii Nakryiko
@ 2021-09-28  1:33             ` John Fastabend
  2021-10-07  9:06               ` Dave Marchevsky
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2021-09-28  1:33 UTC (permalink / raw)
  To: Andrii Nakryiko, John Fastabend
  Cc: Dave Marchevsky, Alexei Starovoitov, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yonghong Song

Andrii Nakryiko wrote:
> On Mon, Sep 27, 2021 at 11:20 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Dave Marchevsky wrote:
> > > On 9/23/21 10:02 PM, Andrii Nakryiko wrote:
> > > > On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > > >>
> > > >> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
> > > >>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> > > >>>> The verifier currently logs some useful statistics in
> > > >>>> print_verification_stats. Although the text log is an effective feedback
> > > >>>> tool for an engineer iterating on a single application, it would also be
> > > >>>> useful to enable tracking these stats in a more structured form for
> > > >>>> fleetwide or historical analysis, which this patchset attempts to do.
> > > >>>>

[...] 

> > >
> > > Seems reasonable to me - and attaching a BPF program to the tracepoint to
> > > grab data is delightfully meta :)
> > >
> > > I'll do a pass on alternate implementation with _just_ tracepoint, no
> > > prog_info or fdinfo, can add minimal or full stats to those later if
> > > necessary.
> >
> > We can also use a hook point here to enforce policy on allowing the
> > BPF program to load or not using the stats here. For now basic
> > insn is a good start to allow larger/smaller programs to be loaded,
> > but we might add other info like call bitmask, features, types, etc.
> > If one of the arguments is the bpf_attr struct we can just read
> > lots of useful program info out directly.
> >
> > We would need something different from a tracepoint though to let
> > it return a reject|accept code. How about a new hook type that
> > has something similar to sockops that lets us just return an
> > accept or reject code?
> >
> > By doing this we can check loader signatures here to be sure the
> > loader is signed or otherwise has correct permissions to be loading
> > whatever type of bpf program is here.
> 
> For signing and generally preventing some BPF programs from loading
> (e.g., if there is some malicious BPF program that takes tons of
> memory to be validated), wouldn't you want to check that before BPF
> verifier spent all those resources on verification? So maybe there
> will be another hook before BPF prog is validated for that? Basically,
> if you don't trust any BPF program unless it is signed, I'd expect you
> check signature before BPF verifier does its heavy job.

Agree, for basic sig check or anything that just wants to look at
the task_struct storage for some attributes before we verify is
more efficient. The only reason I suggested after is if we wanted
to start auditing/enforcing on calls or map read/writes, etc. these
we would need the verifier to help tabulate.

When I hacked it in for experimenting I put the hook in the sys
bpf load path before the verifier runs. That seemed to work for
the simpler sig check cases I was running.

OTOH though if we have a system with lots of BPF failed loads this
would indicate a more serious problem that an admin should fix
so might be nicer code-wise to just have a single hook after verifier
vs optimizing to two one in front and one after. 

> 
> >
> > Thanks,
> > John



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-09-28  1:33             ` John Fastabend
@ 2021-10-07  9:06               ` Dave Marchevsky
  2021-10-08 15:50                 ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Marchevsky @ 2021-10-07  9:06 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Yonghong Song

On 9/27/21 9:33 PM, John Fastabend wrote:   
> Andrii Nakryiko wrote:
>> On Mon, Sep 27, 2021 at 11:20 AM John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>>
>>> Dave Marchevsky wrote:
>>>> On 9/23/21 10:02 PM, Andrii Nakryiko wrote:
>>>>> On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>>>>>
>>>>>> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
>>>>>>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
>>>>>>>> The verifier currently logs some useful statistics in
>>>>>>>> print_verification_stats. Although the text log is an effective feedback
>>>>>>>> tool for an engineer iterating on a single application, it would also be
>>>>>>>> useful to enable tracking these stats in a more structured form for
>>>>>>>> fleetwide or historical analysis, which this patchset attempts to do.
>>>>>>>>
> 
> [...] 
> 
>>>>
>>>> Seems reasonable to me - and attaching a BPF program to the tracepoint to
>>>> grab data is delightfully meta :)
>>>>
>>>> I'll do a pass on alternate implementation with _just_ tracepoint, no
>>>> prog_info or fdinfo, can add minimal or full stats to those later if
>>>> necessary.

Actually I ended up pushing a simple add of insn_processed to prog_info, 
fdinfo instead of bare tracepoint. The more general discussion here is
interesting - if we can inject some custom logic into various points in
verification process, can gather arbitrary stats or make policy decisions
from the same attach points.

>>>
>>> We can also use a hook point here to enforce policy on allowing the
>>> BPF program to load or not using the stats here. For now basic
>>> insn is a good start to allow larger/smaller programs to be loaded,
>>> but we might add other info like call bitmask, features, types, etc.
>>> If one of the arguments is the bpf_attr struct we can just read
>>> lots of useful program info out directly.
>>>
>>> We would need something different from a tracepoint though to let
>>> it return a reject|accept code. How about a new hook type that
>>> has something similar to sockops that lets us just return an
>>> accept or reject code?
>>>
>>> By doing this we can check loader signatures here to be sure the
>>> loader is signed or otherwise has correct permissions to be loading
>>> whatever type of bpf program is here.
>>
>> For signing and generally preventing some BPF programs from loading
>> (e.g., if there is some malicious BPF program that takes tons of
>> memory to be validated), wouldn't you want to check that before BPF
>> verifier spent all those resources on verification? So maybe there
>> will be another hook before BPF prog is validated for that? Basically,
>> if you don't trust any BPF program unless it is signed, I'd expect you
>> check signature before BPF verifier does its heavy job.
> 
> Agree, for basic sig check or anything that just wants to look at
> the task_struct storage for some attributes before we verify is
> more efficient. The only reason I suggested after is if we wanted
> to start auditing/enforcing on calls or map read/writes, etc. these
> we would need the verifier to help tabulate.

This is the "Bob isn't signed, so ensure that Bob can only read from 
Alice's maps" case from your recent talk, right? 

I'd like to add another illustrative usecase: "progs of type X can 
use 4k of stack, while type Y can only use 128 bytes of stack". For
the 4k case, a single attach point after verification is complete 
wouldn't work as the prog would've been eagerly rejected.

Alexei suggested adding some empty noinline functions with 
ALLOW_ERROR_INJECTION at potential attach points, then attaching 
BPF_MODIFY_RETURN progs to inject custom logic. This would allow 
arbitrarily digging around while optionally affecting return result.

WDYT?

> When I hacked it in for experimenting I put the hook in the sys
> bpf load path before the verifier runs. That seemed to work for
> the simpler sig check cases I was running.
> 
> OTOH though if we have a system with lots of BPF failed loads this
> would indicate a more serious problem that an admin should fix
> so might be nicer code-wise to just have a single hook after verifier
> vs optimizing to two one in front and one after. 
> 
>>
>>>
>>> Thanks,
>>> John
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
  2021-10-07  9:06               ` Dave Marchevsky
@ 2021-10-08 15:50                 ` John Fastabend
  0 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2021-10-08 15:50 UTC (permalink / raw)
  To: Dave Marchevsky, John Fastabend, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Yonghong Song

Dave Marchevsky wrote:
> On 9/27/21 9:33 PM, John Fastabend wrote:   
> > Andrii Nakryiko wrote:
> >> On Mon, Sep 27, 2021 at 11:20 AM John Fastabend
> >> <john.fastabend@gmail.com> wrote:
> >>>
> >>> Dave Marchevsky wrote:
> >>>> On 9/23/21 10:02 PM, Andrii Nakryiko wrote:
> >>>>> On Thu, Sep 23, 2021 at 6:27 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >>>>>>
> >>>>>> On 9/23/21 4:51 PM, Alexei Starovoitov wrote:
> >>>>>>> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> >>>>>>>> The verifier currently logs some useful statistics in
> >>>>>>>> print_verification_stats. Although the text log is an effective feedback
> >>>>>>>> tool for an engineer iterating on a single application, it would also be
> >>>>>>>> useful to enable tracking these stats in a more structured form for
> >>>>>>>> fleetwide or historical analysis, which this patchset attempts to do.
> >>>>>>>>
> > 
> > [...] 
> > 
> >>>>
> >>>> Seems reasonable to me - and attaching a BPF program to the tracepoint to
> >>>> grab data is delightfully meta :)
> >>>>
> >>>> I'll do a pass on alternate implementation with _just_ tracepoint, no
> >>>> prog_info or fdinfo, can add minimal or full stats to those later if
> >>>> necessary.
> 
> Actually I ended up pushing a simple add of insn_processed to prog_info, 
> fdinfo instead of bare tracepoint. The more general discussion here is
> interesting - if we can inject some custom logic into various points in
> verification process, can gather arbitrary stats or make policy decisions
> from the same attach points.
> 
> >>>
> >>> We can also use a hook point here to enforce policy on allowing the
> >>> BPF program to load or not using the stats here. For now basic
> >>> insn is a good start to allow larger/smaller programs to be loaded,
> >>> but we might add other info like call bitmask, features, types, etc.
> >>> If one of the arguments is the bpf_attr struct we can just read
> >>> lots of useful program info out directly.
> >>>
> >>> We would need something different from a tracepoint though to let
> >>> it return a reject|accept code. How about a new hook type that
> >>> has something similar to sockops that lets us just return an
> >>> accept or reject code?
> >>>
> >>> By doing this we can check loader signatures here to be sure the
> >>> loader is signed or otherwise has correct permissions to be loading
> >>> whatever type of bpf program is here.
> >>
> >> For signing and generally preventing some BPF programs from loading
> >> (e.g., if there is some malicious BPF program that takes tons of
> >> memory to be validated), wouldn't you want to check that before BPF
> >> verifier spent all those resources on verification? So maybe there
> >> will be another hook before BPF prog is validated for that? Basically,
> >> if you don't trust any BPF program unless it is signed, I'd expect you
> >> check signature before BPF verifier does its heavy job.
> > 
> > Agree, for basic sig check or anything that just wants to look at
> > the task_struct storage for some attributes before we verify is
> > more efficient. The only reason I suggested after is if we wanted
> > to start auditing/enforcing on calls or map read/writes, etc. these
> > we would need the verifier to help tabulate.
> 
> This is the "Bob isn't signed, so ensure that Bob can only read from 
> Alice's maps" case from your recent talk, right? 

Correct.

> 
> I'd like to add another illustrative usecase: "progs of type X can 
> use 4k of stack, while type Y can only use 128 bytes of stack". For
> the 4k case, a single attach point after verification is complete 
> wouldn't work as the prog would've been eagerly rejected.

Makes sense. Another use case would be to allow more tailcalls. 32 has
become limiting for some of our use cases where we have relatively small
insn counts, but tail calls >32 may happen. If we bumped this to
128 for example we might want to only allow specific prog types to
allow it.

> 
> Alexei suggested adding some empty noinline functions with 
> ALLOW_ERROR_INJECTION at potential attach points, then attaching 
> BPF_MODIFY_RETURN progs to inject custom logic. This would allow 
> arbitrarily digging around while optionally affecting return result.
> 
> WDYT?

That is exactly what I had in mind as well. And what I did to get
the example to work in the talk.

Also we would want a few other of these non-inline verdict locations
as well mostly to avoid having to setup a full LSM environment when we
just want some simple BPF verdict actions at key hooks and/or to
trigger our CI on some edge cases that are tricky to hit by luck.

> 
> > When I hacked it in for experimenting I put the hook in the sys
> > bpf load path before the verifier runs. That seemed to work for
> > the simpler sig check cases I was running.
> > 
> > OTOH though if we have a system with lots of BPF failed loads this
> > would indicate a more serious problem that an admin should fix
> > so might be nicer code-wise to just have a single hook after verifier
> > vs optimizing to two one in front and one after. 
> > 
> >>
> >>>
> >>> Thanks,
> >>> John
> > 
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-10-08 15:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 15:11 [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Dave Marchevsky
2021-09-20 15:11 ` [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo Dave Marchevsky
2021-09-20 15:11 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add verif_stats test Dave Marchevsky
2021-09-23 20:51 ` [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Alexei Starovoitov
2021-09-24  1:27   ` Dave Marchevsky
2021-09-24  2:02     ` Andrii Nakryiko
2021-09-24 18:24       ` Dave Marchevsky
2021-09-27 18:20         ` John Fastabend
2021-09-28  0:41           ` Andrii Nakryiko
2021-09-28  1:33             ` John Fastabend
2021-10-07  9:06               ` Dave Marchevsky
2021-10-08 15:50                 ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.