netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Roman Gushchin <guro@fb.com>, Alexei Starovoitov <ast@kernel.org>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats
Date: Sat, 23 Feb 2019 02:06:56 +0100	[thread overview]
Message-ID: <8de20612-c32a-9370-254d-9ebf1df78efa@iogearbox.net> (raw)
In-Reply-To: <20190223003434.GA17559@tower.DHCP.thefacebook.com>

On 02/23/2019 01:34 AM, Roman Gushchin wrote:
> On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote:
>> JITed BPF programs are indistinguishable from kernel functions, but unlike
>> kernel code BPF code can be changed often.
>> Typical approach of "perf record" + "perf report" profiling and tunning of
>> kernel code works just as well for BPF programs, but kernel code doesn't
>> need to be monitored whereas BPF programs do.
>> Users load and run large amount of BPF programs.
>> These BPF stats allow tools monitor the usage of BPF on the server.
>> The monitoring tools will turn sysctl kernel.bpf_stats_enabled
>> on and off for few seconds to sample average cost of the programs.
>> Aggregated data over hours and days will provide an insight into cost of BPF
>> and alarms can trigger in case given program suddenly gets more expensive.
>>
>> The cost of two sched_clock() per program invocation adds ~20 nsec.
>> Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
>> from ~40 nsec to ~60 nsec.
>> static_key minimizes the cost of the stats collection.
>> There is no measurable difference before/after this patch
>> with kernel.bpf_stats_enabled=0
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>  include/linux/bpf.h    |  7 +++++++
>>  include/linux/filter.h | 16 +++++++++++++++-
>>  kernel/bpf/core.c      | 13 +++++++++++++
>>  kernel/bpf/syscall.c   | 24 ++++++++++++++++++++++--
>>  kernel/bpf/verifier.c  |  5 +++++
>>  kernel/sysctl.c        | 34 ++++++++++++++++++++++++++++++++++
>>  6 files changed, 96 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index de18227b3d95..14260674bc57 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type {
>>  
>>  #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
>>  
>> +struct bpf_prog_stats {
>> +	u64 cnt;
>> +	u64 nsecs;
>> +};
>> +
>>  struct bpf_prog_aux {
>>  	atomic_t refcnt;
>>  	u32 used_map_cnt;
>> @@ -389,6 +394,7 @@ struct bpf_prog_aux {
>>  	 * main prog always has linfo_idx == 0
>>  	 */
>>  	u32 linfo_idx;
>> +	struct bpf_prog_stats __percpu *stats;
>>  	union {
>>  		struct work_struct work;
>>  		struct rcu_head	rcu;
>> @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base);
>>  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
>>  
>>  extern int sysctl_unprivileged_bpf_disabled;
>> +extern int sysctl_bpf_stats_enabled;
>>  
>>  int bpf_map_new_fd(struct bpf_map *map, int flags);
>>  int bpf_prog_new_fd(struct bpf_prog *prog);
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index f32b3eca5a04..7b14235b6f7d 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -533,7 +533,21 @@ struct sk_filter {
>>  	struct bpf_prog	*prog;
>>  };
>>  
>> -#define BPF_PROG_RUN(filter, ctx)  ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
>> +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>> +
>> +#define BPF_PROG_RUN(prog, ctx)	({				\
>> +	u32 ret;						\
>> +	cant_sleep();						\
>> +	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
>> +		u64 start = sched_clock();			\
>> +		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
>> +		this_cpu_inc(prog->aux->stats->cnt);		\
>> +		this_cpu_add(prog->aux->stats->nsecs,		\
>> +			     sched_clock() - start);		\
>> +	} else {						\
>> +		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
>> +	}							\
>> +	ret; })
> 
> Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?
> 
> Why it might be useful?
> If the performance cost is too high to keep the statistics gathering
> always enabled, it will be possible to measure the average cost
> periodically and extrapolate.

In general, having some stats and timing info would be useful, but I
guess people might want to customize it in future even more specifically
beyond number of runs + time it takes. One thing that would be super
useful is to have some notion of __attribute__((constructor)) and
__attribute__((destructor)) support in BPF which gets then inlined into
prologue / epilogue of program. E.g. such infrastructure would allow to
mark an skb and measure time it takes through the BPF prog till it hits
an exit point somewhere (without having to explicitly code this into the
program everywhere). Other examples may be histograms or p99 latencies
that might probably be useful. Feels like for monitoring more ways to
program would be nice and to move it into the BPF insns sequence (e.g.
enforced globally or by choice of prog as another option)? Thoughts?

  reply	other threads:[~2019-02-23  1:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 23:36 [PATCH bpf-next 0/4] bpf: per program stats Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 1/4] bpf: enable " Alexei Starovoitov
2019-02-23  0:05   ` Eric Dumazet
2019-02-23  2:29     ` Alexei Starovoitov
2019-02-23  0:34   ` Roman Gushchin
2019-02-23  1:06     ` Daniel Borkmann [this message]
2019-02-23  2:38       ` Alexei Starovoitov
2019-02-25 10:36         ` Daniel Borkmann
2019-02-26  4:27           ` Alexei Starovoitov
2019-02-26 15:29             ` Daniel Borkmann
2019-02-27  4:05               ` Alexei Starovoitov
2019-02-23  2:31     ` Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 2/4] bpf: expose program stats via bpf_prog_info Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 3/4] tools/bpf: sync bpf.h into tools Alexei Starovoitov
2019-02-22 23:36 ` [PATCH bpf-next 4/4] tools/bpftool: recognize bpf_prog_info runtime and runcnt Alexei Starovoitov
2019-02-23  4:17   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8de20612-c32a-9370-254d-9ebf1df78efa@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=guro@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).