From: Song Liu <songliubraving@fb.com>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Networking <netdev@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"ast@kernel.org" <ast@kernel.org>,
"mcgrof@kernel.org" <mcgrof@kernel.org>,
"keescook@chromium.org" <keescook@chromium.org>,
"yzaikin@google.com" <yzaikin@google.com>
Subject: Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats
Date: Wed, 18 Mar 2020 23:45:32 +0000 [thread overview]
Message-ID: <B98810F3-EA8F-432B-A66D-7111B779AC1C@fb.com> (raw)
In-Reply-To: <20200318222953.GA2507308@mini-arch.hsd1.ca.comcast.net>
> On Mar 18, 2020, at 3:29 PM, Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/18, Song Liu wrote:
>>
>>
>>> On Mar 18, 2020, at 1:58 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>
>>> On 3/18/20 7:33 AM, Song Liu wrote:
>>>>> On Mar 17, 2020, at 4:08 PM, Song Liu <songliubraving@fb.com> wrote:
>>>>>> On Mar 17, 2020, at 2:47 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>>>>>>
>>>>>>>> Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit
>>>>>>>> programs supersede this old bpf_stats infrastructure? Iow, can't we implement the
>>>>>>>> same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then
>>>>>>>> potentially deprecate bpf_stats counters?
>>>>>>> I think run_time_ns has its own value as a simple monitoring framework. We can
>>>>>>> use it in tools like top (and variations). It will be easier for these tools to
>>>>>>> adopt run_time_ns than using fentry/fexit.
>>>>>>
>>>>>> Agree that this is easier; I presume there is no such official integration today
>>>>>> in tools like top, right, or is there anything planned?
>>>>>
>>>>> Yes, we do want more supports in different tools to increase the visibility.
>>>>> Here is the effort for atop: https://github.com/Atoptool/atop/pull/88 .
>>>>>
>>>>> I wasn't pushing push hard on this one mostly because the sysctl interface requires
>>>>> a user space "owner".
>>>>>
>>>>>>> On the other hand, in long term, we may include a few fentry/fexit based programs
>>>>>>> in the kernel binary (or the rpm), so that these tools can use them easily. At
>>>>>>> that time, we can fully deprecate run_time_ns. Maybe this is not too far away?
>>>>>>
>>>>>> Did you check how feasible it is to have something like `bpftool prog profile top`
>>>>>> which then enables fentry/fexit for /all/ existing BPF programs in the system? It
>>>>>> could then sort the sample interval by run_cnt, cycles, cache misses, aggregated
>>>>>> runtime, etc in a top-like output. Wdyt?
>>>>>
>>>>> I wonder whether we can achieve this with one bpf prog (or a trampoline) that covers
>>>>> all BPF programs, like a trampoline inside __BPF_PROG_RUN()?
>>>>>
>>>>> For long term direction, I think we could compare two different approaches: add new
>>>>> tools (like bpftool prog profile top) vs. add BPF support to existing tools. The
>>>>> first approach is easier. The latter approach would show BPF information to users
>>>>> who are not expecting BPF programs in the systems. For many sysadmins, seeing BPF
>>>>> programs in top/ps, and controlling them via kill is more natural than learning
>>>>> bpftool. What's your thought on this?
>>>> More thoughts on this.
>>>> If we have a special trampoline that attach to all BPF programs at once, we really
>>>> don't need the run_time_ns stats anymore. Eventually, tools that monitor BPF
>>>> programs will depend on libbpf, so using fentry/fexit to monitor BPF programs doesn't
>>>> introduce extra dependency. I guess we also need a way to include BPF program in
>>>> libbpf.
>>>> To summarize this plan, we need:
>>>> 1) A global trampoline that attaches to all BPF programs at once;
>>>
>>> Overall sounds good, I think the `at once` part might be tricky, at least it would
>>> need to patch one prog after another, each prog also needs to store its own metrics
>>> somewhere for later collection. The start-to-sample could be a shared global var (aka
>>> shared map between all the programs) which would flip the switch though.
>>
>> I was thinking about adding bpf_global_trampoline and use it in __BPF_PROG_RUN.
>> Something like:
>>
>> diff --git i/include/linux/filter.h w/include/linux/filter.h
>> index 9b5aa5c483cc..ac9497d1fa7b 100644
>> --- i/include/linux/filter.h
>> +++ w/include/linux/filter.h
>> @@ -559,9 +559,14 @@ struct sk_filter {
>>
>> DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>>
>> +extern struct bpf_trampoline *bpf_global_trampoline;
>> +DECLARE_STATIC_KEY_FALSE(bpf_global_tr_active);
>> +
>> #define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \
>> u32 ret; \
>> cant_migrate(); \
>> + if (static_branch_unlikely(&bpf_global_tr_active)) \
>> + run_the_trampoline(); \
>> if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
>> struct bpf_prog_stats *stats; \
>> u64 start = sched_clock(); \
>>
>>
>> I am not 100% sure this is OK.
>>
>> I am also not sure whether this is an overkill. Do we really want more complex
>> metric for all BPF programs? Or run_time_ns is enough?
> I was thinking about exporting a real distribution of the prog runtimes
> instead of doing an average. It would be interesting to see
> 50%/95%/99%/max stats.
Good point. Distribution logic fits well in fentry/fexit programs.
Let me think more about\x10 this.
Thanks,
Song
next prev parent reply other threads:[~2020-03-18 23:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-16 20:33 [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats Song Liu
2020-03-17 19:30 ` Daniel Borkmann
2020-03-17 19:54 ` Song Liu
2020-03-17 20:03 ` Daniel Borkmann
2020-03-17 20:13 ` Song Liu
2020-03-17 21:47 ` Daniel Borkmann
2020-03-17 23:08 ` Song Liu
2020-03-18 6:33 ` Song Liu
2020-03-18 20:58 ` Daniel Borkmann
2020-03-18 21:20 ` Song Liu
2020-03-18 22:29 ` Stanislav Fomichev
2020-03-18 23:45 ` Song Liu [this message]
2020-03-17 20:04 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B98810F3-EA8F-432B-A66D-7111B779AC1C@fb.com \
--to=songliubraving@fb.com \
--cc=Kernel-team@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=keescook@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sdf@fomichev.me \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).