linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"Kernel Team" <Kernel-team@fb.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"yzaikin@google.com" <yzaikin@google.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"bristot@redhat.com" <bristot@redhat.com>,
	"mingo@kernel.org" <mingo@kernel.org>
Subject: Re: [RFC bpf-next 0/2] sharing bpf runtime stats with /dev/bpf_stats
Date: Sat, 14 Mar 2020 03:47:50 +0000	[thread overview]
Message-ID: <E7BBB6E4-F911-47D4-A4BC-3DF3D29B557B@fb.com> (raw)
In-Reply-To: <20200314024322.vymr6qkxsf6nzpum@ast-mbp.dhcp.thefacebook.com>



> On Mar 13, 2020, at 7:43 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Fri, Mar 13, 2020 at 05:35:16PM -0700, Song Liu wrote:
>> Motivation (copied from 2/2):
>> 
>> ======================= 8< =======================
>> Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
>> Typical userspace tools use kernel.bpf_stats_enabled as follows:
>> 
>>  1. Enable kernel.bpf_stats_enabled;
>>  2. Check program run_time_ns;
>>  3. Sleep for the monitoring period;
>>  4. Check program run_time_ns again, calculate the difference;
>>  5. Disable kernel.bpf_stats_enabled.
>> 
>> The problem with this approach is that only one userspace tool can toggle
>> this sysctl. If multiple tools toggle the sysctl at the same time, the
>> measurement may be inaccurate.
>> 
>> To fix this problem while keep backward compatibility, introduce
>> /dev/bpf_stats. sysctl kernel.bpf_stats_enabled will only change the
>> lowest bit of the static key. /dev/bpf_stats, on the other hand, adds 2
>> to the static key for each open fd. The runtime stats is enabled when
>> kernel.bpf_stats_enabled == 1 or there is open fd to /dev/bpf_stats.
>> 
>> With /dev/bpf_stats, user space tool would have the following flow:
>> 
>>  1. Open a fd to /dev/bpf_stats;
>>  2. Check program run_time_ns;
>>  3. Sleep for the monitoring period;
>>  4. Check program run_time_ns again, calculate the difference;
>>  5. Close the fd.
>> ======================= 8< =======================
>> 
>> 1/2 adds a few new API to jump_label.
>> 2/2 adds the /dev/bpf_stats and adjust kernel.bpf_stats_enabled handler.
>> 
>> Please share your comments.
> 
> Conceptually makes sense to me. Few comments:
> 1. I don't understand why +2 logic is necessary.
> Just do +1 for every FD and change proc_do_static_key() from doing
> explicit enable/disable to do +1/-1 as well on transition from 0->1 and 1->0.
> The handler would need to check that 1->1 and 0->0 is a nop.

With the +2/-2 logic, we use the lowest bit of the counter to remember 
the value of the sysctl. Otherwise, we cannot tell whether we are making
0->1 transition or 1->1 transition. 

> 
> 2. /dev is kinda awkward. May be introduce a new bpf command that returns fd?

Yeah, I also feel /dev is awkward. fd from bpf command sounds great. 

> 
> 3. Instead of 1 and 2 tweak sysctl to do ++/-- unconditionally?
> Like repeated sysctl kernel.bpf_stats_enabled=1 will keep incrementing it
> and would need equal amount of sysctl kernel.bpf_stats_enabled=0 to get
> it back to zero where it will stay zero even if users keep spamming
> sysctl kernel.bpf_stats_enabled=0.
> This way current services that use sysctl will keep working as-is.
> Multiple services that currently collide on sysctl will magically start
> working without any changes to them. It is still backwards compatible.

I think this is not fully backwards compatible. With current logic, the 
following sequence disables stats eventually. 

  sysctl kernel.bpf_stats_enabled=1
  sysctl kernel.bpf_stats_enabled=1
  sysctl kernel.bpf_stats_enabled=0

The same sequence will not disable stats with the ++/-- sysctl. 

Thanks,
Song




  reply	other threads:[~2020-03-14  3:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200314003518.3114452-1-songliubraving@fb.com>
2020-03-14  2:43 ` [RFC bpf-next 0/2] sharing bpf runtime stats with /dev/bpf_stats Alexei Starovoitov
2020-03-14  3:47   ` Song Liu [this message]
2020-03-14 15:57     ` Alexei Starovoitov
2020-03-16  5:52       ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E7BBB6E4-F911-47D4-A4BC-3DF3D29B557B@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=bristot@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).