All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wangnan (F)" <wangnan0@huawei.com>
To: Alexei Starovoitov <ast@plumgrid.com>,
	Kaixu Xia <xiakaixu@huawei.com>, <davem@davemloft.net>,
	<acme@kernel.org>, <mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>,
	<daniel@iogearbox.net>
Cc: <linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
	<hekuang@huawei.com>, <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers
Date: Tue, 13 Oct 2015 14:57:59 +0800	[thread overview]
Message-ID: <561CAB77.7050406@huawei.com> (raw)
In-Reply-To: <561C9361.6090104@plumgrid.com>



On 2015/10/13 13:15, Alexei Starovoitov wrote:
> On 10/12/15 9:34 PM, Wangnan (F) wrote:
>>
>>
>> On 2015/10/13 12:16, Alexei Starovoitov wrote:
>>> On 10/12/15 8:51 PM, Wangnan (F) wrote:
>>>>> why 'set disable' is needed ?
>>>>> the example given in cover letter shows the use case where you want
>>>>> to receive samples only within sys_write() syscall.
>>>>> The example makes sense, but sys_write() is running on this cpu, so
>>>>> just
>>>>> disabling it on the current one is enough.
>>>>>
>>>>
>>>> Our real use case is control of the system-wide sampling. For example,
>>>> we need sampling all CPUs when smartphone start refershing its 
>>>> display.
>>>> We need all CPUs because in Android system there are plenty of threads
>>>> get involed into this behavior. We can't achieve this by controling
>>>> sampling on only one CPU. This is the reason we need 'set enable'
>>>> and 'set disable'.
>>>
>>> ok, but that use case may have different enable/disable pattern.
>>> In sys_write example ultra-fast enable/disable is must have, since
>>> the whole syscall is fast and overhead should be minimal.
>>> but for display refresh? we're talking milliseconds, no?
>>> Can you just ioctl() it from user space?
>>> If cost of enable/disable is high or the time range between toggling is
>>> long, then doing it from the bpf program doesn't make sense. Instead
>>> the program can do bpf_perf_event_output() to send a notification to
>>> user space that condition is met and the user space can ioctl() events.
>>>
>>
>> OK. I think I understand your design principle that, everything 
>> inside BPF
>> should be as fast as possible.
>>
>> Make userspace control events using ioctl make things harder. You 
>> know that
>> 'perf record' itself doesn't care too much about events it reveived. It
>> only
>> copies data to perf.data, but what we want is to use perf record simply
>> like
>> this:
>>
>>   # perf record -e evt=cycles -e control.o/pmu=evt/ -a sleep 100
>>
>> And in control.o we create uprobe point to mark the start and finish of
>> a frame:
>>
>>   SEC("target=/a/b/c.o\nstartFrame=0x123456")
>>   int startFrame(void *) {
>>     bpf_pmu_enable(pmu);
>>     return 1;
>>   }
>>
>>   SEC("target=/a/b/c.o\nfinishFrame=0x234568")
>>   int finishFrame(void *) {
>>     bpf_pmu_disable(pmu);
>>     return 1;
>>   }
>>
>> I think it is make sence also.
>
> yes. that looks quite useful,
> but did you consider re-entrant startFrame() ?
> start << here sampling starts
>   start
>   finish << here all samples disabled?!
> finish
> and startFrame()/finishFrame() running on all cpus of that user app ?
> One cpu entering into startFrame() while another cpu doing finishFrame
> what behavior should be? sampling is still enabled on all cpus? or off?
> Either case doesn't seem to work with simple enable/disable.
> Few emails in this thread back, I mentioned inc/dec of a flag
> to solve that.

Correct.

>
>> What about using similar
>> implementation
>> like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like
>> PERF_EVENT_IOC_SET_ENABLER,
>> then let perf to select an event as 'enabler', then BPF can still
>> control one atomic
>> variable to enable/disable a set of events.
>
> you lost me on that last sentence. How this 'enabler' will work?

Like what we did in this patchset: add an atomic flag to perf_event,
make all perf_event connected to this enabler by PERF_EVENT_IOC_SET_ENABLER.
During running, check the enabler's atomic flag. So we use one atomic
variable to control a set of perf_event. Finally create a BPF helper
function to control that atomic variable.

> Also I'm still missing what's wrong with perf doing ioctl() on
> events on all cpus manually when bpf program tells it to do so.
> Is it speed you concerned about or extra work in perf ?
>

I think both speed and extra work need be concerned.

Say we use perf to enable/disable sampling. Use the above example to
describe, when smartphone starting refresing display, we write something
into ringbuffer, then display refreshing start. We have to wait for
perf be scheduled in, parse event it get (perf record doesn't do this
currently), discover the trigger event then enable sampling perf events
on all cpus. We make trigger and action asynchronous. I'm not sure how
many ns or ms it need, and I believe asynchronization itself introduces
complexity, which I think need to be avoided except we can explain the
advantages asynchronization can bring.

But yes, perf based implementation can shut down the PMU completely, which
is better than current light-weight implementation.

In summary:

  - In next version we will use counter based flag instead of current
    0/1 switcher in considering of reentering problem.

  - I think we both agree we need a light weight solution in which we can
    enable/disable sampling in function level. This light-weight solution
    can be applied to only one perf-event.

  - Our disagreement is whether to introduce a heavy-weight solution based
    on perf to enable/disable a group of perf event. For me, perf-based
    solution can shut down PMU completly, which is good. However, it
    introduces asynchronization and extra work on perf. I think we can
    do it in a much simpler, fully BPF way. Enabler solution I mentioned
    above is a candidate.

Thank you.


  reply	other threads:[~2015-10-13  7:05 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12  9:02 [RFC PATCH 0/2] bpf: enable/disable events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia
2015-10-12  9:02 ` [RFC PATCH 1/2] perf: Add the flag sample_disable not to output data on samples Kaixu Xia
2015-10-12 12:02   ` Peter Zijlstra
2015-10-12 12:05     ` Wangnan (F)
2015-10-12 12:12       ` Peter Zijlstra
2015-10-12 14:14   ` kbuild test robot
2015-10-12 19:20   ` Alexei Starovoitov
2015-10-13  2:30     ` xiakaixu
2015-10-13  3:10       ` Alexei Starovoitov
2015-10-13 12:00   ` Sergei Shtylyov
2015-10-12  9:02 ` [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers Kaixu Xia
2015-10-12 19:29   ` Alexei Starovoitov
2015-10-13  3:27     ` Wangnan (F)
2015-10-13  3:39       ` Alexei Starovoitov
2015-10-13  3:51         ` Wangnan (F)
2015-10-13  4:16           ` Alexei Starovoitov
2015-10-13  4:34             ` Wangnan (F)
2015-10-13  5:15               ` Alexei Starovoitov
2015-10-13  6:57                 ` Wangnan (F) [this message]
2015-10-13 10:54                 ` He Kuang
2015-10-13 11:07                   ` Wangnan (F)
2015-10-14  5:14                   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561CAB77.7050406@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pi3orama@163.com \
    --cc=xiakaixu@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.