All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wang Nan <wangnan0@huawei.com>
To: Alexei Starovoitov <ast@plumgrid.com>, <davem@davemloft.net>,
	<acme@kernel.org>, <mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
	<hekuang@huawei.com>, <bgregg@netflix.com>
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Date: Tue, 5 May 2015 14:14:23 +0800	[thread overview]
Message-ID: <55485FBF.20306@huawei.com> (raw)
In-Reply-To: <554859CD.4090206@plumgrid.com>

On 2015/5/5 13:49, Alexei Starovoitov wrote:
> On 5/4/15 9:41 PM, Wang Nan wrote:
>>
>> That's great. Could you please append the description of 'llvm -s' into your README
>> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
>> add it into perf...
> 
> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
> Eventually it will work as normal 'clang -S file.c' when few more
> llvm commits are accepted upstream.
> 
>>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>>> and accessing its local variable will be supported like this:
>>>>
>>>>    SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>>>    int prog(struct pt_regs *ctx, unsigned long vara) {
>>>>       // vara is the value of localvara of function func_name
>>>>    }
>>>
>>> that would be great. I'm not sure though how you can achieve that
>>> without changing C front-end ?
>>
>> It's not very difficult. He is trying to generate the loader of vara
>> as prologue, then paste the prologue and the main eBPF program together.
>>  From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
>> prologue program fetches the value of vara then put it into a propoer register,
>> then main program work.
> 
> got it. I think that's much cleaner than what I was proposing.
> The only question is then:
> char _prog_config[] = "prog: func_name:1234 vara=localvara"
> should actually be something like "... r2=localvara", right?
> since prologue would need to assign into r2.
> Otherwise I don't see where you find out about 'vara' inside
> compiled bpf code.
>

I think the calling convention could teach us which var should go to which
register. In the case of

 SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara varb=globalvarb";
 int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }

llvm should compile 'prog' according to calling convention. The body of that
program should assume vara in r2 and varb in r3. The prologue also puts the vars into
r2 and r3 according to calling convention. Therefore, after paste them together, the final
program should run properly. There is no need to describe register number explicitly.
What do you think?


> Would be nice if this can be done without debug info.
> Like in tracex2_kern.c I have:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *ctx)
> {
>         long wr_size = ctx->dx; /* arg3 */
> 
> with your prolog generator the above can be rewritten as:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
> {
>         /* use wr_size */
> 
> that will improve ease of use a lot.
>

It is possible if probing on the entry of a function. However, when probing on
function body, there still need a way to pass variable list required by the
program to perf to let it generate correct prologue. We'd like to implement
the generic one (list vars in config string) first, then make function
parameters accessing as a syntax sugar.

>> Another possible solution is to change the protocol between kprobe and eBPF
>> program, makes kprobes calls fetchers and passes them to eBPF program as
>> a second param (group all varx together).
>> A prologue may still need in this case to load each param into correct
>> register.
> 
> you mean grouping varx together in some other struct and embedding it
> together with pt_regs into new container struct?
> doable, but your first approach is quite clean already. why bother.
> 

The second approach makes us reuse the fetchers code which are already in
kernel. Further more, if new type of fetchers are appear (for example, fetcher
of PMU counter), we support it automatically.

>> Could you please consider the following problem?
>>
>> We find there are serval __lock_page() calls last very long time. We are going
>> to find corresponding __unlock_page() so we can know what blocks them. We want to
>> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
>> on the entry of __unlock_page(), so we can compute the interval between page locking and
>> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
>> so we get its call stack. In this case, eBPF program acts as a trace filter.
> 
> all makes sense and your use case fits quite well into existing
> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
> A problem of how to display that call stack from perf?
> I would say it fits better as a sample than a trace.
> If you dump it as a trace, it won't easy to decipher, whereas if you
> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
> latency > N. Then existing sample_callchain flag should work.
> 

Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample

so we can use it in the program probed in the body of __unlock_page() like that:

 ...
 if (latency > 0.5s)
    bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
 ...

Thank you.


  reply	other threads:[~2015-05-05  6:15 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 10:52 [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 01/22] perf: probe: avoid segfault if passed with '' Wang Nan
2015-05-05 14:09   ` Masami Hiramatsu
2015-05-05 15:26     ` Arnaldo Carvalho de Melo
2015-05-05 16:33       ` Masami Hiramatsu
2015-04-30 10:52 ` [RFC PATCH 02/22] perf: bpf: prepare: add __aligned_u64 to types.h Wang Nan
2015-04-30 10:52 ` [RFC PATCH 03/22] perf: add bpf common operations Wang Nan
2015-04-30 10:52 ` [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command Wang Nan
2015-05-11  6:28   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 05/22] perf bpf: open eBPF object file and do basic validation Wang Nan
2015-04-30 10:52 ` [RFC PATCH 06/22] perf bpf: check swap according to EHDR Wang Nan
2015-04-30 10:52 ` [RFC PATCH 07/22] perf bpf: iterater over elf sections to collect information Wang Nan
2015-04-30 10:52 ` [RFC PATCH 08/22] perf bpf: collect version and license from ELF Wang Nan
2015-04-30 10:52 ` [RFC PATCH 09/22] perf bpf: collect map definitions Wang Nan
2015-05-11  6:32   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 10/22] perf bpf: collect config section in object Wang Nan
2015-04-30 10:52 ` [RFC PATCH 11/22] perf bpf: collect symbol table in object files Wang Nan
2015-04-30 10:52 ` [RFC PATCH 12/22] perf bpf: collect bpf programs from " Wang Nan
2015-04-30 10:52 ` [RFC PATCH 13/22] perf bpf: collects relocation sections from object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 14/22] perf bpf: config eBPF programs based on their names Wang Nan
2015-04-30 10:52 ` [RFC PATCH 15/22] perf bpf: config eBPF programs using config section Wang Nan
2015-04-30 10:52 ` [RFC PATCH 16/22] perf bpf: create maps needed by object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 17/22] perf bpf: relocation programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 18/22] perf bpf: load eBPF programs into kernel Wang Nan
2015-04-30 10:52 ` [RFC PATCH 19/22] perf bpf: dump eBPF program before loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 20/22] perf bpf: clean elf memory after loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 21/22] perf bpf: probe at kprobe points Wang Nan
2015-05-05 16:34   ` Masami Hiramatsu
2015-05-06  2:36     ` Wang Nan
2015-04-30 10:52 ` [RFC PATCH 22/22] perf bpf: attaches eBPF program to perf fd Wang Nan
2015-05-01  4:37 ` [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Alexei Starovoitov
2015-05-01 11:06   ` Peter Zijlstra
2015-05-01 11:49     ` Ingo Molnar
2015-05-01 16:56       ` Alexei Starovoitov
2015-05-01 17:06         ` Ingo Molnar
2015-05-05 15:39         ` Arnaldo Carvalho de Melo
2015-05-02  7:19   ` Wang Nan
2015-05-05  3:02     ` Alexei Starovoitov
2015-05-05  4:41       ` Wang Nan
2015-05-05  5:49         ` Alexei Starovoitov
2015-05-05  6:14           ` Wang Nan [this message]
2015-05-06  4:46             ` Wang Nan
2015-05-06  4:56               ` Alexei Starovoitov
2015-05-06  5:00                 ` Wang Nan
2015-05-01  7:16 ` Ingo Molnar
2015-05-05 21:52 ` Brendan Gregg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55485FBF.20306@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=bgregg@netflix.com \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.