linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] bpf: Suggestion on bpf syscall interface
@ 2015-03-28 11:36 He Kuang
  2015-03-28 17:21 ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: He Kuang @ 2015-03-28 11:36 UTC (permalink / raw)
  To: ast@plumgrid.com >> Alexei Starovoitov
  Cc: wangnan0, linux-kernel@vger.kernel.org >> LKML

Hi, Alexei

In our end-end IO module project, we use bpf maps to record
configurations. According to current bpf syscall interface, we
should specify map_fd to lookup/update bpf maps, so we are
restricted to do config in the same user program.

My suggestion is to export this kind of operations to sysfs, so
we can load&attach bpf progs and config it seperately. We
implement this feature in our demo project. What's your opinion
on this?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bpf: Suggestion on bpf syscall interface
  2015-03-28 11:36 [RFC] bpf: Suggestion on bpf syscall interface He Kuang
@ 2015-03-28 17:21 ` Alexei Starovoitov
  2015-03-28 22:16   ` Daniel Borkmann
  2015-03-30  3:13   ` He Kuang
  0 siblings, 2 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2015-03-28 17:21 UTC (permalink / raw)
  To: He Kuang; +Cc: wangnan0, LKML

On 3/28/15 4:36 AM, He Kuang wrote:
> Hi, Alexei
>
> In our end-end IO module project, we use bpf maps to record
> configurations. According to current bpf syscall interface, we
> should specify map_fd to lookup/update bpf maps, so we are
> restricted to do config in the same user program.

you can pass map_fd and prog_fd from one process to another via normal
scm_rights mechanism.

> My suggestion is to export this kind of operations to sysfs, so
> we can load&attach bpf progs and config it seperately. We
> implement this feature in our demo project. What's your opinion
> on this?

Eventually we may use single sysfs file for lsmod-like listings, but I
definitely don't want to create parallel interface to maps via sysfs.
It's way too expensive and not really suitable for binary key/values.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bpf: Suggestion on bpf syscall interface
  2015-03-28 17:21 ` Alexei Starovoitov
@ 2015-03-28 22:16   ` Daniel Borkmann
  2015-03-30  3:13   ` He Kuang
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel Borkmann @ 2015-03-28 22:16 UTC (permalink / raw)
  To: Alexei Starovoitov, He Kuang; +Cc: wangnan0, LKML

On 03/28/2015 06:21 PM, Alexei Starovoitov wrote:
> On 3/28/15 4:36 AM, He Kuang wrote:
>> Hi, Alexei
>>
>> In our end-end IO module project, we use bpf maps to record
>> configurations. According to current bpf syscall interface, we
>> should specify map_fd to lookup/update bpf maps, so we are
>> restricted to do config in the same user program.
>
> you can pass map_fd and prog_fd from one process to another via normal
> scm_rights mechanism.

+1, I've just tried that out in the context of a different work and
works like a charm.

>> My suggestion is to export this kind of operations to sysfs, so
>> we can load&attach bpf progs and config it seperately. We
>> implement this feature in our demo project. What's your opinion
>> on this?
>
> Eventually we may use single sysfs file for lsmod-like listings, but I
> definitely don't want to create parallel interface to maps via sysfs.

Yes, that would be a bad design decision. Btw, even more lightweight
for kernel-side would be to just implement .show_fdinfo() for the anon
indoes on the map/prog store and have some meta information exported
from there. You can then grab that via /proc/<pid>/fdinfo/<fd>, I
would consider such a thing a slow-path operation anyway, and you would
also get the app info using it for free.

> It's way too expensive and not really suitable for binary key/values.

+1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bpf: Suggestion on bpf syscall interface
  2015-03-28 17:21 ` Alexei Starovoitov
  2015-03-28 22:16   ` Daniel Borkmann
@ 2015-03-30  3:13   ` He Kuang
  2015-03-31  3:23     ` Alexei Starovoitov
  1 sibling, 1 reply; 5+ messages in thread
From: He Kuang @ 2015-03-30  3:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: wangnan0, LKML, daniel@iogearbox.net >> Daniel Borkmann


On 2015/3/29 1:21, Alexei Starovoitov wrote:
> On 3/28/15 4:36 AM, He Kuang wrote:
>> Hi, Alexei
>>
>> In our end-end IO module project, we use bpf maps to record
>> configurations. According to current bpf syscall interface, we
>> should specify map_fd to lookup/update bpf maps, so we are
>> restricted to do config in the same user program.
>
> you can pass map_fd and prog_fd from one process to another via normal
> scm_rights mechanism.

In our current use case, we add a bpf probe point in sys_write()
as the entry of IO procedure, this bpf point will return true on
some conditions, and then trigger bpf chain on IO path, for
example:

SEC("kprobe/sys_write")
int NODE_sys_write(struct pt_regs *ctx) {
...
    struct parameters *param = bpf_map_lookup_elem(&parameters_map, 
&index);
    if(param->num_samples % param->sample_rate !=0)
        return 0;
...
    /* extract characters from this sampled point, fill it to another 
map */
    bpf_map_update_elem(&TRIGGER_mpage_submit_page_HASH, 
(void**)__b__buf, &value, BPF_ANY);
    return 1;
...
SEC("kprobe/mpage_submit_page")
int NODE_mpage_submit_page(struct pt_regs *ctx) {
...
   /* lookup filter table */
   value = (struct 
table_value*)bpf_map_lookup_elem(&TRIGGER_mpage_submit_page_HASH, 
(void**)__b__buf);
   if (!value) return 0;
...

By using current bpf syscalls, we should keep the program which
attaches bpf programs running background, use it or some other
processes communicate with it to adjust maps parameters, like
sample rate for sys_write.

What we hope is to use bpf maps/progs like kernel-modules or
kprobes, one process inserts them to kernel, then they detactch
from that process, and allow us to configure them with sysfs. For
example:

$ perf probe --add='sys_write'
$ perf record -e probe:sys_open -aR sleep 1

In current implementation, we have to use a large and relative
heavy daemon to deal with loading, configuration, adjusting and
unloading works together.

Thanks.
>
>> My suggestion is to export this kind of operations to sysfs, so
>> we can load&attach bpf progs and config it seperately. We
>> implement this feature in our demo project. What's your opinion
>> on this?
>
> Eventually we may use single sysfs file for lsmod-like listings, but I
> definitely don't want to create parallel interface to maps via sysfs.
> It's way too expensive and not really suitable for binary key/values.
>
>
>
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bpf: Suggestion on bpf syscall interface
  2015-03-30  3:13   ` He Kuang
@ 2015-03-31  3:23     ` Alexei Starovoitov
  0 siblings, 0 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2015-03-31  3:23 UTC (permalink / raw)
  To: He Kuang; +Cc: wangnan0, LKML, daniel@iogearbox.net >> Daniel Borkmann

On 3/29/15 8:13 PM, He Kuang wrote:
>
> By using current bpf syscalls, we should keep the program which
> attaches bpf programs running background, use it or some other
> processes communicate with it to adjust maps parameters, like
> sample rate for sys_write.

You can do all of the above by passing fds between processes. I still
don't see a need for sysfs.

> In current implementation, we have to use a large and relative
> heavy daemon to deal with loading, configuration, adjusting and
> unloading works together.

This daemon is actually small and simple.
Just take a look how Daniel did for tc:
http://patchwork.ozlabs.org/patch/456387/
In that example 3 programs are sharing maps and single bpf_agent
monitors maps. Note that tc loaded programs and exited while
agent keeps running. Very straightforward.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-03-31  3:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-28 11:36 [RFC] bpf: Suggestion on bpf syscall interface He Kuang
2015-03-28 17:21 ` Alexei Starovoitov
2015-03-28 22:16   ` Daniel Borkmann
2015-03-30  3:13   ` He Kuang
2015-03-31  3:23     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).