linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 00/15] tracing: 'hist' triggers
@ 2015-03-03  2:31 Alexei Starovoitov
  2015-03-03 15:47 ` Tom Zanussi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-03  2:31 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>>
>> I'm saying keep the command line version of hist, but let
>> user space process it.
>> I don't buy the argument that you must run it in busybox
>> without any extra tools.
>> If you're in busybox, the system is likely idle, so nothing
>> to trace/analyze. If you have some user space apps,
>> then it equally easy to add 'hist->bpf' tool.
>>
>
> How about systems that run a single statically linked process with no
> shell (but a service that can read and write files like/event/trigger
> and event/hist)?  We'd still like to be able to trace those systems.

hmm, there is no shell and one cannot do
'echo hist.. > trigger; cat hist' , but there is a demon
that can accept remote commands ?
Then would make even more sense to pass bpf programs
from remote host and consume aggregated data remotely.
This on-host demon will be tiny wrapper on top of bpf syscall.
Quite a bit more flexible :)

> Well, I'd say writing BPF 'assembly' to do anything isn't something more
> than a few users in the world would even consider, so that's completely
> out. Which means the only practical way to use it is via the C
> interface.  But getting that set up properly doesn't seem
> straightforward either - it isn't something the Makefile will help with,
> and there's no documentation on how one might do it.

I'm not proposing to use asm or C for this 'hist->bpf' tool.
Keep proposed 'hist:keys=...:vals=...' syntax and generate
bpf program on the fly based on this string.
So this user tool will take string, generate program, load
and attach it. Everything based on this single string input.
With the examples you mentioned in docs, it's not hard.
It will be more involved than patch 12, but way more generic
and easily extensible when 'hist:keys=...' string would need
to be extended.

> So I tweaked the Makefile to get samples/bpf in the build (I mean the
> directory is there under samples/, so why do I need to add it to the
> Makefile myself?) and tried building which failed until I tweaked
> something else to get it to find the right headers, etc.  Finally I got
> it building the userspace stuff but then found out I needed my own llvm
> to get the kernel modules built, so searched and found your llvm tree
> which I thought would configure the bpf backend automatically, but
> apparently not, since it then failed with llc: invalid target 'bpf'
> which is where I gave up.  Do I need to configure with --target=bpf or
> something like that?  I don't know, and know nothing about llvm, so am
> kind of stuck.

hmm. 'make samples/bpf/' should work out of the box.
only llvm needs to installed.
To install llvm:
$git clone http://llvm.org/git/llvm.git
$mkdir llvm/bld; cd llvm/bld
if you like cmake:
$cmake -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=BPF ..
$make
if you like autoconf:
$../configure --enable-experimental-targets=BPF
$make

That's typical for any new backend. Hopefully soon it will lose
'experimental' status.

> I really do want to try doing something with it, and I understand that
> you're working on improving the user experience, but at this point it
> seems users have to jump through a lot of hoops just to get a minimally
> working setup.  Even a small paragraph with some basic instructions
> would help.  Or maybe it's just me, and it works for everyone else out
> of the box.

Thank you for feedback. We'll add llvm build instructions to the doc.
The goal for llvm is to be able to do
'apt-get install llvm-3.7' and bpf will be part of default install.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-03  2:31 [PATCH v2 00/15] tracing: 'hist' triggers Alexei Starovoitov
@ 2015-03-03 15:47 ` Tom Zanussi
  2015-03-04  2:22   ` Alexei Starovoitov
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Zanussi @ 2015-03-03 15:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >>
> >> I'm saying keep the command line version of hist, but let
> >> user space process it.
> >> I don't buy the argument that you must run it in busybox
> >> without any extra tools.
> >> If you're in busybox, the system is likely idle, so nothing
> >> to trace/analyze. If you have some user space apps,
> >> then it equally easy to add 'hist->bpf' tool.
> >>
> >
> > How about systems that run a single statically linked process with no
> > shell (but a service that can read and write files like/event/trigger
> > and event/hist)?  We'd still like to be able to trace those systems.
> 
> hmm, there is no shell and one cannot do
> 'echo hist.. > trigger; cat hist' , but there is a demon
> that can accept remote commands ?
> Then would make even more sense to pass bpf programs
> from remote host and consume aggregated data remotely.
> This on-host demon will be tiny wrapper on top of bpf syscall.
> Quite a bit more flexible :)
> 
> > Well, I'd say writing BPF 'assembly' to do anything isn't something more
> > than a few users in the world would even consider, so that's completely
> > out. Which means the only practical way to use it is via the C
> > interface.  But getting that set up properly doesn't seem
> > straightforward either - it isn't something the Makefile will help with,
> > and there's no documentation on how one might do it.
> 
> I'm not proposing to use asm or C for this 'hist->bpf' tool.
> Keep proposed 'hist:keys=...:vals=...' syntax and generate
> bpf program on the fly based on this string.
> So this user tool will take string, generate program, load
> and attach it. Everything based on this single string input.
> With the examples you mentioned in docs, it's not hard.
> It will be more involved than patch 12, but way more generic
> and easily extensible when 'hist:keys=...' string would need
> to be extended.
> 

Hmm, this seems silly to me - doing all that work just to get back to
where we already are.

I don't think you can start requiring all new tracing functionality to
embed a code generator/compiler, or as a result force any particular
interface on users.  If it's possible to factor out a common
infrastructure that can accommodate different user interfaces and
preferences, don't you think it makes sense to do that?

In any case, outside of the boilerplate tracing infrastructure code, it
seems to me that a lot of the code in the main hist trigger patches is
code that you'd also need for a tracepoint/bpf interface.  I think we've
already agreed that it would be good to work towards being able to share
that, so for the next version, I'll see what I can come up with.

Tom



> > So I tweaked the Makefile to get samples/bpf in the build (I mean the
> > directory is there under samples/, so why do I need to add it to the
> > Makefile myself?) and tried building which failed until I tweaked
> > something else to get it to find the right headers, etc.  Finally I got
> > it building the userspace stuff but then found out I needed my own llvm
> > to get the kernel modules built, so searched and found your llvm tree
> > which I thought would configure the bpf backend automatically, but
> > apparently not, since it then failed with llc: invalid target 'bpf'
> > which is where I gave up.  Do I need to configure with --target=bpf or
> > something like that?  I don't know, and know nothing about llvm, so am
> > kind of stuck.
> 
> hmm. 'make samples/bpf/' should work out of the box.
> only llvm needs to installed.
> To install llvm:
> $git clone http://llvm.org/git/llvm.git
> $mkdir llvm/bld; cd llvm/bld
> if you like cmake:
> $cmake -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=BPF ..
> $make
> if you like autoconf:
> $../configure --enable-experimental-targets=BPF
> $make
> 
> That's typical for any new backend. Hopefully soon it will lose
> 'experimental' status.
> 
> > I really do want to try doing something with it, and I understand that
> > you're working on improving the user experience, but at this point it
> > seems users have to jump through a lot of hoops just to get a minimally
> > working setup.  Even a small paragraph with some basic instructions
> > would help.  Or maybe it's just me, and it works for everyone else out
> > of the box.
> 
> Thank you for feedback. We'll add llvm build instructions to the doc.
> The goal for llvm is to be able to do
> 'apt-get install llvm-3.7' and bpf will be part of default install.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-03 15:47 ` Tom Zanussi
@ 2015-03-04  2:22   ` Alexei Starovoitov
  2015-03-04  5:01     ` Tom Zanussi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-04  2:22 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On 3/3/15 7:47 AM, Tom Zanussi wrote:
> On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
>> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

>> I'm not proposing to use asm or C for this 'hist->bpf' tool.
>> Keep proposed 'hist:keys=...:vals=...' syntax and generate
>> bpf program on the fly based on this string.
>> So this user tool will take string, generate program, load
>> and attach it. Everything based on this single string input.
>> With the examples you mentioned in docs, it's not hard.
>> It will be more involved than patch 12, but way more generic
>> and easily extensible when 'hist:keys=...' string would need
>> to be extended.
>>
>
> Hmm, this seems silly to me - doing all that work just to get back to
> where we already are.

your 'hist->bpf' tool could have been first to solve 'bpf hard to use'
problem and overtime it could have evolved into full dtrace alternative.
Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck
with it and somebody else's dtrace-like tool will supersede it.

> I don't think you can start requiring all new tracing functionality to
> embed a code generator/compiler, or as a result force any particular
> interface on users.

force interface on users?!
I think I've explained enough that bpf would not be a user visible
interface. It's kernel to user land interface. In my proposal of
'hist->bpf' tool users won't even see that bpf is being generated
and run underneath. Same with dtrace-like scripting. Users will
see scripting language and won't care how it's compiled and
executed inside kernel. Multiple different languages and interfaces
are possible. Including hist-like text that's not a language.

 > If it's possible to factor out a common
> infrastructure that can accommodate different user interfaces and
> preferences, don't you think it makes sense to do that?
>
> In any case, outside of the boilerplate tracing infrastructure code, it
> seems to me that a lot of the code in the main hist trigger patches is
> code that you'd also need for a tracepoint/bpf interface.  I think we've
> already agreed that it would be good to work towards being able to share
> that, so for the next version, I'll see what I can come up with.

yes. let's see what pieces can be shared. Though I don't want to
sacrifice long term goals for short term solution.
In case of bpf_maps, the objective is to share data between
kernel and user space by single abstraction with different
implementations underneath. In some use cases kernel programs only
do lookups into maps, while user space concurrently adding/deleting
entries (so no allocations from programs)
In your patch 4 you're adding kernel only access to maps but planning
to use hash type only and also not exposing these maps to user space
at all... that defeats bpf_map purpose and you only reusing ~200
lines of hashtab.c code, but also need to hack it for pre-allocate
and make config_bpf_syscall a strong dependency for 'hist'. why?
'hist' patch set would have been much shorter if you just used
hashtable.h or rhashtable.h or even rbtree.h (so you don't need
to do sorting in patch 13). The actual hash insert/walk code
is tiny and trivial.

I've looked through the patches again and they seem to have serious
bugs when it comes to object life time:
- patch 5 that adds clear_map is completely broken, calling
delete_all_elements() by wrapping it in rcu_read_lock() ?!
- patch 6 adds 'free_elem' callback that is called right from
htab_map_delete_elem() while we're still under rcu.. ?!
- patch 12 does
struct hist_trigger_entry *entry;
tracing_map_lookup_elem(hist_data->map, next_key, &entry);
hist_trigger_entry_print(m, hist_data, next_key, entry);
so when you're storing a pointer inside the map the rcu protection
of map_lookup protects only the value of 'entry' pointer.
The actual contents of *entry are not protected by anything,
so even if you fix 5 and 6, you'll still have severe memory corruptions,
since nothing guarantees that contents of *entry are valid when
hist_trigger_entry_print() is trying to access it.

I think you'd be better off using vanilla hashtable.h

Thanks


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-04  2:22   ` Alexei Starovoitov
@ 2015-03-04  5:01     ` Tom Zanussi
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Zanussi @ 2015-03-04  5:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Tue, 2015-03-03 at 18:22 -0800, Alexei Starovoitov wrote:
> On 3/3/15 7:47 AM, Tom Zanussi wrote:
> > On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
> >> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> >> I'm not proposing to use asm or C for this 'hist->bpf' tool.
> >> Keep proposed 'hist:keys=...:vals=...' syntax and generate
> >> bpf program on the fly based on this string.
> >> So this user tool will take string, generate program, load
> >> and attach it. Everything based on this single string input.
> >> With the examples you mentioned in docs, it's not hard.
> >> It will be more involved than patch 12, but way more generic
> >> and easily extensible when 'hist:keys=...' string would need
> >> to be extended.
> >>
> >
> > Hmm, this seems silly to me - doing all that work just to get back to
> > where we already are.
> 
> your 'hist->bpf' tool could have been first to solve 'bpf hard to use'
> problem and overtime it could have evolved into full dtrace alternative.
> Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck
> with it and somebody else's dtrace-like tool will supersede it.
> 

I think there's some misunderstanding there - it was never my intent to
create a full dtrace alternative, really the idea was (and has been,
even before there was any such thing as ebpf in the kernel) only to
provide access to some low-hanging fruit via the standard read/write
interfaces people are used to with ftrace.  The dtrace-like tools were
what I thought ebpf was all about, where users could go after exhausting
the possibilities of a simple table-driven first approach to a problem. 

> > I don't think you can start requiring all new tracing functionality to
> > embed a code generator/compiler, or as a result force any particular
> > interface on users.
> 
> force interface on users?!
> I think I've explained enough that bpf would not be a user visible
> interface. It's kernel to user land interface. In my proposal of
> 'hist->bpf' tool users won't even see that bpf is being generated
> and run underneath. Same with dtrace-like scripting. Users will
> see scripting language and won't care how it's compiled and
> executed inside kernel. Multiple different languages and interfaces
> are possible. Including hist-like text that's not a language.
> 

What about the kernel to user land interface that users have already
been asking for, pseudo-files in debugfs/tracefs?  If you can't do that,
then you're forcing interface on users.  

>  > If it's possible to factor out a common
> > infrastructure that can accommodate different user interfaces and
> > preferences, don't you think it makes sense to do that?
> >
> > In any case, outside of the boilerplate tracing infrastructure code, it
> > seems to me that a lot of the code in the main hist trigger patches is
> > code that you'd also need for a tracepoint/bpf interface.  I think we've
> > already agreed that it would be good to work towards being able to share
> > that, so for the next version, I'll see what I can come up with.
> 
> yes. let's see what pieces can be shared. Though I don't want to
> sacrifice long term goals for short term solution.
> In case of bpf_maps, the objective is to share data between
> kernel and user space by single abstraction with different
> implementations underneath. In some use cases kernel programs only
> do lookups into maps, while user space concurrently adding/deleting
> entries (so no allocations from programs)
> In your patch 4 you're adding kernel only access to maps but planning
> to use hash type only and also not exposing these maps to user space
> at all... that defeats bpf_map purpose and you only reusing ~200
> lines of hashtab.c code, but also need to hack it for pre-allocate
> and make config_bpf_syscall a strong dependency for 'hist'. why?
> 'hist' patch set would have been much shorter if you just used
> hashtable.h or rhashtable.h or even rbtree.h (so you don't need
> to do sorting in patch 13). The actual hash insert/walk code
> is tiny and trivial.

Well, I kind of hinted in the cover letter that ultimately I thought the
tracing_map common code would move out of bpf/syscall.c, and thus break
the dependency on bpf_syscall.  I didn't want to spend time actually
doing that before knowing how the general idea would fly, and am glad
now that I didn't.

But the reason I thought of using it in the first place was that I saw a
map implementation specially designed for tracing and thought it would
make sense to reuse it if possible for what I thought was a
complementary tracing tool.  I did find it useful, and using it did
simplify my overall code, but it seems it's more trouble than it's worth
and not particularly welcome, so yeah, I probably will just look at
something else.

Thanks for your input,

Tom

> I've looked through the patches again and they seem to have serious
> bugs when it comes to object life time:
> - patch 5 that adds clear_map is completely broken, calling
> delete_all_elements() by wrapping it in rcu_read_lock() ?!
> - patch 6 adds 'free_elem' callback that is called right from
> htab_map_delete_elem() while we're still under rcu.. ?!
> - patch 12 does
> struct hist_trigger_entry *entry;
> tracing_map_lookup_elem(hist_data->map, next_key, &entry);
> hist_trigger_entry_print(m, hist_data, next_key, entry);
> so when you're storing a pointer inside the map the rcu protection
> of map_lookup protects only the value of 'entry' pointer.
> The actual contents of *entry are not protected by anything,
> so even if you fix 5 and 6, you'll still have severe memory corruptions,
> since nothing guarantees that contents of *entry are valid when
> hist_trigger_entry_print() is trying to access it.
> 
> I think you'd be better off using vanilla hashtable.h
> 
> Thanks
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:55 ` Tom Zanussi
@ 2015-03-09 11:38   ` He Kuang
  0 siblings, 0 replies; 23+ messages in thread
From: He Kuang @ 2015-03-09 11:38 UTC (permalink / raw)
  To: Tom Zanussi, Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

hi, Tom

On 2015/3/3 3:55, Tom Zanussi wrote:
> BTW, I've actually tried to play around with the BPF samples/, but it 
> seems they're not actually hooked into the system i.e. the samples 
> Makefile doesn't build them, and it even looks for tools/llvm that's 
> not there. I got as far as getting the latest llvm from the location 
> mentioened in one of the bpf commit messages, but gave up after it 
> told me 'invalid target bpf'. And I couldn't find any documentation on 
> how to set it all up - did I just miss that? Tom 

I'd also encountered the same problem:

--enable-experimental-targets=BPF, 'BPF' letters must be uppercase. Have 
a try.
>> Thanks
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-03  2:25 ` Masami Hiramatsu
@ 2015-03-03 14:47   ` Tom Zanussi
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Zanussi @ 2015-03-03 14:47 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: rostedt, namhyung, andi, ast, linux-kernel

On Tue, 2015-03-03 at 11:25 +0900, Masami Hiramatsu wrote:
> (2015/03/03 1:00), Tom Zanussi wrote:
> > This is v2 of my previously posted 'hashtriggers' patchset [1], but
> > renamed to 'hist triggers' following feedback from v1.
> 
> This is what I need :) The trigger interface gives us better flexibility
> for environment. With this series I believe the 80% use of "scripting
> tracing" can be replaced with just "echo'ing tracing" via tracefs :)
> 

Glad you like it, thanks!

> > 
> > Since then, the kernel has gained a tracing map implementation in the
> > form of bpf_map, which this patchset makes a bit more generic, exports
> > and uses (as tracing_map_*, still in the bpf syscall file however).
> > 
> > A large part of the initial hash triggers implementation was devoted
> > to a map implementation and general-purpose hashing functions, which
> > have now been subsumed by the bpf maps.  I've completely redone the
> > trigger patches themselves to work on top of tracing_map.  The result
> > is a much simpler and easier-to-review patchset that's able to focus
> > more directly on the problem at hand.
> > 
> > The new version addresses all the comments from the previous review,
> > including changing the name from hash->hist, adding separate 'hist'
> > files for the output, and moving the examples into Documentation.
> > 
> > This patchset also includes a couple other new and related triggers,
> > enable_hist and disable_hist, very similar to the existing
> > enable_event/disable_event triggers used to automatically enable and
> > disable events based on a triggering condition, but in this case
> > allowing hist triggers to be enabled and disabled in the same way.
> > 
> > The only problem with using the bpf_map implementation for this is
> > that it uses kmalloc internally, which causes problems when trying to
> > trace kmalloc itself.  I'm guessing the ebpf tracing code would also
> > share this problem e.g. when using bpf_maps from probes on kmalloc().
> > This patchset attempts a solution to that problem (by adding a
> > gfp_flag and changing the kmem memory allocation tracepoints to
> > conditional variants) for checking for it in for but I'm not sure it's
> > the best way to address it.
> 
> That is not a solution for kprobe-based events, nor the events on
> interrupt context.
> Can we reserve some amount of memory for bpf_map? and If it is exceeded
> the reserved memory we can choose (A) disable hist or (B) continue
> to do with kmalloc.
> 

Yeah, the non-bpf_map v1 did (A) with reserved memory.  I'll take a look
at doing that again for the next version.   

> > 
> > There are a couple of important bits of functionality that were
> > present in v1 but dropped in v2 mainly because I'm still trying to
> > figure out the best way to accomplish those things using the bpf_map
> > implementation.
> > 
> > The first is support for compound keys.  Currently, maps can only be
> > keyed on a single event field, whereas in v1 they could be keyed on
> > multiple keys.  With support for compound keys, you can create much
> > more interesting output, such as for example per-pid lists of
> > syscalls or read counts e.g.:
> > 
> >   # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
> >         /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
> > 
> >   # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
> > 
> >   key: common_pid:bash[3112], id:sys_write		       vals: count:69
> >   key: common_pid:bash[3112], id:sys_rt_sigprocmask	       vals: count:218
> > 
> >   key: common_pid:update-notifier[3164], id:sys_poll	       vals: count:37
> >   key: common_pid:update-notifier[3164], id:sys_recvfrom       vals: count:118
> > 
> >   key: common_pid:deja-dup-monito[3194], id:sys_sendto	       vals: count:1
> >   key: common_pid:deja-dup-monito[3194], id:sys_read	       vals: count:4
> >   key: common_pid:deja-dup-monito[3194], id:sys_poll	       vals: count:8
> >   key: common_pid:deja-dup-monito[3194], id:sys_recvmsg	       vals: count:8
> >   key: common_pid:deja-dup-monito[3194], id:sys_getegid	       vals: count:8
> > 
> >   key: common_pid:emacs[3275], id:sys_fsync		       vals: count:1
> >   key: common_pid:emacs[3275], id:sys_open		       vals: count:1
> >   key: common_pid:emacs[3275], id:sys_symlink		       vals: count:2
> >   key: common_pid:emacs[3275], id:sys_poll		       vals: count:23
> >   key: common_pid:emacs[3275], id:sys_select		       vals: count:23
> >   key: common_pid:emacs[3275], id:unknown_syscall	       vals: count:34
> >   key: common_pid:emacs[3275], id:sys_ioctl		       vals: count:60
> >   key: common_pid:emacs[3275], id:sys_rt_sigprocmask	       vals: count:116
> > 
> >   key: common_pid:cat[3323], id:sys_munmap		       vals: count:1
> >   key: common_pid:cat[3323], id:sys_fadvise64		       vals: count:1
> 
> Very impressive! :)
> 

Thanks!

Tom

> Thank you,
> 
> > 
> > Related to that is support for sorting on multiple fields.  Currently,
> > you can sort using only a primary key.  Being able to sort on multiple
> > or at least a secondary key is indispensible for seeing trends when
> > displaying multiple values.
> > 
> > [1] http://thread.gmane.org/gmane.linux.kernel/1673551
> > 
> > Changes from v1:
> >  - completely rewritten on top of tracing_map (renamed and exported bpf_map)
> >  - added map clearing and client ops to tracing_map
> >  - changed the name from 'hash' triggers to 'hist' triggers
> >  - added new trigger 'pause' feature
> >  - added new enable_hist and disable_hist triggers
> >  - added usage for hist/enable_hist/disable hist to tracing/README
> >  - moved examples into Documentation/trace/event.txt
> >  - added ___GFP_NOTRACE, kmalloc/kfree macros, and conditional kmem tracepoints
> > 
> > The following changes since commit 49058038a12cfd9044146a1bf4b286781268d5c9:
> > 
> >   ring-buffer: Do not wake up a splice waiter when page is not full (2015-02-24 14:00:41 -0600)
> > 
> > are available in the git repository at:
> > 
> >   git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hist-triggers-v2
> >   http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hist-triggers-v2
> > 
> > Tom Zanussi (15):
> >   tracing: Make ftrace_event_field checking functions available
> >   tracing: Add event record param to trigger_ops.func()
> >   tracing: Add get_syscall_name()
> >   bpf: Export bpf map functionality as trace_map_*
> >   bpf: Export a map-clearing function
> >   bpf: Add tracing_map client ops
> >   mm: Add ___GFP_NOTRACE
> >   tracing: Make kmem memory allocation tracepoints conditional
> >   tracing: Add kmalloc/kfree macros
> >   bpf: Make tracing_map use kmalloc/kfree_notrace()
> >   tracing: Add a per-event-trigger 'paused' field
> >   tracing: Add 'hist' event trigger command
> >   tracing: Add sorting to hist triggers
> >   tracing: Add enable_hist/disable_hist triggers
> >   tracing: Add 'hist' trigger Documentation
> > 
> >  Documentation/trace/events.txt      |  870 +++++++++++++++++++++
> >  include/linux/bpf.h                 |   15 +
> >  include/linux/ftrace_event.h        |    9 +-
> >  include/linux/gfp.h                 |    3 +-
> >  include/linux/slab.h                |   61 +-
> >  include/trace/events/kmem.h         |   28 +-
> >  kernel/bpf/arraymap.c               |   16 +
> >  kernel/bpf/hashtab.c                |   39 +-
> >  kernel/bpf/syscall.c                |  193 ++++-
> >  kernel/trace/trace.c                |   48 ++
> >  kernel/trace/trace.h                |   25 +-
> >  kernel/trace/trace_events.c         |    3 +
> >  kernel/trace/trace_events_filter.c  |   15 +-
> >  kernel/trace/trace_events_trigger.c | 1466 ++++++++++++++++++++++++++++++++++-
> >  kernel/trace/trace_syscalls.c       |   11 +
> >  mm/slab.c                           |   45 +-
> >  mm/slob.c                           |   45 +-
> >  mm/slub.c                           |   47 +-
> >  18 files changed, 2795 insertions(+), 144 deletions(-)
> > 
> 
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 16:00 Tom Zanussi
@ 2015-03-03  2:25 ` Masami Hiramatsu
  2015-03-03 14:47   ` Tom Zanussi
  0 siblings, 1 reply; 23+ messages in thread
From: Masami Hiramatsu @ 2015-03-03  2:25 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, namhyung, andi, ast, linux-kernel

(2015/03/03 1:00), Tom Zanussi wrote:
> This is v2 of my previously posted 'hashtriggers' patchset [1], but
> renamed to 'hist triggers' following feedback from v1.

This is what I need :) The trigger interface gives us better flexibility
for environment. With this series I believe the 80% use of "scripting
tracing" can be replaced with just "echo'ing tracing" via tracefs :)

> 
> Since then, the kernel has gained a tracing map implementation in the
> form of bpf_map, which this patchset makes a bit more generic, exports
> and uses (as tracing_map_*, still in the bpf syscall file however).
> 
> A large part of the initial hash triggers implementation was devoted
> to a map implementation and general-purpose hashing functions, which
> have now been subsumed by the bpf maps.  I've completely redone the
> trigger patches themselves to work on top of tracing_map.  The result
> is a much simpler and easier-to-review patchset that's able to focus
> more directly on the problem at hand.
> 
> The new version addresses all the comments from the previous review,
> including changing the name from hash->hist, adding separate 'hist'
> files for the output, and moving the examples into Documentation.
> 
> This patchset also includes a couple other new and related triggers,
> enable_hist and disable_hist, very similar to the existing
> enable_event/disable_event triggers used to automatically enable and
> disable events based on a triggering condition, but in this case
> allowing hist triggers to be enabled and disabled in the same way.
> 
> The only problem with using the bpf_map implementation for this is
> that it uses kmalloc internally, which causes problems when trying to
> trace kmalloc itself.  I'm guessing the ebpf tracing code would also
> share this problem e.g. when using bpf_maps from probes on kmalloc().
> This patchset attempts a solution to that problem (by adding a
> gfp_flag and changing the kmem memory allocation tracepoints to
> conditional variants) for checking for it in for but I'm not sure it's
> the best way to address it.

That is not a solution for kprobe-based events, nor the events on
interrupt context.
Can we reserve some amount of memory for bpf_map? and If it is exceeded
the reserved memory we can choose (A) disable hist or (B) continue
to do with kmalloc.

> 
> There are a couple of important bits of functionality that were
> present in v1 but dropped in v2 mainly because I'm still trying to
> figure out the best way to accomplish those things using the bpf_map
> implementation.
> 
> The first is support for compound keys.  Currently, maps can only be
> keyed on a single event field, whereas in v1 they could be keyed on
> multiple keys.  With support for compound keys, you can create much
> more interesting output, such as for example per-pid lists of
> syscalls or read counts e.g.:
> 
>   # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
>         /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
> 
>   # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
> 
>   key: common_pid:bash[3112], id:sys_write		       vals: count:69
>   key: common_pid:bash[3112], id:sys_rt_sigprocmask	       vals: count:218
> 
>   key: common_pid:update-notifier[3164], id:sys_poll	       vals: count:37
>   key: common_pid:update-notifier[3164], id:sys_recvfrom       vals: count:118
> 
>   key: common_pid:deja-dup-monito[3194], id:sys_sendto	       vals: count:1
>   key: common_pid:deja-dup-monito[3194], id:sys_read	       vals: count:4
>   key: common_pid:deja-dup-monito[3194], id:sys_poll	       vals: count:8
>   key: common_pid:deja-dup-monito[3194], id:sys_recvmsg	       vals: count:8
>   key: common_pid:deja-dup-monito[3194], id:sys_getegid	       vals: count:8
> 
>   key: common_pid:emacs[3275], id:sys_fsync		       vals: count:1
>   key: common_pid:emacs[3275], id:sys_open		       vals: count:1
>   key: common_pid:emacs[3275], id:sys_symlink		       vals: count:2
>   key: common_pid:emacs[3275], id:sys_poll		       vals: count:23
>   key: common_pid:emacs[3275], id:sys_select		       vals: count:23
>   key: common_pid:emacs[3275], id:unknown_syscall	       vals: count:34
>   key: common_pid:emacs[3275], id:sys_ioctl		       vals: count:60
>   key: common_pid:emacs[3275], id:sys_rt_sigprocmask	       vals: count:116
> 
>   key: common_pid:cat[3323], id:sys_munmap		       vals: count:1
>   key: common_pid:cat[3323], id:sys_fadvise64		       vals: count:1

Very impressive! :)

Thank you,

> 
> Related to that is support for sorting on multiple fields.  Currently,
> you can sort using only a primary key.  Being able to sort on multiple
> or at least a secondary key is indispensible for seeing trends when
> displaying multiple values.
> 
> [1] http://thread.gmane.org/gmane.linux.kernel/1673551
> 
> Changes from v1:
>  - completely rewritten on top of tracing_map (renamed and exported bpf_map)
>  - added map clearing and client ops to tracing_map
>  - changed the name from 'hash' triggers to 'hist' triggers
>  - added new trigger 'pause' feature
>  - added new enable_hist and disable_hist triggers
>  - added usage for hist/enable_hist/disable hist to tracing/README
>  - moved examples into Documentation/trace/event.txt
>  - added ___GFP_NOTRACE, kmalloc/kfree macros, and conditional kmem tracepoints
> 
> The following changes since commit 49058038a12cfd9044146a1bf4b286781268d5c9:
> 
>   ring-buffer: Do not wake up a splice waiter when page is not full (2015-02-24 14:00:41 -0600)
> 
> are available in the git repository at:
> 
>   git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hist-triggers-v2
>   http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hist-triggers-v2
> 
> Tom Zanussi (15):
>   tracing: Make ftrace_event_field checking functions available
>   tracing: Add event record param to trigger_ops.func()
>   tracing: Add get_syscall_name()
>   bpf: Export bpf map functionality as trace_map_*
>   bpf: Export a map-clearing function
>   bpf: Add tracing_map client ops
>   mm: Add ___GFP_NOTRACE
>   tracing: Make kmem memory allocation tracepoints conditional
>   tracing: Add kmalloc/kfree macros
>   bpf: Make tracing_map use kmalloc/kfree_notrace()
>   tracing: Add a per-event-trigger 'paused' field
>   tracing: Add 'hist' event trigger command
>   tracing: Add sorting to hist triggers
>   tracing: Add enable_hist/disable_hist triggers
>   tracing: Add 'hist' trigger Documentation
> 
>  Documentation/trace/events.txt      |  870 +++++++++++++++++++++
>  include/linux/bpf.h                 |   15 +
>  include/linux/ftrace_event.h        |    9 +-
>  include/linux/gfp.h                 |    3 +-
>  include/linux/slab.h                |   61 +-
>  include/trace/events/kmem.h         |   28 +-
>  kernel/bpf/arraymap.c               |   16 +
>  kernel/bpf/hashtab.c                |   39 +-
>  kernel/bpf/syscall.c                |  193 ++++-
>  kernel/trace/trace.c                |   48 ++
>  kernel/trace/trace.h                |   25 +-
>  kernel/trace/trace_events.c         |    3 +
>  kernel/trace/trace_events_filter.c  |   15 +-
>  kernel/trace/trace_events_trigger.c | 1466 ++++++++++++++++++++++++++++++++++-
>  kernel/trace/trace_syscalls.c       |   11 +
>  mm/slab.c                           |   45 +-
>  mm/slob.c                           |   45 +-
>  mm/slub.c                           |   47 +-
>  18 files changed, 2795 insertions(+), 144 deletions(-)
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-03  0:01 Alexei Starovoitov
@ 2015-03-03  1:18 ` Tom Zanussi
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Zanussi @ 2015-03-03  1:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, 2015-03-02 at 16:01 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi
> <tom.zanussi@linux.intel.com> wrote:
> >
> > I disagree that it would be rarely used.  In fact, it would probably
> > cover about 80% of the use cases that people initially use things like
> > systemtap or dtrace for, which I guess is what ebpf is shooting for.
> 
> 'hist' style won't solve any of the use cases I'm targeting with bpf.
> So, imo, 'hist' being 80% of dtrace is far from reality...
> but let's agree to disagree. it's not that important.
> I'm not saying don't do 'hist' at all.
> I'm only suggesting to do it differently.
> 
> > I'm also looking at systems that have very little memory and 8Mb of
> > storage to work with, so streaming it all to userspace and
> > post-processing won't really work on those systems.
> 
> I'm not suggesting to post-process. Quite the opposite.
> Let programs do ++ in the kernel, since that's what
> your patch 12 is doing, but in a hard coded way.
> 
> > With some thought, though, I think the ebpf system/interpreter could be
> > made smart enough to recognize the simple patterns represented by the
> > hist triggers, and reuse them internally.  So ftrace users get their
> > command-line version and it's also something ebpf can reuse.
> 
> I'm saying keep the command line version of hist, but let
> user space process it.
> I don't buy the argument that you must run it in busybox
> without any extra tools.
> If you're in busybox, the system is likely idle, so nothing
> to trace/analyze. If you have some user space apps,
> then it equally easy to add 'hist->bpf' tool.
> 

How about systems that run a single statically linked process with no
shell (but a service that can read and write files like/event/trigger
and event/hist)?  We'd still like to be able to trace those systems.

> >> to embedded argument. Why add this to kernel
> >> when bpf programs can do the same on demand?
> >
> > Because this demonstrates that all those things can be done without
> > introducing an interpreter into the mix, so why bother with the
> > interpreter?
> 
> because interpreter is done once for all use cases,
> whereas custom 'hist' code is doing only one thing for one use case.
> 

I agree that the hist functionality is a subset of what can be done with
a full-blown interpreter, but it's not doing just one thing for one use
case - it covers a whole set of use cases.

> >> the kernel ABI exposure is much smaller.
> >> So would you consider working together on adding
> >> clean bpf+tracepoints infra and corresponding
> >> user space bits?
> >> We can have small user space parser/compiler for
> >> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> >> strings that will convert it into bpf program and you'll
> >> be able to use it in embedded setups ?
> >
> > Yeah, wouldn't be averse to working together to create a clean bpf
> > +tracepoints infrastructure - I think creating a reusable component like
> > this would be a good first step.
> 
> great.
> From the program you can emit the same text format
> as in your 'cat hist' example.
> But it will not be a part of stable kernel ABI, which I think is
> one of the main advantages to do such printing from programs
> instead of kernel C code.
> If you decide to extend what is being printed, you can
> tweak 'hist->bpf' tool and print something else.
> No one will complain, whereas when you would want
> to extend the format of 'hist' file printed by kernel, you'd
> need to consider all user tools that are parsing it.
> Like we saw in systrace example...
> 
> > BTW, I've actually tried to play around with the BPF samples/, but it
> > seems they're not actually hooked into the system i.e. the samples
> > Makefile doesn't build them, and it even looks for tools/llvm that's not
> > there.  I got as far as getting the latest llvm from the location
> > mentioened in one of the bpf commit messages, but gave up after it told
> > me 'invalid target bpf'.  And I couldn't find any documentation on how
> > to set it all up - did I just miss that?
> 
> the comment next to 'tool/llvm' says 'point to your llvm' :)
> so yes, to build C examples one need to install latest llvm trunk.
> If you're saying that existing bpf stuff is hard to use, then yes.

Well, I'd say writing BPF 'assembly' to do anything isn't something more
than a few users in the world would even consider, so that's completely
out. Which means the only practical way to use it is via the C
interface.  But getting that set up properly doesn't seem
straightforward either - it isn't something the Makefile will help with,
and there's no documentation on how one might do it.

So I tweaked the Makefile to get samples/bpf in the build (I mean the
directory is there under samples/, so why do I need to add it to the
Makefile myself?) and tried building which failed until I tweaked
something else to get it to find the right headers, etc.  Finally I got
it building the userspace stuff but then found out I needed my own llvm
to get the kernel modules built, so searched and found your llvm tree
which I thought would configure the bpf backend automatically, but
apparently not, since it then failed with llc: invalid target 'bpf'
which is where I gave up.  Do I need to configure with --target=bpf or
something like that?  I don't know, and know nothing about llvm, so am
kind of stuck.

I really do want to try doing something with it, and I understand that
you're working on improving the user experience, but at this point it
seems users have to jump through a lot of hoops just to get a minimally
working setup.  Even a small paragraph with some basic instructions
would help.  Or maybe it's just me, and it works for everyone else out
of the box.

Tom

> I completely agree. It is hard to use. We're working on it.
> The user bits can be improved gradually unlike kernel/user
> boundary. Once you set it to be 'hist' file format it will stay
> forever.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
@ 2015-03-03  0:01 Alexei Starovoitov
  2015-03-03  1:18 ` Tom Zanussi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-03  0:01 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi
<tom.zanussi@linux.intel.com> wrote:
>
> I disagree that it would be rarely used.  In fact, it would probably
> cover about 80% of the use cases that people initially use things like
> systemtap or dtrace for, which I guess is what ebpf is shooting for.

'hist' style won't solve any of the use cases I'm targeting with bpf.
So, imo, 'hist' being 80% of dtrace is far from reality...
but let's agree to disagree. it's not that important.
I'm not saying don't do 'hist' at all.
I'm only suggesting to do it differently.

> I'm also looking at systems that have very little memory and 8Mb of
> storage to work with, so streaming it all to userspace and
> post-processing won't really work on those systems.

I'm not suggesting to post-process. Quite the opposite.
Let programs do ++ in the kernel, since that's what
your patch 12 is doing, but in a hard coded way.

> With some thought, though, I think the ebpf system/interpreter could be
> made smart enough to recognize the simple patterns represented by the
> hist triggers, and reuse them internally.  So ftrace users get their
> command-line version and it's also something ebpf can reuse.

I'm saying keep the command line version of hist, but let
user space process it.
I don't buy the argument that you must run it in busybox
without any extra tools.
If you're in busybox, the system is likely idle, so nothing
to trace/analyze. If you have some user space apps,
then it equally easy to add 'hist->bpf' tool.

>> to embedded argument. Why add this to kernel
>> when bpf programs can do the same on demand?
>
> Because this demonstrates that all those things can be done without
> introducing an interpreter into the mix, so why bother with the
> interpreter?

because interpreter is done once for all use cases,
whereas custom 'hist' code is doing only one thing for one use case.

>> the kernel ABI exposure is much smaller.
>> So would you consider working together on adding
>> clean bpf+tracepoints infra and corresponding
>> user space bits?
>> We can have small user space parser/compiler for
>> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
>> strings that will convert it into bpf program and you'll
>> be able to use it in embedded setups ?
>
> Yeah, wouldn't be averse to working together to create a clean bpf
> +tracepoints infrastructure - I think creating a reusable component like
> this would be a good first step.

great.
>From the program you can emit the same text format
as in your 'cat hist' example.
But it will not be a part of stable kernel ABI, which I think is
one of the main advantages to do such printing from programs
instead of kernel C code.
If you decide to extend what is being printed, you can
tweak 'hist->bpf' tool and print something else.
No one will complain, whereas when you would want
to extend the format of 'hist' file printed by kernel, you'd
need to consider all user tools that are parsing it.
Like we saw in systrace example...

> BTW, I've actually tried to play around with the BPF samples/, but it
> seems they're not actually hooked into the system i.e. the samples
> Makefile doesn't build them, and it even looks for tools/llvm that's not
> there.  I got as far as getting the latest llvm from the location
> mentioened in one of the bpf commit messages, but gave up after it told
> me 'invalid target bpf'.  And I couldn't find any documentation on how
> to set it all up - did I just miss that?

the comment next to 'tool/llvm' says 'point to your llvm' :)
so yes, to build C examples one need to install latest llvm trunk.
If you're saying that existing bpf stuff is hard to use, then yes.
I completely agree. It is hard to use. We're working on it.
The user bits can be improved gradually unlike kernel/user
boundary. Once you set it to be 'hist' file format it will stay
forever.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:48           ` Alexei Starovoitov
  2015-03-02 20:52             ` Karim Yaghmour
@ 2015-03-02 20:54             ` Karim Yaghmour
  1 sibling, 0 replies; 23+ messages in thread
From: Karim Yaghmour @ 2015-03-02 20:54 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo


On 15-03-02 03:48 PM, Alexei Starovoitov wrote:
> good. thanks for explaining.
> all makes sense now.
> 
> btw, that fancy systrace seems to be parsing text from trace_pipe
> https://android.googlesource.com/platform/external/chromium-trace/+/jb-dev/src/tracing/linux_perf_importer.js
> with a bunch of regex...
> including sched_switch: next_prio...

Oh, and you can't use Firefox to view the traces it generates. You can
only use Chrome ... even though the documentation says: "... you can
open the report using a web browser." Which I guess means that "browser
= chrome" at Google.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:48           ` Alexei Starovoitov
@ 2015-03-02 20:52             ` Karim Yaghmour
  2015-03-02 20:54             ` Karim Yaghmour
  1 sibling, 0 replies; 23+ messages in thread
From: Karim Yaghmour @ 2015-03-02 20:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Steven Rostedt
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo


On 15-03-02 03:48 PM, Alexei Starovoitov wrote:
> good. thanks for explaining.
> all makes sense now.
> 
> btw, that fancy systrace seems to be parsing text from trace_pipe
> https://android.googlesource.com/platform/external/chromium-trace/+/jb-dev/src/tracing/linux_perf_importer.js
> with a bunch of regex...
> including sched_switch: next_prio...

Yes, it does. This is why it's not meant for analyzing large traces.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:43         ` Steven Rostedt
@ 2015-03-02 20:48           ` Alexei Starovoitov
  2015-03-02 20:52             ` Karim Yaghmour
  2015-03-02 20:54             ` Karim Yaghmour
  0 siblings, 2 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 20:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Karim Yaghmour, Tom Zanussi, Masami Hiramatsu, Namhyung Kim,
	Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 12:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Mar 2015 15:39:48 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Mon, 2 Mar 2015 12:33:34 -0800
>> Alexei Starovoitov <ast@plumgrid.com> wrote:
>>
>> > On Mon, Mar 2, 2015 at 11:49 AM, Karim Yaghmour
>> > <karim.yaghmour@opersys.com> wrote:
>> > >
>> > > On 15-03-02 02:45 PM, Steven Rostedt wrote:
>> > >> Interesting. The Android devices I have still have it enabled (rooted,
>> > >> but still running the stock system).
>> > >
>> > > I don't know that there's any policy to disable tracing on Android. The
>> > > Android framework in fact has generally been instrumented by Google
>> > > itself to output trace info into trace_marker. And the systrace/atrace
>> > > tools made available to app developers need to get access to this
>> > > tracing info. So, if Android had tracing disabled, systrace/atrace
>> > > wouldn't work.
>> > > https://developer.android.com/tools/debugging/systrace.html
>> >
>> > that's interesting. thanks for the link.
>> >
>> > I don't see tracing being explicitly enabled in defconfig:
>> > https://source.android.com/devices/tech/kernel.html
>>
>> CONFIG_SCHED_TRACER=y
>>
>> That alone will enable tracing.
>>
>> -- Steve
>>
>> > or here:
>> > https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg
>>
>
> CONFIG_ENABLE_DEFAULT_TRACERS=y
>
> And so will that.

good. thanks for explaining.
all makes sense now.

btw, that fancy systrace seems to be parsing text from trace_pipe
https://android.googlesource.com/platform/external/chromium-trace/+/jb-dev/src/tracing/linux_perf_importer.js
with a bunch of regex...
including sched_switch: next_prio...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:39       ` Steven Rostedt
@ 2015-03-02 20:43         ` Steven Rostedt
  2015-03-02 20:48           ` Alexei Starovoitov
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2015-03-02 20:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Karim Yaghmour, Tom Zanussi, Masami Hiramatsu, Namhyung Kim,
	Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, 2 Mar 2015 15:39:48 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2 Mar 2015 12:33:34 -0800
> Alexei Starovoitov <ast@plumgrid.com> wrote:
> 
> > On Mon, Mar 2, 2015 at 11:49 AM, Karim Yaghmour
> > <karim.yaghmour@opersys.com> wrote:
> > >
> > > On 15-03-02 02:45 PM, Steven Rostedt wrote:
> > >> Interesting. The Android devices I have still have it enabled (rooted,
> > >> but still running the stock system).
> > >
> > > I don't know that there's any policy to disable tracing on Android. The
> > > Android framework in fact has generally been instrumented by Google
> > > itself to output trace info into trace_marker. And the systrace/atrace
> > > tools made available to app developers need to get access to this
> > > tracing info. So, if Android had tracing disabled, systrace/atrace
> > > wouldn't work.
> > > https://developer.android.com/tools/debugging/systrace.html
> > 
> > that's interesting. thanks for the link.
> > 
> > I don't see tracing being explicitly enabled in defconfig:
> > https://source.android.com/devices/tech/kernel.html
> 
> CONFIG_SCHED_TRACER=y
> 
> That alone will enable tracing.
> 
> -- Steve
> 
> > or here:
> > https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg
> 

CONFIG_ENABLE_DEFAULT_TRACERS=y

And so will that.

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:33     ` Alexei Starovoitov
  2015-03-02 20:37       ` Karim Yaghmour
@ 2015-03-02 20:39       ` Steven Rostedt
  2015-03-02 20:43         ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2015-03-02 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Karim Yaghmour, Tom Zanussi, Masami Hiramatsu, Namhyung Kim,
	Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, 2 Mar 2015 12:33:34 -0800
Alexei Starovoitov <ast@plumgrid.com> wrote:

> On Mon, Mar 2, 2015 at 11:49 AM, Karim Yaghmour
> <karim.yaghmour@opersys.com> wrote:
> >
> > On 15-03-02 02:45 PM, Steven Rostedt wrote:
> >> Interesting. The Android devices I have still have it enabled (rooted,
> >> but still running the stock system).
> >
> > I don't know that there's any policy to disable tracing on Android. The
> > Android framework in fact has generally been instrumented by Google
> > itself to output trace info into trace_marker. And the systrace/atrace
> > tools made available to app developers need to get access to this
> > tracing info. So, if Android had tracing disabled, systrace/atrace
> > wouldn't work.
> > https://developer.android.com/tools/debugging/systrace.html
> 
> that's interesting. thanks for the link.
> 
> I don't see tracing being explicitly enabled in defconfig:
> https://source.android.com/devices/tech/kernel.html

CONFIG_SCHED_TRACER=y

That alone will enable tracing.

-- Steve

> or here:
> https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 20:33     ` Alexei Starovoitov
@ 2015-03-02 20:37       ` Karim Yaghmour
  2015-03-02 20:39       ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Karim Yaghmour @ 2015-03-02 20:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Tom Zanussi, Masami Hiramatsu, Namhyung Kim,
	Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo


On 15-03-02 03:33 PM, Alexei Starovoitov wrote:
> that's interesting. thanks for the link.
> 
> I don't see tracing being explicitly enabled in defconfig:
> https://source.android.com/devices/tech/kernel.html
> or here:
> https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg

I don't know that either of these is "authoritative". I know of both of
these, but I've never looked at them as being the reference for what
manufacturers ship. Instead, most manufacturers get their default
kernels from SoC vendors. So it's much likelier that an Androidized
kernel tree from Qualcomm or Intel is closer to what gets really shipped
than the two links above.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:49   ` Karim Yaghmour
@ 2015-03-02 20:33     ` Alexei Starovoitov
  2015-03-02 20:37       ` Karim Yaghmour
  2015-03-02 20:39       ` Steven Rostedt
  0 siblings, 2 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 20:33 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Steven Rostedt, Tom Zanussi, Masami Hiramatsu, Namhyung Kim,
	Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 11:49 AM, Karim Yaghmour
<karim.yaghmour@opersys.com> wrote:
>
> On 15-03-02 02:45 PM, Steven Rostedt wrote:
>> Interesting. The Android devices I have still have it enabled (rooted,
>> but still running the stock system).
>
> I don't know that there's any policy to disable tracing on Android. The
> Android framework in fact has generally been instrumented by Google
> itself to output trace info into trace_marker. And the systrace/atrace
> tools made available to app developers need to get access to this
> tracing info. So, if Android had tracing disabled, systrace/atrace
> wouldn't work.
> https://developer.android.com/tools/debugging/systrace.html

that's interesting. thanks for the link.

I don't see tracing being explicitly enabled in defconfig:
https://source.android.com/devices/tech/kernel.html
or here:
https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:14 Alexei Starovoitov
  2015-03-02 19:31 ` Steven Rostedt
@ 2015-03-02 19:55 ` Tom Zanussi
  2015-03-09 11:38   ` He Kuang
  1 sibling, 1 reply; 23+ messages in thread
From: Tom Zanussi @ 2015-03-02 19:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

Hi Alexei,

On Mon, 2015-03-02 at 11:14 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 8:00 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >
> >   # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
> >         /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
> >
> >   # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
> >
> >   key: common_pid:bash[3112], id:sys_write                     vals: count:69
> >   key: common_pid:bash[3112], id:sys_rt_sigprocmask            vals: count:218
> 
> Hi Tom,
> 
> I think we both want to see in-kernel aggregation.
> This 'hist' stuff is trying to do counting and even map sorting
> in the kernel, whereas with bpf programs I'm moving
> all of these decisions to user space.
> I understand your desire to avoid any user level scripts
> and do everything via 'cat' and debugfs, but imo that's
> very limiting. I think it's better to do slim user space

It's consistent with the whole host of other ftrace tools, so I don't
know why that model would be any more limiting for this.

> scripting language that can translate to bpf even in
> embedded setups. Then users will be able to aggregate
> whatever they like, whereas with 'hist' approach
> they're limited to simple counters.
> trace_events_trigger.c - 1466 lines - that's quite a bit
> of code that will be rarely used. Kinda goes counter

I disagree that it would be rarely used.  In fact, it would probably
cover about 80% of the use cases that people initially use things like
systemtap or dtrace for, which I guess is what ebpf is shooting for.

I'm also looking at systems that have very little memory and 8Mb of
storage to work with, so streaming it all to userspace and
post-processing won't really work on those systems.

With some thought, though, I think the ebpf system/interpreter could be
made smart enough to recognize the simple patterns represented by the
hist triggers, and reuse them internally.  So ftrace users get their
command-line version and it's also something ebpf can reuse.

> to embedded argument. Why add this to kernel
> when bpf programs can do the same on demand?

Because this demonstrates that all those things can be done without
introducing an interpreter into the mix, so why bother with the
interpreter?

> Also the arguments about stable ABI apply as well.
> The format of 'hist' file would need to be stable, so will
> be hard to extend it. With bpf programs doing aggregation

Well, the format is very regular - keys and values and summary lines,
not much to break there.

> the kernel ABI exposure is much smaller.
> So would you consider working together on adding
> clean bpf+tracepoints infra and corresponding
> user space bits?
> We can have small user space parser/compiler for
> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> strings that will convert it into bpf program and you'll
> be able to use it in embedded setups ?

Yeah, wouldn't be averse to working together to create a clean bpf
+tracepoints infrastructure - I think creating a reusable component like
this would be a good first step.

BTW, I've actually tried to play around with the BPF samples/, but it
seems they're not actually hooked into the system i.e. the samples
Makefile doesn't build them, and it even looks for tools/llvm that's not
there.  I got as far as getting the latest llvm from the location
mentioened in one of the bpf commit messages, but gave up after it told
me 'invalid target bpf'.  And I couldn't find any documentation on how
to set it all up - did I just miss that?

Tom

> 
> Thanks



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:45 ` Steven Rostedt
@ 2015-03-02 19:49   ` Karim Yaghmour
  2015-03-02 20:33     ` Alexei Starovoitov
  0 siblings, 1 reply; 23+ messages in thread
From: Karim Yaghmour @ 2015-03-02 19:49 UTC (permalink / raw)
  To: Steven Rostedt, Alexei Starovoitov
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo


On 15-03-02 02:45 PM, Steven Rostedt wrote:
> Interesting. The Android devices I have still have it enabled (rooted,
> but still running the stock system).

I don't know that there's any policy to disable tracing on Android. The
Android framework in fact has generally been instrumented by Google
itself to output trace info into trace_marker. And the systrace/atrace
tools made available to app developers need to get access to this
tracing info. So, if Android had tracing disabled, systrace/atrace
wouldn't work.
https://developer.android.com/tools/debugging/systrace.html

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:36 Alexei Starovoitov
@ 2015-03-02 19:45 ` Steven Rostedt
  2015-03-02 19:49   ` Karim Yaghmour
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2015-03-02 19:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo, Karim Yaghmour

On Mon, 2 Mar 2015 11:36:36 -0800
Alexei Starovoitov <ast@plumgrid.com> wrote:

> > Make sure it also works on android.
> 
> the sad part is that tracing is off on android...
> so all of tracefs is actually compiled out there...
> and probably will be if we keep adding rarely used code to it.

Interesting. The Android devices I have still have it enabled (rooted,
but still running the stock system).

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
@ 2015-03-02 19:36 Alexei Starovoitov
  2015-03-02 19:45 ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 19:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 11:31 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Mar 2015 11:14:54 -0800
> Alexei Starovoitov <ast@plumgrid.com> wrote:
>
>> I think we both want to see in-kernel aggregation.
>> This 'hist' stuff is trying to do counting and even map sorting
>> in the kernel, whereas with bpf programs I'm moving
>> all of these decisions to user space.
>> I understand your desire to avoid any user level scripts
>> and do everything via 'cat' and debugfs, but imo that's
>> very limiting. I think it's better to do slim user space
>> scripting language that can translate to bpf even in
>> embedded setups. Then users will be able to aggregate
>> whatever they like, whereas with 'hist' approach
>> they're limited to simple counters.
>> trace_events_trigger.c - 1466 lines - that's quite a bit
>> of code that will be rarely used. Kinda goes counter
>> to embedded argument. Why add this to kernel
>> when bpf programs can do the same on demand?
>
> At Collab, a lot of people came to me telling me they love the debugfs
> system. Because it's everywhere they go. You remove that, you remove
> most users (including myself).
>
>
>> Also the arguments about stable ABI apply as well.
>> The format of 'hist' file would need to be stable, so will
>> be hard to extend it. With bpf programs doing aggregation
>> the kernel ABI exposure is much smaller.
>
> I disagree with this statement.
>
>> So would you consider working together on adding
>> clean bpf+tracepoints infra and corresponding
>> user space bits?
>> We can have small user space parser/compiler for
>> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
>> strings that will convert it into bpf program and you'll
>> be able to use it in embedded setups ?
>
> Make sure it also works on android.

the sad part is that tracing is off on android...
so all of tracefs is actually compiled out there...
and probably will be if we keep adding rarely used code to it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
  2015-03-02 19:14 Alexei Starovoitov
@ 2015-03-02 19:31 ` Steven Rostedt
  2015-03-02 19:55 ` Tom Zanussi
  1 sibling, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2015-03-02 19:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Zanussi, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, 2 Mar 2015 11:14:54 -0800
Alexei Starovoitov <ast@plumgrid.com> wrote:
 
> I think we both want to see in-kernel aggregation.
> This 'hist' stuff is trying to do counting and even map sorting
> in the kernel, whereas with bpf programs I'm moving
> all of these decisions to user space.
> I understand your desire to avoid any user level scripts
> and do everything via 'cat' and debugfs, but imo that's
> very limiting. I think it's better to do slim user space
> scripting language that can translate to bpf even in
> embedded setups. Then users will be able to aggregate
> whatever they like, whereas with 'hist' approach
> they're limited to simple counters.
> trace_events_trigger.c - 1466 lines - that's quite a bit
> of code that will be rarely used. Kinda goes counter
> to embedded argument. Why add this to kernel
> when bpf programs can do the same on demand?

At Collab, a lot of people came to me telling me they love the debugfs
system. Because it's everywhere they go. You remove that, you remove
most users (including myself).


> Also the arguments about stable ABI apply as well.
> The format of 'hist' file would need to be stable, so will
> be hard to extend it. With bpf programs doing aggregation
> the kernel ABI exposure is much smaller.

I disagree with this statement.

> So would you consider working together on adding
> clean bpf+tracepoints infra and corresponding
> user space bits?
> We can have small user space parser/compiler for
> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> strings that will convert it into bpf program and you'll
> be able to use it in embedded setups ?

Make sure it also works on android.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] tracing: 'hist' triggers
@ 2015-03-02 19:14 Alexei Starovoitov
  2015-03-02 19:31 ` Steven Rostedt
  2015-03-02 19:55 ` Tom Zanussi
  0 siblings, 2 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2015-03-02 19:14 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Steven Rostedt, Masami Hiramatsu, Namhyung Kim, Andi Kleen, LKML,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 2, 2015 at 8:00 AM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
>
>   # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
>         /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
>
>   # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
>
>   key: common_pid:bash[3112], id:sys_write                     vals: count:69
>   key: common_pid:bash[3112], id:sys_rt_sigprocmask            vals: count:218

Hi Tom,

I think we both want to see in-kernel aggregation.
This 'hist' stuff is trying to do counting and even map sorting
in the kernel, whereas with bpf programs I'm moving
all of these decisions to user space.
I understand your desire to avoid any user level scripts
and do everything via 'cat' and debugfs, but imo that's
very limiting. I think it's better to do slim user space
scripting language that can translate to bpf even in
embedded setups. Then users will be able to aggregate
whatever they like, whereas with 'hist' approach
they're limited to simple counters.
trace_events_trigger.c - 1466 lines - that's quite a bit
of code that will be rarely used. Kinda goes counter
to embedded argument. Why add this to kernel
when bpf programs can do the same on demand?
Also the arguments about stable ABI apply as well.
The format of 'hist' file would need to be stable, so will
be hard to extend it. With bpf programs doing aggregation
the kernel ABI exposure is much smaller.
So would you consider working together on adding
clean bpf+tracepoints infra and corresponding
user space bits?
We can have small user space parser/compiler for
'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
strings that will convert it into bpf program and you'll
be able to use it in embedded setups ?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 00/15] tracing: 'hist' triggers
@ 2015-03-02 16:00 Tom Zanussi
  2015-03-03  2:25 ` Masami Hiramatsu
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Zanussi @ 2015-03-02 16:00 UTC (permalink / raw)
  To: rostedt
  Cc: masami.hiramatsu.pt, namhyung, andi, ast, linux-kernel, Tom Zanussi

This is v2 of my previously posted 'hashtriggers' patchset [1], but
renamed to 'hist triggers' following feedback from v1.

Since then, the kernel has gained a tracing map implementation in the
form of bpf_map, which this patchset makes a bit more generic, exports
and uses (as tracing_map_*, still in the bpf syscall file however).

A large part of the initial hash triggers implementation was devoted
to a map implementation and general-purpose hashing functions, which
have now been subsumed by the bpf maps.  I've completely redone the
trigger patches themselves to work on top of tracing_map.  The result
is a much simpler and easier-to-review patchset that's able to focus
more directly on the problem at hand.

The new version addresses all the comments from the previous review,
including changing the name from hash->hist, adding separate 'hist'
files for the output, and moving the examples into Documentation.

This patchset also includes a couple other new and related triggers,
enable_hist and disable_hist, very similar to the existing
enable_event/disable_event triggers used to automatically enable and
disable events based on a triggering condition, but in this case
allowing hist triggers to be enabled and disabled in the same way.

The only problem with using the bpf_map implementation for this is
that it uses kmalloc internally, which causes problems when trying to
trace kmalloc itself.  I'm guessing the ebpf tracing code would also
share this problem e.g. when using bpf_maps from probes on kmalloc().
This patchset attempts a solution to that problem (by adding a
gfp_flag and changing the kmem memory allocation tracepoints to
conditional variants) for checking for it in for but I'm not sure it's
the best way to address it.

There are a couple of important bits of functionality that were
present in v1 but dropped in v2 mainly because I'm still trying to
figure out the best way to accomplish those things using the bpf_map
implementation.

The first is support for compound keys.  Currently, maps can only be
keyed on a single event field, whereas in v1 they could be keyed on
multiple keys.  With support for compound keys, you can create much
more interesting output, such as for example per-pid lists of
syscalls or read counts e.g.:

  # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
        /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

  # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist

  key: common_pid:bash[3112], id:sys_write		       vals: count:69
  key: common_pid:bash[3112], id:sys_rt_sigprocmask	       vals: count:218

  key: common_pid:update-notifier[3164], id:sys_poll	       vals: count:37
  key: common_pid:update-notifier[3164], id:sys_recvfrom       vals: count:118

  key: common_pid:deja-dup-monito[3194], id:sys_sendto	       vals: count:1
  key: common_pid:deja-dup-monito[3194], id:sys_read	       vals: count:4
  key: common_pid:deja-dup-monito[3194], id:sys_poll	       vals: count:8
  key: common_pid:deja-dup-monito[3194], id:sys_recvmsg	       vals: count:8
  key: common_pid:deja-dup-monito[3194], id:sys_getegid	       vals: count:8

  key: common_pid:emacs[3275], id:sys_fsync		       vals: count:1
  key: common_pid:emacs[3275], id:sys_open		       vals: count:1
  key: common_pid:emacs[3275], id:sys_symlink		       vals: count:2
  key: common_pid:emacs[3275], id:sys_poll		       vals: count:23
  key: common_pid:emacs[3275], id:sys_select		       vals: count:23
  key: common_pid:emacs[3275], id:unknown_syscall	       vals: count:34
  key: common_pid:emacs[3275], id:sys_ioctl		       vals: count:60
  key: common_pid:emacs[3275], id:sys_rt_sigprocmask	       vals: count:116

  key: common_pid:cat[3323], id:sys_munmap		       vals: count:1
  key: common_pid:cat[3323], id:sys_fadvise64		       vals: count:1

Related to that is support for sorting on multiple fields.  Currently,
you can sort using only a primary key.  Being able to sort on multiple
or at least a secondary key is indispensible for seeing trends when
displaying multiple values.

[1] http://thread.gmane.org/gmane.linux.kernel/1673551

Changes from v1:
 - completely rewritten on top of tracing_map (renamed and exported bpf_map)
 - added map clearing and client ops to tracing_map
 - changed the name from 'hash' triggers to 'hist' triggers
 - added new trigger 'pause' feature
 - added new enable_hist and disable_hist triggers
 - added usage for hist/enable_hist/disable hist to tracing/README
 - moved examples into Documentation/trace/event.txt
 - added ___GFP_NOTRACE, kmalloc/kfree macros, and conditional kmem tracepoints

The following changes since commit 49058038a12cfd9044146a1bf4b286781268d5c9:

  ring-buffer: Do not wake up a splice waiter when page is not full (2015-02-24 14:00:41 -0600)

are available in the git repository at:

  git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hist-triggers-v2
  http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hist-triggers-v2

Tom Zanussi (15):
  tracing: Make ftrace_event_field checking functions available
  tracing: Add event record param to trigger_ops.func()
  tracing: Add get_syscall_name()
  bpf: Export bpf map functionality as trace_map_*
  bpf: Export a map-clearing function
  bpf: Add tracing_map client ops
  mm: Add ___GFP_NOTRACE
  tracing: Make kmem memory allocation tracepoints conditional
  tracing: Add kmalloc/kfree macros
  bpf: Make tracing_map use kmalloc/kfree_notrace()
  tracing: Add a per-event-trigger 'paused' field
  tracing: Add 'hist' event trigger command
  tracing: Add sorting to hist triggers
  tracing: Add enable_hist/disable_hist triggers
  tracing: Add 'hist' trigger Documentation

 Documentation/trace/events.txt      |  870 +++++++++++++++++++++
 include/linux/bpf.h                 |   15 +
 include/linux/ftrace_event.h        |    9 +-
 include/linux/gfp.h                 |    3 +-
 include/linux/slab.h                |   61 +-
 include/trace/events/kmem.h         |   28 +-
 kernel/bpf/arraymap.c               |   16 +
 kernel/bpf/hashtab.c                |   39 +-
 kernel/bpf/syscall.c                |  193 ++++-
 kernel/trace/trace.c                |   48 ++
 kernel/trace/trace.h                |   25 +-
 kernel/trace/trace_events.c         |    3 +
 kernel/trace/trace_events_filter.c  |   15 +-
 kernel/trace/trace_events_trigger.c | 1466 ++++++++++++++++++++++++++++++++++-
 kernel/trace/trace_syscalls.c       |   11 +
 mm/slab.c                           |   45 +-
 mm/slob.c                           |   45 +-
 mm/slub.c                           |   47 +-
 18 files changed, 2795 insertions(+), 144 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-03-09 11:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03  2:31 [PATCH v2 00/15] tracing: 'hist' triggers Alexei Starovoitov
2015-03-03 15:47 ` Tom Zanussi
2015-03-04  2:22   ` Alexei Starovoitov
2015-03-04  5:01     ` Tom Zanussi
  -- strict thread matches above, loose matches on Subject: below --
2015-03-03  0:01 Alexei Starovoitov
2015-03-03  1:18 ` Tom Zanussi
2015-03-02 19:36 Alexei Starovoitov
2015-03-02 19:45 ` Steven Rostedt
2015-03-02 19:49   ` Karim Yaghmour
2015-03-02 20:33     ` Alexei Starovoitov
2015-03-02 20:37       ` Karim Yaghmour
2015-03-02 20:39       ` Steven Rostedt
2015-03-02 20:43         ` Steven Rostedt
2015-03-02 20:48           ` Alexei Starovoitov
2015-03-02 20:52             ` Karim Yaghmour
2015-03-02 20:54             ` Karim Yaghmour
2015-03-02 19:14 Alexei Starovoitov
2015-03-02 19:31 ` Steven Rostedt
2015-03-02 19:55 ` Tom Zanussi
2015-03-09 11:38   ` He Kuang
2015-03-02 16:00 Tom Zanussi
2015-03-03  2:25 ` Masami Hiramatsu
2015-03-03 14:47   ` Tom Zanussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).