All of lore.kernel.org
 help / color / mirror / Atom feed
* perf trace: file names, strace groups
@ 2015-07-15 13:02 Milian Wolff
  2015-07-15 14:53 ` Arnaldo Carvalho de Melo
  2015-07-15 17:08 ` Frank Ch. Eigler
  0 siblings, 2 replies; 6+ messages in thread
From: Milian Wolff @ 2015-07-15 13:02 UTC (permalink / raw)
  To: Perf Users

Hey all,

are there any plans to improve the `perf trace` utility to make it a drop-in 
replacement for `strace`? `man strace` says "A traced process runs slowly.", 
and if I remember correctly, Brendan Gregg said this is due to strace using 
the PTrace subsystem and not the faster alternative that `perf trace` is based 
upon. But, currently, `perf trace` is far from capable of replacing `strace` 
for my common usecases:

$ touch test

$ strace -e file cat test
execve("/usr/bin/cat", ["cat", "file"], [/* 90 vars */]) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or 
directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("file", O_RDONLY)                  = 3
+++ exited with 0 +++

$ perf trace -e file cat file
perf: Segmentation fault
Obtained 10 stack frames.
perf(dump_stack+0x32) [0x4d8652]
perf(sighandler_dump_stack+0x29) [0x4d86e9]
/usr/lib/libc.so.6(+0x335b0) [0x7f15bba065b0]
perf(perf_evlist__mmap_read+0x35) [0x487445]
perf(cmd_trace+0xe4b) [0x4582cb]
perf() [0x47b593]
perf(main+0x622) [0x427fe2]
/usr/lib/libc.so.6(__libc_start_main+0xf0) [0x7f15bb9f3790]
perf(_start+0x29) [0x428109]
[(nil)]

Two things to notice: perf version 4.1.0, and every trace call ends with the 
above crash. But the file event group is not known to perf list, and thus 
nothing gets traced. I'd expect an error message instead, i.e.:

$ strace -e asdf cat file
strace: invalid system call 'asdf'

Now, if I instead do this:

$ perf trace -e open cat file
     0.070 ( 0.003 ms): cat/8663 open(filename: 0x7f04b76a4a05, flags: CLOEXEC                         
) = 3
     0.092 ( 0.003 ms): cat/8663 open(filename: 0x7f04b78acef0, flags: CLOEXEC                         
) = 3
     0.245 ( 0.005 ms): cat/8663 open(filename: 0x7f04b7453310, flags: CLOEXEC                         
) = 3
     0.277 ( 0.002 ms): cat/8663 open(filename: 0x7ffdba0902f8, mode: 131072                           
) = 3

This is nice, esp. the time measurements. But why are the file names not 
printed as strings? The pointer address does not help one at all. Also, the 
O_RDONLY flag is not properly handled (compare to the strace output above).

I'd love to see someone working on this gem, as it really promises to be a 
good replacement for strace.

Bye
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf trace: file names, strace groups
  2015-07-15 13:02 perf trace: file names, strace groups Milian Wolff
@ 2015-07-15 14:53 ` Arnaldo Carvalho de Melo
  2015-07-15 15:41   ` Milian Wolff
  2015-07-15 17:08 ` Frank Ch. Eigler
  1 sibling, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-15 14:53 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, David Ahern

Executive summary:

. Check the latest sources, i.e. 4.2-rc

. Some of the stuff you report was implemented or fixed already

. Details about some of the limitations and advantages of 'perf trace' below

. Including an auto-tracing that will fix one of the problems you reported

Em Wed, Jul 15, 2015 at 03:02:21PM +0200, Milian Wolff escreveu:
> Hey all,
> 
> are there any plans to improve the `perf trace` utility to make it a drop-in 
> replacement for `strace`? `man strace` says "A traced process runs slowly.", 

Yes, 'perf trace' should go in that direction.

> and if I remember correctly, Brendan Gregg said this is due to strace using 
> the PTrace subsystem and not the faster alternative that `perf trace` is based 
> upon.

Right

> But, currently, `perf trace` is far from capable of replacing `strace` 
> for my common usecases:
> 
> $ touch test
> 
> $ strace -e file cat test
> execve("/usr/bin/cat", ["cat", "file"], [/* 90 vars */]) = 0
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or 
> directory)
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
> open("file", O_RDONLY)                  = 3
> +++ exited with 0 +++
> 
> $ perf trace -e file cat file
> perf: Segmentation fault
> Obtained 10 stack frames.
> perf(dump_stack+0x32) [0x4d8652]
> perf(sighandler_dump_stack+0x29) [0x4d86e9]
> /usr/lib/libc.so.6(+0x335b0) [0x7f15bba065b0]
> perf(perf_evlist__mmap_read+0x35) [0x487445]
> perf(cmd_trace+0xe4b) [0x4582cb]
> perf() [0x47b593]
> perf(main+0x622) [0x427fe2]
> /usr/lib/libc.so.6(__libc_start_main+0xf0) [0x7f15bb9f3790]
> perf(_start+0x29) [0x428109]
> [(nil)]

[acme@zoo ~]$ touch test
[acme@zoo ~]$ trace -e file cat file
Error:	Invalid syscall file
Hint:	try 'perf list syscalls:sys_enter_*'
Hint:	and: 'man syscalls'
[acme@zoo ~]$

When I use 'trace', it is the same as 'perf trace', it is just a
hardlink and it checks argv[0] to use what is in
tools/perf/builtin-trace.c, etc.
 
> Two things to notice: perf version 4.1.0, and every trace call ends with the 
> above crash. But the file event group is not known to perf list, and thus 
> nothing gets traced. I'd expect an error message instead, i.e.:
> 
> $ strace -e asdf cat file
> strace: invalid system call 'asdf'

[acme@zoo ~]$ trace -e asdf cat file
Error:	Invalid syscall asdf
Hint:	try 'perf list syscalls:sys_enter_*'
Hint:	and: 'man syscalls'
[acme@zoo ~]$
 
> Now, if I instead do this:
> 
> $ perf trace -e open cat file
>      0.070 ( 0.003 ms): cat/8663 open(filename: 0x7f04b76a4a05, flags: CLOEXEC                         
> ) = 3
>      0.092 ( 0.003 ms): cat/8663 open(filename: 0x7f04b78acef0, flags: CLOEXEC                         
> ) = 3
>      0.245 ( 0.005 ms): cat/8663 open(filename: 0x7f04b7453310, flags: CLOEXEC                         
> ) = 3
>      0.277 ( 0.002 ms): cat/8663 open(filename: 0x7ffdba0902f8, mode: 131072                           
> ) = 3

[acme@zoo ~]$ trace -e open cat file
Error:	No permissions to read
/sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

[acme@zoo ~]$ sudo mount -o remount,mode=755 /sys/kernel/debug
[sudo] password for acme: 
[acme@zoo ~]$ trace -e open cat file
Error:	No permissions to read
/sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'

[acme@zoo ~]$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
[acme@zoo ~]$ trace -e open cat file
cat: file: No such file or directory
  0.325 ( 0.032 ms): cat/7295 open(filename: 0x7f5e1f049b92, flags: CLOEXEC) = 3
  0.361 ( 0.011 ms): cat/7295 open(filename: 0x7f5e1f24adc8, flags: CLOEXEC) = 3
  0.948 ( 0.019 ms): cat/7295 open(filename: 0x7f5e1edf86b0, flags: CLOEXEC) = 3
  1.099 ( 0.021 ms): cat/7295 open(filename: 0x7fff1460d22f, mode: 65536) = -1 ENOENT No such file or directory
  1.209 ( 0.024 ms): cat/7295 open(filename: 0x7fff1460a550, flags: CLOEXEC) = 3
  1.339 ( 0.017 ms): cat/7295 open(filename: 0xb86100, mode: 1) = -1 ENOENT No such file or directory
  1.345 ( 0.004 ms): cat/7295 open(filename: 0xb87190, mode: 12083520) = -1 ENOENT No such file or directory
  1.352 ( 0.005 ms): cat/7295 open(filename: 0xb86190, mode: 12087760) = -1 ENOENT No such file or directory
  1.359 ( 0.004 ms): cat/7295 open(filename: 0xb87110, mode: 12088032) = -1 ENOENT No such file or directory
  1.365 ( 0.004 ms): cat/7295 open(filename: 0xb87220, mode: 12087632) = -1 ENOENT No such file or directory
  1.385 ( 0.019 ms): cat/7295 open(filename: 0xb86210, mode: 12087904) = -1 ENOENT No such file or directory
[acme@zoo ~]$

 
> This is nice, esp. the time measurements. But why are the file names not 
> printed as strings? The pointer address does not help one at all. Also, the 

Because we don't use ptrace to stop the process and copy the string from
userspace, etc.

We have access, right now, to only what the tracepoints capture, and
they do not follow pointers.

So, there is already provision to either use a vfs_getname 'wannabe
tracepoint', i.e. a 'perf probe' put in the routine that copies stuff
from userspace to kernel space, and that works if you do something like:

  [root@zoo ~]# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
  Added new event:
    probe:vfs_getname    (on getname_flags:72 with pathname=filename:string)

  You can now use it in all perf tools, such as:

	perf record -e probe:vfs_getname -aR sleep 1

  [root@zoo ~]#

So, after this if we try again:

[acme@zoo linux]$ trace -e open cat file
Error:	Operation not permitted.
Hint:	Check /proc/sys/kernel/perf_event_paranoid setting.
Hint:	For system wide tracing it needs to be set to -1.
Hint:	Try: 'sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"'
Hint:	The current value is 1.
[acme@zoo linux]$

Ok, now it sees the probe:vfs_getname tracepoint and tries to use it, but
normal users can't connect to kprobes... So, if we open the doors wide:

[acme@zoo linux]$ trace -e open cat file
cat: file: No such file or directory
     0.159 ( 0.012 ms): cat/7662 open(filename: 0x7f65e224cb92, flags: CLOEXEC) = 3

It still doesn't work, but at least it managed to connect to the probe, probably,
IIRC, open doesn't use getname_flags() to transfer the contents pointed by
'filename' in the open syscall from userspace to kernel space, or the logic to
connect the vfs_getname payload to the outstanding open syscall has issues, lets see:

[acme@zoo linux]$ perf record -e probe:vfs_getname cat file
cat: file: No such file or directory
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (12 samples) ]
[acme@zoo linux]$ perf script
  cat 7682 [0] 1.093735: probe:vfs_getname: (ffffffff812230bc) pathname="/etc/ld.so.preload"
  cat 7682 [0] 1.093747: probe:vfs_getname: (ffffffff812230bc) pathname="/etc/ld.so.cache"
  cat 7682 [0] 1.093767: probe:vfs_getname: (ffffffff812230bc) pathname="/lib64/libc.so.6"
  cat 7682 [0] 1.094051: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/lib/locale/local
  cat 7682 [0] 1.094113: probe:vfs_getname: (ffffffff812230bc) pathname="file"
  cat 7682 [0] 1.094165: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/loc
  cat 7682 [0] 1.094229: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en_
  cat 7682 [0] 1.094233: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en_
  cat 7682 [0] 1.094236: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en_
  cat 7682 [0] 1.094242: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en.
  cat 7682 [0] 1.094245: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en.
  cat 7682 [0] 1.094248: probe:vfs_getname: (ffffffff812230bc) pathname="/usr/share/locale/en/
(END)

So, good news, I guess some tweaking to the open logic in 'builtin-trace.c' will
do the trick for the open syscall as well, lets see again, using trace itself...

[acme@zoo linux]$ trace -e open --event probe:vfs_getname cat file
cat: file: No such file or directory
 0.163 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/etc/ld.so.preload")
 0.180 (0.004 ms): cat/7754 open(filename: 0x7fe6c4b86b92, flags: CLOEXEC                         ) ...
 0.180 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/etc/ld.so.cache")
 0.186 (0.010 ms): cat/7754  ... [continued]: open()) = 3
 0.206 (0.003 ms): cat/7754 open(filename: 0x7fe6c4d87dc8, flags: CLOEXEC                         ) ...
 0.206 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/lib64/libc.so.6")
 0.212 (0.009 ms): cat/7754  ... [continued]: open()) = 3
 0.477 (0.005 ms): cat/7754 open(filename: 0x7fe6c49356b0, flags: CLOEXEC                         ) ...
 0.477 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/lib/locale/locale-archive")
 0.485 (0.013 ms): cat/7754  ... [continued]: open()) = 3
 0.540 (0.003 ms): cat/7754 open(filename: 0x7fff70d9220f, mode: 65536                            ) ...
 0.540 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="file")
 0.545 (0.008 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.604 (0.005 ms): cat/7754 open(filename: 0x7fff70d8f610, flags: CLOEXEC                         ) ...
 0.604 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/locale.alias")
 0.612 (0.013 ms): cat/7754  ... [continued]: open()) = 3
 0.701 (0.006 ms): cat/7754 open(filename: 0x1315100, mode: 1                                     ) ...
 0.701 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo")
 0.707 (0.012 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.711 (0.003 ms): cat/7754 open(filename: 0x1316190, mode: 20009280                              ) ...
 0.711 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo")
 0.716 (0.007 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.720 (0.003 ms): cat/7754 open(filename: 0x1315190, mode: 20013520                              ) ...
 0.720 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en_US/LC_MESSAGES/libc.mo")
 0.724 (0.007 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.729 (0.003 ms): cat/7754 open(filename: 0x1316110, mode: 20013792                              ) ...
 0.729 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo")
 0.732 (0.006 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.736 (0.003 ms): cat/7754 open(filename: 0x1316220, mode: 20013392                              ) ...
 0.736 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.740 (0.006 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
 0.744 (0.003 ms): cat/7754 open(filename: 0x1315210, mode: 20013664                              ) ...
 0.744 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en/LC_MESSAGES/libc.mo")
 0.749 (0.007 ms): cat/7754  ... [continued]: open()) = -1 ENOENT No such file or directory
[acme@zoo linux]$

Ok, that was good, using trace to trace trace and fix it, after lunch I'll give
it a try.

One not: -e expects syscall names, --event (or --ev, as it accepts the minimum
non-ambiguous part of a long option) expects perf events, but I think I'll make
both accept either "events", to avoid confusion with other perf tools, i.e.

  -e sched:*,open,close

Should do the same as:

  -e open,close --ev sched:*

Etc.

> O_RDONLY flag is not properly handled (compare to the strace output above).

It is zero, right? yeap:

#define O_RDONLY        00000000

syscall arguments with a zero payload are removed trying to make the output
more compact, and also for cases where some of the args are optional, to reduce
the logic needed to know when things are optional, just don't print zeroes.

This shortcut has issues, as in this case, where an exception seems to be
needed, i.e. we should print O_RDONLY, d'accord?
 
> I'd love to see someone working on this gem, as it really promises to be a 
> good replacement for strace.

Well, you're doing some work, getting some discussion going, I can code some
more if someone like you tests it and tries to use it in your use cases.

Bonus points go to whoever follows these discussions and writes some
documentation or blogs about it ;-)

Ok, now to juggle this with the eBPF enablement of perf... I can see
some synergies, like: please, trace network traffic to host foo.bar,
please... :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf trace: file names, strace groups
  2015-07-15 14:53 ` Arnaldo Carvalho de Melo
@ 2015-07-15 15:41   ` Milian Wolff
  2015-07-15 19:43     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2015-07-15 15:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Perf Users, David Ahern

On Wednesday 15 July 2015 11:53:09 Arnaldo Carvalho de Melo wrote:
> Executive summary:
> 
> . Check the latest sources, i.e. 4.2-rc

I could also try to compile perf from git, if you tell me what branch and 
remote I should use.

> . Some of the stuff you report was implemented or fixed already
> 
> . Details about some of the limitations and advantages of 'perf trace' below
> 
> . Including an auto-tracing that will fix one of the problems you reported

Nice, thanks for that!

<snip>

> [acme@zoo ~]$ trace -e file cat file
> Error:	Invalid syscall file
> Hint:	try 'perf list syscalls:sys_enter_*'
> Hint:	and: 'man syscalls'

Just to be clear: Supporting the strace groups as a collection of syscalls 
will be implemented? Or is that not possible, as it goes straight down to the 
perf event subsystem which does not know this group?

<snip>

> Ok, that was good, using trace to trace trace and fix it, after lunch I'll
> give it a try.

Cool, so you'll add that internally such that the hoops are not necessary on 
the user side to get the filename? Excellent! I also notice that 
close/read/write and others /sometimes/ write out the filename:

    10.404 ( 0.001 ms): kmimetypefinde/8217 read(fd: 6</etc/localtime>, buf: 
0x7ffe2b6d1319, count: 15671         ) = 0

    10.529 ( 0.004 ms): kmimetypefinde/8494 read(fd: 6, buf: 0x7ffc61c45b40, 
count: 8192                          ) = 4

This is also pretty odd.

<snip>

> > O_RDONLY flag is not properly handled (compare to the strace output
> > above).
> 
> It is zero, right? yeap:
> 
> #define O_RDONLY        00000000
> 
> syscall arguments with a zero payload are removed trying to make the output
> more compact, and also for cases where some of the args are optional, to
> reduce the logic needed to know when things are optional, just don't print
> zeroes.
> 
> This shortcut has issues, as in this case, where an exception seems to be
> needed, i.e. we should print O_RDONLY, d'accord?

Yes, that would be good to have here, I guess. In general, the closer the 
output gets to `strace`, the better.

> > I'd love to see someone working on this gem, as it really promises to be a
> > good replacement for strace.
> 
> Well, you're doing some work, getting some discussion going, I can code some
> more if someone like you tests it and tries to use it in your use cases.
> 
> Bonus points go to whoever follows these discussions and writes some
> documentation or blogs about it ;-)
> 
> Ok, now to juggle this with the eBPF enablement of perf... I can see
> some synergies, like: please, trace network traffic to host foo.bar,
> please... :-)

Sounds cool :)
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf trace: file names, strace groups
  2015-07-15 13:02 perf trace: file names, strace groups Milian Wolff
  2015-07-15 14:53 ` Arnaldo Carvalho de Melo
@ 2015-07-15 17:08 ` Frank Ch. Eigler
  1 sibling, 0 replies; 6+ messages in thread
From: Frank Ch. Eigler @ 2015-07-15 17:08 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users


mail wrote:

> [...]
> are there any plans to improve the `perf trace` utility to make it a drop-in 
> replacement for `strace`? [...]

By the way, see also systemtap's "strace.stp" script:
https://sourceware.org/systemtap/examples/#process/strace.stp

- FChE

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf trace: file names, strace groups
  2015-07-15 15:41   ` Milian Wolff
@ 2015-07-15 19:43     ` Arnaldo Carvalho de Melo
  2015-07-16 15:00       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-15 19:43 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, David Ahern

Em Wed, Jul 15, 2015 at 05:41:53PM +0200, Milian Wolff escreveu:
> On Wednesday 15 July 2015 11:53:09 Arnaldo Carvalho de Melo wrote:
> > Executive summary:
> > 
> > . Check the latest sources, i.e. 4.2-rc
> 
> I could also try to compile perf from git, if you tell me what branch and 
> remote I should use.

The branch I use for new features, for the next merge window (the kernel
one, that we tag along) is here:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git  perf/core

> > . Some of the stuff you report was implemented or fixed already

> > . Details about some of the limitations and advantages of 'perf trace' below

> > . Including an auto-tracing that will fix one of the problems you reported
> 
> Nice, thanks for that!
> 
> <snip>
> 
> > [acme@zoo ~]$ trace -e file cat file
> > Error:	Invalid syscall file
> > Hint:	try 'perf list syscalls:sys_enter_*'
> > Hint:	and: 'man syscalls'
 
> Just to be clear: Supporting the strace groups as a collection of syscalls 
> will be implemented? Or is that not possible, as it goes straight down to the 
> perf event subsystem which does not know this group?

Right, I'll implement a strlist__new_with_expansions(), to break a CSV
list and look for some specific keys, that when found will expand into
other keys, like what is done for strace -e file, i.e. file becomes
open,write,close, making sure there are no duplicates.

Then it will be just a matter of providing the expansors for those
strace groups.

> <snip>
> 
> > Ok, that was good, using trace to trace trace and fix it, after lunch I'll
> > give it a try.
> 
> Cool, so you'll add that internally such that the hoops are not necessary on 
> the user side to get the filename? Excellent! I also notice that 

Yes, one other plan is to get bits and pieces of 'perf probe' to be more usable
by other tools, so that we can add these probes, but this is fragile, being
useful just for some areas that are not that much in flux, this vfs_getname
one, for isntance has changed multiple times, i.e. the place where we have to
insert the probe and also the variable name we need to get (the one where the
copy from userspace was already done, etc).

> close/read/write and others /sometimes/ write out the filename:
> 
>     10.404 ( 0.001 ms): kmimetypefinde/8217 read(fd: 6</etc/localtime>, buf: 
> 0x7ffe2b6d1319, count: 15671         ) = 0
> 
>     10.529 ( 0.004 ms): kmimetypefinde/8494 read(fd: 6, buf: 0x7ffc61c45b40, 
> count: 8192                          ) = 4
> 
> This is also pretty odd.

Well, when it intercepts the vfs_getname it stores the filename in a per
thread storage, then, when it gets an open syscall return, the fd, it
associates, in an array, per thread, the filename to the fd, and then,
when that fd appears again in another syscall, it expands it, showing
the name.

When vfs_getname is not used, it still works, to some degree, but by generating
some noise in the system by reading the /proc/PID/fd/ symlinks, which is racy,
etc.

Which reminds me what needs to be done for open: to defer printing the
syscall args till we get the vfs_getname tracepoint, when we realize
what is the filename, matching the pointer to the contents... But that
has some complications, as when open itself is interrupted by some other
event... I.e. we need to somehow indicate that, for instance:

 0.605 (0.002 ms): cat/13780 open(filename: 0x78a220, mode: 7905616                                ) ...
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open()) = -1 ENOENT No such file or directory

I.e. normally we print the first line with what is in the
raw_syscall:sys_enter, and that is just the filename pointer, as you
see above, but then, we will do the association of that pointer to
its contents ("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo") when
we get that probe:vfs_getname...

 0.605 (0.002 ms): cat/13780 open(filename: 0x78a220, mode: 7905616                                ) ...
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open()) = -1 ENOENT No such file or directory

We could transform that into:

 0.605 (0.002 ms): cat/13780 open( ... interrupted
 0.605 (        ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo")
 0.608 (0.004 ms): cat/13780  ... [continued]: open(filename = "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", mode: 7905616)) = -1 ENOENT No such file or directory

That, when no interruption happens, i.e. no event from syscall enter to syscall
exit happens, i.e. when "probe:vfs_getname" isn't one of the events select, for
instance,  would result into:

 0.605 (0.002 ms): cat/13780 open(filename = "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", mode: 7905616)) = -1 ENOENT No such file or directory

Now we need just to write the code for that ;-)

> <snip>
> 
> > > O_RDONLY flag is not properly handled (compare to the strace output
> > > above).
> > 
> > It is zero, right? yeap:
> > 
> > #define O_RDONLY        00000000
> > 
> > syscall arguments with a zero payload are removed trying to make the output
> > more compact, and also for cases where some of the args are optional, to
> > reduce the logic needed to know when things are optional, just don't print
> > zeroes.
> > 
> > This shortcut has issues, as in this case, where an exception seems to be
> > needed, i.e. we should print O_RDONLY, d'accord?
> 
> Yes, that would be good to have here, I guess. In general, the closer the 
> output gets to `strace`, the better.

Ok.
 
- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf trace: file names, strace groups
  2015-07-15 19:43     ` Arnaldo Carvalho de Melo
@ 2015-07-16 15:00       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-16 15:00 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Perf Users, David Ahern

Em Wed, Jul 15, 2015 at 04:43:15PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Jul 15, 2015 at 05:41:53PM +0200, Milian Wolff escreveu:
> > On Wednesday 15 July 2015 11:53:09 Arnaldo Carvalho de Melo wrote:
> > > [acme@zoo ~]$ trace -e file cat file
> > > Error:	Invalid syscall file
> > > Hint:	try 'perf list syscalls:sys_enter_*'
> > > Hint:	and: 'man syscalls'
  
> > Just to be clear: Supporting the strace groups as a collection of syscalls 
> > will be implemented? Or is that not possible, as it goes straight down to the 
> > perf event subsystem which does not know this group?
 
> Right, I'll implement a strlist__new_with_expansions(), to break a CSV
> list and look for some specific keys, that when found will expand into
> other keys, like what is done for strace -e file, i.e. file becomes
> open,write,close, making sure there are no duplicates.
 
> Then it will be just a matter of providing the expansors for those
> strace groups.

Did it a little bit differently, see patch below, needs some more
polishing, will do after some other stuff I need to do now:

[root@zoo ~]# trace -e file ls
     0.153 ( 0.019 ms): ls/26666 access(filename: 0x7fe31a181270, mode: R) = -1 ENOENT No such file or directory
     0.166 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a17fb92, flags: CLOEXEC) = 3
     0.192 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a37883c, flags: CLOEXEC) = 3
     0.258 ( 0.010 ms): ls/26666 open(filename: 0x7fe31a380b2f, flags: CLOEXEC) = 3
     0.304 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a381a39, flags: CLOEXEC) = 3
     0.351 ( 0.009 ms): ls/26666 open(filename: 0x7fe31a380dc8, flags: CLOEXEC) = 3
     0.409 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a379d5a, flags: CLOEXEC) = 3
     0.455 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a37fc6b, flags: CLOEXEC) = 3
     0.502 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a3815d4, flags: CLOEXEC) = 3
     0.545 ( 0.008 ms): ls/26666 open(filename: 0x7fe31a379648, flags: CLOEXEC) = 3
anaconda-ks.cfg  b  bin  lib64	libexec  new  old  perf.data  perf.data.old  stream_test  tg.run
     1.124 ( 0.258 ms): ls/26666 statfs(pathname: 0x7fe319f57654, buf: 0x7ffd15dc4810) = 0
     1.142 ( 0.015 ms): ls/26666 statfs(pathname: 0x7fe319f57654, buf: 0x7ffd15dc4700) = 0
     1.234 ( 0.011 ms): ls/26666 open(filename: 0x7fe3198fb6b0, flags: CLOEXEC) = 3
     1.320 ( 0.008 ms): ls/26666 openat(dfd: CWD, filename: 0x11f2c10, flags: CLOEXEC|DIRECTORY|NONBLOCK) = 3
[root@zoo ~]#

But should we have all those below? And if so, wouldn't be better to use
that criteria (in the grep+set below) as a future proof "file" group?
Anyway, having groups as extensions on files in a dir adds the strace
feature as soon as someone provides me the groups files :-)

[root@zoo ~]# grep "field:.*name" /sys/kernel/debug/tracing/events/syscalls/*/format | sed
's/.*sys_enter_\([a-z_0-9]\+\).*/\1/g'
access
acct
chdir
chmod
chown
chroot
creat
delete_module
faccessat
fanotify_mark
fchmodat
fchownat
fgetxattr
fremovexattr
fsetxattr
futimesat
getsockopt
getxattr
getxattr
inotify_add_watch
lchown
lgetxattr
lgetxattr
linkat
linkat
link
link
listxattr
llistxattr
lremovexattr
lremovexattr
lsetxattr
lsetxattr
memfd_create
mkdirat
mkdir
mknodat
mknod
mount
mount
mq_open
mq_unlink
name_to_handle_at
newfstatat
newlstat
newstat
newuname
openat
open
readlinkat
removexattr
removexattr
renameat2
renameat2
renameat
renameat
rename
rename
rmdir
setdomainname
sethostname
setsockopt
setxattr
setxattr
statfs
symlinkat
symlinkat
symlink
symlink
umount
unlinkat
unlink
utime
utimensat
utimes
[root@zoo ~]# 


diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 0ebf55bf20b3..4dc50fdba5dd 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2931,7 +2931,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 		trace.not_ev_qualifier = *s == '!';
 		if (trace.not_ev_qualifier)
 			++s;
-		trace.ev_qualifier = strlist__new(true, s);
+		trace.ev_qualifier = strlist__new_subst_dir(true, s, "/home/acme/git/linux/tools/perf/trace/strace/groups/");
 		if (trace.ev_qualifier == NULL) {
 			fputs("Not enough memory to parse event qualifier",
 			      trace.output);
diff --git a/tools/perf/util/strlist.c b/tools/perf/util/strlist.c
index 68ae673f9fb6..2e4534a1fffe 100644
--- a/tools/perf/util/strlist.c
+++ b/tools/perf/util/strlist.c
@@ -108,32 +108,52 @@ struct str_node *strlist__find(struct strlist *slist, const char *entry)
 	return snode;
 }
 
-static int strlist__parse_list_entry(struct strlist *slist, const char *s)
+static int strlist__parse_list_entry(struct strlist *slist, const char *s,
+				     const char *subst_dir)
 {
+	int err;
+
 	if (strncmp(s, "file://", 7) == 0)
 		return strlist__load(slist, s + 7);
 
-	return strlist__add(slist, s);
+	if (subst_dir) {
+		char *subst;
+
+		err = -ENOMEM;
+		if (asprintf(&subst, "%s/%s", subst_dir, s) < 0)
+			goto out;
+
+		err = 0;
+		if (access(subst, F_OK) == 0)
+			err = strlist__load(slist, subst);
+
+		free(subst);
+	} else {
+		err = strlist__add(slist, s);
+	}
+out:
+	return err;
 }
 
-int strlist__parse_list(struct strlist *slist, const char *s)
+int strlist__parse_list(struct strlist *slist, const char *s, const char *subst_dir)
 {
 	char *sep;
 	int err;
 
 	while ((sep = strchr(s, ',')) != NULL) {
 		*sep = '\0';
-		err = strlist__parse_list_entry(slist, s);
+		err = strlist__parse_list_entry(slist, s, subst_dir);
 		*sep = ',';
 		if (err != 0)
 			return err;
 		s = sep + 1;
 	}
 
-	return *s ? strlist__parse_list_entry(slist, s) : 0;
+	return *s ? strlist__parse_list_entry(slist, s, subst_dir) : 0;
 }
 
-struct strlist *strlist__new(bool dupstr, const char *list)
+struct strlist *strlist__new_subst_dir(bool dupstr, const char *list,
+				       const char *dirname)
 {
 	struct strlist *slist = malloc(sizeof(*slist));
 
@@ -144,7 +164,7 @@ struct strlist *strlist__new(bool dupstr, const char *list)
 		slist->rblist.node_delete = strlist__node_delete;
 
 		slist->dupstr	 = dupstr;
-		if (list && strlist__parse_list(slist, list) != 0)
+		if (list && strlist__parse_list(slist, list, dirname) != 0)
 			goto out_error;
 	}
 
@@ -154,6 +174,11 @@ out_error:
 	return NULL;
 }
 
+struct strlist *strlist__new(bool dupstr, const char *list)
+{
+	return strlist__new_subst_dir(dupstr, list, NULL);
+}
+
 void strlist__delete(struct strlist *slist)
 {
 	if (slist != NULL)
diff --git a/tools/perf/util/strlist.h b/tools/perf/util/strlist.h
index 5c7f87069d9c..4bcffdfb9f29 100644
--- a/tools/perf/util/strlist.h
+++ b/tools/perf/util/strlist.h
@@ -17,6 +17,8 @@ struct strlist {
 };
 
 struct strlist *strlist__new(bool dupstr, const char *slist);
+struct strlist *strlist__new_subst_dir(bool dupstr, const char *slist,
+				       const char *dirname);
 void strlist__delete(struct strlist *slist);
 
 void strlist__remove(struct strlist *slist, struct str_node *sn);
@@ -75,5 +77,5 @@ static inline struct str_node *strlist__next(struct str_node *sn)
 	for (pos = strlist__first(slist), n = strlist__next(pos); pos;\
 	     pos = n, n = strlist__next(n))
 
-int strlist__parse_list(struct strlist *slist, const char *s);
+int strlist__parse_list(struct strlist *slist, const char *s, const char *subst_dir);
 #endif /* __PERF_STRLIST_H */

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-16 15:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 13:02 perf trace: file names, strace groups Milian Wolff
2015-07-15 14:53 ` Arnaldo Carvalho de Melo
2015-07-15 15:41   ` Milian Wolff
2015-07-15 19:43     ` Arnaldo Carvalho de Melo
2015-07-16 15:00       ` Arnaldo Carvalho de Melo
2015-07-15 17:08 ` Frank Ch. Eigler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.