All of lore.kernel.org
 help / color / mirror / Atom feed
* timing information with perf
@ 2015-11-12 21:07 Kornievskaia, Olga
  2015-11-12 21:57 ` Arnaldo Carvalho de Melo
  2015-11-16 18:08 ` Frank Ch. Eigler
  0 siblings, 2 replies; 18+ messages in thread
From: Kornievskaia, Olga @ 2015-11-12 21:07 UTC (permalink / raw)
  To: linux-perf-users

Hi, 

Can somebody tell me if perf can do “timing-of-operation” type of measurement? By that I mean, say there are two tracepoints “foobar_enter” and “foobar_exit”. Each tracepoint logs a timestamp. I’d like to be able to say that on average it takes that many time-units between “enter” and “exit” tracepoints. I haven’t been able to find something like that with “perf”.

Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 21:07 timing information with perf Kornievskaia, Olga
@ 2015-11-12 21:57 ` Arnaldo Carvalho de Melo
  2015-11-12 22:04   ` Arnaldo Carvalho de Melo
  2015-11-12 23:09   ` Kornievskaia, Olga
  2015-11-16 18:08 ` Frank Ch. Eigler
  1 sibling, 2 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-12 21:57 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-perf-users

Em Thu, Nov 12, 2015 at 09:07:20PM +0000, Kornievskaia, Olga escreveu:
> Hi, 
> 
> Can somebody tell me if perf can do “timing-of-operation” type of measurement? By that I mean, say there are two tracepoints “foobar_enter” and “foobar_exit”. Each tracepoint logs a timestamp. I’d like to be able to say that on average it takes that many time-units between “enter” and “exit” tracepoints. I haven’t been able to find something like that with “perf”.

Well, you can do:

  # perf record -a -e sched:*waking*,sched:*wakeup* sleep 2
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.199 MB perf.data (156 samples) ]
  # perf script -g python
  generated Python script: perf-script.py
  # grep '^def ' perf-script.py 
  def trace_begin():
  def trace_end():
  def sched__sched_wakeup_new(event_name, context, common_cpu,
  def sched__sched_wakeup(event_name, context, common_cpu,
  def sched__sched_waking(event_name, context, common_cpu,
  def trace_unhandled(event_name, context, event_fields_dict):
  def print_header(event_name, cpu, secs, nsecs, pid, comm):
  #

The prototype for sched__sched_wakeup(), for instance is:

def sched__sched_wakeup(event_name, context, common_cpu,
        common_secs, common_nsecs, common_pid, common_comm,
        common_callchain, comm, pid, prio, success,
        target_cpu):

Then do some minimal coding for what you want, because you need
to match it using some key, i.e. same pid is enough for some cases, for
others you need pid + fd, etc.

	If you prefer perl, just use 'perf script -g perl' instead.

	If all you want is for syscalls, then 'perf trace' does get
enter+exit and does the math for you:

  # perf trace usleep 1 2>&1 | head -10
     0.083 ( 0.003 ms): usleep/5487 brk(                                                                  ) = 0x560cf53db000
     0.100 ( 0.003 ms): usleep/5487 mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1    ) = 0x7f38f3617000
     0.108 ( 0.005 ms): usleep/5487 access(filename: 0xf3415270, mode: R                                  ) = -1 ENOENT No such file or directory
     0.118 ( 0.005 ms): usleep/5487 open(filename: 0xf3413b92, flags: CLOEXEC                             ) = 3
     0.122 ( 0.003 ms): usleep/5487 fstat(fd: 3, statbuf: 0x7ffeaa0ad9a0                                  ) = 0
     0.126 ( 0.003 ms): usleep/5487 mmap(len: 111605, prot: READ, flags: PRIVATE, fd: 3                   ) = 0x7f38f35fb000
     0.128 ( 0.001 ms): usleep/5487 close(fd: 3                                                           ) = 0
     0.139 ( 0.004 ms): usleep/5487 open(filename: 0xf360c37c, flags: CLOEXEC                             ) = 3
     0.142 ( 0.002 ms): usleep/5487 read(fd: 3, buf: 0x7ffeaa0adb30, count: 832                           ) = 832
     0.145 ( 0.001 ms): usleep/5487 fstat(fd: 3, statbuf: 0x7ffeaa0ad9d0                                  ) = 0
  # 

And you can specify, just like with strace, what are the syscalls you want (or
the ones you don't, prefixing with !):

  # perf trace -e nanosleep sleep 1
  1000.744 (1000.239 ms): sleep/5661 nanosleep(rqtp: 0x7ffc4dce2960                                        ) = 0
  # 

You can also mix it with any other tracepoint, including any you have created
with 'perf probe', for kernel or userspace:

  # perf trace --event sched:*exit,sched:*switch,kmem:kmalloc -e nanosleep sleep 1 2>&1 | tail -5
     0.343 (           ): kmem:kmalloc:(perf_event_mmap+0xc7) call_site=ffffffff81194617 ptr=0xffff88022a4ad000 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL)
     0.484 (   0.006 ms): sleep/6081 nanosleep(rqtp: 0x7ffdbcc9b6e0                                        ) ...
     0.484 (           ): sched:sched_switch:sleep:6081 [120] S ==> tail:6080 [120])
  1000.664 (1000.186 ms): sleep/6081  ... [continued]: nanosleep()) = 0
  1000.800 (           ): sched:sched_process_exit:comm=sleep pid=6081 prio=120)
  #

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 21:57 ` Arnaldo Carvalho de Melo
@ 2015-11-12 22:04   ` Arnaldo Carvalho de Melo
  2015-11-12 23:19     ` David Ahern
  2015-11-12 23:09   ` Kornievskaia, Olga
  1 sibling, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-12 22:04 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-perf-users

Em Thu, Nov 12, 2015 at 06:57:28PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Nov 12, 2015 at 09:07:20PM +0000, Kornievskaia, Olga escreveu:

> > Can somebody tell me if perf can do “timing-of-operation” type of
> > measurement? By that I mean, say there are two tracepoints
> > “foobar_enter” and “foobar_exit”. Each tracepoint logs a timestamp.
> > I’d like to be able to say that on average it takes that many
> > time-units between “enter” and “exit” tracepoints. I haven’t been
> > able to find something like that with “perf”.
 
> Well, you can do:
> 
>   # perf record -a -e sched:*waking*,sched:*wakeup* sleep 2
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.199 MB perf.data (156 samples) ]
>   # perf script -g python
>   generated Python script: perf-script.py
>   # grep '^def ' perf-script.py 
>   def trace_begin():
>   def trace_end():
>   def sched__sched_wakeup_new(event_name, context, common_cpu,
>   def sched__sched_wakeup(event_name, context, common_cpu,
>   def sched__sched_waking(event_name, context, common_cpu,
>   def trace_unhandled(event_name, context, event_fields_dict):
>   def print_header(event_name, cpu, secs, nsecs, pid, comm):
>   #
> 
> The prototype for sched__sched_wakeup(), for instance is:
> 
> def sched__sched_wakeup(event_name, context, common_cpu,
>         common_secs, common_nsecs, common_pid, common_comm,
>         common_callchain, comm, pid, prio, success,
>         target_cpu):
> 
> Then do some minimal coding for what you want, because you need
> to match it using some key, i.e. same pid is enough for some cases, for
> others you need pid + fd, etc.

If you do not change that script, then running it will produce:

  # perf script -s perf-script.p
  in trace_begin
  sched__sched_waking 2 09156.170326579 16181 perf comm=perf, pid=16182, prio=120, success=1, target_cpu=3
  sched__sched_wakeup 2 09156.170329797 16181 perf comm=perf, pid=16182, prio=120, success=1, target_cpu=3
  sched__sched_waking 3 09156.170446653 16182 perf comm=migration/3, pid=28, prio=0, success=1, target_cpu=3
<SNIP>
  in trace_end
 
- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 21:57 ` Arnaldo Carvalho de Melo
  2015-11-12 22:04   ` Arnaldo Carvalho de Melo
@ 2015-11-12 23:09   ` Kornievskaia, Olga
  2015-11-12 23:22     ` David Ahern
  2015-11-12 23:34     ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 18+ messages in thread
From: Kornievskaia, Olga @ 2015-11-12 23:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Kornievskaia, Olga, linux-perf-users


> On Nov 12, 2015, at 4:57 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Thu, Nov 12, 2015 at 09:07:20PM +0000, Kornievskaia, Olga escreveu:
>> Hi, 
>> 
>> Can somebody tell me if perf can do “timing-of-operation” type of measurement? By that I mean, say there are two tracepoints “foobar_enter” and “foobar_exit”. Each tracepoint logs a timestamp. I’d like to be able to say that on average it takes that many time-units between “enter” and “exit” tracepoints. I haven’t been able to find something like that with “perf”.
> 
> Well, you can do:
> 
>  # perf record -a -e sched:*waking*,sched:*wakeup* sleep 2
>  [ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.199 MB perf.data (156 samples) ]
>  # perf script -g python
>  generated Python script: perf-script.py
>  # grep '^def ' perf-script.py 
>  def trace_begin():
>  def trace_end():
>  def sched__sched_wakeup_new(event_name, context, common_cpu,
>  def sched__sched_wakeup(event_name, context, common_cpu,
>  def sched__sched_waking(event_name, context, common_cpu,
>  def trace_unhandled(event_name, context, event_fields_dict):
>  def print_header(event_name, cpu, secs, nsecs, pid, comm):
>  #
> 
> The prototype for sched__sched_wakeup(), for instance is:
> 
> def sched__sched_wakeup(event_name, context, common_cpu,
>        common_secs, common_nsecs, common_pid, common_comm,
>        common_callchain, comm, pid, prio, success,
>        target_cpu):
> 
> Then do some minimal coding for what you want, because you need
> to match it using some key, i.e. same pid is enough for some cases, for
> others you need pid + fd, etc.
> 
> 	If you prefer perl, just use 'perf script -g perl' instead.
> 
> 	If all you want is for syscalls, then 'perf trace' does get
> enter+exit and does the math for you:
> 
>  # perf trace usleep 1 2>&1 | head -10
>     0.083 ( 0.003 ms): usleep/5487 brk(                                                                  ) = 0x560cf53db000
>     0.100 ( 0.003 ms): usleep/5487 mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1    ) = 0x7f38f3617000
>     0.108 ( 0.005 ms): usleep/5487 access(filename: 0xf3415270, mode: R                                  ) = -1 ENOENT No such file or directory
>     0.118 ( 0.005 ms): usleep/5487 open(filename: 0xf3413b92, flags: CLOEXEC                             ) = 3
>     0.122 ( 0.003 ms): usleep/5487 fstat(fd: 3, statbuf: 0x7ffeaa0ad9a0                                  ) = 0
>     0.126 ( 0.003 ms): usleep/5487 mmap(len: 111605, prot: READ, flags: PRIVATE, fd: 3                   ) = 0x7f38f35fb000
>     0.128 ( 0.001 ms): usleep/5487 close(fd: 3                                                           ) = 0
>     0.139 ( 0.004 ms): usleep/5487 open(filename: 0xf360c37c, flags: CLOEXEC                             ) = 3
>     0.142 ( 0.002 ms): usleep/5487 read(fd: 3, buf: 0x7ffeaa0adb30, count: 832                           ) = 832
>     0.145 ( 0.001 ms): usleep/5487 fstat(fd: 3, statbuf: 0x7ffeaa0ad9d0                                  ) = 0
>  # 
> 
> And you can specify, just like with strace, what are the syscalls you want (or
> the ones you don't, prefixing with !):
> 
>  # perf trace -e nanosleep sleep 1
>  1000.744 (1000.239 ms): sleep/5661 nanosleep(rqtp: 0x7ffc4dce2960                                        ) = 0
>  # 
> 
> You can also mix it with any other tracepoint, including any you have created
> with 'perf probe', for kernel or userspace:
> 
>  # perf trace --event sched:*exit,sched:*switch,kmem:kmalloc -e nanosleep sleep 1 2>&1 | tail -5
>     0.343 (           ): kmem:kmalloc:(perf_event_mmap+0xc7) call_site=ffffffff81194617 ptr=0xffff88022a4ad000 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL)
>     0.484 (   0.006 ms): sleep/6081 nanosleep(rqtp: 0x7ffdbcc9b6e0                                        ) ...
>     0.484 (           ): sched:sched_switch:sleep:6081 [120] S ==> tail:6080 [120])
>  1000.664 (1000.186 ms): sleep/6081  ... [continued]: nanosleep()) = 0
>  1000.800 (           ): sched:sched_process_exit:comm=sleep pid=6081 prio=120)
>  #

Thank you Arnaldo for the answer. 

Just to elaborate further, what I’m interested in timings of NFS operations, so I’m not using any new tracepoints, but just the ones that are present in the code. However since NFS doesn’t various things outside of the process calling into NFS then will “perf trace” still work? 

I’m using “perf record -e ‘nfs4:*’,’nfs:*’ -ag” command to generate tracing info. That captures tracepoints executed by the application as well as the kernel threads that do asynchronous IO. 

> 
> - Arnaldo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 22:04   ` Arnaldo Carvalho de Melo
@ 2015-11-12 23:19     ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2015-11-12 23:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Kornievskaia, Olga; +Cc: linux-perf-users

On 11/12/15 3:04 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 12, 2015 at 06:57:28PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Thu, Nov 12, 2015 at 09:07:20PM +0000, Kornievskaia, Olga escreveu:
>
>>> Can somebody tell me if perf can do “timing-of-operation” type of
>>> measurement? By that I mean, say there are two tracepoints
>>> “foobar_enter” and “foobar_exit”. Each tracepoint logs a timestamp.
>>> I’d like to be able to say that on average it takes that many
>>> time-units between “enter” and “exit” tracepoints. I haven’t been
>>> able to find something like that with “perf”.
>
>> Well, you can do:
>>
>>    # perf record -a -e sched:*waking*,sched:*wakeup* sleep 2
>>    [ perf record: Woken up 1 times to write data ]
>>    [ perf record: Captured and wrote 0.199 MB perf.data (156 samples) ]
>>    # perf script -g python
>>    generated Python script: perf-script.py
>>    # grep '^def ' perf-script.py
>>    def trace_begin():
>>    def trace_end():
>>    def sched__sched_wakeup_new(event_name, context, common_cpu,
>>    def sched__sched_wakeup(event_name, context, common_cpu,
>>    def sched__sched_waking(event_name, context, common_cpu,
>>    def trace_unhandled(event_name, context, event_fields_dict):
>>    def print_header(event_name, cpu, secs, nsecs, pid, comm):
>>    #
>>
>> The prototype for sched__sched_wakeup(), for instance is:
>>
>> def sched__sched_wakeup(event_name, context, common_cpu,
>>          common_secs, common_nsecs, common_pid, common_comm,
>>          common_callchain, comm, pid, prio, success,
>>          target_cpu):
>>
>> Then do some minimal coding for what you want, because you need
>> to match it using some key, i.e. same pid is enough for some cases, for
>> others you need pid + fd, etc.
>
> If you do not change that script, then running it will produce:
>
>    # perf script -s perf-script.p
>    in trace_begin
>    sched__sched_waking 2 09156.170326579 16181 perf comm=perf, pid=16182, prio=120, success=1, target_cpu=3
>    sched__sched_wakeup 2 09156.170329797 16181 perf comm=perf, pid=16182, prio=120, success=1, target_cpu=3
>    sched__sched_waking 3 09156.170446653 16182 perf comm=migration/3, pid=28, prio=0, success=1, target_cpu=3
> <SNIP>
>    in trace_end

I think he wants to average the time difference between 2 tracepoints. 
You should be able to run the output of perf script through an awk 
script to get it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 23:09   ` Kornievskaia, Olga
@ 2015-11-12 23:22     ` David Ahern
  2015-11-12 23:34     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 18+ messages in thread
From: David Ahern @ 2015-11-12 23:22 UTC (permalink / raw)
  To: Kornievskaia, Olga, Arnaldo Carvalho de Melo; +Cc: linux-perf-users

On 11/12/15 4:09 PM, Kornievskaia, Olga wrote:
> Just to elaborate further, what I’m interested in timings of NFS operations, so I’m not using any new tracepoints, but just the ones that are present in the code. However since NFS doesn’t various things outside of the process calling into NFS then will “perf trace” still work?
>
> I’m using “perf record -e ‘nfs4:*’,’nfs:*’ -ag” command to generate tracing info. That captures tracepoints executed by the application as well as the kernel threads that do asynchronous IO.

Do you really need callchains? (-g option)

After the perf-record run perf-script to dump the samples. Pipe that 
into an awk script (or python or bash or ...) and you can compute what 
you want.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 23:09   ` Kornievskaia, Olga
  2015-11-12 23:22     ` David Ahern
@ 2015-11-12 23:34     ` Arnaldo Carvalho de Melo
  2015-11-13 17:01       ` Kornievskaia, Olga
  1 sibling, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-12 23:34 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-perf-users

Em Thu, Nov 12, 2015 at 11:09:48PM +0000, Kornievskaia, Olga escreveu:
> Just to elaborate further, what I’m interested in timings of NFS
> operations, so I’m not using any new tracepoints, but just the ones
> that are present in the code. However since NFS doesn’t various things
> outside of the process calling into NFS then will “perf trace” still
> work? 
 
> I’m using “perf record -e ‘nfs4:*’,’nfs:*’ -ag” command to generate
> tracing info. That captures tracepoints executed by the application as
> well as the kernel threads that do asynchronous IO. 

So, you can replace 'perf record' with 'perf trace' and what will be
captured will be the same, its just that 'perf trace' still doesn't
support callchains (-g), but as David Ahern asked, do you want those?

Also, using -a you're not copying just the apps, but the whole system,
perhaps you want something like:

[root@felicio /]# perf trace --no-syscalls --event nfs4:*’,’nfs:*’ cp /etc/passwd /wb
     0.877 nfs:nfs_getattr_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
     0.891 nfs:nfs_revalidate_inode_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
     1.440 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
     1.445 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
     1.455 nfs4:nfs4_getattr:error=0 (ACCESS) fileid=00:26:2359297 fhandle=0x675bf9ee valid=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE)
     1.460 nfs:nfs_refresh_inode_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
     1.466 nfs:nfs_refresh_inode_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
     1.471 nfs:nfs_revalidate_inode_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
     1.476 nfs:nfs_getattr_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
     1.493 nfs:nfs_access_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
     1.498 nfs:nfs_access_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
     1.504 nfs:nfs_lookup_enter:flags=1 () name=00:26:2359297/passwd)
     1.940 nfs4:nfs4_lookup:error=-2 () name=00:26:2359297/passwd)
     1.948 nfs:nfs_lookup_exit:error=-2 flags=1 () name=00:26:2359297/passwd)
     1.966 nfs:nfs_access_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
     1.971 nfs:nfs_access_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
     2.124 nfs:nfs_atomic_open_enter:flags=32961 (O_CREAT|O_EXCL) fmode=WRITE name=00:26:2359297/passwd)
     8.402 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
     8.421 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
    13.156 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
    13.175 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
    13.187 nfs4:nfs4_setattr:error=0 (ACCESS) fileid=00:26:2360376 fhandle=0xfd52be06)
    13.199 nfs4:nfs4_open_file:error=0 (ACCESS) flags=32961 (O_CREAT|O_EXCL) fmode=WRITE fileid=00:26:2360376 fhandle=0xfd52be06 name=00:26:2359297/passwd)
    13.212 nfs:nfs_atomic_open_exit:error=0 flags=32961 (O_CREAT|O_EXCL) fmode=WRITE name=00:26:2359297/passwd)
    13.223 nfs:nfs_getattr_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
    13.229 nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
    13.235 nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651367497762 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
    13.239 nfs:nfs_getattr_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651367497762 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
    13.275 nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
    18.359 nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651372497681 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
[root@felicio /]# 

If you remove the --no-syscalls, it gets even nicer, here is a slice:

    10.909 ( 7.182 ms): cp/14149  ... [continued]: open()) = 4
    10.922 ( 0.005 ms): cp/14149 fstat(fd: 4, statbuf: 0x7ffe38a34be0                                  ) ...
    10.922 (         ): nfs:nfs_getattr_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
    10.927 (         ): nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
    10.931 (         ): nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
    10.936 (         ): nfs:nfs_getattr_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
    10.948 ( 0.031 ms): cp/14149  ... [continued]: fstat()) = 0
    10.959 ( 0.005 ms): cp/14149 fadvise64(fd: 3</etc/passwd>, advice: 2                               ) = 0
    10.978 ( 0.013 ms): cp/14149 mmap(len: 1056768, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1 ) = 0x7f4de0fd3000
    10.994 ( 0.008 ms): cp/14149 read(fd: 3</etc/passwd>, buf: 0x7f4de0fd4000, count: 1048576          ) = 1576
    11.012 ( 0.013 ms): cp/14149 write(fd: 4, buf: 0x7f4de0fd4000, count: 1576                         ) = 1576
    11.022 ( 0.005 ms): cp/14149 read(fd: 3</etc/passwd>, buf: 0x7f4de0fd4000, count: 1048576          ) = 0
    11.033 ( 0.005 ms): cp/14149 close(fd: 4                                                           ) ...
    11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
    15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
    15.544 ( 4.517 ms): cp/14149  ... [continued]: close()) = 0
    16.058 ( 0.011 ms): cp/14149 close(fd: 3</etc/passwd>                                              ) = 0

Humm, so you want to pair things like:

    10.927 (         ): nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
    10.931 (         ): nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())

Using fileid as the key i addition to the common_pid, etc? That would be a nice
feature to have in 'perf trace'... :-)

There is also:

    11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
    15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())

Well, you also can just get these and do some scripting, no, in this case:

   15.535 - 11.033 = 4.502ms

for that nfs_fsync :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 23:34     ` Arnaldo Carvalho de Melo
@ 2015-11-13 17:01       ` Kornievskaia, Olga
  2015-11-13 19:47         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Kornievskaia, Olga @ 2015-11-13 17:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Kornievskaia, Olga, linux-perf-users


> On Nov 12, 2015, at 6:34 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Thu, Nov 12, 2015 at 11:09:48PM +0000, Kornievskaia, Olga escreveu:
>> Just to elaborate further, what I’m interested in timings of NFS
>> operations, so I’m not using any new tracepoints, but just the ones
>> that are present in the code. However since NFS doesn’t various things
>> outside of the process calling into NFS then will “perf trace” still
>> work? 
> 
>> I’m using “perf record -e ‘nfs4:*’,’nfs:*’ -ag” command to generate
>> tracing info. That captures tracepoints executed by the application as
>> well as the kernel threads that do asynchronous IO. 
> 
> So, you can replace 'perf record' with 'perf trace' and what will be
> captured will be the same, its just that 'perf trace' still doesn't
> support callchains (-g), but as David Ahern asked, do you want those?
> 
> Also, using -a you're not copying just the apps, but the whole system,
> perhaps you want something like:
> 
> [root@felicio /]# perf trace --no-syscalls --event nfs4:*’,’nfs:*’ cp /etc/passwd /wb
>     0.877 nfs:nfs_getattr_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
>     0.891 nfs:nfs_revalidate_inode_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
>     1.440 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
>     1.445 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
>     1.455 nfs4:nfs4_getattr:error=0 (ACCESS) fileid=00:26:2359297 fhandle=0x675bf9ee valid=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE)
>     1.460 nfs:nfs_refresh_inode_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
>     1.466 nfs:nfs_refresh_inode_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
>     1.471 nfs:nfs_revalidate_inode_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
>     1.476 nfs:nfs_getattr_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
>     1.493 nfs:nfs_access_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
>     1.498 nfs:nfs_access_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
>     1.504 nfs:nfs_lookup_enter:flags=1 () name=00:26:2359297/passwd)
>     1.940 nfs4:nfs4_lookup:error=-2 () name=00:26:2359297/passwd)
>     1.948 nfs:nfs_lookup_exit:error=-2 flags=1 () name=00:26:2359297/passwd)
>     1.966 nfs:nfs_access_enter:fileid=00:26:2359297 fhandle=0x675bf9ee version=6216388797857602399 )
>     1.971 nfs:nfs_access_exit:error=0 fileid=00:26:2359297 fhandle=0x675bf9ee type=4 (DIR) version=6216388797857602399 size=16384 cache_validity=0 () nfs_flags=4 ())
>     2.124 nfs:nfs_atomic_open_enter:flags=32961 (O_CREAT|O_EXCL) fmode=WRITE name=00:26:2359297/passwd)
>     8.402 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
>     8.421 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
>    13.156 nfs4:nfs4_map_name_to_uid:error=0 id=0 name=0)
>    13.175 nfs4:nfs4_map_group_to_gid:error=0 id=0 name=0)
>    13.187 nfs4:nfs4_setattr:error=0 (ACCESS) fileid=00:26:2360376 fhandle=0xfd52be06)
>    13.199 nfs4:nfs4_open_file:error=0 (ACCESS) flags=32961 (O_CREAT|O_EXCL) fmode=WRITE fileid=00:26:2360376 fhandle=0xfd52be06 name=00:26:2359297/passwd)
>    13.212 nfs:nfs_atomic_open_exit:error=0 flags=32961 (O_CREAT|O_EXCL) fmode=WRITE name=00:26:2359297/passwd)
>    13.223 nfs:nfs_getattr_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
>    13.229 nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
>    13.235 nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651367497762 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
>    13.239 nfs:nfs_getattr_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651367497762 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
>    13.275 nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410651367497762 )
>    18.359 nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410651372497681 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=0 ())
> [root@felicio /]# 
> 
> If you remove the --no-syscalls, it gets even nicer, here is a slice:
> 
>    10.909 ( 7.182 ms): cp/14149  ... [continued]: open()) = 4
>    10.922 ( 0.005 ms): cp/14149 fstat(fd: 4, statbuf: 0x7ffe38a34be0                                  ) ...
>    10.922 (         ): nfs:nfs_getattr_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>    10.927 (         ): nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>    10.931 (         ): nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
>    10.936 (         ): nfs:nfs_getattr_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
>    10.948 ( 0.031 ms): cp/14149  ... [continued]: fstat()) = 0
>    10.959 ( 0.005 ms): cp/14149 fadvise64(fd: 3</etc/passwd>, advice: 2                               ) = 0
>    10.978 ( 0.013 ms): cp/14149 mmap(len: 1056768, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1 ) = 0x7f4de0fd3000
>    10.994 ( 0.008 ms): cp/14149 read(fd: 3</etc/passwd>, buf: 0x7f4de0fd4000, count: 1048576          ) = 1576
>    11.012 ( 0.013 ms): cp/14149 write(fd: 4, buf: 0x7f4de0fd4000, count: 1576                         ) = 1576
>    11.022 ( 0.005 ms): cp/14149 read(fd: 3</etc/passwd>, buf: 0x7f4de0fd4000, count: 1048576          ) = 0
>    11.033 ( 0.005 ms): cp/14149 close(fd: 4                                                           ) ...
>    11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>    15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
>    15.544 ( 4.517 ms): cp/14149  ... [continued]: close()) = 0
>    16.058 ( 0.011 ms): cp/14149 close(fd: 3</etc/passwd>                                              ) = 0
> 
> Humm, so you want to pair things like:
> 
>    10.927 (         ): nfs:nfs_writeback_inode_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>    10.931 (         ): nfs:nfs_writeback_inode_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999163535246 size=0 cache_validity=24 (INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
> 
> Using fileid as the key i addition to the common_pid, etc? That would be a nice
> feature to have in 'perf trace'... :-)
> 
> There is also:
> 
>    11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>    15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
> 
> Well, you also can just get these and do some scripting, no, in this case:
> 
>   15.535 - 11.033 = 4.502ms
> 
> for that nfs_fsync :-)

David and Arnaldo thank you your replies. I can see that to acquire the timings for non-syscall events requires scripting. Would I be missing something by not using “perf” and instead doing scripting on the output that I get from the trace_pipe? I guess instead of the numeric timestamps that are present in the raw tracepoint information, I’ll get a converted value in seconds from perf. Is there something else I’d be missing?

Thank you.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-13 17:01       ` Kornievskaia, Olga
@ 2015-11-13 19:47         ` Arnaldo Carvalho de Melo
  2015-11-13 21:57           ` Kornievskaia, Olga
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-13 19:47 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-perf-users

Em Fri, Nov 13, 2015 at 05:01:57PM +0000, Kornievskaia, Olga escreveu:
> > On Nov 12, 2015, at 6:34 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > There is also:
> > 
> >    11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
> >    15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
> > 
> > Well, you also can just get these and do some scripting, no, in this case:
> > 
> >   15.535 - 11.033 = 4.502ms
> > 
> > for that nfs_fsync :-)

> David and Arnaldo thank you your replies. I can see that to acquire
> the timings for non-syscall events requires scripting. Would I be
> missing something by not using “perf” and instead doing scripting on
> the output that I get from the trace_pipe? I guess instead of the

Both will present you with the timestamp and parameters for the enabled
tracepoints, so I think that at this state, yes, you will not get
anything better from perf than what you'll get from ftrace via debugfs.

But this comes from time to time, the logic used for
raw_syscalls:sys_enter -> raw_syscalls:sys_exit, using some set of keys
to match one to the other should be extended to be used with non-syscall
events, such as the ones you mention here.

i.e. whereas raw_syscalls:sys_exit has a "ret" variable to tell the
result of the sys_enter+sys_exit operation, nfs_fsync_exit has "error",
I guess, for the same purpose :)

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-13 19:47         ` Arnaldo Carvalho de Melo
@ 2015-11-13 21:57           ` Kornievskaia, Olga
  0 siblings, 0 replies; 18+ messages in thread
From: Kornievskaia, Olga @ 2015-11-13 21:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Kornievskaia, Olga, linux-perf-users


> On Nov 13, 2015, at 2:47 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Fri, Nov 13, 2015 at 05:01:57PM +0000, Kornievskaia, Olga escreveu:
>>> On Nov 12, 2015, at 6:34 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>> There is also:
>>> 
>>>   11.033 (         ): nfs:nfs_fsync_enter:fileid=00:26:2360376 fhandle=0xfd52be06 version=6216410999163535246 )
>>>   15.535 (         ): nfs:nfs_fsync_exit:error=0 fileid=00:26:2360376 fhandle=0xfd52be06 type=8 (REG) version=6216410999169535149 size=1576 cache_validity=25 (INVALID_ATTR|INVALID_ACCESS|INVALID_ACL) nfs_flags=4 ())
>>> 
>>> Well, you also can just get these and do some scripting, no, in this case:
>>> 
>>>  15.535 - 11.033 = 4.502ms
>>> 
>>> for that nfs_fsync :-)
> 
>> David and Arnaldo thank you your replies. I can see that to acquire
>> the timings for non-syscall events requires scripting. Would I be
>> missing something by not using “perf” and instead doing scripting on
>> the output that I get from the trace_pipe? I guess instead of the
> 
> Both will present you with the timestamp and parameters for the enabled
> tracepoints, so I think that at this state, yes, you will not get
> anything better from perf than what you'll get from ftrace via debugfs.

Actually one thing that came to mind that I would like to clarification about. I know that tracepoints are “lossy” and that if the “reader” of the pipe is not consuming fast enough and the ring buffers get full then tracepoint information gets dropped and pipe logs that it lost events. When I run “perf record”, is it also recording every tracepoint or is it sampling (which I don’t exactly understand but I think of it as it just periodically captures the tracepoint info which basically means it’s losing them intentionally)?

> But this comes from time to time, the logic used for
> raw_syscalls:sys_enter -> raw_syscalls:sys_exit, using some set of keys
> to match one to the other should be extended to be used with non-syscall
> events, such as the ones you mention here.
> 
> i.e. whereas raw_syscalls:sys_exit has a "ret" variable to tell the
> result of the sys_enter+sys_exit operation, nfs_fsync_exit has "error",
> I guess, for the same purpose :)
> 

I don’t exactly follow what you are suggesting here. Are you saying I should look at the perf code that does parsing for raw sys calls and modify and use it for nfs tracepoints?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-12 21:07 timing information with perf Kornievskaia, Olga
  2015-11-12 21:57 ` Arnaldo Carvalho de Melo
@ 2015-11-16 18:08 ` Frank Ch. Eigler
  2015-11-17  0:12   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 18+ messages in thread
From: Frank Ch. Eigler @ 2015-11-16 18:08 UTC (permalink / raw)
  To: Kornievskaia, Olga; +Cc: linux-perf-users

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 677 bytes --]


Olga.Kornievskaia wrote:

> [...] say there are two tracepoints üfoobar_enterý and
> üfoobar_exitý. Each tracepoint logs a timestamp. Iùd like to be able
> to say that on average it takes that many time-units between üenterý
> and üexitý tracepoints. [...]

For completeness, a complete systemtap implementation of this could look like:

# stap -e '
   global s, t%
   probe kernel.trace("foobar_enter") {
       t[tid()]=gettimeofday_us() 
   }
   probe kernel.trace("foobar_exit") {
       if (tid() in t) { s <<< gettimeofday_us() - t[tid()] } 
   }
   probe timer.s(5),end {
       printf("cumulative average us: %d\n", @avg(s)) 
   } 
'


- FChE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-16 18:08 ` Frank Ch. Eigler
@ 2015-11-17  0:12   ` Arnaldo Carvalho de Melo
  2015-11-17  0:30     ` Frank Ch. Eigler
  2015-11-17  3:57     ` David Ahern
  0 siblings, 2 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-17  0:12 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Kornievskaia, Olga, linux-perf-users

Em Mon, Nov 16, 2015 at 01:08:15PM -0500, Frank Ch. Eigler escreveu:
> Olga.Kornievskaia wrote:
 
> > [...] say there are two tracepoints ?foobar_enter? and
> > ?foobar_exit?. Each tracepoint logs a timestamp. I?d like to be able
> > to say that on average it takes that many time-units between ?enter?
> > and ?exit? tracepoints. [...]
 
> For completeness, a complete systemtap implementation of this could look like:
 
> # stap -e '
>    global s, t%
>    probe kernel.trace("foobar_enter") {
>        t[tid()]=gettimeofday_us() 
>    }
>    probe kernel.trace("foobar_exit") {
>        if (tid() in t) { s <<< gettimeofday_us() - t[tid()] } 
>    }
>    probe timer.s(5),end {
>        printf("cumulative average us: %d\n", @avg(s)) 
>    } 
> '

Is that done syncronously?

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17  0:12   ` Arnaldo Carvalho de Melo
@ 2015-11-17  0:30     ` Frank Ch. Eigler
  2015-11-17  1:31       ` Arnaldo Carvalho de Melo
  2015-11-17  3:57     ` David Ahern
  1 sibling, 1 reply; 18+ messages in thread
From: Frank Ch. Eigler @ 2015-11-17  0:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Kornievskaia, Olga, linux-perf-users

Hi -


acme wrote:

> [...]
> > For completeness, a complete systemtap implementation of this could look like:
>  
> > # stap -e '
> >    global s, t%
> >    probe kernel.trace("foobar_enter") {
> >        t[tid()]=gettimeofday_us() 
> >    } [...]
> > '
> 
> Is that done syncronously?

Meaning - calculated live as the tracepoints hit?  Yes, right during
the respective callbacks.  No trace files nor postprocessing.


- FChE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17  0:30     ` Frank Ch. Eigler
@ 2015-11-17  1:31       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-17  1:31 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Kornievskaia, Olga, linux-perf-users

Em Mon, Nov 16, 2015 at 07:30:20PM -0500, Frank Ch. Eigler escreveu:
> acme wrote:
 
> > [...]
> > > For completeness, a complete systemtap implementation of this could look like:
> >  
> > > # stap -e '
> > >    global s, t%
> > >    probe kernel.trace("foobar_enter") {
> > >        t[tid()]=gettimeofday_us() 
> > >    } [...]
> > > '

> > Is that done syncronously?

> Meaning - calculated live as the tracepoints hit?  Yes, right during
> the respective callbacks.  No trace files nor postprocessing.

Right, 'perf trace' is in between those extremes, i.e. no synchronous
processing, no trace file for postprocessing. Just the ring buffer, that
is continuously processed.

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17  0:12   ` Arnaldo Carvalho de Melo
  2015-11-17  0:30     ` Frank Ch. Eigler
@ 2015-11-17  3:57     ` David Ahern
  2015-11-17 13:31       ` Frank Ch. Eigler
  1 sibling, 1 reply; 18+ messages in thread
From: David Ahern @ 2015-11-17  3:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Frank Ch. Eigler
  Cc: Kornievskaia, Olga, linux-perf-users

On 11/16/15 5:12 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 16, 2015 at 01:08:15PM -0500, Frank Ch. Eigler escreveu:
>> Olga.Kornievskaia wrote:
>
>>> [...] say there are two tracepoints ?foobar_enter? and
>>> ?foobar_exit?. Each tracepoint logs a timestamp. I?d like to be able
>>> to say that on average it takes that many time-units between ?enter?
>>> and ?exit? tracepoints. [...]
>
>> For completeness, a complete systemtap implementation of this could look like:
>
>> # stap -e '
>>     global s, t%
>>     probe kernel.trace("foobar_enter") {
>>         t[tid()]=gettimeofday_us()
>>     }
>>     probe kernel.trace("foobar_exit") {
>>         if (tid() in t) { s <<< gettimeofday_us() - t[tid()] }
>>     }
>>     probe timer.s(5),end {
>>         printf("cumulative average us: %d\n", @avg(s))
>>     }
>> '
>
> Is that done syncronously?

Interesting solution. How does it scale with CPUs and tasks?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17  3:57     ` David Ahern
@ 2015-11-17 13:31       ` Frank Ch. Eigler
  2015-11-17 13:39         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Frank Ch. Eigler @ 2015-11-17 13:31 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Kornievskaia, Olga, linux-perf-users

Hi -

> >>    global s, t%
> >>    probe kernel.trace("foobar_enter") {
> >>        t[tid()]=gettimeofday_us()
> >>    }
> >>    probe kernel.trace("foobar_exit") {
> >>        if (tid() in t) { s <<< gettimeofday_us() - t[tid()] }
> >>    }
> >>    probe timer.s(5),end {
> >>        printf("cumulative average us: %d\n", @avg(s))
> >>    }
> > [...]
> 
> Interesting solution. How does it scale with CPUs and tasks?

It's not a simple question, as there are several factors, but
generally fine.

systemtap compiles down to native code, so execution is very fast and
CPU consumption is not a problem (better than streaming to userspace &
processing there).

Depending on actual runtime behavior, there may be contention over the
t[] array that tracks start-time.  systemtap automagically safely
locks these for correctness.  If contention is high, trylock/timeouts
can fail and some probes will be missed.  Fail too many and the script
will self-terminate.  (See [man stap] for some of the related
configurables.)

There is no contention over the s scalar value tracks elapsed time,
since it is implicitly aggregated on a per-cpu basis.  That scales
without problem.

- FChE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17 13:31       ` Frank Ch. Eigler
@ 2015-11-17 13:39         ` Arnaldo Carvalho de Melo
  2015-11-17 14:04           ` Frank Ch. Eigler
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-17 13:39 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: David Ahern, Kornievskaia, Olga, linux-perf-users

Em Tue, Nov 17, 2015 at 08:31:11AM -0500, Frank Ch. Eigler escreveu:
> Hi -
> 
> > >>    global s, t%
> > >>    probe kernel.trace("foobar_enter") {
> > >>        t[tid()]=gettimeofday_us()
> > >>    }
> > >>    probe kernel.trace("foobar_exit") {
> > >>        if (tid() in t) { s <<< gettimeofday_us() - t[tid()] }
> > >>    }
> > >>    probe timer.s(5),end {
> > >>        printf("cumulative average us: %d\n", @avg(s))
> > >>    }
> > > [...]
> > 
> > Interesting solution. How does it scale with CPUs and tasks?
> 
> It's not a simple question, as there are several factors, but
> generally fine.
> 
> systemtap compiles down to native code, so execution is very fast and
> CPU consumption is not a problem (better than streaming to userspace &
> processing there).

That "better' is also not a simple answer, as there are several factors
:-)

Anyway, those are different methods, applicable to different scenarios
and sets of constraints, sometimes only one of these tools can be used,
due to these constraints, sometimes one can try several and see how they
fare.

- Arnaldo
 
> Depending on actual runtime behavior, there may be contention over the
> t[] array that tracks start-time.  systemtap automagically safely
> locks these for correctness.  If contention is high, trylock/timeouts
> can fail and some probes will be missed.  Fail too many and the script
> will self-terminate.  (See [man stap] for some of the related
> configurables.)
> 
> There is no contention over the s scalar value tracks elapsed time,
> since it is implicitly aggregated on a per-cpu basis.  That scales
> without problem.
> 
> - FChE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: timing information with perf
  2015-11-17 13:39         ` Arnaldo Carvalho de Melo
@ 2015-11-17 14:04           ` Frank Ch. Eigler
  0 siblings, 0 replies; 18+ messages in thread
From: Frank Ch. Eigler @ 2015-11-17 14:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Ahern, Kornievskaia, Olga, linux-perf-users

Hi -

> > systemtap compiles down to native code, so execution is very fast and
> > CPU consumption is not a problem (better than streaming to userspace &
> > processing there).
> 
> That "better' is also not a simple answer, as there are several factors
> :-)

(I was referring only to CPU consumption in that context.)


> Anyway, those are different methods, applicable to different scenarios
> and sets of constraints, sometimes only one of these tools can be used,
> due to these constraints, sometimes one can try several and see how they
> fare.

That's 100% right.


- FChE

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-11-17 14:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 21:07 timing information with perf Kornievskaia, Olga
2015-11-12 21:57 ` Arnaldo Carvalho de Melo
2015-11-12 22:04   ` Arnaldo Carvalho de Melo
2015-11-12 23:19     ` David Ahern
2015-11-12 23:09   ` Kornievskaia, Olga
2015-11-12 23:22     ` David Ahern
2015-11-12 23:34     ` Arnaldo Carvalho de Melo
2015-11-13 17:01       ` Kornievskaia, Olga
2015-11-13 19:47         ` Arnaldo Carvalho de Melo
2015-11-13 21:57           ` Kornievskaia, Olga
2015-11-16 18:08 ` Frank Ch. Eigler
2015-11-17  0:12   ` Arnaldo Carvalho de Melo
2015-11-17  0:30     ` Frank Ch. Eigler
2015-11-17  1:31       ` Arnaldo Carvalho de Melo
2015-11-17  3:57     ` David Ahern
2015-11-17 13:31       ` Frank Ch. Eigler
2015-11-17 13:39         ` Arnaldo Carvalho de Melo
2015-11-17 14:04           ` Frank Ch. Eigler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.