linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] New utility: 'trace'
@ 2010-11-16 21:04 Thomas Gleixner
  2010-11-16 21:27 ` Darren Hart
                   ` (6 more replies)
  0 siblings, 7 replies; 88+ messages in thread
From: Thomas Gleixner @ 2010-11-16 21:04 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

We are pleased to announce a new tracing tool called 'trace'.

This tool follows up on Linus's (not so) recent observations that
Linux tracing is still not really offering good tracing tools to
ordinary users. There's no easy to use workflow and the tools that
exist are not expressive enough to solve everyday problem.

After years of efforts we have not succeeded in meeting (let alone
exceeding) the utility of decades-old user-space tracing tools such as
strace - except for a few new good tools such as PowerTop and
LatencyTop.

'trace' is our shot at improving the situation: it aims at providing a
simple to use and straightforward tracing tool based on the perf
infrastructure and on the well-known perf profiling workflow:

   trace record firefox          # trace firefox and all child tasks,
                                   put it into trace.data

   trace summary                 # show quick overview of the trace,
                                   based on trace.data

   trace report                  # detailed traces, based on trace.data

The main workflow approach is to trace once, and then to analyze the
trace in an iterative fashion, by drilling down from a high-level
overview to more and more detailed trace events - with numerous
filtering capabilities available to make this progression effective
and intuitive. (further below there are a number of real examples of
what you can do - but try out the tool to see all the capabilities.)

To try it out, pick up the -tip tree:

   http://people.redhat.com/mingo/tip.git/README

   ( NOTE: you will need to install the audit-libs-dev /
     audit-libs-devel package if you dont have it installed. We'll
     remove this library dependency in a future iteration of the
     tool. )

And do:

   cd tools/perf/
   make -j install

Once you've done that you should have the 'trace' tool available.

Build the -tip kernel and reboot into it to get the most out of the
tool - but even on default kernels it will try to cope. Syscall events
(CONFIG_FTRACE_SYSCALLS=y) is a must for most methods of analysis
though. (with the above -tip tree you'll get it all configured by
default)

If you've booted the new kernel you can run 'trace check' to double
check that all events used by the tool are available:

 $ trace check

 Checking whether the kernel has all required events ...
  ... Checking event  raw_syscalls:sys_enter:r      : ok
  ...
  ... Checking event  sched:sched_stat_runtime:r    : ok
 Good: all required event types are supported by this kernel.
 The 'trace' utility will be fully functional.

The combo diffstat of the tool is appended at the end of the mail. The
overwhelming majority of changes is on the tooling side - it uses
existing perf events facilities and features to implement the tool.

Here's a bit more general description of the 'trace' tool's capabilities:

     - Trace sessions (single tasks or groups of processes/threads
                       (auto-follow), system-wide and per cpu
                       tracing).

     - Unintrusive tracing

     - It will record filterable information about the session:

        * syscalls
        * task lifetime events (fork/clone/exit)
        * pagefaults
	* various mm events (mmap)
	* scheduling events

        * (support for more events is being worked on)

     - Iterative analysis of an existing session/trace without having
       to re-run the trace, with different filters and options (trace
       data is saved on disk)

     - Concurrent and multi-user safe trace recording: multiple users
       can trace, or the same user can trace in multiple different
       tracing sessions.

     - Available to plain users - the permission model is not
       completely done yet, it needs a temporary workaround currently:

	echo -1 > /proc/sys/kernel/perf_event_paranoid

Todo list:

     - Improve output formatting
     - Utilize more events: networking, block IO
     - Add more higher level trace points
     - Implement 'trace diff' utility to simplify the comparison of traces
     - Improve the permission model (work in progress)
     - Implement -p NNN (trace existing pid) option

Main differences to strace:

     - Low impact recording

     - Arguments are printed in plain hex representation, except for
       important arguments like filenames, which are resolved after
       the syscall is recorded.  'trace' wont do full decoding like
       strace does.

     - Ability to record pagefaults and resolve them fully back to
       within the corresponding mapping:

        * file : offset
        * [heap] : offset
        * [stack] : offset
        * [anon] : offset
        * [vdso] : offset

     - Ability to repeat analysis with different filters and options

Here are a few real-life examples of how to use the 'trace' tool.

Firstly, to record a trace, simply prefix whatever command you'd like
to be traced with 'trace record':

  $ trace record firefox
  # trace recorded [2.570 MB] - try 'trace summary' to get an overview

The new trace.data file in the current directory contains the tracing
session, and it can be analyzed after it has been recorded - without
having to repeat tracing again.

For example, you can get a summary of all events:

  $ trace summary

    .--------------------------------.
 __(    Summary of tasks traced       )_______________________________

               [ task - pid ]     [ events ] [ ratio ]  [ runtime ]
 _____________________________________________________________________

       run-mozilla.sh - 1740   :     14108   [ 57.2% ]     37.341 ms
              firefox - 1741   :       851   [  3.5% ]      0.708 ms
                uname - 1742   :       842   [  3.4% ]      0.527 ms
              firefox - 1747   :      1793   [  7.3% ]      3.568 ms
                mkdir - 1748   :       713   [  2.9% ]      1.289 ms
              firefox - 1750   :       497   [  2.0% ]      2.050 ms
                :1751 - 1751   :       368   [  1.5% ]      0.000 ms
                  sed - 1752   :      1034   [  4.2% ]      0.479 ms
                :1753 - 1753   :       494   [  2.0% ]      2.059 ms
                :1754 - 1754   :       377   [  1.5% ]      0.331 ms
                  sed - 1755   :      1035   [  4.2% ]      0.378 ms
       run-mozilla.sh - 1756   :       848   [  3.4% ]      0.467 ms
              dirname - 1757   :       845   [  3.4% ]      0.531 ms
       run-mozilla.sh - 1758   :       840   [  3.4% ]      0.501 ms
 ____________________________________________________________________

                   14   tasks  :     24645   [100.0% ]     50.228 ms
  $

( Note that the table is augmented with colors to highligh
  high-overhead entries and other information. Those colors are not
  visible in the cut & paste screens here :-) )

To get to the detailed trace (see further below about subsystem level
filtering and higher level overview):

 $ trace report

 #
 # trace events of 'firefox':
 #
             firefox/28132 ( 0.000 ms):  ... [continued]: execve() => 0x0
             firefox/28132 ( 0.001 ms): brk(brk: 0x0)                                                          => 0x6e2000
             firefox/28132 ( 0.010 ms): mmap(addr: 0x0, len: 0x1000, prot: 0x3, flags: 0x22, fd: 0xffffffff, off: 0x0) => 0x7fb3f442c000
             firefox/28132 ( 0.006 ms): access(filename: 0x7fb3f422aa60, mode: 0x4) (fpath: /etc/ld.so.preload)  => 0xfffffffffffffffe
             firefox/28132 ( 0.004 ms): open(filename: 0x7fb3f4227f4b, flags: 0x0, mode: 0x1) (fpath: /etc/ld.so.cache)  => 0x3
             firefox/28132 ( 0.001 ms): fstat(3:</etc/ld.so.cache>, statbuf: 0x7fffafe4ef80)                   => 0x0

To see the events of the 'mozilla-xremote' task alone, the -t option
can be used:

  $ trace report -t mozilla-xremote

 ( 0.000 ms):  ... [continued]: clone() => 0x0
 ( 0.003 ms): close(255:<...>)                                                       => 0x0
 ( 0.002 ms): rt_sigprocmask(how: 0x2, set: 0x6e1a60, oset: 0x0, sigsetsize: 0x8)    => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x14, act: 0x7fffcc4caa80, oact: 0x7fffcc4ca9e0, sigsetsize: 0x8) => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x15, act: 0x7fffcc4caa80, oact: 0x7fffcc4ca9e0, sigsetsize: 0x8) => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x16, act: 0x7fffcc4caa90, oact: 0x7fffcc4ca9f0, sigsetsize: 0x8) => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x2, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8) => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x3, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8) => 0x0
 ( 0.001 ms): rt_sigaction(sig: 0x11, act: 0x7fffcc4cabb0, oact: 0x7fffcc4cab10, sigsetsize: 0x8) => 0x0
 ( 0.010 ms): open(filename: 0x6e95b0, flags: 0x241, mode: 0x1b6) (fpath: /dev/null)  => 0x3
 ( 0.001 ms): dup2(oldfd: 0x3, newfd: 0x1)                                           => 0x1
 ( 0.001 ms): close(3:</dev/null>)                                                   => 0x0

Note how the comm/pid column has been collapsed - because there's only
a single task to display - this keeps the output simpler.

But such traces are generally still too large and detailed to be
really suitable for at-a-glance analysis, so 'trace' offers further
filters which allow you to drill down into any traced workload.

For example subsystem filters will output activities of a given kernel
subsystem.

Current subsystems tags are:

   aio, arch-x86, events, fd, fs, fs-attr, IO, IO-locking,
   IPC-locking, IPC-mm, IPC-net, locking, misc, mm, net, process,
   sched, security, signal, stat, system, task, timer, tty.

This is a first rough categorization of events - more is possible and
more will be done. (Suggestions are welcome!)

For example to utilize the 'process' event category, simply add it
after 'trace report':

  $ trace report process

 # (restricting syscalls to: process)
             firefox/ 6722 ( 0.000 ms):  ... [continued]: execve() => 0x0
             firefox/ 6722 ( 0.257 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a43
            basename/ 6723 ( 0.000 ms):  ... [continued]: clone() => 0x0
            basename/ 6723 ( 0.215 ms): execve(0x6e9c40, 0x6e8d20, 0x6e8b00, 0x7fffcc4ca980, 0x0, 0x0) (fpath: /bin/basename)  => 0x0
            basename/ 6723 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.021 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cad6c, options: 0x0, ru: 0x0) => 0x1a43
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca8dc, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.267 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a44
      run-mozilla.sh/ 6724 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6724 ( 0.166 ms): execve(0x6ea150, 0x6ea400, 0x6ec660, 0x7fffcc4ca830, 0x0, 0x7f1aa2a611e8) (fpath: /bin/uname)  => 0x0
      run-mozilla.sh/ 6724 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac1c, options: 0x0, ru: 0x0) => 0x1a44
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca79c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.246 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a45
     mozilla-xremote/ 6725 ( 0.000 ms):  ... [continued]: clone() => 0x0
     mozilla-xremote/ 6725 ( 0.204 ms): execve(0x6ef210, 0x6e95d0, 0x6ecda0, 0x7fffcc4caa30, 0x1, 0x2a) (fpath: /usr/lib64//xulrunner-2/mozilla-xremote-client)  => 0x0
     mozilla-xremote/ 6725 ( 0.000 ms): exit_group(error_code: 0x1)                                           
             firefox/ 6722 ( 3.262 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac7c, options: 0x0, ru: 0x0) => 0x1a45
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca7dc, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.247 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a46
      run-mozilla.sh/ 6726 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6726 ( 0.150 ms): execve(0x6edca0, 0x6e9ab0, 0x6ecda0, 0x7fffcc4c9ca0, 0x1, 0x7f1aa2a611e8) (fpath: /bin/mkdir)  => 0x0
      run-mozilla.sh/ 6726 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 1.148 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9eec, options: 0x0, ru: 0x0) => 0x1a46
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9a5c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.243 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a47
             firefox/ 6727 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6727 ( 0.289 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a47, 0x1a47)             => 0x1a48
             firefox/ 6728 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6727 ( 0.259 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a47, 0x1a47)             => 0x1a49
                 sed/ 6729 ( 0.000 ms):  ... [continued]: clone() => 0x0
             firefox/ 6728 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6727 ( 0.151 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc, options: 0x0, ru: 0x0) => 0x1a48
                 sed/ 6729 ( 0.161 ms): execve(0x6e9ed0, 0x6f25e0, 0x6ecda0, 0x7fffcc4c9bb0, 0x0, 0x0) (fpath: /bin/sed)  => 0x0
                 sed/ 6729 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6727 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc, options: 0x0, ru: 0x0) => 0x1a49
             firefox/ 6727 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9c1c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6727 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.041 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca2bc, options: 0x0, ru: 0x0) => 0x1a47
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9e1c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.264 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x0, 0x1a42)                => 0x1a4a
      run-mozilla.sh/ 6730 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6730 ( 0.277 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a4a, 0x1a4a)             => 0x1a4b
               :6731/ 6731 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6730 ( 0.248 ms): clone(0x1200011, 0x0, 0x0, 0x7f1aa32a69f0, 0x1a4a, 0x1a4a)             => 0x1a4c
                 sed/ 6732 ( 0.000 ms):  ... [continued]: clone() => 0x0
               :6731/ 6731 ( 0.000 ms): exit_group(error_code: 0x0)                                           
      run-mozilla.sh/ 6730 ( 0.214 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc, options: 0x0, ru: 0x0) => 0x1a4b
                 sed/ 6732 ( 0.229 ms): execve(0x6e9e50, 0x6f25e0, 0x6ecda0, 0x7fffcc4c9d90, 0x0, 0x0) (fpath: /bin/sed)  => 0x0
                 sed/ 6732 ( 0.000 ms): exit_group(error_code: 0x0)                                           
      run-mozilla.sh/ 6730 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc, options: 0x0, ru: 0x0) => 0x1a4c
      run-mozilla.sh/ 6730 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9ddc, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
      run-mozilla.sh/ 6730 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.043 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca49c, options: 0x0, ru: 0x0) => 0x1a4a
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4ca01c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.174 ms): execve(0x6f0ee0, 0x6f0550, 0x6ef580, 0x7fffcc4cae50, 0x0, 0x0) (fpath: /usr/lib64/firefox-4/run-mozilla.sh)  => 0x0
             firefox/ 6722 ( 0.489 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)                => 0x1a4d
            basename/ 6733 ( 0.000 ms):  ... [continued]: clone() => 0x0
            basename/ 6733 ( 0.154 ms): execve(0x6ea830, 0x6eacd0, 0x6e9070, 0x7fffc740ac50, 0x0, 0x7f2b576381e8) (fpath: /bin/basename)  => 0x0
            basename/ 6733 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740b03c, options: 0x0, ru: 0x0) => 0x1a4d
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740ab9c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.243 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)                => 0x1a4e
      run-mozilla.sh/ 6734 ( 0.000 ms):  ... [continued]: clone() => 0x0
      run-mozilla.sh/ 6734 ( 0.154 ms): execve(0x6ea790, 0x6eabd0, 0x6e9070, 0x7fffc740ac50, 0x0, 0x7f2b576381e8) (fpath: /usr/bin/dirname)  => 0x0
      run-mozilla.sh/ 6734 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.074 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740b03c, options: 0x0, ru: 0x0) => 0x1a4e
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740ab9c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.237 ms): clone(0x1200011, 0x0, 0x0, 0x7f2b57e7d9f0, 0x0, 0x1a42)                => 0x1a4f
               uname/ 6735 ( 0.000 ms):  ... [continued]: clone() => 0x0
               uname/ 6735 ( 0.158 ms): execve(0x6e9780, 0x6e9b00, 0x6e97a0, 0x7fffc740a330, 0x0, 0x7f2b576381e8) (fpath: /bin/uname)  => 0x0
               uname/ 6735 ( 0.000 ms): exit_group(error_code: 0x0)                                           
             firefox/ 6722 ( 0.020 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740a71c, options: 0x0, ru: 0x0) => 0x1a4f
             firefox/ 6722 ( 0.001 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffc740a29c, options: 0x1, ru: 0x0) => 0xfffffffffffffff6
             firefox/ 6722 ( 0.194 ms): execve(0x6f8ae0, 0x6f9c00, 0x6f8ed0, 0x7fffc740a8f0, 0x0, 0x40) (fpath: /usr/lib64/firefox-4/firefox)  => 0x0
             firefox/ 6722 ( 0.000 ms): exit_group(error_code: 0x1)                                           

(Multiple categories can be specified too - for example 'process,io'.)

Note that this is the complete output already, no abbreviations - this
dump alone is already short enough and suitable to get an overview of
what happened in this Firefox session.

Other useful filters are 'trace report io' - this filters out IO
events (syscalls for the time being), or 'trace report fs' to see FS
related activities.

Another common approach to analyzing traces is to figure out which
portion of a trace took the most time. 'trace' allows such analysis
via a duration filter. For example, to see all events that took longer
than 0.5 milliseconds:

  $ trace report -d 0.5

             firefox/ 6722 ( 1.273 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4caf40, count: 0x80)           => 0x8
             firefox/ 6722 ( 1.213 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4cadf0, count: 0x80)           => 0x7
             firefox/ 6722 ( 3.262 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4cac7c, options: 0x0, ru: 0x0) => 0x1a45
             firefox/ 6722 ( 1.148 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9eec, options: 0x0, ru: 0x0) => 0x1a46
             firefox/ 6722 ( 2.360 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4ca490, count: 0x80)           => 0x3
             firefox/ 6727 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9dfc, options: 0x0, ru: 0x0) => 0x1a49
             firefox/ 6722 ( 2.431 ms): read(3:</usr/bin/firefox>, buf: 0x7fffcc4ca670, count: 0x80)           => 0x6
      run-mozilla.sh/ 6730 ( 1.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffcc4c9fdc, options: 0x0, ru: 0x0) => 0x1a4c
             firefox/ 6722 ( 1.054 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf: 0x7fffc740b210, count: 0x80) => 0xf
             firefox/ 6722 ( 1.053 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf: 0x7fffc740b210, count: 0x80) => 0x15
             firefox/ 6722 ( 1.157 ms): read(3:</usr/lib64/firefox-4/run-mozilla.sh>, buf: 0x7fffc740a8f0, count: 0x80) => 0x6

Another thing that is recorded are page fault events. These can be analyzed via the 
-p option:

  $ trace report -p

  [...]
             firefox/ 6722 ( 0.001 ms):      #PF       4793: [                    g_str_hash]: => /usr/lib64/libgdk-x11-2.0.so.0.2302.0 offset: 0xa5000 page: 165 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4794: [         gdk_event_handler_set]: => /usr/lib64/libgdk-x11-2.0.so.0.2302.0 offset: 0x25bc0 page: 37 (R.U)
             firefox/ 6722 ( 0.001 ms): rt_sigaction(sig: 0xd, act: 0x7fff163c34c0, oact: 0x7fff163c3420, sigsetsize: 0x8) => 0x0
             firefox/ 6722 ( 0.001 ms):      #PF       4795: [                  7f4dddedd860]: => /usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x72860 page: 114 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4796: [                  7f4dde0163a0]: => /usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x1ab3a0 page: 427 (R.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4797: [                  7f4dde0163a4]: => [anon 0x7f4dde4ee000] offset: 0x220 page: 0 (R.U)
             firefox/ 6722 ( 0.002 ms):      #PF       4798: [                  7f4dde0163b0]: => [anon 0x7f4dde4ee000] offset: 0x220 page: 0 (W.U)
             firefox/ 6722 ( 0.001 ms):      #PF       4799: [                  7f4dde011c50]: => /usr/lib64/libgtk-x11-2.0.so.0.2302.0 offset: 0x1a6c50 page: 422 (R.U)
  [...]

Note that both data faults and code faults are traced and interpreted
- and if the fault lies in an executable area (binary or shared
library) then the tool will figure out the function that caused the
fault.

The file name of the faulting address is displayed as well, and the
offset into that file.

Here is an example of the 'duration filter', where only events that
took longer than 20 msecs are displayed, including absolute timestamps
(with syscalls, pagefaults and scheduler events included):

 $ trace report -t -P -s -d 20
#
# trace events of 'firefox':
# (duration filter: 20.000 msecs)
#
    29.205       run-mozilla.sh/11425 (22.723 ms): => blocked [state: 0x02 cputime: 0.044 ms #PF: 0] stat(filename: 0x1734860, statbuf: 0x7fff313370b0) (fpath: /usr/lib64/firefox-3.6/firefox)
    32.347       run-mozilla.sh/11425 (25.864 ms): stat(filename: 0x1734860, statbuf: 0x7fff313370b0) (fpath: /usr/lib64/firefox-3.6/firefox)  => 0x0
    69.612      mozilla-plugin-/11428 (23.321 ms): => blocked [state: 0x02 cputime: 0.041 ms #PF: 0] stat(filename: 0x8f1ba0, statbuf: 0x7fff06a44a40) (fpath: /usr/lib64/nspluginwrapper/plugi
    74.800      mozilla-plugin-/11428 (28.509 ms): stat(filename: 0x8f1ba0, statbuf: 0x7fff06a44a40) (fpath: /usr/lib64/nspluginwrapper/plugin-config)  => 0x0
   307.225      mozilla-plugin-/11428 (216.985 ms): => unblock [PF: 0]
   307.238      mozilla-plugin-/11428 (217.002 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fff06a44c7c, options: 0x0, ru: 0x0) => 0x2ca6
   307.459       run-mozilla.sh/11425 (274.198 ms): => unblock [PF: 0]
   307.466       run-mozilla.sh/11425 (274.209 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fff313372ec, options: 0x0, ru: 0x0) => 0x2ca4
   534.117       run-mozilla.sh/11425 (221.398 ms): => unblock [PF: 0]
   534.128       run-mozilla.sh/11425 (221.413 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fff313370ec, options: 0x0, ru: 0x0) => 0x2ca7
   556.522       run-mozilla.sh/11425 (21.759 ms): execve(0x173c110, 0x173b800, 0x173acc0, 0x7fff31337040, 0x0, 0x0) (fpath: /usr/lib64/firefox-3.6/run-mozilla.sh)  => 0x0
   597.105       run-mozilla.sh/11435 (23.718 ms):      #PF         64: [                      __execve]: => /usr/lib64/firefox-3.6/firefox offset: 0x6d8 page: 0 (W.K)
   597.230       run-mozilla.sh/11435 (30.228 ms): execve(0x1a7c860, 0x1a81560, 0x1a88140, 0x7fffd937b050, 0x0, 0x7f30498921e8) (fpath: /usr/lib64/firefox-3.6/firefox)  => 0x0
   628.920       run-mozilla.sh/11435 (20.541 ms): open(filename: 0x7fff35246ec4, flags: 0x0, mode: 0x1b6) (fpath: /etc/gre.d/gre64.conf)  => 0x3
   767.480       run-mozilla.sh/11435 (29.628 ms):      #PF        579: [                        memset]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0x1ce7a0 page: 462 (W.U)
  1179.170       run-mozilla.sh/11435 (26.734 ms): => unblock [PF: 1]
  1179.270       run-mozilla.sh/11435 (26.864 ms):      #PF       2944: [                  7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U) ... [unfini
  1179.270       run-mozilla.sh/11435 (26.864 ms): => blocked [state: 0x02 cputime: 0.010 ms #PF: 1] 
  1193.124       run-mozilla.sh/11435 (40.718 ms):      #PF       2944: [                  7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U)
  1730.468       run-mozilla.sh/11435 (20.649 ms): access(filename: 0x7fb7a5981040, mode: 0x4) (fpath: /usr/share/X11/locale/en_US.UTF-8/XLC_LOCALE)  => 0x0
  1984.050       run-mozilla.sh/11425 (1417.131 ms): => unblock [PF: 0]
  1984.070       run-mozilla.sh/11425 (1417.155 ms): wait4(upid: 0xffffffffffffffff, stat_addr: 0x7fffd937b32c, options: 0x0, ru: 0x0) => 0x2cab

Here is an example of looking at a specific task (pid 11435),
scheduler events in a timeframe around an interesting event which we
picked from the above duration filtered output.

 $ trace report -t -P -s -f 11435 -T 1152.4,1196
 #
 # trace events of 'firefox':
 #
  1152.401 ( 0.002 ms):      #PF       2942: [                  7fb7a3005e20]: => /usr/lib64/libssl3.so offset: 0x29e20 page: 41 (R.U)
  1152.404 ( 0.001 ms):      #PF       2943: [                  7fb7a39375d0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0x4dd5d0 page: 1245 (R.U)
  1152.436 ( 0.031 ms):      #PF       2944: [                  7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U) ... [unfinished]
  1152.436 ( 0.031 ms): => blocked [state: 0x02 cputime: 0.216 ms #PF: 1] 
  1179.170 (26.734 ms): => unblock [PF: 1]
  1179.270 (26.864 ms): => blocked [state: 0x02 cputime: 0.010 ms #PF: 1] 
  1193.120 (13.850 ms): => unblock [PF: 1]
  1193.124 (40.718 ms):      #PF       2944: [                  7fb7a42db4e0]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe814e0 page: 3713 (R.U)
  1193.187 ( 0.061 ms):      #PF       2945: [                  7fb7a42c270c]: => /usr/lib64/xulrunner-1.9.2/libxul.so offset: 0xe6870c page: 3688 (R.U) ... [unfinished]
  1193.187 ( 0.061 ms): => blocked [state: 0x02 cputime: 0.067 ms #PF: 1] 

At this point we'd like to ask for feedback from other users of tracing tools,
which workflow components would you like to see in addition to what the 'trace'
tool is offering?

Comments, bug reports, suggestions welcome,

Thanks,

	Thomas, Ingo
---
 arch/x86/mm/fault.c                                |   32 +-
 fs/namei.c                                         |    7 +-
 include/trace/events/kmem.h                        |   39 +
 include/trace/events/sched.h                       |   80 +
 include/trace/events/vfs.h                         |   32 +
 init/Kconfig                                       |    3 +
 kernel/perf_event.c                                |    2 +-
 kernel/sched.c                                     |    2 +
 kernel/trace/trace.c                               |   17 +
 .../{perf-trace-perl.txt => perf-script-perl.txt}  |   28 +-
 ...erf-trace-python.txt => perf-script-python.txt} |   88 +-
 tools/perf/Documentation/perf-script.txt           |  111 ++
 tools/perf/Documentation/perf-trace.txt            |   96 +-
 tools/perf/Makefile                                |   14 +-
 tools/perf/builtin-lock.c                          |    6 +-
 tools/perf/builtin-record.c                        |   91 +-
 tools/perf/builtin-sched.c                         |    6 +-
 tools/perf/builtin-script.c                        |  826 +++++++++
 tools/perf/builtin-trace.c                         | 1839 +++++++++++++-------
 tools/perf/builtin.h                               |    3 +-
 tools/perf/command-list.txt                        |    1 +
 tools/perf/perf.c                                  |   13 +-
 .../perf/scripts/python/Perf-Trace-Util/Context.c  |    2 +-
 tools/perf/syscall-attr.h                          |  303 ++++
 tools/perf/util/event.c                            |    6 +-
 tools/perf/util/header.c                           |   76 +-
 tools/perf/util/header.h                           |    1 +
 tools/perf/util/map.c                              |   26 +-
 tools/perf/util/parse-events.c                     |   18 +-
 tools/perf/util/parse-events.h                     |    1 +
 .../perf/util/scripting-engines/trace-event-perl.c |    6 +-
 .../util/scripting-engines/trace-event-python.c    |    4 +-
 tools/perf/util/session.c                          |   28 +-
 tools/perf/util/session.h                          |    5 +
 34 files changed, 2969 insertions(+), 843 deletions(-)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
@ 2010-11-16 21:27 ` Darren Hart
  2010-11-16 21:59   ` Ingo Molnar
  2010-11-16 22:17   ` Frederic Weisbecker
  2010-11-16 21:35 ` Ingo Molnar
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-16 21:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
> We are pleased to announce a new tracing tool called 'trace'.
>

Hi Thomas, Ingo - Congrats and Thanks!

> At this point we'd like to ask for feedback from other users of tracing tools,
> which workflow components would you like to see in addition to what the 'trace'
> tool is offering?
>
> Comments, bug reports, suggestions welcome,

The first thing that comes to mind is trace_marker:

echo "Test point A" > /sys/kernel/debug/tracing/trace_marker

I've found this sort of markup to be useful using ftrace (with C 
equivalents embedded in the test case). Is this supported?

Thanks,

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
  2010-11-16 21:27 ` Darren Hart
@ 2010-11-16 21:35 ` Ingo Molnar
  2010-11-17  1:37 ` Ted Ts'o
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-16 21:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet


* Thomas Gleixner <tglx@linutronix.de> wrote:

> Todo list:
>
> [...]
> 
>      - Implement -p NNN (trace existing pid) option

Update, 'trace -p <pid> ...' is now implemented via:

  da46d9f09699: trace: Add -p <pid> recording

I have just pushed it out into -tip.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:27 ` Darren Hart
@ 2010-11-16 21:59   ` Ingo Molnar
  2010-11-16 22:03     ` Peter Zijlstra
  2010-11-16 22:07     ` Darren Hart
  2010-11-16 22:17   ` Frederic Weisbecker
  1 sibling, 2 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-16 21:59 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet


* Darren Hart <dvhart@linux.intel.com> wrote:

> On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
> >We are pleased to announce a new tracing tool called 'trace'.
> >
> 
> Hi Thomas, Ingo - Congrats and Thanks!

You are welcome :)

> > At this point we'd like to ask for feedback from other users of tracing tools, 
> > which workflow components would you like to see in addition to what the 'trace' 
> > tool is offering?
> >
> > Comments, bug reports, suggestions welcome,
> 
> The first thing that comes to mind is trace_marker:
> 
>   echo "Test point A" > /sys/kernel/debug/tracing/trace_marker
> 
> I've found this sort of markup to be useful using ftrace (with C equivalents 
> embedded in the test case). Is this supported?

Yes, LatencyTop uses something similar IIRC, via a prctl() hack: it uses 
prctl(PR_SET_NAME) to inject up to 16 characters into the comm - and then every perf 
event gets that comm. You can utilize that straight away if you need a marker 
solution right now.

A cleaner, more generic approach would be a more free-form ASCII event facility for 
this: a generic 'user-space triggered event' with injection capabilities.

To inject it, we could put that into a prctl() as well, for easy, single-syscall 
access from C. We already have two perf prctls: PR_TASK_PERF_EVENTS_DISABLE, 
PR_TASK_PERF_EVENTS_ENABLE - and this would be a third one: PR_TASK_PERF_USER_EVENT.

This would, like the existing perf_event_task_enable()/disable() methods, loop 
through current events and inject a string into matching user events.

Ideally i'd like to see basically user-space access to trace_printk(), and proper 
integration into /debug/tracing/events/ enumeration and availability - not a 
ftrace-specific and admin-only hack like /sys/kernel/debug/tracing/trace_marker is 
today.

Would you be interested in helping out with (and testing) such a more generic 
approach?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:59   ` Ingo Molnar
@ 2010-11-16 22:03     ` Peter Zijlstra
  2010-11-16 22:08       ` Ingo Molnar
  2010-11-16 22:09       ` Darren Hart
  2010-11-16 22:07     ` Darren Hart
  1 sibling, 2 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-16 22:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Tue, 2010-11-16 at 22:59 +0100, Ingo Molnar wrote:
> 
> 
> Would you be interested in helping out with (and testing) such a more generic 
> approach? 

how about having perf-record open a named pipe and rewriting everything
that comes in through that as an event and storing it in the buffer?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:59   ` Ingo Molnar
  2010-11-16 22:03     ` Peter Zijlstra
@ 2010-11-16 22:07     ` Darren Hart
  1 sibling, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-16 22:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On 11/16/2010 01:59 PM, Ingo Molnar wrote:
>
> * Darren Hart<dvhart@linux.intel.com>  wrote:
>
>> On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
>>> We are pleased to announce a new tracing tool called 'trace'.
>>>
>>
>> Hi Thomas, Ingo - Congrats and Thanks!
>
> You are welcome :)
>
>>> At this point we'd like to ask for feedback from other users of tracing tools,
>>> which workflow components would you like to see in addition to what the 'trace'
>>> tool is offering?
>>>
>>> Comments, bug reports, suggestions welcome,
>>
>> The first thing that comes to mind is trace_marker:
>>
>>    echo "Test point A">  /sys/kernel/debug/tracing/trace_marker
>>
>> I've found this sort of markup to be useful using ftrace (with C equivalents
>> embedded in the test case). Is this supported?
>
> Yes, LatencyTop uses something similar IIRC, via a prctl() hack: it uses
> prctl(PR_SET_NAME) to inject up to 16 characters into the comm - and then every perf
> event gets that comm. You can utilize that straight away if you need a marker
> solution right now.

OK, definitely much more limited than trace_marker.

>
> A cleaner, more generic approach would be a more free-form ASCII event facility for
> this: a generic 'user-space triggered event' with injection capabilities.
>
> To inject it, we could put that into a prctl() as well, for easy, single-syscall
> access from C. We already have two perf prctls: PR_TASK_PERF_EVENTS_DISABLE,
> PR_TASK_PERF_EVENTS_ENABLE - and this would be a third one: PR_TASK_PERF_USER_EVENT.
>
> This would, like the existing perf_event_task_enable()/disable() methods, loop
> through current events and inject a string into matching user events.
>
> Ideally i'd like to see basically user-space access to trace_printk(), and proper

Right, this would be ideal.

> integration into /debug/tracing/events/ enumeration and availability - not a
> ftrace-specific and admin-only hack like /sys/kernel/debug/tracing/trace_marker is
> today.
>
> Would you be interested in helping out with (and testing) such a more generic
> approach?

I would. I need some time to familiarize myself with perf first (I've 
been a relatively happy ftrace/trace-cmd user for the last couple years).

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 22:03     ` Peter Zijlstra
@ 2010-11-16 22:08       ` Ingo Molnar
  2010-11-16 22:09       ` Darren Hart
  1 sibling, 0 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-16 22:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2010-11-16 at 22:59 +0100, Ingo Molnar wrote:
> > 
> > 
> > Would you be interested in helping out with (and testing) such a more generic 
> > approach?
> 
> how about having perf-record open a named pipe and rewriting everything that comes 
> in through that as an event and storing it in the buffer?

Yeah, that would work too in most cases - except if there's no VFS connection 
between tracer and tracee. (as it might be in the case of say chroot environments, 
etc.)

I think it's better to have kernel solutions that dont modify the VFS beyond the 
trace.data session info. A task could thus generate a user-space event without 
having to negotiate with the tracer.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 22:03     ` Peter Zijlstra
  2010-11-16 22:08       ` Ingo Molnar
@ 2010-11-16 22:09       ` Darren Hart
  2010-11-16 22:48         ` Darren Hart
  1 sibling, 1 reply; 88+ messages in thread
From: Darren Hart @ 2010-11-16 22:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On 11/16/2010 02:03 PM, Peter Zijlstra wrote:
> On Tue, 2010-11-16 at 22:59 +0100, Ingo Molnar wrote:
>>
>>
>> Would you be interested in helping out with (and testing) such a more generic
>> approach?
>
> how about having perf-record open a named pipe and rewriting everything
> that comes in through that as an event and storing it in the buffer?

When I'm tracing an app or test-case, I want to be able to quickly add 
and change strings. I like the trace_printk sort of access as it avoids 
having to do a bunch of string formatting (sprintf()) calls in the test app.

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:27 ` Darren Hart
  2010-11-16 21:59   ` Ingo Molnar
@ 2010-11-16 22:17   ` Frederic Weisbecker
  2010-11-17  8:30     ` Ingo Molnar
  1 sibling, 1 reply; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-16 22:17 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Tue, Nov 16, 2010 at 01:27:35PM -0800, Darren Hart wrote:
> On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
>> We are pleased to announce a new tracing tool called 'trace'.
>>
>
> Hi Thomas, Ingo - Congrats and Thanks!
>
>> At this point we'd like to ask for feedback from other users of tracing tools,
>> which workflow components would you like to see in addition to what the 'trace'
>> tool is offering?
>>
>> Comments, bug reports, suggestions welcome,
>
> The first thing that comes to mind is trace_marker:
>
> echo "Test point A" > /sys/kernel/debug/tracing/trace_marker
>
> I've found this sort of markup to be useful using ftrace (with C  
> equivalents embedded in the test case). Is this supported?


No, the trace_marker file sends trace_printk events. And trace_printk
events needs to be converted to use the unified trace event interface
(ie: implement a struct ftrace_event_call).

This shouldn't be so hard, and there are several ways we could do this.

An idea is to reproduce the kernel file hierarchy in a "printk" event
subsystem, but this implies to allow subsystems nesting.

Imagine you have two trace_printk(), one in kernel/sched.c:117
and one in kernel/time/tick-sched.c:228
The result would be:

$ ls /sys/kernel/debug/tracing/events/printk

/sys/kernel/debug/tracing/events/printk:
enable  filter  kernel/

/sys/kernel/debug/tracing/events/printk/kernel:
enable  filter sched.c/ time/

/sys/kernel/debug/tracing/events/printk/kernel/sched.c:
enable  filter  117/

/sys/kernel/debug/tracing/events/printk/kernel/sched.c/117:
enable filter format id

/sys/kernel/debug/tracing/events/printk/kernel/time:
enable  filter tick-sched.c/

/sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c:
enable  filter  228/

/sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c/228:
enable filter format id


The format would be quite easy, and only one field for the whole string (strloc).

Not sure what to do with trace_marker. I just think we can add it as:
/sys/kernel/debug/tracing/events/printk/sys/kernel/debug/tracing/trace_marker

<:o)


But may be the whole idea is just fancy and nobody will care, and would just
a simple single event in a printk subsystem, on which you can use a kind of
virtual filter:

	echo "path != kernel/whatever.c:226" > /sys/kernel/debug/tracing/events/printk/filter

would turn on every trace_printk() but the one in the given path.

Dunno, I like both ideas. I prefer the first one which looks to me more flexible: could be
useful to list the user his trace_printks for example and implement an interface for him
to quickly select the trace_printk he wants.

For example I'm currently working with dozens of trace_printk() and I would be very happy
to turn some of them off half of the time.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 22:09       ` Darren Hart
@ 2010-11-16 22:48         ` Darren Hart
  0 siblings, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-16 22:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On 11/16/2010 02:09 PM, Darren Hart wrote:
> On 11/16/2010 02:03 PM, Peter Zijlstra wrote:
>> On Tue, 2010-11-16 at 22:59 +0100, Ingo Molnar wrote:
>>>
>>>
>>> Would you be interested in helping out with (and testing) such a more
>>> generic
>>> approach?
>>
>> how about having perf-record open a named pipe and rewriting everything
>> that comes in through that as an event and storing it in the buffer?
>
> When I'm tracing an app or test-case, I want to be able to quickly add
> and change strings. I like the trace_printk sort of access as it avoids
> having to do a bunch of string formatting (sprintf()) calls in the test
> app.

Sorry - fprintf would work for Peter's suggestion. I had the PR_SET_NAME 
with a short string or another call with jsut a plain C string - those 
would require the user doing the string formatting first.

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
  2010-11-16 21:27 ` Darren Hart
  2010-11-16 21:35 ` Ingo Molnar
@ 2010-11-17  1:37 ` Ted Ts'o
  2010-11-17  1:47   ` Steven Rostedt
  2010-11-17 13:24   ` Ingo Molnar
  2010-11-17  9:49 ` Philipp Marek
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 88+ messages in thread
From: Ted Ts'o @ 2010-11-17  1:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Tue, Nov 16, 2010 at 10:04:40PM +0100, Thomas Gleixner wrote:
> If you've booted the new kernel you can run 'trace check' to double
> check that all events used by the tool are available:
> 
>  $ trace check
> 
>  Checking whether the kernel has all required events ...
>   ... Checking event  raw_syscalls:sys_enter:r      : ok
>   ...
>   ... Checking event  sched:sched_stat_runtime:r    : ok
>  Good: all required event types are supported by this kernel.
>  The 'trace' utility will be fully functional.

For the benefit of people who create tracepoints, what restrictions
does trace have with respect to event types, and is this anticipated
to change in the future?

> The combo diffstat of the tool is appended at the end of the mail. The
> overwhelming majority of changes is on the tooling side - it uses
> existing perf events facilities and features to implement the tool.

What about the filtering and other general features/functionality of
ftrace?  Is the anticipation that this will be ported over to perf?
What about things like blktrace?

Not that I expect all of this will be working with this initial
release, but I'm curious about the long-term roadmap of this
interface.  (Obviously subject to change as we learn more, etc.  But
I'd love to hear what your current thoughts and plans are.)

Thanks,

						- Ted  

P.S.  What about sysrq-z?  Is that going to stay with ftrace side of
the tracing architecture?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  1:37 ` Ted Ts'o
@ 2010-11-17  1:47   ` Steven Rostedt
  2010-11-17  3:16     ` Ted Ts'o
  2010-11-17 13:24   ` Ingo Molnar
  1 sibling, 1 reply; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17  1:47 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Tue, 2010-11-16 at 20:37 -0500, Ted Ts'o wrote:

> For the benefit of people who create tracepoints, what restrictions
> does trace have with respect to event types, and is this anticipated
> to change in the future?

Note, I just posted an RFC stable event patch set. I would like any tool
that does general analysis, to use stable events and not random raw
events created by the perspective maintainers.

The tool could tap into the raw events, but I don't like the "trace
check" I think anything that the trace needs should be guaranteed there
(with the stable tracepoints).

The tracepoints used for general analysis should be stable, anything
else is just shear bonus.

-- Steve



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  1:47   ` Steven Rostedt
@ 2010-11-17  3:16     ` Ted Ts'o
  2010-11-17  3:34       ` Steven Rostedt
  0 siblings, 1 reply; 88+ messages in thread
From: Ted Ts'o @ 2010-11-17  3:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Tue, Nov 16, 2010 at 08:47:40PM -0500, Steven Rostedt wrote:
> 
> Note, I just posted an RFC stable event patch set. I would like any tool
> that does general analysis, to use stable events and not random raw
> events created by the perspective maintainers.
> 
> The tool could tap into the raw events, but I don't like the "trace
> check" I think anything that the trace needs should be guaranteed there
> (with the stable tracepoints).
> 
> The tracepoints used for general analysis should be stable, anything
> else is just shear bonus.

Which is fine with me, so long as it is accepted that the 'trace' tool
is not targetted at kernel developers (who would probably like to use
a combination of stable and unstable tracepoints).

Do we all agree on what the intended target audience is for this
'trace' tool?

My one concern of having a tool that doesn't support the unstable
tracepoints is that if the kernel developers aren't using it for their
day-to-day use, it won't get attention/love, and it risks the fate of
'systemtap' --- ignored by kernel developers and treated as if it
doesn't exist.

Maybe the answer is there's a #ifdef, and there's one version of the
tool that is intended for use by kernel developers, and one which is
restricted to the stable interface that we give to the hoi polloi?

	      	  	 	   	   - Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  3:16     ` Ted Ts'o
@ 2010-11-17  3:34       ` Steven Rostedt
  2010-11-17 19:00         ` Bob Copeland
  0 siblings, 1 reply; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17  3:34 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Tue, 2010-11-16 at 22:16 -0500, Ted Ts'o wrote:
> On Tue, Nov 16, 2010 at 08:47:40PM -0500, Steven Rostedt wrote:
> > 
> > Note, I just posted an RFC stable event patch set. I would like any tool
> > that does general analysis, to use stable events and not random raw
> > events created by the perspective maintainers.
> > 
> > The tool could tap into the raw events, but I don't like the "trace
> > check" I think anything that the trace needs should be guaranteed there
> > (with the stable tracepoints).
> > 
> > The tracepoints used for general analysis should be stable, anything
> > else is just shear bonus.
> 
> Which is fine with me, so long as it is accepted that the 'trace' tool
> is not targetted at kernel developers (who would probably like to use
> a combination of stable and unstable tracepoints).
> 
> Do we all agree on what the intended target audience is for this
> 'trace' tool?
> 
> My one concern of having a tool that doesn't support the unstable
> tracepoints is that if the kernel developers aren't using it for their
> day-to-day use, it won't get attention/love, and it risks the fate of
> 'systemtap' --- ignored by kernel developers and treated as if it
> doesn't exist.
> 
> Maybe the answer is there's a #ifdef, and there's one version of the
> tool that is intended for use by kernel developers, and one which is
> restricted to the stable interface that we give to the hoi polloi?

I was not involved in making of this tool, but I would envision that it
would probably have special options to enable the raw events. That is,
basic users would never see the events that kernel developers use, but
with a simple switch (-r?) it would enable all events (both stable and
raw). It just wont have the ability to do any analysis of these events.
That is, it can just report the events (much like what trace-cmd does
today), but they will just exist as an event, not anything the tool can
give true meaning to. But the kernel developer should be able to
understand it.(*)

If we get a consensus on stable events, I would like to add
a /debug/events directory that will be similar to
the /debug/tracing/events, except that it will have the new format for
all raw events.

-- Steve

(*) Currently in trace-cmd I have plugins. These plugins are simple
functions to read events and either output better displays of the event
or do something special (Avi wrote some for his kvm events). I would
like to create a tools/trace/plugins directory that this trace tool
could use. A kernel developer could write the code to pretty print their
event and show much better information that reading a bunch of hex
fields. But more on this later, I would like to work on getting stable
events done first.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 22:17   ` Frederic Weisbecker
@ 2010-11-17  8:30     ` Ingo Molnar
  2010-11-17 11:35       ` Peter Zijlstra
  2010-11-18 23:23       ` Joe Perches
  0 siblings, 2 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17  8:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Nov 16, 2010 at 01:27:35PM -0800, Darren Hart wrote:
> > On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
> >> We are pleased to announce a new tracing tool called 'trace'.
> >>
> >
> > Hi Thomas, Ingo - Congrats and Thanks!
> >
> >> At this point we'd like to ask for feedback from other users of tracing tools,
> >> which workflow components would you like to see in addition to what the 'trace'
> >> tool is offering?
> >>
> >> Comments, bug reports, suggestions welcome,
> >
> > The first thing that comes to mind is trace_marker:
> >
> > echo "Test point A" > /sys/kernel/debug/tracing/trace_marker
> >
> > I've found this sort of markup to be useful using ftrace (with C  
> > equivalents embedded in the test case). Is this supported?
> 
> 
> No, the trace_marker file sends trace_printk events. And trace_printk
> events needs to be converted to use the unified trace event interface
> (ie: implement a struct ftrace_event_call).
> 
> This shouldn't be so hard, and there are several ways we could do this.
> 
> An idea is to reproduce the kernel file hierarchy in a "printk" event
> subsystem, but this implies to allow subsystems nesting.
> 
> Imagine you have two trace_printk(), one in kernel/sched.c:117
> and one in kernel/time/tick-sched.c:228
> The result would be:
> 
> $ ls /sys/kernel/debug/tracing/events/printk
> 
> /sys/kernel/debug/tracing/events/printk:
> enable  filter  kernel/
> 
> /sys/kernel/debug/tracing/events/printk/kernel:
> enable  filter sched.c/ time/
> 
> /sys/kernel/debug/tracing/events/printk/kernel/sched.c:
> enable  filter  117/
> 
> /sys/kernel/debug/tracing/events/printk/kernel/sched.c/117:
> enable filter format id
> 
> /sys/kernel/debug/tracing/events/printk/kernel/time:
> enable  filter tick-sched.c/
> 
> /sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c:
> enable  filter  228/
> 
> /sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c/228:
> enable filter format id
> 
> 
> The format would be quite easy, and only one field for the whole string (strloc).
> 
> Not sure what to do with trace_marker. I just think we can add it as:
> /sys/kernel/debug/tracing/events/printk/sys/kernel/debug/tracing/trace_marker
> 
> <:o)
> 
> 
> But may be the whole idea is just fancy and nobody will care, [...]

I think it's an excellent idea, and it could also give us the framework to integrate 
the dynamic_printk points.

> [...] and would just a simple single event in a printk subsystem, on which you can 
> use a kind of virtual filter:
> 
> 	echo "path != kernel/whatever.c:226" > /sys/kernel/debug/tracing/events/printk/filter
> 
> would turn on every trace_printk() but the one in the given path.
> 
> Dunno, I like both ideas. I prefer the first one which looks to me more flexible: 
> could be useful to list the user his trace_printks for example and implement an 
> interface for him to quickly select the trace_printk he wants.

I prefer the first one too.

> For example I'm currently working with dozens of trace_printk() and I would be 
> very happy to turn some of them off half of the time.

I guess we could try such a patch. If you send a prototype i'd be interested in 
testing it out.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
                   ` (2 preceding siblings ...)
  2010-11-17  1:37 ` Ted Ts'o
@ 2010-11-17  9:49 ` Philipp Marek
       [not found] ` <4CE38C53.8090606@kernel.org>
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 88+ messages in thread
From: Philipp Marek @ 2010-11-17  9:49 UTC (permalink / raw)
  To: linux-kernel

Hello Thomas!

Thomas Gleixner <tglx <at> linutronix.de> writes:
> We are pleased to announce a new tracing tool called 'trace'.
...
>   read(3:</usr/bin/firefox>, buf: 0x7fffcc4caf40, count: 0x80)

Is there a way to get (parts of) the data being read/written/sent/received?
Something like the -s flag for strace would be nice, but that already seems
taken for scheduler events.


Regards,

Phil



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  8:30     ` Ingo Molnar
@ 2010-11-17 11:35       ` Peter Zijlstra
  2010-11-17 12:53         ` Frederic Weisbecker
  2010-11-18 23:23       ` Joe Perches
  1 sibling, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 11:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Tue, Nov 16, 2010 at 01:27:35PM -0800, Darren Hart wrote:
> > > On 11/16/2010 01:04 PM, Thomas Gleixner wrote:
> > >> We are pleased to announce a new tracing tool called 'trace'.
> > >>
> > >
> > > Hi Thomas, Ingo - Congrats and Thanks!
> > >
> > >> At this point we'd like to ask for feedback from other users of tracing tools,
> > >> which workflow components would you like to see in addition to what the 'trace'
> > >> tool is offering?
> > >>
> > >> Comments, bug reports, suggestions welcome,
> > >
> > > The first thing that comes to mind is trace_marker:
> > >
> > > echo "Test point A" > /sys/kernel/debug/tracing/trace_marker
> > >
> > > I've found this sort of markup to be useful using ftrace (with C  
> > > equivalents embedded in the test case). Is this supported?
> > 
> > 
> > No, the trace_marker file sends trace_printk events. And trace_printk
> > events needs to be converted to use the unified trace event interface
> > (ie: implement a struct ftrace_event_call).
> > 
> > This shouldn't be so hard, and there are several ways we could do this.
> > 
> > An idea is to reproduce the kernel file hierarchy in a "printk" event
> > subsystem, but this implies to allow subsystems nesting.
> > 
> > Imagine you have two trace_printk(), one in kernel/sched.c:117
> > and one in kernel/time/tick-sched.c:228
> > The result would be:
> > 
> > $ ls /sys/kernel/debug/tracing/events/printk
> > 
> > /sys/kernel/debug/tracing/events/printk:
> > enable  filter  kernel/
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel:
> > enable  filter sched.c/ time/
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel/sched.c:
> > enable  filter  117/
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel/sched.c/117:
> > enable filter format id
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel/time:
> > enable  filter tick-sched.c/
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c:
> > enable  filter  228/
> > 
> > /sys/kernel/debug/tracing/events/printk/kernel/time/tick-sched.c/228:
> > enable filter format id
> > 
> > 
> > The format would be quite easy, and only one field for the whole string (strloc).
> > 
> > Not sure what to do with trace_marker. I just think we can add it as:
> > /sys/kernel/debug/tracing/events/printk/sys/kernel/debug/tracing/trace_marker
> > 
> > <:o)
> > 
> > 
> > But may be the whole idea is just fancy and nobody will care, [...]
> 
> I think it's an excellent idea, and it could also give us the framework to integrate 
> the dynamic_printk points.
> 
> > [...] and would just a simple single event in a printk subsystem, on which you can 
> > use a kind of virtual filter:
> > 
> > 	echo "path != kernel/whatever.c:226" > /sys/kernel/debug/tracing/events/printk/filter
> > 
> > would turn on every trace_printk() but the one in the given path.
> > 
> > Dunno, I like both ideas. I prefer the first one which looks to me more flexible: 
> > could be useful to list the user his trace_printks for example and implement an 
> > interface for him to quickly select the trace_printk he wants.
> 
> I prefer the first one too.
> 
> > For example I'm currently working with dozens of trace_printk() and I would be 
> > very happy to turn some of them off half of the time.
> 
> I guess we could try such a patch. If you send a prototype i'd be interested in 
> testing it out.

I don't see the point, the kernel shouldn't contain any trace_printk()s
to begin with..

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch] trace: Add user-space event tracing/injection
       [not found] ` <4CE38C53.8090606@kernel.org>
@ 2010-11-17 12:07   ` Ingo Molnar
  2010-11-17 12:17     ` Pekka Enberg
                       ` (3 more replies)
  0 siblings, 4 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 12:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven


* Pekka Enberg <penberg@kernel.org> wrote:

> (Trimming CC for sanity.)

[ Added back Linus (in case he wants to object to the new prctl()) and lkml. ]

> On 11/16/10 11:04 PM, Thomas Gleixner wrote:
> >'trace' is our shot at improving the situation: it aims at providing a
> >simple to use and straightforward tracing tool based on the perf
> >infrastructure and on the well-known perf profiling workflow:
> >
> >    trace record firefox          # trace firefox and all child tasks,
> >                                    put it into trace.data
> >
> >    trace summary                 # show quick overview of the trace,
> >                                    based on trace.data
> >
> >    trace report                  # detailed traces, based on trace.data
> 
> Nice work guys!
> 
> Does this concept lend itself to tracing latencies in userspace applications that 
> run in virtual machines (e.g. the Java kind)? I'm of course interested in this 
> because of Jato [1] where bunch of interesting things can cause jitter: JIT 
> compilation, GC, kernel, and the actual application doing something (in either 
> native code or JIT'd code). It's important to be able to measure where "slowness" 
> to desktop applications and certain class of server applications comes from to be 
> able to improve things.

Makes quite a bit of sense.

How about the attached patch? It works fine with the simple testcase included in the 
changelog. There's a common-sense limit on the message size - but otherwise it adds 
support for apps to generate a free-form string trace event.

Thanks,

	Ingo

---------------------------------->
Subject: trace: Add user-space event tracing/injection
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Nov 17 10:11:53 CET 2010

This feature (suggested by Darren Hart and Pekka Engberg) allows user-space
programs to print trace events in a very simple and self-contained way:

 #include <sys/prctl.h>
 #include <string.h>

 #define PR_TASK_PERF_USER_TRACE 35

 int main(void)
 {
         char *msg = "Hello World!\n";

         prctl(PR_TASK_PERF_USER_TRACE, msg, strlen(msg));

         return 0;
 }

These show up in 'trace' output as:

 $ trace report
 #
 # trace events of 'sleep 1':
 #
        testit/ 6006 ( 0.002 ms): <"Hello World!">
        testit/ 6006 ( 0.002 ms): <"Hello World!">

Suggested-by: Darren Hart <dvhart@linux.intel.com>
Suggested-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/prctl.h       |    2 ++
 include/trace/events/user.h |   32 ++++++++++++++++++++++++++++++++
 kernel/sys.c                |   23 +++++++++++++++++++++++
 kernel/trace/trace_events.c |   34 +++++++++++-----------------------
 tools/perf/builtin-trace.c  |   41 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 109 insertions(+), 23 deletions(-)

Index: linux/include/linux/prctl.h
===================================================================
--- linux.orig/include/linux/prctl.h
+++ linux/include/linux/prctl.h
@@ -102,4 +102,6 @@
 
 #define PR_MCE_KILL_GET 34
 
+#define PR_TASK_PERF_USER_TRACE			35
+
 #endif /* _LINUX_PRCTL_H */
Index: linux/include/trace/events/user.h
===================================================================
--- /dev/null
+++ linux/include/trace/events/user.h
@@ -0,0 +1,32 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM user
+
+#if !defined(_TRACE_USER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_USER_H_
+
+#include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+TRACE_EVENT(user,
+
+	TP_PROTO(const char *message),
+
+	TP_ARGS(message),
+
+	TP_STRUCT__entry(
+		__string(	message, message);
+	),
+
+	TP_fast_assign(
+		__assign_str(message, message);
+	),
+
+	TP_printk("user %s", __get_str(message))
+);
+
+#undef NO_DEV
+
+#endif /* _TRACE_USER_H_ */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
Index: linux/kernel/sys.c
===================================================================
--- linux.orig/kernel/sys.c
+++ linux/kernel/sys.c
@@ -47,6 +47,11 @@
 #include <asm/io.h>
 #include <asm/unistd.h>
 
+#define MAX_USER_TRACE_SIZE 128
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/user.h>
+
 #ifndef SET_UNALIGN_CTL
 # define SET_UNALIGN_CTL(a,b)	(-EINVAL)
 #endif
@@ -1681,6 +1686,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 		case PR_TASK_PERF_EVENTS_ENABLE:
 			error = perf_event_task_enable();
 			break;
+		/*
+		 * Inject a trace event into the current tracing context:
+		 */
+		case PR_TASK_PERF_USER_TRACE:
+		{
+			void __user *uevent_ptr = (void *)arg2;
+			char kstring[MAX_USER_TRACE_SIZE+1];
+			unsigned long uevent_len = arg3;
+
+			if (uevent_len > MAX_USER_TRACE_SIZE)
+				return -EINVAL;
+			if (copy_from_user(kstring, uevent_ptr, uevent_len))
+				return -EFAULT;
+			kstring[uevent_len] = 0;
+
+			trace_user(kstring);
+			return 0;
+		}
 		case PR_GET_TIMERSLACK:
 			error = current->timer_slack_ns;
 			break;
Index: linux/kernel/trace/trace_events.c
===================================================================
--- linux.orig/kernel/trace/trace_events.c
+++ linux/kernel/trace/trace_events.c
@@ -989,8 +989,7 @@ static struct dentry *event_trace_events
 
 	d_events = debugfs_create_dir("events", d_tracer);
 	if (!d_events)
-		pr_warning("Could not create debugfs "
-			   "'events' directory\n");
+		pr_warning("Could not create debugfs 'events' directory\n");
 
 	return d_events;
 }
@@ -1014,15 +1013,13 @@ event_subsystem_dir(const char *name, st
 	/* need to create new entry */
 	system = kmalloc(sizeof(*system), GFP_KERNEL);
 	if (!system) {
-		pr_warning("No memory to create event subsystem %s\n",
-			   name);
+		pr_warning("No memory to create event subsystem %s\n", name);
 		return d_events;
 	}
 
 	system->entry = debugfs_create_dir(name, d_events);
 	if (!system->entry) {
-		pr_warning("Could not create event subsystem %s\n",
-			   name);
+		pr_warning("Could not create event subsystem %s\n", name);
 		kfree(system);
 		return d_events;
 	}
@@ -1041,8 +1038,7 @@ event_subsystem_dir(const char *name, st
 
 	system->filter = kzalloc(sizeof(struct event_filter), GFP_KERNEL);
 	if (!system->filter) {
-		pr_warning("Could not allocate filter for subsystem "
-			   "'%s'\n", name);
+		pr_warning("Could not allocate filter for subsystem '%s'\n", name);
 		return system->entry;
 	}
 
@@ -1051,8 +1047,7 @@ event_subsystem_dir(const char *name, st
 	if (!entry) {
 		kfree(system->filter);
 		system->filter = NULL;
-		pr_warning("Could not create debugfs "
-			   "'%s/filter' entry\n", name);
+		pr_warning("Could not create debugfs '%s/filter' entry\n", name);
 	}
 
 	trace_create_file("enable", 0644, system->entry,
@@ -1081,19 +1076,16 @@ event_create_dir(struct ftrace_event_cal
 
 	call->dir = debugfs_create_dir(call->name, d_events);
 	if (!call->dir) {
-		pr_warning("Could not create debugfs "
-			   "'%s' directory\n", call->name);
+		pr_warning("Could not create debugfs '%s' directory\n", call->name);
 		return -1;
 	}
 
 	if (call->class->reg)
-		trace_create_file("enable", 0644, call->dir, call,
-				  enable);
+		trace_create_file("enable", 0644, call->dir, call, enable);
 
 #ifdef CONFIG_PERF_EVENTS
 	if (call->event.type && call->class->reg)
-		trace_create_file("id", 0444, call->dir, call,
-		 		  id);
+		trace_create_file("id", 0444, call->dir, call, id);
 #endif
 
 	/*
@@ -1104,16 +1096,12 @@ event_create_dir(struct ftrace_event_cal
 	if (list_empty(head)) {
 		ret = call->class->define_fields(call);
 		if (ret < 0) {
-			pr_warning("Could not initialize trace point"
-				   " events/%s\n", call->name);
+			pr_warning("Could not initialize trace point events/%s\n", call->name);
 			return ret;
 		}
 	}
-	trace_create_file("filter", 0644, call->dir, call,
-			  filter);
-
-	trace_create_file("format", 0444, call->dir, call,
-			  format);
+	trace_create_file("filter", 0644, call->dir, call, filter);
+	trace_create_file("format", 0444, call->dir, call, format);
 
 	return 0;
 }
Index: linux/tools/perf/builtin-trace.c
===================================================================
--- linux.orig/tools/perf/builtin-trace.c
+++ linux/tools/perf/builtin-trace.c
@@ -719,6 +719,44 @@ static void vfs_getname(void *data,
 	}
 }
 
+static void user_event(void *data,
+			struct event *event __used,
+			int cpu __used,
+			u64 timestamp __used,
+			struct thread *thread __used)
+{
+	struct thread_data *tdata = get_thread_data(thread);
+	unsigned int i;
+	char *message;
+	u64 t = 0;
+
+	if (!tdata->enabled)
+		return;
+	if (time_filtered(timestamp))
+		return;
+	if (cpu_filtered(cpu))
+		return;
+	if (tdata->entry_time) {
+		t = timestamp - tdata->entry_time;
+		if (filter_duration(t))
+			return;
+	} else if (duration_filter)
+		return;
+
+	print_entry_head(thread, timestamp, t, cpu);
+	message = strdup(raw_field_ptr(event, "message", data));
+
+	/* Sanitize the string a bit before printing it: */
+	for (i = 0; i < strlen(message); i++) {
+		if (message[i] == '\n')
+			message[i] = ' ';
+	}
+
+	color_fprintf(stdout, PERF_COLOR_GREEN, "<\"%s\">\n", message);
+
+	free(message);
+}
+
 static int pagefault_preprocess_sample(const event_t *self,
 				       struct addr_location *al,
 				       struct sample_data *data,
@@ -1036,6 +1074,8 @@ process_raw_event(event_t *self, void *d
 		pagefault_exit(data, event, cpu, timestamp, thread);
 	if (!strcmp(event->name, "vfs_getname"))
 		vfs_getname(data, event, cpu, timestamp, thread);
+	if (!strcmp(event->name, "user"))
+		user_event(data, event, cpu, timestamp, thread);
 
 	if (!scheduler_events && !scheduler_all_events)
 		return;
@@ -1226,6 +1266,7 @@ static const char *record_args[] = {
 	"-e", "sched:sched_stat_sleep:r",
 	"-e", "sched:sched_stat_iowait:r",
 	"-e", "sched:sched_stat_runtime:r",
+	"-e", "user:user:r",
 };
 
 static int __cmd_record(int argc, const char **argv)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:07   ` [patch] trace: Add user-space event tracing/injection Ingo Molnar
@ 2010-11-17 12:17     ` Pekka Enberg
  2010-11-17 12:30       ` Ingo Molnar
  2010-11-17 12:24     ` Ingo Molnar
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 88+ messages in thread
From: Pekka Enberg @ 2010-11-17 12:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven

Hi Ingo,

On 11/17/10 2:07 PM, Ingo Molnar wrote:
> * Pekka Enberg<penberg@kernel.org>  wrote:
>
>> (Trimming CC for sanity.)
> [ Added back Linus (in case he wants to object to the new prctl()) and lkml. ]
>
>> On 11/16/10 11:04 PM, Thomas Gleixner wrote:
>>> 'trace' is our shot at improving the situation: it aims at providing a
>>> simple to use and straightforward tracing tool based on the perf
>>> infrastructure and on the well-known perf profiling workflow:
>>>
>>>     trace record firefox          # trace firefox and all child tasks,
>>>                                     put it into trace.data
>>>
>>>     trace summary                 # show quick overview of the trace,
>>>                                     based on trace.data
>>>
>>>     trace report                  # detailed traces, based on trace.data
>> Nice work guys!
>>
>> Does this concept lend itself to tracing latencies in userspace applications that
>> run in virtual machines (e.g. the Java kind)? I'm of course interested in this
>> because of Jato [1] where bunch of interesting things can cause jitter: JIT
>> compilation, GC, kernel, and the actual application doing something (in either
>> native code or JIT'd code). It's important to be able to measure where "slowness"
>> to desktop applications and certain class of server applications comes from to be
>> able to improve things.
> Makes quite a bit of sense.
>
> How about the attached patch? It works fine with the simple testcase included in the
> changelog. There's a common-sense limit on the message size - but otherwise it adds
> support for apps to generate a free-form string trace event.
>
> Thanks,
>
> 	Ingo
>
> ---------------------------------->
> Subject: trace: Add user-space event tracing/injection
> From: Ingo Molnar<mingo@elte.hu>
> Date: Wed Nov 17 10:11:53 CET 2010
>
> This feature (suggested by Darren Hart and Pekka Engberg) allows user-space
> programs to print trace events in a very simple and self-contained way:
>
>   #include<sys/prctl.h>
>   #include<string.h>
>
>   #define PR_TASK_PERF_USER_TRACE 35
>
>   int main(void)
>   {
>           char *msg = "Hello World!\n";
>
>           prctl(PR_TASK_PERF_USER_TRACE, msg, strlen(msg));
>
>           return 0;
>   }
>
> These show up in 'trace' output as:
>
>   $ trace report
>   #
>   # trace events of 'sleep 1':
>   #
>          testit/ 6006 ( 0.002 ms):<"Hello World!">
>          testit/ 6006 ( 0.002 ms):<"Hello World!">

Wow! This looks really nice!

What does the duration in milliseconds mean there? For things like GC 
and JIT, I want something like:

void gc(void)
{
         prctl(PR_TASK_PERF_USER_TRACE_START, ...)

         collect();

         prctl(PR_TASK_PERF_USER_TRACE_END, ...)
}

So that it's clear from the tracing output that the VM was busy doing GC 
for n milliseconds. Barring background JIT'ing and pauseless GC, I'd 
also be interested in showing how much time the VM was actually 
_blocking_ the running application (which can happen with signals too, 
btw, for things like accessing data that's lazily initialized).

             Pekka

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:07   ` [patch] trace: Add user-space event tracing/injection Ingo Molnar
  2010-11-17 12:17     ` Pekka Enberg
@ 2010-11-17 12:24     ` Ingo Molnar
  2010-11-17 12:30       ` Pekka Enberg
  2010-11-17 12:29     ` Peter Zijlstra
  2010-11-18  1:18     ` Darren Hart
  3 siblings, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 12:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven


* Ingo Molnar <mingo@elte.hu> wrote:

> > Does this concept lend itself to tracing latencies in userspace applications 
> > that run in virtual machines (e.g. the Java kind)? I'm of course interested in 
> > this because of Jato [1] where bunch of interesting things can cause jitter: JIT 
> > compilation, GC, kernel, and the actual application doing something (in either 
> > native code or JIT'd code). It's important to be able to measure where 
> > "slowness" to desktop applications and certain class of server applications 
> > comes from to be able to improve things.
> 
> Makes quite a bit of sense.
> 
> How about the attached patch? It works fine with the simple testcase included in 
> the changelog. There's a common-sense limit on the message size - but otherwise it 
> adds support for apps to generate a free-form string trace event.

The entirely untested Jato patch below adds support for this to Jato's user-space 
tracer. Btw., you have _hundreds_ of tracepoints in Jato, wow!

The prctl() approach is very attractive because it's very simple to integrate. It's 
also reasonably fast, there's no fd baggage in prctl(). It is also arguably a 
'process/task event' so fits prctl()'s original design (if it ever had one ...).

Note, i kept the original Jato buffering as well, and the prctl() does finegrained 
events, one event per trace_printf() line printed.

I think it makes sense to generate a separate event for all trace_printf() calls, 
because that way the events propagate immediately. OTOH i dont know how large the 
trace messages are - if there's really big tables printed (or lines are constructed 
out of many trace_printf() calls) then it may make sense to buffer them a bit.

Anyway, this demonstrates the concept.

Thanks,

	Ingo

diff --git a/vm/trace.c b/vm/trace.c
index 0192de6..64030ff 100644
--- a/vm/trace.c
+++ b/vm/trace.c
@@ -24,10 +24,13 @@
  * Please refer to the file LICENSE for details.
  */
 
+#include <sys/prctl.h>
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
 
+#define PR_TASK_PERF_USER_TRACE 35
+
 #include "jit/compiler.h"
 #include "lib/string.h"
 #include "vm/thread.h"
@@ -50,15 +53,24 @@ static void setup_trace_buffer(void)
 
 int trace_printf(const char *fmt, ...)
 {
+	unsigned long curr_pos;
 	va_list args;
 	int err;
 
 	setup_trace_buffer();
 
+	curr_pos = trace_buffer->length;
+
 	va_start(args, fmt);
 	err = str_vappend(trace_buffer, fmt, args);
 	va_end(args);
 
+	/*
+	 * Send the trace buffer to perf, it will show up as user:user events:
+	 * (on non-perf kernels this will produce a harmless -EINVAL)
+	 */
+	prctl(PR_TASK_PERF_USER_TRACE, trace_buffer->value + curr_pos, trace_buffer->length - curr_pos);
+
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:07   ` [patch] trace: Add user-space event tracing/injection Ingo Molnar
  2010-11-17 12:17     ` Pekka Enberg
  2010-11-17 12:24     ` Ingo Molnar
@ 2010-11-17 12:29     ` Peter Zijlstra
  2010-11-17 12:37       ` Peter Zijlstra
  2010-11-17 13:10       ` Frederic Weisbecker
  2010-11-18  1:18     ` Darren Hart
  3 siblings, 2 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 12:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, 2010-11-17 at 13:07 +0100, Ingo Molnar wrote:

> Subject: trace: Add user-space event tracing/injection
> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed Nov 17 10:11:53 CET 2010
> 
> This feature (suggested by Darren Hart and Pekka Engberg) allows user-space
> programs to print trace events in a very simple and self-contained way:
> 
>  #include <sys/prctl.h>
>  #include <string.h>
> 
>  #define PR_TASK_PERF_USER_TRACE 35
> 
>  int main(void)
>  {
>          char *msg = "Hello World!\n";
> 
>          prctl(PR_TASK_PERF_USER_TRACE, msg, strlen(msg));
> 
>          return 0;
>  }
> 
> These show up in 'trace' output as:
> 
>  $ trace report
>  #
>  # trace events of 'sleep 1':
>  #
>         testit/ 6006 ( 0.002 ms): <"Hello World!">
>         testit/ 6006 ( 0.002 ms): <"Hello World!">
> 
> Suggested-by: Darren Hart <dvhart@linux.intel.com>
> Suggested-by: Pekka Enberg <penberg@kernel.org>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>

I really dislike abusing prctl(), I understand your reasons, but it
still sucks.

Also, the naming doesn't work, you've implemented a trace event, that's
got nothing to do with perf, so the PR_TASK_PERF_ prefix is incorrect.


> Index: linux/include/linux/prctl.h
> ===================================================================
> --- linux.orig/include/linux/prctl.h
> +++ linux/include/linux/prctl.h
> @@ -102,4 +102,6 @@
>  
>  #define PR_MCE_KILL_GET 34
>  
> +#define PR_TASK_PERF_USER_TRACE			35
> +
>  #endif /* _LINUX_PRCTL_H */
> Index: linux/include/trace/events/user.h
> ===================================================================
> --- /dev/null
> +++ linux/include/trace/events/user.h
> @@ -0,0 +1,32 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM user
> +
> +#if !defined(_TRACE_USER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_USER_H_
> +
> +#include <linux/tracepoint.h>
> +#include <linux/ftrace.h>
> +
> +TRACE_EVENT(user,
> +
> +	TP_PROTO(const char *message),
> +
> +	TP_ARGS(message),
> +
> +	TP_STRUCT__entry(
> +		__string(	message, message);
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(message, message);
> +	),
> +
> +	TP_printk("user %s", __get_str(message))
> +);
> +
> +#undef NO_DEV
> +
> +#endif /* _TRACE_USER_H_ */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> Index: linux/kernel/sys.c
> ===================================================================
> --- linux.orig/kernel/sys.c
> +++ linux/kernel/sys.c
> @@ -47,6 +47,11 @@
>  #include <asm/io.h>
>  #include <asm/unistd.h>
>  
> +#define MAX_USER_TRACE_SIZE 128
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/user.h>
> +
>  #ifndef SET_UNALIGN_CTL
>  # define SET_UNALIGN_CTL(a,b)	(-EINVAL)
>  #endif
> @@ -1681,6 +1686,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>  		case PR_TASK_PERF_EVENTS_ENABLE:
>  			error = perf_event_task_enable();
>  			break;
> +		/*
> +		 * Inject a trace event into the current tracing context:
> +		 */
> +		case PR_TASK_PERF_USER_TRACE:
> +		{
> +			void __user *uevent_ptr = (void *)arg2;
> +			char kstring[MAX_USER_TRACE_SIZE+1];
> +			unsigned long uevent_len = arg3;
> +
> +			if (uevent_len > MAX_USER_TRACE_SIZE)
> +				return -EINVAL;
> +			if (copy_from_user(kstring, uevent_ptr, uevent_len))
> +				return -EFAULT;
> +			kstring[uevent_len] = 0;
> +
> +			trace_user(kstring);
> +			return 0;
> +		}
>  		case PR_GET_TIMERSLACK:
>  			error = current->timer_slack_ns;
>  			break;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:24     ` Ingo Molnar
@ 2010-11-17 12:30       ` Pekka Enberg
  0 siblings, 0 replies; 88+ messages in thread
From: Pekka Enberg @ 2010-11-17 12:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven

On Wed, Nov 17, 2010 at 2:24 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> > Does this concept lend itself to tracing latencies in userspace applications
>> > that run in virtual machines (e.g. the Java kind)? I'm of course interested in
>> > this because of Jato [1] where bunch of interesting things can cause jitter: JIT
>> > compilation, GC, kernel, and the actual application doing something (in either
>> > native code or JIT'd code). It's important to be able to measure where
>> > "slowness" to desktop applications and certain class of server applications
>> > comes from to be able to improve things.
>>
>> Makes quite a bit of sense.
>>
>> How about the attached patch? It works fine with the simple testcase included in
>> the changelog. There's a common-sense limit on the message size - but otherwise it
>> adds support for apps to generate a free-form string trace event.
>
> The entirely untested Jato patch below adds support for this to Jato's user-space
> tracer. Btw., you have _hundreds_ of tracepoints in Jato, wow!
>
> The prctl() approach is very attractive because it's very simple to integrate. It's
> also reasonably fast, there's no fd baggage in prctl(). It is also arguably a
> 'process/task event' so fits prctl()'s original design (if it ever had one ...).
>
> Note, i kept the original Jato buffering as well, and the prctl() does finegrained
> events, one event per trace_printf() line printed.
>
> I think it makes sense to generate a separate event for all trace_printf() calls,
> because that way the events propagate immediately. OTOH i dont know how large the
> trace messages are - if there's really big tables printed (or lines are constructed
> out of many trace_printf() calls) then it may make sense to buffer them a bit.

There's two kinds of tracing: simple (almost) one-liners (e.g.
trace_invoke() function) that are similar to trace points and data
dumps (e.g. trace_lir() function). I don't think we want too hook the
latter to PR_TASK_PERF_USER_TRACE but for the former, it definitely
makes tons of sense for the former!

                        Pekka

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:17     ` Pekka Enberg
@ 2010-11-17 12:30       ` Ingo Molnar
  2010-11-17 12:33         ` Pekka Enberg
  0 siblings, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 12:30 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven


* Pekka Enberg <penberg@kernel.org> wrote:

> >  $ trace report
> >  #
> >  # trace events of 'sleep 1':
> >  #
> >         testit/ 6006 ( 0.002 ms):<"Hello World!">
> >         testit/ 6006 ( 0.002 ms):<"Hello World!">
> 
> Wow! This looks really nice!
> 
> What does the duration in milliseconds mean there? For things like
> GC and JIT, I want something like:
> 
> void gc(void)
> {
>         prctl(PR_TASK_PERF_USER_TRACE_START, ...)
> 
>         collect();
> 
>         prctl(PR_TASK_PERF_USER_TRACE_END, ...)
> }
> 
> So that it's clear from the tracing output that the VM was busy
> doing GC for n milliseconds. Barring background JIT'ing and
> pauseless GC, I'd also be interested in showing how much time the VM
> was actually _blocking_ the running application (which can happen
> with signals too, btw, for things like accessing data that's lazily
> initialized).

We can add two events: user_event_entry/user_event_exit - or we could use the string 
to differentiate, and start it with:

  "entry: ..."
  "exit: ..."

And then the event timestamps (which are absolute and are available) could be used 
to calculate the duration of this period.

'trace' could even be taught to treat such entry:/exit: strings in a special way, so 
that you dont have to write Jato specific trace decoding bits?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:30       ` Ingo Molnar
@ 2010-11-17 12:33         ` Pekka Enberg
  2010-11-17 12:42           ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Pekka Enberg @ 2010-11-17 12:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Darren Hart,
	Arjan van de Ven

On 11/17/10 2:30 PM, Ingo Molnar wrote:
> What does the duration in milliseconds mean there? For things like
>> GC and JIT, I want something like:
>>
>> void gc(void)
>> {
>>          prctl(PR_TASK_PERF_USER_TRACE_START, ...)
>>
>>          collect();
>>
>>          prctl(PR_TASK_PERF_USER_TRACE_END, ...)
>> }
>>
>> So that it's clear from the tracing output that the VM was busy
>> doing GC for n milliseconds. Barring background JIT'ing and
>> pauseless GC, I'd also be interested in showing how much time the VM
>> was actually _blocking_ the running application (which can happen
>> with signals too, btw, for things like accessing data that's lazily
>> initialized).
> We can add two events: user_event_entry/user_event_exit - or we could use the string
> to differentiate, and start it with:
>
>    "entry: ..."
>    "exit: ..."
>
> And then the event timestamps (which are absolute and are available) could be used
> to calculate the duration of this period.
>
> 'trace' could even be taught to treat such entry:/exit: strings in a special way, so
> that you dont have to write Jato specific trace decoding bits?

Yes, makes sense. I like the API so lets convince others that it's 
important enough to be merged. :-)

             Pekka

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:29     ` Peter Zijlstra
@ 2010-11-17 12:37       ` Peter Zijlstra
  2010-11-17 13:10       ` Frederic Weisbecker
  1 sibling, 0 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, 2010-11-17 at 13:29 +0100, Peter Zijlstra wrote:
> > +             case PR_TASK_PERF_USER_TRACE:
> > +             {
> > +                     void __user *uevent_ptr = (void *)arg2;
> > +                     char kstring[MAX_USER_TRACE_SIZE+1];
> > +                     unsigned long uevent_len = arg3;
> > +
> > +                     if (uevent_len > MAX_USER_TRACE_SIZE)
> > +                             return -EINVAL;
> > +                     if (copy_from_user(kstring, uevent_ptr, uevent_len))
> > +                             return -EFAULT;
> > +                     kstring[uevent_len] = 0;
> > +
> > +                     trace_user(kstring);
> > +                     return 0;
> > +             } 

That implementation can be improved upon, no need to actually do the
copy_from_user() when the tracepoint isn't enabled.

Also, do we really need a max size here? I know ftrace (and the perf
tracepoint interface) are currently limited to 1 page of trace data but
that can be fixed (although not trivially).

Also, -EINVAL isn't really a helpful error code in that case, -EMSGSIZE
seems like a good candidate.

And the uevent name is already taken for something else, re-using it
here might cause confusion.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:33         ` Pekka Enberg
@ 2010-11-17 12:42           ` Peter Zijlstra
  2010-11-17 12:58             ` Ingo Molnar
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 12:42 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, 2010-11-17 at 14:33 +0200, Pekka Enberg wrote:
> On 11/17/10 2:30 PM, Ingo Molnar wrote:
> > What does the duration in milliseconds mean there? For things like
> >> GC and JIT, I want something like:
> >>
> >> void gc(void)
> >> {
> >>          prctl(PR_TASK_PERF_USER_TRACE_START, ...)
> >>
> >>          collect();
> >>
> >>          prctl(PR_TASK_PERF_USER_TRACE_END, ...)
> >> }
> >>
> >> So that it's clear from the tracing output that the VM was busy
> >> doing GC for n milliseconds. Barring background JIT'ing and
> >> pauseless GC, I'd also be interested in showing how much time the VM
> >> was actually _blocking_ the running application (which can happen
> >> with signals too, btw, for things like accessing data that's lazily
> >> initialized).
> > We can add two events: user_event_entry/user_event_exit - or we could use the string
> > to differentiate, and start it with:
> >
> >    "entry: ..."
> >    "exit: ..."
> >
> > And then the event timestamps (which are absolute and are available) could be used
> > to calculate the duration of this period.
> >
> > 'trace' could even be taught to treat such entry:/exit: strings in a special way, so
> > that you dont have to write Jato specific trace decoding bits?
> 
> Yes, makes sense. I like the API so lets convince others that it's 
> important enough to be merged. :-)

I don't much like it, Jato already does its own tracing for the anon_vma
symbols, it might as well write its own event log too (would need a
proper VDSO clock thingy though).

Both these, the anon_vma symbols and the events, should be implemented
in a libperf.so or the like so as not to rely on the crummy /tmp file
interface we've been using so far.

Apps can link against libperf_dummy.so to get weak stubs and LD_PRELOAD
libperf.so to get the real ones (perf record can do that LD_PRELOAD
automagically).

perf report will need to be taught to not only pick up the anon_vma
symbols but also look for the events to be merge sorted.

We already have this communication channel for Jato (due to the anon_vma
muck), might as well extend it.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
                   ` (4 preceding siblings ...)
       [not found] ` <4CE38C53.8090606@kernel.org>
@ 2010-11-17 12:47 ` Török Edwin
  2010-11-17 12:51   ` Peter Zijlstra
  2010-11-17 13:32   ` Frederic Weisbecker
  2010-11-18  0:47 ` Ian Munsie
  6 siblings, 2 replies; 88+ messages in thread
From: Török Edwin @ 2010-11-17 12:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven

On Tue, 16 Nov 2010 22:04:40 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> After years of efforts we have not succeeded in meeting (let alone
> exceeding) the utility of decades-old user-space tracing tools such as
> strace - except for a few new good tools such as PowerTop and
> LatencyTop.
> 
> 'trace' is our shot at improving the situation: it aims at providing a
> simple to use and straightforward tracing tool based on the perf
> infrastructure and on the well-known perf profiling workflow:

Interesting, I just tried it using the -tip kernel.
Looks like a good start, but there are some features I'm missing, see below.

> The combo diffstat of the tool is appended at the end of the mail.

How did you generate that? In other words is there a git command I can
use to apply the mm_pagefault/vfs_getname/trace patches on top of
2.6.36? Doing a git merge tip/tmp.perf/trace merged 2.6.37 for me.
For now I'm running a -tip kernel to test.

>      - Unintrusive tracing

Nice! At a quick glance it didn't seem to impact performance much, I'll
have to do some measurements.

> 
>      - It will record filterable information about the session:
> 
>         * syscalls

There is no strace-like "not finished":
               clamd/11086 ( 0.000 ms): pread(0x9, 0x7f142f7f1000, 0x2000, 0xa8000, 0x2000, 0xa8)              => 0x1
               clamd/11087 ( 0.014 ms): futex(uaddr: 0x7f143fad3260, op: 0x80, val: 0x2, utime: 0x0, uaddr2: 0x7f143fad3260, val3: 0x2b4f) => 0x0
               clamd/11087 ( 0.001 ms): futex(uaddr: 0x7f143fad3260, op: 0x81, val: 0x1, utime: 0xfe0, uaddr2: 0x7f143fad3260, val3: 0x2b4f) => 0x0
               clamd/11086 ( 0.011 ms):  ... [continued]: pread() =>          0x2000

I assume that the first pread with 0ms time got interrupted, and it resumed/continued later. Is that the case?
Is the 0x1 return value bogus then? If so it would be less confusing if you output what strace usually does (".. not finished" IIRC).

>         * task lifetime events (fork/clone/exit)
>         * pagefaults
> 	* various mm events (mmap)
> 	* scheduling events
> 
>         * (support for more events is being worked on)

Is it possible to use 'trace record' and 'perf record' at the same time?
I.e to do both strace-like (trace) and oprofile-like (perf record)
recording?

> 
>      - Iterative analysis of an existing session/trace without having
>        to re-run the trace, with different filters and options (trace
>        data is saved on disk)
> 
>      - Concurrent and multi-user safe trace recording: multiple users
>        can trace, or the same user can trace in multiple different
>        tracing sessions.
> 
>      - Available to plain users - the permission model is not
>        completely done yet, it needs a temporary workaround currently:
> 
> 	echo -1 > /proc/sys/kernel/perf_event_paranoid

I see --call-graph in the trace record -h output, but it doesn't seem to work on x86_64 for me.
Do I need to pass some additional parameters to trace report, do I need additional CONFIG_ turned on?
I have:
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
# CONFIG_CC_STACKPROTECTOR is not set
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y
# CONFIG_STACK_TRACER is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set

Does it use the same code for callgraphs as ftrace, so I could choose userspace stacktrace?

> 
> Todo list:
> 
>      - Improve output formatting
>      - Utilize more events: networking, block IO
>      - Add more higher level trace points
>      - Implement 'trace diff' utility to simplify the comparison of
> traces
>      - Improve the permission model (work in progress)
>      - Implement -p NNN (trace existing pid) option

1. I miss sorting. 
Sure I can do something like this: trace report -d <N> |
sort -g -k2 -t\(, but I have to choose <N> first, and it has to output
all the data before sorting can begin (including the text part).
It would be better if I 'trace' itself could sort based on the syscall
latency time, and show me the slowest syscalls first.

2. Another feature I miss is ability to sum up syscall times per syscall, and output a summary.

3. The -s output is not very intuitive, the sched tracer output in /sys/debug/tracing was more intuitive IIRC,
showing that you swithced from process A to process B. I would like to see that in the -s output
(and if switching to a process not traced, it should say so).

4. Also it would be useful to know if during a long syscall (futex, pread, etc.) the CPU time was wasted, or 
another thread of same process got scheduled in. Calculating the overhead of doing that would be interesting too.
I think this could be done with a tool that postprocesses the results, but maybe its easier to do in the tool itself.
For example:
 thread1: futex(...) [... not finished]
 thread2: (delta1 ms) ... scheduled in place of thread1 (switch thread1 -> thread2)
 thread2: ... do stuff ...
 thread1: (delta2 ms) ... scheduled in place of thread2 (switch thread2 -> thread1)
 thread1: (delta3 ms) [... continued] futex() => ...

I would be interested to see delta1, and delta2, perhaps totalled up per syscall, per thread or per process. 

5. Also it would be good if it could tell what a futex is used for:
  1122.567                clamd/11082 (139.120 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x883, utime: 0x0, uaddr2: 0x0, val3: 0x441) => 0x0
  1712.922                clamd/11082 (309.925 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x891, utime: 0x0, uaddr2: 0x0, val3: 0x448) => 0x0
  2014.289                clamd/11082 (244.312 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x895, utime: 0x0, uaddr2: 0x0, val3: 0x44a) => 0x0
  3639.956                clamd/11082 (104.370 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x919, utime: 0x0, uaddr2: 0x0, val3: 0x48c) => 0x0
  4371.086                clamd/11082 (158.304 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x951, utime: 0x0, uaddr2: 0x0, val3: 0x4a8) => 0x0
  5204.773                clamd/11082 (133.566 ms): futex(uaddr: 0x1b59d44, op: 0x80, val: 0x999, utime: 0x0, uaddr2: 0x0, val3: 0x4cc) => 0x0

Using /proc/`pidof clamd`/maps I found out that 0x1b59d44 is on the heap. Such large delays are probably
due to a pthread_cond_wait/cond_timed_wait, and after spending a bit of time in gdb I found out exactly which condition var it is:
(gdb) p &thr_pool->queueable_bulk_cond.__data.__futex
$10 = (unsigned int *) 0x1b59d44

But I don't want to repeat that for each futex call with a different address.
It would be good if 'trace' could figure out what kind of futex wait this is (i.e. mutex lock or condition variable wait), I hope
the kernel has a wait to tell them apart. 
Large delays for waiting on condition variables is expected, and I want to exclude them from my trace.
However I do want to see if there are large delays on mutex locks, that is definetely something I am interested in.
Now of course 'strace' can't make the difference between these two, but 'trace' being a kernel feature might have more information.

Of course it'd be even better if it could show the name of the condition variable, but if the condition variable is on the heap that is hard to do.
I'll have to see if there is a way to script gdb to lookup the variable associated with an address (by looking through the local and global vars on the current thread).

> 
> Main differences to strace:
> 
>      - Low impact recording
> 
>      - Arguments are printed in plain hex representation, except for
>        important arguments like filenames, which are resolved after
>        the syscall is recorded.  'trace' wont do full decoding like
>        strace does.

Did you consider providing a tool to post-process the output and
perform that decoding?

> Another thing that is recorded are page fault events. These can be
> analyzed via the -p option:

  Error: switch `p' requires a value
-P works though.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 12:47 ` [ANNOUNCE] New utility: 'trace' Török Edwin
@ 2010-11-17 12:51   ` Peter Zijlstra
  2010-11-17 13:05     ` Török Edwin
  2010-11-17 13:32   ` Frederic Weisbecker
  1 sibling, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 12:51 UTC (permalink / raw)
  To: Török Edwin
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Steven Rostedt, Arjan van de Ven

On Wed, 2010-11-17 at 14:47 +0200, Török Edwin wrote:
> I see --call-graph in the trace record -h output, but it doesn't seem
> to work on x86_64 for me.

If you want to unwind userspace you need to build everything with
framepointers -- sadly some people have been lobbying to remove
framepointers from all distro builds, even though on x86_64 its nearly
free (i386 does have a significant performance benefit, sadly).

There's some work on-going to simply copy out the top of stack and let
dwarves go wild at it in userspace, but that isn't done yet.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 11:35       ` Peter Zijlstra
@ 2010-11-17 12:53         ` Frederic Weisbecker
  2010-11-17 13:02           ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> > > For example I'm currently working with dozens of trace_printk() and I would be 
> > > very happy to turn some of them off half of the time.
> > 
> > I guess we could try such a patch. If you send a prototype i'd be interested in 
> > testing it out.
> 
> I don't see the point, the kernel shouldn't contain any trace_printk()s
> to begin with..


It's oriented toward developers. Those who use dozens of tracepoints in
their tree because they are debugging something or developing a new feature,
they might to deactivate/reactivate some of these independant points.

This can also apply to dynamic_printk of course.

Well, the very first and main point is to standardize trace_printk into
a trace event so that it gets usable by perf tools. I have been asked many
times "how to use trace_printk() with perf?".


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:42           ` Peter Zijlstra
@ 2010-11-17 12:58             ` Ingo Molnar
  2010-11-17 13:09               ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 12:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pekka Enberg, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-11-17 at 14:33 +0200, Pekka Enberg wrote:
> > On 11/17/10 2:30 PM, Ingo Molnar wrote:
> > > What does the duration in milliseconds mean there? For things like
> > >> GC and JIT, I want something like:
> > >>
> > >> void gc(void)
> > >> {
> > >>          prctl(PR_TASK_PERF_USER_TRACE_START, ...)
> > >>
> > >>          collect();
> > >>
> > >>          prctl(PR_TASK_PERF_USER_TRACE_END, ...)
> > >> }
> > >>
> > >> So that it's clear from the tracing output that the VM was busy
> > >> doing GC for n milliseconds. Barring background JIT'ing and
> > >> pauseless GC, I'd also be interested in showing how much time the VM
> > >> was actually _blocking_ the running application (which can happen
> > >> with signals too, btw, for things like accessing data that's lazily
> > >> initialized).
> > > We can add two events: user_event_entry/user_event_exit - or we could use the string
> > > to differentiate, and start it with:
> > >
> > >    "entry: ..."
> > >    "exit: ..."
> > >
> > > And then the event timestamps (which are absolute and are available) could be used
> > > to calculate the duration of this period.
> > >
> > > 'trace' could even be taught to treat such entry:/exit: strings in a special way, so
> > > that you dont have to write Jato specific trace decoding bits?
> > 
> > Yes, makes sense. I like the API so lets convince others that it's 
> > important enough to be merged. :-)
> 
> I don't much like it, Jato already does its own tracing for the anon_vma
> symbols, it might as well write its own event log too (would need a
> proper VDSO clock thingy though).

The problem is that it then does not properly mix with other events outside of the 
control of the application.

For example if there are two apps both generating user events, but there's no 
connection with them, a system-wide tracer wont get a properly ordered set of events 
- both apps will trace into their own buffers. So if we have:

  CPU1

  app1: "user event X"
  app2: "user event Y"

Then a 'trace --all' system-wide tracing session will not get proper ordering 
between app1 and app2's events. It only gets timestamps - which may or may not be 
correct.

User-space tracing schemes tend to be clumsy and limiting. There's other 
disadvantages as well: approaches that expose a named pipe in /tmp or an shmem 
region are not transparent and robust either: if user-space owns a pending buffer 
then bugs in the apps can corrupt the trace buffer, can prevent its flushing when 
the app goes down due to an app bug (and when the trace would be the most useful), 
etc. etc.

Also, in general their deployment isnt particularly fast nor lightweight - while 
prctl() is available everywhere.

And when it comes to tracing/instrumentation, if we make deployment too complex, 
people will simply not use it - and we all use. A prctl() isnt particularly sexy 
design, but it's a task/process event that we are generating (so related to prctls), 
plus it's available everywhere and is very easy to deploy.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 12:53         ` Frederic Weisbecker
@ 2010-11-17 13:02           ` Peter Zijlstra
  2010-11-17 13:10             ` Ingo Molnar
  2010-11-17 16:49             ` Darren Hart
  0 siblings, 2 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 13:53 +0100, Frederic Weisbecker wrote:
> On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> > > > For example I'm currently working with dozens of trace_printk() and I would be 
> > > > very happy to turn some of them off half of the time.
> > > 
> > > I guess we could try such a patch. If you send a prototype i'd be interested in 
> > > testing it out.
> > 
> > I don't see the point, the kernel shouldn't contain any trace_printk()s
> > to begin with..
> 
> 
> It's oriented toward developers. Those who use dozens of tracepoints in
> their tree because they are debugging something or developing a new feature,
> they might to deactivate/reactivate some of these independant points.
> 
> This can also apply to dynamic_printk of course.
> 
> Well, the very first and main point is to standardize trace_printk into
> a trace event so that it gets usable by perf tools. I have been asked many
> times "how to use trace_printk() with perf?".

Thing is, since its these dev who add the trace_printk()s to begin with,
I don't see the point in splitting them out, if you didn't want them why
did you add them to begin with?!

As to the trace_printk() to perf interface, you could do like mingo did
and create a fake event and use the regular tracepoint interface, or
hook it up directly and create a PERF_RECORD_TEXT field.

Personally I like the trace_printk() as a TRACE_EVENT(printk), it also
allows removing lots of the special casing concerning trace_printk from
ftrace.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 12:51   ` Peter Zijlstra
@ 2010-11-17 13:05     ` Török Edwin
  2010-11-17 13:10       ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Török Edwin @ 2010-11-17 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Steven Rostedt, Arjan van de Ven

On Wed, 17 Nov 2010 13:51:52 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-11-17 at 14:47 +0200, Török Edwin wrote:
> > I see --call-graph in the trace record -h output, but it doesn't
> > seem to work on x86_64 for me.
> 
> If you want to unwind userspace you need to build everything with
> framepointers -- sadly some people have been lobbying to remove
> framepointers from all distro builds, even though on x86_64 its nearly
> free (i386 does have a significant performance benefit, sadly).
> 
> There's some work on-going to simply copy out the top of stack and let
> dwarves go wild at it in userspace, but that isn't done yet.

Even without frame-pointers I should see at least the first caller in
userspace, since that can be obtained by reading the return address of
the syscall, right?
I don't see any difference in the output of 'trace report' thats why
I'm asking it is implemented at all right now, or I need additional
flags to see it.

I do see some difference in output of 'perf record -g/perf report'
though.

Best regards,
--Edwin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:58             ` Ingo Molnar
@ 2010-11-17 13:09               ` Peter Zijlstra
  2010-11-17 13:36                 ` Ingo Molnar
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, 2010-11-17 at 13:58 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:

> > I don't much like it, Jato already does its own tracing for the anon_vma
> > symbols, it might as well write its own event log too (would need a
> > proper VDSO clock thingy though).
> 
> The problem is that it then does not properly mix with other events outside of the 
> control of the application.
> 
> For example if there are two apps both generating user events, but there's no 
> connection with them, a system-wide tracer wont get a properly ordered set of events 
> - both apps will trace into their own buffers. So if we have:
> 
>   CPU1
> 
>   app1: "user event X"
>   app2: "user event Y"
> 
> Then a 'trace --all' system-wide tracing session will not get proper ordering 
> between app1 and app2's events. It only gets timestamps - which may or may not be 
> correct.

I claim we can do a VDSO to the quality of the kernel/sched_clock.c
code, which basically means we can do it as good as the kernel can.

> User-space tracing schemes tend to be clumsy and limiting. There's other 
> disadvantages as well: approaches that expose a named pipe in /tmp or an shmem 
> region are not transparent and robust either: if user-space owns a pending buffer 
> then bugs in the apps can corrupt the trace buffer, can prevent its flushing when 
> the app goes down due to an app bug (and when the trace would be the most useful), 
> etc. etc.

Sure, but you're not considering the fact that Jato already needs an
interface to communicate its generated symbols, also writing its own
events really isn't a big deal after that.

> Also, in general their deployment isnt particularly fast nor lightweight - while 
> prctl() is available everywhere.

I know your reasoning, but deployment isn't everything. Technical sanity
does, I hope, still count for something as well.

> And when it comes to tracing/instrumentation, if we make deployment too complex, 
> people will simply not use it - and we all use. A prctl() isnt particularly sexy 
> design, but it's a task/process event that we are generating (so related to prctls), 
> plus it's available everywhere and is very easy to deploy.

Different tools for different people, complex applications like JITs can
use a more complex interface to communicate all their various data.

A simple printk() style interface through a syscall (preferably not
prctl) if fine too, it just doesn't suffice for everything, nor should
we want it to.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:05     ` Török Edwin
@ 2010-11-17 13:10       ` Peter Zijlstra
  0 siblings, 0 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:10 UTC (permalink / raw)
  To: Török Edwin
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Steven Rostedt, Arjan van de Ven

On Wed, 2010-11-17 at 15:05 +0200, Török Edwin wrote:
> On Wed, 17 Nov 2010 13:51:52 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, 2010-11-17 at 14:47 +0200, Török Edwin wrote:
> > > I see --call-graph in the trace record -h output, but it doesn't
> > > seem to work on x86_64 for me.
> > 
> > If you want to unwind userspace you need to build everything with
> > framepointers -- sadly some people have been lobbying to remove
> > framepointers from all distro builds, even though on x86_64 its nearly
> > free (i386 does have a significant performance benefit, sadly).
> > 
> > There's some work on-going to simply copy out the top of stack and let
> > dwarves go wild at it in userspace, but that isn't done yet.
> 
> Even without frame-pointers I should see at least the first caller in
> userspace, since that can be obtained by reading the return address of
> the syscall, right?

Right.

> I don't see any difference in the output of 'trace report' thats why
> I'm asking it is implemented at all right now, or I need additional
> flags to see it.

Ah, I haven't actually looked at any of the perf-trace code yet, so I'm
afraid I can't answer that particular question.

> I do see some difference in output of 'perf record -g/perf report'
> though.

ok.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:02           ` Peter Zijlstra
@ 2010-11-17 13:10             ` Ingo Molnar
  2010-11-17 13:36               ` Peter Zijlstra
  2010-11-17 16:49             ` Darren Hart
  1 sibling, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 13:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-11-17 at 13:53 +0100, Frederic Weisbecker wrote:
> > On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
> > > On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> > > > > For example I'm currently working with dozens of trace_printk() and I would be 
> > > > > very happy to turn some of them off half of the time.
> > > > 
> > > > I guess we could try such a patch. If you send a prototype i'd be interested in 
> > > > testing it out.
> > > 
> > > I don't see the point, the kernel shouldn't contain any trace_printk()s
> > > to begin with..
> > 
> > 
> > It's oriented toward developers. Those who use dozens of tracepoints in
> > their tree because they are debugging something or developing a new feature,
> > they might to deactivate/reactivate some of these independant points.
> > 
> > This can also apply to dynamic_printk of course.
> > 
> > Well, the very first and main point is to standardize trace_printk into
> > a trace event so that it gets usable by perf tools. I have been asked many
> > times "how to use trace_printk() with perf?".
> 
> Thing is, since its these dev who add the trace_printk()s to begin with, I don't 
> see the point in splitting them out, if you didn't want them why did you add them 
> to begin with?!

That's a common workflow: lots of printks (trace_printk's) put all around the code - 
and sometimes one set of tracepoints is needed, one time another set.

_If_ we succeed in presenting them like Frederic suggested it, and if we make the 
turning on/off _simpler_ (no kernel modification) and faster (no kernel reboot) via 
the tooling, people like Frederic might start using it.

I dont think we should fight the workflow itself - it makes sense.

The only question is whether we can represent it all in a nicer fashion than 'modify 
the source code and reboot'. If we cannot then there's no point - but i'm not sure 
about it and Frederic seems to be convinced too that he can make such a switch 
on/off facility intuitive. We'll only see if we try it.

Also, i dont see any harm - do you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:29     ` Peter Zijlstra
  2010-11-17 12:37       ` Peter Zijlstra
@ 2010-11-17 13:10       ` Frederic Weisbecker
  2010-11-17 13:31         ` Peter Zijlstra
  1 sibling, 1 reply; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 13:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Pekka Enberg, Thomas Gleixner, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, Nov 17, 2010 at 01:29:54PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 13:07 +0100, Ingo Molnar wrote:
> 
> > Subject: trace: Add user-space event tracing/injection
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Wed Nov 17 10:11:53 CET 2010
> > 
> > This feature (suggested by Darren Hart and Pekka Engberg) allows user-space
> > programs to print trace events in a very simple and self-contained way:
> > 
> >  #include <sys/prctl.h>
> >  #include <string.h>
> > 
> >  #define PR_TASK_PERF_USER_TRACE 35
> > 
> >  int main(void)
> >  {
> >          char *msg = "Hello World!\n";
> > 
> >          prctl(PR_TASK_PERF_USER_TRACE, msg, strlen(msg));
> > 
> >          return 0;
> >  }
> > 
> > These show up in 'trace' output as:
> > 
> >  $ trace report
> >  #
> >  # trace events of 'sleep 1':
> >  #
> >         testit/ 6006 ( 0.002 ms): <"Hello World!">
> >         testit/ 6006 ( 0.002 ms): <"Hello World!">
> > 
> > Suggested-by: Darren Hart <dvhart@linux.intel.com>
> > Suggested-by: Pekka Enberg <penberg@kernel.org>
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> I really dislike abusing prctl(), I understand your reasons, but it
> still sucks.
> 
> Also, the naming doesn't work, you've implemented a trace event, that's
> got nothing to do with perf, so the PR_TASK_PERF_ prefix is incorrect.


Why not using write() on the perf event fd?
So that you don't only limit to a custom new simple event (which should
match most needs) but you can also inject whatever existing event you like.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  1:37 ` Ted Ts'o
  2010-11-17  1:47   ` Steven Rostedt
@ 2010-11-17 13:24   ` Ingo Molnar
  2010-11-17 13:38     ` Peter Zijlstra
  1 sibling, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 13:24 UTC (permalink / raw)
  To: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet


* Ted Ts'o <tytso@mit.edu> wrote:

> On Tue, Nov 16, 2010 at 10:04:40PM +0100, Thomas Gleixner wrote:
> > If you've booted the new kernel you can run 'trace check' to double
> > check that all events used by the tool are available:
> > 
> >  $ trace check
> > 
> >  Checking whether the kernel has all required events ...
> >   ... Checking event  raw_syscalls:sys_enter:r      : ok
> >   ...
> >   ... Checking event  sched:sched_stat_runtime:r    : ok
> >  Good: all required event types are supported by this kernel.
> >  The 'trace' utility will be fully functional.
> 
> For the benefit of people who create tracepoints, what restrictions
> does trace have with respect to event types, and is this anticipated
> to change in the future?

There are no conceptual restrictions - this v1 version of the tool uses a (small) 
subset of tracepoints right now.

ext4 tracepoints will be used once someone writes a trace code (or plugin) for it - 
i think beyond simply displaying that trace data it would also be useful to 
interpret it to a certain degree? They could also interact with the block events: so 
we could track which inode (or other ext4 data structure) generates what IO, what 
the latencies are and which task originated things. We could track what the wait 
reasons are.

At that point (when ext4 event support is added) we'd add 'trace check' test as 
well, and would generally try to make sure that if distros enable events that all 
commonly used tracepoints are available as well.

I.e. the long term goal would be to create a widely available tool, which, amongst 
other things, can be used to record and report ext4 events as well - with no kernel 
reboots or tool rebuilds required.

> > The combo diffstat of the tool is appended at the end of the mail. The
> > overwhelming majority of changes is on the tooling side - it uses
> > existing perf events facilities and features to implement the tool.
> 
> What about the filtering and other general features/functionality of
> ftrace?  Is the anticipation that this will be ported over to perf?
> What about things like blktrace?

Yeah, i'd love to see all that available. Filtering is available already on the 
kernel perf event side and could be added as an option.

> Not that I expect all of this will be working with this initial
> release, but I'm curious about the long-term roadmap of this
> interface.  (Obviously subject to change as we learn more, etc.  But
> I'd love to hear what your current thoughts and plans are.)
>
> P.S.  What about sysrq-z?  Is that going to stay with ftrace side of
> the tracing architecture?

What we are working towards is to unify the whole tracing infrastructure. IMHO it's 
not really acceptable that there is an 'ftrace side' and a 'perf side', with feature 
overlap and feature mismatch and general confusion.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 13:10       ` Frederic Weisbecker
@ 2010-11-17 13:31         ` Peter Zijlstra
  2010-11-17 13:33           ` Frederic Weisbecker
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Pekka Enberg, Thomas Gleixner, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, 2010-11-17 at 14:10 +0100, Frederic Weisbecker wrote:
> 
> 
> 
> Why not using write() on the perf event fd?
> So that you don't only limit to a custom new simple event (which should
> match most needs) but you can also inject whatever existing event you like.

Because you don't have the perf event fd, the monitoring application has
it.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 12:47 ` [ANNOUNCE] New utility: 'trace' Török Edwin
  2010-11-17 12:51   ` Peter Zijlstra
@ 2010-11-17 13:32   ` Frederic Weisbecker
  1 sibling, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 13:32 UTC (permalink / raw)
  To: Török Edwin
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

(Please keep the initial Cc list)


On Wed, Nov 17, 2010 at 02:47:58PM +0200, Török Edwin wrote:
> I see --call-graph in the trace record -h output, but it doesn't seem to work on x86_64 for me.
> Do I need to pass some additional parameters to trace report, do I need additional CONFIG_ turned on?
> I have:
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
> # CONFIG_CC_STACKPROTECTOR is not set
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
> # CONFIG_STACK_TRACER is not set
> # CONFIG_DEBUG_STACKOVERFLOW is not set
> # CONFIG_DEBUG_STACK_USAGE is not set
> 
> Does it use the same code for callgraphs as ftrace, so I could choose userspace stacktrace?


So, this option comes from the fact this new tools wraps on perf record
command line. So yeah, your callchains are recorded, they are just not
reported by this new tool (yet).

Only perf report know how to deal with them for now.

You can play with that though, recording with "trace record -g" and look
at the result with "perf report", you'll get nice stats about your
callchains.

Now the tracing has yet to make use of it.

Ah and it doesn't use the same code than ftrace, but perf does kernel and
user callchain snapshot too. "-g" just record both.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 13:31         ` Peter Zijlstra
@ 2010-11-17 13:33           ` Frederic Weisbecker
  0 siblings, 0 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Pekka Enberg, Thomas Gleixner, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven

On Wed, Nov 17, 2010 at 02:31:27PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 14:10 +0100, Frederic Weisbecker wrote:
> > 
> > 
> > 
> > Why not using write() on the perf event fd?
> > So that you don't only limit to a custom new simple event (which should
> > match most needs) but you can also inject whatever existing event you like.
> 
> Because you don't have the perf event fd, the monitoring application has
> it.


Ah, right.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 13:09               ` Peter Zijlstra
@ 2010-11-17 13:36                 ` Ingo Molnar
  0 siblings, 0 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pekka Enberg, Thomas Gleixner, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, linux-kernel,
	Linus Torvalds, Andrew Morton, Darren Hart, Arjan van de Ven


* Peter Zijlstra <peterz@infradead.org> wrote:

> > User-space tracing schemes tend to be clumsy and limiting. There's other 
> > disadvantages as well: approaches that expose a named pipe in /tmp or an shmem 
> > region are not transparent and robust either: if user-space owns a pending 
> > buffer then bugs in the apps can corrupt the trace buffer, can prevent its 
> > flushing when the app goes down due to an app bug (and when the trace would be 
> > the most useful), etc. etc.
> 
> Sure, but you're not considering the fact that Jato already needs an interface to 
> communicate its generated symbols, also writing its own events really isn't a big 
> deal after that.

But Jato is special there (it's a special execution machine with its own symbol 
space) - and most apps that generate trace events are not such.

Also, while it's not a big deal to not get symbols, it's a big deal to not get trace 
events _exactly when they are needed most_: when the app crashes or corrupts itself.

I.e. the kernel does us a real and useful service of extracting and then protecting 
data.

> > Also, in general their deployment isnt particularly fast nor lightweight - while 
> > prctl() is available everywhere.
> 
> I know your reasoning, but deployment isn't everything. Technical sanity does, I 
> hope, still count for something as well.

I agree that a prctl() isnt particularly nice - a new syscall would be nicer, if it 
wasnt such a PITA to get new syscalls supported by widely available libraries like 
glibc.

But i disagree that there should be pending buffers in the tracee context. Having 
app-side data buffering introduces the sorts of problems i outlined, that the data 
can be lost or corrupted when we need _reliable_ (and non-corrupted) trace data the 
most.

We could use the vDSO approach for super-fast and super-voluminous tracing needs, 
although i really doubt that it's the common case.

Availability is the biggest issue by far - and availability is inverse proportional 
to deployment complexity.

> > And when it comes to tracing/instrumentation, if we make deployment too complex, 
> > people will simply not use it - and we all use. A prctl() isnt particularly sexy 
> > design, but it's a task/process event that we are generating (so related to 
> > prctls), plus it's available everywhere and is very easy to deploy.
> 
> Different tools for different people, complex applications like JITs can use a 
> more complex interface to communicate all their various data.

Yes but i dont want complex interfaces at all - i want rich trace data from many 
apps, so that tracing tools start to make sense.

> A simple printk() style interface through a syscall (preferably not prctl) is fine 
> too, it just doesn't suffice for everything, nor should we want it to.

Well, it covers about 80-90% of the needs, so it was the first thing i considered.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:10             ` Ingo Molnar
@ 2010-11-17 13:36               ` Peter Zijlstra
  2010-11-17 13:43                 ` Frederic Weisbecker
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 14:10 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, 2010-11-17 at 13:53 +0100, Frederic Weisbecker wrote:
> > > On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
> > > > On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> > > > > > For example I'm currently working with dozens of trace_printk() and I would be 
> > > > > > very happy to turn some of them off half of the time.
> > > > > 
> > > > > I guess we could try such a patch. If you send a prototype i'd be interested in 
> > > > > testing it out.
> > > > 
> > > > I don't see the point, the kernel shouldn't contain any trace_printk()s
> > > > to begin with..
> > > 
> > > 
> > > It's oriented toward developers. Those who use dozens of tracepoints in
> > > their tree because they are debugging something or developing a new feature,
> > > they might to deactivate/reactivate some of these independant points.
> > > 
> > > This can also apply to dynamic_printk of course.
> > > 
> > > Well, the very first and main point is to standardize trace_printk into
> > > a trace event so that it gets usable by perf tools. I have been asked many
> > > times "how to use trace_printk() with perf?".
> > 
> > Thing is, since its these dev who add the trace_printk()s to begin with, I don't 
> > see the point in splitting them out, if you didn't want them why did you add them 
> > to begin with?!
> 
> That's a common workflow: lots of printks (trace_printk's) put all around the code - 
> and sometimes one set of tracepoints is needed, one time another set.
> 
> _If_ we succeed in presenting them like Frederic suggested it, and if we make the 
> turning on/off _simpler_ (no kernel modification) and faster (no kernel reboot) via 
> the tooling, people like Frederic might start using it.
> 
> I dont think we should fight the workflow itself - it makes sense.
> 
> The only question is whether we can represent it all in a nicer fashion than 'modify 
> the source code and reboot'. If we cannot then there's no point - but i'm not sure 
> about it and Frederic seems to be convinced too that he can make such a switch 
> on/off facility intuitive. We'll only see if we try it.
> 
> Also, i dont see any harm - do you?

Yes, trace_printk() is a pure debug interface, solely meant for the edit
+ reboot cycle.

If you want anything more than that we've got tracepoints. The rule up
until now has been to never merge a trace_printk() user.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:24   ` Ingo Molnar
@ 2010-11-17 13:38     ` Peter Zijlstra
  2010-11-17 14:00       ` Ingo Molnar
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Wed, 2010-11-17 at 14:24 +0100, Ingo Molnar wrote:
> > What about the filtering and other general features/functionality of
> > ftrace?  Is the anticipation that this will be ported over to perf?
> > What about things like blktrace?
> 
> Yeah, i'd love to see all that available. Filtering is available already on the 
> kernel perf event side and could be added as an option. 

Except I utterly detest the existing filter code.. it runs _after_ we do
all the hard work and then makes us roll back when it decides we
shouldn't output after all.

But yes, it is functional.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:36               ` Peter Zijlstra
@ 2010-11-17 13:43                 ` Frederic Weisbecker
  2010-11-17 13:53                   ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 13:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, Nov 17, 2010 at 02:36:16PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 14:10 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Wed, 2010-11-17 at 13:53 +0100, Frederic Weisbecker wrote:
> > > > On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
> > > > > On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> > > > > > > For example I'm currently working with dozens of trace_printk() and I would be 
> > > > > > > very happy to turn some of them off half of the time.
> > > > > > 
> > > > > > I guess we could try such a patch. If you send a prototype i'd be interested in 
> > > > > > testing it out.
> > > > > 
> > > > > I don't see the point, the kernel shouldn't contain any trace_printk()s
> > > > > to begin with..
> > > > 
> > > > 
> > > > It's oriented toward developers. Those who use dozens of tracepoints in
> > > > their tree because they are debugging something or developing a new feature,
> > > > they might to deactivate/reactivate some of these independant points.
> > > > 
> > > > This can also apply to dynamic_printk of course.
> > > > 
> > > > Well, the very first and main point is to standardize trace_printk into
> > > > a trace event so that it gets usable by perf tools. I have been asked many
> > > > times "how to use trace_printk() with perf?".
> > > 
> > > Thing is, since its these dev who add the trace_printk()s to begin with, I don't 
> > > see the point in splitting them out, if you didn't want them why did you add them 
> > > to begin with?!
> > 
> > That's a common workflow: lots of printks (trace_printk's) put all around the code - 
> > and sometimes one set of tracepoints is needed, one time another set.
> > 
> > _If_ we succeed in presenting them like Frederic suggested it, and if we make the 
> > turning on/off _simpler_ (no kernel modification) and faster (no kernel reboot) via 
> > the tooling, people like Frederic might start using it.
> > 
> > I dont think we should fight the workflow itself - it makes sense.
> > 
> > The only question is whether we can represent it all in a nicer fashion than 'modify 
> > the source code and reboot'. If we cannot then there's no point - but i'm not sure 
> > about it and Frederic seems to be convinced too that he can make such a switch 
> > on/off facility intuitive. We'll only see if we try it.
> > 
> > Also, i dont see any harm - do you?
> 
> Yes, trace_printk() is a pure debug interface, solely meant for the edit
> + reboot cycle.

So why prevent from making it even more handy?


> If you want anything more than that we've got tracepoints. The rule up
> until now has been to never merge a trace_printk() user.

Sure, that doesn't change the core idea of trace_prink(): none of them must
be merged. That new event interface would just make private uses of trace_printk()
more convenient.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:43                 ` Frederic Weisbecker
@ 2010-11-17 13:53                   ` Peter Zijlstra
  2010-11-17 14:10                     ` Frederic Weisbecker
  0 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 13:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 14:43 +0100, Frederic Weisbecker wrote:
> 
> > Yes, trace_printk() is a pure debug interface, solely meant for the edit
> > + reboot cycle.
> 
> So why prevent from making it even more handy?
> 
> 
> > If you want anything more than that we've got tracepoints. The rule up
> > until now has been to never merge a trace_printk() user.
> 
> Sure, that doesn't change the core idea of trace_prink(): none of them must
> be merged. That new event interface would just make private uses of trace_printk()
> more convenient. 

I don't get it, if you don't want it, why put it there in the first
place?

I've never found myself thinking, oh damn, I didn't want to see that
one!, only damn, I should have added more :-)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:38     ` Peter Zijlstra
@ 2010-11-17 14:00       ` Ingo Molnar
  2010-11-17 14:11         ` Peter Zijlstra
  0 siblings, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-17 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-11-17 at 14:24 +0100, Ingo Molnar wrote:
> > > What about the filtering and other general features/functionality of
> > > ftrace?  Is the anticipation that this will be ported over to perf?
> > > What about things like blktrace?
> > 
> > Yeah, i'd love to see all that available. Filtering is available already on the 
> > kernel perf event side and could be added as an option. 
> 
> Except I utterly detest the existing filter code.. it runs _after_ we do all the 
> hard work and then makes us roll back when it decides we shouldn't output after 
> all.
> 
> But yes, it is functional.

I suspect that is what matters mostly - unless you think that it's impossible to 
have a sane implementation of it, if the users come.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:53                   ` Peter Zijlstra
@ 2010-11-17 14:10                     ` Frederic Weisbecker
  2010-11-17 15:02                       ` Steven Rostedt
  2010-11-17 18:13                       ` Ted Ts'o
  0 siblings, 2 replies; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 14:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Darren Hart, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, Nov 17, 2010 at 02:53:49PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 14:43 +0100, Frederic Weisbecker wrote:
> > 
> > > Yes, trace_printk() is a pure debug interface, solely meant for the edit
> > > + reboot cycle.
> > 
> > So why prevent from making it even more handy?
> > 
> > 
> > > If you want anything more than that we've got tracepoints. The rule up
> > > until now has been to never merge a trace_printk() user.
> > 
> > Sure, that doesn't change the core idea of trace_prink(): none of them must
> > be merged. That new event interface would just make private uses of trace_printk()
> > more convenient. 
> 
> I don't get it, if you don't want it, why put it there in the first
> place?
> 
> I've never found myself thinking, oh damn, I didn't want to see that
> one!, only damn, I should have added more :-)


Hehe :)

Yeah I have a strange workflow. I'm working on that CPU isolation thing
and I have dozens of trace_printk all over the place for tons of
things. And everytime I remove one to unwind some output or to focus
on another one, I often have to restore it later because I need it
again. Usually I even just comment it out instead of removing it.

If I could make this dynamically on a per line filtering, or sometimes on
a per file granularity (as both are equally often the case for me), I would
probably win some time.

I just don't know how many developers have a similar workflow than mine.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:00       ` Ingo Molnar
@ 2010-11-17 14:11         ` Peter Zijlstra
  2010-11-17 14:18           ` Peter Zijlstra
                             ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 14:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Wed, 2010-11-17 at 15:00 +0100, Ingo Molnar wrote:
> > But yes, it is functional.
> 
> I suspect that is what matters mostly - unless you think that it's impossible to 
> have a sane implementation of it, if the users come. 

I'm not sure, I raised the point several times, and I think Steve and
Tom though it was possible to implement sanely, but I'm not sure if it
was expression invariant (the fix that is).

It would be very good to have a definite answer on that.

The idea was to not let the filter engine work on the trace data (once
its gathered) but on the trace argument right at the beginning of the
tracepoint callchain, since some of the trace data is an expression of
the trace argument (say next->prio instead of next), the trace
expression wouldn't stay invariant, you'd have to write a different
filter for the same effect.

So I think it would be wise to make this change sooner rather than
later.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:11         ` Peter Zijlstra
@ 2010-11-17 14:18           ` Peter Zijlstra
  2010-11-17 14:37             ` Thomas Gleixner
  2010-11-17 15:10           ` Steven Rostedt
  2010-11-17 15:33           ` Tom Zanussi
  2 siblings, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 14:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Wed, 2010-11-17 at 15:11 +0100, Peter Zijlstra wrote:
> The idea was to not let the filter engine work on the trace data (once
> its gathered) but on the trace argument right at the beginning of the
> tracepoint callchain, since some of the trace data is an expression of
> the trace argument (say next->prio instead of next), the trace
> expression wouldn't stay invariant, you'd have to write a different
> filter for the same effect.
> 
> So I think it would be wise to make this change sooner rather than
> later. 

Also, I see a lot of overlap with the dynamic probes stuff which needs
the help of magic dwarves to come up with the right data to gather.

Merging the dynamic tracepoint and filter stuff would be nice, there's
no way you can express next->prio without the help of these short
buggers with large axes.

The trouble is that the dynamic tracepoint stuff is privileged for good
reasons, try next+0x1000000 and you're out in the woods, priv. only
filters otoh just doesn't sound too hot.

Another nasty thing is that you actually need to have these dwarves
present, which means having the -debug package installed.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:18           ` Peter Zijlstra
@ 2010-11-17 14:37             ` Thomas Gleixner
  2010-11-17 14:41               ` Peter Zijlstra
  2010-11-17 15:41               ` Tom Zanussi
  0 siblings, 2 replies; 88+ messages in thread
From: Thomas Gleixner @ 2010-11-17 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ted Ts'o, LKML, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 17 Nov 2010, Peter Zijlstra wrote:

> On Wed, 2010-11-17 at 15:11 +0100, Peter Zijlstra wrote:
> > The idea was to not let the filter engine work on the trace data (once
> > its gathered) but on the trace argument right at the beginning of the
> > tracepoint callchain, since some of the trace data is an expression of
> > the trace argument (say next->prio instead of next), the trace
> > expression wouldn't stay invariant, you'd have to write a different
> > filter for the same effect.
> > 
> > So I think it would be wise to make this change sooner rather than
> > later. 
> 
> Also, I see a lot of overlap with the dynamic probes stuff which needs
> the help of magic dwarves to come up with the right data to gather.
> 
> Merging the dynamic tracepoint and filter stuff would be nice, there's
> no way you can express next->prio without the help of these short
> buggers with large axes.
> 
> The trouble is that the dynamic tracepoint stuff is privileged for good
> reasons, try next+0x1000000 and you're out in the woods, priv. only
> filters otoh just doesn't sound too hot.
> 
> Another nasty thing is that you actually need to have these dwarves
> present, which means having the -debug package installed.

That sounds utterly insane for the basic use case where you trace
application context. There is no point to filter out the tracepoints,
really. Postprocessing can do that just fine.

I consider myself a power user of tracing, but hell I never used any
of those filters in a real use case. awk, grep, scripting languages do
just a better job and you don't miss any data just because you got
your filter expression slightly wrong. Postprocessing always wins in
that regard.

The only reason why these filters might make sense is to reduce the
trace volume on system wide traces in production environments, but
that's a completely different story. These scenarios can do with the
dynamic tracepoint stuff and the custom filtering w/o putting the
burden on the majority of users.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:37             ` Thomas Gleixner
@ 2010-11-17 14:41               ` Peter Zijlstra
  2010-11-17 15:41               ` Tom Zanussi
  1 sibling, 0 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 14:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Ted Ts'o, LKML, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 15:37 +0100, Thomas Gleixner wrote:
> 
> The only reason why these filters might make sense is to reduce the
> trace volume on system wide traces in production environments, but
> that's a completely different story. These scenarios can do with the
> dynamic tracepoint stuff and the custom filtering w/o putting the
> burden on the majority of users. 

I've been hearing people claim 'enterprise' and custom are like mutually
exclusive ;-)

But yeah, I totally agree, I've never used any of that filter stuff
either, I always script the post-processing.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:10                     ` Frederic Weisbecker
@ 2010-11-17 15:02                       ` Steven Rostedt
  2010-11-17 18:13                       ` Ted Ts'o
  1 sibling, 0 replies; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17 15:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 15:10 +0100, Frederic Weisbecker wrote:

> Yeah I have a strange workflow. I'm working on that CPU isolation thing
> and I have dozens of trace_printk all over the place for tons of
> things. And everytime I remove one to unwind some output or to focus
> on another one, I often have to restore it later because I need it
> again. Usually I even just comment it out instead of removing it.
> 
> If I could make this dynamically on a per line filtering, or sometimes on
> a per file granularity (as both are equally often the case for me), I would
> probably win some time.
> 
> I just don't know how many developers have a similar workflow than mine.

I usually wrap trace_printk() (or printk sometimes) in my own wrapper:

#define sprintk(on, x...) do { if (on) { trace_prinkt(x); } } while (0)

Then have things like:

#define SDR_DEBUG_IRQS 0
#define SDR_DEBUG_SCHED 1
[...]

and in the irq code I'll have:

	sprintk(SDR_DEBUG_IRQS, "printme"...);

and the sched code:

	sprintk(SDR_DEBUG_SCHED, "printthis"...);

And I can easily enable or disable the prints I want. Sometimes I'll
define it as a variable, and enable them either in certain code paths,
or export them in the debugfs system and enable or disable them from
userspace.

extern int sdr_debug_irqs;
#define SDR_DEBUG_IRQS sdr_debug_irqs

[...]
int sdr_debug_irqs;
[...]

	debugfs_create_bool("sdr_debug_sched", 0644, NULL, &sdr_debug_irqs);

Of course this is all for temporary debugging. I use SDR and sdr and
sprintk to search and delete the debugging output when done.

-- Steve



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:11         ` Peter Zijlstra
  2010-11-17 14:18           ` Peter Zijlstra
@ 2010-11-17 15:10           ` Steven Rostedt
  2010-11-17 15:43             ` Peter Zijlstra
  2010-11-17 15:33           ` Tom Zanussi
  2 siblings, 1 reply; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 15:11 +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 15:00 +0100, Ingo Molnar wrote:
> > > But yes, it is functional.
> > 
> > I suspect that is what matters mostly - unless you think that it's impossible to 
> > have a sane implementation of it, if the users come. 
> 
> I'm not sure, I raised the point several times, and I think Steve and
> Tom though it was possible to implement sanely, but I'm not sure if it
> was expression invariant (the fix that is).
> 
> It would be very good to have a definite answer on that.
> 
> The idea was to not let the filter engine work on the trace data (once
> its gathered) but on the trace argument right at the beginning of the
> tracepoint callchain, since some of the trace data is an expression of
> the trace argument (say next->prio instead of next), the trace
> expression wouldn't stay invariant, you'd have to write a different
> filter for the same effect.
> 
> So I think it would be wise to make this change sooner rather than
> later.

Right, the problem with filtering is what do we want to filter, and what
about copying?

Currently, we copy the data into the buffer and then filter on that
data. We could also easily filter on the parameters of the tracepoint,
but sometimes those parameters do not match the final output (as the
case with sched_switch). Do we copy the data into a separate "per cpu"
temp buffer, and figure out the filter then? And if the filter is fine,
then copy into the buffer. This obviously is slow, due to the multiple
copies. We could do this only if the filtering is enabled.

Note, the stable event proposal I suggested passes the same info via the
trace callback parameters that is expected to go into the buffer. Arrays
are passed as pointers so the copying is of the minimum. This would give
us the ability to filter _before_ writing into the buffer, and no
unnecessary copying.

-- Steve



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:11         ` Peter Zijlstra
  2010-11-17 14:18           ` Peter Zijlstra
  2010-11-17 15:10           ` Steven Rostedt
@ 2010-11-17 15:33           ` Tom Zanussi
  2010-11-17 18:36             ` Mathieu Desnoyers
  2 siblings, 1 reply; 88+ messages in thread
From: Tom Zanussi @ 2010-11-17 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 15:11 +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 15:00 +0100, Ingo Molnar wrote:
> > > But yes, it is functional.
> > 
> > I suspect that is what matters mostly - unless you think that it's impossible to 
> > have a sane implementation of it, if the users come. 
> 
> I'm not sure, I raised the point several times, and I think Steve and
> Tom though it was possible to implement sanely, but I'm not sure if it
> was expression invariant (the fix that is).
> 
> It would be very good to have a definite answer on that.
> 
> The idea was to not let the filter engine work on the trace data (once
> its gathered) but on the trace argument right at the beginning of the
> tracepoint callchain, since some of the trace data is an expression of
> the trace argument (say next->prio instead of next), the trace
> expression wouldn't stay invariant, you'd have to write a different
> filter for the same effect.
> 

IIRC, I think the conclusion we came to was that it could be done
mechanically if for example the right-hand-side of an assignment in
TP_fast_assign() only involved a simple variable assignment, but as
Steve pointed out, some assignments are more complicated than that.

For example, in the sched_switch tracepoint assignments:

	__entry->prev_prio      = prev->prio;
        __entry->prev_state     = __trace_sched_switch_state(prev);

so the prev_prio should be able to be tested 'in-line' but the
prev_state would require a temporary buffer to write the value into
before doing the test as mentioned by Steve.  At which point you're no
further ahead (in that case) than the current situation...

Tom

> So I think it would be wise to make this change sooner rather than
> later.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:37             ` Thomas Gleixner
  2010-11-17 14:41               ` Peter Zijlstra
@ 2010-11-17 15:41               ` Tom Zanussi
  2010-11-17 15:55                 ` Avi Kivity
  1 sibling, 1 reply; 88+ messages in thread
From: Tom Zanussi @ 2010-11-17 15:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, LKML, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 15:37 +0100, Thomas Gleixner wrote:
> On Wed, 17 Nov 2010, Peter Zijlstra wrote:
> 
> > On Wed, 2010-11-17 at 15:11 +0100, Peter Zijlstra wrote:
> > > The idea was to not let the filter engine work on the trace data (once
> > > its gathered) but on the trace argument right at the beginning of the
> > > tracepoint callchain, since some of the trace data is an expression of
> > > the trace argument (say next->prio instead of next), the trace
> > > expression wouldn't stay invariant, you'd have to write a different
> > > filter for the same effect.
> > > 
> > > So I think it would be wise to make this change sooner rather than
> > > later. 
> > 
> > Also, I see a lot of overlap with the dynamic probes stuff which needs
> > the help of magic dwarves to come up with the right data to gather.
> > 
> > Merging the dynamic tracepoint and filter stuff would be nice, there's
> > no way you can express next->prio without the help of these short
> > buggers with large axes.
> > 
> > The trouble is that the dynamic tracepoint stuff is privileged for good
> > reasons, try next+0x1000000 and you're out in the woods, priv. only
> > filters otoh just doesn't sound too hot.
> > 
> > Another nasty thing is that you actually need to have these dwarves
> > present, which means having the -debug package installed.
> 
> That sounds utterly insane for the basic use case where you trace
> application context. There is no point to filter out the tracepoints,
> really. Postprocessing can do that just fine.
> 
> I consider myself a power user of tracing, but hell I never used any
> of those filters in a real use case. awk, grep, scripting languages do
> just a better job and you don't miss any data just because you got
> your filter expression slightly wrong. Postprocessing always wins in
> that regard.
> 
> The only reason why these filters might make sense is to reduce the
> trace volume on system wide traces in production environments, but
> that's a completely different story. These scenarios can do with the
> dynamic tracepoint stuff and the custom filtering w/o putting the
> burden on the majority of users.
> 

Yeah, in my mind, the main point of the filters was to be a 'blunt
force' instrument preventing userspace from being overwhelmed by events.
The real filtering would happen in userspace with e.g. real scripting
languages.

Tom 

> Thanks,
> 
> 	tglx



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:10           ` Steven Rostedt
@ 2010-11-17 15:43             ` Peter Zijlstra
  2010-11-17 16:04               ` Steven Rostedt
                                 ` (3 more replies)
  0 siblings, 4 replies; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 15:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 10:10 -0500, Steven Rostedt wrote:
> 
> Right, the problem with filtering is what do we want to filter, and what
> about copying?
> 
> Currently, we copy the data into the buffer and then filter on that
> data. We could also easily filter on the parameters of the tracepoint,
> but sometimes those parameters do not match the final output (as the
> case with sched_switch). Do we copy the data into a separate "per cpu"
> temp buffer, and figure out the filter then? And if the filter is fine,
> then copy into the buffer. This obviously is slow, due to the multiple
> copies. We could do this only if the filtering is enabled.

Right, so what is the primary purpose of this filtering stuff? As it
stands it makes stuff terribly slow, so you add overhead but the win
(presumably) is less data output, is that a sane trade-off?

Most people I've heard -- both at LinuxCon.JP and LPC -- are asking for
lower overhead tracing (while at the same time demanding more features).

Who are actually using this and can they provide some input on this?


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:41               ` Tom Zanussi
@ 2010-11-17 15:55                 ` Avi Kivity
  2010-11-17 15:58                   ` Thomas Gleixner
  2010-11-17 15:59                   ` Peter Zijlstra
  0 siblings, 2 replies; 88+ messages in thread
From: Avi Kivity @ 2010-11-17 15:55 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Ted Ts'o, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/17/2010 05:41 PM, Tom Zanussi wrote:
>
>> The only reason why these filters might make sense is to reduce the
>> trace volume on system wide traces in production environments, but
>> that's a completely different story. These scenarios can do with the
>> dynamic tracepoint stuff and the custom filtering w/o putting the
>> burden on the majority of users.
>>
> Yeah, in my mind, the main point of the filters was to be a 'blunt
> force' instrument preventing userspace from being overwhelmed by events.
> The real filtering would happen in userspace with e.g. real scripting
> languages.
>

Filtering is also useful for fine-grained perf events; this cannot be 
done in userspace.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:55                 ` Avi Kivity
@ 2010-11-17 15:58                   ` Thomas Gleixner
  2010-11-17 16:20                     ` Avi Kivity
  2010-11-17 15:59                   ` Peter Zijlstra
  1 sibling, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2010-11-17 15:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Tom Zanussi, Peter Zijlstra, Ingo Molnar, Ted Ts'o, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 17 Nov 2010, Avi Kivity wrote:

> On 11/17/2010 05:41 PM, Tom Zanussi wrote:
> > 
> > > The only reason why these filters might make sense is to reduce the
> > > trace volume on system wide traces in production environments, but
> > > that's a completely different story. These scenarios can do with the
> > > dynamic tracepoint stuff and the custom filtering w/o putting the
> > > burden on the majority of users.
> > > 
> > Yeah, in my mind, the main point of the filters was to be a 'blunt
> > force' instrument preventing userspace from being overwhelmed by events.
> > The real filtering would happen in userspace with e.g. real scripting
> > languages.
> > 
> 
> Filtering is also useful for fine-grained perf events; this cannot be done in
> userspace.

Could you explain that a bit more detailed please ?

Thanks,

	tglx
 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:55                 ` Avi Kivity
  2010-11-17 15:58                   ` Thomas Gleixner
@ 2010-11-17 15:59                   ` Peter Zijlstra
  2010-11-17 16:08                     ` Avi Kivity
  1 sibling, 1 reply; 88+ messages in thread
From: Peter Zijlstra @ 2010-11-17 15:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Tom Zanussi, Thomas Gleixner, Ingo Molnar, Ted Ts'o, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 17:55 +0200, Avi Kivity wrote:
> 
> Filtering is also useful for fine-grained perf events; this cannot be 
> done in userspace.
> 
You're hinting at the kvm_exit histogram thing you've constructed,
right?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:43             ` Peter Zijlstra
@ 2010-11-17 16:04               ` Steven Rostedt
  2010-11-17 16:15               ` Steven Rostedt
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 16:43 +0100, Peter Zijlstra wrote:

> Most people I've heard -- both at LinuxCon.JP and LPC -- are asking for
> lower overhead tracing (while at the same time demanding more features).

Tell them we have:

 1) features
 2) fast
 3) simple to use

and have them pick any two.

;-)

-- Steve


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:59                   ` Peter Zijlstra
@ 2010-11-17 16:08                     ` Avi Kivity
  0 siblings, 0 replies; 88+ messages in thread
From: Avi Kivity @ 2010-11-17 16:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tom Zanussi, Thomas Gleixner, Ingo Molnar, Ted Ts'o, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/17/2010 05:59 PM, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 17:55 +0200, Avi Kivity wrote:
> >
> >  Filtering is also useful for fine-grained perf events; this cannot be
> >  done in userspace.
> >
> You're hinting at the kvm_exit histogram thing you've constructed,
> right?

Yes.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:43             ` Peter Zijlstra
  2010-11-17 16:04               ` Steven Rostedt
@ 2010-11-17 16:15               ` Steven Rostedt
  2010-11-17 19:40                 ` Darren Hart
  2010-11-17 18:23               ` Mathieu Desnoyers
  2010-11-18  6:00               ` Masami Hiramatsu
  3 siblings, 1 reply; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17 16:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 16:43 +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 10:10 -0500, Steven Rostedt wrote:
> > 
> > Right, the problem with filtering is what do we want to filter, and what
> > about copying?
> > 
> > Currently, we copy the data into the buffer and then filter on that
> > data. We could also easily filter on the parameters of the tracepoint,
> > but sometimes those parameters do not match the final output (as the
> > case with sched_switch). Do we copy the data into a separate "per cpu"
> > temp buffer, and figure out the filter then? And if the filter is fine,
> > then copy into the buffer. This obviously is slow, due to the multiple
> > copies. We could do this only if the filtering is enabled.
> 
> Right, so what is the primary purpose of this filtering stuff? As it
> stands it makes stuff terribly slow, so you add overhead but the win
> (presumably) is less data output, is that a sane trade-off?

I've actually used filtering too. Not for speed up, but because I was
recording a lot of data and the reader could not keep up. By filtering,
I was able to get all the relevant information without needing to make
the kernel buffer a Gig.

-- Steve



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:58                   ` Thomas Gleixner
@ 2010-11-17 16:20                     ` Avi Kivity
  0 siblings, 0 replies; 88+ messages in thread
From: Avi Kivity @ 2010-11-17 16:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tom Zanussi, Peter Zijlstra, Ingo Molnar, Ted Ts'o, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/17/2010 05:58 PM, Thomas Gleixner wrote:
> >
> >  Filtering is also useful for fine-grained perf events; this cannot be done in
> >  userspace.
>
> Could you explain that a bit more detailed please ?

I have a tracepoint kvm:kvm_exit which is called every time a guest 
exits to the host (for whatever reason).  The first step in profiling 
kvm is checking the rate at which this tracepoint is triggered.

(I have a small tool kvm_stat which shows a 'perf top' like display, 
except that instead of showing functions, it shows kvm tracepoints 
sorted by triggering rate)

If you see a high exit rate, you may also be interested in what exit 
reasons triggered those exits.  So kvm_stat creates additional events, 
one per possible exit_reason (which is a field in the kvm_exit 
tracepoint), with a filter matching exit_reason to the enum describing 
the possible exit reasons.  We add that to the display and get a sort of 
histogram showing which exit_reason is responsible for how many exits.

A screenshot:

kvm statistics

  kvm_exit                                     40075   10293
  kvm_entry                                    40075   10293
  kvm_exit(EXCEPTION_NMI)                      29824    7712
  kvm_page_fault                               29642    7672
  kvm_emulate_insn                              9117    2205
  kvm_inj_exception                             8052    2134
  kvm_mmio                                      6160    1514
  kvm_apic                                      6143    1509
  kvm_exit(CR_ACCESS)                           2307     635
  kvm_cr                                        2307     635
  kvm_exit(INVLPG)                              1998     599
  kvm_userspace_exit                            2956     582
  kvm_exit(IO_INSTRUCTION)                      2954     582
  kvm_pio                                       2954     582
  kvm_inj_virq                                  2194     551
  kvm_apic_accept_irq                           2181     547
  kvm_exit(HLT)                                 1523     372
  kvm_exit(EXTERNAL_INTERRUPT)                  1131     304
  kvm_fpu                                        706     176
  kvm_set_irq                                    802     156
  kvm_pic_set_irq                                802     156
  kvm_ioapic_set_irq                             802     156
  kvm_apic_ipi                                   454     132
  kvm_exit(PENDING_INTERRUPT)                    175      44
  kvm_cpuid                                      163      42
  kvm_exit(CPUID)                                163      42
  kvm_ack_irq                                    152      29


So there are 10K exits/sec, of which 7.7K are EXCEPTION_NMI and 635 are 
CR_ACCESS.

Tool source http://tinyurl.com/2ue5tev; one day I'll make it a 
tools/perf script.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 13:02           ` Peter Zijlstra
  2010-11-17 13:10             ` Ingo Molnar
@ 2010-11-17 16:49             ` Darren Hart
  1 sibling, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-17 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/17/2010 05:02 AM, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 13:53 +0100, Frederic Weisbecker wrote:
>> On Wed, Nov 17, 2010 at 12:35:50PM +0100, Peter Zijlstra wrote:
>>> On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
>>>>> For example I'm currently working with dozens of trace_printk() and I would be
>>>>> very happy to turn some of them off half of the time.
>>>>
>>>> I guess we could try such a patch. If you send a prototype i'd be interested in
>>>> testing it out.
>>>
>>> I don't see the point, the kernel shouldn't contain any trace_printk()s
>>> to begin with..
>>
>>
>> It's oriented toward developers. Those who use dozens of tracepoints in
>> their tree because they are debugging something or developing a new feature,
>> they might to deactivate/reactivate some of these independant points.
>>
>> This can also apply to dynamic_printk of course.
>>
>> Well, the very first and main point is to standardize trace_printk into
>> a trace event so that it gets usable by perf tools. I have been asked many
>> times "how to use trace_printk() with perf?".
>
> Thing is, since its these dev who add the trace_printk()s to begin with,
> I don't see the point in splitting them out, if you didn't want them why
> did you add them to begin with?!

What I understood from Frederic's email was that during a debug session 
it is sometimes helpful to be able to enable and disable the 
trace_printk's. This makes sense as it reduces the number of kernel 
build/reboot cycles. However, I would think most of that could be 
accomplished with some judicious message tagging and post-processing to 
filter out the unwanted trace_printk's. The only exception might be when 
the trace_printk's add enough overhead to mask a timing related bug. In 
this case, I'd probably be tempted to remove the stubs anyway.

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 14:10                     ` Frederic Weisbecker
  2010-11-17 15:02                       ` Steven Rostedt
@ 2010-11-17 18:13                       ` Ted Ts'o
  2010-11-17 18:29                         ` Frederic Weisbecker
  1 sibling, 1 reply; 88+ messages in thread
From: Ted Ts'o @ 2010-11-17 18:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Ingo Molnar, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, Nov 17, 2010 at 03:10:33PM +0100, Frederic Weisbecker wrote:
> Yeah I have a strange workflow. I'm working on that CPU isolation thing
> and I have dozens of trace_printk all over the place for tons of
> things. And everytime I remove one to unwind some output or to focus
> on another one, I often have to restore it later because I need it
> again. Usually I even just comment it out instead of removing it.

What I do for my file system development work is to drop in
trace_printk's everywhere, since they are lightweight and don't slow
down the system much.  Then when the system wedges, I use sysrq-z to
get a "flight data recorder" printout of what happened up til the
system hung.

I love the fact that the ring buffer is in "overwrite" mode (aka
flight data recorder mode), and hope this doesn't go away.

Per line filtering is also great, but very often when I'm interacting
with the block I/O layer, if something screws up, what happens is "and
then whole machine locks up", and sysrq-z is literally all I have. 

     	   	   	      	  	     - Ted

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:43             ` Peter Zijlstra
  2010-11-17 16:04               ` Steven Rostedt
  2010-11-17 16:15               ` Steven Rostedt
@ 2010-11-17 18:23               ` Mathieu Desnoyers
  2010-11-18  5:58                 ` Masami Hiramatsu
  2010-11-18  6:00               ` Masami Hiramatsu
  3 siblings, 1 reply; 88+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Wed, 2010-11-17 at 10:10 -0500, Steven Rostedt wrote:
> > 
> > Right, the problem with filtering is what do we want to filter, and what
> > about copying?
> > 
> > Currently, we copy the data into the buffer and then filter on that
> > data. We could also easily filter on the parameters of the tracepoint,
> > but sometimes those parameters do not match the final output (as the
> > case with sched_switch). Do we copy the data into a separate "per cpu"
> > temp buffer, and figure out the filter then? And if the filter is fine,
> > then copy into the buffer. This obviously is slow, due to the multiple
> > copies. We could do this only if the filtering is enabled.
> 
> Right, so what is the primary purpose of this filtering stuff? As it
> stands it makes stuff terribly slow, so you add overhead but the win
> (presumably) is less data output, is that a sane trade-off?
> 
> Most people I've heard -- both at LinuxCon.JP and LPC -- are asking for
> lower overhead tracing (while at the same time demanding more features).
> 
> Who are actually using this and can they provide some input on this?

Amongst others, Ericsson rely on this kind of feature. The case where we're
filtering "out" should be really, really fast (even if it makes the recording
slightly slower). If there is a race modifying the data underneath between the
filter execution and the copy to the buffers, I really don't think it matters.
If the tracepoint caller context don't provide data consistency guarantees for
the parameters it passes, we should not expect 100% perfect consistency between
filter and what is actually saved in the trace. The trace analysis tools should
just be able to cope with that, but I really don't think anyone should care.

So I would recommend no copy, filter directly on the source data, stop the
filter chain as soon as a non-match is observed. Copy the data into the buffers
if the filter passes.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:13                       ` Ted Ts'o
@ 2010-11-17 18:29                         ` Frederic Weisbecker
  2010-11-17 18:30                           ` Darren Hart
  0 siblings, 1 reply; 88+ messages in thread
From: Frederic Weisbecker @ 2010-11-17 18:29 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Peter Zijlstra, Ingo Molnar, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, Nov 17, 2010 at 01:13:40PM -0500, Ted Ts'o wrote:
> On Wed, Nov 17, 2010 at 03:10:33PM +0100, Frederic Weisbecker wrote:
> > Yeah I have a strange workflow. I'm working on that CPU isolation thing
> > and I have dozens of trace_printk all over the place for tons of
> > things. And everytime I remove one to unwind some output or to focus
> > on another one, I often have to restore it later because I need it
> > again. Usually I even just comment it out instead of removing it.
> 
> What I do for my file system development work is to drop in
> trace_printk's everywhere, since they are lightweight and don't slow
> down the system much.  Then when the system wedges, I use sysrq-z to
> get a "flight data recorder" printout of what happened up til the
> system hung.
> 
> I love the fact that the ring buffer is in "overwrite" mode (aka
> flight data recorder mode), and hope this doesn't go away.
> 
> Per line filtering is also great, but very often when I'm interacting
> with the block I/O layer, if something screws up, what happens is "and
> then whole machine locks up", and sysrq-z is literally all I have. 

Yeah all agreed.

Steve proposed to keep the current trace_printk() implementation that relies
on ftrace but rename in into ftrace_printk(). So that we can develop a new
trace_printk() based on trace_event interface and in the meantime keep the
old version in case something messes up with the new thing.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:29                         ` Frederic Weisbecker
@ 2010-11-17 18:30                           ` Darren Hart
  0 siblings, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-17 18:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ted Ts'o, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On 11/17/2010 10:29 AM, Frederic Weisbecker wrote:
> On Wed, Nov 17, 2010 at 01:13:40PM -0500, Ted Ts'o wrote:
>> On Wed, Nov 17, 2010 at 03:10:33PM +0100, Frederic Weisbecker wrote:
>>> Yeah I have a strange workflow. I'm working on that CPU isolation thing
>>> and I have dozens of trace_printk all over the place for tons of
>>> things. And everytime I remove one to unwind some output or to focus
>>> on another one, I often have to restore it later because I need it
>>> again. Usually I even just comment it out instead of removing it.
>>
>> What I do for my file system development work is to drop in
>> trace_printk's everywhere, since they are lightweight and don't slow
>> down the system much.  Then when the system wedges, I use sysrq-z to
>> get a "flight data recorder" printout of what happened up til the
>> system hung.
>>
>> I love the fact that the ring buffer is in "overwrite" mode (aka
>> flight data recorder mode), and hope this doesn't go away.
>>
>> Per line filtering is also great, but very often when I'm interacting
>> with the block I/O layer, if something screws up, what happens is "and
>> then whole machine locks up", and sysrq-z is literally all I have.
>
> Yeah all agreed.
>
> Steve proposed to keep the current trace_printk() implementation that relies
> on ftrace but rename in into ftrace_printk(). So that we can develop a new
> trace_printk() based on trace_event interface and in the meantime keep the
> old version in case something messes up with the new thing.

Seems reasonable.


-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:33           ` Tom Zanussi
@ 2010-11-17 18:36             ` Mathieu Desnoyers
  2010-11-17 18:53               ` Tom Zanussi
  2010-11-17 19:25               ` Steven Rostedt
  0 siblings, 2 replies; 88+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 18:36 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Li Zefan, Jason Baron, David S. Miller, Christoph Hellwig,
	Pekka Enberg, Lai Jiangshan, Eric Dumazet

* Tom Zanussi (tzanussi@gmail.com) wrote:
[...]
> IIRC, I think the conclusion we came to was that it could be done
> mechanically if for example the right-hand-side of an assignment in
> TP_fast_assign() only involved a simple variable assignment, but as
> Steve pointed out, some assignments are more complicated than that.

Yep, we came up to the same conclusions in UST.

> For example, in the sched_switch tracepoint assignments:
> 
> 	__entry->prev_prio      = prev->prio;
>         __entry->prev_state     = __trace_sched_switch_state(prev);
> 
> so the prev_prio should be able to be tested 'in-line' but the
> prev_state would require a temporary buffer to write the value into
> before doing the test as mentioned by Steve.  At which point you're no
> further ahead (in that case) than the current situation...

if we change all assignments to, e.g.:

_tp_assign(__entry->prev_prio, prev->prio)
_tp_assign(__entry->prev_state, __trace_sched_switch_state(prev))

then we can redefine the macros for filtering much more easily than with the
" = " assignment operator.

About your comment above, what is the problem with evaluating
"__trace_sched_switch_state(prev)" twice ? It will typically be cache-hot after
the first evaluation, so I wonder if, in practice, we really save a significant
amount of cycles by saving its result between filtering and writing into trace
buffers. As I pointed out earlier, for my customers, having a very, very fast
filter "out" case is more important that trying to squeeze a few cycles out of
the filter passed case.

Also, how many of these "__trace_sched_switch_state(prev)" are static inlines vs
actual function calls ? If it's mostly static inlines to dereference a few
pointers, doing it the second time when the filter passed won't hurt much.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:36             ` Mathieu Desnoyers
@ 2010-11-17 18:53               ` Tom Zanussi
  2010-11-17 19:02                 ` Mathieu Desnoyers
  2010-11-17 19:25               ` Steven Rostedt
  1 sibling, 1 reply; 88+ messages in thread
From: Tom Zanussi @ 2010-11-17 18:53 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Li Zefan, Jason Baron, David S. Miller, Christoph Hellwig,
	Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 13:36 -0500, Mathieu Desnoyers wrote:
> * Tom Zanussi (tzanussi@gmail.com) wrote:
> [...]
> > IIRC, I think the conclusion we came to was that it could be done
> > mechanically if for example the right-hand-side of an assignment in
> > TP_fast_assign() only involved a simple variable assignment, but as
> > Steve pointed out, some assignments are more complicated than that.
> 
> Yep, we came up to the same conclusions in UST.
> 
> > For example, in the sched_switch tracepoint assignments:
> > 
> > 	__entry->prev_prio      = prev->prio;
> >         __entry->prev_state     = __trace_sched_switch_state(prev);
> > 
> > so the prev_prio should be able to be tested 'in-line' but the
> > prev_state would require a temporary buffer to write the value into
> > before doing the test as mentioned by Steve.  At which point you're no
> > further ahead (in that case) than the current situation...
> 
> if we change all assignments to, e.g.:
> 
> _tp_assign(__entry->prev_prio, prev->prio)
> _tp_assign(__entry->prev_state, __trace_sched_switch_state(prev))
> 
> then we can redefine the macros for filtering much more easily than with the
> " = " assignment operator.
> 
> About your comment above, what is the problem with evaluating
> "__trace_sched_switch_state(prev)" twice ? It will typically be cache-hot after
> the first evaluation, so I wonder if, in practice, we really save a significant
> amount of cycles by saving its result between filtering and writing into trace
> buffers. As I pointed out earlier, for my customers, having a very, very fast
> filter "out" case is more important that trying to squeeze a few cycles out of
> the filter passed case.
> 

But the idea is to avoid allocating the trace buffer in the first place,
until we've decided we want the event.  So how do you check the result
of __trace_sched_switch_state(prev) with the filter value if you don't
have it temporarily stored somewhere (not in the trace buffer, which
doesn't exist yet as far as this event is concerned)?

Tom

> Also, how many of these "__trace_sched_switch_state(prev)" are static inlines vs
> actual function calls ? If it's mostly static inlines to dereference a few
> pointers, doing it the second time when the filter passed won't hurt much.
> 
> Thanks,
> 
> Mathieu
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  3:34       ` Steven Rostedt
@ 2010-11-17 19:00         ` Bob Copeland
  0 siblings, 0 replies; 88+ messages in thread
From: Bob Copeland @ 2010-11-17 19:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ted Ts'o, Thomas Gleixner, LKML, Linus Torvalds,
	Andrew Morton, Ingo Molnar, Peter Zijlstra, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Tue, Nov 16, 2010 at 10:34 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> (*) Currently in trace-cmd I have plugins. These plugins are simple
> functions to read events and either output better displays of the event
> or do something special (Avi wrote some for his kvm events).

We also use this for debugging some wireless drivers.  The (unstable)
raw events can capture entire packets and produce psuedo-pcap files with
a plugin so we can compare what we thought went over the air versus what
actually did, and correlate them with api-level tracepoints in upper
layers.  It's perhaps a bit of a bending of the infrastructure, but one
that has been helpful.

-- 
Bob Copeland %% www.bobcopeland.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:53               ` Tom Zanussi
@ 2010-11-17 19:02                 ` Mathieu Desnoyers
  2010-11-17 19:25                   ` Tom Zanussi
  0 siblings, 1 reply; 88+ messages in thread
From: Mathieu Desnoyers @ 2010-11-17 19:02 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Li Zefan, Jason Baron, David S. Miller, Christoph Hellwig,
	Pekka Enberg, Lai Jiangshan, Eric Dumazet

* Tom Zanussi (tzanussi@gmail.com) wrote:
> On Wed, 2010-11-17 at 13:36 -0500, Mathieu Desnoyers wrote:
> > * Tom Zanussi (tzanussi@gmail.com) wrote:
> > [...]
> > > IIRC, I think the conclusion we came to was that it could be done
> > > mechanically if for example the right-hand-side of an assignment in
> > > TP_fast_assign() only involved a simple variable assignment, but as
> > > Steve pointed out, some assignments are more complicated than that.
> > 
> > Yep, we came up to the same conclusions in UST.
> > 
> > > For example, in the sched_switch tracepoint assignments:
> > > 
> > > 	__entry->prev_prio      = prev->prio;
> > >         __entry->prev_state     = __trace_sched_switch_state(prev);
> > > 
> > > so the prev_prio should be able to be tested 'in-line' but the
> > > prev_state would require a temporary buffer to write the value into
> > > before doing the test as mentioned by Steve.  At which point you're no
> > > further ahead (in that case) than the current situation...
> > 
> > if we change all assignments to, e.g.:
> > 
> > _tp_assign(__entry->prev_prio, prev->prio)
> > _tp_assign(__entry->prev_state, __trace_sched_switch_state(prev))
> > 
> > then we can redefine the macros for filtering much more easily than with the
> > " = " assignment operator.
> > 
> > About your comment above, what is the problem with evaluating
> > "__trace_sched_switch_state(prev)" twice ? It will typically be cache-hot after
> > the first evaluation, so I wonder if, in practice, we really save a significant
> > amount of cycles by saving its result between filtering and writing into trace
> > buffers. As I pointed out earlier, for my customers, having a very, very fast
> > filter "out" case is more important that trying to squeeze a few cycles out of
> > the filter passed case.
> > 
> 
> But the idea is to avoid allocating the trace buffer in the first place,
> until we've decided we want the event.  So how do you check the result
> of __trace_sched_switch_state(prev) with the filter value if you don't
> have it temporarily stored somewhere (not in the trace buffer, which
> doesn't exist yet as far as this event is concerned)?

It seems I might be missing something important, but what's wrong with using
registers or the stack to hold the value for comparison ? In this case, it's a
"long", so a register seems perfectly reasonable. But again, I feel I'm missing
a key point -- what is it ?

Thanks,

Mathieu

> 
> Tom
> 
> > Also, how many of these "__trace_sched_switch_state(prev)" are static inlines vs
> > actual function calls ? If it's mostly static inlines to dereference a few
> > pointers, doing it the second time when the filter passed won't hurt much.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:36             ` Mathieu Desnoyers
  2010-11-17 18:53               ` Tom Zanussi
@ 2010-11-17 19:25               ` Steven Rostedt
  1 sibling, 0 replies; 88+ messages in thread
From: Steven Rostedt @ 2010-11-17 19:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Tom Zanussi, Peter Zijlstra, Ingo Molnar, Ted Ts'o,
	Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Masami Hiramatsu, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 13:36 -0500, Mathieu Desnoyers wrote:
> * Tom Zanussi (tzanussi@gmail.com) wrote:
> [...]
> > IIRC, I think the conclusion we came to was that it could be done
> > mechanically if for example the right-hand-side of an assignment in
> > TP_fast_assign() only involved a simple variable assignment, but as
> > Steve pointed out, some assignments are more complicated than that.
> 
> Yep, we came up to the same conclusions in UST.
> 
> > For example, in the sched_switch tracepoint assignments:
> > 
> > 	__entry->prev_prio      = prev->prio;
> >         __entry->prev_state     = __trace_sched_switch_state(prev);
> > 
> > so the prev_prio should be able to be tested 'in-line' but the
> > prev_state would require a temporary buffer to write the value into
> > before doing the test as mentioned by Steve.  At which point you're no
> > further ahead (in that case) than the current situation...
> 
> if we change all assignments to, e.g.:
> 
> _tp_assign(__entry->prev_prio, prev->prio)
> _tp_assign(__entry->prev_state, __trace_sched_switch_state(prev))

I would just call it __assign().


> 
> then we can redefine the macros for filtering much more easily than with the
> " = " assignment operator.
> 
> About your comment above, what is the problem with evaluating
> "__trace_sched_switch_state(prev)" twice ? It will typically be cache-hot after
> the first evaluation, so I wonder if, in practice, we really save a significant
> amount of cycles by saving its result between filtering and writing into trace
> buffers. As I pointed out earlier, for my customers, having a very, very fast
> filter "out" case is more important that trying to squeeze a few cycles out of
> the filter passed case.
> 
> Also, how many of these "__trace_sched_switch_state(prev)" are static inlines vs
> actual function calls ? If it's mostly static inlines to dereference a few
> pointers, doing it the second time when the filter passed won't hurt much.

Yes, something like this could work. It would require a bit more CPP
magic to handle it though.

I could write something up to do this. I just have to add it to the 100
other things I'm doing at the same time :-p

-- Steve



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 19:02                 ` Mathieu Desnoyers
@ 2010-11-17 19:25                   ` Tom Zanussi
  0 siblings, 0 replies; 88+ messages in thread
From: Tom Zanussi @ 2010-11-17 19:25 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Steven Rostedt, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Li Zefan, Jason Baron, David S. Miller, Christoph Hellwig,
	Pekka Enberg, Lai Jiangshan, Eric Dumazet

On Wed, 2010-11-17 at 14:02 -0500, Mathieu Desnoyers wrote: 
> * Tom Zanussi (tzanussi@gmail.com) wrote:
> > On Wed, 2010-11-17 at 13:36 -0500, Mathieu Desnoyers wrote:
> > > * Tom Zanussi (tzanussi@gmail.com) wrote:
> > > [...]
> > > > IIRC, I think the conclusion we came to was that it could be done
> > > > mechanically if for example the right-hand-side of an assignment in
> > > > TP_fast_assign() only involved a simple variable assignment, but as
> > > > Steve pointed out, some assignments are more complicated than that.
> > > 
> > > Yep, we came up to the same conclusions in UST.
> > > 
> > > > For example, in the sched_switch tracepoint assignments:
> > > > 
> > > > 	__entry->prev_prio      = prev->prio;
> > > >         __entry->prev_state     = __trace_sched_switch_state(prev);
> > > > 
> > > > so the prev_prio should be able to be tested 'in-line' but the
> > > > prev_state would require a temporary buffer to write the value into
> > > > before doing the test as mentioned by Steve.  At which point you're no
> > > > further ahead (in that case) than the current situation...
> > > 
> > > if we change all assignments to, e.g.:
> > > 
> > > _tp_assign(__entry->prev_prio, prev->prio)
> > > _tp_assign(__entry->prev_state, __trace_sched_switch_state(prev))
> > > 
> > > then we can redefine the macros for filtering much more easily than with the
> > > " = " assignment operator.
> > > 
> > > About your comment above, what is the problem with evaluating
> > > "__trace_sched_switch_state(prev)" twice ? It will typically be cache-hot after
> > > the first evaluation, so I wonder if, in practice, we really save a significant
> > > amount of cycles by saving its result between filtering and writing into trace
> > > buffers. As I pointed out earlier, for my customers, having a very, very fast
> > > filter "out" case is more important that trying to squeeze a few cycles out of
> > > the filter passed case.
> > > 
> > 
> > But the idea is to avoid allocating the trace buffer in the first place,
> > until we've decided we want the event.  So how do you check the result
> > of __trace_sched_switch_state(prev) with the filter value if you don't
> > have it temporarily stored somewhere (not in the trace buffer, which
> > doesn't exist yet as far as this event is concerned)?
> 
> It seems I might be missing something important, but what's wrong with using
> registers or the stack to hold the value for comparison ? In this case, it's a

Nothing, it shouldn't matter where the temporary storage is, as long as
it's not in the trace buffer.

Tom

> "long", so a register seems perfectly reasonable. But again, I feel I'm missing
> a key point -- what is it ?
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Tom
> > 
> > > Also, how many of these "__trace_sched_switch_state(prev)" are static inlines vs
> > > actual function calls ? If it's mostly static inlines to dereference a few
> > > pointers, doing it the second time when the filter passed won't hurt much.
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > 
> > 
> 




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 16:15               ` Steven Rostedt
@ 2010-11-17 19:40                 ` Darren Hart
  0 siblings, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-17 19:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On 11/17/2010 08:15 AM, Steven Rostedt wrote:
> On Wed, 2010-11-17 at 16:43 +0100, Peter Zijlstra wrote:
>> On Wed, 2010-11-17 at 10:10 -0500, Steven Rostedt wrote:
>>>
>>> Right, the problem with filtering is what do we want to filter, and what
>>> about copying?
>>>
>>> Currently, we copy the data into the buffer and then filter on that
>>> data. We could also easily filter on the parameters of the tracepoint,
>>> but sometimes those parameters do not match the final output (as the
>>> case with sched_switch). Do we copy the data into a separate "per cpu"
>>> temp buffer, and figure out the filter then? And if the filter is fine,
>>> then copy into the buffer. This obviously is slow, due to the multiple
>>> copies. We could do this only if the filtering is enabled.
>>
>> Right, so what is the primary purpose of this filtering stuff? As it
>> stands it makes stuff terribly slow, so you add overhead but the win
>> (presumably) is less data output, is that a sane trade-off?
>
> I've actually used filtering too. Not for speed up, but because I was
> recording a lot of data and the reader could not keep up. By filtering,
> I was able to get all the relevant information without needing to make
> the kernel buffer a Gig.

I have run into situations where the volume of output becomes a problem 
and not every system will have the memory to dedicate to massive trace 
buffers. This is, for me, probably the one motivating argument for doing 
filtering in the kernel as opposed to post-processing scripts.

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
                   ` (5 preceding siblings ...)
  2010-11-17 12:47 ` [ANNOUNCE] New utility: 'trace' Török Edwin
@ 2010-11-18  0:47 ` Ian Munsie
       [not found]   ` <20101118151141.GA3368@redhat.com>
  6 siblings, 1 reply; 88+ messages in thread
From: Ian Munsie @ 2010-11-18  0:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	Steven Rostedt, Arjan van de Ven, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Masami Hiramatsu, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet

Excerpts from Thomas Gleixner's message of Wed Nov 17 08:04:40 +1100 2010:
> We are pleased to announce a new tracing tool called 'trace'.

Cool!

> Build the -tip kernel and reboot into it to get the most out of the
> tool - but even on default kernels it will try to cope. Syscall events
> (CONFIG_FTRACE_SYSCALLS=y) is a must for most methods of analysis
> though. (with the above -tip tree you'll get it all configured by
> default)

OK, now I'll really have to find some time in the near future to go back
to the work I was doing on the ftrace syscalls + building on Jason's
ftrace compat syscall patches to make the most of this.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-17 12:07   ` [patch] trace: Add user-space event tracing/injection Ingo Molnar
                       ` (2 preceding siblings ...)
  2010-11-17 12:29     ` Peter Zijlstra
@ 2010-11-18  1:18     ` Darren Hart
  2010-11-18  8:55       ` Ingo Molnar
  3 siblings, 1 reply; 88+ messages in thread
From: Darren Hart @ 2010-11-18  1:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Arjan van de Ven

On 11/17/2010 04:07 AM, Ingo Molnar wrote:
>
> * Pekka Enberg<penberg@kernel.org>  wrote:
>
>> (Trimming CC for sanity.)
>
> [ Added back Linus (in case he wants to object to the new prctl()) and lkml. ]
>
>> On 11/16/10 11:04 PM, Thomas Gleixner wrote:
>>> 'trace' is our shot at improving the situation: it aims at providing a
>>> simple to use and straightforward tracing tool based on the perf
>>> infrastructure and on the well-known perf profiling workflow:
>>>
>>>     trace record firefox          # trace firefox and all child tasks,
>>>                                     put it into trace.data
>>>
>>>     trace summary                 # show quick overview of the trace,
>>>                                     based on trace.data
>>>
>>>     trace report                  # detailed traces, based on trace.data
>>
>> Nice work guys!
>>
>> Does this concept lend itself to tracing latencies in userspace applications that
>> run in virtual machines (e.g. the Java kind)? I'm of course interested in this
>> because of Jato [1] where bunch of interesting things can cause jitter: JIT
>> compilation, GC, kernel, and the actual application doing something (in either
>> native code or JIT'd code). It's important to be able to measure where "slowness"
>> to desktop applications and certain class of server applications comes from to be
>> able to improve things.
>
> Makes quite a bit of sense.
>
> How about the attached patch? It works fine with the simple testcase included in the
> changelog. There's a common-sense limit on the message size - but otherwise it adds
> support for apps to generate a free-form string trace event.


Ideally I would like to see something just like trace_printf() without 
having to define it myself in each of my testcases. I only see a few 
ways of doing that, none of which are particularly nice:

# Perf trace wrappers in linux header files
#include <linux/perf-trace.h>
   - requires working with glibc for headers installation
   - could just be something like trace_printf() (that Ingo posted in
     another mail in this thread) in the header file itself, no need for
     a library.


# Perf vDSO and /sys exported headers
-lperf -I/sys/kernel/include/perf/
#include <perf-trace.h>
   - introduces including from and linking to the kernel to applications
     (not just glibc)
   - introduces kernel /sys exported headers (ewe)


# Limited variable count
Use the max syscall argument count (7?) to allow for an opcode, format 
string, and at most 5 arguments. This isn't very pretty in the user code 
and I'd probably rather wrap the prctl version below than use this idea.

# char[] pointer argument
This requires the conversion of all arguments to strings and rather 
defeats the purpose of having the kernel do the formatting.


Are there any other tricks I'm not thinking of which we could use to 
implement:

int trace_printf(const char *format, ...);


The only other approach I can think of is a per process file descriptor 
which could be used with fprintf. Something like 
/proc/$PID/trace_marker. That wouldn't require any new headers or vDSO, 
but it would add to the vfs, which Ingo objected to. However, placing it 
in proc makes it feasible to access even in chroot environments. Leaving 
the existing prctl implementation would provide a bare bones fallback 
when needed.

Thoughts?

Thanks,

Darren

>
> Thanks,
>
> 	Ingo
>
> ---------------------------------->
> Subject: trace: Add user-space event tracing/injection
> From: Ingo Molnar<mingo@elte.hu>
> Date: Wed Nov 17 10:11:53 CET 2010
>
> This feature (suggested by Darren Hart and Pekka Engberg) allows user-space
> programs to print trace events in a very simple and self-contained way:
>
>   #include<sys/prctl.h>
>   #include<string.h>
>
>   #define PR_TASK_PERF_USER_TRACE 35
>
>   int main(void)
>   {
>           char *msg = "Hello World!\n";
>
>           prctl(PR_TASK_PERF_USER_TRACE, msg, strlen(msg));
>
>           return 0;
>   }
>
> These show up in 'trace' output as:
>
>   $ trace report
>   #
>   # trace events of 'sleep 1':
>   #
>          testit/ 6006 ( 0.002 ms):<"Hello World!">
>          testit/ 6006 ( 0.002 ms):<"Hello World!">
>
> Suggested-by: Darren Hart<dvhart@linux.intel.com>
> Suggested-by: Pekka Enberg<penberg@kernel.org>
> Signed-off-by: Ingo Molnar<mingo@elte.hu>
> ---
>   include/linux/prctl.h       |    2 ++
>   include/trace/events/user.h |   32 ++++++++++++++++++++++++++++++++
>   kernel/sys.c                |   23 +++++++++++++++++++++++
>   kernel/trace/trace_events.c |   34 +++++++++++-----------------------
>   tools/perf/builtin-trace.c  |   41 +++++++++++++++++++++++++++++++++++++++++
>   5 files changed, 109 insertions(+), 23 deletions(-)
>
> Index: linux/include/linux/prctl.h
> ===================================================================
> --- linux.orig/include/linux/prctl.h
> +++ linux/include/linux/prctl.h
> @@ -102,4 +102,6 @@
>
>   #define PR_MCE_KILL_GET 34
>
> +#define PR_TASK_PERF_USER_TRACE			35
> +
>   #endif /* _LINUX_PRCTL_H */
> Index: linux/include/trace/events/user.h
> ===================================================================
> --- /dev/null
> +++ linux/include/trace/events/user.h
> @@ -0,0 +1,32 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM user
> +
> +#if !defined(_TRACE_USER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_USER_H_
> +
> +#include<linux/tracepoint.h>
> +#include<linux/ftrace.h>
> +
> +TRACE_EVENT(user,
> +
> +	TP_PROTO(const char *message),
> +
> +	TP_ARGS(message),
> +
> +	TP_STRUCT__entry(
> +		__string(	message, message);
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(message, message);
> +	),
> +
> +	TP_printk("user %s", __get_str(message))
> +);
> +
> +#undef NO_DEV
> +
> +#endif /* _TRACE_USER_H_ */
> +
> +/* This part must be outside protection */
> +#include<trace/define_trace.h>
> Index: linux/kernel/sys.c
> ===================================================================
> --- linux.orig/kernel/sys.c
> +++ linux/kernel/sys.c
> @@ -47,6 +47,11 @@
>   #include<asm/io.h>
>   #include<asm/unistd.h>
>
> +#define MAX_USER_TRACE_SIZE 128
> +
> +#define CREATE_TRACE_POINTS
> +#include<trace/events/user.h>
> +
>   #ifndef SET_UNALIGN_CTL
>   # define SET_UNALIGN_CTL(a,b)	(-EINVAL)
>   #endif
> @@ -1681,6 +1686,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>   		case PR_TASK_PERF_EVENTS_ENABLE:
>   			error = perf_event_task_enable();
>   			break;
> +		/*
> +		 * Inject a trace event into the current tracing context:
> +		 */
> +		case PR_TASK_PERF_USER_TRACE:
> +		{
> +			void __user *uevent_ptr = (void *)arg2;
> +			char kstring[MAX_USER_TRACE_SIZE+1];
> +			unsigned long uevent_len = arg3;
> +
> +			if (uevent_len>  MAX_USER_TRACE_SIZE)
> +				return -EINVAL;
> +			if (copy_from_user(kstring, uevent_ptr, uevent_len))
> +				return -EFAULT;
> +			kstring[uevent_len] = 0;
> +
> +			trace_user(kstring);
> +			return 0;
> +		}
>   		case PR_GET_TIMERSLACK:
>   			error = current->timer_slack_ns;
>   			break;
> Index: linux/kernel/trace/trace_events.c
> ===================================================================
> --- linux.orig/kernel/trace/trace_events.c
> +++ linux/kernel/trace/trace_events.c
> @@ -989,8 +989,7 @@ static struct dentry *event_trace_events
>
>   	d_events = debugfs_create_dir("events", d_tracer);
>   	if (!d_events)
> -		pr_warning("Could not create debugfs "
> -			   "'events' directory\n");
> +		pr_warning("Could not create debugfs 'events' directory\n");
>
>   	return d_events;
>   }
> @@ -1014,15 +1013,13 @@ event_subsystem_dir(const char *name, st
>   	/* need to create new entry */
>   	system = kmalloc(sizeof(*system), GFP_KERNEL);
>   	if (!system) {
> -		pr_warning("No memory to create event subsystem %s\n",
> -			   name);
> +		pr_warning("No memory to create event subsystem %s\n", name);
>   		return d_events;
>   	}
>
>   	system->entry = debugfs_create_dir(name, d_events);
>   	if (!system->entry) {
> -		pr_warning("Could not create event subsystem %s\n",
> -			   name);
> +		pr_warning("Could not create event subsystem %s\n", name);
>   		kfree(system);
>   		return d_events;
>   	}
> @@ -1041,8 +1038,7 @@ event_subsystem_dir(const char *name, st
>
>   	system->filter = kzalloc(sizeof(struct event_filter), GFP_KERNEL);
>   	if (!system->filter) {
> -		pr_warning("Could not allocate filter for subsystem "
> -			   "'%s'\n", name);
> +		pr_warning("Could not allocate filter for subsystem '%s'\n", name);
>   		return system->entry;
>   	}
>
> @@ -1051,8 +1047,7 @@ event_subsystem_dir(const char *name, st
>   	if (!entry) {
>   		kfree(system->filter);
>   		system->filter = NULL;
> -		pr_warning("Could not create debugfs "
> -			   "'%s/filter' entry\n", name);
> +		pr_warning("Could not create debugfs '%s/filter' entry\n", name);
>   	}
>
>   	trace_create_file("enable", 0644, system->entry,
> @@ -1081,19 +1076,16 @@ event_create_dir(struct ftrace_event_cal
>
>   	call->dir = debugfs_create_dir(call->name, d_events);
>   	if (!call->dir) {
> -		pr_warning("Could not create debugfs "
> -			   "'%s' directory\n", call->name);
> +		pr_warning("Could not create debugfs '%s' directory\n", call->name);
>   		return -1;
>   	}
>
>   	if (call->class->reg)
> -		trace_create_file("enable", 0644, call->dir, call,
> -				  enable);
> +		trace_create_file("enable", 0644, call->dir, call, enable);
>
>   #ifdef CONFIG_PERF_EVENTS
>   	if (call->event.type&&  call->class->reg)
> -		trace_create_file("id", 0444, call->dir, call,
> -		 		  id);
> +		trace_create_file("id", 0444, call->dir, call, id);
>   #endif
>
>   	/*
> @@ -1104,16 +1096,12 @@ event_create_dir(struct ftrace_event_cal
>   	if (list_empty(head)) {
>   		ret = call->class->define_fields(call);
>   		if (ret<  0) {
> -			pr_warning("Could not initialize trace point"
> -				   " events/%s\n", call->name);
> +			pr_warning("Could not initialize trace point events/%s\n", call->name);
>   			return ret;
>   		}
>   	}
> -	trace_create_file("filter", 0644, call->dir, call,
> -			  filter);
> -
> -	trace_create_file("format", 0444, call->dir, call,
> -			  format);
> +	trace_create_file("filter", 0644, call->dir, call, filter);
> +	trace_create_file("format", 0444, call->dir, call, format);
>
>   	return 0;
>   }
> Index: linux/tools/perf/builtin-trace.c
> ===================================================================
> --- linux.orig/tools/perf/builtin-trace.c
> +++ linux/tools/perf/builtin-trace.c
> @@ -719,6 +719,44 @@ static void vfs_getname(void *data,
>   	}
>   }
>
> +static void user_event(void *data,
> +			struct event *event __used,
> +			int cpu __used,
> +			u64 timestamp __used,
> +			struct thread *thread __used)
> +{
> +	struct thread_data *tdata = get_thread_data(thread);
> +	unsigned int i;
> +	char *message;
> +	u64 t = 0;
> +
> +	if (!tdata->enabled)
> +		return;
> +	if (time_filtered(timestamp))
> +		return;
> +	if (cpu_filtered(cpu))
> +		return;
> +	if (tdata->entry_time) {
> +		t = timestamp - tdata->entry_time;
> +		if (filter_duration(t))
> +			return;
> +	} else if (duration_filter)
> +		return;
> +
> +	print_entry_head(thread, timestamp, t, cpu);
> +	message = strdup(raw_field_ptr(event, "message", data));
> +
> +	/* Sanitize the string a bit before printing it: */
> +	for (i = 0; i<  strlen(message); i++) {
> +		if (message[i] == '\n')
> +			message[i] = ' ';
> +	}
> +
> +	color_fprintf(stdout, PERF_COLOR_GREEN, "<\"%s\">\n", message);
> +
> +	free(message);
> +}
> +
>   static int pagefault_preprocess_sample(const event_t *self,
>   				       struct addr_location *al,
>   				       struct sample_data *data,
> @@ -1036,6 +1074,8 @@ process_raw_event(event_t *self, void *d
>   		pagefault_exit(data, event, cpu, timestamp, thread);
>   	if (!strcmp(event->name, "vfs_getname"))
>   		vfs_getname(data, event, cpu, timestamp, thread);
> +	if (!strcmp(event->name, "user"))
> +		user_event(data, event, cpu, timestamp, thread);
>
>   	if (!scheduler_events&&  !scheduler_all_events)
>   		return;
> @@ -1226,6 +1266,7 @@ static const char *record_args[] = {
>   	"-e", "sched:sched_stat_sleep:r",
>   	"-e", "sched:sched_stat_iowait:r",
>   	"-e", "sched:sched_stat_runtime:r",
> +	"-e", "user:user:r",
>   };
>
>   static int __cmd_record(int argc, const char **argv)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 18:23               ` Mathieu Desnoyers
@ 2010-11-18  5:58                 ` Masami Hiramatsu
  0 siblings, 0 replies; 88+ messages in thread
From: Masami Hiramatsu @ 2010-11-18  5:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Steven Rostedt, Ingo Molnar, Ted Ts'o,
	Thomas Gleixner, LKML, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Tom Zanussi, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet,
	2nddept-manager

(2010/11/18 3:23), Mathieu Desnoyers wrote:
> Amongst others, Ericsson rely on this kind of feature. The case where we're
> filtering "out" should be really, really fast (even if it makes the recording
> slightly slower). If there is a race modifying the data underneath between the
> filter execution and the copy to the buffers, I really don't think it matters.
> If the tracepoint caller context don't provide data consistency guarantees for
> the parameters it passes, we should not expect 100% perfect consistency between
> filter and what is actually saved in the trace. The trace analysis tools should
> just be able to cope with that, but I really don't think anyone should care.
> 
> So I would recommend no copy, filter directly on the source data, stop the
> filter chain as soon as a non-match is observed. Copy the data into the buffers
> if the filter passes.

Indeed, so that we can eliminate memory write accesses. :)

Thanks,

-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17 15:43             ` Peter Zijlstra
                                 ` (2 preceding siblings ...)
  2010-11-17 18:23               ` Mathieu Desnoyers
@ 2010-11-18  6:00               ` Masami Hiramatsu
  3 siblings, 0 replies; 88+ messages in thread
From: Masami Hiramatsu @ 2010-11-18  6:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Ingo Molnar, Ted Ts'o, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Tom Zanussi,
	Mathieu Desnoyers, Li Zefan, Jason Baron, David S. Miller,
	Christoph Hellwig, Pekka Enberg, Lai Jiangshan, Eric Dumazet,
	2nddept-manager

(2010/11/18 0:43), Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 10:10 -0500, Steven Rostedt wrote:
>>
>> Right, the problem with filtering is what do we want to filter, and what
>> about copying?
>>
>> Currently, we copy the data into the buffer and then filter on that
>> data. We could also easily filter on the parameters of the tracepoint,
>> but sometimes those parameters do not match the final output (as the
>> case with sched_switch). Do we copy the data into a separate "per cpu"
>> temp buffer, and figure out the filter then? And if the filter is fine,
>> then copy into the buffer. This obviously is slow, due to the multiple
>> copies. We could do this only if the filtering is enabled.
> 
> Right, so what is the primary purpose of this filtering stuff? As it
> stands it makes stuff terribly slow, so you add overhead but the win
> (presumably) is less data output, is that a sane trade-off?
> 
> Most people I've heard -- both at LinuxCon.JP and LPC -- are asking for
> lower overhead tracing (while at the same time demanding more features).

I've also heard a dynamic filtering (or kicking buffer snapshot) request
in LinuxCon.JP. I think filtering is not only for filtering-out purpose,
but also useful for hooking event to take some action. :)

Thank you,


-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-18  1:18     ` Darren Hart
@ 2010-11-18  8:55       ` Ingo Molnar
  2010-11-18 16:25         ` hp
  2010-11-18 17:06         ` Darren Hart
  0 siblings, 2 replies; 88+ messages in thread
From: Ingo Molnar @ 2010-11-18  8:55 UTC (permalink / raw)
  To: Darren Hart
  Cc: Pekka Enberg, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Arjan van de Ven


* Darren Hart <dvhart@linux.intel.com> wrote:

> Ideally I would like to see something just like trace_printf()
> without having to define it myself in each of my testcases. [...]

We can make the prctl a single-argument thing, at the cost of not allowing \0 in the 
content. (which is probably sane anyway)

That way deployment is super-simple:

	prctl(35, "My Trace Message");
	...


	if (asprintf(&msg, "My Trace Message: %d\n", 1234) != -1) {
		prctl(35, *msg);
		free(*msg);
	}

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-18  8:55       ` Ingo Molnar
@ 2010-11-18 16:25         ` hp
  2010-11-18 18:56           ` Ingo Molnar
  2010-11-18 17:06         ` Darren Hart
  1 sibling, 1 reply; 88+ messages in thread
From: hp @ 2010-11-18 16:25 UTC (permalink / raw)
  To: linux-kernel

Ingo Molnar <mingo <at> elte.hu> writes:

> 
> 
> * Darren Hart <dvhart <at> linux.intel.com> wrote:
> 
> > Ideally I would like to see something just like trace_printf()
> > without having to define it myself in each of my testcases. [...]
> 
> We can make the prctl a single-argument thing, at the cost of not allowing \0
in the 
> content. (which is probably sane anyway)
> 
> That way deployment is super-simple:
> 
> 	prctl(35, "My Trace Message");
> 	...
> 
> 	if (asprintf(&msg, "My Trace Message: %d\n", 1234) != -1) {
> 		prctl(35, *msg);
> 		free(*msg);
> 	}
> 
> Thanks,
> 
> 	Ingo
> 
I like this approach - it is doing it nearly the same way I did it with an extra
k-mod (no patch needed) and a debugfs entry handled in that mod.
I only see one thing with the string only data - I am doing stuff where there
are long recording times with also a lot of user events,
in such an environment I need more semantics on the event contents.
In my k-mod solution there's an event ID and the opportunity to log binary data.
As prctl() has 4 additional args after the option, it would be possible to use
it in the following way:
prtctl( 35, int eventID, int data_type, int msglen, void *buf);
or without the data_type
prtctl( 35, int eventID, int msglen, void *buf);
decoding would be of more effort but it would be worth

The event definition would be like this (with data_type):

TRACE_EVENT(user,
        TP_PROTO(int id, int dtype, int dlen, unsigned char *bytes),
        TP_ARGS(id, dtype, dlen, bytes),
        TP_STRUCT__entry(
                __field(int, ev_id)
                __field(int, ev_type)
                __dynamic_array(unsigned char, ev_data, dlen)
        ),
        TP_fast_assign(
                __entry->ev_id = id;
		__entry->ev_type = dtype;
		memcpy(__get_dynamic_array(ev_data), bytes, dlen);
        ),

        TP_printk("ID: %d type: %s data: %s", __entry->ev_id,
__print_symbolic(__entry->ev_type, {0,"V"}, {1,"I"}, {2,"S"}, {4,"B"}),
__entry->ev_type == 0 ? "n/a" : __get_str(ev_data))
);


What do you think about this?
/hp


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-18  8:55       ` Ingo Molnar
  2010-11-18 16:25         ` hp
@ 2010-11-18 17:06         ` Darren Hart
  1 sibling, 0 replies; 88+ messages in thread
From: Darren Hart @ 2010-11-18 17:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	linux-kernel, Linus Torvalds, Andrew Morton, Arjan van de Ven

On 11/18/2010 12:55 AM, Ingo Molnar wrote:
>
> * Darren Hart<dvhart@linux.intel.com>  wrote:
>
>> Ideally I would like to see something just like trace_printf()
>> without having to define it myself in each of my testcases. [...]
>
> We can make the prctl a single-argument thing, at the cost of not allowing \0 in the
> content. (which is probably sane anyway)
>
> That way deployment is super-simple:
>
> 	prctl(35, "My Trace Message");
> 	...
>
>
> 	if (asprintf(&msg, "My Trace Message: %d\n", 1234) != -1) {
> 		prctl(35, *msg);
> 		free(*msg);
> 	}


I think that would be ideal.

-- 
Darren Hart
Yocto Linux Kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch] trace: Add user-space event tracing/injection
  2010-11-18 16:25         ` hp
@ 2010-11-18 18:56           ` Ingo Molnar
  2010-11-18 19:13             ` AW: " Reichert, Hans-Peter
  0 siblings, 1 reply; 88+ messages in thread
From: Ingo Molnar @ 2010-11-18 18:56 UTC (permalink / raw)
  To: hp
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker


* hp <hp.reichert@xse.de> wrote:

> Ingo Molnar <mingo <at> elte.hu> writes:
> 
> > 
> > 
> > * Darren Hart <dvhart <at> linux.intel.com> wrote:
> > 
> > > Ideally I would like to see something just like trace_printf()
> > > without having to define it myself in each of my testcases. [...]
> > 
> > We can make the prctl a single-argument thing, at the cost of not allowing \0
> in the 
> > content. (which is probably sane anyway)
> > 
> > That way deployment is super-simple:
> > 
> > 	prctl(35, "My Trace Message");
> > 	...
> > 
> > 	if (asprintf(&msg, "My Trace Message: %d\n", 1234) != -1) {
> > 		prctl(35, *msg);
> > 		free(*msg);
> > 	}
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> I like this approach - it is doing it nearly the same way I did it with an extra
> k-mod (no patch needed) and a debugfs entry handled in that mod.
> I only see one thing with the string only data - I am doing stuff where there
> are long recording times with also a lot of user events,
> in such an environment I need more semantics on the event contents.
> In my k-mod solution there's an event ID and the opportunity to log binary data.
> As prctl() has 4 additional args after the option, it would be possible to use
> it in the following way:
> prtctl( 35, int eventID, int data_type, int msglen, void *buf);
> or without the data_type
> prtctl( 35, int eventID, int msglen, void *buf);
> decoding would be of more effort but it would be worth
> 
> The event definition would be like this (with data_type):
> 
> TRACE_EVENT(user,
>         TP_PROTO(int id, int dtype, int dlen, unsigned char *bytes),
>         TP_ARGS(id, dtype, dlen, bytes),
>         TP_STRUCT__entry(
>                 __field(int, ev_id)
>                 __field(int, ev_type)
>                 __dynamic_array(unsigned char, ev_data, dlen)
>         ),
>         TP_fast_assign(
>                 __entry->ev_id = id;
> 		__entry->ev_type = dtype;
> 		memcpy(__get_dynamic_array(ev_data), bytes, dlen);
>         ),
> 
>         TP_printk("ID: %d type: %s data: %s", __entry->ev_id,
> __print_symbolic(__entry->ev_type, {0,"V"}, {1,"I"}, {2,"S"}, {4,"B"}),
> __entry->ev_type == 0 ? "n/a" : __get_str(ev_data))
> );
> 
> 
> What do you think about this?

The transport was not limited to strings - it's a memory buffer of 'len' bytes.

In that sense 'ev_id' and 'ev_type' above is really just hardcoding something that 
the app might not care about.

For example with the patch i sent one could send 1 byte messages - no other 
overhead. (beyond the standard record header)

While if an app does want to use an (ev_id, ev_type), it can still do so. Or if an 
app wants to do some other message type, that can be done too - it's free-form.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 88+ messages in thread

* AW: [patch] trace: Add user-space event tracing/injection
  2010-11-18 18:56           ` Ingo Molnar
@ 2010-11-18 19:13             ` Reichert, Hans-Peter
  0 siblings, 0 replies; 88+ messages in thread
From: Reichert, Hans-Peter @ 2010-11-18 19:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker




XS Embedded GmbH
Roggenbachstr. 6
D-78050 Villingen - Schwenningen

Telefon +49 (7721) 4060 - 450
Fax     +49 (7721) 4060 - 499

www.xse.de
Hans-Peter.Reichert@xse.de

:::::::::: based.on.visions ::::::::::


XS Embedded GmbH
Geschäftsführung: Michael Maier, Rainer Oder

Handelsregister: Amtsgericht Freiburg i.Br., HRB 705237
Sitz der Gesellschaft: Roggenbachstrasse 6, 78050 Villingen-Schwenningen
USt-Id Nummer: DE268786819

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.

-----Ursprüngliche Nachricht-----
Von: Ingo Molnar [mailto:mingo@elte.hu]
Gesendet: Donnerstag, 18. November 2010 19:57
An: Reichert, Hans-Peter
Cc: linux-kernel@vger.kernel.org; Peter Zijlstra; Thomas Gleixner; Arnaldo Carvalho de Melo; Frédéric Weisbecker
Betreff: Re: [patch] trace: Add user-space event tracing/injection


* hp <hp.reichert@xse.de> wrote:

> Ingo Molnar <mingo <at> elte.hu> writes:
>
> >
> >
> > * Darren Hart <dvhart <at> linux.intel.com> wrote:
> >
> > > Ideally I would like to see something just like trace_printf()
> > > without having to define it myself in each of my testcases. [...]
> >
> > We can make the prctl a single-argument thing, at the cost of not allowing \0
> in the
> > content. (which is probably sane anyway)
> >
> > That way deployment is super-simple:
> >
> >     prctl(35, "My Trace Message");
> >     ...
> >
> >     if (asprintf(&msg, "My Trace Message: %d\n", 1234) != -1) {
> >             prctl(35, *msg);
> >             free(*msg);
> >     }
> >
> > Thanks,
> >
> >     Ingo
> >
> I like this approach - it is doing it nearly the same way I did it with an extra
> k-mod (no patch needed) and a debugfs entry handled in that mod.
> I only see one thing with the string only data - I am doing stuff where there
> are long recording times with also a lot of user events,
> in such an environment I need more semantics on the event contents.
> In my k-mod solution there's an event ID and the opportunity to log binary data.
> As prctl() has 4 additional args after the option, it would be possible to use
> it in the following way:
> prtctl( 35, int eventID, int data_type, int msglen, void *buf);
> or without the data_type
> prtctl( 35, int eventID, int msglen, void *buf);
> decoding would be of more effort but it would be worth
>
> The event definition would be like this (with data_type):
>
> TRACE_EVENT(user,
>         TP_PROTO(int id, int dtype, int dlen, unsigned char *bytes),
>         TP_ARGS(id, dtype, dlen, bytes),
>         TP_STRUCT__entry(
>                 __field(int, ev_id)
>                 __field(int, ev_type)
>                 __dynamic_array(unsigned char, ev_data, dlen)
>         ),
>         TP_fast_assign(
>                 __entry->ev_id = id;
>               __entry->ev_type = dtype;
>               memcpy(__get_dynamic_array(ev_data), bytes, dlen);
>         ),
>
>         TP_printk("ID: %d type: %s data: %s", __entry->ev_id,
> __print_symbolic(__entry->ev_type, {0,"V"}, {1,"I"}, {2,"S"}, {4,"B"}),
> __entry->ev_type == 0 ? "n/a" : __get_str(ev_data))
> );
>
>
> What do you think about this?

The transport was not limited to strings - it's a memory buffer of 'len' bytes.

In that sense 'ev_id' and 'ev_type' above is really just hardcoding something that
the app might not care about.

For example with the patch i sent one could send 1 byte messages - no other
overhead. (beyond the standard record header)

While if an app does want to use an (ev_id, ev_type), it can still do so. Or if an
app wants to do some other message type, that can be done too - it's free-form.

Thanks,

        Ingo

I agree.
The reason for this was to have a format description that could be used within "trace-cmd report"
without having the need to implement a separate decoder for (each) user land application.
I did a lot of system tracing for the last 10years on a different 'NX system.
>From an analysts sight it is more than just convenience that the tracing format is the same for every application,
just think of a system that is build up of binaries of several suppliers - like it will be in automotive projects.
Having a uniform format description would highly increase usage and acceptance.

Thanks
/hp





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-17  8:30     ` Ingo Molnar
  2010-11-17 11:35       ` Peter Zijlstra
@ 2010-11-18 23:23       ` Joe Perches
  1 sibling, 0 replies; 88+ messages in thread
From: Joe Perches @ 2010-11-18 23:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Darren Hart, Thomas Gleixner, LKML,
	Linus Torvalds, Andrew Morton, Peter Zijlstra, Steven Rostedt,
	Arjan van de Ven, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Tom Zanussi, Mathieu Desnoyers, Li Zefan, Jason Baron,
	David S. Miller, Christoph Hellwig, Pekka Enberg, Lai Jiangshan,
	Eric Dumazet

On Wed, 2010-11-17 at 09:30 +0100, Ingo Molnar wrote:
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > An idea is to reproduce the kernel file hierarchy in a "printk" event
> > subsystem, but this implies to allow subsystems nesting.
> > But may be the whole idea is just fancy and nobody will care, [...]
> I think it's an excellent idea, and it could also give us the framework to integrate 
> the dynamic_printk points.
[]
> > For example I'm currently working with dozens of trace_printk() and I would be 
> > very happy to turn some of them off half of the time.
> I guess we could try such a patch. If you send a prototype i'd be interested in 
> testing it out.

Another concept you might consider would be to
selectively compile trace points per compilation
unit.  This might help embedded users that might
want to use trace only in a few areas without the
additional text overhead.

Perhaps something like:

(default on)

#ifdef ENABLE_TRACE_COMPILATION
# normal tracepoint macro defines
#else
# static inline tracepoint functions {} or null macro defines
#endif



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
       [not found]   ` <20101118151141.GA3368@redhat.com>
@ 2010-11-19  2:32     ` Ian Munsie
  2010-11-19 15:23       ` Jason Baron
       [not found]       ` <4CECACF9.3080907@linux.vnet.ibm.com>
  0 siblings, 2 replies; 88+ messages in thread
From: Ian Munsie @ 2010-11-19  2:32 UTC (permalink / raw)
  To: Jason Baron; +Cc: Avantika Mathur, linux-kernel

Hi Jason,

I'm adding Avantika to the CC list as she recently contacted me
regarding syscall tracepoints on PowerPC.

Excerpts from Jason Baron's message of Fri Nov 19 02:11:42 +1100 2010:
> I was wondering what the status of those patches were...last I remember
> they looked pretty good to me...I'd be more than happy to help review
> those.

Yeah there isn't much left to do on them - the main delay has just been
finding some time to work on them.

I did want to revisit the naming of the events and the subtle
differences for syscalls with a single implementation for both the
native and compat versions versus those that have separate
implementations - I don't want userspace to have to care about that
distinction.

I recall there were a few comments on some of your patches in the series
- if I rebase the patches on the current tip tree and publish the tree
somewhere would you be willing to take a look at those?


Cheers,
-Ian

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [ANNOUNCE] New utility: 'trace'
  2010-11-19  2:32     ` Ian Munsie
@ 2010-11-19 15:23       ` Jason Baron
       [not found]       ` <4CECACF9.3080907@linux.vnet.ibm.com>
  1 sibling, 0 replies; 88+ messages in thread
From: Jason Baron @ 2010-11-19 15:23 UTC (permalink / raw)
  To: Ian Munsie; +Cc: Avantika Mathur, linux-kernel

On Fri, Nov 19, 2010 at 01:32:08PM +1100, Ian Munsie wrote:
> Hi Jason,
> 
> I'm adding Avantika to the CC list as she recently contacted me
> regarding syscall tracepoints on PowerPC.
> 
> Excerpts from Jason Baron's message of Fri Nov 19 02:11:42 +1100 2010:
> > I was wondering what the status of those patches were...last I remember
> > they looked pretty good to me...I'd be more than happy to help review
> > those.
> 
> Yeah there isn't much left to do on them - the main delay has just been
> finding some time to work on them.
> 
> I did want to revisit the naming of the events and the subtle
> differences for syscalls with a single implementation for both the
> native and compat versions versus those that have separate
> implementations - I don't want userspace to have to care about that
> distinction.
> 
> I recall there were a few comments on some of your patches in the series
> - if I rebase the patches on the current tip tree and publish the tree
> somewhere would you be willing to take a look at those?
> 

sure. the remaining issue was to combine the enabled/disabled flags into
the 'int nb_args' field since we are only using 3 bits of it. that
shouldn't be too bad...

thanks,

-Jason


^ permalink raw reply	[flat|nested] 88+ messages in thread

* PowerPC, ftrace: Add PPC raw syscall tracepoints & ftrace fixes
       [not found]         ` <4CFD7182.4060206@linux.vnet.ibm.com>
@ 2010-12-07  4:03           ` Ian Munsie
  0 siblings, 0 replies; 88+ messages in thread
From: Ian Munsie @ 2010-12-07  4:03 UTC (permalink / raw)
  To: Avantika Mathur, Jason Baron, linux-kernel

Excerpts from Avantika Mathur's message of Tue Dec 07 10:28:02 +1100 2010:
> Were you able to find any time to work on the patches?  I understand if 
> you are not able to work on this.  If you don't have time to push the 
> patches soon,  I would like to push at your 4 patches that add the 
> powerpc syscall tracing support.
> Let me know what your status is.  I woud like to plan to do this in the 
> next couple days

Hi Avantika,

I haven't had a chance to work on them unfortunately, I've been pretty
tied up with other work.

I've rebased the subset of patches necessary to get the raw syscall
tracepoints working and to ensure that none of the unworking ftrace
syscall events show up. This won't give you the per syscall events that
ftrace can provide, but if all you need is the raw tracepoints they will
work. I'll send them to the mailing list shortly.


Jason: It seems that publishing a tree for you to work on will be time
consuming due to legal concerns of publishing the whole kernel tree.
Rather than go down that path it will be easier for me to just send you
the patches directly. I'll try to find some time to rebase them and send
them to you before I go away next week.


Cheers,
-Ian

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2010-12-07  4:03 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-16 21:04 [ANNOUNCE] New utility: 'trace' Thomas Gleixner
2010-11-16 21:27 ` Darren Hart
2010-11-16 21:59   ` Ingo Molnar
2010-11-16 22:03     ` Peter Zijlstra
2010-11-16 22:08       ` Ingo Molnar
2010-11-16 22:09       ` Darren Hart
2010-11-16 22:48         ` Darren Hart
2010-11-16 22:07     ` Darren Hart
2010-11-16 22:17   ` Frederic Weisbecker
2010-11-17  8:30     ` Ingo Molnar
2010-11-17 11:35       ` Peter Zijlstra
2010-11-17 12:53         ` Frederic Weisbecker
2010-11-17 13:02           ` Peter Zijlstra
2010-11-17 13:10             ` Ingo Molnar
2010-11-17 13:36               ` Peter Zijlstra
2010-11-17 13:43                 ` Frederic Weisbecker
2010-11-17 13:53                   ` Peter Zijlstra
2010-11-17 14:10                     ` Frederic Weisbecker
2010-11-17 15:02                       ` Steven Rostedt
2010-11-17 18:13                       ` Ted Ts'o
2010-11-17 18:29                         ` Frederic Weisbecker
2010-11-17 18:30                           ` Darren Hart
2010-11-17 16:49             ` Darren Hart
2010-11-18 23:23       ` Joe Perches
2010-11-16 21:35 ` Ingo Molnar
2010-11-17  1:37 ` Ted Ts'o
2010-11-17  1:47   ` Steven Rostedt
2010-11-17  3:16     ` Ted Ts'o
2010-11-17  3:34       ` Steven Rostedt
2010-11-17 19:00         ` Bob Copeland
2010-11-17 13:24   ` Ingo Molnar
2010-11-17 13:38     ` Peter Zijlstra
2010-11-17 14:00       ` Ingo Molnar
2010-11-17 14:11         ` Peter Zijlstra
2010-11-17 14:18           ` Peter Zijlstra
2010-11-17 14:37             ` Thomas Gleixner
2010-11-17 14:41               ` Peter Zijlstra
2010-11-17 15:41               ` Tom Zanussi
2010-11-17 15:55                 ` Avi Kivity
2010-11-17 15:58                   ` Thomas Gleixner
2010-11-17 16:20                     ` Avi Kivity
2010-11-17 15:59                   ` Peter Zijlstra
2010-11-17 16:08                     ` Avi Kivity
2010-11-17 15:10           ` Steven Rostedt
2010-11-17 15:43             ` Peter Zijlstra
2010-11-17 16:04               ` Steven Rostedt
2010-11-17 16:15               ` Steven Rostedt
2010-11-17 19:40                 ` Darren Hart
2010-11-17 18:23               ` Mathieu Desnoyers
2010-11-18  5:58                 ` Masami Hiramatsu
2010-11-18  6:00               ` Masami Hiramatsu
2010-11-17 15:33           ` Tom Zanussi
2010-11-17 18:36             ` Mathieu Desnoyers
2010-11-17 18:53               ` Tom Zanussi
2010-11-17 19:02                 ` Mathieu Desnoyers
2010-11-17 19:25                   ` Tom Zanussi
2010-11-17 19:25               ` Steven Rostedt
2010-11-17  9:49 ` Philipp Marek
     [not found] ` <4CE38C53.8090606@kernel.org>
2010-11-17 12:07   ` [patch] trace: Add user-space event tracing/injection Ingo Molnar
2010-11-17 12:17     ` Pekka Enberg
2010-11-17 12:30       ` Ingo Molnar
2010-11-17 12:33         ` Pekka Enberg
2010-11-17 12:42           ` Peter Zijlstra
2010-11-17 12:58             ` Ingo Molnar
2010-11-17 13:09               ` Peter Zijlstra
2010-11-17 13:36                 ` Ingo Molnar
2010-11-17 12:24     ` Ingo Molnar
2010-11-17 12:30       ` Pekka Enberg
2010-11-17 12:29     ` Peter Zijlstra
2010-11-17 12:37       ` Peter Zijlstra
2010-11-17 13:10       ` Frederic Weisbecker
2010-11-17 13:31         ` Peter Zijlstra
2010-11-17 13:33           ` Frederic Weisbecker
2010-11-18  1:18     ` Darren Hart
2010-11-18  8:55       ` Ingo Molnar
2010-11-18 16:25         ` hp
2010-11-18 18:56           ` Ingo Molnar
2010-11-18 19:13             ` AW: " Reichert, Hans-Peter
2010-11-18 17:06         ` Darren Hart
2010-11-17 12:47 ` [ANNOUNCE] New utility: 'trace' Török Edwin
2010-11-17 12:51   ` Peter Zijlstra
2010-11-17 13:05     ` Török Edwin
2010-11-17 13:10       ` Peter Zijlstra
2010-11-17 13:32   ` Frederic Weisbecker
2010-11-18  0:47 ` Ian Munsie
     [not found]   ` <20101118151141.GA3368@redhat.com>
2010-11-19  2:32     ` Ian Munsie
2010-11-19 15:23       ` Jason Baron
     [not found]       ` <4CECACF9.3080907@linux.vnet.ibm.com>
     [not found]         ` <4CFD7182.4060206@linux.vnet.ibm.com>
2010-12-07  4:03           ` PowerPC, ftrace: Add PPC raw syscall tracepoints & ftrace fixes Ian Munsie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).