All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v4)
@ 2013-12-24  8:22 Namhyung Kim
  2013-12-24  8:22 ` [PATCH 01/21] perf tools: Introduce struct add_entry_iter Namhyung Kim
                   ` (20 more replies)
  0 siblings, 21 replies; 56+ messages in thread
From: Namhyung Kim @ 2013-12-24  8:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Arun Sharma, Frederic Weisbecker, Jiri Olsa, Rodrigo Campos

Hello,

This is my third attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

Please see the patch 01/21.  I refactored functions that add hist
entries with struct add_entry_iter.  While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.

I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self").  The output will be sorted by children (cumulative) overhead
for now.  The reason I changed to the --children is that I still think
it's much different from other --callchain options and I plan to add
support for showing (remaining) callchains to cumulative entries too
as Arun requested.  The --callchain option will take care of it even
with --children option.

I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next.  I also added a new config option to enable it by default.

 * changes in v4:
  - change to --children option (Ingo)
  - rebased on new annotation change (Arnaldo)
  - support perf top also
  - enable --children option by default (Ingo)

 * changes in v3:
  - change to --cumulate option
  - fix a couple of bugs (Jiri, Rodrigo)
  - rename some help functions (Arnaldo)
  - cache previous hist entries rathen than just symbol and dso
  - add some preparatory cleanups
  - add report.cumulate config option


Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
  	int i;
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  void b(void)
  {
  	a();
  }
  void c(void)
  {
  	b();
  }
  int main(void)
  {
  	c();
  	return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc

  $ perf report --stdio
      88.29%      abc  abc                [.] a                  
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main

       9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
                  |
                  --- _dl_relocate_object
                      dl_main
                      _dl_sysdep_start

       2.27%      abc  [kernel.kallsyms]  [k] page_fault         
                  |
                  --- page_fault
                     |          
                     |--95.94%-- _dl_sysdep_start
                     |          _dl_start_user
                     |          
                      --4.06%-- _start

       0.00%      abc  ld-2.17.so         [.] _start             
                  |
                  --- _start


When the -g cumulative option is given, it'll be shown like this:

  $ perf report --children --stdio

  #     Self  Children  Command      Shared Object                   Symbol
  # ........  ........  .......  .................  .......................
  #
       0.00%    88.29%      abc  libc-2.17.so       [.] __libc_start_main  
       0.00%    88.29%      abc  abc                [.] main               
       0.00%    88.29%      abc  abc                [.] c                  
       0.00%    88.29%      abc  abc                [.] b                  
      88.29%    88.29%      abc  abc                [.] a                  
       0.00%    11.61%      abc  ld-2.17.so         [.] _dl_sysdep_start   
       0.00%     9.43%      abc  ld-2.17.so         [.] dl_main            
       9.43%     9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
       2.27%     2.27%      abc  [kernel.kallsyms]  [k] page_fault         
       0.00%     2.18%      abc  ld-2.17.so         [.] _dl_start_user     
       0.00%     0.10%      abc  ld-2.17.so         [.] _start             

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

I know it have some rough edges or even bugs, but I really want to
release it and get reviews.  It does not handle event groups and
annotations yet.

You can also get this series on 'perf/cumulate-v4' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>

[1] https://lkml.org/lkml/2012/3/31/6


Namhyung Kim (21):
  perf tools: Introduce struct add_entry_iter
  perf hists: Convert hist entry functions to use struct he_stat
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf hists: Accumulate hist entry stat based on the callchain
  perf tools: Update cpumode for each cumulative entry
  perf report: Cache cumulative callchains
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf tools: Apply percent-limit to cumulative percentage
  perf tools: Add more hpp helper functions
  perf report: Add --children option
  perf report: Add report.children config option
  perf tools: Factor out sample__resolve_callchain()
  perf tools: Factor out fill_callchain_info()
  perf top: Support callchain accumulation
  perf top: Add --children option
  perf top: Add top.children config option
  perf tools: Enable --children option by default

 tools/perf/Documentation/perf-report.txt |   5 +
 tools/perf/Documentation/perf-top.txt    |   6 +
 tools/perf/builtin-annotate.c            |   3 +-
 tools/perf/builtin-diff.c                |   2 +-
 tools/perf/builtin-report.c              | 534 +++++++++++++++++++++++++------
 tools/perf/builtin-top.c                 | 137 +++++++-
 tools/perf/tests/hists_link.c            |   4 +-
 tools/perf/ui/browsers/hists.c           |  51 ++-
 tools/perf/ui/gtk/hists.c                |  27 +-
 tools/perf/ui/hist.c                     |  62 ++++
 tools/perf/ui/stdio/hist.c               |  13 +-
 tools/perf/util/callchain.c              |  65 ++++
 tools/perf/util/callchain.h              |   8 +
 tools/perf/util/hist.c                   |  73 +++--
 tools/perf/util/hist.h                   |   7 +-
 tools/perf/util/sort.h                   |   1 +
 tools/perf/util/symbol.c                 |  11 +-
 tools/perf/util/symbol.h                 |   1 +
 18 files changed, 855 insertions(+), 155 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET 00/24] perf tools: Add support to accumulate hist periods (v7)
@ 2014-01-23  0:13 Namhyung Kim
  2014-01-23  0:14 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2014-01-23  0:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim, LKML,
	Arun Sharma, Jiri Olsa, Rodrigo Campos, Andi Kleen,
	Frederic Weisbecker

Hello,

This is a new attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

This patchset is based on my previous patchset [2] but I think it's
almost independent so that it can be applied separately.

Please see the patch 03/24.  I refactored functions that add hist
entries with struct hist_entry_iter.  While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.

I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self").  The output will be sorted by children (cumulative) overhead
for now.  The reason I changed to the --children is that I still think
it's much different from other --call-graph options.  The --call-graph
option will take care of it even with --children option.

I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next.  I also added a new config option to enable it by default.

 * chagnes in v7:
  - add Tested-by tags from Arun
  - rebase onto current acme/perf/core

 * changes in v6:
  - separate struct hist_iter_ops (Jiri)
  - check iter->he before calling ->add_entry_cb (Jiri)
  - fix locking issue on perf top (Jiri)

 * changes in v5:
  - support both of --children and --call-graph (Arun)
  - refactor hist_entry_iter to share with perf top (Jiri)
  - various cleanups and fixes (Jiri)
  - add ack's from Jiri

 * changes in v4:
  - change to --children option (Ingo)
  - rebased on new annotation change (Arnaldo)
  - support perf top also
  - enable --children option by default (Ingo)

 * changes in v3:
  - change to --cumulate option
  - fix a couple of bugs (Jiri, Rodrigo)
  - rename some help functions (Arnaldo)
  - cache previous hist entries rathen than just symbol and dso
  - add some preparatory cleanups
  - add report.cumulate config option


Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
  	int i;
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  void b(void)
  {
  	a();
  }
  void c(void)
  {
  	b();
  }
  int main(void)
  {
  	c();
  	return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc


Case 1.

  $ perf report --stdio --no-call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
       8.18%      abc  ld-2.17.so         [.] strlen    
       0.31%      abc  [kernel.kallsyms]  [k] page_fault
       0.01%      abc  ld-2.17.so         [.] _start    


Case 2. (current default behavior)

  $ perf report --stdio --call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main

       8.18%      abc  ld-2.17.so         [.] strlen    
                  |
                  --- strlen
                      _dl_sysdep_start

       0.31%      abc  [kernel.kallsyms]  [k] page_fault
                  |
                  --- page_fault
                      _start

       0.01%      abc  ld-2.17.so         [.] _start    
                  |
                  --- _start


Case 3.

  $ perf report --no-call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
       0.00%    91.50%      abc  abc                [.] main             
       0.00%    91.50%      abc  abc                [.] c                
       0.00%    91.50%      abc  abc                [.] b                
      91.50%    91.50%      abc  abc                [.] a                
       0.00%     8.18%      abc  ld-2.17.so         [.] _dl_sysdep_start 
       8.18%     8.18%      abc  ld-2.17.so         [.] strlen           
       0.01%     0.33%      abc  ld-2.17.so         [.] _start           
       0.31%     0.31%      abc  [kernel.kallsyms]  [k] page_fault       

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

Finally, it looks like below with both option enabled:

Case 4. (default behavior?)

  $ perf report --call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
                  |
                  --- __libc_start_main

       0.00%    91.50%      abc  abc                [.] main             
                  |
                  --- main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] c                
                  |
                  --- c
                      main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] b                
                  |
                  --- b
                      c
                      main
                      __libc_start_main

      91.50%    91.50%      abc  abc                [.] a                
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main
  ...


Currently the perf enables both of --call-graph and --children when it
finds callchains in the samples.  While this is useful for TUI or GTK,
I'm not sure for stdio as it'd consume so much lines.

It does not handle all kind of cases like event groups and annotations
yet, but I really want to release it and get reviews.

You can also get this series on 'perf/cumulate-v7' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>

[1] https://lkml.org/lkml/2012/3/31/6
[2] https://lkml.org/lkml/2014/1/22/514


Namhyung Kim (21):
  perf tools: Introduce struct hist_entry_iter
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf hists: Accumulate hist entry stat based on the callchain
  perf tools: Update cpumode for each cumulative entry
  perf report: Cache cumulative callchains
  perf callchain: Add callchain_cursor_snapshot()
  perf tools: Save callchain info for each cumulative entry
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf tools: Apply percent-limit to cumulative percentage
  perf tools: Add more hpp helper functions
  perf report: Add --children option
  perf report: Add report.children config option
  perf tools: Add callback function to hist_entry_iter
  perf top: Convert to hist_entry_iter
  perf top: Add --children option
  perf top: Add top.children config option
  perf tools: Enable --children option by default

 tools/perf/Documentation/perf-report.txt |   5 +
 tools/perf/Documentation/perf-top.txt    |   6 +
 tools/perf/builtin-annotate.c            |   3 +-
 tools/perf/builtin-diff.c                |   2 +-
 tools/perf/builtin-report.c              | 184 +++--------
 tools/perf/builtin-top.c                 |  97 +++---
 tools/perf/tests/hists_link.c            |   4 +-
 tools/perf/ui/browsers/hists.c           |  50 +--
 tools/perf/ui/gtk/hists.c                |  20 +-
 tools/perf/ui/hist.c                     |  62 ++++
 tools/perf/ui/stdio/hist.c               |   4 +-
 tools/perf/util/callchain.c              |  42 +++
 tools/perf/util/callchain.h              |  11 +
 tools/perf/util/hist.c                   | 506 ++++++++++++++++++++++++++++++-
 tools/perf/util/hist.h                   |  47 ++-
 tools/perf/util/sort.h                   |  18 +-
 tools/perf/util/symbol.c                 |  11 +-
 tools/perf/util/symbol.h                 |   1 +
 18 files changed, 850 insertions(+), 223 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v8)
@ 2014-02-07  1:35 Namhyung Kim
  2014-02-07  1:35 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2014-02-07  1:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim, LKML,
	Arun Sharma, Frederic Weisbecker, Rodrigo Campos, Andi Kleen,
	David Ahern

Hello,

This is a new attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

In this version, I separated out the --percentage patchset not to have
dependency on it since it requires further works.

Please see the patch 01/21.  I refactored functions that add hist
entries with struct hist_entry_iter.  While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.

I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self").  The output will be sorted by children (cumulative) overhead
for now.  The reason I changed to the --children is that I still think
it's much different from other --call-graph options.  The --call-graph
option will take care of it even with --children option.

I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next.  I also added a new config option to enable it by default.

 * changes in v8:
  - not depends on --percentage patchkit
  - fix callchain resolving bug (Jiri)
  - convert to sample__resolve_{mem,bstack}
  - eliminate 'event' field from hist_entry_iter

 * changes in v7:
  - add Tested-by tags from Arun
  - rebase onto current acme/perf/core

 * changes in v6:
  - separate struct hist_iter_ops (Jiri)
  - check iter->he before calling ->add_entry_cb (Jiri)
  - fix locking issue on perf top (Jiri)

 * changes in v5:
  - support both of --children and --call-graph (Arun)
  - refactor hist_entry_iter to share with perf top (Jiri)
  - various cleanups and fixes (Jiri)
  - add ack's from Jiri

 * changes in v4:
  - change to --children option (Ingo)
  - rebased on new annotation change (Arnaldo)
  - support perf top also
  - enable --children option by default (Ingo)

 * changes in v3:
  - change to --cumulate option
  - fix a couple of bugs (Jiri, Rodrigo)
  - rename some help functions (Arnaldo)
  - cache previous hist entries rathen than just symbol and dso
  - add some preparatory cleanups
  - add report.cumulate config option


Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
  	int i;
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  void b(void)
  {
  	a();
  }
  void c(void)
  {
  	b();
  }
  int main(void)
  {
  	c();
  	return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc


Case 1.

  $ perf report --stdio --no-call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
       8.18%      abc  ld-2.17.so         [.] strlen    
       0.31%      abc  [kernel.kallsyms]  [k] page_fault
       0.01%      abc  ld-2.17.so         [.] _start    


Case 2. (current default behavior)

  $ perf report --stdio --call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main

       8.18%      abc  ld-2.17.so         [.] strlen    
                  |
                  --- strlen
                      _dl_sysdep_start

       0.31%      abc  [kernel.kallsyms]  [k] page_fault
                  |
                  --- page_fault
                      _start

       0.01%      abc  ld-2.17.so         [.] _start    
                  |
                  --- _start


Case 3.

  $ perf report --no-call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
       0.00%    91.50%      abc  abc                [.] main             
       0.00%    91.50%      abc  abc                [.] c                
       0.00%    91.50%      abc  abc                [.] b                
      91.50%    91.50%      abc  abc                [.] a                
       0.00%     8.18%      abc  ld-2.17.so         [.] _dl_sysdep_start 
       8.18%     8.18%      abc  ld-2.17.so         [.] strlen           
       0.01%     0.33%      abc  ld-2.17.so         [.] _start           
       0.31%     0.31%      abc  [kernel.kallsyms]  [k] page_fault       

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

Finally, it looks like below with both option enabled:

Case 4. (default behavior?)

  $ perf report --call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
                  |
                  --- __libc_start_main

       0.00%    91.50%      abc  abc                [.] main             
                  |
                  --- main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] c                
                  |
                  --- c
                      main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] b                
                  |
                  --- b
                      c
                      main
                      __libc_start_main

      91.50%    91.50%      abc  abc                [.] a                
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main
  ...


Currently the perf enables both of --call-graph and --children when it
finds callchains in the samples.  While this is useful for TUI or GTK,
I'm not sure for stdio as it'd consume so much lines.

It does not handle all kind of cases like event groups and annotations
yet, but I really want to release it and get reviews.

You can also get this series on 'perf/cumulate-v8' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>

[1] https://lkml.org/lkml/2012/3/31/6


Namhyung Kim (21):
  perf tools: Introduce struct hist_entry_iter
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf hists: Accumulate hist entry stat based on the callchain
  perf tools: Update cpumode for each cumulative entry
  perf report: Cache cumulative callchains
  perf callchain: Add callchain_cursor_snapshot()
  perf tools: Save callchain info for each cumulative entry
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf tools: Apply percent-limit to cumulative percentage
  perf tools: Add more hpp helper functions
  perf report: Add --children option
  perf report: Add report.children config option
  perf tools: Add callback function to hist_entry_iter
  perf top: Convert to hist_entry_iter
  perf top: Add --children option
  perf top: Add top.children config option
  perf tools: Enable --children option by default

 tools/perf/Documentation/perf-report.txt |   5 +
 tools/perf/Documentation/perf-top.txt    |   6 +
 tools/perf/builtin-annotate.c            |   3 +-
 tools/perf/builtin-diff.c                |   2 +-
 tools/perf/builtin-report.c              | 178 +++--------
 tools/perf/builtin-top.c                 |  89 +++---
 tools/perf/tests/hists_link.c            |   4 +-
 tools/perf/ui/browsers/hists.c           |  50 ++--
 tools/perf/ui/gtk/hists.c                |  20 +-
 tools/perf/ui/hist.c                     |  62 ++++
 tools/perf/ui/stdio/hist.c               |   4 +-
 tools/perf/util/callchain.c              |  45 ++-
 tools/perf/util/callchain.h              |  11 +
 tools/perf/util/hist.c                   | 499 ++++++++++++++++++++++++++++++-
 tools/perf/util/hist.h                   |  45 ++-
 tools/perf/util/sort.h                   |  18 +-
 tools/perf/util/symbol.c                 |  11 +-
 tools/perf/util/symbol.h                 |   1 +
 18 files changed, 836 insertions(+), 217 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v9)
@ 2014-03-20  5:36 Namhyung Kim
  2014-03-20  5:36 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2014-03-20  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim,
	Namhyung Kim, LKML, Jiri Olsa, David Ahern, Frederic Weisbecker,
	Andi Kleen, Arun Sharma, Rodrigo Campos

Hello,

This is a new attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

This version depends on my previous percentage patchset [2] and output
field patchset[3].  So please test this again although there's not
much change in the series itself.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.

I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self").  The output will be sorted by children (cumulative) overhead
for now.  It also adds 'overhead_children' field to -F/--fields option
to be selected by user and it'll show you "N/A" if the cumulation is
not supported (due to missing callchain).

 * changes in v9:
  - support output field option
  - add Acked-by tags from Jiri

 * changes in v8:
  - not depends on --percentage patchkit
  - fix callchain resolving bug (Jiri)
  - convert to sample__resolve_{mem,bstack}
  - eliminate 'event' field from hist_entry_iter

 * changes in v7:
  - add Tested-by tags from Arun
  - rebase onto current acme/perf/core

 * changes in v6:
  - separate struct hist_iter_ops (Jiri)
  - check iter->he before calling ->add_entry_cb (Jiri)
  - fix locking issue on perf top (Jiri)

 * changes in v5:
  - support both of --children and --call-graph (Arun)
  - refactor hist_entry_iter to share with perf top (Jiri)
  - various cleanups and fixes (Jiri)
  - add ack's from Jiri

 * changes in v4:
  - change to --children option (Ingo)
  - rebased on new annotation change (Arnaldo)
  - support perf top also
  - enable --children option by default (Ingo)

 * changes in v3:
  - change to --cumulate option
  - fix a couple of bugs (Jiri, Rodrigo)
  - rename some help functions (Arnaldo)
  - cache previous hist entries rathen than just symbol and dso
  - add some preparatory cleanups
  - add report.cumulate config option


Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
  	int i;
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  void b(void)
  {
  	a();
  }
  void c(void)
  {
  	b();
  }
  int main(void)
  {
  	c();
  	return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc


Case 1.

  $ perf report --stdio --no-call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
       8.18%      abc  ld-2.17.so         [.] strlen    
       0.31%      abc  [kernel.kallsyms]  [k] page_fault
       0.01%      abc  ld-2.17.so         [.] _start    


Case 2. (current default behavior)

  $ perf report --stdio --call-graph --no-children

  # Overhead  Command      Shared Object          Symbol
  # ........  .......  .................  ..............
  #
      91.50%      abc  abc                [.] a         
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main

       8.18%      abc  ld-2.17.so         [.] strlen    
                  |
                  --- strlen
                      _dl_sysdep_start

       0.31%      abc  [kernel.kallsyms]  [k] page_fault
                  |
                  --- page_fault
                      _start

       0.01%      abc  ld-2.17.so         [.] _start    
                  |
                  --- _start


Case 3.

  $ perf report --no-call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
       0.00%    91.50%      abc  abc                [.] main             
       0.00%    91.50%      abc  abc                [.] c                
       0.00%    91.50%      abc  abc                [.] b                
      91.50%    91.50%      abc  abc                [.] a                
       0.00%     8.18%      abc  ld-2.17.so         [.] _dl_sysdep_start 
       8.18%     8.18%      abc  ld-2.17.so         [.] strlen           
       0.01%     0.33%      abc  ld-2.17.so         [.] _start           
       0.31%     0.31%      abc  [kernel.kallsyms]  [k] page_fault       

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

Finally, it looks like below with both option enabled:

Case 4. (default behavior?)

  $ perf report --call-graph --children --stdio

  #     Self  Children  Command      Shared Object                 Symbol
  # ........  ........  .......  .................  .....................
  #
       0.00%    91.50%      abc  libc-2.17.so       [.] __libc_start_main
                  |
                  --- __libc_start_main

       0.00%    91.50%      abc  abc                [.] main             
                  |
                  --- main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] c                
                  |
                  --- c
                      main
                      __libc_start_main

       0.00%    91.50%      abc  abc                [.] b                
                  |
                  --- b
                      c
                      main
                      __libc_start_main

      91.50%    91.50%      abc  abc                [.] a                
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main
  ...


Currently the perf enables both of --call-graph and --children when it
finds callchains in the samples.  While this is useful for TUI or GTK,
I'm not sure for stdio as it'd consume so much lines.

It does not handle all kind of cases like event annotation yet, but I
really want to release it and get reviews.

You can also get this series on 'perf/cumulate-v9' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>

[1] https://lkml.org/lkml/2012/3/31/6
[2] https://lkml.org/lkml/2014/3/10/48
[3] https://lkml.org/lkml/2014/3/19/689


Namhyung Kim (21):
  perf tools: Introduce struct hist_entry_iter
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf hists: Accumulate hist entry stat based on the callchain
  perf tools: Update cpumode for each cumulative entry
  perf report: Cache cumulative callchains
  perf callchain: Add callchain_cursor_snapshot()
  perf tools: Save callchain info for each cumulative entry
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf tools: Apply percent-limit to cumulative percentage
  perf tools: Add more hpp helper functions
  perf report: Add --children option
  perf report: Add report.children config option
  perf tools: Do not auto-remove Children column if --fields given
  perf tools: Add callback function to hist_entry_iter
  perf top: Convert to hist_entry_iter
  perf top: Add --children option
  perf top: Add top.children config option
  perf tools: Enable --children option by default

 tools/perf/Documentation/perf-report.txt |   7 +-
 tools/perf/Documentation/perf-top.txt    |   8 +-
 tools/perf/builtin-annotate.c            |   3 +-
 tools/perf/builtin-diff.c                |   2 +-
 tools/perf/builtin-report.c              | 193 +++---------
 tools/perf/builtin-top.c                 |  95 +++---
 tools/perf/tests/hists_link.c            |   4 +-
 tools/perf/ui/browsers/hists.c           |  68 +++--
 tools/perf/ui/gtk/hists.c                |  23 +-
 tools/perf/ui/hist.c                     | 119 ++++++++
 tools/perf/ui/stdio/hist.c               |   4 +-
 tools/perf/util/callchain.c              |  45 ++-
 tools/perf/util/callchain.h              |  11 +
 tools/perf/util/hist.c                   | 495 ++++++++++++++++++++++++++++++-
 tools/perf/util/hist.h                   |  49 ++-
 tools/perf/util/sort.c                   |   1 +
 tools/perf/util/sort.h                   |  18 +-
 tools/perf/util/symbol.c                 |  11 +-
 tools/perf/util/symbol.h                 |   1 +
 19 files changed, 910 insertions(+), 247 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2014-03-20  5:37 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-24  8:22 [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v4) Namhyung Kim
2013-12-24  8:22 ` [PATCH 01/21] perf tools: Introduce struct add_entry_iter Namhyung Kim
2014-01-05 14:55   ` Jiri Olsa
2014-01-06  7:45     ` Namhyung Kim
2014-01-05 15:28   ` Jiri Olsa
2014-01-06  7:45     ` Namhyung Kim
2014-01-05 15:55   ` Jiri Olsa
2014-01-06  8:03     ` Namhyung Kim
2014-01-06 14:32       ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 02/21] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
2014-01-05 16:09   ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 03/21] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
2014-01-05 16:16   ` Jiri Olsa
2014-01-05 16:26   ` Jiri Olsa
2014-01-06  8:09     ` Namhyung Kim
2014-01-05 16:45   ` Jiri Olsa
2014-01-06  8:10     ` Namhyung Kim
2013-12-24  8:22 ` [PATCH 04/21] perf hists: Check if accumulated when adding a " Namhyung Kim
2013-12-24  8:22 ` [PATCH 05/21] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
2014-01-05 16:58   ` Jiri Olsa
2014-01-05 17:15     ` Jiri Olsa
2014-01-06  8:17       ` Namhyung Kim
2013-12-24  8:22 ` [PATCH 06/21] perf tools: Update cpumode for each cumulative entry Namhyung Kim
2014-01-05 17:02   ` Jiri Olsa
2014-01-06  8:18     ` Namhyung Kim
2013-12-24  8:22 ` [PATCH 07/21] perf report: Cache cumulative callchains Namhyung Kim
2014-01-05 17:10   ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 08/21] perf hists: Sort hist entries by accumulated period Namhyung Kim
2013-12-24  8:22 ` [PATCH 09/21] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
2014-01-05 17:31   ` Jiri Olsa
2014-01-06  8:32     ` Namhyung Kim
2014-01-06 14:30       ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 10/21] perf ui/browser: " Namhyung Kim
2014-01-05 17:33   ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 11/21] perf ui/gtk: " Namhyung Kim
2014-01-05 17:35   ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 12/21] perf tools: Apply percent-limit to cumulative percentage Namhyung Kim
2014-01-05 17:40   ` Jiri Olsa
2014-01-06  8:33     ` Namhyung Kim
2013-12-24  8:22 ` [PATCH 13/21] perf tools: Add more hpp helper functions Namhyung Kim
2013-12-24  8:22 ` [PATCH 14/21] perf report: Add --children option Namhyung Kim
2013-12-24  8:22 ` [PATCH 15/21] perf report: Add report.children config option Namhyung Kim
2013-12-24  8:22 ` [PATCH 16/21] perf tools: Factor out sample__resolve_callchain() Namhyung Kim
2014-01-05 17:56   ` Jiri Olsa
2013-12-24  8:22 ` [PATCH 17/21] perf tools: Factor out fill_callchain_info() Namhyung Kim
2013-12-24  8:22 ` [PATCH 18/21] perf top: Support callchain accumulation Namhyung Kim
2014-01-05 18:01   ` Jiri Olsa
2014-01-06  8:34     ` Namhyung Kim
2013-12-24  8:22 ` [PATCH 19/21] perf top: Add --children option Namhyung Kim
2013-12-24  8:22 ` [PATCH 20/21] perf top: Add top.children config option Namhyung Kim
2013-12-24  8:22 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
2014-01-05 18:08   ` Jiri Olsa
2014-01-06  8:47     ` Namhyung Kim
2014-01-23  0:13 [PATCHSET 00/24] perf tools: Add support to accumulate hist periods (v7) Namhyung Kim
2014-01-23  0:14 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
2014-02-07  1:35 [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v8) Namhyung Kim
2014-02-07  1:35 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim
2014-03-20  5:36 [PATCHSET 00/21] perf tools: Add support to accumulate hist periods (v9) Namhyung Kim
2014-03-20  5:36 ` [PATCH 21/21] perf tools: Enable --children option by default Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.