linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: "Wangnan (F)" <wangnan0@huawei.com>,
	linux-kernel@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Borislav Petkov <bp@suse.de>,
	Chandler Carruth <chandlerc@gmail.com>,
	David Ahern <dsahern@gmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Stephane Eranian <eranian@google.com>,
	pi3orama <pi3orama@163.com>
Subject: Re: [PATCH 13/16] perf callchain: Switch default to 'graph,0.5,caller'
Date: Wed, 21 Oct 2015 10:43:48 -0300	[thread overview]
Message-ID: <20151021134348.GI10639@kernel.org> (raw)
In-Reply-To: <20151021084816.GA12992@gmail.com>

Em Wed, Oct 21, 2015 at 10:48:16AM +0200, Ingo Molnar escreveu:
> 
> * Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
> 
> > I was bitten by the --children thing and took some time to get used to it, so I 
> > can relate to that...
> > 
> > I think we should revert this change in callchain default, enough complaints...  
> > Ingo, since you suggested that change, what are your thoughts?
> 
> Btw., one side note, I noticed that the call-graph options to 'perf top' do not 
> match that of perf report. I tried for a couple of minutes to figure out why this 
> doesn't work:
> 
>   perf top -g graph,0.5,caller
> 
> ... only to notice that it's perf report options.

Right, 'perf top' needs 'record' and 'report' knobs, so, for -g, it uses
the 'record' semantics where its parameters specify how to _collect_ the
callchains, not how to _present_ them, i.e.:

perf report:
    -g, --call-graph <output_type,min_percent[,print_limit],call_order[,branch]>
                  Display callchains using output_type (graph, flat, fractal or
                  none) , min percent threshold, optional print limit, callchain
                  order, key (function or address), add branches.
		  Default: graph,0.5,caller

perf record:

    -g            enables call-graph recording
        --call-graph <mode[,dump_size]>
                  setup and enables call-graph (stack chain/backtrace)
                  recording: fp dwarf lbr

perf top:
    -g            enables call-graph recording
        --call-graph <mode[,dump_size]>
                  setup and enables call-graph (stack chain/backtrace)
                  recording: fp dwarf lbr

Possibly we could make it smart and accept both cases, interpreting
'report' like parameters if starting with one of (graph, flat, fractal
or none) having as a separator one of (fp, dwarf or lbr).

This way one could specify both how to collect and how to present
callchains in one --call-graph call.
 
> A couple of thoughts about defaults:
> 
> 1)
> 
> I think 'perf top' and 'perf report' should provide the very same output by 
> default. The two tools are unified, and we should think of 'perf top' more of a 
> rolling, continuously updated perf report, with some dynamic runtime features that 
> go beyond a simple perf report. Making them diverge only creates confusion.

That is the idea, now with 'f' (Enable/Disable events) in the TUI it is
one step closer to that, i.e. it moves to/from top/dynamic to/from
report/static,we need more code to ask it to start collecting into a
perf.data so that it gets 'record', but the general idea is to have it
all integrated.

> 2) min-percentage
> 
> I suspect the '0.5%' part of the default is not contested by anyone?
> 
> 3) 'graph' vs. 'fractal'
> 
> The 'graph' part of the default: I think 'graph' (absolute percentages) is more 
> intuitive in general than 'fractal' (relative percentages), especially when 
> drilling down deep into more complex call graphs.

Here I'm more worried about polishing the invalid entries in callchains
so that we can have sane caller based output.

With some work we could go to/from caller/callee without a huge impact,
i.e. without the need to reprocess everything.
 
> For example, if you look at this output:
> 
>                     |          |          |          |--41.61%-- local_apic_timer_interrupt
>                     |          |          |          |          |          
>                     |          |          |          |           --100.00%-- hrtimer_interrupt
>                     |          |          |          |                     __run_hrtimer
>                     |          |          |          |                     |          
>                     |          |          |          |                     |--72.98%-- hrtimer_wakeup
>                     |          |          |          |                     |          wake_up_process
>                     |          |          |          |                     |          |          
>                     |          |          |          |                     |           --100.00%-- try_to_wake_up
>                     |          |          |          |                     |                     ttwu_do_activate.constprop.93
>                     |          |          |          |                     |                     activate_task
>                     |          |          |          |                     |                     enqueue_task
>                     |          |          |          |                     |                     enqueue_task_fair
>                     |          |          |          |                     |                     enqueue_entity
>                     |          |          |          |                     |          
>                     |          |          |          |                      --27.02%-- ehci_hrtimer_func
> 
> Would you have guessed that its relevance in reality is:
> 
>                     |          |          |          |          
>                     |          |          |          |--0.11%-- local_apic_timer_interrupt
>                     |          |          |          |          |          
>                     |          |          |          |           --0.10%-- hrtimer_interrupt
>                     |          |          |          |                     __run_hrtimer
>                     |          |          |          |                     |          
>                     |          |          |          |                     |--0.07%-- hrtimer_wakeup
>                     |          |          |          |                     |          wake_up_process
>                     |          |          |          |                     |          |          
>                     |          |          |          |                     |           --0.01%-- try_to_wake_up
>                     |          |          |          |                     |                     ttwu_do_activate.constprop.93
>                     |          |          |          |                     |                     activate_task
>                     |          |          |          |                     |                     enqueue_task
>                     |          |          |          |                     |                     enqueue_task_fair
>                     |          |          |          |                     |                     enqueue_entity
>                     |          |          |          |                     |          
>                     |          |          |          |                      --0.03%-- ehci_hrtimer_func
>                     |          |          |          |          
> 
> ?
> 
> I think the 'big picture' should always be apparent, even when looking at a small 
> detail. Also, it's not _that_ hard to see the relative weight of each entry even 
> if they are small numbers.
> 
> Fractal output can be useful if you are trying to drill down really, really deep 
> and only concentrate on that aspect - but that kind of workflow is probably best 
> served via a search option in any case:
> 
>   perf report --call-graph fractal,0.5,caller --stdio --symbol-filter local_apic_timer_interrupt
> 
> In which case fractal output is the more intuitive one I suspect:
> 
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 1K of event 'cycles:pp'
>   # Event count (approx.): 1155803425
>   #
>   # Children      Self  Command  Shared Object      Symbol                        
>   # ........  ........  .......  .................  ..............................
>   #
>      0.11%     0.01%  swapper  [kernel.kallsyms]  [k] local_apic_timer_interrupt
>             |          
>             |--89.19%-- local_apic_timer_interrupt
>             |          hrtimer_interrupt
>             |          __run_hrtimer
>             |          |          
>             |          |--72.98%-- hrtimer_wakeup
>             |          |          wake_up_process
>             |          |          |          
>             |          |           --100.00%-- try_to_wake_up
>             |          |                     ttwu_do_activate.constprop.93
>             |          |                     activate_task
>             |          |                     enqueue_task
>             |          |                     enqueue_task_fair
>             |          |                     enqueue_entity
>             |          |          
>             |           --27.02%-- ehci_hrtimer_func
>             |          
>              --10.81%-- start_secondary
>                        cpu_startup_entry
>                        cpuidle_enter
>                        apic_timer_interrupt
>                        smp_apic_timer_interrupt
>                        local_apic_timer_interrupt
> 
> 
> 
>   #
>   # (For a higher level overview, try: perf report --sort comm,dso)
>   #
> 
> Btw., I noticed an oddity, why doesn't "-S local_apic_timer_interrupt" produce any 
> output? It was the first option I tried, and it only gave me:

I'll check that
 
>   triton:~/tip> perf report --call-graph fractal,0.5,caller --stdio -S local_apic_timer_interrupt
>   [nv] with build id 744b5b4279152a54e61208989daf5d3d6b375aa3 not found, continuing without symbols
>   Failed to open /tmp/perf-6650.map, continuing without symbols
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   # symbol: local_apic_timer_interrupt
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 1K of event 'cycles:pp'
>   # Event count (approx.): 1155803425
>   #
>   # Children      Self  Command  Shared Object
>   # ........  ........  .......  .............
>   #
> 
> 
>   #
>   # (For a higher level overview, try: perf report --sort comm,dso)
>   #
> 
> some symbols could not be found - but the output is pretty confusing and 
> misleading in outputing just empty headers, plus it doesn't explain why it does 
> so.

Needs fixing, will check
 
> 4) 'caller' vs. 'callee'.
> 
> If I change 'caller' to 'callee' in the above example, I get this output:
> 
>   triton:~/tip> perf report --call-graph fractal,0.5,callee --stdio --symbol-filter local_apic_timer_interrupt
>   [nv] with build id 744b5b4279152a54e61208989daf5d3d6b375aa3 not found, continuing without symbols
>   Failed to open /tmp/perf-6650.map, continuing without symbols
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 1K of event 'cycles:pp'
>   # Event count (approx.): 1155803425
>   #
>   # Children      Self  Command  Shared Object      Symbol                        
>   # ........  ........  .......  .................  ..............................
>   #
>        0.11%     0.01%  swapper  [kernel.kallsyms]  [k] local_apic_timer_interrupt
>               |
>               ---local_apic_timer_interrupt
>                  smp_apic_timer_interrupt
>                  apic_timer_interrupt
>                  cpuidle_enter
>                  cpu_startup_entry
>                  start_secondary
> 
> 
> 
>   #
>   # (For a higher level overview, try: perf report --sort comm,dso)
>   #
> 
> That does not look very helpful, does it?
> 
> Now I tried to test caller vs. callee in perf top - but couldn't find a command 
> line option to do it - is there any?
> 
> 5) --no-children
> 
> I agree that 'perf top -g --no-children' looks more intuitive than 'perf top -g'.

So, what do you propose, to switch back the default to --no-children,
for both tools, top and report? Now that I am getting used to it... ;-)

- Arnaldo

  reply	other threads:[~2015-10-21 13:43 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-05 21:03 [GIT PULL 00/16] perf/core improvements and fixes Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 01/16] tools lib api fs: No need to use PATH_MAX + 1 Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 02/16] perf evlist: Display DATA_SRC sample type bit Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 03/16] perf annotate: Fix sizeof_sym_hist overflow issue Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 04/16] perf tools: Export perf_event_attr__set_max_precise_ip() Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 05/16] perf tools: Introduce 'P' modifier to request max precision Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 06/16] perf tests: Add parsing test for 'P' modifier Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 07/16] perf tools: Add support for sorting on the iaddr Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 08/16] perf tools: Setup proper width for symbol_iaddr field Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 09/16] perf tools: Handle -h and -v options Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 10/16] perf tests: Add arch tests Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 11/16] perf tests: Move x86 tests into arch directory Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 12/16] perf tests: Add Intel CQM test Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 13/16] perf callchain: Switch default to 'graph,0.5,caller' Arnaldo Carvalho de Melo
2015-10-09 20:34   ` Brendan Gregg
2015-10-09 21:56     ` Arnaldo Carvalho de Melo
2015-10-09 22:10       ` Brendan Gregg
2015-10-09 22:25         ` Arnaldo Carvalho de Melo
2015-10-20  0:16           ` Brendan Gregg
2015-10-20 12:00             ` Arnaldo Carvalho de Melo
2015-10-20 12:19               ` Frederic Weisbecker
2015-10-20 13:06                 ` Arnaldo Carvalho de Melo
2015-10-20 17:21                   ` Frederic Weisbecker
2015-10-20 18:44                     ` Arnaldo Carvalho de Melo
2015-10-21  1:21                       ` Namhyung Kim
2015-10-21 13:24                         ` Arnaldo Carvalho de Melo
2015-10-21  8:09                     ` Namhyung Kim
2015-10-21 11:57                       ` Wangnan (F)
2015-10-21 16:35                       ` Frederic Weisbecker
     [not found]                   ` <CAAwGriEtYeBytGt9x24=uUqSEy5oJ2HigfA2KXnKyrAioKrtNg@mail.gmail.com>
2015-10-21 16:27                     ` Frederic Weisbecker
2015-10-21 18:28                     ` Brendan Gregg
2015-10-21 19:23                       ` Arnaldo Carvalho de Melo
2015-10-22  0:44                         ` Brendan Gregg
2015-10-21  8:06               ` Ingo Molnar
2015-10-21 13:21                 ` Arnaldo Carvalho de Melo
2015-10-21 19:18                 ` Brendan Gregg
2015-10-10  7:09         ` Ingo Molnar
2015-10-10  7:34           ` Brendan Gregg
2015-10-10  9:07             ` Ingo Molnar
2015-10-12 15:27   ` Frederic Weisbecker
2015-10-13  4:26   ` Namhyung Kim
2015-10-19 23:50     ` Brendan Gregg
2015-10-21  7:29       ` Namhyung Kim
2015-10-20 13:23   ` Wangnan (F)
2015-10-20 13:38     ` Arnaldo Carvalho de Melo
2015-10-21  1:44       ` Namhyung Kim
2015-10-21  8:48       ` Ingo Molnar
2015-10-21 13:43         ` Arnaldo Carvalho de Melo [this message]
2015-10-21 13:46           ` Arnaldo Carvalho de Melo
2015-10-22  8:46           ` Ingo Molnar
2015-10-22 12:36             ` Namhyung Kim
2015-10-05 21:03 ` [PATCH 14/16] perf ui browser: Optional horizontal scrolling key binding Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 15/16] perf hists browser: Implement horizontal scrolling Arnaldo Carvalho de Melo
2015-10-05 21:03 ` [PATCH 16/16] perf tools: Fail properly in case pattern matching fails to find tracepoint Arnaldo Carvalho de Melo
2015-10-06  7:09 ` [GIT PULL 00/16] perf/core improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151021134348.GI10639@kernel.org \
    --to=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=bp@suse.de \
    --cc=chandlerc@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).