From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754998AbdERJof (ORCPT ); Thu, 18 May 2017 05:44:35 -0400 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:48891 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754325AbdERJob (ORCPT ); Thu, 18 May 2017 05:44:31 -0400 X-Original-SENDERIP: 156.147.1.127 X-Original-MAILFROM: namhyung@kernel.org X-Original-SENDERIP: 10.177.227.17 X-Original-MAILFROM: namhyung@kernel.org Date: Thu, 18 May 2017 18:44:25 +0900 From: Namhyung Kim To: Milian Wolff Cc: Linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Arnaldo Carvalho de Melo , David Ahern , Peter Zijlstra , Yao Jin , kernel-team@lge.com Subject: Re: [PATCH v3] perf report: fix off-by-one for non-activation frames Message-ID: <20170518094425.GD4885@sejong> References: <20170518083735.14126-1-milian.wolff@kdab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170518083735.14126-1-milian.wolff@kdab.com> User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Milian, On Thu, May 18, 2017 at 10:37:35AM +0200, Milian Wolff wrote: > As the documentation for dwfl_frame_pc says, frames that > are no activation frames need to have their program counter > decremented by one to properly find the function of the caller. > > This fixes many cases where perf report currently attributes > the cost to the next line. I.e. I have code like this: > > ~~~~~~~~~~~~~~~ > #include > #include > > using namespace std; > > int main() > { > this_thread::sleep_for(chrono::milliseconds(1000)); > this_thread::sleep_for(chrono::milliseconds(100)); > this_thread::sleep_for(chrono::milliseconds(10)); > > return 0; > } > ~~~~~~~~~~~~~~~ > > Now compile and record it: > > ~~~~~~~~~~~~~~~ > g++ -std=c++11 -g -O2 test.cpp > echo 1 | sudo tee /proc/sys/kernel/sched_schedstats > perf record \ > --event sched:sched_stat_sleep \ > --event sched:sched_process_exit \ > --event sched:sched_switch --call-graph=dwarf \ > --output perf.data.raw \ > ./a.out > echo 0 | sudo tee /proc/sys/kernel/sched_schedstats > perf inject --sched-stat --input perf.data.raw --output perf.data > ~~~~~~~~~~~~~~~ > > Before this patch, the report clearly shows the off-by-one issue. > Most notably, the last sleep invocation is incorrectly attributed > to the "return 0;" line: > > ~~~~~~~~~~~~~~~ > Overhead Source:Line > ........ ........... > > 100.00% core.c:0 > | > ---__schedule core.c:0 > schedule > do_nanosleep hrtimer.c:0 > hrtimer_nanosleep > sys_nanosleep > entry_SYSCALL_64_fastpath .tmp_entry_64.o:0 > __nanosleep_nocancel .:0 > std::this_thread::sleep_for > thread:323 > | > |--90.08%--main test.cpp:9 > | __libc_start_main > | _start > | > |--9.01%--main test.cpp:10 > | __libc_start_main > | _start > | > --0.91%--main test.cpp:13 > __libc_start_main > _start > ~~~~~~~~~~~~~~~ > > With this patch here applied, the issue is fixed. The report becomes > much more usable: > > ~~~~~~~~~~~~~~~ > Overhead Source:Line > ........ ........... > > 100.00% core.c:0 > | > ---__schedule core.c:0 > schedule > do_nanosleep hrtimer.c:0 > hrtimer_nanosleep > sys_nanosleep > entry_SYSCALL_64_fastpath .tmp_entry_64.o:0 > __nanosleep_nocancel .:0 > std::this_thread::sleep_for > thread:323 > | > |--90.08%--main test.cpp:8 > | __libc_start_main > | _start > | > |--9.01%--main test.cpp:9 > | __libc_start_main > | _start > | > --0.91%--main test.cpp:10 > __libc_start_main > _start > ~~~~~~~~~~~~~~~ > > Similarly it works for signal frames: > > ~~~~~~~~~~~~~~~ > #include > #include > #include > > #define __noinline __attribute__((noinline)) > > __noinline void bar(void) > { > volatile long cnt = 0; > > for (cnt = 0; cnt < 100000000; cnt++); > } > > __noinline void foo(void) > { > bar(); > } > > void sig_handler(int sig) > { > foo(); > } > > int main(void) > { > signal(SIGUSR1, sig_handler); > raise(SIGUSR1); > > foo(); > return 0; > } > ~~~~~~~~~~~~~~~~ > > Before, the report wrongly points to `signal.c:29` after raise(): > > ~~~~~~~~~~~~~~~~ > $ perf report --stdio --no-children -g srcline -s srcline > ... > 100.00% signal.c:11 > | > ---bar signal.c:11 > | > |--50.49%--main signal.c:29 > | __libc_start_main > | _start > | > --49.51%--0x33a8f > raise .:0 > main signal.c:29 > __libc_start_main > _start > ~~~~~~~~~~~~~~~~ > > With this patch in, the issue is fixed and we instead get: > > ~~~~~~~~~~~~~~~~ > 100.00% signal signal [.] bar > | > ---bar signal.c:11 > | > |--50.49%--main signal.c:29 > | __libc_start_main > | _start > | > --49.51%--0x33a8f > raise .:0 > main signal.c:27 > __libc_start_main > _start > ~~~~~~~~~~~~~~~~ > > Note how this patch fixes this issue for both unwinding methods, i.e. > both dwfl and libunwind. The former case is straight-forward thanks > to dwfl_frame_pc. For libunwind, we replace the functionality via > unw_is_signal_frame for any but the very first frame. > > Cc: Arnaldo Carvalho de Melo > Cc: David Ahern > Cc: Peter Zijlstra > Cc: Yao Jin > Signed-off-by: Milian Wolff Acked-by: Namhyung Kim Thanks, Namhyung > --- > tools/perf/util/unwind-libdw.c | 6 +++++- > tools/perf/util/unwind-libunwind-local.c | 11 +++++++++++ > 2 files changed, 16 insertions(+), 1 deletion(-) > > v2: > - use unw_is_signal_frame to also fix this issue for libunwind > > v3: > - only check current signal in libunwind case, which fixes the > example code provided by Namhyung. The example is added in the > commit message and I confirmed that libdw and libunwind now > both produce the correct results for this case > > diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c > index f90e11a555b2..943a06291587 100644 > --- a/tools/perf/util/unwind-libdw.c > +++ b/tools/perf/util/unwind-libdw.c > @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg) > { > struct unwind_info *ui = arg; > Dwarf_Addr pc; > + bool isactivation; > > - if (!dwfl_frame_pc(state, &pc, NULL)) { > + if (!dwfl_frame_pc(state, &pc, &isactivation)) { > pr_err("%s", dwfl_errmsg(-1)); > return DWARF_CB_ABORT; > } > > + if (!isactivation) > + --pc; > + > return entry(pc, ui) || !(--ui->max_stack) ? > DWARF_CB_ABORT : DWARF_CB_OK; > } > diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c > index f8455bed6e65..84d553898e2a 100644 > --- a/tools/perf/util/unwind-libunwind-local.c > +++ b/tools/perf/util/unwind-libunwind-local.c > @@ -692,6 +692,17 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb, > > while (!ret && (unw_step(&c) > 0) && i < max_stack) { > unw_get_reg(&c, UNW_REG_IP, &ips[i]); > + > + /* > + * Decrement the IP for any non-activation frames. > + * this is required to properly find the srcline > + * for caller frames. > + * See also the documentation for dwfl_frame_pc, > + * which this code tries to replicate. > + */ > + if (unw_is_signal_frame(&c) <= 0) > + --ips[i]; > + > ++i; > } > > -- > 2.13.0 >